cs.LG @ 2025-07-18: 1417
-
00 07-17 (4) Hierarchical Rectified Flow Matching with Mini-Batch Couplings Hierarchischer rektifizierter Fluss passend zu Mini-Batch-Kupplungen 与小批量相匹配的梯级校正流程 2507.13350v1 -
01 07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning VisionThink: Intelligentes und effizientes Vision-Sprachmodell durch Verstärkungslernen 远景设想:通过强化学习建立聪明、高效的愿景语言模式 2507.13348v1 -
02 07-17 Latent Policy Steering with Embodiment-Agnostic Pretrained World Models Latent Policy Steering mit prätrainierten Weltmodellen der Embodiment-Agnostik 与Embodiment-Agnnocistic未受训练世界模型的原始政策指导 2507.13340v1 -
03 07-17 Training Transformers with Enforced Lipschitz Constants Trainingstransformatoren mit verstärkter Lipschitz-Konstanten 培训具有强制立利普施茨常数的变革者 2507.13338v1 -
04 07-17 GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM GeoReg: Gewicht-beschränkt Wenig-heiße Regression für sozioökonomische Abschätzung mit LLM Georg: 使用LLM法理学模型,为社会经济估算而进行微慢回归,但受重力约束的微弱回缩 2507.13323v1 -
05 07-17 Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence Föderiertes Lernen: Eine Umfrage zum Datenschutz-Schutz Kollaborativer Intelligenz 联邦学习:保护隐私合作情报调查 2504.17703v2 -
06 07-17 EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos EgoVLA: Vision-Language-Action-Modelle von egozentrischen menschlichen Videos lernen EgoVLA:从以以地球为中心的人类视频中学习愿景-语言-行动模式 2507.12440v2 -
07 07-17 Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models Von reward-free Offline-Daten lernen: Ein Fall für die Planung mit latenten Dynamics-Modellen 从无回报脱离线数据中学习:利用隐时动态模型进行规划的一个案例 2502.14819v2 -
08 07-17 Boosting Team Modeling through Tempo-Relational Representation Learning Teammodellierung durch Tempo-Relationales Repräsentationslernen fördern 通过Tempo-关系代表制学习促进团队模拟 2507.13305v1 -
09 07-17 Retraining-Free Merging of Sparse MoE via Hierarchical Clustering Retraining-Free Merging von Sparse MoE über Hierarchical Clustering 通过等级式集束式集成,无培训地重新合并粗微中小部 2410.08589v3 -
10 07-17 Advancing Seasonal Prediction of Tropical Cyclone Activity with a Hybrid AI-Physics Climate Model Förderung der saisonalen Vorhersage Tropischer Zyklonaktivität mit einem Hybrid-KI-Physik-Klimamodell 采用AI-物理混合气候模型推进热带气旋活动季节性预测 2505.01455v2 -
11 07-17 SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks SIDDA: SInkhorn Dynamische Domain-Anpassung für die Bildklassifizierung mit Gleichwertigen Neuronalen Netzwerken SIDDA: 利用等质神经网络进行图像分类的SInkhorn动态域域适应 2501.14048v2 -
12 07-17 crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy Labels crowd-hpo: Realistische Hyperparameter-Optimierung und Benchmarking zum Lernen von Crowds mit Noisy-Labels 现实主义超超参数最佳化和基准化,用噪音标签从人群中学习 2504.09085v2 -
13 07-17 Optimal Empirical Risk Minimization under Temporal Distribution Shifts Optimale Empirische Risikominimierung unter zeitlichen Verteilungsverschiebungen 时间分布变化下最佳实证风险最小化 2507.13287v1 -
14 07-17 Stochastic Weakly Convex Optimization Under Heavy-Tailed Noises Stochastisch schwache Konvex-Optimierung unter schwerfälligen Geräuschen 在重故障噪音下优化 2507.13283v1 -
15 07-17 Generative Diffusion Models for Resource Allocation in Wireless Networks Generative Diffusionsmodelle zur Ressourcenallokation in drahtlosen Netzwerken 无线网络资源分配生成传播模型 2504.20277v2 -
16 07-17 Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour Bewertung von Stärkungslernen Algorithmen für die Navigation in simulierten Roboter-Quadrupen: Eine vergleichende Studie, inspiriert von Guide Dog Behaviour 评价模拟机器人四重干扰模拟机器人四重干扰导航中强化学习的教学比值:受导狗行为启发的比较研究 2507.13277v1 -
17 07-17 Do you know what q-means? Weißt du, was q-bedeutet? 你知道什么是q - means吗? 2308.09701v3 -
18 07-17 Merge Kernel for Bayesian Optimization on Permutation Space Zusammenführen Kernel für Bayesian Optimierung auf Permutationsraum Bayesian Permodation 空间优化合并核心圈 2507.13263v1 -
19 07-17 Automating Steering for Safe Multimodal Large Language Models Automatisierungslenkung für sichere multimodale große Sprachmodelle 安全多式联运大语言模式自动化指导 2507.13255v1 -
20 07-17 A Roadmap for Climate-Relevant Robotics Research Ein Fahrplan für die klimarelevante Robotikforschung 气候相关机器人研究路线图 2507.11623v2 -
21 07-17 Leveraging Asynchronous Cross-border Market Data for Improved Day-Ahead Electricity Price Forecasting in European Markets Nutzung asynchroner grenzübergreifender Marktdaten für eine verbesserte Tagesprognose der Strompreise in den europäischen Märkten 利用非同步跨界市场数据改进欧洲市场日间电力价格预测 2507.13250v1 -
22 07-17 Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform Annäherungssätze für Shallow ReLU$^k$ Neurale Netze auf Sobolev-Räumen über die Radon-Transformation Sobolev空间的浅光RELU$QK$美元神经网络通过拉子变换的近似率 2408.10996v2 -
23 07-17 The carbon cost of materials discovery: Can machine learning really accelerate the discovery of new photovoltaics? Die CO2-Kosten der Materialentdeckung: Kann maschinelles Lernen die Entdeckung neuer Photovoltaik wirklich beschleunigen? 材料发现的碳成本:机器学习能否真正加速新光伏发电的发现? 2507.13246v1 -
24 07-17 VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models VectorFit : Adaptive Singular & Bias Vector Fine-Tuning von vortrainierten Foundation-Modellen 矢量Fit:培训前基金会模型的适应性单单项和比亚斯矢量微调 2503.19530v2 -
25 07-17 Multiple-Frequencies Population-Based Training Mehrfachhäufigkeiten bevölkerungsbasierte Ausbildung 以人口为基础的培训 2506.03225v2 -
26 07-17 Computational-Statistical Tradeoffs from NP-hardness Computational-Statistical Tradeoffs von NP-Härte 对NP-硬度的计算-统计取舍 2507.13222v1 -
27 07-17 V-Max: A Reinforcement Learning Framework for Autonomous Driving V-Max: Ein Rahmen für verstärktes Lernen für autonomes Fahren V-Max:加强自主驾驶学习框架 2503.08388v3 -
28 07-17 Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models Komponativ diskreter latenter Code für High Fidelity, Produktive Diffusionsmodelle 高菲力、生产性扩散模型、生产性扩散模型 2507.12318v2 -
29 07-17 MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling MoTM: Auf dem Weg zu einem Basismodell für Zeitreihen Imputation basierend auf kontinuierlicher Modellierung MoTM:建立基于连续建模的时间序列计算基础模型 2507.13207v1 -
30 07-17 Branching Stein Variational Gradient Descent for sampling multimodal distributions Verzweigung Stein Variational Gradient Descent für die Probenahme multimodaler Verteilungen 用于抽样多式联运分销的 2506.13916v2 -
31 07-17 Relation-Aware Slicing in Cross-Domain Alignment Verhältnis-Bewusstsein-Slicing in Cross-Domain-Alignment 跨域对齐中的关系软件切切 2507.13194v1 -
32 07-17 Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis Jüngste Fortschritte bei der simulationsbasierten Schlussfolgerung für die Analyse von Gravitationswellendaten 引力波数据分析模拟推导法最近的进展 2507.11192v2 -
33 07-17 GradNetOT: Learning Optimal Transport Maps with GradNets GradNetOT: Optimale Transportkarten mit GradNets lernen GradNetOT: 与 GradNets一起学习最佳交通地图 2507.13191v1 -
34 07-17 Bounding the Worst-class Error: A Boosting Approach Den Fehler der schlechtesten Klasse zu überwinden: Ein Boosting-Ansatz 绕过最坏的错误 : 推动方法 2310.14890v3 -
35 07-17 Spectral Bellman Method: Unifying Representation and Exploration in RL Spektral Bellman-Methode: Vereinheitliche Darstellung und Exploration in RL 光谱钟门方法:统一代表与探索 2507.13181v1 -
36 07-17 SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks SHIELD: Ein sicheres und hochverstärktes integriertes Lernen für robuste Deepfake-Erkennung gegen feindliche Angriffe SHIELD: 可靠和高度强化的综合学习,以强有力地发现深假,防止反向攻击 2507.13170v1 -
37 07-17 Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models Orbis: Herausforderungen der Langzeit-Vorhersage bei treibenden Weltmodellen überwinden Orbis:克服在推动世界模式方面长期预测的挑战 2507.13162v1 -
38 07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Inverse Stärkung Lernen trifft auf großes Sprachmodell Post-Training: Grundlagen, Fortschritte und Chancen 培训后培训:基础、进步和机会 2507.13158v1 -
39 07-17 NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech 非口头翻译:一个以文本为主的非口头演唱的英文公共单位,带有文字对语音情感说明 2507.13155v1 -
40 07-17 NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation NGTM: Substrukturbasiertes Neural Graph Topic Model für die interpretierbare Graphengenerierung NGTM: 以次级结构为基础的可解释图形生成神经图专题模型 2507.13133v1 -
41 07-17 PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data PINT: Physik-informierte Neuralzeit-Serienmodelle mit Anwendungen zur langfristigen Schlussfolgerung auf WeatherBench 2m-Temperaturdaten PINT: 应用气象区2m-温度数据长期推断的物理化神经时间序列模型 2502.04018v2 -
42 07-17 Search for Z/2 eigenfunctions on the sphere using machine learning Suche nach Z/2 Eigenfunktionen auf der Kugel mittels maschinellem Lernen 使用机器学习在球体上搜索 Z/2 电子元件 2507.13122v1 -
43 07-17 RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images RS-TinyNet: Stage-wise Feature Fusion Network zur Erkennung winziger Objekte in Bildern der Fernerkundung RS-TinyNet:在遥感图像中探测小物体的分阶段地貌融合网络 2507.13120v1 -
44 07-17 Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression Task-Circuit Quantization: Nutzung von Wissen Lokalisierung und Dolmetschbarkeit für Komprimierung 任务-环境环境定量:利用知识本地化和压缩解释 2504.07389v2 -
45 07-17 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction Deep Learning-based Fetal Lung Segmentation aus diffusionsgewichteten MRT-Bildern und Lungenreife-Evaluierung für fetale Wachstumsbeschränkung 从传播加权磁RI图像和对胎儿生长限制的肺期评估中分离出的深学习-基于学习的胎儿肺部切片 2507.13106v1 -
46 07-17 SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts SemCSE: Semantische kontrastive Satzeinbettungen mit LLM-generierten Zusammenfassungen für wissenschaftliche Abstracts SEMCSE: 使用LLM创制的科学摘要摘要 2507.13105v1 -
47 07-17 Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models Unified Triplet-Level Halluzination Evaluation für große Vision-Sprache Modelle 大型视觉语言模型统一三维级幻觉评价 2410.23114v4 -
48 07-17 Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction Uni-Instruct: Einstufiges Diffusionsmodell durch Unified Diffusion Divergence Instruction Uni- Instruct: 通过统一扩散分散指令单步扩散模型 2505.20755v2 -
49 07-17 Unsupervised Ground Metric Learning Unüberwachtes metrisches Lernen am Boden 不受监督的地面计量学习 2507.13094v1 -
50 07-17 Truthful Elicitation of Imprecise Forecasts Wahre Botschaft von ungenauen Prognosen 以真真真真真真真真真切的易感简易预报 2503.16395v4 -
51 07-17 Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces Ungewissheitsbewusste Cross-Modal Knowledge Destillation mit Prototypenlernen für multimodale Gehirn-Computer-Schnittstellen 与多式脑-计算机界面的原型学习相结合的不确定-软件软件的跨模式知识蒸馏 2507.13092v1 -
52 07-17 Super Resolution for Renewable Energy Resource Data With Wind From Reanalysis Data and Application to Ukraine Super Auflösung für erneuerbare Energien Ressourcendaten mit Wind Von der Reanalyse Daten und Anwendung in die Ukraine 乌克兰可再生能源资源数据利用风向再分析数据和应用于乌克兰的超级分辨率 2407.19086v2 -
53 07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI MUPAX: Multidimensionales Problem Agnostic eXplainable KI MUPAX: 多元问题Agnistic EXlable AI 2507.13090v1 -
54 07-17 DASViT: Differentiable Architecture Search for Vision Transformer DASViT: Unterschiedliche Architektur Suche nach Vision Transformer DASVVT:不同建筑搜索视野变异器 2507.13079v1 -
55 07-17 On the Effectiveness of the z-Transform Method in Quadratic Optimization Über die Wirksamkeit der z-Transform Methode in der quadratischen Optimierung 关于四压压优化中z变形方法有效性问题 2507.03404v2 -
56 07-17 MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs MedPix 2.0: Umfassender multimodaler biomedizinischer Datensatz für fortgeschrittene KI-Anwendungen mit retrieval Augmented Generation und Wissensgraphen MedPix 2.0:一套综合多式生物医学数据集,用于高级AI应用,并附有回收增加的生成和知识图 2407.02994v5 -
57 07-17 On statistical learning of graphs Statistisches Erlernen von Schaubildern 关于统计学图表 2507.13054v1 -
58 07-17 Uncertainty quantification for White Matter Hyperintensity segmentation detects silent failures and improves automated Fazekas quantification Unsicherheits-Quantifizierung für White Matter Hyperintensitätssegmentierung erkennt leise Ausfälle und verbessert die automatisierte Fazekas-Quantifizierung 白色物质超密度分离的不确定性量化,可检测静态故障,改进自动Fazekas量化 2411.17571v2 -
59 07-17 The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting Die Kraft der Architektur: Tiefgehen in Transformer-Architekturen für langfristige Zeitreihen 建筑力量:为长期时间序列预测而向变形结构深度下潜 2507.13043v1 -
60 07-17 Confidence-Filtered Relevance (CFR): An Interpretable and Uncertainty-Aware Machine Learning Framework for Naturalness Assessment in Satellite Imagery Confidence-Filtered Relevance (CFR): Ein interpretierbares und unsicheres Machine Learning Framework für die Bewertung von Natürlichkeit in Satellitenbildern 信任改变的相关性:卫星图像中自然评估的 解释性和不确定性和不确定性-智能学习框架 2507.13034v1 -
61 07-17 (Exhaustive) Symbolic Regression and model selection by minimum description length (Erschöpfend) Symbolische Regression und Modellauswahl nach minimaler Beschreibungslänge 按最低描述长度分列的符号回归和模型选择 2507.13033v1 -
62 07-17 When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values Wenn Pattern-by-Pattern arbeitet: Theoretische und Empirische Einblicke für Logistische Modelle mit fehlenden Werten 代代办法:缺少价值的后勤模式理论和经验透视 2507.13024v1 -
63 07-17 Fault detection and diagnosis for the engine electrical system of a space launcher based on a temporal convolutional autoencoder and calibrated classifiers Fehlererkennung und Diagnose für das elektrische Motorsystem eines Raumwerfers basierend auf einem zeitlich konvolutionären Autoencoder und kalibrierten Klassifikatoren 以时富集自动编码器和校准分类器为基础的空间发射装置发动机电气系统的故障检测和诊断 2507.13022v1 -
64 07-17 The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks Die Spätphasen-Trainingsdynamik des (stochastischen) subgradienten Abstiegs auf homogenen neuronalen Netzwerken 在同质神经网络上的(随机)亚梯级下降的后阶段培训动态 2502.05668v3 -
65 07-17 SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs SMART: Beziehungsorientiertes Lernen geometrischer Darstellungen für Wissensgraphen SMART:知识图表几何表示法关系-知识学习 2507.13001v1 -
66 07-17 Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning Differential-informierte Probenauswahl beschleunigt multimodales kontrastives Lernen 不同知情的抽样甄选加速多模式差异学习 2507.12998v1 -
67 07-17 (Almost) Free Modality Stitching of Foundation Models (Fast) Freie Modalitätsstiche von Stiftungsmodellen (几乎) 基金会模型的免费方式 2507.10015v3 -
68 07-17 Teach Old SAEs New Domain Tricks with Boosting Lehren Sie alte SAEs neue Domain Tricks mit Förderung 教授旧的 SAEs 新域圈套 2507.12990v1 -
69 07-17 Variance-Based Pruning for Accelerating and Compressing Trained Networks Varianzbasiertes Pruning für beschleunigte und komprimierende Ausgebildete Netzwerke 加快和压缩经过训练的网络 2507.12988v1 -
70 07-17 FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient FedGA: Ein faires, auf dem Gini-Koeffizienten basierendes Föderated Learning Framework FDGA:基于基尼系数的公平联邦学习框架 2507.12983v1 -
71 07-17 A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints Ein verteilter generativer KI-Ansatz für heterogene Multi-Domain-Umgebungen unter Datenfreigabebeschränkungen 在数据共享制约下,对异种多领域不同环境采取分散的AI方法 2507.12979v1 -
72 07-17 WaveletInception Networks for Drive-by Vibration-Based Infrastructure Health Monitoring WaveletInception-Netzwerke für Drive-by-Vibrationsbasierte Infrastruktur-Gesundheitsüberwachung 驱动振动基础设施健康监测波动感知网络 2507.12969v1 -
73 07-17 Investigating Forecasting Models for Pandemic Infections Using Heterogeneous Data Sources: A 2-year Study with COVID-19 Untersuchung von Prognosemodellen für Pandemieinfektionen unter Verwendung heterogener Datenquellen: Eine 2-jährige Studie mit COVID-19 利用异源数据源调查利用异源数据对传染病的预测模型:COVID-19的两年期研究 2507.12966v1 -
74 07-17 Demographic-aware fine-grained classification of pediatric wrist fractures Demografiebewusste feinkörnige Klassifizierung von pädiatrischen Handgelenkfrakturen 人口意识小儿科手腕骨折细细细分分类 2507.12964v1 -
75 07-17 A Spectral Interpretation of Redundancy in a Graph Reservoir Eine spektrale Interpretation der Redundanz in einem Graph Reservoir 图表储量中剩余性的旁观解释 2507.12963v1 -
76 07-17 Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning Dynamische Stabilität des stochastischen Gradienten Absinkens im überparameterisierten Lernen charakterisierend 将过度量化的学习中存储层渐变源的动态稳定化特性化 2407.20209v3 -
77 07-17 A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing Ein Progressives Bildwiederherstellungsnetzwerk für High-Order Degradation Imaging in Remote Sensing 遥感中高顺序退化成像的逐步图像恢复网络 2412.07195v2 -
78 07-17 A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion Eine Gehirntumor-Segmentierungsmethode basierend auf CLIP und 3D U-Net mit Cross-Modal Semantic Guidance und Multi-Level-Feature Fusion 以CLIP和3D U-Net为基础的脑肿瘤分解法,并配有跨模式语义指导和多功能融合 2507.09966v2 -
79 07-17 cIDIR: Conditioned Implicit Neural Representation for Regularized Deformable Image Registration cIDIR: Bedingte implizite Neuraldarstellung für regularisierte, deformierbare Bildregistrierung cIDIR: 定期变形图像注册的有条件的、隐含的神经代表 2507.12953v1 -
80 07-17 Signal Recovery Using a Spiked Mixture Model Signalwiederherstellung mit einem Spiked Mixture Model 使用斯派混合混合模型恢复信号 2501.01840v2 -
81 07-17 MMOne: Representing Multiple Modalities in One Scene MMUne: Vertretung mehrerer Modalitäten in einer Szene MMIO: 在一个场景中代表多种模式 2507.11129v2 -
82 07-17 Insights into a radiology-specialised multimodal large language model with sparse autoencoders Einblicke in ein radiologisch spezialisiertes multimodales Großsprachmodell mit spärlichen Autoencodern 深入观察放射学专门化多式联运大型语言模型,无甚多的自动编码器 2507.12950v1 -
83 07-17 Probabilistic Soundness Guarantees in LLM Reasoning Chains Probabilistische Solidität garantiert in LLM-Aufklärungsketten LLM 理赔链条的概率稳妥性保障 2507.12948v1 -
84 07-17 Global urban visual perception varies across demographics and personalities Globale urbane visuelle Wahrnehmung variiert je nach Demografie und Persönlichkeit 全球城市视觉认识因人口和个性而异 2505.12758v3 -
85 07-17 MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration MC$^2$A: Algorithm-Hardware Co-Design für effiziente Markov-Kette Monte Carlo Beschleunigung MC$$2$A: 提高Markov链节蒙特卡洛速度加速速度的辅助算法-Hardware共同设计 2507.12935v1 -
86 07-17 DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization DMQ: Ausreißer von Diffusionsmodellen für die Quantisierung nach dem Training DMQ: 解剖培训后量化传播模型的外源离子 2507.12933v1 -
87 07-17 From a Mixed-Policy Perspective: Improving Differentiable Automatic Post-editing Optimization Aus einer Mixed-Policy-Perspektive: Verbesserung der differenzierbaren automatischen Post-Editing-Optimierung 从混合政策角度看:改进可区别的自动编辑后优化 2507.12931v1 -
88 07-17 Trace Reconstruction with Language Models Trace Rekonstruktion mit Sprachmodellen 使用语言模式进行追踪重建 2507.12927v1 -
89 07-17 Robust Explanations Through Uncertainty Decomposition: A Path to Trustworthier AI Robuste Erklärungen durch Unsicherheitszersetzung: Ein Weg zu vertrauensvoller KI 通过不确定性的分解作出有力的解释:通往信托的路径 AI 2507.12913v1 -
90 07-17 LaViPlan : Language-Guided Visual Path Planning with RLVR LaViPlan : Sprachgeführte visuelle Pfadplanung mit RLVR Laviplan: RLVR 语言引导视觉路径规划 2507.12911v1 -
91 07-17 Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services Fremer: Leichter und effektiver Frequenztransformator für Workload-Prognose in Cloud Services Fremer:云服务工作量预测的轻型和有效频率变压器 2507.12908v1 -
92 07-17 Learning to Reject Low-Quality Explanations via User Feedback Lernen, Low-Quality-Erklärungen per User Feedback abzulehnen 通过用户反馈学习拒绝低质量解释 2507.12900v1 -
93 07-17 A column generation algorithm with dynamic constraint aggregation for minimum sum-of-squares clustering Ein Spaltengenerierungsalgorithmus mit dynamischer Constraint-Aggregation für minimale Summe von Quadraten 为最小平方和组合组合组合组合而具有动态约束聚合的列生成算法 2410.06187v2 -
94 07-17 Generalist Bimanual Manipulation via Foundation Video Diffusion Models Generalist Bimanual Manipulation über Stiftung Video Diffusion Modelle 通过基金会录像传播模型进行通用二手操作 2507.12898v1 -
95 07-17 VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks VAR-MATH: Wahre mathematische Vernunft in großen Sprachmodellen anhand symbolischer Multi-Instance-Benchmarks VAR-MATH:通过符号性多因基准在大语言模型中验证真实的数学理由 2507.12885v1 -
96 07-17 Autonomous Resource Management in Microservice Systems via Reinforcement Learning Autonomes Ressourcenmanagement in Mikroservice-Systemen durch Verstärkungslernen 通过加强学习,对微小服务系统进行自主资源管理 2507.12879v1 -
97 07-17 Bayesian Modeling and Estimation of Linear Time-Variant Systems using Neural Networks and Gaussian Processes Bayesische Modellierung und Abschätzung von linearen Zeitvariantsystemen unter Verwendung neuraler Netzwerke und Gaußschen Prozessen 利用神经网络和高斯进程模拟和估计线性时间变化系统 2507.12878v1 -
98 07-17 Topology-Aware Activation Functions in Neural Networks Topologie-Bewusst-Aktivierungsfunktionen in neuralen Netzwerken 神经网络中的地形-软件启动功能 2507.12874v1 -
99 07-17 An Investigation of Ear-EEG Signals for a Novel Biometric Authentication System Untersuchung von Ohr-EEG-Signalen für ein neuartiges biometrisches Authentifizierungssystem 关于新生物测定鉴定系统耳电信号的调查 2507.12873v1 -
100 07-17 WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding WhoFi: 通过 Wi-Fi 频道信号编码来识别深层人的身份 2507.12869v1 -
101 07-17 Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) Beaufsichtigte Feinabstimmung auf kuratierten Daten ist Verstärktes Lernen (und kann verbessert werden) 受监督的 “ 封闭数据 “ 微调微调是 “ 强化学习 “ (并可以改进) 2507.12856v1 -
102 07-17 Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v3 -
103 07-17 Transformer-Based Person Identification via Wi-Fi CSI Amplitude and Phase Perturbations Transformerbasierte Personenidentifikation über Wi-Fi CSI Amplitude und Phasenstörungen 通过Wi-Fi CSI进行基于变压器的人的识别 2507.12854v1 -
104 07-17 Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants Site-Level Feintuning mit Progressive Layer Freezing: Auf dem Weg zur robusten Vorhersage der Bronchopulmonalen Dysplasie von Tag 1 Brustradiographen bei extrem prätermen Säuglingen 与累进层冷冻有关的地点级微调级微调:对极期前婴儿每日1号胸前无线电报上的布朗-希波本二元病原体进行强有力的预测 2507.12269v2 -
105 07-17 Formalising causal inference as prediction on a target population Formalisierende kausale Schlussfolgerungen als Vorhersage für eine Zielpopulation 将因果推断正规化,作为对目标人口的预测 2407.17385v3 -
106 07-17 Dataset resulting from the user study on comprehensibility of explainable AI algorithms Datensatz aus der Nutzerstudie zur Verständlichkeit erklärbarer KI-Algorithmen 用户关于可解释的AI算法的可理解性研究产生的数据集 2411.02419v2 -
107 07-17 A Kernel Distribution Closeness Testing Eine Näherungsprüfung der Kernelverteilung A 内核分布近距离测试 2507.12843v1 -
108 07-17 Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling Aufgabenspezifische Generative Datensatzdestillation mit schwer wiegender Probenahme 利用难于指导的抽样抽样进行任务特定生成数据集蒸馏 2507.03331v2 -
109 07-17 We should avoid the assumption of data-generating probability distributions in social settings Wir sollten die Annahme von datengenerierenden Wahrscheinlichkeitsverteilungen in sozialen Settings vermeiden 我们应该避免假设在社会环境中产生数据的概率分布 2407.17395v4 -
110 07-17 Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines Bridging the Gap: Leveraging Retrieval-Augmented Generation zu besser verstehen öffentliche Bedenken über Impfstoffe 缩小差距:利用利用回收-养殖一代来更好地了解公众对疫苗的关切 2507.12840v1 -
111 07-17 Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability Die Evolution des neuralen Tangentenkerns am Rande der Stabilität verstehen 了解稳定边缘的内心内核核心的演变 2507.12837v1 -
112 07-17 MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results MVA 2025 Kleines Multi-Objekt-Tracking für die Vogelbeobachtung Herausforderung: Datensatz, Methoden und Ergebnisse MVA 2025 发现鸟类挑战小型多目标跟踪:数据集、方法和结果 2507.12832v1 -
113 07-17 Autoregressive Speech Enhancement via Acoustic Tokens Autoregressive Sprachverbesserung durch akustische Token 通过声调声调增强自动递减语音 2507.12825v1 -
114 07-17 Assessing adaptive world models in machines with novel games Bewertung adaptiver Weltmodelle in Maschinen mit neuartigen Spielen 评估用新游戏机器制作的适应性世界模式 2507.12821v1 -
115 07-17 Self Balancing Neural Network: A Novel Method to Estimate Average Treatment Effect Self Balancing Neural Network: Eine neuartige Methode zur Schätzung des durchschnittlichen Behandlungseffekts 自我平衡神经网络:估计平均治疗效果的新办法 2507.12818v1 -
116 07-17 From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning Von der Neuheit zur Imitation: Selbstdestillierte Belohnungen für Offline-Verstärkungslernen 从新闻到消化:为脱线强化学习自行提炼奖项 2507.12815v1 -
117 07-17 RONOM: Reduced-Order Neural Operator Modeling RONOM: Reduzierte Neuraloperator-Modellierung RONOM: 降低轨道神经操作员模型 2507.12814v1 -
118 07-17 ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space ZClassifier: Temperatur-Tuning und Manifold-Annäherung über KL Divergenz auf Logit Space ZClasizer: 通过在登录空间的 KL diggence 进行温度调制和调控相近 2507.10638v2 -
119 07-17 Holistix: A Dataset for Holistic Wellness Dimensions Analysis in Mental Health Narratives Holistix: Ein Datensatz für ganzheitliche Wellness-Dimensionen Analyse in psychischen Gesundheits-Erzählungen Holistix:心理健康叙事中整体健康层面分析数据集 2507.09565v2 -
120 07-17 Quantum Long Short-Term Memory for Drug Discovery Quantenlanges Kurzzeitgedächtnis für die Drogenentdeckung 药物发现长期短期记忆 2407.19852v2 -
121 07-17 Deep Q-Learning with Gradient Target Tracking Deep Q-Learning mit gradientem Target Tracking 与渐进目标跟踪进行深度学习 2503.16700v2 -
122 07-17 Large Language Models’ Internal Perception of Symbolic Music Die innere Wahrnehmung symbolischer Musik durch große Sprachmodelle 大语言模型内部对符号音乐的感知 2507.12808v1 -
123 07-17 PMKLC: Parallel Multi-Knowledge Learning-based Lossless Compression for Large-Scale Genomics Database PMKLC: Parallele Multi-Knowledge Learning-basierte Lossless-Kompression für großformatige Genomics-Datenbank PMKLC: 大型基因组数据库的平行多知识学习-无损失压缩 2507.12805v1 -
124 07-17 Physics-Informed Linear Model (PILM): Analytical Representations and Application to Crustal Strain Rate Estimation Physik-informiertes Linearmodell (PILM): Analytische Darstellungen und Anwendung auf Crustal Strain Rate Abschätzung 物理内建线性模型(PILM):对结壳定流速率估计的分析说明和应用 2507.12218v2 -
125 07-17 FLDmamba: Integrating Fourier and Laplace Transform Decomposition with Mamba for Enhanced Time Series Prediction FLDmamba: Integration von Fourier und Laplace-Transformationszersetzung mit Mamba für verbesserte Zeitreihenvorhersage FLDmamba:将Fourier和Laple变形变形变形与Mamba结合,以提高时间序列预测 2507.12803v1 -
126 07-17 ReCode: Updating Code API Knowledge with Reinforcement Learning ReCode: Aktualisierung von Code-API-Kenntnissen mit Verstärkungslernen ReCode:更新法规API知识与强化学习 2506.20495v2 -
127 07-17 MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment MPO: Ein effizientes Post-Processing-Framework zum Mischen unterschiedlicher Präferenzen MPO: 混合多种优惠协调的高效处理后框架 2502.18699v2 -
128 07-17 Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises NEEQ企业金融风险预测多通道图图神经网络 2507.12787v1 -
129 07-17 Compact Vision Transformer by Reduction of Kernel Complexity Kompakter Vision Transformer durch Reduktion der Kernelkomplexität 减少内核复杂度,实现全球契约愿景转型 2507.12780v1 -
130 07-17 Demystifying MuZero Planning: Interpreting the Learned Model MuZero-Planung entmystifizieren: Das gelernte Modell interpretieren 消除神秘的 “ 零零规划 “ :解释 “ 总结经验 “ 模式 2411.04580v2 -
131 07-17 A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models Eine umfassende Umfrage zur elektronischen Gesundheitsdatenmodellierung: Von Deep Learning Ansätzen bis hin zu großen Sprachmodellen 《电子健康记录模型综合调查:从深学习方法到大语言模式》 2507.12774v1 -
132 07-17 Sample-Constrained Black Box Optimization for Audio Personalization Sample-Constrained Black Box Optimierung für Audio-Personalisierung 优化音频个性化 2507.12773v1 -
133 07-17 AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation AnyPos: Automatisierte Task-Agnostische Aktionen zur bimanuellen Manipulation 任何 波 : 用于二手操纵的自动任务- 不可允许动作 2507.12768v1 -
134 07-17 Layer Separation Deep Learning Model with Auxiliary Variables for Partial Differential Equations Ebenentrennung Deep Learning Modell mit Hilfsvariablen für partielle Differentialgleichungen 图层分离深学习模型,带有局部差异等量的辅助变量 2507.12766v1 -
135 07-17 Golden Noise for Diffusion Models: A Learning Framework Goldene Geräusche für Diffusionsmodelle: Ein Lernrahmen 传播模型的黄金噪音:学习框架 2411.09502v5 -
136 07-17 TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph TBDetector:Transformer-basierter Detektor für erweiterte persistente Bedrohungen mit Provenienzgraph TB 检测器:用证明图测出先进持久性威胁的转移前检测器 2304.02838v2 -
137 07-17 World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving Weltmodellbasierte End-to-End-Szenengenerierung für Unfallvorhersage im autonomen Fahren 以世界模式为基础的在自主驾驶中事故预防端至终点到终点示范景点一代 2507.12762v1 -
138 07-17 Faster and Space Efficient Indexing for Locality Sensitive Hashing Schnellere und raumsparende Indexierung für Lokalitätssensitive Hashing 地方敏感散列更快和空间高效索引编制 2503.06737v2 -
139 07-17 A Comprehensive Survey of Synthetic Tabular Data Generation Eine umfassende Übersicht über die Erstellung von synthetischen Tabellendaten 合成图表数据生成综合调查 2504.16506v3 -
140 07-17 Domain-Enhanced Dual-Branch Model for Efficient and Interpretable Accident Anticipation Domain-Enhanced Dual-Branch-Modell für effiziente und interpretierbare Unfallvorhersage 高效和可解释的意外事故预测的强化双重-双重-双重强化模式 2507.12755v1 -
141 07-17 Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning Multimodal geführtes dynamisches Datenset Pruning für robustes und effizientes datenzentrales Lernen 灵活、高效、高效的数据中心学习的多式指导动态数据集 2507.12750v1 -
142 07-17 Learning Universal Human Mobility Patterns with a Foundation Model for Cross-domain Data Fusion Lernen von universellen Mobilitätsmustern mit einem Basismodell für die domänenübergreifende Datenfusion 具有跨领域数据融合基础模型的学习通用人类流动模式 2503.15779v2 -
143 07-17 How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction Wie wirkt sich Beschriftungsfehler auf das kontrasive Lernen aus? Eine Perspektive aus der Datendimensionalitätsreduktion 标签错误影响差异影响学习如何进行? 减少数据多维度的视角 2507.11161v2 -
144 07-17 Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems Vereinheitlichung der erklärbaren Anomalienerkennung und der Ursachenanalyse in dynamischen Systemen 动态系统中不可解释的异常探测和根本原因分析 2502.12086v3 -
145 07-17 BEARCUBS: A benchmark for computer-using web agents BEARCUBS: Benchmark für computergestützte Web-Agenten BEARCUBS:计算机使用网络代理器的基准 2503.07919v2 -
146 07-17 Rethinking Inductive Bias in Geographically Neural Network Weighted Regression Induktive Bias im geographisch neuralen Netzwerk neu denken Gewichtete Regression 重新思考在地理神经网络中诱导的偏见 2507.09958v2 -
147 07-17 Multi-View Node Pruning for Accurate Graph Representation Multi-View-Knotenschnitt für eine exakte Graphendarstellung 多查看节点 精确图表代表 2503.11737v4 -
148 07-17 Scaling Trends for Data Poisoning in LLMs Skalierungstrends für Datenvergiftungen in LLMs LLMM中数据中毒趋势的扩大趋势 2408.02946v6 -
149 07-17 From SGD to Spectra: A Theory of Neural Network Weight Dynamics Von SGD zu Spectra: Eine Theorie der neuralen Netzwerkgewichtsdynamik 从SGD到Spetra:神经网络强度动态理论 2507.12709v1 -
150 07-17 PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform PinFM: Gründungsmodell für Benutzeraktivität Sequenzen auf einer Visual Discovery Platform im Milliardenmaßstab PinFM:十亿规模视觉发现平台用户活动序列基础模型 2507.12704v1 -
151 07-16 (3) Benchmarking Deception Probes via Black-to-White Performance Boosts Benchmarking Deception Probes über Black-to-White Performance Boosts 通过黑到白性性能促进手段的欺骗性探测 2507.12691v1 -
152 07-16 Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights Finite-Dimensional Gaussian Approximation für tiefe neurale Netzwerke: Universalität in zufälligen Gewichten 深神经网络的简单多功能高斯近似度:随机重量的普遍性 2507.12686v1 -
153 07-16 Data Transformation Strategies to Remove Heterogeneity Strategien zur Datentransformation zur Entfernung von Heterogenität 消除异异性的数据转换战略 2507.12677v1 -
154 07-16 Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography Vergleichende Auswertung von Radiomik und Deep-Learning-Modellen zur Erkennung von Krankheiten in der Brustradiographie 比较评价用于在胸针射电摄影中检测疾病辐射学和深学习模型的比较评价 2504.12249v2 -
155 07-16 Fly, Fail, Fix: Iterative Game Repair with Reinforcement Learning and Large Multimodal Models Fly, Fail, Fix: Iterative Spiel Reparatur mit Verstärkung Lernen und große multimodale Modelle Fly、Fly、fail、Fix:利用强化学习和大型多模式模式进行迭接游戏修理 2507.12666v1 -
156 07-16 UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning UPCORE: Nutzenschonende Coreset-Auswahl für ausgewogenes Lernen UPCORE: 平衡退学的核心选择 2502.15082v2 -
157 07-16 Physics constrained learning of stochastic characteristics Physik bedingtes Lernen stochastischer Merkmale 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 2507.12661v1 -
158 07-16 Data-driven rainfall prediction at a regional scale: a case study with Ghana Datengesteuerte Niederschlagsprognose auf regionaler Ebene: eine Fallstudie mit Ghana 区域规模以数据驱动的降雨预测:加纳案例研究 2410.14062v3 -
159 07-16 DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback DOPL: Direktes Online-Preference-Lernen für ruhelose Banditen mit Preference Feedback DCPL: 提供首选反馈的无休眠强盗直接在线优先学习 2410.05527v2 -
160 07-16 Improving physics-informed neural network extrapolation via transfer learning and adaptive activation functions Verbesserung der Physik-informierten neuronalen Netzwerk-Extrapolation durch Transfer-Lernen und adaptive Aktivierungsfunktionen 通过转让学习和适应性启动功能,改进物理学知情神经网络的外推法 2507.12659v1 -
161 07-16 Distributional Reinforcement Learning on Path-dependent Options Distributionelle Stärkung Lernen über pathabhängige Optionen 关于依赖道路的选项的分布强化分发学习 2507.12657v1 -
162 07-16 Federated Learning in Open- and Closed-Loop EMG Decoding: A Privacy and Performance Perspective Federated Learning in Open- and Closed-Loop EMG Decodierung: Eine Datenschutz- und Performanceperspektive 开放和闭闭门和闭闭门环境管理集团解释中的联邦学习:隐私和业绩展望 2507.12652v1 -
163 07-16 Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting Timing ist wichtig: Risiko-aware Fund Allokation basierend auf Time-Series Forecasting 时间选择很重要:根据时间-系列预测进行有风险的基金分配 2505.24835v3 -
164 07-16 A Novel Data Augmentation Strategy for Robust Deep Learning Classification of Biomedical Time-Series Data: Application to ECG and EEG Analysis Eine neuartige Datenvergrößerungsstrategie für robustes Deep Learning Klassifizierung biomedizinischer Zeitreihendaten: Anwendung auf EKG- und EEG-Analysen 生物医学时间序列数据深入学习分类:应用ECG和EEG分析的新颖数据增强战略 2507.12645v1 -
165 07-16 Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows Fine-Tune ein SLM oder Prompt ein LLM? Der Fall der Erzeugung von Low-Code Workflows 微调可持续土地管理还是迅速提炼一个LLM? 产生低碳工作流程的案例 2505.24189v2 -
166 07-16 Cross-Layer Discrete Concept Discovery for Interpreting Language Models Cross-Layer Discrete Concept Discovery für Interpretationssprachmodelle 解释语言模型的跨语言监听概念发现 2506.20040v2 -
167 07-16 On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations Über die lineare Beschleunigung des personalisierten Federated Verstärkungslernens mit geteilten Repräsentationen 在线加快个人化联邦强化学习,共用代表 2411.15014v2 -
168 07-16 VLMgineer: Vision Language Models as Robotic Toolsmiths VLMgineer: Vision Language Models als Roboterwerkzeugmacher VLMGineer:作为机器人工具匠的愿景语言模型 2507.12644v1 -
169 07-16 Manify: A Python Library for Learning Non-Euclidean Representations Manify: Eine Python-Bibliothek zum Lernen nicht-euklidischen Repräsentationen 拼写:一个用于学习非欧洲语言代表的皮顿图书馆 2503.09576v2 -
170 07-16 Multi-task retriever fine-tuning for domain-specific and efficient RAG Multi-Task Retriever Feinabstimmung für domänenspezifische und effiziente RAG 多任务检索器微调,用于特定领域和高效率的RAG 2501.04652v2 -
171 07-16 Reasoning-Finetuning Repurposes Latent Representations in Base Models Reasoning-Finetuning Repurposes Latente Darstellungen in Basismodellen 基础模型中的重新目的前期代表 2507.12638v1 -
172 07-16 LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization LoRA Done RITE: Robuste Invariante Transformations-Equilibration für LoRA-Optimierung Lora Done REITE: 优化 LoRA 的强劲的动态转型平衡 2410.20625v2 -
173 07-16 Escaping Plato’s Cave: JAM for Aligning Independently Trained Vision and Language Models Escaping Platons Cave: JAM for Aligning Independently Trained Vision and Language Models 脱离柏拉图的洞穴:调整独立培训的愿景和语言模式的JAM 2507.01201v4 -
174 07-16 Conformal inference for regression on Riemannian Manifolds Konforme Schlussfolgerung zur Regression auf Riemannische Manifolds 里伊曼尼马内佛山回归的正规推论 2310.08209v2 -
175 07-16 Active Human Feedback Collection via Neural Contextual Dueling Bandits Aktive menschliche Feedback-Sammlung über neurale Kontext-Duellbanditen 通过神经环境授权强盗收集活性人类反馈 2504.12016v2 -
176 07-16 Hamiltonian Neural Networks approach to fuzzball geodesics Hamiltonian Neural Networks Ansatz für Fuzzball Geodäsie 汉密尔顿神经网络法 2502.20881v3 -
177 07-16 BootSeer: Analyzing and Mitigating Initialization Bottlenecks in Large-Scale LLM Training BootSeer: Analysieren und Abmildern von Initialisierungsengpässen im großformatigen LLM-Training BoutSeer:大规模LLM培训中分析和减缓初始化瓶颈 2507.12619v1 -
178 07-16 Boolformer: Symbolic Regression of Logic Functions with Transformers Booformer: Symbolische Regression von logischen Funktionen mit Transformern 布尔: 带有变换器的逻辑函数的符号回归 2309.12207v2 -
179 07-16 Quantum HyperNetworks: Training Binary Neural Networks in Quantum Superposition Quantum HyperNetworks: Training von Binary Neural Networks in der Quantenüberlagerung 量子超超网络:在量子叠置方面培训二元神经网络 2301.08292v2 -
180 07-16 Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning Lernen, was zählt: Probabilistische Aufgabenauswahl über Gegenseitige Informationen zur Modellfeinsteuerung 学习什么重要:通过相互信息选择任务概率选择,用于示范微调 2507.12612v1 -
181 07-16 Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies Mixed-Reality Digital Twins: Nutzung der physischen und virtuellen Welten für Hybrid Sim2Real Transition von Multi-Agent Verstärkungs-Learning-Politiken 混合-现实数字双对:利用物理和虚拟世界促进混合的Sim2重新过渡多机构强化学习政策 2403.10996v6 -
182 07-16 The Target Polish: A New Approach to Outlier-Resistant Non-Negative Matrix and Tensor Factorization Das Zielpolnisch: Ein neuer Ansatz für eine nicht-negative Matrix und Tensor-Fabrikierung 目标波兰:对外部-外部-相对非消极矩阵和电文因素化的新办法 2507.10484v2 -
183 07-16 Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization? Sind Encoder in der Lage, Landmarken für den Warmstart der Hyperparameter-Optimierung zu lernen? 编码员能否学习取暖启动超参数优化的地标? 2507.12604v1 -
184 07-16 A Survey of Explainable Reinforcement Learning: Targets, Methods and Needs Eine Übersicht über das Erklärbare Verstärkte Lernen: Ziele, Methoden und Bedürfnisse 《可解释的强化学习调查:目标、方法和需要》 2507.12599v1 -
185 07-16 SCULPT: Systematic Tuning of Long Prompts SCULPT: Systematisches Tuning von langen Prompts SCULPT: 长期提示系统图示 2410.20788v3 -
186 07-16 Nonparametric IPSS: Fast, flexible feature selection with false discovery control Nichtparametrischer IPSS: Schnelle, flexible Feature-Auswahl mit falscher Discovery-Steuerung 非参数IPSS:采用虚假发现控制快速、灵活地选择特征 2410.02208v3 -
187 07-16 Cross-Problem Parameter Transfer in Quantum Approximate Optimization Algorithm: A Machine Learning Approach Cross-Problem-Parameter-Transfer in Quanten Ungefähre Optimierungs-Algorithmus: Ein Ansatz zum maschinellen Lernen 量子中交叉问题参数转移 近最佳优化算法:机械学习方法 2504.10733v3 -
188 07-16 Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows Best Practices für großformatige, Pixel-Wise-Crop-Mapping- und Transfer-Lern-Workflows 大型、像素-威氏作物绘图和转移学习性工作流程最佳做法 2507.12590v1 -
189 07-16 Second-Order Bounds for [0,1]-Valued Regression via Betting Loss Zweiter Ordnungsbund für [0,1]-bewertete Regression über Wetting Loss [0,1]-通过打赌损失导致的有价累退 2507.12584v1 -
190 07-16 Ranking Vectors Clustering: Theory and Applications Ranking Vektoren Clustering: Theorie und Anwendungen 病媒分类组合:理论和应用 2507.12583v1 -
191 07-16 Deep Bilinear Koopman Model for Real-Time Vehicle Control in Frenet Frame Tiefes Bilineares Koopman-Modell für Echtzeit-Fahrzeugsteuerung im Frenet-Rahmen Frenet框架中实时车辆控制深海双线性库普曼模型 2507.12578v1 -
192 07-16 Assay2Mol: large language model-based drug design using BioAssay context Assay2Mol: großsprachiges, modellbasiertes Arzneimitteldesign unter Verwendung von BioAssay-Kontexten Assay2Mol:使用BioAssay环境的大型语言示范药物设计 2507.12574v1 -
193 07-16 IncA-DES: An incremental and adaptive dynamic ensemble selection approach using online K-d tree neighborhood search for data streams with concept drift IncA-DES: Ein inkrementeller und adaptiver dynamischer Ensemble-Auswahlansatz mit Online-K-d-Baum Nachbarschaftssuche nach Datenströmen mit Konzeptdrift IncA-DES:使用在线K-d树区搜索带有概念漂移的数据流的渐进和适应性动态动态混合选择方法 2507.12573v1 -
194 07-16 Evaluation of Neural Surrogates for Physical Modelling Synthesis of Nonlinear Elastic Plates Bewertung von Neuralen Surrogaten für die physikalische Modellierung der Synthese nichtlinearer elastischer Platten 评价非线性电磁板物理模拟合成神经悬浮体评价 2507.12563v1 -
195 07-16 Rel-HNN: Split Parallel Hypergraph Neural Network for Learning on Relational Databases Rel-HNN: Paralleles Hypergraphen-Neurales Netzwerk zum Lernen auf relationalen Datenbanken Rel-HNN: 用于在关系数据库中学习的分平行超时图神经网络 2507.12562v1 -
196 07-16 Monocular 3D Hand Pose Estimation with Implicit Camera Alignment Monokulare 3D-Hand Pose-Schätzung mit Impliziter Kameraausrichtung 带有隐性相机对齐的手动脉动估计 2506.11133v2 -
197 07-16 Neural stochastic Volterra equations: learning path-dependent dynamics Neural stochastische Volterra-Gleichungen: Lernpfad-abhängige Dynamiken 神经随机伏变方程式:学习依赖路径的动态 2407.19557v2 -
198 07-16 Machine Learning Systems: A Survey from a Data-Oriented Perspective Machine Learning Systems: Eine Umfrage aus datenorientierter Perspektive 机械学习系统:从数据导向的角度进行调查 2302.04810v3 -
199 07-16 Improving Transformer World Models for Data-Efficient RL Verbesserung von Transformer-Weltmodellen für dateneffiziente RL 改进数据效率RL世界模型 2502.01591v3 -
200 07-16 Can Mental Imagery Improve the Thinking Capabilities of AI Systems? Kann Mental Imagery die Denkfähigkeiten von KI-Systemen verbessern? 精神形象能提高人工智能系统的思考能力吗? 2507.12555v1 -
201 07-16 The Serial Scaling Hypothesis Die serienmäßige Skalierungshypothese 序列缩放假设 2507.12549v1 -
202 07-16 Language Models Improve When Pretraining Data Matches Target Tasks Sprachmodelle verbessern, wenn die Vorschulung von Daten zu Zielaufgaben passt 培训前数据匹配目标任务时改进语言模式 2507.12466v1 -
203 07-16 CytoSAE: Interpretable Cell Embeddings for Hematology CytoSAE: Interpretierbare Zelleinbettungen für die Hämatologie CytoSAE: 热病学的解释性细胞嵌入 2507.12464v1 -
204 07-16 Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models Difused Responsibility: Analyse des Energieverbrauches von generativen Text-zu-Audio-Diffusionsmodellen 挥散责任:分析产生型号向视听传播模型的能源消耗 2505.07615v2 -
205 07-16 Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training Scaling Up RL: Unlocking Diverse Reasoning in LLMs durch längeres Training 提升RL:通过长期培训解锁LLMs的多样化理由 2507.12507v1 -
206 07-16 Cost-aware Stopping for Bayesian Optimization Kostenbewusstes Stoppen für die Bayesian-Optimierung Bayesian最佳最佳化的成本意识停止 2507.12453v1 -
207 07-16 MARS: Unleashing the Power of Variance Reduction for Training Large Models MARS: Die Kraft der Varianzreduktion für das Training großer Modelle freisetzen MARS:释放减少差异的力量,用于培训大型模式 2411.10438v3 -
208 07-16 S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic Modeling S2WTM: Spherical Sliced-Wasserstein Autoencoder für Themenmodellierung S2WTM: 用于专题建模的球球锯子-Wasserstein自动编码器 2507.12451v1 -
209 07-16 Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning Navigation auf der sozialen Wohlfahrtsgrenze: Portfolios für multi-objektives Stärkungslernen 引导社会福利前沿:多目标加强学习一揽子计划 2502.09724v2 -
210 07-16 The Utility of the Virtual Imaging Trials Methodology for Objective Characterization of AI Systems and Training Data Die Nützlichkeit der Virtual Imaging Trials Methodik zur objektiven Charakterisierung von KI-Systemen und Trainingsdaten AI系统和培训数据客观定性虚拟成像试验方法的效用 2308.09730v5 -
211 07-16 Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length Charakterisieren von State Space Model (SSM) und SSM-Transformer Hybrid Language Model Performance mit langer Kontextlänge 确定国家空间模型(SSM)和SSM-过渡混合语言模型长内性性能特点 2507.12442v1 -
212 07-16 Describe Anything Model for Visual Question Answering on Text-rich Images Beschreiben Sie alles Modell für die visuelle Frage Antwort auf Text-reiche Bilder 描述在丰富文本图像上视觉问答的 “ 任何东西 “ 模式 2507.12441v1 -
213 07-16 PBM-VFL: Vertical Federated Learning with Feature and Sample Privacy PBM-VFL: Vertical Federated Learning mit Feature und Sample Privacy PBM-VFL: 具有特色和抽样隐私的垂直联邦学习 2501.13916v3 -
214 07-16 A Bayesian Incentive Mechanism for Poison-Resilient Federated Learning Ein bayesischer Anreizmechanismus für toxisch-resilientes Federated Learning 贝耶斯州具有毒性抗毒性的联邦学习激励机制 2507.12439v1 -
215 07-16 Targeted Deep Architectures: A TMLE-Based Framework for Robust Causal Inference in Neural Networks Gezielte Tiefenarchitekturen: Ein TMLE-basiertes Framework für robuste Kausalableitung in neuralen Netzwerken 定向深层建筑:以TMLE为基础的神经网络硬性诱因推断框架 2507.12435v1 -
216 07-16 DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications DUNIA: Pixel-Sized-Embeddings über Cross-Modal Alignment für Erdbeobachtungsanwendungen DUNIA:通过对地观测应用的跨模式一致利用像素化嵌入 2502.17066v2 -
217 07-16 Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models Können wir eine Ausrichtung voraussagen, bevor Modelle das Denken beenden? 我们能否在模型完成思考之前实现预测一致? 2507.12428v1 -
218 07-16 Unit-Based Histopathology Tissue Segmentation via Multi-Level Feature Representation Einheitsbasierte Histopathologie Tissue-Segmentierung über Multi-Level-Feature-Darstellung 通过多级地物代表制进行分类 2507.12427v1 -
219 07-16 Mixture of Raytraced Experts Mischung von Raytraced Experts 探雷专家混合体 2507.12419v1 -
220 07-16 AutoVDC: Automated Vision Data Cleaning Using Vision-Language Models AutoVDC: Automatisierte Vision-Datenreinigung mit Vision-Sprachenmodellen AutoVDC:利用视觉语言模型自动清理视觉数据 2507.12414v1 -
221 07-16 MirrorCBO: A consensus-based optimization method in the spirit of mirror descent MirrorCBO: Eine Konsens-basierte Optimierungsmethode im Geiste der Spiegelabkunft BUSRCBO: 本着反光下沉精神采取协商一致的优化方法 2501.12189v2 -
222 07-16 NOCTA: Non-Greedy Objective Cost-Tradeoff Acquisition for Longitudinal Data NOCTA: Nicht-griechisches Ziel Kosten-Tradeoff-Erwerb für Längsschnittdaten NOCTA: 用于纵向数据的非通用目标 2507.12412v1 -
223 07-16 Simple Mechanistic Explanations for Out-Of-Context Reasoning Einfache mechanistische Erklärungen für Out-of-Context Reasoning 外部逻辑理由的简单机械解释 2507.08218v2 -
224 07-16 Large Language Models are Unreliable for Cyber Threat Intelligence Große Sprachmodelle sind für Cyber Threat Intelligence unzuverlässig 大语言模型在网络威胁情报中不可靠 2503.23175v2 -
225 07-16 Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts Neurale Netzwerk-geführte symbolische Regression für interpretierbare Deskriptor-Entdeckung in Perovskite-Katalysatoren Perovskite催化器中可解释描述器发现器的神经网络-导导符号回归 2507.12404v1 -
226 07-16 ROC-n-reroll: How verifier imperfection affects test-time scaling ROC-n-Reroll: Wie die Unvollkommenheit der Prüfer die Skalierung der Testzeit beeinflusst ROC-n-reroll:核查不完善如何影响测试时间的缩放 2507.12399v1 -
227 07-16 BondMatcher: H-Bond Stability Analysis in Molecular Systems BondMatcher: H-Bond Stabilitätsanalyse in molekularen Systemen BondMatcher:H-Bond 分子系统稳定分析 2504.03205v2 -
228 07-16 Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries Tree-based Machine Learning von $MoS_2$ Flash-basierte analoge CAM mit inhärenten weichen Grenzen 以可信赖的树为基础的以树为基础的机器学习,用$MoS$2$ 以闪光为基础的模拟 CAM 与固有软边界 2507.12384v1 -
229 07-16 Improving Reinforcement Learning Sample-Efficiency using Local Approximation Verbesserung des Ausbaus des Lernens anhand lokaler Näherungswerte 利用当地接近率改进强化学习学习抽样效率 2507.12383v1 -
230 07-16 Heat Kernel Goes Topological Wärme-Kernel wird topologisch 热中心戈斯地形学 2507.12380v1 -
231 07-16 Towards Understanding Link Predictor Generalizability Under Distribution Shifts Auf dem Weg zum Verständnis von Link Predictor Verallgemeinerbarkeit unter Verteilungsverschiebungen 实现对分配变化下的可通用性 2406.08788v3 -
232 07-16 Exploring and Analyzing Wildland Fire Data Via Machine Learning Techniques Erforschen und Analysieren von Wildland-Feuerdaten über maschinelle Lerntechniken 探索和分析荒野火灾数据 2311.05128v2 -
233 07-16 Distilling Invariant Representations with Dual Augmentation Destillieren von Invarianten Darstellungen mit Dual Augmentation 具有双重加增的蒸馏变异表示式 2410.09474v4 -
234 07-16 Bridging Predictive Coding and MDL: A Two-Part Code Framework for Deep Learning Bridging Predictive Coding und MDL: Ein zweiteiliges Code-Framework für Deep Learning 架桥预测编码和MDL:深层学习两部分守则框架 2505.14635v2 -
235 07-16 Planning-Aware Code Infilling via Horizon-Length Prediction Planning-Aware Code Infilling via Horizon-Length Prediction 通过地平线-地球预测填充规划-软件代码 2410.03103v3 -
236 07-16 Sparse Orthogonal Parameters Tuning for Continual Learning Sparse Orthogonale Parameter Tuning für kontinuierliches Lernen 用于持续学习的 简单正纵向参数图示 2411.02813v2 -
237 07-16 Active Deep Kernel Learning of Molecular Properties: Realizing Dynamic Structural Embeddings Aktives tiefes Kernel-Lernen von molekularen Eigenschaften: Dynamische strukturelle Einbettungen realisieren 活跃的分子属性深核学习:实现动态结构嵌入 2403.01234v2 -
238 07-16 Nonlinear Concept Erasure: a Density Matching Approach Nichtlineare Konzeptauslöschung: ein Density-Matching-Ansatz 非线性概念时代:密度匹配方法 2507.12341v1 -
239 07-16 GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning GHPO: Adaptive Anleitung für stabiles und effizientes LLM-Verstärkungslernen GHPO: 稳定有效的LLM强化学习适应性指导 2507.10628v2 -
240 07-16 Neural Polar Decoders for Deletion Channels Neurale Polardecoder für Löschkanäle Dedeletion 通道的神经极极代碼器 2507.12329v1 -
241 07-16 Quantifying calibration error in modern neural networks through evidence based theory Quantifizierung von Kalibrierfehlern in modernen neuronalen Netzwerken durch evidenzbasierte Theorie 通过基于证据的理论对现代神经网络中的校准错误进行量化 2411.00265v2 -
242 07-16 MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball MVP-Shapley: Featurebasierte Modellierung für die Bewertung des wertvollsten Spielers im Basketball MVP-Shaplay:评估篮球中最有价值的玩家的基于地物的模型模型 2506.04602v2 -
243 07-16 Thought Purity: Defense Paradigm For Chain-of-Thought Attack Thought Purity: Verteidigungsparadigm für den Ketten-of-Thought-Angriff 思想纯度: 研究链攻击的防御范式 2507.12314v1 -
244 07-16 PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning PROL : Probefreies kontinuierliches Lernen in Streaming-Daten über Prompt Online-Lernen PROL: 通过即时在线学习在流数据中进行排练免费持续学习 2507.12305v1 -
245 07-16 AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization AnnoPage Datensatz: Datensatz nicht-textlicher Elemente in Dokumenten mit feinkörniger Kategorisierung AnnoPage 数据集: 精细分类文档中非形式元素数据集 2503.22526v3 -
246 07-16 ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy 维一致:细胞显微镜:扩大生物代表性学习 2411.02572v2 -
247 07-16 RegCL: Continual Adaptation of Segment Anything Model via Model Merging RegCL: Kontinuierliche Anpassung des Segments an alles Modell über Modellverschmelzung RegCL:通过模型合并不断调整区段 “ 任何东西 “ 模式 2507.12297v1 -
248 07-16 Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding Text-ADBench: Text-Anomaly Detection Benchmark basierend auf LLMs Einbetten 文本 – – 亚银:基于嵌入LLMs的文本异常检测基准 2507.12295v1 -
249 07-16 On the Statistical Properties of Generative Adversarial Models for Low Intrinsic Data Dimension Über die statistischen Eigenschaften generativer Adversarialmodelle für die geringe intrinsische Datendimension 关于低内在数据层面的生成反逆模型的统计属性 2401.15801v2 -
250 07-16 RACER: Rational Artificial Intelligence Car-following-model Enhanced by Reality RACER: Rationale Künstliche Intelligenz Car-following-Modell durch Realität verbessert RACER: 合理人工人工智能汽车跟踪模型 2312.07003v2 -
251 07-16 Uncertainty Quantification for Motor Imagery BCI – Machine Learning vs. Deep Learning Unsicherheit Quantifizierung für Motor Imagery BCI – Machine Learning vs. Deep Learning 机动图像BCI – – 机器学习与深层学习 2507.07511v2 -
252 07-16 What’s Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift Was zieht die Strings? Bewertung von Integrität und Attribution in KI-Training und Schlussfolgerung durch Konzeptverschiebung 什么是拉弦?在AI培训和推论中通过概念转变评估诚信和归属。 2504.21042v3 -
253 07-16 Structured and Balanced Multi-Component and Multi-Layer Neural Networks Strukturierte und ausgewogene Multi-Komponenten- und Multi-Layer-Neural-Netzwerke 结构化和平衡式多功能和多功能多功能多功能多功能多功能多功能神经网络 2407.00765v3 -
254 07-16 A Framework for Nonstationary Gaussian Processes with Neural Network Parameters Ein Framework für nichtstationäre Gauß-Prozesse mit neuralen Netzwerkparametern 带有神经网络参数的非静止高斯进程框架 2507.12262v1 -
255 07-16 Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty Proaktive Agenten für Multi-Turn-Text-to-Image-Generierung unter Unsicherheit 多发文本到图像在不确定情况下生成的活性剂 2412.06771v2 -
256 07-16 A Thorough Assessment of the Non-IID Data Impact in Federated Learning Eine gründliche Bewertung der Auswirkungen von nicht-IID-Daten auf das Federated Learning 彻底评估非二二二二二项数据对联邦学习的影响 2503.17070v2 -
257 07-16 Robust Causal Discovery in Real-World Time Series with Power-Laws Robuste Causal Discovery in der Real-World Time Series mit Power-Laws 具有权力法的 “ 真实世界时间系列 “ 中强有力的因果发现 2507.12257v1 -
258 07-16 Surrogate Quantum Circuit Design for the Lattice Boltzmann Collision Operator Surrogate Quantum Circuit Design für den Lattice Boltzmann Collision Operator Lattice Boltzmann 碰撞操作员的代管量子电路设计 2507.12256v1 -
259 07-16 Comparative Analysis of CNN Performance in Keras, PyTorch and JAX on PathMNIST Vergleichende Analyse der CNN-Leistung in Keras, PyTorch und JAX auf PathMNIST CNN在Keras、PyTorch和JAX在 “ 路运 “ 上的表现比较分析 2507.12248v1 -
260 07-16 Universal Fourier Neural Operators for Micromechanics Universal Fourier-Neural-Betreiber für Mikromechanik 通用微型机械天体神经操作员 2507.12233v1 -
261 07-16 FADE: Why Bad Descriptions Happen to Good Features FADE: Warum schlechte Beschreibungen gut aussehen FADE:为什么不良描述发生在好地貌 2502.16994v2 -
262 07-16 Holistic analysis on the sustainability of Federated Learning across AI product lifecycle Ganzheitliche Analyse der Nachhaltigkeit von Federated Learning über den gesamten Lebenszyklus von KI-Produkten hinweg 关于全AI性产品生命周期中联邦学习可持续性的全面分析 2312.14628v3 -
263 07-16 Optimizers Qualitatively Alter Solutions And We Should Leverage This Optimierer Qualitativ alternative Lösungen und wir sollten diese nutzen 最优化质的平价平价解决方案,我们应该利用这个 2507.12224v1 -
264 07-16 Error bounds for particle gradient descent, and extensions of the log-Sobolev and Talagrand inequalities Fehlergrenzen für Partikelgradientenabstieg und Erweiterungen der log-Sobolev- und Talagrand-Ungleichheiten 粒子梯度下降错误的界限,以及log-Sobolev 和 Talagrand 不平等的延伸 2403.02004v3 -
265 07-16 Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control Sparse Autoencoder für sequentielle Empfehlungsmodelle: Interpretation und flexible Steuerung 序列建议模型:解释和灵活控制 2507.12202v1 -
266 07-16 Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms Prominente Rollen bedingt Invarianter Komponenten in der Domänenanpassung: Theorie und Algorithmen 有条件的不变化构件在适应域中的主要作用:理论和数值 2309.10301v3 -
267 07-16 Selective Quantization Tuning for ONNX Models Selektive Quantisierungstuning für ONNX-Modelle ONNX 模型选择性量化图 2507.12196v1 -
268 07-16 Explainable Evidential Clustering Erklärbares Evidential Clustering 可解释的证明人群集 2507.12192v1 -
269 07-16 BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search BenchRL-QAS: Benchmarking Bewehrung Lernalgorithmen für die Quantenarchitektursuche BenchRL-QAS:为量子结构搜索确定强化学习算法的基准 2507.12189v1 -
270 07-16 MTF-Grasp: A Multi-tier Federated Learning Approach for Robotic Grasping MTF-Grasp: Multi-Tier-Federated Learning Approach for Robotic Grasping MTF-Grasp: 一种多阶段联邦的机器人采掘学习方法 2507.10158v2 -
271 07-16 2.5D Object Detection for Intelligent Roadside Infrastructure 2.5D-Objekterkennung für intelligente Straßeninfrastruktur 2.5D 智能路边基础设施物体探测 2507.03564v2 -
272 07-16 LHU-Net: a Lean Hybrid U-Net for Cost-efficient, High-performance Volumetric Segmentation LHU-Net: Ein schlankes Hybrid-U-Net für kosteneffiziente, leistungsstarke Volumetric-Segmentierung LHU-Net:低成本效益、高性能量量分解的精混合U-Net 2404.05102v3 -
273 07-16 NeuTSFlow: Modeling Continuous Functions Behind Time Series Forecasting NeuTSFlow: Modellierung kontinuierlicher Funktionen hinter Zeitreihen Prognose NeSTSFlow: 时间序列预测背后的模拟连续函数 2507.09888v2 -
274 07-16 RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection RUMAA: Repeat-Aware Unified Music Audio Analyse zur Ausrichtung, Transkription und Fehlererkennung RUMAA: 用于计分业绩协调、追踪和误差探测的重复软件统一音乐音频分析 2507.12175v1 -
275 07-16 Sharing is CAIRing: Characterizing Principles and Assessing Properties of Universal Privacy Evaluation for Synthetic Tabular Data Sharing is CAIRing: Charakterisierende Prinzipien und Bewertung der Eigenschaften der universellen Datenschutzbewertung für synthetische Tabellendaten 共享是CAIR:确定合成图表数据通用隐私评价的原则和特性评估 2312.12216v2 -
276 07-16 Governance of Generative Artificial Intelligence for Companies Governance generativer Künstlicher Intelligenz für Unternehmen 公司创造人工情报的治理 2403.08802v4 -
277 07-16 RadioDiff-3D: A 3D$\times$3D Radio Map Dataset and Generative Diffusion Based Benchmark for 6G Environment-Aware Communication RadioDiff-3D: Ein 3D$\times$3D Radio Map Datensatz und Generative Diffusionsbasierter Benchmark für 6G Environment-Aware Kommunikation RadioDiff-3D: 6G 环境软件通信的3D$3D无线电地图数据集和基于发源传播的基准3D美元 2507.12166v1 -
278 07-16 Multi-Component VAE with Gaussian Markov Random Field Multi-Komponent VAE mit Gaussian Markov Random Field 带有 Gaussian Markov 随机字段的多功能 VAE 2507.12165v1 -
279 07-16 Patherea: Cell Detection and Classification for the 2020s Patherea: Zellerkennung und Klassifizierung für die 2020er Jahre Pathea:2020年代细胞检测和分类 2412.16425v2 -
280 07-16 Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training Schutz urheberrechtlich geschützter Materialien mit einzigartigen Identifikatoren in großsprachlichen Modellschulungen 在大语言模式培训中以独特标识人保护版权材料 2403.15740v3 -
281 07-16 Data Augmentation in Time Series Forecasting through Inverted Framework Datenvergrößerung in Zeitreihen Vorhersage durch umgekehrtes Framework 通过反向框架预测时间序列中的数据增加值 2507.11439v2 -
282 07-16 Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery Complexity-Aware Training Deep Neural Networks für eine optimale Struktur-Discovery 为发现最佳结构最佳结构而进行深神经网络的复杂度知识培训 2411.09127v2 -
283 07-16 PRISM: Distributed Inference for Foundation Models at Edge PRISM: Verteilte Schlussfolgerung für Stiftungsmodelle am Rand PRISM: 边缘基础模型分布式推理 2507.12145v1 -
284 07-16 FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale FourCastNet 3: Ein geometrischer Ansatz zur probabilistischen maschinellen Wettervorhersage im Maßstab 4CastNet 3: 大规模机学习气象预测概率的几何方法 2507.12144v1 -
285 07-16 RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization RiemannLoRA: Ein einheitliches Riemann-Rahmenwerk für die ambiguitätsfreie LoRA-Optimierung Riemann LoRA:无模糊无洛拉优化的统一里伊曼框架 2507.12142v1 -
286 07-16 Neural Human Pose Prior Neurale menschliche Pose vor 人类神经先锋 2507.12138v1 -
287 07-16 FedRef: Communication-Efficient Bayesian Fine Tuning with Reference Model FedRef: Kommunikation-Effizient Bayesian Feinabstimmung mit Referenzmodell FedRef: 通信-节能贝ysian精密票,参考模型 2506.23210v2 -
288 07-16 HyDRA: A Hybrid Dual-Mode Network for Closed- and Open-Set RFFI with Optimized VMD HyDRA: Hybrides Dual-Mode-Netzwerk für geschlossenes und offenes RFFI mit optimiertem VMD HYDRA: 具有优化VMD的封闭式和开放式RFFI混合双模式网络 2507.12133v1 -
289 07-16 Self-Adaptive and Robust Federated Spectrum Sensing without Benign Majority for Cellular Networks Selbstadaptives und robustes Federated Spectrum Sensing ohne Benign Majority für Zelluläre Netzwerke 细胞网络的自我适应和强力联邦光谱测量,不以优美多数进行细胞网络 2507.12127v1 -
290 07-16 Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis Iterative Augmentation mit Summarization Refinement (IASR) Evaluation für unstrukturierte Umfragedaten Modellierung und Analyse 对无结构调查数据建模和分析的抽样改进(IASR)评价 2507.12126v1 -
291 07-16 From Observational Data to Clinical Recommendations: A Causal Framework for Estimating Patient-level Treatment Effects and Learning Policies Von Beobachtungsdaten zu klinischen Empfehlungen: Ein ursächlicher Rahmen für die Schätzung von Behandlungseffekten und Lernstrategien auf Patientenebene 从观察数据到临床建议:估计病人治疗效果和学习政策的结果框架 2507.11381v2 -
292 07-16 Learning to Reason at the Frontier of Learnability Vernunft lernen an der Grenze der Lernfähigkeit 学习在可学习的前沿学习理性 2502.12272v4 -
293 07-16 Multimodal Coordinated Online Behavior: Trade-offs and Strategies Multimodal koordiniertes Online-Verhalten: Kompromisse und Strategien 多式联运协调在线行为:取舍和战略 2507.12108v1 -
294 07-16 A Privacy-Preserving Framework for Advertising Personalization Incorporating Federated Learning and Differential Privacy Ein Privacy-Preserving Framework für Werbung Personalisierung Einschließlich Federated Learning und Differential Privacy 包含联邦学习和不同隐私的隐私保护框架 2507.12098v1 -
295 07-16 Measuring Informativeness Gap of (Mis)Calibrated Predictors Messung der Informativitätslücke von (Miss)Kalibrierten Vorhersagern 测量(米)已校算的预测人的信息差距 2507.12094v1 -
296 07-16 Improved Analysis for Sign-based Methods with Momentum Updates Verbesserte Analyse für signbasierte Methoden mit Momentum-Updates 改进对基于信号方法的最新动态分析 2507.12091v1 -
297 07-16 Emergence of Quantised Representations Isolated to Anisotropic Functions Entstehung quantifizierter Repräsentationen isoliert mit anisotropen Funktionen 孤立到非尼斯代职能的量化代表的出现情况 2507.12070v1 -
298 07-16 Enhancing RLHF with Human Gaze Modeling Verbesserung der RLHF mit dem Modellieren von Human Gaze 利用人体盖盖模型模型增强RLHF 2507.09016v2 -
299 07-16 StylOch at PAN: Gradient-Boosted Trees with Frequency-Based Stylometric Features StylOch bei PAN: Gradient-Boosted Trees mit frequenzbasierten stylometrischen Eigenschaften PAN的StylOch:带以频率为基础的音量特征的梯度-波状树 2507.12064v1 -
300 07-16 FloGAN: Scenario-Based Urban Mobility Flow Generation via Conditional GANs and Dynamic Region Decoupling FloGAN: Szenariobasierte Urban Mobility Flow Generation über bedingte GANs und dynamische Region Entkopplung FloGAN:通过有条件的GANs和动态区域脱钩,根据设想情况产生城市流动流动流动流动 2507.12053v1 -
301 07-16 Information-Theoretic Generalization Bounds of Replay-based Continual Learning Information-Theoretische Verallgemeinerung Grenzen des replay-basierten kontinuierlichen Lernens 基于重放的连续不断学习的信息理论一般化环球 2507.12043v1 -
302 07-16 Granular feedback merits sophisticated aggregation Granular Feedback verdient anspruchsvolle Aggregation 精密的汇总值得考虑 2507.12041v1 -
303 07-16 A Computational Theory and Semi-Supervised Algorithm for Clustering A Computational Theory und semi-überwachten Algorithmus für Clustering 集束法的计算理论和半有效比值 2306.06974v2 -
304 07-16 MVAR: MultiVariate AutoRegressive Air Pollutants Forecasting Model MVAR: MultiVariate AutoRegressive Luftverunreinigungs-Prognosemodell MVAR: 多变自动递减空气污染物预测模型 2507.12023v1 -
305 07-16 Incorporating Fairness Constraints into Archetypal Analysis Einschließlich Fairness-Einschränkungen in die Archetypische Analyse 将公平制约因素纳入大区分析 2507.12021v1 -
306 07-16 DUSE: A Data Expansion Framework for Low-resource Automatic Modulation Recognition based on Active Learning DUSE: Ein Datenerweiterungs-Framework für die automatische Modulationserkennung mit geringer Ressource basierend auf aktivem Lernen DUSE:基于积极学习的低资源自动调整识别数据扩展框架 2507.12011v1 -
307 07-16 How does Watermarking Affect Visual Language Models in Document Understanding? Wie wirkt sich Watermarking auf visuelle Sprachmodelle im Dokumentenverständnis aus? 文件理解中的视觉语言模型如何影响水标记? 2504.01048v2 -
308 07-16 Expanding ML-Documentation Standards For Better Security Erweiterung der ML-Dokumentationsstandards für bessere Sicherheit 扩大多L-文件标准以增进安全 2507.12003v1 -
309 07-16 Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing In-Person-Gespräche in geräuschvollen Real-World-Umgebungen mit Smartwatch Audio und Motion Sensing erkennen 利用智能监视音频和运动遥感,在噪音真实世界环境中检测人间谈话 2507.12002v1 -
310 07-16 Labels Generated by Large Language Models Help Measure People’s Empathy in Vitro Etiketten, die durch große Sprachmodelle erzeugt werden, helfen, die Empathie der Menschen in Vitro zu messen 以大语言模型生成的标签 帮助测量体外民众的共鸣 2501.00691v2 -
311 07-16 Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection Können LLMs Betrüger finden? Mehrstufige LLM-Verbesserung der Graphenbetrugserkennung 多级LLM强化图形欺诈探测 2507.11997v1 -
312 07-16 Predictable Scale: Part I, Step Law – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining Vorhersehbare Skala: Teil I, Schrittgesetz – Optimales Hyperparameter-Skalierungsgesetz im großen Sprachmodell Vorschulung 可预测比例:第一部分,步法 – – 大语言示范培训前,最佳超参数缩放法 2503.04715v6 -
313 07-16 Dataset-Adaptive Dimensionality Reduction Datensatz-Adaptive Dimensionalitätsreduktion 数据集-适应多维度减少 2507.11984v1 -
314 07-16 Recent results on searches with boosted Higgs bosons at CMS Aktuelle Ergebnisse bei Suchanfragen mit Higgs-Bosonen am CMS 最近在CMS 使用增强的 Higgs bosons 搜索结果 2507.11977v1 -
315 07-16 Online Training and Pruning of Deep Reinforcement Learning Networks Online-Training und Pruning von Deep Verstärkung Learning Networks 深强化学习网络的在线培训和配置 2507.11975v1 -
316 07-16 Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models Vorhersehbare Skala: Teil II, Farseer: Ein verfeinertes Skalierungsgesetz in großen Sprachmodellen 可预见规模:第二部分,Farseer:改进大语言模式中的规模法 2506.10972v3 -
317 07-16 Regret Analysis of Posterior Sampling-Based Expected Improvement for Bayesian Optimization Bedauerliche Analyse von posteriorer Sampling-basiert erwartete Verbesserung für Bayesian Optimierung 对巴耶斯最佳优化的预期改进情况进行基于实际抽样结果的遗憾分析 2507.09828v2 -
318 07-16 Simplifying Graph Kernels for Efficient Vereinfachende Graphenkerne für effizientes Arbeiten 简化用于高效的图形内核 2507.03560v2 -
319 07-16 Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation Decoder-Hybrid-Decoder-Architektur für effizientes Nachdenken mit langer Generation 提高长代人合理性效率的代coder-Hybrid-Decer 结构 2507.06607v2 -
320 07-16 PATCH: a deep learning method to assess heterogeneity of artistic practice in historical paintings PATCH: eine Methode des tiefen Lernens zur Beurteilung der Heterogenität der künstlerischen Praxis in historischen Gemälden 评估历史绘画艺术实践多样性的深层学习方法 2502.01912v3 -
321 07-16 BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling BRIDGE: Bootstrapping-Text zur Steuerung der Time-Series-Generation über Multi-Agent iterative Optimierung und Diffusionsmodellierung BRIDGE:通过多代理迭代优化和传播模型化控制时间- 系列生成的推进文本 2503.02445v5 -
322 07-16 Rethinking Data Protection in the (Generative) Artificial Intelligence Era Datenschutz im Zeitalter der (generativen) Künstlichen Intelligenz neu denken 在人工(人工)情报时代重新思考数据保护问题 2507.03034v2 -
323 07-16 d-DQIVAR: Data-centric Visual Analytics and Reasoning for Data Quality Improvement d-DQIVAR: datenzentrierte visuelle Analyse und Begründung zur Verbesserung der Datenqualität d-DQIVAR:以数据为中心的提高数据质量的视觉分析和理由 2507.11960v1 -
324 07-16 PoTPTQ: A Two-step Power-of-Two Post-training for LLMs PoTPTQ: Zweistufige Kraft von zwei Nachschulungen für LLMs PoTPTQ:为LLMs提供两步二级培训后培训 2507.11959v1 -
325 07-16 Tuning Algorithmic and Architectural Hyperparameters in Graph-Based Semi-Supervised Learning with Provable Guarantees Tuning algorithmischer und architektonischer Hyperparameter im grafisch fundierten semi-überwachten Lernen mit nachweisbaren Garantien 在以图表为基础的半监测学习中以可实现的担保进行算法和建筑建筑超参数 2502.12937v2 -
326 07-16 The benefits of query-based KGQA systems for complex and temporal questions in LLM era Die Vorteile von anfragebasierten KGQA-Systemen für komplexe und zeitliche Fragen im LLM-Zeitalter 基于查询的KGQA系统对LLM时代复杂和时间问题的益处 2507.11954v1 -
327 07-16 IAM: Efficient Inference through Attention Mapping between Different-scale LLMs IAM: Effiziente Schlussfolgerung durch Aufmerksamkeitsmapping zwischen unterschiedlichen LLMs IAM:通过在不同规模的LMMs之间绘制注意绘图,有效推论 2507.11953v1 -
328 07-16 RNAMunin: A Deep Machine Learning Model for Non-coding RNA Discovery RNAMunin: Ein Deep Machine Learning Modell für die nicht-kodierende RNA Discovery RNAMunin:一个非编码 RNA 探索的深机器学习模型 2507.11950v1 -
329 07-16 Kevin: Multi-Turn RL for Generating CUDA Kernels Kevin: Multi-Turn RL für die Erzeugung von CUDA-Kerneln Kevin: 生成 CUDA 核心多发RL 2507.11948v1 -
330 07-16 A Survey of Deep Learning for Geometry Problem Solving Eine Umfrage über Deep Learning zur Lösung von Geometrieproblemen 解决几何问题深层学习调查 2507.11936v1 -
331 07-16 Truncated Kernel Stochastic Gradient Descent on Spheres Beschnittener Kern Stochastischer Gradient Abstieg auf Sphären 球体上被排出核心内核岩层渐变源 2410.01570v6 -
332 07-16 Complex non-backtracking matrix for directed graphs Komplexe Nicht-Rückverfolgungsmatrix für gerichtete Graphen 定向图表的复杂非后跟踪矩阵表 2507.12503v1 -
333 07-16 Accelerating RF Power Amplifier Design via Intelligent Sampling and ML-Based Parameter Tuning Beschleunigung des RF-Leistungsverstärkers über intelligente Probenahme und ML-basierte Parameter-Tuning 通过智能取样和以 ML 为基础的参数图集加速 RF 功率放大器设计 2507.11928v1 -
334 07-16 From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning Von Generativ zu Episodisch: Muster-Effizient Replicable Verstärkungslernen 从产生到起源:抽样有效复制强化学习 2507.11926v1 -
335 07-16 TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images TextDestroyer: Eine trainings- und annotationsfreie Diffusionsmethode zum Zerstören anomaler Texte aus Bildern 文字破坏:一个销毁图像中的非原子文字的无培训和注注解-不附带说明的传播方法 2411.00355v3 -
336 07-16 Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? Kann Prompt Schwierigkeit Online vorausgesagt werden, um RL zu beschleunigen Finetuning of Reasoning Models? 快速困难能否预测为加速理据模型的RL微调而在线化? 2507.04632v2 -
337 07-16 AFPM: Alignment-based Frame Patch Modeling for Cross-Dataset EEG Decoding AFPM: Alignmentbasierte Rahmenpatch-Modellierung für Cross-Dataset-EEG-Dekodierung FAFPM: 跨数据交换电子EEG的对齐框架补全模型 2507.11911v1 -
338 07-16 Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes Lernen von zeitvariierenden Multi-Region Gehirnkommunikation über skalierbare Markovian Gaussian Prozesse 通过可缩放的马尔科维扬高斯进程进行学习、改变时间的多区域脑交流 2407.00397v6 -
339 07-16 Epic-Sounds: A Large-scale Dataset of Actions That Sound Epic-Sounds: Ein großer Datensatz von Aktionen, die klingen 超声波:声响的大规模行动数据集 2302.00646v3 -
340 07-16 Resampling strategies for imbalanced regression: a survey and empirical analysis Strategien für eine unausgewogene Regression: eine Erhebung und empirische Analyse 恢复不平衡回归的战略:调查和实证分析 2507.11902v1 -
341 07-16 Imbalanced Regression Pipeline Recommendation Unausgewogene Regressionspipeline-Empfehlung 不平衡的递减管道建议 2507.11901v1 -
342 07-16 Newfluence: Boosting Model interpretability and Understanding in High Dimensions Newfluence: Verbesserung der Interpretationsfähigkeit und des Verständnisses von Modellen in hohen Dimensionen 新流:在高维度方面促进模型解释和理解 2507.11895v1 -
343 07-16 Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work? Den besseren Bandit-Algorithmus unter Datenfreigabe wählen: Wann funktionieren A/B-Experimente? 在数据共享:A/B实验何时奏效? 2507.11891v1 -
344 07-16 HueManity: Probing Fine-Grained Visual Perception in MLLMs HueManity: Erzeugen einer feinkörnigen visuellen Wahrnehmung in MLLMs 优才:在MLLMs中探究精美的视觉感知 2506.03194v2 -
345 07-16 GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning GeoChain: Multimodale Kette von Ideen für geographische Vernunft Geo Chain:为地理原因寻求的多式联运谈判链 2506.00785v2 -
346 07-16 A Policy-Improved Deep Deterministic Policy Gradient Framework for the Discount Order Acceptance Strategy of Ride-hailing Drivers Ein Policy-Improved Deep Deterministic Policy Gradient Framework für die Discount Order Acceptance Strategy of Ride-hailing Drivers 改善乘乘驾驶员折扣令接受战略的 政策改进深确定性政策分级框架 2507.11865v1 -
347 07-16 METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation METIS: Schnelle, qualitätsbewusste RAG-Systeme mit Konfigurationsanpassung METIS:具有配置适应的快速质量软件RAG系统 2412.10543v2 -
348 07-16 OrdShap: Feature Position Importance for Sequential Black-Box Models OrdShap: Feature Position Bedeutung für sequentielle Black-Box-Modelle OrdShap: 序列黑ox 模型的特性位置重要性 2507.11855v1 -
349 07-16 Some remarks on gradient dominance and LQR policy optimization Einige Bemerkungen zur Gradientendominanz und zur LQR-Politikoptimierung 关于梯度支配地位和LQR政策优化的一些评论 2507.10452v2 -
350 07-16 Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential Ihr LLM kennt die Zukunft: Sein Multi-Token-Prognosepotenzial enthüllen 您的LLM 了解未来: 发掘其多功能预测潜力 2507.11851v1 -
351 07-16 Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update Generalisierte Lineare Banditen: Fast optimales Bedauern mit One-Pass-Aktualisierung 通用线性直线强盗: 几乎最佳的误差, 带有单纸条更新 2507.11847v1 -
352 07-16 ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving ReAL-AD: Auf dem Weg zu menschlicher Vernunft im autonomen Fahren Ende-zu-Ende Re-AL-AD:在最终至最终自治驾驶中争取同人相同的理由 2507.12499v1 -
353 07-16 CTSR: Cartesian tensor-based sparse regression for data-driven discovery of high-dimensional invariant governing equations CTSR: Kartesische Tensor-basierte spärliche Regression für die datengetriebene Entdeckung hochdimensionaler Invariant-Regulierungsgleichungen CTSR: 由数据驱动的发现高维变异调节方程式的 数据驱动的高度异变方程的 笛卡尔斯偏差微弱回归 2504.07618v2 -
354 07-16 Understanding Pan-Sharpening via Generalized Inverse Pan-Sharpening über generalisierte Inverse verstehen 通过一般化反向 2310.02718v3 -
355 07-16 CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching CosmoFlow: Scale-Aware Representative Learning für die Kosmologie mit Flow Matching CosmoFlow: 以流动匹配方式进行宇宙学规模- 软件代表制学习 2507.11842v1 -
356 07-16 Protenix-Mini: Efficient Structure Predictor via Compact Architecture, Few-Step Diffusion and Switchable pLM Protenix-Mini: Effizienter Strukturvorhersage über kompakte Architektur, wenige Schritte Diffusion und umschaltbare pLM Protenix-Mini:通过集约结构结构、很少批发和可转接的PLM, 高效的结构预测器 2507.11839v1 -
357 07-16 HyperEvent:Learning Cohesive Events for Large-scale Dynamic Link Prediction HyperEvent:Learning Cohesive Events für groß angelegte dynamische Link-Vorhersage HyperEvent: 大型动态链接预测学习共聚活动 2507.11836v1 -
358 07-16 MatRL: Provably Generalizable Iterative Algorithm Discovery via Monte-Carlo Tree Search MatRL: Wahrscheinlich verallgemeinerbare iterative Algorithmen Entdeckung über Monte-Carlo Baumsuche MatRL: 通过蒙特-卡洛树搜索 发现可普遍实现的迭代性电算算法 2507.03833v2 -
359 07-16 Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI Arctic Inferenz mit Shift Parallelismus: Schnelles und effizientes Open Source Inferenzsystem für Enterprise AI 北极与转移平行主义的推论:企业AI快速有效的开放源码推断系统 2507.11830v1 -
360 07-16 A Group Theoretic Analysis of the Symmetries Underlying Base Addition and Their Learnability by Neural Networks Eine gruppentheoretische Analyse der Symmetrien, die Basiszusatz und ihre Erlernbarkeit durch neurale Netzwerke sind 神经网络对基底添加的对称及其可学习性进行小组理论分析 2507.10678v2 -
361 07-16 Extension OL-MDISF: Online Learning from Mix-Typed, Drifted, and Incomplete Streaming Features Erweiterung OL-MDISF: Online-Lernen von Mix-Typed, Drifted und Unvollständige Streaming-Funktionen OL-MDISF:从混ix-Typed、drifted和不完全流的特征网上学习 2507.10594v2 -
362 07-16 Regret Analysis for Randomized Gaussian Process Upper Confidence Bound Bedauerliche Analyse für Randomized Gaussian Prozess Oberes Vertrauen Gebunden 对随机调整高斯进程最高信任圈的遗憾分析 2409.00979v3 -
363 07-16 Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving Proaktive Intra-GPU-Disaggregation von Prefill und Decode in LLM Serving 预填和解除LLM服务中编码的预填和分解 2507.06608v4 -
364 07-16 BiLO: Bilevel Local Operator Learning for PDE Inverse Problems. Part I: PDE-Constrained Optimization BiLO: Zweistufiges lokales Operator-Lernen für inverse PDE-Probleme. Teil I: PDE-Kontrainierte Optimierung BILO: 双级当地操作员学习PDE反问题。 第一部分:受PDE约束的优化 2404.17789v5 -
365 07-16 Symbiosis: Multi-Adapter Inference and Fine-Tuning Symbiose: Multi-Adapter-Schlussfolgerung und Feinabstimmung 共生关系:多位开发商的推断和精准调整 2507.03220v2 -
366 07-16 MNIST-Gen: A Modular MNIST-Style Dataset Generation Using Hierarchical Semantics, Reinforcement Learning, and Category Theory MNIST-Gen: Eine modulare MNIST-Style-Datensatz-Generation mit Hierarchischer Semantik, Verstärkungslernen und Kategorietheorie MNIST-Gen:利用等级的语义、强化学习和分类理论,形成一个Modular MNIST-Style数据集 2507.11821v1 -
367 07-16 SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling SynCoGen: Synthesizable 3D-Molekül-Generation über Gelenkreaktion und Koordinatenmodellierung SynCoGen:通过联合反应和协调建模,同步可3D分子生成 2507.11818v1 -
368 07-16 Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models Nachvollziehen von Fakten oder nur Kopien? Eine kritische Untersuchung der Wettbewerbe von Mechanismen in großen Sprachmodellen 对大语言模式机制竞争情况的重要调查 2507.11809v1 -
369 07-16 CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategy for Learning with Noisy Labels CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategie zum Lernen mit lauteren Etiketten CLID-MU:跨行业信息差异:基于跨行业信息差异的Met Met 最新学习战略,有噪音标签 2507.11807v1 -
370 07-16 MOFSimBench: Evaluating Universal Machine Learning Interatomic Potentials In Metal–Organic Framework Molecular Modeling MOFSimBench: Bewertung der interatomaren Potentiale des universellen maschinellen Lernens in Metall–Organic Framework Molecular Modeling MOFSimBench:评价金属-有机框架中的通用机器学习和相互作用潜力 2507.11806v1 -
371 07-15 (2) Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation Verstärkung der latenten euklidischen Geometrie in Single-Cell VAEs für Manifold Interpolation 在单细胞VAEs 中执行中流的欧洲立地磷化物几何测量以用于 MManided Indigitation 2507.11789v1 -
372 07-15 Lost in Transmission: When and Why LLMs Fail to Reason Globally Verloren in der Übertragung: Wann und warum LLMs weltweit nicht vernünftig sind LLLM女士何时和为何未能达到全球范围的理由 2505.08140v3 -
373 07-15 Metalic: Meta-Learning In-Context with Protein Language Models Metallic: Meta-Learning im Kontext mit Protein-Sprachmodellen 金属:使用蛋白素语言模型的元学习内文 2410.08355v3 -
374 07-15 Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks Implizite Bias des gradienten Abstiegs für nicht-homogene Deep Networks 非同源深层网络的梯发隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐 2502.16075v2 -
375 07-15 Foundation Models for Brain Signals: A Critical Review of Current Progress and Future Directions Grundlagenmodelle für Gehirnsignale: Ein kritischer Überblick über aktuelle Fortschritte und zukünftige Richtungen 脑信号基础模型:对当前进展和未来方向的重要审查 2507.11783v1 -
376 07-15 Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction Generalisierte Venn- und Venn-Abers-Kalibrierung mit Anwendungen in konformer Vorhersage 通用文文和文安-用非正式预测对应用进行校准 2502.05676v3 -
377 07-15 Inference on Optimal Policy Values and Other Irregular Functionals via Smoothing Schlussfolgerung zu optimalen Policy Values und anderen irregulären Funktionen durch Glätten 通过平滑对最佳政策价值和其他不正常功能的推论 2507.11780v1 -
378 07-15 Predicting Delayed Trajectories Using Network Features: A Study on the Dutch Railway Network Vorhersage verzögerter Bahnen mit Netzwerkmerkmalen: Eine Studie zum niederländischen Eisenbahnnetz 利用网络特点预测延迟轨道:关于荷兰铁路网的研究 2507.11776v1 -
379 07-15 Scaling laws for activation steering with Llama 2 models and refusal mechanisms Skalierungsgesetze für die Aktivierungssteuerung mit Llama 2 Modellen und Ablehnungsmechanismen 以Llama 2模式和拒绝机制启动指导的法律 2507.11771v1 -
380 07-15 LLMs are Bayesian, in Expectation, not in Realization LLMs sind Bayesian, in Erwartung, nicht in der Realisierung LLMs是巴耶斯人、期望、而不是实现的巴耶斯人。 2507.11768v1 -
381 07-15 Differentially Private Conformal Prediction via Quantile Binary Search Differential private konforme Vorhersage über Quantile Binary Search 通过量度二进制搜索的不同私人 2507.12497v1 -
382 07-15 SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation SAMO: Ein leicht schärfer und bewusster Ansatz für die Multi-Task-Optimierung mit gemeinsamer Global-Local-Perturbation SAMO: 与全球-地方联合干扰进行多任务优化的轻量级锐锐利软件方法 2507.07883v3 -
383 07-15 Torsional-GFN: a conditional conformation generator for small molecules Torsional-GFN: ein konditionaler Exterieurgenerator für kleine Moleküle Torsional-GFN:小型分子的有条件整装发电机 2507.11759v1 -
384 07-15 FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making FOUNDER: Erdungs-Stiftungsmodelle in Weltmodellen für offene, einkörperige Entscheidungsfindung FOUNDER: 以世界不限名额作出不限 2507.12496v1 -
385 07-15 A Graph-in-Graph Learning Framework for Drug-Target Interaction Prediction Ein Graph-in-Graph-Lernrahmen für die Vorhersage von Drogen-Target-Interaktion 药物-目标互动预测图示-格图学习框架 2507.11757v1 -
386 07-15 AKReF: An argumentative knowledge representation framework for structured argumentation AKREF: Ein argumentativer Wissensvertretungsrahmen für strukturierte Argumentation AKREF: 结构化论证的理论知识代表框架 2506.00713v3 -
387 07-15 Problem-dependent convergence bounds for randomized linear gradient compression Problemabhängige Konvergenzgrenzen für randomisierte lineare Gradientenkompression 随机的线性梯度压缩 2411.12898v3 -
388 07-15 Sparse Identification of Nonlinear Dynamics with Conformal Prediction Sparse Identifikation von nichtlinearen Dynamiken mit konformer Vorhersage 以非正式预测对非线性动态的简单识别 2507.11739v1 -
389 07-15 Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning 以编码器嵌入式嵌入为改进节点学习提供动力的神经网络 2507.11732v1 -
390 07-15 Globalization for Scalable Short-term Load Forecasting Globalisierung für skalierbare kurzfristige Lastprognosen 全球化促进可伸缩的短期负载预测 2507.11729v1 -
391 07-15 Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop Benchmarking und Evaluation von KI-Modellen in der Biologie: Ergebnisse und Empfehlungen aus dem CZI Virtual Cells Workshop 衡量和评价AI 生物学模型的基准和评估:CZI虚拟单元讲习班的成果和建议 2507.10502v2 -
392 07-15 Subgraph Generation for Generalizing on Out-of-Distribution Links Subgraphengenerierung für die Verallgemeinerung von Out-of-Distribution-Links 通用分配外链接的子集 2507.11710v1 -
393 07-15 Sporadic Federated Learning Approach in Quantum Environment to Tackle Quantum Noise Sporadic Federated Learning Approach in Quantum Environment to Tackle Quantum Noise 处理量子噪音的量子环境中零星的联邦学习方法 2507.12492v1 -
394 07-15 Reinforcement Learning from Adversarial Preferences in Tabular MDPs Verstärkung des Lernens von Adversarial Preferences in Tabular MDPs 从表列MDP的反向优惠中学习 2507.11706v1 -
395 07-15 Time series classification of satellite data using LSTM networks: an approach for predicting leaf-fall to minimize railroad traffic disruption Zeitreihenklassifizierung von Satellitendaten unter Verwendung von LSTM-Netzen: ein Ansatz zur Vorhersage von Blattfallen zur Minimierung von Verkehrsunterbrechungen im Eisenbahnverkehr 利用LSTM网络对卫星数据进行时间序列分类:预测落叶的方法,以尽量减少铁路交通中断 2507.11702v1 -
396 07-15 Spatially Grounded Explanations in Vision Language Models for Document Visual Question Answering Spatially Grounded Erklärungen in Vision Language Models for Document Visual Question Answering 用于文件视觉问题解答的愿景语言模型中的基于空间的解释 2507.12490v1 -
397 07-15 Variational Combinatorial Sequential Monte Carlo for Bayesian Phylogenetics in Hyperbolic Space Variationale Kombinatorial Sequentielle Monte Carlo für Bayesische Phylogenetik im Hyperbolischen Raum 双曲空间巴耶斯动力基因组学变异组合序列蒙特卡洛 2501.17965v2 -
398 07-15 Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees Any-Property-Conditional Molecule Generation mit Selbst-Kritik mit Spanning Trees 使用横贯树木进行自批评的 任何有条件的分子代 2407.09357v3 -
399 07-15 Galaxy image simplification using Generative AI Galaxy Bildvereinfachung mit Generative KI 利用创用AI简化银河系统图像 2507.11692v1 -
400 07-15 The Impact of Coreset Selection on Spurious Correlations and Group Robustness Die Auswirkungen der Coreset-Auswahl auf Purious Correlations und Group Robustness Coreset 选择对污损和群体强势的影响 2507.11690v1 -
401 07-15 Composing Linear Layers from Irreducibles Das Komponieren von linearen Schichten aus Irreduzierbaren 将来自不灵异的线性图层合成成线性图层 2507.11688v1 -
402 07-15 MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization MetaLint: Generalisierbare idiomatische Code-Qualitätsanalyse durch instruction-following und einfach-zu-harte Verallgemeinerung MetLint: 通过执行指示和易于协调的通用化,可通用的单性守则质量分析 2507.11687v1 -
403 07-15 PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training PGT-I: Scaling Spatiotemporal GNNs mit speichereffizienter verteilter Ausbildung PGT-I: 具有记忆有效分配培训的Splap Spatotomotial GNNs 2507.11683v1 -
404 07-15 AI for Explosive Ordnance Detection in Clearance Operations: The State of Research KI für explosive Ordnance Detection in Clearing-Operationen: Der Stand der Forschung 清除行动中爆炸性弹药侦测的AI:研究状况 2411.05813v2 -
405 07-15 Kolmogorov-Arnold Networks: Approximation and Learning Guarantees for Functions and their Derivatives Kolmogorov-Arnold-Netzwerke: Annäherungs- und Lerngarantien für Funktionen und deren Derivate Kolmogorov-Arnold网络:功能及其衍生工具的近似和学习保障 2504.15110v2 -
406 07-15 Machine Learning-Driven Compensation for Non-Ideal Channels in AWG-Based FBG Interrogator Machine Learning-Driven Kompensation für nicht-ideale Kanäle im AWG-basierten FBG-Interrogator 特设工作组FBG 干涉器中非理想通道的机器学习驱动补偿 2506.13575v2 -
407 07-15 Let’s Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification Lassen Sie uns in zwei Schritten denken: Abmildern Vereinbarung Bias in MLLMs mit selbst-gerundete Verifikation 让我们思考两步:在MLLMs中减少协议与自我核查的偏见 2507.11662v1 -
408 07-15 Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory Mathematische Einführung in Deep Learning: Methoden, Implementierungen und Theorie 深层学习数学介绍:方法、实施和理论 2310.20360v3 -
409 07-15 STAGED: A Multi-Agent Neural Network for Learning Cellular Interaction Dynamics STAGED: Ein multi-agent-neurales Netzwerk zum Lernen zellulärer Interaktionsdynamik STAGAD: 学习细胞互动动态多要素神经网络 2507.11660v1 -
410 07-15 ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs ZKP-FedEval: Überprüfbare und datenschutzschonende Federated Evaluation mit Null-Wissensnachweisen ZKP-FedEval:使用零知识证明进行可核查和隐私保护的联邦评价 2507.11649v1 -
411 07-15 Tracing the Path to Grokking: Embeddings, Dropout, and Network Activation Auf dem Weg zum Grokking: Einbettungen, Dropout und Netzwerkaktivierung 追踪通往格罗金的道路:嵌入、辍学和网络启动 2507.11645v1 -
412 07-15 Posture-Driven Action Intent Inference for Playing style and Fatigue Assessment Posture-Driven Action Intent Inferenz für Spielstil und Müdigkeit Bewertung 游戏风格和Fatigue评估的推论 2507.11642v1 -
413 07-15 Deep Generative Methods and Tire Architecture Design Tiefe generative Methoden und Reifenarchitektur Design 深生成方法和轮胎结构设计 2507.11639v1 -
414 07-15 Interpretable Prediction of Lymph Node Metastasis in Rectal Cancer MRI Using Variational Autoencoders Interpretable Vorhersage von Lymphknotenmetastasen bei rektaler KrebsmRT mit variablen Autoencodern 利用变化式自动电解器对直肠癌MRI中淋巴结结的代谢值进行可解释的预测 2507.11638v1 -
415 07-15 JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs JSQA: Sprachqualitätsbewertung mit Wahrnehmungs-Inspired Contractive Pretraining basierend auf JND Audio Pairs JSQA:根据JND音频对音频对调,用自觉受启发的违反规定前训练进行语言质量评估 2507.11636v1 -
416 07-15 Multi-view biomedical foundation models for molecule-target and property prediction Multi-View biomedizinische Stiftungsmodelle für Molekül-Ziel- und Eigenschaftsvorhersage 分子目标和财产预测多视角生物医学基础模型 2410.19704v4 -
417 07-15 A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs Ein rechnerisch frugales Open-Source-Stiftungsmodell für Thorax-Erkennung in Lungenkrebs-Screening-Programmen 肺癌筛查方案中胸腔酸疾病检测的计算节节制开源基础模型 2507.01881v2 -
418 07-15 MapIQ: Benchmarking Multimodal Large Language Models for Map Question Answering MapIQ: Benchmarking multimodaler Großsprachenmodelle für Kartenfrageantworten MapIQ:为地图回答问题确定多式大语言模式基准 2507.11625v1 -
419 07-15 Learning Representations of Event Time Series with Sparse Autoencoders for Anomaly Detection, Similarity Search, and Unsupervised Classification Lernrepräsentationen der Veranstaltungszeitreihe mit Sparse-Autoencodern für Anomalieerkennung, Ähnlichkeitssuche und unbeaufsichtigte Klassifizierung 与用于异常探测、相似搜索和无监督分类的粗皮自动编码器一起进行的 活动时间系列学习说明 2507.11620v1 -
420 07-15 Streaming 4D Visual Geometry Transformer Streaming 4D Visuelle Geometrie Transformer 流动 4D 视觉几何变换器 2507.11539v1 -
421 07-15 Canonical Bayesian Linear System Identification Canonical Bayesian Linear System Identification Canonical Bayesian Canonical Bayesian 线性系统识别 2507.11535v1 -
422 07-15 Langevin Flows for Modeling Neural Latent Dynamics Langevin-Ströme für die Modellierung neuraler Latent-Dynamik 模拟神经内流动态的Langevin流程 2507.11531v1 -
423 07-15 EXPO: Stable Reinforcement Learning with Expressive Policies EXPO: Stabiles Stärkungslernen mit ausdrucksstarker Politik 出口促进: 采用表达式政策进行稳定的加强学习 2507.07986v2 -
424 07-15 CATVis: Context-Aware Thought Visualization CATVis: Kontext-Bewusste Gedankenvisualisierung CAT-Vis:背景意识思想视觉化 2507.11522v1 -
425 07-15 Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Hi Robot: Open Ended Instruction mit Hierarchical Vision-Language-Action-Modellen 高机器人:不限名额教学,采用等级愿景-语言-行动模式 2502.19417v2 -
426 07-15 AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air AirLLM:传播基于政策的适应性LORA,用于远距离微调LLM在空中的LLM 2507.11515v1 -
427 07-15 Are DeepSeek R1 And Other Reasoning Models More Faithful? Sind DeepSeek R1 und andere vernünftige Modelle treuer? DeepSeek R1和其他理由模型更可信吗? 2501.08156v5 -
428 07-15 Large Language Models Engineer Too Many Simple Features For Tabular Data Large Language Models Engineer Zu viele einfache Funktionen für Tabellendaten 大语言模型工程师 2410.17787v2 -
429 07-15 Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques Elk: Erforschung der Effizienz von Intercore-vernetzten KI-Chips mit Deep Learning Compiler-Techniken Elk:探索与深学习汇编者技术一起的机构间连接的AI芯片的效率 2507.11506v1 -
430 07-15 A Mathematical Theory of Discursive Networks Eine mathematische Theorie diskursiver Netzwerke 讨论网络的数学理论 2507.06565v3 -
431 07-15 ComFairGNN: Community Fair Graph Neural Network ComFairGNN: Gemeinschaftsgerechtes Diagramm-Neural-Netzwerk ComfairGNNN:社区公平图形神经网络 2411.04371v3 -
432 07-15 Searching Latent Program Spaces Suche nach latenten Programmräumen 搜索隐藏程序空间 2411.08706v2 -
433 07-15 Reinforcement Learning with Action Chunking Verstärktes Lernen mit Action Chunking 强化学习与行动决赛 2507.07969v2 -
434 07-15 Exploring the robustness of TractOracle methods in RL-based tractography Erforschung der Robustheit von TractOracle-Methoden in der RL-basierten Traktographie 探索基于RL的地形图象学中的Tract Oracle方法的稳健性 2507.11486v1 -
435 07-15 Model See Model Do: Speech-Driven Facial Animation with Style Control Modell siehe Modell Do: Sprachgesteuerte Gesichtsanimation mit Stilsteuerung 见示范 do:带有样式控制的语音驱动动画模型 2505.01319v2 -
436 07-15 Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety Chain of Thought Monitoringability: Eine neue und fragile Chance für KI-Sicherheit 《思想链可监测性:AI安全的新机会和脆弱机会》 2507.11473v1 -
437 07-15 D3FL: Data Distribution and Detrending for Robust Federated Learning in Non-linear Time-series Data D3FL: Datenverteilung und Detrending für robustes Federated Learning in nichtlinearen Zeitreihendaten D3FL:非线性时间序列数据中硬性联邦学习的数据分配和分流 2507.11471v1 -
438 07-15 Gram-Schmidt Methods for Unsupervised Feature Extraction and Selection Gram-Schmidt Methoden zur unüberwachten Feature-Extraktion und -Auswahl 不受监督地物采掘和选择的Gram-Schmidt方法 2311.09386v4 -
439 07-15 Training neural control variates using correlated configurations Ausbildung von Neuralsteuerungsvariaten mit korrelierten Konfigurationen 使用相关配置的培训神经控制变异 2505.07719v3 -
440 07-15 LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer LRMR: LLM-getriebenes relationales Multiknoten-Ranking für Lymphknotenmetastasis-Abschätzung bei rektaler Krebserkrankung LRMR: 红外癌症中淋巴结结结节元值评估的LLM-Driven 关系多节分级 2507.11457v1 -
441 07-15 A Generative Approach to LLM Harmfulness Detection with Special Red Flag Tokens Eine generative Annäherung an LLM Harmfulness Detection mit speziellen roten Flaggen-Tokens 利用特别红旗拳生成LLM 无害性探测法 2502.16366v3 -
442 07-15 Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs Synthetische Datensätze für maschinelles Lernen auf Spatio-Temporalen Graphen mit PDEs 利用PDEs在斯帕蒂奥-时空图上进行机器学习的合成数据集 2502.04140v2 -
443 07-15 Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control Langsame Entscheidungshäufigkeiten in der kontinuierlichen Kontrolle überwinden: Modellbasiertes Sequenz-Verstärkungs-Lernen für modellfreie Steuerung 克服持续控制中缓慢决定因素:无模式控制的示范序列强化学习 2410.08979v4 -
444 07-15 Implementing Adaptations for Vision AutoRegressive Model Implementierung von Anpassungen für das AutoRegressive Vision Modell 实施适应展望自动递减模式 2507.11441v1 -
445 07-15 Toward Improving fNIRS Classification: A Study on Activation Functions in Deep Neural Architectures Zur Verbesserung der fNIRS-Klassifikation: Eine Studie über Aktivierungsfunktionen in tiefen neuralen Architekturen 努力改进FNIRS分类:关于深神经结构中激活功能的研究 2507.11436v1 -
446 07-15 FLsim: A Modular and Library-Agnostic Simulation Framework for Federated Learning FLsim: Ein modulares und bibliotheks-agnostisches Simulations-Framework für Federated Learning FLsim: 联邦学习模式和图书馆-不可知模拟框架 2507.11430v1 -
447 07-15 Matrix Is All You Need Matrix ist alles, was Sie brauchen 母体是所有你需要的 2506.01966v2 -
448 07-15 Improving sub-seasonal wind-speed forecasts in Europe with a non-linear model Verbesserung der Windgeschwindigkeitsprognosen innerhalb der Saison in Europa mit einem nichtlinearen Modell 利用非线性模型改进欧洲季节性风速次风速预报 2411.19077v2 -
449 07-15 Better Regret Rates in Bilateral Trade via Sublinear Budget Violation Bessere Bedauernsraten im bilateralen Handel durch sublineare Haushaltsverletzung 双边贸易中因次线性预算违反规定而出现更好的遗憾率 2507.11419v1 -
450 07-15 A Resource Efficient Quantum Kernel Ein ressourceneffizienter Quantenkern 资源效率高的量子核心 2507.03689v2 -
451 07-15 DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation DFRot: Erzielen von aussergewöhnlicher und massiver Aktivierungsfrei für rotierte LLMs mit raffinierter Rotation DFRot: 实现无外源和无大规模激励-无源于经过精炼旋转的旋转LMLMs 2412.00648v4 -
452 07-15 Seq vs Seq: An Open Suite of Paired Encoders and Decoders Seq vs Seq: Eine offene Suite aus koppelten Encodern und Decodern Seq vs Seq:一个开放的套件,其中含有子元编码器和代碼器。 2507.11412v1 -
453 07-15 Robust-Multi-Task Gradient Boosting Robust-Multi-Task-Gradienten-Boosting 强力多任务梯级推动 2507.11411v1 -
454 07-15 Temporal Chunking Enhances Recognition of Implicit Sequential Patterns Temporales Chunking verbessert die Anerkennung von impliziten Sequenzmustern 增强对隐性序列模式的认识 2506.00588v2 -
455 07-15 Gaussian mixture models as a proxy for interacting language models Gaußsche Mischungsmodelle als Proxy für interagierende Sprachmodelle Gaussian 混合模型作为交互语言模型的替代 2506.00077v3 -
456 07-15 Moderate Adaptive Linear Units (MoLU) Mäßige adaptive Lineareinheiten (MoLU) 适应性线性线性单位(MoLU) 2302.13696v7 -
457 07-15 Stochastic Entanglement Configuration for Constructive Entanglement Topologies in Quantum Machine Learning with Application to Cardiac MRI Stochastische Verflechtungskonfiguration für konstruktives Verflechtungs-Topologien im Quantum Machine Learning mit Anwendung auf Herz-Kreislauf-MRT Qantum 机器学习中用于心脏部磁共振 2507.11401v1 -
458 07-15 X Hacking: The Threat of Misguided AutoML X Hacking: Die Bedrohung durch fehlgeleitete AutoML Xacking:误导自动洗钱的威胁 2401.08513v3 -
459 07-15 The model is the message: Lightweight convolutional autoencoders applied to noisy imaging data for planetary science and astrobiology Das Modell ist die Botschaft: Leichte konvolutionäre Autoencoder, die auf laute Bilddaten für die Planetenwissenschaft und Astrobiologie angewendet werden 模型就是信息:轻量级变速自动电解码器,用于行星科学和天体生物学的噪音成像数据。 2507.11400v1 -
460 07-15 Inverse Reinforcement Learning with Switching Rewards and History Dependency for Characterizing Animal Behaviors Inverse Verstärkung Lernen mit wechselnden Belohnungen und Geschichte Abhängigkeit für die Charakterisierung von Tierverhalten 反强化学习,转换奖励和对动物行为定性的动物行为的历史依赖 2501.12633v3 -
461 07-15 A Neural Network Model of Complementary Learning Systems: Pattern Separation and Completion for Continual Learning Ein neurales Netzwerkmodell für komplementäre Lernsysteme: Mustertrennung und -vervollständigung für kontinuierliches Lernen 补充学习系统神经网络模型:持续学习的模式分离和完成 2507.11393v1 -
462 07-15 Synthetic Tabular Data Generation: A Comparative Survey for Modern Techniques Synthetische tabellarische Datengenerierung: Eine vergleichende Erhebung für moderne Techniken 制作合成图表数据:现代技术比较调查 2507.11590v1 -
463 07-15 From Kinetic Theory to AI: a Rediscovery of High-Dimensional Divergences and Their Properties Von der Kinetischen Theorie zur KI: Eine Wiederentdeckung hochdimensionaler Divergenzen und ihrer Eigenschaften 从动从理论到AI:重现高度多元差异及其属性 2507.11387v1 -
464 07-15 Einstein Fields: A Neural Perspective To Computational General Relativity Einstein-Felder: Eine neurale Perspektive zur Berechnung allgemeiner Relativität 爱因斯坦领域:从神经角度看待对一般相对论的比较 2507.11589v1 -
465 07-15 Joint space-time wind field data extrapolation and uncertainty quantification using nonparametric Bayesian dictionary learning Gemeinsame Raum-Zeit-Windfelddaten-Extrapolation und Unsicherheits-Quantifizierung mit nichtparametrischem Bayesian Wörterbuch-Lernen 使用非参数贝耶斯词典学习法进行联合时空风场数据外推和不确定性量化 2507.11385v1 -
466 07-15 SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics SToFM: ein Multi-Skala-Stiftungsmodell für räumliche Transkriptomik SToFM:空间转换学多规模基础模型 2507.11588v1 -
467 07-15 An All-digital 8.6-nJ/Frame 65-nm Tsetlin Machine Image Classification Accelerator Ein volldigitaler 8,6-nJ/Frame 65-nm Tsetlin Maschineneinteilung Beschleuniger 全数8.6-nJ/Frame 65nm Tsetlin 机器图像分类加速器 2501.19347v3 -
468 07-15 Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs Schrittweise Richtlinie für Wissen über seltene Werkzeuge (SPaRK): Offline-RL, die vielfältige Werkzeugnutzung in LLMs antreibt 有限工具知识(SPARK)的逐步政策:驱动在LLM中使用多样化工具的离线RL 2507.11371v1 -
469 07-15 Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning Lokales Paarweise Abstand passend für Backpropagation-freie Verstärkungs-Lernen 后推进-无强化学习的地方对等相近距离匹配 2507.11367v1 -
470 07-15 A Parallelizable Approach for Characterizing NE in Zero-Sum Games After a Linear Number of Iterations of Gradient Descent Ein parallelisierbarer Ansatz zur Charakterisierung von NE in Null-Sum-Spielen nach einer linearen Anzahl von Iterationen von gradienten Abstieg 在 “ 累进后裔线性迭代数后零苏姆运动会 “ 中将NE定性的可平行办法 2507.11366v1 -
471 07-15 DeInfoReg: A Decoupled Learning Framework for Better Training Throughput DeInfoReg: Ein entkoppelter Lernrahmen für besseren Trainingsdurchsatz DInfoReg:一个分离的学习框架,以改善培训工作量 2506.18193v2 -
472 07-15 Neurosymbolic Reasoning Shortcuts under the Independence Assumption Neurosymbolische Begründung Kurzbefehle unter der Unabhängigkeitsaufnahme 独立假设下的神经曲脚解释快捷键 2507.11357v1 -
473 07-15 Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making Beyond Predictions: Ein partizipatorischer Rahmen für Entscheidungsfindung mit mehreren Interessenträgern 超越预测:多方利益攸关方决策参与框架 2502.08542v2 -
474 07-15 Guiding LLM Decision-Making with Fairness Reward Models Leitende LLM-Entscheidungs-Making mit Fairness-Reward-Modelle 以公平奖励模式作出指导性LLM决策 2507.11344v1 -
475 07-15 Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding Intervening in Black Box: Konzept Engpass-Modell für die Verbesserung der menschlichen neuralen Netzwerk gegenseitiges Verständnis 黑盒干预:增强人类神经网络相互了解的概念瓶颈模式 2506.22803v2 -
476 07-15 Universal rates of ERM for agnostic learning Universelle WKM-Raten für agnostisches Lernen 用于不可不可知性学习的企业风险管理普遍比率 2506.14110v2 -
477 07-15 Internal Value Alignment in Large Language Models through Controlled Value Vector Activation Interne Wertausrichtung in großen Sprachmodellen durch kontrollierte Wert-Vektor-Aktivierung 通过控制值矢量激活,通过控制值矢量激活,大语言模型的内部价值对齐 2507.11316v1 -
478 07-15 Contrast All the Time: Learning Time Series Representation from Temporal Consistency Kontrast die ganze Zeit: Zeitreihendarstellung von zeitlicher Konsistenz lernen 时间一致性的学习时间序列代表 2410.15416v2 -
479 07-15 Supercharging Floorplan Localization with Semantic Rays Supercharging Grundriss Lokalisierung mit Semantic Rays 配有语义雷的本地化 2507.09291v2 -
480 07-15 Grasping a Handful: Sequential Multi-Object Dexterous Grasp Generation Greifen einer Handful: Sequentielle Multi-Object Dexterous Grasp Generation 绘制手巧的 : 序列式多对象脱色重力生成 2503.22370v3 -
481 07-15 FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation FeDa4Fair: Client-Level-Federated Datasets für die Fairness-Bewertung FeDa4fair:公平评价客户-联邦数据集 2506.21095v2 -
482 07-15 Energy Efficiency in AI for 5G and Beyond: A DeepRx Case Study Energieeffizienz in KI für 5G und darüber hinaus: Eine DeepRx-Fallstudie 5G 及5G 以上的AI 能源效率:深Rx 案例研究 2507.10409v2 -
483 07-15 Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime Schnelle letzte Konvergenz der SGD im glatten Interpolationssystem SGD在平滑的内插制度中的汇合 2507.11274v1 -
484 07-15 Parameter-Efficient Fine-Tuning with Circulant and Diagonal Vectors Parametereffizientes Feintuning mit Kreisel- und Diagonalvektoren 具有圆环和对角矢量的高效参数精密喷射 2505.00580v2 -
485 07-15 Gaussian Loss Smoothing Enables Certified Training with Tight Convex Relaxations Gaussian Loss Smoothing ermöglicht zertifiziertes Training mit engen Convex-Entspannungen Gausian 滑动损失平滑使经认证的培训具有紧固封顶宽宽度 2403.07095v4 -
486 07-15 BridgeNet: A Hybrid, Physics-Informed Machine Learning Framework for Solving High-Dimensional Fokker-Planck Equations BridgeNet: Hybrides, physikinformiertes Machine Learning Framework zur Lösung hochdimensionaler Fokker-Planck-Gleichungen BridgeNet:用于解决高二分法克-普朗克赤道的混合、物理成形机械学习框架 2506.04354v4 -
487 07-15 Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound Sand zu Gold machen: Recyclingdaten überbrücken On-Policy- und Off-Policy-Lernen über Causal Bound 将沙沙变成金子:利用回收数据,通过 “ 因果关系 “ 将 “ 沙沙变成 “ 金 “ :利用回收数据,将 “ 政策 “ 和 “ 政策外学习 “ 连接起来 2507.11269v1 -
488 07-15 Privacy Against Agnostic Inference Attacks in Vertical Federated Learning Datenschutz gegen agnostische Inferenzangriffe im vertikalen Föderierten Lernen 在垂直联邦学习中针对精神推断攻击的隐私 2302.05545v3 -
489 07-15 Block Circulant Adapter for Large Language Models Block Circulant Adapter für große Sprachmodelle 用于大语言模型的块环相适应器 2505.00582v2 -
490 07-15 Learning Safe Numeric Planning Action Models Sichere numerische Planungs-Aktionsmodelle lernen 学习安全数字规划行动模式 2312.10705v2 -
491 07-15 LyAm: Robust Non-Convex Optimization for Stable Learning in Noisy Environments LyAm: Robuste Non-Convex-Optimierung für stabiles Lernen in lauten Umgebungen LyAm: 在噪音环境中稳定学习的强力非Convex优化 2507.11262v1 -
492 07-15 The Pragmatic Frames of Spurious Correlations in Machine Learning: Interpreting How and Why They Matter Die Pragmatischen Rahmen von Puriösen Korrelationen im maschinellen Lernen: Verdolmetschen, wie und warum sie wichtig sind 机器学习中净化的相互校正的实用框架:解释这些框架如何和为何重要 2411.04696v4 -
493 07-15 Fairness-Aware Grouping for Continuous Sensitive Variables: Application for Debiasing Face Analysis with respect to Skin Tone Fairness-Aware-Gruppierung für kontinuierliche Sensitive Variablen: Anwendung für die Debiasing Face Analysis in Bezug auf Hautton 持续敏感变量的公平意识群集:关于皮肤色调的贬低面分析申请 2507.11247v1 -
494 07-15 Generative Click-through Rate Prediction with Applications to Search Advertising Generative Click-through-Rate-Vorhersage mit Anwendungen zur Suche Werbung 利用搜索广告应用程序生成点击率预测 2507.11246v1 -
495 07-15 Shared Global and Local Geometry of Language Model Embeddings Gemeinsame globale und lokale Geometrie von Sprachmodellen 共同的全球和地方语言对地测量 2503.21073v3 -
496 07-15 Few-Shot Radar Signal Recognition through Self-Supervised Learning and Radio Frequency Domain Adaptation Wenig scharfe Radarsignalerkennung durch selbstüberwachtes Lernen und Funkfrequenz-Domänenanpassung 通过自我监督学习和无线电频域的适应,很少点热雷达信号识别 2501.03461v3 -
497 07-15 Improved sampling algorithms and Poincaré inequalities for non-log-concave distributions Verbesserte Sampling-Algorithmen und Poincaré-Ungleichheiten für Nicht-Log-Konkaven-Distributionen 改进取样算法和波因卡雷非卷卷混集分布分布的不平等情况 2507.11236v1 -
498 07-15 DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion DuetGraph: Coarse-to-Fine-Wissensgrafik mit Dual-Pathway Global-Local Fusion 迪特格格:粗到精知识图,与双路全球-本地融合 2507.11229v1 -
499 07-15 TorchCP: A Python Library for Conformal Prediction TorchCP: Eine Python-Bibliothek für konforme Vorhersagen 火炬CP:皮顿综合预测图书馆 2402.12683v3 -
500 07-15 Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere? Gradient Descent on Logistic Regression: Arbeiten große Schrittgrößen mit Daten auf der Sphäre? 物流倒退的梯度:大步级系统是否与球体数据相配合? 2507.11228v1 -
501 07-15 On Equivariant Model Selection through the Lens of Uncertainty Bei gleicher Modellauswahl durch das Lens of Uncertainty 通过不确定性的镜头进行等同模型选择 2506.18629v2 -
502 07-15 On the Effect of Instruction Tuning Loss on Generalization Auf die Auswirkungen der Instruktion Tuning Verlust auf die Verallgemeinerung 指示计票损失对普遍化的影响的影响 2507.07817v2 -
503 07-15 Stylometry recognizes human and LLM-generated texts in short samples Stylometrie erkennt menschliche und LLM-generierte Texte in kurzen Proben tytylometerm在短样本中确认人类和LLM产生的文本 2507.00838v2 -
504 07-15 Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks Wahrscheinliche Robustheit von (Graph) Neuronalen Netzwerken gegen Datenvergiftung und Hintertürangriffe 防止数据中毒和后门攻击的(格)神经网络(防止数据中毒和后门攻击)的可证实的强力 2407.10867v3 -
505 07-15 A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation Eine Überprüfung der Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation 对贝叶斯不确定因素在深概率图像分割中量化的回顾 2411.16370v5 -
506 07-15 A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition Ein robuster, unvollständiger multimodaler Low-Rank-Anpassungsansatz für die Emotionserkennung 强烈的承认情感的不完全的多式低Rank适应办法 2507.11202v1 -
507 07-15 Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why Feature-based vs. GAN-based Learning from Demonstrations: Wann und warum 从示范活动中学习:何时和为何 2507.05906v2 -
508 07-15 Data-Driven Differential Evolution in Tire Industry Extrusion: Leveraging Surrogate Models Datengetriebene Differentialentwicklung in Reifenindustrie Extrusion: Hebelwirkung von Surrogate-Modellen 轮胎工业振荡中数据驱动的差别变化:杠杆化代金模型 2507.11191v1 -
509 07-15 Striking the Perfect Balance: Preserving Privacy While Boosting Utility in Collaborative Medical Prediction Platforms Perfekte Balance: Schutz der Privatsphäre bei gleichzeitiger Steigerung der Nützlichkeit in kollaborativen medizinischen Vorhersageplattformen 实现完美平衡:在合作医疗预测平台中维护隐私,同时促进效用 2507.11187v1 -
510 07-15 An Explainable AI-Enhanced Machine Learning Approach for Cardiovascular Disease Detection and Risk Assessment Ein erklärbarer KI-verbesserter maschineller Lernansatz für die Erkennung und Risikobewertung von Herz-Kreislauf-Erkrankungen 用于心血管疾病检测和风险评估的可解释的AI增强的机器学习方法 2507.11185v1 -
511 07-15 Quantized Rank Reduction: A Communications-Efficient Federated Learning Scheme for Network-Critical Applications Quantisierte Rangreduzierung: Ein kommunikativ-effizientes Federated Learning Scheme für netzwerk-kritische Anwendungen 减少数量级:网络-英国应用通信-效率高的联邦学习计划 2507.11183v1 -
512 07-15 Mixture of Experts in Large Language Models Mixtur von Experten in großen Sprachmodellen 大语言模式专家混合 2507.11181v1 -
513 07-15 Gradient Regularization-based Neural Granger Causality Gradient Regularisierung-basierte Neural Granger Kausalität 以神经重力为主的神经固态致果性 2507.11178v1 -
514 07-15 Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Verstärkung Learning Based UAV Deconfliction 在以无人驾驶航空器为基础的强化学习中潜入 2507.11173v1 -
515 07-15 Improving Wi-Fi Network Performance Prediction with Deep Learning Models Verbesserung der Wi-Fi-Netzwerk-Performance-Vorhersage mit Deep-Learning-Modellen 利用深学习模式改进无线网络绩效预测 2507.11168v1 -
516 07-15 GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks GRAPES: Lernen zu Mustergraphen für skalierbare Graphen-Neural-Netzwerke GRAPES: 学习可缩放图形神经网络样本图 2310.03399v3 -
517 07-15 EASTER: Embedding Aggregation-based Heterogeneous Models Training in Vertical Federated Learning EASTER: Einbettung von Aggregationsbasierten Heterogenen Modellen Training in vertikales Federated Learning EEASTER:在纵向联邦学习中嵌入基于聚合的异种模式培训 2310.13367v3 -
518 07-15 Fast Fourier Correlation is a Highly Efficient and Accurate Feature Attribution Algorithm from the Perspective of Control Theory and Game Theory Fast Fourier Correlation ist ein hocheffizientes und präzises Feature Attribution Algorithmus aus der Perspektive der Steuerungstheorie und Spieltheorie 从控制理论和游戏理论的角度看,快速的四面形关联是一种高效和准确的地物归属比值。 2504.02016v2 -
519 07-15 RMAU-NET: A Residual-Multihead-Attention U-Net Architecture for Landslide Segmentation and Detection from Remote Sensing Images RMAU-NET: Eine residual-Multihead-Aufmerksamkeit U-Net-Architektur für Erdrutschsegmentierung und Detektion von Fernerkundungsbildern RMAU-NET:从遥感图像中分离和探测滑坡的剩余-多头-注意 U-网络结构 2507.11143v1 -
520 07-15 CLA: Latent Alignment for Online Continual Self-Supervised Learning CLA: Latent Alignment for Online Continual Self-Supervised Learning CLCA: 在线持续自学在线持续自我监督学习的经常协调 2507.10434v2 -
521 07-15 Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking Verschüttetes Wasserzeichen als Filter: Bekämpfung von Schmieden und Überschreiben von Angriffen bei gewichtsbasiertem Neural Network Watermarking 流域水印作为过滤器:在以重量为基础的神经网络水印中击退伪造和推翻攻击 2507.11137v1 -
522 07-15 Interpretable Bayesian Tensor Network Kernel Machines with Automatic Rank and Feature Selection Interpretierbare Bayesian Tensor Netzwerk-Kernel-Maschinen mit automatischer Rang- und Feature-Auswahl 具有自动排级和特选功能的可解释贝耶斯泰瑟网络中枢机 2507.11136v1 -
523 07-15 What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests Was sollten LLMs vergessen? Quantifizierung personenbezogener Daten in LLMs für rechts-zu-vergessene Anfragen 普法女士应忘记什么? 将个人数据量化为 “ 有权被遗忘的请求 “ 的 “ 普法女士 “ 中的 “ 个人数据 “ 。 2507.11128v1 -
524 07-15 TAB: Unified Benchmarking of Time Series Anomaly Detection Methods TAB: Unified Benchmarking von Methoden zur Erkennung von Anomalien in der Zeitreihe TAB: 不同探测方法的时间序列统一基准 2506.18046v2 -
525 07-15 PPA-Game: Characterizing and Learning Competitive Dynamics Among Online Content Creators PPA-Game: Charakterisieren und Lernen wettbewerbsfähige Dynamik unter Online Content Creators PPA-Game:确定和学习在线内容创建者之间的竞争动态 2403.15524v2 -
526 07-15 Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Dynamisches Chunking für die end-to-end-Hierarchische Sequenzmodellierung 端端到末端等级序列建模动态震动 2507.07955v2 -
527 07-15 Context-Aware Deep Lagrangian Networks for Model Predictive Control Context-Aware Deep Lagrangian Networks für Modellvorhersagesteuerung 用于模型预测控制的深拉格朗江网络 2506.15249v2 -
528 07-15 Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs Multi-Trigger-Vergiftung verstärkt Sicherheitslücken in LLMs 多触发中毒行为放大了LLM 的后门脆弱性 2507.11112v1 -
529 07-15 A Mathematical Optimization Approach to Multisphere Support Vector Data Description Ein mathematischer Optimierungsansatz zur Multisphärenunterstützung Vektordatenbeschreibung 多重支持矢量数据描述的数学优化方法 2507.11106v1 -
530 07-15 Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs Sprachenübergreifendes Reisen: Benchmarking Cross-Lingual Consistency in multimodalen LLMs 跨语言旅行:多模式LLM中跨语言一致基准 2505.15075v3 -
531 07-15 LaCoOT: Layer Collapse through Optimal Transport LaCoOT: Layer Collapse durch optimalen Transport LaCOOT: 通过最佳迁移折叠图层 2406.08933v3 -
532 07-15 Tree-Structured Parzen Estimator Can Solve Black-Box Combinatorial Optimization More Efficiently Tree-Structured Parzen Estimator kann Black-Box Kombinatorische Optimierung effizienter lösen 树结构化 Parzen 模拟器能够更有效地解决黑色Box组合优化 2507.08053v2 -
533 07-15 SketchDNN: Joint Continuous-Discrete Diffusion for CAD Sketch Generation SketchDNN: Joint Continuous-Discrete Diffusion für CAD Sketch Generation SletchDNN: 为CAD SlaychDN 生成的 CAD SlaychDN 联合连续分解扩散 2507.11579v1 -
534 07-15 LogTinyLLM: Tiny Large Language Models Based Contextual Log Anomaly Detection LogTinyLLM: Kleine, große Sprachmodelle auf Basis von Kontext-Loganomalie-Erkennung LogTinyLLLM:基于上下文原对地探测的小型大语言模型 2507.11071v1 -
535 07-15 A Distance Metric for Mixed Integer Programming Instances Ein Abstandsmetrik für gemischte Integer-Programmierungsinstanzen 混合整数方案拟订实例远程计量 2507.11063v1 -
536 07-15 Comply: Learning Sentences with Complex Weights inspired by Fruit Fly Olfaction Comply: Lernen von Sätzen mit komplexen Gewichten inspiriert von Fruit Fly Olfaction 遵守:受果蝇运动启发的具有复杂重力的学习判决 2502.01706v3 -
537 07-15 Generalising Battery Control in Net-Zero Buildings via Personalised Federated RL Verallgemeinerung der Batteriesteuerung in Net-Zero-Gebäuden durch personalisierte Federated RL 通过个性化联式RL对净零楼的通用电池控制 2412.20946v2 -
538 07-15 Solar Flare Prediction Using Long Short-term Memory (LSTM) and Decomposition-LSTM with Sliding Window Pattern Recognition Solarflare-Vorhersage mit Langzeit-Kurzzeitspeicher (LSTM) und Zersetzung-LSTM mit Schiebefenstermustererkennung 使用长期短期内存(LSTM)和分解(SLSTM)的太阳光线预测和用滑式窗口模式识别的分解(SLTM) 2507.05313v2 -
539 07-15 GATE: Graph Attention Neural Networks with Real-Time Edge Construction for Robust Indoor Localization using Mobile Embedded Devices GATE: Grafik-Achtung Neurale Netzwerke mit Echtzeit-Edge-Konstruktion für robuste Indoor-Lokalisierung mit mobilen Embedded-Geräten GATE:利用移动嵌入装置实时边缘建设硬式室内本地化的图形关注神经网络 2507.11053v1 -
540 07-15 The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products Der Preis der Freiheit: Exploring Expressivity und Runtime Tradeoffs in gleichwertigen Tensor-Produkten 《自由的代价:探讨平等出租产品中的表达性和时间取舍》 2506.13523v2 -
541 07-15 ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification ReVISE: Verfeinern lernen zur Testzeit durch Intrinsische Selbstverifizierung REVISE:通过内在自我核查学习在试验时进行精炼 2502.14565v2 -
542 07-15 Learning from Label Proportions and Covariate-shifted Instances Lernen von Etikettenproportionen und Kovariate-verschiebten Instanzen 从标签比例和共同变换情况中学习 2411.12334v2 -
543 07-15 Relative Entropy Pathwise Policy Optimization Relative Entropie pfadweise politische Optimierung 相对 Entrop 路径式政策优化 2507.11019v1 -
544 07-15 Structured Preconditioners in Adaptive Optimization: A Unified Analysis Strukturierte Vorkonditionierer in adaptiver Optimierung: Eine einheitliche Analyse 适应性优化的结构性先决条件:统一分析 2503.10537v2 -
545 07-15 First-Order Error Matters: Accurate Compensation for Quantized Large Language Models Error Matters: Genaue Kompensation für Quantisierte große Sprachmodelle 第一顺序误差事项:量化大语言模型的准确补偿 2507.11017v1 -
546 07-15 Leveraging Advanced Machine Learning to Predict Turbulence Dynamics from Temperature Observations at an Experimental Prescribed Fire Nutzung von fortgeschrittenem maschinellem Lernen zur Vorhersage von Turbulenzdynamiken aus Temperaturbeobachtungen bei einem experimentellen vorgeschriebenen Feuer 利用先进机器学习利用实验定火条件下温度观测产生的预测扰动动力学 2507.11012v1 -
547 07-15 MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications MATE:为无障碍应用提供LLM 授权多机构翻译环境 2506.19502v2 -
548 07-15 On the Similarities of Embeddings in Contrastive Learning Über die Ähnlichkeiten von Einbettungen im kontrastiven Lernen 关于差异学习中的嵌入相似性 2506.09781v2 -
549 07-15 AdaMuon: Adaptive Muon Optimizer AdaMuon: Adaptiver Muon-Optimierer AdaMuon:适应性 Muon 最佳优化剂 2507.11005v1 -
550 07-15 Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data Unwahrnehmbare Angriffe auf das menschliche Gewirr für tabellarische Daten 用于表格数据的手工艺隐蔽的在门上对立攻击 2507.10998v1 -
551 07-15 Patch-wise Structural Loss for Time Series Forecasting Patch-weise strukturelle Verluste für die Zeitreihenvorhersage 时间序列预测的补补结构损失 2503.00877v2 -
552 07-15 Misalignment from Treating Means as Ends Fehlausrichtung aus der Behandlung von Mitteln als Enden 与 “ 最终 “ 处理手段的不协调 2507.10995v1 -
553 07-15 Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective Erforschung und Verbesserung der Initialisierung für tiefe Graphen-Neural-Netzwerke: Eine Signalverbreitungsperspektive 探索和改进深图神经网络的初始化:信号传动视角 2506.16790v2 -
554 07-15 Fully Data-driven but Interpretable Human Behavioural Modelling with Differentiable Discrete Choice Model Vollständig datengesteuerte, aber interpretierbare menschliche Verhaltensmodellierung mit differenzierbarem diskretes Wahlmodell 完全由数据驱动但可解释的人类行为模型与差异分辨选择模型 2412.19403v3 -
555 07-15 Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback Online-Intrinsische Belohnungen für Entscheidungsträger aus großen Sprachmodellen Feedback 来自大语言模式反馈的决策者在线内部奖励 2410.23022v3 -
556 07-15 BMDetect: A Multimodal Deep Learning Framework for Comprehensive Biomedical Misconduct Detection BMDEtect: Ein multimodales Deep Learning Framework für eine umfassende biomedizinische Fehlverhaltenserkennung BMM 检测:综合生物医学不当行为检测的多式深层学习框架 2505.05763v2 -
557 07-15 High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization 通过适应政策同步化进行适应性政策同步化 2507.10990v1 -
558 07-15 Trajectory Imputation in Multi-Agent Sports with Derivative-Accumulating Self-Ensemble Trajektorien-Imputation im Multi-Agenten-Sport mit demivativ-akkumulierendem Selbst-Ensemble 多机构体育中具有衍生-累积自我集合功能的多机构体育 2408.10878v4 -
559 07-15 StellarF: A Lora-Adapter Integrated Large Model Framework for Stellar Flare Forecasting with Historical & Statistical Data StellarF: Ein Lora-Adapter integriertes Large Model Framework für Stellar Flare-Prognose mit historischen und statistischen Daten StellarF: 利用历史和统计数据预测Stellar 火焰的Lora-Adapter综合大型模型框架 2507.10986v1 -
560 07-15 Physics-Informed Neural Networks For Semiconductor Film Deposition: A Review Physik-informierte Neuronale Netzwerke für Halbleiterfilmabscheidung: Ein Rückblick 半导体电影沉积的物理内建神经网络:回顾 2507.10983v1 -
561 07-15 Distribution-Free Uncertainty-Aware Virtual Sensing via Conformalized Neural Operators Distributionsfreies Unsichersein-Bewusstsein Virtuelles Sensing über konformisierte Neuraloperatoren 通过正规神经操作员进行分布式无不流通的不确定性-软件虚拟遥感 2507.11574v1 -
562 07-15 Unified ODE Analysis of Smooth Q-Learning Algorithms Einheitliche ODE-Analyse von glatten Q-Learning-Algorithmen 对平滑的Q-学习算法进行UI ODE分析 2404.14442v4 -
563 07-15 Seeding neural network quantum states with tensor network states Neurale Netzwerk-Quantenzustände mit Tensor-Netzwerkzuständen absähen 种子神经网络量量度状态与 ARW 网络状态 2506.23550v2 -
564 07-15 Is Training Data Quality or Quantity More Impactful to Small Language Model Performance? Ist Training Daten Qualität oder Quantität Impactful to Small Language Model Performance? 培训数据质量或数量是否对小型语言模范业绩更有影响? 2411.15821v4 -
565 07-15 GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering GOLFS: Feature-Auswahl durch Kombination sowohl globaler als auch lokaler Informationen für hochdimensionales Clustering GOLFS:通过将全球和地方信息相结合,为高维度集束组合组合选择特选 2507.10956v1 -
566 07-15 Diffusion Decoding for Peptide De Novo Sequencing Diffusionsdekodierung für Peptid De Novo Sequenzierung 用于新先令Peptide的分解 2507.10955v1 -
567 07-15 Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach Enthüllen von Unterschieden in generativen Modellen: Ein skalierbarer Differential-Clustering-Ansatz 创创型模型中无法消除的差别:可缩放差异群集办法 2405.02700v3 -
568 07-15 Rethinking the Foundations for Continual Reinforcement Learning Umdenken über die Grundlagen des kontinuierlichen Ausbaus des Lernens 重新思考不断加强学习的基础 2504.08161v3 -
569 07-15 SimAD: A Simple Dissimilarity-based Approach for Time Series Anomaly Detection SimAD: Ein einfacher, auf Dissimilarität basierender Ansatz zur Erkennung von Zeitreihenanomalien SMAD: 一种基于时间序列异常探测的简单差异法 2405.11238v2 -
570 07-15 Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models Auf dem Weg zu einem praktischen Benchmarking von Datenreinigungstechniken: Authentische Fehler über große Sprachmodelle generieren 制定数据清理技术实用基准:通过大语言模式产生真实错误 2507.10934v1 -
571 07-15 Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout Effizientes Federated Learning mit heterogenen Daten und adaptivem Dropout 采用异种数据和适应性辍学的高效联邦学习 2507.10430v2 -
572 07-15 Compositional Flows for 3D Molecule and Synthesis Pathway Co-design Kompositionsflüsse für 3D-Molekül und Synthese Pathway Co-Design 三维分子和综合途径共同设计的组成流程 2504.08051v2 -
573 07-15 Representation Bending for Large Language Model Safety Darstellungsbiegen für große Sprachmodellsicherheit 大语文示范语文安全示范语文代表名单 2504.01550v3 -
574 07-15 A Learning Framework For Cooperative Collision Avoidance of UAV Swarms Leveraging Domain Knowledge Ein Lernrahmen zur kooperativen Kollision Vermeidung von UAV-Schwärmen Nutzung von Domain-Wissen 合作协作避免无人驾驶航空飞行器冲冲冲器利用域域知识学习框架 2507.10913v1 -
575 07-15 View Invariant Learning for Vision-Language Navigation in Continuous Environments Invariantes Lernen für Vision-Language-Navigation in kontinuierlichen Umgebungen anzeigen 查看持续环境中愿景-语言导航变量学习 2507.08831v2 -
576 07-15 Class-Proportional Coreset Selection for Difficulty-Separable Data Klasse-Proportionale Coreset-Auswahl für schwer trennbare Daten 难分离数据的类类( Palportal) 核心集选择 2507.10904v1 -
577 07-15 LiLM-RDB-SFC: Lightweight Language Model with Relational Database-Guided DRL for Optimized SFC Provisioning LiLM-RDB-SFC: Leichtes Sprachmodell mit relationaler Datenbank-geführter DRL für optimierte SFC-Provisionierung LILM-RDB-SFC:为优化SFC供应而与关系数据库-指导DRL 优化SFC供应的轻量语言模型 2507.10903v1 -
578 07-15 Constrained Online Convex Optimization with Polyak Feasibility Steps Beschränkte Online Convex-Optimierung mit Polyak-Feasibility-Schritten 以聚氨酯可行性步骤实现优化 2502.13112v2 -
579 07-15 Commuting Distance Regularization for Timescale-Dependent Label Inconsistency in EEG Emotion Recognition Pendeldistanz-Regularisierung für zeitabhängige Label-Inkonsistenz bei der EEG-Emotionserkennung EEG情感识别中时间尺度依赖性标签不一致的远程常规化迁移 2507.10895v1 -
580 07-15 SurgeryLSTM: A Time-Aware Neural Model for Accurate and Explainable Length of Stay Prediction After Spine Surgery SurgeryLSTM: Ein zeitbewusstes Neuralmodell für genaue und erklärbare Dauer der Vorhersage nach Spine Surgery 手术LSTM: 脊柱外科后准确和可解释的停留时间预测时间长度的时器神经模型 2507.11570v1 -
581 07-15 Modernizing CNN-based Weather Forecast Model towards Higher Computational Efficiency Modernisierung des CNN-basierten Wettervorhersagemodells hin zu höherer rechnerischer Effizienz 使基于CNN的天气预报模型现代化,实现更高的计算效率 2507.10893v1 -
582 07-15 ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning ZebraLogic: Auf den Skalierungsgrenzen von LLMs für logische Vernunft ZebraLogic:关于逻辑理由解释的LLMs限制限度 2502.01100v2 -
583 07-15 Robust Semi-Supervised CT Radiomics for Lung Cancer Prognosis: Cost-Effective Learning with Limited Labels and SHAP Interpretation Robuste semi-überwachte CT-Radiomics für Lungenkrebs-Prognose: Kosteneffizientes Lernen mit limitierten Etiketten und SHAP-Interpretation 强力半半强化CT 肺癌预测的放射感应器:利用有限标签和SHAP解释进行成本-成本-效益高的学习 2507.08189v2 -
584 07-15 Outbound Modeling for Inventory Management Outbound-Modellierung für die Bestandsverwaltung 库存管理外部示范 2507.10890v1 -
585 07-15 SA-GDA: Spectral Augmentation for Graph Domain Adaptation SA-GDA: Spektrale Augmentation für Graph Domain Adaption SA-GDA:图域适应的光谱增强 2408.09189v2 -
586 07-15 How to Protect Models against Adversarial Unlearning? Wie kann man Modelle gegen das Unlernen von Widersachern schützen? 如何保护模型防止反向学习不学习? 2507.10886v1 -
587 07-15 Learning from Imperfect Data: Robust Inference of Dynamic Systems using Simulation-based Generative Model Von unvollkommenen Daten lernen: Robuste Schlussfolgerung dynamischer Systeme mit simulationsbasiertem Generativem Modell 从不完美数据中学习:使用模拟生成模型对动态系统进行有力的推论 2507.10884v1 -
588 07-15 Domain-Adaptive Small Language Models for Structured Tax Code Prediction Domain-Adaptive kleine Sprachmodelle für strukturierte Steuervorhersage 结构化税法预测结构化税法 2507.10880v1 -
589 07-15 BioScore: A Foundational Scoring Function For Diverse Biomolecular Complexes BioScore: Eine grundlegende Scoring-Funktion für vielfältige biomolekulare Komplexe 生物核心:多样性生物分子复合体的基础测量功能 2507.10877v1 -
590 07-15 The Odyssey of the Fittest: Can Agents Survive and Still Be Good? Die Odyssee der Fittest: Können Agenten überleben und immer noch gut sein? 《适龄者的奥德赛:代理能生存和保持良好吗? 2502.05442v3 -
591 07-15 GALDS: A Graph-Autoencoder-based Latent Dynamics Surrogate model to predict neurite material transport GALDS: Ein auf Graph-Autoencoder basierendes Latent Dynamics Surrogate-Modell zur Vorhersage des Neurit-Materialtransports GALDS:一个基于图形自动电解码器的冷流动态探测模型,用于预测中程材料的迁移 2507.10871v1 -
592 07-14 (1) PhysiX: A Foundation Model for Physics Simulations PhysiX: Ein Grundlagenmodell für Physiksimulationen PhysiX:物理模拟基础模型 2506.17774v2 -
593 07-14 Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning Training Dynamics zugrunde liegende Sprachmodellskalierungsgesetze: Verlustverschleierung und Null-Summe-Lernen 培训动态基础语言示范缩写法:损失减速和零苏姆学习 2506.05447v2 -
594 07-14 Visually grounded emotion regulation via diffusion models and user-driven reappraisal Optisch geerdete Emotionsregulation über Diffusionsmodelle und benutzergesteuerte Neubewertung 通过传播模型和用户驱动的重新评价,以视觉为基础的情感调控 2507.10861v1 -
595 07-14 Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Embedding RoPE neu denken: Ein mathematischer Blueprint für N-dimensionales Positional Embedding 重新思考ROPE: N维定位嵌入的数学蓝图 2504.06308v2 -
596 07-14 PhreshPhish: A Real-World, High-Quality, Large-Scale Phishing Website Dataset and Benchmark PhreshPhish: Ein echter, hochwertiger, großformatiger Phishing-Website-Datensatz und Benchmark PhreshPhish:一个现实世界、高质量、大规模搜索网站数据集和基准 2507.10854v1 -
597 07-14 Prediction via Shapley Value Regression Vorhersage durch Shapley-Wert-Regression 通过阴影值回归预测 2505.04775v2 -
598 07-14 FairTargetSim: An Interactive Simulator for Understanding and Explaining the Fairness Effects of Target Variable Definition FairTargetSim: Ein interaktiver Simulator zum Verstehen und Erklären der Fairness-Effekte von Target Variable Definition FairtargetSim: 理解和解释目标变量定义的公平影响的交互式模拟器 2403.06031v2 -
599 07-14 HEIMDALL: a grapH-based sEIsMic Detector And Locator for microseismicity HEIMDALL: ein grapH-basierter SEIsMic-Detektor und Ortung für Mikroseismizität HEMDALL: 一种基于 grapH 的微型地震探测器和定位器 2507.10850v1 -
600 07-14 Winsor-CAM: Human-Tunable Visual Explanations from Deep Networks via Layer-Wise Winsorization Winsor-CAM: Human-Tunable Visuelle Erklärungen aus Deep Networks über Layer-Wise Winsorization Winsor-CAM:通过图层-Wise Winsorization从深网络获得的人类可视解释 2507.10846v1 -
601 07-14 Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps Offline-Verstärkung Lernen mit Wasserstein Regularisierung über optimale Transportkarten 通过最佳运输地图与瓦塞斯坦通过最佳运输地图实现正规化 2507.10843v1 -
602 07-14 EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices EntroLLM: Entropie-kodierte Gewichtskompression für effiziente großsprachliche Modellableitung auf Edge-Geräten EntroLLM: Entropy Encod Weight 压缩,以高效大语言模型推导边缘设备 2505.02380v3 -
603 07-14 Geometric Learning Dynamics Geometrische Lerndynamik 几何学习动态 2504.14728v2 -
604 07-14 Entity-Specific Cyber Risk Assessment using InsurTech Empowered Risk Factors Cyber-Risikobewertung von Unternehmen mit InsurTech Empowered Risk Factors 利用科学、技术、赋权风险因素进行具体实体具体网络风险评估 2507.08193v2 -
605 07-14 Functional Neural Wavefunction Optimization Funktionelle Neuralwellenfunktionsoptimierung 功能神经波函数优化 2507.10835v1 -
606 07-14 From Small to Large: A Graph Convolutional Network Approach for Solving Assortment Optimization Problems Von klein zu groß: Ein Graph Convolutional Network Ansatz zur Lösung von Sortieroptimierungsproblemen 从小到大:解决各类优化问题图集网络方法 2507.10834v1 -
607 07-14 FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE FLAME: Auf dem Weg zu Federated Fine-Tuning großen Sprachmodellen durch adaptive SMoE FLAME:通过适应性SMOE,走向联邦微调大语言模式 2506.16600v2 -
608 07-14 Semantic Context for Tool Orchestration Semantischer Kontext für Werkzeug-Orchestrierung 工具管弦化的语义背景 2507.10820v1 -
609 07-14 Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood Prognose von intermittierenden Zeitreihen mit Gaußschen Prozessen und Tweedie-Wahrscheinlichkeit 利用高斯进程和特韦迪可能性预测间歇时间序列 2502.19086v4 -
610 07-14 Quantum Transfer Learning to Boost Dementia Detection Quantentransfer Lernen, um Demenzerkennung zu fördern 学习加速痴呆症检测的量子传输 2507.12485v1 -
611 07-14 Uncovering Causal Relation Shifts in Event Sequences under Out-of-Domain Interventions Entdecken von Kausalrelation-Shifts in Ereignissequenzen unter Out-of-Domain-Interventionen 场外干预下事件序列中未覆盖的因果关系变化变化 2507.10809v1 -
612 07-14 Multi-Armed Sampling Problem and the End of Exploration Multi-Armed Sampling Problem und das Ende der Exploration 多军备抽样问题和探索的结束 2507.10797v1 -
613 07-14 Multilayer Artificial Benchmark for Community Detection (mABCD) Multilayer Artificial Benchmark für Gemeinschaftserkennung (mABCD) 社区探测多人基准(MIABCD) 2507.10795v1 -
614 07-14 A Generalizable Physics-Enhanced State Space Model for Long-Term Dynamics Forecasting in Complex Environments Ein generalisierbares physik-verbessertes Zustands-Raummodell für die Langzeit-Dynamik-Prognose in komplexen Umgebungen 综合环境中长期动态预测空间模型 2507.10792v1 -
615 07-14 FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing FlowAlign: Trajektorie-regularisierte, inversionsfreie Fluss-basierte Bildbearbeitung 流动对等: 轨迹- 重新分类、 转换- 无流动图像编辑 2505.23145v3 -
616 07-14 Score-of-Mixture Training: Training One-Step Generative Models Made Simple via Score Estimation of Mixture Distributions Score-of-Mixture Training: Training Ein-Schritt-Generative Modelle einfach gemacht via Score-Abschätzung von Mixture-Distributionen 混合计分培训:通过对混合分发品进行记分估计而简单化的单级生成模型培训 2502.09609v3 -
617 07-14 XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation XGeM: Ein Multi-Prompt-Stiftungsmodell für multimodale medizinische Datengenerierung XGeM:多式医疗数据多式生成多式医疗多模式基金会模式 2501.04614v4 -
618 07-14 Transfer Learning Analysis of Variational Quantum Circuits Transfer Learning Analyse von Variationalen Quantenkreisen 变化量子电路变化性量子电路的转移学习分析 2501.01507v3 -
619 07-14 Accounting for multiplicity in machine learning benchmark performance Bilanzierung der Vielfältigkeit in der Benchmark-Leistung bei maschinellem Lernen 机器学习基准业绩多重性核算 2303.07272v6 -
620 07-14 Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift Einschließliche Interventionale Unabhängigkeit verbessert Robustheit gegen Interventionale Verteilungsverschiebung 纳入干预性独立 增强抵御干预性分配转变的力度 2507.05412v2 -
621 07-14 Spatial Reasoners for Continuous Variables in Any Domain Räumliche Reasoner für kontinuierliche Variablen in jeder Domäne 任何域中连续变量的空间理由 2507.10768v1 -
622 07-14 IoT Malware Network Traffic Detection using Deep Learning and GraphSAGE Models IoT Malware Netzwerk Traffic Detection mit Deep Learning und GraphSAGE-Modellen 利用深深学习和图形分析模型来探测 Iot 恶意网络流量 2507.10758v1 -
623 07-14 A Benchmarking Framework for AI models in Automotive Aerodynamics Ein Benchmarking-Rahmen für KI-Modelle in der Automobilaerodynamik 汽车空气动力学AI模型基准框架 2507.10747v1 -
624 07-14 Language Models for Adult Service Website Text Analysis Sprachmodelle für Erwachsene Service Website Textanalyse 成人服务语言模式网站文本分析 2507.10743v1 -
625 07-14 Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language Ground-Compose-Reinforce: Verstärktes Lernen durch formale Sprache 地面综合部队:通过正式语文指定加强学习代理 2507.10741v1 -
626 07-14 The Trust Calibration Maturity Model for Characterizing and Communicating Trustworthiness of AI Systems Das Modell der Treuhandkalibrierungsreife zur Charakterisierung und Kommunikation der Vertrauenswürdigkeit von KI-Systemen AI系统确定和传播信任度的信托校准期限模型 2503.15511v2 -
627 07-14 Extracting Document Relations from Search Corpus by Marginalizing over User Queries Extrahieren von Dokumentenbeziehungen aus dem Suchkorpus durch Marginalisierung über Benutzerfragen 将文件关系从搜索 Corpus 中提取, 将其边缘化于用户查询 2507.10726v1 -
628 07-14 Group-wise oracle-efficient algorithms for online multi-group learning Gruppenweise Orakel-effiziente Algorithmen für Online-Multigruppen-Lernen 用于在线多小组学习的群集法或手法效率算法 2406.05287v2 -
629 07-14 State-Constrained Offline Reinforcement Learning Staatlich bedingtes Offline-Verstärkungslernen 国家培训的离线强化学习 2405.14374v2 -
630 07-14 Distributionally Robust Optimization with Adversarial Data Contamination Verteilungsstarke Optimierung mit Adversarial Data Contamination 使用反对数据污染优化分布强力优化 2507.10718v1 -
631 07-14 Real-time, Adaptive Radiological Anomaly Detection and Isotope Identification Using Non-negative Matrix Factorization Echtzeit-, adaptive radiologische Anomalienerkennung und Isotopenidentifizierung mittels nicht-negativer Matrixfaktorisierung 利用非负矩阵化系数进行实时适应性辐射异常探测和同位素识别 2507.10715v1 -
632 07-14 A Simple Approximate Bayesian Inference Neural Surrogate for Stochastic Petri Net Models Eine einfache ungefähre Bayesian Inferenz Neural Surrogate für stochastische Petri Net Modelle 用于Stochastic Petrii 网模型的简单近近贝耶斯导引神经基体巡天模型 2507.10714v1 -
633 07-14 Imitation Learning from a Single Temporally Misaligned Video Imitation Lernen von einem einzigen temporär fehlgeleiteten Video 从单一临时错配视频中学习 2502.05397v2 -
634 07-14 DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving DroidSpeak: KV Cache Sharing für Cross-LLM Kommunikation und Multi-LLM Serving DroidSpeak: KV 共享缓存, 用于跨 LLM 通信和多 LLM 服务 2411.02820v4 -
635 07-14 Robust Multi-Manifold Clustering via Simplex Paths Robustes Multi-Manifold-Clustering über Simplex-Pfade 通过 Simlipx 路径进行强力多功能集成集成 2507.10710v1 -
636 07-14 Kernel Learning for Mean-Variance Trading Strategies Kernel-Lernen für Mittlere Varianz-Trading-Strategien 平均变化贸易战略核心学习 2507.10701v1 -
637 07-14 Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment Multi-Preference Lambda-bewertet Listwise DPO für Dynamic Preference Alignment 多首选项 Lambda 加权列表 DPO 动态首选项一致 2506.19780v4 -
638 07-14 Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder Selbstüberwachtes Lernen auf der Kamera Trap Footage führt zu einem starken universellen Gesicht Embedder 由自我监督的关于摄影机陷阱脚脚的自监学习 强烈的通用面部嵌入器 2507.10552v1 -
639 07-14 Quantize-then-Rectify: Efficient VQ-VAE Training Quantize-then-Rectify: Effiziente VQ-VAE-Schulung 量化-随后确定:有效的VQ-VAE培训 2507.10547v1 -
640 07-14 Disentangling Neural Disjunctive Normal Form Models Entwirren neural disjunktiver Normalformmodelle 分离神经分相正常格式模型 2507.10546v1 -
641 07-14 Fusing LLM Capabilities with Routing Data LLM-Fähigkeiten mit Routing-Daten Fusing LLM 带路标数据功能的Fusing LLM 功能 2507.10540v1 -
642 07-14 Graph World Model Schaubild-Weltmodell 世界模型 2507.10539v1 -
643 07-14 On the Performance of Differentially Private Optimization with Heavy-Tail Class Imbalance Zur Performance der differenzierten privaten Optimierung mit Heavy-Tail-Klasse-Unwucht 以重赛级的不平衡进行有区别的私人优化 2507.10536v1 -
644 07-14 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Begründung oder Erinnerung? Unzuverlässige Ergebnisse des Verstärkungslernens aufgrund von Datenkontamination 理由或记忆化?由于数据污染而加强学习的不可靠结果 2507.10532v1 -
645 07-14 Expert-level validation of AI-generated medical text with scalable language models Validierung von KI-generierten medizinischen Texten auf Expertenebene mit skalierbaren Sprachmodellen 专家一级对AI产生的带有可缩放语言模型的可缩放语言模型的医学文本进行鉴定 2507.03152v2 -
646 07-14 Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation Mixture-of-Recursions: Dynamische Rekursive Tiefen für adaptive Token-Level-Computation lernen 混合流流流:学习适应调控级计算法的动态回流深度 2507.10524v1 -
647 07-14 Ark: An Open-source Python-based Framework for Robot Learning Ark: Ein Open-Source-Python-basiertes Framework für das Roboterlernen Ark:一个基于开放源码的机器人学习 Python 框架 2506.21628v2 -
648 07-14 Visual Test-time Scaling for GUI Agent Grounding Visual Test-Time Scaling für GUI Agent Grounding GUI 代理定位的视觉测试时间缩放 2505.00684v2 -
649 07-14 A Unified View on Learning Unnormalized Distributions via Noise-Contrastive Estimation Unified View on Learning Unnormalized Distributions via Noise-Contrastive Assimation 关于通过噪音 – – 关心 – – 估计来学习非正常分发的统一观点 2409.18209v2 -
650 07-14 Improved Offline Contextual Bandits with Second-Order Bounds: Betting and Freezing Verbesserte Offline-Kontext-Banditen mit Second-Order-Bounds: Wetten und Einfrieren 改善有二阶边界的离线环境强盗:打赌和冻结 2502.10826v2 -
651 07-14 National level satellite-based crop field inventories in smallholder landscapes Nationale satellitengestützte Feldbestände in Kleinbauernlandschaften 国家一级基于卫星的小农地貌景观作物实地清单 2507.10499v1 -
652 07-14 Split Happens: Combating Advanced Threats with Split Learning and Function Secret Sharing Split passiert: Mit Split Learning und Function Secret Sharing gegen fortgeschrittene Bedrohungen 分化事件:通过分化学习和职能秘密分享来对抗先进威胁 2507.10494v1 -
653 07-14 On the Robustness Tradeoff in Fine-Tuning Über die Robustheit im Feintuning 关于强健的决断 2503.14836v2 -
654 07-14 BenchReAD: A systematic benchmark for retinal anomaly detection BenchReAD: Ein systematischer Benchmark für Netzhautanomaliendetektion BenchReAD: 视视网膜异常现象探测系统基准 2507.10492v1 -
655 07-14 Enabling Advanced Land Cover Analytics: An Integrated Data Extraction Pipeline for Predictive Modeling with the Dynamic World Dataset Ermöglichen von Advanced Land Cover Analytics: Eine integrierte Datenextraktionspipeline für vorausschauende Modellierung mit dem Dynamic World Dataset 扶持性先进土地覆盖分析分析:利用动态世界数据集进行预测模拟的综合数据提取管道 2410.09135v2 -
656 07-14 Overcoming catastrophic forgetting in neural networks Überwindung des katastrophalen Vergessens in neuronalen Netzwerken 克服神经网络中的灾难性遗忘 2507.10485v1 -
657 07-14 Random Erasing vs. Model Inversion: A Promising Defense or a False Hope? Zufällige Auslöschung gegen Modellumkehr: Eine vielversprechende Verteidigung oder eine falsche Hoffnung? 随机反射与模型反射:有希望的防御还是虚幻的希望? 2409.01062v2 -
658 07-14 From BERT to Qwen: Hate Detection across architectures Von BERT bis Qwen: Hasserkennung über Architekturen hinweg 从BERT到Quw:跨结构的仇恨检测 2507.10468v1 -
659 07-14 An Interoperable Machine Learning Pipeline for Pediatric Obesity Risk Estimation Eine interoperable Machine Learning Pipeline für die Abschätzung des Kinderleibsrisikos 用于小儿产科风险估计的可互操作的机器学习管道 2412.10454v2 -
660 07-14 Discrimination-free Insurance Pricing with Privatized Sensitive Attributes Diskriminierungsfreie Versicherungspreise mit privatisierten sensiblen Attributen 与私有化的敏感敏感属性挂钩的无歧视无歧视保险 2504.11775v2 -
661 07-14 RAPNet: A Receptive-Field Adaptive Convolutional Neural Network for Pansharpening RAPNet: Ein rezeptives, adaptives, konvolutionäres Neuralnetzwerk für Pansharpening RAPNet: 泛码头受体-战地适应性革命神经网络 2507.10461v1 -
662 07-14 Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds Poisson Midpoint-Methode für Log Concave Sampling: Jenseits der starken Fehler unteren Bounds 日志集中取样的 Poisson 中点方法: 超越强误差, 下界 2506.07614v2 -
663 07-14 TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models TaylorPODA: Eine Taylor Expansion-basierte Methode zur Verbesserung der Post-Hoc-Attributionen für Opaque-Modelle 泰勒·泰勒:以扩大泰勒为基础的方法,改进不透明模式的后住房分配办法 2507.10643v1 -
664 07-14 First-of-its-kind AI model for bioacoustic detection using a lightweight associative memory Hopfield neural network First-of-its-Art-KI-Modell für die bioakustische Erkennung mit einem leichten assoziativen Speicher Hopfield neuronalen Netzwerk 使用轻量级联合内存Hopfield神经网络进行生物声学探测的首类AI型AI型生物声学探测模型 2507.10642v1 -
665 07-14 Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems Logic Layer Prompt Control Injection (LPCI): Eine neuartige Sicherheitslückenklasse in Agentensystemen 逻辑层快速控制喷射(LPCI): 剂系统中的新安全脆弱程度类别 2507.10457v1 -
666 07-14 Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction Rollen Sie die Würfel & Blick, bevor Sie springen: Gehen über die kreativen Grenzen der Next-Token-Vorhersage 跳跃前的骰子滚动和看一看:超越了次声预测的创造性极限 2504.15266v3 -
667 07-14 Non-exchangeable Conformal Prediction with Optimal Transport: Tackling Distribution Shifts with Unlabeled Data Nicht austauschbare konforme Vorhersagen mit optimalem Verkehr: Umschaltung von Verteilungsverschiebungen mit unmarkierten Daten 采用最佳运输方式的非正规非正式预测:用无标签数据处理分配变化 2507.10425v1 -
668 07-14 SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning SentiDrop: Ein Multi Modal Machine Learning Modell zur Vorhersage von Ausfällen im Fernunterricht SentiDROp:用于预测远程学习辍学的多模式机械学习模式 2507.10421v1 -
669 07-14 Multiple Choice Learning of Low Rank Adapters for Language Modeling Multiple Choice-Lernen von Low-Rank-Adaptern für die Sprachmodellierung 低级别语言建模适应者多选择学习 2507.10419v1 -
670 07-14 Anticipating the Selectivity of Cyclization Reaction Pathways with Neural Network Potentials Die Selektivität von Zyklisierungsreaktionspfaden mit neuralen Netzwerkpotentialen antizipieren 预测具有神经网络潜力的循环反应路径的选择性 2507.10400v1 -
671 07-14 SEAL: Towards Safe Autonomous Driving via Skill-Enabled Adversary Learning for Closed-Loop Scenario Generation SEAL: Auf dem Weg zu einem sicheren autonomen Fahren durch qualifikationsfähiges, gewinnbringendes Lernen für die Closed-Loop-Szenario-Erzeugung SEAL:通过技能-有技能的对抗性学习实现安全自主驾驶,促进闭路电视假想一代人的安全自主驾驶 2409.10320v3 -
672 07-14 Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection Systems Bypassing LLM Guardrails: Eine empirische Analyse von Evasionsangriffen gegen prompte Injektions- und Jailbreak-Detektionssysteme 绕过LLM 护卫车:对攻击即时射入和越狱侦察系统的逃难攻击经验分析 2504.11168v3 -
673 07-14 Deep Learning Accelerated Quantum Transport Simulations in Nanoelectronics: From Break Junctions to Field-Effect Transistors Deep Learning Beschleunigte Quantentransportsimulationen in der Nanoelektronik: Von Break Junctions zu Field-Effect Transistors 纳米电子中的深度学习加速量子传输模拟:从断裂交叉点到实地影响晶体体体 2411.08800v3 -
674 07-14 Extracting Important Tokens in E-Commerce Queries with a Tag Interaction-Aware Transformer Model Extrahieren wichtiger Token in E-Commerce Abfragen mit einem Tag Interaction-Aware Transformer Modell 使用标签互动软件变换模型在电子商务查询中提取重要调量 2507.10385v1 -
675 07-14 Dynamical stability for dense patterns in discrete attractor neural networks Dynamische Stabilität für dichte Muster in diskreten neuronalen Attraktorennetzen 离散吸引性神经网络中密度型态动态稳定的动态稳定 2507.10383v1 -
676 07-14 Leveraging RAG-LLMs for Urban Mobility Simulation and Analysis Nutzung von RAG-LLMs für Simulation und Analyse der urbanen Mobilität 为城市流动模拟和分析利用RAG-LLMs进行城市流动模拟和分析 2507.10382v1 -
677 07-14 Improving Remote Sensing Classification using Topological Data Analysis and Convolutional Neural Networks Verbesserung der Klassifikation der Fernerkundung mittels topologischer Datenanalyse und konvolutionärer neuraler Netzwerke 利用地形数据分析和进化神经网络改进遥感分类 2507.10381v1 -
678 07-14 EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration EVOLvE: Bewertung und Optimierung von LLMs für In-Context Exploration EVOLvE: 评估和优化用于内衣探索的LMs LMs 2410.06238v2 -
679 07-14 Test-Time Canonicalization by Foundation Models for Robust Perception Test-Time Canonicalization durch Foundation Models für robuste Wahrnehmung 强力感知基础模型的试验时罐化 2507.10375v1 -
680 07-14 Enhanced DeepONet for 1-D consolidation operator learning: an architectural investigation Verbesserte DeepONet für 1-D-Konsolidierungsoperator Lernen: eine architektonische Untersuchung 1D整合操作员学习的强化深水卫星:建筑调查 2507.10368v1 -
681 07-14 HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong HKGAI-V1: Auf dem Weg zu einem regionalen Souveränen Großsprachenmodell für Hongkong HKGAI-V1:为香港建立区域主权大语言模式 2507.11502v1 -
682 07-14 SENSOR: An ML-Enhanced Online Annotation Tool to Uncover Privacy Concerns from User Reviews in Social-Media Applications SENSOR: Ein ML-erweitertes Online-Annotations-Tool, um Datenschutz-Bedenken aus User Reviews in Social-Media-Anwendungen zu enthüllen SENSOR:一个ML-加强在线说明工具,以从社会-媒体应用中的用户审查中发现隐私问题。 2507.10640v1 -
683 07-14 Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation Durchschnittlicher Kalibrierungsfehler: Ein differenzierbarer Verlust für verbesserte Zuverlässigkeit in der Bildsegmentierung 平均校准误差:图像分割法可靠性提高的可区别损失 2403.06759v4 -
684 07-14 TAT: Temporal-Aligned Transformer for Multi-Horizon Peak Demand Forecasting TAT: Temporal ausgerichteter Transformer für Multi-Horizon-Peak-Nachfrageprognosen TAT: 多霍里宗峰需求预测的时向调整变换器 2507.10349v1 -
685 07-14 Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning Feature Destillation ist die bessere Wahl für modell-heterogenes Federated Learning 精化是示范-异种联邦学习的更好选择。 2507.10348v1 -
686 07-14 Parallel Sampling of Diffusion Models on $SO(3)$ Parallele Probenahme von Diffusionsmodellen auf $SO(3)$ 以USSO美元(3美元)平行采集传播模型样本 2507.10347v1 -
687 07-14 Faster Reinforcement Learning by Freezing Slow States Schnellere Stärkung des Lernens durch einfrierende langsame Staaten 冷冻慢速国家加快加强学习 2301.00922v3 -
688 07-14 Some Super-approximation Rates of ReLU Neural Networks for Korobov Functions Einige Super-Annäherungsraten der ReLU-Neuralnetze für Korobov-Funktionen Korobov 函数的 ReLU 神经网络的某些超接近率 2507.10345v1 -
689 07-14 MoCap-Impute: A Comprehensive Benchmark and Comparative Analysis of Imputation Methods for IMU-based Motion Capture Data MoCap-Impute: Umfassender Benchmark und vergleichende Analyse von Imputationsmethoden für IMU-basierte Motion Capture Daten MoCap-Capute:以IMU为基础的运动捕获数据估算方法综合基准和比较分析 2507.10334v1 -
690 07-14 Constructing Extreme Heatwave Storylines with Differentiable Climate Models Extreme Hitzewellen-Geschichten mit differenzierbaren Klimamodellen konstruieren 以差异气候模型构建极端热浪线 2506.10660v2 -
691 07-14 Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach Überbrückung von Robustheit und Verallgemeinerung gegen Wortersatzangriffe in NLP über den Ansatz der Wachstumsbound Matrix 通过 “ 增长组合矩阵方法 “ ,在NLP中架起桥梁,反对用词替代袭击的有力性和普遍性 2507.10330v1 -
692 07-14 Zero-Shot Cyclic Peptide Design via Composable Geometric Constraints Zero-Shot Cyclic Peptid Design über kompostierbare geometrische Einschränkungen 利用可合成几何限制设计零热阴极球化物 2507.04225v2 -
693 07-14 Convergence of Agnostic Federated Averaging Konvergenz der agnostischen Föderierten Durchschnittswerte Agnostic Federal 波动的趋同 2507.10325v1 -
694 07-14 LEXam: Benchmarking Legal Reasoning on 340 Law Exams LEXam: Benchmarking der rechtlichen Begründung von 340 Rechtsprüfungen LEXam:340项法律考试的法律依据基准 2505.12864v3 -
695 07-14 Recognizing Dementia from Neuropsychological Tests with State Space Models Demenz aus neuropsychologischen Tests mit State Space Models erkennen 利用国家空间模型进行神经心理测试的痴呆症 2507.10311v1 -
696 07-14 DESIGN: Encrypted GNN Inference via Server-Side Input Graph Pruning DESIGN: Verschlüsselte GNN-Inferenz über Server-Side Input Graph Pruning design:通过服务器- Side 输入图路透图加密的 GNN 推论 2507.05649v2 -
697 07-14 TKAN: Temporal Kolmogorov-Arnold Networks TKAN: Temporale Kolmogorov-Arnold-Netzwerke TKAN: 时间性科尔莫戈罗夫-阿诺尔德网络 2405.07344v4 -
698 07-14 MF-GLaM: A multifidelity stochastic emulator using generalized lambda models MF-GLaM: Ein multifidelity stochastischer Emulator mit generalisierten Lambda-Modellen MF-GLAM:使用通用羊羔模型的多纤维性随机模拟模拟器 2507.10303v1 -
699 07-14 Low Resource Reconstruction Attacks Through Benign Prompts Niedrige Ressourcen-Wiederaufbau Angriffe durch Benign Prompts 通过慈善提示进行低资源重建袭击 2507.07947v2 -
700 07-14 Average Sensitivity of Hierarchical $k$-Median Clustering Durchschnittliche Empfindlichkeit des hierarchischen $k$-Median-Clusters 等级平均敏感度(千克元-印面) 2507.10296v1 -
701 07-14 Application of RESNET50 Convolution Neural Network for the Extraction of Optical Parameters in Scattering Media Anwendung von RESNET50 Convolution Neural Network zur Extraktion optischer Parameter in Streumedien RESNET50 利用革命神经网络在散散射媒体中提取光学参数 2404.16647v2 -
702 07-14 Conditional Chemical Language Models are Versatile Tools in Drug Discovery Bedingte chemische Sprachmodelle sind vielseitige Werkzeuge in der Drug Discovery 有条件的化学语言模型是药物发现中易感工具 2507.10273v1 -
703 07-14 Asymptotic regularity of a generalised stochastic Halpern scheme Asymptotische Regelmäßigkeit eines generalisierten stochastischen Halpern-Systems 普通的口切性Halpern计划无症状的常规性 2411.04845v2 -
704 07-14 DNS Tunneling: Threat Landscape and Improved Detection Solutions DNS Tunneling: Bedrohungslandschaft und verbesserte Erkennungslösungen DNS 隧道建设:威胁景观和改进探测解决方案 2507.10267v1 -
705 07-14 On the asymptotic behaviour of stochastic processes, with applications to supermartingale convergence, Dvoretzky’s approximation theorem, and stochastic quasi-Fejér monotonicity Über das asymptotische Verhalten stochastischer Prozesse, mit Anwendungen zur Supermartingale Konvergenz, Dvoretzkys Näherungssatz und stochastische Quasi-Fejér-Monotonizität 关于随机过程的无症状行为,应用到超海趋同、Dvoretzky的近似理论,以及随机准菲杰尔单音性。 2504.12922v2 -
706 07-14 A Simple Baseline for Stable and Plastic Neural Networks Eine einfache Basis für stabile und plastische Neuralnetze 稳定神经网络和可塑神经网络的简单基线 2507.10637v1 -
707 07-14 Transformers Can Solve Non-Linear and Non-Markovian Filtering Problems in Continuous Time For Conditionally Gaussian Signals Transformer können nicht-lineare und nicht-markowsche Filterprobleme in kontinuierlicher Zeit für bedingt gaussische Signale lösen 变换器可以在连续时间解答非滑动和非马尔科维的过滤问题, 以用于有条件的高斯信号 2310.19603v4 -
708 07-14 DepViT-CAD: Deployable Vision Transformer-Based Cancer Diagnosis in Histopathology DepViT-CAD: Deployable Vision Transformerbasierte Krebsdiagnose in der Histopathologie DepVVT-CAD: 在病理学中可部署的愿景变异器癌症诊断 2507.10250v1 -
709 07-14 GeoHopNet: Hopfield-Augmented Sparse Spatial Attention for Dynamic UAV Site Location Problem GeoHopNet: Hopfield-Augmented Sparse Räumliche Aufmerksamkeit für dynamische UAV-Standort-Problem GeoHopNet:动态无人驾驶飞行器现场位置问题 2507.10636v1 -
710 07-14 Kernel-Adaptive PI-ELMs for Forward and Inverse Problems in PDEs with Sharp Gradients Kernel-Adaptive PI-ELMs für vorwärts und inverse Probleme bei PDEs mit scharfen Gradienten 具有尖锐梯度的PDE中前方问题和反问题核心适应性 PI-ELMs 2507.10241v1 -
711 07-14 Visual Analytics for Explainable and Trustworthy Artificial Intelligence Visual Analytics für erklärbare und vertrauenswürdige Künstliche Intelligenz 可解释和可信赖的人工智能的视觉分析分析 2507.10240v1 -
712 07-14 Spatial Lifting for Dense Prediction Raumheben für dichte Vorhersagen 高度预测空间升空 2507.10222v1 -
713 07-14 A Graph Sufficiency Perspective for Neural Networks Eine grafische Sufficiency-Perspektive für neurale Netzwerke 图 神经网络的量化透视图 2507.10215v1 -
714 07-14 Formal Verification of Variational Quantum Circuits Formale Überprüfung von Variations-Quantenkreisen 变量量电路的正式核查 2507.10635v1 -
715 07-14 Learning to Quantize and Precode in Massive MIMO Systems for Energy Reduction: a Graph Neural Network Approach Quantisieren und Vorkodieren in massiven MIMO-Systemen zur Energiereduzierung lernen: ein Graph Neuronaler Netzwerkansatz 学习如何量化和预先编码巨量海事组织大规模减少能源系统:图表神经网络方法 2507.10634v1 -
716 07-14 History Matching under Uncertainty of Geological Scenarios with Implicit Geological Realism Control with Generative Deep Learning and Graph Convolutions Geschichte Passend unter Ungewissheit geologischer Szenarien mit impliziter geologischer Realismuskontrolle mit generativem Deep Learning und Graph Convolutions 历史在地质情景与隐隐隐的地质现实控制与产生深层学习和图案革命的不确定性的不确定性下匹配的历史 2507.10201v1 -
717 07-14 Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models Trinity-RFT: Ein allgemein angelegtes und einheitliches Rahmenwerk zur Verstärkung der Feinsteuerung großer Sprachmodelle 三一-RFT:加强大语言模式精美应用的一般目的和统一框架 2505.17826v2 -
718 07-14 Learning Private Representations through Entropy-based Adversarial Training Private Repräsentationen lernen durch eine auf Entropie basierende Adversarial-Schulung 通过以英文为基础的反向培训进行学习私人代表 2507.10194v1 -
719 07-14 T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs T-GRAB: Ein synthetischer Diagnose-Benchmark für das Lernen auf zeitlichen Graphen T-GRAB: 时间图学习的合成诊断基准 2507.10183v1 -
720 07-14 Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving Pimba: Eine Verarbeitungs-in-Memory-Beschleunigung für Post-Transformer-Großsprachmodell-Servieren Pimba:在外向后大语文示范服务中快速处理后大语文示范服务 2507.10178v1 -
721 07-14 Token-based Audio Inpainting via Discrete Diffusion Token-basierte Audio-Inpainting über Discrete Diffusion 以 Tokon 为基调的音频通过分解传播油漆 2507.08333v2 -
722 07-14 Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning? Sollten wir jemals Entscheidungstransformator für Offline-Verstärkung Lernen bevorzugen? 我们是否应该更偏爱离线强化学习的决策变异器? 2507.10174v1 -
723 07-14 Play Style Identification Using Low-Level Representations of Play Traces in MicroRTS Wiedergabestil-Identifizierung mit Low-Level-Darstellungen von Spielspuren in MicroRTS 使用微小RTS游戏轨迹的低层次代表的游戏样式识别 2507.10172v1 -
724 07-14 Understanding the Rank of Tensor Networks via an Intuitive Example-Driven Approach Den Rang der Tensor-Netzwerke über einen intuitiven Beispiel-getriebenen Ansatz verstehen 通过直观的 “ 实例转化办法 “ 了解Tensor网络的排名 2507.10170v1 -
725 07-14 Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning Entgraben von Edelsteinen aus Steinen: Politikoptimierung mit negativer Probenvergrößerung für LLM-Reasoning 从石石中挖出金宝石:政策优化,对LLM理由的负抽样增加 2505.14403v3 -
726 07-14 Domain Borders Are There to Be Crossed With Federated Few-Shot Adaptation Domain-Grenzen gibt es mit Föderated Few-Shot-Anpassung überschritten werden 与联邦几热量适应措施交界的域域边界 2507.10160v1 -
727 07-14 Simulating Biases for Interpretable Fairness in Offline and Online Classifiers Simulation von Biasen für interpretierbare Fairness in Offline- und Online-Klassifikatoren 模拟离线和在线分类中的可解释公平比数 2507.10154v1 -
728 07-14 Concentration of measure for non-linear random matrices with applications to neural networks and non-commutative polynomials Konzentration von Messwerten für nichtlineare Zufallsmatrizen mit Anwendungen in neuronalen Netzwerken und nicht-kommutativen Polynomen 非线性随机随机矩阵的测量浓度,该矩阵应用到神经网络和非模拟多元复合体 2507.07625v2 -
729 07-14 Deep Recurrence for Dynamical Segmentation Models Tiefe Wiederholung für dynamische Segmentierungsmodelle 动态分割模型的深度重现 2507.10143v1 -
730 07-14 Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review Anpassungsfähigkeit im Mehr-Agenten-Verstärkungs-Lernen: Ein Rahmen und eine einheitliche Überprüfung 多机构加强学习中的适应性:框架和统一审查 2507.10142v1 -
731 07-14 Large-Scale Graph Building in Dynamic Environments: Low Latency and High Quality Large-Scale Graph Building in dynamischen Umgebungen: geringe Latenz und hohe Qualität 动态环境中的大比例图建设:低长期和高质量 2507.10139v1 -
732 07-14 Wavelet-Enhanced Neural ODE and Graph Attention for Interpretable Energy Forecasting Wavelet-Enhanced Neural ODE und Graphen-Achtung für interpretierbare Energieprognosen 用于可解释性能源预测的增强的神经数字和图示注意 2507.10132v1 -
733 07-14 Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution Hypersphärische Variations-Autoencoder mit effizienter sphärischer Cauchy-Distribution 使用高效球道球道配送的超球变异自动编码器 2506.21278v2 -
734 07-14 A Variance-Reduced Cubic-Regularized Newton for Policy Optimization Ein varianzreduzierter kubisch-regularisierter Newton für politische Optimierung 用于政策优化的 差异缩放立方( Cubic- Reculized 牛顿) 2507.10120v1 -
735 07-14 Analysis of AI Techniques for Orchestrating Edge-Cloud Application Migration Analyse von KI-Techniken für das Orchestrieren von Edge-Cloud-Anwendungsmigration AI: AI: 拼接边城-边城应用移民应用技术分析 2507.10119v1 -
736 07-14 DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models DiaTool-DPO: Multi-Turn Direct Preference Optimierung für Tool-Augmented Large Language Models DiaTool-DPO:多发直接首选优化工具增强型大语言模型 2504.02882v2 -
737 07-14 Riemannian Time Warping: Multiple Sequence Alignment in Curved Spaces Riemannian Time Warping: Mehrere Sequenzen richten sich in gekrümmten Räumen Riemannian 时间扭曲: 曲线空间中的多个序列对齐 2506.01635v3 -
738 07-14 A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications Eine umfassende Übersicht über die direkte Präferenzoptimierung: Datensätze, Theorien, Varianten und Anwendungen 直接优先优化综合调查:数据集、理论、变式和应用 2410.15595v3 -
739 07-14 Explaining the Impact of Training on Vision Models via Activation Clustering Erklären der Auswirkungen von Schulungen auf Vision-Modelle durch Aktivierungs-Clustering 解释培训通过启动集群化对愿景模型的影响 2411.19700v4 -
740 07-14 Structuring Radiology Reports: Challenging LLMs with Lightweight Models Structuring Radiology Reports: Herausfordernde LLMs mit Leichtbaumodellen 结构化放射学报告:用轻量级模型对LMS提出挑战 2506.00200v2 -
741 07-14 FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training FRUGAL: Memory-Efficient Optimization durch Reduzierung des staatlichen Overheads für skalierbares Training FRUGAL:通过减少国家可扩展培训的间接费用,实现记忆-有效优化 2411.07837v2 -
742 07-14 Class-Aware PillarMix: Can Mixed Sample Data Augmentation Enhance 3D Object Detection with Radar Point Clouds? Klasse-Aware-SäuleMix: Kann gemischte Probendatenvergrößerung die 3D-Objekterkennung mit Radarpunktwolken verbessern? 类警用支柱混合:混合抽样数据增强能够用雷达点云加强3D物体探测吗? 2503.02687v2 -
743 07-14 Towards High Supervised Learning Utility Training Data Generation: Data Pruning and Column Reordering Auf dem Weg zu einem hochüberwachten Lernprogramm zur Schulung von Datengenerierung: Datenkorrektur und Spaltenumstellung 数据生成:数据调节和整列重新排序 2507.10088v1 -
744 07-14 A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area Eine Transfer-Lernmethode für die Segmentierung von Wasserkörpern in Fernerkundungsbildern: Eine Fallstudie des Zhada-Tulin-Gebiets 遥感图像中水体分离的转让学习方法:Zhada Tulin地区的案例研究 2507.10084v1 -
745 07-14 Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding Kodezi Chronos: Ein Debugging-First Language Model für Repository-Scale, Memory-Driven Code Understanding Kodezi Chronos:调试第一语言模型,用于存储库规模、记忆驱动代码理解 2507.12482v1 -
746 07-14 RsGCN: Rescaling Enhances Generalization of GCNs for Solving Scalable Traveling Salesman Problems RsGCN: Rescaling verbessert die Generalisierung von GCNs zur Lösung skalierbarer reisender Salesman-Probleme RsGCN: 提高全球氯化萘的通用化,以解决可缩放旅行销售员问题 2506.00533v3 -
747 07-14 Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction Komprimierungsmethode für das Deep Diagonal State Space Model basierend auf $H^2$ Optimale Reduktion 以2千赫元最佳减少量为基础的深对角国家空间模型压缩方法 2507.10078v1 -
748 07-14 Quality over Quantity: An Effective Large-Scale Data Reduction Strategy Based on Pointwise V-Information Qualität über Quantität: Eine effektive großräumige Datenreduktionsstrategie basierend auf pointwise V-Informationen 质量高于数量:基于点五信息的有效大型数据减少战略 2507.00038v2 -
749 07-14 ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism ElasticMM: Effiziente multimodale LLMs mit elastischer multimodaler Parallelität Elastic MM: 高效的多式多式LLMs 与 Elastic 多式平行主义一起服务 2507.10069v1 -
750 07-14 A Vector-Quantized Foundation Model for Patient Behavior Monitoring Ein Vector-Quantisiertes Foundation-Modell für Patientenverhaltensüberwachung 病人行为监测矢量定量基础模型 2503.15221v2 -
751 07-14 PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization PRISM: Feinkörniges Papier-zu-Papier-Retrieval mit Multi-Aspect-Aware-Abfrageoptimierung PRISM: 配有多频谱软件查询优化的精细读纸到纸检索器 2507.10057v1 -
752 07-14 Scalable Unsupervised Segmentation via Random Fourier Feature-based Gaussian Process Skalierbare unüberwachte Segmentierung über Random Fourier Feature-based Gaussian Process 通过随机的 Fourier 地貌特征基于 Gaussian 的 Gaussian 进程进行可缩放的不受监督的分割 2507.10632v1 -
753 07-14 Lightweight Model for Poultry Disease Detection from Fecal Images Using Multi-Color Space Feature Optimization and Machine Learning Leichtes Modell für die Erkennung von Geflügelkrankheiten von Fäkalienbildern mit Multi-Color-Raum-Feature-Optimierung und maschinellem Lernen 利用多层空间地物优化和机器学习法从地表图像中检测禽类疾病轻量模型 2507.10056v1 -
754 07-14 Self-attentive Transformer for Fast and Accurate Postprocessing of Temperature and Wind Speed Forecasts Selbstaufmerksamer Transformer für schnelle und genaue Nachbearbeitung von Temperatur- und Windgeschwindigkeitsprognosen 用于快速和准确快速和准确的温度和风速预报后处理的自控变形器 2412.13957v2 -
755 07-14 IPAD: Inverse Prompt for AI Detection – A Robust and Explainable LLM-Generated Text Detector IPAD: Inverse Aufforderung zur KI-Erkennung – ein robuster und erklärbarer LLM-generierter Textdetektor IPAD: AI 检测反光提示 – – 强力和可解释的LLM-发光文本检测器 2502.15902v2 -
756 07-14 On the Learning with Augmented Class via Forests Über das Lernen mit Augmented Class über Wälder 通过森林进修学习 2505.09294v2 -
757 07-14 On the Efficiency of Training Robust Decision Trees Über die Effizienz des Trainings Robuste Entscheidungsbäume 提高培训效率的有力决策树 2507.10048v1 -
758 07-14 Efficient Deployment of Vision-Language Models on Mobile Devices: A Case Study on OnePlus 13R Effiziente Implementierung von Vision-Language-Modellen auf mobilen Geräten: Eine Fallstudie zu OnePlus 13R 高效部署移动设备愿景-语言模型:关于OnePlus 13R的案例研究 2507.08505v2 -
759 07-14 First-ish Order Methods: Hessian-aware Scalings of Gradient Descent Erste-ish-Order-Methoden: Hessisch-bewusste Skalierungen des gradienten Abstiegs 第一至一等秩序方法:逐渐后裔的赫西安人觉醒规模 2502.03701v3 -
760 07-14 Towards Applying Large Language Models to Complement Single-Cell Foundation Models Zur Anwendung großer Sprachmodelle zur Ergänzung von Single-Cell-Stiftungsmodellen 努力应用大语言模型来补充单一行业基金会模型 2507.10039v1 -
761 07-14 Memory-Efficient Personalization of Text-to-Image Diffusion Models via Selective Optimization Strategies Speichereffiziente Personalisierung von Text-zu-Bild-Diffusions-Modellen über selektive Optimierungsstrategien 通过选择性优化战略实现文本到图像传播模型的记忆有效个化 2507.10029v1 -
762 07-14 Integrated Gradient Correlation: a Dataset-wise Attribution Method Integrierte Gradientenkorrelation: eine datensatzweise Attributionsmethode 集成梯度关联:数据集自定义方法 2404.13910v2 -
763 07-14 STRAP: Spatial-Temporal Risk-Attentive Vehicle Trajectory Prediction for Autonomous Driving STRAP: Raum-Temporale Risiko-Attentive Fahrzeug-Trajektorie Vorhersage für autonomes Fahren SSTRAP: 机动车辆自动驾驶空间-时空风险-加速风险-机动车辆轨迹预测 2507.08563v2 -
764 07-14 Collaboration Promotes Group Resilience in Multi-Agent RL Zusammenarbeit fördert Gruppenresistenz in Multi-Agent RL 协作促进多机构RL中的团体复原力 2111.06614v3 -
765 07-14 Forecasting Coccidioidomycosis (Valley Fever) in Arizona: A Graph Neural Network Approach Prognose der Kokzidioidomykose (Valley Fever) in Arizona: Ein Graph-Neural-Netzwerk-Ansatz 亚利桑那州Codidiosmiccidomiccios (Valley Fever) 预测亚利桑那州Codidiosmiccios (Valley Fever) : 图形神经网络方法 2507.10014v1 -
766 07-14 Defense-as-a-Service: Black-box Shielding against Backdoored Graph Models Defense-as-a-Service: Black-Box-Abschirmung gegen hintertürige Graphenmodelle 防卫即服务:防止后门图表模型的黑箱防护 2410.04916v2 -
767 07-14 Effects of structural properties of neural networks on machine learning performance Auswirkungen struktureller Eigenschaften neuronaler Netze auf die Leistungsfähigkeit des maschinellen Lernens 神经网络结构特性对机器学习绩效的影响 2507.10005v1 -
768 07-14 Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code LLM zur Vernunft bringen: Stärkung Lernen aus algorithmischen Problemen ohne Code 教LLM到理由:加强从没有法典的等级问题中学习 2507.07498v2 -
769 07-14 Two-cluster test Zwei-Cluster-Prüfung 两组测试 2507.08382v2 -
770 07-14 Player-Team Heterogeneous Interaction Graph Transformer for Soccer Outcome Prediction Spieler-Team Heterogene Interaktion Graph Transformer für Fußball Outcome Vorhersage 用于足球结果预测的玩家团队 2507.10626v1 -
771 07-14 Compliance Minimization via Physics-Informed Gaussian Processes Compliance Minimierung durch physikinformierte Gaußsche Prozesse 通过物理系统化高斯进程最大限度地减少遵守规定的情况 2507.09968v1 -
772 07-14 Text-Driven Causal Representation Learning for Source-Free Domain Generalization Text-getriebene Kausaldarstellungs-Lernen für quellfreie Domain-Verallgemeinerung 为无源域普遍化进行文字-文字-文字-事业代表性学习 2507.09961v1 -
773 07-14 Radial Neighborhood Smoothing Recommender System Radial Nachbarschaft Smoothing Recommender System 辐射邻居平滑建议系统 2507.09952v1 -
774 07-14 Hierarchical Job Classification with Similarity Graph Integration Hierarchische Jobklassifikation mit Ähnlichkeitsgrafikintegration 具有相似图集集成的等级职务分类 2507.09949v1 -
775 07-14 Iceberg: Enhancing HLS Modeling with Synthetic Data Iceberg: Verbesserung der HLS-Modellierung mit synthetischen Daten 冰山:加强利用合成数据建立HLS模型 2507.09948v1 -
776 07-14 Predicting Graph Structure via Adapted Flux Balance Analysis Vorhersage der Graphenstruktur über angepasste Flux-Balance-Analyse 通过经调整的通量平衡分析实现的预测图结构 2507.05806v2 -
777 07-14 DeepGesture: A conversational gesture synthesis system based on emotions and semantics DeepGesture: Ein dialogisches Gesten-Synthesesystem basierend auf Emotionen und Semantik DeepGesture:基于情感和语义的谈话手势合成系统 2507.03147v2 -
778 07-14 Long-Tailed Data Classification by Increasing and Decreasing Neurons During Training Langzeit-Datenklassifikation durch zunehmende und abnehmende Neuronen während des Trainings 培训期间通过增加和减少中微量增加和减少长期数据分类 2507.09940v1 -
779 07-14 EVALOOP: Assessing LLM Robustness in Programming from a Self-consistency Perspective EVALOOP: Bewertung der Robustheit von LLM in der Programmierung aus einer Perspektive der Selbstkonsistenz EVALOOP: 从自统一的角度评估方案拟订中的LLM强力 2505.12185v3 -
780 07-14 Memorization Sinks: Isolating Memorization during LLM Training Memorization Sinks: Isolation der Memorization während des LLM-Trainings 记忆记忆辛克:在LLLM培训期间隔离记忆 2507.09937v1 -
781 07-14 Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications Mechanistische Interpretation von LoRA-adaptierten Sprachmodellen für Anwendungen in der Reaktorsicherheit LoRA-Adddd 核反应堆安全应用语言模型的可解释性 2507.09931v1 -
782 07-14 Aligning Generative Speech Enhancement with Human Preferences via Direct Preference Optimization Generative Sprachverbesserung mit menschlichen Präferenzen über direkte Präferenzoptimierung ausrichten 通过直接普惠制优化,使发创性话语增强与人类偏爱一致 2507.09929v1 -
783 07-14 Extracting Cause-Effect Pairs from a Sentence with a Dependency-Aware Transformer Model Extrahieren von Ursache-Wirkungs-Paaren aus einem Satz mit einem Dependency-Aware-Transformer-Modell 利用依赖软件变换模型从判决中提取因果对等 2507.09925v1 -
784 07-14 MixLoRA-DSI: Dynamically Expandable Mixture-of-LoRA Experts for Rehearsal-Free Generative Retrieval over Dynamic Corpora MixLoRA-DSI: Dynamisch erweiterbare Mischungs-of-LoRA-Experten für ein probenfreies generatives Retrieval über Dynamic Corpora Mix LoRA-DSI: 动态公司排练-无创录检索专家动态可扩展混合Mix-LORA 2507.09924v1 -
785 07-14 Towards Efficient Quantity Retrieval from Text:An Approach via Description Parsing and Weak Supervision Auf dem Weg zu einer effizienten Menge Abrufen von Text:Ein Ansatz über Beschreibung Parsing und Schwache Überwachung 实现从文本中有效获取数量:通过描述分析和薄弱监督的一种方法 2507.08322v2 -
786 07-14 Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts Erforschen von Sparse Adaptern für skalierbare Zusammenführung von Parameter-Effizienten Experten 探索可缩放的参数集成高效专家的分散适配器 2507.07140v2 -
787 07-14 Advanced U-Net Architectures with CNN Backbones for Automated Lung Cancer Detection and Segmentation in Chest CT Images Erweiterte U-Net-Architekturen mit CNN-Backbones für automatisierte Lungenkrebserkennung und Segmentierung in Brust CT-Bildern 使用有线电视新闻网用于肺癌自动检测和切斯特CT图象分割的U-Net高级建筑 2507.09898v1 -
788 07-14 Algorithm Development in Neural Networks: Insights from the Streaming Parity Task Algorithmenentwicklung in neuralen Netzwerken: Einblicke aus der Streaming Parity-Aufgabe 神经网络中的算法发展:流动均等任务中的透视 2507.09897v1 -
789 07-14 Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning Verständnis ohne Kompetenz: architektonische Grenzen von LLMs in symbolischer Computation und Vernunft 无权限理解:符号计算和理由中LLMs的建筑界限 2507.10624v1 -
790 07-14 Sequence-Model-Guided Measurement Selection for Quantum State Learning Sequence-Modell-geführte Messauswahl für Quantum State Learning 量子州学习的测量选择 2507.09891v1 -
791 07-14 Soft Graph Clustering for single-cell RNA Sequencing Data Weiches Graphen-Clustering für Einzelzell-RNA-Sequenzierungsdaten RNA 单细胞测序数据软图图群集 2507.09890v1 -
792 07-14 TolerantECG: A Foundation Model for Imperfect Electrocardiogram TolerantECG: Ein Grundmodell für ein imperfektes Elektrokardiogramm 缩放式ECG:不完美心电图基金会模型 2507.09887v1 -
793 07-14 Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference MLPs zum Master Heterogenes Graph-Strukturiertes Wissen für effiziente und genaue Schlussfolgerungen zu bringen 向异异质图形结构知识硕士教授多功能模型,以便高效和准确推断 2411.14035v2 -
794 07-14 AdaBrain-Bench: Benchmarking Brain Foundation Models for Brain-Computer Interface Applications AdaBrain-Bench: Benchmarking Brain Foundation Modelle für Brain-Computer Interface Anwendungen AdaBrain-Bench:脑-计算机界面应用基准脑基础模型 2507.09882v1 -
795 07-14 Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching Bridging the Last Mile of Prediction: Verbesserung der Zeitreihenvorhersage mit konditional gesteuertem Flow Matching 连接预测的最后一环:加强时间序列预测与有条件的引导流动匹配 2507.07192v2 -
796 07-14 Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition Funktionsinduktion und Aufgabenverallgemeinerung: Eine Interpretationsstudie mit Off-by-One-Addition 职能上岗和任务一般化:解释性研究 2507.09875v1 -
797 07-14 External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation Externes Large Foundation Modell: Wie Sie effizient dienen Trillionen von Parametern für Online-Anzeigen Empfehlung 外部大型基金会模式:如何有效服务数以万计的在线咨询建议参数 2502.17494v7 -
798 07-14 A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization Eine vollständige Verlustlandschaftsanalyse der Regularisierten Tiefenmatrix-Fabrikierung 对正规化深母体因子化的全损全损地貌分析 2506.20344v2 -
799 07-14 Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks Task Priors: Verbesserung der Modellbewertung unter Berücksichtigung des gesamten Raumes von Downstream-Aufgaben 任务前期:考虑到下游任务的全部空间,加强示范评价 2507.09871v1 -
800 07-14 Intersection of Reinforcement Learning and Bayesian Optimization for Intelligent Control of Industrial Processes: A Safe MPC-based DPG using Multi-Objective BO Intersektion von Verstärkungslernen und Bayesian-Optimierung zur intelligenten Steuerung industrieller Prozesse: Ein sicheres MPC-basiertes DPG mit Multi-Objective BO 强化学习和巴耶斯优化优化对工业加工的明智控制:使用多目标BB,以MPC为基础的安全DPG 2507.09864v1 -
801 07-14 Flows and Diffusions on the Neural Manifold Strömungen und Diffusionen auf der Neuralmanifolde 神经元层的流量和扩散 2507.10623v1 -
802 07-14 On the Local Complexity of Linear Regions in Deep ReLU Networks Über die lokale Komplexität linearer Regionen in Deep ReLU-Netzwerken 深RELU网络线性区域局部复杂程度 2412.18283v3 -
803 07-14 REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models REINFORCE++: Effizienter RLHF-Algorithmus mit Robustheit sowohl für Prompt- als auch für Reward-Modelle REINFORCE++: 高效的RLHF对快速模型和奖励模型具有强力的测算法 2501.03262v6 -
804 07-14 CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation CRISP-SAM2: SAM2 mit Cross-Modal Interaction und semantischer Prompting für Multi-Organ Segmentierung CRIISP-SAM2:SAM2 具有跨模式相互作用和跨组织分解的语义提示的SAM2 2506.23121v3 -
805 07-14 Spectral Feature Extraction for Robust Network Intrusion Detection Using MFCCs Spektrale Feature-Extraktion für robuste Netzwerkintrusionserkennung mit MFCCs 利用 MFCCs 进行强力网络入侵探测的光谱特征采掘 2507.10622v1 -
806 07-14 A General Framework for Inference-time Scaling and Steering of Diffusion Models Ein allgemeiner Rahmen für Schlussfolgerungs-Zeit-Skalierung und Steuerung von Diffusionsmodellen 传播模型的推推时间缩放和引导总框架 2501.06848v4 -
807 07-14 A Data-Driven Review of Remote Sensing-Based Data Fusion in Precision Agriculture from Foundational to Transformer-Based Techniques Eine datengestützte Überprüfung von Fernerkundungsbasierter Datenfusion in der Präzisions-Landwirtschaft von der Grundlagen- bis zur Transformer-basierten Technik 对精密农业中从基础技术到变换技术的遥感数据融合的数据驱动审查 2410.18353v2 -
808 07-14 Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training Through the River: Den Nutzen planfreier Methoden für das Sprachmodelltraining verstehen 通过河道:了解语文示范培训的无附条件方法的益处 2507.09846v1 -
809 07-14 Spurious Stationarity and Hardness Results for Bregman Proximal-Type Algorithms Puristische Stationarität und Härte Ergebnisse für Bregman Proximale Algorithmen Bregman Proximal-Type 的纯净持久性和硬性结果 2404.08073v2 -
810 07-14 Dataset Distillation-based Hybrid Federated Learning on Non-IID Data Datensatz Destillationsbasiertes Hybrid-Federated-Learning auf nicht-ID-Daten 基于数据提炼的关于非统计数据数据的混合联邦学习 2409.17517v2 -
811 07-14 Subgroups Matter for Robust Bias Mitigation Untergruppen Materie für robuste Bias Mitigation 稳健的Biust Bias 减轻风险的分组事项 2505.21363v3 -
812 07-14 Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs Rethinking Prompt Optimization: Verstärkung, Diversifizierung und Migration in Blackbox LLMs 重新思考即时优化:加强、多样化和黑盒LMS中的移民 2507.09839v1 -
813 07-14 A Pre-training Framework for Relational Data with Information-theoretic Principles Ein Vorausbildungsrahmen für relationale Daten mit informationstheoretischen Prinzipien 带有信息理论原则的关系数据培训前框架 2507.09837v1 -
814 07-14 Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems Multi-Residual Mixture of Experts Learning for Cooperative Control in Multi-Vehicle Systems 多车辆系统合作控制专家学习 2507.09836v1 -
815 07-13 (7) Generative Cognitive Diagnosis Generative Kognitive Diagnose 认知诊断 2507.09831v1 -
816 07-13 Hierarchical Abstraction Enables Human-Like 3D Object Recognition in Deep Learning Models Hierarchische Abstraktion ermöglicht die Erkennung von Menschen wie 3D-Objekten in Deep Learning-Modellen 在深学习模型中,等级式抽象抽象化使人类能够识别3D等3D对象 2507.09830v1 -
817 07-13 LLMs Meet Cross-Modal Time Series Analytics: Overview and Directions LLMs treffen auf Cross-Modal Time Series Analytics: Übersicht und Anfahrt 跨模式时间序列分析分析:概览和方向 2507.10620v1 -
818 07-13 Conditional Data Synthesis Augmentation Bedingte Daten Synthese Augmentation 有条件数据合成增强 2504.07426v2 -
819 07-13 Bridging Neural Networks and Dynamic Time Warping for Adaptive Time Series Classification Überbrückung von Neuronalen Netzwerken und dynamisches Zeitwarping für adaptive Zeitreihenklassifikation 架桥神经网络和适应性时间序列分类动态时间调整 2507.09826v1 -
820 07-13 Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization Beyond Multiple Choice: Bewertung von Steuerungsvektoren für adaptive Freiform-Zusammenfassung 超越多重选择:评估适应性自由形式总结指导矢量 2505.24859v2 -
821 07-13 Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding Annäherung an Ratenverzerrungsgrenzen bei Neuralkompression mit Lattice Transform Coding 采用拉蒂节变换编码,在神经压缩中接近比率扭曲限制 2403.07320v2 -
822 07-13 Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization Nesterov findet GRAAL: Optimale und adaptive Gradienten-Methode zur Convex-Optimierung Nesterov Finds GRAAL: 最优化和适应性梯度法 2507.09823v1 -
823 07-13 Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing Entwirren der komplexen Multiplexed DIA Spectra in De Novo Peptide Sequenzierung 拆分新佩普迪德省复杂的多氧化DIA分层 2411.15684v5 -
824 07-13 Coupled Entropy: A Goldilocks Generalization for Nonextensive Statistical Mechanics Gepaarte Entropie: Verallgemeinerung von Goldilocks für Nonextensive Statistical Mechanics Goldilocks 通用非广延性统计机械学 2506.17229v2 -
825 07-13 Compressed Computation: Dense Circuits in a Toy Model of the Universal-AND Problem Komprimierte Berechnung: Dichte Schaltungen in einem Spielzeugmodell des Universal-AND-Problems 压缩计算:普遍问题玩具模型中的密集电路 2507.09816v1 -
826 07-13 Interpretable Time Series Autoregression for Periodicity Quantification Verdolmetschbare Zeitreihen Autoregression für Periodizitätsquantifizierung 周期量化的自动递减 2506.22895v2 -
827 07-13 Federated Learning with Graph-Based Aggregation for Traffic Forecasting Föderiertes Lernen mit Graphen-basierter Aggregation für Verkehrsprognosen 使用基于图表的交通流量预测汇总的联邦学习 2507.09805v1 -
828 07-13 Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks Meta-Reinforcement-Lernen für schnelle und dateneffiziente Frequenzallokation in dynamischen drahtlosen Netzwerken 动态无线网络快速和数据有效频谱分配元加强学习 2507.10619v1 -
829 07-13 Compute Requirements for Algorithmic Innovation in Frontier AI Models Berechnung der Anforderungen an algorithmische Innovationen bei Frontier-KI-Modellen 边境AI 模型的计算方法分析创新要求 2507.10618v1 -
830 07-13 A Scalable and Efficient Signal Integration System for Job Matching Ein skalierbares und effizientes Signalintegrationssystem für Job Matching 用于匹配工作的可缩放和高效信号集成系统 2507.09797v1 -
831 07-13 LASER: Attention with Exponential Transformation LASER: Aufmerksamkeit bei exponentieller Transformation LASER: 关注感官转变 2411.03493v2 -
832 07-13 NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection NegRefine: Verfeinerung negativer Label-basierter Zero-Shot-OOD-Erkennung NegRefine: 改进以标签为基的零热 OOOD 检测 2507.09795v1 -
833 07-13 Leveraging Distribution Matching to Make Approximate Machine Unlearning Faster Leveraging Distribution Passend, um annähernde Maschine Unlearning schneller zu machen 利用配配配配的配送让近似机器更快退出学习 2507.09786v1 -
834 07-13 Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow Effiziente molekulare Konformer-Generation mit SO(3)-gemitteltem Flow Matching und Reflow 具有SO(3)-可预见流动匹配和回流的高效分子前代分子 2507.09785v1 -
835 07-13 Physics-informed neural networks for high-dimensional solutions and snaking bifurcations in nonlinear lattices Physik-informierte neuronale Netzwerke für hochdimensionale Lösungen und snaking bifurkations in nichtlinearen Gittern 物理知情神经网络,用于高维溶液和在非线性顶层中截断双硫 2507.09782v1 -
836 07-13 Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces Wahrscheinlich adaptive durchschnittliche Belohnung Verstärkung Lernen für Metrische Räume 可调适性平均增益学习,用于计量空间 2410.19919v2 -
837 07-13 DataDecide: How to Predict Best Pretraining Data with Small Experiments DataDecide: Wie man die besten Vorschulungsdaten mit kleinen Experimenten vorhersagt 数据减少:如何利用小型实验预测最佳培训前数据 2504.11393v2 -
838 07-13 SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving SLED: Ein spekulatives LLM-Decoding-Framework für effizientes Edge Serving SLED: 有效边缘服务投机性LLM代谢框架 2506.09397v4 -
839 07-13 Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Vision-geführtes Chunking ist alles, was Sie brauchen: Verbesserung der RAG durch multimodales Dokumentenverständnis 愿景引导的决赛是您所需要的:用多模式文件理解加强RAG 2506.16035v2 -
840 07-13 Knowing When to Quit: Probabilistic Early Exits for Speech Separation Zu wissen, wann man aufhören soll: probabilistische frühe Ausgänge für Sprachtrennung 了解何时退出:语言分离的概率早期出场 2507.09768v1 -
841 07-13 Toward accurate RUL and SOH estimation using reinforced graph-based PINNs enhanced with dynamic weights Zur genauen RUL- und SOH-Schätzung mit verstärkten graphbasierten PINNs mit dynamischen Gewichten 使用强化的以图表为基础的活性净净化网,加上动态权重,实现准确的RUL和SOH估算 2507.09766v1 -
842 07-13 Cascade Speculative Drafting for Even Faster LLM Inference Cascade Spekulative Drafting für noch schnellere LLM-Inferenz 连速度更快LLM推论的连带连带性投机起草 2312.11462v5 -
843 07-13 Data-Centric Human Preference with Rationales for Direct Preference Alignment Daten-Centric Human Preference mit Rationales für direkte Präferenzausrichtung 数据中心人类首选与直接优先调整的理由说明 2407.14477v4 -
844 07-13 Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding Ihr prätrainiertes Modell erzählt die Schwierigkeit selbst: Ein selbstadaptives Curriculum Lernen Paradigma für das natürliche Sprachverständnis 您训练有素的模型告诉困难本身:学习自然语言理解的自适应课程学习范式 2507.09758v1 -
845 07-13 Energy Dissipation Rate Guided Adaptive Sampling for Physics-Informed Neural Networks: Resolving Surface-Bulk Dynamics in Allen-Cahn Systems Energieableitungsrate Geführte adaptive Probenahme für physikinformierte Neuronale Netzwerke: Auflösen von Oberflächen-Bulk-Dynamik in Allen-Cahn-Systemen 物理内成形神经网络的能源损耗率向导适应性抽样抽样:Allen-Cahn系统中的表面-柱体动力学的解决方案 2507.09757v1 -
846 07-13 DiPT: Enhancing LLM reasoning through diversified perspective-taking DiPT: Verbesserung der LLM-Reinigung durch diversifizierte Perspektive DPT:通过从不同角度出发,加强LLM推理 2409.06241v2 -
847 07-13 Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts Erklärbare KI in der Genomik: Transkriptionsfaktor Bindung Site Prediction mit Mischung von Experten 在基因组学中可解释的AI:与专家混合的转移要素约束性现场预测 2507.09754v1 -
848 07-13 Do we need equivariant models for molecule generation? Brauchen wir äquivariante Modelle für die Molekülgenerierung? 我们需要分子生成的等同模型吗? 2507.09753v1 -
849 07-13 Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them Scalpel vs. Hammer: GRPO verstärkt bestehende Fähigkeiten, SFT ersetzt sie 缩略图与锤子:GROPO 放大现有能力,SFT 替换 2507.10616v1 -
850 07-13 MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients MB-RIRs: ein Synthetischer Raumimpuls-Ansprechdatensatz mit Frequenzabhängigen Absorptionskoeffizienten MB-RIRs:一个具有频率依赖吸收系数的合成室电动脉冲反应数据集 2507.09750v1 -
851 07-13 Fair Domain Generalization: An Information-Theoretic View Fair Domain Generalization: Eine informationstheoretische Ansicht 公平域一般化:信息理论观点 2507.05823v2 -
852 07-13 Discovering Governing Equations in the Presence of Uncertainty Entdeckt regierende Gleichungen in der Gegenwart von Ungewissheit 不确定性存在时的发现等值 2507.09740v1 -
853 07-13 Accelerating Constrained Sampling: A Large Deviations Approach Beschleunigte Probenahme beschleunigen: Ein großer Abweichungsansatz 加速受控抽样:大偏离方法 2506.07816v2 -
854 07-13 Universal Physics Simulation: A Foundational Diffusion Approach Universelle Physik Simulation: Ein grundlegender Diffusionsansatz 宇宙物理模拟:基础扩散方法 2507.09733v1 -
855 07-13 Continental scale habitat modelling with artificial intelligence and multimodal earth observation Lebensraummodellierung im kontinentalen Maßstab mit künstlicher Intelligenz und multimodaler Erdbeobachtung 利用人工智能和多式地球观测进行大陆规模的大陆生境建模 2507.09732v1 -
856 07-13 Signed Graph Learning: Algorithms and Theory Unterzeichnetes Graphenlernen: Algorithmen und Theorie 签署图表学习:算法和理论 2507.09717v1 -
857 07-13 TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems TimberStrike: Datensatz-Rekonstruktion Angriff Enthüllen der Privatsphäre Leckage in Federated Tree-Based Systems 木材三角:联邦树基系统中数据集重建攻击清除隐私渗漏 2506.07605v3 -
858 07-13 Task-Agnostic Pre-training and Task-Guided Fine-tuning for Versatile Diffusion Planner Task-Agnostic Pre-Training und Task-Guided Fine-Tuning für vielseitige Diffusion Planner Versatile Difatile 扩散规划器任务不可知性培训前和任务指导微调 2409.19949v3 -
859 07-13 Phase transition of the Sinkhorn-Knopp algorithm Phasenübergang des Sinkhorn-Knopp-Algorithmus Sinkhorn- Knopp 算法的阶段过渡 2507.09711v1 -
860 07-13 Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces Große Sprachmodelle kodieren Semantik in Low-Dimensional Linear Subspaces 低多维线性线性子空间中大语言模型编码语义学 2507.09709v1 -
861 07-13 BiDepth: A Bidirectional-Depth Neural Network for Spatio-Temporal Prediction BiDepth: Ein bidirektional-depth-Neurales Netzwerk für Spatio-Temporale Vorhersagen 双向 – – 双向 – – 外心神经网络 2501.08411v3 -
862 07-13 CCDM: Continuous Conditional Diffusion Models for Image Generation CCDM: Continuous Conditional Diffusion Models für die Bildgenerierung CCDM: 图像生成持续有条件传播模型 2405.03546v3 -
863 07-13 EPT-2 Technical Report EPT-2 Technischer Bericht EPT-2 技术报告 2507.09703v1 -
864 07-13 Latent Functional Maps: a spectral framework for representation alignment Latent Functional Maps: ein spektraler Rahmen für die Darstellungsausrichtung 原始功能地图:代表调整的光谱框架 2406.14183v4 -
865 07-13 VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling VideoChat-Flash: Hierarchische Komprimierung für die Langkontext-Videomodellierung VideoChat-Flash:长文本视频建模的等级压缩 2501.00574v4 -
866 07-13 Frequency-aware Surrogate Modeling With SMT Kernels For Advanced Data Forecasting Frequency-aware Surrogate Modellierung mit SMT-Kerneln für erweiterte Datenvorhersage 利用SMT内核建模甚高能代谢模型,用于高级数据预报 2507.09694v1 -
867 07-13 Comprehensive Evaluation of OCT-based Automated Segmentation of Retinal Layer, Fluid and Hyper-Reflective Foci: Impact on Clinical Assessment of Diabetic Retinopathy Severity Umfassende Bewertung der OCT-basierten Automatisierten Segmentierung von Netzhautschicht, Flüssigkeit und Hyperreflektiver Foci: Auswirkungen auf die klinische Beurteilung von severity diabetischer Retinopathie 综合评价基于OCT的视网膜、流体和超反光谱系的视网膜、流体和超反光谱系的自动分解:对诊断性糖尿病病理严重性评估的影响 2503.01248v4 -
868 07-13 Post-Training Quantization of Generative and Discriminative LSTM Text Classifiers: A Study of Calibration, Class Balance, and Robustness Post-Training Quantization of Generative and Discriminative LSTM Text Klassifikatoren: Eine Studie zur Kalibrierung, Klassenbilanz und Robustheit 培训后对产生和区别的LSTM文字分类的量化:校准、分类平衡和强力研究 2507.09687v1 -
869 07-13 Symptom-Driven Personalized Proton Pump Inhibitors Therapy Using Bayesian Neural Networks and Model Predictive Control Symptom-getriebene personalisierte Protonenpumpenhemmer Therapie mit Bayesian Neural Networks und Modell Predictive Control 利用贝耶斯神经网络和模型预测控制进行治疗 2507.09685v1 -
870 07-13 Networked Information Aggregation via Machine Learning Vernetzte Informationsaggregation über maschinelles Lernen 通过机器学习建立网络信息聚合 2507.09683v1 -
871 07-13 Towards Reliable Forgetting: A Survey on Machine Unlearning Verification Zuverlässiges Vergessen: Eine Umfrage über die Überprüfung des maschinellen Lernens 实现可靠地遗忘:关于机械不学习核查的调查 2506.15115v2 -
872 07-13 Conformal Prediction for Privacy-Preserving Machine Learning Conformal Prediction for Privacy-Preserving Machine Learning 隐私保护机器学习的正规预测 2507.09678v1 -
873 07-13 Fine-tuning Large Language Model for Automated Algorithm Design Feinabstimmung Großsprachiges Modell für automatisiertes Algorithmen-Design 自动算法设计大语言模型 2507.10614v1 -
874 07-13 Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs Sub-Scaling-Gesetze: Zur Rolle der Datendichte und Ausbildungsstrategien in LLMs 次级衡量法律:关于数据密度的作用和培训战略 2507.10613v1 -
875 07-13 Machine-Precision Prediction of Low-Dimensional Chaotic Systems Maschinenpräzisionsvorhersage von niederdimensionalen Chaotischen Systemen 低多功能卫生系统机器精确预测 2507.09652v1 -
876 07-13 Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset Pluralismus in der algorithmischen Monokultur kultivieren: Der Datensatz zur Gemeinschaftsausrichtung 在高农业单体养殖中培养多元主义:社区协调数据集 2507.09650v1 -
877 07-13 Learning Flexible Forward Trajectories for Masked Molecular Diffusion Flexible Forward-Trajektorien für maskierte molekulare Diffusion lernen 蒙面分子扩散学习灵活前向轨迹 2505.16790v3 -
878 07-13 Disentanglement and Assessment of Shortcuts in Ophthalmological Retinal Imaging Exams Entflechtung und Beurteilung von Abkürzungen bei ophthalmologischen Retina-Imaging-Prüfungen 眼视视网膜成像Exams 中快捷键的分解和评估 2507.09640v1 -
879 07-13 Limits of Discrete Energy of Families of Increasing Sets Grenzen der diskreten Energie von Familien zunehmender Sets 增加组家庭不同能源限度的限制 2504.11302v3 -
880 07-13 Lightweight Deep Learning-Based Channel Estimation for RIS-Aided Extremely Large-Scale MIMO Systems on Resource-Limited Edge Devices Leichte Deep Learning-basierte Kanalschätzung für RIS-geförderte extrem großräumige MIMO-Systeme auf ressourcenschonenden Edge-Geräten 对资源限制边缘装置的RIS帮助极大型IMIM系统进行基于深深学习的频道估计 2507.09627v1 -
881 07-13 WeGeFT: Weight-Generative Fine-Tuning for Multi-Faceted Efficient Adaptation of Large Models WeGeFT: Gewicht-Generative Feintuning für die effiziente Anpassung großer Modelle WeGeFT: 使大型模型的多面高效适应的重量-弹性微调 2312.00700v5 -
882 07-13 CAN-Trace Attack: Exploit CAN Messages to Uncover Driving Trajectories CAN-Trace Attack: CAN-Nachrichten nutzen, um Fahrbahnen zu entdecken Can- Trace 攻击: 将 CAN 信件开发到无法覆盖的驱动轨迹 2507.09624v1 -
883 07-13 The Full-scale Assembly Simulation Testbed (FAST) Dataset Der Full-Scale Assembly Simulation Testbed (FAST) Datensatz 全规模大会模拟模拟试验数据集 2403.08969v2 -
884 07-13 Regret Analysis of Policy Optimization over Submanifolds for Linearly Constrained Online LQG Bedauerliche Analyse der Politikoptimierung über Submanifolds für linear eingeschränkte Online LQG 对线性受约束在线LQG对潜艇皮带政策优化的遗憾分析 2403.08553v2 -
885 07-13 MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression MLoRQ: Bridging Low-Rank und Quantisierung für Transformer-Kompression MLORQ: 连接低兰克和变压压缩量化 2507.09616v1 -
886 07-13 Your Absorbing Discrete Diffusion Secretly Models the Bayesian Posterior Ihre absorbierende Diskrete Diffusion heimlich Modelle der Bayesian Posterior 您的吸收分解扩散秘密模型 贝叶斯波斯别墅 2507.07586v2 -
887 07-13 Prediction-Augmented Mechanism Design for Weighted Facility Location Voraussichtlicher Mechanismus für den Standort der gewichteten Fazilität 加权设施位置设计 2507.06509v3 -
888 07-13 DRAGD: A Federated Unlearning Data Reconstruction Attack Based on Gradient Differences DRAGD: Ein Federated Unlearning Data Reconstruction Attack basierend auf gradienten Unterschieden DRADD:基于渐变差异的联合会不学习数据重建攻击 2507.09602v1 -
889 07-13 Denoising and Reconstruction of Nonlinear Dynamics using Truncated Reservoir Computing Denoising und Rekonstruktion von nichtlinearen Dynamiken mit verkürztem Reservoir Computing 使用流动储量计算法进行非线性动态的衰减和重建 2504.13355v2 -
890 07-13 Identifying Offline Metrics that Predict Online Impact: A Pragmatic Strategy for Real-World Recommender Systems Offline-Metriken identifizieren, die Online-Impact voraussagen: Eine Pragmatische Strategie für Real-World-Empfängersysteme 查明预测在线影响的离线下矩阵:现实世界建议系统实用战略 2507.09566v1 -
891 07-13 CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning CATP-LLM: Große Sprachmodelle für die kostenbewusste Werkzeugplanung CATP-LLM:增强成本软件工具规划大语言模型能力 2411.16313v3 -
892 07-13 A modular framework for automated evaluation of procedural content generation in serious games with deep reinforcement learning agents Ein modularer Rahmen für die automatisierte Bewertung der verfahrenstechnischen Inhaltsgenerierung in ernsten Spielen mit Deep-Enforcement-Learning-Agenten 与深强化学习机构一起对严重游戏的程序内容生成进行自动自动评价的模块化框架 2505.16801v2 -
893 07-13 Is Intermediate Fusion All You Need for UAV-based Collaborative Perception? Ist Intermediate Fusion alles, was Sie für UAV-basierte Collaborative Perception benötigen? 中间融合 需要所有你 以无人驾驶飞行器为基础的协作感知? 2504.21774v2 -
894 07-13 Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes Beschreibung des Ausbildungsprozesses von Neuronalen Netzwerken über Ergodic Theorem : Geisterknoten 描述Ergodic定理神经网络培训过程:幽灵节点 2507.01003v3 -
895 07-13 Reinforced Reasoning for Embodied Planning Verstärkte Begründung für die körperbetonte Planung 强化规划强化理由 2505.22050v2 -
896 07-13 Lightweight Federated Learning over Wireless Edge Networks Leichtes Federated Learning über drahtlose Edge-Netzwerke 对无线边缘网络进行轻量量量联邦学习 2507.09546v1 -
897 07-13 Assessing reliability of explanations in unbalanced datasets: a use-case on the occurrence of frost events Beurteilung der Zuverlässigkeit von Erklärungen in unausgewogenen Datensätzen: ein Anwendungsfall zum Auftreten von Frostereignissen 评估不平衡数据集中解释解释的可靠性:发生霜冻事件的情况 2507.09545v1 -
898 07-13 FedGSCA: Medical Federated Learning with Global Sample Selector and Client Adaptive Adjuster under Label Noise FedGSCA: Medizinisches Federated Learning mit Global Sample Selector und Client Adaptive Justierer unter Label Noise FedGSCA:与全球抽样选择者和标签噪音下的客户适应调整器进行医学联合会学习 2507.10611v1 -
899 07-13 Quantum Curriculum Learning Quantum Curriculum Lernen 量量课程学习 2407.02419v4 -
900 07-13 Consistency Trajectory Planning: High-Quality and Efficient Trajectory Optimization for Offline Model-Based Reinforcement Learning Konsequente Trajektorienplanung: Hochqualitative und effiziente Trajektorienoptimierung für Offline-Modellbasiertes Verstärkungslernen 一致性轨迹规划:离线示范强化学习的高质量和高效率轨迹优化 2507.09534v1 -
901 07-13 Monte Carlo Tree Diffusion for System 2 Planning Monte Carlo Tree Diffusion für System 2 Planung 用于系统2规划的蒙特卡洛树传播 2502.07202v6 -
902 07-13 VDInstruct: Zero-Shot Key Information Extraction via Content-Aware Vision Tokenization VDInstruct: Zero-Shot-Schlüsselinformationsextraktion über Content-Aware Vision Tokenization VDInstruct: 通过内容软件愿景提取零热关键信息 2507.09531v1 -
903 07-13 A Feed-Forward Artificial Intelligence Pipeline for Sustainable Desalination under Climate Uncertainties: UAE Insights Eine Feed-Forward-Pipeline für künstliche Intelligenz zur nachhaltigen Entsalzung unter Klimaunsicherheiten: VAE-Insights 在气候不确定性下实现可持续脱盐的进餐前人工智能管道:阿联酋观察 2507.10609v1 -
904 07-13 An Analysis of Action-Value Temporal-Difference Methods That Learn State Values Eine Analyse von Aktions-Wert-Temporal-Difference-Methoden, die State Values lernen 《学习国家价值观的行动—-重视时间—-差异方法分析》 2507.09523v1 -
905 07-13 The Shape of Deceit: Behavioral Consistency and Fragility in Money Laundering Patterns Die Form des Verfalls: Verhaltenskonsistenz und Fragilität in Geldwaschmustern 犯罪模式的形状:洗钱模式中的行为一贯性和脆弱性 2507.10608v1 -
906 07-13 Neural Expectation Operators Neurale Erwartungen Betreiber 神经期待运算符 2507.10607v1 -
907 07-13 DALI-PD: Diffusion-based Synthetic Layout Heatmap Generation for ML in Physical Design DALI-PD: Diffusionsbasiertes Synthetisches Layout Heatmap Generation für ML in Physical Design DALI-PD:在物理设计中为ML制造以扩散为基础的合成布局热电图 2507.10606v1 -
908 07-13 Neural Two-Stage Stochastic Optimization for Solving Unit Commitment Problem Neurale Zwei-Stufen-Stochastische Optimierung zur Lösung von Unit Commitment Problem 用于解决单位承诺问题的神经双层两层斯托卡优化 2507.09503v1 -
909 07-13 Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization Verbesserte Regret Bounds für Gaussian Prozess Oberes Vertrauen in Bayesian Optimierung 改善对巴耶斯最佳优化高山进程最高信任圈的遗憾区 2506.01393v2 -
910 07-13 An Algorithm for Identifying Interpretable Subgroups With Elevated Treatment Effects Ein Algorithmus zur Identifizierung von interpretierbaren Untergruppen mit erhöhten Behandlungseffekten 确定具有更高治疗效果的解释分组的数值 2507.09494v1 -
911 07-13 Auditing Prompt Caching in Language Model APIs Auditieren von Prompt-Caching in Sprachmodell-APIs 语言模式APIP中快速抓取 2502.07776v2 -
912 07-13 LEP-QNN: Loan Eligibility Prediction using Quantum Neural Networks LEP-QNN: Kreditfähigkeitsvorhersage über Quantum-Neural-Netzwerke LEP-QNN:利用量子神经网络预测贷款资格 2412.03158v2 -
913 07-13 QFNN-FFD: Quantum Federated Neural Network for Financial Fraud Detection QFNN-FFD: Quantum Federated Neural Network for Financial Betrug Detection QFNN-FFD:金融欺诈侦查量子联邦神经网络 2404.02595v5 -
914 07-13 Learning Expressive Random Feature Models via Parametrized Activations Expressive Zufalls-Feature-Modelle über parametrisierte Aktivierungen lernen 通过半美化动能进行学习表达式随机特质模型 2411.19468v2 -
915 07-13 Learning-Order Autoregressive Models with Application to Molecular Graph Generation Autoregressive Modelle mit Anwendung auf die molekulare Graphengenerierung lernen-Ordnen 适用于分子图生成的学习顺序自动递减模型 2503.05979v2 -
916 07-13 Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling Adaptives Federated LoRA in heterogenen drahtlosen Netzwerken mit unabhängiger Probenahme 具有独立抽样调查的多源无线网络中的联邦适应性 2505.23555v3 -
917 07-13 Neural Architecture Search generated Phase Retrieval Net for Real-time Off-axis Quantitative Phase Imaging Neurale Architektur Suche erzeugtes Phasen-Retrieval-Netz für Echtzeit-Off-Axis Quantitative Phasen-Imaging 实时非轴外定量成像的神经结构搜索生成阶段回收网 2210.14231v2 -
918 07-13 Discrete Differential Principle for Continuous Smooth Function Representation Diskrete Differentialprinzip für kontinuierliche glatte Funktionsdarstellung 连续平滑职能代表的不区分原则 2507.09480v1 -
919 07-13 Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints Anreiz-Aware Dynamische Ressourcenzuweisung unter langfristigen Kosteneinschränkungen 长期成本制约因素下的奖励性-软件动态资源分配 2507.09473v1 -
920 07-13 La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching La-Proteina: Atomistische Proteinerzeugung über teilweise Latent Flow Matching La-Proteina:通过局部延迟流动配对产生的原子蛋白质 2507.09466v1 -
921 07-13 Enhancing ALS Progression Tracking with Semi-Supervised ALSFRS-R Scores Estimated from Ambient Home Health Monitoring Verbesserung der ALS-Progressionsverfolgung mit semi-überwachten ALSFRS-R Punktzahl Geschätzt von Ambient Home Health Monitoring 环境家庭健康监测估计的半超ALSFRS-R分数加强ALS进展跟踪 2507.09460v1 -
922 07-13 Aequa: Fair Model Rewards in Collaborative Learning via Slimmable Networks Aequa: Faire Modellprämien im kollaborativen Lernen über schlanke Netzwerke Aequa:通过可恢复网络合作学习的公平示范奖励 2502.04850v2 -
923 07-13 RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services RedOne: Enthüllen von Domain-spezifischen LLM-Post-Trainings in Social Networking Services 红一:在社会联网服务培训后推出特定域域LLM 2507.10605v1 -
924 07-13 Fourier Basis Mapping: A Time-Frequency Learning Framework for Time Series Forecasting Fourier Basis Mapping: Ein Zeit-Frequenz-Lernrahmen für Zeitreihenprognosen Fourier地基绘图:时间序列预测时间-期限学习框架 2507.09445v1 -
925 07-13 Securing Transformer-based AI Execution via Unified TEEs and Crypto-protected Accelerators Sicherung transformerbasierter KI-Execution über Unified TEEs und Crypto-geschützte Beschleuniger 通过统一TEE和加密保护加速器实施基于安全变压器的 AI 执行 2507.03278v2 -
926 07-13 Toward Developing Machine-Learning-Aided Tools for the Thermomechanical Monitoring of Nuclear Reactor Components Entwicklung von maschinenlernenden Werkzeugen für die thermomechanische Überwachung nuklearer Reaktorkomponenten 逐步开发用于对核反应堆部件进行热机械机械机械监测的机械学习辅助工具 2507.09443v1 -
927 07-13 Next-token pretraining implies in-context learning Pretraining im Rahmen von Next-token impliziert das In-Context-Lernen 下一级培训前的学习意味着通俗的学习 2505.18373v2 -
928 07-13 Transformers Don’t In-Context Learn Least Squares Regression Transformer lernen nicht im Kontext Least Squares Regression 变换者不要在知识中学习最小平方倒退 2507.09440v1 -
929 07-13 Dynamic Sparse Causal-Attention Temporal Networks for Interpretable Causality Discovery in Multivariate Time Series Dynamische Sparse Causal-Aufmerksamkeit Temporale Netzwerke für interpretierbare Kausalitäts-Entdeckung in multivariaten Zeitreihen 多变量时间序列中可解释性诱因发现时空网络 2507.09439v1 -
930 07-13 Neural networks leverage nominally quantum and post-quantum representations Neurale Netzwerke nutzen nominal Quanten- und Post-Quantum-Darstellungen 神经网络在名义上对数量和数量后代表的杠杆作用发挥杠杆作用 2507.07432v2 -
931 07-13 Modern approaches to building interpretable models of the property market using machine learning on the base of mass cadastral valuation Moderne Ansätze für den Aufbau interpretierbarer Modelle des Immobilienmarkts mit maschinellem Lernen auf Basis der Massenkadastralbewertung 采用现代方法,利用根据质量地籍估价进行的机器学习,建立可解释的财产市场模型 2506.15723v2 -
932 07-13 Sensitivity Analysis of Transport and Radiation in NeuralPlasmaODE for ITER Burning Plasmas Sensitivitätsanalyse von Transport und Strahlung in NeuralPlasmaODE für ITER-Brennplasma ITER 燃烧日光虫的神经PlasmaODE内运输和辐射感敏分析 2507.09432v1 -
933 07-12 (6) Optimizing External Sources for Controlled Burning Plasma in Tokamaks with Neural Ordinary Differential Equations Optimierung externer Quellen für kontrolliertes Brennplasma in Tokamaks mit neuralen normalen Differentialgleichungen 利用神经普通差异等同优化托卡马克受控燃烧等离外部源的最佳利用 2507.09431v1 -
934 07-12 Causal Discovery-Driven Change Point Detection in Time Series Causal Discovery-Driven Change Point Detection in der Zeitreihe 时间序列中因果发现 - 驱动变化点探测 2407.07290v2 -
935 07-12 On Information Geometry and Iterative Optimization in Model Compression: Operator Factorization Über Informationsgeometrie und iterative Optimierung in der Modellkompression: Operator Factorization 关于模型压缩中信息几何和迭代优化的信息优化:操作者化 2507.09428v1 -
936 07-12 Domain Adaptation and Multi-view Attention for Learnable Landmark Tracking with Sparse Data Domain-Anpassung und Multi-View-Achtung für erlernbares Landmark-Tracking mit Sparse-Daten 利用简化数据进行可学习土地标记跟踪的域适应和多视角关注 2507.09420v1 -
937 07-12 On Supernet Transfer Learning for Effective Task Adaptation Auf Supernet Transfer Learning für effektive Aufgabenanpassung 用于有效任务适应的超级网传输学习 2407.20279v3 -
938 07-12 Intelligent Orchestration of Distributed Large Foundation Model Inference at the Edge Intelligente Orchestrierung der verteilten Large Foundation Model Inferenz am Rande 分散在边缘的大基金会模型推断 2504.03668v3 -
939 07-12 Insuring Uninsurable Risks from AI: Government as Insurer of Last Resort Unversicherbare Risiken von KI sichern: Regierung als Versicherer des letzten Resorts AI:政府作为最后度假地的保险人 2409.06672v3 -
940 07-12 GreenCrossingAI: A Camera Trap/Computer Vision Pipeline for Environmental Science Research Groups GreenCrossingAI: Eine Kamerafalle/Computer Vision Pipeline für Forschungsgruppen der Umweltwissenschaften GreenCrossingAI:环境科学研究小组的相机陷阱/计算机视觉管道 2507.09410v1 -
941 07-12 Divergence of Empirical Neural Tangent Kernel in Classification Problems Unterschiedlichkeit des empirischen neuralen Tangenten-Kernels bei Klassifizierungsproblemen 在分类问题方面经验性神经神经下层核心的差别 2504.11130v2 -
942 07-12 Score Attack: A Lower Bound Technique for Optimal Differentially Private Learning Score Attack: Eine untere Bound-Technik für optimales, differenziertes Private Learning 得分攻击: 最佳差异化私人学习的低劣技术 2303.07152v2 -
943 07-12 New Statistical and Computational Results for Learning Junta Distributions Neue statistische und rechnerische Ergebnisse für Junta-Distributionen 学习军军分发的新的统计和计算结果 2505.05819v3 -
944 07-12 Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers Adversarial Activation Patching: Ein Framework zur Erkennung und Abmilderung von Emergent Deception in sicherheitsorientierten Transformern 反反向启动补补补:在安全自动变形器中发现和减轻新出现欺骗的框架 2507.09406v1 -
945 07-12 Deep learning lattice gauge theories Theorien des tiefen Lernens von Gittermessgeräten 深深学习花束仪表学理论 2405.14830v2 -
946 07-12 No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms Nein, natürlich kann ich! Tiefere Feinabstimmung Angriffe, die Token-Level Sicherheitsmechanismen umgehen 更深的精准攻击 绕过托肯级安全机制 2502.19537v5 -
947 07-12 Scaling Laws for Optimal Data Mixtures Skalierungsgesetze für optimale Datenmischungen 优化数据混合法的缩放法 2507.09404v1 -
948 07-12 Bayesian Theory of Consciousness as Exchangeable Emotion-Cognition Inference Bayesische Bewusstseinstheorie als auswechselbare Emotion-Kognition-Schlussfolgerung 贝叶斯人的觉悟理论,作为可交流的情感 – – 情绪 – – 气氛推论 2407.09488v3 -
949 07-12 A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention Eine zufällige Matrix-Theorie Perspektive auf die Lerndynamik von mehrköpfiger latenter Aufmerksamkeit 多头端注意学习动态的随机矩阵理论视角 2507.09394v1 -
950 07-12 Geometric Generative Modeling with Noise-Conditioned Graph Networks Geometrische Generative Modellierung mit lärmkonditionierten Graphennetzen 带有噪音、有条件条件的图形网络的生成模型 2507.09391v1 -
951 07-12 Multi-Player Zero-Sum Markov Games with Networked Separable Interactions Multi-Player Zero-Sum Markov Spiele mit vernetzten Separable Interaktionen 多层零Sum Markov 游戏, 带有网络化分离互动 2307.09470v3 -
952 07-12 Credit Card Fraud Detection Using RoFormer Model With Relative Distance Rotating Encoding Kreditkarte Betrugserkennung mit RoFormer-Modell mit relativer Entfernung rotierende Encoding 使用具有相对远程旋转编码的ROFermer模型发现信用卡欺诈 2507.09385v1 -
953 07-12 Real-Time Adaptive Motion Planning via Point Cloud-Guided, Energy-Based Diffusion and Potential Fields Echtzeit-Adaptive Motion-Planung über Point Cloud-geführte, energiebasierte Diffusion und potenzielle Felder 通过点云引导、基于能源的传播和潜在领域进行实时适应性运动规划 2507.09383v1 -
954 07-12 Don’t be so negative! Score-based Generative Modeling with Oracle-assisted Guidance Seien Sie nicht so negativ! Score-basierte Generative Modellierung mit Oracle-assisted Guidance 不要这么消极! 2307.16463v2 -
955 07-12 Fair CCA for Fair Representation Learning: An ADNI Study Faire CCA für Fair Representative Learning: Eine ADNI-Studie 公平代表性学习公平共同国家评析:ADNI研究 2507.09382v1 -
956 07-12 ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans ESPFormer: Doppelstochastische Aufmerksamkeit mit erwarteten Sliced Transport Plans ESP Former: 带有预期切片运输计划的多孔蒸汽关注 2502.07962v2 -
957 07-12 Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers Untere Grenzen für die Ketten-of-Thought-Reasoning in Hard-Attention Transformers 硬注意力变换器中寻求链引因的下下下界宽度 2502.02393v3 -
958 07-12 Prune ‘n Predict: Optimizing LLM Decision-making with Conformal Prediction Prune ‘n Predict: Optimierung der LLM-Entscheidungsfindung mit konformer Vorhersage 普鲁奈预测:利用非正式预测优化LLM决策 2501.00555v2 -
959 07-12 TabDPT: Scaling Tabular Foundation Models on Real Data TabDPT: Scaling Tabular Foundation Models on Real Data TabDPT: 真实数据缩放表表基建模型 2410.18164v2 -
960 07-12 Meta-autoencoders: An approach to discovery and representation of relationships between dynamically evolving classes Meta-Autoencoder: Ein Ansatz zur Entdeckung und Darstellung von Beziehungen zwischen dynamisch sich entwickelnden Klassen Meta-autoencoldders:发现动态演变中的类别之间的关系并体现这种关系的方法 2507.09362v1 -
961 07-12 Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts Vermeiden von Leckagevergiftungen: Konzeptinterventionen unter Verteilungsverschiebungen 避免漏漏毒:分配变更下的概念干预 2504.17921v2 -
962 07-12 Impute With Confidence: A Framework for Uncertainty Aware Multivariate Time Series Imputation Impute With Confidence: Ein Framework für Unsicherheit im Bewusstsein multivariate Zeitreihen Imputation 充满信心的含义:不确定性意识多变时间序列计算框架 2507.09353v1 -
963 07-12 Watermarking Degrades Alignment in Language Models: Analysis and Mitigation Wasserzeichen degradiert Ausrichtung in Sprachmodellen: Analyse und Milderung 语言模型的分级调整:分析和减轻影响 2506.04462v3 -
964 07-12 Unified Linear Parametric Map Modeling and Perception-aware Trajectory Planning for Mobile Robotics Einheitliche lineare Parametrische Kartenmodellierung und Wahrnehmungs-Bewusst-Planung für mobile Robotik 移动机器人学统一线性参数测深图建模和感知感测轨迹规划 2507.09340v1 -
965 07-12 An Introduction to Flow Matching and Diffusion Models Eine Einführung in Flow Matching- und Diffusionsmodelle 流动匹配和推广模型介绍 2506.02070v2 -
966 07-12 WellPINN: Accurate Well Representation for Transient Fluid Pressure Diffusion in Subsurface Reservoirs with Physics-Informed Neural Networks WellPINN: Präzise Well Representation für Transient Fluid Pressure Diffusion in unterirdischen Reservoirs mit physikinformierten Neuronalen Netzwerken WellPINN: 物理成形神经网络在次表层储层中中流水压力扩散的准确代表性 2507.09330v1 -
967 07-12 AGFS-Tractometry: A Novel Atlas-Guided Fine-Scale Tractometry Approach for Enhanced Along-Tract Group Statistical Comparison Using Diffusion MRI Tractography AGFS-Traktometrie: Ein neuartiger Atlas-geführter Fine-Scale-Traktometrie-Ansatz für einen verbesserten along-Tract-Gruppen-Statistikvergleich mit Diffusions-MRT-Traktographie AGFS-Tracto量测:利用扩散MRI轨迹测量法,采用新式阿特拉斯综合地图集指导的微规模微规模轨迹测量方法,加强联合接触小组统计比较 2507.10601v1 -
968 07-12 Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? Bongard im Wunderland: Visuelle Puzzles, die KI immer noch verrückt machen? Bongard in Wonderland:仍然让AI疯掉的视觉图解? 2410.19546v4 -
969 07-12 LLM Agents Are the Antidote to Walled Gardens LLM-Agenten sind das Gegenmittel zu ummauerten Gärten LLM 药剂是被围墙隔绝的花园的抗药剂 2506.23978v2 -
970 07-12 Uncovering symmetric and asymmetric species associations from community and environmental data Entdeckung symmetrischer und asymmetrischer Artenverbände aus Gemeinschafts- und Umweltdaten 利用社区和环境数据覆盖对称和不对称物种协会 2507.09317v1 -
971 07-12 Emergence of Hierarchical Emotion Organization in Large Language Models Entstehung der Hierarchischen Emotionsorganisation in großen Sprachmodellen 大语言模式中等级情感组织的出现 2507.10599v1 -
972 07-12 DAA*: Deep Angular A Star for Image-based Path Planning DAA*: Deep Angular Ein Stern für bildbasierte Pfadplanung DAA*:基于图像的路径规划深角A星 2507.09305v1 -
973 07-12 ViT-ProtoNet for Few-Shot Image Classification: A Multi-Benchmark Evaluation ViT-ProtoNet für die Wenig-Schuss-Bildklassifikation: Eine Multi-Benchmark-Bewertung 鲜热图像分类Vit-ProtoNet:多基准评价 2507.09299v1 -
974 07-12 Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming Lernbasierte Multiuser-Scheichung in MIMO-OFDM-Systemen mit Hybrid-Beamforming MOIMO-OFDM系统和混合波束系统中基于学习的多用户规划 2506.08263v2 -
975 07-12 ClaritySpeech: Dementia Obfuscation in Speech ClaritySpeech: Dementia Verschleierung in der Rede 清晰的言语:言语中的痴呆症 2507.09282v1 -
976 07-12 A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving Eine Überprüfung der Belohnungsfunktionen für die Stärkung des Lernens im Kontext des autonomen Fahrens 在自主驾驶的情况下审查加强学习的奖励职能 2405.01440v2 -
977 07-12 Controllable Patching for Compute-Adaptive Surrogate Modeling of Partial Differential Equations Ansteuerbare Patching für die Berechnung adaptive Surrogate Modellierung von partiellen Differentialgleichungen 局部差别等量计算-加速替代模型可控补补丁 2507.09264v1 -
978 07-12 TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding TPP-SD: Beschleunigung der Transformer-Punkt-Prozedursampling mit spekulativer Dekodierung TPP-SD:加速变速点进程与投机代号抽样 2507.09252v1 -
979 07-12 GRAG: Graph Retrieval-Augmented Generation GRAG: Graph Retrieval-Augmented Generation GRAG: 图表检索-提款一代 2405.16506v3 -
980 07-12 Shaping Laser Pulses with Reinforcement Learning Laserpulse mit Verstärkungslernen gestalten 利用强化学习制造激光脉动 2503.00499v2 -
981 07-12 Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs Bepflanzt in der Vorausbildung, durch Finetuning abgeschwächt: Eine Fallstudie über die Herkunft von Kognitiv-Biasen in LLMs 编在培训前编,《微调:关于LLM中认知性双星起源的个案研究》,《微调摇摇晃》 2507.07186v2 -
982 07-12 PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution PanoDiff-SR: Dental Panoramic Radiographen mit Diffusion und Super-Auflösung synthetisieren PanoDiff-SR:利用传播和超分辨率合成牙科全无光辐射 2507.09227v1 -
983 07-12 Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models Feature-Extraktion und -Lenkung für eine verbesserte Kettenbildung in Sprachmodellen 语言模型中强化研究链理由的特征采掘和指南 2505.15634v4 -
984 07-12 Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift Kalibrierte und robuste Fundamentierungsmodelle für Vision-Sprache und medizinische Bildaufgaben unter Verteilungsverschiebung 分配变化下的愿景语言和医疗图像任务模型 2507.09222v1 -
985 07-12 Optimizing Basis Function Selection in Constructive Wavelet Neural Networks and Its Applications Optimierung der Basisfunktionsauswahl in konstruktiven Wavelet-Neuralnetzwerken und deren Anwendungen 在建设性动态神经网络及其应用中优化基础功能选择 2507.09213v1 -
986 07-12 Warm Starts Accelerate Generative Modelling Warmer Start beschleunigt generative Modellierung 温度起温加速生成模型 2507.09212v1 -
987 07-12 Capturing Unseen Spatial Extremes Through Knowledge-Informed Generative Modeling Ungesehene räumliche Extreme durch wissensbasierte generative Modellierung erfassen 通过知识化创创创型模型获取不见的空间极端 2507.09211v1 -
988 07-12 Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data Diffusion Datensatzkondensation: Training Ihres Diffusionsmodells schneller mit weniger Daten 传播数据集集中: 训练您的传播模型, 以更少数据更快的速度 2507.05914v2 -
989 07-12 XiChen: An observation-scalable fully AI-driven global weather forecasting system with 4D variational knowledge XiChen: Ein beobachtungs-skalierbares, voll KI-gesteuertes globales Wettervorhersagesystem mit 4D-Variationswissen Xichin Chhen: 一个具有4D变异知识的、可观测的完全可扩展的AI驱动的全球天气预报系统 2507.09202v1 -
990 07-12 Learning from M-Tuple Dominant Positive and Unlabeled Data Lernen von M-Tuple Dominant Positive und unmarkierte Daten 从 M- Tiple 主导正和非标签数据中学习 2506.15686v2 -
991 07-12 Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models Erkennen und Beschneiden Prominenter, aber detrimentaler Neuronen in großen Sprachmodellen 在大语言模型中检测和预视突出但有偏偏的神经元 2507.09185v1 -
992 07-12 CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models CASCADE Ihre Datensätze für Cross-Mode Knowledge Retrieval von Sprachmodellen CASCADE 语言模型跨模式知识检索数据集 2504.01450v2 -
993 07-12 Continual Reinforcement Learning by Planning with Online World Models Weiterbildung durch Planung mit Online-Weltmodellen 通过规划与在线世界模式持续加强学习 2507.09177v1 -
994 07-12 DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values DeltaSHAP: Erklären von Vorhersageentwicklungen bei der Online-Patientenüberwachung mit Shapley-Werten DelsaSHAP: 解释在有阴影值的在线患者监测中的预测演变 2507.02342v2 -
995 07-12 Towards Interpretable Drug-Drug Interaction Prediction: A Graph-Based Approach with Molecular and Network-Level Explanations Auf dem Weg zu einer interpretierbaren Drogen- und Drogen-Interaktion Vorhersage: Graphenbasierter Ansatz mit molekularen und netzwerkbasierten Erklärungen 迈向可解释的药物-药物-药物相互作用的预测:以图表为基础的方法与分子和网络一级的解释 2507.09173v1 -
996 07-12 Logits are All We Need to Adapt Closed Models Logits sind alles, was wir brauchen, um geschlossene Modelle anzupassen 只需登录即可,我们只需调整已关闭的模型 2502.06806v4 -
997 07-12 An Epistemic and Aleatoric Decomposition of Arbitrariness to Constrain the Set of Good Models Eine epistemische und aleatorische Zersetzung der Willkür, um das Set guter Modelle zu beschränken 向约束一套良好模型的可变性分解 2302.04525v2 -
998 07-12 Investigating the Robustness of Extreme Precipitation Super-Resolution Across Climates Untersuchung der Robustheit extremer Niederschlags-Super-Resolution über Klima hinweg 调查极端降水性超强 超分辨率 横跨气候 2507.09166v1 -
999 07-12 Tactile-VLA: Unlocking Vision-Language-Action Model’s Physical Knowledge for Tactile Generalization Tactile-VLA: Das physische Wissen des Vision-Sprache-Action-Modells für die Taktile Generalisierung 触觉-VLA:解锁视觉-语言-行动模型的物理知识促进触觉一般化 2507.09160v1 -
1000 07-12 Regularization-based Framework for Quantization-, Fault- and Variability-Aware Training Regularisierungsbasiertes Framework für Quantization-, Fehler- und Variability-Aware Training 量化、失责和易变-软件培训规范化框架 2503.01297v3 -
1001 07-12 AdRo-FL: Informed and Secure Client Selection for Federated Learning in the Presence of Adversarial Aggregator AdRo-FL: Informierte und sichere Kundenauswahl für das Federated Learning in der Gegenwart von Adversarial Aggregator ADRO-FL:在存在反versarial聚合体的情况下,为联邦学习进行知情和安全的客户选择 2506.17805v2 -
1002 07-12 Advanced Health Misinformation Detection Through Hybrid CNN-LSTM Models Informed by the Elaboration Likelihood Model (ELM) Fortschrittliche Gesundheits-Missinformationserkennung durch Hybrid-CNN-LSTM-Modelle Das Elaboration Likelihood Model (ELM) 通过有线电视新闻网-LSTM混合模型,通过 “ 发展相似性模型 “ (ELM)所了解的模型,发现高级健康错误信息 2507.09149v1 -
1003 07-12 A Randomized Algorithm for Sparse PCA based on the Basic SDP Relaxation Ein Randomisierter Algorithmus für Sparse PCA auf Basis der Basic SDP Relaxation 基于基本 SDP 放松的 SDP 随机化 Sparse 五氯苯甲醚的算法 2507.09148v1 -
1004 07-12 Continuous Spiking Graph Neural Networks Kontinuierliche Spiking Graph Neuronale Netzwerke 连续Spiking 图形神经网络 2404.01897v2 -
1005 07-12 HedraRAG: Coordinating LLM Generation and Database Retrieval in Heterogeneous RAG Serving HedraRAG: Koordinierung der LLM-Erzeugung und Datenbankwiederherstellung im heterogenen RAG-Servieren HedraRAG:在异基因RAG服务中协调LLM生成和数据库检索 2507.09138v1 -
1006 07-12 POIFormer: A Transformer-Based Framework for Accurate and Scalable Point-of-Interest Attribution POIFormer: Ein transformerbasierter Rahmen für präzise und skalierbare Point-of-Interest Attribution POI Foremer: 以变换器为基础的准确和可缩放的利点归属框架 2507.09137v1 -
1007 07-12 Dynamic Spiking Framework for Graph Neural Networks Dynamisches Spiking-Framework für Graphen-Neural-Netzwerke 图形神经网络动态Spiking框架 2401.05373v4 -
1008 07-12 MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian MSVD-Indonesier: Benchmark für multimodale Video-Text-Aufgaben auf Indonesisch MSVD-印度尼西亚文:印度尼西亚多式视频文字任务基准 2306.11341v2 -
1009 07-12 CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification CoCo: Ein gekoppeltes Kontrastrahmenwerk für eine nicht überwachte Domänen-Adaptive Graphenklassifikation Co: 未经监督的域适应性图表分类的相互抵触框架 2306.04979v4 -
1010 07-12 Heterogeneous Graph Prompt Learning via Adaptive Weight Pruning Heterogenes Graphen-Prompt-Lernen durch adaptive Gewichtsprüfung 通过适应性弱力缓冲快速学习 2507.09132v1 -
1011 07-12 Learning Traffic Anomalies from Generative Models on Real-Time Observations Verkehrsanomalien aus generativen Modellen auf Echtzeit-Beobachtungen lernen 实时观测生成模型的学习交通异常现象 2502.01391v5 -
1012 07-12 DuSEGO: Dual Second-order Equivariant Graph Ordinary Differential Equation DuSEGO: Zweifach-Äquivariant Graph Normal Differentialgleichung zweiter Ordnung DSEGO: 双二等等同图形普通等同法 2411.10000v2 -
1013 07-12 A Generalization Theory for Zero-Shot Prediction Eine Verallgemeinerungstheorie für Null-Shot-Vorhersage 零热预测通用理论 2507.09128v1 -
1014 07-12 Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs Divide-Then-Rule: Ein clustergetriebener Hierarchischer Interpolator für Attribute-Missing Graphen 区分后规则: 用于属性映射图的集成驱动等级式内插工具 2507.10595v1 -
1015 07-12 A Study of Value-Aware Eigenoptions Eine Studie über wertbewusste Eigenoptionen 价值-知识Eigen备选方法研究 2507.09127v1 -
1016 07-12 Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning Mind the Gap: Erhalten und Kompensieren der Modalitätslücke im CLIP-basierten kontinuierlichen Lernen 牢记差距:维护和补偿基于CLIP的不断学习模式差距 2507.09118v1 -
1017 07-12 KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding KodCode: Ein vielfältiger, anspruchsvoller und überprüfbarer synthetischer Datensatz für die Codierung KodCode:用于编码的多样化、挑战性和可核查合成数据集 2503.02951v2 -
1018 07-12 Deep Neural Network Based Accelerated Failure Time Models using Rank Loss Deep Neural Network Based Accelerated Failure Time Models mit Rang Loss 基于深神经网络的深神经网络加速失败时间模型 2206.05974v2 -
1019 07-12 CoVAE: Consistency Training of Variational Autoencoders CoVAE: Konsequentitätstraining von Variationalen Autoencodern COVAE: 对机动机动机动人员的统一培训 2507.09103v1 -
1020 07-12 AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model AHCPTQ: Genaue und hardwarekompatible Nachschulungs-Quantisierung für Segment-Anything-Modell ACHPTQ: 分片 “ 任何 “ 模式的准确和硬件兼容的训练后培训后量化 2503.03088v3 -
1021 07-12 S2SRec2: Set-to-Set Recommendation for Basket Completion with Recipe S2SRec2: Set-to-Set Empfehlung für Korb Fertigstellung mit Rezept S2SRec2:关于配有食谱的篮子补全的设置到设置建议 2507.09101v1 -
1022 07-12 On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving Zur Fragilität der multimodalen Wahrnehmung zur zeitlichen Fehlausrichtung im autonomen Fahren 自主驾驶时时时失调的多模式观念的易变性 2507.09095v1 -
1023 07-12 Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization Optimale Hochwahrscheinlichkeit Konvergenz von nichtlinearer SGD unter stark gestaffelter Geräuschentwicklung durch Symmetrisierung 非线性SGD在通过平衡化的重尾噪音下达到最佳高概率一致 2507.09093v1 -
1024 07-12 MI CAM: Mutual Information Weighted Activation Mapping for Causal Visual Explanations of Convolutional Neural Networks MI CAM: Gegenseitige Information Gewichtete Aktivierungsmapping für ursächliche visuelle Erklärungen konvolutionärer neuraler Netzwerke MI CAM: 关于革命神经网络的客观视觉解释的相互信息加权活动绘图 2507.09092v1 -
1025 07-12 Continuous-Time Signal Decomposition: An Implicit Neural Generalization of PCA and ICA Kontinuierliche Zeitsignalzersetzung: Eine implizite Neuralverallgemeinerung von PCA und ICA 连续信号分解:五氯苯甲醚和ICA的隐性神经化 2507.09091v1 -
1026 07-12 Deep Reinforcement Learning with Gradient Eligibility Traces Tiefe Verstärkung Lernen mit gradienten Berechtigungsspuren 具有渐进资格追踪的深强化学习 2507.09087v1 -
1027 07-12 Queue up for takeoff: a transferable deep learning framework for flight delay prediction Warteschlange für Start: ein übertragbares Deep-Learning-Framework für die Flugverzögerungsvorhersage 飞行延迟预测的可转让深程学习框架 2507.09084v1 -
1028 07-11 (5) Infinite Video Understanding Unendliches Video-Verständnis 无限视频理解 2507.09068v1 -
1029 07-11 HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization HYPEROFA: Erweitern von LLM Vokabeln auf neue Sprachen über Hypernetwork-basierte Einbettung in Initialisierung HYPROOFA:通过基于超网络的嵌入式初始化,将LLM词汇扩大到新语言 2504.21018v2 -
1030 07-11 Risk Bounds For Distributional Regression Risikogrenzen für distributive Regression 分布性倒退的风险临界值 2505.09075v3 -
1031 07-11 SetupBench: Assessing Software Engineering Agents’ Ability to Bootstrap Development Environments SetupBench: Bewertung der Fähigkeit von Software-Engineering-Agenten zu Bootstrap-Entwicklungsumgebungen 设置基准:评估软件工程代理器的能力,以建立发展环境 2507.09063v1 -
1032 07-11 Imitation Learning in Continuous Action Spaces: Mitigating Compounding Error without Interaction Imitation Learning in Continuous Action Spaces: Compounding Fehler ohne Wechselwirkungen 连续行动空间的模拟学习:没有相互作用的减缓化合物错误 2507.09061v1 -
1033 07-11 Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins Konformations-Aware-Struktur Vorhersage von Antigen-Erkennung Immunproteine 抗原识别免疫素蛋白的预测 2507.09054v1 -
1034 07-11 Can Contrastive Learning Improve Class-Imbalanced Diffusion Model? Kann Kontrastives Lernen das Klassen-Imbalanced Diffusion Model verbessern? 差异学习能改善班级平衡传播模式吗? 2507.09052v1 -
1035 07-11 GPS-Aided Deep Learning for Beam Prediction and Tracking in UAV mmWave Communication GPS-gestütztes Deep Learning für Strahlvorhersage und Tracking in UAV mmWave Kommunikation GPS 辅助的无人驾驶飞行器波段通信光束预测和跟踪深层学习 2505.17530v2 -
1036 07-11 A Method for Learning to Solve Parametric Bilevel Optimization with Coupling Constraints Eine Methode zum Lösen parametrischer Bilevel-Optimierung mit Koppelungsbeschränkungen 学会解决双级优化和组合制约的 参数参数优化方法 2507.09050v1 -
1037 07-11 Shortening the Trajectories: Identity-Aware Gaussian Approximation for Efficient 3D Molecular Generation Verkürzung der Trajektorien: Identity-Aware Gaussian Approximation für effiziente 3D-Molekulargeneration 缩短轨迹:为高效的三维分子生成而使身份-软件高斯近似化 2507.09043v1 -
1038 07-11 Behavioral Exploration: Learning to Explore via In-Context Adaptation Verhaltensforschung: Lernen, durch In-Context-Anpassung zu erkunden B. 行为探索:学习通过内容内适应探索 2507.09041v1 -
1039 07-11 BrainLesion Suite: A Flexible and User-Friendly Framework for Modular Brain Lesion Image Analysis BrainLesion Suite: Ein flexibles und benutzerfreundliches Framework für die modulare Gehirn-Lesions-Bildanalyse 脑悬浮套件:模块脑悬浮图像分析灵活和用户友好框架 2507.09036v1 -
1040 07-11 Confounder-Free Continual Learning via Recursive Feature Normalization Confounder-Free Continual Learning via Rekursive Feature Normalisierung 通过递归性地貌正常化实现连续学习 2507.09031v1 -
1041 07-11 Model Parallelism With Subnetwork Data Parallelism Modell-Parallelität mit Subnetzwerk-Daten-Parallelität 与亚网络数据平行的模型平行主义 2507.09029v1 -
1042 07-11 On the Gradient Domination of the LQG Problem Zur Gradienten-Domination des LQG-Problems LQG 问题的渐变多变 2507.09026v1 -
1043 07-11 Lizard: An Efficient Linearization Framework for Large Language Models Lizard: Ein effizienter Linearisierungsrahmen für große Sprachmodelle Lizard:大型语言模型的高效线性框架 2507.09025v1 -
1044 07-11 Adaptive Non-local Observable on Quantum Neural Networks Adaptive nicht-lokale Beobachtung auf Quantum-Neural-Netzwerken 在量子神经网络上可观测的非当地可观测 2504.13414v3 -
1045 07-11 On Evaluating Performance of LLM Inference Serving Systems Zur Bewertung der Leistung von LLM-Inferenz-Serviersystemen 评价LLLM LM 推断服务系统的性能 2507.09019v1 -
1046 07-11 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration Ausschüttung von Kompetenzen aus nicht gekennzeichneten vorherigen Daten für effiziente Online-Exploration 从未贴标签的先前数据中利用技能以进行有效的在线探索 2410.18076v4 -
1047 07-11 Multimodal Cardiovascular Risk Profiling Using Self-Supervised Learning of Polysomnography Multimodales kardiovaskuläres Risiko Profilieren mittels selbstüberwachtem Lernen der Polysomnographie 利用对多光谱学进行自我监督学习的多式多模式心血管风险分析 2507.09009v1 -
1048 07-11 Surprisingly High Redundancy in Electronic Structure Data Überraschend hohe Redundanz in elektronischen Strukturdaten 电子结构数据冗余率之高令人惊讶 2507.09001v1 -
1049 07-11 Fixed-Confidence Multiple Change Point Identification under Bandit Feedback Fixed-Confidence Multiple Change Point Identification unter Bandit Feedback 土匪反馈下的多变点识别 2507.08994v1 -
1050 07-11 Physics-Based Machine Learning Closures and Wall Models for Hypersonic Transition-Continuum Boundary Layer Predictions Physikbasiertes maschinelles Lernen von Schließungen und Wandmodellen für Hypersonic Transition-Continuum Boundary Layer Vorhersagen 基于物理的机器学习封闭和超音速过渡-连续边界层预测墙模型 2507.08986v1 -
1051 07-11 Exploiting Leaderboards for Large-Scale Distribution of Malicious Models Ausnutzung von Leaderboards für die großräumige Verbreitung von bösartigen Modellen 利用恶意模式大规模分布模式主导板 2507.08983v1 -
1052 07-11 VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models VIP: Visueller Informationsschutz durch feindliche Angriffe auf Vision-Sprachen-Modelle 要人:通过对视觉语言模型的对立攻击保护视觉信息 2507.08982v1 -
1053 07-11 Learning Diffusion Models with Flexible Representation Guidance Diffusionsmodelle mit flexibler Darstellungsführung lernen 具有灵活代表制指导的学习传播模式 2507.08980v1 -
1054 07-11 PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection PRISM: Reduzieren von sauberen Impliziten in Vision-Sprachenmodellen mit LLM-geführter Einbettung PRISM: 利用LLM-引导嵌入式预测减少视觉-语言模型中的纯净隐含比喻 2507.08979v1 -
1055 07-11 Exploration Behavior of Untrained Policies Explorationsverhalten ungeübter Politiken 未经过培训的政策的探索行为 2506.22566v2 -
1056 07-11 Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery Simulation als Aufsicht: Mechanistische Vorausbildung für wissenschaftliche Entdeckung 模拟监督:科学发现机械预科训练 2507.08977v1 -
1057 07-11 Simulating Three-dimensional Turbulence with Physics-informed Neural Networks Simulation von dreidimensionalen Turbulenzen mit physikinformierten Neuronalen Netzwerken 用物理知情神经网络模拟三维振动 2507.08972v1 -
1058 07-11 ToxBench: A Binding Affinity Prediction Benchmark with AB-FEP-Calculated Labels for Human Estrogen Receptor Alpha ToxBench: Ein verbindlicher Affinitätsvorhersage-Benchmark mit AB-FEP-Kalkulierten Etiketten für den menschlichen Östrogenrezeptor Alpha ToxBonch:与AB-FEP-Calculate的人体雌性激素受体实验室的捆绑性亲同预测基准 2507.08966v1 -
1059 07-11 Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models Theorie-informierte Verbesserungen an klassifikatorfreier Anleitung für diskrete Diffusionsmodelle 对分辨扩散模型的无分类/无分类指南的理论化改进 2507.08965v1 -
1060 07-11 Stochastic Approximation with Block Coordinate Optimal Stepsizes Stochastische Annäherung mit Blockkoordinaten Optimale Stufengrößen 带有块坐标坐标最佳步进的斯托步相近 2507.08963v1 -
1061 07-11 From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis Vom Video zum EEG: Anpassung der gemeinsamen Einbettung von vorausschauender Architektur an die Entdeckung visueller Konzepte in der Gehirnsignalanalyse 从视频到EEG:使联合嵌入的预测结构适应脑信号分析中的不可见视觉概念 2507.03633v4 -
1062 07-11 Individual Causal Inference with Structural Causal Model Individueller Kausalzusammenhang mit strukturellem Kausalmodell 与结构因果模型的个体因果推断 2506.17300v2 -
1063 07-11 How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs Wie man einen Führer ausbildet: Hierarchische Vernunft in multi-agenten LLMs 如何培训领导者:多机构LLM中的等级原因 2507.08960v1 -
1064 07-11 Graph Neural Network Enhanced Sequential Recommendation Method for Cross-Platform Ad Campaign Diagramm Neuronales Netzwerk Verbesserte sequentielle Empfehlungsmethode für plattformübergreifende Werbekampagnen 跨平台运动的神经网络强化序列建议方法 2507.08959v1 -
1065 07-11 Beyond Scores: Proximal Diffusion Models Beyond Scores: Proximale Diffusionsmodelle 超过分数: 快速扩散模型 2507.08956v1 -
1066 07-11 Spectral Manifold Harmonization for Graph Imbalanced Regression Spektrale Manifold Harmonisierung für Graph Imbalanced Regression 图I平衡回归的光谱蒙面协调 2507.01132v2 -
1067 07-11 Drowning in Documents: Consequences of Scaling Reranker Inference Ertrinken in Dokumenten: Konsequenzen der Skalierungs-Reranker-Schlussfolgerung 文件中淹没:扩大重新排序者推断的后果 2411.11767v2 -
1068 07-11 The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability? Das nicht-lineare Repräsentations-Dilemma: Reicht die Kausale Abstraktion für die mechanistische Interpretationsfähigkeit? “非碱性代表:因果抽象是否足以进行机械解释?” 2507.08802v1 -
1069 07-11 NeuralOS: Towards Simulating Operating Systems via Neural Generative Models NeuralOS: Auf dem Weg zur Simulation von Betriebssystemen über neurale Generative Modelle NeurorOS:通过神经产生模型努力模拟操作系统 2507.08800v1 -
1070 07-11 Filter Equivariant Functions: A symmetric account of length-general extrapolation on lists Filter Equivariant Funktionen: Eine symmetrische Darstellung der Längen-allgemeinen Extrapolation auf Listen 过滤器等同函数 : 列表中长度一般外推法的对称账户 2507.08796v1 -
1071 07-11 One Token to Fool LLM-as-a-Judge Ein Token zum Narren LLM-as-a-Richter 愚人一拳LLM -A法官 2507.08794v1 -
1072 07-11 Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning Optimistische Exploration für risikoabhängiges verstärktes Lernen 最佳探索,以进行风险与风险相关的强化学习 2507.08793v1 -
1073 07-11 MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation MH-FSF: Ein einheitliches Framework zur Überwindung von Benchmarking und Reproduzierbarkeitsbeschränkungen in der Feature Selection Evaluation MH-FSF:在地物选择评价中克服基准设定和可复制限制的统一框架 2507.10591v1 -
1074 07-11 Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks Lernen-unterstützte Bigraph Matching Ansatz zur Multi-Crew Wiederherstellung beschädigter Stromnetze mit Straßentransport-Netzwerke gekoppelt 与公路运输网相结合的多组恢复受损电力网的学习辅助活书匹配方法 2506.19703v2 -
1075 07-11 Exploring Efficient Quantification of Modeling Uncertainties with Differentiable Physics-Informed Machine Learning Architectures Effiziente Quantifizierung von Modellierungsunsicherheiten mit differenzierten physikinformierten Machine Learning-Architekturen 探索对以不同物理和机械化学习架构建模的不确定性模型化进行高效率的量化 2506.18247v2 -
1076 07-11 Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees Greedy Low-Rank Gradient Compression für verteiltes Lernen mit Konvergenzgarantien 利用聚合担保分配学习的贪婪低频梯度压缩 2507.08784v1 -
1077 07-11 Predicting Barge Presence and Quantity on Inland Waterways using Vessel Tracking Data: A Machine Learning Approach Vorhersagen von Barge Präsenz und Menge auf Binnenwasserstraßen mit Vessel Tracking Daten: Ein Ansatz zum maschinellen Lernen 利用船舶跟踪数据预测内陆水道的内河水道存在和数量:机械学习方法 2501.00615v2 -
1078 07-11 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity BlockFFN: Auf dem Weg zur End-Side Acceleration-Friendly Mixture-of-Experts mit Chunk-Level-Aktivierung Sparsity 块块FFN: 向具有整块级激活分级的 终端- 双极加速- 友好混合混合专家方向 2507.08771v1 -
1079 07-11 A Hybrid Multi-Well Hopfield-CNN with Feature Extraction and K-Means for MNIST Classification Hybrides Multiwell-Hopfield-CNN mit Feature Extraction und K-Means für die MNIST-Klassifikation 多功能Hopfield-CNN混合型多功能井-CNN,具有用于MNIST分类的地貌采掘和K-MISM-Means 2507.08766v1 -
1080 07-11 Local Flow Matching Generative Models Lokale Flow-Matching Generative Modelle 本地流程匹配生成模型 2410.02548v3 -
1081 07-11 The Bayesian Approach to Continual Learning: An Overview Der Bayesische Ansatz zum kontinuierlichen Lernen: Ein Überblick Bayesian 持续学习方法:概览 2507.08922v1 -
1082 07-11 Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data Straffung undurchführbarer Aktionen und Belohnungsskalierung im Ausbau des Lernens mit Offline-Daten 利用离线数据在加强学习中处罚不可行的行动和奖励措施 2507.08761v1 -
1083 07-11 The Value of Prediction in Identifying the Worst-Off Der Wert der Vorhersage bei der Identifizierung des Schlimmsten 预测在查明最有害的 2501.19334v3 -
1084 07-11 ML-Based Automata Simplification for Symbolic Accelerators ML-basierte Automata-Vereinfachung für symbolische Beschleuniger ML 符号加速器的基于 ML 的自动数据简化 2507.08751v1 -
1085 07-11 Modeling Partially Observed Nonlinear Dynamical Systems and Efficient Data Assimilation via Discrete-Time Conditional Gaussian Koopman Network Modellierung teilweise beobachtete nichtlineare dynamische Systeme und effiziente Datenassimilation über diskret-zeitbedingtes Gaußian Koopman Network 通过分立时间条件性高斯扬库普曼网络模拟部分观测的非线性非线性动态系统和有效的数据同化 2507.08749v1 -
1086 07-11 Partitioned Hybrid Quantum Fourier Neural Operators for Scientific Quantum Machine Learning Partitionierte Hybrid-Quantum Fourier-Neural-Betreiber für das wissenschaftliche Quantenmaschinenlernen 用于科学量子机器学习的四级神经操作员 2507.08746v1 -
1087 07-11 Hashing for Fast Pattern Set Selection Hashing für schnelle Muster Set Auswahl 仓促快速模式集选择 2507.08745v1 -
1088 07-11 Discovering Algorithms with Computational Language Processing Algorithmen mit numerischer Sprachverarbeitung entdecken 使用计算语言语言处理发现算法 2507.03190v2 -
1089 07-11 Adaptive Nonlinear Vector Autoregression: Robust Forecasting for Noisy Chaotic Time Series Adaptive nichtlineare Vektor-Autoregression: Robuste Prognose für lärmende Chaotische Zeitreihen 非线性适应性非线性矢量自动递减: 噪声拖拉时间序列的强力预报 2507.08738v1 -
1090 07-11 Catastrophic Forgetting Mitigation Through Plateau Phase Activity Profiling Katastrophisches Vergessen der Milderung durch Plateau-Phasen-Aktivität Profiling 通过高原阶段活动分析,通过高原阶段减轻灾难 2507.08736v1 -
1091 07-11 Bias-Aware Mislabeling Detection via Decoupled Confident Learning Bias-Aware-Mislabeling-Erkennung durch entkoppeltes vertrauensvolles Lernen 通过解开信任学习解开错误标签检测 2507.07216v2 -
1092 07-11 Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks Alternierende Gradientenströme: Eine Theorie des Feature-Lernens in zweischichtigen Neuronalen Netzwerken 交错的渐变流:两层神经网络中的特色学习理论 2506.06489v2 -
1093 07-11 Monitoring Risks in Test-Time Adaptation Überwachung von Risiken bei der Anpassung an die Testzeit 监测试验时间适应中的风险 2507.08721v1 -
1094 07-11 On the Effect of Regularization in Policy Mirror Descent Auf die Auswirkungen der Regularisierung im politischen Spiegelabbruch 对政策从属来源正规化的影响的影响 2507.08718v1 -
1095 07-11 On learning functions over biological sequence space: relating Gaussian process priors, regularization, and gauge fixing Auf Lernfunktionen über biologischen Sequenzraum: Gaußsche Prozessvorhersage, Regularisierung und Messwertfixierung 生物序列空间学习功能方面的学习功能:与高斯进程前期、正规化和测量确定有关 2504.19034v2 -
1096 07-11 Rethinking Approximate Gaussian Inference in Classification Ungefähre gaussische Schlussfolgerung in der Klassifizierung neu denken 重新思考约近高斯在分类中的推理 2502.03366v2 -
1097 07-11 SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations SPLASH! Probeneffizientes Inverse Verstärkungslernen auf Präferenzbasis für langhorizontige Adversarialaufgaben aus suboptimalen Hierarchischen Demonstrationen 苏丹解放军-苏丹解放军-苏丹解放军-苏人解(苏人解)为次最佳等级示威的长风对流任务提供抽样高效的基于优惠的反反强化学习学习 2507.08707v1 -
1098 07-11 Conditional regression for the Nonlinear Single-Variable Model Bedingte Regression für das nichtlineare Single-Variable Modell 非线性单一可变模式有条件回归 2411.09686v2 -
1099 07-11 SEREP: Semantic Facial Expression Representation for Robust In-the-Wild Capture and Retargeting SEREP: Semantische Gesichtsausdruck-Darstellung für robustes In-the-Wild-Capture und Retargeting SEREP: 野外强力捕捉和重新瞄准目标的语义法表达表 2412.14371v3 -
1100 07-11 Domain-Informed Operation Excellence of Gas Turbine System with Machine Learning Domain-informierte Operation Exzellenz des Gasturbinensystems mit maschinellem Lernen 采用机器学习的天然气涡轮系统内部一体化英才行动 2507.08697v1 -
1101 07-11 Learnable quantum spectral filters for hybrid graph neural networks Erlernbare Quantenspektralfilter für hybride Graphen-Neuralnetzwerke 用于混合图形神经网络的可学习量子光谱过滤器 2507.05640v2 -
1102 07-11 PREAMBLE: Private and Efficient Aggregation via Block Sparse Vectors PRÄAMBLE: Private und effiziente Aggregation über Block Sparse Vektoren PREAMBL: 通过块状散射矢量进行私人和高效聚合 2503.11897v2 -
1103 07-11 Forget Me Not: Fighting Local Overfitting with Knowledge Fusion and Distillation Vergessen Sie mich nicht: Gegen lokales Überpassen mit Wissensfusion und Destillation kämpfen 忘记我,不要忘记我,不要在本地与知识融合和蒸馏的重叠作斗争。 2507.08686v1 -
1104 07-11 Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness Wiederkehrende Konvergenz: Umwerfende Komplexität jenseits von Lipschitz Smoothness 重新审视趋同:利普施茨平滑之后的复杂程度 2507.08913v1 -
1105 07-11 Open Materials Generation with Stochastic Interpolants Offene Materialgenerierung mit stochastischen Interpolanten 与室内内刑警一起制造开放材料 2502.02582v2 -
1106 07-11 Fair-FLIP: Fair Deepfake Detection with Fairness-Oriented Final Layer Input Prioritising Fair-FLIP: Faire Deepfake-Erkennung mit Fairness-orientiertem Final Layer Input Priorisierung Fair-FLIP:以公平为导向、以公平为导向的最后层投入为优先的公平深海探测 2507.08912v1 -
1107 07-11 Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs Modellkollaps ist kein Fehler, sondern ein Feature in Machine Unlearning für LLMs 模型折叠不是臭虫,而是机器为 LLM 取消学习的特写 2507.04219v2 -
1108 07-11 Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry Feature Learning beyond the Lazy-Rich Dichotomie: Einblicke aus der Repräsentationsgeometrie 超越Lazy-Rich二分切开术的特征学习:代表式几何的透视 2503.18114v2 -
1109 07-11 The Impact of Automatic Speech Transcription on Speaker Attribution Die Auswirkungen der automatischen Sprachtranskription auf die Sprecherzuweisung 自动发言限制对议长权力的影响 2507.08660v1 -
1110 07-11 Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees Sicheres tiefes Stärkungslernen für Ressourcenallokation mit Spitzenzeit der Informationsverletzungsgarantien 安全深强化学习,以进行违反信息达到高峰年龄的违反信息保障的资源分配 2507.08653v1 -
1111 07-11 Scaling Attention to Very Long Sequences in Linear Time with Wavelet-Enhanced Random Spectral Attention (WERSA) Skalierung der Aufmerksamkeit auf sehr lange Sequenzen in linearer Zeit mit Wavelet-erweiterter Zufallsspektral-Achtung (WERSA) 以波浪增强随机光谱注意, 将注意力转向线性时间的甚长序列( WERSA) 2507.08637v1 -
1112 07-11 Entangled Threats: A Unified Kill Chain Model for Quantum Machine Learning Security Verschränkte Bedrohungen: Ein einheitliches Kill Chain Modell für Quantum Machine Learning Security 相互纠缠的威胁:量子机器学习安全的统一杀手链模式 2507.08623v1 -
1113 07-11 Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference Mind the Memory Gap: Enthüllen von GPU-Flaschenhalsen in großflächiger LLM-Inferenz 牢记记忆差距:大型批量LLM 推理中的 GPU 堆积点 2503.08311v2 -
1114 07-11 A Malliavin calculus approach to score functions in diffusion generative models Ein Malliavin Kalkül Ansatz, um Funktionen in Diffusion generative Modelle punkten 以Malliavin微积分法在传播基因变异模型中计分功能 2507.05550v2 -
1115 07-11 Towards Collaborative Fairness in Federated Learning Under Imbalanced Covariate Shift Auf dem Weg zu kollaborativer Fairness im Federated Learning under Imbalanced Covariate Shift 实现在平衡的共变调整下实现联邦学习合作公平 2507.08617v1 -
1116 07-11 AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs AgentsNet: Koordination und kollaborative Reasoning in Multi-Agent LLMs 网:多机构LLM中的协调与合作理由 2507.08616v1 -
1117 07-11 Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data Emergent Natural Language mit Kommunikationsspielen zur Verbesserung der Bildbeschriftung Fähigkeiten ohne zusätzliche Daten 新兴自然语言与交流运动会:在没有额外数据的情况下提高图像能力交流运动会 2507.08610v1 -
1118 07-11 Attribution assignment for deep-generative sequence models enables interpretability analysis using positive-only data Zuordnungszuweisung für tiefgenerative Sequenzmodelle ermöglicht eine Interpretationsanalyse mit Positiv-Only-Daten 深遗传序列模型的归属分配,使得能够使用只使用正数数据的可解释性分析 2506.23182v2 -
1119 07-11 MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs MedSegFactory: Textgeführte Generation medizinischer Image-Mask-Paare MedSegFactory: 以文本指导方式制作的医学图像图像-面面对称 2504.06897v2 -
1120 07-11 Remote Sensing Reveals Adoption of Sustainable Rice Farming Practices Across Punjab, India Fernerkundung offenbart Annahme nachhaltiger Rice Farming-Praktiken in Punjab, Indien 在印度旁遮普省各地采用可持续的稻米耕作做法 2507.08605v1 -
1121 07-11 ADAPT: A Pseudo-labeling Approach to Combat Concept Drift in Malware Detection ADAPT: Ein Pseudo-Labeling-Ansatz zur Bekämpfung von Konzept Drift bei Malware-Erkennung ADAPT: 一种以优多为标签的方法,以对抗马利软件探测中的漂流概念 2507.08597v1 -
1122 07-11 The Engineer’s Dilemma: A Review of Establishing a Legal Framework for Integrating Machine Learning in Construction by Navigating Precedents and Industry Expectations Das Dilemma des Ingenieurs: Eine Überprüfung der Schaffung eines rechtlichen Rahmens für die Integration von maschinellem Lernen in den Bau durch Navigieren von Vor- und Industrieerwartungen 工程师的难题:审查建立一个法律框架,通过控制先例和工业预期,将机械学习纳入建筑的法律框架 2507.08908v1 -
1123 07-11 On the Gaussian process limit of Bayesian Additive Regression Trees Auf der Gaußschen Prozessgrenze von Bayesian Additive Regression Trees Bayesian Additive 倒退树的高斯进程极限 2410.20289v2 -
1124 07-11 What should a neuron aim for? Designing local objective functions based on information theory Was sollte ein Neuron anstreben? Auf der Grundlage der Informationstheorie lokale objektive Funktionen entwerfen 神经神经元的目标应该是什么?根据信息理论设计当地客观功能 2412.02482v4 -
1125 07-11 AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling AbbIE: Autoregressiver Blockbasierter iterativer Encoder für effiziente Sequenzmodellierung BBIE: 高效序列建模自动递减区块迭代计算器 2507.08567v1 -
1126 07-11 Data-driven system identification using quadratic embeddings of nonlinear dynamics Datengesteuerte Systemidentifikation mittels quadratischer Einbettungen nichtlinearer Dynamik 利用非线性动态的二次嵌入进行数据驱动系统识别 2501.08202v2 -
1127 07-11 LITE: Efficiently Estimating Gaussian Probability of Maximality LITE: Effiziente Bewertung der Gaußschen Wahrscheinlichkeit von Maximalität LITE:有效估计高斯人最大化的概率 2501.13535v3 -
1128 07-11 GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction GNN-ALLP:基于模拟电路链接预测的图表神经网络 2504.10240v3 -
1129 07-11 Leveraging priors on distribution functions for multi-arm bandits Nutzung von Vorabinformationen über Verteilungsfunktionen für Mehrarmbanditen 利用多武器强盗分配功能的前身 2503.04518v2 -
1130 07-11 SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2 SAM2RL: Auf dem Weg zu einer verstärkten Gedächtnissteuerung im Segment Anything Modell 2 SAM2RL: 争取加强第2部分 “ 任何内容 “ 模式中的学习记忆控制 2507.08548v1 -
1131 07-11 Quantum Algorithms for Projection-Free Sparse Convex Optimization Quantenalgorithmen für projektionsfreie Sparse Convex-Optimierung 用于无投射无孔式微粒电解优化的量图量算法 2507.08543v1 -
1132 07-11 CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes CircFormerMoE: Ein durchgängiges Deep-Learning-Framework für die kreisförmige RNA-Splice-Site-Erkennung und -Pairing in Pflanzengenomen Circ FormerMoE: 植物基因组中循环RNA Spolice Spolice Spolice 站点探测和配对的尾端至尾端深层学习框架 2507.08542v1 -
1133 07-11 Recursive Reward Aggregation Rekursive Prämienaggregation 递递回报聚合 2507.08537v1 -
1134 07-11 Multiaccuracy and Multicalibration via Proxy Groups Multiakkuratität und Multikalibrierung über Proxy-Gruppen 通过代理集团实现多准确度和多校准 2503.02870v3 -
1135 07-11 Binary and Ternary Quantization Can Enhance Feature Discrimination Binäre und Ternäre Quantisierung kann Feature-Diskriminierung verbessern 二进制和三进制量化能够增强特征歧视 2504.13792v2 -
1136 07-11 Communities in the Kuramoto Model: Dynamics and Detection via Path Signatures Gemeinschaften im Kuramoto-Modell: Dynamik und Erkennung über Pfadsignaturen 仓本模式中的社区:动态和通过路径签名探测 2503.17546v3 -
1137 07-11 REGEN: A Dataset and Benchmarks with Natural Language Critiques and Narratives REGEN: Ein Datensatz und Benchmarks mit natürlichen Sprachkritiken und Erzählungen REGEN: 一套具有自然语种背景和叙述的数据集和基准 2503.11924v2 -
1138 07-11 Data Depth as a Risk Datentiefe als Risiko 数据深度作为风险 2507.08518v1 -
1139 07-11 SFedKD: Sequential Federated Learning with Discrepancy-Aware Multi-Teacher Knowledge Distillation SFedKD: Sequentielles Föderales Lernen mit Diskrepanz-Bewusst-Multi-Lehrer-Wissensdestillation SFedKD: 分级的联邦学习与差异-软件软件多教学员知识蒸馏 2507.08508v1 -
1140 07-11 Physics-informed machine learning: A mathematical framework with applications to time series forecasting Physik-informiertes maschinelles Lernen: Ein mathematisches Rahmenwerk mit Anwendungen zur Zeitreihenvorhersage 物理知情机机学习:一个数学框架,可应用于时间序列预测 2507.08906v1 -
1141 07-11 One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning One-Pass to Reason: Token-Duplikation und Block-Spar-Maske für effizientes Feintuning auf Multi-Turn-Reasoning 单向理由:在多向理由上高效精美调整的相重复和块分割掩码 2504.18246v2 -
1142 07-11 Universal Approximation Theorem for a Single-Layer Transformer Universelles Approximationstheorem für einen Single-Layer Transformer 单层变形器的通用近光理论论 2507.10581v1 -
1143 07-11 Feasibility Study of CNNs and MLPs for Radiation Heat Transfer in 2-D Furnaces with Spectrally Participative Gases Machbarkeitsstudie von CNNs und MLPs für den Strahlungswärmetransfer in 2-D-Öfen mit Spektrally Participative Gasen 关于有线电视新闻网和多频多频卫星在2-D发热中用光谱参与气体进行辐射热传导的有线电视新闻网和 MLP的可行性研究 2506.08033v3 -
1144 07-11 SynBridge: Bridging Reaction States via Discrete Flow for Bidirectional Reaction Prediction SynBridge: Überbrückungsreaktionszustände über diskreten Fluss für bidirektionale Reaktionsvorhersage SynBridge:通过分向流为双向反应预测进行连接反应国家 2507.08475v1 -
1145 07-11 Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model Squeeze the Soaked Sponge: Effiziente Off-Policy-Verstärkung Feinsteuerung für großes Sprachmodell 挤压海绵:高效非政策强化大语言模式的高效非政策改进微调 2507.06892v3 -
1146 07-11 Evaluating SAE interpretability without explanations Bewertung der SAE-Interpretation ohne Erklärungen 评估是否可无解释地对SAE进行可解释性评估 2507.08473v1 -
1147 07-11 Predicting Air Pollution in Cork, Ireland Using Machine Learning Vorhersage der Luftverschmutzung in Cork, Irland durch maschinelles Lernen 利用机器学习预测爱尔兰科克的空气污染 2507.04196v2 -
1148 07-11 Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings Neural Concept Verifier: Scaling Prover-Verifier Spiele über Concept Encodings 神经概念验证符:通过概念编码来缩放Prover-Ver化游戏 2507.07532v2 -
1149 07-11 Pre-Training LLMs on a budget: A comparison of three optimizers Pre-Training LLMs auf einem Budget: Ein Vergleich von drei Optimierern 预算培训前LLMLM项目:三个优化器的比较 2507.08472v1 -
1150 07-11 Last Layer Hamiltonian Monte Carlo Letzte Schicht Hamiltonian Monte Carlo 汉密尔顿·蒙特卡洛 2507.08905v1 -
1151 07-11 Sculpting Quantum Landscapes: Fubini-Study Metric Conditioning for Geometry Aware Learning in Parameterized Quantum Circuits Sculpting Quantum Landscapes: Fubini-Studie Metric Conditioning for Geometry Aware Learning in parameterized Quantum Circuits 雕刻量子地貌:在可计量量子电路进行几何认知学习的 Fubini-Study 测量测量条件 2506.21940v3 -
1152 07-11 Ranked Set Sampling-Based Multilayer Perceptron: Improving Generalization via Variance-Based Bounds Ranked Set Sampling-based Multilayer Perceptron: Verbesserung der Generalisierung durch variance-based Bounds 按等级排列的基于抽样的多层概念:通过基于差异的边界改进普遍化 2507.08465v1 -
1153 07-11 Collaborative filtering based on nonnegative/binary matrix factorization Kollaborative Filterung auf der Grundlage nichtnegativer/binärer Matrixfaktorisierung 基于非负负/二进制矩阵因子化的合作过滤 2410.10381v3 -
1154 07-11 Space filling positionality and the Spiroformer Raumfüllpositionalität und der Spiroformer 空间填充定位和空间 2507.08456v1 -
1155 07-11 Why this and not that? A Logic-based Framework for Contrastive Explanations Warum das und nicht das? Ein logisch-basiertes Framework für kontrastive Erklärungen 为什么这样而不是这样?基于逻辑的矛盾解释框架 2507.08454v1 -
1156 07-11 Field Matching: an Electrostatic Paradigm to Generate and Transfer Data Field Matching: ein elektrostatisches Paradigma zur Generierung und Übertragung von Daten 字段匹配:生成和传输数据的电静电模型 2502.02367v2 -
1157 07-11 KGRAG-Ex: Explainable Retrieval-Augmented Generation with Knowledge Graph-based Perturbations KGRAG-Ex: Erklärbare retrieval-erweiterte Generation mit wissensgraphbasierten Störungen KGRAG-Ex: 具有基于知识图表的扰动作用的可解释的检索增强型生成器 2507.08443v1 -
1158 07-11 Optimal and Practical Batched Linear Bandit Algorithm Optimaler und praktischer Batched Linear Bandit Algorithmus 最佳和实用的 Batched 线性强盗 2507.08438v1 -
1159 07-11 FonTS: Text Rendering with Typography and Style Controls FonTS: Text Rendering mit Typografie und Style Controls FonTS: 带有打字和样式控控管的文字成文 2412.00136v3 -
1160 07-11 Answer Generation for Questions With Multiple Information Sources in E-Commerce Antwortgenerierung für Fragen mit mehreren Informationsquellen im E-Commerce 电子商务中具有多种信息来源问题的答案生成问题 2111.14003v2 -
1161 07-11 RTNinja: a generalized machine learning framework for analyzing random telegraph noise signals in nanoelectronic devices RTNinja: ein generalisierter Rahmen für maschinelles Lernen zur Analyse von zufälligen Telegraphenrauschsignalen in nanoelektronischen Geräten RTNinja:用于分析纳米电子设备随机电报噪音信号的通用机器学习框架 2507.08424v1 -
1162 07-11 Minerva: A File-Based Ransomware Detector Minerva: Ein dateibasierter Ransomware-Detektor Minerva: 以文件为基础的序列器检测器 2301.11050v4 -
1163 07-11 Towards AI-Native RAN: An Operator’s Perspective of 6G Day 1 Standardization Auf dem Weg zu KI-Native RAN: Die Perspektive des Betreibers von 6G Tag 1 Standardisierung 面向AI-Native RAN:运营商对6G日1标准化的看法 2507.08403v1 -
1164 07-11 SPINT: Spatial Permutation-Invariant Neural Transformer for Consistent Intracortical Motor Decoding SPINT: Raumpermutations-Invarianter Neuraltransformator für konsistente intrakortikale Motordekodierung SPINT: 空间变异-内变量内神经变异器,用于连贯一致的异质内装配机动车代号 2507.08402v1 -
1165 07-11 DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving DriveTransformer: Unified Transformer für skalierbares autonomes Fahren 驱动器变换: 用于可缩放的终端到终端自动驱动的统一变换器 2503.07656v2 -
1166 07-11 Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling Inferenz-Zeit-Skalierung von Diffusions-Sprachmodellen mit Partikel Gibbs-Sampling 配有粒子Gibbs抽样的传播语言模型的推推-时间缩放 2507.08390v1 -
1167 07-11 Online Pre-Training for Offline-to-Online Reinforcement Learning Online-Vorschulung für Offline-to-Online-Verstärkung 离线至在线强化学习在线培训前培训 2507.08387v1 -
1168 07-11 Estimation of conditional average treatment effects on distributed confidential data Schätzung der bedingten durchschnittlichen Behandlungseffekte auf verteilte vertrauliche Daten 对分发的机密数据进行有条件平均待遇影响的估计 2402.02672v4 -
1169 07-11 Advances in Machine Learning: Where Can Quantum Techniques Help? Fortschritte beim maschinellen Lernen: Wo können Quantentechniken helfen? 机器学习的进步:量子技术能帮助哪里? 2507.08379v1 -
1170 07-11 Sampling from Your Language Model One Byte at a Time Proben aus Ihrem Sprachmodell ein Byte zu einer Zeit 一次抽取您语言模式一字节的样本 2506.14123v2 -
1171 07-11 Learning Pole Structures of Hadronic States using Predictive Uncertainty Estimation Erlernen der Polstrukturen von Hadronischen Staaten mittels vorausschauender Unsicherheitsabschätzung 使用预测性不确定性估计值的 强力国家学习极极结构 2507.07668v2 -
1172 07-11 Prediction of Lane Change Intentions of Human Drivers using an LSTM, a CNN and a Transformer Vorhersage von Lane Change Absichten menschlicher Treiber mit einem LSTM, einem CNN und einem Transformer 使用LSTM、CNN和变形器预测人驾驶员的车道改变意图 2507.08365v1 -
1173 07-11 A Plea for History and Philosophy of Statistics and Machine Learning Ein Plädoyer für Geschichte und Philosophie der Statistik und des maschinellen Lernens 统计和机器学习历史和哲学 2506.22236v2 -
1174 07-11 Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text Nutzung von maschinellem Lernen und verbesserte Parallelitätserkennung für die BPMN-Modellgenerierung aus Text 利用机器学习和强化平行探测,从文字中生成BPMN模型 2507.08362v1 -
1175 07-11 scE$^2$TM: Toward Interpretable Single-Cell Embedding via Topic Modeling scE$^2$TM: Auf dem Weg zur Interpretierbaren Single-Cell-Einbettung über Topic Modeling ScE$2美元TM:争取通过专题建模以可解释的单一公司嵌入 2507.08355v1 -
1176 07-11 DRAN: A Distribution and Relation Adaptive Network for Spatio-temporal Forecasting DRAN: Ein Vertriebs- und Beziehungsadaptives Netzwerk für die räumlich-zeitliche Vorhersage DRAN: 空间时预报分布和关系适应网络 2504.01531v3 -
1177 07-11 Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting Galerkin-ARIMA: Ein zweistufiges Polynom-Regressions-Framework für schnelles Ein-Schritt-Vorhersagen Galerkin-ARIMA:一个双级多级倒退框架,用于快速滚动单步单步预告 2507.07469v2 -
1178 07-11 Enhancing Distributional Robustness in Principal Component Analysis by Wasserstein Distances Verbesserung der Verteilungs Robustheit in der Hauptkomponentenanalyse durch Wasserstein-Abstände 提高瓦塞斯坦距离主要构成部分分析的分布强度 2503.02494v2 -
1179 07-11 Interpretability-Aware Pruning for Efficient Medical Image Analysis Dolmetschbarkeits-Vorsicht für effiziente medizinische Bildanalyse 高效医学图像分析的解释性软件 2507.08330v1 -
1180 07-11 An Adaptive Volatility-based Learning Rate Scheduler Eine adaptive Volatilität-basierte Lernrate Scheduler 基于适应性波动的学习率计划表 2507.10575v1 -
1181 07-11 Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection Emoji-Angriff: Verstärkung von Jailbreak-Angriffen gegen Richter LLM-Erkennung Emoji攻击:加强针对LLM法官的越狱袭击 2411.01077v4 -
1182 07-11 EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees EvalTree: Profiling Language Model Schwächen über Hierarchische Fähigkeiten Bäume EvalTree:通过等级能力树分析语言模型弱点 2503.08893v2 -
1183 07-11 Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments Entschlüsselung des Interplays zwischen Übertragungseffekten und Belohnungsautokorrelationen in Switchback-Experimenten 在回转实验中解开结转效应与回转回实验中回调自动关系之间的交互作用 2403.17285v5 -
1184 07-11 A Comprehensively Adaptive Architectural Optimization-Ingrained Quantum Neural Network Model for Cloud Workloads Prediction Ein umfassend adaptives architektonisches Optimierungs- und Quantum-Neural-Netzwerkmodell für Cloud Workloads Vorhersage 全面适应性建筑建筑优化-植入量子云工作量预测神经网络模型 2507.08317v1 -
1185 07-11 CAS Condensed and Accelerated Silhouette: An Efficient Method for Determining the Optimal K in K-Means Clustering CAS Kondensiertes und Beschleunigtes Silhouette: Eine effiziente Methode zur Bestimmung des Optimalen K in K-Means Clustering CAS 集中和加速的西尔休埃特:确定K-Meyans集群中最佳K的高效方法 2507.08311v1 -
1186 07-11 M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning M2-Reasoning: Stärkung von MLLMs mit einheitlicher allgemeiner und räumlicher Vernunft M2-反应:以统一的一般和空间理由,赋予MLLMs权力 2507.08306v1 -
1187 07-11 Amortized Posterior Sampling with Diffusion Prior Distillation Amortisierte amortisierte hintere Probenahme mit Diffusionsvordestillation 先前蒸馏阶段的分散分解的摊销水底抽样 2407.17907v2 -
1188 07-11 Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers Bandit-Based Prompt Design Strategy Selection verbessert Prompt Optimizers 基于强盗的即时设计战略选择改进即时优化 2503.01163v2 -
1189 07-11 Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces Dualformer: Kontrollierbares schnelles und langsames Denken durch Lernen mit Randomized Reasoning Traces 二进制:可控制快速和慢思维,通过学习随机调整理性路径进行思考 2410.09918v3 -
1190 07-11 Granular Ball Twin Support Vector Machine Granular Ball Twin Unterstützung Vektor Maschine 颗粒球双双支持矢量机 2410.04774v3 -
1191 07-11 Distributional Soft Actor-Critic with Diffusion Policy Verteilungs-Soft-Actor-Kritik mit Diffusionspolitik 配发软软软动作- 带有传播政策批评器 2507.01381v3 -
1192 07-11 Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training Leichte Sicherheits-Guardrails über Synthetische Daten und RL-geführtes Adversarial Training 通过合成数据和RL制导反向训练轻量安全护卫车 2507.08284v1 -
1193 07-11 Predictive Causal Inference via Spatio-Temporal Modeling and Penalized Empirical Likelihood Prädiktive Kausalableitung über Spatio-Temporale Modellierung und Penalized Empirical Likelihood 通过SPATIO-临时模拟和惩罚性实证可能性,预测性因果推断 2507.08896v1 -
1194 07-11 MIRRAMS: Towards Training Models Robust to Missingness Distribution Shifts MIRRAMS: Auf dem Weg zu Trainingsmodellen Robuste bis fehlende Verteilungsverschiebungen MIRRAMS:努力建立培训模式,以强化缺失分布分布变化 2507.08280v1 -
1195 07-11 Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets Pocket2Mol: Effiziente molekulare Probenahme auf Basis von 3D Protein Pockets Pocket2Mol:基于 3D 蛋白质薄片的有效分子取样 2205.07249v2 -
1196 07-11 Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators Lokales Transfer-Lernen Gaußsche Prozessmodellierung, mit Anwendungen zur Ersatzmodellierung von teuren Computersimulatoren 当地转移学习学习 高斯进程建模,并应用昂贵计算机模拟器替代模型 2410.12690v3 -
1197 07-11 EmissionNet: Air Quality Pollution Forecasting for Agriculture EmissionsNet: Vorhersage der Luftqualität für die Landwirtschaft 排放网:农业空气质量污染预测 2507.05416v2 -
1198 07-11 A Novel Shape-Aware Topological Representation for GPR Data with DNN Integration Eine neuartige formbewusste Topologische Darstellung für GPR-Daten mit DNN-Integration 与 DNN 融合的GPR数据新元形状- 工具地形代表 2506.06311v2 -
1199 07-11 Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization datengetriebene Dimensionssynthese unterschiedlicher planarer Vier-Leiter-Funktionsgenerierungsmechanismen über direkte Parametrierung 通过直接参数化实现的多层平板四巴函数生成机制数据驱动多维度合成 2507.08269v1 -
1200 07-11 A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning Ein praktisches Zwei-Stufen-Rezept für mathematische LLMs: Maximierung der Genauigkeit mit SFT und Effizienz mit Verstärkungslernen 数学LMM的实用双级两套套套餐:最大限度地提高SFT的准确度和强化学习的效率 2507.08267v1 -
1201 07-11 Task Arithmetic Through The Lens Of One-Shot Federated Learning Aufgabe Arithmetik durch die Linse des ein-shot-Federated Learning 通过单层联邦学习的镜头进行任务自真 2411.18607v2 -
1202 07-11 Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration Navigieren Sie das Unbekannte: Verbesserung der LLM-Vernunft mit intrinsischer Motivation geführte Exploration 导航未知:利用内在动力性引导探索加强LLM 2505.17621v3 -
1203 07-11 Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks Zulässigkeit des Steinschrumpfens für die Batch-Normalisierung in Gegenwart von Adversarialangriffen 是否允许施泰因·施特里奇在出现对立攻击时进行批次正常化 2507.08261v1 -
1204 07-11 Quantum-Accelerated Neural Imputation with Large Language Models (LLMs) Quantenbeschleunigte neurale Imputation mit großen Sprachmodellen (LLMs) 与大语言模型(LLMs)的量度加速神经量算 2507.08255v1 -
1205 07-11 Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models Raptor: Skalierbare Train-Free-Embeddings für 3D medizinische Volumes Leveraging Pretrained 2D Foundation Models 3D医疗量利用预先训练的2D基础模型 2507.08254v1 -
1206 07-11 Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs Algorithmische Kontiguität aus Low-Grad-Konjektur und Anwendungen in korrelierten Zufallsgraphen 低度推测和相关随机图中应用的低度推断和 2502.09832v3 -
1207 07-11 Thinner Latent Spaces: Detecting Dimension and Imposing Invariance with Conformal Autoencoders Dünnere Latent Spaces: Dimension erkennen und Invarianz mit konformen Autoencodern imposieren 细边空格: 检测尺寸和与普通自动编码器的不协调情况 2408.16138v2 -
1208 07-11 SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths SpecDec++: Spekulative Dekodierung durch adaptive Kandidatenlängen steigern SpecDec+++:通过适应性候选时间长度促进投机替代 2405.19715v3 -
1209 07-11 PAC-Bayes Analysis for Recalibration in Classification PAC-Bayes Analyse zur Rekalibrierung in der Klassifizierung PAC-Bayes分类重新计算分析 2406.06227v2 -
1210 07-11 Transfer Learning and Mixup for Fine-Grained Few-Shot Fungi Classification Transfer-Lernen und Mischen für die feinkörnige Wenig-Hot-Fungi-Klassifikation 微粒少沙托菌菌类分类的转移学习和混合学习和混合 2507.08248v1 -
1211 07-11 A Survey on State-of-the-art Deep Learning Applications and Challenges Eine Umfrage zu aktuellen Anwendungen und Herausforderungen des Deep Learning 关于最先进的深深学习应用和挑战的调查 2403.17561v8 -
1212 07-11 CoreSPECT: Enhancing Clustering Algorithms via an Interplay of Density and Geometry CoreSPECT: Verbesserung der Clustering-Algorithmen durch ein Interplay von Dichte und Geometrie 核心内容:通过密度和几何的相互作用加强群集比 2507.08243v1 -
1213 07-11 An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems Ausblick auf die Chancen und Herausforderungen multiagenter KI-Systeme 关于多机构AI系统机会和挑战的展望 2505.18397v2 -
1214 07-11 Exploring Gender Differences in Chronic Pain Discussions on Reddit Erforschung geschlechtsspezifischer Unterschiede bei chronischen Schmerzdiskussionen auf Reddit 探讨关于康复的慢性疼痛讨论中的性别差异 2507.08241v1 -
1215 07-11 On the Principles of ReLU Networks with One Hidden Layer Über die Prinzipien von ReLU-Netzwerken mit einer verborgenen Ebene 关于 “ 同一层 “ RELU网络原则 2411.06728v2 -
1216 07-11 Data Generation without Function Estimation Datenerstellung ohne Funktionsabschätzung 无函数估算的生成数据 2507.08239v1 -
1217 07-11 Self-Supervised Learning-Based Multimodal Prediction on Prosocial Behavior Intentions Selbstüberwachte multimodale Lernvorhersage über prosoziale Verhaltensabsichten 对有利社会行为行为的自我监督学习的多模式预测 2507.08238v1 -
1218 07-11 InsightBuild: LLM-Powered Causal Reasoning in Smart Building Systems InsightBuild: LLM-Powered Causal Reasoning in Smart Building Systems Insight 建筑:智能建筑系统中的LLM能动原因推理 2507.08235v1 -
1219 07-10 (4) Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning Verdolmetschen von großen Text-zu-Bild-Diffusions-Modellen mit Wörterbuch-Lernen 解释具有字典学习的大文本到图像传播模型 2505.24360v3 -
1220 07-10 ZKTorch: Compiling ML Inference to Zero-Knowledge Proofs via Parallel Proof Accumulation ZKTorch: Kompilieren von ML-Inferenz zu Null-Wissens-Proofs durch parallele Proof-Kumulation ZKTorch:通过平行证据累积,将ML推论编成零知识证据 2507.07031v2 -
1221 07-10 Extracting memorized pieces of (copyrighted) books from open-weight language models Extrahieren von auswendig gelernten Stücken von Büchern aus Open-Wight-Sprachmodellen 从开放重量级语言模式中提取(复印权)书籍 2505.12546v2 -
1222 07-10 On the Necessity of Output Distribution Reweighting for Effective Class Unlearning Über die Notwendigkeit der Neugewichtung der Output-Distribution für effektives Klassenunlernen 有效班级取消学习时必须增加产出分配的加权 2506.20893v2 -
1223 07-10 EvA: Evolutionary Attacks on Graphs EvA: Evolutionäre Angriffe auf Graphen EvA:对图表的进化攻击 2507.08212v1 -
1224 07-10 Deep Learning-Based Forecasting of Boarding Patient Counts to Address ED Overcrowding Deep Learning-based Forecasting von Boarding-Patienten zählt ED Overcrowding Adresse 对寄宿病人计数进行深入的基于学习的预测,以解决ED过度拥挤问题 2505.14765v2 -
1225 07-10 Signed Diverse Multiplex Networks: Clustering and Inference Signierte Vielfältige Multiplex-Netzwerke: Clustering und Schlussfolgerung 已签署的多元多重网络:集群和推断 2402.10242v3 -
1226 07-10 Compositional Risk Minimization Zusammensetzungelle Risikominimierung 尽量减少风险 2410.06303v3 -
1227 07-10 EP-GAT: Energy-based Parallel Graph Attention Neural Network for Stock Trend Classification EP-GAT: Energiebasierte parallele Graphen-Achtung Neuronales Netzwerk für die Bestandstrendklassifikation EP-GAT:基于能源的库存趋势分类平行图形关注神经网络 2507.08184v1 -
1228 07-10 Shifting Work Patterns with Generative AI Verschiebende Arbeitsmuster mit generativer KI 以创新创新创新创新创新创新创新创新创新创 2504.11436v3 -
1229 07-10 Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm Cloud Computing Energieverbrauch Vorhersage auf Basis von Kernel Extreme Learning Machine Algorithm Verbessert durch Vector Gewichteter Durchschnitt Algorithm 以内核极端学习机器算法为基础,用矢量加权平均算法改进的云计算 云能消耗预测值 2503.04088v3 -
1230 07-10 Parametrized Quantum Circuit Learning for Quantum Chemical Applications Parametrisiertes Quantum Circuit Lernen für Quantum Chemical Anwendungen 量子化学应用量子电路学习 2507.08183v1 -
1231 07-10 State Estimation Using Sparse DEIM and Recurrent Neural Networks Staatliche Schätzung mit Sparse DEIM und recurrenten Neuronalen Netzwerken 使用简缩的DEIM和经常性神经网络的状态估计 2410.15982v2 -
1232 07-10 CTRLS: Chain-of-Thought Reasoning via Latent State-Transition CTRLS: Gedankliche Veranlagung durch Latent State-Transition CTRLS:通过中端国家-过渡进行的研究链理由 2507.08182v1 -
1233 07-10 Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations Wissenschaftliches maschinelles Lernen chaotischer Systeme entdeckt regierende Gleichungen für neurale Bevölkerungen 神经人口等分的麻风系统发现科学机学 2507.03631v2 -
1234 07-10 Rethinking Spatio-Temporal Anomaly Detection: A Vision for Causality-Driven Cybersecurity Spatio-Temporale Anomalie-Erkennung neu denken: Eine Vision für ursächliche Cybersicherheit 重新思考时空空间异常探测:驱动力-驱动网络安全愿景 2507.08177v1 -
1235 07-10 Emotion Recognition in Older Adults with Quantum Machine Learning and Wearable Sensors Emotionserkennung bei älteren Erwachsenen mit Quantum Machine Learning und tragbaren Sensoren 具有量子机器学习和穿戴感应器的老年人的情感认同 2507.08175v1 -
1236 07-10 Reconstructing Galaxy Cluster Mass Maps using Score-based Generative Modeling Rekonstruieren von Galaxy Cluster Massenkarten mit Score-basierte Generative Modellierung 使用计分生成模型重建银河群群群地图 2410.02857v2 -
1237 07-10 Emotion Detection in Older Adults Using Physiological Signals from Wearable Sensors Emotionserkennung bei älteren Erwachsenen mit physiologischen Signalen von tragbaren Sensoren 使用穿戴感应器的生理信号在老年人体内检测情感 2507.08167v1 -
1238 07-10 Grokking Beyond the Euclidean Norm of Model Parameters Grokking jenseits der euklidischen Norm von Modellparametern 示范参数欧洲标准 2506.05718v2 -
1239 07-10 Adaptive Diffusion Denoised Smoothing : Certified Robustness via Randomized Smoothing with Differentially Private Guided Denoising Diffusion Adaptive Diffusion Denoised Glättung : Zertifizierte Robustheit durch Randomized Glättung mit Differential Private Guided Denoising Diffusion 适应性扩散 脱节滑动:通过有差异的私人制导滑动,通过随机化滑动,证明强力 2507.08163v1 -
1240 07-10 Hybrid machine learning based scale bridging framework for permeability prediction of fibrous structures Hybrides maschinelles Lernen auf Basis von Skalenüberbrückungsrahmen für die Permeabilitätsvorhersage von faserigen Strukturen 用于预测纤维结构渗透性的混合机 机床学习比例过渡框架 2502.05044v2 -
1241 07-10 Just Read the Question: Enabling Generalization to New Assessment Items with Text Awareness Lesen Sie einfach die Frage: Ermöglichung der Generalisierung zu neuen Bewertungsgegenständen mit Text-Bewusstsein 只需读一读问题:在有文本意识的情况下,使新的评估项目能够普遍化。 2507.08154v1 -
1242 07-10 ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction ALCo-FM: Adaptives Long-Context-Stiftungsmodell für Unfallvorhersage ALCO-FM:适应性长全文基金会事故预测模型 2507.08153v1 -
1243 07-10 Downscaling Extreme Precipitation with Wasserstein Regularized Diffusion Downscaling Extreme Niederschlag mit Wasserstein Regularized Diffusion 降降降极端降降,与瓦塞斯坦正规化的传播 2410.00381v3 -
1244 07-10 CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk CLEAR: Kalibriertes Lernen für epistemisches und aleatorisches Risiko CLEAR: 流行和感知风险校准学习 2507.08150v1 -
1245 07-10 Convergence of Natural Policy Gradient for a Family of Infinite-State Queueing MDPs Konvergenz des Gradienten der Naturpolitik für eine Familie unendlicher Staaten, die MDPs in Anspruch nehmen 自然政策 “ 进步 “ 与 “ 无限国家排队多DP家庭 “ 的趋同 2402.05274v3 -
1246 07-10 UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching UmbraTTS: Text-zu-Sprechen an Umweltkontexte anpassen mit Flow Matching UmbratTS:用流动匹配使文字语音适应环境环境环境 2506.09874v2 -
1247 07-10 Stochastic Operator Network: A Stochastic Maximum Principle Based Approach to Operator Learning Stochastic Operator Network: Ein stochastisches Maximum-Prinzip basierte Ansatz zum Operator-Lernen 存储操作员网络:运行操作员学习的存储最大原则性方法 2507.10401v1 -
1248 07-10 Graph Convolutional Branch and Bound Graphischer konvolutionärer Zweig und Bound 革命分支和圆环 2406.03099v3 -
1249 07-10 Assessing the Chemical Intelligence of Large Language Models Bewertung der chemischen Intelligenz großer Sprachmodelle 评估大语言模型的化学情报 2505.07735v2 -
1250 07-10 Physics-Informed Neural Networks with Hard Nonlinear Equality and Inequality Constraints Physik-informierte neurale Netzwerke mit harten nichtlinearen Gleichstellungs- und Ungleichheitsbeschränkungen 具有硬非线性平等和不平等制约因素的物理内立神经网络 2507.08124v1 -
1251 07-10 Quasi-Random Physics-informed Neural Networks Quasi-Random Physik-informierte Neuronale Netzwerke 准环境网 物理-知情神经网络 2507.08121v1 -
1252 07-10 PDE-aware Optimizer for Physics-informed Neural Networks PDE-aware Optimizer für physikinformierte Neuronale Netzwerke PDE-觉醒物理知情神经网络优化器 2507.08118v1 -
1253 07-10 Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation Mallows-Modell mit Lerndistanz-Metriken: Probenahme und maximale Likelihood-Schätzung 边远计量:抽样和最大可能性估计 2507.08108v1 -
1254 07-10 Predicting Flow Dynamics using Diffusion Models Vorhersage von Strömungsdynamiken mit Diffusionsmodellen 利用传播模型预测流动动态 2507.08106v1 -
1255 07-10 PIAD-SRNN: Physics-Informed Adaptive Decomposition in State-Space RNN PIAD-SRNN: Physik-informierte Adaptive Zersetzung im State-Space RNN PIAD-SRNN: 国家空间空间网中的物理系统化适应性分解 2412.00994v2 -
1256 07-10 Low-rank Momentum Factorization for Memory Efficient Training Low-rank Momentum Factorization für ein speichereffizientes Training 记忆高效培训的低调动力化 2507.08091v1 -
1257 07-10 Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models Auswirkungen von Pretraining Word Co-occurence auf die kompositorische Generalisierung in multimodalen Modellen 预言前世界共同会议对多式联运模式中整体构成的影响 2507.08000v1 -
1258 07-10 Single-pass Adaptive Image Tokenization for Minimum Program Search Single-Pass Adaptive Image Tokenization für minimale Programmsuche 用于最低程序搜索的单一被动图像适配 2507.07995v1 -
1259 07-10 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Überspringen Sie eine Ebene oder Schleifen Sie es? Test-Zeit Tiefe Anpassung von vorgebildeten LLMs 跳过图层或循环它? 预设 LLM 的测试时间深度适应 2507.07996v1 -
1260 07-10 Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012 Verwendung von KI zur Zusammenfassung der US-Präsidentschaftskampagne TV-Werbung Videos, 1952-2012 利用大赦国际总结1952-2012年美国总统竞选运动电视广告视频, 2503.22589v2 -
1261 07-10 Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions Quantile Reward Policy Optimierung: Ausrichtung mit punktweisen Regressions- und Exaktpartitionsfunktionen 量化奖退利政策优化:与点回归和精密分区函数一致 2507.08068v1 -
1262 07-10 KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors KinDEL: DNA-kodierter Bibliotheks-Datensatz für Kinase-Inhibitoren KinDEL: Kinas Inhibbitor 的DNA编码图书馆数据集 2410.08938v2 -
1263 07-10 Why is Your Language Model a Poor Implicit Reward Model? Warum ist Ihr Sprachmodell ein schlechtes Implizit-Reward-Modell? 为什么您的语言模式 是一个贫穷的隐含奖赏模式? 2507.07981v1 -
1264 07-10 Prospective Learning in Retrospect Zukunftsorientiertes Lernen im Nachhinein 回溯中的未来学习 2507.07965v1 -
1265 07-10 TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices TinierHAR: Auf dem Weg zu ultraleichten Deep-Learning-Modellen für effiziente menschliche Aktivitätserkennung auf Edge-Geräten TiniierHAR:迈向超轻量深深学习模型,以便有效识别人类在边缘装置方面的活动 2507.07949v1 -
1266 07-10 BarcodeBERT: Transformers for Biodiversity Analysis BarcodeBERT: Transformer für Biodiversitätsanalyse 条码BERT:生物多样性分析变异器 2311.02401v3 -
1267 07-10 Towards Continuous Home Cage Monitoring: An Evaluation of Tracking and Identification Strategies for Laboratory Mice Towards Continuous Home Cage Monitoring: Eine Bewertung von Tracking- und Identifikationsstrategien für Labor-Mäuse 逐步实现家用钥匙持续监测:对实验室老鼠跟踪和识别战略的评价 2507.07929v1 -
1268 07-10 A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search Eine Theorie der Schlussfolgerung Berechnung Scaling: Vernunft durch gerichtete stochastische Fähigkeiten Suche 推断计算尺度理论论:通过定向斯托卡技能搜索推理 2507.00004v2 -
1269 07-10 No $D_{\text{train}}$: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning Keine $D_{\text{train}}}$: Modell-agnostische Gegenfaktische Erklärungen mit Verstärkungslernen 无 $D{text{train$:利用强化学习模型-不可允许的反事实解释 2405.18563v2 -
1270 07-10 Plausible Counterfactual Explanations of Recommendations Plausible gegenfaktische Erklärungen der Empfehlungen 对建议的反事实解释 2507.07919v1 -
1271 07-10 A statistical physics framework for optimal learning Statistischer Physikrahmen für optimales Lernen 促进最佳学习的统计物理框架 2507.07907v1 -
1272 07-10 Agentic Retrieval of Topics and Insights from Earnings Calls Agentische Retrieval von Themen und Erkenntnisse aus Earnings Calls 收入呼吁的主题和透视的 Agent 检索 2507.07906v1 -
1273 07-10 Enhancing Cross Entropy with a Linearly Adaptive Loss Function for Optimized Classification Performance Verbesserung der Kreuzentropie mit einer linearen adaptiven Verlustfunktion für optimierte Klassifizierungsleistung 优化分类绩效的线性适应性损失函数 2507.10574v1 -
1274 07-10 Efficient Causal Discovery for Autoregressive Time Series Effiziente Causal Discovery für autoregressive Zeitreihen 自动递减时间序列高效因果发现 2507.07898v1 -
1275 07-10 Sampling Imbalanced Data with Multi-objective Bilevel Optimization Probenahme ausgewogener Daten mit multi-objektiver Bilevel-Optimierung 具有多目标双一级最佳优化的数据 2506.11315v2 -
1276 07-10 Masked Image Modeling: A Survey Maskenbildmodellierung: Eine Umfrage 蒙面图像建模:调查 2408.06687v3 -
1277 07-10 A Bilevel Optimization Framework for Imbalanced Data Classification Ein Bilevel-Optimierungsrahmen für die unausgewogene Datenklassifikation 平衡数据分类双级优化框架 2410.11171v3 -
1278 07-10 UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs UnIT: Skalierbare unstrukturierte Schlussfolgerungs-Zeit-Rechnung für MAC-effiziente Neuralinferenz auf MCUs UnIT:MAC 高效神经引力对多边协调单位的可缩放无结构的推推力-时间节制 2507.07885v1 -
1279 07-10 Can AI-predicted complexes teach machine learning to compute drug binding affinity? Können KI-vorhergesehene Komplexe maschinelles Lernen beibringen, um Arzneimittelbindungsaffinität zu berechnen? 人工智能预测综合体能教机器学习如何计算药物绑定的亲缘关系吗? 2507.07882v1 -
1280 07-10 What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models Was hat ein Stiftungsmodell gefunden? Mit induktiven Bias zur Untersuchung von Weltmodellen ” 基金会模式 “ 有何发现? 2507.06952v2 -
1281 07-10 Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models Edge-ASR: Auf dem Weg zur Low-Bit Quantisierung von automatischen Spracherkennungsmodellen 边缘-ASR:实现自动语音识别模式的低比量量化 2507.07877v1 -
1282 07-10 Fair Uncertainty Quantification for Depression Prediction Faire Unsicherheit Quantifizierung für Depression Vorhersage 预测萧条预测的公平不确定性量化 2505.04931v2 -
1283 07-10 Improving AEBS Validation Through Objective Intervention Classification Leveraging the Prediction Divergence Principle Verbesserung der AEBS-Validierung durch Ziel-Interventions-Klassifikation Begünstigung des Prinzips der Prognoseabweichung 通过利用预测差异原则的客观干预分类,改进对AEBS的验证 2507.07872v1 -
1284 07-10 Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking Abmildernde Wasserzeichen-Stealing-Angriffe in generativen Modellen über Multi-Key-Wasserzeichen 通过多钥匙划水标记,在产生模型时通过多钥匙划水标记减轻盗用盗用水标志袭击 2507.07871v1 -
1285 07-10 Parametric Scaling Law of Tuning Bias in Conformal Prediction Parametrisches Skalierungsgesetz des Tuning Bias in konformer Vorhersage 非正规预测中计票比价的参数衡量法 2502.03023v2 -
1286 07-10 Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders Re-Bottleneck: Latent Re-Structuring für Neural Audio Autoencoder 重新装瓶器:神经音频自动自动编码器前端重新结构 2507.07867v1 -
1287 07-10 Predicting and generating antibiotics against future pathogens with ApexOracle Vorhersage und Generierung von Antibiotika gegen zukünftige Krankheitserreger mit ApexOracle 预测并产生抗生素,用ApexOracle来防治未来的病原体 2507.07862v1 -
1288 07-10 Studying and Improving Graph Neural Network-based Motif Estimation Untersuchung und Verbesserung der graphischen Neuralnetz-basierten Motivationsschätzung 研究和改善图形神经网络基于Motif 估计 2506.15709v3 -
1289 07-10 Principled Foundations for Preference Optimization Prinzipierte Grundlagen für die Preference-Optimierung 最优化原则基金会 2507.07855v1 -
1290 07-10 Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain Kreditrisikoanalyse für KMU mit Hilfe von Graph Neural Networks in der Lieferkette 利用供应链中图表神经网络的中小企业信贷风险分析 2507.07854v1 -
1291 07-10 Optimization Guarantees for Square-Root Natural-Gradient Variational Inference Optimierungsgarantien für Square-Root Natural-Gradient Variational Inferenz 平方-极极自然-梯度变动性推断的最佳保障 2507.07853v1 -
1292 07-10 Pre-Trained AI Model Assisted Online Decision-Making under Missing Covariates: A Theoretical Perspective Pre-Trained AI Model Assisted Online Entscheidungsfindung unter fehlenden Kovariaten: Eine theoretische Perspektive 在失踪的共变之下协助作出在线决策的模式:理论视角 2507.07852v1 -
1293 07-10 Revisiting the Predictability of Performative, Social Events Über die Vorhersagbarkeit von performativen, gesellschaftlichen Veranstaltungen 重新审视表演性、社会活动的可预测性 2503.11713v2 -
1294 07-10 “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents “So, erzählen Sie mir über Ihre Politik…”: Destillation von interpretierbaren Richtlinien von Deep Reinforcement Learning Agents “告诉我你们的政策……:从深强化学习机构那里提炼可解释的政策”。 2507.07848v1 -
1295 07-10 Response Wide Shut? Surprising Observations in Basic Vision Language Model Capabilities Response Wide Shut? Überraschende Beobachtungen in grundlegenden Vision Sprachmodell Fähigkeiten 在基本愿景语言模型能力中的令人惊讶的观察 2507.10442v1 -
1296 07-10 Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components Bewertung der Einhaltung der Hierarchischen Sicherheitsgrundsätze durch LLM-Agenten: Ein leichter Maßstab für die Erprobung grundlegender Steuerungskomponenten 遵守等级安全原则:基础控制组成部分检验的轻量基准 2506.02357v2 -
1297 07-10 Unsupervised Morphological Tree Tokenizer Unüberwachter morphologischer Baum Tokenizer 不受监督的病理树化器 2406.15245v2 -
1298 07-10 Statistical physics analysis of graph neural networks: Approaching optimality in the contextual stochastic block model Statistische Physik-Analyse von Graphen-Neuronalen-Netzwerken: Annäherung an die Optimität im kontextuellen stochastischen Blockmodell 图形神经网络的统计物理学分析:在背景随机区块模型中接近最佳性 2503.01361v2 -
1299 07-10 Towards Benchmarking Foundation Models for Tabular Data With Text Auf dem Weg zu Benchmarking-Grundlagenmodellen für tabellarische Daten mit Text 建立文字表格数据基准基准基础模型 2507.07829v1 -
1300 07-10 An Empirical Bernstein Inequality for Dependent Data in Hilbert Spaces and Applications Eine empirische Bernsteinungleichheit für abhängige Daten in Hilbert-Räumen und Anwendungen 希尔伯特空间和应用中依赖数据方面的不平等问题 2507.07826v1 -
1301 07-10 Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution Symmetrie entdecken Breaking in Physical Systems mit entspannter Gruppenkonvolution 发现物理系统中的对称断裂与放松的集团革命 2310.02299v8 -
1302 07-10 MAEBE: Multi-Agent Emergent Behavior Framework MAEBE: Multi-Agent Emergent Behavior Framework 多边代理新兴行为框架 2506.03053v2 -
1303 07-10 An Algorithm for Learning Smaller Representations of Models With Scarce Data Ein Algorithmus für das Lernen kleinerer Darstellungen von Modellen mit knappen Daten 学习缺乏数据模型较小比例模型的计算方法 2010.07990v2 -
1304 07-10 AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift KI sollte besser fühlen, nicht nur größer skalieren: Adaptive Sensing als Paradigmenverschiebung AI 应当更好,而不仅仅是规模更大:将适应性遥感作为范式转变 2507.07820v1 -
1305 07-10 MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving MoSE: Skill-by-Skill-Mixture-of-Expert-Lernen für autonomes Fahren MOSE: 自主驾驶专家技能与技能混合学习 2507.07818v1 -
1306 07-10 Pay Attention to Attention Distribution: A New Local Lipschitz Bound for Transformers Achten Sie auf Aufmerksamkeit Verteilung: Eine neue lokale Lipschitz Bound für Transformatoren ” 注意注意分发 “ : “ 变革者新地方利普施奇茨圆环 “ 。 2507.07814v1 -
1307 07-10 “I am bad”: Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models “I am bad”: Verdolmetschen von Stealthy, Universal und Robust Audio Jailbreaks in Audio-Language-Modellen “我是坏人”:在音频语言模型中解释隐形、通用和强势音频牢房破损 2502.00718v2 -
1308 07-10 Deep Survival Analysis in Multimodal Medical Data: A Parametric and Probabilistic Approach with Competing Risks Tiefe Überlebensanalyse in multimodalen medizinischen Daten: Ein parametrischer und probabilistischer Ansatz mit kompetitiven Risiken 多模式医疗数据深度生存分析:与相竞风险的参数和概率分析方法 2507.07804v1 -
1309 07-10 Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning Kontextuelle Banditen in der Zahlungsabwicklung: Nicht einheitliche Exploration und überwachtes Lernen 付款处理:非统一探索和监督学习 2412.00569v2 -
1310 07-10 Space-Filling Regularization for Robust and Interpretable Nonlinear State Space Models Raumfüllende Regularisierung für robuste und interpretierbare nichtlineare State Space Modelle 强力和可解释的非线性国家空间模型的空间巡空常规化 2507.07792v1 -
1311 07-10 Understanding Chain-of-Thought in LLMs through Information Theory Verständnis der in LLMs durch Informationstheorie gesuchten Gedankenkette 通过信息理论在LLM 中探索了解链 2411.11984v2 -
1312 07-10 Unsupervised Automata Learning via Discrete Optimization Unüberwachtes Automata-Lernen über Diskrete Optimierung 通过 Discrete 优化化学习不受监督的自动自动数据 2303.14111v2 -
1313 07-10 Learning Algorithms in the Limit Algorithmen lernen an der Grenze 在限制范围内学习算法 2506.15543v2 -
1314 07-10 Approximation Depth of Convex Polytopes Näherungstiefe von Konvex-Polytopen 电解多面的近似深度 2507.07779v1 -
1315 07-10 Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training Aufgabenverhalten synchronisieren: Mehrere Aufgaben während der Test-Time-Schulung ausrichten 同步任务行为: 测试时训练中对齐多个任务 2507.07778v1 -
1316 07-10 Deep Learning is Not So Mysterious or Different Deep Learning ist nicht so geheimnisvoll oder anders 深深学习不是那么神秘或不同 2503.02113v2 -
1317 07-10 A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision Ein einheitliches empirisches Risikominimierungs-Framework für flexible N-Tuples Schwache Überwachung 灵活N-Tuples弱监督统一经验风险最小化框架 2507.07771v1 -
1318 07-10 BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning BEAVER: Bauen von Umgebungen mit einschätzbarer Variation zur Bewertung von multi-objektiven Verstärkungslernen BEAVER: 在环境建设中采用可评估的变数评估多目标强化学习 2507.07769v1 -
1319 07-10 TRIX- Trading Adversarial Fairness via Mixed Adversarial Training TRIX- Trading-Adversarial Fairness durch gemischte Adversarial Training TRIX-通过混合反向培训进行贸易反向公平 2507.07768v1 -
1320 07-10 Distributed and Decentralised Training: Technical Governance Challenges in a Shifting AI Landscape Verteilte und dezentralisierte Ausbildung: Technische Governance-Herausforderungen in einer sich verändernden KI-Landschaft 分散和分散化培训:AI 横向变化中的技术治理挑战 2507.07765v1 -
1321 07-10 OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting OPC: One-Point-Contraction Unlearning Toward Deep Feature Vergessen OPC: 一点-合同拆开学习深地地貌的遗忘 2507.07754v1 -
1322 07-10 Efficient and Scalable Estimation of Distributional Treatment Effects with Multi-Task Neural Networks Effiziente und skalierbare Abschätzung der Verteilungseffekte mit multi-Task Neuronalen Netzwerken 与多任务神经神经网络一道高效和可缩放地估算分布式治疗效应 2507.07738v1 -
1323 07-10 GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing GuardVal: Dynamic Large Language Model Jailbreak Evaluation für umfassende Sicherheitstests 警卫:综合安全测试动态大语言示范监狱防爆评价 2507.07735v1 -
1324 07-10 Robust Federated Personalised Mean Estimation for the Gaussian Mixture Model Robuste, federführende, personalisierte mittlere Schätzung für das Gaussian Mixture Model Gaussian Mixture 模型的联邦硬性个人化平均平均估计值 2504.19955v2 -
1325 07-10 Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization Stabile Preference-Optimierung für LLMs: Ein zweistufiger Ansatz über die direkte Preference-Optimierung hinaus 对LLLMM公司的稳定优惠优化:超越直接优惠优化的双级办法 2507.07723v1 -
1326 07-10 Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means Robuste Verteilte Schätzung: Erweiterung von Gossip-Algorithmen auf Rangfolge und Trimmmittel 强有力的分布分布式估算:将Gossip的数值扩大至排名和缩略语 2505.17836v6 -
1327 07-10 Discrete Optimal Transport and Voice Conversion Diskreter Optimaler Transport und Sprachumwandlung 分辨最佳传输和语音转换 2505.04382v2 -
1328 07-10 Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots Adaptive Gaussian Mixture Models-basierte Anomalieerkennung für unterbeschränkte kabelgetriebene Parallelroboter 用于控制不足的有线驱动平行机器人的适应性高斯混合混合模型异常探测 2507.07714v1 -
1329 07-10 Balancing the Past and Present: A Coordinated Replay Framework for Federated Class-Incremental Learning Ausbalancieren der Vergangenheit und Gegenwart: Ein koordiniertes Replay-Framework für das Federated Class-Incremental Learning 平衡过去和现在的平衡:联邦级强化学习协调重现框架 2507.07712v1 -
1330 07-10 Shapley-Based Data Valuation with Mutual Information: A Key to Modified K-Nearest Neighbors Shapley-based Data Valuation mit gegenseitiger Information: Ein Schlüssel zu veränderten K-Nächsten Nachbarn 与相互信息一起进行基于虚光的数据估值:修改 K- 最近邻的密钥 2312.01991v4 -
1331 07-10 Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought Rationale-Enhanced Decodierung für multimodale Chain-of-Thought 多式联运谈判链附加说明 2507.07685v1 -
1332 07-10 Accelerating Transposed Convolutions on FPGA-based Edge Devices Beschleunigung transponierter Konvolutionen auf FPGA-basierten Edge-Geräten 加速基于 FPGA 的边缘设备的转换变速 2507.07683v1 -
1333 07-10 Beyond Cox Models: Assessing the Performance of Machine-Learning Methods in Non-Proportional Hazards and Non-Linear Survival Analysis Jenseits von Cox-Modellen: Bewertung der Leistungsfähigkeit von Machine-Learning-Methoden bei nichtproportionalen Gefahren und nichtlinearer Überlebensanalyse 超越考克斯模型:评估机器学习方法在非季节性危险和无林性生存分析方面的性能 2504.17568v2 -
1334 07-10 Implicit Counterfactual Data Augmentation for Robust Learning Implizite gegenfaktische Datenvergrößerung für robustes Lernen 强力学习所需的反事实数据放大 2304.13431v4 -
1335 07-10 Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks Einige theoretische Ergebnisse auf schichtweise Effektive Dimensions-Oszillationen in Finite-Wide-ReLU-Netzwerken 关于有限宽度 RELU 网络中多层有效尺寸振动的一些理论结果 2507.07675v1 -
1336 07-10 Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL Uncovering RL Integration in SSL Loss: Zielspezifische Implikationen für dateneffiziente RL SSL损失中未覆盖的 RL 整合:对数据高效RL的客观具体影响 2410.17428v3 -
1337 07-10 Curriculum Negative Mining For Temporal Networks Curriculum Negative Mining für zeitliche Netzwerke 时间网络负面采矿课程 2407.17070v2 -
1338 07-10 Machine Learning-Assisted Surrogate Modeling with Multi-Objective Optimization and Decision-Making of a Steam Methane Reforming Reactor Machine Learning-Assisted Surrogate Modellierung mit multi-objektiver Optimierung und Entscheidungsfindung eines Dampfmethan-Reformreaktors 利用蒸气甲烷改造反应堆的多目标优化和决策 2507.07641v1 -
1339 07-10 HLF-FSL. A Decentralized Federated Split Learning Solution for IoT on Hyperledger Fabric HLF-FSL. Eine dezentrale, gefederte Split-Learning-Lösung für IoT auf Hyperledger Fabric HLF-FLF-FLF. 关于超板机纤维化的IOT的分散化的联邦学习分解解决方案 2507.07637v1 -
1340 07-10 Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights Vergleichende Stimmungsanalyse der öffentlichen Wahrnehmung: Monkeypox vs. COVID-19 Verhaltenseinblicke 对公众感知的比较情绪分析:天花对COVID-19行为洞察力 2505.07430v2 -
1341 07-10 Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks Erforschung der Grenzen der Modellkompression in LLMs: Eine Studie zur Wissensdestillation über QA-Aufgaben 探索LLMM中模型压缩的限度:关于质量保证任务的知识积累研究 2507.07630v1 -
1342 07-10 TransformEEG: Towards Improving Model Generalizability in Deep Learning-based EEG Parkinson’s Disease Detection TransformEEG: Auf dem Weg zur Verbesserung des Modells Generalizability in Deep Learning-based EEG Parkinson’s Disease Detection TerverEEEG:努力改进深学习性EEG Parkinson疾病检测模式 2507.07622v1 -
1343 07-10 Sparse Causal Discovery with Generative Intervention for Unsupervised Graph Domain Adaptation Sparse Causal Discovery mit generativer Intervention für unüberwachte Graphen-Domänenanpassung 以未受监督的图形域适应的生成干预生成的简单原因发现 2507.07621v1 -
1344 07-10 Sparse Self-Federated Learning for Energy Efficient Cooperative Intelligence in Society 5.0 Sparse Selbstgebundenes Lernen für energieeffiziente kooperative Intelligenz in der Gesellschaft 5.0 社会节能合作情报学会 2507.07613v1 -
1345 07-10 S2FGL: Spatial Spectral Federated Graph Learning S2FGL: Raumspektrales Federiertes Graphenlernen S2FGL: 空间光谱联邦图表学习 2507.02409v2 -
1346 07-10 Offline Trajectory Optimization for Offline Reinforcement Learning Offline-Trajektorienoptimierung für Offline-Verstärkungslernen 离线轨迹优化用于离线强化学习 2404.10393v2 -
1347 07-10 Synthetic MC via Biological Transmitters: Therapeutic Modulation of the Gut-Brain Axis Synthetische MC über biologische Transmitter: Therapeutische Modulation der Gut-Brain-Achse 通过生物传播器进行MC:古特脑轴体的治疗变化 2507.07604v1 -
1348 07-10 Don’t Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning Drücken Sie nicht auf den Knopf! Erforschen von Daten Leckage Risiken im maschinellen Lernen und Transfer Lernen 不要按按钮! 探索机器学习和传输学习中的数据泄漏风险 2401.13796v4 -
1349 07-10 Context Pooling: Query-specific Graph Pooling for Generic Inductive Link Prediction in Knowledge Graphs Kontextpooling: Abfragespezifische Graphenpooling für generische Induktive Link-Vorhersage in Wissensgraphen 背景集合:知识图中通用感应链接预测的查询特定图集 2507.07595v1 -
1350 07-10 Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations Überprüfung der Likelihood-basierten Out-of-Distribution-Erkennung durch Modellierung von Repräsentationen 通过建模代表机构重新审视以可能性为基础的分销外探测 2504.07793v3 -
1351 07-10 Stress Monitoring in Healthcare: An Ensemble Machine Learning Framework Using Wearable Sensor Data Stressüberwachung im Gesundheitswesen: Ein Ensemble Machine Learning Framework mit tragbaren Sensordaten 保健中压力监测:使用穿戴感感应数据的综合机械学习框架 2507.07589v1 -
1352 07-10 Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench Jenseits von Überkorrektur: Bewertung von Diversität in T2I-Modellen mit DivBench 超越过度纠正:在DivBench的T2I模型中评估多样性 2507.03015v2 -
1353 07-10 Improving Clustering on Occupational Text Data through Dimensionality Reduction Verbesserung der Clusterbildung auf berufsbezogenen Textdaten durch Dimensionalitätsreduzierung 通过减少分量改进职业文本数据集群化 2507.07582v1 -
1354 07-10 CHOMET: Conditional Handovers via Meta-Learning CHOMET: Bedingte Übergaben über Meta-Learning CHOMET: 通过Met-Learn 有条件的交接 2507.07581v1 -
1355 07-10 COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation COALA: Numerisch stabiles und effizientes Framework für kontextabhängige Low-Rank-Annäherung COALA: 低 Rank 上下低敏度接近度的数值稳定、高效框架 2507.07580v1 -
1356 07-10 On Trustworthy Rule-Based Models and Explanations Über vertrauenswürdige regelbasierte Modelle und Erklärungen 关于可信赖、有可信赖的、基于规则的模型和解释 2507.07576v1 -
1357 07-10 Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning Künstliche Generäle Intelligenz: Meistern von Generälen.io mit Verstärkungslernen 人造将军情报:掌握将军,加强学习 2507.06825v2 -
1358 07-10 Solving Probabilistic Verification Problems of Neural Networks using Branch and Bound Lösung probabilistischer Verifikationsprobleme von neuralen Netzen mittels Branch und Bound 利用分支和边界解决神经网络的概率核查问题 2405.17556v3 -
1359 07-10 Real-Time Decorrelation-Based Anomaly Detection for Multivariate Time Series Echtzeit-Dekorrelation-basierte Anomalieerkennung für multivariate Zeitreihen 用于多变量时间序列的基于实时显示关系异常探测 2507.07559v1 -
1360 07-10 TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference TokenWeave: Effiziente Compute-Communication Overlap für verteilte LLM-Inferenz TokenWeave: 有效计算分布式LLM 推理的通信重叠 2505.11329v2 -
1361 07-10 LARP: Learner-Agnostic Robust Data Prefiltering LARP: Learner-Agnostic Robuste Datenvorfilterung LARP: 学习者-不可知强力数据预过滤 2506.20573v3 -
1362 07-10 Position: We Need An Algorithmic Understanding of Generative AI Position: Wir brauchen ein algorithmisches Verständnis der Generativen KI 立场:我们需要对 “ 创造的人工智能 “ 的定量理解。 2507.07544v1 -
1363 07-10 Don’t Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series Nicht falsch machen: Wie man tiefe visuelle Interpretationen auf Zeitreihen anwendet 不要误会我: 如何将深视判读应用到时间序列 2203.07861v3 -
1364 07-10 Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models Thought Crime: Hintertüren und Emergent-Missausrichtung in vernünftigen Modellen 思想犯罪:后门和合理理由模型中新出现的不协调现象 2506.13206v2 -
1365 07-10 Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process Ableitung von Output-Korrelations-Schlussfolgerungen für Multi-Output (aka Multi-Task) Gaussian-Prozess 多种产出(又称多任务)的多产出(高斯)进程输出相关关系推断的衍生结果 2501.07964v4 -
1366 07-10 Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer Testen der Spin-Bad-Ansicht der Selbstachtung: Eine Hamiltonian Analyse von GPT-2 Transformer 测试自觉自觉的自吹泡泡视图:汉密尔顿对GPT-2变形器的分析 2507.00683v3 -
1367 07-10 Recurrent U-Net-Based Graph Neural Network (RUGNN) for Accurate Deformation Predictions in Sheet Material Forming Recurrent U-Net-based Graph Neural Network (RUGNN) für genaue Deformationsvorhersagen in Blattmaterialformung 经常性 U-Net-基于U-Net的制表材料成型准确变形预测的图形神经网络(RUGNN) 2507.11547v1 -
1368 07-10 Robust and Efficient Writer-Independent IMU-Based Handwriting Recognition Robuste und effiziente Schreib-Unabhängige IMU-basierte Handschriftenerkennung 强有力和高效率的独立作家、独立作家、以IMU为基础的手写识别 2502.20954v2 -
1369 07-10 Lightweight Cloud Masking Models for On-Board Inference in Hyperspectral Imaging Leichte Cloud-Maskierungsmodelle für On-Board-Inferenzen in der Hyperspektralen Bildgebung 超光谱成像中超光谱成像中在板上推断的轻型云面遮云模型 2507.08052v1 -
1370 07-10 Divergence Minimization Preference Optimization for Diffusion Model Alignment Divergenz-Minimierungspräferenz-Optimierung für Diffusionsmodellausrichtung 传播模型对齐 2507.07510v1 -
1371 07-10 An Enhanced Privacy-preserving Federated Few-shot Learning Framework for Respiratory Disease Diagnosis Ein verbessertes Datenschutz-erhaltendes Föderated Few-shot Learning Framework für die Diagnose von Atemwegserkrankungen 强化的隐私保护联邦呼吸道疾病诊断学习框架 2507.08050v1 -
1372 07-10 Semi-supervised learning and integration of multi-sequence MR-images for carotid vessel wall and plaque segmentation Semi-überwachtes Lernen und Integration von Multi-Sequenz-MR-Bildern für karotide Gefäßwand- und Plaquesegmentierung 在半监督下学习和整合对折合体船只壁壁和隔板的多序列MMM-图像的半监督学习和集成 2507.07496v1 -
1373 07-10 Task Assignment and Exploration Optimization for Low Altitude UAV Rescue via Generative AI Enhanced Multi-agent Reinforcement Learning Aufgabenzuweisung und Explorationsoptimierung für UAV-Rescue mit geringer Höhe über Generative KI Enhanced Multi-Agent Verstärkungs-Lernen 通过创新的AI增强型多剂强化学习,为低高空无人驾驶航空器救援工作分配任务和探索优化 2504.13554v2 -
1374 07-10 Affordable AI Assistants with Knowledge Graph of Thoughts Erschwingliche KI-Assistenten mit Wissensgrafik der Gedanken 具有知识思想知识图的负担得起的AI助理 2504.02670v5 -
1375 07-10 Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning Token-Space-Gradient-Konflikte lösen: Token Space-Manipulation für transformerbasiertes Multi-Task-Learning 解决 Token- Space 渐变冲突: 用于以变换器为基础的多任务学习的 Token 空间操纵 2507.07485v1 -
1376 07-10 Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models Machine Bullshit: Charakterisieren der Emergenten Missachtung der Wahrheit in großen Sprachmodellen 机器胡说:在大语言模型中突出新人无视真相的特点 2507.07484v1 -
1377 07-10 Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences Adaptive Randomisierte Glättung: Zertifizierte Adversarial Robustheit für Multi-Step-Verteidigungen 适应性随机调整平滑:多步骤防御的证明反向强力 2406.10427v3 -
1378 07-10 Mixture of Group Experts for Learning Invariant Representations Mixtur von Gruppenexperten für Learning Invariante Repräsentationen 学习不稳定代表小组专家混合 2504.09265v2 -
1379 07-10 ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining ixi-GEN: Effiziente industrielle sLLMs durch Domain Adaptive Continual Pretraining ixi-GEN:通过远程适应性连续训练前,提高工业低温生产效率 2507.06795v2 -
1380 07-10 A Hybrid Multilayer Extreme Learning Machine for Image Classification with an Application to Quadcopters Eine Hybrid-Multilayer-Extreme-Lernmaschine für die Bildklassifizierung mit einer Anwendung auf Quadcopter 用于图像分类的混合多层极端学习机,并适用于四重拳击机 2507.08047v1 -
1381 07-10 Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals Hess-MC2: Sequentielle Monte Carlo mit Hessischen Informationen und Vorschlägen für die zweite Ordnung Hess-MC2:使用黑森信息和第二顺序提案的顺序蒙特卡洛广场 2507.07461v1 -
1382 07-10 General purpose models for the chemical sciences Allgemeine Zweckmodelle für die Chemiewissenschaften 化学科学通用模型 2507.07456v1 -
1383 07-10 C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition C3T: Grenzüberschreitender Transfer durch Zeit für sensorgestützte menschliche Aktivitätserkennung C3T: 以传感器为基础的人类活动识别跨时间跨模式转让 2407.16803v4 -
1384 07-10 ARBoids: Adaptive Residual Reinforcement Learning With Boids Model for Cooperative Multi-USV Target Defense ARBoids: Adaptives Residual-Verstärkungs-Lernen mit Boids-Modell für kooperative Multi-USV-Zielverteidigung ABBOids:多紫外线合作多紫外线目标防御用BOids模式进行适应性残余强化学习 2502.18549v2 -
1385 07-10 ODIA: Oriented Distillation for Inline Acceleration of LLM-based Function Calling ODIA: Orientierte Destillation zur Inline-Beschleunigung des LLM-basierten Funktionsaufrufs ODIA:以LLM为基础的功能调用为内联加速进行定向蒸馏 2507.08877v1 -
1386 07-10 Harmonic Loss Trains Interpretable AI Models Harmonische Verlust Züge Interpretierbare KI-Modelle 可解释的 AI 模型 2502.01628v2 -
1387 07-10 Probabilistic Approximate Optimization: A New Variational Monte Carlo Algorithm Probabilistische annähernde Optimierung: Eine neue Variation des Monte Carlo-Algorithmus 概率近似优化:新的变异性蒙特卡洛算法 2507.07420v1 -
1388 07-10 Autonomous AI-based Cybersecurity Framework for Critical Infrastructure: Real-Time Threat Mitigation Autonomes KI-basiertes Cybersecurity Framework für kritische Infrastruktur: Echtzeit-Bedrohungsmilderung 以AI为基础的关键基础设施自动网络安全框架:减少实时威胁 2507.07416v1 -
1389 07-10 Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks Hybride LLM-verstärkte Intrusionserkennung für Zero-Day-Bedrohungen in IoT-Netzwerken 在IoT网络零日威胁下加强入侵探测 2507.07413v1 -
1390 07-10 Determinant Estimation under Memory Constraints and Neural Scaling Laws Determinante Abschätzung unter Gedächtnisbeschränkungen und neuralen Skalierungsgesetzen 根据记忆限制和神经扩增法对决定因素进行估算 2503.04424v2 -
1391 07-10 Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models Phishing Detection in der Gen-AI Ära: Quantisierte LLMs gegen klassische Modelle Gen-AI 时代中的幻影探测:量化的LMs 与古典模型 2507.07406v1 -
1392 07-10 HGMP:Heterogeneous Graph Multi-Task Prompt Learning HGMP:Heterogenes Graph-Multi-Task-Prompt-Lernen HGMP: 异基因图多任务快速学习 2507.07405v1 -
1393 07-10 Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization Generalized Tree Edit Distance (GTED): Ein treues Bewertungsmetrikum für die Autoformalisierung von Aussagen 通用树版编辑距离(GTED):声明自动正规化的忠实评价度量 2507.07399v1 -
1394 07-10 IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing IML-Spikeformer: Multi-Level Spiking Transformer für die Sprachverarbeitung IML-Spikeex: 用于语音处理的具有投入意识的多层Spiking变换器 2507.07396v1 -
1395 07-10 Learning Collective Variables from Time-lagged Generation Kollektive Variablen aus der zeitverzögerten Generation lernen 时间滞后一代的学习集体变量 2507.07390v1 -
1396 07-10 ST-GRIT: Spatio-Temporal Graph Transformer For Internal Ice Layer Thickness Prediction ST-GRIT: Spatio-Temporal Graph Transformer für interne Eisschichtdicke Vorhersage ST-GRIT: 内部冰层厚度预测的时空图变异器 2507.07389v1 -
1397 07-10 GRIT: Graph Transformer For Internal Ice Layer Thickness Prediction GRIT: Graph Transformer für interne Eisschichtdicke Vorhersage GRIT: 内部冰层厚度预测的图形变形器 2507.07388v1 -
1398 07-10 HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning HeLo: Heterogene Multi-Modal Fusion mit Labelkorrelation für Emotion Distribution Learning HeLo:情感分布学习中带有标签关联的异变多模式融合 2507.06821v2 -
1399 07-10 Online Continual Learning via Spiking Neural Networks with Sleep Enhanced Latent Replay Online Continual Learning über Spiking Neuronal Networks mit Schlaf Enhanced Latent Replay 通过Spiking神经网络在线持续学习,并配有睡眠强化前端重播 2507.02901v2 -
1400 07-10 Unifews: You Need Fewer Operations for Efficient Graph Neural Networks Unifews: Sie brauchen weniger Operationen für effiziente Graphen-Neural-Netzwerke Unifews: 高效图形神经网络需要更少操作 2403.13268v2 -
1401 07-10 User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection Benutzerbasierte sequentielle Modellierung mit Transformer-Encodern für Insider Threat Detection 以用户为基础的序列模型,使用变换器编码器进行内部威胁探测 2506.23446v2 -
1402 07-10 An Automated Classifier of Harmful Brain Activities for Clinical Usage Based on a Vision-Inspired Pre-trained Framework Ein automatisierter Klassifikator schädlicher Gehirnaktivitäten für die klinische Anwendung basierend auf einem Vision-Inspired Pre-trained Framework 以 “ 愿景引导的预培训框架 “ 为基础,对临床使用的有害脑活动进行自动分类 2507.08874v1 -
1403 07-10 BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems BountyBench: Dollar-Impact von KI-Agenten-Angriffen und Verteidigern auf reale Cybersicherheitssysteme BuntyBuntyBunnench: AI代理攻击者和捍卫者对现实世界网络安全系统的美元影响 2505.15216v2 -
1404 07-10 A Multi-Granularity Supervised Contrastive Framework for Remaining Useful Life Prediction of Aero-engines Ein Multi-Granularität überwacht Kontrastive Rahmen für das Bleiben nützlicher Lebensvorhersage von Aero-Motoren 空气-发动机剩余使用寿命预测多族监督多族监督违规框架 2411.00461v3 -
1405 07-10 Bradley-Terry and Multi-Objective Reward Modeling Are Complementary Bradley-Terry und Multi-Objective Reward Modeling sind komplementär Bradley-Terriy和多目标奖励模型具有补充作用 2507.07375v1 -
1406 07-10 Atherosclerosis through Hierarchical Explainable Neural Network Analysis Atherosklerose durch hierarchische erklärende neurale Netzwerkanalyse 通过可解释的神经网络分析,通过高层次解析神经网络分析,实现天体硬化 2507.07373v1 -
1407 07-10 Data-driven Kinematic Modeling in Soft Robots: System Identification and Uncertainty Quantification Datengesteuerte kinematische Modellierung in Soft Robots: Systemidentifikation und Unsicherheitsquantifizierung 软机器人中数据驱动的虚拟模型:系统识别和不确定性量化 2507.07370v1 -
1408 07-10 Contrastive Language-Image Pre-Training Model based Semantic Communication Performance Optimization Kontrastive Sprach-Image Pre-Training Modellbasierte Semantische Kommunikationsleistung Optimierung 基于语义交流交流绩效优化的示范示范 2507.08873v1 -
1409 07-10 A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning Eine kryptografische Perspektive auf Mitigation vs. Detection in Machine Learning 关于减缓与机械学习中的探测的加密视角 2504.20310v2 -
1410 07-10 Platform for Representation and Integration of multimodal Molecular Embeddings Plattform für Repräsentation und Integration multimodaler molekularer Einbettungen 多式联运分子嵌入的 代表性和一体化平台 2507.07367v1 -
1411 07-10 Goal-Oriented Sequential Bayesian Experimental Design for Causal Learning Zielorientiertes sequentielles Bayesian Experimental Design für das kausale Lernen 以目标为导向、按顺序排列的Bayesian 因果关系学习实验设计 2507.07359v1 -
1412 07-10 Learning from positive and unlabeled examples -Finite size sample bounds Aus positiven und unmarkierten Beispielen lernen -Finite-Size-Probengrenzen 从正面和未贴标签的例子中学习 - 微小大小抽样范围 2507.07354v1 -
1413 07-10 Machine Learning-driven Multiscale MD Workflows: The Mini-MuMMI Experience Mehrstufige MD-Workflows mit maschinellem Lernen: Die Mini-MuMMI-Erfahrung 由学习驱动的机械式学习驱动的多规模MD工作流程:微型MIMI经验 2507.07352v1 -
1414 07-10 Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts Zero-Shot-Context-Verallgemeinerung in der Verstärkung Lernen aus wenigen Trainingskontexten 从少见的培训背景中加强学习的零零零片背景概括化 2507.07348v1 -
1415 07-10 It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation Es ist schwer, normal zu sein: Der Einfluss von Lärm auf die strukturagnostische Abschätzung 很难正常:噪音对结构-不可计量估计的影响 2507.02275v2 -
1416 07-10 Way More Than the Sum of Their Parts: From Statistical to Structural Mixtures Viel mehr als die Summe ihrer Teile: Von statistischen zu strukturellen Mischungen 超出其部分总和:从统计到结构混合 2507.07343v1
Article 0
Title@2025-07-17 (4): Hierarchical Rectified Flow Matching with Mini-Batch Couplings
Title: Hierarchical Rectified Flow Matching with Mini-Batch Couplings | Hierarchischer rektifizierter Fluss passend zu Mini-Batch-Kupplungen | 与小批量相匹配的梯级校正流程 2507.13350v1 |
Authors (4): Yichi Zhang, Yici Yan, Alex Schwing, Zhizhen Zhao
Flow matching has emerged as a compelling generative modeling approach that is widely used across domains. To generate data via a flow matching model, an ordinary differential equation (ODE) is numerically solved via forward integration of the modeled velocity field. To better capture the multi-modality that is inherent in typical velocity fields, hierarchical flow matching was recently introduced. It uses a hierarchy of ODEs that are numerically integrated when generating data. This hierarchy of ODEs captures the multi-modal velocity distribution just like vanilla flow matching is capable of modeling a multi-modal data distribution. While this hierarchy enables to model multi-modal velocity distributions, the complexity of the modeled distribution remains identical across levels of the hierarchy. In this paper, we study how to gradually adjust the complexity of the distributions across different levels of the hierarchy via mini-batch couplings. We show the benefits of mini-batch couplings in hierarchical rectified flow matching via compelling results on synthetic and imaging data. Code is available at https://riccizz.github.io/HRF_coupling.
为了通过流动匹配模型生成数据,普通差异方程式(ODE)通过模型速度字段的远前整合从数字上解决。为了更好地捕捉典型速度字段所固有的多模式性,最近采用了等级流配对。它使用在生成数据时以数字集成的代码组分。ODE的等级分级抓住了多模式速度分布,就像香草流配对一样,能够模拟多模式数据分布。虽然这种等级组分能够模拟多模式速度分布,但模型分布的复杂性在各等级层之间仍然相同。在本文中,我们研究如何通过小型组合组合组合逐渐调整等级结构不同层次分布的复杂性。我们通过合成和成像数据上令人信服的结果,展示了等级校正流动中小型组合组合的好处。代码可在 https://ricizz.github.io/HRF_coupling上查阅。
Article 1
Title@2025-07-17 (4): VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Title: VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning | VisionThink: Intelligentes und effizientes Vision-Sprachmodell durch Verstärkungslernen | 远景设想:通过强化学习建立聪明、高效的愿景语言模式 2507.13348v1 |
Authors (6): Senqiao Yang, Junyi Li, Xin Lai, Bei Yu, Hengshuang Zhao, Jiaya Jia
Recent advancements in vision-language models (VLMs) have improved performance by increasing the number of visual tokens, which are often significantly longer than text tokens. However, we observe that most real-world scenarios do not require such an extensive number of visual tokens. While the performance drops significantly in a small subset of OCR-related tasks, models still perform accurately in most other general VQA tasks with only 1/4 resolution. Therefore, we propose to dynamically process distinct samples with different resolutions, and present a new paradigm for visual token compression, namely, VisionThink. It starts with a downsampled image and smartly decides whether it is sufficient for problem solving. Otherwise, the model could output a special token to request the higher-resolution image. Compared to existing Efficient VLM methods that compress tokens using fixed pruning ratios or thresholds, VisionThink autonomously decides whether to compress tokens case by case. As a result, it demonstrates strong fine-grained visual understanding capability on OCR-related tasks, and meanwhile saves substantial visual tokens on simpler tasks. We adopt reinforcement learning and propose the LLM-as-Judge strategy to successfully apply RL to general VQA tasks. Moreover, we carefully design a reward function and penalty mechanism to achieve a stable and reasonable image resize call ratio. Extensive experiments demonstrate the superiority, efficiency, and effectiveness of our method. Our code is available at https://github.com/dvlab-research/VisionThink.
最近在视觉语言模型(VLMS)方面的进步通过增加视觉象征物的数量而改善了业绩,视觉象征物的数量往往比文字象征物要长得多。然而,我们注意到,大多数真实世界情景并不要求如此大量的视觉象征物。虽然在与OCR有关的小部分任务中表现显著下降,但在大多数其他通用VQA任务中,VLMA模型仍然以1/4的分辨率来准确地执行。因此,我们提议动态地处理不同分辨率的样本,并提供一个视觉象征物压缩的新模式,即VisionThink。它从下印图像开始,并明智地决定它是否足以解决问题。否则,该模型可以输出一个特别象征物来请求更高分辨率的图像。与现有的使用固定的调理比或阈值拼贴的高效VLMMM方法相比,VVTink自主地决定是否以实例来压缩标志物证物。结果显示,OCRCRVS/SLS的精细直观理解能力,同时在更简单的任务上保留大量视觉标志。我们采用强化的学习和提议一个稳定的LM-JSMAS-SMA 战略,我们对常规的精准的精准标准,我们用LALS-S-S-A的精度的精度的精度的精度的精度的精度,我们的精度的精度的精度的精度的精度的精度的精度的精度,我们的精度的精度的精度的精度的精度的精度的精度的精度。
Article 2
Title@2025-07-17 (4): Latent Policy Steering with Embodiment-Agnostic Pretrained World Models
Title: Latent Policy Steering with Embodiment-Agnostic Pretrained World Models | Latent Policy Steering mit prätrainierten Weltmodellen der Embodiment-Agnostik | 与Embodiment-Agnnocistic未受训练世界模型的原始政策指导 2507.13340v1 |
Authors (3): Yiqi Wang, Mrinal Verghese, Jeff Schneider
Learning visuomotor policies via imitation has proven effective across a wide range of robotic domains. However, the performance of these policies is heavily dependent on the number of training demonstrations, which requires expensive data collection in the real world. In this work, we aim to reduce data collection efforts when learning visuomotor robot policies by leveraging existing or cost-effective data from a wide range of embodiments, such as public robot datasets and the datasets of humans playing with objects (human data from play). Our approach leverages two key insights. First, we use optic flow as an embodiment-agnostic action representation to train a World Model (WM) across multi-embodiment datasets, and finetune it on a small amount of robot data from the target embodiment. Second, we develop a method, Latent Policy Steering (LPS), to improve the output of a behavior-cloned policy by searching in the latent space of the WM for better action sequences. In real world experiments, we observe significant improvements in the performance of policies trained with a small amount of data (over 50% relative improvement with 30 demonstrations and over 20% relative improvement with 50 demonstrations) by combining the policy with a WM pretrained on two thousand episodes sampled from the existing Open X-embodiment dataset across different robots or a cost-effective human dataset from play.
通过仿造法学习相对机体的政策在广泛的机器人领域证明是有效的。然而,这些政策的绩效在很大程度上取决于培训示范活动的数量,这需要在现实世界中收集昂贵的数据。在这项工作中,我们的目标是在学习相对机体机器人政策时减少数据收集工作,方法是利用现有或成本效益高的数据,来自各种各样的化体,例如公共机器人数据集和玩物(游戏中的人类数据)的人类玩物(游戏中的人类数据)的数据集。我们的方法利用了两个主要的洞察力。首先,我们利用光学流作为化-神学行动代表来培训多式数据集的世界模型(WM),并将它微小的机器人数据从目标成形体中微量。第二,我们开发了一种方法,即远程政策指导(LPS),通过在WM的潜伏空间搜索来改进行为组合政策的输出。在现实世界实验中,我们观察到了以少量数据(从30个演示前的相对改进50%以上世界模型,或从50多张高压版的人类数据结合了50个不同成本的开放式模型) 的政策表现有显著改善。
Article 3
Title@2025-07-17 (4): Training Transformers with Enforced Lipschitz Constants
Title: Training Transformers with Enforced Lipschitz Constants | Trainingstransformatoren mit verstärkter Lipschitz-Konstanten | 培训具有强制立利普施茨常数的变革者 2507.13338v1 |
Authors (6): Laker Newhouse, R. Preston Hess, Franz Cesista, Andrii Zahorodnii, Jeremy Bernstein, Phillip Isola
Neural networks are often highly sensitive to input and weight perturbations. This sensitivity has been linked to pathologies such as vulnerability to adversarial examples, divergent training, and overfitting. To combat these problems, past research has looked at building neural networks entirely from Lipschitz components. However, these techniques have not matured to the point where researchers have trained a modern architecture such as a transformer with a Lipschitz certificate enforced beyond initialization. To explore this gap, we begin by developing and benchmarking novel, computationally-efficient tools for maintaining norm-constrained weight matrices. Applying these tools, we are able to train transformer models with Lipschitz bounds enforced throughout training. We find that optimizer dynamics matter: switching from AdamW to Muon improves standard methods – weight decay and spectral normalization – allowing models to reach equal performance with a lower Lipschitz bound. Inspired by Muon’s update having a fixed spectral norm, we co-design a weight constraint method that improves the Lipschitz vs. performance tradeoff on MLPs and 2M parameter transformers. Our 2-Lipschitz transformer on Shakespeare text reaches validation accuracy 60%. Scaling to 145M parameters, our 10-Lipschitz transformer reaches 21% accuracy on internet text. However, to match the NanoGPT baseline validation accuracy of 39.4%, our Lipschitz upper bound increases to 10^264. Nonetheless, our Lipschitz transformers train without stability measures such as layer norm, QK norm, and logit tanh softcapping.
神经网络通常对输入和重量的扰动非常敏感。 这种敏感性已经与病理学相关, 比如易受对抗性例子的影响, 不同的培训和过度适应。 为了解决这些问题, 过去的研究已经考察了完全从利普施茨组件建造神经网络。 但是, 这些技术还没有成熟到研究人员训练现代结构的地步, 比如一个具有利普施茨证书的变压器, 在初始阶段之后实施利普施茨证书。 为了探索这一差距, 我们开始开发并设定用于维持标准约束性重量矩阵的新型、 计算效率高的工具。 应用这些工具, 我们有能力在培训过程中用利普施茨界限来训练变压器模型。 我们发现, 优化的动力性能很重要: 从亚当·W转换到穆恩( Muon) 标准方法 – 重量腐蚀和光谱正常化 – 使模型能够达到与较低的利普施茨茨维茨( Lipschitz) 标准相同的性能。 我们共同设计了一个重量约束性工具, 改进利普施茨(Lis) 和2M(M) 的性变压交易和2M(M) 参数变压的变压模型。 我们的2-Lis- Listiliz(I) 的升级) 的平质化(I) 的平质化(IL) 的平质化) 的平质化(O) (O) 升) (O)
Article 4
Title@2025-07-17 (4): GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM
Title: GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM | GeoReg: Gewicht-beschränkt Wenig-heiße Regression für sozioökonomische Abschätzung mit LLM | Georg: 使用LLM法理学模型,为社会经济估算而进行微慢回归,但受重力约束的微弱回缩 2507.13323v1 |
Authors (9): Kyeongjin Ahn, Sungwon Han, Seungeon Lee, Donghyun Ahn, Hyoshin Kim, Jungwon Kim, Jihee Kim, Sangyoon Park, Meeyoung Cha
Socio-economic indicators like regional GDP, population, and education levels, are crucial to shaping policy decisions and fostering sustainable development. This research introduces GeoReg a regression model that integrates diverse data sources, including satellite imagery and web-based geospatial information, to estimate these indicators even for data-scarce regions such as developing countries. Our approach leverages the prior knowledge of large language model (LLM) to address the scarcity of labeled data, with the LLM functioning as a data engineer by extracting informative features to enable effective estimation in few-shot settings. Specifically, our model obtains contextual relationships between data features and the target indicator, categorizing their correlations as positive, negative, mixed, or irrelevant. These features are then fed into the linear estimator with tailored weight constraints for each category. To capture nonlinear patterns, the model also identifies meaningful feature interactions and integrates them, along with nonlinear transformations. Experiments across three countries at different stages of development demonstrate that our model outperforms baselines in estimating socio-economic indicators, even for low-income countries with limited data availability.
区域GDP、人口和教育水平等社会经济指标对决策的形成和促进可持续发展至关重要。这一研究引入了GeoReg 回归模型,该模型整合了各种数据来源,包括卫星图像和网络地理空间信息,以估算这些指标,甚至对于发展中国家等数据偏缺区域也是如此。我们的方法利用大语言模型(LLM)的先前知识,解决标签数据稀缺的问题,而LLM是数据工程师,通过提取信息功能,在微小的环境下进行有效估算。具体地说,我们的模型在数据特征和目标指标之间建立了背景关系,将其相关性分类为正数、负数、混合或无关。这些特征随后被输入线性天线性估计器,对每一类别都有特定的重量限制。为了捕捉非线性模式,该模型还确定了有意义的特征互动,并将之与非线性转变结合起来。在不同发展阶段的三个国家进行的实验表明,我们的模型在估算社会经济指标方面超过了基线,即使是数据提供有限的低收入国家也是如此。
Article 5
Title@2025-07-17 (4): Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence
Title: Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence | Föderiertes Lernen: Eine Umfrage zum Datenschutz-Schutz Kollaborativer Intelligenz | 联邦学习:保护隐私合作情报调查 2504.17703v2 |
Authors (3): Nusrat Jahan, Ratun Rahman, Michel Wang
Federated Learning (FL) has emerged as a transformative paradigm in the field of distributed machine learning, enabling multiple clients such as mobile devices, edge nodes, or organizations to collaboratively train a shared global model without the need to centralize sensitive data. This decentralized approach addresses growing concerns around data privacy, security, and regulatory compliance, making it particularly attractive in domains such as healthcare, finance, and smart IoT systems. This survey provides a concise yet comprehensive overview of Federated Learning, beginning with its core architecture and communication protocol. We discuss the standard FL lifecycle, including local training, model aggregation, and global updates. A particular emphasis is placed on key technical challenges such as handling non-IID (non-independent and identically distributed) data, mitigating system and hardware heterogeneity, reducing communication overhead, and ensuring privacy through mechanisms like differential privacy and secure aggregation. Furthermore, we examine emerging trends in FL research, including personalized FL, cross-device versus cross-silo settings, and integration with other paradigms such as reinforcement learning and quantum computing. We also highlight real-world applications and summarize benchmark datasets and evaluation metrics commonly used in FL research. Finally, we outline open research problems and future directions to guide the development of scalable, efficient, and trustworthy FL systems.
联邦学习联合会(FL)已成为分布式机器学习领域的变革范例,使移动设备、边缘节点或组织等多个客户能够合作培训一个共同的全球模式,而无需集中敏感数据; 这种分散化的做法解决了数据隐私、安全和监管合规方面日益令人关切的问题,使其在保健、金融和智能IoT系统等领域特别具有吸引力;这项调查从核心架构和通信协议开始,简要而全面地概述了FL学习联合会研究的新趋势;我们讨论了标准的FL生命周期,包括当地培训、模型汇总和全球更新;我们特别强调了关键技术挑战,例如处理非IID(不独立和同样分布)数据、减轻系统和硬件差异性、减少通信间接费用、通过差异隐私和安全汇总等机制确保隐私。此外,我们研究了FL研究的新趋势,包括个性化FL、交叉缺陷和跨筒环境,以及与其他模式的整合,例如强化学习和量子计算。我们还强调了现实世界应用,并总结了基准数据集和基准,以用于FL研究中常用的开放性研究方向、FL系统的未来可理解性指南。
Article 6
Title@2025-07-17 (4): EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
Title: EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos | EgoVLA: Vision-Language-Action-Modelle von egozentrischen menschlichen Videos lernen | EgoVLA:从以以地球为中心的人类视频中学习愿景-语言-行动模式 2507.12440v2 |
Authors (13): Ruihan Yang, Qinxi Yu, Yecheng Wu, Rui Yan, Borui Li, An-Chieh Cheng, Xueyan Zou, Yunhao Fang, Hongxu Yin, Sifei Liu, Song Han, Yao Lu, Xiaolong Wang
Real robot data collection for imitation learning has led to significant advancements in robotic manipulation. However, the requirement for robot hardware in the process fundamentally constrains the scale of the data. In this paper, we explore training Vision-Language-Action (VLA) models using egocentric human videos. The benefit of using human videos is not only for their scale but more importantly for the richness of scenes and tasks. With a VLA trained on human video that predicts human wrist and hand actions, we can perform Inverse Kinematics and retargeting to convert the human actions to robot actions. We fine-tune the model using a few robot manipulation demonstrations to obtain the robot policy, namely EgoVLA. We propose a simulation benchmark called Ego Humanoid Manipulation Benchmark, where we design diverse bimanual manipulation tasks with demonstrations. We fine-tune and evaluate EgoVLA with Ego Humanoid Manipulation Benchmark and show significant improvements over baselines and ablate the importance of human data. Videos can be found on our website: https://rchalyang.github.io/EgoVLA
用于模仿学习的真实机器人数据收集工作已导致机器人操作方面的重大进步。 然而,对机器人硬件在工艺过程中的要求从根本上限制了数据的规模。 在本文中,我们探索使用以自我为中心的人类视频来培训视觉-语言-动作模型(VLA)。使用人类视频的好处不仅在于其规模,而且更重要的是其场景和任务的丰富性。通过对人视频进行人类视频培训以预测人类手腕和手动动作,我们可以进行反光学和重新定位,将人类行动转换为机器人行动。我们用一些机器人操作演示来微调模型,以获得机器人政策,即EgoVLA。我们提出了一个模拟基准,称为“Ego 人类手动操纵基准 ” 。 我们用演示设计了多种多样的二元操纵任务。我们用“Ego 人类手动操纵基准” 进行微调和评价EgoVLA, 并显示基线的重大改进和人类数据的重要性。视频可以在我们的网站上找到: https://rchalyang.githubio./EgoVLA。
Article 7
Title@2025-07-17 (4): Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
Title: Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models | Von reward-free Offline-Daten lernen: Ein Fall für die Planung mit latenten Dynamics-Modellen | 从无回报脱离线数据中学习:利用隐时动态模型进行规划的一个案例 2502.14819v2 |
Authors (6): Vlad Sobal, Wancong Zhang, Kynghyun Cho, Randall Balestriero, Tim G. J. Rudner, Yann LeCun
A long-standing goal in AI is to build agents that can solve a variety of tasks across different environments, including previously unseen ones. Two dominant approaches tackle this challenge: (i) reinforcement learning (RL), which learns policies through trial and error, and (ii) optimal control, which plans actions using a learned or known dynamics model. However, their relative strengths and weaknesses remain underexplored in the setting where agents must learn from offline trajectories without reward annotations. In this work, we systematically analyze the performance of different RL and control-based methods under datasets of varying quality. On the RL side, we consider goal-conditioned and zero-shot approaches. On the control side, we train a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and use it for planning. We study how dataset properties-such as data diversity, trajectory quality, and environment variability-affect the performance of these approaches. Our results show that model-free RL excels when abundant, high-quality data is available, while model-based planning excels in generalization to novel environment layouts, trajectory stitching, and data-efficiency. Notably, planning with a latent dynamics model emerges as a promising approach for zero-shot generalization from suboptimal data.
AI的一个长期目标是建立能够解决不同环境(包括以前不为人知的环境)的各种任务的代理商。两种主要方法应对这一挑战:(一) 强化学习(RL),通过试验和错误学习政策,以及(二) 最佳控制,利用一个已知或已知的动态模型规划行动;然而,在代理商必须从离线轨迹中学习而不附加奖励说明的环境下,其相对优缺点仍未得到充分探讨。在这项工作中,我们在质量不同的数据集下,系统分析不同的RL和控制方法的绩效。在RL方面,我们考虑有目标的、零效果的方法。在控制方面,我们利用联合嵌入式预测架构(JEPA)来培训一个潜在动态模型,并利用它进行规划。我们研究了数据设置属性的方式,例如数据多样性、轨迹质量和环境变异性影响这些方法的绩效。我们的结果显示,在可获得丰富、高质量的数据时,没有模型的RLL优异性,而基于模型的规划则优于新式的环境布局、轨迹、轨迹、总体数据动态规划,从高水平看如何进行。
Article 8
Title@2025-07-17 (4): Boosting Team Modeling through Tempo-Relational Representation Learning
Title: Boosting Team Modeling through Tempo-Relational Representation Learning | Teammodellierung durch Tempo-Relationales Repräsentationslernen fördern | 通过Tempo-关系代表制学习促进团队模拟 2507.13305v1 |
Authors (3): Vincenzo Marco De Luca, Giovanna Varni, Andrea Passerini
Team modeling remains a fundamental challenge at the intersection of Artificial Intelligence and the Social Sciences. Social Science research emphasizes the need to jointly model dynamics and relations, while practical applications demand unified models capable of inferring multiple team constructs simultaneously, providing interpretable insights and actionable recommendations to enhance team performance. However, existing works do not meet these practical demands. To bridge this gap, we present TRENN, a novel tempo-relational architecture that integrates: (i) an automatic temporal graph extractor, (ii) a tempo-relational encoder, (iii) a decoder for team construct prediction, and (iv) two complementary explainability modules. TRENN jointly captures relational and temporal team dynamics, providing a solid foundation for MT-TRENN, which extends TReNN by replacing the decoder with a multi-task head, enabling the model to learn shared Social Embeddings and simultaneously predict multiple team constructs, including Emergent Leadership, Leadership Style, and Teamwork components. Experimental results demonstrate that our approach significantly outperforms approaches that rely exclusively on temporal or relational information. Additionally, experimental evaluation has shown that the explainability modules integrated in MT-TRENN yield interpretable insights and actionable suggestions to support team improvement. These capabilities make our approach particularly well-suited for Human-Centered AI applications, such as intelligent decision-support systems in high-stakes collaborative environments.
社会科学研究强调,需要联合模拟动态和关系,而实用应用则需要能够同时推导多个团队构建的统一模型,提供可解释的见解和可操作的建议,以提高团队绩效;然而,现有工作无法满足这些实际需求。为了缩小这一差距,我们提出了TREN,这是一个新型的节奏关系架构,整合了:(一) 自动时间图提取器,(二) 动态关系编码器,(三) 团队构建预测的解码器,以及(四) 两个互补的解释模块。TRENN 联合捕捉关系和时间团队动态,为MT-TRENN提供了坚实的基础,它通过以多功能头顶取代解码器,使模型能够学习共享的社会嵌入,同时预测多个团队构建,包括新兴领导、领导风格和团队工作构成。实验结果显示,我们的方法大大超越了完全依赖时间或关系层面的预测方法,特别是在时间或关系层面的应用程序中,为MERN提供了关联性和时间-时间-时间性团队动态动态动态动态动态动态,为MERN提供了坚实的扩展基础。
Article 9
Title@2025-07-17 (4): Retraining-Free Merging of Sparse MoE via Hierarchical Clustering
Title: Retraining-Free Merging of Sparse MoE via Hierarchical Clustering | Retraining-Free Merging von Sparse MoE über Hierarchical Clustering | 通过等级式集束式集成,无培训地重新合并粗微中小部 2410.08589v3 |
Authors (6): I-Chun Chen, Hsu-Shen Liu, Wei-Fang Sun, Chen-Hao Chao, Yen-Chang Hsu, Chun-Yi Lee
Sparse Mixture-of-Experts (SMoE) models represent a significant advancement in large language model (LLM) development through their efficient parameter utilization. These models achieve substantial performance improvements at reduced inference costs. However, the deployment of SMoE models faces constraints from extensive memory requirements of expert components in resource-limited environments. To address these limitations, this paper introduces Hierarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE), a task-agnostic expert merging framework for parameter reduction without retraining. HC-SMoE introduces a novel hierarchical clustering approach based on expert outputs to ensure merging robustness independent of routing decisions. The proposed output-based clustering method enables effective capture of functional relationships between experts for large-scale architectures. We provide theoretical analysis and comprehensive evaluations across multiple zero-shot language tasks to demonstrate HC-SMoE’s effectiveness in state-of-the-art models including Qwen and Mixtral. The experimental results validate HC-SMoE’s superior performance and practical applicability for real-world deployments.
专家的简单混合模型(SMoE)通过高效使用参数,在大型语言模型(LLM)开发方面取得了显著进步。这些模型在降低推论成本后取得了显著的绩效改进。然而,SMoE模型的部署面临资源有限环境中专家组成部分广泛记忆要求的限制。为了解决这些限制,本文件介绍了专家简单激活混合模型(HC-SMoE)的等级组合,这是一个不进行再培训而削减参数的任务性专家合并框架。HC-SMoE采用了基于专家产出的新颖的等级分组办法,以确保与路由决定无关的稳健性相结合。拟议的基于产出的集群方法能够有效地捕捉到大型结构专家之间的功能关系。我们提供理论分析和全面评价,涉及多种零发语言任务,以证明HC-SMoE在包括Quwen和Mixtral在内的最新技术模型中的有效性。实验结果证实了HC-SMoE的优异性业绩和对现实世界部署的实际适用性。
Article 10
Title@2025-07-17 (4): Advancing Seasonal Prediction of Tropical Cyclone Activity with a Hybrid AI-Physics Climate Model
Title: Advancing Seasonal Prediction of Tropical Cyclone Activity with a Hybrid AI-Physics Climate Model | Förderung der saisonalen Vorhersage Tropischer Zyklonaktivität mit einem Hybrid-KI-Physik-Klimamodell | 采用AI-物理混合气候模型推进热带气旋活动季节性预测 2505.01455v2 |
Authors (4): Gan Zhang, Megha Rao, Janni Yuval, Ming Zhao
Machine learning (ML) models are successful with weather forecasting and have shown progress in climate simulations, yet leveraging them for useful climate predictions needs exploration. Here we show this feasibility using Neural General Circulation Model (NeuralGCM), a hybrid ML-physics atmospheric model developed by Google, for seasonal predictions of large-scale atmospheric variability and Northern Hemisphere tropical cyclone (TC) activity. Inspired by physical model studies, we simplify boundary conditions, assuming sea surface temperature (SST) and sea ice follow their climatological cycle but persist anomalies present at the initialization time. With such forcings, NeuralGCM can generate 100 simulation days in ~8 minutes with a single Graphics Processing Unit (GPU), while simulating realistic atmospheric circulation and TC climatology patterns. This configuration yields useful seasonal predictions (July to November) for the tropical atmosphere and various TC activity metrics. Notably, the predicted and observed TC frequency in the North Atlantic and East Pacific basins are significantly correlated during 1990 to 2023 (r=~0.7), suggesting prediction skill comparable to existing physical GCMs. Despite challenges associated with model resolution and simplified boundary forcings, the model-predicted interannual variations demonstrate significant correlations with the observation, including the sub-basin TC tracks (p<0.1) and basin-wide accumulated cyclone energy (p<0.01) of the North Atlantic and North Pacific basins. These findings highlight the promise of leveraging ML models with physical insights to model TC risks and deliver seamless weather-climate predictions.
机器学习模型(ML)在天气预报方面是成功的,在气候模拟方面已经显示进步,但在气候模拟方面需要探索。在这里,我们展示了使用谷歌开发的神经通用环流模型(NeuralGCM)这一混合ML物理大气模型(Neural General Cirma)的可行性,该模型用于对大规模大气变异和北半球热带气旋活动的季节性预测;在物理模型研究的启发下,我们简化了边界条件,假设海面温度和海冰遵循其气候周期,但在初始化时一直存在着不完全的直观性。在这种压力下,NeuralGCMM(NeuralGCM)可以在~8分钟内与单一的图形处理股(GPU)产生100天的模拟日,同时模拟现实的大气环环流和TC气候学模式(MCMR)模式为热带大气层和各种技合活动指标。 值得注意的是,在1990年至2023年期间,北太平洋和东太平洋盆地预测和观察频率的预测和观察频率与现有的物理GCMMMS-MS-MS-CS-CMS-CMS-S-C-S-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-
Article 11
Title@2025-07-17 (4): SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks
Title: SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks | SIDDA: SInkhorn Dynamische Domain-Anpassung für die Bildklassifizierung mit Gleichwertigen Neuronalen Netzwerken | SIDDA: 利用等质神经网络进行图像分类的SInkhorn动态域域适应 2501.14048v2 |
Authors (5): Sneh Pandya, Purvik Patel, Brian D. Nord, Mike Walmsley, Aleksandra Ćiprijanović
Modern neural networks (NNs) often do not generalize well in the presence of a “covariate shift”; that is, in situations where the training and test data distributions differ, but the conditional distribution of classification labels remains unchanged. In such cases, NN generalization can be reduced to a problem of learning more domain-invariant features. Domain adaptation (DA) methods include a range of techniques aimed at achieving this; however, these methods have struggled with the need for extensive hyperparameter tuning, which then incurs significant computational costs. In this work, we introduce SIDDA, an out-of-the-box DA training algorithm built upon the Sinkhorn divergence, that can achieve effective domain alignment with minimal hyperparameter tuning and computational overhead. We demonstrate the efficacy of our method on multiple simulated and real datasets of varying complexity, including simple shapes, handwritten digits, and real astronomical observations. SIDDA is compatible with a variety of NN architectures, and it works particularly well in improving classification accuracy and model calibration when paired with equivariant neural networks (ENNs). We find that SIDDA enhances the generalization capabilities of NNs, achieving up to a $\approx40\%$ improvement in classification accuracy on unlabeled target data. We also study the efficacy of DA on ENNs with respect to the varying group orders of the dihedral group $D_N$, and find that the model performance improves as the degree of equivariance increases. Finally, we find that SIDDA enhances model calibration on both source and target data–achieving over an order of magnitude improvement in the ECE and Brier score. SIDDA’s versatility, combined with its automated approach to domain alignment, has the potential to advance multi-dataset studies by enabling the development of highly generalizable models.
现代神经网络(NNS)通常在存在“ 变换” 的情况下并不全面, 也就是说, 在培训和测试数据分布不同的情况下, 有条件的分类标签分布保持不变。 在这样的情况下, NNS 常规化可以降低到一个学习更多域异性特征的问题。 域适应(DA) 方法包括一系列旨在实现这一目标的技术; 然而, 这些方法与广泛超立度调制的需要不相适应, 而这又需要大量计算成本。 在这项工作中, 我们引入了SIDRAD, 这是在Sinkhorn 目标分布不同的情况下, 一个在测试数据分布上, 在最小超立度调整和计算顶端之间, 能够实现有效的域对齐。 我们通过简单形状、高手写数字和真正的天文观测, SIDRA 方法与各种NE 方法相兼容, 并且它特别有助于改进分类的精确度和模型化, 在不易变异性联合的内基数据网络上, 提高SINDA的性能, 也能够提高SANDA 和G 数据在总级上, 我们发现SIDAA 的升级, 和GNEDADA 的升级。
Article 12
Title@2025-07-17 (4): crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy Labels
Title: crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy Labels | crowd-hpo: Realistische Hyperparameter-Optimierung und Benchmarking zum Lernen von Crowds mit Noisy-Labels | 现实主义超超参数最佳化和基准化,用噪音标签从人群中学习 2504.09085v2 |
Authors (4): Marek Herde, Lukas Lührs, Denis Huseljic, Bernhard Sick
Crowdworking is a cost-efficient solution for acquiring class labels. Since these labels are subject to noise, various approaches to learning from crowds have been proposed. Typically, these approaches are evaluated with default hyperparameter configurations, resulting in unfair and suboptimal performance, or with hyperparameter configurations tuned via a validation set with ground truth class labels, representing an often unrealistic scenario. Moreover, both setups can produce different approach rankings, complicating study comparisons. Therefore, we introduce crowd-hpo as a framework for evaluating approaches to learning from crowds in combination with criteria to select well-performing hyperparameter configurations with access only to noisy crowd-labeled validation data. Extensive experiments with neural networks demonstrate that these criteria select hyperparameter configurations, which improve the learning from crowd approaches’ generalization performances, measured on separate test sets with ground truth labels. Hence, incorporating such criteria into experimental studies is essential for enabling fairer and more realistic benchmarking.
人群拥挤是获取类类标签的具有成本效益的解决方案。 由于这些标签受到噪音的影响, 已经提出了向人群学习的各种方法。 通常, 以默认的超参数配置来评估这些方法, 造成不公平和亚优度的性能, 或以超参数配置通过带有地面真相类标签的验证组合来调整, 这往往是一种不切实际的设想。 此外, 这两种设置可以产生不同的方法排序, 使研究比较复杂化。 因此, 我们引入了人群聚居作为框架, 用于评价向人群学习的方法, 并结合标准来选择业绩良好的超参数配置, 以选择仅能获取噪音的人群标签验证数据。 与神经网络进行的广泛实验表明, 这些标准选择超参数配置, 改进从人群类标签的通用性表现中学习, 以与地面真相标签分开的测试组测量。 因此, 将这类标准纳入实验研究对于更公平、更现实的基准至关重要 。
Article 13
Title@2025-07-17 (4): Optimal Empirical Risk Minimization under Temporal Distribution Shifts
Title: Optimal Empirical Risk Minimization under Temporal Distribution Shifts | Optimale Empirische Risikominimierung unter zeitlichen Verteilungsverschiebungen | 时间分布变化下最佳实证风险最小化 2507.13287v1 |
Authors (4): Yujin Jeong, Ramesh Johari, Dominik Rothenhäusler, Emily Fox
Temporal distribution shifts pose a key challenge for machine learning models trained and deployed in dynamically evolving environments. This paper introduces RIDER (RIsk minimization under Dynamically Evolving Regimes) which derives optimally-weighted empirical risk minimization procedures under temporal distribution shifts. Our approach is theoretically grounded in the random distribution shift model, where random shifts arise as a superposition of numerous unpredictable changes in the data-generating process. We show that common weighting schemes, such as pooling all data, exponentially weighting data, and using only the most recent data, emerge naturally as special cases in our framework. We demonstrate that RIDER consistently improves out-of-sample predictive performance when applied as a fine-tuning step on the Yearbook dataset, across a range of benchmark methods in Wild-Time. Moreover, we show that RIDER outperforms standard weighting strategies in two other real-world tasks: predicting stock market volatility and forecasting ride durations in NYC taxi data.
时间分布变化对在动态变化的环境中培训和部署的机器学习模式构成一个关键挑战。本文介绍了RIDER(动态演变制度下的RIsk 最小化),在时间分布变化中产生最佳加权的经验风险最小化程序。我们的方法理论上以随机分布变化模式为基础,随机变化是数据生成过程中许多不可预测的变化的叠加。我们表明,共同加权计划,如汇集所有数据、指数加权数据以及仅使用最新数据,自然会在我们的框架里作为特例出现。我们证明,RIDER在作为《年鉴》数据集的微调步骤应用时,在野时时代的一系列基准方法中,始终在改进超出全面的预测性预测性性性性性性性性能。此外,我们表明,RIDER在另外两项现实世界任务中(预测股票市场波动和预测纽约州出租车数据中的载量期限),优于标准加权战略。
Article 14
Title@2025-07-17 (4): Stochastic Weakly Convex Optimization Under Heavy-Tailed Noises
Title: Stochastic Weakly Convex Optimization Under Heavy-Tailed Noises | Stochastisch schwache Konvex-Optimierung unter schwerfälligen Geräuschen | 在重故障噪音下优化 2507.13283v1 |
Authors (3): Tianxi Zhu, Yi Xu, Xiangyang Ji
An increasing number of studies have focused on stochastic first-order methods (SFOMs) under heavy-tailed gradient noises, which have been observed in the training of practical deep learning models. In this paper, we focus on two types of gradient noises: one is sub-Weibull noise, and the other is noise under the assumption that it has a bounded $p$-th central moment ($p$-BCM) with $p\in (1, 2]$. The latter is more challenging due to the occurrence of infinite variance when $p\in (1, 2)$. Under these two gradient noise assumptions, the in-expectation and high-probability convergence of SFOMs have been extensively studied in the contexts of convex optimization and standard smooth optimization. However, for weakly convex objectives-a class that includes all Lipschitz-continuous convex objectives and smooth objectives-our understanding of the in-expectation and high-probability convergence of SFOMs under these two types of noises remains incomplete. We investigate the high-probability convergence of the vanilla stochastic subgradient descent (SsGD) method under sub-Weibull noises, as well as the high-probability and in-expectation convergence of clipped SsGD under the $p$-BCM noises. Both analyses are conducted in the context of weakly convex optimization. For weakly convex objectives that may be non-convex and non-smooth, our results demonstrate that the theoretical dependence of vanilla SsGD on the failure probability and number of iterations under sub-Weibull noises does not degrade compared to the case of smooth objectives. Under $p$-BCM noises, our findings indicate that the non-smoothness and non-convexity of weakly convex objectives do not impact the theoretical dependence of clipped SGD on the failure probability relative to the smooth case; however, the sample complexity we derived is worse than a well-known lower bound for smooth optimization.
越来越多的研究侧重于在重尾梯度低效的噪声下出现的随机先令方法(SFOMs),这在实际深层学习模型的培训中已经观察到。在本文中,我们侧重于两种类型的梯度噪音:一种是次Weibull噪声,而另一种是噪音,假设它有一个约束的中央时刻($p$-BCM),有1美元、2美元。后者由于在重尾梯度低效的噪声下出现无限差异而更具挑战性。在这两种梯度低度的噪声假设下,SFOMs的不预知性和高概率趋同性目标在Convex优化和标准平稳优化的背景下得到了广泛研究。然而,对于一个包含Lipschitz- convoly convex目标的弱点,以及对于SFOMMs在这两种噪音下的预知性变异性变异性变异性变异性变异性,我们研究的是甚易变异性变异性变异性变异性在Sgevilla-Squaltal-dealation上的结果。
Article 15
Title@2025-07-17 (4): Generative Diffusion Models for Resource Allocation in Wireless Networks
Title: Generative Diffusion Models for Resource Allocation in Wireless Networks | Generative Diffusionsmodelle zur Ressourcenallokation in drahtlosen Netzwerken | 无线网络资源分配生成传播模型 2504.20277v2 |
Authors (4): Yigit Berkay Uslu, Samar Hadou, Shirin Saeedi Bidokhti, Alejandro Ribeiro
This paper proposes a supervised training algorithm for learning stochastic resource allocation policies with generative diffusion models (GDMs). We formulate the allocation problem as the maximization of an ergodic utility function subject to ergodic Quality of Service (QoS) constraints. Given samples from a stochastic expert policy that yields a near-optimal solution to the constrained optimization problem, we train a GDM policy to imitate the expert and generate new samples from the optimal distribution. We achieve near-optimal performance through the sequential execution of the generated samples. To enable generalization to a family of network configurations, we parameterize the backward diffusion process with a graph neural network (GNN) architecture. We present numerical results in a case study of power control.
本文建议了一种以基因扩散模型(GDMs)学习随机资源分配政策的监管培训算法。我们将分配问题表述为在服务质量(QOS)限制下最大限度地发挥ERgodic效用功能。鉴于从随机随机专家政策样本中得出接近最佳的优化优化问题解决方案,我们培训了一种GDM政策,以模仿专家,并从最佳分布中产生新的样本。我们通过相继执行生成的样本,实现接近最佳的性能。为了使网络配置的大家庭能够普遍化,我们用一个图形神经网络(GNN)结构对后向扩散过程进行参数化。我们在一项权力控制案例研究中提出了数字结果。
Article 16
Title@2025-07-17 (4): Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour
Title: Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour | Bewertung von Stärkungslernen Algorithmen für die Navigation in simulierten Roboter-Quadrupen: Eine vergleichende Studie, inspiriert von Guide Dog Behaviour | 评价模拟机器人四重干扰模拟机器人四重干扰导航中强化学习的教学比值:受导狗行为启发的比较研究 2507.13277v1 |
Authors (1): Emma M. A. Harrison
Robots are increasingly integrated across industries, particularly in healthcare. However, many valuable applications for quadrupedal robots remain overlooked. This research explores the effectiveness of three reinforcement learning algorithms in training a simulated quadruped robot for autonomous navigation and obstacle avoidance. The goal is to develop a robotic guide dog simulation capable of path following and obstacle avoidance, with long-term potential for real-world assistance to guide dogs and visually impaired individuals. It also seeks to expand research into medical ‘pets’, including robotic guide and alert dogs. A comparative analysis of thirteen related research papers shaped key evaluation criteria, including collision detection, pathfinding algorithms, sensor usage, robot type, and simulation platforms. The study focuses on sensor inputs, collision frequency, reward signals, and learning progression to determine which algorithm best supports robotic navigation in complex environments. Custom-made environments were used to ensure fair evaluation of all three algorithms under controlled conditions, allowing consistent data collection. Results show that Proximal Policy Optimization (PPO) outperformed Deep Q-Network (DQN) and Q-learning across all metrics, particularly in average and median steps to goal per episode. By analysing these results, this study contributes to robotic navigation, AI and medical robotics, offering insights into the feasibility of AI-driven quadruped mobility and its role in assistive robotics.
然而,许多关于四重机器人的宝贵应用仍然被忽视。这项研究探索了三种强化学习算法在培训模拟四重机器人以进行自主导航和避免障碍方面的有效性。目标是开发一个机器人导狗模拟,能够沿途和避免障碍,长期具有现实世界协助指导狗和视力受损者的潜力。它还力求扩大对医学“小贝”的研究,包括机器人指南和警犬。对13份相关研究文件的比较分析形成了关键评价标准,包括碰撞探测、路由算法、传感器使用、机器人类型和模拟平台。研究侧重于传感器输入、碰撞频率、奖励信号和学习进展,以确定哪种算法最能支持复杂环境中的机器人导航。定制环境被用来确保在受控制的条件下对所有三种算法进行公平评价,从而能够进行一致的数据收集。结果显示,Proximus政策优化(PPPO)优于深度Q-Network(DQN),以及在所有计量标准中学习,特别是碰撞探测、碰撞频率、奖赏信号和学习过程,以便确定哪些算法最适合复杂环境中的机器人导航和移动能力。通过这些步骤来分析这一机器人驱动的自动定位。
Article 17
Title@2025-07-17 (4): Do you know what q-means?
Title: Do you know what q-means? | Weißt du, was q-bedeutet? | 你知道什么是q - means吗? 2308.09701v3 |
Authors (4): Arjan Cornelissen, Joao F. Doriguello, Alessandro Luongo, Ewin Tang
Clustering is one of the most important tools for analysis of large datasets, and perhaps the most popular clustering algorithm is Lloyd’s algorithm for $k$-means. This algorithm takes $n$ vectors $V=[v_1,\dots,v_n]\in\mathbb{R}^{d\times n}$ and outputs $k$ centroids $c_1,\dots,c_k\in\mathbb{R}^d$; these partition the vectors into clusters based on which centroid is closest to a particular vector. We present a classical $\varepsilon$-$k$-means algorithm that performs an approximate version of one iteration of Lloyd’s algorithm with time complexity $\tilde{O}\big(\frac{|V|_F^2}{n}\frac{k^{2}d}{\varepsilon^2}(k + \log{n})\big)$, exponentially improving the dependence on the data size $n$ and matching that of the “$q$-means” quantum algorithm originally proposed by Kerenidis, Landman, Luongo, and Prakash (NeurIPS’19). Moreover, we propose an improved $q$-means quantum algorithm with time complexity $\tilde{O}\big(\frac{|V|_F}{\sqrt{n}}\frac{k^{3/2}d}{\varepsilon}(\sqrt{k}+\sqrt{d})(\sqrt{k} + \log{n})\big)$ that quadratically improves the runtime of our classical $\varepsilon$-$k$-means algorithm in several parameters. Our quantum algorithm does not rely on quantum linear algebra primitives of prior work, but instead only uses QRAM to prepare simple states based on the current iteration’s clusters and multivariate quantum amplitude estimation. Finally, we provide classical and quantum query lower bounds, showing that our algorithms are optimal in most parameters.
集群是分析大型数据集的最重要工具之一, 也许最受欢迎的组算法是 劳埃德 的 $k$ 的算法 。 这个算法需要 $V = [v_ 1,\ dots,v_n]\ in\ mathb{ Rd\time} n} 美元和输出 $k c_ 1,\ dots, c_k\ in\ mathb{rd} ; 这些将矢量分割成 roid 最接近特定矢量的组 。 我们展示了一个经典的 $\ varepslon $/2 美元 美元 美元 的运算法 。
Article 18
Title@2025-07-17 (4): Merge Kernel for Bayesian Optimization on Permutation Space
Title: Merge Kernel for Bayesian Optimization on Permutation Space | Zusammenführen Kernel für Bayesian Optimierung auf Permutationsraum | Bayesian Permodation 空间优化合并核心圈 2507.13263v1 |
Authors (2): Zikai Xie, Linjiang Chen
Bayesian Optimization (BO) algorithm is a standard tool for black-box optimization problems. The current state-of-the-art BO approach for permutation spaces relies on the Mallows kernel-an $\Omega(n^2)$ representation that explicitly enumerates every pairwise comparison. Inspired by the close relationship between the Mallows kernel and pairwise comparison, we propose a novel framework for generating kernel functions on permutation space based on sorting algorithms. Within this framework, the Mallows kernel can be viewed as a special instance derived from bubble sort. Further, we introduce the \textbf{Merge Kernel} constructed from merge sort, which replaces the quadratic complexity with $\Theta(n\log n)$ to achieve the lowest possible complexity. The resulting feature vector is significantly shorter, can be computed in linearithmic time, yet still efficiently captures meaningful permutation distances. To boost robustness and right-invariance without sacrificing compactness, we further incorporate three lightweight, task-agnostic descriptors: (1) a shift histogram, which aggregates absolute element displacements and supplies a global misplacement signal; (2) a split-pair line, which encodes selected long-range comparisons by aligning elements across the two halves of the whole permutation; and (3) sliding-window motifs, which summarize local order patterns that influence near-neighbor objectives. Our empirical evaluation demonstrates that the proposed kernel consistently outperforms the state-of-the-art Mallows kernel across various permutation optimization benchmarks. Results confirm that the Merge Kernel provides a more compact yet more effective solution for Bayesian optimization in permutation space.
Bayesian Optimization (BO) 算法是黑盒优化问题的标准工具。 目前对变异空间的BO 最新状态方法依赖于 Mallows 内核 $\ Omega (n2) =2$ 代表法,该代表法明确罗列了每对对配对的比较。 由Mallows 内核与对配对比较之间的密切关系所启发, 我们提议了一个基于排序算法在变异空间生成内流函数的新框架。 在这个框架内, Mallows 上下基螺旋内核可以被视为源自泡沫类的特殊实例。 此外, 我们引入了从合并分类中构建的\ textbf{Meurge Kernel} 方法, 以 $\ Theta (n\log n) 来取代二次变形复杂性。 由此产生的特性矢量非常短, 可以用线性时间来计算, 但仍高效地捕捉到有意义的变色距离。 在不牺牲压缩的情况下, 我们进一步引入了三次轻度、 任务和右变色的内值的内值 内值 内值 。 我们引入了一种直线- 直线- 直线- 的内置的内置的内置的内置的内置的内置的内置的内置的内置的内置要素, 。
Article 19
Title@2025-07-17 (4): Automating Steering for Safe Multimodal Large Language Models
Title: Automating Steering for Safe Multimodal Large Language Models | Automatisierungslenkung für sichere multimodale große Sprachmodelle | 安全多式联运大语言模式自动化指导 2507.13255v1 |
Authors (7): Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng
Recent progress in Multimodal Large Language Models (MLLMs) has unlocked powerful cross-modal reasoning abilities, but also raised new safety concerns, particularly when faced with adversarial multimodal inputs. To improve the safety of MLLMs during inference, we introduce a modular and adaptive inference-time intervention technology, AutoSteer, without requiring any fine-tuning of the underlying model. AutoSteer incorporates three core components: (1) a novel Safety Awareness Score (SAS) that automatically identifies the most safety-relevant distinctions among the model’s internal layers; (2) an adaptive safety prober trained to estimate the likelihood of toxic outputs from intermediate representations; and (3) a lightweight Refusal Head that selectively intervenes to modulate generation when safety risks are detected. Experiments on LLaVA-OV and Chameleon across diverse safety-critical benchmarks demonstrate that AutoSteer significantly reduces the Attack Success Rate (ASR) for textual, visual, and cross-modal threats, while maintaining general abilities. These findings position AutoSteer as a practical, interpretable, and effective framework for safer deployment of multimodal AI systems.
在多式大语言模型(MLLM)方面最近取得的进展释放了强大的跨模式推理能力,但也提出了新的安全关切,特别是在面临对抗性多式联运投入时。为了在推论期间提高MLLMs的安全性,我们采用了模块和适应性推导时间干预技术AutoSteer, 无需对基本模型作任何微调。AutoSteer包含三个核心组成部分:(1) 一个新的安全意识评分(SAS),该评分自动确定该模型内部各层之间与安全最相关的区别;(2) 受过训练的适应性安全计分,以估计中间表现的有毒产出的可能性;(3) 轻量级拒绝头,在发现安全风险时有选择地干预调节生成。关于LLAVA-OVA和Chameleon的各种安全临界基准的实验表明,AutSteer在保持一般能力的同时,大大降低了对文字、视觉和跨模式威胁的攻击成功率。这些研究结果表明,AutSter是一个实用、可解释和有效的框架,可以安全地部署多式AI系统。
Article 20
Title@2025-07-17 (4): A Roadmap for Climate-Relevant Robotics Research
Title: A Roadmap for Climate-Relevant Robotics Research | Ein Fahrplan für die klimarelevante Robotikforschung | 气候相关机器人研究路线图 2507.11623v2 |
Authors (28): Alan Papalia, Charles Dawson, Laurentiu L. Anton, Norhan Magdy Bayomi, Bianca Champenois, Jung-Hoon Cho, Levi Cai, Joseph DelPreto, Kristen Edwards, Bilha-Catherine Githinji, Cameron Hickert, Vindula Jayawardana, Matthew Kramer, Shreyaa Raghavan, David Russell, Shide Salimi, Jingnan Shi, Soumya Sudhakar, Yanwei Wang, Shouyi Wang, Luca Carlone, Vijay Kumar, Daniela Rus, John E. Fernandez, Cathy Wu, George Kantor, Derek Young, Hanumant Singh
Climate change is one of the defining challenges of the 21st century, and many in the robotics community are looking for ways to contribute. This paper presents a roadmap for climate-relevant robotics research, identifying high-impact opportunities for collaboration between roboticists and experts across climate domains such as energy, the built environment, transportation, industry, land use, and Earth sciences. These applications include problems such as energy systems optimization, construction, precision agriculture, building envelope retrofits, autonomous trucking, and large-scale environmental monitoring. Critically, we include opportunities to apply not only physical robots but also the broader robotics toolkit - including planning, perception, control, and estimation algorithms - to climate-relevant problems. A central goal of this roadmap is to inspire new research directions and collaboration by highlighting specific, actionable problems at the intersection of robotics and climate. This work represents a collaboration between robotics researchers and domain experts in various climate disciplines, and it serves as an invitation to the robotics community to bring their expertise to bear on urgent climate priorities.
气候变化是21世纪的决定性挑战之一,许多机器人社区正在寻找方法。本文件提出了气候相关机器人研究的路线图,确定了机器人学家和专家在能源、建筑环境、交通、工业、土地使用和地球科学等气候领域开展合作的高效机会。这些应用包括能源系统优化、建筑、精密农业、建筑翻新、自动卡车和大规模环境监测等问题。关键的是,我们包括了不仅应用物理机器人,而且应用更广泛的机器人工具包(包括规划、认知、控制和估算算法)解决气候相关问题的机会。路线图的中心目标是通过突出机器人和气候交汇点上的具体、可操作的问题,激发新的研究方向和协作。这项工作代表了机器人研究人员与各种气候学科领域专家的合作,并且作为邀请机器人社区在紧迫的气候优先事项上运用其专门知识。
Article 21
Title@2025-07-17 (4): Leveraging Asynchronous Cross-border Market Data for Improved Day-Ahead Electricity Price Forecasting in European Markets
Title: Leveraging Asynchronous Cross-border Market Data for Improved Day-Ahead Electricity Price Forecasting in European Markets | Nutzung asynchroner grenzübergreifender Marktdaten für eine verbesserte Tagesprognose der Strompreise in den europäischen Märkten | 利用非同步跨界市场数据改进欧洲市场日间电力价格预测 2507.13250v1 |
Authors (4): Maria Margarida Mascarenhas, Jilles De Blauwe, Mikael Amelin, Hussain Kazmi
Accurate short-term electricity price forecasting is crucial for strategically scheduling demand and generation bids in day-ahead markets. While data-driven techniques have shown considerable prowess in achieving high forecast accuracy in recent years, they rely heavily on the quality of input covariates. In this paper, we investigate whether asynchronously published prices as a result of differing gate closure times (GCTs) in some bidding zones can improve forecasting accuracy in other markets with later GCTs. Using a state-of-the-art ensemble of models, we show significant improvements of 22% and 9% in forecast accuracy in the Belgian (BE) and Swedish bidding zones (SE3) respectively, when including price data from interconnected markets with earlier GCT (Germany-Luxembourg, Austria, and Switzerland). This improvement holds for both general as well as extreme market conditions. Our analysis also yields further important insights: frequent model recalibration is necessary for maximum accuracy but comes at substantial additional computational costs, and using data from more markets does not always lead to better performance - a fact we delve deeper into with interpretability analysis of the forecast models. Overall, these findings provide valuable guidance for market participants and decision-makers aiming to optimize bidding strategies within increasingly interconnected and volatile European energy markets.
准确的短期电力价格预测对于在白昼市场战略性地安排供求和产生投标至关重要。近年来,数据驱动技术在高预测准确性方面显示出相当的勇气,但它们在很大程度上依赖投入共差的质量。在本文中,我们调查由于一些投标区关闭门的时间不同,公布的价格是否不稳,能够提高其他市场的预测准确性,而后又采用GCT。 使用最先进的模型组合,我们显示比利时(BE)和瑞典(SE3)投标区的预测准确性分别大幅提高22%和9%,包括早先GCT的互联市场(德国-卢森堡、奥地利和瑞士)的价格数据。 在本文中,这种改进既保留了一般市场条件,也保留了极端市场条件。我们的分析还得出了更加重要的见解:经常的模型重新校正对于最高准确性是必要的,但又带来大量额外的计算成本。 使用来自更多市场的数据并不总是导致更好的业绩――一个事实,我们通过预测模型的可解释性分析,更深入地分析了比利时(SE3)和瑞典(SEE3)的预测性改进了比利时(BE-L)的预测性分析,我们更深入地深入地了解了预测模型,为欧洲的能源的投资者提供了不断优化的判断性决定的投资者提供了宝贵的判断。
Article 22
Title@2025-07-17 (4): Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform
Title: Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform | Annäherungssätze für Shallow ReLU$^k$ Neurale Netze auf Sobolev-Räumen über die Radon-Transformation | Sobolev空间的浅光RELU$QK$美元神经网络通过拉子变换的近似率 2408.10996v2 |
Authors (3): Tong Mao, Jonathan W. Siegel, Jinchao Xu
Let $\Omega\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(\Omega))$ with error measured in the $L_q(\Omega)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $q\leq p$, $p\geq 2$, and $s \leq k + (d+1)/2$. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU$^k$ neural networks enables them to obtain optimal approximation rates for smoothness up to order $s = k + (d+1)/2$, even though they represent piecewise polynomials of fixed degree $k$.
Let\ Omega\ subset \ mathb{Rd$ 是一个约束域 。 我们考虑的问题是, 使用 ReLU$}k$ 激活功能的浅质神经网络能如何高效地从 Sobolev 空间接近功能 $W {s (L_p (\Omega)) $(美元=qqq) $(Omega) $-norm ) 。 利用 radon 变换和差异理论的最新结果, 我们简单证明在各种情况下近似最佳近似速率, 包括当 $qleq p$, $p\geq 2 $, 和 $\leq k + (d+1) / $时。 我们得出的速率符合对数系数的优化, 并大大概括了现有结果。 一个有趣的结果是, 浅线网的适应性使得它们能够获得平滑到 $s = k + (d+1) /2$ 的最佳近似率, 即使它们代表了固定度的平面多角度的多角度 $ 。
Article 23
Title@2025-07-17 (4): The carbon cost of materials discovery: Can machine learning really accelerate the discovery of new photovoltaics?
Title: The carbon cost of materials discovery: Can machine learning really accelerate the discovery of new photovoltaics? | Die CO2-Kosten der Materialentdeckung: Kann maschinelles Lernen die Entdeckung neuer Photovoltaik wirklich beschleunigen? | 材料发现的碳成本:机器学习能否真正加速新光伏发电的发现? 2507.13246v1 |
Authors (2): Matthew Walker, Keith T. Butler
Computational screening has become a powerful complement to experimental efforts in the discovery of high-performance photovoltaic (PV) materials. Most workflows rely on density functional theory (DFT) to estimate electronic and optical properties relevant to solar energy conversion. Although more efficient than laboratory-based methods, DFT calculations still entail substantial computational and environmental costs. Machine learning (ML) models have recently gained attention as surrogates for DFT, offering drastic reductions in resource use with competitive predictive performance. In this study, we reproduce a canonical DFT-based workflow to estimate the maximum efficiency limit and progressively replace its components with ML surrogates. By quantifying the CO$_2$ emissions associated with each computational strategy, we evaluate the trade-offs between predictive efficacy and environmental cost. Our results reveal multiple hybrid ML/DFT strategies that optimize different points along the accuracy–emissions front. We find that direct prediction of scalar quantities, such as maximum efficiency, is significantly more tractable than using predicted absorption spectra as an intermediate step. Interestingly, ML models trained on DFT data can outperform DFT workflows using alternative exchange–correlation functionals in screening applications, highlighting the consistency and utility of data-driven approaches. We also assess strategies to improve ML-driven screening through expanded datasets and improved model architectures tailored to PV-relevant features. This work provides a quantitative framework for building low-emission, high-throughput discovery pipelines.
在发现高性能光伏(PV)材料的实验性努力中,计算筛选已成为对高性能光伏(PV)材料实验努力的有力补充,大多数工作流程都依靠密度功能理论(DFT)来估计与太阳能转换有关的电子和光学特性。虽然DFT计算比实验室方法效率更高,但计算仍然需要大量的计算和环境成本。机器学习模型最近作为DFT的代孕而引起注意,使资源使用量大为减少,具有有竞争力的预测性性性业绩。在这项研究中,我们复制了一个基于DFT的明性工作流程,以估计最高效率限度,并逐步用ML代理取代其组成部分。通过量化战略量化CO$_2的排放量,我们评估了预测效力和环境成本之间的取舍。我们的结果显示,多种混合的ML/DFT战略在精确排放前优化不同点。我们发现,直接预测数量,例如最高效率,比使用预测的吸收光谱作为中间步骤要容易得多。有趣的是,在DFT数据中培训的ML模型可以超越DFT工作流程,而利用替代性交换-LS-CS-CLS-S-CLS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SVDL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-Sir-S-S-SD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S
Article 24
Title@2025-07-17 (4): VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models
Title: VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models | VectorFit : Adaptive Singular & Bias Vector Fine-Tuning von vortrainierten Foundation-Modellen | 矢量Fit:培训前基金会模型的适应性单单项和比亚斯矢量微调 2503.19530v2 |
Authors (3): Suhas G Hegde, Shilpy Kaur, Aruna Tiwari
Popular PEFT methods reduce trainable parameter count for fine-tuning by parameterizing new low-rank or sparse trainable weights in parallel to the frozen pre-trained weights $W$. However, these weights are trained from scratch, and there exists a performance gap between these methods and full fine-tuning, especially in low-budget settings. We introduce VectorFit, a new way of parameterization that efficiently utilizes the existing knowledge embedded in $W$ by adaptively training their singular vectors and biases. We show that utilizing the structural and transformational properties of $W$ in this way can lead to high-rank incremental weight matrices $\Delta W$, comparable to that of full fine-tuning. VectorFit delivers superior results with \textbf{9$\boldsymbol\times$} fewer trainable parameters than the leading PEFT methods. Through comprehensive experiments across 19 datasets covering a wide range of language and vision tasks such as natural language understanding and generation, question answering, image classification, and image generation, we demonstrate that VectorFit surpasses baselines in terms of performance as a function of parameter-efficiency.
流行的PEFT方法通过对单向矢量和偏差进行适应性的培训,减少了用于微调的可训练参数,将新的低级或稀少的可训练重量与冷冻预先训练的重量相提并论,从而降低用于微调的可训练参数值。然而,这些重量是从零开始训练的,这些方法与全面微调之间存在性能差距,特别是在低预算情况下。我们引入了矢量Fit(Vector Fitt),这是一种新的参数化方法,它有效地利用了以W美元嵌入的现有知识,对单向矢量和偏差进行了适应性的培训。我们表明,以这种方式利用W$的结构性和转型性能可以导致高级增量重量矩阵($\Delta W$),与完全微调相当。VectorFit(VictorFit)提供优异的结果,与领先的PEEFTFT方法相比, 少一些可训练参数。通过涵盖广泛语言理解和生成、回答、图像分类和图像生成等一系列任务的全面试验,我们证明VectFittFit在参数效率方面超过了基线。
Article 25
Title@2025-07-17 (4): Multiple-Frequencies Population-Based Training
Title: Multiple-Frequencies Population-Based Training | Mehrfachhäufigkeiten bevölkerungsbasierte Ausbildung | 以人口为基础的培训 2506.03225v2 |
Authors (6): Waël Doulazmi, Auguste Lehuger, Marin Toromanoff, Valentin Charraut, Thibault Buhet, Fabien Moutarde
Reinforcement Learning’s high sensitivity to hyperparameters is a source of instability and inefficiency, creating significant challenges for practitioners. Hyperparameter Optimization (HPO) algorithms have been developed to address this issue, among them Population-Based Training (PBT) stands out for its ability to generate hyperparameters schedules instead of fixed configurations. PBT trains a population of agents, each with its own hyperparameters, frequently ranking them and replacing the worst performers with mutations of the best agents. These intermediate selection steps can cause PBT to focus on short-term improvements, leading it to get stuck in local optima and eventually fall behind vanilla Random Search over longer timescales. This paper studies how this greediness issue is connected to the choice of evolution frequency, the rate at which the selection is done. We propose Multiple-Frequencies Population-Based Training (MF-PBT), a novel HPO algorithm that addresses greediness by employing sub-populations, each evolving at distinct frequencies. MF-PBT introduces a migration process to transfer information between sub-populations, with an asymmetric design to balance short and long-term optimization. Extensive experiments on the Brax suite demonstrate that MF-PBT improves sample efficiency and long-term performance, even without actually tuning hyperparameters.
超光谱优化算法(HPO)已经开发出来,以解决这一问题,其中包括基于人口的培训(PBT)表明它有能力生成超光谱表,而不是固定配置。PBT培训一批具有超光度计的物剂,每个都有超光度计,经常排位,用最差的物剂变异来取代最差的性能者。这些中间选择步骤可以导致PBT注重短期改进,导致它陷入当地opima,最终落后于香草随机搜索,超过更长的时间尺度。本文研究这一贪婪问题如何与进化频率的选择(即完成选择的速度)相联系。我们提议多频度基于人口的培训(MF-PBT),这是一种新颖的HPO算法,通过使用亚光度人口来解决贪婪问题,每个变化频率不同。MBT引入了一种迁移过程,在子群之间转移信息,甚至不进行不对称的随机随机随机随机搜索,以显示BRAMF的短期和长期性能调整。
Article 26
Title@2025-07-17 (4): Computational-Statistical Tradeoffs from NP-hardness
Title: Computational-Statistical Tradeoffs from NP-hardness | Computational-Statistical Tradeoffs von NP-Härte | 对NP-硬度的计算-统计取舍 2507.13222v1 |
Authors (4): Guy Blanc, Caleb Koch, Carmen Strassle, Li-Yang Tan
A central question in computer science and statistics is whether efficient algorithms can achieve the information-theoretic limits of statistical problems. Many computational-statistical tradeoffs have been shown under average-case assumptions, but since statistical problems are average-case in nature, it has been a challenge to base them on standard worst-case assumptions. In PAC learning where such tradeoffs were first studied, the question is whether computational efficiency can come at the cost of using more samples than information-theoretically necessary. We base such tradeoffs on $\mathsf{NP}$-hardness and obtain: $\circ$ Sharp computational-statistical tradeoffs assuming $\mathsf{NP}$ requires exponential time: For every polynomial $p(n)$, there is an $n$-variate class $C$ with VC dimension $1$ such that the sample complexity of time-efficiently learning $C$ is $\Theta(p(n))$. $\circ$ A characterization of $\mathsf{RP}$ vs. $\mathsf{NP}$ in terms of learning: $\mathsf{RP} = \mathsf{NP}$ iff every $\mathsf{NP}$-enumerable class is learnable with $O(\mathrm{VCdim}(C))$ samples in polynomial time. The forward implication has been known since (Pitt and Valiant, 1988); we prove the reverse implication. Notably, all our lower bounds hold against improper learners. These are the first $\mathsf{NP}$-hardness results for improperly learning a subclass of polynomial-size circuits, circumventing formal barriers of Applebaum, Barak, and Xiao (2008).
计算机科学和统计的一个中心问题是,在计算机科学和统计学中,高效的算法能否达到统计问题的信息理论极限。许多计算-统计取舍在平均假设中得到了显示,但由于统计问题在性质上是平均情况,因此根据标准最坏假设来确定它们是一个挑战。在首次研究这种取舍的PAC学习中,问题是计算效率能否以使用比信息理论所必要的更多样本的成本来计算。我们以美元(mathsfsf{NP}美元作为这种取舍的基础。我们以美元(mathsf{NP}美元作为这种取舍的基础。对于美元计算-计算-统计取取舍,美元(cent)美元(cent)美元(circ)和美元(lickr}美元)的快速取舍:美元\math\mas=美元(nicer=NPS)以来每个多的变换值值是美元。
Article 27
Title@2025-07-17 (4): V-Max: A Reinforcement Learning Framework for Autonomous Driving
Title: V-Max: A Reinforcement Learning Framework for Autonomous Driving | V-Max: Ein Rahmen für verstärktes Lernen für autonomes Fahren | V-Max:加强自主驾驶学习框架 2503.08388v3 |
Authors (4): Valentin Charraut, Waël Doulazmi, Thomas Tournaire, Thibault Buhet
Learning-based decision-making has the potential to enable generalizable Autonomous Driving (AD) policies, reducing the engineering overhead of rule-based approaches. Imitation Learning (IL) remains the dominant paradigm, benefiting from large-scale human demonstration datasets, but it suffers from inherent limitations such as distribution shift and imitation gaps. Reinforcement Learning (RL) presents a promising alternative, yet its adoption in AD remains limited due to the lack of standardized and efficient research frameworks. To this end, we introduce V-Max, an open research framework providing all the necessary tools to make RL practical for AD. V-Max is built on Waymax, a hardware-accelerated AD simulator designed for large-scale experimentation. We extend it using ScenarioNet’s approach, enabling the fast simulation of diverse AD datasets.
以学习为基础的决策有可能促成普遍适用的自主驱动政策,减少基于规则的方法的工程间接费用。 模拟学习(IL)仍然是主导模式,受益于大规模的人类示范数据集,但受到内在限制,如分布转移和模仿差距等。强化学习(RL)是一个有希望的替代方案,但由于缺乏标准化和有效的研究框架,在应用过程中的采用仍然有限。为此,我们引入了V-Max,这是一个开放的研究框架,为AD提供所有必要的工具,使RL实用。V-Max建在Waymax上,这是为大规模实验设计的硬件加速的自动模拟器。我们利用假想网络的方法扩展它,使多种反倾销数据集能够快速模拟。
Article 28
Title@2025-07-17 (4): Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
Title: Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models | Komponativ diskreter latenter Code für High Fidelity, Produktive Diffusionsmodelle | 高菲力、生产性扩散模型、生产性扩散模型 2507.12318v2 |
Authors (3): Samuel Lavoie, Michael Noukhovitch, Aaron Courville
We argue that diffusion models’ success in modeling complex distributions is, for the most part, coming from their input conditioning. This paper investigates the representation used to condition diffusion models from the perspective that ideal representations should improve sample fidelity, be easy to generate, and be compositional to allow out-of-training samples generation. We introduce Discrete Latent Code (DLC), an image representation derived from Simplicial Embeddings trained with a self-supervised learning objective. DLCs are sequences of discrete tokens, as opposed to the standard continuous image embeddings. They are easy to generate and their compositionality enables sampling of novel images beyond the training distribution. Diffusion models trained with DLCs have improved generation fidelity, establishing a new state-of-the-art for unconditional image generation on ImageNet. Additionally, we show that composing DLCs allows the image generator to produce out-of-distribution samples that coherently combine the semantics of images in diverse ways. Finally, we showcase how DLCs can enable text-to-image generation by leveraging large-scale pretrained language models. We efficiently finetune a text diffusion language model to generate DLCs that produce novel samples outside of the image generator training distribution.
我们认为,传播模型在模拟复杂分布方面的成功在很大程度上来自输入调节。本文调查了用于限定传播模型的表述方式,其视角是理想的表示方式应当提高样本的忠诚度,易于生成,并且能够组成,以允许在培训外生成样本。我们引入了来自自监督学习目标的简易嵌入式图像代表器(DLC ) 。DLC 是离散符号序列,而不是标准的连续图像嵌入。它们很容易生成,其组成性使得能够对培训分布之外的新图像进行取样。在DLCs培训的传播模型提高了新一代的忠诚度,为在图像网络上无条件生成图像创造了新的最新艺术。此外,我们展示了由DLCs构成的图像生成过程,使分配样本能够以不同方式一致地结合图像的语义。最后,我们展示了DLCs如何通过利用大规模预培训语言模型来帮助生成文本到新版本。我们高效地将模型用于制作新版本的LCSmal 。
Article 29
Title@2025-07-17 (4): MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling
Title: MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling | MoTM: Auf dem Weg zu einem Basismodell für Zeitreihen Imputation basierend auf kontinuierlicher Modellierung | MoTM:建立基于连续建模的时间序列计算基础模型 2507.13207v1 |
Authors (3): Etienne Le Naour, Tahar Nabil, Ghislain Agoua
Recent years have witnessed a growing interest for time series foundation models, with a strong emphasis on the forecasting task. Yet, the crucial task of out-of-domain imputation of missing values remains largely underexplored. We propose a first step to fill this gap by leveraging implicit neural representations (INRs). INRs model time series as continuous functions and naturally handle various missing data scenarios and sampling rates. While they have shown strong performance within specific distributions, they struggle under distribution shifts. To address this, we introduce MoTM (Mixture of Timeflow Models), a step toward a foundation model for time series imputation. Building on the idea that a new time series is a mixture of previously seen patterns, MoTM combines a basis of INRs, each trained independently on a distinct family of time series, with a ridge regressor that adapts to the observed context at inference. We demonstrate robust in-domain and out-of-domain generalization across diverse imputation scenarios (e.g., block and pointwise missingness, variable sampling rates), paving the way for adaptable foundation imputation models.
近些年来,人们对时间序列基础模型的兴趣日益浓厚,并着重强调了预测任务。然而,对缺失值进行外出计算的关键任务在很大程度上仍未得到充分探讨。我们提出第一步,通过利用隐含神经表(INRs)来填补这一空白。IRS模型时间序列是连续的功能,自然处理各种缺失的数据假设和取样率。虽然它们在特定分布中表现出很强的性能,但在分布变化中挣扎。为了解决这个问题,我们引入了MTM(时间流模型混合),这是向时间序列估算基础模型迈出的一步。基于新时间序列是以前所见模式的混合体这一想法,IMS将每个独立地在不同的时间序列中培训的IRS的基础与一个山脊回归器结合起来,以适应所观察到的背景。我们展示了各种估算假设(例如块和点缺失率、可变抽样率)中坚固的内部和外部外总体化,为可调整的基础估算模型铺平了道路。
Article 30
Title@2025-07-17 (4): Branching Stein Variational Gradient Descent for sampling multimodal distributions
Title: Branching Stein Variational Gradient Descent for sampling multimodal distributions | Verzweigung Stein Variational Gradient Descent für die Probenahme multimodaler Verteilungen | 用于抽样多式联运分销的 2506.13916v2 |
Authors (3): Isaías Bañales, Arturo Jaramillo, Joshué Helí Ricalde-Guerrero
We propose a novel particle-based variational inference method designed to work with multimodal distributions. Our approach, referred to as Branched Stein Variational Gradient Descent (BSVGD), extends the classical Stein Variational Gradient Descent (SVGD) algorithm by incorporating a random branching mechanism that encourages the exploration of the state space. In this work, a theoretical guarantee for the convergence in distribution is presented, as well as numerical experiments to validate the suitability of our algorithm. Performance comparisons between the BSVGD and the SVGD are presented using the Wasserstein distance between samples and the corresponding computational times.
我们提出一种新的粒子变推法,用于多式联运分配。我们的方法被称为“分形斯坦因斯坦变异梯子(BSVGD) ” , 通过纳入鼓励探索国家空间的随机分支机制,扩展了古典斯坦因斯坦变异梯子(SVGD)算法。在这项工作中,提出了分配趋同的理论保证,以及验证我们算法是否适合的数值实验。BSVGD和SVGD之间的性能比较是用抽样之间的瓦塞斯坦距离和相应的计算时间进行的。
Article 31
Title@2025-07-17 (4): Relation-Aware Slicing in Cross-Domain Alignment
Title: Relation-Aware Slicing in Cross-Domain Alignment | Verhältnis-Bewusstsein-Slicing in Cross-Domain-Alignment | 跨域对齐中的关系软件切切 2507.13194v1 |
Authors (4): Dhruv Sarkar, Aprameyo Chakrabartty, Anish Chakrabarty, Swagatam Das
The Sliced Gromov-Wasserstein (SGW) distance, aiming to relieve the computational cost of solving a non-convex quadratic program that is the Gromov-Wasserstein distance, utilizes projecting directions sampled uniformly from unit hyperspheres. This slicing mechanism incurs unnecessary computational costs due to uninformative directions, which also affects the representative power of the distance. However, finding a more appropriate distribution over the projecting directions (slicing distribution) is often an optimization problem in itself that comes with its own computational cost. In addition, with more intricate distributions, the sampling itself may be expensive. As a remedy, we propose an optimization-free slicing distribution that provides fast sampling for the Monte Carlo approximation. We do so by introducing the Relation-Aware Projecting Direction (RAPD), effectively capturing the pairwise association of each of two pairs of random vectors, each following their ambient law. This enables us to derive the Relation-Aware Slicing Distribution (RASD), a location-scale law corresponding to sampled RAPDs. Finally, we introduce the RASGW distance and its variants, e.g., IWRASGW (Importance Weighted RASGW), which overcome the shortcomings experienced by SGW. We theoretically analyze its properties and substantiate its empirical prowess using extensive experiments on various alignment tasks.
Sliced Gromov-Wasserstein(SGW)距离(SGW)距离,旨在降低解决非Convex象形方案的计算成本,即Gromov-Wasserstein距离,使用单位超光谱统一抽样的投影方向。这种切片机制由于缺乏信息化的方向而产生不必要的计算成本,这也影响到距离的代表性。然而,在投影方向(切片分布)上找到一个更适当的分布,这本身往往是一个最优化的问题,而其本身的计算成本也随之而来。此外,由于分布更为复杂,取样本身可能很昂贵。作为一种补救措施,我们建议采用不优化的剪切片分布,为蒙特卡洛接近提供快速抽样。我们这样做的方法是引入Relation-Award预测方向(RADDD),有效地捕捉到每对两对随机矢量的配对的配对关系,每个配方都遵循其环境法律。这使我们能够从一个与抽样的RAPDS(RASD)相对的分布(RASD)法律(RASD),一个位置尺度尺度法律,一个与RAPA(W)对应的缩略地法律。我们用各种的变体分析了RASG(RASG(W)的变体),我们用其历史变体)的变体),我们用其变体的变体分析其变体。
Article 32
Title@2025-07-17 (4): Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis
Title: Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis | Jüngste Fortschritte bei der simulationsbasierten Schlussfolgerung für die Analyse von Gravitationswellendaten | 引力波数据分析模拟推导法最近的进展 2507.11192v2 |
Authors (2): Bo Liang, He Wang
The detection of gravitational waves by the LIGO-Virgo-KAGRA collaboration has ushered in a new era of observational astronomy, emphasizing the need for rapid and detailed parameter estimation and population-level analyses. Traditional Bayesian inference methods, particularly Markov chain Monte Carlo, face significant computational challenges when dealing with the high-dimensional parameter spaces and complex noise characteristics inherent in gravitational wave data. This review examines the emerging role of simulation-based inference methods in gravitational wave astronomy, with a focus on approaches that leverage machine-learning techniques such as normalizing flows and neural posterior estimation. We provide a comprehensive overview of the theoretical foundations underlying various simulation-based inference methods, including neural posterior estimation, neural ratio estimation, neural likelihood estimation, flow matching, and consistency models. We explore the applications of these methods across diverse gravitational wave data processing scenarios, from single-source parameter estimation and overlapping signal analysis to testing general relativity and conducting population studies. Although these techniques demonstrate speed improvements over traditional methods in controlled studies, their model-dependent nature and sensitivity to prior assumptions are barriers to their widespread adoption. Their accuracy, which is similar to that of conventional methods, requires further validation across broader parameter spaces and noise conditions.
LIGO-Virgo-KAGRA合作探测引力波,这带来了一个新的观测天文学时代,突出了快速和详细参数估计和人口层面分析的必要性。传统的巴伊西亚推断方法,特别是Markov链 Monte Carlo,在处理引力波数据中固有的高维参数空间和复杂噪音特征时,面临重大的计算挑战。本审查审查了模拟推力波方法在引力波天文学中的新作用,重点是利用机器学习技术的方法,如正常流和神经远洋估计。我们全面概述了各种模拟推力方法所依据的理论基础,包括神经后层估计、神经比率估计、神经概率估计、流动匹配和一致性模型。我们探索这些方法在各种引力波数据处理假设中的应用,从单一源参数估计和重叠信号分析到测试一般相对性和进行人口研究。尽管这些技术在控制研究中展示了对传统方法的快速改进,但其模型依赖性性质和参数的敏感度是其先前的精确度,而其先前的精确度则要求采用更为广泛的精确性。
Article 33
Title@2025-07-17 (4): GradNetOT: Learning Optimal Transport Maps with GradNets
Title: GradNetOT: Learning Optimal Transport Maps with GradNets | GradNetOT: Optimale Transportkarten mit GradNets lernen | GradNetOT: 与 GradNets一起学习最佳交通地图 2507.13191v1 |
Authors (3): Shreyas Chaudhari, Srinivasa Pranav, José M. F. Moura
Monotone gradient functions play a central role in solving the Monge formulation of the optimal transport problem, which arises in modern applications ranging from fluid dynamics to robot swarm control. When the transport cost is the squared Euclidean distance, Brenier’s theorem guarantees that the unique optimal map is the gradient of a convex function, namely a monotone gradient map, and it satisfies a Monge-Amp`ere equation. In [arXiv:2301.10862] [arXiv:2404.07361], we proposed Monotone Gradient Networks (mGradNets), neural networks that directly parameterize the space of monotone gradient maps. In this work, we leverage mGradNets to directly learn the optimal transport mapping by minimizing a training loss function defined using the Monge-Amp`ere equation. We empirically show that the structural bias of mGradNets facilitates the learning of optimal transport maps and employ our method for a robot swarm control problem.
单色梯度函数在解决从流体动态到机器人群控等现代应用中产生的最佳运输问题的蒙古式配方方面发挥着中心作用。 当运输成本为平方的 Euclidean 距离时, Brenier 的理论保证了独特的最佳地图是单色梯度函数的梯度, 即单色梯度图, 满足了蒙古- Ampere 等式。 在 [ar Xiv:2301.10862][arXiv:2404. 07361] 中, 我们提议单色度梯度网络( mGradNets), 直接将单色度梯度地图的空间参数化的神经网络。 在这项工作中, 我们利用 mGradNets 直接学习最佳的运输图绘制方法, 以尽量减少使用 Monge- Ampre 等式定义的培训损失函数。 我们从经验中显示, mGradNet 的结构偏差有利于学习最佳运输图, 并使用我们的方法解决机器人温控问题。
Article 34
Title@2025-07-17 (4): Bounding the Worst-class Error: A Boosting Approach
Title: Bounding the Worst-class Error: A Boosting Approach | Den Fehler der schlechtesten Klasse zu überwinden: Ein Boosting-Ansatz | 绕过最坏的错误 : 推动方法 2310.14890v3 |
Authors (4): Yuya Saito, Shinnosuke Matsuo, Seiichi Uchida, Daiki Suehiro
This paper tackles the problem of the worst-class error rate, instead of the standard error rate averaged over all classes. For example, a three-class classification task with class-wise error rates of 10%, 10%, and 40% has a worst-class error rate of 40%, whereas the average is 20% under the class-balanced condition. The worst-class error is important in many applications. For example, in a medical image classification task, it would not be acceptable for the malignant tumor class to have a 40% error rate, while the benign and healthy classes have a 10% error rates. To avoid overfitting in worst-class error minimization using Deep Neural Networks (DNNs), we design a problem formulation for bounding the worst-class error instead of achieving zero worst-class error. Moreover, to correctly bound the worst-class error, we propose a boosting approach which ensembles DNNs. We give training and generalization worst-class-error bound. Experimental results show that the algorithm lowers worst-class test error rates while avoiding overfitting to the training set. This code is available at https://github.com/saito-yuya/Bounding-the-Worst-class-error-A-Boosting-Approach.
本文解决了最差级错误率问题, 而不是所有类别的平均标准错误率。 例如, 3级分类任务, 等级错误率为10%、 10%和40%, 等级错误率为10%、 10%和40%, 差差率为40%, 而平均差率为20%, 在等级平衡条件下为20% 。 在许多应用中, 最差级错误很重要 。 例如, 在医学图像分类任务中, 恶性肿瘤类的误率为40%, 而良性健康类的差率为10% 。 实验结果显示, 使用深神经网络( DNNNS) 避免在最差级错误中过度适应, 我们设计了一个问题配对最差级错误的配方, 而不是达到最差级差差差的差率 。 此外, 为了正确约束最差级错误, 我们建议一种推进方法, 将 DNNS 组合成 DNNS 。 我们提供培训和一般化最坏级错误约束。 实验结果显示, 算法降低最差级测试错误率, 同时避免过度适应训练设置 。 这个代码可在 https://Wrma- probarmas- parto- / bors- pard- 。 amus- 。
Article 35
Title@2025-07-17 (4): Spectral Bellman Method: Unifying Representation and Exploration in RL
Title: Spectral Bellman Method: Unifying Representation and Exploration in RL | Spektral Bellman-Methode: Vereinheitliche Darstellung und Exploration in RL | 光谱钟门方法:统一代表与探索 2507.13181v1 |
Authors (4): Ofir Nabati, Bo Dai, Shie Mannor, Guy Tennenholtz
The effect of representation has been demonstrated in reinforcement learning, from both theoretical and empirical successes. However, the existing representation learning mainly induced from model learning aspects, misaligning with our RL tasks. This work introduces Spectral Bellman Representation, a novel framework derived from the Inherent Bellman Error (IBE) condition, which aligns with the fundamental structure of Bellman updates across a space of possible value functions, therefore, directly towards value-based RL. Our key insight is the discovery of a fundamental spectral relationship: under the zero-IBE condition, the transformation of a distribution of value functions by the Bellman operator is intrinsically linked to the feature covariance structure. This spectral connection yields a new, theoretically-grounded objective for learning state-action features that inherently capture this Bellman-aligned covariance. Our method requires a simple modification to existing algorithms. We demonstrate that our learned representations enable structured exploration, by aligning feature covariance with Bellman dynamics, and improve overall performance, particularly in challenging hard-exploration and long-horizon credit assignment tasks. Our framework naturally extends to powerful multi-step Bellman operators, further broadening its impact. Spectral Bellman Representation offers a principled and effective path toward learning more powerful and structurally sound representations for value-based reinforcement learning.
从理论和实证的成功经验中,在强化学习中都可以看出代表性的效果。然而,现有的代表性学习主要是从模型学习方面产生的,与我们的RL任务不相符。这项工作引入了Spectral Bellman Supplement(Spectral Bellman Suble),这是源于Intherent Bellman错误(IBE)的一个新框架,它与Bellman更新可能具有价值功能的空间的基本结构相匹配,因此直接面向基于价值的RL。我们的主要见解是发现一种基本的光谱关系:在零IBE条件下,贝尔曼运营商价值分配功能的转变与特征共变异结构有着内在的联系。这种光谱连接产生了一个新的、基于理论上的学习国家行动特点的新目标,而这种差异本身就反映了Bellman与共变异的内在差异。我们的方法需要简单修改现有的算法。我们所学的表述方式能够进行结构性的探索,将特征的变异性与Bellman动态相匹配,并改进总体业绩,特别是在挑战硬质探索和长期正向信贷分配任务方面。我们的框架自然延伸到强大的多级Bellman操作,进一步扩大了Bellman操作的加强原则代表,从而扩大了了方向的学习方式。
Article 36
Title@2025-07-17 (4): SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks
Title: SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks | SHIELD: Ein sicheres und hochverstärktes integriertes Lernen für robuste Deepfake-Erkennung gegen feindliche Angriffe | SHIELD: 可靠和高度强化的综合学习,以强有力地发现深假,防止反向攻击 2507.13170v1 |
Authors (4): Kutub Uddin, Awais Khan, Muhammad Umar Farooq, Khalid Malik
Audio plays a crucial role in applications like speaker verification, voice-enabled smart devices, and audio conferencing. However, audio manipulations, such as deepfakes, pose significant risks by enabling the spread of misinformation. Our empirical analysis reveals that existing methods for detecting deepfake audio are often vulnerable to anti-forensic (AF) attacks, particularly those attacked using generative adversarial networks. In this article, we propose a novel collaborative learning method called SHIELD to defend against generative AF attacks. To expose AF signatures, we integrate an auxiliary generative model, called the defense (DF) generative model, which facilitates collaborative learning by combining input and output. Furthermore, we design a triplet model to capture correlations for real and AF attacked audios with real-generated and attacked-generated audios using auxiliary generative models. The proposed SHIELD strengthens the defense against generative AF attacks and achieves robust performance across various generative models. The proposed AF significantly reduces the average detection accuracy from 95.49% to 59.77% for ASVspoof2019, from 99.44% to 38.45% for In-the-Wild, and from 98.41% to 51.18% for HalfTruth for three different generative models. The proposed SHIELD mechanism is robust against AF attacks and achieves an average accuracy of 98.13%, 98.58%, and 99.57% in match, and 98.78%, 98.62%, and 98.85% in mismatch settings for the ASVspoof2019, In-the-Wild, and HalfTruth datasets, respectively.
音频在语音校验、语音智能装置和音频会议等应用中发挥着关键作用。 但是,音频操纵,如深假等,通过传播错误信息而构成重大风险。 我们的实证分析显示,现有的深假音频探测方法往往容易受到反法(AF)攻击,特别是使用基因对抗网络袭击的反法(AF)攻击。 在本篇文章中,我们提议了一种新的合作学习方法,称为SHIELD,以抵御变形AF攻击。为了曝光AF的签名,我们整合了一个辅助基因化模型,称为国防(DF)基因模型,通过合并输入和输出促进协作学习。此外,我们设计了一个三重模型,以真实和被攻击的音频探测到真实的音频。 拟议的SHIELD加强防变形A攻击的防御能力,在各种基因化模型中实现强效性功能。 ABFID的平均探测精确度从95.49%到59.77%,AVSWiforpof19,从99.44%到38.45%到38.45%, 在Wildald-Iald-ILD 平均为98.%,在98. 和SID中,在98.
Article 37
Title@2025-07-17 (4): Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models
Title: Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models | Orbis: Herausforderungen der Langzeit-Vorhersage bei treibenden Weltmodellen überwinden | Orbis:克服在推动世界模式方面长期预测的挑战 2507.13162v1 |
Authors (5): Arian Mousakhan, Sudhanshu Mittal, Silvio Galesso, Karim Farid, Thomas Brox
Existing world models for autonomous driving struggle with long-horizon generation and generalization to challenging scenarios. In this work, we develop a model using simple design choices, and without additional supervision or sensors, such as maps, depth, or multiple cameras. We show that our model yields state-of-the-art performance, despite having only 469M parameters and being trained on 280h of video data. It particularly stands out in difficult scenarios like turning maneuvers and urban traffic. We test whether discrete token models possibly have advantages over continuous models based on flow matching. To this end, we set up a hybrid tokenizer that is compatible with both approaches and allows for a side-by-side comparison. Our study concludes in favor of the continuous autoregressive model, which is less brittle on individual design choices and more powerful than the model built on discrete tokens. Code, models and qualitative results are publicly available at https://lmb-freiburg.github.io/orbis.github.io/.
现有世界自主驱动模型, 长视距生成并概括到具有挑战性的情景。 在这项工作中, 我们开发了一个模型, 使用简单的设计选择, 不需要额外的监督或传感器, 如地图、深度或多摄像头。 我们显示, 我们的模型能产生最先进的性能, 尽管我们只有469M参数, 并受过280h的视频数据培训。 它在变换动作和城市交通等困难情况下尤为突出。 我们测试离散的象征性模型是否比基于流量匹配的连续模型有优势。 为此, 我们设置了一个混合符号, 既符合各种方法, 也允许进行平行比较。 我们的研究结论是支持连续的自动递减模型, 它对个体设计选择的难度较小, 比以离散符号为基础的模型更强大。 代码、 模型和定性结果可在 https://lmb- freiburg.github.io/orbis.github.io/ 上公开查阅 。
Article 38
Title@2025-07-17 (4): Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Title: Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities | Inverse Stärkung Lernen trifft auf großes Sprachmodell Post-Training: Grundlagen, Fortschritte und Chancen | 培训后培训:基础、进步和机会 2507.13158v1 |
Authors (2): Hao Sun, Mihaela van der Schaar
In the era of Large Language Models (LLMs), alignment has emerged as a fundamental yet challenging problem in the pursuit of more reliable, controllable, and capable machine intelligence. The recent success of reasoning models and conversational AI systems has underscored the critical role of reinforcement learning (RL) in enhancing these systems, driving increased research interest at the intersection of RL and LLM alignment. This paper provides a comprehensive review of recent advances in LLM alignment through the lens of inverse reinforcement learning (IRL), emphasizing the distinctions between RL techniques employed in LLM alignment and those in conventional RL tasks. In particular, we highlight the necessity of constructing neural reward models from human data and discuss the formal and practical implications of this paradigm shift. We begin by introducing fundamental concepts in RL to provide a foundation for readers unfamiliar with the field. We then examine recent advances in this research agenda, discussing key challenges and opportunities in conducting IRL for LLM alignment. Beyond methodological considerations, we explore practical aspects, including datasets, benchmarks, evaluation metrics, infrastructure, and computationally efficient training and inference techniques. Finally, we draw insights from the literature on sparse-reward RL to identify open questions and potential research directions. By synthesizing findings from diverse studies, we aim to provide a structured and critical overview of the field, highlight unresolved challenges, and outline promising future directions for improving LLM alignment through RL and IRL techniques.
在大语言模型(LLM)时代,在追求更可靠、可控制和更有能力的机器智能方面,调整已成为一个根本性但具有挑战性的问题。最近,推理模型和对话性AI系统的成功突出了强化学习(RL)在加强这些系统方面的关键作用,从而促使研究对RL和LLM的交叉点产生更大的兴趣。本文从反强化学习(IRL)的角度全面审查LLM调整的最新进展,强调LLM调整所使用的技术与常规RL任务之间的差别。特别是,我们强调必须从人类数据中建立神经奖赏模型,并讨论这种范式转变的正式和实际影响。我们首先在RL引入基本概念,为不熟悉实地的读者奠定基础。然后我们审视这一研究议程的最新进展,讨论为LLM调整进行IRL的关键挑战和机遇。除了方法考虑外,我们还探讨实际问题,包括数据集、基准、评价指标、基础设施、以及计算高效的培训和推导技术。最后,我们从关于Slob-R-L的文献中,从结构式研究方向,通过RL-L的全局性研究,从提供我们未解决的前沿研究方向,然后提出一个开放问题。
Article 39
Title@2025-07-17 (4): NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech
Title: NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech | NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech | 非口头翻译:一个以文本为主的非口头演唱的英文公共单位,带有文字对语音情感说明 2507.13155v1 |
Authors (3): Maksim Borisov, Egor Spirin, Daria Diatlova
Current expressive speech synthesis models are constrained by the limited availability of open-source datasets containing diverse nonverbal vocalizations (NVs). In this work, we introduce NonverbalTTS (NVTTS), a 17-hour open-access dataset annotated with 10 types of NVs (e.g., laughter, coughs) and 8 emotional categories. The dataset is derived from popular sources, VoxCeleb and Expresso, using automated detection followed by human validation. We propose a comprehensive pipeline that integrates automatic speech recognition (ASR), NV tagging, emotion classification, and a fusion algorithm to merge transcriptions from multiple annotators. Fine-tuning open-source text-to-speech (TTS) models on the NVTTS dataset achieves parity with closed-source systems such as CosyVoice2, as measured by both human evaluation and automatic metrics, including speaker similarity and NV fidelity. By releasing NVTTS and its accompanying annotation guidelines, we address a key bottleneck in expressive TTS research. The dataset is available at https://huggingface.co/datasets/deepvk/NonverbalTTS.
现有表达式语音合成模型受开放源数据集有限的限制,这些数据集包含多种非口头声音(NVs),在这项工作中,我们引入了NonverbalTTS(NVTTS),这是一个17小时开放访问数据集,附加10种NVS(例如,笑声、咳嗽)和8种情感类别。该数据集来自流行来源,VoxCeleb和Expresso,使用自动检测,然后由人类验证。我们建议建立一个综合自动语音识别(ASR)、NV标记、情感分类和聚合算法的综合管道,以合并多个发音器的转录。对NVTTS数据集的开放源文本到语音(TTS)模型进行微调,实现了与CosyVoice2等封闭源系统的平等,这些系统由人类评估和自动测量,包括语音相似性和NVeality。我们通过释放NVTTS及其附带的注释指南,在明示TIS研究中处理一个关键的瓶颈问题。数据设置可在 https/OVlevkask/ODDATS。
Article 40
Title@2025-07-17 (4): NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation
Title: NGTM: Substructure-based Neural Graph Topic Model for Interpretable Graph Generation | NGTM: Substrukturbasiertes Neural Graph Topic Model für die interpretierbare Graphengenerierung | NGTM: 以次级结构为基础的可解释图形生成神经图专题模型 2507.13133v1 |
Authors (3): Yuanxin Zhuang, Dazhong Shen, Ying Sun
Graph generation plays a pivotal role across numerous domains, including molecular design and knowledge graph construction. Although existing methods achieve considerable success in generating realistic graphs, their interpretability remains limited, often obscuring the rationale behind structural decisions. To address this challenge, we propose the Neural Graph Topic Model (NGTM), a novel generative framework inspired by topic modeling in natural language processing. NGTM represents graphs as mixtures of latent topics, each defining a distribution over semantically meaningful substructures, which facilitates explicit interpretability at both local and global scales. The generation process transparently integrates these topic distributions with a global structural variable, enabling clear semantic tracing of each generated graph. Experiments demonstrate that NGTM achieves competitive generation quality while uniquely enabling fine-grained control and interpretability, allowing users to tune structural features or induce biological properties through topic-level adjustments.
图表生成在多个领域发挥着关键作用,包括分子设计和知识图解构建。虽然现有方法在生成现实图表方面取得了相当大的成功,但其可解释性仍然有限,往往掩盖了结构性决定背后的理由。为了应对这一挑战,我们提议了神经图形专题模型(NGTM),这是一个由自然语言处理中专题建模所启发的新型发型框架。NGTM将图表作为潜在专题的混合物,每个图解都定义了具有语义意义的子结构的分布,便于在地方和全球范围内进行明确的解释。生成过程以透明的方式将这些专题分布与全球结构变量相结合,使每个生成的图表能够清晰的语义跟踪。实验表明,NGTM实现了有竞争力的生成质量,同时具有独特性,能够细微的控制和可解释性,让用户通过主题层面的调整来调节结构特征或诱发生物特性。
Article 41
Title@2025-07-17 (4): PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data
Title: PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data | PINT: Physik-informierte Neuralzeit-Serienmodelle mit Anwendungen zur langfristigen Schlussfolgerung auf WeatherBench 2m-Temperaturdaten | PINT: 应用气象区2m-温度数据长期推断的物理化神经时间序列模型 2502.04018v2 |
Authors (3): Keonvin Park, Jisu Kim, Jaemin Seo
This paper introduces PINT (Physics-Informed Neural Time Series Models), a framework that integrates physical constraints into neural time series models to improve their ability to capture complex dynamics. We apply PINT to the ERA5 WeatherBench dataset, focusing on long-term forecasting of 2m-temperature data. PINT incorporates the Simple Harmonic Oscillator Equation as a physics-informed prior, embedding its periodic dynamics into RNN, LSTM, and GRU architectures. This equation’s analytical solutions (sine and cosine functions) facilitate rigorous evaluation of the benefits of incorporating physics-informed constraints. By benchmarking against a linear regression baseline derived from its exact solutions, we quantify the impact of embedding physical principles in data-driven models. Unlike traditional time series models that rely on future observations, PINT is designed for practical forecasting. Using only the first 90 days of observed data, it iteratively predicts the next two years, addressing challenges posed by limited real-time updates. Experiments on the WeatherBench dataset demonstrate PINT’s ability to generalize, capture periodic trends, and align with physical principles. This study highlights the potential of physics-informed neural models in bridging machine learning and interpretable climate applications. Our models and datasets are publicly available on GitHub: https://github.com/KV-Park.
本文介绍了 PINT (物理、内建神经时间序列模型) , 该框架将物理限制纳入神经时间序列模型, 以提高其捕捉复杂动态的能力。 我们将 PINT 应用到 ERA5 气象基准数据集, 重点是对 2m 温度数据进行长期预测。 PINT 将简单调和振荡器作为物理知情的前期数据, 将其定期动态嵌入 RNN、 LSTM 和 GRU 结构中。 此等式的分析解决方案( 线和连线功能) 有助于严格评估纳入物理知情限制的好处。 我们根据精确解决方案对线性回归基线进行基准测试, 我们量化了将物理原则嵌入数据驱动模型的影响。 与依赖未来观测的传统时间序列模型不同, PINT 设计了实际预测。 仅使用观察数据的头90天, 反复预测了未来两年, 应对有限的实时更新所带来的挑战。 天气- 数据设置的实验显示 PINT 的常规化、 捕捉取周期性模型的能力, 以及我们现有的物理物理原则 。
Article 42
Title@2025-07-17 (4): Search for Z/2 eigenfunctions on the sphere using machine learning
Title: Search for Z/2 eigenfunctions on the sphere using machine learning | Suche nach Z/2 Eigenfunktionen auf der Kugel mittels maschinellem Lernen | 使用机器学习在球体上搜索 Z/2 电子元件 2507.13122v1 |
Authors (2): Andriy Haydys, Willem Adriaan Salm
We use machine learning to search for examples of Z/2 eigenfunctions on the 2-sphere. For this we created a multivalued version of a feedforward deep neural network, and we implemented it using the JAX library. We found Z/2 eigenfunctions for three cases: In the first two cases we fixed the branch points at the vertices of a tetrahedron and at a cube respectively. In a third case, we allowed the AI to move the branch points around and, in the end, it positioned the branch points at the vertices of a squashed tetrahedron.
我们用机器学习来寻找 2 层上的 Z/2 机能的例子。 为此, 我们创建了一个多值版本的进化深神经网络, 我们使用 JAX 库执行它。 我们发现 Z/2 机能有三个案例: 在前两个案例中, 我们分别固定了四面形和立方体顶部的分支点。 在第三个案例中, 我们允许 AI 移动分支点, 最后, 它将分支点定位在 压压压四面体顶部的分支点 。
Article 43
Title@2025-07-17 (4): RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images
Title: RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images | RS-TinyNet: Stage-wise Feature Fusion Network zur Erkennung winziger Objekte in Bildern der Fernerkundung | RS-TinyNet:在遥感图像中探测小物体的分阶段地貌融合网络 2507.13120v1 |
Authors (3): Xiaozheng Jiang, Wei Zhang, Xuerui Mao
Detecting tiny objects in remote sensing (RS) imagery has been a long-standing challenge due to their extremely limited spatial information, weak feature representations, and dense distributions across complex backgrounds. Despite numerous efforts devoted, mainstream detectors still underperform in such scenarios. To bridge this gap, we introduce RS-TinyNet, a multi-stage feature fusion and enhancement model explicitly tailored for RS tiny object detection in various RS scenarios. RS-TinyNet comes with two novel designs: tiny object saliency modeling and feature integrity reconstruction. Guided by these principles, we design three step-wise feature enhancement modules. Among them, the multi-dimensional collaborative attention (MDCA) module employs multi-dimensional attention to enhance the saliency of tiny objects. Additionally, the auxiliary reversible branch (ARB) and a progressive fusion detection head (PFDH) module are introduced to preserve information flow and fuse multi-level features to bridge semantic gaps and retain structural detail. Comprehensive experiments on public RS dataset AI-TOD show that our RS-TinyNet surpasses existing state-of-the-art (SOTA) detectors by 4.0% AP and 6.5% AP75. Evaluations on DIOR benchmark dataset further validate its superior detection performance in diverse RS scenarios. These results demonstrate that the proposed multi-stage feature fusion strategy offers an effective and practical solution for tiny object detection in complex RS environments.
遥感图像中小物体的探测是一项长期挑战,因为其空间信息极为有限,特征表现薄弱,而且分布分布复杂。尽管作出了许多努力,但主流探测器在这种情景中仍然表现不佳。为了缩小这一差距,我们引入了RS-TinyNet,一个为在各种RS情景中探测小物体而专门设计的多阶段特征聚合和增强模型。RS-TinyNet有两种新颖的设计:微小的天体特征建模和特征完整性重建。根据这些原则,我们设计了三个分步骤增强功能模块。其中包括多维协作关注模块,采用多维关注,以提高小物体的显著性。此外,还引入了辅助可逆分支和渐进式聚变探测头模块,以维护信息流动,并结合多级特征,以弥合语系差距,保留结构细节细节。关于公共遥感目标设定AI-TOD的全面实验显示,我们的RS-TinyNet超越了现有的状态增强功能模块。在4.0% AP和6.5 REVIM 多重定位模型中,提出了一套拟议的先进性战略定位模型,用以进一步测试。
Article 44
Title@2025-07-17 (4): Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression
Title: Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression | Task-Circuit Quantization: Nutzung von Wissen Lokalisierung und Dolmetschbarkeit für Komprimierung | 任务-环境环境定量:利用知识本地化和压缩解释 2504.07389v2 |
Authors (4): Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin, Mohit Bansal
Post-training quantization (PTQ) reduces a model’s memory footprint by mapping full precision weights into low bit weights without costly retraining, but can degrade its downstream performance especially in low 2- to 3-bit settings. We develop a new mixed-precision PTQ approach, Task-Circuit Quantization (TaCQ), that draws parallels to automated circuit discovery, directly conditioning the quantization process on specific weight circuits – which we define as sets of weights associated with downstream task performance. These weights are kept as 16-bit weights, while others are quantized, maintaining performance while only adding a marginal memory cost. Specifically, TaCQ contrasts unquantized model weights with a uniformly-quantized model to estimate the expected change in weights due to quantization and uses gradient information to predict the resulting impact on task performance, allowing us to preserve task-specific weights. We compare TaCQ-based quantization to existing mixed-precision quantization methods when conditioning both on general-purpose and task-specific data. Across QA, math reasoning, and text-to-SQL tasks for both Llama-3 and Qwen2.5, we find that TaCQ outperforms baselines using the same calibration data and a lower weight budget, achieving major improvements in the 2 and 3-bit regime. With only 3.1 bits we are able to recover 96% of Llama-3-8B-Instruct’s unquantized 16-bit MMLU performance, obtaining a 5.25% absolute improvement over SPQR. We also observe consistently large gains over existing methods in the 2-bit regime, with an average gain of 14.74% over the strongest baseline, SliM-LLM. Moreover, we observe a 7.20% gain without conditioning on specific tasks, showing TaCQ’s ability to identify important weights is not limited to task-conditioned settings.
训练后夸度(PTQ) 通过将完全精密重量映射成低比位重量,降低模型的记忆足迹,不进行费用高昂的再培训,但可以降低其下游性能,特别是在低2至3位设置中。我们开发了一种新的混合精密PTQ(Tacliit-Circit Quantization)(TaCQ)方法,该方法与自动电路发现相平行,将四分法进程直接限定在特定重力电路上,我们将其定义为与下游任务性能相关的数组权重。这些重量作为16比位重量加以保留,而其他重量则加以量化,保持性能仅增加边际内存成本。具体地,TacQQ(Tac) 对比未量化的模型重量,以统一的模型来估计由于量化而预期的重量变化,并使用梯度信息来预测对任务性能的影响,从而保留具体任务重量。我们把基于TacQ的四分量与现有的混合精度分解分解分解方法进行比较。在一般和任务分解数据上,其他的精度的精度的精度中, 在QB的精度上, 16A、数学中,数学-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx的平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平
Article 45
Title@2025-07-17 (4): Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction
Title: Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction | Deep Learning-based Fetal Lung Segmentation aus diffusionsgewichteten MRT-Bildern und Lungenreife-Evaluierung für fetale Wachstumsbeschränkung | 从传播加权磁RI图像和对胎儿生长限制的肺期评估中分离出的深学习-基于学习的胎儿肺部切片 2507.13106v1 |
Authors (10): Zhennan Xiao, Katharine Brudkiewicz, Zhen Yuan, Rosalind Aughwane, Magdalena Sokolska, Joanna Chappell, Trevor Gaunt, Anna L. David, Andrew P. King, Andrew Melbourne
Fetal lung maturity is a critical indicator for predicting neonatal outcomes and the need for post-natal intervention, especially for pregnancies affected by fetal growth restriction. Intra-voxel incoherent motion analysis has shown promising results for non-invasive assessment of fetal lung development, but its reliance on manual segmentation is time-consuming, thus limiting its clinical applicability. In this work, we present an automated lung maturity evaluation pipeline for diffusion-weighted magnetic resonance images that consists of a deep learning-based fetal lung segmentation model and a model-fitting lung maturity assessment. A 3D nnU-Net model was trained on manually segmented images selected from the baseline frames of 4D diffusion-weighted MRI scans. The segmentation model demonstrated robust performance, yielding a mean Dice coefficient of 82.14%. Next, voxel-wise model fitting was performed based on both the nnU-Net-predicted and manual lung segmentations to quantify IVIM parameters reflecting tissue microstructure and perfusion. The results suggested no differences between the two. Our work shows that a fully automated pipeline is possible for supporting fetal lung maturity assessment and clinical decision-making.
胎儿肺部成熟度是预测新生儿结果和产后干预需要的关键指标,特别是对受胎儿生长限制影响的妊娠而言,胎儿肺部成熟度是预测新生儿结果和产后干预需要的关键指标。在对胎儿肺部发育的非侵入性评估方面,异形混凝土运动分析显示,在对胎儿肺部发育进行非侵入性评估方面,结果大有希望,但依赖人工分解是耗时的,因此限制了临床应用。在这项工作中,我们为扩散加权磁共振成像展示了一种自动肺部成熟度评价管道,其中包括一个深层学习的胎儿肺部分解模型和一个适合模型的肺部成熟度评估。一个3D nnU-Net模型对从4D扩散加权的MRI扫描基准框中挑选的人工片段图像进行了培训。分解模型显示了强健的性表现,得出了82.14%的平均骰子系数。接下来,基于nnU-Net预测值和人工肺部分解以量化反映组织微结构和渗透的IVIM参数。结果表明,两者之间没有差异。我们的工作表明,完全自动化的管道管道可以支持胎儿肺部成熟度评估和临床决定。
Article 46
Title@2025-07-17 (4): SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts
Title: SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts | SemCSE: Semantische kontrastive Satzeinbettungen mit LLM-generierten Zusammenfassungen für wissenschaftliche Abstracts | SEMCSE: 使用LLM创制的科学摘要摘要 2507.13105v1 |
Authors (2): Marc Brinner, Sina Zarriess
We introduce SemCSE, an unsupervised method for learning semantic embeddings of scientific texts. Building on recent advances in contrastive learning for text embeddings, our approach leverages LLM-generated summaries of scientific abstracts to train a model that positions semantically related summaries closer together in the embedding space. This resulting objective ensures that the model captures the true semantic content of a text, in contrast to traditional citation-based approaches that do not necessarily reflect semantic similarity. To validate this, we propose a novel benchmark designed to assess a model’s ability to understand and encode the semantic content of scientific texts, demonstrating that our method enforces a stronger semantic separation within the embedding space. Additionally, we evaluate SemCSE on the comprehensive SciRepEval benchmark for scientific text embeddings, where it achieves state-of-the-art performance among models of its size, thus highlighting the benefits of a semantically focused training approach.
我们引入了SemCSE, 这是学习科学文本语义嵌入的一种不受监督的SemCSE, 这是一种学习科学文本语义嵌入的一种方法。 我们的方法利用LLM产生的科学摘要摘要摘要,对一种模型进行训练,这种模型将与语义相关的摘要更紧密地放在嵌入空间中。 由此产生的目标确保模型捕捉文本的真正语义内容, 与传统的以引用为基础的方法相比, 它不一定反映语义相似性。 为了验证这一点, 我们提出了一个新的基准, 旨在评估模型理解和编码科学文本语义内容的能力, 表明我们的方法在嵌入空间内加强了语义分离。 此外, 我们评估SEMCSE, 有关科学文本嵌入空间的SciRepEval综合基准, 在那里, 它在规模不同的模型中取得了最新技术表现, 从而突出了语义集中的培训方法的好处。
Article 47
Title@2025-07-17 (4): Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
Title: Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models | Unified Triplet-Level Halluzination Evaluation für große Vision-Sprache Modelle | 大型视觉语言模型统一三维级幻觉评价 2410.23114v4 |
Authors (4): Junjie Wu, Tsz Ting Chung, Kai Chen, Dit-Yan Yeung
Despite the outstanding performance in vision-language reasoning, Large Vision-Language Models (LVLMs) might generate hallucinated contents that do not exist in the given image. Most existing LVLM hallucination benchmarks are constrained to evaluate the object-related hallucinations. However, the potential hallucination on the relations between two objects, i.e., relation hallucination, still lacks investigation. To remedy that, we design a unified framework to measure the object and relation hallucination in LVLMs simultaneously. The core idea of our framework is to evaluate hallucinations via (object, relation, object) triplets extracted from LVLMs’ responses, making it easily generalizable to different vision-language tasks. Based on our framework, we further introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark which can be used to study both object and relation hallucination at the same time. With comprehensive evaluations on Tri-HE, we observe that the relation hallucination issue is even more serious than object hallucination among existing LVLMs, highlighting a previously neglected problem towards reliable LVLMs. Moreover, based on our findings, we design a simple training-free approach that effectively mitigates hallucinations for LVLMs. Our dataset and code for the reproduction of our experiments are available publicly at https://github.com/wujunjie1998/Tri-HE.
尽管在视觉语言推理方面表现出色,大型视觉语言模型(LVLM)可能会产生在特定图像中不存在的幻觉内容。大多数现有的LVLM幻觉基准都不得不评估与目标有关的幻觉。然而,对两个对象之间的关系的潜在幻觉,即关系幻觉,仍然缺乏调查。为了纠正这一点,我们设计了一个统一框架,以同时测量LVLMs中的对象和关系幻觉。我们框架的核心思想是通过LVMs答复中提取的幻觉(对象、关系、对象)三重幻觉,使LVLMS易于将其推广到不同的视觉语言任务中。此外,我们根据我们的框架,进一步引入了Tri-HE,一个全新的三重幻觉评价标准,可以同时用于研究对象和关系幻觉。在对Tri-HE的全面评价中,我们观察到,与LVLMMs之间的关系比目标幻觉问题更为严重,突出了以前被忽视的LVLMS的问题。此外,我们根据我们的研究结果,设计了一个简单的培训/MLVMS/J 有效减少我们现有的数据。
Article 48
Title@2025-07-17 (4): Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction
Title: Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction | Uni-Instruct: Einstufiges Diffusionsmodell durch Unified Diffusion Divergence Instruction | Uni- Instruct: 通过统一扩散分散指令单步扩散模型 2505.20755v2 |
Authors (6): Yifei Wang, Weimin Bai, Colin Zhang, Debing Zhang, Weijian Luo, He Sun
In this paper, we unify more than 10 existing one-step diffusion distillation approaches, such as Diff-Instruct, DMD, SIM, SiD, $f$-distill, etc, inside a theory-driven framework which we name the \textbf{\emph{Uni-Instruct}}. Uni-Instruct is motivated by our proposed diffusion expansion theory of the $f$-divergence family. Then we introduce key theories that overcome the intractability issue of the original expanded $f$-divergence, resulting in an equivalent yet tractable loss that effectively trains one-step diffusion models by minimizing the expanded $f$-divergence family. The novel unification introduced by Uni-Instruct not only offers new theoretical contributions that help understand existing approaches from a high-level perspective but also leads to state-of-the-art one-step diffusion generation performances. On the CIFAR10 generation benchmark, Uni-Instruct achieves record-breaking Frechet Inception Distance (FID) values of \textbf{\emph{1.46}} for unconditional generation and \textbf{\emph{1.38}} for conditional generation. On the ImageNet-$64\times 64$ generation benchmark, Uni-Instruct achieves a new SoTA one-step generation FID of \textbf{\emph{1.02}}, which outperforms its 79-step teacher diffusion with a significant improvement margin of 1.33 (1.02 vs 2.35). We also apply Uni-Instruct on broader tasks like text-to-3D generation. For text-to-3D generation, Uni-Instruct gives decent results, which slightly outperforms previous methods, such as SDS and VSD, in terms of both generation quality and diversity. Both the solid theoretical and empirical contributions of Uni-Instruct will potentially help future studies on one-step diffusion distillation and knowledge transferring of diffusion models.
在本文中,我们统一了超过10种现有的单步扩散蒸馏法,例如Diff-Instruct、DMD、SIM、SID、美元蒸馏等10种现有的单步扩散蒸馏法,在我们命名为\ textbfemph{Uni-Instruct{Uni-Instruct}。 Uni-Instruct的动力来自我们提议的以美元为单位的“一步扩散法”理论。然后我们引入了克服最初扩大的$(美元)振动的不易吸引问题的关键理论,从而导致相当的可移动性损失,从而通过尽量减少扩大的美元(美元)振荡式的“一步扩散模型”,Uni-Intrechtrech-rechetference 等新的理论统一不仅有助于从高层次上理解现有的方法,而且导致“一步式”的“一步式扩散”的“一进化”。 在CFAR10的生成基准中,Uni-Instructal-dechet Frechendeferal laft (F) laft_Ifrent fremodefx_defrdefxxxx_defrental_dexxxxlatexxxxxxxxxxxxlxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx。
Article 49
Title@2025-07-17 (4): Unsupervised Ground Metric Learning
Title: Unsupervised Ground Metric Learning | Unüberwachtes metrisches Lernen am Boden | 不受监督的地面计量学习 2507.13094v1 |
Authors (4): Janis Auffenberg, Jonas Bresch, Oleh Melnyk, Gabriele Steidl
Data classification without access to labeled samples remains a challenging problem. It usually depends on an appropriately chosen distance between features, a topic addressed in metric learning. Recently, Huizing, Cantini and Peyr'e proposed to simultaneously learn optimal transport (OT) cost matrices between samples and features of the dataset. This leads to the task of finding positive eigenvectors of a certain nonlinear function that maps cost matrices to OT distances. Having this basic idea in mind, we consider both the algorithmic and the modeling part of unsupervised metric learning. First, we examine appropriate algorithms and their convergence. In particular, we propose to use the stochastic random function iteration algorithm and prove that it converges linearly for our setting, although our operators are not paracontractive as it was required for convergence so far. Second, we ask the natural question if the OT distance can be replaced by other distances. We show how Mahalanobis-like distances fit into our considerations. Further, we examine an approach via graph Laplacians. In contrast to the previous settings, we have just to deal with linear functions in the wanted matrices here, so that simple algorithms from linear algebra can be applied.
无法访问标签样本的数据分类仍是一个具有挑战性的问题。 它通常取决于不同特征之间适当选择的距离, 一个在计量学习中涉及的专题。 最近, Huizizing、 Cantini 和 Peyr'e 提议同时学习样本和数据集特征之间的最佳运输成本矩阵。 这导致寻找某种非线性功能的正偏差源, 该非线性功能可以绘制成本矩阵到 OT 距离。 我们考虑到这一基本想法, 我们考虑的是未经监督的计量学习的算法和模型部分。 首先, 我们检查适当的算法及其趋同性。 特别是, 我们提议使用随机函数迭代算法, 并证明它对于我们的设置来说是线性一致的, 尽管我们的操作者在如此远的趋同中并不具有平行性。 其次, 我们问一个自然问题, 是否可以用其它距离来取代OT的距离。 我们展示了马哈拉诺比的距离如何适合我们的考虑 。 此外, 我们检查了一种方法, 通过图 Laplecian 来比较以前的设置。 。 与以前的设置相比, 我们只需要在这里处理直线性矩阵中的直线性算。
Article 50
Title@2025-07-17 (4): Truthful Elicitation of Imprecise Forecasts
Title: Truthful Elicitation of Imprecise Forecasts | Wahre Botschaft von ungenauen Prognosen | 以真真真真真真真真真切的易感简易预报 2503.16395v4 |
Authors (3): Anurag Singh, Siu Lun Chau, Krikamol Muandet
The quality of probabilistic forecasts is crucial for decision-making under uncertainty. While proper scoring rules incentivize truthful reporting of precise forecasts, they fall short when forecasters face epistemic uncertainty about their beliefs, limiting their use in safety-critical domains where decision-makers (DMs) prioritize proper uncertainty management. To address this, we propose a framework for scoring imprecise forecasts – forecasts given as a set of beliefs. Despite existing impossibility results for deterministic scoring rules, we enable truthful elicitation by drawing connection to social choice theory and introducing a two-way communication framework where DMs first share their aggregation rules (e.g., averaging or min-max) used in downstream decisions for resolving forecast ambiguity. This, in turn, helps forecasters resolve indecision during elicitation. We further show that truthful elicitation of imprecise forecasts is achievable using proper scoring rules randomized over the aggregation procedure. Our approach allows DM to elicit and integrate the forecaster’s epistemic uncertainty into their decision-making process, thus improving credibility.
概率预测的质量对于在不确定情况下的决策至关重要。适当的评分规则鼓励真实报告准确的预测,但当预测者面对其信仰的隐含不确定性时,则没有达到正确的评分规则,限制了其在决策者优先进行适当不确定性管理的安全关键领域的使用。为了解决这个问题,我们提议了一个评分不精确的预测的框架 – – 预测作为一套信念作出的预测。尽管确定性评分规则目前不可能产生结果,但我们通过与社会选择理论挂钩和引入双向通信框架,让预测者首先分享其汇总规则(例如平均或微量)用于下游决定,以解决预测的模糊性。这反过来又有助于预测者在引证过程中解决决策问题。我们进一步表明,通过对汇总程序随机进行适当的评分规则,可以实现不准确预测的真实引出。我们的方法使预测者集中的不确定性能够引出并将其纳入决策过程,从而提高可信度。
Article 51
Title@2025-07-17 (4): Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces
Title: Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces | Ungewissheitsbewusste Cross-Modal Knowledge Destillation mit Prototypenlernen für multimodale Gehirn-Computer-Schnittstellen | 与多式脑-计算机界面的原型学习相结合的不确定-软件软件的跨模式知识蒸馏 2507.13092v1 |
Authors (3): Hyo-Jeong Jang, Hye-Bin Shin, Seong-Whan Lee
Electroencephalography (EEG) is a fundamental modality for cognitive state monitoring in brain-computer interfaces (BCIs). However, it is highly susceptible to intrinsic signal errors and human-induced labeling errors, which lead to label noise and ultimately degrade model performance. To enhance EEG learning, multimodal knowledge distillation (KD) has been explored to transfer knowledge from visual models with rich representations to EEG-based models. Nevertheless, KD faces two key challenges: modality gap and soft label misalignment. The former arises from the heterogeneous nature of EEG and visual feature spaces, while the latter stems from label inconsistencies that create discrepancies between ground truth labels and distillation targets. This paper addresses semantic uncertainty caused by ambiguous features and weakly defined labels. We propose a novel cross-modal knowledge distillation framework that mitigates both modality and label inconsistencies. It aligns feature semantics through a prototype-based similarity module and introduces a task-specific distillation head to resolve label-induced inconsistency in supervision. Experimental results demonstrate that our approach improves EEG-based emotion regression and classification performance, outperforming both unimodal and multimodal baselines on a public multimodal dataset. These findings highlight the potential of our framework for BCI applications.
脑电学是大脑-计算机界面认知状态监测的基本模式。然而,它极易受到内在信号错误和人为标签错误的影响,从而导致标签噪音和最终降低模型性能。为了加强电子小组的学习,已经探索了多式知识蒸馏(KD),以便向基于电子小组的模型转移具有丰富代表性的视觉模型的知识。然而,KD面临两大挑战:模式差距和软标签不匹配。前者源于电子小组和视觉特征空间的多元性,而后者则源于造成地面真实标签和蒸馏目标之间差异的标签不一致。本文述及由模糊特征和定义薄弱的标签造成的语义不确定性。我们提出了一个新的跨模式知识蒸馏框架,以缓解模式和标签不一致性。它通过基于原型的类似模块对特征进行匹配,并引入一个特定任务的蒸馏头,以解决标签引起的监督不一致性。实验结果表明,我们的方法改善了基于EG的情感回归和分类功能与蒸馏目标之间的不一致。这份文件涉及模糊性特征和定义性能超过我们用于非形式和多式联运结果的框架。
Article 52
Title@2025-07-17 (4): Super Resolution for Renewable Energy Resource Data With Wind From Reanalysis Data and Application to Ukraine
Title: Super Resolution for Renewable Energy Resource Data With Wind From Reanalysis Data and Application to Ukraine | Super Auflösung für erneuerbare Energien Ressourcendaten mit Wind Von der Reanalyse Daten und Anwendung in die Ukraine | 乌克兰可再生能源资源数据利用风向再分析数据和应用于乌克兰的超级分辨率 2407.19086v2 |
Authors (7): Brandon N. Benton, Grant Buster, Pavlo Pinchuk, Andrew Glaws, Ryan N. King, Galen Maclaurin, Ilya Chernyakhovskiy
With a potentially increasing share of the electricity grid relying on wind to provide generating capacity and energy, there is an expanding global need for historically accurate, spatiotemporally continuous, high-resolution wind data. Conventional downscaling methods for generating these data based on numerical weather prediction have a high computational burden and require extensive tuning for historical accuracy. In this work, we present a novel deep learning-based spatiotemporal downscaling method using generative adversarial networks (GANs) for generating historically accurate high-resolution wind resource data from the European Centre for Medium-Range Weather Forecasting Reanalysis version 5 data (ERA5). In contrast to previous approaches, which used coarsened high-resolution data as low-resolution training data, we use true low-resolution simulation outputs. We show that by training a GAN model with ERA5 as the low-resolution input and Wind Integration National Dataset Toolkit (WTK) data as the high-resolution target, we achieved results comparable in historical accuracy and spatiotemporal variability to conventional dynamical downscaling. This GAN-based downscaling method additionally reduces computational costs over dynamical downscaling by two orders of magnitude. We applied this approach to downscale 30 km, hourly ERA5 data to 2 km, 5 min wind data for January 2000 through December 2023 at multiple hub heights over Ukraine, Moldova, and part of Romania. This 24-year data record is the first member of the super-resolution for renewable energy resource data with wind from the reanalysis data dataset (Sup3rWind).
随着依赖风来提供发电能力和能源的电力网份额的潜在增加,全球越来越需要历史上准确、短暂连续、高分辨率的风量数据。根据数字天气预测生成这些数据的常规降尺度方法具有很高的计算负担,需要为历史准确性进行广泛调整。在这项工作中,我们展示了一种新的深层次的基于学习的基于空间的降尺度方法,使用基因化对称网络(GANs)从欧洲中期天气预报再分析中心第五版数据(ERA5)中得出历史上准确的高分辨率风量资源数据。与以前采用粗度高分辨率数据作为低分辨率培训数据的以往方法相比,S使用真正的低分辨率模拟产出。我们通过将ERA5的GAN模型培训为低分辨率输入和风整合国家数据设置工具(WTK)数据作为高分辨率目标,我们取得了历史准确性和波粒度下坡度变化到常规动态降尺度下坡度数据(ERA5 ) 与以往方法相比,2000年12月GAN降级的高分辨率数据降级数据为2公里, 将这一罗马尼亚的双级数据再计算成本。
Article 53
Title@2025-07-17 (4): MUPAX: Multidimensional Problem Agnostic eXplainable AI
Title: MUPAX: Multidimensional Problem Agnostic eXplainable AI | MUPAX: Multidimensionales Problem Agnostic eXplainable KI | MUPAX: 多元问题Agnistic EXlable AI 2507.13090v1 |
Authors (4): Vincenzo Dentamaro, Felice Franchini, Giuseppe Pirlo, Irina Voiculescu
Robust XAI techniques should ideally be simultaneously deterministic, model agnostic, and guaranteed to converge. We propose MULTIDIMENSIONAL PROBLEM AGNOSTIC EXPLAINABLE AI (MUPAX), a deterministic, model agnostic explainability technique, with guaranteed convergency. MUPAX measure theoretic formulation gives principled feature importance attribution through structured perturbation analysis that discovers inherent input patterns and eliminates spurious relationships. We evaluate MUPAX on an extensive range of data modalities and tasks: audio classification (1D), image classification (2D), volumetric medical image analysis (3D), and anatomical landmark detection, demonstrating dimension agnostic effectiveness. The rigorous convergence guarantees extend to any loss function and arbitrary dimensions, making MUPAX applicable to virtually any problem context for AI. By contrast with other XAI methods that typically decrease performance when masking, MUPAX not only preserves but actually enhances model accuracy by capturing only the most important patterns of the original data. Extensive benchmarking against the state of the XAI art demonstrates MUPAX ability to generate precise, consistent and understandable explanations, a crucial step towards explainable and trustworthy AI systems. The source code will be released upon publication.
在理想情况下,强力XAX技术应同时使用确定性、模型不可知性,并保证汇合。我们建议采用多种确定性、模型性、模型性能和保证能同时使用。我们建议采用多立性、模型性不可知的解释性 AI(MUPAX),这是一种确定性、模型性可解释性的技术,有保证的趋同性能。MUPAX测量理论性配方通过结构化的扰动分析,发现内在输入模式并消除虚假关系,使原则性特征归属具有重要性。我们评估MUPAX的广泛数据模式和任务:音频分类(1D),图像分类(2D),体积医学图像分析(3D),解剖性标志性检测,展示无异性效果。严格的趋同性保证将任何损失功能和任意性层面都包括在内,使MUPAX几乎适用于AI的任何问题背景。与通常会减少性的工作表现的其他XAUPAX方法相比,MUPAX不仅保存而且实际上还会提高模型的准确性,只捕捉取原始数据的最重要模式。根据X艺术状况进行广泛的基准化,显示MUPAX的测量和可理解性源码解释。
Article 54
Title@2025-07-17 (4): DASViT: Differentiable Architecture Search for Vision Transformer
Title: DASViT: Differentiable Architecture Search for Vision Transformer | DASViT: Unterschiedliche Architektur Suche nach Vision Transformer | DASVVT:不同建筑搜索视野变异器 2507.13079v1 |
Authors (3): Pengjin Wu, Ferrante Neri, Zhenhua Feng
Designing effective neural networks is a cornerstone of deep learning, and Neural Architecture Search (NAS) has emerged as a powerful tool for automating this process. Among the existing NAS approaches, Differentiable Architecture Search (DARTS) has gained prominence for its efficiency and ease of use, inspiring numerous advancements. Since the rise of Vision Transformers (ViT), researchers have applied NAS to explore ViT architectures, often focusing on macro-level search spaces and relying on discrete methods like evolutionary algorithms. While these methods ensure reliability, they face challenges in discovering innovative architectural designs, demand extensive computational resources, and are time-intensive. To address these limitations, we introduce Differentiable Architecture Search for Vision Transformer (DASViT), which bridges the gap in differentiable search for ViTs and uncovers novel designs. Experiments show that DASViT delivers architectures that break traditional Transformer encoder designs, outperform ViT-B/16 on multiple datasets, and achieve superior efficiency with fewer parameters and FLOPs.
设计有效的神经网络是深层学习的基石,神经结构搜索(NAS)已成为这一进程自动化的有力工具。在现有的NAS方法中,差异型建筑搜索(DARTS)因其效率和使用方便而越来越突出,激励了许多进步。自愿景变异器(Viet)崛起以来,研究人员应用NAS来探索ViT结构,往往侧重于宏观搜索空间,并依赖进化算法等离散方法。这些方法确保可靠性,但在发现创新建筑设计、要求大量计算资源以及时间密集方面,他们面临着挑战。为了应对这些限制,我们引入了差异型建筑搜索视野变异器(DASVIT),它弥合了对ViT的不同搜索差距并揭示了新的设计。实验显示,DASVIT提供的建筑打破了传统的变异器编码设计,超越了对多个数据集的VT-B/16,并实现了以较少参数和FLOP的更高效率。
Article 55
Title@2025-07-17 (4): On the Effectiveness of the z-Transform Method in Quadratic Optimization
Title: On the Effectiveness of the z-Transform Method in Quadratic Optimization | Über die Wirksamkeit der z-Transform Methode in der quadratischen Optimierung | 关于四压压优化中z变形方法有效性问题 2507.03404v2 |
Authors (1): Francis Bach
The z-transform of a sequence is a classical tool used within signal processing, control theory, computer science, and electrical engineering. It allows for studying sequences from their generating functions, with many operations that can be equivalently defined on the original sequence and its $z$-transform. In particular, the z-transform method focuses on asymptotic behaviors and allows the use of Taylor expansions. We present a sequence of results of increasing significance and difficulty for linear models and optimization algorithms, demonstrating the effectiveness and versatility of the z-transform method in deriving new asymptotic results. Starting from the simplest gradient descent iterations in an infinite-dimensional Hilbert space, we show how the spectral dimension characterizes the convergence behavior. We then extend the analysis to Nesterov acceleration, averaging techniques, and stochastic gradient descent.
一个序列的 Z 变形是信号处理、控制理论、计算机科学和电气工程中使用的一种古典工具,它允许从生成功能中研究序列,其许多操作可以在原始序列及其$z$的变形上等量定义。特别是,z 变形法侧重于无症状行为,允许泰勒扩张。我们展示了线性模型和优化算法越来越重要和困难的结果序列,显示了Z 变形方法在产生新的无症状结果方面的有效性和多功能。我们从无限维度希尔伯特空间最简单的梯度下沉迭开始,我们展示了这些趋同行为的特征。我们随后将分析扩展至Nesterov加速、平均技术和慢度梯度下降。
Article 56
Title@2025-07-17 (4): MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs
Title: MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs | MedPix 2.0: Umfassender multimodaler biomedizinischer Datensatz für fortgeschrittene KI-Anwendungen mit retrieval Augmented Generation und Wissensgraphen | MedPix 2.0:一套综合多式生物医学数据集,用于高级AI应用,并附有回收增加的生成和知识图 2407.02994v5 |
Authors (5): Irene Siragusa, Salvatore Contino, Massimo La Ciura, Rosario Alicata, Roberto Pirrone
The increasing interest in developing Artificial Intelligence applications in the medical domain, suffers from the lack of high-quality data set, mainly due to privacy-related issues. In addition, the recent increase in Vision Language Models (VLM) leads to the need for multimodal medical data sets, where clinical reports and findings are attached to the corresponding medical scans. This paper illustrates the entire workflow for building the MedPix 2.0 data set. Starting with the well-known multimodal data set MedPix\textsuperscript{\textregistered}, mainly used by physicians, nurses, and healthcare students for Continuing Medical Education purposes, a semi-automatic pipeline was developed to extract visual and textual data followed by a manual curing procedure in which noisy samples were removed, thus creating a MongoDB database. Along with the data set, we developed a Graphical User Interface aimed at navigating efficiently the MongoDB instance and obtaining the raw data that can be easily used for training and/or fine-tuning VLMs. To enforce this point, in this work, we first recall DR-Minerva, a Retrieve Augmented Generation-based VLM model trained upon MedPix 2.0. DR-Minerva predicts the body part and the modality used to scan its input image. We also propose the extension of DR-Minerva with a Knowledge Graph that uses Llama 3.1 Instruct 8B, and leverages MedPix 2.0. The resulting architecture can be queried in a end-to-end manner, as a medical decision support system. MedPix 2.0 is available on GitHub.
医学领域对开发人工智能应用的兴趣日益浓厚,因为缺乏高质量的数据集,这主要是由于与隐私有关的问题。此外,最近视野语言模型的增加导致需要多式医疗数据集,临床报告和调查结果附在相应的医疗扫描中。本文说明了建立MedPix2.0数据集的整个工作流程。从众所周知的多式联运数据集MedPix/ TextPrimitictimeth extextlimity textrt}开始,主要为医生、护士和医科学生用于继续医学教育目的,开发了一个半自动管道,以提取视觉和文字数据,然后用手动固化程序清除杂音样本,从而创建了MongoDB数据库。我们开发了一个图形用户界面,目的是高效浏览MondPix 2.0数据集,并获取可用于培训和/或微调VLMMM的原始数据。我们首先回顾DR-Merva,一个基于RVVVVVVVAAAAAAG 的Slimal-Slimal-MLMLMI 模型, 也是MDR IMR IM IM IM IM IM IM IMU IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM
Article 57
Title@2025-07-17 (4): On statistical learning of graphs
Title: On statistical learning of graphs | Statistisches Erlernen von Schaubildern | 关于统计学图表 2507.13054v1 |
Authors (4): Vittorio Cipriani, Valentino Delle Rose, Luca San Mauro, Giovanni Solda
We study PAC and online learnability of hypothesis classes formed by copies of a countably infinite graph G, where each copy is induced by permuting G’s vertices. This corresponds to learning a graph’s labeling, knowing its structure and label set. We consider classes where permutations move only finitely many vertices. Our main result shows that PAC learnability of all such finite-support copies implies online learnability of the full isomorphism type of G, and is equivalent to the condition of automorphic triviality. We also characterize graphs where copies induced by swapping two vertices are not learnable, using a relaxation of the extension property of the infinite random graph. Finally, we show that, for all G and k>2, learnability for k-vertex permutations is equivalent to that for 2-vertex permutations, yielding a four-class partition of infinite graphs, whose complexity we also determine using tools coming from both descriptive set theory and computability theory.
我们研究PAC和在线学习由可计算到的无限图形G的复制件组成的假设类,每个复制件都是由更替 G 的顶点诱导的。 这相当于学习图形的标签, 了解其结构和标签集。 我们考虑的是, 变位的类别只有有限的许多顶点。 我们的主要结果显示, PAC对所有这些有限支持份的可学习性都意味着可以在线学习G 的全异形类型, 并且与自定义的微点条件相当。 我们还用无限随机图的扩展属性的松散来描述两个顶点的复制件无法学习的图形。 最后, 我们显示, 对于所有 G 和 k> 2 , k- vertex 变形的可学习性相当于 2 垂直变形的可学习性, 产生无限图的四级分区, 我们用描述性集理论和可调和理论的工具来决定其复杂性。
Article 58
Title@2025-07-17 (4): Uncertainty quantification for White Matter Hyperintensity segmentation detects silent failures and improves automated Fazekas quantification
Title: Uncertainty quantification for White Matter Hyperintensity segmentation detects silent failures and improves automated Fazekas quantification | Unsicherheits-Quantifizierung für White Matter Hyperintensitätssegmentierung erkennt leise Ausfälle und verbessert die automatisierte Fazekas-Quantifizierung | 白色物质超密度分离的不确定性量化,可检测静态故障,改进自动Fazekas量化 2411.17571v2 |
Authors (11): Ben Philps, Maria del C. Valdes Hernandez, Chen Qin, Una Clancy, Eleni Sakka, Susana Munoz Maniega, Mark E. Bastin, Angela C. C. Jochems, Joanna M. Wardlaw, Miguel O. Bernabeu, Alzheimers Disease Neuroimaging Initiative
White Matter Hyperintensities (WMH) are key neuroradiological markers of small vessel disease present in brain MRI. Assessment of WMH is important in research and clinics. However, WMH are challenging to segment due to their high variability in shape, location, size, poorly defined borders, and similar intensity profile to other pathologies (e.g stroke lesions) and artefacts (e.g head motion). In this work, we assess the utility and semantic properties of the most effective techniques for uncertainty quantification (UQ) in segmentation for the WMH segmentation task across multiple test-time data distributions. We find UQ techniques reduce ‘silent failure’ by identifying in UQ maps small WMH clusters in the deep white matter that are unsegmented by the model. A combination of Stochastic Segmentation Networks with Deep Ensembles also yields the highest Dice and lowest Absolute Volume Difference % (AVD) score and can highlight areas where there is ambiguity between WMH and stroke lesions. We further demonstrate the downstream utility of UQ, proposing a novel method for classification of the clinical Fazekas score using spatial features extracted from voxelwise WMH probability and UQ maps. We show that incorporating WMH uncertainty information improves Fazekas classification performance and calibration. Our model with (UQ and spatial WMH features)/(spatial WMH features)/(WMH volume only) achieves a balanced accuracy score of 0.74/0.67/0.62, and root brier score of 0.65/0.72/0.74 in the Deep WMH and balanced accuracy of 0.74/0.73/0.71 and root brier score of 0.64/0.66/0.68 in the Periventricular region. We further demonstrate that stochastic UQ techniques with high sample diversity can improve the detection of poor quality segmentations.
白质超重( WM4/ H) 是大脑MRI 中小船舶疾病的关键神经放射标记。 对 WMH 的评估在研究和诊所中很重要。 然而, WMH 因其形状、位置、大小、定义不完善的边界等差异性强,以及与其他病状( 如中风损伤) 和人工制品( 如头部运动) 相似的强度配置, 给部分带来挑战。 在这项工作中, 我们评估了在多个测试时数据发布时对 WMH 部分进行分解( UQ ) 的最有效技术的效用和语义特性。 我们发现 UQ 技术通过在 UQ 地图中绘制与深白物质的小 WMH 群集, 以及与其他病状( 如中心) 一样的强度配置性能配置, 也得出最高的Dice 和最低绝对量比值( AVDD) , 并且能够突出WMH 和中位值 。 我们进一步展示了 UQ 的下游效用, 提出了将OM 和WH 的精确度分解的精确度分解方法。
Article 59
Title@2025-07-17 (4): The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting
Title: The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting | Die Kraft der Architektur: Tiefgehen in Transformer-Architekturen für langfristige Zeitreihen | 建筑力量:为长期时间序列预测而向变形结构深度下潜 2507.13043v1 |
Authors (8): Lefei Shen, Mouxiang Chen, Han Fu, Xiaoxue Ren, Xiaoyun Joy Wang, Jianling Sun, Zhuo Li, Chenghao Liu
Transformer-based models have recently become dominant in Long-term Time Series Forecasting (LTSF), yet the variations in their architecture, such as encoder-only, encoder-decoder, and decoder-only designs, raise a crucial question: What Transformer architecture works best for LTSF tasks? However, existing models are often tightly coupled with various time-series-specific designs, making it difficult to isolate the impact of the architecture itself. To address this, we propose a novel taxonomy that disentangles these designs, enabling clearer and more unified comparisons of Transformer architectures. Our taxonomy considers key aspects such as attention mechanisms, forecasting aggregations, forecasting paradigms, and normalization layers. Through extensive experiments, we uncover several key insights: bi-directional attention with joint-attention is most effective; more complete forecasting aggregation improves performance; and the direct-mapping paradigm outperforms autoregressive approaches. Furthermore, our combined model, utilizing optimal architectural choices, consistently outperforms several existing models, reinforcing the validity of our conclusions. We hope these findings offer valuable guidance for future research on Transformer architectural designs in LTSF. Our code is available at https://github.com/HALF111/TSF_architecture.
最近,在长期时间序列预测(LTCSF)中,基于变换器的模型最近已成为长期时间序列预测(LTSF)的主要模型,然而,其结构的变异,如只编码器、编码器-代码器和只编码器设计等,提出了一个关键问题:什么变异器结构最适合LTSF的任务?然而,现有模型往往与各种特定时间序列的设计紧密结合,使得难以分离建筑本身的影响。为了解决这个问题,我们提议了一个新的分类学,分离这些设计,使得变异器结构的比较更加清晰和更加统一。我们的分类学考虑了关注机制、预测集、预测范式和正常化层等关键方面。通过广泛的实验,我们发现了几个关键洞察力:联合关注的双向关注最为有效;更完整的预测组合提高了性能;直接绘图范式超越了结构本身的反向性。此外,我们的综合模型利用最佳的建筑选择,始终超越了现有的几个模型,加强了我们的结论的有效性。我们希望这些发现为未来变异器/FLFSF的建筑设计设计设计提供了宝贵的指导。
Article 60
Title@2025-07-17 (4): Confidence-Filtered Relevance (CFR): An Interpretable and Uncertainty-Aware Machine Learning Framework for Naturalness Assessment in Satellite Imagery
Title: Confidence-Filtered Relevance (CFR): An Interpretable and Uncertainty-Aware Machine Learning Framework for Naturalness Assessment in Satellite Imagery | Confidence-Filtered Relevance (CFR): Ein interpretierbares und unsicheres Machine Learning Framework für die Bewertung von Natürlichkeit in Satellitenbildern | 信任改变的相关性:卫星图像中自然评估的 解释性和不确定性和不确定性-智能学习框架 2507.13034v1 |
Authors (2): Ahmed Emam, Ribana Roscher
Protected natural areas play a vital role in ecological balance and ecosystem services. Monitoring these regions at scale using satellite imagery and machine learning is promising, but current methods often lack interpretability and uncertainty-awareness, and do not address how uncertainty affects naturalness assessment. In contrast, we propose Confidence-Filtered Relevance (CFR), a data-centric framework that combines LRP Attention Rollout with Deep Deterministic Uncertainty (DDU) estimation to analyze how model uncertainty influences the interpretability of relevance heatmaps. CFR partitions the dataset into subsets based on uncertainty thresholds, enabling systematic analysis of how uncertainty shapes the explanations of naturalness in satellite imagery. Applied to the AnthroProtect dataset, CFR assigned higher relevance to shrublands, forests, and wetlands, aligning with other research on naturalness assessment. Moreover, our analysis shows that as uncertainty increases, the interpretability of these relevance heatmaps declines and their entropy grows, indicating less selective and more ambiguous attributions. CFR provides a data-centric approach to assess the relevance of patterns to naturalness in satellite imagery based on their associated certainty.
自然保护区在生态平衡和生态系统服务方面发挥着关键作用。利用卫星图像和机器学习对这些地区进行规模监测是很有希望的,但目前的方法往往缺乏解释性和不确定性意识,而且没有解决不确定性如何影响自然评估的问题。相反,我们提议采用以数据为中心的框架,将LRP 关注推出和深确定性不确定性(DDDU)估算结合起来,以分析模型不确定性如何影响相关热图的可解释性。CFR 将数据集分成基于不确定性阈值的子集,以便能够系统分析不确定性如何影响卫星图像的自然特性解释。CFR适用于AnthroProtect数据集,将信任性相关性与灌木地、森林和湿地等自然评估研究结合起来。此外,我们的分析表明,随着不确定性的增加,这些相关热图的下降及其酶增长的可解释性,表明较少选择性和更加模糊性。CFRR提供了一种以数据为中心的方法,用以根据相关确定性评估卫星图像模式与自然相关性的相关性。
Article 61
Title@2025-07-17 (4): (Exhaustive) Symbolic Regression and model selection by minimum description length
Title: (Exhaustive) Symbolic Regression and model selection by minimum description length | (Erschöpfend) Symbolische Regression und Modellauswahl nach minimaler Beschreibungslänge | 按最低描述长度分列的符号回归和模型选择 2507.13033v1 |
Authors (1): Harry Desmond
Symbolic regression is the machine learning method for learning functions from data. After a brief overview of the symbolic regression landscape, I will describe the two main challenges that traditional algorithms face: they have an unknown (and likely significant) probability of failing to find any given good function, and they suffer from ambiguity and poorly-justified assumptions in their function-selection procedure. To address these I propose an exhaustive search and model selection by the minimum description length principle, which allows accuracy and complexity to be directly traded off by measuring each in units of information. I showcase the resulting publicly available Exhaustive Symbolic Regression algorithm on three open problems in astrophysics: the expansion history of the universe, the effective behaviour of gravity in galaxies and the potential of the inflaton field. In each case the algorithm identifies many functions superior to the literature standards. This general purpose methodology should find widespread utility in science and beyond.
符号回归是从数据中学习函数的机器学习方法。 在简要概述象征性回归图景之后,我将描述传统算法所面临的两个主要挑战:它们可能无法找到任何特定良好功能的未知(而且可能相当重要)概率,而且它们在其函数选择程序中受到模糊和不合理假设的影响。为了解决这些问题,我提议按照最低描述长度原则进行详尽的搜索和模型选择,以便通过测量每个信息单位来直接交换准确性和复杂性。我将由此产生的公开提供的关于天体物理学三个公开问题的散射符号回归算法展示出来:宇宙的扩张历史、星系重力的有效行为以及膨胀场的潜力。在每种情况下,算法都确定了许多高于文献标准的功能。这种通用方法应在科学领域和范围以外找到广泛的用途。
Article 62
Title@2025-07-17 (4): When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values
Title: When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values | Wenn Pattern-by-Pattern arbeitet: Theoretische und Empirische Einblicke für Logistische Modelle mit fehlenden Werten | 代代办法:缺少价值的后勤模式理论和经验透视 2507.13024v1 |
Authors (3): Christophe Muller, Erwan Scornet, Julie Josse
Predicting a response with partially missing inputs remains a challenging task even in parametric models, since parameter estimation in itself is not sufficient to predict on partially observed inputs. Several works study prediction in linear models. In this paper, we focus on logistic models, which present their own difficulties. From a theoretical perspective, we prove that a Pattern-by-Pattern strategy (PbP), which learns one logistic model per missingness pattern, accurately approximates Bayes probabilities in various missing data scenarios (MCAR, MAR and MNAR). Empirically, we thoroughly compare various methods (constant and iterative imputations, complete case analysis, PbP, and an EM algorithm) across classification, probability estimation, calibration, and parameter inference. Our analysis provides a comprehensive view on the logistic regression with missing values. It reveals that mean imputation can be used as baseline for low sample sizes, and improved performance is obtained via nonlinear multiple iterative imputation techniques with the labels (MICE.RF.Y). For large sample sizes, PbP is the best method for Gaussian mixtures, and we recommend MICE.RF.Y in presence of nonlinear features.
即便在参数模型中,预测部分缺少投入的反应仍然是一项艰巨的任务,因为参数估计本身不足以预测部分观察到的投入。一些工程研究对线性模型的预测。在本文件中,我们侧重于后勤模型,这些模型本身有困难。从理论角度看,我们证明一个按部就班的战略(PbP),每个缺失模式学习一个后勤模型,精确地接近各种缺失数据假设(MCAR、MAR和MNAR)中的贝斯概率。生动地说,我们彻底比较各种分类、概率估计、校准和参数推算方法(一致和迭代估算、完整案例分析、PbPP和EM算法)。我们的分析为缺少值的后勤回归提供了全面的观点。它表明,平均估算可用作低抽样规模的基准,并通过非线性多迭代估算技术(MICE.RF.Y)获得改进性能。对于大样本尺寸,PbP是高射力混合物的最佳方法,我们建议在高射力混合物中的非磁力。
Article 63
Title@2025-07-17 (4): Fault detection and diagnosis for the engine electrical system of a space launcher based on a temporal convolutional autoencoder and calibrated classifiers
Title: Fault detection and diagnosis for the engine electrical system of a space launcher based on a temporal convolutional autoencoder and calibrated classifiers | Fehlererkennung und Diagnose für das elektrische Motorsystem eines Raumwerfers basierend auf einem zeitlich konvolutionären Autoencoder und kalibrierten Klassifikatoren | 以时富集自动编码器和校准分类器为基础的空间发射装置发动机电气系统的故障检测和诊断 2507.13022v1 |
Authors (4): Luis Basora, Louison Bocquet-Nouaille, Elinirina Robinson, Serge Le Gonidec
In the context of the health monitoring for the next generation of reusable space launchers, we outline a first step toward developing an onboard fault detection and diagnostic capability for the electrical system that controls the engine valves. Unlike existing approaches in the literature, our solution is designed to meet a broader range of key requirements. This includes estimating confidence levels for predictions, detecting out-of-distribution (OOD) cases, and controlling false alarms. The proposed solution is based on a temporal convolutional autoencoder to automatically extract low-dimensional features from raw sensor data. Fault detection and diagnosis are respectively carried out using a binary and a multiclass classifier trained on the autoencoder latent and residual spaces. The classifiers are histogram-based gradient boosting models calibrated to output probabilities that can be interpreted as confidence levels. A relatively simple technique, based on inductive conformal anomaly detection, is used to identify OOD data. We leverage other simple yet effective techniques, such as cumulative sum control chart (CUSUM) to limit the false alarms, and threshold moving to address class imbalance in fault detection. The proposed framework is highly configurable and has been evaluated on simulated data, covering both nominal and anomalous operational scenarios. The results indicate that our solution is a promising first step, though testing with real data will be necessary to ensure that it achieves the required maturity level for operational use.
在对下一代可再利用的空间发射器进行健康监测的背景下,我们概述了为控制发动机阀门的电气系统开发机载故障检测和诊断能力的第一步。与文献中的现有方法不同,我们的解决办法旨在满足更广泛的关键要求。这包括估计预测的可信度水平,检测分配外(OOD)案例,控制虚假警报。拟议解决方案的基础是一个时序自动自动编码器,以自动从原始传感器数据中提取低维特征。分别使用一个二进制和多级分类器对控制发动机阀门进行检测和诊断。与文献中的现有方法不同,我们的解决办法旨在满足更广泛的关键要求。这包括估算预测的可信度水平,探测出分配外(OOOD)案例,控制假警报。我们利用其他简单而有效的技术,例如累积总控制图(CUSUM)来限制虚假警报,并用在自动编码潜在和剩余空间中处理舱载错失的分类分级器进行检测。拟议框架是基于直方图的梯梯梯梯梯梯梯梯梯梯梯梯梯升模型,其真实性数据将经过高额的模拟测试,通过模拟模拟和模拟测试,以模拟方式进行实际测试。
Article 64
Title@2025-07-17 (4): The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks
Title: The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks | Die Spätphasen-Trainingsdynamik des (stochastischen) subgradienten Abstiegs auf homogenen neuronalen Netzwerken | 在同质神经网络上的(随机)亚梯级下降的后阶段培训动态 2502.05668v3 |
Authors (2): Sholom Schechtman, Nicolas Schreuder
We analyze the implicit bias of constant step stochastic subgradient descent (SGD). We consider the setting of binary classification with homogeneous neural networks - a large class of deep neural networks with ReLU-type activation functions such as MLPs and CNNs without biases. We interpret the dynamics of normalized SGD iterates as an Euler-like discretization of a conservative field flow that is naturally associated to the normalized classification margin. Owing to this interpretation, we show that normalized SGD iterates converge to the set of critical points of the normalized margin at late-stage training (i.e., assuming that the data is correctly classified with positive normalized margin). Up to our knowledge, this is the first extension of the analysis of Lyu and Li (2020) on the discrete dynamics of gradient descent to the nonsmooth and stochastic setting. Our main result applies to binary classification with exponential or logistic losses. We additionally discuss extensions to more general settings.
我们分析了恒定步骤随机亚梯级下移(SGD)的隐含偏差。 我们考虑在同质神经网络中设定二进制分类,这是一大批具有RELU型激活功能的深神经网络,如MLPs和CNN无偏见的深层神经网络。 我们把正常的 SGD 循环的动态解释为与正常分类差自然相关的保守外向流的分解。 由于这一解释, 我们发现, 正常的 SGD 循环在后阶段培训中会与正常差的临界点组合相融合( 假设数据被正确分类为正正正正平差 ) 。 据我们所知, 这是对Lyu和Li (2020年) 的首次扩展分析, 分析梯度下移到非光谱和随机环境的分立动态。 我们的主要结果适用于具有指数或后勤损失的二进制分类。 我们还讨论了更一般环境的扩展。
Article 65
Title@2025-07-17 (4): SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs
Title: SMART: Relation-Aware Learning of Geometric Representations for Knowledge Graphs | SMART: Beziehungsorientiertes Lernen geometrischer Darstellungen für Wissensgraphen | SMART:知识图表几何表示法关系-知识学习 2507.13001v1 |
Authors (6): Kossi Amouzouvi, Bowen Song, Andrea Coletta, Luigi Bellomarini, Jens Lehmann, Sahar Vahdati
Knowledge graph representation learning approaches provide a mapping between symbolic knowledge in the form of triples in a knowledge graph (KG) and their feature vectors. Knowledge graph embedding (KGE) models often represent relations in a KG as geometric transformations. Most state-of-the-art (SOTA) KGE models are derived from elementary geometric transformations (EGTs), such as translation, scaling, rotation, and reflection, or their combinations. These geometric transformations enable the models to effectively preserve specific structural and relational patterns of the KG. However, the current use of EGTs by KGEs remains insufficient without considering relation-specific transformations. Although recent models attempted to address this problem by ensembling SOTA baseline models in different ways, only a single or composite version of geometric transformations are used by such baselines to represent all the relations. In this paper, we propose a framework that evaluates how well each relation fits with different geometric transformations. Based on this ranking, the model can: (1) assign the best-matching transformation to each relation, or (2) use majority voting to choose one transformation type to apply across all relations. That is, the model learns a single relation-specific EGT in low dimensional vector space through an attention mechanism. Furthermore, we use the correlation between relations and EGTs, which are learned in a low dimension, for relation embeddings in a high dimensional vector space. The effectiveness of our models is demonstrated through comprehensive evaluations on three benchmark KGs as well as a real-world financial KG, witnessing a performance comparable to leading models
知识图形学习方法在知识图(KG)及其特性矢量中以三重形式提供的象征性知识之间提供了一种图解。知识图嵌入模型(KGE)往往以几何转换形式代表KG中的关系。大多数最先进的(SOTA)KGE模型来自基本几何转换(EGTs),如翻译、缩放、旋转、反射或组合。这些几何转换使模型能够有效地保存KG的具体结构和关系模式。然而,KGE目前对EGTs的使用仍然不够充分,而没有考虑特定关系的变化。虽然最近的一些模型试图通过以不同方式组合SOTA基线模型来解决这一问题,但这种基准只使用单一或综合版的几何变模型来代表所有关系。在本文中,我们提出了一个框架,用以评估每种关系与不同的几何变化之间的关系。根据这一等级,模型可以:(1) 给每一种关系分配最匹配的转换,或者(2) 多数人选择一种变异类型,作为整个GTF关系中的一种可比较性模型, 一种我们通过一个学习的深度模型, 一种高层次关系。
Article 66
Title@2025-07-17 (4): Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Title: Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning | Differential-informierte Probenauswahl beschleunigt multimodales kontrastives Lernen | 不同知情的抽样甄选加速多模式差异学习 2507.12998v1 |
Authors (8): Zihua Zhao, Feng Hong, Mengxi Chen, Pengyi Chen, Benyuan Liu, Jiangchao Yao, Ya Zhang, Yanfeng Wang
The remarkable success of contrastive-learning-based multimodal models has been greatly driven by training on ever-larger datasets with expensive compute consumption. Sample selection as an alternative efficient paradigm plays an important direction to accelerate the training process. However, recent advances on sample selection either mostly rely on an oracle model to offline select a high-quality coreset, which is limited in the cold-start scenarios, or focus on online selection based on real-time model predictions, which has not sufficiently or efficiently considered the noisy correspondence. To address this dilemma, we propose a novel Differential-Informed Sample Selection (DISSect) method, which accurately and efficiently discriminates the noisy correspondence for training acceleration. Specifically, we rethink the impact of noisy correspondence on contrastive learning and propose that the differential between the predicted correlation of the current model and that of a historical model is more informative to characterize sample quality. Based on this, we construct a robust differential-based sample selection and analyze its theoretical insights. Extensive experiments on three benchmark datasets and various downstream tasks demonstrate the consistent superiority of DISSect over current state-of-the-art methods. Source code is available at: https://github.com/MediaBrain-SJTU/DISSect.
对比性学习的多式联运模式的显著成功,在很大程度上是由关于昂贵的计算消费的日益扩大的数据集的培训所促成的。抽样选择作为一种高效的替代模式,是加快培训进程的一个重要方向。然而,最近抽样选择的进展主要依靠一个神器模型,以脱线选择高质量的核心集,这种模式在冷开始的情景中是有限的,或者侧重于基于实时模型预测的在线选择,这种模型预测没有充分或有效地考虑到吵闹的通信。为了解决这一难题,我们提出了一种新的差异化抽样选择方法(DISSect),它准确和有效地区分了加快培训速度的吵闹通信。具体地说,我们重新考虑了噪音通信对对比性学习的影响,并提议目前模式与历史模型的预期相关性之间的差别对于确定样本质量的特性更为丰富。在此基础上,我们构建了一个强有力的基于差异的样本选择,并分析其理论洞察力。关于三个基准数据集和各种下游任务的广泛实验表明,DISect区始终优于当前状态-艺术方法。源代码见: https/givex/trubsrain。
Article 67
Title@2025-07-17 (4): (Almost) Free Modality Stitching of Foundation Models
Title: (Almost) Free Modality Stitching of Foundation Models | (Fast) Freie Modalitätsstiche von Stiftungsmodellen | (几乎) 基金会模型的免费方式 2507.10015v3 |
Authors (4): Jaisidh Singh, Diganta Misra, Boris Knyazev, Antonio Orvieto
Foundation multi-modal models are often designed by stitching of multiple existing pretrained uni-modal models: for example, an image classifier with an text model. This stitching process is performed by training a connector module that aims to align the representation spaces of these uni-modal models towards a multi-modal objective. However, given the complexity of training such connectors on large scale web-based datasets coupled with the ever-increasing number of available pretrained uni-modal models, the task of uni-modal models selection and subsequent connector module training becomes computationally demanding. To address this under-studied critical problem, we propose Hypernetwork Model Alignment (Hyma), a novel all-in-one solution for optimal uni-modal model selection and connector training by leveraging hypernetworks. Specifically, our framework utilizes the parameter prediction capability of a hypernetwork to obtain jointly trained connector modules for $N \times M$ combinations of uni-modal models. In our experiments, Hyma reduces the cost of searching for the best performing uni-modal model pair by $10\times$, while matching the ranking and trained connector performance obtained via grid search across a suite of diverse multi-modal benchmarks.
基础型多模式模式往往通过缝合多种经过预先训练的单一模式模式来设计:例如,一个带有文本模型的图像分类器。这一缝合过程是通过培训一个连接器模块来进行的,该模块旨在将这些单一模式模式的代表空间与多模式目标相匹配。然而,鉴于在大型网络数据集中培训这些连接器的复杂性,加上现有经过预先训练的单一模式模型的数量不断增加,单模式选择和随后的连接器模块培训的任务在计算上变得要求很高。为了解决这个研究不足的关键问题,我们提议了超网络模型协调(Hyma),这是利用超网络将这些单一模式模型的最佳选择和连接器培训的新型全方位解决方案。具体地说,我们的框架利用超网络的参数预测能力,获得经过联合培训的单模式元和元模式组合的连接器模块。在我们的实验中,Hyma降低了通过经过培训的多模式搜索的10美元组合进行最佳执行单模式模型测试的成本,同时匹配了经过培训的多级搜索和经过10美元升级的多级标准。
Article 68
Title@2025-07-17 (4): Teach Old SAEs New Domain Tricks with Boosting
Title: Teach Old SAEs New Domain Tricks with Boosting | Lehren Sie alte SAEs neue Domain Tricks mit Förderung | 教授旧的 SAEs 新域圈套 2507.12990v1 |
Authors (6): Nikita Koriagin, Yaroslav Aksenov, Daniil Laptev, Gleb Gerasimov, Nikita Balagansky, Daniil Gavrilov
Sparse Autoencoders have emerged as powerful tools for interpreting the internal representations of Large Language Models, yet they often fail to capture domain-specific features not prevalent in their training corpora. This paper introduces a residual learning approach that addresses this feature blindness without requiring complete retraining. We propose training a secondary SAE specifically to model the reconstruction error of a pretrained SAE on domain-specific texts, effectively capturing features missed by the primary model. By summing the outputs of both models during inference, we demonstrate significant improvements in both LLM cross-entropy and explained variance metrics across multiple specialized domains. Our experiments show that this method efficiently incorporates new domain knowledge into existing SAEs while maintaining their performance on general tasks. This approach enables researchers to selectively enhance SAE interpretability for specific domains of interest, opening new possibilities for targeted mechanistic interpretability of LLMs.
粗略的Autoencolders已成为解释大语言模型内部代表性的有力工具,但它们往往未能捕捉到在培训公司中并不普遍存在的特定领域特征。本文介绍了一种处理这一特异性失明的留级学习方法,而无需经过全面再培训。我们建议专门培训二级SAE,以在特定领域文本上模拟经过预先培训的SAE的重建错误,有效地捕捉主要模式所遗漏的特征。通过在推断过程中对两种模型的产出进行总结,我们展示了在多种专门领域LLM交叉渗透和解释差异性指标方面的重大改进。我们的实验表明,这种方法有效地将新的域知识纳入现有的SAE,同时保持其在一般任务上的绩效。这一方法使研究人员能够有选择地提高SEA在特定领域可解释性,为LMM具有针对性的机械性解释性开辟新的可能性。
Article 69
Title@2025-07-17 (4): Variance-Based Pruning for Accelerating and Compressing Trained Networks
Title: Variance-Based Pruning for Accelerating and Compressing Trained Networks | Varianzbasiertes Pruning für beschleunigte und komprimierende Ausgebildete Netzwerke | 加快和压缩经过训练的网络 2507.12988v1 |
Authors (3): Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache
Increasingly expensive training of ever larger models such as Vision Transfomers motivate reusing the vast library of already trained state-of-the-art networks. However, their latency, high computational costs and memory demands pose significant challenges for deployment, especially on resource-constrained hardware. While structured pruning methods can reduce these factors, they often require costly retraining, sometimes for up to hundreds of epochs, or even training from scratch to recover the lost accuracy resulting from the structural modifications. Maintaining the provided performance of trained models after structured pruning and thereby avoiding extensive retraining remains a challenge. To solve this, we introduce Variance-Based Pruning, a simple and structured one-shot pruning technique for efficiently compressing networks, with minimal finetuning. Our approach first gathers activation statistics, which are used to select neurons for pruning. Simultaneously the mean activations are integrated back into the model to preserve a high degree of performance. On ImageNet-1k recognition tasks, we demonstrate that directly after pruning DeiT-Base retains over 70% of its original performance and requires only 10 epochs of fine-tuning to regain 99% of the original accuracy while simultaneously reducing MACs by 35% and model size by 36%, thus speeding up the model by 1.44x.
诸如Vision Transfomerers等越来越昂贵的大型模型培训,如Vision Transfomerers等越来越昂贵的模型培训激励重新使用已经受过训练的先进网络的庞大图书馆。然而,它们的潜伏性、高计算成本和记忆需求对部署提出了重大挑战,特别是在资源受限制的硬件方面。结构化的修剪方法可以减少这些因素,但它们往往需要费用高昂的再培训,有时甚至需要从零到零的训练,以恢复因结构改造而丧失的准确性。在结构化的修剪后保持所提供经过训练的模型的性能,从而避免广泛的再培训,这仍然是一个挑战。为了解决这个问题,我们引入了基于差异的预留式、简单和结构化的一发裁剪裁技术,以高效的压缩网络为目的,且微调程度小。我们的方法首先收集激活数据,用来选择用于修剪剪的神经元。同时,将平均的活化纳入模型,以保持较高的性能。在图像Net-1k识别任务中,我们证明在钻完后直接运行DITBBs后保留了70%以上的原有性能,只需要10个模型,只需要10个模型来进行精化的模型,然后通过35再恢复原的35的精确度,同时将原始的35号恢复到35的精确度。
Article 70
Title@2025-07-17 (4): FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient
Title: FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient | FedGA: Ein faires, auf dem Gini-Koeffizienten basierendes Föderated Learning Framework | FDGA:基于基尼系数的公平联邦学习框架 2507.12983v1 |
Authors (1): ShanBin Liu
Fairness has emerged as one of the key challenges in federated learning. In horizontal federated settings, data heterogeneity often leads to substantial performance disparities across clients, raising concerns about equitable model behavior. To address this issue, we propose FedGA, a fairness-aware federated learning algorithm. We first employ the Gini coefficient to measure the performance disparity among clients. Based on this, we establish a relationship between the Gini coefficient $G$ and the update scale of the global model ${U_s}$, and use this relationship to adaptively determine the timing of fairness intervention. Subsequently, we dynamically adjust the aggregation weights according to the system’s real-time fairness status, enabling the global model to better incorporate information from clients with relatively poor performance.We conduct extensive experiments on the Office-Caltech-10, CIFAR-10, and Synthetic datasets. The results show that FedGA effectively improves fairness metrics such as variance and the Gini coefficient, while maintaining strong overall performance, demonstrating the effectiveness of our approach.
公平已成为联邦学习的关键挑战之一。在横向联盟环境中,数据差异往往导致客户之间业绩的巨大差异,引起对公平模式行为的关切。为了解决这一问题,我们提议采用公平意识的联邦学习算法FedGA,即公平意识的联邦学习算法。我们首先使用基尼系数来衡量客户之间的业绩差异。在此基础上,我们建立了基尼系数$G美元与全球模型更新规模${U_s}之间的关系,并利用这种关系适应性决定公平干预的时机。随后,我们根据系统的实时公平状况动态调整汇总权重,使全球模式能够更好地纳入业绩较差的客户提供的信息。我们在办公室-Caltech-10、CIFAR-10和合成数据集方面进行了广泛的实验。结果显示,FDGA有效地改进了公平度指标,如差异和基尼系数等,同时保持了强有力的总体绩效,显示了我们方法的有效性。
Article 71
Title@2025-07-17 (4): A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints
Title: A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints | Ein verteilter generativer KI-Ansatz für heterogene Multi-Domain-Umgebungen unter Datenfreigabebeschränkungen | 在数据共享制约下,对异种多领域不同环境采取分散的AI方法 2507.12979v1 |
Authors (4): Youssef Tawfilis, Hossam Amer, Minar El-Aasser, Tallal Elshabrawy
Federated Learning has gained increasing attention for its ability to enable multiple nodes to collaboratively train machine learning models without sharing their raw data. At the same time, Generative AI – particularly Generative Adversarial Networks (GANs) – have achieved remarkable success across a wide range of domains, such as healthcare, security, and Image Generation. However, training generative models typically requires large datasets and significant computational resources, which are often unavailable in real-world settings. Acquiring such resources can be costly and inefficient, especially when many underutilized devices – such as IoT devices and edge devices – with varying capabilities remain idle. Moreover, obtaining large datasets is challenging due to privacy concerns and copyright restrictions, as most devices are unwilling to share their data. To address these challenges, we propose a novel approach for decentralized GAN training that enables the utilization of distributed data and underutilized, low-capability devices while not sharing data in its raw form. Our approach is designed to tackle key challenges in decentralized environments, combining KLD-weighted Clustered Federated Learning to address the issues of data heterogeneity and multi-domain datasets, with Heterogeneous U-Shaped split learning to tackle the challenge of device heterogeneity under strict data sharing constraints – ensuring that no labels or raw data, whether real or synthetic, are ever shared between nodes. Experimental results shows that our approach demonstrates consistent and significant improvements across key performance metrics, where it achieves 1.1x – 2.2x higher image generation scores, an average 10% boost in classification metrics (up to 50% in multi-domain non-IID settings), in much lower latency compared to several benchmarks. Find our code at https://github.com/youssefga28/HuSCF-GAN.
联邦学习组织越来越关注其使多个节点能够在不分享原始数据的情况下合作培训机器学习模式的能力。与此同时,General AI – – 特别是General Aversarial Networks(GANs) – – 在保健、安全和图像生成等广泛领域取得了显著成功。然而,培训基因化模式通常需要大型数据集和大量计算资源,而这些在现实世界环境中往往无法获得。获取此类资源的成本和低效率,特别是许多未充分利用的多功能(如IOT装置和边缘装置等)的自动化升级升级工具仍然空闲。此外,由于隐私关切和版权限制,获得大型数据集具有挑战性,因为大多数装置不愿意分享数据。为了应对这些挑战,我们提出了分散的GAN培训新颖方法,以便能够利用分布的数据和利用率不足、低能力工具,同时又不以原始形式分享数据。我们的方法旨在应对分散环境中的关键挑战,将KLD-加权的多功能化的多功能化分类学习结合到数据高度化的系统环境,而多功能化的标准化和多功能化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化的系统化数据。
Article 72
Title@2025-07-17 (4): WaveletInception Networks for Drive-by Vibration-Based Infrastructure Health Monitoring
Title: WaveletInception Networks for Drive-by Vibration-Based Infrastructure Health Monitoring | WaveletInception-Netzwerke für Drive-by-Vibrationsbasierte Infrastruktur-Gesundheitsüberwachung | 驱动振动基础设施健康监测波动感知网络 2507.12969v1 |
Authors (3): Reza Riahi Samani, Alfredo Nunez, Bart De Schutter
This paper presents a novel deep learning-based framework for infrastructure health monitoring using drive-by vibration response signals. Recognizing the importance of spectral and temporal information, we introduce the WaveletInception-BiLSTM network. The WaveletInception feature extractor utilizes a Learnable Wavelet Packet Transform (LWPT) as the stem for extracting vibration signal features, incorporating spectral information in the early network layers. This is followed by 1D Inception networks that extract multi-scale, high-level features at deeper layers. The extracted vibration signal features are then integrated with operational conditions via a Long Short-term Memory (LSTM) layer. The resulting feature extraction network effectively analyzes drive-by vibration signals across various measurement speeds without preprocessing and uses LSTM to capture interrelated temporal dependencies among different modes of information and to create feature vectors for health condition estimation. The estimator head is designed with a sequential modeling architecture using bidirectional LSTM (BiLSTM) networks, capturing bi-directional temporal relationships from drive-by measurements. This architecture allows for a high-resolution, beam-level assessment of infrastructure health conditions. A case study focusing on railway track stiffness estimation with simulated drive-by vibration signals shows that the model significantly outperforms state-of-the-art methods in estimating railway ballast and railpad stiffness parameters. Results underscore the potential of this approach for accurate, localized, and fully automated drive-by infrastructure health monitoring.
本文介绍了利用驱动振动响应信号进行基础设施健康监测的新型深层次学习框架。我们认识到光谱和时间信息的重要性,引入了波盘感知-BILSTM网络。波盘感知特性提取器使用可学习的波盘包装变换(LWPT)作为提取振动信号特征的干线,将光谱信息纳入早期网络层。随后是1D感知网络,在更深层提取多级高层次特征。提取的振动信号功能随后通过长期短期内存层与业务条件相结合。由此产生的地貌感测网有效分析各种测量速度的驱动力振动信号,而无需预处理,并使用LSTM来捕捉不同信息模式之间相互关联的时间依赖性,并为健康状况估算创建特征矢量矢量矢量矢量载器。Sitematahead头的设计是使用双向型LSTM(BILSTM)网络的顺序建模结构结构,从驱动力测量中捕捉到双向时间关系。这一结构使得能够通过高分辨率测量,通过高分辨率对各种测度的铁路结构进行高清晰度评估,并展示对铁路状况进行精确度评估。
Article 73
Title@2025-07-17 (4): Investigating Forecasting Models for Pandemic Infections Using Heterogeneous Data Sources: A 2-year Study with COVID-19
Title: Investigating Forecasting Models for Pandemic Infections Using Heterogeneous Data Sources: A 2-year Study with COVID-19 | Untersuchung von Prognosemodellen für Pandemieinfektionen unter Verwendung heterogener Datenquellen: Eine 2-jährige Studie mit COVID-19 | 利用异源数据源调查利用异源数据对传染病的预测模型:COVID-19的两年期研究 2507.12966v1 |
Authors (3): Zacharias Komodromos, Kleanthis Malialis, Panayiotis Kolios
Emerging in December 2019, the COVID-19 pandemic caused widespread health, economic, and social disruptions. Rapid global transmission overwhelmed healthcare systems, resulting in high infection rates, hospitalisations, and fatalities. To minimise the spread, governments implemented several non-pharmaceutical interventions like lockdowns and travel restrictions. While effective in controlling transmission, these measures also posed significant economic and societal challenges. Although the WHO declared COVID-19 no longer a global health emergency in May 2023, its impact persists, shaping public health strategies. The vast amount of data collected during the pandemic offers valuable insights into disease dynamics, transmission, and intervention effectiveness. Leveraging these insights can improve forecasting models, enhancing preparedness and response to future outbreaks while mitigating their social and economic impact. This paper presents a large-scale case study on COVID-19 forecasting in Cyprus, utilising a two-year dataset that integrates epidemiological data, vaccination records, policy measures, and weather conditions. We analyse infection trends, assess forecasting performance, and examine the influence of external factors on disease dynamics. The insights gained contribute to improved pandemic preparedness and response strategies.
2019年12月,COVID-19大流行造成了广泛的健康、经济和社会混乱; 迅速全球传播的保健系统,造成高感染率、住院和死亡; 为了最大限度地减少这种传播,各国政府实施了几项非制药的干预措施,如封锁和旅行限制; 这些措施在有效控制传播方面也带来了重大的经济和社会挑战; 尽管世卫组织宣布COVID-19在2023年5月不再是全球卫生紧急事件,但其影响依然存在,影响影响影响影响着公共卫生战略; 在这种流行病期间收集的大量数据为疾病动态、传播和干预效力提供了宝贵的见解; 利用这些见解可以改进预测模型,加强防备和应对未来疾病爆发,同时减轻其社会和经济影响; 本文介绍了塞浦路斯COVID-19预报的大规模案例研究,利用了一套两年的数据集,综合了流行病数据、疫苗接种记录、政策措施和天气条件; 我们分析了感染趋势,评估了预测业绩,并审查了外部因素对疾病动态的影响; 获得的见解有助于改进大流行病的防备和应对战略。
Article 74
Title@2025-07-17 (4): Demographic-aware fine-grained classification of pediatric wrist fractures
Title: Demographic-aware fine-grained classification of pediatric wrist fractures | Demografiebewusste feinkörnige Klassifizierung von pädiatrischen Handgelenkfrakturen | 人口意识小儿科手腕骨折细细细分分类 2507.12964v1 |
Authors (4): Ammar Ahmed, Ali Shariq Imran, Zenun Kastrati, Sher Muhammad Daudpota
Wrist pathologies are frequently observed, particularly among children who constitute the majority of fracture cases. However, diagnosing these conditions is time-consuming and requires specialized expertise. Computer vision presents a promising avenue, contingent upon the availability of extensive datasets, a notable challenge in medical imaging. Therefore, reliance solely on one modality, such as images, proves inadequate, especially in an era of diverse and plentiful data types. In this study, we employ a multifaceted approach to address the challenge of recognizing wrist pathologies using an extremely limited dataset. Initially, we approach the problem as a fine-grained recognition task, aiming to identify subtle X-ray pathologies that conventional CNNs overlook. Secondly, we enhance network performance by fusing patient metadata with X-ray images. Thirdly, rather than pre-training on a coarse-grained dataset like ImageNet, we utilize weights trained on a fine-grained dataset. While metadata integration has been used in other medical domains, this is a novel application for wrist pathologies. Our results show that a fine-grained strategy and metadata integration improve diagnostic accuracy by 2% with a limited dataset and by over 10% with a larger fracture-focused dataset.
经常观察到骨折病症,特别是在占骨折病例多数的儿童中。然而,诊断这些病症是耗时且需要专门知识的。计算机愿景是一个充满希望的途径,取决于能否提供广泛的数据集,这是医学成像方面的一个显著挑战。因此,仅仅依赖一种模式,例如图像,证明是不充分的,特别是在数据类型多样和种类繁多的时代。在这项研究中,我们采用多方面的方法,利用极为有限的数据集来应对辨别手腕病症的挑战。最初,我们将此问题作为细微的识别任务,目的是查明传统CNN所忽略的微妙X射线病理。第二,我们通过使用X光图像来利用病人的元数据来提高网络的性能。第三,而不是对象图像网这样的粗糙的数据集进行预先训练,我们使用经过精细的数据集培训的权重。虽然在其它医疗领域使用过元集,但这是对手腕病理病理学的一种新应用。我们的结果显示,经过精细的策略和元数据整合后,通过以2 %的断裂率提高诊断性能,通过有限的数据节率提高。
Article 75
Title@2025-07-17 (4): A Spectral Interpretation of Redundancy in a Graph Reservoir
Title: A Spectral Interpretation of Redundancy in a Graph Reservoir | Eine spektrale Interpretation der Redundanz in einem Graph Reservoir | 图表储量中剩余性的旁观解释 2507.12963v1 |
Authors (2): Anna Bison, Alessandro Sperduti
Reservoir computing has been successfully applied to graphs as a preprocessing method to improve the training efficiency of Graph Neural Networks (GNNs). However, a common issue that arises when repeatedly applying layer operators on graphs is over-smoothing, which consists in the convergence of graph signals toward low-frequency components of the graph Laplacian. This work revisits the definition of the reservoir in the Multiresolution Reservoir Graph Neural Network (MRGNN), a spectral reservoir model, and proposes a variant based on a Fairing algorithm originally introduced in the field of surface design in computer graphics. This algorithm provides a pass-band spectral filter that allows smoothing without shrinkage, and it can be adapted to the graph setting through the Laplacian operator. Given its spectral formulation, this method naturally connects to GNN architectures for tasks where smoothing, when properly controlled, can be beneficial,such as graph classification. The core contribution of the paper lies in the theoretical analysis of the algorithm from a random walks perspective. In particular, it shows how tuning the spectral coefficients can be interpreted as modulating the contribution of redundant random walks. Exploratory experiments based on the MRGNN architecture illustrate the potential of this approach and suggest promising directions for future research.
储量计算作为提高图形神经网络(GNNS)培训效率的预处理方法被成功地应用于图表。然而,在反复在图形上应用层操作员时产生的一个共同问题是超移动的,这包括图形信号与Laplacian图中低频组件的趋同。这项工作重新审视了多分辨率储量图像神经网络(MRGNN)中储油层的定义,这是一个光谱储油层模型,并提出了一个基于计算机图形表面设计领域最初引入的公平算法的变方。这种算法提供了一种允许平滑而不缩小的过频带光谱过滤器,可以通过拉普拉cian操作员将其调整为图形设置。鉴于其光谱配制,这种方法自然地连接到GNNN,在多分辨率储气层网络(MGNN)中,如果控制得当它得到适当控制,则可能是有益的。文件的核心贡献在于从随机行走角度对算法进行理论分析。特别是它如何调整光谱系数可以被解释为调整后方空间空间研究方法的潜在方向。探索以展示未来空间研究方向。
Article 76
Title@2025-07-17 (4): Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Title: Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning | Dynamische Stabilität des stochastischen Gradienten Absinkens im überparameterisierten Lernen charakterisierend | 将过度量化的学习中存储层渐变源的动态稳定化特性化 2407.20209v3 |
Authors (2): Dennis Chemnitz, Maximilian Engel
For overparameterized optimization tasks, such as those found in modern machine learning, global minima are generally not unique. In order to understand generalization in these settings, it is vital to study to which minimum an optimization algorithm converges. The possibility of having minima that are unstable under the dynamics imposed by the optimization algorithm limits the potential minima that the algorithm can find. In this paper, we characterize the global minima that are dynamically stable/unstable for both deterministic and stochastic gradient descent (SGD). In particular, we introduce a characteristic Lyapunov exponent that depends on the local dynamics around a global minimum and rigorously prove that the sign of this Lyapunov exponent determines whether SGD can accumulate at the respective global minimum.
对于诸如现代机器学习中发现的超分化优化任务,全球微型模型一般并非独一无二。为了了解这些环境中的概括性,必须研究最起码的优化算法与哪些最起码的组合。在优化算法所施加的动态下,微型模型不稳定的可能性限制了该算法所能找到的潜在小型模型。在本文中,我们描述全球微型模型的特征,这些微型模型动态地稳定/不适于确定性和随机性梯度下降(SGD ) 。特别是,我们引入了一个特性Lyapunov Exponov,它取决于全球最低值周围的当地动态,并严格地证明,该Lyapunov Exponent的标志决定了SGD能否在相应的全球最低值上累积。
Article 77
Title@2025-07-17 (4): A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing
Title: A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing | Ein Progressives Bildwiederherstellungsnetzwerk für High-Order Degradation Imaging in Remote Sensing | 遥感中高顺序退化成像的逐步图像恢复网络 2412.07195v2 |
Authors (6): Yujie Feng, Yin Yang, Xiaohong Fan, Zhengpeng Zhang, Lijing Bu, Jianping Zhang
Recently, deep learning methods have gained remarkable achievements in the field of image restoration for remote sensing (RS). However, most existing RS image restoration methods focus mainly on conventional first-order degradation models, which may not effectively capture the imaging mechanisms of remote sensing images. Furthermore, many RS image restoration approaches that use deep learning are often criticized for their lacks of architecture transparency and model interpretability. To address these problems, we propose a novel progressive restoration network for high-order degradation imaging (HDI-PRNet), to progressively restore different image degradation. HDI-PRNet is developed based on the theoretical framework of degradation imaging, also Markov properties of the high-order degradation process and Maximum a posteriori (MAP) estimation, offering the benefit of mathematical interpretability within the unfolding network. The framework is composed of three main components: a module for image denoising that relies on proximal mapping prior learning, a module for image deblurring that integrates Neumann series expansion with dual-domain degradation learning, and a module for super-resolution. Extensive experiments demonstrate that our method achieves superior performance on both synthetic and real remote sensing images.
最近,在遥感图像恢复领域,深层学习方法取得了显著成就。然而,大多数现有的斯普斯卡图像恢复方法主要侧重于传统的一阶退化模型,这些模型可能无法有效捕捉遥感图像的成像机制。此外,许多使用深层次学习的斯普斯卡图像恢复方法往往因其缺乏结构透明度和模型解释能力而遭到批评。为解决这些问题,我们提议建立一个新型的高层次退化成像渐进恢复网络(HDI-PRNet),以逐步恢复不同图像退化。人的发展行动-PRNet是根据退化成像理论框架、高层次退化过程的Markov特性和最大后遗图估计(MAP)开发的,在正在发展的网络中提供了数学解释的惠益。该框架由三个主要组成部分组成:一个图像淡化模块,依赖先学准的绘图,一个将Neumann系列的扩展与双度退化学习相结合的图像分流模块,以及一个超级分辨率模块。广泛的实验表明,我们的方法在合成和真实遥感图像上都取得了优异性。
Article 78
Title@2025-07-17 (4): A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion
Title: A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion | Eine Gehirntumor-Segmentierungsmethode basierend auf CLIP und 3D U-Net mit Cross-Modal Semantic Guidance und Multi-Level-Feature Fusion | 以CLIP和3D U-Net为基础的脑肿瘤分解法,并配有跨模式语义指导和多功能融合 2507.09966v2 |
Authors (1): Mingda Zhang
Precise segmentation of brain tumors from magnetic resonance imaging (MRI) is essential for neuro-oncology diagnosis and treatment planning. Despite advances in deep learning methods, automatic segmentation remains challenging due to tumor morphological heterogeneity and complex three-dimensional spatial relationships. Current techniques primarily rely on visual features extracted from MRI sequences while underutilizing semantic knowledge embedded in medical reports. This research presents a multi-level fusion architecture that integrates pixel-level, feature-level, and semantic-level information, facilitating comprehensive processing from low-level data to high-level concepts. The semantic-level fusion pathway combines the semantic understanding capabilities of Contrastive Language-Image Pre-training (CLIP) models with the spatial feature extraction advantages of 3D U-Net through three mechanisms: 3D-2D semantic bridging, cross-modal semantic guidance, and semantic-based attention mechanisms. Experimental validation on the BraTS 2020 dataset demonstrates that the proposed model achieves an overall Dice coefficient of 0.8567, representing a 4.8% improvement compared to traditional 3D U-Net, with a 7.3% Dice coefficient increase in the clinically important enhancing tumor (ET) region.
磁共振成像(MRI)对脑肿瘤进行精密分解,对于神经肿瘤诊断和治疗规划至关重要。尽管在深层学习方法方面取得了进展,但自动分解仍具有挑战性,因为肿瘤形态异质性和复杂的三维空间关系。当前技术主要依赖从磁共振序列序列中提取的视觉特征,同时没有充分利用医学报告中嵌入的语义知识。这一研究提供了一个多层次的融合结构,将像素级、特征级和语义级信息结合起来,便利从低层次数据到高层次概念的全面处理。语义分解途径将对比语言图像预培训模型(CLIP)的语义理解能力与3D U-Net的空间特征提取优势结合起来,这三种机制是:3D-2D 语义连接、跨模式性语义导导导导导导导导和基于语义的注意机制。 BRATS 2020 数据集的实验验证表明,拟议的模型实现了0.8567的总体狄氏系数,代表了相对4.8%的临床模型的改善程度,与传统的3.DMERMM的基系数相比,增加了4.8%的U-DMIT-D的模型。
Article 79
Title@2025-07-17 (4): cIDIR: Conditioned Implicit Neural Representation for Regularized Deformable Image Registration
Title: cIDIR: Conditioned Implicit Neural Representation for Regularized Deformable Image Registration | cIDIR: Bedingte implizite Neuraldarstellung für regularisierte, deformierbare Bildregistrierung | cIDIR: 定期变形图像注册的有条件的、隐含的神经代表 2507.12953v1 |
Authors (3): Sidaty El Hadramy, Oumeymah Cherkaoui, Philippe C. Cattin
Regularization is essential in deformable image registration (DIR) to ensure that the estimated Deformation Vector Field (DVF) remains smooth, physically plausible, and anatomically consistent. However, fine-tuning regularization parameters in learning-based DIR frameworks is computationally expensive, often requiring multiple training iterations. To address this, we propose cIDI, a novel DIR framework based on Implicit Neural Representations (INRs) that conditions the registration process on regularization hyperparameters. Unlike conventional methods that require retraining for each regularization hyperparameter setting, cIDIR is trained over a prior distribution of these hyperparameters, then optimized over the regularization hyperparameters by using the segmentations masks as an observation. Additionally, cIDIR models a continuous and differentiable DVF, enabling seamless integration of advanced regularization techniques via automatic differentiation. Evaluated on the DIR-LAB dataset, $\operatorname{cIDIR}$ achieves high accuracy and robustness across the dataset.
在可变形图像登记(DIR)中,为了确保估计的变形矢量场(DVF)保持平稳、物理上看似合理和解剖上的一致性,对基于学习的DIR框架中的正规化参数进行微调是计算上昂贵的,往往需要多次培训迭代。为了解决这个问题,我们提议以隐形神经表(INRs)为基础,建立一个新型的DIR框架,使注册过程符合超参数的正规化条件。与需要对每个正规化超参数设置进行再培训的常规方法不同,CIDIR在事先分配这些超参数方面受过培训,然后通过使用分解面面罩进行观察,优化到正规化超参数。此外,CIDIR模型是一个连续和不同的DVF模型,通过自动区分使先进的正规化技术能够无缝地整合。根据DIR-LAB数据集, $\operatorname{cIDIR}在数据集中进行了高准确性和稳健度评估。
Article 80
Title@2025-07-17 (4): Signal Recovery Using a Spiked Mixture Model
Title: Signal Recovery Using a Spiked Mixture Model | Signalwiederherstellung mit einem Spiked Mixture Model | 使用斯派混合混合模型恢复信号 2501.01840v2 |
Authors (5): Paul-Louis Delacour, Sander Wahls, Jeffrey M. Spraggins, Lukasz Migas, Raf Van de Plas
We introduce the spiked mixture model (SMM) to address the problem of estimating a set of signals from many randomly scaled and noisy observations. Subsequently, we design a novel expectation-maximization (EM) algorithm to recover all parameters of the SMM. Numerical experiments show that in low signal-to-noise ratio regimes, and for data types where the SMM is relevant, SMM surpasses the more traditional Gaussian mixture model (GMM) in terms of signal recovery performance. The broad relevance of the SMM and its corresponding EM recovery algorithm is demonstrated by applying the technique to different data types. The first case study is a biomedical research application, utilizing an imaging mass spectrometry dataset to explore the molecular content of a rat brain tissue section at micrometer scale. The second case study demonstrates SMM performance in a computer vision application, segmenting a hyperspectral imaging dataset into underlying patterns. While the measurement modalities differ substantially, in both case studies SMM is shown to recover signals that were missed by traditional methods such as k-means clustering and GMM.
我们引入了悬浮混合模型(SMM),以解决从许多随机和吵闹的观测中估计一系列信号的问题。随后,我们设计了一种新的预期-最大化算法(EM),以恢复SMM的所有参数。数字实验表明,在低信号-噪音比率制度中,对于与SMM相关的数据类型,SMM在信号恢复性能方面超过了较传统的高斯混合模型(GMM)。SMM及其相应的EM恢复算法的广泛相关性通过将该技术应用于不同的数据类型而得到证明。第一个案例研究是生物医学研究应用,利用成象质量光谱学数据集来探索微米尺度鼠脑组织部分的分子内容。第二个案例研究表明,在计算机视觉应用中,SMMM的性能将超光谱成像数据分成基本模式。虽然测量方法大不相同,但在两个案例研究中,SMMMM都表明,恢复了被K-手段集成和GMM等传统方法所遗漏的信号。
Article 81
Title@2025-07-17 (4): MMOne: Representing Multiple Modalities in One Scene
Title: MMOne: Representing Multiple Modalities in One Scene | MMUne: Vertretung mehrerer Modalitäten in einer Szene | MMIO: 在一个场景中代表多种模式 2507.11129v2 |
Authors (2): Zhifeng Gu, Bing Wang
Humans perceive the world through multimodal cues to understand and interact with the environment. Learning a scene representation for multiple modalities enhances comprehension of the physical world. However, modality conflicts, arising from inherent distinctions among different modalities, present two critical challenges: property disparity and granularity disparity. To address these challenges, we propose a general framework, MMOne, to represent multiple modalities in one scene, which can be readily extended to additional modalities. Specifically, a modality modeling module with a novel modality indicator is proposed to capture the unique properties of each modality. Additionally, we design a multimodal decomposition mechanism to separate multi-modal Gaussians into single-modal Gaussians based on modality differences. We address the essential distinctions among modalities by disentangling multimodal information into shared and modality-specific components, resulting in a more compact and efficient multimodal scene representation. Extensive experiments demonstrate that our method consistently enhances the representation capability for each modality and is scalable to additional modalities. The code is available at https://github.com/Neal2020GitHub/MMOne.
人类通过多式联运的提示来看待世界,从而理解和与环境互动; 学习多种模式的现场代表可以增进对物质世界的理解; 然而,由于不同模式之间的内在区别而产生的模式冲突提出了两个关键挑战:财产差异和颗粒差异; 为了应对这些挑战,我们提议了一个总框架,即MMMOU, 在一个场面代表多种模式,可以随时扩展为其他模式; 具体地说,提议了一个模式模型模块,配有新模式指标,以捕捉每种模式的独特性; 此外,我们设计一个模式拆解机制,以基于模式差异的方式将多模式高斯人分为单一模式高斯人; 我们通过将多模式信息分解为共享和特定模式的组成部分,从而导致更加紧凑和高效的多式联运场面代表,解决模式冲突之间的根本区别问题; 广泛的实验表明,我们的方法一贯地提高每种模式的代表性能力,并且可以扩增其他模式。 代码可在https://github.com/Neal20Gib/MMMO中查阅。
Article 82
Title@2025-07-17 (4): Insights into a radiology-specialised multimodal large language model with sparse autoencoders
Title: Insights into a radiology-specialised multimodal large language model with sparse autoencoders | Einblicke in ein radiologisch spezialisiertes multimodales Großsprachmodell mit spärlichen Autoencodern | 深入观察放射学专门化多式联运大型语言模型,无甚多的自动编码器 2507.12950v1 |
Authors (6): Kenza Bouzid, Shruthi Bannur, Daniel Coelho de Castro, Anton Schwaighofer, Javier Alvarez-Valle, Stephanie L. Hyland
Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on model behaviour through steering, demonstrating directional control over generations with mixed success. Our results reveal practical and methodological challenges, yet they offer initial insights into the internal concepts learned by MAIRA-2 - marking a step toward deeper mechanistic understanding and interpretability of a radiology-adapted multimodal large language model, and paving the way for improved model transparency. We release the trained SAEs and interpretations: https://huggingface.co/microsoft/maira-2-sae.
解释性可提高AI模型的安全性、透明度和信任性,这对保健应用中决策往往产生重大后果的保健应用特别重要。机械解释性,特别是通过使用稀疏的自动解析器(SAEs),为在大型变压器型模型中发现人的解释性特征提供了一个很有希望的方法。在本研究中,我们将Matryoshka-SAE应用到放射学专业多式联运大语言模型MAIRA-2来解释其内部表述。使用大规模自动解析的SAE特征,我们发现一系列临床相关概念,包括医疗装置(例如线和管的布置、制动能器的存在)、胸膜破碎和心血管、纵向变化和文本特征等病理学,我们进一步通过指导来研究这些特征对模型行为的影响,展示对几代人的指导性控制,并取得喜忧参半的成功。我们的结果揭示了实际和方法方面的挑战,但对MAIRA-2所学的内部概念提供了初步的洞察力。我们为更深入的机械化理解和解释性理解和可解释性理解性迈出了一步。我们所培训的AMS-SOFMLARMLARMLARMISMARMISMISMARMARMISMARMISMISMISMISMISMISMARMISMISMISMARMISMISMARMARMISMISMARMISMISMISMISMISMISMISMISMISMISMISMISMISMISMISMISMISMDARMISMISMISMISMISMISMISMISMDRISMISMIS。
Article 83
Title@2025-07-17 (4): Probabilistic Soundness Guarantees in LLM Reasoning Chains
Title: Probabilistic Soundness Guarantees in LLM Reasoning Chains | Probabilistische Solidität garantiert in LLM-Aufklärungsketten | LLM 理赔链条的概率稳妥性保障 2507.12948v1 |
Authors (7): Weiqiu You, Anton Xue, Shreya Havaldar, Delip Rao, Helen Jin, Chris Callison-Burch, Eric Wong
In reasoning chains generated by large language models (LLMs), initial errors often propagate and undermine the reliability of the final conclusion. Current LLM-based error detection methods often fail to detect propagated errors because they do not properly account for how earlier errors might corrupt judgments of downstream reasoning. To better detect such propagated errors, we introduce Autoregressive Reasoning Entailment Stability (ARES), a novel probabilistic framework that prevents error propagation by judging each claim based only on previously-assessed sound premises. This inductive method yields a nuanced score for each step and provides certified statistical guarantees of its soundness, rather than a brittle binary label. ARES achieves state-of-the-art performance across four benchmarks (72.1% Macro-F1, +8.2 points) and demonstrates superior robustness on very long synthetic reasoning chains, where it excels at detecting propagated errors (90.3% F1, +27.6 points).
在由大型语言模型(LLMs)产生的推理链中,最初的错误往往会传播并破坏最后结论的可靠性。目前基于LLM的错误探测方法往往无法发现传播错误,因为它们没有正确解释早期错误如何会腐蚀下游推理的判断。为了更好地发现这种传播错误,我们引入了“自动递减理性稳定”(ARES),这是一个新的概率框架,它防止错误传播,因为它通过仅仅根据以前评估过的音响前提来判断每项索赔。这种推导方法为每个步骤带来细微分,并为每个步骤的健全性提供经认证的统计保证,而不是一个简便的二进制标签。 ARES在四个基准(72.1% 宏观-F1, +8.2点)上达到了最新水平,并在非常长的合成推理链上展示了超强的稳健性,在其中它最擅长发现传播错误(90.3% F1, +27.6点 ) 。
Article 84
Title@2025-07-17 (4): Global urban visual perception varies across demographics and personalities
Title: Global urban visual perception varies across demographics and personalities | Globale urbane visuelle Wahrnehmung variiert je nach Demografie und Persönlichkeit | 全球城市视觉认识因人口和个性而异 2505.12758v3 |
Authors (8): Matias Quintana, Youlong Gu, Xiucheng Liang, Yujun Hou, Koichi Ito, Yihan Zhu, Mahmoud Abdelrahman, Filip Biljecki
Understanding people’s preferences is crucial for urban planning, yet current approaches often combine responses from multi-cultural populations, obscuring demographic differences and risking amplifying biases. We conducted a large-scale urban visual perception survey of streetscapes worldwide using street view imagery, examining how demographics – including gender, age, income, education, race and ethnicity, and, for the first time, personality traits – shape perceptions among 1,000 participants with balanced demographics from five countries and 45 nationalities. This dataset, Street Perception Evaluation Considering Socioeconomics (SPECS), reveals demographic- and personality-based differences across six traditional indicators (safe, lively, wealthy, beautiful, boring, depressing) and four new ones (live nearby, walk, cycle, green). Location-based sentiments further shape these preferences. Machine learning models trained on existing global datasets tend to overestimate positive indicators and underestimate negative ones compared to human responses, underscoring the need for local context. Our study aspires to rectify the myopic treatment of street perception, which rarely considers demographics or personality traits.
理解人们的喜好对于城市规划至关重要,然而,目前的方法往往将多文化人口的反应结合起来,掩盖人口差异,并有可能扩大偏见。我们利用街头观景图像对世界各地的街头景象进行了大规模的城市视觉调查,审查了人口状况 – – 包括性别、年龄、收入、教育、种族和族裔,以及首次对个性特征 – – 如何塑造出来自五个国家和45个民族的1 000名具有平衡人口特征的参与者的观念。这个数据集,即《考虑到社会经济的街面观评价》(SPECS),揭示了六个传统指标(安全、活跃、富足、美丽、无聊、令人沮丧)和四个新指标(在附近、步行、循环、绿色)的人口和个性差异。基于地点的情绪进一步塑造了这些偏好。根据现有全球数据集培训的机械学习模式往往高估了正面指标,并低估了与人类反应相比较的负面指标,强调了对当地环境的需要。我们的研究希望纠正对街头感知觉的短视处理,而很少考虑人口或个性特征。
Article 85
Title@2025-07-17 (4): MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration
Title: MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration | MC$^2$A: Algorithm-Hardware Co-Design für effiziente Markov-Kette Monte Carlo Beschleunigung | MC$$2$A: 提高Markov链节蒙特卡洛速度加速速度的辅助算法-Hardware共同设计 2507.12935v1 |
Authors (6): Shirui Zhao, Jun Yin, Lingyun Yao, Martin Andraud, Wannes Meert, Marian Verhelst
An increasing number of applications are exploiting sampling-based algorithms for planning, optimization, and inference. The Markov Chain Monte Carlo (MCMC) algorithms form the computational backbone of this emerging branch of machine learning. Unfortunately, the high computational cost limits their feasibility for large-scale problems and real-world applications, and the existing MCMC acceleration solutions are either limited in hardware flexibility or fail to maintain efficiency at the system level across a variety of end-to-end applications. This paper introduces \textbf{MC$^2$A}, an algorithm-hardware co-design framework, enabling efficient and flexible optimization for MCMC acceleration. Firstly, \textbf{MC$^2$A} analyzes the MCMC workload diversity through an extension of the processor performance roofline model with a 3rd dimension to derive the optimal balance between the compute, sampling and memory parameters. Secondly, \textbf{MC$^2$A} proposes a parametrized hardware accelerator architecture with flexible and efficient support of MCMC kernels with a pipeline of ISA-programmable tree-structured processing units, reconfigurable samplers and a crossbar interconnect to support irregular access. Thirdly, the core of \textbf{MC$^2$A} is powered by a novel Gumbel sampler that eliminates exponential and normalization operations. In the end-to-end case study, \textbf{MC$^2$A} achieves an overall {$307.6\times$, $1.4\times$, $2.0\times$, $84.2\times$} speedup compared to the CPU, GPU, TPU and state-of-the-art MCMC accelerator. Evaluated on various representative MCMC workloads, this work demonstrates and exploits the feasibility of general hardware acceleration to popularize MCMC-based solutions in diverse application domains.
越来越多的应用程序正在利用基于取样的算法来进行规划、优化和推断。 Markov 链链 Monte Carlo( MCMC) 算法构成了这个新兴机器学习分支的计算主干。 不幸的是, 高计算成本限制了大规模问题和现实世界应用的可行性, 以及现有的 MMC加速解决方案在硬件灵活性上受到限制, 或者未能在各种端至端应用中保持系统一级的效率。 本文引入了 textbf{ MC$2$2, 3美元 。 本文引入了一种基于算法的硬软件共同设计框架, 使 MMC 加速的高效和灵活优化。 首先,\ textb{ MC$2$A} 通过扩展进程或性能操作模型来分析 MMC 的多样化。
Article 86
Title@2025-07-17 (4): DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization
Title: DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization | DMQ: Ausreißer von Diffusionsmodellen für die Quantisierung nach dem Training | DMQ: 解剖培训后量化传播模型的外源离子 2507.12933v1 |
Authors (5): Dongyeun Lee, Jiwan Hur, Hyounguk Shon, Jae Young Lee, Junmo Kim
Diffusion models have achieved remarkable success in image generation but come with significant computational costs, posing challenges for deployment in resource-constrained environments. Recent post-training quantization (PTQ) methods have attempted to mitigate this issue by focusing on the iterative nature of diffusion models. However, these approaches often overlook outliers, leading to degraded performance at low bit-widths. In this paper, we propose a DMQ which combines Learned Equivalent Scaling (LES) and channel-wise Power-of-Two Scaling (PTS) to effectively address these challenges. Learned Equivalent Scaling optimizes channel-wise scaling factors to redistribute quantization difficulty between weights and activations, reducing overall quantization error. Recognizing that early denoising steps, despite having small quantization errors, crucially impact the final output due to error accumulation, we incorporate an adaptive timestep weighting scheme to prioritize these critical steps during learning. Furthermore, identifying that layers such as skip connections exhibit high inter-channel variance, we introduce channel-wise Power-of-Two Scaling for activations. To ensure robust selection of PTS factors even with small calibration set, we introduce a voting algorithm that enhances reliability. Extensive experiments demonstrate that our method significantly outperforms existing works, especially at low bit-widths such as W4A6 (4-bit weight, 6-bit activation) and W4A8, maintaining high image generation quality and model stability. The code is available at https://github.com/LeeDongYeun/dmq.
在图像生成方面,传播模型取得了显著的成功,但计算成本很高,给在资源紧张的环境中部署带来了挑战。最近的培训后量化方法(PTQ)试图通过侧重于扩散模型的迭接性来缓解这一问题。然而,这些方法往往忽略了外端模型,导致低位偏差的性能退化。在本文件中,我们提议了一个DMQ, 将知识等效扩增(LES)和渠道型双向动力扩增(PTS)相结合,以有效应对这些挑战。 学习了等价扩大优化了通道错增系数,以重新分配重量和激活之间的分解困难,减少了整体四分解错误。认识到早期分解步骤,尽管有小分解错误,对因错误积累而产生的最终产出产生了关键影响。我们采用了适应性时间加权办法,以便在学习过程中优先安排这些关键步骤。此外,我们提出了诸如跳过连接的六位级间差异,我们引入了频道/双向增强启动能力。确保稳健选择PTS因素,甚至以低位校准方式选择了甚低级的图像。我们采用了甚高校准方法。
Article 87
Title@2025-07-17 (4): From a Mixed-Policy Perspective: Improving Differentiable Automatic Post-editing Optimization
Title: From a Mixed-Policy Perspective: Improving Differentiable Automatic Post-editing Optimization | Aus einer Mixed-Policy-Perspektive: Verbesserung der differenzierbaren automatischen Post-Editing-Optimierung | 从混合政策角度看:改进可区别的自动编辑后优化 2507.12931v1 |
Authors (1): Hongze Tan
This paper introduces two novel modifications to the Differentiable Automatic Post-editing Optimization (DAPO) algorithm, approached from a mixed-policy perspective. Standard policy gradient methods can suffer from instability and sample inefficiency, particularly in sparse reward settings. To address this, we first propose a method that incorporates a pre-trained, stable guiding policy ($\piphi$) to provide off-policy experience, thereby regularizing the training of the target policy ($\pion$). This approach improves training stability and convergence speed by adaptively adjusting the learning step size. Secondly, we extend this idea to re-utilize zero-reward samples, which are often discarded by dynamic sampling strategies like DAPO’s. By treating these samples as a distinct batch guided by the expert policy, we further enhance sample efficiency. We provide a theoretical analysis for both methods, demonstrating that their objective functions converge to the optimal solution within the established theoretical framework of reinforcement learning. The proposed mixed-policy framework effectively balances exploration and exploitation, promising more stable and efficient policy optimization.
本文对从混合政策角度处理的差别式自动编辑后优化(DAPO)算法进行了两项新的修改。标准政策梯度方法可能因不稳定和抽样效率低下而受到影响,特别是在微薄的奖励环境中。为了解决这个问题,我们首先提出一种方法,纳入预先培训的稳定指导政策(Piphi$),以提供政策外经验,从而使目标政策的培训($pion$)正规化。这一方法通过适应性调整学习步数来提高培训稳定性和趋同速度。第二,我们扩大这一想法,重新利用经常被像DAPO那样的动态采样战略抛弃的零回报样本。我们将这些样本作为专家政策指导下的不同批量处理,进一步提高采样效率。我们为这两种方法提供理论分析,表明它们的目标功能在强化学习的既定理论框架内与最佳解决办法汇合在一起。拟议的混合政策框架有效地平衡了勘探和开发,有望实现更稳定和高效的政策优化。
Article 88
Title@2025-07-17 (4): Trace Reconstruction with Language Models
Title: Trace Reconstruction with Language Models | Trace Rekonstruktion mit Sprachmodellen | 使用语言模式进行追踪重建 2507.12927v1 |
Authors (3): Franziska Weindel, Michael Girsch, Reinhard Heckel
The general trace reconstruction problem seeks to recover an original sequence from its noisy copies independently corrupted by deletions, insertions, and substitutions. This problem arises in applications such as DNA data storage, a promising storage medium due to its high information density and longevity. However, errors introduced during DNA synthesis, storage, and sequencing require correction through algorithms and codes, with trace reconstruction often used as part of the data retrieval process. In this work, we propose TReconLM, which leverages language models trained on next-token prediction for trace reconstruction. We pretrain language models on synthetic data and fine-tune on real-world data to adapt to technology-specific error patterns. TReconLM outperforms state-of-the-art trace reconstruction algorithms, including prior deep learning approaches, recovering a substantially higher fraction of sequences without error.
一般的追踪重建问题试图从被删除、插入和替换完全腐蚀的杂音中恢复原始序列。这个问题出现在DNA数据存储等应用中,由于信息密度和寿命高,这是一个很有希望的存储介质。然而,DNA合成、储存和排序过程中出现的错误需要通过算法和代码加以纠正,并经常将追踪重建作为数据检索过程的一部分。在这项工作中,我们建议TReconLM利用经过后方预测培训的语言模型来进行追踪重建。我们预先开发合成数据语言模型和对真实世界数据进行微调,以适应技术特定的错误模式。TRECLM超越了最新的追踪重建算法,包括先前的深层次学习方法,无误恢复了相当高的序列。
Article 89
Title@2025-07-17 (4): Robust Explanations Through Uncertainty Decomposition: A Path to Trustworthier AI
Title: Robust Explanations Through Uncertainty Decomposition: A Path to Trustworthier AI | Robuste Erklärungen durch Unsicherheitszersetzung: Ein Weg zu vertrauensvoller KI | 通过不确定性的分解作出有力的解释:通往信托的路径 AI 2507.12913v1 |
Authors (5): Chenrui Zhu, Louenas Bounia, Vu Linh Nguyen, Sébastien Destercke, Arthur Hoarau
Recent advancements in machine learning have emphasized the need for transparency in model predictions, particularly as interpretability diminishes when using increasingly complex architectures. In this paper, we propose leveraging prediction uncertainty as a complementary approach to classical explainability methods. Specifically, we distinguish between aleatoric (data-related) and epistemic (model-related) uncertainty to guide the selection of appropriate explanations. Epistemic uncertainty serves as a rejection criterion for unreliable explanations and, in itself, provides insight into insufficient training (a new form of explanation). Aleatoric uncertainty informs the choice between feature-importance explanations and counterfactual explanations. This leverages a framework of explainability methods driven by uncertainty quantification and disentanglement. Our experiments demonstrate the impact of this uncertainty-aware approach on the robustness and attainability of explanations in both traditional machine learning and deep learning scenarios.
最近在机器学习方面的进展强调了模型预测透明度的必要性,特别是在使用日益复杂的结构时,可解释性会减少,因此,在本文件中,我们提出利用预测不确定性作为传统解释方法的补充方法,具体地说,我们区分疏通(数据相关)和迷你(模型相关)不确定性,以指导适当解释的选择。概念不确定性是拒绝不可靠解释的一个标准,其本身为不可靠解释提供了洞察力,对培训不足(一种新的解释形式)提供了洞察力。在特征重要性解释和反事实解释之间作出选择时,可以发现不确定性。这利用了不确定性量化和分解驱动的可解释方法框架。我们的实验显示了这种认识不确定性方法对传统机器学习和深层学习情景解释的健全性和可实现性的影响。
Article 90
Title@2025-07-17 (4): LaViPlan : Language-Guided Visual Path Planning with RLVR
Title: LaViPlan : Language-Guided Visual Path Planning with RLVR | LaViPlan : Sprachgeführte visuelle Pfadplanung mit RLVR | Laviplan: RLVR 语言引导视觉路径规划 2507.12911v1 |
Authors (1): Hayeon Oh
Out-of-distribution (OOD) scenarios in autonomous driving refer to situations that deviate from the training domain, often leading to unexpected and potentially hazardous behavior from planners that lack prior exposure to such cases. Recently, Vision-Language Models (VLMs) have been introduced into autonomous driving research for their promising generalization capabilities in OOD settings. Early studies demonstrated that VLMs could recognize OOD scenarios and generate user-level decisions such as “go straight” or “turn right.” However, a new challenge has emerged due to the misalignment between the VLM’s high-level decisions or visual reasoning expressed in language, and the low-level predicted trajectories interpreted as actions. In this paper, we propose LaViPlan, a framework that leverages Reinforcement Learning with Verifiable Rewards (RLVR) to optimize VLMs using planning-oriented metrics. This approach addresses the vision-language-action misalignment observed in existing VLMs fine-tuned via supervised learning, which can recognize driving scenarios but often produce context-unaware decisions. Experimental results demonstrate that our method improves situational awareness and decision-making under OOD conditions, highlighting its potential to mitigate the misalignment issue. This work introduces a promising post-training paradigm for VLM agents in the context of autonomous driving.
自主驱动的分布(OOD)外向情景是指与培训领域不同的情况,往往导致规划者没有事先接触此类案例的意外和潜在危险行为。最近,愿景-语言模型(VLMS)被引入自主驱动研究,以在OOD环境中进行有希望的普及能力;早期研究表明,VLMS可以识别OOD情景,并产生“直向”或“右转”等用户层面的决定。然而,由于VLM高层决定或语言表达的视觉推理不协调,以及低层次预测轨迹被解释为行动,出现了新的挑战。在本文件中,我们建议了LaViPlan,这是一个利用可变回报(RLVRVR)来充分利用强化学习的自主驱动能力来优化VLMS模式,这一方法解决了通过监管学习观察到的愿景-语言行动不匹配问题,这既能识别驱动情景,又往往产生背景-无序决定。实验结果表明,我们的方法改进了范式定位,从而降低了其自主决策的定位。
Article 91
Title@2025-07-17 (4): Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services
Title: Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services | Fremer: Leichter und effektiver Frequenztransformator für Workload-Prognose in Cloud Services | Fremer:云服务工作量预测的轻型和有效频率变压器 2507.12908v1 |
Authors (7): Jiadong Chen, Hengyu Ye, Fuxin Jiang, Xiao He, Tieying Zhang, Jianjun Chen, Xiaofeng Gao
Workload forecasting is pivotal in cloud service applications, such as auto-scaling and scheduling, with profound implications for operational efficiency. Although Transformer-based forecasting models have demonstrated remarkable success in general tasks, their computational efficiency often falls short of the stringent requirements in large-scale cloud environments. Given that most workload series exhibit complicated periodic patterns, addressing these challenges in the frequency domain offers substantial advantages. To this end, we propose Fremer, an efficient and effective deep forecasting model. Fremer fulfills three critical requirements: it demonstrates superior efficiency, outperforming most Transformer-based forecasting models; it achieves exceptional accuracy, surpassing all state-of-the-art (SOTA) models in workload forecasting; and it exhibits robust performance for multi-period series. Furthermore, we collect and open-source four high-quality, open-source workload datasets derived from ByteDance’s cloud services, encompassing workload data from thousands of computing instances. Extensive experiments on both our proprietary datasets and public benchmarks demonstrate that Fremer consistently outperforms baseline models, achieving average improvements of 5.5% in MSE, 4.7% in MAE, and 8.6% in SMAPE over SOTA models, while simultaneously reducing parameter scale and computational costs. Additionally, in a proactive auto-scaling test based on Kubernetes, Fremer improves average latency by 18.78% and reduces resource consumption by 2.35%, underscoring its practical efficacy in real-world applications.
工作负荷预测在云服务应用中至关重要,例如自动缩放和排期等,对业务效率具有深远影响。尽管基于变异器的预测模型在一般任务中表现出显著的成功,但其计算效率往往低于大型云层环境中的严格要求。鉴于大多数工作量系列呈现复杂的周期性模式,应对频率领域的这些挑战具有巨大的优势。为此,我们提议Fremer,一个高效和有效的深度预测模型。 Fremer满足了三大关键要求:它显示出更高的效率,优于大多数基于变异器的预测模型;它实现了超乎寻常的精确性,超过了工作量预测中所有最先进的(SOATA)应用模型;其计算效率往往低于大型云层的严格要求。此外,我们收集并公开来源了来自ByteDance云服务的四个高质量、公开来源的工作量数据集,其中包括来自数千个计算案例的工作量数据。 关于我们专有数据集和公共基准的广泛实验表明,Fremer一贯优于基准模型,在工作量预测方面实现了超标式模型的平均改进,在MSE、4.7%的SOE、4.5的MAE、8.5的SMA-Destreal-comnial、同时测试的Simal-comma-comma-comma-comma-comma-commalalalalalalalalalalalalalalalal 和8.的Simal-deal-dexxxxxxxxxxxxxxxxx,在SMA-SMA-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,在SMASMASMAMAMA-xxxxxxxxxxxxxx
Article 92
Title@2025-07-17 (4): Learning to Reject Low-Quality Explanations via User Feedback
Title: Learning to Reject Low-Quality Explanations via User Feedback | Lernen, Low-Quality-Erklärungen per User Feedback abzulehnen | 通过用户反馈学习拒绝低质量解释 2507.12900v1 |
Authors (4): Luca Stradiotti, Dario Pesenti, Stefano Teso, Jesse Davis
Machine Learning predictors are increasingly being employed in high-stakes applications such as credit scoring. Explanations help users unpack the reasons behind their predictions, but are not always “high quality’’. That is, end-users may have difficulty interpreting or believing them, which can complicate trust assessment and downstream decision-making. We argue that classifiers should have the option to refuse handling inputs whose predictions cannot be explained properly and introduce a framework for learning to reject low-quality explanations (LtX) in which predictors are equipped with a rejector that evaluates the quality of explanations. In this problem setting, the key challenges are how to properly define and assess explanation quality and how to design a suitable rejector. Focusing on popular attribution techniques, we introduce ULER (User-centric Low-quality Explanation Rejector), which learns a simple rejector from human ratings and per-feature relevance judgments to mirror human judgments of explanation quality. Our experiments show that ULER outperforms both state-of-the-art and explanation-aware learning to reject strategies at LtX on eight classification and regression benchmarks and on a new human-annotated dataset, which we will publicly release to support future research.
解释帮助用户解开预测背后的理由,但并不总是“ 高质量 ” 。 也就是说, 终端用户可能难以解释或相信它们,这可能会使信任评估和下游决策复杂化。 我们主张, 分类人员应该可以选择拒绝处理那些无法正确解释预测的输入, 并引入一个学习框架, 拒绝低质量的解释( LtX ) , 低质量的解释( LtX ) , 其中预测者配备了一位拒绝者, 评估解释的质量。 在这个问题设置中, 关键的挑战是如何正确定义和评估解释质量, 以及如何设计合适的拒绝者。 专注于流行的归属技术, 我们引入 ULER( 以用户为中心的低质量解释拒绝者 ) , 它可以从人类评级中学习一个简单的拒绝者, 并引入一个反常相关判断来反映人类解释质量的判断。 我们的实验显示, ULER 将超越当前最先进的解释者, 和 认知者学习如何拒绝LtX 的八项分类和回归基准战略, 以及新的人类附加的未来数据, 我们将公开支持。
Article 93
Title@2025-07-17 (4): A column generation algorithm with dynamic constraint aggregation for minimum sum-of-squares clustering
Title: A column generation algorithm with dynamic constraint aggregation for minimum sum-of-squares clustering | Ein Spaltengenerierungsalgorithmus mit dynamischer Constraint-Aggregation für minimale Summe von Quadraten | 为最小平方和组合组合组合组合而具有动态约束聚合的列生成算法 2410.06187v2 |
Authors (2): Antonio M. Sudoso, Daniel Aloise
The minimum sum-of-squares clustering problem (MSSC), also known as $k$-means clustering, refers to the problem of partitioning $n$ data points into $k$ clusters, with the objective of minimizing the total sum of squared Euclidean distances between each point and the center of its assigned cluster. We propose an efficient algorithm for solving large-scale MSSC instances, which combines column generation (CG) with dynamic constraint aggregation (DCA) to effectively reduce the number of constraints considered in the CG master problem. DCA was originally conceived to reduce degeneracy in set partitioning problems by utilizing an aggregated restricted master problem obtained from a partition of the set partitioning constraints into disjoint clusters. In this work, we explore the use of DCA within a CG algorithm for MSSC exact solution. Our method is fine-tuned by a series of ablation studies on DCA design choices, and is demonstrated to significantly outperform existing state-of-the-art exact approaches available in the literature.
最小方位群集问题(MSSC),也称为美元-单位群集问题,是指将n美元数据点分割成k美元组的问题,目的是最大限度地减少每个点与其分配组群中心之间平方欧西里得距离的总和。我们提出了解决大规模MSSC案例的有效算法,将柱子生成与动态制约集合结合起来,以有效减少CG主问题中考虑的限制数量。DCA最初的构想是,利用从将设定的分区制约分割成断开组群中获得的总限总问题,减少在设定分解问题中的偏差性。在这项工作中,我们探索在CG算法中如何使用DCA来精确解决MSSC问题。我们的方法经过一系列关于DCA设计选择的调整研究的调整,并证明我们的方法大大超越了文献中现有的最新精确方法。
Article 94
Title@2025-07-17 (4): Generalist Bimanual Manipulation via Foundation Video Diffusion Models
Title: Generalist Bimanual Manipulation via Foundation Video Diffusion Models | Generalist Bimanual Manipulation über Stiftung Video Diffusion Modelle | 通过基金会录像传播模型进行通用二手操作 2507.12898v1 |
Authors (8): Yao Feng, Hengkai Tan, Xinyi Mao, Guodong Liu, Shuhe Huang, Chendong Xiang, Hang Su, Jun Zhu
Bimanual robotic manipulation, which involves the coordinated control of two robotic arms, is foundational for solving challenging tasks. Despite recent progress in general-purpose manipulation, data scarcity and embodiment heterogeneity remain serious obstacles to further scaling up in bimanual settings. In this paper, we introduce VIdeo Diffusion for Action Reasoning (VIDAR), a two-stage framework that leverages large-scale, diffusion-based video pre-training and a novel masked inverse dynamics model for action prediction. We pre-train the video diffusion model on 750K multi-view videos from three real-world bimanual robot platforms, utilizing a unified observation space that encodes robot, camera, task, and scene contexts. Our masked inverse dynamics model learns masks to extract action-relevant information from generated trajectories without requiring pixel-level labels, and the masks can effectively generalize to unseen backgrounds. Our experiments demonstrate that with only 20 minutes of human demonstrations on an unseen robot platform (only 1% of typical data requirements), VIDAR generalizes to unseen tasks and backgrounds with strong semantic understanding, surpassing state-of-the-art methods. Our findings highlight the potential of video foundation models, coupled with masked action prediction, to enable scalable and generalizable robotic manipulation in diverse real-world settings.
尽管最近在一般用途操作方面取得了进展,但数据稀缺和化化异性仍严重妨碍在双体环境下进一步扩展。在本文中,我们引入了VIdeo Difulation for Action Acience(VIDAR),这是一个两阶段框架,利用大规模、基于扩散的视频前培训和新颖的掩盖反动动态模型来进行行动预测,这是协调控制两种机器人的工具,是解决具有挑战性任务的基础。我们预先将三个现实世界双人机器人平台的750K多视视频视频的视频传播模型用于培训,利用一个统一观测空间,对机器人、相机、任务和场景环境进行编码。我们蒙面的反动动态模型学习面具,从产生的轨迹中提取与行动相关的信息,而不需要像素级标签,而面具可以有效地概括到不可见的背景。我们的实验表明,仅用20分钟的人类演示时间在看不见的机器人平台上(只有典型数据要求的1%),VIDAR将人类的任务和背景概括成看不见的任务和背景,同时使用强度的精度理解、摄像、超前制、超前期的影像操作方法。
Article 95
Title@2025-07-17 (4): VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks
Title: VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks | VAR-MATH: Wahre mathematische Vernunft in großen Sprachmodellen anhand symbolischer Multi-Instance-Benchmarks | VAR-MATH:通过符号性多因基准在大语言模型中验证真实的数学理由 2507.12885v1 |
Authors (3): Jian Yao, Ran Cheng, Kay Chen Tan
Recent advances in reinforcement learning (RL) have led to substantial improvements in the mathematical reasoning abilities of large language models (LLMs), as measured by standard benchmarks. However, these gains often persist even when models are trained with flawed signals, such as random or inverted rewards, raising a fundamental question: do such improvements reflect true reasoning, or are they merely artifacts of overfitting to benchmark-specific patterns? To address this question, we take an evaluation-centric perspective and identify two critical shortcomings in existing protocols. First, \emph{benchmark contamination} arises from the public availability of test problems, increasing the risk of data leakage. Second, \emph{evaluation fragility} stems from the reliance on single-instance assessments, which are highly sensitive to stochastic outputs and fail to capture reasoning consistency. To overcome these limitations, we introduce {VAR-MATH}, a symbolic evaluation framework designed to probe genuine reasoning ability. By converting fixed numerical problems into symbolic templates and requiring models to solve multiple instantiations of each, VAR-MATH enforces consistent reasoning across structurally equivalent variants, thereby mitigating contamination and improving evaluation robustness. We apply VAR-MATH to transform two popular benchmarks, AMC23 and AIME24, into their symbolic counterparts, VAR-AMC23 and VAR-AIME24. Experimental results reveal substantial performance drops for RL-trained models on the variabilized versions, especially for smaller models, with average declines of 48.0\% on AMC23 and 58.3\% on AIME24. These findings suggest that many existing RL methods rely on superficial heuristics and fail to generalize beyond specific numerical forms. Overall, VAR-MATH offers a principled, contamination-resistant evaluation paradigm for mathematical reasoning.
强化学习(RL)的最近进展导致大型语言模型数学推理能力(LLM)的大幅提高,以标准基准来衡量;然而,即使模型经过随机或反向奖赏等有缺陷的信号的培训,这些收益也往往会持续,提出一个根本问题:这些改进是否反映了真正的推理,或者仅仅是过于符合基准模式的手工艺?为了解决这个问题,我们从评价中心的角度出发,并找出现有协议的两个关键缺陷。第一, 测试问题的公开性{bemph{benchmark 污染}产生于测试问题的公开性,增加了数据渗漏的风险。第二, emph{评价脆弱性}之所以存在,是因为对单向模型进行有缺陷的信号评估,例如随机或反倒转,无法捕捉到推理的一致性。 为了克服这些局限性,我们引入了{VAR-MA},一个用于探究真实推理能力的象征性的直观评估框架。 通过将固定的数字问题转换成象征性模板,要求模型解决每个模型的多重即快速反应,从而减少数据渗漏风险风险。第二, 超越了AAR-AR脆弱性评估的模型,从而减轻了常规污染,并改进了对A-AR结果的缩。
Article 96
Title@2025-07-17 (4): Autonomous Resource Management in Microservice Systems via Reinforcement Learning
Title: Autonomous Resource Management in Microservice Systems via Reinforcement Learning | Autonomes Ressourcenmanagement in Mikroservice-Systemen durch Verstärkungslernen | 通过加强学习,对微小服务系统进行自主资源管理 2507.12879v1 |
Authors (6): Yujun Zou, Nia Qi, Yingnan Deng, Zhihao Xue, Ming Gong, Wuyang Zhang
This paper proposes a reinforcement learning-based method for microservice resource scheduling and optimization, aiming to address issues such as uneven resource allocation, high latency, and insufficient throughput in traditional microservice architectures. In microservice systems, as the number of services and the load increase, efficiently scheduling and allocating resources such as computing power, memory, and storage becomes a critical research challenge. To address this, the paper employs an intelligent scheduling algorithm based on reinforcement learning. Through the interaction between the agent and the environment, the resource allocation strategy is continuously optimized. In the experiments, the paper considers different resource conditions and load scenarios, evaluating the proposed method across multiple dimensions, including response time, throughput, resource utilization, and cost efficiency. The experimental results show that the reinforcement learning-based scheduling method significantly improves system response speed and throughput under low load and high concurrency conditions, while also optimizing resource utilization and reducing energy consumption. Under multi-dimensional resource conditions, the proposed method can consider multiple objectives and achieve optimized resource scheduling. Compared to traditional static resource allocation methods, the reinforcement learning model demonstrates stronger adaptability and optimization capability. It can adjust resource allocation strategies in real time, thereby maintaining good system performance in dynamically changing load and resource environments.
本文建议了一种强化的微观服务资源时间安排和优化学习方法,目的是解决资源分配不均、高潜值和传统微观服务结构中产出不足等问题。在微观服务系统中,服务数量和负荷增加、高效安排和分配资源(如计算能力、记忆和储存)已成为一项关键的研究挑战。为解决这一问题,本文件采用了基于强化学习的智能时间安排算法。通过代理和环境之间的互动,资源分配战略不断得到优化。在实验中,本文件考虑了不同的资源条件和负荷假设,评估了拟议方法的多个方面,包括反应时间、吞吐量、资源利用和成本效益。实验结果表明,强化基于学习的时间安排方法大大改进了低负荷和高通货条件下的系统反应速度和吞吐量,同时优化了资源利用和减少能源消耗。在多维资源条件下,拟议方法可以考虑多个目标,实现优化资源时间安排。与传统的静态资源分配方法相比,强化学习模型显示了更强的适应和优化能力。它能够调整实时资源配置战略,从而保持动态资源负荷环境中的良好系统绩效变化。
Article 97
Title@2025-07-17 (4): Bayesian Modeling and Estimation of Linear Time-Variant Systems using Neural Networks and Gaussian Processes
Title: Bayesian Modeling and Estimation of Linear Time-Variant Systems using Neural Networks and Gaussian Processes | Bayesische Modellierung und Abschätzung von linearen Zeitvariantsystemen unter Verwendung neuraler Netzwerke und Gaußschen Prozessen | 利用神经网络和高斯进程模拟和估计线性时间变化系统 2507.12878v1 |
Authors (1): Yaniv Shulman
The identification of Linear Time-Variant (LTV) systems from input-output data is a fundamental yet challenging ill-posed inverse problem. This work introduces a unified Bayesian framework that models the system’s impulse response, $h(t, \tau)$, as a stochastic process. We decompose the response into a posterior mean and a random fluctuation term, a formulation that provides a principled approach for quantifying uncertainty and naturally defines a new, useful system class we term Linear Time-Invariant in Expectation (LTIE). To perform inference, we leverage modern machine learning techniques, including Bayesian neural networks and Gaussian Processes, using scalable variational inference. We demonstrate through a series of experiments that our framework can robustly infer the properties of an LTI system from a single noisy observation, show superior data efficiency compared to classical methods in a simulated ambient noise tomography problem, and successfully track a continuously varying LTV impulse response by using a structured Gaussian Process prior. This work provides a flexible and robust methodology for uncertainty-aware system identification in dynamic environments.
从投入产出数据中确定线性时间变化(LTV)系统是一个根本性但富有挑战性的反向问题。 这项工作引入了一个统一的巴伊西亚框架,将该系统的脉冲反应($h(t),\toau)美元)作为模拟过程。 我们通过一系列实验将反应分解成一个后向值和随机波动术语,这种配方为量化不确定性提供了原则性方法,并自然地定义了一个新的、有用的系统类,我们称之为“线性时间变化预测(LTIE) ” 。为了进行推断,我们利用现代机器学习技术,包括Bayesian神经网络和Gaussian进程,使用可变的变法推断。我们通过一系列实验证明,我们的框架能够将LTI系统的特性强有力地从单一的噪音观测中推断出来,显示与模拟环境噪音摄影问题的经典方法相比,数据效率更高,并且通过使用结构化的标尺,在动态环境中成功跟踪持续变化的LTV脉冲反应。 这项工作为不确定性- 系统识别提供了灵活和稳健的方法。
Article 98
Title@2025-07-17 (4): Topology-Aware Activation Functions in Neural Networks
Title: Topology-Aware Activation Functions in Neural Networks | Topologie-Bewusst-Aktivierungsfunktionen in neuralen Netzwerken | 神经网络中的地形-软件启动功能 2507.12874v1 |
Authors (2): Pavel Snopov, Oleg R. Musin
This study explores novel activation functions that enhance the ability of neural networks to manipulate data topology during training. Building on the limitations of traditional activation functions like $\mathrm{ReLU}$, we propose $\mathrm{SmoothSplit}$ and $\mathrm{ParametricSplit}$, which introduce topology “cutting” capabilities. These functions enable networks to transform complex data manifolds effectively, improving performance in scenarios with low-dimensional layers. Through experiments on synthetic and real-world datasets, we demonstrate that $\mathrm{ParametricSplit}$ outperforms traditional activations in low-dimensional settings while maintaining competitive performance in higher-dimensional ones. Our findings highlight the potential of topology-aware activation functions in advancing neural network architectures. The code is available via https://github.com/Snopoff/Topology-Aware-Activations.
本研究探索新的激活功能,提高神经网络在培训期间操控数据表层的能力。基于传统激活功能的局限性,例如$\mathrm{SmootSplit}$和$\mathrm{ParaticSplit}$,我们提议采用“分层”的地形能力。这些功能使网络能够有效地转换复杂数据元件,改善低维层情景的性能。通过合成和真实世界数据集的实验,我们证明$\mathrm{ParateritSplit}$超越了低维环境中的传统激活功能,同时保持了高维的竞争性性能。我们的发现凸显了推进神经网络结构中的表层-aware激活功能的潜力。代码可通过https://github.com/Snopoff/Toplogy-Aware-Aactivations查阅。
Article 99
Title@2025-07-17 (4): An Investigation of Ear-EEG Signals for a Novel Biometric Authentication System
Title: An Investigation of Ear-EEG Signals for a Novel Biometric Authentication System | Untersuchung von Ohr-EEG-Signalen für ein neuartiges biometrisches Authentifizierungssystem | 关于新生物测定鉴定系统耳电信号的调查 2507.12873v1 |
Authors (6): Danilo Avola, Giancarlo Crocetti, Gian Luca Foresti, Daniele Pannone, Claudio Piciarelli, Amedeo Ranaldi
This work explores the feasibility of biometric authentication using EEG signals acquired through in-ear devices, commonly referred to as ear-EEG. Traditional EEG-based biometric systems, while secure, often suffer from low usability due to cumbersome scalp-based electrode setups. In this study, we propose a novel and practical framework leveraging ear-EEG signals as a user-friendly alternative for everyday biometric authentication. The system extracts an original combination of temporal and spectral features from ear-EEG signals and feeds them into a fully connected deep neural network for subject identification. Experimental results on the only currently available ear-EEG dataset suitable for different purposes, including biometric authentication, demonstrate promising performance, with an average accuracy of 82\% in a subject identification scenario. These findings confirm the potential of ear-EEG as a viable and deployable direction for next-generation real-world biometric systems.
这项工作探索了使用通过耳耳EEG获得的近距离电路信号进行生物鉴别的可行性。传统的耳EEG生物鉴别系统虽然安全,但由于基于头皮的电极装置繁琐,往往使用率较低。在本研究中,我们提出了一个新的实用框架,利用耳EEG信号作为日常生物鉴别认证的方便用户的替代方法。该系统从耳EEEG信号中提取了时间和光谱特征的原始组合,并将其输入一个完全连通的深神经网络,以便识别对象。目前唯一适合不同用途(包括生物鉴别认证)的耳EEG数据集的实验结果显示有希望的性能,在主题识别假设中平均准确度为82。这些结果证实了耳EEG作为下一代真实世界生物鉴别系统的可行和可部署方向的潜力。
Article 100
Title@2025-07-17 (4): WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding
Title: WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding | WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding | WhoFi: 通过 Wi-Fi 频道信号编码来识别深层人的身份 2507.12869v1 |
Authors (4): Danilo Avola, Daniele Pannone, Dario Montagnini, Emad Emam
Person Re-Identification is a key and challenging task in video surveillance. While traditional methods rely on visual data, issues like poor lighting, occlusion, and suboptimal angles often hinder performance. To address these challenges, we introduce WhoFi, a novel pipeline that utilizes Wi-Fi signals for person re-identification. Biometric features are extracted from Channel State Information (CSI) and processed through a modular Deep Neural Network (DNN) featuring a Transformer-based encoder. The network is trained using an in-batch negative loss function to learn robust and generalizable biometric signatures. Experiments on the NTU-Fi dataset show that our approach achieves competitive results compared to state-of-the-art methods, confirming its effectiveness in identifying individuals via Wi-Fi signals.
个人再识别是视频监控中一项关键和具有挑战性的任务。虽然传统方法依赖于视觉数据,但光线差、隐蔽和次最佳角度等问题往往阻碍业绩。为了应对这些挑战,我们引入了“谁Fi”这一利用无线-Fi信号进行个人再识别的新管道。从频道国家信息(CSI)中提取了生物测量特征,并通过模块化的深神经网络(DNN)(以变异器为基础的编码器为主 ) 处理。这个网络的培训使用内包负负损失功能学习强健和通用的生物识别特征。 NTU-Fi数据集实验显示,我们的方法与最新方法相比,取得了竞争性的结果,证实了它通过Wi-Fi信号识别个人的有效性。
Article 101
Title@2025-07-17 (4): Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)
Title: Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) | Beaufsichtigte Feinabstimmung auf kuratierten Daten ist Verstärktes Lernen (und kann verbessert werden) | 受监督的 “ 封闭数据 “ 微调微调是 “ 强化学习 “ (并可以改进) 2507.12856v1 |
Authors (2): Chongli Qin, Jost Tobias Springenberg
Behavior Cloning (BC) on curated (or filtered) data is the predominant paradigm for supervised fine-tuning (SFT) of large language models; as well as for imitation learning of control policies. Here, we draw on a connection between this successful strategy and the theory and practice of finding optimal policies via Reinforcement Learning (RL). Building on existing literature, we clarify that SFT can be understood as maximizing a lower bound on the RL objective in a sparse reward setting. Giving support to its often observed good performance. From this viewpoint, we realize that a small modification to SFT leads to an importance weighted variant that behaves closer to training with RL as it: i) optimizes a tighter bound to the RL objective and, ii) can improve performance compared to SFT on curated data. We refer to this variant as importance weighted supervised fine-tuning (iw-SFT). We show that it is easy to implement and can be further generalized to training with quality scored data. The resulting SFT variants are competitive with more advanced RL algorithms for large language models and for training policies in continuous control tasks. For example achieving 66.7% on the AIME 2024 dataset.
在整理(或过滤)数据方面,行为克隆(BC)是大型语言模型监督微调(SFT)的主要范例;以及仿照控制政策。在这里,我们利用这一成功战略与通过强化学习(RL)寻找最佳政策的理论和实践之间的联系。在现有文献的基础上,我们澄清,SFT可以理解为在稀薄的奖励环境中最大限度地降低对RL目标的约束。支持其经常观察到的良好业绩。从这个角度看,我们对SFT的小规模修改导致一个重要加权变体,与RL培训更加接近:i)优化与RL目标的更紧密结合,和,ii)能够改进与SFT在整理数据方面的性能。我们把这一变体称为加权有监督的微调(iw-SFT)的重要性。我们表明,实施和进一步推广质量分数数据的培训是容易的。由此形成的SFT变体具有竞争力,与大语言模型的更先进的RL算法和连续控制任务的培训政策具有竞争力。例如,在2024年实现AIME数据中的66.7%。
Article 102
Title@2025-07-17 (4): Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application
Title: Latent Diffusion Model Based Denoising Receiver for 6G Semantic Communication: From Stochastic Differential Theory to Application | Latent Diffusion Modellbasierter Denoisierungsempfänger für 6G Semantische Kommunikation: Von der stochastischen Differentialtheorie zur Anwendung | 用于 6G 语义通讯: 从斯托卡差异理论到应用的 6G 语义通讯的 以 DEM 为基础的前传播模型模型 2506.05710v3 |
Authors (3): Xiucheng Wang, Honggang Jia, Nan Cheng
In this paper, a novel semantic communication framework empowered by generative artificial intelligence (GAI) is proposed, to enhance the robustness against both channel noise and transmission data distribution shifts. A theoretical foundation is established using stochastic differential equations (SDEs), from which a closed-form mapping between any signal-to-noise ratio (SNR) and the optimal denoising timestep is derived. Moreover, to address distribution mismatch, a mathematical scaling method is introduced to align received semantic features with the training distribution of the GAI. Built on this theoretical foundation, a latent diffusion model (LDM)-based semantic communication framework is proposed that combines a variational autoencoder for semantic features extraction, where a pretrained diffusion model is used for denoising. The proposed system is a training-free framework that supports zero-shot generalization, and achieves superior performance under low-SNR and out-of-distribution conditions, offering a scalable and robust solution for future 6G semantic communication systems. Experimental results demonstrate that the proposed semantic communication framework achieves state-of-the-art performance in both pixel-level accuracy and semantic perceptual quality, consistently outperforming baselines across a wide range of SNRs and data distributions without any fine-tuning or post-training.
本文提出一个新的语义通信框架,通过基因人工智能(GAI)增强对频道噪音和传输数据分布变化的稳健性能; 利用随机差分方程(SDEs)建立理论基础,从中得出任何信号对噪音比率(SNR)和最佳分流时间步骤之间的封闭式映射; 此外,为解决分布不匹配问题,还采用了数学缩放方法,使接收到的语义特征与GAI的培训分布相匹配。 在这个理论基础上,提议了一个基于潜在传播模型(LDM)的语义通信框架,将用于提取语义特征的变异自动校对仪(SDEs)结合起来,在此过程中,使用一种先入为定的传播模型进行分解。 拟议的系统是一个无培训框架,支持零发全局化,并在低调和分配条件下实现优异性业绩,为未来的6G语义通信系统提供了一个可扩缩和稳健的解决方案。 实验结果显示,拟议的语义通信框架在SMAL级后质量上实现了任何恒定的SIS级质量。
Article 103
Title@2025-07-17 (4): Transformer-Based Person Identification via Wi-Fi CSI Amplitude and Phase Perturbations
Title: Transformer-Based Person Identification via Wi-Fi CSI Amplitude and Phase Perturbations | Transformerbasierte Personenidentifikation über Wi-Fi CSI Amplitude und Phasenstörungen | 通过Wi-Fi CSI进行基于变压器的人的识别 2507.12854v1 |
Authors (7): Danilo Avola, Andrea Bernardini, Francesco Danese, Mario Lezoche, Maurizio Mancini, Daniele Pannone, Amedeo Ranaldi
Wi-Fi sensing is gaining momentum as a non-intrusive and privacy-preserving alternative to vision-based systems for human identification. However, person identification through wireless signals, particularly without user motion, remains largely unexplored. Most prior wireless-based approaches rely on movement patterns, such as walking gait, to extract biometric cues. In contrast, we propose a transformer-based method that identifies individuals from Channel State Information (CSI) recorded while the subject remains stationary. CSI captures fine-grained amplitude and phase distortions induced by the unique interaction between the human body and the radio signal. To support evaluation, we introduce a dataset acquired with ESP32 devices in a controlled indoor environment, featuring six participants observed across multiple orientations. A tailored preprocessing pipeline, including outlier removal, smoothing, and phase calibration, enhances signal quality. Our dual-branch transformer architecture processes amplitude and phase modalities separately and achieves 99.82\% classification accuracy, outperforming convolutional and multilayer perceptron baselines. These results demonstrate the discriminative potential of CSI perturbations, highlighting their capacity to encode biometric traits in a consistent manner. They further confirm the viability of passive, device-free person identification using low-cost commodity Wi-Fi hardware in real-world settings.
Wi-Fi 感测作为一种不受侵入和隐私保护的人类识别系统替代视像系统的替代方法,正在获得势头;然而,通过无线信号,特别是没有用户运动的无线信号,对人的身份识别基本上仍未得到探索;大多数先前的无线方法都依赖行走步步等运动模式,以提取生物鉴别提示;相反,我们提议采用基于变压器的方法,在主题保持静止的情况下,从频道国家信息中识别个人;CSI捕捉人体和无线电信号之间独特互动造成的细微放大和阶段扭曲;为了支持评价,我们引入了在有控制的室内环境中用ESP32装置获得的数据集,有6名参与者观看了多个方向;专门设计的预处理管道,包括外部清除、平滑和阶段校准,提高了信号质量;我们的双轨结构结构进程分散和阶段,实现了99.82的分类准确度,超过了进化和多层次的透视基线;这些结果进一步显示了CSI渗透的潜力,突出了在有控制的室内环境环境环境中的无弹性识别装置的能力;它们以稳定的方式确认了全球范围内的硬质识别装置的可行性。
Article 104
Title@2025-07-17 (4): Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants
Title: Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants | Site-Level Feintuning mit Progressive Layer Freezing: Auf dem Weg zur robusten Vorhersage der Bronchopulmonalen Dysplasie von Tag 1 Brustradiographen bei extrem prätermen Säuglingen | 与累进层冷冻有关的地点级微调级微调:对极期前婴儿每日1号胸前无线电报上的布朗-希波本二元病原体进行强有力的预测 2507.12269v2 |
Authors (16): Sybelle Goedicke-Fritz, Michelle Bous, Annika Engel, Matthias Flotho, Pascal Hirsch, Hannah Wittig, Dino Milanovic, Dominik Mohr, Mathias Kaspar, Sogand Nemat, Dorothea Kerner, Arno Bücker, Andreas Keller, Sascha Meyer, Michael Zemlin, Philipp Flotho
Bronchopulmonary dysplasia (BPD) is a chronic lung disease affecting 35% of extremely low birth weight infants. Defined by oxygen dependence at 36 weeks postmenstrual age, it causes lifelong respiratory complications. However, preventive interventions carry severe risks, including neurodevelopmental impairment, ventilator-induced lung injury, and systemic complications. Therefore, early BPD prognosis and prediction of BPD outcome is crucial to avoid unnecessary toxicity in low risk infants. Admission radiographs of extremely preterm infants are routinely acquired within 24h of life and could serve as a non-invasive prognostic tool. In this work, we developed and investigated a deep learning approach using chest X-rays from 163 extremely low-birth-weight infants ($\leq$32 weeks gestation, 401-999g) obtained within 24 hours of birth. We fine-tuned a ResNet-50 pretrained specifically on adult chest radiographs, employing progressive layer freezing with discriminative learning rates to prevent overfitting and evaluated a CutMix augmentation and linear probing. For moderate/severe BPD outcome prediction, our best performing model with progressive freezing, linear probing and CutMix achieved an AUROC of 0.78 $\pm$ 0.10, balanced accuracy of 0.69 $\pm$ 0.10, and an F1-score of 0.67 $\pm$ 0.11. In-domain pre-training significantly outperformed ImageNet initialization (p = 0.031) which confirms domain-specific pretraining to be important for BPD outcome prediction. Routine IRDS grades showed limited prognostic value (AUROC 0.57 $\pm$ 0.11), confirming the need of learned markers. Our approach demonstrates that domain-specific pretraining enables accurate BPD prediction from routine day-1 radiographs. Through progressive freezing and linear probing, the method remains computationally feasible for site-level implementation and future federated learning deployments.
棕榈肺炎(BPD)是一种慢性肺病,影响35 % 极低出生体重婴儿的35 % 。 以产期后36周的氧依赖为定义,导致终生呼吸并发症。 然而,预防性干预措施带来严重风险,包括神经发育缺陷、呼吸器引起的肺损伤和系统并发症。 因此, 早期BPD预测和预测BPD结果对于避免低风险婴儿不必要的毒性至关重要。 极早期婴儿预产期(BPD)定期在出生24小时内获得的慢性肺病(BPD) 慢性肺病(BPD) 肺病(BPD) 慢性病(BPD) 的入院前的入院性冷冻(BPD) 的入院前和直线性蛋白(BDD) 初产期的中度X光(BDD) 值 20 m) 预产期的血压(BDRD) 20 m) , 预产期的血压(BDRD) IM IM IM IM 的直流化结果(BD) 10 m) IM IM IM IM IM IM IM IM IM IM IM IM 预化(BD IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IMD IM IM IM IM IM IM IM IM IM IMD IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM IM
Article 105
Title@2025-07-17 (4): Formalising causal inference as prediction on a target population
Title: Formalising causal inference as prediction on a target population | Formalisierende kausale Schlussfolgerungen als Vorhersage für eine Zielpopulation | 将因果推断正规化,作为对目标人口的预测 2407.17385v3 |
Authors (2): Benedikt Höltgen, Robert C. Williamson
The standard approach to causal modelling especially in social and health sciences is the potential outcomes framework due to Neyman and Rubin. In this framework, observations are thought to be drawn from a distribution over variables of interest, and the goal is to identify parameters of this distribution. Even though the stated goal is often to inform decision making on some target population, there is no straightforward way to include these target populations in the framework. Instead of modelling the relationship between the observed sample and the target population, the inductive assumptions in this framework take the form of abstract sampling and independence assumptions. In this paper, we develop a version of this framework that construes causal inference as treatment-wise predictions for finite populations where all assumptions are testable in retrospect; this means that one can not only test predictions themselves (without any fundamental problem) but also investigate sources of error when they fail. Due to close connections to the original framework, established methods can still be be analysed under the new framework.
特别是在社会和健康科学方面,因果建模的标准方法是奈曼和鲁宾公司的潜在成果框架。在这个框架中,人们认为观测是从利益变量的分布中得出的,目标是确定这种分布的参数。尽管所述目标往往是为某些目标人口的决策提供信息,但没有直接的办法将这些目标人口纳入框架。这个框架中的引因假设不是模拟所观察到的抽样和目标人口之间的关系,而是采取抽象抽样和独立假设的形式。在本文中,我们开发了这一框架的版本,将因果关系推论作为可追溯性检验所有假设都可检验的有限人口的治疗预测;这意味着人们不仅可以测试预测本身(没有任何根本问题),还可以在预测失败时调查错误来源。由于与原始框架的联系密切,在新框架下仍然可以分析既定的方法。
Article 106
Title@2025-07-17 (4): Dataset resulting from the user study on comprehensibility of explainable AI algorithms
Title: Dataset resulting from the user study on comprehensibility of explainable AI algorithms | Datensatz aus der Nutzerstudie zur Verständlichkeit erklärbarer KI-Algorithmen | 用户关于可解释的AI算法的可理解性研究产生的数据集 2411.02419v2 |
Authors (8): Szymon Bobek, Paloma Korycińska, Monika Krakowska, Maciej Mozolewski, Dorota Rak, Magdalena Zych, Magdalena Wójcik, Grzegorz J. Nalepa
This paper introduces a dataset that is the result of a user study on the comprehensibility of explainable artificial intelligence (XAI) algorithms. The study participants were recruited from 149 candidates to form three groups representing experts in the domain of mycology (DE), students with a data science and visualization background (IT) and students from social sciences and humanities (SSH). The main part of the dataset contains 39 transcripts of interviews during which participants were asked to complete a series of tasks and questions related to the interpretation of explanations of decisions of a machine learning model trained to distinguish between edible and inedible mushrooms. The transcripts were complemented with additional data that includes visualizations of explanations presented to the user, results from thematic analysis, recommendations of improvements of explanations provided by the participants, and the initial survey results that allow to determine the domain knowledge of the participant and data analysis literacy. The transcripts were manually tagged to allow for automatic matching between the text and other data related to particular fragments. In the advent of the area of rapid development of XAI techniques, the need for a multidisciplinary qualitative evaluation of explainability is one of the emerging topics in the community. Our dataset allows not only to reproduce the study we conducted, but also to open a wide range of possibilities for the analysis of the material we gathered.
本文件介绍一个数据集,该数据集是用户对可解释人工智能算法的可理解性进行研究的结果,研究参与者从149名候选人中征聘,组成代表神学领域专家、具有数据科学和可视化背景的学生以及社会科学和人文学科学生的三个小组。该数据集的主要部分载有39份访谈记录,在访谈过程中,要求参与者完成一系列任务和问题的解释,这些任务和问题与解释为区分食用蘑菇和不可食蘑菇而培训的机器学习模型的决定有关。这些笔录得到了补充,包括向用户提供的解释的可视化数据、专题分析结果、参与者提供的解释改进建议以及初步调查结果,以便确定参与者的域知识以及数据分析知识。这些笔录经过人工标记,使参与者能够自动匹配文本和与特定碎片有关的其他数据。在进入XAI技术的快速发展领域,对解释性进行多学科定性评价的必要性是社区中正在出现的一个专题,也是我们所收集的材料分析的范围。我们的数据设置仅允许我们进行广泛的研究。
Article 107
Title@2025-07-17 (4): A Kernel Distribution Closeness Testing
Title: A Kernel Distribution Closeness Testing | Eine Näherungsprüfung der Kernelverteilung | A 内核分布近距离测试 2507.12843v1 |
Authors (4): Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu
The distribution closeness testing (DCT) assesses whether the distance between a distribution pair is at least $\epsilon$-far. Existing DCT methods mainly measure discrepancies between a distribution pair defined on discrete one-dimensional spaces (e.g., using total variation), which limits their applications to complex data (e.g., images). To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measurement of the distributional discrepancy between two complex distributions, into DCT scenarios. However, we find that MMD’s value can be the same for many pairs of distributions that have different norms in the same reproducing kernel Hilbert space (RKHS), making MMD less informative when assessing the closeness levels for multiple distribution pairs. To mitigate the issue, we design a new measurement of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales MMD’s value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we finally propose the NAMMD-based DCT to assess the closeness levels of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power compared to MMD-based DCT, with bounded type-I error, which is also validated by extensive experiments on many types of data (e.g., synthetic noise, real images). Furthermore, we also apply the proposed NAMMD for addressing the two-sample testing problem and find NAMMD-based two-sample test has higher test power than the MMD-based two-sample test in both theory and experiments.
分布近距离测试 (DCT) 评估分布配对之间的距离是否至少是$\ epsilon$-far。 现有的 DCT 方法主要测量离散单维空间(例如使用全变异)上定义的分配配对之间的差异,这些配对限制其应用到复杂数据(例如图像) 。 要将 DCT 扩大到更多类型的数据,自然的想法是引入最大平均值差异( MMD) , 这是对两种复杂分布分布之间在 DCT 情景中分布差异的有力测量。 然而,我们发现 MMD 的价值对于许多配对分布的配对,对于在同一再生核心Hilbert 空间( RKHSH) 上具有不同规范的配对分布配对差异(例如使用全变异) 。 为了缓解问题,我们发现MDMD(MD-MD) 的配对值值可能是一样的。 我们最后建议基于 NMDMD- MCT 的配对两种基于双基的混和双基的MDMD 测试模型, 的MD- 和MDMD(我们用双基) 测测测测的MD) 。
Article 108
Title@2025-07-17 (4): Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling
Title: Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling | Aufgabenspezifische Generative Datensatzdestillation mit schwer wiegender Probenahme | 利用难于指导的抽样抽样进行任务特定生成数据集蒸馏 2507.03331v2 |
Authors (6): Mingzhuo Li, Guang Li, Jiafeng Mao, Linfeng Ye, Takahiro Ogawa, Miki Haseyama
To alleviate the reliance of deep neural networks on large-scale datasets, dataset distillation aims to generate compact, high-quality synthetic datasets that can achieve comparable performance to the original dataset. The integration of generative models has significantly advanced this field. However, existing approaches primarily focus on aligning the distilled dataset with the original one, often overlooking task-specific information that can be critical for optimal downstream performance. In this paper, focusing on the downstream task of classification, we propose a task-specific sampling strategy for generative dataset distillation that incorporates the concept of difficulty to consider the requirements of the target task better. The final dataset is sampled from a larger image pool with a sampling distribution obtained by matching the difficulty distribution of the original dataset. A logarithmic transformation is applied as a pre-processing step to correct for distributional bias. The results of extensive experiments demonstrate the effectiveness of our method and suggest its potential for enhancing performance on other downstream tasks. The code is available at https://github.com/SumomoTaku/DiffGuideSamp.
为了减轻深层神经网络对大型数据集的依赖,数据集蒸馏旨在产生能够达到与原始数据集可比性能的精密、高质量的合成数据集。基因模型的整合大大推动了这一领域的发展。但是,现有的方法主要侧重于使蒸馏数据集与原始数据集的原始数据集相一致,而原始数据集往往忽略了对最佳下游性能至关重要的具体任务信息。本文侧重于分类的下游任务,我们提出了基因数据集蒸馏的具体任务抽样战略,其中纳入了更好地考虑目标任务要求的困难概念。最终数据集从更大的图像集合中取样,通过匹配原始数据集的难分配而获得的抽样分布。对数转换是用于纠正分配偏差的预处理步骤。广泛的实验结果表明我们的方法的有效性,并表明其提高其他下游任务性能的潜力。代码见https://github.com/Summotoku/DiffGuideSamp。
Article 109
Title@2025-07-17 (4): We should avoid the assumption of data-generating probability distributions in social settings
Title: We should avoid the assumption of data-generating probability distributions in social settings | Wir sollten die Annahme von datengenerierenden Wahrscheinlichkeitsverteilungen in sozialen Settings vermeiden | 我们应该避免假设在社会环境中产生数据的概率分布 2407.17395v4 |
Authors (2): Benedikt Höltgen, Robert C. Williamson
Machine Learning research, including work promoting fair or equitable algorithms, heavily relies on the concept of a data-generating probability distribution. The standard presumption is that since data points are ‘sampled from’ such a distribution, one can learn from observed data about this distribution and, thus, predict future data points which are also drawn from it. We argue, however, that such true probability distributions do not exist and should not be dealt with uncritically. We show that alternative frameworks focusing directly on relevant populations rather than abstract distributions are available and leave classical learning theory almost unchanged. Furthermore, we argue that the assumption of true probabilities or data-generating distributions can be misleading and obscure both the choices made and the goals pursued in machine learning practice. Based on these considerations, this position paper argues that, at least in social settings, machine learning work should avoid assuming data-generating probability distributions.
机器学习研究,包括促进公平或公平算法的工作,在很大程度上依赖于数据生成概率分布的概念。标准假设是,由于数据点是“摘自”这种分布,人们可以从关于这种分布的观察数据中学习,从而预测未来数据点,而数据点也是从中得出的。然而,我们争辩说,这种真实概率分布并不存在,不应以不批评的方式处理。我们表明,有直接关注相关人群而不是抽象分布的替代框架存在,而传统学习理论几乎保持不变。 此外,我们认为,假设真实概率或数据生成分布可能误导人,模糊了在机器学习实践中所作的选择和追求的目标。 基于这些考虑,本立场文件认为,至少在社会环境中,机器学习工作应当避免假定数据生成概率分布。
Article 110
Title@2025-07-17 (4): Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines
Title: Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines | Bridging the Gap: Leveraging Retrieval-Augmented Generation zu besser verstehen öffentliche Bedenken über Impfstoffe | 缩小差距:利用利用回收-养殖一代来更好地了解公众对疫苗的关切 2507.12840v1 |
Authors (6): Muhammad Javed, Sedigh Khademi Habibabadi, Christopher Palmer, Hazel Clothier, Jim Buttery, Gerardo Luis Dimaguila
Vaccine hesitancy threatens public health, leading to delayed or rejected vaccines. Social media is a vital source for understanding public concerns, and traditional methods like topic modelling often struggle to capture nuanced opinions. Though trained for query answering, large Language Models (LLMs) often miss current events and community concerns. Additionally, hallucinations in LLMs can compromise public health communication. To address these limitations, we developed a tool (VaxPulse Query Corner) using the Retrieval Augmented Generation technique. It addresses complex queries about public vaccine concerns on various online platforms, aiding public health administrators and stakeholders in understanding public concerns and implementing targeted interventions to boost vaccine confidence. Analysing 35,103 Shingrix social media posts, it achieved answer faithfulness (0.96) and relevance (0.94).
社交媒体是了解公众关注问题的重要来源,而主题建模等传统方法往往难以捕捉细微的意见。大型语言模型虽然接受过问答培训,但往往忽略当前事件和社区关注问题。此外,LLMS中的幻觉会损害公共卫生沟通。为了解决这些限制,我们利用回收循环代际增强技术开发了一个工具(VaxPulse Query Corner)。它解决了各种在线平台上关于公共疫苗关注问题的复杂问题,帮助公共卫生行政人员和利益攸关方了解公众关注问题,并实施有针对性的干预措施以提高疫苗信心。分析35 103 Shingrix社会媒体文章,实现了忠诚(0.96)和相关性(0.94)。
Article 111
Title@2025-07-17 (4): Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability
Title: Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability | Die Evolution des neuralen Tangentenkerns am Rande der Stabilität verstehen | 了解稳定边缘的内心内核核心的演变 2507.12837v1 |
Authors (3): Kaiqi Jiang, Jeremy Cohen, Yuanzhi Li
The study of Neural Tangent Kernels (NTKs) in deep learning has drawn increasing attention in recent years. NTKs typically actively change during training and are related to feature learning. In parallel, recent work on Gradient Descent (GD) has found a phenomenon called Edge of Stability (EoS), in which the largest eigenvalue of the NTK oscillates around a value inversely proportional to the step size. However, although follow-up works have explored the underlying mechanism of such eigenvalue behavior in depth, the understanding of the behavior of the NTK eigenvectors during EoS is still missing. This paper examines the dynamics of NTK eigenvectors during EoS in detail. Across different architectures, we observe that larger learning rates cause the leading eigenvectors of the final NTK, as well as the full NTK matrix, to have greater alignment with the training target. We then study the underlying mechanism of this phenomenon and provide a theoretical analysis for a two-layer linear network. Our study enhances the understanding of GD training dynamics in deep learning.
近些年来,在深层学习中Neural Tangent Kernels(NTKs)的研究引起了越来越多的关注。在培训期间,NTKs通常会发生积极变化,而且与学习特征有关。与此同时,最近关于渐渐基因(GD)的工作发现了一种叫作稳定边缘(EoS)的现象,在这种现象中,NTK的最大的叶素值与步骤大小成反比,尽管后续工作探索了这种叶质价值行为的基本机制,但在深度方面,对NTK族生物在EOS期间的行为仍然缺乏了解。本文详细分析了NTK族生物在EOS期间的动态。在不同的结构中,我们观察到,较大的学习率导致最终NTK和完整的NTK矩阵的先导体与培训目标更加一致。我们随后研究了这一现象的基本机制,并为两层线性网络提供了理论分析。我们的研究加强了对GD培训动态在深层学习中的了解。
Article 112
Title@2025-07-17 (4): MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results
Title: MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results | MVA 2025 Kleines Multi-Objekt-Tracking für die Vogelbeobachtung Herausforderung: Datensatz, Methoden und Ergebnisse | MVA 2025 发现鸟类挑战小型多目标跟踪:数据集、方法和结果 2507.12832v1 |
Authors (24): Yuki Kondo, Norimichi Ukita, Riku Kanayama, Yuki Yoshida, Takayuki Yamaguchi, Xiang Yu, Guang Liang, Xinyao Liu, Guan-Zhang Wang, Wei-Ta Chu, Bing-Cheng Chuang, Jia-Hua Lee, Pin-Tseng Kuo, I-Hsuan Chu, Yi-Shein Hsiao, Cheng-Han Wu, Po-Yi Wu, Jui-Chien Tsou, Hsuan-Chi Liu, Chun-Yi Lee, Yuan-Fu Yang, Kosuke Shigematsu, Asuka Shin, Ba Tran
Small Multi-Object Tracking (SMOT) is particularly challenging when targets occupy only a few dozen pixels, rendering detection and appearance-based association unreliable. Building on the success of the MVA2023 SOD4SB challenge, this paper introduces the SMOT4SB challenge, which leverages temporal information to address limitations of single-frame detection. Our three main contributions are: (1) the SMOT4SB dataset, consisting of 211 UAV video sequences with 108,192 annotated frames under diverse real-world conditions, designed to capture motion entanglement where both camera and targets move freely in 3D; (2) SO-HOTA, a novel metric combining Dot Distance with HOTA to mitigate the sensitivity of IoU-based metrics to small displacements; and (3) a competitive MVA2025 challenge with 78 participants and 308 submissions, where the winning method achieved a 5.1x improvement over the baseline. This work lays a foundation for advancing SMOT in UAV scenarios with applications in bird strike avoidance, agriculture, fisheries, and ecological monitoring.
在目标只占据几十个像素时,小型多物体跟踪(SMOT)尤其具有挑战性,使探测和外观联系不可靠。 本文件在MVA2023 SOD4SB挑战成功的基础上,介绍了SMOOT4SB挑战,利用时间信息解决单一框架探测的局限性。我们的三大贡献是:(1) SMOT4SB数据集,由211个UAV视频序列组成,在不同的现实世界条件下有108 192个附加说明的框架,旨在捕捉3D摄影机和目标自由移动时的运动纠缠;(2) SO-HOTA,将Dot距离与HOTA相结合的新型指标,以降低基于IOU的测量指标对小规模流离失所的敏感度;(3) MVA2025竞争性指标,有78名参与者和308个提交材料,其中获胜的方法在基线上实现了5.1x改进。这项工作为在避免鸟类撞击、农业、渔业和生态监测方面应用,推进无人机体照情景下的SMOT奠定了基础。
Article 113
Title@2025-07-17 (4): Autoregressive Speech Enhancement via Acoustic Tokens
Title: Autoregressive Speech Enhancement via Acoustic Tokens | Autoregressive Sprachverbesserung durch akustische Token | 通过声调声调增强自动递减语音 2507.12825v1 |
Authors (3): Luca Della Libera, Cem Subakan, Mirco Ravanelli
In speech processing pipelines, improving the quality and intelligibility of real-world recordings is crucial. While supervised regression is the primary method for speech enhancement, audio tokenization is emerging as a promising alternative for a smooth integration with other modalities. However, research on speech enhancement using discrete representations is still limited. Previous work has mainly focused on semantic tokens, which tend to discard key acoustic details such as speaker identity. Additionally, these studies typically employ non-autoregressive models, assuming conditional independence of outputs and overlooking the potential improvements offered by autoregressive modeling. To address these gaps we: 1) conduct a comprehensive study of the performance of acoustic tokens for speech enhancement, including the effect of bitrate and noise strength; 2) introduce a novel transducer-based autoregressive architecture specifically designed for this task. Experiments on VoiceBank and Libri1Mix datasets show that acoustic tokens outperform semantic tokens in terms of preserving speaker identity, and that our autoregressive approach can further improve performance. Nevertheless, we observe that discrete representations still fall short compared to continuous ones, highlighting the need for further research in this area.
在语音处理管道中,提高真实世界录音的质量和智能至关重要。尽管监督回归是增强语音的主要方法,但音效象征正在成为与其他模式顺利融合的有希望的替代方法。然而,关于使用离散表达方式加强语音的研究仍然有限。以前的工作主要侧重于语义符号,这些符号往往会丢弃关键声学细节,如语音身份等。此外,这些研究通常采用非航空模式,假设产出的有条件独立,并忽视自动回归模型提供的潜在改进。为弥补这些差距,我们:(1) 全面研究语音增强声标的性能,包括比特率和噪声强度的影响;(2) 推出一个专门为这项任务设计的新型基于导导导体的自动反向结构。关于语音银行和Libri1Mix数据集的实验表明,音义标物在维护语音身份方面超越立立立立立立的音符号,而且我们自动回归方法可以进一步改进性能。然而,我们观察到,与持续研究领域的必要性相比,离散表达方式仍然不够。
Article 114
Title@2025-07-17 (4): Assessing adaptive world models in machines with novel games
Title: Assessing adaptive world models in machines with novel games | Bewertung adaptiver Weltmodelle in Maschinen mit neuartigen Spielen | 评估用新游戏机器制作的适应性世界模式 2507.12821v1 |
Authors (14): Lance Ying, Katherine M. Collins, Prafull Sharma, Cedric Colas, Kaiya Ivy Zhao, Adrian Weller, Zenna Tavares, Phillip Isola, Samuel J. Gershman, Jacob D. Andreas, Thomas L. Griffiths, Francois Chollet, Kelsey R. Allen, Joshua B. Tenenbaum
Human intelligence exhibits a remarkable capacity for rapid adaptation and effective problem-solving in novel and unfamiliar contexts. We argue that this profound adaptability is fundamentally linked to the efficient construction and refinement of internal representations of the environment, commonly referred to as world models, and we refer to this adaptation mechanism as world model induction. However, current understanding and evaluation of world models in artificial intelligence (AI) remains narrow, often focusing on static representations learned from training on a massive corpora of data, instead of the efficiency and efficacy of models in learning these representations through interaction and exploration within a novel environment. In this Perspective, we provide a view of world model induction drawing on decades of research in cognitive science on how humans learn and adapt so efficiently; we then call for a new evaluation framework for assessing adaptive world models in AI. Concretely, we propose a new benchmarking paradigm based on suites of carefully designed games with genuine, deep and continually refreshing novelty in the underlying game structures – we refer to this kind of games as novel games. We detail key desiderata for constructing these games and propose appropriate metrics to explicitly challenge and evaluate the agent’s ability for rapid world model induction. We hope that this new evaluation framework will inspire future evaluation efforts on world models in AI and provide a crucial step towards developing AI systems capable of the human-like rapid adaptation and robust generalization – a critical component of artificial general intelligence.
人类情报显示,在新的和不熟悉的环境中,快速适应和有效解决问题的能力是惊人的。我们认为,这种深刻的适应能力与有效构建和完善通常称为世界模型的环境内部代表结构有着根本的联系,我们把这种适应机制称为世界模式,称为世界范式。然而,目前对世界人工智能模型的理解和评价仍然狭窄,往往侧重于从大规模数据组合培训中吸取的静态表述,而不是通过在新的环境中互动和探索来学习这些表述的模型的效率和效力。我们从这个角度出发,提出借鉴数十年关于人类如何以如此高效的方式学习和适应的认知科学研究的世界范式介绍;我们随后呼吁建立一个新的评价框架,用于在AI中评估适应性世界模型。具体地说,我们提出一个新的基准范式,以精心设计的游戏套式为基础,真正、深入和不断更新基本游戏结构的新颖的游戏为基础 – – 我们把这种游戏称为新游戏。我们详细介绍了建造这些游戏的关键侧面,并提出了适当的衡量尺度,以明确挑战和评价代理人在快速世界范式上学习和适应的能力;我们希望,新的、稳健的人工智能框架将激励世界范式系统的未来评价。
Article 115
Title@2025-07-17 (4): Self Balancing Neural Network: A Novel Method to Estimate Average Treatment Effect
Title: Self Balancing Neural Network: A Novel Method to Estimate Average Treatment Effect | Self Balancing Neural Network: Eine neuartige Methode zur Schätzung des durchschnittlichen Behandlungseffekts | 自我平衡神经网络:估计平均治疗效果的新办法 2507.12818v1 |
Authors (3): Atomsa Gemechu Abdisa, Yingchun Zhou, Yuqi Qiu
In observational studies, confounding variables affect both treatment and outcome. Moreover, instrumental variables also influence the treatment assignment mechanism. This situation sets the study apart from a standard randomized controlled trial, where the treatment assignment is random. Due to this situation, the estimated average treatment effect becomes biased. To address this issue, a standard approach is to incorporate the estimated propensity score when estimating the average treatment effect. However, these methods incur the risk of misspecification in propensity score models. To solve this issue, a novel method called the “Self balancing neural network” (Sbnet), which lets the model itself obtain its pseudo propensity score from the balancing net, is proposed in this study. The proposed method estimates the average treatment effect by using the balancing net as a key part of the feedforward neural network. This formulation resolves the estimation of the average treatment effect in one step. Moreover, the multi-pseudo propensity score framework, which is estimated from the diversified balancing net and used for the estimation of the average treatment effect, is presented. Finally, the proposed methods are compared with state-of-the-art methods on three simulation setups and real-world datasets. It has been shown that the proposed self-balancing neural network shows better performance than state-of-the-art methods.
在观察研究中,混杂的变量会影响治疗和结果。此外,工具变量也会影响治疗分配机制。这种情况使得研究不同于标准的随机控制试验,因为治疗分配是随机的。由于这种情况,估计的平均治疗效果会产生偏差。为了解决这一问题,一个标准办法是在估计平均治疗效果时纳入估计的偏差分。不过,这些方法在适应性分数模型中存在偏差的风险。为了解决这个问题,本研究建议采用一种叫作“自平衡神经网络”(Sbnet)的新颖方法,让模型本身从平衡网中获得假的偏差分。拟议方法通过使用平衡网作为进料神经网络的关键部分来估计平均治疗效果。这一方法在估计平均治疗效果时,用一个步骤解决了估计平均治疗效果的估计问题。此外,还提出了多功能偏差指数框架,从多样化的平衡网和用于估计平均治疗效果的模型(Sbnet)中估算出来。最后,拟议方法与三个模拟网络显示的自我表现和真实状态的方法相比较。
Article 116
Title@2025-07-17 (4): From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning
Title: From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning | Von der Neuheit zur Imitation: Selbstdestillierte Belohnungen für Offline-Verstärkungslernen | 从新闻到消化:为脱线强化学习自行提炼奖项 2507.12815v1 |
Authors (2): Gaurav Chaudhary, Laxmidhar Behera
Offline Reinforcement Learning (RL) aims to learn effective policies from a static dataset without requiring further agent-environment interactions. However, its practical adoption is often hindered by the need for explicit reward annotations, which can be costly to engineer or difficult to obtain retrospectively. To address this, we propose ReLOAD (Reinforcement Learning with Offline Reward Annotation via Distillation), a novel reward annotation framework for offline RL. Unlike existing methods that depend on complex alignment procedures, our approach adapts Random Network Distillation (RND) to generate intrinsic rewards from expert demonstrations using a simple yet effective embedding discrepancy measure. First, we train a predictor network to mimic a fixed target network’s embeddings based on expert state transitions. Later, the prediction error between these networks serves as a reward signal for each transition in the static dataset. This mechanism provides a structured reward signal without requiring handcrafted reward annotations. We provide a formal theoretical construct that offers insights into how RND prediction errors effectively serve as intrinsic rewards by distinguishing expert-like transitions. Experiments on the D4RL benchmark demonstrate that ReLOAD enables robust offline policy learning and achieves performance competitive with traditional reward-annotated methods.
离线强化学习(RL)旨在从静态数据集中学习有效政策,而不需要进一步的代理-环境互动,然而,实际采用该套数据往往因需要明确的奖赏说明而受阻,因为这种奖励说明对工程设计来说成本高昂或难以追溯获得。为了解决这个问题,我们提议为离线的RL(RL)提供一个新的奖赏说明框架,即通过蒸馏加强离线的奖赏框架。与依赖复杂校准程序的现有方法不同,我们的方法是调整随机网络蒸馏(RND),以便利用简单而有效的嵌入差异计量,从专家演示中产生内在的奖赏。首先,我们训练一个预测网络,模拟固定目标网络基于专家国家过渡的嵌入。后来,这些网络之间的预测错误成为静态数据集每次过渡的奖赏信号。这一机制提供了结构化的奖赏信号,而不需要手工制作的奖赏说明。我们提供了一个正式的理论概念,通过区分专家类型的过渡,来深入了解RND预测错误如何有效发挥内在的奖赏作用。
Article 117
Title@2025-07-17 (4): RONOM: Reduced-Order Neural Operator Modeling
Title: RONOM: Reduced-Order Neural Operator Modeling | RONOM: Reduzierte Neuraloperator-Modellierung | RONOM: 降低轨道神经操作员模型 2507.12814v1 |
Authors (3): Sven Dummer, Dongwei Ye, Christoph Brune
Time-dependent partial differential equations are ubiquitous in physics-based modeling, but they remain computationally intensive in many-query scenarios, such as real-time forecasting, optimal control, and uncertainty quantification. Reduced-order modeling (ROM) addresses these challenges by constructing a low-dimensional surrogate model but relies on a fixed discretization, which limits flexibility across varying meshes during evaluation. Operator learning approaches, such as neural operators, offer an alternative by parameterizing mappings between infinite-dimensional function spaces, enabling adaptation to data across different resolutions. Whereas ROM provides rigorous numerical error estimates, neural operator learning largely focuses on discretization convergence and invariance without quantifying the error between the infinite-dimensional and the discretized operators. This work introduces the reduced-order neural operator modeling (RONOM) framework, which bridges concepts from ROM and operator learning. We establish a discretization error bound analogous to those in ROM, and get insights into RONOM’s discretization convergence and discretization robustness. Moreover, two numerical examples are presented that compare RONOM to existing neural operators for solving partial differential equations. The results demonstrate that RONOM using standard vector-to-vector neural networks achieves comparable performance in input generalization and superior performance in both spatial super-resolution and discretization robustness, while also offering novel insights into temporal super-resolution scenarios.
以时间为基础的部分差异方程式在物理学基础上的模型中普遍存在,但在许多狭义情景中,如实时预测、最佳控制和不确定性量化,它们仍然在计算上十分密集。 减少排序模型(ROM)通过建立一个低维代代谢模型来应对这些挑战,但依靠固定的离散化,这限制了在评价期间不同中间的灵活度。 神经操作者等操作者学习方法提供了一种替代办法,将无限维功能空间之间的绘图参数化,从而能够适应不同分辨率的数据。 ROM提供了严格的数字误差估计,而神经操作者学习主要侧重于离散趋同和差异性,而没有量化无限维度操作者和离散操作者之间的错误。 这项工作引入了减序神经操作模型(RONOM)框架,该框架将各种概念从ROM和操作者学习过程中的不同缩略概念联系起来。 我们设置了一种离散化错误,类似于RONOM的离异功能融合和离析性强度强度。 此外,还提出了两个数字实例,将RONOM与现有神经操作者比较,以便将现有的神经操作者进行分解的趋异的趋性趋性趋性聚合的超度度度度度度度度度度度度度图像,同时在Syal-直方程式中实现S-直方方方方方方方程式化,在可变式的超式的度的度的度上,在S-直方位性方程式中,在SMA制式的度上也展示了性方方程式中实现了性平方程式的性方程式的精确度的性方程式中实现。
Article 118
Title@2025-07-17 (4): ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space
Title: ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space | ZClassifier: Temperatur-Tuning und Manifold-Annäherung über KL Divergenz auf Logit Space | ZClasizer: 通过在登录空间的 KL diggence 进行温度调制和调控相近 2507.10638v2 |
Authors (1): Shim Soon Yong
We introduce a novel classification framework, ZClassifier, that replaces conventional deterministic logits with diagonal Gaussian-distributed logits. Our method simultaneously addresses temperature scaling and manifold approximation by minimizing the Kullback-Leibler (KL) divergence between the predicted Gaussian distributions and a unit isotropic Gaussian. This unifies uncertainty calibration and latent control in a principled probabilistic manner, enabling a natural interpretation of class confidence and geometric consistency. Experiments on CIFAR-10 show that ZClassifier improves over softmax classifiers in robustness, calibration, and latent separation.
我们引入了一个新的分类框架ZClassizer, 它将常规的确定性对数用二角高斯分布式对数取代。 我们的方法同时解决温度缩放和多重近距离问题, 最大限度地缩小高斯分布和单位等离子体之间的 Kullback- Leiber (KL) 差异。 这以原则性概率化的方式统一了不确定性的校准和潜在控制, 使得对等级信任和几何一致性的自然解释成为可能。 对 CIFAR- 10 的实验显示, ZClassorizer 在坚固性、 校准和潜在分离方面对软式分解器的改进了 。
Article 119
Title@2025-07-17 (4): Holistix: A Dataset for Holistic Wellness Dimensions Analysis in Mental Health Narratives
Title: Holistix: A Dataset for Holistic Wellness Dimensions Analysis in Mental Health Narratives | Holistix: Ein Datensatz für ganzheitliche Wellness-Dimensionen Analyse in psychischen Gesundheits-Erzählungen | Holistix:心理健康叙事中整体健康层面分析数据集 2507.09565v2 |
Authors (3): Heba Shakeel, Tanvir Ahmad, Chandni Saxena
We introduce a dataset for classifying wellness dimensions in social media user posts, covering six key aspects: physical, emotional, social, intellectual, spiritual, and vocational. The dataset is designed to capture these dimensions in user-generated content, with a comprehensive annotation framework developed under the guidance of domain experts. This framework allows for the classification of text spans into the appropriate wellness categories. We evaluate both traditional machine learning models and advanced transformer-based models for this multi-class classification task, with performance assessed using precision, recall, and F1-score, averaged over 10-fold cross-validation. Post-hoc explanations are applied to ensure the transparency and interpretability of model decisions. The proposed dataset contributes to region-specific wellness assessments in social media and paves the way for personalized well-being evaluations and early intervention strategies in mental health. We adhere to ethical considerations for constructing and releasing our experiments and dataset publicly on Github.
我们引入了一个数据集,用于在社交媒体用户岗位上对健康层面进行分类,涵盖六个关键方面:身体、情感、社会、智力、精神和职业;数据集旨在捕捉用户生成内容中的这些层面,在领域专家指导下制定全面说明框架;这一框架允许将文字分类为适当的健康类别;我们评估这一多级分类任务的传统机器学习模式和先进的变压器模型,评估工作表现时使用精确度、回溯度和F1分,平均超过10倍的交叉校验;采用后热解解释,以确保示范决定的透明度和可解释性;拟议数据集有助于社会媒体中针对具体区域的健康评估,并为个人化健康评估和心理健康早期干预战略铺平道路;我们坚持在Github上建立和公布我们的实验和数据集的道德考虑。
Article 120
Title@2025-07-17 (4): Quantum Long Short-Term Memory for Drug Discovery
Title: Quantum Long Short-Term Memory for Drug Discovery | Quantenlanges Kurzzeitgedächtnis für die Drogenentdeckung | 药物发现长期短期记忆 2407.19852v2 |
Authors (5): Liang Zhang, Yin Xu, Mohan Wu, Liang Wang, Hua Xu
Quantum computing combined with machine learning (ML) is a highly promising research area, with numerous studies demonstrating that quantum machine learning (QML) is expected to solve scientific problems more effectively than classical ML. In this work, we present Quantum Long Short-Term Memory (QLSTM), a QML architecture, and demonstrate its effectiveness in drug discovery. We evaluate QLSTM on five benchmark datasets (BBBP, BACE, SIDER, BCAP37, T-47D), and observe consistent performance gains over classical LSTM, with ROC-AUC improvements ranging from 3% to over 6%. Furthermore, QLSTM exhibits improved predictive accuracy as the number of qubits increases, and faster convergence than classical LSTM under the same training conditions. Notably, QLSTM maintains strong robustness against quantum computer noise, outperforming noise-free classical LSTM in certain settings. These findings highlight the potential of QLSTM as a scalable and noise-resilient model for scientific applications, particularly as quantum hardware continues to advance in qubit capacity and fidelity.
与机器学习(ML)相结合的量子计算是一个很有希望的研究领域,许多研究表明,量子机器学习(QML)可望比古典ML更有效地解决科学问题。 在这项工作中,我们展示了Qantum长期短期内存(QLSTM)这一QML架构,并展示了其在药物发现方面的效力。我们根据五个基准数据集(BBBP、BACE、SIDER、BCAP37、T-47D)对QLSTM进行了评估,并观察到古典LSTM的一贯性效绩收益,ROC-AUC改进幅度为3%至6%以上。此外,QLSTM还显示出预测性准确性有所提高,因为qubits数量在增加,在相同的培训条件下比经典LSTM的趋同速度更快。值得注意的是,QLSTM对量子噪音保持很强的坚固性,在某些环境下,优于无噪音的古典LSTM。这些结论突出表明,QLSTM作为可扩度和耐噪性的科学应用模型的潜力,特别是量子硬件在qubitit能力和忠诚度上继续进步。
Article 121
Title@2025-07-17 (4): Deep Q-Learning with Gradient Target Tracking
Title: Deep Q-Learning with Gradient Target Tracking | Deep Q-Learning mit gradientem Target Tracking | 与渐进目标跟踪进行深度学习 2503.16700v2 |
Authors (3): Donghwan Lee, Bum Geun Park, Taeho Lee
This paper introduces Q-learning with gradient target tracking, a novel reinforcement learning framework that provides a learned continuous target update mechanism as an alternative to the conventional hard update paradigm. In the standard deep Q-network (DQN), the target network is a copy of the online network’s weights, held fixed for a number of iterations before being periodically replaced via a hard update. While this stabilizes training by providing consistent targets, it introduces a new challenge: the hard update period must be carefully tuned to achieve optimal performance. To address this issue, we propose two gradient-based target update methods: DQN with asymmetric gradient target tracking (AGT2-DQN) and DQN with symmetric gradient target tracking (SGT2-DQN). These methods replace the conventional hard target updates with continuous and structured updates using gradient descent, which effectively eliminates the need for manual tuning. We provide a theoretical analysis proving the convergence of these methods in tabular settings. Additionally, empirical evaluations demonstrate their advantages over standard DQN baselines, which suggest that gradient-based target updates can serve as an effective alternative to conventional target update mechanisms in Q-learning.
本文介绍了使用梯度目标跟踪的Q-学习,这是一个新的强化学习框架,它提供了学习的连续目标更新机制,作为常规硬性更新模式的替代。在标准的深Q网络(DQN)中,目标网络是在线网络的权重的复制件,在通过硬性更新定期更新之前固定了一些迭代。虽然这通过提供一致的目标稳定了培训,但带来了新的挑战:必须仔细调整硬性更新期,以取得最佳业绩。为了解决这一问题,我们提出了两种基于梯度的目标更新方法:DQN,使用不对称的梯度目标跟踪(AGT2-DQN)和DQN,使用对称梯度目标跟踪(SGT2-DQN)。这些方法取代了常规硬性目标更新,使用梯度下降进行连续和结构更新,从而有效地消除了手工调整的需要。我们提供了理论分析,证明这些方法在表格环境中的趋同性。此外,经验评估表明它们比标准DQN基准具有优势,这表明基于梯度的目标更新可以作为Q学习常规目标更新机制的有效替代方法。
Article 122
Title@2025-07-17 (4): Large Language Models’ Internal Perception of Symbolic Music
Title: Large Language Models’ Internal Perception of Symbolic Music | Die innere Wahrnehmung symbolischer Musik durch große Sprachmodelle | 大语言模型内部对符号音乐的感知 2507.12808v1 |
Authors (2): Andrew Shin, Kunitake Kaneko
Large language models (LLMs) excel at modeling relationships between strings in natural language and have shown promise in extending to other symbolic domains like coding or mathematics. However, the extent to which they implicitly model symbolic music remains underexplored. This paper investigates how LLMs represent musical concepts by generating symbolic music data from textual prompts describing combinations of genres and styles, and evaluating their utility through recognition and generation tasks. We produce a dataset of LLM-generated MIDI files without relying on explicit musical training. We then train neural networks entirely on this LLM-generated MIDI dataset and perform genre and style classification as well as melody completion, benchmarking their performance against established models. Our results demonstrate that LLMs can infer rudimentary musical structures and temporal relationships from text, highlighting both their potential to implicitly encode musical patterns and their limitations due to a lack of explicit musical context, shedding light on their generative capabilities for symbolic music.
大型语言模型(LLMS)在天然语言字符串之间的建模关系方面非常出色,并表现出向其他象征性领域,如编码或数学扩展的希望。然而,它们隐含的模拟象征性音乐的模范仍然未得到充分探讨。本文调查LLMS如何通过描述不同类型和风格组合的文字提示生成象征性音乐数据,并通过承认和生成任务来评估其效用。我们制作了LM公司生成的MIDI文件的数据集,而不必依赖明确的音乐培训。然后,我们完全用LLM公司生成的MDI数据集来培训神经网络,并进行原型和风格的分类以及旋律的完成,对照既定模型衡量其性能。我们的成果表明LLMS公司可以推导出基本的音乐结构和文本的时空关系,强调其隐含的音乐模式以及由于缺乏明确的音乐背景而导致的局限性。我们用光灯光来显示其象征音乐音乐的基因化能力。
Article 123
Title@2025-07-17 (4): PMKLC: Parallel Multi-Knowledge Learning-based Lossless Compression for Large-Scale Genomics Database
Title: PMKLC: Parallel Multi-Knowledge Learning-based Lossless Compression for Large-Scale Genomics Database | PMKLC: Parallele Multi-Knowledge Learning-basierte Lossless-Kompression für großformatige Genomics-Datenbank | PMKLC: 大型基因组数据库的平行多知识学习-无损失压缩 2507.12805v1 |
Authors (8): Hui Sun, Yanfeng Ding, Liping Yi, Huidong Ma, Gang Wang, Xiaoguang Liu, Cheng Zhong, Wentong Cai
Learning-based lossless compressors play a crucial role in large-scale genomic database backup, storage, transmission, and management. However, their 1) inadequate compression ratio, 2) low compression \& decompression throughput, and 3) poor compression robustness limit their widespread adoption and application in both industry and academia. To solve those challenges, we propose a novel \underline{P}arallel \underline{M}ulti-\underline{K}nowledge \underline{L}earning-based \underline{C}ompressor (PMKLC) with four crucial designs: 1) We propose an automated multi-knowledge learning-based compression framework as compressors’ backbone to enhance compression ratio and robustness; 2) we design a GPU-accelerated ($s$,$k$)-mer encoder to optimize compression throughput and computing resource usage; 3) we introduce data block partitioning and Step-wise Model Passing (SMP) mechanisms for parallel acceleration; 4) We design two compression modes PMKLC-S and PMKLC-M to meet the complex application scenarios, where the former runs on a resource-constrained single GPU and the latter is multi-GPU accelerated. We benchmark PMKLC-S/M and 14 baselines (7 traditional and 7 leaning-based) on 15 real-world datasets with different species and data sizes. Compared to baselines on the testing datasets, PMKLC-S/M achieve the average compression ratio improvement up to 73.609\% and 73.480\%, the average throughput improvement up to 3.036$\times$ and 10.710$\times$, respectively. Besides, PMKLC-S/M also achieve the best robustness and competitive memory cost, indicating its greater stability against datasets with different probability distribution perturbations, and its strong ability to run on memory-constrained devices.
为解决这些问题,我们提议建立一个新型的基于学习的无损压缩压缩机在大型基因组数据库备份、存储、传输和管理中发挥关键作用。然而,它们(1) 压缩率不足,(2) 压缩压低压压低压压过量,(3) 压缩压强性差限制了其在行业和学术界的广泛采用和应用。为了应对这些挑战,我们提议建立一个新型的以下线(underline{P}arallel\underline{M}M}Text-underline{K}nowledge\deline{L}legleding_BAR__C}Compressor (PMKLC) (PMKLC) (PK) ) (PM) (PM) ) (PM) (PK-LC) (P) (PK-LC) (P) (PK) (PK) (M) (M) (PK) (M) (PM) (M) (M) (M) (M) (M) (M) (PLC) (M) (M) (M) (PL) (PL) (PL) (PL) (PL) (PL) (O) (PD) (P) (O) (PD) (PD) (PD) (PD) (P) (PL) (PL) (O) (O) (O) (O) (PL) (O) (O) (PL) (PL) (P) (PL) (S) (PL) (PL) (PL) (PL) (PL) (S) (P) (PD) (PD) (P) (PD) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (PD) (PD) (PD) (PD) (PD) (PD) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (P) (PD) (P) (P) (P) (P) (PD) (PD) (
Article 124
Title@2025-07-17 (4): Physics-Informed Linear Model (PILM): Analytical Representations and Application to Crustal Strain Rate Estimation
Title: Physics-Informed Linear Model (PILM): Analytical Representations and Application to Crustal Strain Rate Estimation | Physik-informiertes Linearmodell (PILM): Analytische Darstellungen und Anwendung auf Crustal Strain Rate Abschätzung | 物理内建线性模型(PILM):对结壳定流速率估计的分析说明和应用 2507.12218v2 |
Authors (1): Tomohisa Okazaki
Many physical systems are described by partial differential equations (PDEs), and solving these equations and estimating their coefficients or boundary conditions (BCs) from observational data play a crucial role in understanding the associated phenomena. Recently, a machine learning approach known as physics-informed neural network, which solves PDEs using neural networks by minimizing the sum of residuals from the PDEs, BCs, and data, has gained significant attention in the scientific community. In this study, we investigate a physics-informed linear model (PILM) that uses linear combinations of basis functions to represent solutions, thereby enabling an analytical representation of optimal solutions. The PILM was formulated and verified for illustrative forward and inverse problems including cases with uncertain BCs. Furthermore, the PILM was applied to estimate crustal strain rates using geodetic data. Specifically, physical regularization that enforces elastic equilibrium on the velocity fields was compared with mathematical regularization that imposes smoothness constraints. From a Bayesian perspective, mathematical regularization exhibited superior performance. The PILM provides an analytically solvable framework applicable to linear forward and inverse problems, underdetermined systems, and physical regularization.
许多物理系统通过部分差异方程式(PDEs)来描述,解决这些方程式,并从观测数据中估计其系数或边界条件(BCs),在理解相关现象方面起着关键作用。最近,一种称为物理知情神经网络的机器学习方法,通过尽量减少PDE、BCs和数据残留物的总数,通过神经网络解决PDEs,在科学界引起了极大关注。在本研究中,我们调查了一个物理学知情线性模型,该模型使用基础功能的线性组合来代表解决方案,从而能够对最佳解决方案进行分析性的表述。PILM的制定和核实是为了说明前向和反向问题,包括不确定的不完全的BCs案例。此外,PILM还应用大地测量数据来估计地壳压力率。具体地说,实施速度场弹性平衡的物理规范化与要求平稳的数学规范化相比较。从巴伊西亚的角度看,数学正规化表现优。PILM提供了一个用于线性前向和反向问题、定型系统和物理正规化的分析性的分析性框架。
Article 125
Title@2025-07-17 (4): FLDmamba: Integrating Fourier and Laplace Transform Decomposition with Mamba for Enhanced Time Series Prediction
Title: FLDmamba: Integrating Fourier and Laplace Transform Decomposition with Mamba for Enhanced Time Series Prediction | FLDmamba: Integration von Fourier und Laplace-Transformationszersetzung mit Mamba für verbesserte Zeitreihenvorhersage | FLDmamba:将Fourier和Laple变形变形变形与Mamba结合,以提高时间序列预测 2507.12803v1 |
Authors (8): Qianru Zhang, Chenglei Yu, Haixin Wang, Yudong Yan, Yuansheng Cao, Siu-Ming Yiu, Tailin Wu, Hongzhi Yin
Time series prediction, a crucial task across various domains, faces significant challenges due to the inherent complexities of time series data, including non-stationarity, multi-scale periodicity, and transient dynamics, particularly when tackling long-term predictions. While Transformer-based architectures have shown promise, their quadratic complexity with sequence length hinders their efficiency for long-term predictions. Recent advancements in State-Space Models, such as Mamba, offer a more efficient alternative for long-term modeling, but they cannot capture multi-scale periodicity and transient dynamics effectively. Meanwhile, they are susceptible to data noise issues in time series. This paper proposes a novel framework, FLDmamba (Fourier and Laplace Transform Decomposition Mamba), addressing these limitations. FLDmamba leverages the strengths of both Fourier and Laplace transforms to effectively capture both multi-scale periodicity, transient dynamics within time series data, and improve the robustness of the model to the data noise issue. Our extensive experiments demonstrate that FLDmamba achieves superior performance on time series prediction benchmarks, outperforming both Transformer-based and other Mamba-based architectures. To promote the reproducibility of our method, we have made both the code and data accessible via the following URL:{\href{https://github.com/AI4Science-WestlakeU/FLDmamba}{https://github.com/AI4Science-WestlakeU/\model}.
时间序列预测是跨不同领域的关键任务,由于时间序列数据固有的复杂性,包括非常态、多尺度周期和瞬时动态,特别是在处理长期预测时,时间序列预测面临重大挑战。尽管以变异器为基础的结构显示有希望,但其序列长度的二次复杂性妨碍了长期预测的效率。国家空间模型(如Mamba)最近的进展为长期建模提供了更有效的替代方法,但无法有效捕捉多尺度的周期性和瞬时动态。与此同时,它们容易在时间序列中出现数据噪音问题。本文提出了一个新的框架,即FLDmamba(Fourier和Laplace变异变变变变Mamba),解决了这些局限性。FLDmamba利用了Fourier和Laplace变变异的优势,以有效捕捉多尺度周期、时间序列内变异性动态的数据,提高了模型对数据噪音问题的稳健性。我们的广泛实验表明,FLDMmbamba在时间序列预测基准上取得了优异性业绩,超过了以变异器为基础的和以Mamba_Slimia4号/Slimalmalimalmama4为主的建筑。我们用了数据格式的系统。我们的数据转换的系统/Restrodu化方法。
Article 126
Title@2025-07-17 (4): ReCode: Updating Code API Knowledge with Reinforcement Learning
Title: ReCode: Updating Code API Knowledge with Reinforcement Learning | ReCode: Aktualisierung von Code-API-Kenntnissen mit Verstärkungslernen | ReCode:更新法规API知识与强化学习 2506.20495v2 |
Authors (5): Haoze Wu, Yunzhi Yao, Wenhao Yu, Huajun Chen, Ningyu Zhang
Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs’ code generation performance in dynamic API scenarios, especially on the unseen CodeUpdateArena task. Crucially, compared to supervised fine-tuning, ReCode has less impact on LLMs’ general code generation abilities. We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and DAPO), all achieving consistent improvements. Notably, after training, Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned model and the reasoning model with the same architecture. Code is available at https://github.com/zjunlp/ReCode.
大型语言模型(LLMS)具有非凡的代码生成能力,但在适应外部图书馆API的频繁更新时却步履维艰。这一关键限制来自对培训数据中过时的 API 知识的依赖,即使能够查阅现有文件,从而在动态环境中阻碍可靠的代码生成。为了解决这一问题,我们提议ReCode(基于规则的加强学习以更新代码),这是一个模仿人类程序程序员适应API变化的新框架。具体地说,我们建立一个大约2 000个数据条目的数据集,以培训LLMS进行基于更新信息的版本的迁移。然后,我们引入一个修改后的代码评估字符串相似度指标,作为强化学习的奖励。我们的实验表明,ReCode大大提升了LPIS在动态API情景中的代码生成性能,特别是在隐蔽的代码AredateArena任务上。与监管的微调相比,ReCode对于LMS的一般代码生成能力影响较小。我们应用了一套LMS和强化学习算法(GPO和DAPO),所有这些都都实现了一致的改进。 值得注意的是,在培训后,Quender2.5-C-7BB的模型/Rebroughdaldroformax
Article 127
Title@2025-07-17 (4): MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
Title: MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment | MPO: Ein effizientes Post-Processing-Framework zum Mischen unterschiedlicher Präferenzen | MPO: 混合多种优惠协调的高效处理后框架 2502.18699v2 |
Authors (5): Tianze Wang, Dongnan Gui, Yifan Hu, Shuhang Lin, Linjun Zhang
Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning large language models (LLMs). Yet its reliance on a singular reward model often overlooks the diversity of human preferences. Recent approaches address this limitation by leveraging multi-dimensional feedback to fine-tune corresponding reward models and train LLMs using reinforcement learning. However, the process is costly and unstable, especially given the competing and heterogeneous nature of human preferences. In this paper, we propose Mixing Preference Optimization (MPO), a post-processing framework for aggregating single-objective policies as an alternative to both multi-objective RLHF (MORLHF) and MaxMin-RLHF. MPO avoids alignment from scratch. Instead, it log-linearly combines existing policies into a unified one with the weight of each policy computed via a batch stochastic mirror descent. Empirical results demonstrate that MPO achieves balanced performance across diverse preferences, outperforming or matching existing models with significantly reduced computational costs.
从人类反馈中强化学习(RLHF)在调整大型语言模式方面显示了希望。然而,它依赖单一奖励模式往往忽略了人类偏好的多样性。最近的做法通过利用多维反馈来微调相应的奖赏模式,并利用强化学习来培训LMS来应对这一局限性。然而,这一过程成本高且不稳定,特别是考虑到人类偏好的竞争性和多样性性质。在本文件中,我们提议混合优先优化(MPO),这是一个综合单一目标政策的后处理框架,作为多目标RLHF(MORLHF)和MaxMin-RLHF(MLHF)的替代方案。MPO避免从零开始调整。相反,它将现有政策合并成一个统一的政策,与通过分批相近的镜像下降计算出来的每项政策的权重。经验表明,MPO在各种偏好中取得了平衡的业绩,业绩优于或匹配现有模型,计算成本显著降低。
Article 128
Title@2025-07-17 (4): Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises
Title: Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises | Multi-Channel Graph Neural Network for Financial Risk Prediction of NEEQ Enterprises | NEEQ企业金融风险预测多通道图图神经网络 2507.12787v1 |
Authors (1): Jianyu Zhu
With the continuous evolution of China’s multi-level capital market, the National Equities Exchange and Quotations (NEEQ), also known as the “New Third Board,” has become a critical financing platform for small and medium-sized enterprises (SMEs). However, due to their limited scale and financial resilience, many NEEQ-listed companies face elevated risks of financial distress. To address this issue, we propose a multi-channel deep learning framework that integrates structured financial indicators, textual disclosures, and enterprise relationship data for comprehensive financial risk prediction. Specifically, we design a Triple-Channel Graph Isomorphism Network (GIN) that processes numeric, textual, and graph-based inputs separately. These modality-specific representations are fused using an attention-based mechanism followed by a gating unit to enhance robustness and prediction accuracy. Experimental results on data from 7,731 real-world NEEQ companies demonstrate that our model significantly outperforms traditional machine learning methods and single-modality baselines in terms of AUC, Precision, Recall, and F1 Score. This work provides theoretical and practical insights into risk modeling for SMEs and offers a data-driven tool to support financial regulators and investors.
随着中国多层次资本市场的不断发展,国家股票交易所和配额(NEEQ)(NEEQ)(又称“新的第三理事会”)已成为中小企业的重要融资平台,但由于规模有限、金融复原力有限,许多NEEQ上市公司面临更大的金融困境风险。为了解决这一问题,我们提出了一个多渠道深层次学习框架,将结构化金融指标、文字披露和企业关系数据整合到全面金融风险预测中。具体地说,我们设计了一个三合形图单体化网络(GIN),分别处理数字、文字和图表投入。这些特定模式的表述采用基于关注的机制,然后由一个单位组成,以加强稳健性和预测准确性。7 731家实际NEEEQ公司的数据实验结果表明,我们的模型大大超越了AUC、Precision、Recall和F1评分的传统的机器学习方法和单一模式基线。这项工作为中小企业和风险模型提供了理论和实践上的投资者提供了一种理论和实践上的洞察力。
Article 129
Title@2025-07-17 (4): Compact Vision Transformer by Reduction of Kernel Complexity
Title: Compact Vision Transformer by Reduction of Kernel Complexity | Kompakter Vision Transformer durch Reduktion der Kernelkomplexität | 减少内核复杂度,实现全球契约愿景转型 2507.12780v1 |
Authors (2): Yancheng Wang, Yingzhen Yang
Self-attention and transformer architectures have become foundational components in modern deep learning. Recent efforts have integrated transformer blocks into compact neural architectures for computer vision, giving rise to various efficient vision transformers. In this work, we introduce Transformer with Kernel Complexity Reduction, or KCR-Transformer, a compact transformer block equipped with differentiable channel selection, guided by a novel and sharp theoretical generalization bound. KCR-Transformer performs input/output channel selection in the MLP layers of transformer blocks to reduce the computational cost. Furthermore, we provide a rigorous theoretical analysis establishing a tight generalization bound for networks equipped with KCR-Transformer blocks. Leveraging such strong theoretical results, the channel pruning by KCR-Transformer is conducted in a generalization-aware manner, ensuring that the resulting network retains a provably small generalization error. Our KCR-Transformer is compatible with many popular and compact transformer networks, such as ViT and Swin, and it reduces the FLOPs of the vision transformers while maintaining or even improving the prediction accuracy. In the experiments, we replace all the transformer blocks in the vision transformers with KCR-Transformer blocks, leading to KCR-Transformer networks with different backbones. The resulting TCR-Transformers achieve superior performance on various computer vision tasks, achieving even better performance than the original models with even less FLOPs and parameters.
自我关注和变压器结构已成为现代深层学习的基础组成部分。 最近的努力将变压器块整合为计算机视觉的紧凑神经结构, 产生了各种高效的视觉变异器。 在这项工作中, 我们引入了使用内核复杂度减少的变压器, 或KCR- Transformex, 一个配置有不同频道选择的变压器块, 由新颖和尖锐的理论概括性装订。 KCR- Transformation 在变压器块的MLP层中, 执行输入/ 输出频道选择, 以减少计算成本。 此外, 我们提供严格的理论分析, 为配备了 KCR- Transmext 的网络建立紧紧的超紧的超紧的超紧的神经神经结构, 利用这种强烈的理论结果, KCR- Translentrex 的频道运行方式以一般化方式进行, 确保由此形成的网络保留一个可辨别而微小的概括性能错误。 我们的KCR- Translexext与许多广型变压器网络相兼容, 甚至改进了原变压系统的变压系统。
Article 130
Title@2025-07-17 (4): Demystifying MuZero Planning: Interpreting the Learned Model
Title: Demystifying MuZero Planning: Interpreting the Learned Model | MuZero-Planung entmystifizieren: Das gelernte Modell interpretieren | 消除神秘的 “ 零零规划 “ :解释 “ 总结经验 “ 模式 2411.04580v2 |
Authors (4): Hung Guei, Yan-Ru Ju, Wei-Yu Chen, Ti-Rong Wu
MuZero has achieved superhuman performance in various games by using a dynamics network to predict the environment dynamics for planning, without relying on simulators. However, the latent states learned by the dynamics network make its planning process opaque. This paper aims to demystify MuZero’s model by interpreting the learned latent states. We incorporate observation reconstruction and state consistency into MuZero training and conduct an in-depth analysis to evaluate latent states across two board games: 9x9 Go and Gomoku, and three Atari games: Breakout, Ms. Pacman, and Pong. Our findings reveal that while the dynamics network becomes less accurate over longer simulations, MuZero still performs effectively by using planning to correct errors. Our experiments also show that the dynamics network learns better latent states in board games than in Atari games. These insights contribute to a better understanding of MuZero and offer directions for future research to improve the performance, robustness, and interpretability of the MuZero algorithm. The code and data are available at https://rlg.iis.sinica.edu.tw/papers/demystifying-muzero-planning.
Muzero通过使用动态网络预测规划所需的环境动态而不依赖模拟器,在各种游戏中取得了超人性的表现。然而,动态网络所学的潜在状态使得其规划过程变得不透明。本文件旨在通过解释所学的潜在状态来解开Muzero模型的神秘性。我们把观测重建和国家一致性纳入Muzero培训中,并进行深入的分析,以评价两个游戏的潜伏状态:9x9 Go和Gomoku,以及三个Atari游戏:突破、Pacman女士和Pong。我们的调查结果显示,虽然动态网络在较长的模拟中变得不那么精确,但Muzero仍然通过利用规划来纠正错误来有效地运行。我们的实验还表明,动态网络在游戏中学习的潜伏状态比Atari游戏中要好。这些洞察力有助于更好地了解Muzero,并为未来的研究提供方向,以改进Muzero算法的性能、稳健性和可解释性。代码和数据见https://rlg.ienica.edu.tw/papers/demyfrifin-muzerition-plan。
Article 131
Title@2025-07-17 (4): A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models
Title: A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models | Eine umfassende Umfrage zur elektronischen Gesundheitsdatenmodellierung: Von Deep Learning Ansätzen bis hin zu großen Sprachmodellen | 《电子健康记录模型综合调查:从深学习方法到大语言模式》 2507.12774v1 |
Authors (5): Weijieying Ren, Jingxi Zhu, Zehao Liu, Tianxiang Zhao, Vasant Honavar
Artificial intelligence (AI) has demonstrated significant potential in transforming healthcare through the analysis and modeling of electronic health records (EHRs). However, the inherent heterogeneity, temporal irregularity, and domain-specific nature of EHR data present unique challenges that differ fundamentally from those in vision and natural language tasks. This survey offers a comprehensive overview of recent advancements at the intersection of deep learning, large language models (LLMs), and EHR modeling. We introduce a unified taxonomy that spans five key design dimensions: data-centric approaches, neural architecture design, learning-focused strategies, multimodal learning, and LLM-based modeling systems. Within each dimension, we review representative methods addressing data quality enhancement, structural and temporal representation, self-supervised learning, and integration with clinical knowledge. We further highlight emerging trends such as foundation models, LLM-driven clinical agents, and EHR-to-text translation for downstream reasoning. Finally, we discuss open challenges in benchmarking, explainability, clinical alignment, and generalization across diverse clinical settings. This survey aims to provide a structured roadmap for advancing AI-driven EHR modeling and clinical decision support. For a comprehensive list of EHR-related methods, kindly refer to https://survey-on-tabular-data.github.io/.
人工智能(AI)通过电子健康记录(EHRs)的分析和建模,展示了在改变保健方面的巨大潜力;然而,EHR数据固有的异质性、时间性不规则性和具体领域性质,提出了与愿景和自然语言任务截然不同的独特挑战;这项调查全面概述了在深层次学习、大语言模型(LLMS)和EHR建模等交叉方面的最新进展;我们引入了涵盖五个关键设计层面的统一分类:以数据为中心的方法、神经结构设计、以学习为重点的战略、多式联运学习和以LLM为基础的建模系统。我们在每个层面审查涉及数据质量提高、结构和时间代表性、自我监督学习和与临床知识整合的代表性方法。我们进一步强调了基础模型、LLMM驱动的临床代理和下游推理的EHR对文本翻译等新出现的趋势。最后,我们讨论了不同临床环境在基准、解释性、临床调整和一般化方面的公开挑战。这次调查旨在提供一个结构化的路线图,用于推进AIHR驱动的EHR建模/临床决定支持。
Article 132
Title@2025-07-17 (4): Sample-Constrained Black Box Optimization for Audio Personalization
Title: Sample-Constrained Black Box Optimization for Audio Personalization | Sample-Constrained Black Box Optimierung für Audio-Personalisierung | 优化音频个性化 2507.12773v1 |
Authors (3): Rajalaxmi Rajagopalan, Yu-Lin Wei, Romit Roy Choudhury
We consider the problem of personalizing audio to maximize user experience. Briefly, we aim to find a filter $h^$, which applied to any music or speech, will maximize the user’s satisfaction. This is a black-box optimization problem since the user’s satisfaction function is unknown. Substantive work has been done on this topic where the key idea is to play audio samples to the user, each shaped by a different filter $h_i$, and query the user for their satisfaction scores $f(h_i)$. A family of ``surrogate” functions is then designed to fit these scores and the optimization method gradually refines these functions to arrive at the filter $\hat{h}^$ that maximizes satisfaction. In certain applications, we observe that a second type of querying is possible where users can tell us the individual elements $h^[j]$ of the optimal filter $h^$. Consider an analogy from cooking where the goal is to cook a recipe that maximizes user satisfaction. A user can be asked to score various cooked recipes (e.g., tofu fried rice) or to score individual ingredients (say, salt, sugar, rice, chicken, etc.). Given a budget of $B$ queries, where a query can be of either type, our goal is to find the recipe that will maximize this user’s satisfaction. Our proposal builds on Sparse Gaussian Process Regression (GPR) and shows how a hybrid approach can outperform any one type of querying. Our results are validated through simulations and real world experiments, where volunteers gave feedback on music/speech audio and were able to achieve high satisfaction levels. We believe this idea of hybrid querying opens new problems in black-box optimization and solutions can benefit other applications beyond audio personalization.
我们考虑的是将音频个人化以最大限度地增加用户经验的问题。 简而言之, 我们的目标是找到一个用于任何音乐或演讲的过滤 $ $ 美元 的过滤器, 以最大限度地提高用户的满意度。 这是一个黑箱优化问题, 因为用户的满意度功能未知。 已经就这个主题做了大量的工作, 关键的想法是向用户播放音频样本, 每个都由不同的过滤器 $ 美元组成, 并询问用户满意度 $ (h_ i) 。 然后, 我们的目标是为用户提供一种能让用户满意度最大化的食谱。 然后, 一种“ urrogate” 功能, 来适应这些评分, 优化方法逐渐完善这些功能, 以到达过滤器 $ $ (hhat{h) 的满意度最大化。 在某些应用程序中, 我们观察到了第二种查询方式, 糖 $ (j) , 最好的过滤器, 和 新的解算方法可以让用户满意度 。
Article 133
Title@2025-07-17 (4): AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation
Title: AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation | AnyPos: Automatisierte Task-Agnostische Aktionen zur bimanuellen Manipulation | 任何 波 : 用于二手操纵的自动任务- 不可允许动作 2507.12768v1 |
Authors (8): Hengkai Tan, Yao Feng, Xinyi Mao, Shuhe Huang, Guodong Liu, Zhongkai Hao, Hang Su, Jun Zhu
Vision-language-action (VLA) models have shown promise on task-conditioned control in complex settings such as bimanual manipulation. However, the heavy reliance on task-specific human demonstrations limits their generalization and incurs high data acquisition costs. In this work, we present a new notion of task-agnostic action paradigm that decouples action execution from task-specific conditioning, enhancing scalability, efficiency, and cost-effectiveness. To address the data collection challenges posed by this paradigm – such as low coverage density, behavioral redundancy, and safety risks – we introduce ATARA (Automated Task-Agnostic Random Actions), a scalable self-supervised framework that accelerates collection by over $ 30\times $ compared to human teleoperation. To further enable effective learning from task-agnostic data, which often suffers from distribution mismatch and irrelevant trajectories, we propose AnyPos, an inverse dynamics model equipped with Arm-Decoupled Estimation and a Direction-Aware Decoder (DAD). We additionally integrate a video-conditioned action validation module to verify the feasibility of learned policies across diverse manipulation tasks. Extensive experiments show that the AnyPos-ATARA pipeline yields a 51% improvement in test accuracy and achieves 30-40% higher success rates in downstream tasks such as lifting, pick-and-place, and clicking, using replay-based video validation. Project Page: https://embodiedfoundation.github.io/vidar_anypos
视觉-语言行动(VLA)模型显示,在诸如双体操纵等复杂环境下,对任务限制的控制很有希望。然而,对任务特定人类演示的高度依赖限制了它们的一般化,并导致数据获取成本高。在这项工作中,我们提出了一个任务-不可知行动模式的新概念,它使行动的执行与任务特定调节脱钩,提高了可缩放性、效率和成本效益。为了应对这种模式带来的数据收集挑战,例如低覆盖率、行为冗余和安全风险 – – 我们引入了ATARA(自动任务-亚异性随机行动),这是一个可缩放的自我监督框架,比人类远程操作加速收集超过30美元的时间。为了进一步能够有效地学习常常因分配不匹配和不相关的轨迹而受到影响的任务-任务;为了应对这种模式带来的反动态模式,例如低覆盖率、行为重复性刺激和安全风险 – – 我们进一步整合了一个视频-40型行动验证模块,用以加速收集超过30美元的数据采集率收集。我们建议Any-rassimialalalal disal disal laimal laim laimal lagistrual ex laction extraction 30 astrubal ex ex
Article 134
Title@2025-07-17 (4): Layer Separation Deep Learning Model with Auxiliary Variables for Partial Differential Equations
Title: Layer Separation Deep Learning Model with Auxiliary Variables for Partial Differential Equations | Ebenentrennung Deep Learning Modell mit Hilfsvariablen für partielle Differentialgleichungen | 图层分离深学习模型,带有局部差异等量的辅助变量 2507.12766v1 |
Authors (2): Yaru Liu, Yiqi Gu
In this paper, we propose a new optimization framework, the layer separation (LySep) model, to improve the deep learning-based methods in solving partial differential equations. Due to the highly non-convex nature of the loss function in deep learning, existing optimization algorithms often converge to suboptimal local minima or suffer from gradient explosion or vanishing, resulting in poor performance. To address these issues, we introduce auxiliary variables to separate the layers of deep neural networks. Specifically, the output and its derivatives of each layer are represented by auxiliary variables, effectively decomposing the deep architecture into a series of shallow architectures. New loss functions with auxiliary variables are established, in which only variables from two neighboring layers are coupled. Corresponding algorithms based on alternating directions are developed, where many variables can be updated optimally in closed forms. Moreover, we provide theoretical analyses demonstrating the consistency between the LySep model and the original deep model. High-dimensional numerical results validate our theory and demonstrate the advantages of LySep in minimizing loss and reducing solution error.
在本文中,我们提出一个新的优化框架,即分层模型(LySep),以改善深层学习解决部分差异方程式的方法。由于深层学习中损失函数的高度非曲线性质,现有的优化算法往往会聚集到不理想的本地迷你,或者受到梯度爆炸或消失的影响,导致性能不佳。为了解决这些问题,我们引入了辅助变量,将深层神经网络的层层分开。具体地说,每个层的输出及其衍生物由辅助变量代表,有效地将深层结构分解成一系列浅层结构。建立了辅助变量的新损失函数,其中只有两个相邻层的变量相互结合。制定了基于交替方向的对应算法,其中许多变量可以最佳地以封闭的形式更新。此外,我们提供了理论分析,表明LySep模型与原始深层模型的一致性。高维数值结果证实了我们的理论,并展示了LySep在尽量减少损失和减少解决方案错误方面的优势。
Article 135
Title@2025-07-17 (4): Golden Noise for Diffusion Models: A Learning Framework
Title: Golden Noise for Diffusion Models: A Learning Framework | Goldene Geräusche für Diffusionsmodelle: Ein Lernrahmen | 传播模型的黄金噪音:学习框架 2411.09502v5 |
Authors (7): Zikai Zhou, Shitong Shao, Lichen Bai, Shufei Zhang, Zhiqiang Xu, Bo Han, Zeke Xie
Text-to-image diffusion model is a popular paradigm that synthesizes personalized images by providing a text prompt and a random Gaussian noise. While people observe that some noises are golden noises'' that can achieve better text-image alignment and higher human preference than others, we still lack a machine learning framework to obtain those golden noises. To learn golden noises for diffusion sampling, we mainly make three contributions in this paper. First, we identify a new concept termed the \textit{noise prompt}, which aims at turning a random Gaussian noise into a golden noise by adding a small desirable perturbation derived from the text prompt. Following the concept, we first formulate the \textit{noise prompt learning} framework that systematically learns
prompted’’ golden noise associated with a text prompt for diffusion models. Second, we design a noise prompt data collection pipeline and collect a large-scale \textit{noise prompt dataset}~(NPD) that contains 100k pairs of random noises and golden noises with the associated text prompts. With the prepared NPD as the training dataset, we trained a small \textit{noise prompt network}~(NPNet) that can directly learn to transform a random noise into a golden noise. The learned golden noise perturbation can be considered as a kind of prompt for noise, as it is rich in semantic information and tailored to the given text prompt. Third, our extensive experiments demonstrate the impressive effectiveness and generalization of NPNet on improving the quality of synthesized images across various diffusion models, including SDXL, DreamShaper-xl-v2-turbo, and Hunyuan-DiT. Moreover, NPNet is a small and efficient controller that acts as a plug-and-play module with very limited additional inference and computational costs, as it just provides a golden noise instead of a random noise without accessing the original pipeline.
文本到图像扩散模型是一种流行模式, 通过提供文本提示和随机高斯噪音, 将个人化图像合成为一种流行模式。 虽然人们观察到一些噪音是“ 黄金噪音” , 能够实现更好的文本图像匹配和人类偏好, 但我们仍然缺乏一个机器学习框架来获取这些黄金噪音。 要学习黄金噪音用于扩散取样, 我们主要在本文中做出三项贡献。 首先, 我们确定一个名为\ textit{ noise passy} 的新概念, 目的是将随机高斯噪音变成一个金噪音, 通过添加源自文本提示的微小触动。 遵循这个概念, 我们首先制定“ 黄金噪音” 声音“ 黄金噪音” , 能够系统地学习与传播模型的文本提示相关的黄金噪音。 其次, 我们设计一个叫声快速的数据收集管道, 并且仅仅收集一个大型的\ textnalentral { entral mess (NPDD) , 能够将随机的噪音和黄金噪音和音音音化变成一个快速的网络 , , 并直接学习NPDD , , , 将它作为训练成一个快速的网络, 学习一个快速的版本。
Article 136
Title@2025-07-17 (4): TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph
Title: TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph | TBDetector:Transformer-basierter Detektor für erweiterte persistente Bedrohungen mit Provenienzgraph | TB 检测器:用证明图测出先进持久性威胁的转移前检测器 2304.02838v2 |
Authors (10): Nan Wang, Xuezhi Wen, Dalin Zhang, Xibin Zhao, Jiahui Ma, Mengxia Luo, Fan Xu, Sen Nie, Shi Wu, Jiqiang Liu
APT detection is difficult to detect due to the long-term latency, covert and slow multistage attack patterns of Advanced Persistent Threat (APT). To tackle these issues, we propose TBDetector, a transformer-based advanced persistent threat detection method for APT attack detection. Considering that provenance graphs provide rich historical information and have the powerful attacks historic correlation ability to identify anomalous activities, TBDetector employs provenance analysis for APT detection, which summarizes long-running system execution with space efficiency and utilizes transformer with self-attention based encoder-decoder to extract long-term contextual features of system states to detect slow-acting attacks. Furthermore, we further introduce anomaly scores to investigate the anomaly of different system states, where each state is calculated with an anomaly score corresponding to its similarity score and isolation score. To evaluate the effectiveness of the proposed method, we have conducted experiments on five public datasets, i.e., streamspot, cadets, shellshock, clearscope, and wget_baseline. Experimental results and comparisons with state-of-the-art methods have exhibited better performance of our proposed method.
由于高级持久性威胁(APT)的长期潜伏、隐蔽和缓慢的多阶段攻击模式,难以探测APT的探测。为了解决这些问题,我们提议采用TB探测器,这是用于探测APT攻击的基于变压器的先进持久威胁探测方法。考虑到出处图提供了丰富的历史信息,并且具有强大的攻击历史相关能力,可以识别异常活动,肺结核检测员利用出处分析对APT探测进行长期运行的系统实施空间效率分析,并利用基于自我注意的变压器来提取系统国家的长期背景特征以探测慢动攻击。此外,我们进一步引入异常分数,以调查不同系统国家的反常情况,每个州的计算结果与其相似性分数和隔离分数相匹配。为了评价拟议方法的有效性,我们进行了五个公共数据集的实验,即,溪壶、罐子、贝、贝瑟、贝雷、清晰镜和Wget_ baseline。实验结果和与最新方法的比较显示我们拟议方法的更好表现。
Article 137
Title@2025-07-17 (4): World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving
Title: World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving | Weltmodellbasierte End-to-End-Szenengenerierung für Unfallvorhersage im autonomen Fahren | 以世界模式为基础的在自主驾驶中事故预防端至终点到终点示范景点一代 2507.12762v1 |
Authors (6): Yanchen Guan, Haicheng Liao, Chengyue Wang, Xingcheng Liu, Jiaxun Zhang, Zhenning Li
Reliable anticipation of traffic accidents is essential for advancing autonomous driving systems. However, this objective is limited by two fundamental challenges: the scarcity of diverse, high-quality training data and the frequent absence of crucial object-level cues due to environmental disruptions or sensor deficiencies. To tackle these issues, we propose a comprehensive framework combining generative scene augmentation with adaptive temporal reasoning. Specifically, we develop a video generation pipeline that utilizes a world model guided by domain-informed prompts to create high-resolution, statistically consistent driving scenarios, particularly enriching the coverage of edge cases and complex interactions. In parallel, we construct a dynamic prediction model that encodes spatio-temporal relationships through strengthened graph convolutions and dilated temporal operators, effectively addressing data incompleteness and transient visual noise. Furthermore, we release a new benchmark dataset designed to better capture diverse real-world driving risks. Extensive experiments on public and newly released datasets confirm that our framework enhances both the accuracy and lead time of accident anticipation, offering a robust solution to current data and modeling limitations in safety-critical autonomous driving applications.
对交通事故的可靠预测对于推进自主驾驶系统至关重要。然而,这一目标受到两个基本挑战的限制:缺乏多样的高质量培训数据,以及由于环境中断或传感器缺陷而经常缺乏关键的物体级提示。为了解决这些问题,我们提议了一个综合框架,将基因化场景增强与适应性时间推理相结合。具体地说,我们开发了一个视频生成管道,利用一个以领域知情的提示为指导的世界模型来制作高分辨率、统计上一致的驾驶假设,特别是丰富边缘案例和复杂互动的覆盖面。同时,我们建立了一个动态预测模型,通过强化图形变异和变异时间操作器来编码时空关系,有效处理数据不完备和瞬时的视觉噪音。此外,我们发布了一套新的基准数据集,旨在更好地捕捉不同现实世界驱动风险。对公众和新发布的数据集进行的广泛实验证实,我们的框架提高了事故预测的准确性和周转时间,为当前数据提供了强有力的解决方案,并模拟了安全批评性自主驾驶应用程序中存在的限制。
Article 138
Title@2025-07-17 (4): Faster and Space Efficient Indexing for Locality Sensitive Hashing
Title: Faster and Space Efficient Indexing for Locality Sensitive Hashing | Schnellere und raumsparende Indexierung für Lokalitätssensitive Hashing | 地方敏感散列更快和空间高效索引编制 2503.06737v2 |
Authors (2): Bhisham Dev Verma, Rameshwar Pratap
This work suggests faster and space-efficient index construction algorithms for LSH for Euclidean distance (\textit{a.k.a.}~\ELSH) and cosine similarity (\textit{a.k.a.}~\SRP). The index construction step of these LSHs relies on grouping data points into several bins of hash tables based on their hashcode. To generate an $m$-dimensional hashcode of the $d$-dimensional data point, these LSHs first project the data point onto a $d$-dimensional random Gaussian vector and then discretise the resulting inner product. The time and space complexity of both \ELSH~and \SRP~for computing an $m$-sized hashcode of a $d$-dimensional vector is $O(md)$, which becomes impractical for large values of $m$ and $d$. To overcome this problem, we propose two alternative LSH hashcode generation algorithms, both for Euclidean distance and cosine similarity, namely, \CSELSH, \HCSELSH~and \CSSRP, \HCSSRP, respectively. \CSELSH~and \CSSRP~are based on count sketch \cite{count_sketch} and \HCSELSH~and \HCSSRP~utilize higher-order count sketch \cite{shi2019higher}. These proposals significantly reduce the hashcode computation time from $O(md)$ to $O(d)$. Additionally, both \CSELSH~and \CSSRP~reduce the space complexity from $O(md)$ to $O(d)$; ~and \HCSELSH, \HCSSRP~ reduce the space complexity from $O(md)$ to $O(N \sqrt[N]{d})$ respectively, where $N\geq 1$ denotes the size of the input/reshaped tensor. Our proposals are backed by strong mathematical guarantees, and we validate their performance through simulations on various real-world datasets.
这项工作为 ELClidean 距离 (\ textit{ a.k.a. @ELSH) 和 cosine 相似性 (\ textit{ a.k.a. @ SRP.) 提供了更快和空间效率指数的构建算法。 这些 LSH 的构建步骤依赖于将数据点分组到基于其散数的大麻表格的数箱中。 要生成美元维度数据代码, 这些LSH 首次将数据点投放到 $d$n- 维度数据(\ textit{ a.k.a. ELS) 上, 将数据点投放到 $DLS.
Article 139
Title@2025-07-17 (4): A Comprehensive Survey of Synthetic Tabular Data Generation
Title: A Comprehensive Survey of Synthetic Tabular Data Generation | Eine umfassende Übersicht über die Erstellung von synthetischen Tabellendaten | 合成图表数据生成综合调查 2504.16506v3 |
Authors (6): Ruxue Shi, Yili Wang, Mengnan Du, Xu Shen, Yi Chang, Xin Wang
Tabular data is one of the most prevalent and important data formats in real-world applications such as healthcare, finance, and education. However, its effective use in machine learning is often constrained by data scarcity, privacy concerns, and class imbalance. Synthetic tabular data generation has emerged as a powerful solution, leveraging generative models to learn underlying data distributions and produce realistic, privacy-preserving samples. Although this area has seen growing attention, most existing surveys focus narrowly on specific methods (e.g., GANs or privacy-enhancing techniques), lacking a unified and comprehensive view that integrates recent advances such as diffusion models and large language models (LLMs). In this survey, we present a structured and in-depth review of synthetic tabular data generation methods. Specifically, the survey is organized into three core components: (1) Background, which covers the overall generation pipeline, including problem definitions, synthetic tabular data generation methods, post processing, and evaluation; (2) Generation Methods, where we categorize existing approaches into traditional generation methods, diffusion model methods, and LLM-based methods, and compare them in terms of architecture, generation quality, and applicability; and (3) Applications and Challenges, which summarizes practical use cases, highlights common datasets, and discusses open challenges such as heterogeneity, data fidelity, and privacy protection. This survey aims to provide researchers and practitioners with a holistic understanding of the field and to highlight key directions for future work in synthetic tabular data generation.
图表数据是保健、金融、教育等现实世界应用中最普遍和最重要的数据格式之一。然而,在机器学习中,数据的有效使用往往受到数据稀缺、隐私问题和阶级不平衡的制约。合成表格数据生成已经成为一个强有力的解决方案,利用基因化模型学习基本数据分布,并产生现实的、隐私保存样本。虽然这一领域受到越来越多的关注,但大多数现有调查都狭隘地侧重于具体方法(如GANs或增强隐私技术),缺乏统一和全面的观点,无法整合传播模型和大型语言模型(LLLMs)等最新进展。在本调查中,我们对合成表格数据生成方法进行了结构性和深入的审查。具体地说,调查分为三个核心部分:(1) 背景,涵盖总体生成管道,包括问题定义、合成表格生成方法、后处理和评价;(2) 生成方法,我们将现有方法归为传统生成方法、推广模型方法或基于LLMM方法,在结构、生成质量和适用性方面对其进行比较;(3) 应用与挑战,将数据生成方法的实用性与数据采集和数据操作方法的实地分析,从而对数据进行公开性分析,并分析。
Article 140
Title@2025-07-17 (4): Domain-Enhanced Dual-Branch Model for Efficient and Interpretable Accident Anticipation
Title: Domain-Enhanced Dual-Branch Model for Efficient and Interpretable Accident Anticipation | Domain-Enhanced Dual-Branch-Modell für effiziente und interpretierbare Unfallvorhersage | 高效和可解释的意外事故预测的强化双重-双重-双重强化模式 2507.12755v1 |
Authors (7): Yanchen Guan, Haicheng Liao, Chengyue Wang, Bonan Wang, Jiaxun Zhang, Jia Hu, Zhenning Li
Developing precise and computationally efficient traffic accident anticipation system is crucial for contemporary autonomous driving technologies, enabling timely intervention and loss prevention. In this paper, we propose an accident anticipation framework employing a dual-branch architecture that effectively integrates visual information from dashcam videos with structured textual data derived from accident reports. Furthermore, we introduce a feature aggregation method that facilitates seamless integration of multimodal inputs through large models (GPT-4o, Long-CLIP), complemented by targeted prompt engineering strategies to produce actionable feedback and standardized accident archives. Comprehensive evaluations conducted on benchmark datasets (DAD, CCD, and A3D) validate the superior predictive accuracy, enhanced responsiveness, reduced computational overhead, and improved interpretability of our approach, thus establishing a new benchmark for state-of-the-art performance in traffic accident anticipation.
制定准确和计算高效的交通事故预测系统对当代自主驾驶技术至关重要,有利于及时干预和避免损失。本文建议采用一个事故预测框架,采用双部门架构,将破碎摄像头视频的视觉信息与事故报告得出的结构文字数据有效结合起来。此外,我们引入了一种特征汇总方法,通过大型模型(GPT-4o、Long-CLIP),促进多式联运投入的无缝整合,辅之以有针对性的快速工程战略,以产生可采取行动的反馈和标准化事故档案。对基准数据集(DAD、CCD和A3D)进行的全面评价,验证了高预报准确性、更灵敏度、计算间接费用减少以及我们方法的可解释性,从而为交通事故预测的最先进的性能设定了新的基准。
Article 141
Title@2025-07-17 (4): Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning
Title: Multimodal-Guided Dynamic Dataset Pruning for Robust and Efficient Data-Centric Learning | Multimodal geführtes dynamisches Datenset Pruning für robustes und effizientes datenzentrales Lernen | 灵活、高效、高效的数据中心学习的多式指导动态数据集 2507.12750v1 |
Authors (8): Suorong Yang, Peijia Li, Yujie Liu, Zhiming Xu, Peng Ye, Wanli Ouyang, Furao Shen, Dongzhan Zhou
Modern deep models are trained on large real-world datasets, where data quality varies and redundancy is common. Data-centric approaches such as dataset pruning have shown promise in improving training efficiency and model performance. However, most existing methods rely on static heuristics or task-specific metrics, limiting their robustness and generalizability across domains. In this work, we introduce a dynamic dataset pruning framework that adaptively selects training samples based on both task-driven difficulty and cross-modality semantic consistency. By incorporating supervision from pretrained multimodal foundation models, our approach captures training dynamics while effectively filtering out uninformative samples. Our work highlights the potential of integrating cross-modality alignment for robust sample selection, advancing data-centric learning toward more efficient and robust practices across application domains.
对现代深层模型进行关于大型真实世界数据集的培训,这些数据集的数据质量不同,冗余现象也常见。诸如数据集修剪等以数据为中心的方法在提高培训效率和模型性能方面显示了希望。然而,大多数现有方法都依赖于静态的超自然学或特定任务衡量标准,限制了其坚固性和跨领域的一般性。在这项工作中,我们引入了一个动态数据集修剪框架,根据任务驱动的困难和跨模式的语义一致性,适应性地选择培训样本。通过纳入来自预先培训的多式联运基础模型的监督,我们的方法捕捉了培训动态,同时有效地过滤了无信息规范的样本。我们的工作突出强调了将跨模式统一起来以稳健的样本选择,推进以数据为中心的学习,以便在整个应用领域采用更高效、更稳健的做法的潜力。
Article 142
Title@2025-07-17 (4): Learning Universal Human Mobility Patterns with a Foundation Model for Cross-domain Data Fusion
Title: Learning Universal Human Mobility Patterns with a Foundation Model for Cross-domain Data Fusion | Lernen von universellen Mobilitätsmustern mit einem Basismodell für die domänenübergreifende Datenfusion | 具有跨领域数据融合基础模型的学习通用人类流动模式 2503.15779v2 |
Authors (7): Haoxuan Ma, Xishun Liao, Yifan Liu, Qinhua Jiang, Chris Stanford, Shangqing Cao, Jiaqi Ma
Human mobility modeling is critical for urban planning and transportation management, yet existing approaches often lack the integration capabilities needed to handle diverse data sources. We present a foundation model framework for universal human mobility patterns that leverages cross-domain data fusion and large language models to address these limitations. Our approach integrates multi-modal data of distinct nature and spatio-temporal resolution, including geographical, mobility, socio-demographic, and traffic information, to construct a privacy-preserving and semantically enriched human travel trajectory dataset. Our framework demonstrates adaptability through domain transfer techniques that ensure transferability across diverse urban contexts, as evidenced in case studies of Los Angeles (LA) and Egypt. The framework employs LLMs for semantic enrichment of trajectory data, enabling comprehensive understanding of mobility patterns. Quantitative evaluation shows that our generated synthetic dataset accurately reproduces mobility patterns observed in empirical data. The practical utility of this foundation model approach is demonstrated through large-scale traffic simulations for LA County, where results align well with observed traffic data. On California’s I-405 corridor, the simulation yields a Mean Absolute Percentage Error of 5.85% for traffic volume and 4.36% for speed compared to Caltrans PeMS observations, illustrating the framework’s potential for intelligent transportation systems and urban mobility applications.
人类流动模型对于城市规划和交通管理至关重要,但现有方法往往缺乏处理不同数据源所需的整合能力,而现有方法往往缺乏处理不同数据源所需的整合能力。我们为普遍人类流动模式提供了一个基础模型框架,利用跨域数据聚合和大语言模型来应对这些局限性。我们的方法整合了性质不同和时空分辨率的多模式数据,包括地理、流动性、社会人口和交通信息,以构建一个保护隐私和精密丰富的人类旅行轨迹数据集。我们的框架通过域传输技术表现出适应性,确保在不同城市环境中的可转让性,这在洛杉矶和埃及的案例研究中可以证明。框架使用LLMS来对轨迹数据进行语义化浓缩,从而能够全面理解流动模式。定量评估表明,我们生成的合成数据集准确地复制了在实证数据中观察到的流动模式。这一基础模型方法的实用性体现在洛杉矶州大规模交通模拟中,其结果与观测到的交通数据相匹配。在加利福尼亚I-405走廊上,模拟得出了交通流量5.85%的绝对百分误差率,而智能运输系统则显示移动速度为4.36%。
Article 143
Title@2025-07-17 (4): How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction
Title: How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction | Wie wirkt sich Beschriftungsfehler auf das kontrasive Lernen aus? Eine Perspektive aus der Datendimensionalitätsreduktion | 标签错误影响差异影响学习如何进行? 减少数据多维度的视角 2507.11161v2 |
Authors (4): Jun Chen, Hong Chen, Yonghua Yu, Yiming Ying
In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success of contrastive learning. Almost all of them rely on a default assumption, i.e., the label consistency assumption, which may not hold in practice (the probability of failure is called labeling error) due to the strength and randomness of common augmentation strategies, such as random resized crop (RRC). This paper investigates the theoretical impact of labeling error on the downstream classification performance of contrastive learning. We first reveal several significant negative impacts of labeling error on downstream classification risk. To mitigate these impacts, data dimensionality reduction method (e.g., singular value decomposition, SVD) is applied on original data to reduce false positive samples, and establish both theoretical and empirical evaluations. Moreover, it is also found that SVD acts as a double-edged sword, which may lead to the deterioration of downstream classification accuracy due to the reduced connectivity of the augmentation graph. Based on the above observations, we give the augmentation suggestion that we should use some moderate embedding dimension (such as $512, 1024$ in our experiments), data inflation, weak augmentation, and SVD to ensure large graph connectivity and small labeling error to improve model performance.
近些年来,对比式学习在自我监督的代表学习领域取得了最先进的业绩。许多前的工作都试图为对比性学习的成功提供理论理解,几乎所有工作都依赖于默认假设,即标签一致性假设,由于随机调整作物等共同增强战略的强度和随机性(失败概率称为标签误差),这些假设实际上可能无法维持(失败概率称为误差),因为随机调整作物(RRC)等共同增强战略的强度和随机性强。本文调查了标签误差对下游对比性学习的分类性能造成的理论影响。我们首先揭示了下游分类风险标签错误的一些重大负面影响。为了减轻这些影响,数据维度减少方法(例如单值分解法,SVD)在原始数据中应用,以减少假正样,并确定理论性和经验性评价。此外,还发现SVD作为双刃,可能导致下游分类准确性差因增压图的互连通性下降而恶化。根据上述观察,我们给出了几个重大负面的负面负面负面影响。我们给出了增强性试验建议,在10D中,我们应该使用某种适度的增强性数据层面,作为稳定性模型,我们应当使用某种稳定度,将S2,将一些稳定化数据作为稳定性图作为稳定化图。
Article 144
Title@2025-07-17 (4): Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems
Title: Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems | Vereinheitlichung der erklärbaren Anomalienerkennung und der Ursachenanalyse in dynamischen Systemen | 动态系统中不可解释的异常探测和根本原因分析 2502.12086v3 |
Authors (3): Yue Sun, Rick S. Blum, Parv Venkitasubramaniam
Dynamical systems, prevalent in various scientific and engineering domains, are susceptible to anomalies that can significantly impact their performance and reliability. This paper addresses the critical challenges of anomaly detection, root cause localization, and anomaly type classification in dynamical systems governed by ordinary differential equations (ODEs). We define two categories of anomalies: cyber anomalies, which propagate through interconnected variables, and measurement anomalies, which remain localized to individual variables. To address these challenges, we propose the Interpretable Causality Ordinary Differential Equation (ICODE) Networks, a model-intrinsic explainable learning framework. ICODE leverages Neural ODEs for anomaly detection while employing causality inference through an explanation channel to perform root cause analysis (RCA), elucidating why specific time periods are flagged as anomalous. ICODE is designed to simultaneously perform anomaly detection, RCA, and anomaly type classification within a single, interpretable framework. Our approach is grounded in the hypothesis that anomalies alter the underlying ODEs of the system, manifesting as changes in causal relationships between variables. We provide a theoretical analysis of how perturbations in learned model parameters can be utilized to identify anomalies and their root causes in time series data. Comprehensive experimental evaluations demonstrate the efficacy of ICODE across various dynamical systems, showcasing its ability to accurately detect anomalies, classify their types, and pinpoint their origins.
在不同科学和工程领域普遍存在的动态系统很容易受到异常现象的影响,从而对其性能和可靠性产生显著影响。本文件述及在普通差异方程式(ODEs)管理下动态系统中异常现象检测、根本原因本地化和异常类型分类等关键挑战。我们界定了两类异常现象:通过相互关联的变量传播的网络异常现象,以及测量异常现象,这些异常现象仍然与个别变量相本地化。为了应对这些挑战,我们建议采用可解释的可解释的可解释差异性普通差异计算(ICODE)网络,这是一个可解释的示范性学习框架。ICODE利用神经系统来检测异常现象,同时通过解释渠道进行因果关系推断,说明为什么具体时间段被标为反常现象。ICDE旨在同时进行异常检测、RCA和异常类型分类,在单一、可解释的框架内,我们的方法依据的假设是,即异常现象改变系统的基础,表明变量之间的因果关系的变化。我们从理论角度分析了对异常现象进行检测,同时通过解释渠道推断,通过解释渠道进行因果关系推断,通过解释渠道对因果关系进行因果关系推断,解释分析,说明为什么特定时间段段段期被标为异常期,具体标明了具体时间,从而显示IRODE的精确测测测测测测测定了各种的系统。
Article 145
Title@2025-07-17 (4): BEARCUBS: A benchmark for computer-using web agents
Title: BEARCUBS: A benchmark for computer-using web agents | BEARCUBS: Benchmark für computergestützte Web-Agenten | BEARCUBS:计算机使用网络代理器的基准 2503.07919v2 |
Authors (6): Yixiao Song, Katherine Thai, Chau Minh Pham, Yapei Chang, Mazin Nadaf, Mohit Iyyer
Modern web agents possess computer use abilities that allow them to interact with webpages by sending commands to a virtual keyboard and mouse. While such agents have considerable potential to assist human users with complex tasks, evaluating their capabilities in real-world settings poses a major challenge. To this end, we introduce BEARCUBS, a “small but mighty” benchmark of 111 information-seeking questions designed to evaluate a web agent’s ability to search, browse, and identify factual information from the web. Unlike prior web agent benchmarks, solving BEARCUBS requires (1) accessing live web content rather than synthetic or simulated pages, which captures the unpredictability of real-world web interactions; and (2) performing a broad range of multimodal interactions (e.g., video understanding, 3D navigation) that cannot be bypassed via text-based workarounds. Each question in BEARCUBS has a corresponding short, unambiguous answer and a human-validated browsing trajectory, allowing for transparent evaluation of agent performance and strategies. A human study confirms that BEARCUBS questions are solvable but non-trivial (84.7% human accuracy), revealing domain knowledge gaps and overlooked details as common failure points. By contrast, state-of-the-art computer-using agents underperform, with the best-scoring system (OpenAI’s Operator) reaching only 23.4% accuracy. These results highlight critical areas for improvement, including reliable source selection and more powerful multimodal capabilities. To facilitate future research, BEARCUBS will be updated periodically to replace invalid or contaminated questions, keeping the benchmark fresh for future generations of web agents.
现代网络代理拥有计算机使用能力,使其能够通过向虚拟键盘和鼠标发送指令与网页进行互动。虽然这些代理具有协助人类用户完成复杂任务的巨大潜力,但评估其在现实世界环境中的能力是一项重大挑战。为此,我们引入了“小型但强大的”BEARCUBS,这是111个信息查询问题的一个“小型但强大的”基准,旨在评价网络代理商搜索、浏览和识别网上事实信息的能力。与先前的网络代理商基准不同,解决BEARCUBS需要(1) 访问现场网络内容,而不是合成或模拟网页,这可以捕捉真实世界网络互动的不可预测性;以及(2) 进行广泛的多式联运互动(例如视频理解、3D导航),这是无法通过基于文本的变通办法绕过。 BERCUBS的每个问题都有相应的短、明确答案和具有人性价值的浏览轨迹,能够透明地评估代理商的绩效和战略。 一项人类研究证实,BEARCBS问题需要保持可调的但非初始性(84. 7 % 人类网络互动 ) , 揭示域域域域域域域域域域域信息差距差距, 以及操作操作者将在未来的精确性数据更新。
Article 146
Title@2025-07-17 (4): Rethinking Inductive Bias in Geographically Neural Network Weighted Regression
Title: Rethinking Inductive Bias in Geographically Neural Network Weighted Regression | Induktive Bias im geographisch neuralen Netzwerk neu denken Gewichtete Regression | 重新思考在地理神经网络中诱导的偏见 2507.09958v2 |
Authors (1): Zhenyuan Chen
Inductive bias is a key factor in spatial regression models, determining how well a model can learn from limited data and capture spatial patterns. This work revisits the inductive biases in Geographically Neural Network Weighted Regression (GNNWR) and identifies limitations in current approaches for modeling spatial non-stationarity. While GNNWR extends traditional Geographically Weighted Regression by using neural networks to learn spatial weighting functions, existing implementations are often restricted by fixed distance-based schemes and limited inductive bias. We propose to generalize GNNWR by incorporating concepts from convolutional neural networks, recurrent neural networks, and transformers, introducing local receptive fields, sequential context, and self-attention into spatial regression. Through extensive benchmarking on synthetic spatial datasets with varying heterogeneity, noise, and sample sizes, we show that GNNWR outperforms classic methods in capturing nonlinear and complex spatial relationships. Our results also reveal that model performance depends strongly on data characteristics, with local models excelling in highly heterogeneous or small-sample scenarios, and global models performing better with larger, more homogeneous data. These findings highlight the importance of inductive bias in spatial modeling and suggest future directions, including learnable spatial weighting functions, hybrid neural architectures, and improved interpretability for models handling non-stationary spatial data.
在空间回归模型中,诱导偏差是空间回归模型中的一个关键因素,确定模型能够从有限的数据中学习和捕捉空间模式的很好之处。这项工作重新审视了地理神经网络加权回归(GNNWR)中的诱导偏差,并确定了目前模拟空间非静止方法中的局限性。虽然GNNWR通过使用神经网络学习空间加权功能,扩展了传统的地域加权回归法,但现有的实施往往受到固定的远程计划和有限的诱导偏差的限制。我们建议通过纳入来自横向神经网络、经常性神经网络和变异器的概念,将GNNNWRFM(GNNWR)普遍化为GNNWR,引入本地接收字段、顺序环境以及自我关注到空间回归(GNNNWR)中的偏向偏向,通过对合成空间数据集进行广泛的基准设定,同时使用不同的异质、噪音和样本大小,我们表明GNNNWRWR在捕获非线性和复杂的空间关系方面超越了典型方法。我们的结果还表明,模型的绩效在很大程度上取决于数据特征特征,而当地模型在高度异化或小型的模型中,而全球模型则在空间模型中,在更大、更趋同式的模型中以更高的空间结构结构结构上显示更大的方向上显示。这些结论显示,包括更大的空间结构结构结构结构结构结构的改进的改进了更大的分析。
Article 147
Title@2025-07-17 (4): Multi-View Node Pruning for Accurate Graph Representation
Title: Multi-View Node Pruning for Accurate Graph Representation | Multi-View-Knotenschnitt für eine exakte Graphendarstellung | 多查看节点 精确图表代表 2503.11737v4 |
Authors (6): Hanjin Kim, Jiseong Park, Seojin Kim, Jueun Choi, Doheon Lee, Sung Ju Hwang
Graph pooling, which compresses a whole graph into a smaller coarsened graph, is an essential component of graph representation learning. To efficiently compress a given graph, graph pooling methods often drop their nodes with attention-based scoring with the task loss. However, this often results in simply removing nodes with lower degrees without consideration of their feature-level relevance to the given task. To fix this problem, we propose a Multi-View Pruning(MVP), a graph pruning method based on a multi-view framework and reconstruction loss. Given a graph, MVP first constructs multiple graphs for different views either by utilizing the predefined modalities or by randomly partitioning the input features, to consider the importance of each node in diverse perspectives. Then, it learns the score for each node by considering both the reconstruction and the task loss. MVP can be incorporated with any hierarchical pooling framework to score the nodes. We validate MVP on multiple benchmark datasets by coupling it with two graph pooling methods, and show that it significantly improves the performance of the base graph pooling method, outperforming all baselines. Further analysis shows that both the encoding of multiple views and the consideration of reconstruction loss are the key to the success of MVP, and that it indeed identifies nodes that are less important according to domain knowledge.
将整张图压缩成一个小的粗略图, 集成图是图形代表学习的一个基本组成部分。 要有效地压缩一个特定图表, 图形集方法往往会降低节点, 随任务损失的分数而降低。 但是, 这往往导致简单地删除低度节点, 而不考虑与任务任务相关的特性级别。 为了解决这个问题, 我们提议了一个多维观察普鲁宁( 集成) , 这是一种基于多视图框架和重整损失的图形剪切除方法。 如果用一个图表, MVP 首次为不同的观点构建多张图表, 要么使用预设的模式, 要么随机分割输入功能, 以考虑每个节点在不同角度的重要性。 然后, 它会通过考虑重建和任务损失这两个方面来学习每个节点的分数。 为了解决这个问题, 我们用两个图形集方法对多基准数据集进行校准 MVP , 显示它大大改进了基本图表集方法的性能, 超越了所有基线。 进一步的分析显示, 每一个节点的分数的值是, , 重新确定一个关键域的值是, , 它的值的值是, 它的正确值是, 。
Article 148
Title@2025-07-17 (4): Scaling Trends for Data Poisoning in LLMs
Title: Scaling Trends for Data Poisoning in LLMs | Skalierungstrends für Datenvergiftungen in LLMs | LLMM中数据中毒趋势的扩大趋势 2408.02946v6 |
Authors (6): Dillon Bowen, Brendan Murphy, Will Cai, David Khachaturov, Adam Gleave, Kellin Pelrine
LLMs produce harmful and undesirable behavior when trained on datasets containing even a small fraction of poisoned data. We demonstrate that GPT models remain vulnerable to fine-tuning on poisoned data, even when safeguarded by moderation systems. Given the persistence of data poisoning vulnerabilities in today’s most capable models, this paper investigates whether these risks increase with model scaling. We evaluate three threat models – malicious fine-tuning, imperfect data curation, and intentional data contamination – across 24 frontier LLMs ranging from 1.5 to 72 billion parameters. Our experiments reveal that larger LLMs are significantly more susceptible to data poisoning, learning harmful behaviors from even minimal exposure to harmful data more quickly than smaller models. These findings underscore the need for leading AI companies to thoroughly red team fine-tuning APIs before public release and to develop more robust safeguards against data poisoning, particularly as models continue to scale in size and capability.
长效驱虫蚊帐在接受包含哪怕一小部分有毒数据的数据集培训时产生有害和不可取的行为。我们证明,即使受温和系统保护,GPT模型仍然易受有毒数据的微调影响。鉴于数据中毒脆弱性在当今最有能力的模型中持续存在,本文件调查这些风险是否随着模型规模的扩大而增加。我们评估了三个威胁模型 – – 恶意微调、数据整理不完善和故意数据污染 – – 跨越24个边界LMS,范围从1.5亿到720亿参数不等。我们的实验显示,较大的长效驱虫蚊帐极易受到数据中毒的影响,从最低程度接触有害数据中学习有害行为的速度比较小的模型要快得多。这些研究结果突出表明,AI公司需要在公开发布之前对API进行彻底的红色团队微调,并针对数据中毒制定更强有力的保障措施,特别是当模型在规模和能力上继续扩大时。
Article 149
Title@2025-07-17 (4): From SGD to Spectra: A Theory of Neural Network Weight Dynamics
Title: From SGD to Spectra: A Theory of Neural Network Weight Dynamics | Von SGD zu Spectra: Eine Theorie der neuralen Netzwerkgewichtsdynamik | 从SGD到Spetra:神经网络强度动态理论 2507.12709v1 |
Authors (5): Brian Richard Olsen, Sam Fatehmanesh, Frank Xiao, Adarsh Kumarappan, Anirudh Gajula
Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear-we develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects the microscopic dynamics of SGD to the macroscopic evolution of singular-value spectra in weight matrices. We derive exact SDEs showing that squared singular values follow Dyson Brownian motion with eigenvalue repulsion, and characterize stationary distributions as gamma-type densities with power-law tails, providing the first theoretical explanation for the empirically observed ‘bulk+tail’ spectral structure in trained networks. Through controlled experiments on transformer and MLP architectures, we validate our theoretical predictions and demonstrate quantitative agreement between SDE-based forecasts and observed spectral evolution, providing a rigorous foundation for understanding why deep learning works.
深神经网络已经革命了机器学习,但其培训动态在理论上仍然不明朗,我们开发了一个持续的时间、矩阵价值的随机差异方程框架,将SGD的微光学动态与重量矩阵中单值光谱的宏观演进紧密连接起来。我们得出精确的SDE,表明正方形单数值跟随Dyson Brownian运动,并带有egenvalue反射,并将固定分布特征描述为带有电法尾巴的伽马型密度,为在经过培训的网络中经经验观测到的“Bulk+tail”光谱结构提供了第一个理论解释。通过对变压器和MLP结构的受控实验,我们验证了我们的理论预测,并展示了基于SDE的预测和观测到的光谱进化之间的定量一致,为理解深层学习为何起作用提供了坚实的基础。
Article 150
Title@2025-07-17 (4): PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform
Title: PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform | PinFM: Gründungsmodell für Benutzeraktivität Sequenzen auf einer Visual Discovery Platform im Milliardenmaßstab | PinFM:十亿规模视觉发现平台用户活动序列基础模型 2507.12704v1 |
Authors (12): Xiangyi Chen, Kousik Rajesh, Matthew Lawhon, Zelun Wang, Hanyu Li, Haomiao Li, Saurabh Vishwas Joshi, Pong Eksombatchai, Jaewon Yang, Yi-Ping Hsu, Jiajing Xu, Charles Rosenberg
User activity sequences have emerged as one of the most important signals in recommender systems. We present a foundational model, PinFM, for understanding user activity sequences across multiple applications at a billion-scale visual discovery platform. We pretrain a transformer model with 20B+ parameters using extensive user activity data, then fine-tune it for specific applications, efficiently coupling it with existing models. While this pretraining-and-fine-tuning approach has been popular in other domains, such as Vision and NLP, its application in industrial recommender systems presents numerous challenges. The foundational model must be scalable enough to score millions of items every second while meeting tight cost and latency constraints imposed by these systems. Additionally, it should capture the interactions between user activities and other features and handle new items that were not present during the pretraining stage. We developed innovative techniques to address these challenges. Our infrastructure and algorithmic optimizations, such as the Deduplicated Cross-Attention Transformer (DCAT), improved our throughput by 600% on Pinterest internal data. We demonstrate that PinFM can learn interactions between user sequences and candidate items by altering input sequences, leading to a 20% increase in engagement with new items. PinFM is now deployed to help improve the experience of more than a half billion users across various applications.
用户活动序列已成为推荐者系统中最重要的信号之一。 我们展示了一个基础模型, PinFM, 用于在10亿规模的视觉发现平台上理解多个应用程序的用户活动序列。 我们用广泛的用户活动数据预先设计一个20B+参数的变压器模型,然后对它进行微调,将其与具体应用进行精细调整,将其与现有模型高效地结合起来。 虽然这种培训前和调整方法在Vision和NLP等其他领域很受欢迎,但其在工业推荐者系统中的应用却带来许多挑战。 基础模型必须足以每秒得分数百万个项目,同时满足这些系统所施加的严格成本和延缓限制。 此外,它应该捕捉用户活动与其他特点之间的互动,并处理在培训前阶段没有出现的新项目。 我们开发了应对这些挑战的创新性技术。 我们的基础设施与算法优化,如Dedcrediced Cros-Atenyer(DCAT), 将我们的兴趣内部数据的吞吐量提高了600%。 我们证明PinFM现在可以学习用户序列与20 %的应用程序之间的互动, 改进了各种输入程序的经验。
Article 151
Title@2025-07-16 (3): Benchmarking Deception Probes via Black-to-White Performance Boosts
Title: Benchmarking Deception Probes via Black-to-White Performance Boosts | Benchmarking Deception Probes über Black-to-White Performance Boosts | 通过黑到白性性能促进手段的欺骗性探测 2507.12691v1 |
Authors (3): Avi Parrack, Carlo Leonardo Attubato, Stefan Heimersheim
AI assistants will occasionally respond deceptively to user queries. Recently, linear classifiers (called “deception probes”) have been trained to distinguish the internal activations of a language model during deceptive versus honest responses. However, it’s unclear how effective these probes are at detecting deception in practice, nor whether such probes are resistant to simple counter strategies from a deceptive assistant who wishes to evade detection. In this paper, we compare white-box monitoring (where the monitor has access to token-level probe activations) to black-box monitoring (without such access). We benchmark deception probes by the extent to which the white box monitor outperforms the black-box monitor, i.e. the black-to-white performance boost. We find weak but encouraging black-to-white performance boosts from existing deception probes.
AI 助理偶尔会对用户的询问作出欺骗性的反应。 最近,线性分类器(称为“欺骗性探测器 ” ) ( Dedefition 探针 ) 已经接受了培训,以区分在欺骗性回应和诚实回应中语言模型的内部激活。 但是,这些检测器在实际中检测欺骗的效果如何, 以及这些检测器是否对想要逃避检测的欺骗性助手的简单反制策略有抵触性。 在本文中, 我们比较了白箱监测器( 显示器可以使用象征性级别探测器激活) 和黑盒监测器( 没有这种访问) 。 我们用白盒监测器比黑箱监视器( 黑到白) 监测器比黑盒监测器( 黑到白的性能增强) 的程度来衡量欺骗性检测。 我们发现, 黑到鼓励黑到白的性能推动器来自现有的欺骗性探测器。
Article 152
Title@2025-07-16 (3): Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
Title: Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights | Finite-Dimensional Gaussian Approximation für tiefe neurale Netzwerke: Universalität in zufälligen Gewichten | 深神经网络的简单多功能高斯近似度:随机重量的普遍性 2507.12686v1 |
Authors (2): Krishnakumar Balasubramanian, Nathan Ross
We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the FDDs and their Gaussian limit assuming a Lipschitz activation function and allowing the layer widths to grow to infinity at arbitrary relative rates. In the special case where all widths are proportional to a common scale parameter $n$ and there are $L-1$ hidden layers, we obtain convergence rates of order $n^{-({1}/{6})^{L-1} + \epsilon}$, for any $\epsilon > 0$.
我们研究具有随机初始重量且具有定序时间的深神经网络的有限-二分分布(DFDs),具体地说,我们假设Lipsitz激活功能,允许层宽度以任意相对速率增长至无限,在特殊情况下,所有宽度均与通用比例参数成正比,且有1美元隐藏层,我们获得任何1美元($-)-({1}/{6})//L-1}+ epsilon}的汇合率。
Article 153
Title@2025-07-16 (3): Data Transformation Strategies to Remove Heterogeneity
Title: Data Transformation Strategies to Remove Heterogeneity | Strategien zur Datentransformation zur Entfernung von Heterogenität | 消除异异性的数据转换战略 2507.12677v1 |
Authors (11): Sangbong Yoo, Jaeyoung Lee, Chanyoung Yoon, Geonyeong Son, Hyein Hong, Seongbum Seo, Soobin Yim, Chanyoung Jung, Jungsoo Park, Misuk Kim, Yun Jang
Data heterogeneity is a prevalent issue, stemming from various conflicting factors, making its utilization complex. This uncertainty, particularly resulting from disparities in data formats, frequently necessitates the involvement of experts to find resolutions. Current methodologies primarily address conflicts related to data structures and schemas, often overlooking the pivotal role played by data transformation. As the utilization of artificial intelligence (AI) continues to expand, there is a growing demand for a more streamlined data preparation process, and data transformation becomes paramount. It customizes training data to enhance AI learning efficiency and adapts input formats to suit diverse AI models. Selecting an appropriate transformation technique is paramount in preserving crucial data details. Despite the widespread integration of AI across various industries, comprehensive reviews concerning contemporary data transformation approaches are scarce. This survey explores the intricacies of data heterogeneity and its underlying sources. It systematically categorizes and presents strategies to address heterogeneity stemming from differences in data formats, shedding light on the inherent challenges associated with each strategy.
数据差异是一个普遍问题,来自各种相互冲突的因素,使得数据使用变得复杂。这种不确定性,特别是数据格式差异造成的不确定性,往往需要专家参与寻找解决办法。目前的方法主要处理与数据结构和计划有关的冲突,往往忽视数据转换的关键作用。随着人工智能(AI)的继续扩大,对更简化数据编制过程的需求不断增加,数据转换变得至关重要。它定制培训数据,以提高AI学习效率,并调整输入格式以适应不同的AI模式。选择适当的转换技术对于保存关键数据细节至关重要。尽管大赦国际在各行业广泛整合,但当代数据转换方法的全面审查仍然很少。这一调查探索数据差异及其基本来源。它系统地分类和提出战略,以解决数据格式差异引起的差异造成的差异,并阐明与每项战略有关的内在挑战。
Article 154
Title@2025-07-16 (3): Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography
Title: Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography | Vergleichende Auswertung von Radiomik und Deep-Learning-Modellen zur Erkennung von Krankheiten in der Brustradiographie | 比较评价用于在胸针射电摄影中检测疾病辐射学和深学习模型的比较评价 2504.12249v2 |
Authors (2): Zhijin He, Alan B. McMillan
The application of artificial intelligence (AI) in medical imaging has revolutionized diagnostic practices, enabling advanced analysis and interpretation of radiological data. This study presents a comprehensive evaluation of radiomics-based and deep learning-based approaches for disease detection in chest radiography, focusing on COVID-19, lung opacity, and viral pneumonia. While deep learning models, particularly convolutional neural networks and vision transformers, learn directly from image data, radiomics-based models extract handcrafted features, offering potential advantages in data-limited scenarios. We systematically compared the diagnostic performance of various AI models, including Decision Trees, Gradient Boosting, Random Forests, Support Vector Machines, and Multi-Layer Perceptrons for radiomics, against state-of-the-art deep learning models such as InceptionV3, EfficientNetL, and ConvNeXtXLarge. Performance was evaluated across multiple sample sizes. At 24 samples, EfficientNetL achieved an AUC of 0.839, outperforming SVM with an AUC of 0.762. At 4000 samples, InceptionV3 achieved the highest AUC of 0.996, compared to 0.885 for Random Forest. A Scheirer-Ray-Hare test confirmed significant main and interaction effects of model type and sample size on all metrics. Post hoc Mann-Whitney U tests with Bonferroni correction further revealed consistent performance advantages for deep learning models across most conditions. These findings provide statistically validated, data-driven recommendations for model selection in diagnostic AI. Deep learning models demonstrated higher performance and better scalability with increasing data availability, while radiomics-based models may remain useful in low-data contexts. This study addresses a critical gap in AI-based diagnostic research by offering practical guidance for deploying AI models across diverse clinical environments.
在医学成像中应用人工智能(AI)使诊断做法发生革命性的变化,有助于对辐射数据进行先进的分析和解释。本研究对各种人工智能模型的诊断性能进行系统比较,包括决策树、梯级推动、随机森林、支助矢量机和用于放射线学的多射线仪,以及用于放射线摄影的多射线透视器,重点是COVID-19、肺不透明和病毒性肺炎。虽然深层次学习模型,特别是革命性神经网络和视觉变异器,直接从图像数据中学习,但基于放射线的模型提取手动特征,在数据有限的情况下提供了潜在的优势。 在24个样本中,基于高效网络的模型实现了0.839的AUC, 将SVM的诊断性能比AUC高出0.762。在4000个样本中,IncepionV3 和用于放射线测光谱的多射线的多光谱的多光谱测试模型, 提供了最新的AUC-RE-RO-S-S-S-BS-S-SLS-SLS-S-SLS-SLS-S-S-I-I-S-S-S-S-S-I-I-I-S-ILS-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-S-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-
Article 155
Title@2025-07-16 (3): Fly, Fail, Fix: Iterative Game Repair with Reinforcement Learning and Large Multimodal Models
Title: Fly, Fail, Fix: Iterative Game Repair with Reinforcement Learning and Large Multimodal Models | Fly, Fail, Fix: Iterative Spiel Reparatur mit Verstärkung Lernen und große multimodale Modelle | Fly、Fly、fail、Fix:利用强化学习和大型多模式模式进行迭接游戏修理 2507.12666v1 |
Authors (3): Alex Zook, Josef Spjut, Jonathan Tremblay
Game design hinges on understanding how static rules and content translate into dynamic player behavior - something modern generative systems that inspect only a game’s code or assets struggle to capture. We present an automated design iteration framework that closes this gap by pairing a reinforcement learning (RL) agent, which playtests the game, with a large multimodal model (LMM), which revises the game based on what the agent does. In each loop the RL player completes several episodes, producing (i) numerical play metrics and/or (ii) a compact image strip summarising recent video frames. The LMM designer receives a gameplay goal and the current game configuration, analyses the play traces, and edits the configuration to steer future behaviour toward the goal. We demonstrate results that LMMs can reason over behavioral traces supplied by RL agents to iteratively refine game mechanics, pointing toward practical, scalable tools for AI-assisted game design.
游戏设计取决于理解静态规则和内容如何转化为动态玩家行为 — — 这是一种现代基因化系统,只检查游戏代码或资产挣扎以捕捉。我们提出了一个自动设计迭代框架,通过配对强化学习(RL)代理来缩小这一差距,该代理在游戏中进行测试,使用一个大型多式模型(LMM)来测试游戏,该模型根据代理器的行为来修改游戏游戏。在每个循环中,RL播放器完成若干场景,生成(一)数字游戏量度和(或)(二)一个压缩图像条,对最近的视频框架进行总结。 LMM 设计器接收游戏目标和当前游戏配置,分析游戏跟踪,并编辑配置以引导未来行为走向目标。我们展示的结果是,LMMMs可以根据RL代理器提供的行为痕迹来反复改进游戏机械,指向用于 AI 辅助游戏设计的实用且可缩放的工具。
Article 156
Title@2025-07-16 (3): UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
Title: UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning | UPCORE: Nutzenschonende Coreset-Auswahl für ausgewogenes Lernen | UPCORE: 平衡退学的核心选择 2502.15082v2 |
Authors (3): Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal
User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or “forgetting” a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model’s other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model’s representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. Across three standard unlearning methods, UPCORE consistently achieves a superior balance between the competing objectives of deletion efficacy and model preservation. To better evaluate this trade-off, we introduce a new metric, measuring the area-under-the-curve (AUC) across standard metrics. Our results show that UPCORE improves both standard metrics and AUC, benefiting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.
用户的规格或法律框架往往要求从预先培训的模型中删除信息,包括大型语言模型(LLMs),这要求从已经培训的模型中删除或“忘记”一组数据点,这通常会降低其在其他数据点上的性能。因此,必须在删除信息与保持模型其他能力保持完好之间取得平衡,不能平衡这一取舍,导致删除工作不力或无法使用模式。为此,我们提议采用UPCO(通用-保留核心选择),一个方法-不可知性数据选择框架,用以在不学习期间减轻附带损害。发现模型损害与模型在“忘却套”上的表达方式的差异相关,我们有选择地利用“忘记”来清除外源,从而在不学习后尽量减少模式退化。在三种标准的不学习方法中,UPCORE始终在相互竞争的删除功效和模式保存目标之间取得更佳的平衡。为了更好地评估这一取舍,我们提出了一个新的衡量标准度,衡量区域偏向(AUSC)跨标准度。我们的结果表明,UCORE 改进了标准向外部转移点,同时从正向正向核心点的转移。
Article 157
Title@2025-07-16 (3): Physics constrained learning of stochastic characteristics
Title: Physics constrained learning of stochastic characteristics | Physik bedingtes Lernen stochastischer Merkmale | 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 物理限制 2507.12661v1 |
Authors (4): Pardha Sai Krishna Ala, Ameya Salvi, Venkat Krovi, Matthias Schmid
Accurate state estimation requires careful consideration of uncertainty surrounding the process and measurement models; these characteristics are usually not well-known and need an experienced designer to select the covariance matrices. An error in the selection of covariance matrices could impact the accuracy of the estimation algorithm and may sometimes cause the filter to diverge. Identifying noise characteristics has long been a challenging problem due to uncertainty surrounding noise sources and difficulties in systematic noise modeling. Most existing approaches try identifying unknown covariance matrices through an optimization algorithm involving innovation sequences. In recent years, learning approaches have been utilized to determine the stochastic characteristics of process and measurement models. We present a learning-based methodology with different loss functions to identify noise characteristics and test these approaches’ performance for real-time vehicle state estimation
准确的状态估计需要仔细考虑过程和测量模型的不确定性;这些特征通常不为人熟知,需要有经验的设计师来选择共变矩阵。选择共变矩阵中的错误会影响估计算法的准确性,有时还可能造成过滤器的差异。由于噪音源的不确定性和系统噪音建模方面的困难,识别噪音特征长期以来是一个具有挑战性的问题。大多数现有方法都试图通过涉及创新序列的优化算法来识别未知的共变矩阵。近年来,学习方法被用来确定过程和测量模型的随机特征。我们提出了一个基于学习的方法,该方法具有不同的损失功能,用以确定噪音特征,测试这些方法在实时车辆状态估测方面的性能。
Article 158
Title@2025-07-16 (3): Data-driven rainfall prediction at a regional scale: a case study with Ghana
Title: Data-driven rainfall prediction at a regional scale: a case study with Ghana | Datengesteuerte Niederschlagsprognose auf regionaler Ebene: eine Fallstudie mit Ghana | 区域规模以数据驱动的降雨预测:加纳案例研究 2410.14062v3 |
Authors (3): Indrajit Kalita, Lucia Vilallonga, Yves Atchade
With a warming planet, tropical regions are expected to experience the brunt of climate change, with more intense and more volatile rainfall events. Currently, state-of-the-art numerical weather prediction (NWP) models are known to struggle to produce skillful rainfall forecasts in tropical regions of Africa. There is thus a pressing need for improved rainfall forecasting in these regions. Over the last decade or so, the increased availability of large-scale meteorological datasets and the development of powerful machine learning models have opened up new opportunities for data-driven weather forecasting. Focusing on Ghana in this study, we use these tools to develop two U-Net convolutional neural network (CNN) models, to predict 24h rainfall at 12h and 30h lead-time. The models were trained using data from the ERA5 reanalysis dataset, and the GPM-IMERG dataset. A special attention was paid to interpretability. We developed a novel statistical methodology that allowed us to probe the relative importance of the meteorological variables input in our model, offering useful insights into the factors that drive precipitation in the Ghana region. Empirically, we found that our 12h lead-time model has performances that match, and in some accounts are better than the 18h lead-time forecasts produced by the ECMWF (as available in the TIGGE dataset). We also found that combining our data-driven model with classical NWP further improves forecast accuracy.
随着地球变暖,热带地区预计将首当其冲地经受气候变化的冲击,降水事件更加密集和更加动荡,目前已知非洲热带地区最先进的数字天气预测模型(NWP)难以得出熟练的降雨预报,因此迫切需要改善这些地区的降雨预报,过去十年左右期间,大规模气象数据集的可用性增加以及开发强大的机器学习模型为数据驱动的天气预报开辟了新的机会。在本研究中,我们以加纳为重点,利用这些工具开发两个U-Net神经网络模型,预测12小时和30小时的24小时降雨量。这些模型经过培训,使用了ERA5再分析数据集和GMM-IMERG数据集的数据。特别注意了可解释性。我们开发了一种新的统计方法,使我们能够调查气象变量投入在我们模型中的相对重要性,为推动加纳地区降水的因素提供了有用的洞察力。我们发现,我们12小时的铅模型与我们18年的预测模型的同步率数据也比我们18年的GGFIS模型的同步。
Article 159
Title@2025-07-16 (3): DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback
Title: DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback | DOPL: Direktes Online-Preference-Lernen für ruhelose Banditen mit Preference Feedback | DCPL: 提供首选反馈的无休眠强盗直接在线优先学习 2410.05527v2 |
Authors (5): Guojun Xiong, Ujwal Dinesha, Debajoy Mukherjee, Jian Li, Srinivas Shakkottai
Restless multi-armed bandits (RMAB) has been widely used to model constrained sequential decision making problems, where the state of each restless arm evolves according to a Markov chain and each state transition generates a scalar reward. However, the success of RMAB crucially relies on the availability and quality of reward signals. Unfortunately, specifying an exact reward function in practice can be challenging and even infeasible. In this paper, we introduce Pref-RMAB, a new RMAB model in the presence of \textit{preference} signals, where the decision maker only observes pairwise preference feedback rather than scalar reward from the activated arms at each decision epoch. Preference feedback, however, arguably contains less information than the scalar reward, which makes Pref-RMAB seemingly more difficult. To address this challenge, we present a direct online preference learning (DOPL) algorithm for Pref-RMAB to efficiently explore the unknown environments, adaptively collect preference data in an online manner, and directly leverage the preference feedback for decision-makings. We prove that DOPL yields a sublinear regret. To our best knowledge, this is the first algorithm to ensure $\tilde{\mathcal{O}}(\sqrt{T\ln T})$ regret for RMAB with preference feedback. Experimental results further demonstrate the effectiveness of DOPL.
大量多武装强盗(RMAB)被广泛用来模拟受限制的顺序决策问题,让每个不安宁的手臂的状况根据Markov链条演变,每个州过渡产生大规模奖赏。然而,RMAB的成功主要取决于奖赏信号的可用性和质量。不幸的是,在实际中具体说明确切的奖赏功能可能具有挑战性,甚至不可行。在本文中,我们引入了一种新的RMAB模式,即新的RMAB模式,在有Textit{pref}信号的情况下,决策者只能从每个决策节点的激活武器中看到双向的优惠反馈,而不是按比例的奖赏。然而,可以说,PregAB的反馈所包含的信息比使Pref-RMAB看起来更加困难的奖赏还少。为了应对这一挑战,我们为Pref-RMAB提供了直接的在线优惠学习算法,以便有效地探索未知的环境,以在线方式适应性地收集优惠数据,并直接将优惠反馈用于决策。我们证明DPLPL会进一步产生亚值的遗憾。
Article 160
Title@2025-07-16 (3): Improving physics-informed neural network extrapolation via transfer learning and adaptive activation functions
Title: Improving physics-informed neural network extrapolation via transfer learning and adaptive activation functions | Verbesserung der Physik-informierten neuronalen Netzwerk-Extrapolation durch Transfer-Lernen und adaptive Aktivierungsfunktionen | 通过转让学习和适应性启动功能,改进物理学知情神经网络的外推法 2507.12659v1 |
Authors (3): Athanasios Papastathopoulos-Katsaros, Alexandra Stavrianidi, Zhandong Liu
Physics-Informed Neural Networks (PINNs) are deep learning models that incorporate the governing physical laws of a system into the learning process, making them well-suited for solving complex scientific and engineering problems. Recently, PINNs have gained widespread attention as a powerful framework for combining physical principles with data-driven modeling to improve prediction accuracy. Despite their successes, however, PINNs often exhibit poor extrapolation performance outside the training domain and are highly sensitive to the choice of activation functions (AFs). In this paper, we introduce a transfer learning (TL) method to improve the extrapolation capability of PINNs. Our approach applies transfer learning (TL) within an extended training domain, using only a small number of carefully selected collocation points. Additionally, we propose an adaptive AF that takes the form of a linear combination of standard AFs, which improves both the robustness and accuracy of the model. Through a series of experiments, we demonstrate that our method achieves an average of 40% reduction in relative L2 error and an average of 50% reduction in mean absolute error in the extrapolation domain, all without a significant increase in computational cost. The code is available at https://github.com/LiuzLab/PINN-extrapolation .
物理进化神经网络(PINNs)是深层次的学习模式,将系统物理法则纳入学习过程,使其适合于解决复杂的科学和工程问题。最近,PINNs作为将物理原理与数据驱动模型相结合以提高预测准确性的一个强大框架得到了广泛关注。尽管取得了成功,但PINNs在培训领域外的外推性能往往表现不佳,并且对激活功能的选择非常敏感。在本文件中,我们采用了一种转移学习(TL)方法,以提高PINNs的外推能力。我们的方法在扩大的培训领域应用了转移学习(TL),只使用少量仔细选择的合用点。此外,我们建议采用标准AFS的线性组合形式,提高模型的稳健性和准确性。我们通过一系列实验,证明我们的方法实现了相对L2错误平均减少40%,在外推领域平均减少50%的绝对误差,而没有显著增加计算成本。 MAPR/Ls。
Article 161
Title@2025-07-16 (3): Distributional Reinforcement Learning on Path-dependent Options
Title: Distributional Reinforcement Learning on Path-dependent Options | Distributionelle Stärkung Lernen über pathabhängige Optionen | 关于依赖道路的选项的分布强化分发学习 2507.12657v1 |
Authors (1): Ahmet Umur Özsoy
We reinterpret and propose a framework for pricing path-dependent financial derivatives by estimating the full distribution of payoffs using Distributional Reinforcement Learning (DistRL). Unlike traditional methods that focus on expected option value, our approach models the entire conditional distribution of payoffs, allowing for risk-aware pricing, tail-risk estimation, and enhanced uncertainty quantification. We demonstrate the efficacy of this method on Asian options, using quantile-based value function approximators.
我们重新解释并提出了一个依赖路径的金融衍生物定价框架,方法是利用分配强化学习(DistRL)估算全面分配收益。 与注重预期选择价值的传统方法不同,我们的方法模拟了整个有条件的支付分配,允许风险意识定价、尾风险估算和增强不确定性量化。 我们用基于量化价值的功能近似器展示了这一方法对亚洲选项的有效性。
Article 162
Title@2025-07-16 (3): Federated Learning in Open- and Closed-Loop EMG Decoding: A Privacy and Performance Perspective
Title: Federated Learning in Open- and Closed-Loop EMG Decoding: A Privacy and Performance Perspective | Federated Learning in Open- and Closed-Loop EMG Decodierung: Eine Datenschutz- und Performanceperspektive | 开放和闭闭门和闭闭门环境管理集团解释中的联邦学习:隐私和业绩展望 2507.12652v1 |
Authors (3): Kai Malcolm, César Uribe, Momona Yamagami
Invasive and non-invasive neural interfaces hold promise as high-bandwidth input devices for next-generation technologies. However, neural signals inherently encode sensitive information about an individual’s identity and health, making data sharing for decoder training a critical privacy challenge. Federated learning (FL), a distributed, privacy-preserving learning framework, presents a promising solution, but it remains unexplored in closed-loop adaptive neural interfaces. Here, we introduce FL-based neural decoding and systematically evaluate its performance and privacy using high-dimensional electromyography signals in both open- and closed-loop scenarios. In open-loop simulations, FL significantly outperformed local learning baselines, demonstrating its potential for high-performance, privacy-conscious neural decoding. In contrast, closed-loop user studies required adapting FL methods to accommodate single-user, real-time interactions, a scenario not supported by standard FL. This modification resulted in local learning decoders surpassing the adapted FL approach in closed-loop performance, yet local learning still carried higher privacy risks. Our findings highlight a critical performance-privacy tradeoff in real-time adaptive applications and indicate the need for FL methods specifically designed for co-adaptive, single-user applications.
侵入式和非侵入式神经界面作为下一代技术的高带宽输入装置很有希望。然而,神经信号必然会将个人的身份和健康敏感信息编码起来,使用于解码器培训的数据共享成为关键的隐私挑战。联邦学习(FL)是一个分布式的、保护隐私的学习框架,它是一个很有希望的解决方案,但在封闭式环球适应性神经界面中仍然没有探索。在这里,我们引入基于FL的神经解码并系统地评估其性能和隐私,在开放式和闭式环情景中使用高维电传信号。在开放式模拟中,FL大大超过当地学习基线,展示其高性能、有意识的神经解码的潜力。相比之下,封闭式用户研究需要调整FL方法,以适应单一用户,实时互动,一种没有标准FL支持的假想。这种修改导致当地学习解码器在闭式操作中超过了经调整的FL方法,但在本地学习中仍然带有较高的隐私风险。我们的研究结论显示,在高度交易应用中需要一种关键的软化式组合。
Article 163
Title@2025-07-16 (3): Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting
Title: Timing is Important: Risk-aware Fund Allocation based on Time-Series Forecasting | Timing ist wichtig: Risiko-aware Fund Allokation basierend auf Time-Series Forecasting | 时间选择很重要:根据时间-系列预测进行有风险的基金分配 2505.24835v3 |
Authors (9): Fuyuan Lyu, Linfeng Du, Yunpeng Weng, Qiufang Ying, Zhiyan Xu, Wen Zou, Haolun Wu, Xiuqiang He, Xing Tang
Fund allocation has been an increasingly important problem in the financial domain. In reality, we aim to allocate the funds to buy certain assets within a certain future period. Naive solutions such as prediction-only or Predict-then-Optimize approaches suffer from goal mismatch. Additionally, the introduction of the SOTA time series forecasting model inevitably introduces additional uncertainty in the predicted result. To solve both problems mentioned above, we introduce a Risk-aware Time-Series Predict-and-Allocate (RTS-PnO) framework, which holds no prior assumption on the forecasting models. Such a framework contains three features: (i) end-to-end training with objective alignment measurement, (ii) adaptive forecasting uncertainty calibration, and (iii) agnostic towards forecasting models. The evaluation of RTS-PnO is conducted over both online and offline experiments. For offline experiments, eight datasets from three categories of financial applications are used: Currency, Stock, and Cryptos. RTS-PnO consistently outperforms other competitive baselines. The online experiment is conducted on the Cross-Border Payment business at FiT, Tencent, and an 8.4\% decrease in regret is witnessed when compared with the product-line approach. The code for the offline experiment is available at https://github.com/fuyuanlyu/RTS-PnO.
在金融领域,资金分配已成为一个日益重要的问题。在现实中,我们的目标是分配资金,以便在未来的某一时期内购买某些资产。预测或预测-预测-当时-优化方法等初步解决办法存在目标不匹配的问题。此外,SOTA时间序列预测模型的采用不可避免地增加了预测结果的不确定性。为了解决上述两个问题,我们引入了一个风险意识-了解时间-系列预测和分配(RTS-PnO)框架,该框架对预测模型没有预先假设。这种框架包含三个特点:(一) 目标调整衡量的端到端培训,(二) 适应性预测不确定性校准,以及(三) 预测模型的不可知性。RTS-PnO的评估工作在网上和离线试验中进行。在离线实验中,使用了来自三类金融应用的8个数据集:货币、库存和冷冻。RTS-PNOO始终超越其他竞争性基线。这种框架包含三个特点:(一) 在FT、Tencent/com的跨界支付业务上进行在线试验,在FIT、Tententcent-Ocrual实验时,在产品上减少。
Article 164
Title@2025-07-16 (3): A Novel Data Augmentation Strategy for Robust Deep Learning Classification of Biomedical Time-Series Data: Application to ECG and EEG Analysis
Title: A Novel Data Augmentation Strategy for Robust Deep Learning Classification of Biomedical Time-Series Data: Application to ECG and EEG Analysis | Eine neuartige Datenvergrößerungsstrategie für robustes Deep Learning Klassifizierung biomedizinischer Zeitreihendaten: Anwendung auf EKG- und EEG-Analysen | 生物医学时间序列数据深入学习分类:应用ECG和EEG分析的新颖数据增强战略 2507.12645v1 |
Authors (3): Mohammed Guhdar, Ramadhan J. Mstafa, Abdulhakeem O. Mohammed
The increasing need for accurate and unified analysis of diverse biological signals, such as ECG and EEG, is paramount for comprehensive patient assessment, especially in synchronous monitoring. Despite advances in multi-sensor fusion, a critical gap remains in developing unified architectures that effectively process and extract features from fundamentally different physiological signals. Another challenge is the inherent class imbalance in many biomedical datasets, often causing biased performance in traditional methods. This study addresses these issues by proposing a novel and unified deep learning framework that achieves state-of-the-art performance across different signal types. Our method integrates a ResNet-based CNN with an attention mechanism, enhanced by a novel data augmentation strategy: time-domain concatenation of multiple augmented variants of each signal to generate richer representations. Unlike prior work, we scientifically increase signal complexity to achieve future-reaching capabilities, which resulted in the best predictions compared to the state of the art. Preprocessing steps included wavelet denoising, baseline removal, and standardization. Class imbalance was effectively managed through the combined use of this advanced data augmentation and the Focal Loss function. Regularization techniques were applied during training to ensure generalization. We rigorously evaluated the proposed architecture on three benchmark datasets: UCI Seizure EEG, MIT-BIH Arrhythmia, and PTB Diagnostic ECG. It achieved accuracies of 99.96%, 99.78%, and 100%, respectively, demonstrating robustness across diverse signal types and clinical contexts. Finally, the architecture requires ~130 MB of memory and processes each sample in ~10 ms, suggesting suitability for deployment on low-end or wearable devices.
对各种生物信号(如ECG和EEG)进行准确和统一分析的必要性日益增加,对于全面病人评估,特别是同步监测而言,对ECG和EEG等不同生物信号的准确和统一分析的需求日益增加,这对于综合病人评估至关重要。尽管在多传感器聚合方面取得了进展,但在建立能够有效处理和提取根本不同的生理信号特征的统一结构方面,仍然存在一个重大差距。另一个挑战是许多生物医学数据集固有的阶级不平衡,这往往造成传统方法的偏差。本研究通过提出一个创新和统一的深层次学习框架来解决这些问题,使不同信号类型达到最先进的业绩。我们的方法将基于ResNet的CNN与关注机制相结合,并辅之以新的数据增强战略:每个信号的多种增强变异器在时间上相互融合,以产生更丰富的表达。我们不同于以往的工作,我们从科学上增加了信号的复杂性,以实现未来影响能力,这导致与艺术状态相比的最佳预测。预处理步骤包括波粒分解、基线清除和标准化。通过联合使用这一先进的数据增强和降低损失功能来有效地管理。在培训过程中应用常规化技术,以确保通用性环境结构的准确性。我们最后评估了标准。
Article 165
Title@2025-07-16 (3): Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows
Title: Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows | Fine-Tune ein SLM oder Prompt ein LLM? Der Fall der Erzeugung von Low-Code Workflows | 微调可持续土地管理还是迅速提炼一个LLM? 产生低碳工作流程的案例 2505.24189v2 |
Authors (5): Orlando Marquez Ayala, Patrice Bechard, Emily Chen, Maggie Baird, Jingfei Chen
Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are reduced, the advantages of fine-tuning Small Language Models (SLMs) for real-world applications – faster inference, lower costs – may no longer be clear. In this work, we present evidence that, for domain-specific tasks that require structured outputs, SLMs still have a quality advantage. We compare fine-tuning an SLM against prompting LLMs on the task of generating low-code workflows in JSON form. We observe that while a good prompt can yield reasonable results, fine-tuning improves quality by 10% on average. We also perform systematic error analysis to reveal model limitations.
大型语言模型(LLMs)如GPT-4o等大型语言模型(LLMs)能够以合适的速度处理一系列复杂的任务。由于象征性成本降低,微调用于现实世界应用的小型语言模型(SLMs)的优点可能不再十分清楚 – – 更快的推论、更低的成本 – – 在这项工作中,我们提出的证据表明,对于需要结构化产出的具体领域任务,可持续土地管理仍具有质量优势。我们比较了SLM的微调,而不是激励LLMs完成以 JSON 格式生成低码工作流程的任务。我们发现,虽然良好的迅速可以产生合理的结果,但微调的质量平均提高10%。我们还进行了系统性的错误分析,以揭示模型的局限性。
Article 166
Title@2025-07-16 (3): Cross-Layer Discrete Concept Discovery for Interpreting Language Models
Title: Cross-Layer Discrete Concept Discovery for Interpreting Language Models | Cross-Layer Discrete Concept Discovery für Interpretationssprachmodelle | 解释语言模型的跨语言监听概念发现 2506.20040v2 |
Authors (4): Ankur Garg, Xuemin Yu, Hassan Sajjad, Samira Ebrahimi Kahou
Uncovering emergent concepts across transformer layers remains a significant challenge because the residual stream linearly mixes and duplicates information, obscuring how features evolve within large language models. Current research efforts primarily inspect neural representations at single layers, thereby overlooking this cross-layer superposition and the redundancy it introduces. These representations are typically either analyzed directly for activation patterns or passed to probing classifiers that map them to a limited set of predefined concepts. To address these limitations, we propose cross-layer VQ-VAE (CLVQ-VAE), a framework that uses vector quantization to map representations across layers and in the process collapse duplicated residual-stream features into compact, interpretable concept vectors. Our approach uniquely combines top-k temperature-based sampling during quantization with EMA codebook updates, providing controlled exploration of the discrete latent space while maintaining code-book diversity. We further enhance the framework with scaled-spherical k-means++ for codebook initialization, which clusters by directional similarity rather than magnitude, better aligning with semantic structure in word embedding space.
由于剩余流线性混合和重复信息,掩盖了大语言模型中各种特征的演变方式,因此这些未覆盖的变压层新出现概念仍是一项重大挑战。当前研究工作主要检查单层神经显示,从而忽略了这种跨层叠加和它带来的冗余。这些表示通常不是直接分析激活模式,就是通过直接分析将其映射成有限的一组预设概念的检测分类。为了解决这些局限性,我们提议采用跨层VQ-VAE(CLVQ-VAE)(CLVQ-VAE)这一框架,利用矢量定量来绘制各层之间和整个过程的表达方式,将重复的残余流特征映射成紧凑的、可解释的概念矢量。我们的方法在四分化过程中将基于温度的顶部取样与 EMA 代码簿更新结合起来,提供对离散潜伏空间的有控制的探索,同时维护代码簿的多样性。我们进一步强化框架,使代码初始化的宽度K-point-point-point +(CLVQ-VE-VE-VAVE),这个框架使用矢量组合,以方向性组合而不是数量,更好地与文字嵌嵌入空间中的文字嵌入空间的语结构。
Article 167
Title@2025-07-16 (3): On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations
Title: On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations | Über die lineare Beschleunigung des personalisierten Federated Verstärkungslernens mit geteilten Repräsentationen | 在线加快个人化联邦强化学习,共用代表 2411.15014v2 |
Authors (4): Guojun Xiong, Shufan Wang, Daniel Jiang, Jian Li
Federated reinforcement learning (FedRL) enables multiple agents to collaboratively learn a policy without sharing their local trajectories collected during agent-environment interactions. However, in practice, the environments faced by different agents are often heterogeneous, leading to poor performance by the single policy learned by existing FedRL algorithms on individual agents. In this paper, we take a further step and introduce a \emph{personalized} FedRL framework (PFedRL) by taking advantage of possibly shared common structure among agents in heterogeneous environments. Specifically, we develop a class of PFedRL algorithms named PFedRL-Rep that learns (1) a shared feature representation collaboratively among all agents, and (2) an agent-specific weight vector personalized to its local environment. We analyze the convergence of PFedTD-Rep, a particular instance of the framework with temporal difference (TD) learning and linear representations. To the best of our knowledge, we are the first to prove a linear convergence speedup with respect to the number of agents in the PFedRL setting. To achieve this, we show that PFedTD-Rep is an example of the federated two-timescale stochastic approximation with Markovian noise. Experimental results demonstrate that PFedTD-Rep, along with an extension to the control setting based on deep Q-networks (DQN), not only improve learning in heterogeneous settings, but also provide better generalization to new environments.
联邦强化学习(FedRL)使多个代理商能够合作学习一项政策,而不必分享其在代理-环境互动期间收集的当地轨迹;然而,在实践中,不同代理商所面临的环境往往各异,导致现有FedRL对个体代理商的算法所学的单一政策业绩不佳;在本文件中,我们进一步迈出一步,引入了FedRL框架(FedRL),利用不同环境中代理商之间可能共享的共同结构。具体地说,我们开发了一类名为PFedRL-Rep的PFedRL算法,学习(1) 所有代理商之间共同的特征代表,以及(2) 适合其当地环境的针对特定代理商的重量矢量矢量。我们分析了PFedTD-Rep的趋同性,这是框架中存在时间差异(TD)学习和线性表述的一个实例。我们最了解的是,我们首先证明在PFedRL设置的代理商的设置中,在数量方面是线性趋同。为了实现这一点,我们还表明,PFTD-Restimal-Repalmental Q是Simal-laview a as a 一种规模的升级到Slavical-cal-fild-fal-fal-fal-formillment a asment asment agilent a as a as a as a as a as a as acaltibild a as a ex a asil asil as asil as as asild asilent asild a as a asild asilizalizal asild as a as a as a as a as a as a as a as a as agild ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex as exide exide exide as a as a as a ex ex ex ex
Article 168
Title@2025-07-16 (3): VLMgineer: Vision Language Models as Robotic Toolsmiths
Title: VLMgineer: Vision Language Models as Robotic Toolsmiths | VLMgineer: Vision Language Models als Roboterwerkzeugmacher | VLMGineer:作为机器人工具匠的愿景语言模型 2507.12644v1 |
Authors (7): George Jiayuan Gao, Tianyu Li, Junyao Shi, Yihan Li, Zizhe Zhang, Nadia Figueroa, Dinesh Jayaraman
Tool design and use reflect the ability to understand and manipulate the physical world through creativity, planning, and foresight. As such, these capabilities are often regarded as measurable indicators of intelligence across biological species. While much of today’s research on robotic intelligence focuses on generating better controllers, inventing smarter tools offers a complementary form of physical intelligence: shifting the onus of problem-solving onto the tool’s design. Given the vast and impressive common-sense, reasoning, and creative capabilities of today’s foundation models, we investigate whether these models can provide useful priors to automatically design and effectively wield such tools? We present VLMgineer, a framework that harnesses the code generation abilities of vision language models (VLMs) together with evolutionary search to iteratively co-design physical tools and the action plans that operate them to perform a task. We evaluate VLMgineer on a diverse new benchmark of everyday manipulation scenarios that demand creative tool design and use. Across this suite, VLMgineer consistently discovers tools and policies that solve tasks more effectively and innovatively, transforming challenging robotics problems into straightforward executions. It also outperforms VLM-generated designs from human specifications and existing human-crafted tools for everyday tasks. To facilitate future research on automated tool invention, we will release our benchmark and code.
工具的设计和使用反映了通过创造性、规划和远见理解和操控物理世界的能力。 因此,这些能力常常被视为生物物种之间情报的可测量指标。 虽然当今关于机器人智能的研究大多侧重于产生更好的控制器,但发明更聪明的工具提供了一种补充性的物理智能形式:将解决问题的重心转移到工具的设计上。鉴于当今基础模型的庞大和令人印象深刻的常识、推理和创造性能力,我们调查这些模型是否能够提供有用的前科,自动设计和有效运用这些工具?我们提出了VLMginer,这个框架利用视觉语言模型(VLMM)的代码生成能力,同时进行进化搜索,以迭代地共同设计物理工具和操作这些工具的行动计划来执行任务。我们评估VLMGineer对需要创造性工具设计和使用的日常操纵情景的多种新基准。我们从这个套件中不断发现一些工具和政策,以更有效和创新的方式解决任务,将挑战性机器人问题转变为直截了的处决。它也超越了我们从现有人类工具中自动设计工具的规格和今后设计工具的自动设计。
Article 169
Title@2025-07-16 (3): Manify: A Python Library for Learning Non-Euclidean Representations
Title: Manify: A Python Library for Learning Non-Euclidean Representations | Manify: Eine Python-Bibliothek zum Lernen nicht-euklidischen Repräsentationen | 拼写:一个用于学习非欧洲语言代表的皮顿图书馆 2503.09576v2 |
Authors (5): Philippe Chlenski, Kaizhu Du, Dylan Satow, Raiyan R. Khan, Itsik Pe’er
We present Manify, an open-source Python library for non-Euclidean representation learning. Leveraging manifold learning techniques, Manify provides tools for learning embeddings in (products of) non-Euclidean spaces, performing classification and regression with data that lives in such spaces, estimating the curvature of a manifold, and more. Manify aims to advance research and applications in machine learning by offering a comprehensive suite of tools for manifold-based data analysis. Our source code, examples, and documentation are available at https://github.com/pchlenski/manify.
我们介绍一个开放源码的Python图书馆“Manify”(一个用于非欧洲语言代表学习的开放源码 Python 图书馆),利用多种学习技术,“Manify”(Manify)提供学习嵌入非欧洲语言空间(产品)的工具,对生活在这些空间的数据进行分类和回归,估计一个多种语言的曲线,等等。“Manify”(Manify)的目的是通过提供一整套基于多种语言的数据分析工具来推进机器学习的研究和应用。我们的源代码、实例和文件可在https://github.com/pchlenski/manify查阅。
Article 170
Title@2025-07-16 (3): Multi-task retriever fine-tuning for domain-specific and efficient RAG
Title: Multi-task retriever fine-tuning for domain-specific and efficient RAG | Multi-Task Retriever Feinabstimmung für domänenspezifische und effiziente RAG | 多任务检索器微调,用于特定领域和高效率的RAG 2501.04652v2 |
Authors (2): Patrice Béchard, Orlando Marquez Ayala
Retrieval-Augmented Generation (RAG) has become ubiquitous when deploying Large Language Models (LLMs), as it can address typical limitations such as generating hallucinated or outdated information. However, when building real-world RAG applications, practical issues arise. First, the retrieved information is generally domain-specific. Since it is computationally expensive to fine-tune LLMs, it is more feasible to fine-tune the retriever to improve the quality of the data included in the LLM input. Second, as more applications are deployed in the same real-world system, one cannot afford to deploy separate retrievers. Moreover, these RAG applications normally retrieve different kinds of data. Our solution is to instruction fine-tune a small retriever encoder on a variety of domain-specific tasks to allow us to deploy one encoder that can serve many use cases, thereby achieving low-cost, scalability, and speed. We show how this encoder generalizes to out-of-domain settings as well as to an unseen retrieval task on real-world enterprise use cases.
在部署大语言模型(LLMs)时,检索-加速一代(RAG)已经变得无处不在,因为它可以解决典型的局限性,例如产生幻觉或过时的信息。然而,在建立真实世界的RAG应用程序时,会出现实际问题。首先,检索的信息一般是特定域的信息。由于对微调LMS而言成本昂贵,因此更可行的做法是微调检索器,以提高LLM投入中所含数据的质量。第二,随着更多的应用程序被部署在同一个真实世界的系统中,人们无法使用单独的检索器。此外,这些RAG应用程序通常会检索不同种类的数据。我们的解决办法是,在各种特定域的任务上对小型检索器编码器进行微调,以使我们能够部署一个能为许多使用案例服务的编码器,从而实现低成本、可缩缩放性和速度。我们展示了这个编码器如何向外部环境一般化,以及对于现实世界企业使用案例的无形检索任务。
Article 171
Title@2025-07-16 (3): Reasoning-Finetuning Repurposes Latent Representations in Base Models
Title: Reasoning-Finetuning Repurposes Latent Representations in Base Models | Reasoning-Finetuning Repurposes Latente Darstellungen in Basismodellen | 基础模型中的重新目的前期代表 2507.12638v1 |
Authors (4): Jake Ward, Chuqiao Lin, Constantin Venhoff, Neel Nanda
Backtracking, an emergent behavior elicited by reasoning fine-tuning, has been shown to be a key mechanism in reasoning models’ enhanced capabilities. Prior work has succeeded in manipulating this behavior via steering vectors, but the underlying mechanism remains poorly understood. In this work, we show that the emergence of backtracking in DeepSeek-R1-Distill-Llama-8B is in part driven by a repurposed direction already present in base model activations. Specifically, we identify a direction in base Llama-3.1-8B’s residual stream which systematically induces backtracking when used to steer the distilled reasoning model, and find that the effects of steering with this direction cannot be trivially explained by token-level attributes. We further find that this direction does not induce backtracking in the base model, suggesting that the reasoning finetuning process repurposes pre-existing representations to form new behavioral circuits. Additionally, we hypothesize that this direction is one of several which may work together to mediate backtracking. Our findings offer a compelling picture that reasoning-finetuned models repurpose pre-existing base model representations, rather than learn new capabilities from scratch.
通过推理细微调整而发现的一种后行行为,已被证明是推理模型增强能力的关键机制。先前的工作成功地通过方向矢量控制了这一行为,但基本机制仍然不甚清楚。在这项工作中,我们显示DeepSeek-R1-Distill-Llama-8B的回跟踪出现部分是由基准模型激活中已经存在的重新定位方向驱动的。具体地说,我们确定了Llama-3.1-3.1-8B基础剩余流的方向,该方向在用来引导蒸馏的推理模型时系统地引导回溯跟踪,并发现以这一方向指导的效应不能用象征性的属性来轻描淡地解释。我们进一步发现,这一方向不会诱导基础模型的回溯跟踪,表明推理微调整过程将原有的演示重新定位用于形成新的行为模式。此外,我们假设这个方向是几个可以一起进行回溯跟踪的方向之一。我们的调查结果提供了令人信服的图片,即推理调整模型将原有基本模型显示为新能力,而不是从新的抓中学习新的能力。
Article 172
Title@2025-07-16 (3): LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Title: LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization | LoRA Done RITE: Robuste Invariante Transformations-Equilibration für LoRA-Optimierung | Lora Done REITE: 优化 LoRA 的强劲的动态转型平衡 2410.20625v2 |
Authors (8): Jui-Nan Yen, Si Si, Zhao Meng, Felix Yu, Sai Surya Duvvuri, Inderjit S. Dhillon, Cho-Jui Hsieh, Sanjiv Kumar
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. However, current LoRA optimizers lack transformation invariance, meaning the actual updates to the weights depends on how the two LoRA factors are scaled or rotated. This deficiency leads to inefficient learning and sub-optimal solutions in practice. This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization, which can achieve transformation invariance and remain computationally efficient. We provide theoretical analysis to demonstrate the benefit of our method and conduct experiments on various LLM tasks with different models including Gemma 2B, 7B, and mT5-XXL. The results demonstrate consistent improvements against existing optimizers. For example, replacing Adam with LoRA-RITE during LoRA fine-tuning of Gemma-2B yielded 4.6\% accuracy gain on Super-Natural Instructions and 3.5\% accuracy gain across other four LLM benchmarks (HellaSwag, ArcChallenge, GSM8K, OpenBookQA).
低级别适应(LORA)是LLM的一种广泛使用的高效参数微调方法,可以减少记忆要求;然而,目前的LORA优化剂缺乏变异性,这意味着对重量的实际更新取决于两个LORA因素的缩放或旋转方式;这一缺陷导致在实践中学习效率低下和次优的解决方案;本文介绍了LORA优化的一种创新的适应矩阵前导方法,即LORA-REITE,它可以实现变异并保持计算效率;我们提供了理论分析,以证明我们的方法的好处,并用不同的模型对LLLM任务进行实验,包括Gemma 2B、7B和mT5-XXL。结果显示与现有的优化器相比,不断有改进。例如,在LORA微调Gemma-2B期间,用LORA-REITE取代Adam,在超自然教学中取得了4.6的精准收益,在其他四个LM基准(HellaSwag、Arcchallenge、GSM8K、OpenBQA)中实现了3.5的精准收益。
Article 173
Title@2025-07-16 (3): Escaping Plato’s Cave: JAM for Aligning Independently Trained Vision and Language Models
Title: Escaping Plato’s Cave: JAM for Aligning Independently Trained Vision and Language Models | Escaping Platons Cave: JAM for Aligning Independently Trained Vision and Language Models | 脱离柏拉图的洞穴:调整独立培训的愿景和语言模式的JAM 2507.01201v4 |
Authors (3): Lauren Hyoseo Yoon, Yisong Yue, Been Kim
Independently trained vision and language models inhabit disjoint representational spaces, shaped by their respective modalities, objectives, and architectures. Yet an emerging hypothesis - the Platonic Representation Hypothesis - suggests that such models may nonetheless converge toward a shared statistical model of reality. This compatibility, if it exists, raises a fundamental question: can we move beyond post-hoc statistical detection of alignment and explicitly optimize for it between such disjoint representations? We cast this Platonic alignment problem as a multi-objective optimization task - preserve each modality’s native structure while aligning for mutual coherence. We introduce the Joint Autoencoder Modulator (JAM) framework that jointly trains modality-specific autoencoders on the latent representations of pre-trained single modality models, encouraging alignment through both reconstruction and cross-modal objectives. By analogy, this framework serves as a method to escape Plato’s Cave, enabling the emergence of shared structure from disjoint inputs. We evaluate this framework across three critical design axes: (i) the alignment objective - comparing contrastive loss (Con), its hard-negative variant (NegCon), and our Spread loss, (ii) the layer depth at which alignment is most effective, and (iii) the impact of foundation model scale on representational convergence. Our findings show that our lightweight Pareto-efficient framework reliably induces alignment, even across frozen, independently trained representations, offering both theoretical insight and practical pathways for transforming generalist unimodal foundations into specialist multimodal models.
经过独立训练的愿景和语言模型位于不同代表空间,由各自的模式、目标和结构决定。但新出现的假设—-平板代表假设假设—-表明这些模型可能仍然会趋向于共同的现实统计模式。如果存在这种兼容性,则提出一个根本问题:我们能否超越热后统计检测对一致性的检测,明确优化这种不一致性代表之间的匹配?我们把这种平板对齐问题作为一个多目标优化任务――维护每种模式的本体结构,同时为相互一致而协调。我们引入了联合自动编码模型(JAM)框架,共同培训特定模式的自动调整模型,以了解事先培训过的单一模式模式模式的潜在表现,鼓励通过重建和交叉模式目标加以调整。类推,这一框架可以作为一种摆脱柏托洞穴的方法,使共同结构能够从不协调的投入中出现。我们从三个关键设计轴对这个框架进行了评估:(一) 调整目标――比较性专家模型(Con),其硬性反差变量(Necon),以及我们横向结构结构结构结构的调整,这显示了我们总体结构结构的深度,显示了我们整个结构结构结构结构结构的深度。
Article 174
Title@2025-07-16 (3): Conformal inference for regression on Riemannian Manifolds
Title: Conformal inference for regression on Riemannian Manifolds | Konforme Schlussfolgerung zur Regression auf Riemannische Manifolds | 里伊曼尼马内佛山回归的正规推论 2310.08209v2 |
Authors (3): Alejandro Cholaquidis, Fabrice Gamboa, Leonardo Moreno
Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for non Euclidean data. Circular data is a classic example, but so is data in the space of covariance matrices, data on the Grassmannian manifold obtained as a result of principal component analysis, among many others. In this work we investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by $X$, lies in an Euclidean space. This extends the concepts delineated in \cite{waser14} to this novel context. Aligning with traditional principles in conformal inference, these prediction sets are distribution-free, indicating that no specific assumptions are imposed on the joint distribution of $(X,Y)$, and they maintain a non-parametric character. We prove the asymptotic almost sure convergence of the empirical version of these regions on the manifold to their population counterparts. The efficiency of this method is shown through a comprehensive simulation study and an analysis involving real-world data.
近年来,由于大量应用非欧洲裔数据,对元件的回归以及更广泛的关于元件的统计,近年来已变得非常重要。循环数据是一个典型的例子,但是在共变矩阵空间中的数据也是如此。通过主要组成部分分析等主要组成部分分析获得的格拉斯曼元数据也是如此。在这项工作中,当反应变量(用Y美元表示)位于一个方位时,当反应变量(用X美元表示)位于一个方位时,回归假设的预测数组就显得相当重要,而以X美元表示的可共变值则存在于欧洲裔国家空间中。这把在\cite{waster14}中描述的概念延伸至这个新的背景。与传统原则一致的一致,这些预测组是没有分布的,表明没有对美元(X)Y)美元的联合分配作任何具体假设,而且它们保持非对称性。我们证明,这些区域的经验版本几乎可以肯定地结合到其人口对应方位。这种方法的效率通过全面模拟研究以及涉及真实世界数据的分析来显示。
Article 175
Title@2025-07-16 (3): Active Human Feedback Collection via Neural Contextual Dueling Bandits
Title: Active Human Feedback Collection via Neural Contextual Dueling Bandits | Aktive menschliche Feedback-Sammlung über neurale Kontext-Duellbanditen | 通过神经环境授权强盗收集活性人类反馈 2504.12016v2 |
Authors (5): Arun Verma, Xiaoqiang Lin, Zhongxiang Dai, Daniela Rus, Bryan Kian Hsiang Low
Collecting human preference feedback is often expensive, leading recent works to develop principled algorithms to select them more efficiently. However, these works assume that the underlying reward function is linear, an assumption that does not hold in many real-life applications, such as online recommendation and LLM alignment. To address this limitation, we propose Neural-ADB, an algorithm based on the neural contextual dueling bandit framework that provides a principled and practical method for collecting human preference feedback when the underlying latent reward function is non-linear. We theoretically show that when preference feedback follows the Bradley-Terry-Luce model, the worst sub-optimality gap of the policy learned by Neural-ADB decreases at a sub-linear rate as the preference dataset increases. Our experimental results on preference datasets further corroborate the effectiveness of Neural-ADB.
收集人类偏好反馈往往费用高昂,导致最近制定原则性算法以更有效地选择这些算法的工作,然而,这些计算法假定,基本的奖励功能是线性的,这种假设在许多实际应用中并不存在,例如在线建议和LLM对齐。为解决这一局限性,我们提议神经-亚银,这是一种基于神经环境背景的强盗框架的算法,它提供了在潜在的潜在奖励功能非线性时收集人类偏爱反馈的原则性和实用方法。我们理论上表明,当偏爱反馈遵循布拉德利-泰里-卢斯模式时,当偏爱反馈遵循布拉德利-特拉伊-卢斯模式时,神经-亚银所学政策中最差的次优化差距随着偏好数据集的增加以亚线性速递减。我们在偏好数据集方面的实验结果进一步证实了Neural-亚行的有效性。
Article 176
Title@2025-07-16 (3): Hamiltonian Neural Networks approach to fuzzball geodesics
Title: Hamiltonian Neural Networks approach to fuzzball geodesics | Hamiltonian Neural Networks Ansatz für Fuzzball Geodäsie | 汉密尔顿神经网络法 2502.20881v3 |
Authors (5): Andrea Cipriani, Alessandro De Santis, Giorgio Di Russo, Alfredo Grillo, Luca Tabarroni
The recent increase in computational resources and data availability has led to a significant rise in the use of Machine Learning (ML) techniques for data analysis in physics. However, the application of ML methods to solve differential equations capable of describing even complex physical systems is not yet fully widespread in theoretical high-energy physics. Hamiltonian Neural Networks (HNNs) are tools that minimize a loss function defined to solve Hamilton equations of motion. In this work, we implement several HNNs trained to solve, with high accuracy, the Hamilton equations for a massless probe moving inside a smooth and horizonless geometry known as D1-D5 circular fuzzball. We study both planar (equatorial) and non-planar geodesics in different regimes according to the impact parameter, some of which are unstable. Our findings suggest that HNNs could eventually replace standard numerical integrators, as they are equally accurate but more reliable in critical situations.
最近计算资源和数据提供量的增加导致物理学数据分析中机器学习技术的使用显著增加。然而,在理论高能物理中,应用ML方法解决能够描述甚至复杂的物理系统的差别方程式尚未完全普及。汉密尔顿神经网络(HNNs)是最大限度地减少一种损失功能的工具,而这种损失功能是用来解决汉密尔顿运动方程式的工具。在这项工作中,我们实施了若干HNS培训,以便以高精准的方式解决无质量探测器的汉密尔顿方程式,该方程式将移动在一个称为D1-D5圆形的平滑和无地平面地质学中。我们根据撞击参数在不同系统中研究平面(赤道)和非平面大地学,其中一些是不稳定的。我们的研究结果表明,HNNS最终可以取代标准的数字化器,因为它们在危急情况下同样准确,但更可靠。
Article 177
Title@2025-07-16 (3): BootSeer: Analyzing and Mitigating Initialization Bottlenecks in Large-Scale LLM Training
Title: BootSeer: Analyzing and Mitigating Initialization Bottlenecks in Large-Scale LLM Training | BootSeer: Analysieren und Abmildern von Initialisierungsengpässen im großformatigen LLM-Training | BoutSeer:大规模LLM培训中分析和减缓初始化瓶颈 2507.12619v1 |
Authors (17): Rui Li, Xiaoyun Zhi, Jinxin Chi, Menghan Yu, Lixin Huang, Jia Zhu, Weilun Zhang, Xing Ma, Wenjia Liu, Zhicheng Zhu, Daowen Luo, Zuquan Song, Xin Yin, Chao Xiang, Shuguang Wang, Wencong Xiao, Gene Cooperman
Large Language Models (LLMs) have become a cornerstone of modern AI, driving breakthroughs in natural language processing and expanding into multimodal jobs involving images, audio, and video. As with most computational software, it is important to distinguish between ordinary runtime performance and startup overhead. Prior research has focused on runtime performance: improving training efficiency and stability. This work focuses instead on the increasingly critical issue of startup overhead in training: the delay before training jobs begin execution. Startup overhead is particularly important in large, industrial-scale LLMs, where failures occur more frequently and multiple teams operate in iterative update-debug cycles. In one of our training clusters, more than 3.5% of GPU time is wasted due to startup overhead alone. In this work, we present the first in-depth characterization of LLM training startup overhead based on real production data. We analyze the components of startup cost, quantify its direct impact, and examine how it scales with job size. These insights motivate the design of Bootseer, a system-level optimization framework that addresses three primary startup bottlenecks: (a) container image loading, (b) runtime dependency installation, and (c) model checkpoint resumption. To mitigate these bottlenecks, Bootseer introduces three techniques: (a) hot block record-and-prefetch, (b) dependency snapshotting, and (c) striped HDFS-FUSE. Bootseer has been deployed in a production environment and evaluated on real LLM training workloads, demonstrating a 50% reduction in startup overhead.
大型语言模型(LLMS)已成为现代AI的基石,推动了自然语言处理的突破,并发展成涉及图像、音频和视频的多式联运工作。与大多数计算软件一样,重要的是区分普通运行时间性能和启动间接费用。先前的研究侧重于运行时间性能:提高培训效率和稳定性。这项工作侧重于培训启动间接费用这一日益紧迫的问题:培训工作开始之前的延误。在大型工业规模LMS中,启动间接费用特别重要,因为失败发生频率更高,多个团队在迭接更新-调试周期运作。在我们的培训集群中,超过3.5%的GPU时间因启动间接费用而浪费。在这项工作中,我们根据实际生产数据对LLMM培训启动间接费用的首次深入描述。我们分析了启动费用的各个组成部分,量化了其直接影响,并考察了它与工作规模的大小。这些洞察力激励了Boutseer的设计,一个系统级优化框架,解决了三个初级启动瓶颈:(a) 集装箱图像装载,(b) 启动前期依赖性工序的SBRADR(c) 启动阶段的升级、升级和升级(c) 升级升级的SBOUDFS) 和升级记录。
Article 178
Title@2025-07-16 (3): Boolformer: Symbolic Regression of Logic Functions with Transformers
Title: Boolformer: Symbolic Regression of Logic Functions with Transformers | Booformer: Symbolische Regression von logischen Funktionen mit Transformern | 布尔: 带有变换器的逻辑函数的符号回归 2309.12207v2 |
Authors (6): Stéphane d’Ascoli, Arthur Renard, Vassilis Papadopoulos, Samy Bengio, Josh Susskind, Emmanuel Abbé
We introduce Boolformer, a Transformer-based model trained to perform end-to-end symbolic regression of Boolean functions. First, we show that it can predict compact formulas for complex functions not seen during training, given their full truth table. Then, we demonstrate that even with incomplete or noisy observations, Boolformer is still able to find good approximate expressions. We evaluate Boolformer on a broad set of real-world binary classification datasets, demonstrating its potential as an interpretable alternative to classic machine learning methods. Finally, we apply it to the widespread task of modeling the dynamics of gene regulatory networks and show through a benchmark that Boolformer is competitive with state-of-the-art genetic algorithms, with a speedup of several orders of magnitude. Our code and models are available publicly.
我们引入了Bulleon, 这是一种以变压器为基础的模型, 受过训练, 以进行布尔功能的端到端的象征性回归。 首先, 我们显示它可以预测训练期间看不到的复杂功能的紧凑公式, 并且考虑到它们的全部真实性表。 然后, 我们证明即使观测不全或者吵闹, Bolleon 仍然能找到良好的大致表达方式。 我们用一套广泛的真实世界二进制分类数据集来评估Bulleon, 表明它作为经典机器学习方法的可解释替代方法的潜力。 最后, 我们将其应用到基因管理网络动态模型的广泛任务中, 通过一个基准显示, Bulleorent 具有与最新基因算法的竞争力, 并有几级的快速级。 我们的代码和模型可以公开使用。
Article 179
Title@2025-07-16 (3): Quantum HyperNetworks: Training Binary Neural Networks in Quantum Superposition
Title: Quantum HyperNetworks: Training Binary Neural Networks in Quantum Superposition | Quantum HyperNetworks: Training von Binary Neural Networks in der Quantenüberlagerung | 量子超超网络:在量子叠置方面培训二元神经网络 2301.08292v2 |
Authors (7): Juan Carrasquilla, Mohamed Hibat-Allah, Estelle Inack, Alireza Makhzani, Kirill Neklyudov, Graham W. Taylor, Giacomo Torlai
Binary neural networks, i.e., neural networks whose parameters and activations are constrained to only two possible values, offer a compelling avenue for the deployment of deep learning models on energy- and memory-limited devices. However, their training, architectural design, and hyperparameter tuning remain challenging as these involve multiple computationally expensive combinatorial optimization problems. Here we introduce quantum hypernetworks as a mechanism to train binary neural networks on quantum computers, which unify the search over parameters, hyperparameters, and architectures in a single optimization loop. Through classical simulations, we demonstrate that our approach effectively finds optimal parameters, hyperparameters and architectural choices with high probability on classification problems including a two-dimensional Gaussian dataset and a scaled-down version of the MNIST handwritten digits. We represent our quantum hypernetworks as variational quantum circuits, and find that an optimal circuit depth maximizes the probability of finding performant binary neural networks. Our unified approach provides an immense scope for other applications in the field of machine learning.
二线神经网络,即神经网络,其参数和激活仅受两个可能值的限制,为部署能源和内存限制装置的深学习模型提供了一条令人信服的途径。然而,它们的训练、建筑设计和超参数调整仍然具有挑战性,因为这些网络涉及多种计算成本昂贵的组合优化问题。在这里,我们引入了量子超网络,作为在量子计算机上培训双线神经网络的一种机制,这些网络将参数、超参数和结构的搜索统一在一个单一优化循环中。通过古典模拟,我们证明我们的方法有效地找到了最理想的参数、超参数和建筑选择,在分类问题上极有可能找到,包括二维高斯数据集和缩缩放的MNISST手写数字。我们把我们的量子超网络作为变异量量量量量子电路,发现最佳电路深将找到性能双线网络的概率最大化。我们的统一方法为机器学习领域的其他应用提供了巨大的空间。
Article 180
Title@2025-07-16 (3): Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning
Title: Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning | Lernen, was zählt: Probabilistische Aufgabenauswahl über Gegenseitige Informationen zur Modellfeinsteuerung | 学习什么重要:通过相互信息选择任务概率选择,用于示范微调 2507.12612v1 |
Authors (6): Prateek Chanda, Saral Sureka, Parth Pratim Chatterjee, Krishnateja Killamsetty, Nikhil Shivakumar Nayak, Ganesh Ramakrishnan
The performance of finetuned large language models (LLMs) hinges critically on the composition of the training mixture. However, selecting an optimal blend of task datasets remains a largely manual, heuristic driven process, with practitioners often relying on uniform or size based sampling strategies. We introduce TASKPGM, a principled and scalable framework for mixture optimization that selects continuous task proportions by minimizing an energy function over a Markov Random Field (MRF). Task relationships are modeled using behavioral divergences such as Jensen Shannon Divergence and Pointwise Mutual Information computed from the predictive distributions of single task finetuned models. Our method yields a closed form solution under simplex constraints and provably balances representativeness and diversity among tasks. We provide theoretical guarantees, including weak submodularity for budgeted variants, and demonstrate consistent empirical improvements on Llama 2 and Mistral across evaluation suites such as MMLU and BIGBench. Beyond performance, TASKPGM offers interpretable insights into task influence and mixture composition, making it a powerful tool for efficient and robust LLM finetuning.
微调的大型语言模型(LLMS)的性能取决于培训组合的构成。然而,选择任务数据集的最佳组合在很大程度上仍是一个人工的、累赘驱动的过程,从业者往往依靠统一或大小的抽样战略。我们引入了TASKPGM,这是混合优化的一个原则性和可扩缩的框架,通过在Markov随机场(MRF)上最大限度地减少能源功能来选择连续的任务比例。任务关系采用行为差异的模式,如Jensen Shannon Divergence和根据单项任务调整模型的预测分布计算出的点性相互信息。我们的方法在简单x的限制下产生了一种封闭式的解决方案,并可能平衡了任务之间的代表性和多样性。我们提供了理论保障,包括预算变式的次模式薄弱,并展示了Llama 2和Mistral在诸如MLMLU和BIGBench等评价套房中的持续经验改进。除了业绩外,TASKPGMGM提供可解释的任务影响和混合物构成,使其成为高效和稳健健的LM微的辅助工具。
Article 181
Title@2025-07-16 (3): Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies
Title: Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies | Mixed-Reality Digital Twins: Nutzung der physischen und virtuellen Welten für Hybrid Sim2Real Transition von Multi-Agent Verstärkungs-Learning-Politiken | 混合-现实数字双对:利用物理和虚拟世界促进混合的Sim2重新过渡多机构强化学习政策 2403.10996v6 |
Authors (3): Chinmay Vilas Samak, Tanmay Vilas Samak, Venkat Narayan Krovi
Multi-agent reinforcement learning (MARL) for cyber-physical vehicle systems usually requires a significantly long training time due to their inherent complexity. Furthermore, deploying the trained policies in the real world demands a feature-rich environment along with multiple physical embodied agents, which may not be feasible due to monetary, physical, energy, or safety constraints. This work seeks to address these pain points by presenting a mixed-reality (MR) digital twin (DT) framework capable of: (i) boosting training speeds by selectively scaling parallelized simulation workloads on-demand, and (ii) immersing the MARL policies across hybrid simulation-to-reality (sim2real) experiments. The viability and performance of the proposed framework are highlighted through two representative use cases, which cover cooperative as well as competitive classes of MARL problems. We study the effect of: (i) agent and environment parallelization on training time, and (ii) systematic domain randomization on zero-shot sim2real transfer, across both case studies. Results indicate up to 76.3% reduction in training time with the proposed parallelization scheme and sim2real gap as low as 2.9% using the proposed deployment method.
由于网络物理车辆系统的多剂强化学习(MARL)通常需要相当长的培训时间,因为其内在的复杂性,因此,在现实世界中部署经过训练的政策需要具有丰富特点的环境以及多种物理成形剂,由于货币、物理、能源或安全方面的限制,这可能不可行。这项工作力求解决这些痛苦点,办法是提出一个混合现实(MR)数字双胞胎(DT)框架,能够:(一) 通过有选择地根据需求扩大平行模拟工作量,提高培训速度;(二) 在混合模拟到现实(im2real)试验中浸泡出MARL政策,通过两个有代表性的使用案例来强调拟议框架的可行性和绩效,这两个案例涉及MARL问题的合作和竞争性类别。我们研究:(一) 代理和环境对培训时间的平行效应,以及(二) 两种案例研究对零点成双向的模拟转移的系统性域随机化效果。结果显示,与拟议的平行计划的培训时间减少76.3%,与使用拟议部署方法的轻度为2.9%的模拟差距,低于2.9%。
Article 182
Title@2025-07-16 (3): The Target Polish: A New Approach to Outlier-Resistant Non-Negative Matrix and Tensor Factorization
Title: The Target Polish: A New Approach to Outlier-Resistant Non-Negative Matrix and Tensor Factorization | Das Zielpolnisch: Ein neuer Ansatz für eine nicht-negative Matrix und Tensor-Fabrikierung | 目标波兰:对外部-外部-相对非消极矩阵和电文因素化的新办法 2507.10484v2 |
Authors (3): Paul Fogel, Christophe Geissler, George Luta
This paper introduces the “Target Polish,” a robust and computationally efficient framework for nonnegative matrix and tensor factorization. Although conventional weighted NMF approaches are resistant to outliers, they converge slowly due to the use of multiplicative updates to minimize the objective criterion. In contrast, the Target Polish approach remains compatible with the Fast-HALS algorithm, which is renowned for its speed, by adaptively smoothing the data with a weighted median-based transformation. This innovation provides outlier resistance while maintaining the highly efficient additive update structure of Fast-HALS. Empirical evaluations using image datasets corrupted with structured (block) and unstructured (salt) noise demonstrate that the Target Polish approach matches or exceeds the accuracy of state-of-the-art robust NMF methods and reduces computational time by an order of magnitude in the studied scenarios.
本文介绍“波兰目标”,这是一个用于非负矩阵和振动因子化的强大和计算效率高的框架。虽然常规的加权NMF方法对离子不具有抗力,但由于使用多倍式更新以尽量减少客观标准,它们缓慢地趋同。相比之下,目标波兰方法仍然与以速度而闻名的快速HALS算法相容,该算法采用加权中位变换法对数据进行适应性平滑。这一创新在保持快速HALS高效添加更新结构的同时,提供了异常的阻力。 使用结构化(区块)和非结构化(盐类)噪音的图像数据集进行的经验评估表明,目标波兰方法符合或超过最先进的稳健NMF方法的准确度,并在研究的情景中以数量顺序减少计算时间。
Article 183
Title@2025-07-16 (3): Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?
Title: Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization? | Sind Encoder in der Lage, Landmarken für den Warmstart der Hyperparameter-Optimierung zu lernen? | 编码员能否学习取暖启动超参数优化的地标? 2507.12604v1 |
Authors (2): Antoni Zajko, Katarzyna Woźnica
Effectively representing heterogeneous tabular datasets for meta-learning purposes is still an open problem. Previous approaches rely on representations that are intended to be universal. This paper proposes two novel methods for tabular representation learning tailored to a specific meta-task - warm-starting Bayesian Hyperparameter Optimization. Both follow the specific requirement formulated by ourselves that enforces representations to capture the properties of landmarkers. The first approach involves deep metric learning, while the second one is based on landmarkers reconstruction. We evaluate the proposed encoders in two ways. Next to the gain in the target meta-task, we also use the degree of fulfillment of the proposed requirement as the evaluation metric. Experiments demonstrate that while the proposed encoders can effectively learn representations aligned with landmarkers, they may not directly translate to significant performance gains in the meta-task of HPO warm-starting.
有效代表用于元学习目的的多式表格数据集仍然是一个尚未解决的问题。 以往的做法依赖于意在普遍性的表述。 本文件建议了两种新的列表代表学习方法,这些方法针对特定元任务,即热启动的巴伊西亚超参数优化。两种方法都遵循我们自己制定的具体要求,即强制进行表述,以捕捉地标人的特性。第一种方法涉及深层次的计量学习,而第二种方法则基于地标人的重建。我们以两种方式评估拟议的编码器。除了目标元任务中的好处外,我们还使用实现拟议要求的程度作为评价指标。实验表明,虽然拟议的编码能够有效地学习与地标人相一致的表述,但它们可能不会直接转化为在热启动的元任务中取得显著的业绩收益。
Article 184
Title@2025-07-16 (3): A Survey of Explainable Reinforcement Learning: Targets, Methods and Needs
Title: A Survey of Explainable Reinforcement Learning: Targets, Methods and Needs | Eine Übersicht über das Erklärbare Verstärkte Lernen: Ziele, Methoden und Bedürfnisse | 《可解释的强化学习调查:目标、方法和需要》 2507.12599v1 |
Authors (1): Léo Saulières
The success of recent Artificial Intelligence (AI) models has been accompanied by the opacity of their internal mechanisms, due notably to the use of deep neural networks. In order to understand these internal mechanisms and explain the output of these AI models, a set of methods have been proposed, grouped under the domain of eXplainable AI (XAI). This paper focuses on a sub-domain of XAI, called eXplainable Reinforcement Learning (XRL), which aims to explain the actions of an agent that has learned by reinforcement learning. We propose an intuitive taxonomy based on two questions “What” and “How”. The first question focuses on the target that the method explains, while the second relates to the way the explanation is provided. We use this taxonomy to provide a state-of-the-art review of over 250 papers. In addition, we present a set of domains close to XRL, which we believe should get attention from the community. Finally, we identify some needs for the field of XRL.
最近人工智能(AI)模式的成功伴随着其内部机制的不透明,这主要是因为使用了深神经网络。为了理解这些内部机制并解释这些光学模型的产出,提出了一套方法,在可氧化的AI(XAI)领域分类。本文件侧重于XAI的一个子领域,称为可氧化的强化学习(XRL),目的是解释一个通过强化学习学到的代理人的行为。我们建议根据“什么”和“如何”这两个问题进行直觉分类。我们提出的第一个问题侧重于该方法解释的目标,而第二个问题则涉及解释的方式。我们利用这一分类对250多份文件进行最新的审查。此外,我们提出了一组与XRL相近的域,我们认为应当引起社区的注意。我们提出了XRL领域的一些需要。
Article 185
Title@2025-07-16 (3): SCULPT: Systematic Tuning of Long Prompts
Title: SCULPT: Systematic Tuning of Long Prompts | SCULPT: Systematisches Tuning von langen Prompts | SCULPT: 长期提示系统图示 2410.20788v3 |
Authors (6): Shanu Kumar, Akhila Yesantarao Venkata, Shubhanshu Khandelwal, Bishal Santra, Parag Agrawal, Manish Gupta
Prompt optimization is essential for effective utilization of large language models (LLMs) across diverse tasks. While existing optimization methods are effective in optimizing short prompts, they struggle with longer, more complex ones, often risking information loss and being sensitive to small perturbations. To address these challenges, we propose SCULPT (Systematic Tuning of Long Prompts), a framework that treats prompt optimization as a hierarchical tree refinement problem. SCULPT represents prompts as tree structures, enabling targeted modifications while preserving contextual integrity. It employs a Critic-Actor framework that generates reflections and applies actions to refine the prompt. Evaluations demonstrate SCULPT’s effectiveness on long prompts, its robustness to adversarial perturbations, and its ability to generate high-performing prompts even without any initial human-written prompt. Compared to existing state of the art methods, SCULPT consistently improves LLM performance by preserving essential task information while applying structured refinements. Both qualitative and quantitative analyses show that SCULPT produces more stable and interpretable prompt modifications, ensuring better generalization across tasks.
快速优化是有效利用大型语言模型(LLMS)完成不同任务的关键。虽然现有的优化方法在优化短效提示方面是有效的,但它们与长效、更复杂的方法挣扎,往往冒着信息丢失的风险,对小扰动很敏感。为了应对这些挑战,我们提议ScULPT(长效提示系统图),这个框架将快速优化视为一个分级的树细化问题。SCULPT代表了树结构的灵敏度,在保持背景完整性的同时进行有针对性的修改。它使用一个Critic-Actor框架来产生反省,并采取行动来改进快速的。评价表明SCULPT在长效上的有效性,它对对抗性扰动的坚固性,以及即使没有初步的人写速度也能产生高性提示的能力。与艺术方法的现有状况相比,SCULPT在使用结构完善的同时通过保存基本的任务信息不断提高LMM的性能。 定性和定量分析都表明,SCULPT产生更稳定且可解释的及时修改,确保任务之间更加普遍化。
Article 186
Title@2025-07-16 (3): Nonparametric IPSS: Fast, flexible feature selection with false discovery control
Title: Nonparametric IPSS: Fast, flexible feature selection with false discovery control | Nichtparametrischer IPSS: Schnelle, flexible Feature-Auswahl mit falscher Discovery-Steuerung | 非参数IPSS:采用虚假发现控制快速、灵活地选择特征 2410.02208v3 |
Authors (3): Omar Melikechi, David B. Dunson, Jeffrey W. Miller
Feature selection is a critical task in machine learning and statistics. However, existing feature selection methods either (i) rely on parametric methods such as linear or generalized linear models, (ii) lack theoretical false discovery control, or (iii) identify few true positives. Here, we introduce a general feature selection method with finite-sample false discovery control based on applying integrated path stability selection (IPSS) to arbitrary feature importance scores. The method is nonparametric whenever the importance scores are nonparametric, and it estimates q-values, which are better suited to high-dimensional data than p-values. We focus on two special cases using importance scores from gradient boosting (IPSSGB) and random forests (IPSSRF). Extensive nonlinear simulations with RNA sequencing data show that both methods accurately control the false discovery rate and detect more true positives than existing methods. Both methods are also efficient, running in under 20 seconds when there are 500 samples and 5000 features. We apply IPSSGB and IPSSRF to detect microRNAs and genes related to cancer, finding that they yield better predictions with fewer features than existing approaches.
然而,现有的特征选择方法要么(一) 依赖线性或通用线性模型等参数学方法,要么(二) 缺乏理论假发现控制,要么(三) 辨别一些真实的正数。在这里,我们采用基于对任意特征重要分数应用综合路径稳定性选择(IPSS)的有限和模样的虚假发现控制方法,采用一般特征选择方法,根据对任意特征重要分数应用固定路径选择(IPSS)来进行一定的虚假发现控制。当重要分数不是参数学分时,这种方法是非参数学分,它估计q值,这些值比p值更适合高维度数据。我们侧重于两个特殊案例,利用梯度加速和随机森林(IPSSRF)的重要分数。用RNA测序数据进行广泛的非线性模拟表明,两种方法都准确控制了虚假发现率,并检测出比现有方法更真实的正数。两种方法的效率也不到20秒,当有500个样本和5000个特征时,我们使用IPSSGB和IPSSRF来检测与癌症有关的微RNA和基因,发现它们具有比现有方法更佳的特性。
Article 187
Title@2025-07-16 (3): Cross-Problem Parameter Transfer in Quantum Approximate Optimization Algorithm: A Machine Learning Approach
Title: Cross-Problem Parameter Transfer in Quantum Approximate Optimization Algorithm: A Machine Learning Approach | Cross-Problem-Parameter-Transfer in Quanten Ungefähre Optimierungs-Algorithmus: Ein Ansatz zum maschinellen Lernen | 量子中交叉问题参数转移 近最佳优化算法:机械学习方法 2504.10733v3 |
Authors (3): Kien X. Nguyen, Bao Bach, Ilya Safro
Quantum Approximate Optimization Algorithm (QAOA) is one of the most promising candidates to achieve the quantum advantage in solving combinatorial optimization problems. The process of finding a good set of variational parameters in the QAOA circuit has proven to be challenging due to multiple factors, such as barren plateaus. As a result, there is growing interest in exploiting parameter transferability, where parameter sets optimized for one problem instance are transferred to another that could be more complex either to estimate the solution or to serve as a warm start for further optimization. But can we transfer parameters from one class of problems to another? Leveraging parameter sets learned from a well-studied class of problems could help navigate the less studied one, reducing optimization overhead and mitigating performance pitfalls. In this paper, we study whether pretrained QAOA parameters of MaxCut can be used as is or to warm start the Maximum Independent Set (MIS) circuits. Specifically, we design machine learning models to find good donor candidates optimized on MaxCut and apply their parameters to MIS acceptors. Our experimental results show that such parameter transfer can significantly reduce the number of optimization iterations required while achieving comparable approximation ratios.
QAOA 电路中找到一套良好的变异参数的过程已证明由于多种因素,例如高原贫瘠,因此具有挑战性。结果,人们越来越有兴趣利用参数的可转移性,因为一个问题的参数组最优化地转移到另一个可能比较复杂的区域,要么用于估计解决办法,要么作为进一步优化的热点开端。但是,我们能否将某一类问题的参数转移到另一个类别?从一个类别的问题中学到的参数组可以帮助在经过仔细研究的各类问题中找到一套良好的变异参数,减少优化的间接费用和减轻性能陷阱。在本文中,我们研究的是,是否可以像现在这样或温暖地利用MaxCut的QAA参数参数组参数组参数组来启动最大独立集(MIS)电路。具体地说,我们设计机器学习模型,以找到最佳的捐赠者候选人,并将其参数应用到MIS接受者身上。我们的实验结果显示,在达到可比的精确度时,这种参数转换率可以大大降低。
Article 188
Title@2025-07-16 (3): Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows
Title: Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows | Best Practices für großformatige, Pixel-Wise-Crop-Mapping- und Transfer-Lern-Workflows | 大型、像素-威氏作物绘图和转移学习性工作流程最佳做法 2507.12590v1 |
Authors (6): Judy Long, Tao Liu, Sean Alexander Woznicki, Miljana Marković, Oskar Marko, Molly Sears
Crop mapping involves identifying and classifying crop types using spatial data, primarily derived from remote sensing imagery. This study presents the first comprehensive review of large-scale, pixel-wise crop mapping workflows, encompassing both conventional supervised methods and emerging transfer learning approaches. To identify the optimal supervised crop mapping workflows, we conducted systematic experiments, comparing six widely adopted satellite image-based preprocessing methods, alongside eleven supervised pixel-wise classification models. Additionally, we assessed the synergistic impact of varied training sample sizes and variable combinations. Moreover, we identified optimal transfer learning techniques for different magnitudes of domain shift. The evaluation of best methods was conducted across five diverse agricultural sites. Landsat 8 served as the primary satellite data source. Labels come from CDL trusted pixels and field surveys. Our findings reveal three key insights. First, fine-scale interval preprocessing paired with Transformer models consistently delivered optimal performance for both supervised and transferable workflows. RF offered rapid training and competitive performance in conventional supervised learning and direct transfer to similar domains. Second, transfer learning techniques enhanced workflow adaptability, with UDA being effective for homogeneous crop classes while fine-tuning remains robust across diverse scenarios. Finally, workflow choice depends heavily on the availability of labeled samples. With a sufficient sample size, supervised training typically delivers more accurate and generalizable results. Below a certain threshold, transfer learning that matches the level of domain shift is a viable alternative to achieve crop mapping. Repository: Best-Practices-for-Large-Scale-Pixel-Wise-Crop-Mapping-and-Transfer-Learning-Workflows
利用主要来自遥感图像的空间数据确定和分类作物类型。本研究首次全面审查了大规模、像素一样的作物绘图工作流程,包括常规监督方法和新兴的转移学习方法。为了确定最佳监督作物绘图工作流程,我们进行了系统实验,将六种广泛采用的卫星图像预处理方法与11个监督像素分类模型进行了比较。此外,我们评估了不同培训样本规模和可变组合的协同效应。此外,我们为不同规模的域变换确定了最佳转移学习技术。对五种不同农业地点的最佳方法进行了评价。8号土地作为主要卫星数据来源。实验室来自CDL信任的像素和实地调查。我们的调查结果揭示了三个主要的洞见。首先,微规模的预处理前与变异模型一致为受监管和可转移的工作流程提供了最佳的绩效。此外,RFR提供了常规监督学习和直接转移至类似领域的快速培训和竞争性绩效。第二,转让中流技术提高了工作流程的适应性,UDA在统一作物课程中有效,同时进行精细的调整,同时进行主要通过CDLLS和实地抽样分析,最终选择一定的升级。
Article 189
Title@2025-07-16 (3): Second-Order Bounds for [0,1]-Valued Regression via Betting Loss
Title: Second-Order Bounds for [0,1]-Valued Regression via Betting Loss | Zweiter Ordnungsbund für [0,1]-bewertete Regression über Wetting Loss | [0,1]-通过打赌损失导致的有价累退 2507.12584v1 |
Authors (2): Yinan Li, Kwang-Sung Jun
We consider the $[0,1]$-valued regression problem in the i.i.d. setting. In a related problem called cost-sensitive classification, \citet{foster21efficient} have shown that the log loss minimizer achieves an improved generalization bound compared to that of the squared loss minimizer in the sense that the bound scales with the cost of the best classifier, which can be arbitrarily small depending on the problem at hand. Such a result is often called a first-order bound. For $[0,1]$-valued regression, we first show that the log loss minimizer leads to a similar first-order bound. We then ask if there exists a loss function that achieves a variance-dependent bound (also known as a second order bound), which is a strict improvement upon first-order bounds. We answer this question in the affirmative by proposing a novel loss function called the betting loss. Our result is ``variance-adaptive’’ in the sense that the bound is attained \textit{without any knowledge about the variance}, which is in contrast to modeling label (or reward) variance or the label distribution itself explicitly as part of the function class such as distributional reinforcement learning.
我们在 i. d. 设置 设置中考虑 $[0,1]美元估价的回归问题。在一个名为成本敏感分类的相关问题中,\citet{foster21valess} 已经表明,与平方损失最小化相比,日志损失最小化实现比平方损失最小化更好的概括化,因为约束比例与最佳分类者的成本相比,根据手头的问题,这种成本可能任意地小一些。这种结果通常被称为第一顺序约束。对于 $0,1,1美元估价的回归,我们首先表明,日志损失最小化导致类似的第一顺序约束。我们然后询问,是否存在一个基于差异的制约(又称为第二顺序约束)的损失函数。这是对第一顺序约束的严格改进。我们肯定这一问题,提出称为赌注损失的新的损失函数。我们的结果是“变式适应性” , 意思是, 约束已经达到\ textititit{ { ,而不知道差异} ,这与标定标签差异(或奖赏) 或标签分配的加强本身是明确学习类别功能的一部分。
Article 190
Title@2025-07-16 (3): Ranking Vectors Clustering: Theory and Applications
Title: Ranking Vectors Clustering: Theory and Applications | Ranking Vektoren Clustering: Theorie und Anwendungen | 病媒分类组合:理论和应用 2507.12583v1 |
Authors (4): Ali Fattahi, Ali Eshragh, Babak Aslani, Meysam Rabiee
We study the problem of clustering ranking vectors, where each vector represents preferences as an ordered list of distinct integers. Specifically, we focus on the k-centroids ranking vectors clustering problem (KRC), which aims to partition a set of ranking vectors into k clusters and identify the centroid of each cluster. Unlike classical k-means clustering (KMC), KRC constrains both the observations and centroids to be ranking vectors. We establish the NP-hardness of KRC and characterize its feasible set. For the single-cluster case, we derive a closed-form analytical solution for the optimal centroid, which can be computed in linear time. To address the computational challenges of KRC, we develop an efficient approximation algorithm, KRCA, which iteratively refines initial solutions from KMC, referred to as the baseline solution. Additionally, we introduce a branch-and-bound (BnB) algorithm for efficient cluster reconstruction within KRCA, leveraging a decision tree framework to reduce computational time while incorporating a controlling parameter to balance solution quality and efficiency. We establish theoretical error bounds for KRCA and BnB. Through extensive numerical experiments on synthetic and real-world datasets, we demonstrate that KRCA consistently outperforms baseline solutions, delivering significant improvements in solution quality with fast computational times. This work highlights the practical significance of KRC for personalization and large-scale decision making, offering methodological advancements and insights that can be built upon in future studies.
我们研究的是集群分级矢量的问题,每个矢量代表偏好,作为不同整数的定序列表。具体地说,我们侧重于 kcentroid 分级矢量群集问题(KRC),目的是将一组排序矢量分解成 k组群,并查明每个组群的中子体。与古典的 k- 平均值群集( KMC) 不同, KRC 将观测和中子体都限制为排序矢量矢量。我们建立了KRC的NP- 硬度,并描述其可行的数据集。对于单组案例,我们为最佳的百分点体分类(可以在线性时间内计算)。为了应对 KRC 的计算挑战,我们开发了一个高效的近点算算算法(KRC ) ,它反复完善了KMC 的初始解决方案( KRC ) 。此外,我们引入了一种在 KRCA 和 BnB 中引入一个控制参数以平衡解决方案的质量和效率来减少实际时间的计算框架。我们为 KRC 和 BnB 设定了一个理论错误框框框框框框框。 通过广泛的数字实验, 展示了未来的模型模型模型模型模型, 并展示了我们在快速的模型模型中展示了 和快速计算方法的模型的模型中 , 展示了大比例式的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型式式式的模型, 。
Article 191
Title@2025-07-16 (3): Deep Bilinear Koopman Model for Real-Time Vehicle Control in Frenet Frame
Title: Deep Bilinear Koopman Model for Real-Time Vehicle Control in Frenet Frame | Tiefes Bilineares Koopman-Modell für Echtzeit-Fahrzeugsteuerung im Frenet-Rahmen | Frenet框架中实时车辆控制深海双线性库普曼模型 2507.12578v1 |
Authors (4): Mohammad Abtahi, Farhang Motallebi Araghi, Navid Mojahed, Shima Nazari
Accurate modeling and control of autonomous vehicles remain a fundamental challenge due to the nonlinear and coupled nature of vehicle dynamics. While Koopman operator theory offers a framework for deploying powerful linear control techniques, learning a finite-dimensional invariant subspace for high-fidelity modeling continues to be an open problem. This paper presents a deep Koopman approach for modeling and control of vehicle dynamics within the curvilinear Frenet frame. The proposed framework uses a deep neural network architecture to simultaneously learn the Koopman operator and its associated invariant subspace from the data. Input-state bilinear interactions are captured by the algorithm while preserving convexity, which makes it suitable for real-time model predictive control (MPC) application. A multi-step prediction loss is utilized during training to ensure long-horizon prediction capability. To further enhance real-time trajectory tracking performance, the model is integrated with a cumulative error regulator (CER) module, which compensates for model mismatch by mitigating accumulated prediction errors. Closed-loop performance is evaluated through hardware-in-the-loop (HIL) experiments using a CarSim RT model as the target plant, with real-time validation conducted on a dSPACE SCALEXIO system. The proposed controller achieved significant reductions in tracking error relative to baseline controllers, confirming its suitability for real-time implementation in embedded autonomous vehicle systems.
由于车辆动态的非线性及交错性质,对自主车辆的精确模型和控制仍是一个根本性挑战。虽然Koopman操作员理论为部署强大的线性控制技术提供了一个框架,但学习高不贞性模型的有限维维度子空间仍然是一个尚未解决的问题。本文件介绍了在卷轴Frenet框架内对车辆动态进行模型和控制的深Koopman方法。拟议框架使用深神经网络架构,从数据中同时学习Koopman操作员及其相关的变异子空间。输入-状态双线性互动由算法捕获,同时保留共性,使其适合实时模型预测控制(MPC)应用。在培训期间使用了多步预测损失,以确保长偏差预测能力。为进一步加强实时轨跟踪性,该模型与累积错误调节器模块相结合,通过减少累积的预测误差来弥补模型的错错错错。通过硬件-行距(HIL)对投入双线性互动进行评估,同时保存同流,从而适合实时模型预测控制(MPC)应用实时模型(IM)实时定位系统进行实时定位跟踪,将封闭式自动定位系统进行定位系统,以进行大规模定位定位系统定位系统定位,以降低系统。
Article 192
Title@2025-07-16 (3): Assay2Mol: large language model-based drug design using BioAssay context
Title: Assay2Mol: large language model-based drug design using BioAssay context | Assay2Mol: großsprachiges, modellbasiertes Arzneimitteldesign unter Verwendung von BioAssay-Kontexten | Assay2Mol:使用BioAssay环境的大型语言示范药物设计 2507.12574v1 |
Authors (3): Yifan Deng, Spencer S. Ericksen, Anthony Gitter
Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. In biochemistry, molecule screening assays evaluate the functional responses of candidate molecules against disease targets. Unstructured text that describes the biological mechanisms through which these targets operate, experimental screening protocols, and other attributes of assays offer rich information for new drug discovery campaigns but has been untapped because of that unstructured format. We present Assay2Mol, a large language model-based workflow that can capitalize on the vast existing biochemical screening assays for early-stage drug discovery. Assay2Mol retrieves existing assay records involving targets similar to the new target and generates candidate molecules using in-context learning with the retrieved assay screening data. Assay2Mol outperforms recent machine learning approaches that generate candidate ligand molecules for target protein structures, while also promoting more synthesizable molecule generation.
在生物化学学中,分子筛选实验评估候选分子对疾病目标的功能性反应; 描述这些目标运作的生物机制的无结构文本; 实验筛选规程和其他实验属性为新的药物发现运动提供了丰富的信息,但由于这种结构化的形式而尚未开发; 我们介绍了Assay2Mol, 这是一种大型语言模式式工作流程,可以利用现有的大规模生物化学筛选实验,用于早期药物发现; Asssay2Mol检索与新目标类似的现有实验记录,并利用检索到的实验筛选数据进行文体内学习,生成候选分子; Asssay2Mol优于最近的机器学习方法,为目标蛋白结构产生候选的骨浆分子,同时促进更多可合成的分子生成。
Article 193
Title@2025-07-16 (3): IncA-DES: An incremental and adaptive dynamic ensemble selection approach using online K-d tree neighborhood search for data streams with concept drift
Title: IncA-DES: An incremental and adaptive dynamic ensemble selection approach using online K-d tree neighborhood search for data streams with concept drift | IncA-DES: Ein inkrementeller und adaptiver dynamischer Ensemble-Auswahlansatz mit Online-K-d-Baum Nachbarschaftssuche nach Datenströmen mit Konzeptdrift | IncA-DES:使用在线K-d树区搜索带有概念漂移的数据流的渐进和适应性动态动态混合选择方法 2507.12573v1 |
Authors (5): Eduardo V. L. Barboza, Paulo R. Lisboa de Almeida, Alceu de Souza Britto Jr., Robert Sabourin, Rafael M. O. Cruz
Data streams pose challenges not usually encountered in batch-based ML. One of them is concept drift, which is characterized by the change in data distribution over time. Among many approaches explored in literature, the fusion of classifiers has been showing good results and is getting growing attention. DS methods, due to the ensemble being instance-based, seem to be an efficient choice under drifting scenarios. However, some attention must be paid to adapting such methods for concept drift. The training must be done in order to create local experts, and the commonly used neighborhood-search DS may become prohibitive with the continuous arrival of data. In this work, we propose IncA-DES, which employs a training strategy that promotes the generation of local experts with the assumption that different regions of the feature space become available with time. Additionally, the fusion of a concept drift detector supports the maintenance of information and adaptation to a new concept. An overlap-based classification filter is also employed in order to avoid using the DS method when there is a consensus in the neighborhood, a strategy that we argue every DS method should employ, as it was shown to make them more applicable and quicker. Moreover, aiming to reduce the processing time of the kNN, we propose an Online K-d tree algorithm, which can quickly remove instances without becoming inconsistent and deals with unbalancing concerns that may occur in data streams. Experimental results showed that the proposed framework got the best average accuracy compared to seven state-of-the-art methods considering different levels of label availability and presented the smaller processing time between the most accurate methods. Additionally, the fusion with the Online K-d tree has improved processing time with a negligible loss in accuracy. We have made our framework available in an online repository.
以批量为基础的 ML 通常不会遇到数据流构成挑战。 其中之一是概念流流,其特点是随着时间推移数据分布的变化而导致概念流变异。在文献中探讨的许多方法中,分类器的融合显示出良好的效果,并日益引起注意。由于组合以实例为基础,DS方法在漂移情景下似乎是一种有效的选择。然而,必须注意调整这种概念流的方法。必须进行这种培训,以便建立当地专家,通常使用的邻里搜索 DS 可能会随着数据的不断到来而变得难以使用。在这项工作中,我们建议使用Inca-DES, 采用一种培训战略,促进当地专家的产生,假设地貌空间的不同区域会随着时间的推移出现。此外,由于概念漂移探测器的融合,这有利于维护信息并适应新的概念。 使用基于重叠的分类过滤器是为了避免在邻里形成共识时使用DS 方法,我们提出的每种DS 方法都应该采用的一种战略,正如它表明这样会使其更精确性更精确性地在在线处理时间流之间变得最不准确。此外,我们提出的时间流流中可以减少一个不固定的K 运结果。我们提出的计算结果在不连续进行不固定的计算结果中可以减少。在不固定的顺序中进行不固定的顺序上的计算。
Article 194
Title@2025-07-16 (3): Evaluation of Neural Surrogates for Physical Modelling Synthesis of Nonlinear Elastic Plates
Title: Evaluation of Neural Surrogates for Physical Modelling Synthesis of Nonlinear Elastic Plates | Bewertung von Neuralen Surrogaten für die physikalische Modellierung der Synthese nichtlinearer elastischer Platten | 评价非线性电磁板物理模拟合成神经悬浮体评价 2507.12563v1 |
Authors (3): Carlos De La Vega Martin, Rodrigo Diaz Fernandez, Mark Sandler
Physical modelling synthesis aims to generate audio from physical simulations of vibrating structures. Thin elastic plates are a common model for drum membranes. Traditional numerical methods like finite differences and finite elements offer high accuracy but are computationally demanding, limiting their use in real-time audio applications. This paper presents a comparative analysis of neural network-based approaches for solving the vibration of nonlinear elastic plates. We evaluate several state-of-the-art models, trained on short sequences, for prediction of long sequences in an autoregressive fashion. We show some of the limitations of these models, and why is not enough to look at the prediction error in the time domain. We discuss the implications for real-time audio synthesis and propose future directions for improving neural approaches to model nonlinear vibration.
物理模拟合成旨在从振动结构的物理模拟中产生音频。薄弹性板是鼓膜的常见模型。传统的数字方法,如有限差异和有限元素,具有很高的准确性,但具有计算要求,限制了实时音频应用中的使用。本文对解决非线性弹性板振动的神经网络方法进行了比较分析。我们评估了几种最先进的模型,这些模型经过短序列培训,以自动递减的方式预测长序列。我们展示了这些模型的一些局限性,以及为什么没有足够的时间范围来查看预测错误。我们讨论了实时音频合成的影响,并提出了改进非线性振动模型神经方法的未来方向。
Article 195
Title@2025-07-16 (3): Rel-HNN: Split Parallel Hypergraph Neural Network for Learning on Relational Databases
Title: Rel-HNN: Split Parallel Hypergraph Neural Network for Learning on Relational Databases | Rel-HNN: Paralleles Hypergraphen-Neurales Netzwerk zum Lernen auf relationalen Datenbanken | Rel-HNN: 用于在关系数据库中学习的分平行超时图神经网络 2507.12562v1 |
Authors (4): Md. Tanvir Alam, Md. Ahasanul Alam, Md Mahmudur Rahman, Md. Mosaddek Khan
Relational databases (RDBs) are ubiquitous in enterprise and real-world applications. Flattening the database poses challenges for deep learning models that rely on fixed-size input representations to capture relational semantics from the structured nature of relational data. Graph neural networks (GNNs) have been proposed to address this, but they often oversimplify relational structures by modeling all the tuples as monolithic nodes and ignoring intra-tuple associations. In this work, we propose a novel hypergraph-based framework, that we call rel-HNN, which models each unique attribute-value pair as a node and each tuple as a hyperedge, enabling the capture of fine-grained intra-tuple relationships. Our approach learns explicit multi-level representations across attribute-value, tuple, and table levels. To address the scalability challenges posed by large RDBs, we further introduce a split-parallel training algorithm that leverages multi-GPU execution for efficient hypergraph learning. Extensive experiments on real-world and benchmark datasets demonstrate that rel-HNN significantly outperforms existing methods in both classification and regression tasks. Moreover, our split-parallel training achieves substantial speedups – up to 3.18x for learning on relational data and up to 2.94x for hypergraph learning – compared to conventional single-GPU execution.
在企业和现实世界应用中, 关系数据库( RDBs) 普遍存在于企业和现实世界应用中。 数据库Flattleing 数据库对依靠固定规模投入表示来从关系数据的结构性质中获取关系语义的深层次学习模式提出了挑战。 已经提议了图形神经网络(GNNS)来解决这个问题, 但是它们往往过分简化关系结构, 将所有图例都建为单一节点, 忽视学生内部的关联。 在这项工作中, 我们提议了一个新型的超光速框架, 我们称之为rel- HNNN, 以每个独特的属性值配对为节点, 以及每个图例作为超端, 以捕捉关系。 我们的方法在属性价值、 图普尔和表级别中学习明确的多层次表示。 为了应对大型区域数据库带来的可缩放挑战, 我们进一步引入了一种双向培训算法, 利用多面GPU执行来高效的超音率学习。 在现实世界和基准数据结构中进行广泛的实验, 将每个属性对应数据关系建为超高端, 。 18 对比GNNNNBS 学习系统 , 的单个和基化为新的系统, , 以新的系统化为二进化为新的系统, , 在常规数据回归中, 和基准数据格式化, 将新的系统, 向新的系统, 在常规学习中, 进行大量学习中, 和基化。
Article 196
Title@2025-07-16 (3): Monocular 3D Hand Pose Estimation with Implicit Camera Alignment
Title: Monocular 3D Hand Pose Estimation with Implicit Camera Alignment | Monokulare 3D-Hand Pose-Schätzung mit Impliziter Kameraausrichtung | 带有隐性相机对齐的手动脉动估计 2506.11133v2 |
Authors (3): Christos Pantazopoulos, Spyridon Thermos, Gerasimos Potamianos
Estimating the 3D hand articulation from a single color image is an important problem with applications in Augmented Reality (AR), Virtual Reality (VR), Human-Computer Interaction (HCI), and robotics. Apart from the absence of depth information, occlusions, articulation complexity, and the need for camera parameters knowledge pose additional challenges. In this work, we propose an optimization pipeline for estimating the 3D hand articulation from 2D keypoint input, which includes a keypoint alignment step and a fingertip loss to overcome the need to know or estimate the camera parameters. We evaluate our approach on the EgoDexter and Dexter+Object benchmarks to showcase that it performs competitively with the state-of-the-art, while also demonstrating its robustness when processing “in-the-wild” images without any prior camera knowledge. Our quantitative analysis highlights the sensitivity of the 2D keypoint estimation accuracy, despite the use of hand priors. Code is available at the project page https://cpantazop.github.io/HandRepo/
从单一颜色图像中估算 3D 手语表达器是应用增强现实(AR)、虚拟现实(VR)、人类-计算机互动(HCI)和机器人方面的一个重要问题。除了缺少深度信息、隔离、表达复杂性以及需要摄像参数知识之外,还带来了更多的挑战。在这项工作中,我们提议了一条优化管道,用 2D 关键点输入来估算 3D 手语表达器,其中包括一个关键点对齐步骤和指尖丢失,以克服了解或估计相机参数的需要。我们评估了EgoDexter 和 Dexter+Object 基准的方法,以展示它与艺术状态竞争的表现,同时在不事先了解任何相机知识的情况下处理“在网络中”图像时也显示了其稳健性。我们的定量分析突出了尽管使用了手前的2D 关键点估计准确度的敏感度。项目网页 https://cpantazop.github.io/HandRepo/ 。
Article 197
Title@2025-07-16 (3): Neural stochastic Volterra equations: learning path-dependent dynamics
Title: Neural stochastic Volterra equations: learning path-dependent dynamics | Neural stochastische Volterra-Gleichungen: Lernpfad-abhängige Dynamiken | 神经随机伏变方程式:学习依赖路径的动态 2407.19557v2 |
Authors (3): Martin Bergerhausen, David J. Prömel, David Scheffels
Stochastic Volterra equations (SVEs) serve as mathematical models for the time evolutions of random systems with memory effects and irregular behaviour. We introduce neural stochastic Volterra equations as a physics-inspired architecture, generalizing the class of neural stochastic differential equations, and provide some theoretical foundation. Numerical experiments on various SVEs, like the disturbed pendulum equation, the generalized Ornstein–Uhlenbeck process, the rough Heston model and a monetary reserve dynamics, are presented, comparing the performance of neural SVEs, neural SDEs and Deep Operator Networks (DeepONets).
斯托克伏尔特拉方程式(SVES)是随机系统时间演变的数学模型,具有记忆效应和不正常行为。我们引入神经切换伏尔特拉方程式作为物理学启发的建筑,对神经切换差异方程式进行总体分析,并提供一些理论基础。 我们介绍了各种SVES的数值实验,如扰动的钟式方程式、通俗的Ornstein-Uhlenbeck工艺、粗糙的赫斯顿模型和货币储备动态,对神经切换、神经SDES和深操作网络(DeepONets)的性能进行比较。
Article 198
Title@2025-07-16 (3): Machine Learning Systems: A Survey from a Data-Oriented Perspective
Title: Machine Learning Systems: A Survey from a Data-Oriented Perspective | Machine Learning Systems: Eine Umfrage aus datenorientierter Perspektive | 机械学习系统:从数据导向的角度进行调查 2302.04810v3 |
Authors (4): Christian Cabrera, Andrei Paleyes, Pierre Thodoroff, Neil D. Lawrence
Engineers are deploying ML models as parts of real-world systems with the upsurge of AI technologies. Real-world environments challenge the deployment of such systems because these environments produce large amounts of heterogeneous data, and users require increasingly efficient responses. These requirements push prevalent software architectures to the limit when deploying ML-based systems. Data-oriented Architecture (DOA) is an emerging style that equips systems better for integrating ML models. Even though papers on deployed ML systems do not mention DOA, their authors made design decisions that implicitly follow DOA. Implicit decisions create a knowledge gap, limiting the practitioners’ ability to implement ML-based systems. \hlb{This paper surveys why, how, and to what extent practitioners have adopted DOA to implement and deploy ML-based systems.} We overcome the knowledge gap by answering these questions and explicitly showing the design decisions and practices behind these systems. The survey follows a well-known systematic and semi-automated methodology for reviewing papers in software engineering. The majority of reviewed works partially adopt DOA. Such an adoption enables systems to address requirements such as Big Data management, low latency processing, resource management, security and privacy. Based on these findings, we formulate practical advice to facilitate the deployment of ML-based systems.
随着AI技术的激增,工程师正在将ML模型作为现实世界系统的一部分加以部署。现实世界环境对此类系统的部署提出了挑战,因为这些环境产生大量不同的数据,用户需要越来越高效的反应。这些要求将流行的软件结构推向部署以ML为基础的系统时的极限。以数据为导向的建筑(DOA)是一种新兴的风格,为整合ML模型提供了更好的系统。即使已部署的ML系统的文件没有提到DOA,但其作者却不言而喻地根据DOA作出了设计决定。隐含的决定造成了知识差距,限制了从业人员实施ML系统的能力。 \hlb{本文调查了为什么、如何以及在何种程度上从业人员采用了DOA来实施和部署以ML为基础的系统。}我们通过回答这些问题并明确展示这些系统背后的设计决定和做法,克服了知识差距。调查遵循了一种众所周知的系统性和半自动化方法来审查软件工程文件。经过审查的大多数作品部分采用DA。这种应用使系统能够满足诸如大数据管理、低密度处理、资源管理、安全和隐私部署发现等要求。
Article 199
Title@2025-07-16 (3): Improving Transformer World Models for Data-Efficient RL
Title: Improving Transformer World Models for Data-Efficient RL | Verbesserung von Transformer-Weltmodellen für dateneffiziente RL | 改进数据效率RL世界模型 2502.01591v3 |
Authors (8): Antoine Dedieu, Joseph Ortiz, Xinghua Lou, Carter Wendelken, Wolfgang Lehrach, J Swaroop Guntupalli, Miguel Lazaro-Gredilla, Kevin Patrick Murphy
We present three improvements to the standard model-based RL paradigm based on transformers: (a) “Dyna with warmup”, which trains the policy on real and imaginary data, but only starts using imaginary data after the world model has been sufficiently trained; (b) “nearest neighbor tokenizer” for image patches, which improves upon previous tokenization schemes, which are needed when using a transformer world model (TWM), by ensuring the code words are static after creation, thus providing a constant target for TWM learning; and (c) “block teacher forcing”, which allows the TWM to reason jointly about the future tokens of the next timestep, instead of generating them sequentially. We then show that our method significantly improves upon prior methods in various environments. We mostly focus on the challenging Craftax-classic benchmark, where our method achieves a reward of 69.66% after only 1M environment steps, significantly outperforming DreamerV3, which achieves 53.2%, and exceeding human performance of 65.0% for the first time. We also show preliminary results on Craftax-full, MinAtar, and three different two-player games, to illustrate the generality of the approach.
我们对基于变压器的标准模型RL范式提出了三项改进:(a)“加热热热腾腾腾”,用于培训真实和想象数据的政策,但只在世界模型经过充分培训后才开始使用假想数据;(b)图像补丁的“近邻代代记号器”,改进了先前的代谢方案,在使用变压器世界模型(TWM)时,需要采用前代代代代号,确保代号在创建后是静止的,从而为TWM学习提供了一个不变的目标;以及(c)“阻隔教师强迫”,使TWM能够共同解释下一个时间步骤的未来标志,而不是按顺序生成。我们随后表明,我们的方法在各种环境中的以往方法有很大改进。我们主要侧重于具有挑战性的Craffag税级基准,即我们的方法在使用变压器世界模型(TWMM)后只获得69.66%的奖励,大大超过DreamerV3,达到53.2%,首次超过65.0%的人类表现。我们还展示了Craftaxfrixfrif-frity、Min and thirnal-plical-gual-pal-pality。
Article 200
Title@2025-07-16 (3): Can Mental Imagery Improve the Thinking Capabilities of AI Systems?
Title: Can Mental Imagery Improve the Thinking Capabilities of AI Systems? | Kann Mental Imagery die Denkfähigkeiten von KI-Systemen verbessern? | 精神形象能提高人工智能系统的思考能力吗? 2507.12555v1 |
Authors (1): Slimane Larabi
Although existing models can interact with humans and provide satisfactory responses, they lack the ability to act autonomously or engage in independent reasoning. Furthermore, input data in these models is typically provided as explicit queries, even when some sensory data is already acquired. In addition, AI agents, which are computational entities designed to perform tasks and make decisions autonomously based on their programming, data inputs, and learned knowledge, have shown significant progress. However, they struggle with integrating knowledge across multiple domains, unlike humans. Mental imagery plays a fundamental role in the brain’s thinking process, which involves performing tasks based on internal multisensory data, planned actions, needs, and reasoning capabilities. In this paper, we investigate how to integrate mental imagery into a machine thinking framework and how this could be beneficial in initiating the thinking process. Our proposed machine thinking framework integrates a Cognitive thinking unit supported by three auxiliary units: the Input Data Unit, the Needs Unit, and the Mental Imagery Unit. Within this framework, data is represented as natural language sentences or drawn sketches, serving both informative and decision-making purposes. We conducted validation tests for this framework, and the results are presented and discussed.
虽然现有模型可以与人类互动,并作出令人满意的反应,但它们缺乏自主行动或独立推理的能力。此外,这些模型中的输入数据通常是作为明确查询提供的,即使已经获得某些感官数据。此外,AI代理机构,这些是计算实体,旨在根据编程、数据投入和知识知识自主地执行任务和作出决定,已经取得了显著进展。然而,它们与人类不同的是,在将知识融入多个领域方面挣扎不休。精神图像在大脑的思维过程中发挥着根本作用,这包括执行基于内部多感官数据、计划的行动、需求和推理能力的任务。在本文件中,我们调查如何将精神图像纳入机智学思维框架,以及这如何有利于启动思维进程。我们提议的机器思维框架整合了一个由三个辅助单位(输入数据股、需求股以及精神图像股)支持的认知思考股。在这个框架内,数据被表述为自然语言句或绘制的草图,为提供信息和决策目的服务。我们为这一框架进行了验证测试,结果被介绍和讨论。
Article 201
Title@2025-07-16 (3): The Serial Scaling Hypothesis
Title: The Serial Scaling Hypothesis | Die serienmäßige Skalierungshypothese | 序列缩放假设 2507.12549v1 |
Authors (4): Yuxi Liu, Konpat Preechakul, Kananart Kuwaranancharoen, Yutong Bai
While machine learning has advanced through massive parallelization, we identify a critical blind spot: some problems are fundamentally sequential. These “inherently serial” problems-from mathematical reasoning to physical simulations to sequential decision-making-require dependent computational steps that cannot be parallelized. Drawing from complexity theory, we formalize this distinction and demonstrate that current parallel-centric architectures face fundamental limitations on such tasks. We argue that recognizing the serial nature of computation holds profound implications on machine learning, model design, hardware development. As AI tackles increasingly complex reasoning, deliberately scaling serial computation-not just parallel computation-is essential for continued progress.
虽然机器学习是通过大规模平行化推进的,但我们发现了一个关键的盲点:有些问题根本上是相继的。这些“内在序列”问题从数学推理到物理模拟,到顺序决策-要求无法平行的依附计算步骤。根据复杂理论,我们正式确定这一区别,并表明当前平行中心结构在此类任务上面临根本性的限制。我们争辩说,承认计算序列性质对机器学习、模型设计、硬件开发具有深远影响。大赦国际处理的理由是,不断复杂的推理,有意扩大序列计算,而不仅仅是平行计算,对于继续取得进展至关重要。
Article 202
Title@2025-07-16 (3): Language Models Improve When Pretraining Data Matches Target Tasks
Title: Language Models Improve When Pretraining Data Matches Target Tasks | Sprachmodelle verbessern, wenn die Vorschulung von Daten zu Zielaufgaben passt | 培训前数据匹配目标任务时改进语言模式 2507.12466v1 |
Authors (10): David Mizrahi, Anders Boesen Lindbo Larsen, Jesse Allardice, Suzie Petryk, Yuri Gorokhov, Jeffrey Li, Alex Fang, Josh Gardner, Tom Gunter, Afshin Dehghan
Every data selection method inherently has a target. In practice, these targets often emerge implicitly through benchmark-driven iteration: researchers develop selection strategies, train models, measure benchmark performance, then refine accordingly. This raises a natural question: what happens when we make this optimization explicit? To explore this, we propose benchmark-targeted ranking (BETR), a simple method that selects pretraining documents based on similarity to benchmark training examples. BETR embeds benchmark examples and a sample of pretraining documents in a shared space, scores this sample by similarity to benchmarks, then trains a lightweight classifier to predict these scores for the full corpus. We compare data selection methods by training over 500 models spanning $10^{19}$ to $10^{22}$ FLOPs and fitting scaling laws to them. From this, we find that simply aligning pretraining data to evaluation benchmarks using BETR achieves a 2.1x compute multiplier over DCLM-Baseline (4.7x over unfiltered data) and improves performance on 9 out of 10 tasks across all scales. BETR also generalizes well: when targeting a diverse set of benchmarks disjoint from our evaluation suite, it still matches or outperforms baselines. Our scaling analysis further reveals a clear trend: larger models require less aggressive filtering. Overall, our findings show that directly matching pretraining data to target tasks precisely shapes model capabilities and highlight that optimal selection strategies must adapt to model scale.
每个数据选择方法本身都有目标。在实践中,这些指标往往通过基准驱动的迭代而隐含地出现:研究人员制定选择战略,培训模型,衡量基准业绩,然后进行相应的完善。这提出了一个自然的问题:当我们使优化明确时会发生什么情况?为了对此进行探讨,我们提议基准目标排名(BETR),这是根据与基准培训范例相似的办法来选择培训前文件的简单方法。BETR在共享空间中嵌入基准范例和训练前文件样本,以类似基准的评分,然后训练一个轻量分类员来预测全套的评分。我们通过培训500多个模型来比较数据选择方法,这些模型覆盖10美元至10美元至22美元。我们从中发现,简单地将培训前的数据与评估基准相匹配,而采用与基准培训范例相似,在DCLM-Baseline(4.7x高于未过滤模型的4.7x)的基础上计算乘数乘数乘数乘数,并在所有尺度的10项任务中提高业绩的比值,然后培训一个精细的分数:当针对不同基准的设定标准时,从10美元到10美元到10美元,我们评价套的比值则要比值,我们更精确的比标,我们更精确的比标,我们更精确地显示一个比标的比比比比比比标,我们更更精确的比比标。
Article 203
Title@2025-07-16 (3): CytoSAE: Interpretable Cell Embeddings for Hematology
Title: CytoSAE: Interpretable Cell Embeddings for Hematology | CytoSAE: Interpretierbare Zelleinbettungen für die Hämatologie | CytoSAE: 热病学的解释性细胞嵌入 2507.12464v1 |
Authors (6): Muhammed Furkan Dasdelen, Hyesu Lim, Michele Buck, Katharina S. Götze, Carsten Marr, Steffen Schneider
Sparse autoencoders (SAEs) emerged as a promising tool for mechanistic interpretability of transformer-based foundation models. Very recently, SAEs were also adopted for the visual domain, enabling the discovery of visual concepts and their patch-wise attribution to tokens in the transformer model. While a growing number of foundation models emerged for medical imaging, tools for explaining their inferences are still lacking. In this work, we show the applicability of SAEs for hematology. We propose CytoSAE, a sparse autoencoder which is trained on over 40,000 peripheral blood single-cell images. CytoSAE generalizes to diverse and out-of-domain datasets, including bone marrow cytology, where it identifies morphologically relevant concepts which we validated with medical experts. Furthermore, we demonstrate scenarios in which CytoSAE can generate patient-specific and disease-specific concepts, enabling the detection of pathognomonic cells and localized cellular abnormalities at the patch level. We quantified the effect of concepts on a patient-level AML subtype classification task and show that CytoSAE concepts reach performance comparable to the state-of-the-art, while offering explainability on the sub-cellular level. Source code and model weights are available at https://github.com/dynamical-inference/cytosae.
以变压器为基础的基建模型(SAEs)作为机械解释变压器基础模型(SAEs)的一个很有希望的工具出现。最近,还采用了SAE(SAE),用于视觉领域,发现视觉概念及其在变压器模型中象征物的随机属性。虽然出现了越来越多的医学成像基础模型,但解释其推论的工具仍然缺乏。在这项工作中,我们展示了SAE(SAE)在血液学上的适用性。我们提议CytoSAE(一个稀有的自动编码器),它拥有40,000多个周边的单细胞血液图像。CytoSAE(SAE)在视觉领域普遍采用多种外的数据集,包括骨髓细胞学,从而能够发现与变形相关的概念,我们与医学专家验证了这些概念。此外,我们展示了CytoSAE(CytoSAE)能够产生特定病人和特定疾病概念的情景,从而能够在补接层检测病情细胞细胞和局部细胞异常。我们量化了概念对病人一级亚模型亚型/直径级分类/直径级的状态,同时展示了Cylosa(Cytoal-E)的可判判分级/直径级)概念。
Article 204
Title@2025-07-16 (3): Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
Title: Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models | Difused Responsibility: Analyse des Energieverbrauches von generativen Text-zu-Audio-Diffusionsmodellen | 挥散责任:分析产生型号向视听传播模型的能源消耗 2505.07615v2 |
Authors (5): Riccardo Passoni, Francesca Ronchini, Luca Comanducci, Romain Serizel, Fabio Antonacci
Text-to-audio models have recently emerged as a powerful technology for generating sound from textual descriptions. However, their high computational demands raise concerns about energy consumption and environmental impact. In this paper, we conduct an analysis of the energy usage of 7 state-of-the-art text-to-audio diffusion-based generative models, evaluating to what extent variations in generation parameters affect energy consumption at inference time. We also aim to identify an optimal balance between audio quality and energy consumption by considering Pareto-optimal solutions across all selected models. Our findings provide insights into the trade-offs between performance and environmental impact, contributing to the development of more efficient generative audio models.
文字到文字模型最近已成为一种强大的技术,能够从文字描述中产生声音,然而,它们的高计算要求引起了对能源消耗和环境影响的关切。在本文件中,我们分析了7种最先进的文字到文字扩散的基因化模型的能源使用情况,评估了生产参数的变化在多大程度上影响推论时间的能源消耗。我们还力求通过考虑所有选定模型的Pareto最佳解决方案,在声音质量和能源消耗之间找到最佳平衡。我们的调查结果揭示了性能和环境影响之间的权衡,有助于开发更高效的基因化声音模型。
Article 205
Title@2025-07-16 (3): Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training
Title: Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training | Scaling Up RL: Unlocking Diverse Reasoning in LLMs durch längeres Training | 提升RL:通过长期培训解锁LLMs的多样化理由 2507.12507v1 |
Authors (22): Mingjie Liu, Shizhe Diao, Jian Hu, Ximing Lu, Xin Dong, Hao Zhang, Alexander Bukharin, Shaokun Zhang, Jiaqi Zeng, Makesh Narsimhan Sreedhar, Gerald Shen, David Mosallanezhad, Di Zhang, Jonas Yang, June Yang, Oleksii Kuchaiev, Guilin Liu, Zhiding Yu, Pavlo Molchanov, Yejin Choi, Jan Kautz, Yi Dong
Recent advancements in reasoning-focused language models such as OpenAI’s O1 and DeepSeek-R1 have shown that scaling test-time computation-through chain-of-thought reasoning and iterative exploration-can yield substantial improvements on complex tasks like mathematics and code generation. These breakthroughs have been driven by large-scale reinforcement learning (RL), particularly when combined with verifiable reward signals that provide objective and grounded supervision. In this report, we investigate the effects of prolonged reinforcement learning on a small language model across a diverse set of reasoning domains. Our work identifies several key ingredients for effective training, including the use of verifiable reward tasks, enhancements to Group Relative Policy Optimization (GRPO), and practical techniques to improve training stability and generalization. We introduce controlled KL regularization, clipping ratio, and periodic reference policy resets as critical components for unlocking long-term performance gains. Our model achieves significant improvements over strong baselines, including +14.7% on math, +13.9% on coding, and +54.8% on logic puzzle tasks. To facilitate continued research, we release our model publicly.
以推理为重点的语言模式,如OpenAI的 O1 和 DeepSeek-R1 的最近进展表明,在数学和代码生成等复杂任务上,按比例测试-测试时间的计算、思维链推理和迭代探索-扫描能够带来重大改进。这些突破是由大规模强化学习(RL)驱动的,特别是当与提供客观和有根据的监督的可核实奖赏信号相结合时。我们在本报告中调查了长期强化学习对一系列不同推理领域的小型语言模式的影响。我们的工作确定了有效培训的若干关键要素,包括使用可核查的奖励任务、增强群体相对政策优化(GROPO)以及改进培训稳定性和一般化的实用技术。我们引入了受控的KL正规化、剪裁率比率和定期参考政策重置作为释放长期绩效收益的关键组成部分。我们的模型在强大的基线上取得了显著的改进,包括数学+14.7%,编码为+13.9%,逻辑拼图为+54.8%。为了便利继续研究,我们公开公布了我们的模型。
Article 206
Title@2025-07-16 (3): Cost-aware Stopping for Bayesian Optimization
Title: Cost-aware Stopping for Bayesian Optimization | Kostenbewusstes Stoppen für die Bayesian-Optimierung | Bayesian最佳最佳化的成本意识停止 2507.12453v1 |
Authors (5): Qian Xie, Linda Cai, Alexander Terenin, Peter I. Frazier, Ziv Scully
In automated machine learning, scientific discovery, and other applications of Bayesian optimization, deciding when to stop evaluating expensive black-box functions is an important practical consideration. While several adaptive stopping rules have been proposed, in the cost-aware setting they lack guarantees ensuring they stop before incurring excessive function evaluation costs. We propose a cost-aware stopping rule for Bayesian optimization that adapts to varying evaluation costs and is free of heuristic tuning. Our rule is grounded in a theoretical connection to state-of-the-art cost-aware acquisition functions, namely the Pandora’s Box Gittins Index (PBGI) and log expected improvement per cost. We prove a theoretical guarantee bounding the expected cumulative evaluation cost incurred by our stopping rule when paired with these two acquisition functions. In experiments on synthetic and empirical tasks, including hyperparameter optimization and neural architecture size search, we show that combining our stopping rule with the PBGI acquisition function consistently matches or outperforms other acquisition-function–stopping-rule pairs in terms of cost-adjusted simple regret, a metric capturing trade-offs between solution quality and cumulative evaluation cost.
在自动机器学习、科学发现和巴伊西亚优化的其他应用中,决定何时停止评估昂贵的黑盒功能是一项重要的实际考虑。虽然已经提出了若干适应性停止规则,但在成本意识设置中,这些规则缺乏保证,确保在产生过多功能评估成本之前停止。我们提出了巴伊西亚优化的成本意识停止规则,该规则适应不同的评价成本,且不进行休眠调整。我们的规则基于与最先进的成本意识获取功能的理论联系,即潘多拉箱Gittins指数(PBGI)和每成本日志的预期改进。我们证明了一种理论保证,约束了我们在与这两个获取功能配对时停止规则产生的预期累积评价成本。在合成和实验性任务中,包括超离子仪优化和神经结构规模搜索实验中,我们表明我们的停止规则与PBGI获取功能的结合一致或优于成本调整后的其他购置功能规则配对,即简单的遗憾、衡量解决方案质量与累积评估成本之间的平衡。
Article 207
Title@2025-07-16 (3): MARS: Unleashing the Power of Variance Reduction for Training Large Models
Title: MARS: Unleashing the Power of Variance Reduction for Training Large Models | MARS: Die Kraft der Varianzreduktion für das Training großer Modelle freisetzen | MARS:释放减少差异的力量,用于培训大型模式 2411.10438v3 |
Authors (5): Huizhuo Yuan, Yifeng Liu, Shuang Wu, Xun Zhou, Quanquan Gu
Training deep neural networks–and more recently, large models demands efficient and scalable optimizers. Adaptive gradient algorithms like Adam, AdamW, and their variants have been central to this task. Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not found widespread success in training deep neural networks or large language models. Consequently, it has remained a less favored approach in modern AI. In this paper, to unleash the power of variance reduction for efficient training of large models, we propose a unified optimization framework, MARS (Make vAriance Reduction Shine), which reconciles preconditioned gradient methods with variance reduction via a scaled stochastic recursive momentum technique. Within our framework, we introduce three instances of MARS that leverage preconditioned gradient updates based on AdamW, Lion, and Shampoo, respectively. We also draw a connection between our algorithms and existing optimizers. Experimental results on training GPT-2 models indicate that MARS consistently outperforms AdamW by a large margin. The implementation of MARS is available at https://github.com/AGI-Arena/MARS.
更近一些时候,大型的神经网络培训更深,大型模型需要高效和可扩缩的优化。亚当、亚当W等适应性梯度算法(MARS,MARS)是这项任务的核心。尽管过去十年来制定了许多减少差异算法,旨在加速在康韦克斯和非康韦克斯环境中的随机优化,但差异减少没有在培训深神经网络或大型语言模型方面取得广泛成功。因此,在现代AI中,它仍然是一个不太有利的方法。在本文件中,为了释放减少差异的力量,以便有效地培训大型模型,我们提出了一个统一的优化框架,即MARS(MARS,MARS,MARS(MER)),它通过一个规模扩大的随机再生动力技术,调和原有的梯度减少差异方法。在我们的框架内,我们引入了三个利用以亚当W、狮子和沙姆波为前提条件的梯度更新的模型。我们还在我们的算法和现有的优化器之间牵线。在培训GPT-2模型的实验结果表明,MARS(MARS)一直比Adam-W大边缘。执行MARS/MARS。在https/MARARIS。
Article 208
Title@2025-07-16 (3): S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic Modeling
Title: S2WTM: Spherical Sliced-Wasserstein Autoencoder for Topic Modeling | S2WTM: Spherical Sliced-Wasserstein Autoencoder für Themenmodellierung | S2WTM: 用于专题建模的球球锯子-Wasserstein自动编码器 2507.12451v1 |
Authors (2): Suman Adhya, Debarshi Kumar Sanyal
Modeling latent representations in a hyperspherical space has proven effective for capturing directional similarities in high-dimensional text data, benefiting topic modeling. Variational autoencoder-based neural topic models (VAE-NTMs) commonly adopt the von Mises-Fisher prior to encode hyperspherical structure. However, VAE-NTMs often suffer from posterior collapse, where the KL divergence term in the objective function highly diminishes, leading to ineffective latent representations. To mitigate this issue while modeling hyperspherical structure in the latent space, we propose the Spherical Sliced Wasserstein Autoencoder for Topic Modeling (S2WTM). S2WTM employs a prior distribution supported on the unit hypersphere and leverages the Spherical Sliced-Wasserstein distance to align the aggregated posterior distribution with the prior. Experimental results demonstrate that S2WTM outperforms state-of-the-art topic models, generating more coherent and diverse topics while improving performance on downstream tasks.
在超球空间中建模潜在代表已证明对获取高维文本数据的方向相似性十分有效,有益于专题建模。在对超球结构进行编码之前,VAE-NTM通常采用 von Mises-Fisher 神经专题模型(VAE-NTMs),但是,VAE-NTMs经常受到后球体崩溃的影响,因为目标中的KL差异术语功能会大大缩小,导致无效的潜在代表。为了减轻这一问题,在对潜在空间的超球结构建模时,我们提议采用Sploical Sliced Wasserstein Autencoder 用于专题建模(S2WTM)。S2WTM使用事先对单位超球体支持的分布,并利用Splic-Wasserstein的距离,使汇总的远球体分布与先前的相匹配。实验结果显示,S2WTM(S2WDM)优于最新主题模型,在改进下游任务绩效的同时产生更加一致和多样化的专题。
Article 209
Title@2025-07-16 (3): Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning
Title: Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning | Navigation auf der sozialen Wohlfahrtsgrenze: Portfolios für multi-objektives Stärkungslernen | 引导社会福利前沿:多目标加强学习一揽子计划 2502.09724v2 |
Authors (7): Cheol Woo Kim, Jai Moondra, Shresth Verma, Madeleine Pollack, Lingkai Kong, Milind Tambe, Swati Gupta
In many real-world applications of reinforcement learning (RL), deployed policies have varied impacts on different stakeholders, creating challenges in reaching consensus on how to effectively aggregate their preferences. Generalized $p$-means form a widely used class of social welfare functions for this purpose, with broad applications in fair resource allocation, AI alignment, and decision-making. This class includes well-known welfare functions such as Egalitarian, Nash, and Utilitarian welfare. However, selecting the appropriate social welfare function is challenging for decision-makers, as the structure and outcomes of optimal policies can be highly sensitive to the choice of $p$. To address this challenge, we study the concept of an $\alpha$-approximate portfolio in RL, a set of policies that are approximately optimal across the family of generalized $p$-means for all $p \in [-\infty, 1]$. We propose algorithms to compute such portfolios and provide theoretical guarantees on the trade-offs among approximation factor, portfolio size, and computational efficiency. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of our approach in summarizing the policy space induced by varying $p$ values, empowering decision-makers to navigate this landscape more effectively.
在许多加强学习的实际应用中,部署的政策对不同的利益攸关方产生了不同的影响,在就如何有效综合其偏好达成共识方面造成了挑战。通用的美元手段形成了一个为此广泛使用的社会福利职能类别,广泛应用于公平的资源分配、大赦国际的调整和决策等。这一类别包括众所周知的福利职能,如Egalitarian、Nash和公用事业福利。然而,选择适当的社会福利职能对决策者来说是困难的,因为最佳政策的结构和结果对于选择美元可能非常敏感。为了应对这一挑战,我们研究了在RL中以美元为单位的近似组合的概念,这是一套在通用美元手段的大家庭中几乎最理想的政策,适用于所有美元[-infty,1]美元。我们提出算法,以计算这种组合,并就近似因素、组合规模和计算效率之间的交易提供理论保证。合成和现实世界数据集的实验结果表明,我们在通过不同的方式有效地改变空间定位,通过不同的方式使空间决策者能够有效地掌握空间定位。
Article 210
Title@2025-07-16 (3): The Utility of the Virtual Imaging Trials Methodology for Objective Characterization of AI Systems and Training Data
Title: The Utility of the Virtual Imaging Trials Methodology for Objective Characterization of AI Systems and Training Data | Die Nützlichkeit der Virtual Imaging Trials Methodik zur objektiven Charakterisierung von KI-Systemen und Trainingsdaten | AI系统和培训数据客观定性虚拟成像试验方法的效用 2308.09730v5 |
Authors (7): Fakrul Islam Tushar, Lavsen Dahal, Saman Sotoudeh-Paima, Ehsan Abadi, W. Paul Segars, Ehsan Samei, Joseph Y. Lo
Purpose: The credibility of Artificial Intelligence (AI) models for medical imaging continues to be a challenge, affected by the diversity of models, the data used to train the models, and applicability of their combination to produce reproducible results for new data. Approach: In this work we aimed to explore if the emerging Virtual Imaging Trials (VIT) methodologies can provide an objective resource to approach this challenge. The study was conducted for the case example of COVID-19 diagnosis using clinical and virtual computed tomography (CT) and chest radiography (CXR) processed with convolutional neural networks. Multiple AI models were developed and tested using 3D ResNet-like and 2D EfficientNetv2 architectures across diverse datasets. Results: The performance differences were evaluated in terms of the area under the curve (AUC) and the DeLong method for AUC confidence intervals. The models trained on the most diverse datasets showed the highest external testing performance, with AUC values ranging from 0.73-0.76 for CT and 0.70-0.73 for CXR. Internal testing yielded higher AUC values (0.77 -0.85 for CT and 0.77-1.0 for CXR), highlighting a substantial drop in performance during external validation, which underscores the importance of diverse and comprehensive training and testing data. Most notably, VIT approach provided objective assessment of the utility of diverse models and datasets while further providing insight into the influence of dataset characteristics, patient factors, and imaging physics on AI efficacy. Conclusions: The VIT approach can be used to enhance model transparency and reliability, offering nuanced insights into the factors driving AI performance and bridging the gap between experimental and clinical settings.
目的:医学成像人工智能(AI)模型的可信度仍然是一项挑战,受到模型多样性、模型培训所使用的数据以及模型组合的可应用性的影响,为新数据产生可复制的结果。方法:在这项工作中,我们旨在探讨新出现的虚拟成像试验(VIT)方法能否提供客观的资源来应对这一挑战。为COVID-19诊断案例进行的研究,使用了临床和虚拟计算断层摄影(CT)和胸部射电学(CXR),并用动态神经网络处理的病人直径镜(CXR),开发并测试了多种AI模型,使用3D ResNet类的特性和2D 节效Netv2结构,以生成新数据集的可复制结果。结果:我们的目标是从正在形成的虚拟成像试验试验(UAC)试验(VI)和DeLong方法下的区域的性能差异评估。 最多样化的数据集的模型显示最高外部测试性能,ACUC方法从0.73-0.76到多样化直径直径直径直径直到CX-0.70R。 内部测试使AUC的数值更接近高的值(0.77-Res),在测试中提供最新数据。
Article 211
Title@2025-07-16 (3): Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length
Title: Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length | Charakterisieren von State Space Model (SSM) und SSM-Transformer Hybrid Language Model Performance mit langer Kontextlänge | 确定国家空间模型(SSM)和SSM-过渡混合语言模型长内性性能特点 2507.12442v1 |
Authors (5): Saptarshi Mitra, Rachid Karami, Haocheng Xu, Sitao Huang, Hyoukjun Kwon
The demand for machine intelligence capable of processing continuous, long-context inputs on local devices is growing rapidly. However, the quadratic complexity and memory requirements of traditional Transformer architectures make them inefficient and often unusable for these tasks. This has spurred a paradigm shift towards new architectures like State Space Models (SSMs) and hybrids, which promise near-linear scaling. While most current research focuses on the accuracy and theoretical throughput of these models, a systematic performance characterization on practical consumer hardware is critically needed to guide system-level optimization and unlock new applications. To address this gap, we present a comprehensive, comparative benchmarking of carefully selected Transformer, SSM, and hybrid models specifically for long-context inference on consumer and embedded GPUs. Our analysis reveals that SSMs are not only viable but superior for this domain, capable of processing sequences up to 220K tokens on a 24GB consumer GPU-approximately 4x longer than comparable Transformers. While Transformers may be up to 1.8x faster at short sequences, SSMs demonstrate a dramatic performance inversion, becoming up to 4x faster at very long contexts (~57K tokens). Our operator-level analysis reveals that custom, hardware-aware SSM kernels dominate the inference runtime, accounting for over 55% of latency on edge platforms, identifying them as a primary target for future hardware acceleration. We also provide detailed, device-specific characterization results to guide system co-design for the edge. To foster further research, we will open-source our characterization framework.
对能够处理当地设备连续、长文本投入的机器情报的需求正在迅速增长,然而,传统变异器结构的四重复杂和记忆要求使得传统变异器结构效率低,往往无法用于这些任务。这促使向国家空间模型(SSMs)和混合体等新结构的范式转变,这些结构有望近线缩放。虽然大多数目前的研究侧重于这些模型的准确性和理论输送量,但对实用消费者硬件的系统性能定性至关重要,以指导系统层面的优化和打开新的应用程序。为弥补这一差距,我们提出了精心选择的变异器、SSSM和混合模型的全面、比较基准化基准,具体针对消费者和嵌入的GPUPs进行长的推断。我们的分析显示,SSMs不仅可行,而且优于此领域,能够对24GB消费者GPU值约4x比可比变异的变异器进行高达220K的顺序处理。虽然变异器在短的顺序上可能达到1.8x开放度优化,但SSMs展示了惊人的反向,在非常长的背景环境背景下的递化,在非常长的背景中将更快地达到4x的递化的递化水平上,在非常的硬化的硬化的系统上显示我们的硬化分析。
Article 212
Title@2025-07-16 (3): Describe Anything Model for Visual Question Answering on Text-rich Images
Title: Describe Anything Model for Visual Question Answering on Text-rich Images | Beschreiben Sie alles Modell für die visuelle Frage Antwort auf Text-reiche Bilder | 描述在丰富文本图像上视觉问答的 “ 任何东西 “ 模式 2507.12441v1 |
Authors (11): Yen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu
Recent progress has been made in region-aware vision-language modeling, particularly with the emergence of the Describe Anything Model (DAM). DAM is capable of generating detailed descriptions of any specific image areas or objects without the need for additional localized image-text alignment supervision. We hypothesize that such region-level descriptive capability is beneficial for the task of Visual Question Answering (VQA), especially in challenging scenarios involving images with dense text. In such settings, the fine-grained extraction of textual information is crucial to producing correct answers. Motivated by this, we introduce DAM-QA, a framework with a tailored evaluation protocol, developed to investigate and harness the region-aware capabilities from DAM for the text-rich VQA problem that requires reasoning over text-based information within images. DAM-QA incorporates a mechanism that aggregates answers from multiple regional views of image content, enabling more effective identification of evidence that may be tied to text-related elements. Experiments on six VQA benchmarks show that our approach consistently outperforms the baseline DAM, with a notable 7+ point gain on DocVQA. DAM-QA also achieves the best overall performance among region-aware models with fewer parameters, significantly narrowing the gap with strong generalist VLMs. These results highlight the potential of DAM-like models for text-rich and broader VQA tasks when paired with efficient usage and integration strategies. Our code is publicly available at https://github.com/Linvyl/DAM-QA.git.
在区域认知的视觉-视觉语言模型方面最近取得了进展,特别是在出现描述Anything模型(DAM)之后。DAM能够生成对任何特定图像区域或对象的详细描述,而不需要额外的本地化图像-文字校正监督。我们假设这种区域层次的描述能力有利于视觉问答(VQA)的任务,特别是在涉及内容稠密图像的具有挑战性的假想中。在这种环境下,对文本信息进行精细提取对于得出正确答案至关重要。受此驱动,我们引入DAM-QA(DAM-QA)是一个带有定制评价协议的框架,旨在调查和利用DAM-QA(DAM-A)对文本丰富的 VQA(VQA-QA)问题的区域认知能力,这要求对图像内容的基于文本的信息进行推理。DAM-QA(DQA)包含一个机制,从多个区域对图像内容的看法中综合解答,从而能够更有效地识别可能与文本相关要素联系在一起的证据。对六个VQA基准的实验表明,我们的方法始终优于DAM(DVVA)基准中显著的7+点获得了在DQA(DVVA)-LA(DAM-LQA)总体业绩模型的精锐化)的精细化结果。DAM-LA(DAM-LA-LA-A(GA-A)的精细化)的精准,这些总的精细化了这些基础,在使用率和GA-LA-LA-LA(G-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-I-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-I-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-
Article 213
Title@2025-07-16 (3): PBM-VFL: Vertical Federated Learning with Feature and Sample Privacy
Title: PBM-VFL: Vertical Federated Learning with Feature and Sample Privacy | PBM-VFL: Vertical Federated Learning mit Feature und Sample Privacy | PBM-VFL: 具有特色和抽样隐私的垂直联邦学习 2501.13916v3 |
Authors (4): Linh Tran, Timothy Castiglia, Stacy Patterson, Ana Milanova
We present Poisson Binomial Mechanism Vertical Federated Learning (PBM-VFL), a communication-efficient Vertical Federated Learning algorithm with Differential Privacy guarantees. PBM-VFL combines Secure Multi-Party Computation with the recently introduced Poisson Binomial Mechanism to protect parties’ private datasets during model training. We define the novel concept of feature privacy and analyze end-to-end feature and sample privacy of our algorithm. We compare sample privacy loss in VFL with privacy loss in HFL. We also provide the first theoretical characterization of the relationship between privacy budget, convergence error, and communication cost in differentially-private VFL. Finally, we empirically show that our model performs well with high levels of privacy.
我们介绍了Poisson Binomial机制的纵向联邦学习(PBM-VFL),这是一种具有不同隐私保障的沟通高效的垂直联邦学习算法。PBM-VFL将安全多党计算与最近推出的Poisson Binomial机制结合起来,以便在示范培训期间保护各方的私人数据集。我们定义了新颖的地物隐私概念,分析了我们的算法的端到端特征和抽样隐私特征。我们比较了VFL的抽样隐私损失与HFL的隐私损失。我们还首次从理论上描述隐私预算、趋同错误以及差别私人VLF的通信成本之间的关系。最后,我们从经验上表明,我们的模型在高隐私水平上表现良好。
Article 214
Title@2025-07-16 (3): A Bayesian Incentive Mechanism for Poison-Resilient Federated Learning
Title: A Bayesian Incentive Mechanism for Poison-Resilient Federated Learning | Ein bayesischer Anreizmechanismus für toxisch-resilientes Federated Learning | 贝耶斯州具有毒性抗毒性的联邦学习激励机制 2507.12439v1 |
Authors (5): Daniel Commey, Rebecca A. Sarpong, Griffith S. Klogo, Winful Bagyl-Bac, Garth V. Crosby
Federated learning (FL) enables collaborative model training across decentralized clients while preserving data privacy. However, its open-participation nature exposes it to data-poisoning attacks, in which malicious actors submit corrupted model updates to degrade the global model. Existing defenses are often reactive, relying on statistical aggregation rules that can be computationally expensive and that typically assume an honest majority. This paper introduces a proactive, economic defense: a lightweight Bayesian incentive mechanism that makes malicious behavior economically irrational. Each training round is modeled as a Bayesian game of incomplete information in which the server, acting as the principal, uses a small, private validation dataset to verify update quality before issuing payments. The design satisfies Individual Rationality (IR) for benevolent clients, ensuring their participation is profitable, and Incentive Compatibility (IC), making poisoning an economically dominated strategy. Extensive experiments on non-IID partitions of MNIST and FashionMNIST demonstrate robustness: with 50% label-flipping adversaries on MNIST, the mechanism maintains 96.7% accuracy, only 0.3 percentage points lower than in a scenario with 30% label-flipping adversaries. This outcome is 51.7 percentage points better than standard FedAvg, which collapses under the same 50% attack. The mechanism is computationally light, budget-bounded, and readily integrates into existing FL frameworks, offering a practical route to economically robust and sustainable FL ecosystems.
联邦学习(FL) 能够让分散的客户进行合作模式培训,同时保护数据隐私。然而,它的开放参与性质使其暴露在数据渗透攻击中,恶意行为者提交了腐败的模型更新,以贬低全球模型。现有的防御往往是被动的,依靠统计汇总规则,可以计算昂贵,而且通常会占诚实多数。本文引入了一种积极的、经济防御:轻量的巴伊西亚激励机制,使恶意行为在经济上变得不合理。每轮培训都以一个不完整信息的巴伊西亚游戏为模型,服务器作为主,使用一个小型的、私人的验证数据集来在付款前核查质量的更新。设计满足了友善客户的个人合理性(IR),确保他们的参与有利可图,以及激励兼容性(IC),从而毒化了经济主导的战略。 大规模试验对非国际标准化的巴伊伊西亚人和新马尼马尼斯特公司(FMIT)的不合理性行为进行经济上不合理性分析。 该机制保持96.7%的准确性,仅0.3个百分点,比有30 %的贴标签和50 %的软度的美度的F-L标准, 正在向当前预算模式提供更精准。
Article 215
Title@2025-07-16 (3): Targeted Deep Architectures: A TMLE-Based Framework for Robust Causal Inference in Neural Networks
Title: Targeted Deep Architectures: A TMLE-Based Framework for Robust Causal Inference in Neural Networks | Gezielte Tiefenarchitekturen: Ein TMLE-basiertes Framework für robuste Kausalableitung in neuralen Netzwerken | 定向深层建筑:以TMLE为基础的神经网络硬性诱因推断框架 2507.12435v1 |
Authors (6): Yi Li, David Mccoy, Nolan Gunter, Kaitlyn Lee, Alejandro Schuler, Mark van der Laan
Modern deep neural networks are powerful predictive tools yet often lack valid inference for causal parameters, such as treatment effects or entire survival curves. While frameworks like Double Machine Learning (DML) and Targeted Maximum Likelihood Estimation (TMLE) can debias machine-learning fits, existing neural implementations either rely on “targeted losses” that do not guarantee solving the efficient influence function equation or computationally expensive post-hoc “fluctuations” for multi-parameter settings. We propose Targeted Deep Architectures (TDA), a new framework that embeds TMLE directly into the network’s parameter space with no restrictions on the backbone architecture. Specifically, TDA partitions model parameters - freezing all but a small “targeting” subset - and iteratively updates them along a targeting gradient, derived from projecting the influence functions onto the span of the gradients of the loss with respect to weights. This procedure yields plug-in estimates that remove first-order bias and produce asymptotically valid confidence intervals. Crucially, TDA easily extends to multi-dimensional causal estimands (e.g., entire survival curves) by merging separate targeting gradients into a single universal targeting update. Theoretically, TDA inherits classical TMLE properties, including double robustness and semiparametric efficiency. Empirically, on the benchmark IHDP dataset (average treatment effects) and simulated survival data with informative censoring, TDA reduces bias and improves coverage relative to both standard neural-network estimators and prior post-hoc approaches. In doing so, TDA establishes a direct, scalable pathway toward rigorous causal inference within modern deep architectures for complex multi-parameter targets.
现代深心神经网络是强大的预测工具,但往往缺乏对因果参数(如治疗效果或整个生存曲线)的有效推断。虽然双机学习(DML)和定向最大隐性估计(TMLE)等框架可以降低机器学习的偏差,但现有的神经系统实施要么依靠“定向损失”,无法保证解决高效影响函数方程式或计算成本昂贵的多参数设置的后热“波动 ” 。我们提议了目标深度深度结构(TDA),这是一个将TMLE直接嵌入网络参数空间,对骨干结构不加限制。具体地说,TDA分区模型参数参数(除了一个小的“目标”子集)可以降低机器学习的偏差,而按照目标梯度反复更新这些参数,从预测损失梯度范围的影响功能不能保证解决高效影响方程式,或者从计算出昂贵的后热度“波动”的多参数设置。我们建议了目标深度深度结构(TDA)可以轻易地将TMLE直接嵌入网络的参数空间空间空间,对主轴结构进行限制。具体地将多因因果级数据覆盖范围范围(包括直径直径直径直径直径直径分析) 和直判的内,将数据直向直径直径径标,将一个直径测测,将数据结果,将一个直向直测测,将数据直测测测,将一个直测,将数据直向直测。
Article 216
Title@2025-07-16 (3): DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
Title: DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications | DUNIA: Pixel-Sized-Embeddings über Cross-Modal Alignment für Erdbeobachtungsanwendungen | DUNIA:通过对地观测应用的跨模式一致利用像素化嵌入 2502.17066v2 |
Authors (9): Ibrahim Fayad, Max Zimmer, Martin Schwartz, Fabian Gieseke, Philippe Ciais, Gabriel Belouze, Sarah Brood, Aurelien De Truchis, Alexandre d’Aspremont
Significant efforts have been directed towards adapting self-supervised multimodal learning for Earth observation applications. However, most current methods produce coarse patch-sized embeddings, limiting their effectiveness and integration with other modalities like LiDAR. To close this gap, we present DUNIA, an approach to learn pixel-sized embeddings through cross-modal alignment between images and full-waveform LiDAR data. As the model is trained in a contrastive manner, the embeddings can be directly leveraged in the context of a variety of environmental monitoring tasks in a zero-shot setting. In our experiments, we demonstrate the effectiveness of the embeddings for seven such tasks: canopy height mapping, fractional canopy cover, land cover mapping, tree species identification, plant area index, crop type classification, and per-pixel waveform-based vertical structure mapping. The results show that the embeddings, along with zero-shot classifiers, often outperform specialized supervised models, even in low-data regimes. In the fine-tuning setting, we show strong performances near or better than the state-of-the-art on five out of six tasks.
在地球观测应用方面,已作出重大努力,使自我监督的多式学习适应地球观测应用,然而,大多数现行方法产生粗糙的零星嵌入,限制了其效力和与LiDAR等其他模式的融合。为缩小这一差距,我们提出了DUNIA,这是通过图像和全波式LIDAR数据之间的跨模式协调学习像素规模嵌入的方法。由于该模型经过对比性培训,在零射环境中的各种环境监测任务中,嵌入可以直接加以利用。在我们的实验中,我们展示了嵌入七种任务的有效性:冠高绘图、分形树枝覆盖、土地覆盖测绘、树种识别、植物面积指数、作物类型分类和以每像素波形为基础的垂直结构绘图。结果显示,嵌入与零发分解器一起,往往超越了专门的监督模型,即使在低数据系统中也是如此。在微调设置中,我们展示了近于或优于六项任务中状态的强性表现。
Article 217
Title@2025-07-16 (3): Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models
Title: Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models | Können wir eine Ausrichtung voraussagen, bevor Modelle das Denken beenden? | 我们能否在模型完成思考之前实现预测一致? 2507.12428v1 |
Authors (3): Yik Siu Chan, Zheng-Xin Yong, Stephen H. Bach
Open-weights reasoning language models generate long chains-of-thought (CoTs) before producing a final response, which improves performance but introduces additional alignment risks, with harmful content often appearing in both the CoTs and the final outputs. In this work, we investigate if we can use CoTs to predict final response misalignment. We evaluate a range of monitoring approaches, including humans, highly-capable large language models, and text classifiers, using either CoT text or activations. First, we find that a simple linear probe trained on CoT activations can significantly outperform all text-based methods in predicting whether a final response will be safe or unsafe. CoT texts are often unfaithful and can mislead humans and classifiers, while model latents (i.e., CoT activations) offer a more reliable predictive signal. Second, the probe makes accurate predictions before reasoning completes, achieving strong performance even when applied to early CoT segments. These findings generalize across model sizes, families, and safety benchmarks, suggesting that lightweight probes could enable real-time safety monitoring and early intervention during generation.
开放加权推理语言模型在作出最终反应之前产生长期的思维链(CoTs),这提高了业绩,但增加了调整风险,有害内容往往出现在CoTs和最终产出中。在这项工作中,我们调查是否可以使用CoTs预测最终反应不匹配。我们评估了一系列监测方法,包括人、高度可控的大型语言模型和文本分类器,使用COT文本或激活。首先,我们发现,在CoT启动方面受过训练的简单线性探测器可以大大超过所有基于文本的方法,从而预测最终反应是否安全或不安全。CoT文本往往不忠,可以误导人类和分类者,而模型潜伏(即Cot激活)则提供更可靠的预测信号。第二,在推理完成之前,在应用早期COT部分时,这些探测结果也会达到很强的性能。这些结果概括了模型大小、家庭和安全基准,表明轻度探测器能够进行实时的安全监测和新一代早期干预。
Article 218
Title@2025-07-16 (3): Unit-Based Histopathology Tissue Segmentation via Multi-Level Feature Representation
Title: Unit-Based Histopathology Tissue Segmentation via Multi-Level Feature Representation | Einheitsbasierte Histopathologie Tissue-Segmentierung über Multi-Level-Feature-Darstellung | 通过多级地物代表制进行分类 2507.12427v1 |
Authors (7): Ashkan Shakarami, Azade Farshad, Yousef Yeganeh, Lorenzo Nicole, Peter Schuffler, Stefano Ghidoni, Nassir Navab
We propose UTS, a unit-based tissue segmentation framework for histopathology that classifies each fixed-size 32 * 32 tile, rather than each pixel, as the segmentation unit. This approach reduces annotation effort and improves computational efficiency without compromising accuracy. To implement this approach, we introduce a Multi-Level Vision Transformer (L-ViT), which benefits the multi-level feature representation to capture both fine-grained morphology and global tissue context. Trained to segment breast tissue into three categories (infiltrating tumor, non-neoplastic stroma, and fat), UTS supports clinically relevant tasks such as tumor-stroma quantification and surgical margin assessment. Evaluated on 386,371 tiles from 459 H&E-stained regions, it outperforms U-Net variants and transformer-based baselines. Code and Dataset will be available at GitHub.
我们建议采用一个基于单位的组织组织分解框架UTS,用于组织病理学,将每个固定大小32*32平方,而不是每个像素分解为分解单元。这种方法减少了批注努力,提高了计算效率,同时又不损害准确性。为了实施这一方法,我们引入了一个多级愿景变异器(L-ViT),它有利于多级特征代表,以捕捉细微的形态和全球组织背景。培训将乳腺组织分成三类(过滤肿瘤、非乳性塑胶和脂肪),UTS支持与临床有关的任务,如肿瘤-温度量化和手术边距评估。对459个H&E地区386,371瓦进行了评估,它优于U-Net变异体和基于变异体的基线。GitHub将提供数据集。
Article 219
Title@2025-07-16 (3): Mixture of Raytraced Experts
Title: Mixture of Raytraced Experts | Mischung von Raytraced Experts | 探雷专家混合体 2507.12419v1 |
Authors (4): Andrea Perin, Giacomo Lagomarsini, Claudio Gallicchio, Giuseppe Nuti
We introduce a Mixture of Raytraced Experts, a stacked Mixture of Experts (MoE) architecture which can dynamically select sequences of experts, producing computational graphs of variable width and depth. Existing MoE architectures generally require a fixed amount of computation for a given sample. Our approach, in contrast, yields predictions with increasing accuracy as the computation cycles through the experts’ sequence. We train our model by iteratively sampling from a set of candidate experts, unfolding the sequence akin to how Recurrent Neural Networks are trained. Our method does not require load-balancing mechanisms, and preliminary experiments show a reduction in training epochs of 10\% to 40\% with a comparable/higher accuracy. These results point to new research directions in the field of MoEs, allowing the design of potentially faster and more expressive models. The code is available at https://github.com/nutig/RayTracing
我们引入了一种雷色专家混合体(Rayttraced Experts Mixture) , 这是一种堆叠的专家混合体(MoE) 结构, 能够动态地选择专家序列, 生成可变宽度和深度的计算图。 现有的教育部结构通常要求对特定样本进行固定的计算量 。 相反, 我们的方法通过专家序列的计算周期, 产生越来越准确的预测值。 我们用一组候选专家的迭接抽样来培训我们的模型, 其序列与经常性神经网络是如何训练的相近。 我们的方法不需要负载平衡机制, 初步实验显示, 培训的学系减少10- 40, 且具有可比/ 更高的精确度。 这些结果指向了教育部领域新的研究方向, 使得设计可能更快和更清晰的模型。 该代码可在 https://github. com/nutig/ RayTastle查阅 https:// github. com/nutig/ RayTastle
Article 220
Title@2025-07-16 (3): AutoVDC: Automated Vision Data Cleaning Using Vision-Language Models
Title: AutoVDC: Automated Vision Data Cleaning Using Vision-Language Models | AutoVDC: Automatisierte Vision-Datenreinigung mit Vision-Sprachenmodellen | AutoVDC:利用视觉语言模型自动清理视觉数据 2507.12414v1 |
Authors (8): Santosh Vasa, Aditi Ramadwar, Jnana Rama Krishna Darabattula, Md Zafar Anwar, Stanislaw Antol, Andrei Vatavu, Thomas Monninger, Sihao Ding
Training of autonomous driving systems requires extensive datasets with precise annotations to attain robust performance. Human annotations suffer from imperfections, and multiple iterations are often needed to produce high-quality datasets. However, manually reviewing large datasets is laborious and expensive. In this paper, we introduce AutoVDC (Automated Vision Data Cleaning) framework and investigate the utilization of Vision-Language Models (VLMs) to automatically identify erroneous annotations in vision datasets, thereby enabling users to eliminate these errors and enhance data quality. We validate our approach using the KITTI and nuImages datasets, which contain object detection benchmarks for autonomous driving. To test the effectiveness of AutoVDC, we create dataset variants with intentionally injected erroneous annotations and observe the error detection rate of our approach. Additionally, we compare the detection rates using different VLMs and explore the impact of VLM fine-tuning on our pipeline. The results demonstrate our method’s high performance in error detection and data cleaning experiments, indicating its potential to significantly improve the reliability and accuracy of large-scale production datasets in autonomous driving.
对自主驾驶系统的培训需要大量具有准确说明的数据集,才能取得稳健的性能。人手说明存在缺陷,而制作高质量数据集往往需要多次迭代。然而,人工审查大型数据集既费力又费钱。我们在本文件中引入了AutoVDC(自动愿景数据清理)框架,并调查视觉语言模型(VLM)的使用情况,以自动识别视觉数据集中的错误说明,从而使用户能够消除这些错误,提高数据质量。我们使用包含自动驾驶物体检测基准的KITTI和nuimages数据集验证我们的方法。为了测试AutoVDC的效力,我们创建数据集变体时故意输入错误说明,并遵守我们方法的错误检测率。此外,我们用不同的VLMS来比较探测率,并探索VLM微调对我们的管道的影响。结果表明,我们的方法在错误检测和数据清理实验方面表现得很高,表明它有可能大大提高自主驾驶大规模生产数据集的可靠性和准确性。
Article 221
Title@2025-07-16 (3): MirrorCBO: A consensus-based optimization method in the spirit of mirror descent
Title: MirrorCBO: A consensus-based optimization method in the spirit of mirror descent | MirrorCBO: Eine Konsens-basierte Optimierungsmethode im Geiste der Spiegelabkunft | BUSRCBO: 本着反光下沉精神采取协商一致的优化方法 2501.12189v2 |
Authors (4): Leon Bungert, Franca Hoffmann, Dohyeon Kim, Tim Roith
In this work we propose MirrorCBO, a consensus-based optimization (CBO) method which generalizes standard CBO in the same way that mirror descent generalizes gradient descent. For this we apply the CBO methodology to a swarm of dual particles and retain the primal particle positions by applying the inverse of the mirror map, which we parametrize as the subdifferential of a strongly convex function $\phi$. In this way, we combine the advantages of a derivative-free non-convex optimization algorithm with those of mirror descent. As a special case, the method extends CBO to optimization problems with convex constraints. Assuming bounds on the Bregman distance associated to $\phi$, we provide asymptotic convergence results for MirrorCBO with explicit exponential rate. Another key contribution is an exploratory numerical study of this new algorithm across different application settings, focusing on (i) sparsity-inducing optimization, and (ii) constrained optimization, demonstrating the competitive performance of MirrorCBO. We observe empirically that the method can also be used for optimization on (non-convex) submanifolds of Euclidean space, can be adapted to mirrored versions of other recent CBO variants, and that it inherits from mirror descent the capability to select desirable minimizers, like sparse ones. We also include an overview of recent CBO approaches for constrained optimization and compare their performance to MirrorCBO.
在这项工作中,我们提出一个基于共识的优化法(MiscCCBO),该方法将标准CBO(CBO)普遍化,以相似的下降法一般化。为此,我们将CBO方法应用到双粒子群和保留原始粒子位置。为此,我们将CBO方法应用到双粒子群,并使用反镜图保留原始粒子位置,我们把它作为强烈正弦函数的次分割法,以美元为对象。这样,我们将无衍生物型非碳优化算法的优势与镜面下降法的优势结合起来。作为一个特殊案例,该方法将CBOO推广到有共质制约的优化问题。假设BBO距离与$和PEPO相连接的界限,我们以明确的指数速率提供无症状的融合结果。另一个关键贡献是探索性地研究不同应用环境中的这种新的算法,重点是(i) 刺激性最优化,以及(ii) 限制优化,以展示USCBO的竞争性表现。我们观察到,该方法也可以用来在(非C-convelecleble borbal assuple) 上优化到最近对C-croborbalbalbisolbal 进行最佳的优化,可以选择的后端的缩化的Cbalbalbalbalbos。
Article 222
Title@2025-07-16 (3): NOCTA: Non-Greedy Objective Cost-Tradeoff Acquisition for Longitudinal Data
Title: NOCTA: Non-Greedy Objective Cost-Tradeoff Acquisition for Longitudinal Data | NOCTA: Nicht-griechisches Ziel Kosten-Tradeoff-Erwerb für Längsschnittdaten | NOCTA: 用于纵向数据的非通用目标 2507.12412v1 |
Authors (4): Dzung Dinh, Boqi Chen, Marc Niethammer, Junier Oliva
In many critical applications, resource constraints limit the amount of information that can be gathered to make predictions. For example, in healthcare, patient data often spans diverse features ranging from lab tests to imaging studies. Each feature may carry different information and must be acquired at a respective cost of time, money, or risk to the patient. Moreover, temporal prediction tasks, where both instance features and labels evolve over time, introduce additional complexity in deciding when or what information is important. In this work, we propose NOCTA, a Non-Greedy Objective Cost-Tradeoff Acquisition method that sequentially acquires the most informative features at inference time while accounting for both temporal dynamics and acquisition cost. We first introduce a cohesive estimation target for our NOCTA setting, and then develop two complementary estimators: 1) a non-parametric method based on nearest neighbors to guide the acquisition (NOCTA-NP), and 2) a parametric method that directly predicts the utility of potential acquisitions (NOCTA-P). Experiments on synthetic and real-world medical datasets demonstrate that both NOCTA variants outperform existing baselines.
在许多关键应用中,资源限制限制了为预测而收集的信息数量,例如,在医疗保健方面,病人数据往往包含从实验室测试到成像研究等不同特征,每个特征可能包含不同的信息,必须分别以时间、金钱或风险成本获得患者。此外,时间预测任务,即既具有实例特征又贴标签随时间演变,在决定何时或何种信息重要时会增加复杂性。在这项工作中,我们提议采用非通用目标成本-贸易获取方法NOCTA,即非通用目标-成本-成本获取方法,该方法在推论时间依次获取最丰富的信息特征,同时计算时间动态和获取成本。我们首先为我们的NOCTA设定引入了一致的估计目标,然后开发了两个互补的估测器:1)基于近邻指导获取的无参数方法(NOCTA-NP),2)直接预测潜在获取的效用的参数方法(NOCTA-P)。合成和现实世界医学数据集实验表明,NOCTA变量都比现有基准。
Article 223
Title@2025-07-16 (3): Simple Mechanistic Explanations for Out-Of-Context Reasoning
Title: Simple Mechanistic Explanations for Out-Of-Context Reasoning | Einfache mechanistische Erklärungen für Out-of-Context Reasoning | 外部逻辑理由的简单机械解释 2507.08218v2 |
Authors (5): Atticus Wang, Joshua Engels, Oliver Clive-Griffin, Senthooran Rajamanoharan, Neel Nanda
Out-of-context reasoning (OOCR) is a phenomenon in which fine-tuned LLMs exhibit surprisingly deep out-of-distribution generalization. Rather than learning shallow heuristics, they implicitly internalize and act on the consequences of observations scattered throughout the fine-tuning data. In this work, we investigate this phenomenon mechanistically and find that many instances of OOCR in the literature have a simple explanation: the LoRA fine-tuning essentially adds a constant steering vector, steering the model towards a general concept. This improves performance on the fine-tuning task and in many other concept-related domains, causing the surprising generalization. Moreover, we can directly train steering vectors for these tasks from scratch, which also induces OOCR. We find that our results hold even for a task that seems like it must involve conditional behavior (model backdoors); it turns out that unconditionally adding a steering vector is sufficient. Overall, our work presents one explanation of what gets learned during fine-tuning for OOCR tasks, contributing to the key question of why LLMs can reason out of context, an advanced capability that is highly relevant to their safe and reliable deployment.
超文本推理(OOCR)是一种现象,在这种现象中,微调的LLMs在分布上表现出出奇的深刻外向。它们不但没有学习浅重的偏差,反而隐含了内在化,并针对微调数据中分散的观测结果的后果采取行动。在这项工作中,我们机械地调查了这一现象,发现文献中许多OOCR的例子都有一个简单的解释:LORA微调基本上增加了一个不变的指导矢量,将模型引向一个一般概念。这改善了微调任务和许多其他概念相关领域的业绩,导致了令人惊讶的概括化。此外,我们可以直接训练从零开始指导矢量任务,这也引出OOCR。我们发现,我们的结果甚至维持着一项似乎必须包含有条件行为(模范后门)的任务;结果显示,无条件增加方向矢量就足够了。总体而言,我们的工作解释了在对OOCR任务进行微调时所学到的教益的一个解释,从而说明了为什么LMs可以从背景中解释出理由的关键问题,一种先进的能力对于其安全可靠部署具有高度相关性。
Article 224
Title@2025-07-16 (3): Large Language Models are Unreliable for Cyber Threat Intelligence
Title: Large Language Models are Unreliable for Cyber Threat Intelligence | Große Sprachmodelle sind für Cyber Threat Intelligence unzuverlässig | 大语言模型在网络威胁情报中不可靠 2503.23175v2 |
Authors (3): Emanuele Mezzi, Fabio Massacci, Katja Tuma
Several recent works have argued that Large Language Models (LLMs) can be used to tame the data deluge in the cybersecurity field, by improving the automation of Cyber Threat Intelligence (CTI) tasks. This work presents an evaluation methodology that other than allowing to test LLMs on CTI tasks when using zero-shot learning, few-shot learning and fine-tuning, also allows to quantify their consistency and their confidence level. We run experiments with three state-of-the-art LLMs and a dataset of 350 threat intelligence reports and present new evidence of potential security risks in relying on LLMs for CTI. We show how LLMs cannot guarantee sufficient performance on real-size reports while also being inconsistent and overconfident. Few-shot learning and fine-tuning only partially improve the results, thus posing doubts about the possibility of using LLMs for CTI scenarios, where labelled datasets are lacking and where confidence is a fundamental factor.
最近的一些工作认为,大语言模型(LLMs)可以通过改进网络威胁情报(CTI)任务的自动化,来驯化网络安全领域的数据巨量,这项工作提出的评价方法,除了允许在使用零光学习、短片学习和微调时测试CTI任务的LMs外,还允许测试CTI任务的LMs外,还允许量化其一致性和信任度。我们试验了三个最先进的LMs和350个威胁情报报告的数据集,并提出了在依赖CTILMs时可能存在安全风险的新证据。我们展示了LLMs如何无法保证在实际规模报告上有足够的业绩,同时又前后不一致和过于自信。少见的学习和微调只能部分地改善结果,从而使人们怀疑在CTI情景中使用LMs的可能性,在这些情景中,贴有标签的数据集缺乏,信任是基本因素。
Article 225
Title@2025-07-16 (3): Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts
Title: Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts | Neurale Netzwerk-geführte symbolische Regression für interpretierbare Deskriptor-Entdeckung in Perovskite-Katalysatoren | Perovskite催化器中可解释描述器发现器的神经网络-导导符号回归 2507.12404v1 |
Authors (3): Yeming Xian, Xiaoming Wang, Yanfa Yan
Understanding and predicting the activity of oxide perovskite catalysts for the oxygen evolution reaction (OER) requires descriptors that are both accurate and physically interpretable. While symbolic regression (SR) offers a path to discover such formulas, its performance degrades with high-dimensional inputs and small datasets. We present a two-phase framework that combines neural networks (NN), feature importance analysis, and symbolic regression (SR) to discover interpretable descriptors for OER activity in oxide perovskites. In Phase I, using a small dataset and seven structural features, we reproduce and improve the known {\mu}/t descriptor by engineering composite features and applying symbolic regression, achieving training and validation MAEs of 22.8 and 20.8 meV, respectively. In Phase II, we expand to 164 features, reduce dimensionality, and identify LUMO energy as a key electronic descriptor. A final formula using {\mu}/t, {\mu}/RA, and LUMO energy achieves improved accuracy (training and validation MAEs of 22.1 and 20.6 meV) with strong physical interpretability. Our results demonstrate that NN-guided symbolic regression enables accurate, interpretable, and physically meaningful descriptor discovery in data-scarce regimes, indicating interpretability need not sacrifice accuracy for materials informatics.
虽然象征性回归(SR)为发现这些公式提供了一条路径,但其性能会随着高维投入和小数据集而退化。我们在第二阶段提出了一个两阶段框架,将神经网络(NN)、特征重要性分析以及象征回归(SR)结合起来,以发现氧化氧气进化反应(OER)活动的可解释性描述符。在第一阶段,使用一个小数据集和七个结构特征,我们复制和改进已知的mu}/t描述符,采用工程复合特征,并采用象征性回归,实现22.8和20.8米V的培训和验证MAE。在第二阶段,我们把神经网络(NNN)扩大到164个特征,降低维度,并将LUMO能源确定为关键的电子描述符。最后一种公式,使用 mu}/t, ru}/RA,以及LUMOO能源, 提高了准确性(培训和验证 MAE22.1和20.6 MEV) 的准确度,并采用了强烈的物理精确性回归,我们的成果显示需要精确的精确性、精确性地解释性地解释,并且能够解释数据,从而解释。
Article 226
Title@2025-07-16 (3): ROC-n-reroll: How verifier imperfection affects test-time scaling
Title: ROC-n-reroll: How verifier imperfection affects test-time scaling | ROC-n-Reroll: Wie die Unvollkommenheit der Prüfer die Skalierung der Testzeit beeinflusst | ROC-n-reroll:核查不完善如何影响测试时间的缩放 2507.12399v1 |
Authors (4): Florian E. Dorner, Yatong Chen, André F. Cruz, Fanny Yang
Test-time scaling aims to improve language model performance by leveraging additional compute during inference. While many works have empirically studied techniques like Best-of-N (BoN) and rejection sampling that make use of a verifier to enable test-time scaling, there is little theoretical understanding of how verifier imperfection affects performance. In this work, we address this gap. Specifically, we prove how instance-level accuracy of these methods is precisely characterized by the geometry of the verifier’s ROC curve. Interestingly, while scaling is determined by the local geometry of the ROC curve for rejection sampling, it depends on global properties of the ROC curve for BoN. As a consequence when the ROC curve is unknown, it is impossible to extrapolate the performance of rejection sampling based on the low-compute regime. Furthermore, while rejection sampling outperforms BoN for fixed compute, in the infinite-compute limit both methods converge to the same level of accuracy, determined by the slope of the ROC curve near the origin. Our theoretical results are confirmed by experiments on GSM8K using different versions of Llama and Qwen to generate and verify solutions.
测试时间的缩放旨在通过在推论期间利用额外的计算来提高语言模型的性能。 虽然许多作品都对使用验证器进行测试时间缩放的“最佳”(BoN)和拒绝取样等技术进行了经验性研究,但对于校验不完善如何影响性能几乎没有理论上的理解。 在这项工作中,我们缩小了这一差距。具体地说,我们证明这些方法的试度准确性能是如何精确地以校验者ROC曲线的几何性能为特征的。有趣的是,虽然缩放由ROC曲线的当地几何性能决定,但它取决于波恩的ROC曲线的全球特性。当ROC曲线不为人所知时,不可能根据低校验制度推断拒绝取样的性能。此外,尽管在无限测算的极限中,这两种方法的试样性能均与精确度相同,由离源很近的ROC曲线的斜度决定,但我们的理论结果通过使用不同版本的Llama和Qwen的GSSM8K实验得到证实,但我们的理论结果得到证实。
Article 227
Title@2025-07-16 (3): BondMatcher: H-Bond Stability Analysis in Molecular Systems
Title: BondMatcher: H-Bond Stability Analysis in Molecular Systems | BondMatcher: H-Bond Stabilitätsanalyse in molekularen Systemen | BondMatcher:H-Bond 分子系统稳定分析 2504.03205v2 |
Authors (3): Thomas Daniel, Malgorzata Olejniczak, Julien Tierny
This application paper investigates the stability of hydrogen bonds (H-bonds), as characterized by the Quantum Theory of Atoms in Molecules (QTAIM). First, we contribute a database of 4544 electron densities associated to four isomers of water hexamers (the so-called Ring, Book, Cage and Prism), generated by distorting their equilibrium geometry under various structural perturbations, modeling the natural dynamic behavior of molecular systems. Second, we present a new stability measure, called bond occurrence rate, associating each bond path present at equilibrium with its rate of occurrence within the input ensemble. We also provide an algorithm, called BondMatcher, for its automatic computation, based on a tailored, geometry-aware partial isomorphism estimation between the extremum graphs of the considered electron densities. Our new stability measure allows for the automatic identification of densities lacking H-bond paths, enabling further visual inspections. Specifically, the topological analysis enabled by our framework corroborates experimental observations and provides refined geometrical criteria for characterizing the disappearance of H-bond paths. Our electron density database and our C++ implementation are available at this address: https://github.com/thom-dani/BondMatcher.
本应用文件调查氢债券(H-bonds)的稳定性(H-bonds),其特征是Molecules(QTAIM)原子原子的量子理论(QTAIM)。首先,我们提供一个数据库,数据库有4544个电子密度,与四种水六相异构体(即所谓的环、书、Cage和Prism)相关联,这些水六相(即所谓的环、书、Cage和Prism)是在不同结构扰动下扭曲其平衡的几何图,模拟分子系统的自然动态行为。第二,我们提出了一个新的稳定度测量标准,称为债券发生率,将平衡时的每一种债券路径与输入合用词的发生速度联系起来。我们还提供了一种算法,称为BondMatcher,用于自动计算。这种算法的基础是根据特定定制的、地貌-aware部分的六相异体模型在考虑的电密度的外形图之间所作的估计。我们的新的稳定度度度度度测量允许自动识别缺乏H-bond路径,从而得以进一步进行视觉检查。具体地貌分析。我们框架所促成的表面分析,证实了实验观测,并提供精确测测测测测测的参数标准,并提供标准,并提供了在现有的C-Metmb/Mingsmb/Mingdrodrodrodropalmalmalmdropalmdromalpalmus。
Article 228
Title@2025-07-16 (3): Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries
Title: Trustworthy Tree-based Machine Learning by $MoS_2$ Flash-based Analog CAM with Inherent Soft Boundaries | Tree-based Machine Learning von $MoS_2$ Flash-basierte analoge CAM mit inhärenten weichen Grenzen | 以可信赖的树为基础的以树为基础的机器学习,用$MoS$2$ 以闪光为基础的模拟 CAM 与固有软边界 2507.12384v1 |
Authors (8): Bo Wen, Guoyun Gao, Zhicheng Xu, Ruibin Mao, Xiaojuan Qi, X. Sharon Hu, Xunzhao Yin, Can Li
The rapid advancement of artificial intelligence has raised concerns regarding its trustworthiness, especially in terms of interpretability and robustness. Tree-based models like Random Forest and XGBoost excel in interpretability and accuracy for tabular data, but scaling them remains computationally expensive due to poor data locality and high data dependence. Previous efforts to accelerate these models with analog content addressable memory (CAM) have struggled, due to the fact that the difficult-to-implement sharp decision boundaries are highly susceptible to device variations, which leads to poor hardware performance and vulnerability to adversarial attacks. This work presents a novel hardware-software co-design approach using $MoS_2$ Flash-based analog CAM with inherent soft boundaries, enabling efficient inference with soft tree-based models. Our soft tree model inference experiments on $MoS_2$ analog CAM arrays show this method achieves exceptional robustness against device variation and adversarial attacks while achieving state-of-the-art accuracy. Specifically, our fabricated analog CAM arrays achieve $96\%$ accuracy on Wisconsin Diagnostic Breast Cancer (WDBC) database, while maintaining decision explainability. Our experimentally calibrated model validated only a $0.6\%$ accuracy drop on the MNIST dataset under $10\%$ device threshold variation, compared to a $45.3\%$ drop for traditional decision trees. This work paves the way for specialized hardware that enhances AI’s trustworthiness and efficiency.
人工智能的快速进步引起了人们对人工智能可信度的关切,特别是在可解释性和稳健性方面。随机森林和XGBost等以树为基础的模型在图表数据的解释性和准确性方面十分出色,但是由于数据位置差和数据依赖性高,在计算成本上仍然非常昂贵。由于难以执行的尖锐决定界限极易受到设备变异的影响,导致硬件性能差和易遭受对抗性攻击。这项工作提出了一种新型的硬件软件软件软件共同设计方法,使用$MoS_2$基于闪烁的模拟 CAM,具有内在的软边界,使得能够以软树为基础的模型进行有效的推论。我们关于$MoS_2$的软树模型推论实验显示,这种方法在设备变异和对抗性攻击时非常可靠,同时,我们伪造的CAM模拟软件阵列在威斯康斯坦辛诊断性乳腺癌(WDBB)中只达到96 $%%$的精准度,同时维持以软树基质为基础的模型,并维持一个在10美元标准标准下进行测试的硬度调整的硬度。
Article 229
Title@2025-07-16 (3): Improving Reinforcement Learning Sample-Efficiency using Local Approximation
Title: Improving Reinforcement Learning Sample-Efficiency using Local Approximation | Verbesserung des Ausbaus des Lernens anhand lokaler Näherungswerte | 利用当地接近率改进强化学习学习抽样效率 2507.12383v1 |
Authors (2): Mohit Prashant, Arvind Easwaran
In this study, we derive Probably Approximately Correct (PAC) bounds on the asymptotic sample-complexity for RL within the infinite-horizon Markov Decision Process (MDP) setting that are sharper than those in existing literature. The premise of our study is twofold: firstly, the further two states are from each other, transition-wise, the less relevant the value of the first state is when learning the $\epsilon$-optimal value of the second; secondly, the amount of ‘effort’, sample-complexity-wise, expended in learning the $\epsilon$-optimal value of a state is independent of the number of samples required to learn the $\epsilon$-optimal value of a second state that is a sufficient number of transitions away from the first. Inversely, states within each other’s vicinity have values that are dependent on each other and will require a similar number of samples to learn. By approximating the original MDP using smaller MDPs constructed using subsets of the original’s state-space, we are able to reduce the sample-complexity by a logarithmic factor to $O(SA \log A)$ timesteps, where $S$ and $A$ are the state and action space sizes. We are able to extend these results to an infinite-horizon, model-free setting by constructing a PAC-MDP algorithm with the aforementioned sample-complexity. We conclude with showing how significant the improvement is by comparing our algorithm against prior work in an experimental setting.
在这次研究中,我们从无限正数 Markov 确定进程( MDP) 中,可能得出大约大约正确(PAC) 在无限正数 Markov 确定进程(MDP) 中,对 RL 的无症状样本复杂性值进行比现有文献更清晰的测试。 我们研究的前提是双重的: 首先, 进一步的两个州是相互的, 过渡性, 第一个州的价值是学习 $\ epsilon $- 最优的第二个州; 第二, “ 努力 ” 、 抽样- 兼容性, 用于学习一个州 $epslon$- 最优的样本值值。 与使用原始美元正数的分集来构建的原始MDP相比, “ fefforest- conformormormority ” 的“ folest- mDP restimeal valateal- serview ” 。 我们通过先期的正数的精确度和亚值 AS- pal- exal- excialalalal- exal- as a exal- exal- sal- ex- excial- exerview as a exact excial exal- ex ex ex.
Article 230
Title@2025-07-16 (3): Heat Kernel Goes Topological
Title: Heat Kernel Goes Topological | Wärme-Kernel wird topologisch | 热中心戈斯地形学 2507.12380v1 |
Authors (2): Maximilian Krahn, Vikas Garg
Topological neural networks have emerged as powerful successors of graph neural networks. However, they typically involve higher-order message passing, which incurs significant computational expense. We circumvent this issue with a novel topological framework that introduces a Laplacian operator on combinatorial complexes (CCs), enabling efficient computation of heat kernels that serve as node descriptors. Our approach captures multiscale information and enables permutation-equivariant representations, allowing easy integration into modern transformer-based architectures. Theoretically, the proposed method is maximally expressive because it can distinguish arbitrary non-isomorphic CCs. Empirically, it significantly outperforms existing topological methods in terms of computational efficiency. Besides demonstrating competitive performance with the state-of-the-art descriptors on standard molecular datasets, it exhibits superior capability in distinguishing complex topological structures and avoiding blind spots on topological benchmarks. Overall, this work advances topological deep learning by providing expressive yet scalable representations, thereby opening up exciting avenues for molecular classification and property prediction tasks.
地形神经网络已成为图形神经网络的强大后继器。 但是,它们通常涉及更高层次的信息传递,这需要大量计算费用。我们绕过这一问题,采用了一个新的地形框架,在组合复合体(CCs)上引入了拉普拉西亚操作员,从而能够有效地计算用作节点描述器的热内核。我们的方法捕捉了多尺度的信息,并使得变异-等异的表达方式能够容易地融入现代变压器结构。理论上,拟议方法具有最高度的表达性,因为它可以区分任意的非形态CCs。在计算效率方面,它大大优于现有的地形方法。除了在标准分子数据集上展示与最先进的描述器的竞争性能外,它还表现出在区分复杂的表层结构、避免在表层基准上出现盲点方面的更高能力。总体而言,这项工作通过提供明确而可伸缩的表达方式,从而在表层学上推进深层次的深层次学习,从而为分子分类和财产预测任务开辟了令人振奋的渠道。
Article 231
Title@2025-07-16 (3): Towards Understanding Link Predictor Generalizability Under Distribution Shifts
Title: Towards Understanding Link Predictor Generalizability Under Distribution Shifts | Auf dem Weg zum Verständnis von Link Predictor Verallgemeinerbarkeit unter Verteilungsverschiebungen | 实现对分配变化下的可通用性 2406.08788v3 |
Authors (3): Jay Revolinsky, Harry Shomer, Jiliang Tang
State-of-the-art link prediction (LP) models demonstrate impressive benchmark results. However, popular benchmark datasets often assume that training, validation, and testing samples are representative of the overall dataset distribution. In real-world situations, this assumption is often incorrect; uncontrolled factors lead new dataset samples to come from a different distribution than training samples. Additionally, the majority of recent work with graph dataset shift focuses on node- and graph-level tasks, largely ignoring link-level tasks. To bridge this gap, we introduce a novel splitting strategy, known as LPShift, which utilizes structural properties to induce a controlled distribution shift. We verify LPShift’s effect through empirical evaluation of SOTA LP models on 16 LPShift variants of original dataset splits, with results indicating drastic changes to model performance. Additional experiments demonstrate graph structure has a strong influence on the success of current generalization methods. Source Code Available Here: https://github.com/revolins/LPShift
最先进的链接预测(LP)模型显示了令人印象深刻的基准成果。然而,流行的基准数据集往往假设培训、验证和测试样本代表了总体数据集分布。在现实世界中,这一假设往往不正确;不受控制的因素导致新的数据集样本来自与培训样本不同的分布;此外,最近关于图表数据集转换的大部分工作侧重于节点和图形层面的任务,在很大程度上忽略了链接层面的任务。为了缩小这一差距,我们引入了一种新的分裂战略,称为LPShift, 利用结构属性促成受控的分布转移。我们通过对16个原始数据集分类的SOTA LP模型的经验性评估,核实LPShift的效应,结果显示模型性能的急剧变化。其他实验显示图表结构对当前通用方法的成功具有重大影响。
Article 232
Title@2025-07-16 (3): Exploring and Analyzing Wildland Fire Data Via Machine Learning Techniques
Title: Exploring and Analyzing Wildland Fire Data Via Machine Learning Techniques | Erforschen und Analysieren von Wildland-Feuerdaten über maschinelle Lerntechniken | 探索和分析荒野火灾数据 2311.05128v2 |
Authors (5): Dipak Dulal, Joseph J. Charney, Michael Gallagher, Carmeliza Navasca, Nicholas Skowronski
This research project investigated the correlation between a 10 Hz time series of thermocouple temperatures and turbulent kinetic energy (TKE) computed from wind speeds collected from a small experimental prescribed burn at the Silas Little Experimental Forest in New Jersey, USA. The primary objective of this project was to explore the potential for using thermocouple temperatures as predictors for estimating the TKE produced by a wildland fire. Machine learning models, including Deep Neural Networks, Random Forest Regressor, Gradient Boosting, and Gaussian Process Regressor, are employed to assess the potential for thermocouple temperature perturbations to predict TKE values. Data visualization and correlation analyses reveal patterns and relationships between thermocouple temperatures and TKE, providing insight into the underlying dynamics. The project achieves high accuracy in predicting TKE by employing various machine learning models despite a weak correlation between the predictors and the target variable. The results demonstrate significant success, particularly from regression models, in accurately estimating the TKE. The research findings contribute to fire behavior and smoke modeling science, emphasizing the importance of incorporating machine learning approaches and identifying complex relationships between fine-scale fire behavior and turbulence. Accurate TKE estimation using thermocouple temperatures allows for the refinement of models that can inform decision-making in fire management strategies, facilitate effective risk mitigation, and optimize fire management efforts. This project highlights the valuable role of machine learning techniques in analyzing wildland fire data, showcasing their potential to advance fire research and management practices.
该项目的主要目的是探讨利用热温温度作为预测因素来估计野地火灾产生的TKE温度的10赫兹时间序列与从美国新泽西州Silas Little实验林中小实验点燃的微小实验性燃烧中采集的风速变化动力能源(TKE)之间的关联。该项目的主要目的是探索利用热温温温度作为预测因素的可能性,以估计野地火灾产生的TKE。机器学习模型,包括深神经网络、随机森林退缩模型、渐进式推进器和高斯进程回归器,用来评估温度温度扰动率渗透以预测TKE值的可能性。数据视觉化和相关性分析揭示了温度温度温度与TKE之间的模式和关系。该项目通过使用各种机器学习模型来预测TKEE,在预测器和目标变量之间的相关性较弱。结果表明,特别是在回归模型、优化前进程评估TKE。研究结果有助于火灾行为和模拟科学,强调采用机器学习方法的重要性,并查明在模拟性温度管理过程中的复杂关系。
Article 233
Title@2025-07-16 (3): Distilling Invariant Representations with Dual Augmentation
Title: Distilling Invariant Representations with Dual Augmentation | Destillieren von Invarianten Darstellungen mit Dual Augmentation | 具有双重加增的蒸馏变异表示式 2410.09474v4 |
Authors (2): Nikolaos Giakoumoglou, Tania Stathaki
Knowledge distillation (KD) has been widely used to transfer knowledge from large, accurate models (teachers) to smaller, efficient ones (students). Recent methods have explored enforcing consistency by incorporating causal interpretations to distill invariant representations. In this work, we extend this line of research by introducing a dual augmentation strategy to promote invariant feature learning in both teacher and student models. Our approach leverages different augmentations applied to both models during distillation, pushing the student to capture robust, transferable features. This dual augmentation strategy complements invariant causal distillation by ensuring that the learned representations remain stable across a wider range of data variations and transformations. Extensive experiments on CIFAR-100 demonstrate the effectiveness of this approach, achieving competitive results in same-architecture KD.
知识蒸馏(KD)已被广泛用于将知识从大型、准确模型(教师)向小型、高效模型(学生)转移,最近的方法通过将因果解释纳入蒸馏变异表征,探索了一致性,在这项工作中,我们扩大了这一研究范围,采用了双重增强战略,促进师生模式中的异同特征学习。我们的方法利用了两种模型在蒸馏过程中应用的不同增量,促使学生捕捉到稳健、可转移的特征。这种双重增强战略通过确保更广泛的数据变异和变异中的知识体现保持稳定,补充了因果蒸馏。关于CIFAR-100的广泛实验证明了这一方法的有效性,在相同的结构中取得了竞争性成果。
Article 234
Title@2025-07-16 (3): Bridging Predictive Coding and MDL: A Two-Part Code Framework for Deep Learning
Title: Bridging Predictive Coding and MDL: A Two-Part Code Framework for Deep Learning | Bridging Predictive Coding und MDL: Ein zweiteiliges Code-Framework für Deep Learning | 架桥预测编码和MDL:深层学习两部分守则框架 2505.14635v2 |
Authors (4): Benjamin Prada, Shion Matsumoto, Abdul Malik Zekri, Ankur Mali
We present the first theoretical framework that connects predictive coding (PC), a biologically inspired local learning rule, with the minimum description length (MDL) principle in deep networks. We prove that layerwise PC performs block-coordinate descent on the MDL two-part code objective, thereby jointly minimizing empirical risk and model complexity. Using Hoeffding’s inequality and a prefix-code prior, we derive a novel generalization bound of the form $R(\theta) \le \hat{R}(\theta) + \frac{L(\theta)}{N}$, capturing the tradeoff between fit and compression. We further prove that each PC sweep monotonically decreases the empirical two-part codelength, yielding tighter high-probability risk bounds than unconstrained gradient descent. Finally, we show that repeated PC updates converge to a block-coordinate stationary point, providing an approximate MDL-optimal solution. To our knowledge, this is the first result offering formal generalization and convergence guarantees for PC-trained deep models, positioning PC as a theoretically grounded and biologically plausible alternative to backpropagation.
我们提出了第一个将预测编码(PC)连接起来的理论框架,这是生物启发的本地学习规则,在深网络中采用最低描述长度(MDL)原则。我们证明,分层的PC在MDL双部分代码目标上进行分块协调的下降,从而共同最大限度地减少经验风险和模型复杂性。利用Hoffding的不平等和前缀代码,我们得出了一个新颖的概括性框架,将表格$R(\theta)\hat{R}(theta)+\frac{L(theta){N}$(MDL){N}$(MDL)+frac{(theta){N}$)连接起来,以捕捉适配和压缩之间的交换。我们进一步证明,每个个人计算机扫描单元的单元代码将实验性双部分代码长度降低,产生比未受限制的梯度梯度梯度下降的高概率风险。最后,我们表明,重复的PC更新会汇集到一个块坐标固定点,提供了大致的MDL-最优的解决方案。据我们所知,这是第一个结果,为PC经过后定位的理论基础和生物上可信的替代模型定位。
Article 235
Title@2025-07-16 (3): Planning-Aware Code Infilling via Horizon-Length Prediction
Title: Planning-Aware Code Infilling via Horizon-Length Prediction | Planning-Aware Code Infilling via Horizon-Length Prediction | 通过地平线-地球预测填充规划-软件代码 2410.03103v3 |
Authors (6): Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, Zijian Wang
Fill-in-the-Middle (FIM), or infilling, has become integral to code language models, enabling generation of missing code given both left and right contexts. However, the current FIM training paradigm which performs next-token prediction (NTP) over reordered sequence often leads to models struggling to generate content that aligns well with the surrounding context. We hypothesize that NTP alone is insufficient for models to learn effective planning conditioned on the distant right context, a critical factor for successful code infilling. To overcome this, we propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens at each step. HLP advances FIM with lookahead planning, enabling models to inherently learn infilling boundaries for arbitrary left and right contexts without relying on dataset-specific post-processing. Our evaluation across different model families and sizes shows that HLP significantly improves FIM performance by up to 24% relatively on diverse benchmarks, across file-level and repository-level. Furthermore, the enhanced planning capability gained through HLP boosts model performance on code reasoning. Importantly, HLP incurs negligible training overhead and no additional inference cost, ensuring its practicality for real-world scenarios.
中途填充(FIM)或填充(FIM)已成为编码语言模型的组成部分,使得在左侧和右侧环境中生成缺失的代码成为了代码模式的组成部分。然而,当前的FIM培训模式,即对顺序重排进行下方预测(NTP)后进行下方预测(NTP)后,往往导致模型难以产生与周围环境相适应的内容。我们假设光是NTP不足以让模型学习以远右环境为条件的有效规划,这是成功填充代码的一个关键因素。为了克服这一点,我们提出了地平线预测(HLP)这一新的培训目标,教给模型来预测每个步骤的剩余中标数。HLP用外观规划推进FIM,使模型能够在不依赖特定数据集后处理的情况下内在地学习为任意的左右环境填充边界。我们对不同模型家族和大小的评价表明,HLP在文件级别和储存库层面的不同基准上大大改进了FIM的绩效,相对提高到24 %。此外,HLP通过HLP推进(HL)推进(HL)系统)的推进(FIP)模型在可计量标准推理算中提高了实际成本。
Article 236
Title@2025-07-16 (3): Sparse Orthogonal Parameters Tuning for Continual Learning
Title: Sparse Orthogonal Parameters Tuning for Continual Learning | Sparse Orthogonale Parameter Tuning für kontinuierliches Lernen | 用于持续学习的 简单正纵向参数图示 2411.02813v2 |
Authors (6): Hai-Jian Ke, Kun-Peng Ning, Yu-Yang Liu, Jia-Yu Yao, Yong-Hong Tian, Li Yuan
Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting. These methods typically refrain from updating the pre-trained parameters and instead employ additional adapters, prompts, and classifiers. In this paper, we from a novel perspective investigate the benefit of sparse orthogonal parameters for continual learning. We found that merging sparse orthogonality of models learned from multiple streaming tasks has great potential in addressing catastrophic forgetting. Leveraging this insight, we propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning). We hypothesize that the effectiveness of SoTU lies in the transformation of knowledge learned from multiple domains into the fusion of orthogonal delta parameters. Experimental evaluations on diverse CL benchmarks demonstrate the effectiveness of the proposed approach. Notably, SoTU achieves optimal feature representation for streaming data without necessitating complex classifier designs, making it a Plug-and-Play solution.
基于预先培训的模型(PTM)的不断学习方法最近引起了注意,这些方法适应了连续的下游任务,而不会发生灾难性的遗忘。这些方法通常不更新预先培训的参数,而是使用更多的适应器、提示器和分类器。在本文中,我们从新颖的角度来调查稀疏或远方参数对持续学习的好处。我们发现,从多重流任务中学习的分散或多位模型在解决灾难性的遗忘方面有着巨大的潜力。我们利用这一洞察力,提出了一种叫SoTU(Sparse Orthogonal参数连接)的新颖而有效的方法。我们假设,SoTU的效力在于将从多个领域学到的知识转化成交汇或交接三角洲参数。对多种CL基准的实验性评价显示了拟议方法的有效性。值得注意的是,SoTU在不需要复杂的分类设计的情况下,实现了流数据的最佳特征代表,使之成为一个“浮点和点”解决方案。
Article 237
Title@2025-07-16 (3): Active Deep Kernel Learning of Molecular Properties: Realizing Dynamic Structural Embeddings
Title: Active Deep Kernel Learning of Molecular Properties: Realizing Dynamic Structural Embeddings | Aktives tiefes Kernel-Lernen von molekularen Eigenschaften: Dynamische strukturelle Einbettungen realisieren | 活跃的分子属性深核学习:实现动态结构嵌入 2403.01234v2 |
Authors (3): Ayana Ghosh, Maxim Ziatdinov, Sergei V. Kalinin
As vast databases of chemical identities become increasingly available, the challenge shifts to how we effectively explore and leverage these resources to study molecular properties. This paper presents an active learning approach for molecular discovery using Deep Kernel Learning (DKL), demonstrated on the QM9 dataset. DKL links structural embeddings directly to properties, creating organized latent spaces that prioritize relevant property information. By iteratively recalculating embedding vectors in alignment with target properties, DKL uncovers concentrated maxima representing key molecular properties and reveals unexplored regions with potential for innovation. This approach underscores DKL’s potential in advancing molecular research and discovery.
随着大量化学特性数据库的日益普及,挑战转向我们如何有效探索和利用这些资源研究分子特性。本文件展示了利用深核心学习(DKL)进行分子发现的积极学习方法,在QM9数据集中展示了这一方法。DKL将结构嵌入直接连接到属性,创建了有组织的潜在空间,将相关财产信息列为优先事项。DKL根据目标特性对嵌入矢量进行迭代重新计算,发现了代表关键分子特性的集中最大值,并揭示了具有创新潜力的未勘探区域。这一方法强调了DKL在推进分子研究和发现方面的潜力。
Article 238
Title@2025-07-16 (3): Nonlinear Concept Erasure: a Density Matching Approach
Title: Nonlinear Concept Erasure: a Density Matching Approach | Nichtlineare Konzeptauslöschung: ein Density-Matching-Ansatz | 非线性概念时代:密度匹配方法 2507.12341v1 |
Authors (2): Antoine Saillenfest, Pirmin Lemberger
Ensuring that neural models used in real-world applications cannot infer sensitive information, such as demographic attributes like gender or race, from text representations is a critical challenge when fairness is a concern. We address this issue through concept erasure, a process that removes information related to a specific concept from distributed representations while preserving as much of the remaining semantic information as possible. Our approach involves learning an orthogonal projection in the embedding space, designed to make the class-conditional feature distributions of the discrete concept to erase indistinguishable after projection. By adjusting the rank of the projector, we control the extent of information removal, while its orthogonality ensures strict preservation of the local structure of the embeddings. Our method, termed $\overline{\mathrm{L}}$EOPARD, achieves state-of-the-art performance in nonlinear erasure of a discrete attribute on classic natural language processing benchmarks. Furthermore, we demonstrate that $\overline{\mathrm{L}}$EOPARD effectively mitigates bias in deep nonlinear classifiers, thereby promoting fairness.
确保现实世界应用中使用的神经模型无法从文本表述中推断出敏感信息,如性别或种族等人口特征,这是一个关键的挑战,因为公平是一个令人关切的问题。我们通过概念删除来解决这个问题,这个过程从分布式表述中去除与特定概念有关的信息,同时尽可能保留其余语义信息。我们的方法是在嵌入空间中学习一个正方位投影,目的是使离散概念的等级特性分布在投影后消除不可分性。我们通过调整投影器的级别,控制信息删除的程度,而信息删除的大小则确保严格保护嵌入器的当地结构。我们的方法叫做$\ overline ~ mathrm{L $$$EOPARD,在传统的自然语言处理基准的离散属性上实现非线式缩小状态的状态。此外,我们证明$\ overline ~ {L$EOPARD 有效地减轻了深度非线级分类者的偏差,从而促进了公平性。
Article 239
Title@2025-07-16 (3): GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning
Title: GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning | GHPO: Adaptive Anleitung für stabiles und effizientes LLM-Verstärkungslernen | GHPO: 稳定有效的LLM强化学习适应性指导 2507.10628v2 |
Authors (10): Ziru Liu, Cheng Gong, Xinyu Fu, Yaofang Liu, Ran Chen, Shoubo Hu, Suiyun Zhang, Rui Liu, Qingfu Zhang, Dandan Tu
Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a powerful paradigm for facilitating the self-improvement of large language models (LLMs), particularly in the domain of complex reasoning tasks. However, prevailing on-policy RL methods often contend with significant training instability and inefficiency. This is primarily due to a capacity-difficulty mismatch, where the complexity of training data frequently outpaces the model’s current capabilities, leading to critically sparse reward signals and stalled learning progress. This challenge is particularly acute for smaller, more resource-efficient LLMs. To overcome this, we introduce the Guided Hybrid Policy Optimization (GHPO), a novel difficulty-aware reinforcement learning framework. GHPO dynamically calibrates task difficulty by employing adaptive prompt refinement to provide targeted guidance. This unique approach adaptively balances direct imitation learning for problems currently beyond the model’s reach with exploration-based reinforcement learning for more manageable tasks, effectively creating a smooth and optimized learning curriculum. Extensive experiments demonstrate that GHPO achieves an average performance gain of approximately 5% across six challenging mathematics benchmarks, consistently outperforming strong on-policy reinforcement learning and curriculum learning baselines. Further analysis confirms that our framework significantly enhances both training stability and final reasoning performance, thus offering a scalable and efficient solution for developing powerful and robust reasoning models.
最近,在推动大型语文模式的自我改进方面,特别是在复杂的推理任务领域,普遍的政策性学习方法往往与培训方面的严重不稳定和效率低下相冲突,这主要是因为能力性差异性不匹配,培训数据的复杂性往往超过模型目前的能力,导致奖励信号极为稀少,学习进展停滞不前。对于规模较小、资源效率更高的LLM公司来说,这一挑战尤为严峻。为了克服这一挑战,我们引入了指导性混合政策优化(GHPO),这是一个新颖的难测强化学习框架。GHPO动态地调整了任务难度,采用适应性的即时改进来提供有针对性的指导。这一独特的方法在直接模仿目前超出模型范围的问题的学习时,与探索性的强化学习学习相比,对于更便于管理的任务,有效地创建一种平稳和优化的学习课程。广泛的实验表明,GPO在六项挑战性数学基准中取得了大约5%的平均业绩,持续地超越了强有力的政策性强化学习和强化性学习基准,从而进一步确定了一种强有力的推理学基准。
Article 240
Title@2025-07-16 (3): Neural Polar Decoders for Deletion Channels
Title: Neural Polar Decoders for Deletion Channels | Neurale Polardecoder für Löschkanäle | Dedeletion 通道的神经极极代碼器 2507.12329v1 |
Authors (2): Ziv Aharoni, Henry D. Pfister
This paper introduces a neural polar decoder (NPD) for deletion channels with a constant deletion rate. Existing polar decoders for deletion channels exhibit high computational complexity of $O(N^4)$, where $N$ is the block length. This limits the application of polar codes for deletion channels to short-to-moderate block lengths. In this work, we demonstrate that employing NPDs for deletion channels can reduce the computational complexity. First, we extend the architecture of the NPD to support deletion channels. Specifically, the NPD architecture consists of four neural networks (NNs), each replicating fundamental successive cancellation (SC) decoder operations. To support deletion channels, we change the architecture of only one. The computational complexity of the NPD is $O(AN\log N)$, where the parameter $A$ represents a computational budget determined by the user and is independent of the channel. We evaluate the new extended NPD for deletion channels with deletion rates $\delta\in{0.01, 0.1}$ and we verify the NPD with the ground truth given by the trellis decoder by Tal et al. We further show that due to the reduced complexity of the NPD, we are able to incorporate list decoding and further improve performance. We believe that the extended NPD presented here could have applications in future technologies like DNA storage.
本文引入了用于删除频道的神经极极解码器( NPD ) 。 现有的极极解码器( NPD) 用于删除频道, 其删除速度不变 。 现有的极解码器( NPD) 显示的计算复杂度很高 $( NN4) , 美元为区块长度 。 这限制了极代代码用于删除频道的运用到中短区块长度 。 在这项工作中, 我们证明, 使用 NPD 来删除频道可以降低计算复杂性 。 首先, 我们扩展了 NPD 结构以支持删除频道 。 具体地说, NPD 结构由四个神经网络( NNPS) 组成, 每个复制基本连续取消( SC) 解码器操作。 为了支持删除频道, 我们只改变一个。 NPD的计算复杂度为$( AN) 美元( N) 美元) , 其中参数代表用户确定的计算预算, 并且独立于频道 。 我们评估新的扩展 NPD 以删除速度 $\ 0.01, 0. 1 0. 0. 0. 1 美元 。 我们根据tellis decoder diced distrut distrual diviews the the des the lacultut the lacultut the wes
Article 241
Title@2025-07-16 (3): Quantifying calibration error in modern neural networks through evidence based theory
Title: Quantifying calibration error in modern neural networks through evidence based theory | Quantifizierung von Kalibrierfehlern in modernen neuronalen Netzwerken durch evidenzbasierte Theorie | 通过基于证据的理论对现代神经网络中的校准错误进行量化 2411.00265v2 |
Authors (1): Koffi Ismael Ouattara
Trustworthiness in neural networks is crucial for their deployment in critical applications, where reliability, confidence, and uncertainty play pivotal roles in decision-making. Traditional performance metrics such as accuracy and precision fail to capture these aspects, particularly in cases where models exhibit overconfidence. To address these limitations, this paper introduces a novel framework for quantifying the trustworthiness of neural networks by incorporating subjective logic into the evaluation of Expected Calibration Error (ECE). This method provides a comprehensive measure of trust, disbelief, and uncertainty by clustering predicted probabilities and fusing opinions using appropriate fusion operators. We demonstrate the effectiveness of this approach through experiments on MNIST and CIFAR-10 datasets, where post-calibration results indicate improved trustworthiness. The proposed framework offers a more interpretable and nuanced assessment of AI models, with potential applications in sensitive domains such as healthcare and autonomous systems.
神经网络的可信赖性对于在关键应用中部署神经网络至关重要,在关键应用中,可靠性、信心和不确定性在决策中起着关键作用。传统的性能衡量标准,如精确度和精确度等,未能捕捉到这些方面,特别是在模型表现出过度自信的情况下。为克服这些限制,本文件提出了一个新的框架,通过将主观逻辑纳入对预期校准错误(欧洲经委会)的评估,量化神经网络的可信赖性。这种方法通过将预测的概率和观点集中起来,利用适当的聚合操作器来全面衡量信任性、不可置信性和不确定性。我们通过对MNIST和CIFAR-10数据集的实验,展示了这一方法的有效性,在这些数据库中,校准后的结果表明信任性有所提高。拟议框架提供了对AI模型的更可解释性和细微的评估,有可能在诸如保健和自主系统等敏感领域应用。
Article 242
Title@2025-07-16 (3): MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball
Title: MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball | MVP-Shapley: Featurebasierte Modellierung für die Bewertung des wertvollsten Spielers im Basketball | MVP-Shaplay:评估篮球中最有价值的玩家的基于地物的模型模型 2506.04602v2 |
Authors (8): Haifeng Sun, Yu Xiong, Runze Wu, Kai Wang, Lan Zhang, Changjie Fan, Shaojie Tang, Xiang-Yang Li
The burgeoning growth of the esports and multiplayer online gaming community has highlighted the critical importance of evaluating the Most Valuable Player (MVP). The establishment of an explainable and practical MVP evaluation method is very challenging. In our study, we specifically focus on play-by-play data, which records related events during the game, such as assists and points. We aim to address the challenges by introducing a new MVP evaluation framework, denoted as \oursys, which leverages Shapley values. This approach encompasses feature processing, win-loss model training, Shapley value allocation, and MVP ranking determination based on players’ contributions. Additionally, we optimize our algorithm to align with expert voting results from the perspective of causality. Finally, we substantiated the efficacy of our method through validation using the NBA dataset and the Dunk City Dynasty dataset and implemented online deployment in the industry.
埃斯波特和多玩家在线赌博界的快速增长突显了评价最有价值的玩家(MVP)的至关重要性。 建立一个可解释和实用的MVP评价方法非常具有挑战性。 在我们的研究中,我们特别侧重于记录游戏期间相关事件的逐个游戏数据,例如协助和点。 我们的目标是通过引入一个新的MVP评价框架来应对挑战,这个框架被称为\oursys,它利用了Shapley值。这个方法包括地物处理、赢损模型培训、损益价值分配和基于玩家贡献的MVP排名确定。 此外,我们优化了我们的算法,以便从因果关系的角度与专家投票结果保持一致。 最后,我们通过使用NBA数据集和Dunk City Dynasty数据集验证了我们的方法的有效性,并在行业中实施了在线部署。
Article 243
Title@2025-07-16 (3): Thought Purity: Defense Paradigm For Chain-of-Thought Attack
Title: Thought Purity: Defense Paradigm For Chain-of-Thought Attack | Thought Purity: Verteidigungsparadigm für den Ketten-of-Thought-Angriff | 思想纯度: 研究链攻击的防御范式 2507.12314v1 |
Authors (9): Zihao Xue, Zhen Bi, Long Ma, Zhenlin Hu, Yan Wang, Zhenfang Liu, Qing Sheng, Jie Xiao, Jungang Lou
While reinforcement learning-trained Large Reasoning Models (LRMs, e.g., Deepseek-R1) demonstrate advanced reasoning capabilities in the evolving Large Language Models (LLMs) domain, their susceptibility to security threats remains a critical vulnerability. This weakness is particularly evident in Chain-of-Thought (CoT) generation processes, where adversarial methods like backdoor prompt attacks can systematically subvert the model’s core reasoning mechanisms. The emerging Chain-of-Thought Attack (CoTA) reveals this vulnerability through exploiting prompt controllability, simultaneously degrading both CoT safety and task performance with low-cost interventions. To address this compounded security-performance vulnerability, we propose Thought Purity (TP): a defense paradigm that systematically strengthens resistance to malicious content while preserving operational efficacy. Our solution achieves this through three synergistic components: (1) a safety-optimized data processing pipeline (2) reinforcement learning-enhanced rule constraints (3) adaptive monitoring metrics. Our approach establishes the first comprehensive defense mechanism against CoTA vulnerabilities in reinforcement learning-aligned reasoning systems, significantly advancing the security-functionality equilibrium for next-generation AI architectures.
虽然经过强化学习培训的大理由模型(LRM,如Deepseek-R1)显示,在不断发展的大型语言模型(LLMS)领域,它们容易受到安全威胁,这仍然是一个严重的弱点,这种弱点在Thought(CoT)的生成过程中特别明显,在这个过程中,诸如后门即时攻击等对抗性方法可以系统地破坏模型的核心推理机制。新兴的 “ 推敲链攻击 “ (CoTA)通过利用迅速的可控性暴露出这种脆弱性,同时降低COT的安全和任务性能,同时采取低成本干预措施。为解决这种复杂的安全性-表现脆弱性,我们提议 “ 思想纯度 “ (TP):一种防御性模式,系统地加强对恶意内容的抵制,同时保持操作效率。我们的解决办法是通过三个协同组成部分实现这一点:(1) 安全优化数据处理管道(2) 强化学习强化规则制约(3) 适应性监测指标。我们的方法建立了第一个防止CTA脆弱性的全面防御机制,以加强学习一致的推理系统,大大推进下一代AI结构的安全性-功能平衡。
Article 244
Title@2025-07-16 (3): PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning
Title: PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning | PROL : Probefreies kontinuierliches Lernen in Streaming-Daten über Prompt Online-Lernen | PROL: 通过即时在线学习在流数据中进行排练免费持续学习 2507.12305v1 |
Authors (6): M. Anwar Ma’sum, Mahardhika Pratama, Savitha Ramasamy, Lin Liu, Habibullah Habibullah, Ryszard Kowalczyk
The data privacy constraint in online continual learning (OCL), where the data can be seen only once, complicates the catastrophic forgetting problem in streaming data. A common approach applied by the current SOTAs in OCL is with the use of memory saving exemplars or features from previous classes to be replayed in the current task. On the other hand, the prompt-based approach performs excellently in continual learning but with the cost of a growing number of trainable parameters. The first approach may not be applicable in practice due to data openness policy, while the second approach has the issue of throughput associated with the streaming data. In this study, we propose a novel prompt-based method for online continual learning that includes 4 main components: (1) single light-weight prompt generator as a general knowledge, (2) trainable scaler-and-shifter as specific knowledge, (3) pre-trained model (PTM) generalization preserving, and (4) hard-soft updates mechanism. Our proposed method achieves significantly higher performance than the current SOTAs in CIFAR100, ImageNet-R, ImageNet-A, and CUB dataset. Our complexity analysis shows that our method requires a relatively smaller number of parameters and achieves moderate training time, inference time, and throughput. For further study, the source code of our method is available at https://github.com/anwarmaxsum/PROL.
在线持续学习(OCL)中的数据隐私限制(OCL)是数据只能看一次的,这使流数据中灾难性的遗忘问题复杂化。在OCL中,目前SOATA采用的一种通用方法是使用记忆保存表象或前几类的特征,在目前的任务中要重复使用。另一方面,基于迅速的方法在持续学习中表现得很好,但费用却在不断增加的可训练参数方面。由于数据开放政策,第一种方法可能无法在实践中适用,而第二种方法则涉及流数据中的吞吐问题。在本研究中,我们提出了一种新的基于快速的在线持续学习方法,其中包括四个主要组成部分:(1) 单一轻量级快速生成器,作为一般知识,(2) 可培训的缩放器和变换器,作为具体知识,(3) 预先培训的模型(PTM) 总体保存,(4) 硬软更新机制。我们提出的方法的绩效可能大大高于当前在CIFAR100、图像网络-R、图像Net-A和CUB数据设置中的SUB。我们的复杂度分析表明,我们的方法需要一种比较小的源代码,在中的时间序列中,通过我们现有的源数和源数的研究。
Article 245
Title@2025-07-16 (3): AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization
Title: AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization | AnnoPage Datensatz: Datensatz nicht-textlicher Elemente in Dokumenten mit feinkörniger Kategorisierung | AnnoPage 数据集: 精细分类文档中非形式元素数据集 2503.22526v3 |
Authors (5): Martin Kišš, Michal Hradiš, Martina Dvořáková, Václav Jiroušek, Filip Kersch
We introduce the AnnoPage Dataset, a novel collection of 7,550 pages from historical documents, primarily in Czech and German, spanning from 1485 to the present, focusing on the late 19th and early 20th centuries. The dataset is designed to support research in document layout analysis and object detection. Each page is annotated with axis-aligned bounding boxes (AABB) representing elements of 25 categories of non-textual elements, such as images, maps, decorative elements, or charts, following the Czech Methodology of image document processing. The annotations were created by expert librarians to ensure accuracy and consistency. The dataset also incorporates pages from multiple, mainly historical, document datasets to enhance variability and maintain continuity. The dataset is divided into development and test subsets, with the test set carefully selected to maintain the category distribution. We provide baseline results using YOLO and DETR object detectors, offering a reference point for future research. The AnnoPage Dataset is publicly available on Zenodo (https://doi.org/10.5281/zenodo.12788419), along with ground-truth annotations in YOLO format.
我们引入了AnnoPage数据集,这是从1485年至今主要以捷克文和德文撰写的历史文献中7 550页的新书,主要以捷克文和德文撰写,侧重于19世纪末和20世纪初,该数据集旨在支持文件布局分析和物体探测方面的研究,每页都配有附加说明的轴边框(AABB),代表25类非文字元素的要素,如图像、地图、装饰元素或图表,采用捷克图像文件处理方法,这些元素由专家图书管理员创建,以确保准确性和一致性。数据集还包含多个页面,主要是历史文档数据集,以加强变异性和维护连续性。数据集分为开发和测试子组,为保持类别分布而仔细选择了测试组。我们使用 YOLO 和 DETR 物体探测器提供基线结果,为未来研究提供了一个参考点。AnnoPage数据集由专家图书管理员创建(https://doi.org/10281/zenodo.127819),连同YOL格式的地面图示。
Article 246
Title@2025-07-16 (3): ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy
Title: ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy | ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy | 维一致:细胞显微镜:扩大生物代表性学习 2411.02572v2 |
Authors (13): Kian Kenyon-Dean, Zitong Jerry Wang, John Urbanik, Konstantin Donhauser, Jason Hartford, Saber Saberian, Nil Sahin, Ihab Bendidi, Safiye Celik, Marta Fay, Juan Sebastian Rodriguez Vera, Imran S Haque, Oren Kraus
Large-scale cell microscopy screens are used in drug discovery and molecular biology research to study the effects of millions of chemical and genetic perturbations on cells. To use these images in downstream analysis, we need models that can map each image into a feature space that represents diverse biological phenotypes consistently, in the sense that perturbations with similar biological effects have similar representations. In this work, we present the largest foundation model for cell microscopy data to date, a new 1.9 billion-parameter ViT-G/8 MAE trained on over 8 billion microscopy image crops. Compared to a previous published ViT-L/8 MAE, our new model achieves a 60% improvement in linear separability of genetic perturbations and obtains the best overall performance on whole-genome biological relationship recall and replicate consistency benchmarks. Beyond scaling, we developed two key methods that improve performance: (1) training on a curated and diverse dataset; and, (2) using biologically motivated linear probing tasks to search across each transformer block for the best candidate representation of whole-genome screens. We find that many self-supervised vision transformers, pretrained on either natural or microscopy images, yield significantly more biologically meaningful representations of microscopy images in their intermediate blocks than in their typically used final blocks. More broadly, our approach and results provide insights toward a general strategy for successfully building foundation models for large-scale biological data.
大型细胞显微镜屏用于药物发现和分子生物学研究,以研究数以百万计的化学和遗传扰动细胞对细胞的影响。在下游分析中使用这些图像。为了在下游分析中使用这些图像,我们需要模型,能够将每个图像映射成一个持续代表不同生物苯型的特征空间,即具有类似生物效果的扰动具有相似的表现形式。在这项工作中,我们提出了迄今为止细胞显微镜数据的最大基础模型,即一个新的19亿兆字节Vit-G/8 MAE,对80多亿个显微镜作物进行了培训。与以前出版的Vit-L/8 MAE相比,我们的新模型在基因扰动的线性分离方面实现了60%的改进,并获得了全基因生物关系回顾和复制一致性基准方面的最佳总体表现。除了缩放外,我们还开发了两种主要方法来提高性能:(1) 精度和多样化数据集的培训;(2) 利用生物动机的线形探测任务,在每个变形块中搜索全基因屏幕的最佳候选表示方式。我们发现,许多自我监督的中间图像的模型通常用于较有说服力的模型。
Article 247
Title@2025-07-16 (3): RegCL: Continual Adaptation of Segment Anything Model via Model Merging
Title: RegCL: Continual Adaptation of Segment Anything Model via Model Merging | RegCL: Kontinuierliche Anpassung des Segments an alles Modell über Modellverschmelzung | RegCL:通过模型合并不断调整区段 “ 任何东西 “ 模式 2507.12297v1 |
Authors (3): Yuan-Chen Shu, Zhiwei Lin, Yongtao Wang
To address the performance limitations of the Segment Anything Model (SAM) in specific domains, existing works primarily adopt adapter-based one-step adaptation paradigms. However, some of these methods are specific developed for specific domains. If used on other domains may lead to performance degradation. This issue of catastrophic forgetting severely limits the model’s scalability. To address this issue, this paper proposes RegCL, a novel non-replay continual learning (CL) framework designed for efficient multi-domain knowledge integration through model merging. Specifically, RegCL incorporates the model merging algorithm into the continual learning paradigm by merging the parameters of SAM’s adaptation modules (e.g., LoRA modules) trained on different domains. The merging process is guided by weight optimization, which minimizes prediction discrepancies between the merged model and each of the domain-specific models. RegCL effectively consolidates multi-domain knowledge while maintaining parameter efficiency, i.e., the model size remains constant regardless of the number of tasks, and no historical data storage is required. Experimental results demonstrate that RegCL achieves favorable continual learning performance across multiple downstream datasets, validating its effectiveness in dynamic scenarios.
为解决在特定领域“部分Any Any”模式(SAM)的绩效局限性,现有工作主要采用基于适应器的单步适应模式。但是,其中一些方法是针对特定领域的特定方法。如果在其他领域使用,可能会导致性能退化。这个灾难性的忘记问题严重限制了模型的可缩放性。为解决这一问题,本文件提议了RegCL,这是一个新颖的非重复持续学习(CL)框架,旨在通过模型合并实现高效的多领域知识整合。具体地说,RegCL将模型合并算法纳入持续学习模式,方法是将SAM适应模块(例如LORA模块)的参数合并为不同领域培训。合并过程以加权优化为指导,最大限度地缩小合并模型与每个特定领域模型之间的预测差异。 RegCL在维持参数效率的同时,有效地整合了多领域知识,也就是说,无论任务数量如何,模型的规模保持不变,不需要历史数据储存。实验结果表明,RECL在多个下游数据集中取得了有利的持续学习业绩,从而证实其动态情景的有效性。
Article 248
Title@2025-07-16 (3): Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding
Title: Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding | Text-ADBench: Text-Anomaly Detection Benchmark basierend auf LLMs Einbetten | 文本 – – 亚银:基于嵌入LLMs的文本异常检测基准 2507.12295v1 |
Authors (2): Feng Xiao, Jicong Fan
Text anomaly detection is a critical task in natural language processing (NLP), with applications spanning fraud detection, misinformation identification, spam detection and content moderation, etc. Despite significant advances in large language models (LLMs) and anomaly detection algorithms, the absence of standardized and comprehensive benchmarks for evaluating the existing anomaly detection methods on text data limits rigorous comparison and development of innovative approaches. This work performs a comprehensive empirical study and introduces a benchmark for text anomaly detection, leveraging embeddings from diverse pre-trained language models across a wide array of text datasets. Our work systematically evaluates the effectiveness of embedding-based text anomaly detection by incorporating (1) early language models (GloVe, BERT); (2) multiple LLMs (LLaMa-2, LLama-3, Mistral, OpenAI (small, ada, large)); (3) multi-domain text datasets (news, social media, scientific publications); (4) comprehensive evaluation metrics (AUROC, AUPRC). Our experiments reveal a critical empirical insight: embedding quality significantly governs anomaly detection efficacy, and deep learning-based approaches demonstrate no performance advantage over conventional shallow algorithms (e.g., KNN, Isolation Forest) when leveraging LLM-derived embeddings.In addition, we observe strongly low-rank characteristics in cross-model performance matrices, which enables an efficient strategy for rapid model evaluation (or embedding evaluation) and selection in practical applications. Furthermore, by open-sourcing our benchmark toolkit that includes all embeddings from different models and code at https://github.com/jicongfan/Text-Anomaly-Detection-Benchmark, this work provides a foundation for future research in robust and scalable text anomaly detection systems.
尽管在大型语言模型(LLMS)和异常检测算法方面取得重大进展,但缺乏标准化和全面的基准来评价文本数据的现有异常探测方法,因此,严格比较和开发创新方法;这项工作进行了全面的实证研究,并引入了文本异常检测基准,利用各种经过预先培训的语言模型在各种文本数据集中的嵌入。我们的工作系统地评价嵌入基于嵌入文本异常检测方法的有效性,包括:(1)早期语言模型(GloVe、BERT);(2)多个LLMS(LLLAMA-2、Lalama-3、Mistral、OpenAI(MLSLAM-2、LOS、ALMA、Ada、大);(3)多方文本数据集(新、社交媒体、科学出版物);(4)综合评价衡量标准(AUROC、AUPRC)。 我们的实验揭示了一种至关重要的经验洞察:嵌入质量显著地规范了基于异常检测的效能和深层次的基于学习的方法,包括:在常规的浅度算算法(egroupal-rational-rmal-comma)上,我们利用了一种快速的系统,从而更准确地利用了我们的数据库基础基础基础化了一种快速的文本。
Article 249
Title@2025-07-16 (3): On the Statistical Properties of Generative Adversarial Models for Low Intrinsic Data Dimension
Title: On the Statistical Properties of Generative Adversarial Models for Low Intrinsic Data Dimension | Über die statistischen Eigenschaften generativer Adversarialmodelle für die geringe intrinsische Datendimension | 关于低内在数据层面的生成反逆模型的统计属性 2401.15801v2 |
Authors (2): Saptarshi Chakraborty, Peter L. Bartlett
Despite the remarkable empirical successes of Generative Adversarial Networks (GANs), the theoretical guarantees for their statistical accuracy remain rather pessimistic. In particular, the data distributions on which GANs are applied, such as natural images, are often hypothesized to have an intrinsic low-dimensional structure in a typically high-dimensional feature space, but this is often not reflected in the derived rates in the state-of-the-art analyses. In this paper, we attempt to bridge the gap between the theory and practice of GANs and their bidirectional variant, Bi-directional GANs (BiGANs), by deriving statistical guarantees on the estimated densities in terms of the intrinsic dimension of the data and the latent space. We analytically show that if one has access to $n$ samples from the unknown target distribution and the network architectures are properly chosen, the expected Wasserstein-1 distance of the estimates from the target scales as $O\left( n^{-1/d_\mu } \right)$ for GANs and $\tilde{O}\left( n^{-1/(d_\mu+\ell)} \right)$ for BiGANs, where $d_\mu$ and $\ell$ are the upper Wasserstein-1 dimension of the data-distribution and latent-space dimension, respectively. The theoretical analyses not only suggest that these methods successfully avoid the curse of dimensionality, in the sense that the exponent of $n$ in the error rates does not depend on the data dimension but also serve to bridge the gap between the theoretical analyses of GANs and the known sharp rates from optimal transport literature. Additionally, we demonstrate that GANs can effectively achieve the minimax optimal rate even for non-smooth underlying distributions, with the use of interpolating generator networks.
尽管Generation Adversarial Network(GANs)取得了显著的成功经验,但其统计准确性的理论保障仍然是相当悲观的。特别是,应用GANs的数据分布,例如自然图像,往往被假定为在典型的高维特征空间内具有内在的低维结构,但通常没有反映在最新分析的衍生率中。在本文中,我们试图弥合GANs及其双向变体(双向)的理论和实践之间的差距。双向GANs(BIGANs),通过对数据内在层面和潜在空间的估计密度(如自然图像)进行统计保障。我们分析表明,如果一个人能够从未知的目标分布和网络结构中获取$美元样本,则预期Wasserstein-1距离在目标尺度上,GANslight(n_-1/dmum%) 和 $xlential(nQ_I_lentral$)之间差距(双向)双向GANs(双向)双向GANs),G-lental-lishal-late$(n_late$) ex ExAN_lational_lational_lational_lational_lational_lational dex dex dexal dal dalationalations dex dislation the the the the the disal dal dies dislations falations falations) legations realations falations realations the the the the the the the the disalations, legations the the the the the dations sald dow the the the dal_ the daldaldaldaldaldaldaldals fal_ daldaldal dal dal dal dald dal dal dal dal daldaldal daldaldaldaldaldaldaldaldaldals fals fals fald dald daldaldaldaldaldaldaldal) ladals faldaldals faldaldaldaldals sals fal sals
Article 250
Title@2025-07-16 (3): RACER: Rational Artificial Intelligence Car-following-model Enhanced by Reality
Title: RACER: Rational Artificial Intelligence Car-following-model Enhanced by Reality | RACER: Rationale Künstliche Intelligenz Car-following-Modell durch Realität verbessert | RACER: 合理人工人工智能汽车跟踪模型 2312.07003v2 |
Authors (3): Tianyi Li, Alexander Halatsis, Raphael Stern
This paper introduces RACER, the Rational Artificial Intelligence Car-following model Enhanced by Reality, a cutting-edge deep learning car-following model, that satisfies partial derivative constraints, designed to predict Adaptive Cruise Control (ACC) driving behavior while staying theoretically feasible. Unlike conventional models, RACER effectively integrates Rational Driving Constraints (RDCs), crucial tenets of actual driving, resulting in strikingly accurate and realistic predictions. Against established models like the Optimal Velocity Relative Velocity (OVRV), a car-following Neural Network (NN), and a car-following Physics-Informed Neural Network (PINN), RACER excels across key metrics, such as acceleration, velocity, and spacing. Notably, it displays a perfect adherence to the RDCs, registering zero violations, in stark contrast to other models. This study highlights the immense value of incorporating physical constraints within AI models, especially for augmenting safety measures in transportation. It also paves the way for future research to test these models against human driving data, with the potential to guide safer and more rational driving behavior. The versatility of the proposed model, including its potential to incorporate additional derivative constraints and broader architectural applications, enhances its appeal and broadens its impact within the scientific community.
本文介绍了RACER, 即由现实增强的理性人工智能汽车跟踪模型,这是一种先进的深深层学习汽车跟踪模型,符合部分衍生限制,旨在预测适航控制(ACC)驾驶行为,同时保持理论上可行。与传统模型不同,RACER有效地整合了理性驾驶限制(RDC),这是实际驾驶的重要原则,从而得出惊人的准确和现实的预测。与最佳快率相对快率(OVRV),汽车跟踪神经网络(NNN)等既定模型,以及汽车跟踪物理内向型神经网络(PINN),RACER在加速、速度和间距等关键度指标方面优异。值得注意的是,RACER展示了完全坚持RDC(RDC),记录零违反情况,与其他模型形成鲜明对比。这项研究强调了将物理限制纳入AI模型的巨大价值,特别是增强运输安全措施。它还为未来研究这些模型测试人类驾驶数据,并有可能引导更安全、更理性的驱动行为(PINN),以及汽车跟踪神经网络(PINN),RACER优于诸如加速、速度和间隔等关键指标。尤其是它所提出的模型的外观,扩大了其潜在价值,从而扩大了其建筑模型的影响力,从而扩大了了其潜在影响。
Article 251
Title@2025-07-16 (3): Uncertainty Quantification for Motor Imagery BCI – Machine Learning vs. Deep Learning
Title: Uncertainty Quantification for Motor Imagery BCI – Machine Learning vs. Deep Learning | Unsicherheit Quantifizierung für Motor Imagery BCI – Machine Learning vs. Deep Learning | 机动图像BCI – – 机器学习与深层学习 2507.07511v2 |
Authors (4): Joris Suurmeijer, Ivo Pascal de Jong, Matias Valdenegro-Toro, Andreea Ioana Sburlea
Brain-computer interfaces (BCIs) turn brain signals into functionally useful output, but they are not always accurate. A good Machine Learning classifier should be able to indicate how confident it is about a given classification, by giving a probability for its classification. Standard classifiers for Motor Imagery BCIs do give such probabilities, but research on uncertainty quantification has been limited to Deep Learning. We compare the uncertainty quantification ability of established BCI classifiers using Common Spatial Patterns (CSP-LDA) and Riemannian Geometry (MDRM) to specialized methods in Deep Learning (Deep Ensembles and Direct Uncertainty Quantification) as well as standard Convolutional Neural Networks (CNNs). We found that the overconfidence typically seen in Deep Learning is not a problem in CSP-LDA and MDRM. We found that MDRM is underconfident, which we solved by adding Temperature Scaling (MDRM-T). CSP-LDA and MDRM-T give the best uncertainty estimates, but Deep Ensembles and standard CNNs give the best classifications. We show that all models are able to separate between easy and difficult estimates, so that we can increase the accuracy of a Motor Imagery BCI by rejecting samples that are ambiguous.
大脑-计算机界面(BCIs) 将大脑信号转化为功能上有用的输出,但并不总是准确的。 良好的机器学习分类人员应该能够通过提供分类的可能性来表明它对某一分类的自信程度。 汽车图像分类公司的标准分类人员确实提供了这种概率,但不确定性量化研究仅限于深学习。 我们用共同空间模式(CSP-LDA)和里曼测量仪(Riemann 几何仪)将已经建立的BCI分类师的不确定性量化能力与深层学习(深层组合和直接不确定性量化)以及标准进化神经网络(CNNs)的专门方法进行比较。 我们发现深层学习中通常看到的过度信任并不是CSP-LDA和MDRM的问题。 我们发现,MDRM(M)不够自信,我们通过添加温度增强(MDRM-T)来解决。 CSP-LDA和MDRM-T(MM-T)给出了最好的不确定性估计,但深层组合和标准的CNN(CN)给出了最佳的精确度。 我们发现,所有模型都很难分辨地分析。
Article 252
Title@2025-07-16 (3): What’s Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift
Title: What’s Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift | Was zieht die Strings? Bewertung von Integrität und Attribution in KI-Training und Schlussfolgerung durch Konzeptverschiebung | 什么是拉弦?在AI培训和推论中通过概念转变评估诚信和归属。 2504.21042v3 |
Authors (6): Jiamin Chang, Haoyang Li, Hammond Pearce, Ruoxi Sun, Bo Li, Minhui Xue
The growing adoption of artificial intelligence (AI) has amplified concerns about trustworthiness, including integrity, privacy, robustness, and bias. To assess and attribute these threats, we propose ConceptLens, a generic framework that leverages pre-trained multimodal models to identify the root causes of integrity threats by analyzing Concept Shift in probing samples. ConceptLens demonstrates strong detection performance for vanilla data poisoning attacks and uncovers vulnerabilities to bias injection, such as the generation of covert advertisements through malicious concept shifts. It identifies privacy risks in unaltered but high-risk samples, filters them before training, and provides insights into model weaknesses arising from incomplete or imbalanced training data. Additionally, at the model level, it attributes concepts that the target model is overly dependent on, identifies misleading concepts, and explains how disrupting key concepts negatively impacts the model. Furthermore, it uncovers sociological biases in generative content, revealing disparities across sociological contexts. Strikingly, ConceptLens reveals how safe training and inference data can be unintentionally and easily exploited, potentially undermining safety alignment. Our study informs actionable insights to breed trust in AI systems, thereby speeding adoption and driving greater innovation.
越来越多的人工智能(AI)的采用加剧了人们对信任性的担忧,包括完整性、隐私、稳健性和偏见。为了评估和说明这些威胁,我们提议了概念Lens,这是一个通用框架,它利用预先培训的多式联运模式,通过分析测试样本中的概念转变,查明完整性威胁的根源。概念Lens展示了香草数据中毒袭击的有力检测性,并揭示了诱发偏见的弱点,例如通过恶意概念转变生成隐蔽广告。它在未经改变但高风险的样本中发现了隐私风险,在培训前过滤了这些风险,并对不完整或不平衡的培训数据产生的模型弱点提供了见解。此外,在模型一级,它赋予了目标模式过分依赖的概念,确定了误导性概念,并解释了破坏关键概念如何对模型产生负面影响。此外,它揭示了基因内容中的社会偏见,揭示了社会背景的差异。 简而言之,概念Lens揭示了培训和推断数据是如何可以无意和容易地利用的,从而可能破坏安全一致性的。我们的研究为在AI系统中培养信任提供了可操作的见解,从而加速采用和推动更大的创新。
Article 253
Title@2025-07-16 (3): Structured and Balanced Multi-Component and Multi-Layer Neural Networks
Title: Structured and Balanced Multi-Component and Multi-Layer Neural Networks | Strukturierte und ausgewogene Multi-Komponenten- und Multi-Layer-Neural-Netzwerke | 结构化和平衡式多功能和多功能多功能多功能多功能多功能多功能神经网络 2407.00765v3 |
Authors (4): Shijun Zhang, Hongkai Zhao, Yimin Zhong, Haomin Zhou
In this work, we propose a balanced multi-component and multi-layer neural network (MMNN) structure to accurately and efficiently approximate functions with complex features, in terms of both degrees of freedom and computational cost. The main idea is inspired by a multi-component approach, in which each component can be effectively approximated by a single-layer network, combined with a multi-layer decomposition strategy to capture the complexity of the target function. Although MMNNs can be viewed as a simple modification of fully connected neural networks (FCNNs) or multi-layer perceptrons (MLPs) by introducing balanced multi-component structures, they achieve a significant reduction in training parameters, a much more efficient training process, and improved accuracy compared to FCNNs or MLPs. Extensive numerical experiments demonstrate the effectiveness of MMNNs in approximating highly oscillatory functions and their ability to automatically adapt to localized features.
在这项工作中,我们提出了一个平衡的多部分和多层神经网络(MMNN)结构,在自由度和计算成本两方面,准确和有效地将具有复杂特征的职能相近,主要想法来自一个多部分方法,其中每个组成部分都可有效地被一个单层网络所近似,并辅之以一个多层分解战略,以捕捉目标功能的复杂性。虽然可以通过引入平衡的多部分结构,将MMNN视为完全连通的神经网络(FCNN)或多层感应器(MLPs)的简单修改,但是它们实现了培训参数的大幅度削减,培训过程效率更高,与FCNN或MLPs相比,准确性也有所提高。广泛的数字实验表明MNN在接近高度串联功能方面的有效性及其自动适应局部特征的能力。
Article 254
Title@2025-07-16 (3): A Framework for Nonstationary Gaussian Processes with Neural Network Parameters
Title: A Framework for Nonstationary Gaussian Processes with Neural Network Parameters | Ein Framework für nichtstationäre Gauß-Prozesse mit neuralen Netzwerkparametern | 带有神经网络参数的非静止高斯进程框架 2507.12262v1 |
Authors (2): Zachary James, Joseph Guinness
Gaussian processes have become a popular tool for nonparametric regression because of their flexibility and uncertainty quantification. However, they often use stationary kernels, which limit the expressiveness of the model and may be unsuitable for many datasets. We propose a framework that uses nonstationary kernels whose parameters vary across the feature space, modeling these parameters as the output of a neural network that takes the features as input. The neural network and Gaussian process are trained jointly using the chain rule to calculate derivatives. Our method clearly describes the behavior of the nonstationary parameters and is compatible with approximation methods for scaling to large datasets. It is flexible and easily adapts to different nonstationary kernels without needing to redesign the optimization procedure. Our methods are implemented with the GPyTorch library and can be readily modified. We test a nonstationary variance and noise variant of our method on several machine learning datasets and find that it achieves better accuracy and log-score than both a stationary model and a hierarchical model approximated with variational inference. Similar results are observed for a model with only nonstationary variance. We also demonstrate our approach’s ability to recover the nonstationary parameters of a spatial dataset.
高斯进程因其灵活性和不确定性的量化而成为非参数回归的流行工具。然而,它们往往使用固定内核,这种内核限制了模型的清晰度,而且可能不适合许多数据集。我们提议一个框架,使用非静止内核,其参数在特性空间各异,这些参数建模作为神经网络输出的神经网络,以其特性作为输入。神经网络和高斯进程使用链规则共同培训,以计算衍生物。我们的方法清楚描述非静止参数的行为,并与向大型数据集缩放的近似方法相容。它灵活且容易地适应不同的非静止内核,而不需要重新设计优化程序。我们的方法与GPyTorch图书馆一起实施,可以随时修改。我们用几个机器学习数据集测试我们方法的非静止差异和噪音变异,发现它比固定模型和接近变异的等级模型都更准确和对日志数。我们观察了类似的结果,以恢复模型来显示我们的空间变异性参数。
Article 255
Title@2025-07-16 (3): Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
Title: Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty | Proaktive Agenten für Multi-Turn-Text-to-Image-Generierung unter Unsicherheit | 多发文本到图像在不确定情况下生成的活性剂 2412.06771v2 |
Authors (7): Meera Hahn, Wenjun Zeng, Nithish Kannen, Rich Galt, Kartikeya Badola, Been Kim, Zi Wang
User prompts for generative AI models are often underspecified, leading to a misalignment between the user intent and models’ understanding. As a result, users commonly have to painstakingly refine their prompts. We study this alignment problem in text-to-image (T2I) generation and propose a prototype for proactive T2I agents equipped with an interface to (1) actively ask clarification questions when uncertain, and (2) present their uncertainty about user intent as an understandable and editable belief graph. We build simple prototypes for such agents and propose a new scalable and automated evaluation approach using two agents, one with a ground truth intent (an image) while the other tries to ask as few questions as possible to align with the ground truth. We experiment over three image-text datasets: ImageInWords (Garg et al., 2024), COCO (Lin et al., 2014) and DesignBench, a benchmark we curated with strong artistic and design elements. Experiments over the three datasets demonstrate the proposed T2I agents’ ability to ask informative questions and elicit crucial information to achieve successful alignment with at least 2 times higher VQAScore (Lin et al., 2024) than the standard T2I generation. Moreover, we conducted human studies and observed that at least 90% of human subjects found these agents and their belief graphs helpful for their T2I workflow, highlighting the effectiveness of our approach. Code and DesignBench can be found at https://github.com/google-deepmind/proactive_t2i_agents.
用户使用基因化的AI 模型的提示往往被描述得不够清楚,导致用户意图和模型理解之间的误差。 因此,用户通常不得不仔细改进他们的提示。 我们研究了文本到图像(T2I)生成中的这一匹配问题,并提出了一个具有界面的积极主动的T2I代理器原型:(1) 当不确定时积极提出澄清问题,(2) 提出用户意图的不确定性,作为可理解和可编辑的信仰图表。 我们为这些代理商建立简单的原型,并提出一种新的可缩放和自动化的评价方法,使用两个代理商,一个带有地面真实意图(图像),而另一个用户则试图尽可能少问几个问题,以便与地面真相保持一致。 我们试验了三个图像文本数据集:图像InOds(Garg等人,2024年),CO(Lin等人,2014年)和Defench Bennch,这是我们用强烈的艺术和设计要素拼凑起来的基准。 三个数据集的实验表明,拟议的T2I代理商能够提出信息性的问题,并尽可能地收集关键信息,以便在至少2次的版本中与TAAS-Lseral的版本中,我们所观察到的20 和20年的人类代代代码研究。
Article 256
Title@2025-07-16 (3): A Thorough Assessment of the Non-IID Data Impact in Federated Learning
Title: A Thorough Assessment of the Non-IID Data Impact in Federated Learning | Eine gründliche Bewertung der Auswirkungen von nicht-IID-Daten auf das Federated Learning | 彻底评估非二二二二二项数据对联邦学习的影响 2503.17070v2 |
Authors (5): Daniel M. Jimenez-Gutierrez, Mehrdad Hassanzadeh, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti
Federated learning (FL) allows collaborative machine learning (ML) model training among decentralized clients’ information, ensuring data privacy. The decentralized nature of FL deals with non-independent and identically distributed (non-IID) data. This open problem has notable consequences, such as decreased model performance and more significant convergence times. Despite its importance, experimental studies systematically addressing all types of data heterogeneity (a.k.a. non-IIDness) remain scarce. We aim to fill this gap by assessing and quantifying the non-IID effect through a thorough empirical analysis. We use the Hellinger Distance (HD) to measure differences in distribution among clients. Our study benchmarks four state-of-the-art strategies for handling non-IID data, including label, feature, quantity, and spatiotemporal skewness, under realistic and controlled conditions. This is the first comprehensive analysis of the spatiotemporal skew effect in FL. Our findings highlight the significant impact of label and spatiotemporal skew non-IID types on FL model performance, with notable performance drops occurring at specific HD thresholds. Additionally, the FL performance is heavily affected mainly when the non-IIDness is extreme. Thus, we provide recommendations for FL research to tackle data heterogeneity effectively. Our work represents the most extensive examination of non-IIDness in FL, offering a robust foundation for future research.
联邦学习(FL)允许在分散客户的信息中进行协作机器学习模式培训,确保数据隐私。FL的分散性质涉及非独立和相同分布(非IID)数据。这一公开问题具有显著后果,例如模型性能下降和更加显著的趋同时间。尽管其重要性,但针对所有类型数据异质性(a.k.a.非IIDness)的实验性研究仍然很少。我们的目的是通过透彻的经验分析评估和量化非IID效应,填补这一差距。我们利用Hellinger距离(HD)来衡量客户间分布的差异。我们的研究基准是处理非II数据的四个最先进的战略,包括标签、特征、数量和广度的时间趋同时间。尽管其重要性很大,但是在现实和控制的条件下,系统地分析FL对FL的广度效应。我们的研究基础和非II对FTopotoporal Skew的非II模型性业绩的重大影响,因此在进行最严格的FHD研究时,我们无法有效地分析。
Article 257
Title@2025-07-16 (3): Robust Causal Discovery in Real-World Time Series with Power-Laws
Title: Robust Causal Discovery in Real-World Time Series with Power-Laws | Robuste Causal Discovery in der Real-World Time Series mit Power-Laws | 具有权力法的 “ 真实世界时间系列 “ 中强有力的因果发现 2507.12257v1 |
Authors (6): Matteo Tusoni, Giuseppe Masi, Andrea Coletta, Aldo Glielmo, Viviana Arrigoni, Novella Bartolini
Exploring causal relationships in stochastic time series is a challenging yet crucial task with a vast range of applications, including finance, economics, neuroscience, and climate science. Many algorithms for Causal Discovery (CD) have been proposed, but they often exhibit a high sensitivity to noise, resulting in misleading causal inferences when applied to real data. In this paper, we observe that the frequency spectra of typical real-world time series follow a power-law distribution, notably due to an inherent self-organizing behavior. Leveraging this insight, we build a robust CD method based on the extraction of power -law spectral features that amplify genuine causal signals. Our method consistently outperforms state-of-the-art alternatives on both synthetic benchmarks and real-world datasets with known causal structures, demonstrating its robustness and practical relevance.
探索随机时间序列中的因果关系是一项具有挑战性但至关重要的任务,涉及广泛的应用,包括金融、经济学、神经科学和气候科学。 已经提出了许多Causal Discovery(CD)的算法,但这些算法往往对噪音具有高度的敏感性,导致在应用到真实数据时产生误导性因果推断。 在本文中,我们观察到典型真实时间序列的频率光谱遵循一种权力法分布,特别是由于一种固有的自我组织行为。利用这一洞察力,我们建立了一种强大的CD CD 方法, 其基础是提取能放大真实因果信号的能量- 法律光谱特征。 我们的方法在合成基准和真实世界数据集方面始终优于已知因果结构的最新替代方法,显示了其稳健性和实际相关性。
Article 258
Title@2025-07-16 (3): Surrogate Quantum Circuit Design for the Lattice Boltzmann Collision Operator
Title: Surrogate Quantum Circuit Design for the Lattice Boltzmann Collision Operator | Surrogate Quantum Circuit Design für den Lattice Boltzmann Collision Operator | Lattice Boltzmann 碰撞操作员的代管量子电路设计 2507.12256v1 |
Authors (2): Monica Lăcătuş, Matthias Möller
Direct numerical simulation of turbulent flows at high Reynolds numbers remains a major challenge for traditional computational fluid dynamics (CFD) tools running on classical computer hardware. This has motivated growing interest in quantum algorithms for CFD to enable flow simulations on quantum computers. The reason being that these computers are expected to deliver potential speed-ups for certain problems. One promising quantum CFD approach is a fully quantum implementation of the lattice Boltzmann method called QLBM. Although efficient quantum routines are now available for the streaming step, implementing the nonlinear, irreversible collision step with a low depth circuit that avoids additional ancilla qubits, probabilistic post-selection and repeated executions remains a significant challenge. In this study, we address this challenge by introducing a framework for learning a surrogate quantum circuit (SQC) that approximates the full Bhatnagar Gross Krook (BGK) collision operator for the D2Q9 lattice. The four qubit circuit is trained to respect the physical properties of the BGK collision operator, including mass and momentum conservation, D8 equivariance and scale equivariance. When compiled to the gate set used by IBM Heron processor under the assumption of full qubit connectivity, the 15 block SQC requires only 2,430 native gates and uses neither ancilla qubits nor post-selection or repeated executions. Moreover, its depth is independent of the grid resolution, as collision is a local operation that can exploit quantum parallelism to its full extent. We validate the SQC on two benchmark flows, the Taylor Green vortex decay and the lid driven cavity, demonstrating that it accurately captures vortex dissipation and flow recirculation.
Reynolds高音量的动荡流的直接数字模拟,对于传统计算机硬件上运行的传统计算流动动态工具(CFD)来说,仍然是一项重大挑战。这促使人们日益关注CFD的量算算法,以便在量子计算机上进行流量模拟。原因是这些计算机预计将为某些问题提供潜在的加速。一个很有希望的量数计算法是完全实施称为QLBM(QLBM)的Lattice Boltzmann(Batice Galtzmann)碰撞操作器。虽然现在为流动步骤提供了高效量定量程序,实施非线性、不可逆转性碰撞步骤,低深度电路,避免增加,稳定性递增的平行性后选和多次处决,这仍然是一项重大挑战。在本研究中,我们通过引入一个框架来学习超速量量量量电路程(SQC),接近D2QK(BVK)全速碰撞操作的全量量量量。这四根基电路只用来尊重BK碰撞操作器的物理特性,包括质量和动力保护, Qal-Qal-Qal-lievilalal lieval lievilational 运行, listral deal deal liveral laveal laveal laveal orational orational lading orations lax laveal lax lax lautt lad lad lax lax lax lax lad lad lauttal lad lautd 和Sild 25 25 20 d d d d d d d 10 10 lax lax lax lax lax lad lax lax lax lax lax d d lax lax lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad laut lad lad lad lax 和 lad la
Article 259
Title@2025-07-16 (3): Comparative Analysis of CNN Performance in Keras, PyTorch and JAX on PathMNIST
Title: Comparative Analysis of CNN Performance in Keras, PyTorch and JAX on PathMNIST | Vergleichende Analyse der CNN-Leistung in Keras, PyTorch und JAX auf PathMNIST | CNN在Keras、PyTorch和JAX在 “ 路运 “ 上的表现比较分析 2507.12248v1 |
Authors (5): Anida Nezović, Jalal Romano, Nada Marić, Medina Kapo, Amila Akagić
Deep learning has significantly advanced the field of medical image classification, particularly with the adoption of Convolutional Neural Networks (CNNs). Various deep learning frameworks such as Keras, PyTorch and JAX offer unique advantages in model development and deployment. However, their comparative performance in medical imaging tasks remains underexplored. This study presents a comprehensive analysis of CNN implementations across these frameworks, using the PathMNIST dataset as a benchmark. We evaluate training efficiency, classification accuracy and inference speed to assess their suitability for real-world applications. Our findings highlight the trade-offs between computational speed and model accuracy, offering valuable insights for researchers and practitioners in medical image analysis.
深层学习大大推进了医学图像分类领域,特别是采用了革命神经网络(CNNs),各种深层学习框架,如Keras、PyTorrch和JAX,在模型开发和部署方面提供了独特的优势,然而,它们在医学成像任务方面的比较性能仍未得到充分探讨。本研究报告以PathMNIST数据集为基准,全面分析了CNN在这些框架中的实施情况。我们评估培训效率、分类准确性和推断速度,以评估其是否适合现实世界应用。我们的调查结果突出了计算速度和模型准确性之间的权衡,为研究人员和从业人员提供了医学图像分析的宝贵见解。
Article 260
Title@2025-07-16 (3): Universal Fourier Neural Operators for Micromechanics
Title: Universal Fourier Neural Operators for Micromechanics | Universal Fourier-Neural-Betreiber für Mikromechanik | 通用微型机械天体神经操作员 2507.12233v1 |
Authors (2): Binh Huy Nguyen, Matti Schneider
\noindent Solving cell problems in homogenization is hard, and available deep-learning frameworks fail to match the speed and generality of traditional computational frameworks. More to the point, it is generally unclear what to expect of machine-learning approaches, let alone single out which approaches are promising. In the work at hand, we advocate Fourier Neural Operators (FNOs) for micromechanics, empowering them by insights from computational micromechanics methods based on the fast Fourier transform (FFT). We construct an FNO surrogate mimicking the basic scheme foundational for FFT-based methods and show that the resulting operator predicts solutions to cell problems with \emph{arbitrary} stiffness distribution only subject to a material-contrast constraint up to a desired accuracy. In particular, there are no restrictions on the material symmetry like isotropy, on the number of phases and on the geometry of the interfaces between materials. Also, the provided fidelity is sharp and uniform, providing explicit guarantees leveraging our physical empowerment of FNOs. To show the desired universal approximation property, we construct an FNO explicitly that requires no training to begin with. Still, the obtained neural operator complies with the same memory requirements as the basic scheme and comes with runtimes proportional to classical FFT solvers. In particular, large-scale problems with more than 100 million voxels are readily handled. The goal of this work is to underline the potential of FNOs for solving micromechanical problems, linking FFT-based methods to FNOs. This connection is expected to provide a fruitful exchange between both worlds.
==================================================================================================================================================================== =================================================Y=============================================================================================================)========================================================================================================================================================================================
Article 261
Title@2025-07-16 (3): FADE: Why Bad Descriptions Happen to Good Features
Title: FADE: Why Bad Descriptions Happen to Good Features | FADE: Warum schlechte Beschreibungen gut aussehen | FADE:为什么不良描述发生在好地貌 2502.16994v2 |
Authors (7): Bruno Puri, Aakriti Jain, Elena Golimblevskaia, Patrick Kahardipraja, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
Recent advances in mechanistic interpretability have highlighted the potential of automating interpretability pipelines in analyzing the latent representations within LLMs. While this may enhance our understanding of internal mechanisms, the field lacks standardized evaluation methods for assessing the validity of discovered features. We attempt to bridge this gap by introducing FADE: Feature Alignment to Description Evaluation, a scalable model-agnostic framework for automatically evaluating feature-to-description alignment. FADE evaluates alignment across four key metrics - Clarity, Responsiveness, Purity, and Faithfulness - and systematically quantifies the causes of the misalignment between features and their descriptions. We apply FADE to analyze existing open-source feature descriptions and assess key components of automated interpretability pipelines, aiming to enhance the quality of descriptions. Our findings highlight fundamental challenges in generating feature descriptions, particularly for SAEs compared to MLP neurons, providing insights into the limitations and future directions of automated interpretability. We release FADE as an open-source package at: https://github.com/brunibrun/FADE
最近在机械化解释性方面取得的进展突出表明了在分析LLMs内部潜在代表形式方面使解释性管道自动化的潜力。虽然这可能增进我们对内部机制的理解,但实地缺乏评估所发现特征有效性的标准化评价方法。我们试图通过引入FADE(FADE:描述评价的特性调整)来弥补这一差距,FADE(FADE:描述评价的特性调整)是一个可扩缩的模型-不可计量框架,用于自动评估特征到描述的一致性。FADE(FADE)评估了四种关键指标(清晰度、反应性、纯度和忠诚性)的协调统一,并系统地量化了特征及其描述之间不匹配的原因。我们应用FADE(FADE)分析现有的公开源特征描述并评估自动解释性管道的关键组成部分,以提高描述的质量。我们的调查结果突出了生成特征描述方面的基本挑战,特别是SAE(SAE)相对于MP神经系统而言,提供了对自动解释的局限性和未来方向的洞察。我们将FADE作为开放源包发布:https://github.com/brunibrun/FADE)/FADE(FADE)
Article 262
Title@2025-07-16 (3): Holistic analysis on the sustainability of Federated Learning across AI product lifecycle
Title: Holistic analysis on the sustainability of Federated Learning across AI product lifecycle | Ganzheitliche Analyse der Nachhaltigkeit von Federated Learning über den gesamten Lebenszyklus von KI-Produkten hinweg | 关于全AI性产品生命周期中联邦学习可持续性的全面分析 2312.14628v3 |
Authors (1): Hongliu Cao
In light of emerging legal requirements and policies focused on privacy protection, there is a growing trend of companies across various industries adopting Federated Learning (FL). This decentralized approach involves multiple clients or silos, collaboratively training a global model under the coordination of a central server while utilizing their private local data. Unlike traditional methods that necessitate data sharing and transmission, Cross-Silo FL allows clients to share model updates rather than raw data, thereby enhancing privacy. Despite its growing adoption, the carbon impact associated with Cross-Silo FL remains poorly understood due to the limited research in this area. This study seeks to bridge this gap by evaluating the sustainability of Cross-Silo FL throughout the entire AI product lifecycle, extending the analysis beyond the model training phase alone. We systematically compare this decentralized method with traditional centralized approaches and present a robust quantitative framework for assessing the costs and CO2 emissions in real-world Cross-Silo FL environments. Our findings indicate that the energy consumption and costs of model training are comparable between Cross-Silo Federated Learning and Centralized Learning. However, the additional data transfer and storage requirements inherent in Centralized Learning can result in significant, often overlooked CO2 emissions. Moreover, we introduce an innovative data and application management system that integrates Cross-Silo FL and analytics, aiming at improving the sustainability and economic efficiency of IT enterprises.
鉴于新出现的以隐私保护为重点的法律要求和政策,采用联邦学习联合会(FL)的各行业的公司趋势日益明显。这种分散化办法涉及多个客户或筒仓,在中央服务器的协调下合作培训一个全球模式,同时利用其私人本地数据。不同于需要数据共享和传播的传统方法,跨西罗FL允许客户共享模式更新而非原始数据,从而增强隐私。尽管采用该办法的情况越来越多,但跨西罗FL的碳影响仍然不为人所知,因为该领域的研究有限。然而,通过评估跨西罗FL在整个AI产品生命周期的可持续性,将分析范围扩大到示范培训阶段之外,从而缩小这一差距。我们系统地将这种分散化方法与传统的集中化方法进行比较,并提供一个强有力的量化框架,用以评估现实世界跨西罗FL环境中的成本和二氧化碳排放量,从而增强隐私。我们的研究结果表明,由于跨西罗联学习和中央化学习,示范培训的能源消耗和成本是可比的。然而,中央化学习所固有的额外数据转移和储存要求可以导致大量、经常被忽视的CO2排放和FIL企业的革新管理。此外,我们引入了一种创新的信息技术和革新的系统。
Article 263
Title@2025-07-16 (3): Optimizers Qualitatively Alter Solutions And We Should Leverage This
Title: Optimizers Qualitatively Alter Solutions And We Should Leverage This | Optimierer Qualitativ alternative Lösungen und wir sollten diese nutzen | 最优化质的平价平价解决方案,我们应该利用这个 2507.12224v1 |
Authors (9): Razvan Pascanu, Clare Lyle, Ionut-Vlad Modoranu, Naima Elosegui Borras, Dan Alistarh, Petar Velickovic, Sarath Chandar, Soham De, James Martens
Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when using optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely fruitful, another perspective specific to DNNs has received considerably less attention: the optimizer not only influences the rate of convergence, but also the qualitative properties of the learned solutions. Restated, the optimizer can and will encode inductive biases and change the effective expressivity of a given class of models. Furthermore, we believe the optimizer can be an effective way of encoding desiderata in the learning process. We contend that the community should aim at understanding the biases of already existing methods, as well as aim to build new optimizers with the explicit intent of inducing certain properties of the solution, rather than solely judging them based on their convergence rates. We hope our arguments will inspire research to improve our understanding of how the learning process can impact the type of solution we converge to, and lead to a greater recognition of optimizers design as a critical lever that complements the roles of architecture and data in shaping model outcomes.
由于深神经网络(DNN)的非线性性质,在使用仅依靠当地信息(如SGD)的优化者时,无法保证与独特的全球最低损失水平趋同。 确实,这是对DNN在外勤初期的可行性的主要怀疑来源。 过去几十年的深层次学习进展表明这种怀疑是错的,大量经验证据表明,按照标准培训协议的足够大的DNNN表现出的成熟优化动态,它们与实绩解决方案完全一致。这一成功使社区无法使用convex优化作为学习的心理趋同模式,导致注重培训效率,无论是在要求的迭代、FLOPs或墙时钟时间方面,当改进优化者时,这是对DNNNS可行性的可行性的怀疑性能。 我们指出,虽然这一视角证明极有成效,但另一个具体针对DNNN的视角却很少受到关注:最优化者不仅影响趋同的趋同率,而且学习解决方案的质量特性也更趋近。 最优化者可以把Convex优化的优化的优化性优化性优化作用作为学习方法, 也能够将我们更深刻地判断其直观的正正态的走向。
Article 264
Title@2025-07-16 (3): Error bounds for particle gradient descent, and extensions of the log-Sobolev and Talagrand inequalities
Title: Error bounds for particle gradient descent, and extensions of the log-Sobolev and Talagrand inequalities | Fehlergrenzen für Partikelgradientenabstieg und Erweiterungen der log-Sobolev- und Talagrand-Ungleichheiten | 粒子梯度下降错误的界限,以及log-Sobolev 和 Talagrand 不平等的延伸 2403.02004v3 |
Authors (4): Rocco Caprio, Juan Kuntz, Samuel Power, Adam M. Johansen
We prove non-asymptotic error bounds for particle gradient descent (PGD, Kuntz et al., 2023), a recently introduced algorithm for maximum likelihood estimation of large latent variable models obtained by discretizing a gradient flow of the free energy. We begin by showing that the flow converges exponentially fast to the free energy’s minimizers for models satisfying a condition that generalizes both the log-Sobolev and the Polyak–{\L}ojasiewicz inequalities (LSI and P{\L}I, respectively). We achieve this by extending a result well-known in the optimal transport literature (that the LSI implies the Talagrand inequality) and its counterpart in the optimization literature (that the P{\L}I implies the so-called quadratic growth condition), and applying the extension to our new setting. We also generalize the Bakry–'Emery Theorem and show that the LSI/P{\L}I extension holds for models with strongly concave log-likelihoods. For such models, we further control PGD’s discretization error and obtain the non-asymptotic error bounds. While we are motivated by the study of PGD, we believe that the inequalities and results we extend may be of independent interest.
我们证明,对于粒子梯度下降(PGD, Kuntz 等人, 2023年)来说,我们并非是非保护性的错误,这是最近引入的一种算法,对通过分解自由能源的梯度流获得的大型潜伏变量模型进行最大可能性的估计。我们首先要显示,流动指数指数迅速接近自由能源的最小化器,这些模型满足了一个条件,该条件将日志-Sobolev和Polyak-L}ojasiewicz的不平等(分别为LSI和P~L}I)普遍化。我们通过扩大最佳运输文献中众所周知的结果(LSI意味着Talagrand不平等)和优化文献中的对应结果(P~L}我意味着所谓的四边增长条件)来实现这一目标。我们还将这种流动指数指数指数快速接近到自由能源最小化的最小化器。 我们还将Bakry-\'Emery Theorem 和P~L}I扩展功能的模型具有很强的逻辑相似性模型。对于这些模型来说,我们进一步控制PGD的离性错误,并获得非不平等性的兴趣。
Article 265
Title@2025-07-16 (3): Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control
Title: Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control | Sparse Autoencoder für sequentielle Empfehlungsmodelle: Interpretation und flexible Steuerung | 序列建议模型:解释和灵活控制 2507.12202v1 |
Authors (6): Anton Klenitskiy, Konstantin Polev, Daria Denisova, Alexey Vasilev, Dmitry Simakov, Gleb Gusev
Many current state-of-the-art models for sequential recommendations are based on transformer architectures. Interpretation and explanation of such black box models is an important research question, as a better understanding of their internals can help understand, influence, and control their behavior, which is very important in a variety of real-world applications. Recently sparse autoencoders (SAE) have been shown to be a promising unsupervised approach for extracting interpretable features from language models. These autoencoders learn to reconstruct hidden states of the transformer’s internal layers from sparse linear combinations of directions in their activation space. This paper is focused on the application of SAE to the sequential recommendation domain. We show that this approach can be successfully applied to the transformer trained on a sequential recommendation task: learned directions turn out to be more interpretable and monosemantic than the original hidden state dimensions. Moreover, we demonstrate that the features learned by SAE can be used to effectively and flexibly control the model’s behavior, providing end-users with a straightforward method to adjust their recommendations to different custom scenarios and contexts.
目前许多关于顺序建议的先进模型都以变压器结构为基础。这种黑盒模型的解读和解释是一个重要的研究问题,因为更好地了解其内部结构可以帮助理解、影响和控制其行为,这对于现实世界的各种应用非常重要。最近稀疏的自动编码器(SAE)被证明是从语言模型中提取可解释特性的一种有希望的、不受监督的方法。这些自动编码器学会了将变压器内部层的隐藏状态从其激活空间的微薄方向线性组合中重建出来。本文侧重于SAE对相继建议域的应用。我们表明,这一方法可以成功地应用于经过连续建议任务培训的变压器:学习方向变得比最初的隐藏状态维度更具解释性和单一性。此外,我们证明,变压器所学的特征可以用来有效和灵活地控制模型的行为,为终端用户提供直接的方法,使其建议适应不同的自定义情景和背景。
Article 266
Title@2025-07-16 (3): Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms
Title: Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms | Prominente Rollen bedingt Invarianter Komponenten in der Domänenanpassung: Theorie und Algorithmen | 有条件的不变化构件在适应域中的主要作用:理论和数值 2309.10301v3 |
Authors (4): Keru Wu, Yuansi Chen, Wooseok Ha, Bin Yu
Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to evaluate the model. While many DA algorithms have demonstrated considerable empirical success, blindly applying these algorithms can often lead to worse performance on new datasets. To address this, it is crucial to clarify the assumptions under which a DA algorithm has good target performance. In this work, we focus on the assumption of the presence of conditionally invariant components (CICs), which are relevant for prediction and remain conditionally invariant across the source and target data. We demonstrate that CICs, which can be estimated through conditional invariant penalty (CIP), play three prominent roles in providing target risk guarantees in DA. First, we propose a new algorithm based on CICs, importance-weighted conditional invariant penalty (IW-CIP), which has target risk guarantees beyond simple settings such as covariate shift and label shift. Second, we show that CICs help identify large discrepancies between source and target risks of other DA algorithms. Finally, we demonstrate that incorporating CICs into the domain invariant projection (DIP) algorithm can address its failure scenario caused by label-flipping features. We support our new algorithms and theoretical findings via numerical experiments on synthetic data, MNIST, CelebA, Camelyon17, and DomainNet datasets.
用于培训模型的源数据分布不同于用于评估模型的目标数据,因此,域适应(DA)是一个统计学习问题。当用于培训模型的源数据分布不同于用于评估模型的目标数据时,就会产生一个统计学习问题。虽然许多达卡算法已经表现出相当的成功经验,但盲目应用这些算法往往会导致新数据集的性能更差。要解决这个问题,关键是要澄清DA算法具有良好目标性能所依据的假设。在这项工作中,我们侧重于假设存在有条件的、不固定的、与预测相关的、并且仍然有条件地在源和目标数据之间变化不定的元件(CICs)。我们证明,可以通过有条件的变换处罚(CIP)来估算CICs,在提供目标风险保证方面可以发挥三大显著作用。首先,我们建议基于CICs, 重要加权的有条件变数惩罚(IW-CIP) 的新算法,其目标风险保证超出了诸如变数转移和标签变换等简单环境。第二,我们表明CICs帮助查明其他达算法地址的来源和目标风险之间的巨大差异。最后,我们展示了CSIICA-tailalalalalalal-tabislationalationalalalal ficisal 。我们可以通过的失败,我们用了自己的内部数据分析结果,我们用了。
Article 267
Title@2025-07-16 (3): Selective Quantization Tuning for ONNX Models
Title: Selective Quantization Tuning for ONNX Models | Selektive Quantisierungstuning für ONNX-Modelle | ONNX 模型选择性量化图 2507.12196v1 |
Authors (2): Nikolaos Louloudakis, Ajitha Rajan
Quantization is a process that reduces the precision of deep neural network models to lower model size and computational demands, often at the cost of accuracy. However, fully quantized models may exhibit sub-optimal performance below acceptable levels and face deployment challenges on low-end hardware accelerators due to practical constraints. To address these issues, quantization can be selectively applied to only a subset of layers, but selecting which layers to exclude is non-trivial. To this direction, we propose TuneQn, a suite enabling selective quantization, deployment and execution of ONNX models across various CPU and GPU devices, combined with profiling and multi-objective optimization. TuneQn generates selectively quantized ONNX models, deploys them on different hardware, measures performance on metrics like accuracy and size, performs Pareto Front minimization to identify the best model candidate and visualizes the results. To demonstrate the effectiveness of TuneQn, we evaluated TuneQn on four ONNX models with two quantization settings across CPU and GPU devices. As a result, we demonstrated that our utility effectively performs selective quantization and tuning, selecting ONNX model candidates with up to a $54.14$% reduction in accuracy loss compared to the fully quantized model, and up to a $72.9$% model size reduction compared to the original model.
量化是一个过程,可以降低深神经网络模型的精确度,降低模型大小和计算要求,而且往往以准确性为代价。然而,完全量化模型可能显示低于可接受水平的亚优性性性能,并因实际限制而面临低端硬件加速器的部署挑战。为了解决这些问题,量化可以有选择地适用于一组子层,但选择哪些层排除为非三级。为此,我们提议TunenQn,一个套件,能够有选择地量化、部署和执行各种CPU和GPU设备中的ONNX模型,同时结合特征分析和多目标优化。TuneQn生成有选择性的ONNX模型,在不同的硬件上部署这些模型,测量精度和大小等标准的业绩,进行Pareto Front最小化,以找到最佳的模型候选者,并直观地将结果化。为了展示TunenQn模型的有效性,我们评估了四个ONNX模型的TunQn,该套件在CPU和GPU设备上有两个量化设置。结果是,我们通过原始的模型,我们演示结果,我们选择了将应用率降低到完全的大小,比X值,比重度调整了我们的VIV标准,比重度,比重度,比了我们选择了我们的应用到完全的大小。
Article 268
Title@2025-07-16 (3): Explainable Evidential Clustering
Title: Explainable Evidential Clustering | Erklärbares Evidential Clustering | 可解释的证明人群集 2507.12192v1 |
Authors (5): Victor F. Lopes de Souza, Karima Bakhti, Sofiane Ramdani, Denis Mottet, Abdelhak Imoussaten
Unsupervised classification is a fundamental machine learning problem. Real-world data often contain imperfections, characterized by uncertainty and imprecision, which are not well handled by traditional methods. Evidential clustering, based on Dempster-Shafer theory, addresses these challenges. This paper explores the underexplored problem of explaining evidential clustering results, which is crucial for high-stakes domains such as healthcare. Our analysis shows that, in the general case, representativity is a necessary and sufficient condition for decision trees to serve as abductive explainers. Building on the concept of representativity, we generalize this idea to accommodate partial labeling through utility functions. These functions enable the representation of “tolerable” mistakes, leading to the definition of evidential mistakeness as explanation cost and the construction of explainers tailored to evidential classifiers. Finally, we propose the Iterative Evidential Mistake Minimization (IEMM) algorithm, which provides interpretable and cautious decision tree explanations for evidential clustering functions. We validate the proposed algorithm on synthetic and real-world data. Taking into account the decision-maker’s preferences, we were able to provide an explanation that was satisfactory up to 93% of the time.
不受监督的分类是一个根本性的机器学习问题。 现实世界的数据往往包含不完善,其特点是不确定性和不精确,而传统方法处理不善。 基于Dempster-Shafer理论的证明群集群集群群集群集群集,解决了这些挑战。 本文探讨了在解释证据群集结果方面探索不足的问题,这个问题对于保健等高接触领域至关重要。 我们的分析表明,在一般情况下,代表性是决定树成为诱拐性解释师的必要和充分条件。 在代表性概念的基础上,我们推广这个概念,以通过使用功能来容纳部分标签。 这些功能可以代表“可容忍”错误,从而导致对证据群集群集结果进行解释,并构建适合证据分类师的解说师。 最后,我们建议采用隐性误最小化(IEMM)算法,该算法为证据群集功能提供了可解释和谨慎的决定树解释解释解释。 我们验证了合成和现实世界数据的拟议算法。 考虑到决策者的偏好的时间,我们能够提供一个令人满意的解释93 %的解释。
Article 269
Title@2025-07-16 (3): BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search
Title: BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search | BenchRL-QAS: Benchmarking Bewehrung Lernalgorithmen für die Quantenarchitektursuche | BenchRL-QAS:为量子结构搜索确定强化学习算法的基准 2507.12189v1 |
Authors (4): Azhar Ikhtiarudin, Aditi Das, Param Thakkar, Akash Kundu
We introduce BenchRL-QAS, a unified benchmarking framework for systematically evaluating reinforcement learning (RL) algorithms in quantum architecture search (QAS) across diverse variational quantum algorithm tasks and system sizes ranging from 2- to 8-qubit. Our study benchmarks nine RL agents including both value-based and policy-gradient methods on representative quantum problems such as variational quantum eigensolver, variational quantum state diagonalization, quantum classification, and state preparation, spanning both noiseless and realistic noisy regimes. We propose a weighted ranking metric that balances accuracy, circuit depth, gate count, and computational efficiency, enabling fair and comprehensive comparison. Our results first reveal that RL-based quantum classifier outperforms baseline variational classifiers. Then we conclude that no single RL algorithm is universally optimal when considering a set of QAS tasks; algorithmic performance is highly context-dependent, varying with task structure, qubit count, and noise. This empirical finding provides strong evidence for the “no free lunch” principle in RL-based quantum circuit design and highlights the necessity of tailored algorithm selection and systematic benchmarking for advancing quantum circuit synthesis. This work represents the most comprehensive RL-QAS benchmarking effort to date, and BenchRL-QAS along with all experimental data are made publicly available to support reproducibility and future research https://github.com/azhar-ikhtiarudin/bench-rlqas.
我们引入了BenchRL-QAS(BencherRL-QAS),这是一个统一的基准框架,用于系统地评价量子结构搜索中各种量子量子算法任务和系统规模的强化学习算法(RL),范围从2到8平方位不等。我们的研究基准为9个量子算法代理商,包括基于价值和政策梯度的代量问题方法,例如变量量量量单、变量量量州分解、量子分类和州制,范围跨越无噪音和现实的噪音制度。我们提出了一个加权等级衡量标准,平衡精度、电路深度、门点数和计算效率,从而能够进行公平和全面的比较。我们的第一个结果显示,基于RL的量子分类法比基线变异性分类器超越基线变异性分类。然后我们得出结论,在考虑一套QAS任务时,没有一个单一的RL值算法是普遍最佳的;算法性表现高度依赖环境,与任务结构、量子计和噪音不同。这一实发现有力地证明了RL电路图设计中的“无免费午餐”原则原则,并强调了为推进量算选择选择选择选择选择选择和系统化的逻辑合成标准。
Article 270
Title@2025-07-16 (3): MTF-Grasp: A Multi-tier Federated Learning Approach for Robotic Grasping
Title: MTF-Grasp: A Multi-tier Federated Learning Approach for Robotic Grasping | MTF-Grasp: Multi-Tier-Federated Learning Approach for Robotic Grasping | MTF-Grasp: 一种多阶段联邦的机器人采掘学习方法 2507.10158v2 |
Authors (3): Obaidullah Zaland, Erik Elmroth, Monowar Bhuyan
Federated Learning (FL) is a promising machine learning paradigm that enables participating devices to train privacy-preserved and collaborative models. FL has proven its benefits for robotic manipulation tasks. However, grasping tasks lack exploration in such settings where robots train a global model without moving data and ensuring data privacy. The main challenge is that each robot learns from data that is nonindependent and identically distributed (non-IID) and of low quantity. This exhibits performance degradation, particularly in robotic grasping. Thus, in this work, we propose MTF-Grasp, a multi-tier FL approach for robotic grasping, acknowledging the unique challenges posed by the non-IID data distribution across robots, including quantitative skewness. MTF-Grasp harnesses data quality and quantity across robots to select a set of “top-level” robots with better data distribution and higher sample count. It then utilizes top-level robots to train initial seed models and distribute them to the remaining “low-level” robots, reducing the risk of model performance degradation in low-level robots. Our approach outperforms the conventional FL setup by up to 8% on the quantity-skewed Cornell and Jacquard grasping datasets.
联邦学习组织(FL)是一个很有希望的机器学习模式,它让参与设备能够培训保密和协作模式,让参与设备能够培训保密和协作模式。 FL已经证明了它在机器人操纵任务方面的好处。 但是,在机器人在不移动数据和确保数据隐私的情况下培训全球模型而不移动数据的情况下,掌握的任务缺乏探索性。 主要的挑战是每个机器人都从不独立且分布相同( 非IID) 和数量较少的数据中学习。 这显示了性能退化, 特别是在机器人捕捉方面。 因此, 在这项工作中, 我们提出了MTF- Graspsp, 这是一种多层次的FLL方法, 用于机器人捕捉, 承认非IID数据在机器人之间分配, 包括数量偏差等所构成的独特挑战。 MTF- GraspSp 利用跨机器人的数据质量和数量来选择一组数据分布更好和样本数更高的“ 顶级” 机器人。 然后利用顶级机器人来培训初始种子模型并将其分发给其余的“ 低级” 机器人, 降低低级机器人的性能退化风险。 我们的方法超越了常规的FLreval 数据, 和 Rockerfrock diquestock lipplegeds 。
Article 271
Title@2025-07-16 (3): 2.5D Object Detection for Intelligent Roadside Infrastructure
Title: 2.5D Object Detection for Intelligent Roadside Infrastructure | 2.5D-Objekterkennung für intelligente Straßeninfrastruktur | 2.5D 智能路边基础设施物体探测 2507.03564v2 |
Authors (6): Nikolai Polley, Yacin Boualili, Ferdinand Mütsch, Maximilian Zipfl, Tobias Fleck, J. Marius Zöllner
On-board sensors of autonomous vehicles can be obstructed, occluded, or limited by restricted fields of view, complicating downstream driving decisions. Intelligent roadside infrastructure perception systems, installed at elevated vantage points, can provide wide, unobstructed intersection coverage, supplying a complementary information stream to autonomous vehicles via vehicle-to-everything (V2X) communication. However, conventional 3D object-detection algorithms struggle to generalize under the domain shift introduced by top-down perspectives and steep camera angles. We introduce a 2.5D object detection framework, tailored specifically for infrastructure roadside-mounted cameras. Unlike conventional 2D or 3D object detection, we employ a prediction approach to detect ground planes of vehicles as parallelograms in the image frame. The parallelogram preserves the planar position, size, and orientation of objects while omitting their height, which is unnecessary for most downstream applications. For training, a mix of real-world and synthetically generated scenes is leveraged. We evaluate generalizability on a held-out camera viewpoint and in adverse-weather scenarios absent from the training set. Our results show high detection accuracy, strong cross-viewpoint generalization, and robustness to diverse lighting and weather conditions. Model weights and inference code are provided at: https://gitlab.kit.edu/kit/aifb/ATKS/public/digit4taf/2.5d-object-detection
自动车辆的机上传感器可能受到阻碍、隐蔽或限制,或受到限制的视野,使下游驾驶决定复杂,使下游驾驶决定复杂化。在高正方点安装的智能路边基础设施感知系统,可以提供广泛、不受阻碍的交叉覆盖,通过车辆到世界(V2X)的通信(V2X)通信,向自主车辆提供辅助信息流;然而,常规的三维物体探测算法在上下下视角和相摄角度斜斜的摄像角度,在自上自上下视角和高摄像角度引入的域变换下方,可能会阻碍、隐蔽或限制地,或受限制,使下方的视野更加复杂。与常规 2D 或 3D 目标探测不同,我们采用预测方法,在图像框中将车辆的平行图像作为平行图进行探测地面对地面飞机进行广泛而不受阻碍的交叉覆盖,平行图保护物体的平面位置、大小和方向,而多数应用应用程序应用。培训、现实世界和合成的场景-合成场景、我们根据长期摄制的摄影4摄制的镜头和反天/比较假设情景评估一般情景评估一般情况评估。我们的结果显示了高、强的准确度、稳度、强的进度/跨比、甚、甚、深点、深点、甚、提供、提供、高、高、高比、高、提供、高、高、高、提供、高比、高比、高、提供的示范、高、高标准、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、
Article 272
Title@2025-07-16 (3): LHU-Net: a Lean Hybrid U-Net for Cost-efficient, High-performance Volumetric Segmentation
Title: LHU-Net: a Lean Hybrid U-Net for Cost-efficient, High-performance Volumetric Segmentation | LHU-Net: Ein schlankes Hybrid-U-Net für kosteneffiziente, leistungsstarke Volumetric-Segmentierung | LHU-Net:低成本效益、高性能量量分解的精混合U-Net 2404.05102v3 |
Authors (5): Yousef Sadegheih, Afshin Bozorgpour, Pratibha Kumari, Reza Azad, Dorit Merhof
The rise of Transformer architectures has advanced medical image segmentation, leading to hybrid models that combine Convolutional Neural Networks (CNNs) and Transformers. However, these models often suffer from excessive complexity and fail to effectively integrate spatial and channel features, crucial for precise segmentation. To address this, we propose LHU-Net, a Lean Hybrid U-Net for volumetric medical image segmentation. LHU-Net prioritizes spatial feature extraction before refining channel features, optimizing both efficiency and accuracy. Evaluated on four benchmark datasets (Synapse, Left Atrial, BraTS-Decathlon, and Lung-Decathlon), LHU-Net consistently outperforms existing models across diverse modalities (CT/MRI) and output configurations. It achieves state-of-the-art Dice scores while using four times fewer parameters and 20% fewer FLOPs than competing models, without the need for pre-training, additional data, or model ensembles. With an average of 11 million parameters, LHU-Net sets a new benchmark for computational efficiency and segmentation accuracy. Our implementation is available on GitHub: https://github.com/xmindflow/LHUNet
变异器结构的兴起已使医学图像分化先进,导致混合模型,将进化神经网络(CNNs)和变异器结合起来,然而,这些模型往往过于复杂,未能有效地整合空间和频道特征,对精确分化至关重要。为此,我们提议LHU-Net,即用于体积医学图像分化的Lean混合U-Net;LHU-Net在精炼通道功能之前优先进行空间特征提取,以优化效率和准确性。在四个基准数据集(Synopse、左侧Ature、BraTS-Decathlon和Lung-Decathlon)上进行了评价,LHU-Net始终超越了不同模式(CT/MRI)和输出配置的现有模型。它达到最先进的Dice分数,同时使用四倍的参数和20%的FLOPs比竞争模型,不需要预先培训、额外数据或模型封装。在平均1 100万个参数下,LHUNet为计算效率和分化精确度的新基准。我们在GiHLUBs流上可以使用:http://HUBxx。
Article 273
Title@2025-07-16 (3): NeuTSFlow: Modeling Continuous Functions Behind Time Series Forecasting
Title: NeuTSFlow: Modeling Continuous Functions Behind Time Series Forecasting | NeuTSFlow: Modellierung kontinuierlicher Funktionen hinter Zeitreihen Prognose | NeSTSFlow: 时间序列预测背后的模拟连续函数 2507.09888v2 |
Authors (7): Huibo Xu, Likang Wu, Xianquan Wang, Haoning Dang, Chun-Wun Cheng, Angelica I Aviles-Rivero, Qi Liu
Time series forecasting is a fundamental task with broad applications, yet conventional methods often treat data as discrete sequences, overlooking their origin as noisy samples of continuous processes. Crucially, discrete noisy observations cannot uniquely determine a continuous function; instead, they correspond to a family of plausible functions. Mathematically, time series can be viewed as noisy observations of a continuous function family governed by a shared probability measure. Thus, the forecasting task can be framed as learning the transition from the historical function family to the future function family. This reframing introduces two key challenges: (1) How can we leverage discrete historical and future observations to learn the relationships between their underlying continuous functions? (2) How can we model the transition path in function space from the historical function family to the future function family? To address these challenges, we propose NeuTSFlow, a novel framework that leverages Neural Operators to facilitate flow matching for learning path of measure between historical and future function families. By parameterizing the velocity field of the flow in infinite-dimensional function spaces, NeuTSFlow moves beyond traditional methods that focus on dependencies at discrete points, directly modeling function-level features instead. Experiments on diverse forecasting tasks demonstrate NeuTSFlow’s superior accuracy and robustness, validating the effectiveness of the function-family perspective.
时间序列的预测是一项具有广泛应用性的基本任务, 但常规方法往往将数据视为离散序列, 忽略其来源, 忽略其源头为连续过程的杂乱样本。 关键是, 离散的噪音观测无法独有地决定一个连续功能; 相反, 它们与一个具有合理功能的大家庭相对应。 从数学角度来说, 时间序列可以被视为一个由共同概率测量法管理的连续功能大家庭的杂乱观测。 因此, 预测任务可以作为学习从历史函数家庭向未来函数家庭过渡的过程来设计。 这种重新配置带来了两个关键挑战:(1) 我们如何利用离散的历史和未来的观测来学习其基本连续功能之间的关系? (2) 我们如何在功能空间从历史函数家庭到未来函数家庭之间的过渡路径上建模? 为了应对这些挑战, 我们建议 NeutSFFlow, 一个利用神经操作者来帮助学习历史和未来函数家庭之间的测量路径的新框架。 通过对无限功能空间流动的速域进行参数比较, NeutTSFFFlow 运动超越了传统方法, 聚焦于离散点的依赖性、 直接建模功能水平的精确度, 实验显示不同预测任务。
Article 274
Title@2025-07-16 (3): RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection
Title: RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection | RUMAA: Repeat-Aware Unified Music Audio Analyse zur Ausrichtung, Transkription und Fehlererkennung | RUMAA: 用于计分业绩协调、追踪和误差探测的重复软件统一音乐音频分析 2507.12175v1 |
Authors (3): Sungkyun Chang, Simon Dixon, Emmanouil Benetos
This study introduces RUMAA, a transformer-based framework for music performance analysis that unifies score-to-performance alignment, score-informed transcription, and mistake detection in a near end-to-end manner. Unlike prior methods addressing these tasks separately, RUMAA integrates them using pre-trained score and audio encoders and a novel tri-stream decoder capturing task interdependencies through proxy tasks. It aligns human-readable MusicXML scores with repeat symbols to full-length performance audio, overcoming traditional MIDI-based methods that rely on manually unfolded score-MIDI data with pre-specified repeat structures. RUMAA matches state-of-the-art alignment methods on non-repeated scores and outperforms them on scores with repeats in a public piano music dataset, while also delivering promising transcription and mistake detection results.
本研究介绍了RUMAA,这是一个基于变压器的音乐性能分析框架,它以近终端至终端的方式统一了分数到业绩的对齐、对分知情的抄录和错误发现。与以前分别处理这些任务的方法不同,RUMAA采用预先培训的分数和音频编码器,以及新颖的三流解码器,通过代理任务捕捉任务的相互依存性。它使人可读MusicXML分数与重复的符号与全长性能听音相匹配,克服了传统的MIDI基方法,这些方法依靠预先指定的重复结构人工展开的分数-MIDI数据。RUMAA匹配了最先进的非重现分数调整方法,并用公共钢琴音乐数据集中的重复数来优胜其分,同时还提供了很有希望的抄录和错误检测结果。
Article 275
Title@2025-07-16 (3): Sharing is CAIRing: Characterizing Principles and Assessing Properties of Universal Privacy Evaluation for Synthetic Tabular Data
Title: Sharing is CAIRing: Characterizing Principles and Assessing Properties of Universal Privacy Evaluation for Synthetic Tabular Data | Sharing is CAIRing: Charakterisierende Prinzipien und Bewertung der Eigenschaften der universellen Datenschutzbewertung für synthetische Tabellendaten | 共享是CAIR:确定合成图表数据通用隐私评价的原则和特性评估 2312.12216v2 |
Authors (4): Tobias Hyrup, Anton Danholt Lautrup, Arthur Zimek, Peter Schneider-Kamp
Data sharing is a necessity for innovative progress in many domains, especially in healthcare. However, the ability to share data is hindered by regulations protecting the privacy of natural persons. Synthetic tabular data provide a promising solution to address data sharing difficulties but does not inherently guarantee privacy. Still, there is a lack of agreement on appropriate methods for assessing the privacy-preserving capabilities of synthetic data, making it difficult to compare results across studies. To the best of our knowledge, this is the first work to identify properties that constitute good universal privacy evaluation metrics for synthetic tabular data. The goal of universally applicable metrics is to enable comparability across studies and to allow non-technical stakeholders to understand how privacy is protected. We identify four principles for the assessment of metrics: Comparability, Applicability, Interpretability, and Representativeness (CAIR). To quantify and rank the degree to which evaluation metrics conform to the CAIR principles, we design a rubric using a scale of 1-4. Each of the four properties is scored on four parameters, yielding 16 total dimensions. We study the applicability and usefulness of the CAIR principles and rubric by assessing a selection of metrics popular in other studies. The results provide granular insights into the strengths and weaknesses of existing metrics that not only rank the metrics but highlight areas of potential improvements. We expect that the CAIR principles will foster agreement among researchers and organizations on which universal privacy evaluation metrics are appropriate for synthetic tabular data.
共享数据是在许多领域,特别是在医疗保健领域创新进步的必要条件。然而,共享数据的能力受到保护自然人隐私的条例的阻碍。合成表格数据为解决数据共享困难提供了很有希望的解决办法,但并不能从根本上保障隐私。然而,对于评估合成数据的隐私保护能力的适当方法,尚缺乏一致意见,因此难以对各种研究的成果进行比较。根据我们的最佳知识,这是确定构成合成表格数据良好普遍隐私评价指标的属性的首项工作。普遍适用指标的目标是使各项研究之间具有可比性,使非技术利益攸关方了解隐私如何受到保护。我们确定了评估指标的四项原则:可比性、可适用性、可互换性和代表性。为了量化和评定评价指标符合CAIR原则的程度,我们用1-4的尺度设计一个标注。这四项特性的每一项均按四个参数评分,共产生16个层面。我们研究CAIR原则的适用性和有用性,通过评估通用数据原则的可保护性如何保护隐私。我们确定了评估指标评估的四项原则的四项原则:可比性、可适用性、可解释性、可解释性、代表性(CAIR)和代表性(CA)评估其他指标领域的标准评估结果,我们仅能评估现有指标评估现有指标的弹性评估。
Article 276
Title@2025-07-16 (3): Governance of Generative Artificial Intelligence for Companies
Title: Governance of Generative Artificial Intelligence for Companies | Governance generativer Künstlicher Intelligenz für Unternehmen | 公司创造人工情报的治理 2403.08802v4 |
Authors (4): Johannes Schneider, Pauline Kuss, Rene Abraham, Christian Meske
Generative Artificial Intelligence (GenAI), specifically large language models(LLMs) like ChatGPT, has swiftly entered organizations without adequate governance, posing both opportunities and risks. Despite extensive debates on GenAI’s transformative nature and regulatory measures, limited research addresses organizational governance, encompassing technical and business perspectives. Although numerous frameworks for governance of AI exist, it is not clear to what extent they apply to GenAI. Our review paper fills this gap by surveying recent works with the purpose of better understanding fundamental characteristics of GenAI and adjusting prior frameworks specifically towards GenAI governance within companies. To do so, it extends Nickerson’s framework development processes to include prior conceptualizations. Our framework outlines the scope, objectives, and governance mechanisms tailored to harness business opportunities as well as mitigate risks associated with GenAI integration. Our research contributes a focused approach to GenAI governance, offering practical insights for companies navigating the challenges of GenAI adoption and highlighting research gaps.
尽管对GenAI的变革性质和监管措施进行了广泛辩论,但研究范围有限,涉及组织治理,包括技术和商业观点。尽管存在许多AI治理框架,但尚不清楚这些框架在多大程度上适用于GenAI。我们的审查文件填补了这一差距,调查了最近的工作,目的是更好地了解GenAI的基本特点,并调整以前框架,具体针对GenAI公司内部治理的框架。为此,它扩展了Nickerson的框架制定过程,以包括先前的概念化。我们的框架概述了为利用商业机会以及减少GenAI一体化带来的风险而专门设计的范围、目标和治理机制。我们的研究为GenAI治理提供了重点方法,为处理GenAI采用方面的挑战的公司提供了实际的见解,并突出了研究差距。
Article 277
Title@2025-07-16 (3): RadioDiff-3D: A 3D$\times$3D Radio Map Dataset and Generative Diffusion Based Benchmark for 6G Environment-Aware Communication
Title: RadioDiff-3D: A 3D$\times$3D Radio Map Dataset and Generative Diffusion Based Benchmark for 6G Environment-Aware Communication | RadioDiff-3D: Ein 3D$\times$3D Radio Map Datensatz und Generative Diffusionsbasierter Benchmark für 6G Environment-Aware Kommunikation | RadioDiff-3D: 6G 环境软件通信的3D$3D无线电地图数据集和基于发源传播的基准3D美元 2507.12166v1 |
Authors (8): Xiucheng Wang, Qiming Zhang, Nan Cheng, Junting Chen, Zezhong Zhang, Zan Li, Shuguang Cui, Xuemin Shen
Radio maps (RMs) serve as a critical foundation for enabling environment-aware wireless communication, as they provide the spatial distribution of wireless channel characteristics. Despite recent progress in RM construction using data-driven approaches, most existing methods focus solely on pathloss prediction in a fixed 2D plane, neglecting key parameters such as direction of arrival (DoA), time of arrival (ToA), and vertical spatial variations. Such a limitation is primarily due to the reliance on static learning paradigms, which hinder generalization beyond the training data distribution. To address these challenges, we propose UrbanRadio3D, a large-scale, high-resolution 3D RM dataset constructed via ray tracing in realistic urban environments. UrbanRadio3D is over 37$\times$3 larger than previous datasets across a 3D space with 3 metrics as pathloss, DoA, and ToA, forming a novel 3D$\times$33D dataset with 7$\times$3 more height layers than prior state-of-the-art (SOTA) dataset. To benchmark 3D RM construction, a UNet with 3D convolutional operators is proposed. Moreover, we further introduce RadioDiff-3D, a diffusion-model-based generative framework utilizing the 3D convolutional architecture. RadioDiff-3D supports both radiation-aware scenarios with known transmitter locations and radiation-unaware settings based on sparse spatial observations. Extensive evaluations on UrbanRadio3D validate that RadioDiff-3D achieves superior performance in constructing rich, high-dimensional radio maps under diverse environmental dynamics. This work provides a foundational dataset and benchmark for future research in 3D environment-aware communication. The dataset is available at https://github.com/UNIC-Lab/UrbanRadio3D.
无线电地图(RMs)是有利于环境的无线通信的重要基础,因为它们提供了无线频道特点的空间分布。尽管最近使用数据驱动的方法在RM建设方面取得了进展,但大多数现有方法仅侧重于固定的 2D 平面上的病理预测,忽视了抵达方向、抵达时间和垂直空间变异等关键参数。这种局限性主要是由于依赖静态学习模式,这阻碍了培训数据分布以外的一般化。为了应对这些挑战,我们提议城市无线电3DD,这是一个大规模、高分辨率的3D RM数据集,在现实的城市环境中通过光线上高级跟踪建立。 城市3D3D比以往3D空间的数据配置大37多3美元,其中3个尺度为病理、DoA和ToA,形成了一个新的3D时间333D数据集,比培训数据发布之前高3美元。 我们为3D RM3D 快速观测设定了3D 快速数据定位,在3D 数据库中引入了3D 快速数据运行。
Article 278
Title@2025-07-16 (3): Multi-Component VAE with Gaussian Markov Random Field
Title: Multi-Component VAE with Gaussian Markov Random Field | Multi-Komponent VAE mit Gaussian Markov Random Field | 带有 Gaussian Markov 随机字段的多功能 VAE 2507.12165v1 |
Authors (5): Fouad Oubari, Mohamed El-Baha, Raphael Meunier, Rodrigue Décatoire, Mathilde Mougeot
Multi-component datasets with intricate dependencies, like industrial assemblies or multi-modal imaging, challenge current generative modeling techniques. Existing Multi-component Variational AutoEncoders typically rely on simplified aggregation strategies, neglecting critical nuances and consequently compromising structural coherence across generated components. To explicitly address this gap, we introduce the Gaussian Markov Random Field Multi-Component Variational AutoEncoder , a novel generative framework embedding Gaussian Markov Random Fields into both prior and posterior distributions. This design choice explicitly models cross-component relationships, enabling richer representation and faithful reproduction of complex interactions. Empirically, our GMRF MCVAE achieves state-of-the-art performance on a synthetic Copula dataset specifically constructed to evaluate intricate component relationships, demonstrates competitive results on the PolyMNIST benchmark, and significantly enhances structural coherence on the real-world BIKED dataset. Our results indicate that the GMRF MCVAE is especially suited for practical applications demanding robust and realistic modeling of multi-component coherence
为了明确解决这一差距,我们引入了多构件数据集,该数据集具有复杂的相互依存性,如工业组件或多式成像,挑战目前的基因模型技术。现有的多构件自动自动编码器通常依赖于简化的聚合战略,忽略了关键微妙的细微差别,从而损害各生成组件之间的结构一致性。为了明确解决这一差距,我们引入了Gausian Markov随机多兼容多元化自动编码器(Gausian Markoov随机场),这是一个新型的基因化框架,将Gaussian Markov随机场嵌入了先前的和后方的分布中。这一设计选择明确地模拟了跨构件关系,使复杂的互动能够有更丰富的代表性和更忠实的复制。我们GMRF MMCVAE在专门为评价复杂组件关系而构建的合成科波拉数据集上取得了最先进的业绩,展示了多构件组合组合式基准的竞争性结果,并大大加强了实际世界BIKED数据集的结构性一致性。我们的结果表明,GMRF MCVAE特别适合实际应用,要求对多构件一致性进行强有力和现实的建模。
Article 279
Title@2025-07-16 (3): Patherea: Cell Detection and Classification for the 2020s
Title: Patherea: Cell Detection and Classification for the 2020s | Patherea: Zellerkennung und Klassifizierung für die 2020er Jahre | Pathea:2020年代细胞检测和分类 2412.16425v2 |
Authors (6): Dejan Štepec, Maja Jerše, Snežana Đokić, Jera Jeruc, Nina Zidar, Danijel Skočaj
We present Patherea, a unified framework for point-based cell detection and classification that enables the development and fair evaluation of state-of-the-art methods. To support this, we introduce a large-scale dataset that replicates the clinical workflow for Ki-67 proliferation index estimation. Our method directly predicts cell locations and classes without relying on intermediate representations. It incorporates a hybrid Hungarian matching strategy for accurate point assignment and supports flexible backbones and training regimes, including recent pathology foundation models. Patherea achieves state-of-the-art performance on public datasets - Lizard, BRCA-M2C, and BCData - while highlighting performance saturation on these benchmarks. In contrast, our newly proposed Patherea dataset presents a significantly more challenging benchmark. Additionally, we identify and correct common errors in current evaluation protocols and provide an updated benchmarking utility for standardized assessment. The Patherea dataset and code are publicly available to facilitate further research and fair comparisons.
我们提出一个基于点的细胞检测和分类统一框架Pathea, 该框架有助于开发和公平评估最新方法。为了支持这一框架,我们推出一个大规模数据集,复制基-67扩散指数估计的临床工作流程。我们的方法直接预测细胞位置和班级,而不必依赖中间代表。它包含一个匈牙利混合匹配战略,用于准确点分配,支持灵活的骨干和培训制度,包括最近的病理基础模型。Paexia在公共数据集(Lizard、BRCA-M2C和BCData)上取得最新业绩,同时突出这些基准的性能饱和度。与此形成对照的是,我们新提出的Paitea数据集是一个更具挑战性的基准。此外,我们发现并纠正了当前评估协议中常见的错误,并为标准化评估提供了最新的基准工具。Pathea数据集和代码可供公开查阅,以便于进一步研究和公平比较。
Article 280
Title@2025-07-16 (3): Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training
Title: Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training | Schutz urheberrechtlich geschützter Materialien mit einzigartigen Identifikatoren in großsprachlichen Modellschulungen | 在大语言模式培训中以独特标识人保护版权材料 2403.15740v3 |
Authors (4): Shuai Zhao, Linchao Zhu, Ruijie Quan, Yi Yang
A primary concern regarding training large language models (LLMs) is whether they abuse copyrighted online text. With the increasing training data scale and the prevalence of LLMs in daily lives, two problems arise: \textbf{1)} false positive membership inference results misled by similar examples; \textbf{2)} membership inference methods are usually too complex for end users to understand and use. To address these issues, we propose an alternative \textit{insert-and-detect} methodology, advocating that web users and content platforms employ \textbf{\textit{unique identifiers}} for reliable and independent membership inference. Users and platforms can create their identifiers, embed them in copyrighted text, and independently detect them in future LLMs. As an initial demonstration, we introduce \textit{\textbf{ghost sentences}} and a user-friendly last-$k$ words test, allowing end users to chat with LLMs for membership inference. Ghost sentences consist primarily of unique passphrases of random natural words, which can come with customized elements to bypass possible filter rules. The last-$k$ words test requires a significant repetition time of ghost sentences~($\ge10$). For cases with fewer repetitions, we designed an extra perplexity test, as LLMs exhibit high perplexity when encountering unnatural passphrases. We also conduct a comprehensive study on the memorization and membership inference of ghost sentences, examining factors such as training data scales, model sizes, repetition times, insertion positions, wordlist of passphrases, alignment, \textit{etc}. Our study shows the possibility of applying ghost sentences in real scenarios and provides instructions for the potential application.
对培训大型语言模型(LLMs)的主要关切是它们是否滥用了版权版在线文本。随着培训数据规模的扩大和LLMs在日常生活中的普及程度,出现了两个问题:{textbf{ { {1} }假正会推论结果被类似的例子误导;{textbf{ { {2} } }会籍推论方法通常过于复杂,最终用户无法理解和使用。为了解决这些问题,我们建议了一种替代的\ textit{插入和检测}方法,主张网络用户和内容平台为可靠和独立的会籍推断使用\ textbf{unit{unial 识别器。用户和平台可以创建自己的识别器,将其嵌入版权文本,并在未来的LLLMsms中独立检测这些结果。作为初步示范,我们引入了 textitleitle text_b{gf{ghorhost} 和方便用户与LMSdeplication 进行交谈的文本模式, 鬼判决主要为随机自然词句系,这可以与定制的元素一起绕过过滤器规则。最后的缩缩缩缩缩缩缩缩缩缩定义, 也要求我们用一次测试。
Article 281
Title@2025-07-16 (3): Data Augmentation in Time Series Forecasting through Inverted Framework
Title: Data Augmentation in Time Series Forecasting through Inverted Framework | Datenvergrößerung in Zeitreihen Vorhersage durch umgekehrtes Framework | 通过反向框架预测时间序列中的数据增加值 2507.11439v2 |
Authors (4): Hongming Tan, Ting Chen, Ruochong Jin, Wai Kin Chan
Currently, iTransformer is one of the most popular and effective models for multivariate time series (MTS) forecasting. Thanks to its inverted framework, iTransformer effectively captures multivariate correlation. However, the inverted framework still has some limitations. It diminishes temporal interdependency information, and introduces noise in cases of nonsignificant variable correlation. To address these limitations, we introduce a novel data augmentation method on inverted framework, called DAIF. Unlike previous data augmentation methods, DAIF stands out as the first real-time augmentation specifically designed for the inverted framework in MTS forecasting. We first define the structure of the inverted sequence-to-sequence framework, then propose two different DAIF strategies, Frequency Filtering and Cross-variation Patching to address the existing challenges of the inverted framework. Experiments across multiple datasets and inverted models have demonstrated the effectiveness of our DAIF.
目前, itransferent 是多变时间序列预测中最受欢迎和最有效的模型之一。 由于其反向框架, itransfer 有效捕捉了多变关系。 但是, 反转框架仍然有一些局限性 。 它会减少时间间相互依存信息, 并在非重大可变关系中引入噪音 。 为了解决这些局限性, 我们在反向框架上引入了一种新的数据增强方法, 称为 DAIF 。 与以往的数据增强方法不同, DAIF 与以前的数据增强方法不同, 是第一次专门为MTIS 预测中逆向框架设计的实时增强。 我们首先定义了逆向序列至后序框架的结构, 然后提出了两种不同的 DAIF 战略, 即频率过滤和跨变式补丁战略, 以应对反向框架的现有挑战 。 跨多个数据集和反向模式的实验证明了我们 DAIF 的有效性 。
Article 282
Title@2025-07-16 (3): Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery
Title: Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery | Complexity-Aware Training Deep Neural Networks für eine optimale Struktur-Discovery | 为发现最佳结构最佳结构而进行深神经网络的复杂度知识培训 2411.09127v2 |
Authors (2): Valentin Frank Ingmar Guenter, Athanasios Sideris
We propose a novel algorithm for combined unit and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit pruning and computational vs. parameter complexity using only three user-defined parameters, which are easy to interpret and tune. We formulate a stochastic optimization problem over the network weights and the parameters of variational Bernoulli distributions for binary Random Variables taking values either 0 or 1 and scaling the units and layers of the network. Optimal network structures are found as the solution to this optimization problem. Pruning occurs when a variational parameter converges to 0 rendering the corresponding structure permanently inactive, thus saving computations both during training and prediction. A key contribution of our approach is to define a cost function that combines the objectives of prediction accuracy and network pruning in a computational/parameter complexity-aware manner and the automatic selection of the many regularization parameters. We show that the proposed algorithm converges to solutions of the optimization problem corresponding to deterministic networks. We analyze the ODE system that underlies our stochastic optimization algorithm and establish domains of attraction for the dynamics of the network parameters. These theoretical results lead to practical pruning conditions avoiding the premature pruning of units and layers during training. We evaluate our method on the CIFAR-10/100 and ImageNet datasets using ResNet architectures and demonstrate that it gives improved results with respect to pruning ratios and test accuracy over layer-only or unit-only pruning and favorably competes with combined unit and layer pruning algorithms requiring pre-trained networks.
我们建议对在训练期间运行且不需要经过训练的网络应用的深神经网络进行组合单元和层调整的新型算法。 我们的算法是最佳交换交换学习精度和修剪水平,同时平衡层对单元的修剪和计算法对参数的复杂性,只使用三个用户定义的参数,这些参数容易解释和调和。 我们在网络重量和Bernoulli变异性分配参数上提出一个随机优化问题,这些变量的值为 0 或 1 并缩放网络的单位和层次。 最佳网络结构被找到作为优化问题的解决方案。 当变异参数趋同到0 使相应的结构永久不活动,从而在培训和预测期间节省计算。 我们的方法的一个关键贡献是界定成本功能,将预测精度和网络的运行目标与计算/度前层的精度复杂性评估方式以及许多正规化参数的自动选择结合起来。 我们显示,拟议的算法与优化条件的解决方案的解决方案相匹配,与精度单位对精度网络的精度尊重度相对的精度网络。 我们用模型对模型的精度测试系统进行了分析, 将模型模型的精度对精度对精度进行系统进行。
Article 283
Title@2025-07-16 (3): PRISM: Distributed Inference for Foundation Models at Edge
Title: PRISM: Distributed Inference for Foundation Models at Edge | PRISM: Verteilte Schlussfolgerung für Stiftungsmodelle am Rand | PRISM: 边缘基础模型分布式推理 2507.12145v1 |
Authors (3): Muhammad Azlan Qazi, Alexandros Iosifidis, Qi Zhang
Foundation models (FMs) have achieved remarkable success across a wide range of applications, from image classification to natural langurage processing, but pose significant challenges for deployment at edge. This has sparked growing interest in developing practical and efficient strategies for bringing foundation models to edge environments. In this work, we propose PRISM, a communication-efficient and compute-aware strategy for distributed Transformer inference on edge devices. Our method leverages a Segment Means representation to approximate intermediate output features, drastically reducing inter-device communication. Additionally, we restructure the self-attention mechanism to eliminate redundant computations caused by per-device Key/Value calculation in position-wise partitioning and design a partition-aware causal masking scheme tailored for autoregressive models. We evaluate PRISM on ViT, BERT, and GPT-2 across diverse datasets, namely CIFAR-10, CIFAR-100, ImageNet-1k, GLUE, and CBT. Our results demonstrate substantial reductions in communication overhead (up to 99.2% for BERT at compression rate CR = 128) and per-device computation (51.24% for BERT at the same setting), with only minor accuracy degradation. This method offers a scalable and practical solution for deploying foundation models in distributed resource-constrained environments.
基础模型(FMS)在从图像分类到自然兰瓜处理等广泛应用中取得了显著的成功,但对于边缘的部署提出了巨大的挑战。这激发了人们越来越有兴趣制定实用而有效的战略,使基础模型进入边缘环境。在这项工作中,我们提出了PRISM,这是用于在边缘装置上分布式变压器推断的通信高效和计算自测战略。我们的方法将“部分”作为代表手段,以接近中间输出特征,大大减少了不同设备之间的通信。此外,我们调整了自控机制,消除了定位分区分配中因每个装置的钥匙/价值计算造成的冗余计算,并设计了一个适合自动递减模型的分区感知因果掩蔽计划。我们评估了VIT、BERT和GPT-2的PRISM,这是跨越不同数据集的分布式变压器,即CIFAR-10、CIFAR-100、图像Net-1k、GLUE和CBT。我们的结果显示通信间接费用的大幅下降(BERET压缩率最高至128的99.2 % ), 以及每步调计算方法(51.24%)和BER 基础模型的最小的精确降解。
Article 284
Title@2025-07-16 (3): FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale
Title: FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale | FourCastNet 3: Ein geometrischer Ansatz zur probabilistischen maschinellen Wettervorhersage im Maßstab | 4CastNet 3: 大规模机学习气象预测概率的几何方法 2507.12144v1 |
Authors (10): Boris Bonev, Thorsten Kurth, Ankur Mahesh, Mauro Bisson, Jean Kossaifi, Karthik Kashinath, Anima Anandkumar, William D. Collins, Michael S. Pritchard, Alexander Keller
FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting. The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales. FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches. In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realistic spectra, even at extended lead times of up to 60 days. All of these advances are realized using a purely convolutional neural network architecture tailored for spherical geometry. Scalable and efficient large-scale training on 1024 GPUs and more is enabled by a novel training paradigm for combined model- and data-parallelism, inspired by domain decomposition methods in classical numerical models. Additionally, FourCastNet 3 enables rapid inference on a single GPU, producing a 90-day global forecast at 0.25{\deg}, 6-hourly resolution in under 20 seconds. Its computational efficiency, medium-range probabilistic skill, spectral fidelity, and rollout stability at subseasonal timescales make it a strong candidate for improving meteorological forecasting and early warning systems through large ensemble predictions.
4CastNet 3通过采用可缩放的、几何机器学习(ML)方法,推进全球气象模型,通过采用可计量的、几何机器学习(ML)方法进行概率共振预测,推进全球气象模型模型。该方法旨在尊重球形几何学,准确模拟问题的空间相关概率性,从而形成稳定的光谱和现实的多尺度动态。四CastNet 3 提供的预报准确性超过了领先的常规混合模型,与最佳的传播方法相匹配,同时生成的预测速度比这些方法快8至60倍。与其他 ML方法相比, FourCastNet 3 展示了极佳的概率校准,并保留了现实的光谱,甚至在长达60天的较长的引导时间里,甚至保留了现实的光谱。所有这些进步都是利用一个纯粹的革命性神经性网络结构实现的,而这种结构的精确和高效的大型培训在1024 GPPUs上比强,而更多的是一个新的培训模式,在经典数字模型的域分解方法的启发下,在模型和数据模型的域阵列系统下,在10-CNet3显示快速的快速的准确的预测,在20秒内,在一次的预测下,在一次的周期内进行快速的预测,在一次的周期内,在一次的周期的周期的周期的预测下,在一次的周期的周期内,在一次的周期性预测下,在一次的周期性预测能下,在一次的周期的周期的周期内进行快速地进行。
Article 285
Title@2025-07-16 (3): RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization
Title: RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization | RiemannLoRA: Ein einheitliches Riemann-Rahmenwerk für die ambiguitätsfreie LoRA-Optimierung | Riemann LoRA:无模糊无洛拉优化的统一里伊曼框架 2507.12142v1 |
Authors (7): Vladimir Bogachev, Vladimir Aletov, Alexander Molozhavenko, Denis Bobkov, Vera Soboleva, Aibek Alanov, Maxim Rakhuba
Low-Rank Adaptation (LoRA) has become a widely adopted standard for parameter-efficient fine-tuning of large language models (LLMs), significantly reducing memory and computational demands. However, challenges remain, including finding optimal initialization strategies or mitigating overparametrization in low-rank matrix factorization. In this work, we propose a novel approach that addresses both of the challenges simultaneously within a unified framework. Our method treats a set of fixed-rank LoRA matrices as a smooth manifold. Considering adapters as elements on this manifold removes overparametrization, while determining the direction of the fastest loss decrease along the manifold provides initialization. Special care is taken to obtain numerically stable and computationally efficient implementation of our method, using best practices from numerical linear algebra and Riemannian optimization. Experimental results on LLM and diffusion model architectures demonstrate that RiemannLoRA consistently improves both convergence speed and final performance over standard LoRA and its state-of-the-art modifications.
低兰克适应(LORA)已成为广泛采用的大型语言模型参数高效微调标准,大大减少了记忆和计算需求,但挑战依然存在,包括找到最佳初始化战略或减轻低级矩阵因子化的过度平衡化。在这项工作中,我们提出了一个新颖的办法,在一个统一的框架内同时解决这两个挑战。我们的方法把一套固定的LORA矩阵视为一个光滑的多元体。我们的方法将适应器作为这一元件的元素,消除了过度平衡化,同时确定沿多元体损失减少速度最快的方向提供了初始化。我们特别注意利用数字线性代数和Riemannian优化的最佳做法,实现我们方法的数值稳定和计算效率。LLMM和扩散模型结构的实验结果表明,Riemann LoRA不断提高标准洛拉及其最新修改的趋同速度和最后性能。
Article 286
Title@2025-07-16 (3): Neural Human Pose Prior
Title: Neural Human Pose Prior | Neurale menschliche Pose vor | 人类神经先锋 2507.12138v1 |
Authors (3): Michal Heker, Sefy Kararlitsky, David Tolpin
We introduce a principled, data-driven approach for modeling a neural prior over human body poses using normalizing flows. Unlike heuristic or low-expressivity alternatives, our method leverages RealNVP to learn a flexible density over poses represented in the 6D rotation format. We address the challenge of modeling distributions on the manifold of valid 6D rotations by inverting the Gram-Schmidt process during training, enabling stable learning while preserving downstream compatibility with rotation-based frameworks. Our architecture and training pipeline are framework-agnostic and easily reproducible. We demonstrate the effectiveness of the learned prior through both qualitative and quantitative evaluations, and we analyze its impact via ablation studies. This work provides a sound probabilistic foundation for integrating pose priors into human motion capture and reconstruction pipelines.
我们采用有原则的、由数据驱动的方法来模拟先神经先于人体的模型,使用正常的流动。我们的方法与超常或低表达率的替代方法不同,利用RealNVP来学习6D轮用格式代表的弹性密度。我们通过在培训期间颠倒Gram-Schmidt进程,解决在6D有效轮用多种模式上进行模拟分布的挑战,在保持与轮用框架的下游兼容性的同时,实现稳定的学习。我们的架构和培训管道具有框架性,易于复制。我们通过定性和定量评估,展示了以前学到的知识的有效性,我们通过减价研究分析了其影响。这项工作为将先于者纳入人类运动抓取和重建管道提供了可靠的概率基础。
Article 287
Title@2025-07-16 (3): FedRef: Communication-Efficient Bayesian Fine Tuning with Reference Model
Title: FedRef: Communication-Efficient Bayesian Fine Tuning with Reference Model | FedRef: Kommunikation-Effizient Bayesian Feinabstimmung mit Referenzmodell | FedRef: 通信-节能贝ysian精密票,参考模型 2506.23210v2 |
Authors (2): Taehwan Yoon, Bongjun Choi
Federated learning(FL) is used for distributed scenarios to train artificial intelligence(AI) models while ensuring users’ privacy. In federated learning scenario, the server generally never knows about users’ data. This type of concept makes the AI training process efficient in terms of data privacy. However, regarding model performance, federated AI models may not sufficiently satisfy AI users’ expectations. Furthermore, AI users have a wide range of different needs. It is not easy to satisfy the whole users needs. These types of issues can be addressed through AI model optimization, fine-tuning, or personalization to achieve optimal model performance. To address model optimization challenges, we propose reference model-based federated learning for optimal fine-tuning, which overcomes catastrophic forgetting in each round. This method is derived from Bayesian parameter-efficient transfer learning, which includes an optimal proximal term and utilizes a reference model that incorporates previous model parameters. As a result, this method achieves both high model performance and clients’ low computing cost.
联邦学习(FL) 用于分布式情景, 用于培训人工智能模型,同时确保用户的隐私。在联合学习情景中,服务器一般从来不知道用户的数据。这种概念使得AI培训过程在数据隐私方面效率高。然而,关于模型性能,联合AI模型可能不能充分满足AI用户的期望。此外,AI用户有着各种各样的不同需要。满足整个用户的需要并非易事。这类问题可以通过AI模型优化、微调或个性化来解决,以实现最佳模型性能。为了应对模型优化挑战,我们建议采用基于参考模型的联合会式学习,以优化微调,克服每轮中灾难性的遗漏。这一方法来自巴伊西亚参数高效转移学习,其中包括一个最佳的精度术语,并利用一个包含先前模型参数的参考模型。因此,这种方法既能达到高模型性能,又能满足客户的低计算成本。
Article 288
Title@2025-07-16 (3): HyDRA: A Hybrid Dual-Mode Network for Closed- and Open-Set RFFI with Optimized VMD
Title: HyDRA: A Hybrid Dual-Mode Network for Closed- and Open-Set RFFI with Optimized VMD | HyDRA: Hybrides Dual-Mode-Netzwerk für geschlossenes und offenes RFFI mit optimiertem VMD | HYDRA: 具有优化VMD的封闭式和开放式RFFI混合双模式网络 2507.12133v1 |
Authors (5): Hanwen Liu, Yuhe Huang, Yifeng Gong, Yanjie Zhai, Jiaxuan Lu
Device recognition is vital for security in wireless communication systems, particularly for applications like access control. Radio Frequency Fingerprint Identification (RFFI) offers a non-cryptographic solution by exploiting hardware-induced signal distortions. This paper proposes HyDRA, a Hybrid Dual-mode RF Architecture that integrates an optimized Variational Mode Decomposition (VMD) with a novel architecture based on the fusion of Convolutional Neural Networks (CNNs), Transformers, and Mamba components, designed to support both closed-set and open-set classification tasks. The optimized VMD enhances preprocessing efficiency and classification accuracy by fixing center frequencies and using closed-form solutions. HyDRA employs the Transformer Dynamic Sequence Encoder (TDSE) for global dependency modeling and the Mamba Linear Flow Encoder (MLFE) for linear-complexity processing, adapting to varying conditions. Evaluation on public datasets demonstrates state-of-the-art (SOTA) accuracy in closed-set scenarios and robust performance in our proposed open-set classification method, effectively identifying unauthorized devices. Deployed on NVIDIA Jetson Xavier NX, HyDRA achieves millisecond-level inference speed with low power consumption, providing a practical solution for real-time wireless authentication in real-world environments.
无线电频率指纹识别(RFFI)通过利用硬件诱发的信号扭曲,提供了非加密解决方案。本文提议HyDRA(一种混合双模式RF架构)结合优化变异模式分解(VMD),其新架构以进化神经网络(CNNs)、变异器和Mamba组合为基础,旨在支持闭合和开放定定级任务的组件。优化的VMD(RFFI)通过确定中心频率和使用封闭式解决方案,提高预处理效率和分类准确性。HyDRA(HyDRA)使用变换器动态序列编码器(TDSE)进行全球依赖型建模,而Mamba线性线性变异模式(MLFE)进行线性兼容处理,适应不同的条件。对公共数据集的评价显示闭合和开放定级的情景的准确性,以及我们拟议的开放式分类方法的稳健性性性性性能,有效识别未经授权的装置。HIVA JERS-DRS在现实的AVA-JER-DRS-S-S-Sirimal-deal-deal-deal-deal-deal-deal-deal Exlivial-dal-deal-droviolvial Exlation Avial-dropal Expal 10,在实际的X AS-dal-IVIDRVIDRVY-IA X AS-irmaldalmal 一级提供实际速度,在现实的X AS-slationaldaldal-daldaldal-daldaldaldal-sm-IDRVDRDRVDIS-IDR 一级的X 一级,在现实的X 一级,在现实化的X 一级提供级的X 一级,在现实性平流-dal-dal-sal-dal-dal-dal-dal-dal-dal-SDRVDRVDRVDRVDRVA X 一级的X 一级上,在现实性平流-SDIS-SDRDRDRM-S-S-S-S-dal-d-
Article 289
Title@2025-07-16 (3): Self-Adaptive and Robust Federated Spectrum Sensing without Benign Majority for Cellular Networks
Title: Self-Adaptive and Robust Federated Spectrum Sensing without Benign Majority for Cellular Networks | Selbstadaptives und robustes Federated Spectrum Sensing ohne Benign Majority für Zelluläre Netzwerke | 细胞网络的自我适应和强力联邦光谱测量,不以优美多数进行细胞网络 2507.12127v1 |
Authors (8): Ngoc Duy Pham, Thusitha Dayaratne, Viet Vo, Shangqi Lai, Sharif Abuadbba, Hajime Suzuki, Xingliang Yuan, Carsten Rudolph
Advancements in wireless and mobile technologies, including 5G advanced and the envisioned 6G, are driving exponential growth in wireless devices. However, this rapid expansion exacerbates spectrum scarcity, posing a critical challenge. Dynamic spectrum allocation (DSA)–which relies on sensing and dynamically sharing spectrum–has emerged as an essential solution to address this issue. While machine learning (ML) models hold significant potential for improving spectrum sensing, their adoption in centralized ML-based DSA systems is limited by privacy concerns, bandwidth constraints, and regulatory challenges. To overcome these limitations, distributed ML-based approaches such as Federated Learning (FL) offer promising alternatives. This work addresses two key challenges in FL-based spectrum sensing (FLSS). First, the scarcity of labeled data for training FL models in practical spectrum sensing scenarios is tackled with a semi-supervised FL approach, combined with energy detection, enabling model training on unlabeled datasets. Second, we examine the security vulnerabilities of FLSS, focusing on the impact of data poisoning attacks. Our analysis highlights the shortcomings of existing majority-based defenses in countering such attacks. To address these vulnerabilities, we propose a novel defense mechanism inspired by vaccination, which effectively mitigates data poisoning attacks without relying on majority-based assumptions. Extensive experiments on both synthetic and real-world datasets validate our solutions, demonstrating that FLSS can achieve near-perfect accuracy on unlabeled datasets and maintain Byzantine robustness against both targeted and untargeted data poisoning attacks, even when a significant proportion of participants are malicious.
无线和移动技术的进步,包括5G先进和设想的6G技术的进步,正在推动无线装置的指数增长。然而,这种快速扩张加剧了频谱的稀缺性,提出了严峻的挑战。动态频谱分配(DSA) – – 依赖感测和动态共享频谱的动态频谱分配(DSA) – – 依赖感测和动态共享的频谱分配 – – 已成为解决这一问题的一个基本解决办法。虽然机器学习(ML)模型在改进频谱遥感方面有着巨大的潜力,但是在基于ML的中央DSA系统中采用这些模型受到隐私关切、带宽限制和监管挑战的限制。为了克服这些限制,基于ML的方法,如联邦学习(FL)提供了有希望的替代方法。这项工作解决了基于FL的频谱感测(FLS)的两种关键挑战。第一,在实际频谱感测情景下培训FL模型的标签数据缺乏,同时进行能源探测和无标签数据集的模型培训。第二,我们考察FLSS的安全脆弱性,重点是数据中毒攻击的影响。我们的分析突出表明,现有的多数防御防御防御系统在对抗这类攻击的不准确性方法方面有缺陷。我们建议,在不依靠精确的精确的实验室进行数据测试时,要对真实性攻击进行真正的数据测试。
Article 290
Title@2025-07-16 (3): Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis
Title: Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis | Iterative Augmentation mit Summarization Refinement (IASR) Evaluation für unstrukturierte Umfragedaten Modellierung und Analyse | 对无结构调查数据建模和分析的抽样改进(IASR)评价 2507.12126v1 |
Authors (3): Payal Bhattad, Sai Manoj Pudukotai Dinakarrao, Anju Gupta
Text data augmentation is a widely used strategy for mitigating data sparsity in natural language processing (NLP), particularly in low-resource settings where limited samples hinder effective semantic modeling. While augmentation can improve input diversity and downstream interpretability, existing techniques often lack mechanisms to ensure semantic preservation during large-scale or iterative generation, leading to redundancy and instability. This work introduces a principled evaluation framework for large language model (LLM) based text augmentation, comprising two components: (1) Scalability Analysis, which measures semantic consistency as augmentation volume increases, and (2) Iterative Augmentation with Summarization Refinement (IASR), which evaluates semantic drift across recursive paraphrasing cycles. Empirical evaluations across state-of-the-art LLMs show that GPT-3.5 Turbo achieved the best balance of semantic fidelity, diversity, and generation efficiency. Applied to a real-world topic modeling task using BERTopic with GPT-enhanced few-shot labeling, the proposed approach results in a 400% increase in topic granularity and complete elimination of topic overlaps. These findings validated the utility of the proposed frameworks for structured evaluation of LLM-based augmentation in practical NLP pipelines.
增强文本数据是一项广泛使用的减少自然语言处理(NLP)中数据广度的战略,特别是在有限样本阻碍有效语义建模的低资源环境中,减少自然语言处理(NLP)中的数据广度(NLP),特别是在有限样本阻碍有效语义建模的低资源环境中。虽然增强可以改善投入多样性和下游解释性,但现有技术往往缺乏确保大规模或迭代生成中语义保存的机制,导致冗余和不稳定。这项工作为基于大语言模型(LLLM)的文本增强引入了一个原则性评价框架,包括两个组成部分:(1) 可缩放分析,它衡量语义一致性,因为增加量的增加;(2) 与Summarization Refination(IASR)的迭代推法增强(IASR),该方法评估了周期周期内语义性流动,评估了反复流动的语义流学流,并彻底消除了NLM 结构化框架。这些结论验证了拟议对语言忠实、多样性和生成效率的最佳平衡。应用于一个真实世界主题建模任务建模任务。
Article 291
Title@2025-07-16 (3): From Observational Data to Clinical Recommendations: A Causal Framework for Estimating Patient-level Treatment Effects and Learning Policies
Title: From Observational Data to Clinical Recommendations: A Causal Framework for Estimating Patient-level Treatment Effects and Learning Policies | Von Beobachtungsdaten zu klinischen Empfehlungen: Ein ursächlicher Rahmen für die Schätzung von Behandlungseffekten und Lernstrategien auf Patientenebene | 从观察数据到临床建议:估计病人治疗效果和学习政策的结果框架 2507.11381v2 |
Authors (8): Rom Gutman, Shimon Sheiba, Omer Noy Klein, Naama Dekel Bird, Amit Gruber, Doron Aronson, Oren Caspi, Uri Shalit
We propose a framework for building patient-specific treatment recommendation models, building on the large recent literature on learning patient-level causal models and inspired by the target trial paradigm of Hernan and Robins. We focus on safety and validity, including the crucial issue of causal identification when using observational data. We do not provide a specific model, but rather a way to integrate existing methods and know-how into a practical pipeline. We further provide a real world use-case of treatment optimization for patients with heart failure who develop acute kidney injury during hospitalization. The results suggest our pipeline can improve patient outcomes over the current treatment regime.
我们提出了一个框架,用于建立针对病人的治疗建议模式,以最近大量关于学习病人因果模式的文献为基础,并受Hernan和Robins的目标试验模式的启发。我们注重安全和有效性,包括使用观察数据时因果识别的关键问题。我们不提供具体模式,而是将现有方法和专门知识纳入实际管道的一种方式。我们进一步为住院期间患心脏病并患急性肾损伤的病人提供一个真正的世界性最佳治疗模式。结果显示,我们的输油管线可以改善目前治疗制度下的患者结果。
Article 292
Title@2025-07-16 (3): Learning to Reason at the Frontier of Learnability
Title: Learning to Reason at the Frontier of Learnability | Vernunft lernen an der Grenze der Lernfähigkeit | 学习在可学习的前沿学习理性 2502.12272v4 |
Authors (2): Thomas Foster, Jakob Foerster
Reinforcement learning is now widely adopted as the final stage of large language model training, especially for reasoning-style tasks such as maths problems. Typically, models attempt each question many times during a single training step and attempt to learn from their successes and failures. However, we demonstrate that throughout training with two popular algorithms (PPO and VinePPO) on two widely used datasets, many questions are either solved by all attempts - meaning they are already learned - or by none - providing no meaningful training signal. To address this, we adapt a method from the reinforcement learning literature - sampling for learnability - and apply it to the reinforcement learning stage of LLM training. Our curriculum prioritises questions with high variance of success, i.e. those where the agent sometimes succeeds, but not always. Our findings demonstrate that this curriculum consistently boosts training performance across multiple algorithms and datasets, paving the way for more efficient and effective reinforcement learning with LLMs.
强化学习现在被广泛作为大型语言模式培训的最后阶段,特别是用于数学问题等推理式任务。典型的情况是,模型在一次培训步骤中多次尝试每个问题,并试图从其成功和失败中吸取教训。然而,我们证明,在用两种广泛使用的数据集进行两种流行算法(PPO和VinePPO)培训的整个过程中,许多问题要么通过所有尝试(即他们已经学习过,要么没有提供有意义的培训信号)得到解决。为了解决这个问题,我们从强化学习文献(即学习能力抽样)中调整了一种方法,并将其应用到LLM培训的强化学习阶段。我们的课程优先考虑成功率差异很大的问题,即代理人有时成功但并不总是成功的问题。我们的研究结果表明,这一课程始终在提高多种算法和数据集的培训绩效,为与LMS一道提高学习效率和成效铺平了道路。
Article 293
Title@2025-07-16 (3): Multimodal Coordinated Online Behavior: Trade-offs and Strategies
Title: Multimodal Coordinated Online Behavior: Trade-offs and Strategies | Multimodal koordiniertes Online-Verhalten: Kompromisse und Strategien | 多式联运协调在线行为:取舍和战略 2507.12108v1 |
Authors (5): Lorenzo Mannocci, Stefano Cresci, Matteo Magnani, Anna Monreale, Maurizio Tesconi
Coordinated online behavior, which spans from beneficial collective actions to harmful manipulation such as disinformation campaigns, has become a key focus in digital ecosystem analysis. Traditional methods often rely on monomodal approaches, focusing on single types of interactions like co-retweets or co-hashtags, or consider multiple modalities independently of each other. However, these approaches may overlook the complex dynamics inherent in multimodal coordination. This study compares different ways of operationalizing the detection of multimodal coordinated behavior. It examines the trade-off between weakly and strongly integrated multimodal models, highlighting the balance between capturing broader coordination patterns and identifying tightly coordinated behavior. By comparing monomodal and multimodal approaches, we assess the unique contributions of different data modalities and explore how varying implementations of multimodality impact detection outcomes. Our findings reveal that not all the modalities provide distinct insights, but that with a multimodal approach we can get a more comprehensive understanding of coordination dynamics. This work enhances the ability to detect and analyze coordinated online behavior, offering new perspectives for safeguarding the integrity of digital platforms.
协调的在线行为从有益的集体行动到错误信息运动等有害操纵,已成为数字生态系统分析的一个关键重点。传统方法往往依赖单一模式方法,侧重于单一类型的互动,如共变或共变或共变,或考虑相互独立的多种模式。然而,这些方法可能忽视多式联运协调所固有的复杂动态。本研究报告比较了使多式协调行为探测工作运作起来的不同方式。本研究报告审查了薄弱和强集成的多式联运模式模式之间的权衡,强调了在捕捉更广泛的协调模式和确定密切协调行为之间的平衡。通过比较单式和多式方法,我们评估了不同数据模式的独特贡献,并探讨了多式联运影响探测结果的不同实施方式。我们的调查结果显示,并非所有模式都提供了不同的见解,但通过采用多式方法,我们可以更全面地了解协调动态。这项工作增强了检测和分析协调的在线行为的能力,为维护数字平台的完整性提供了新的视角。
Article 294
Title@2025-07-16 (3): A Privacy-Preserving Framework for Advertising Personalization Incorporating Federated Learning and Differential Privacy
Title: A Privacy-Preserving Framework for Advertising Personalization Incorporating Federated Learning and Differential Privacy | Ein Privacy-Preserving Framework für Werbung Personalisierung Einschließlich Federated Learning und Differential Privacy | 包含联邦学习和不同隐私的隐私保护框架 2507.12098v1 |
Authors (3): Xiang Li, Yifan Lin, Yuanzhe Zhang
To mitigate privacy leakage and performance issues in personalized advertising, this paper proposes a framework that integrates federated learning and differential privacy. The system combines distributed feature extraction, dynamic privacy budget allocation, and robust model aggregation to balance model accuracy, communication overhead, and privacy protection. Multi-party secure computing and anomaly detection mechanisms further enhance system resilience against malicious attacks. Experimental results demonstrate that the framework achieves dual optimization of recommendation accuracy and system efficiency while ensuring privacy, providing both a practical solution and a theoretical foundation for applying privacy protection technologies in advertisement recommendation.
为了减少个人化广告中的隐私泄漏和性能问题,本文件提出一个将联合学习和差异性隐私相结合的框架,该系统将分布式地物提取、动态隐私预算分配和强有力的模型汇总结合起来,以平衡模型准确性、通信间接费用和隐私保护。多党安全计算和异常现象检测机制进一步加强了系统抵御恶意袭击的能力。实验结果表明,该框架实现了建议准确性和系统效率的双重优化,同时确保隐私,为在广告建议中应用隐私保护技术提供了实用解决方案和理论基础。
Article 295
Title@2025-07-16 (3): Measuring Informativeness Gap of (Mis)Calibrated Predictors
Title: Measuring Informativeness Gap of (Mis)Calibrated Predictors | Messung der Informativitätslücke von (Miss)Kalibrierten Vorhersagern | 测量(米)已校算的预测人的信息差距 2507.12094v1 |
Authors (2): Yiding Feng, Wei Tang
In many applications, decision-makers must choose between multiple predictive models that may all be miscalibrated. Which model (i.e., predictor) is more “useful” in downstream decision tasks? To answer this, our first contribution introduces the notion of the informativeness gap between any two predictors, defined as the maximum normalized payoff advantage one predictor offers over the other across all decision-making tasks. Our framework strictly generalizes several existing notions: it subsumes U-Calibration [KLST-23] and Calibration Decision Loss [HW-24], which compare a miscalibrated predictor to its calibrated counterpart, and it recovers Blackwell informativeness [Bla-51, Bla-53] as a special case when both predictors are perfectly calibrated. Our second contribution is a dual characterization of the informativeness gap, which gives rise to a natural informativeness measure that can be viewed as a relaxed variant of the earth mover’s distance (EMD) between two prediction distributions. We show that this measure satisfies natural desiderata: it is complete and sound, and it can be estimated sample-efficiently in the prediction-only access setting. Along the way, we also obtain novel combinatorial structural results when applying this measure to perfectly calibrated predictors.
在许多应用中, 决策者必须在多种预测模型之间做出选择, 这些模型都可能被错误地校准。 哪种模型( 预测者) 在下游决策任务中更“ 有用 ” ? 为了回答这个问题, 我们的第一种贡献引入了两种预测者之间信息性差距的概念, 被定义为一个预测者在所有决策任务中提供的最大正常报酬优势。 我们的框架严格概括了几个现有概念: 它将U- 校准[ KLST-23] 和校准决定损失[ HW-24] 进行分解, 将一个错误的预测者与其校准的对应方进行比较, 并在两种预测者完全校准时, 它恢复了Blackwell信息性[Bla-51, Bla-53] 。 我们的第二个贡献是两个预测者对信息性差距的双重定性, 由此产生了自然信息性衡量标准, 可以被视为两次预测分布的地球移动者距离( EMD) 和校准决定损失[ HW-24] 。 我们显示, 这一测量标准符合自然偏差: 它是完整和稳妥的, 并且在进行精确的精确的校准后, 能够进行结构校准预测。
Article 296
Title@2025-07-16 (3): Improved Analysis for Sign-based Methods with Momentum Updates
Title: Improved Analysis for Sign-based Methods with Momentum Updates | Verbesserte Analyse für signbasierte Methoden mit Momentum-Updates | 改进对基于信号方法的最新动态分析 2507.12091v1 |
Authors (5): Wei Jiang, Dingzhi Yu, Sifan Yang, Wenhao Yang, Lijun Zhang
In this paper, we present enhanced analysis for sign-based optimization algorithms with momentum updates. Traditional sign-based methods, under the separable smoothness assumption, guarantee a convergence rate of $\mathcal{O}(T^{-1/4})$, but they either require large batch sizes or assume unimodal symmetric stochastic noise. To address these limitations, we demonstrate that signSGD with momentum can achieve the same convergence rate using constant batch sizes without additional assumptions. Our analysis, under the standard $l_2$-smoothness condition, improves upon the result of the prior momentum-based signSGD method by a factor of $\mathcal{O}(d^{1/2})$, where $d$ is the problem dimension. Furthermore, we explore sign-based methods with majority vote in distributed settings and show that the proposed momentum-based method yields convergence rates of $\mathcal{O}\left( d^{1/2}T^{-1/2} + dn^{-1/2} \right)$ and $\mathcal{O}\left( \max { d^{1/4}T^{-1/4}, d^{1/10}T^{-1/5} } \right)$, which outperform the previous results of $\mathcal{O}\left( dT^{-1/4} + dn^{-1/2} \right)$ and $\mathcal{O}\left( d^{3/8}T^{-1/8} \right)$, respectively. Numerical experiments further validate the effectiveness of the proposed methods.
在本文中, 我们为基于信号的优化算法提供强化分析, 并更新动力。 传统的基于信号的方法, 在可分的平滑假设下, 保证一个$\mathcal{O} (T-1/4}) 的趋同率, 但是它们要么需要大批量大小, 要么采取单模量的对称随机噪音。 为了解决这些限制, 我们证明, 具有动力的标志SGD 可以在没有额外假设的情况下, 使用恒定批量大小实现相同的趋同率 。 我们的分析, 在标准 $_ 2, $2$2$- smooth 条件下, 根据先前基于动力的SGD方法, 以$\ mathcal{O} (d1/2} 美元/ d_light_\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
Article 297
Title@2025-07-16 (3): Emergence of Quantised Representations Isolated to Anisotropic Functions
Title: Emergence of Quantised Representations Isolated to Anisotropic Functions | Entstehung quantifizierter Repräsentationen isoliert mit anisotropen Funktionen | 孤立到非尼斯代职能的量化代表的出现情况 2507.12070v1 |
Authors (1): George Bird
This paper describes a novel methodology for determining representational alignment, developed upon the existing Spotlight Resonance method. Using this, it is found that algebraic symmetries of network primitives are a strong predictor for task-agnostic structure in representations. Particularly, this new tool is used to gain insight into how discrete representations can form and arrange in autoencoder models, through an ablation study where only the activation function is altered. Representations are found to tend to discretise when the activation functions are defined through a discrete algebraic permutation-equivariant symmetry. In contrast, they remain continuous under a continuous algebraic orthogonal-equivariant definition. These findings corroborate the hypothesis that functional form choices can carry unintended inductive biases which produce task-independent artefactual structures in representations, particularly that contemporary forms induce discretisation of otherwise continuous structure – a quantisation effect. Moreover, this supports a general causal model for one mode in which discrete representations may form, and could constitute a prerequisite for downstream interpretability phenomena, including grandmother neurons, discrete coding schemes, general linear features and possibly Superposition. Hence, this tool and proposed mechanism for the influence of functional form on representations may provide several insights into emergent interpretability research. Finally, preliminary results indicate that quantisation of representations appears to correlate with a measurable increase in reconstruction error, reinforcing previous conjectures that this collapse can be detrimental.
本文描述了一种根据现有亮光共振法开发的确定代表比对的新方法。 使用这个方法,发现网络原始体的代数对称性是显示任务不可知性结构的强有力预测器。 特别是, 这一新工具用于通过通缩研究, 深入了解离散的表示式如何形成和在自动编码模型中安排, 只有激活功能才会改变。 当激活功能通过离散的代数对流- 异性对称性定义定义时, 表示倾向于分解。 相反, 网络原始体的代数对称性是连续的对等性定义。 这些发现证实了这样的假设: 功能形式选择可能会产生意外的暗示性偏差, 产生任务不独立动的视觉结构, 特别是当代形式导致其他连续结构的离散化 – 一种振荡效应。 此外, 这为一种可能形成离异性表的形态提供了一般因果关系模式, 并且可能构成下游解释性现象的先决条件, 包括祖母神经系统、 离心或异性对等等- Q- 结构的分立性分析结果, 最终显示, 直径直径可判性研究可能增加这个结构结构结构的预判性分析机制。
Article 298
Title@2025-07-16 (3): Enhancing RLHF with Human Gaze Modeling
Title: Enhancing RLHF with Human Gaze Modeling | Verbesserung der RLHF mit dem Modellieren von Human Gaze | 利用人体盖盖模型模型增强RLHF 2507.09016v2 |
Authors (3): Karim Galliamov, Ivan Titov, Ilya Pershin
Reinforcement Learning from Human Feedback (RLHF) aligns language models with human preferences but is computationally expensive. We explore two approaches that leverage human gaze modeling to enhance RLHF: (1) gaze-aware reward models and (2) gaze-based distribution of sparse rewards at token level. Our experiments demonstate that gaze-informed RLHF achieves faster convergence while maintaining or slightly improving performance, thus, reducing computational costs during policy optimization. These results show that human gaze provides a valuable and underused signal for policy optimization, pointing to a promising direction for improving RLHF efficiency.
从人类反馈中强化学习(RLHF)使语言模式与人类偏好相一致,但计算成本很高。我们探索了两种利用人类视线模型来提升RLHF的方法:(1)目视奖励模式和(2)视视地分配象征性的微薄奖励。我们的实验预示着,了解视觉的RLHF在保持或略微改善业绩的同时,能够更快地实现趋同,从而降低政策优化过程中的计算成本。这些结果表明,人类视线为政策优化提供了宝贵和未充分利用的信号,指出了提高RLHF效率的有希望的方向。
Article 299
Title@2025-07-16 (3): StylOch at PAN: Gradient-Boosted Trees with Frequency-Based Stylometric Features
Title: StylOch at PAN: Gradient-Boosted Trees with Frequency-Based Stylometric Features | StylOch bei PAN: Gradient-Boosted Trees mit frequenzbasierten stylometrischen Eigenschaften | PAN的StylOch:带以频率为基础的音量特征的梯度-波状树 2507.12064v1 |
Authors (4): Jeremi K. Ochab, Mateusz Matias, Tymoteusz Boba, Tomasz Walkowiak
This submission to the binary AI detection task is based on a modular stylometric pipeline, where: public spaCy models are used for text preprocessing (including tokenisation, named entity recognition, dependency parsing, part-of-speech tagging, and morphology annotation) and extracting several thousand features (frequencies of n-grams of the above linguistic annotations); light-gradient boosting machines are used as the classifier. We collect a large corpus of more than 500 000 machine-generated texts for the classifier’s training. We explore several parameter options to increase the classifier’s capacity and take advantage of that training set. Our approach follows the non-neural, computationally inexpensive but explainable approach found effective previously.
提交AI二进制检测任务的本件是基于模块式的tylology管道,其中:在文本预处理(包括象征性化、名称实体识别、依赖分析、部分语音标记和形态说明)和提取数千个特征(上述语言注释n克的频率)时,使用公共微粒模型;使用轻度助推机作为分类器。我们收集了50多万份用于分类器培训的机器生成文本。我们探索了几种参数选项,以提高分类器的能力并利用该成套培训。我们的方法遵循了以前发现的非神经、计算成本低但可以解释的方法。
Article 300
Title@2025-07-16 (3): FloGAN: Scenario-Based Urban Mobility Flow Generation via Conditional GANs and Dynamic Region Decoupling
Title: FloGAN: Scenario-Based Urban Mobility Flow Generation via Conditional GANs and Dynamic Region Decoupling | FloGAN: Szenariobasierte Urban Mobility Flow Generation über bedingte GANs und dynamische Region Entkopplung | FloGAN:通过有条件的GANs和动态区域脱钩,根据设想情况产生城市流动流动流动流动 2507.12053v1 |
Authors (4): Seanglidet Yean, Jiazu Zhou, Bu-Sung Lee, Markus Schläpfer
The mobility patterns of people in cities evolve alongside changes in land use and population. This makes it crucial for urban planners to simulate and analyze human mobility patterns for purposes such as transportation optimization and sustainable urban development. Existing generative models borrowed from machine learning rely heavily on historical trajectories and often overlook evolving factors like changes in population density and land use. Mechanistic approaches incorporate population density and facility distribution but assume static scenarios, limiting their utility for future projections where historical data for calibration is unavailable. This study introduces a novel, data-driven approach for generating origin-destination mobility flows tailored to simulated urban scenarios. Our method leverages adaptive factors such as dynamic region sizes and land use archetypes, and it utilizes conditional generative adversarial networks (cGANs) to blend historical data with these adaptive parameters. The approach facilitates rapid mobility flow generation with adjustable spatial granularity based on regions of interest, without requiring extensive calibration data or complex behavior modeling. The promising performance of our approach is demonstrated by its application to mobile phone data from Singapore, and by its comparison with existing methods.
nan
Article 301
Title@2025-07-16 (3): Information-Theoretic Generalization Bounds of Replay-based Continual Learning
Title: Information-Theoretic Generalization Bounds of Replay-based Continual Learning | Information-Theoretische Verallgemeinerung Grenzen des replay-basierten kontinuierlichen Lernens | 基于重放的连续不断学习的信息理论一般化环球 2507.12043v1 |
Authors (6): Wen Wen, Tieliang Gong, Yunjiao Zhang, Zeyu Gao, Weizhan Zhang, Yong-Jin Liu
Continual learning (CL) has emerged as a dominant paradigm for acquiring knowledge from sequential tasks while avoiding catastrophic forgetting. Although many CL methods have been proposed to show impressive empirical performance, the theoretical understanding of their generalization behavior remains limited, particularly for replay-based approaches. In this paper, we establish a unified theoretical framework for replay-based CL, deriving a series of information-theoretic bounds that explicitly characterize how the memory buffer interacts with the current task to affect generalization. Specifically, our hypothesis-based bounds reveal that utilizing the limited exemplars of previous tasks alongside the current task data, rather than exhaustive replay, facilitates improved generalization while effectively mitigating catastrophic forgetting. Furthermore, our prediction-based bounds yield tighter and computationally tractable upper bounds of the generalization gap through the use of low-dimensional variables. Our analysis is general and broadly applicable to a wide range of learning algorithms, exemplified by stochastic gradient Langevin dynamics (SGLD) as a representative method. Comprehensive experimental evaluations demonstrate the effectiveness of our derived bounds in capturing the generalization dynamics in replay-based CL settings.
nan
Article 302
Title@2025-07-16 (3): Granular feedback merits sophisticated aggregation
Title: Granular feedback merits sophisticated aggregation | Granular Feedback verdient anspruchsvolle Aggregation | 精密的汇总值得考虑 2507.12041v1 |
Authors (5): Anmol Kagrecha, Henrik Marklund, Potsawee Manakul, Richard Zeckhauser, Benjamin Van Roy
Human feedback is increasingly used across diverse applications like training AI models, developing recommender systems, and measuring public opinion – with granular feedback often being preferred over binary feedback for its greater informativeness. While it is easy to accurately estimate a population’s distribution of feedback given feedback from a large number of individuals, cost constraints typically necessitate using smaller groups. A simple method to approximate the population distribution is regularized averaging: compute the empirical distribution and regularize it toward a prior. Can we do better? As we will discuss, the answer to this question depends on feedback granularity. Suppose one wants to predict a population’s distribution of feedback using feedback from a limited number of individuals. We show that, as feedback granularity increases, one can substantially improve upon predictions of regularized averaging by combining individuals’ feedback in ways more sophisticated than regularized averaging. Our empirical analysis using questions on social attitudes confirms this pattern. In particular, with binary feedback, sophistication barely reduces the number of individuals required to attain a fixed level of performance. By contrast, with five-point feedback, sophisticated methods match the performance of regularized averaging with about half as many individuals.
nan
Article 303
Title@2025-07-16 (3): A Computational Theory and Semi-Supervised Algorithm for Clustering
Title: A Computational Theory and Semi-Supervised Algorithm for Clustering | A Computational Theory und semi-überwachten Algorithmus für Clustering | 集束法的计算理论和半有效比值 2306.06974v2 |
Authors (1): Nassir Mohammad
A computational theory for clustering and a semi-supervised clustering algorithm is presented. Clustering is defined to be the obtainment of groupings of data such that each group contains no anomalies with respect to a chosen grouping principle and measure; all other examples are considered to be fringe points, isolated anomalies, anomalous clusters or unknown clusters. More precisely, after appropriate modelling under the assumption of uniform random distribution, any example whose expectation of occurrence is <1 with respect to a group is considered an anomaly; otherwise it is assigned a membership of that group. Thus, clustering is conceived as the dual of anomaly detection. The representation of data is taken to be the Euclidean distance of a point to a cluster median. This is due to the robustness properties of the median to outliers, its approximate location of centrality and so that decision boundaries are general purpose. The kernel of the clustering method is the perception anomaly detection algorithm, resulting in a parameter-free, fast, and efficient clustering algorithm. Acknowledging that clustering is an interactive and iterative process, the algorithm relies on a small fraction of known relationships between examples. These relationships serve as seeds to define the user’s objectives and guide the clustering process. The method then expands the clusters accordingly, leaving the remaining examples for exploration and subsequent iterations. Results are presented on synthetic and realworld data sets, demonstrating the advantages over the most popular unsupervised and semi-supervised clustering methods.
nan
Article 304
Title@2025-07-16 (3): MVAR: MultiVariate AutoRegressive Air Pollutants Forecasting Model
Title: MVAR: MultiVariate AutoRegressive Air Pollutants Forecasting Model | MVAR: MultiVariate AutoRegressive Luftverunreinigungs-Prognosemodell | MVAR: 多变自动递减空气污染物预测模型 2507.12023v1 |
Authors (6): Xu Fan, Zhihao Wang, Yuetan Lin, Yan Zhang, Yang Xiang, Hao Li
Air pollutants pose a significant threat to the environment and human health, thus forecasting accurate pollutant concentrations is essential for pollution warnings and policy-making. Existing studies predominantly focus on single-pollutant forecasting, neglecting the interactions among different pollutants and their diverse spatial responses. To address the practical needs of forecasting multivariate air pollutants, we propose MultiVariate AutoRegressive air pollutants forecasting model (MVAR), which reduces the dependency on long-time-window inputs and boosts the data utilization efficiency. We also design the Multivariate Autoregressive Training Paradigm, enabling MVAR to achieve 120-hour long-term sequential forecasting. Additionally, MVAR develops Meteorological Coupled Spatial Transformer block, enabling the flexible coupling of AI-based meteorological forecasts while learning the interactions among pollutants and their diverse spatial responses. As for the lack of standardized datasets in air pollutants forecasting, we construct a comprehensive dataset covering 6 major pollutants across 75 cities in North China from 2018 to 2023, including ERA5 reanalysis data and FuXi-2.0 forecast data. Experimental results demonstrate that the proposed model outperforms state-of-the-art methods and validate the effectiveness of the proposed architecture.
nan
Article 305
Title@2025-07-16 (3): Incorporating Fairness Constraints into Archetypal Analysis
Title: Incorporating Fairness Constraints into Archetypal Analysis | Einschließlich Fairness-Einschränkungen in die Archetypische Analyse | 将公平制约因素纳入大区分析 2507.12021v1 |
Authors (2): Aleix Alcacer, Irene Epifanio
Archetypal Analysis (AA) is an unsupervised learning method that represents data as convex combinations of extreme patterns called archetypes. While AA provides interpretable and low-dimensional representations, it can inadvertently encode sensitive attributes, leading to fairness concerns. In this work, we propose Fair Archetypal Analysis (FairAA), a modified formulation that explicitly reduces the influence of sensitive group information in the learned projections. We also introduce FairKernelAA, a nonlinear extension that addresses fairness in more complex data distributions. Our approach incorporates a fairness regularization term while preserving the structure and interpretability of the archetypes. We evaluate FairAA and FairKernelAA on synthetic datasets, including linear, nonlinear, and multi-group scenarios, demonstrating their ability to reduce group separability – as measured by mean maximum discrepancy and linear separability – without substantially compromising explained variance. We further validate our methods on the real-world ANSUR I dataset, confirming their robustness and practical utility. The results show that FairAA achieves a favorable trade-off between utility and fairness, making it a promising tool for responsible representation learning in sensitive applications.
nan
Article 306
Title@2025-07-16 (3): DUSE: A Data Expansion Framework for Low-resource Automatic Modulation Recognition based on Active Learning
Title: DUSE: A Data Expansion Framework for Low-resource Automatic Modulation Recognition based on Active Learning | DUSE: Ein Datenerweiterungs-Framework für die automatische Modulationserkennung mit geringer Ressource basierend auf aktivem Lernen | DUSE:基于积极学习的低资源自动调整识别数据扩展框架 2507.12011v1 |
Authors (7): Yao Lu, Hongyu Gao, Zhuangzhi Chen, Dongwei Xu, Yun Lin, Qi Xuan, Guan Gui
Although deep neural networks have made remarkable achievements in the field of automatic modulation recognition (AMR), these models often require a large amount of labeled data for training. However, in many practical scenarios, the available target domain data is scarce and difficult to meet the needs of model training. The most direct way is to collect data manually and perform expert annotation, but the high time and labor costs are unbearable. Another common method is data augmentation. Although it can enrich training samples to a certain extent, it does not introduce new data and therefore cannot fundamentally solve the problem of data scarcity. To address these challenges, we introduce a data expansion framework called Dynamic Uncertainty-driven Sample Expansion (DUSE). Specifically, DUSE uses an uncertainty scoring function to filter out useful samples from relevant AMR datasets and employs an active learning strategy to continuously refine the scorer. Extensive experiments demonstrate that DUSE consistently outperforms 8 coreset selection baselines in both class-balance and class-imbalance settings. Besides, DUSE exhibits strong cross-architecture generalization for unseen models.
nan
Article 307
Title@2025-07-16 (3): How does Watermarking Affect Visual Language Models in Document Understanding?
Title: How does Watermarking Affect Visual Language Models in Document Understanding? | Wie wirkt sich Watermarking auf visuelle Sprachmodelle im Dokumentenverständnis aus? | 文件理解中的视觉语言模型如何影响水标记? 2504.01048v2 |
Authors (5): Chunxue Xu, Yiwei Wang, Bryan Hooi, Yujun Cai, Songze Li
Visual Language Models (VLMs) have become foundational models for document understanding tasks, widely used in the processing of complex multimodal documents across domains such as finance, law, and academia. However, documents often contain noise-like information, such as watermarks, which inevitably leads us to inquire: \emph{Do watermarks degrade the performance of VLMs in document understanding?} To address this, we propose a novel evaluation framework to investigate the effect of visible watermarks on VLMs performance. We takes into account various factors, including different types of document data, the positions of watermarks within documents and variations in watermark content. Our experimental results reveal that VLMs performance can be significantly compromised by watermarks, with performance drop rates reaching up to 36\%. We discover that \emph{scattered} watermarks cause stronger interference than centralized ones, and that \emph{semantic contents} in watermarks creates greater disruption than simple visual occlusion. Through attention mechanism analysis and embedding similarity examination, we find that the performance drops are mainly attributed to that watermarks 1) force widespread attention redistribution, and 2) alter semantic representation in the embedding space. Our research not only highlights significant challenges in deploying VLMs for document understanding, but also provides insights towards developing robust inference mechanisms on watermarked documents.
nan
Article 308
Title@2025-07-16 (3): Expanding ML-Documentation Standards For Better Security
Title: Expanding ML-Documentation Standards For Better Security | Erweiterung der ML-Dokumentationsstandards für bessere Sicherheit | 扩大多L-文件标准以增进安全 2507.12003v1 |
Authors (1): Cara Ellen Appel
This article presents the current state of ML-security and of the documentation of ML-based systems, models and datasets in research and practice based on an extensive review of the existing literature. It shows a generally low awareness of security aspects among ML-practitioners and organizations and an often unstandardized approach to documentation, leading to overall low quality of ML-documentation. Existing standards are not regularly adopted in practice and IT-security aspects are often not included in documentation. Due to these factors, there is a clear need for improved security documentation in ML, as one step towards addressing the existing gaps in ML-security. To achieve this, we propose expanding existing documentation standards for ML-documentation to include a security section with specific security relevant information. Implementing this, a novel expanded method of documenting security requirements in ML-documentation is presented, based on the existing Model Cards and Datasheets for Datasets standards, but with the recommendation to adopt these findings in all ML-documentation.
nan
Article 309
Title@2025-07-16 (3): Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing
Title: Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing | In-Person-Gespräche in geräuschvollen Real-World-Umgebungen mit Smartwatch Audio und Motion Sensing erkennen | 利用智能监视音频和运动遥感,在噪音真实世界环境中检测人间谈话 2507.12002v1 |
Authors (4): Alice Zhang, Callihan Bertley, Dawei Liang, Edison Thomaz
Social interactions play a crucial role in shaping human behavior, relationships, and societies. It encompasses various forms of communication, such as verbal conversation, non-verbal gestures, facial expressions, and body language. In this work, we develop a novel computational approach to detect a foundational aspect of human social interactions, in-person verbal conversations, by leveraging audio and inertial data captured with a commodity smartwatch in acoustically-challenging scenarios. To evaluate our approach, we conducted a lab study with 11 participants and a semi-naturalistic study with 24 participants. We analyzed machine learning and deep learning models with 3 different fusion methods, showing the advantages of fusing audio and inertial data to consider not only verbal cues but also non-verbal gestures in conversations. Furthermore, we perform a comprehensive set of evaluations across activities and sampling rates to demonstrate the benefits of multimodal sensing in specific contexts. Overall, our framework achieved 82.0$\pm$3.0% macro F1-score when detecting conversations in the lab and 77.2$\pm$1.8% in the semi-naturalistic setting.
nan
Article 310
Title@2025-07-16 (3): Labels Generated by Large Language Models Help Measure People’s Empathy in Vitro
Title: Labels Generated by Large Language Models Help Measure People’s Empathy in Vitro | Etiketten, die durch große Sprachmodelle erzeugt werden, helfen, die Empathie der Menschen in Vitro zu messen | 以大语言模型生成的标签 帮助测量体外民众的共鸣 2501.00691v2 |
Authors (7): Md Rakibul Hasan, Yue Yao, Md Zakir Hossain, Aneesh Krishna, Imre Rudas, Shafin Rahman, Tom Gedeon
Large language models (LLMs) have revolutionised many fields, with LLM-as-a-service (LLMSaaS) offering accessible, general-purpose solutions without costly task-specific training. In contrast to the widely studied prompt engineering for directly solving tasks (in vivo), this paper explores LLMs’ potential for in-vitro applications: using LLM-generated labels to improve supervised training of mainstream models. We examine two strategies - (1) noisy label correction and (2) training data augmentation - in empathy computing, an emerging task to predict psychology-based questionnaire outcomes from inputs like textual narratives. Crowdsourced datasets in this domain often suffer from noisy labels that misrepresent underlying empathy. We show that replacing or supplementing these crowdsourced labels with LLM-generated labels, developed using psychology-based scale-aware prompts, achieves statistically significant accuracy improvements. Notably, the RoBERTa pre-trained language model (PLM) trained with noise-reduced labels yields a state-of-the-art Pearson correlation coefficient of 0.648 on the public NewsEmp benchmarks. This paper further analyses evaluation metric selection and demographic biases to help guide the future development of more equitable empathy computing models. Code and LLM-generated labels are available at https://github.com/hasan-rakibul/LLMPathy.
nan
Article 311
Title@2025-07-16 (3): Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection
Title: Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection | Können LLMs Betrüger finden? Mehrstufige LLM-Verbesserung der Graphenbetrugserkennung | 多级LLM强化图形欺诈探测 2507.11997v1 |
Authors (2): Tairan Huang, Yili Wang
Graph fraud detection has garnered significant attention as Graph Neural Networks (GNNs) have proven effective in modeling complex relationships within multimodal data. However, existing graph fraud detection methods typically use preprocessed node embeddings and predefined graph structures to reveal fraudsters, which ignore the rich semantic cues contained in raw textual information. Although Large Language Models (LLMs) exhibit powerful capabilities in processing textual information, it remains a significant challenge to perform multimodal fusion of processed textual embeddings with graph structures. In this paper, we propose a \textbf{M}ulti-level \textbf{L}LM \textbf{E}nhanced Graph Fraud \textbf{D}etection framework called MLED. In MLED, we utilize LLMs to extract external knowledge from textual information to enhance graph fraud detection methods. To integrate LLMs with graph structure information and enhance the ability to distinguish fraudsters, we design a multi-level LLM enhanced framework including type-level enhancer and relation-level enhancer. One is to enhance the difference between the fraudsters and the benign entities, the other is to enhance the importance of the fraudsters in different relations. The experiments on four real-world datasets show that MLED achieves state-of-the-art performance in graph fraud detection as a generalized framework that can be applied to existing methods.
nan
Article 312
Title@2025-07-16 (3): Predictable Scale: Part I, Step Law – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining
Title: Predictable Scale: Part I, Step Law – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining | Vorhersehbare Skala: Teil I, Schrittgesetz – Optimales Hyperparameter-Skalierungsgesetz im großen Sprachmodell Vorschulung | 可预测比例:第一部分,步法 – – 大语言示范培训前,最佳超参数缩放法 2503.04715v6 |
Authors (13): Houyi Li, Wenzhen Zheng, Qiufeng Wang, Hanshan Zhang, Zili Wang, Shijie Xuyang, Yuantao Fan, Zhenyu Ding, Haoying Wang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
The impressive capabilities of Large Language Models (LLMs) across diverse tasks are now well\text{-}established, yet their effective deployment necessitates careful hyperparameter optimization. Although existing methods have explored the influence of hyperparameters on model performance, a principled and generalizable framework across model architectures and data recipes remains absent. In this study, we conduct an unprecedented empirical investigation\text{-} training over 3,700 LLMs from scratch across 100 trillion tokens, consuming nearly one million NVIDIA H800 GPU hours to establish a universal Scaling Law for hyperparameter optimization in LLM Pre-training, called \textbf{Step Law}. We empirically observe that, under fixed model size ($N$) and dataset size ($D$), the hyperparameter landscape exhibits convexity with a broad optimum, substantially reducing the complexity of hyperparameter search. Building on this insight, we formally define and empirically validate the Step Law: The optimal learning rate follows a power-law relationship with $N$ and $D$, while the optimal batch size is primarily influenced by $D$ and remains largely invariant to $N$.Notably, our estimated optima deviate from the global best performance found via exhaustive search by merely \textbf{0.094\%} on the test set. To our best known, Step Law is the \textbf{first} that unifies different model shapes and structures, such as Mixture-of-Experts models and dense transformers, as well as establishes optimal hyperparameter scaling laws across diverse data recipes. We contribute a universal, plug-and-play optimal hyperparameter tool for the community, which is expected to advance efficient LLM training at scale. All experimental code, data and checkpoints are publicly available at \href{https://github.com/step-law/steplaw}{https://github.com/step-law/steplaw}.
nan
Article 313
Title@2025-07-16 (3): Dataset-Adaptive Dimensionality Reduction
Title: Dataset-Adaptive Dimensionality Reduction | Datensatz-Adaptive Dimensionalitätsreduktion | 数据集-适应多维度减少 2507.11984v1 |
Authors (6): Hyeon Jeon, Jeongin Park, Soohyun Lee, Dae Hyun Kim, Sungbok Shin, Jinwook Seo
Selecting the appropriate dimensionality reduction (DR) technique and determining its optimal hyperparameter settings that maximize the accuracy of the output projections typically involves extensive trial and error, often resulting in unnecessary computational overhead. To address this challenge, we propose a dataset-adaptive approach to DR optimization guided by structural complexity metrics. These metrics quantify the intrinsic complexity of a dataset, predicting whether higher-dimensional spaces are necessary to represent it accurately. Since complex datasets are often inaccurately represented in two-dimensional projections, leveraging these metrics enables us to predict the maximum achievable accuracy of DR techniques for a given dataset, eliminating redundant trials in optimizing DR. We introduce the design and theoretical foundations of these structural complexity metrics. We quantitatively verify that our metrics effectively approximate the ground truth complexity of datasets and confirm their suitability for guiding dataset-adaptive DR workflow. Finally, we empirically show that our dataset-adaptive workflow significantly enhances the efficiency of DR optimization without compromising accuracy.
nan
Article 314
Title@2025-07-16 (3): Recent results on searches with boosted Higgs bosons at CMS
Title: Recent results on searches with boosted Higgs bosons at CMS | Aktuelle Ergebnisse bei Suchanfragen mit Higgs-Bosonen am CMS | 最近在CMS 使用增强的 Higgs bosons 搜索结果 2507.11977v1 |
Authors (1): Farouk Mokhtar
The study of boosted Higgs bosons at the LHC provides a unique window to probe Higgs boson couplings at high energy scales and search for signs of physics beyond the standard model. In these proceedings, we present recent results on boosted Higgs boson searches at the CMS experiment, highlighting innovative reconstruction and tagging techniques that enhance sensitivity in this challenging regime.
nan
Article 315
Title@2025-07-16 (3): Online Training and Pruning of Deep Reinforcement Learning Networks
Title: Online Training and Pruning of Deep Reinforcement Learning Networks | Online-Training und Pruning von Deep Verstärkung Learning Networks | 深强化学习网络的在线培训和配置 2507.11975v1 |
Authors (2): Valentin Frank Ingmar Guenter, Athanasios Sideris
Scaling deep neural networks (NN) of reinforcement learning (RL) algorithms has been shown to enhance performance when feature extraction networks are used but the gained performance comes at the significant expense of increased computational and memory complexity. Neural network pruning methods have successfully addressed this challenge in supervised learning. However, their application to RL is underexplored. We propose an approach to integrate simultaneous training and pruning within advanced RL methods, in particular to RL algorithms enhanced by the Online Feature Extractor Network (OFENet). Our networks (XiNet) are trained to solve stochastic optimization problems over the RL networks’ weights and the parameters of variational Bernoulli distributions for 0/1 Random Variables $\xi$ scaling each unit in the networks. The stochastic problem formulation induces regularization terms that promote convergence of the variational parameters to 0 when a unit contributes little to the performance. In this case, the corresponding structure is rendered permanently inactive and pruned from its network. We propose a cost-aware, sparsity-promoting regularization scheme, tailored to the DenseNet architecture of OFENets expressing the parameter complexity of involved networks in terms of the parameters of the RVs in these networks. Then, when matching this cost with the regularization terms, the many hyperparameters associated with them are automatically selected, effectively combining the RL objectives and network compression. We evaluate our method on continuous control benchmarks (MuJoCo) and the Soft Actor-Critic RL agent, demonstrating that OFENets can be pruned considerably with minimal loss in performance. Furthermore, our results confirm that pruning large networks during training produces more efficient and higher performing RL agents rather than training smaller networks from scratch.
nan
Article 316
Title@2025-07-16 (3): Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models
Title: Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models | Vorhersehbare Skala: Teil II, Farseer: Ein verfeinertes Skalierungsgesetz in großen Sprachmodellen | 可预见规模:第二部分,Farseer:改进大语言模式中的规模法 2506.10972v3 |
Authors (11): Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface $L(N,D)$, Farseer achieves a significantly better fit to empirical data than prior laws (e.g., Chinchilla’s law). Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities, improving upon Chinchilla’s law by reducing extrapolation error by 433\%. This allows for the reliable evaluation of competing training strategies across all $(N,D)$ settings, enabling conclusions from small-scale ablation studies to be confidently extrapolated to predict large-scale performance. Furthermore, Farseer provides new insights into optimal compute allocation, better reflecting the nuanced demands of modern LLM training. To validate our approach, we trained an extensive suite of approximately 1,000 LLMs across diverse scales and configurations, consuming roughly 3 million NVIDIA H100 GPU hours. We are comprehensively open-sourcing all models, data, results, and logs at https://github.com/Farseer-Scaling-Law/Farseer to foster further research.
nan
Article 317
Title@2025-07-16 (3): Regret Analysis of Posterior Sampling-Based Expected Improvement for Bayesian Optimization
Title: Regret Analysis of Posterior Sampling-Based Expected Improvement for Bayesian Optimization | Bedauerliche Analyse von posteriorer Sampling-basiert erwartete Verbesserung für Bayesian Optimierung | 对巴耶斯最佳优化的预期改进情况进行基于实际抽样结果的遗憾分析 2507.09828v2 |
Authors (4): Shion Takeno, Yu Inatsu, Masayuki Karasuyama, Ichiro Takeuchi
Bayesian optimization is a powerful tool for optimizing an expensive-to-evaluate black-box function. In particular, the effectiveness of expected improvement (EI) has been demonstrated in a wide range of applications. However, theoretical analyses of EI are limited compared with other theoretically established algorithms. This paper analyzes a randomized variant of EI, which evaluates the EI from the maximum of the posterior sample path. We show that this posterior sampling-based random EI achieves the sublinear Bayesian cumulative regret bounds under the assumption that the black-box function follows a Gaussian process. Finally, we demonstrate the effectiveness of the proposed method through numerical experiments.
nan
Article 318
Title@2025-07-16 (3): Simplifying Graph Kernels for Efficient
Title: Simplifying Graph Kernels for Efficient | Vereinfachende Graphenkerne für effizientes Arbeiten | 简化用于高效的图形内核 2507.03560v2 |
Authors (4): Lin Wang, Shijie Wang, Sirui Huang, Qing Li
While kernel methods and Graph Neural Networks offer complementary strengths, integrating the two has posed challenges in efficiency and scalability. The Graph Neural Tangent Kernel provides a theoretical bridge by interpreting GNNs through the lens of neural tangent kernels. However, its reliance on deep, stacked layers introduces repeated computations that hinder performance. In this work, we introduce a new perspective by designing the simplified graph kernel, which replaces deep layer stacking with a streamlined $K$-step message aggregation process. This formulation avoids iterative layer-wise propagation altogether, leading to a more concise and computationally efficient framework without sacrificing the expressive power needed for graph tasks. Beyond this simplification, we propose another Simplified Graph Kernel, which draws from Gaussian Process theory to model infinite-width GNNs. Rather than simulating network depth, this kernel analytically computes kernel values based on the statistical behavior of nonlinear activations in the infinite limit. This eliminates the need for explicit architecture simulation, further reducing complexity. Our experiments on standard graph and node classification benchmarks show that our methods achieve competitive accuracy while reducing runtime. This makes them practical alternatives for learning on graphs at scale. Full implementation and reproducibility materials are provided at: https://anonymous.4open.science/r/SGNK-1CE4/.
nan
Article 319
Title@2025-07-16 (3): Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Title: Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation | Decoder-Hybrid-Decoder-Architektur für effizientes Nachdenken mit langer Generation | 提高长代人合理性效率的代coder-Hybrid-Decer 结构 2507.06607v2 |
Authors (14): Liliang Ren, Congcong Chen, Haoran Xu, Young Jin Kim, Adam Atkinson, Zheng Zhan, Jiankai Sun, Baolin Peng, Liyuan Liu, Shuohang Wang, Hao Cheng, Jianfeng Gao, Weizhu Chen, Yelong Shen
Recent advances in language modeling have demonstrated the effectiveness of State Space Models (SSMs) for efficient sequence modeling. While hybrid architectures such as Samba and the decoder-decoder architecture, YOCO, have shown promising performance gains over Transformers, prior works have not investigated the efficiency potential of representation sharing between SSM layers. In this paper, we introduce the Gated Memory Unit (GMU), a simple yet effective mechanism for efficient memory sharing across layers. We apply it to create SambaY, a decoder-hybrid-decoder architecture that incorporates GMUs in the cross-decoder to share memory readout states from a Samba-based self-decoder. SambaY significantly enhances decoding efficiency, preserves linear pre-filling time complexity, and boosts long-context performance, all while eliminating the need for explicit positional encoding. Through extensive scaling experiments, we demonstrate that our model exhibits a significantly lower irreducible loss compared to a strong YOCO baseline, indicating superior performance scalability under large-scale compute regimes. Our largest model enhanced with Differential Attention, Phi4-mini-Flash-Reasoning, achieves significantly better performance than Phi4-mini-Reasoning on reasoning tasks such as Math500, AIME24/25, and GPQA Diamond without any reinforcement learning, while delivering up to 10x higher decoding throughput on 2K-length prompts with 32K generation length under the vLLM inference framework. We release our training codebase on open-source data at https://github.com/microsoft/ArchScale.
nan
Article 320
Title@2025-07-16 (3): PATCH: a deep learning method to assess heterogeneity of artistic practice in historical paintings
Title: PATCH: a deep learning method to assess heterogeneity of artistic practice in historical paintings | PATCH: eine Methode des tiefen Lernens zur Beurteilung der Heterogenität der künstlerischen Praxis in historischen Gemälden | 评估历史绘画艺术实践多样性的深层学习方法 2502.01912v3 |
Authors (13): Andrew Van Horn, Lauryn Smith, Mahamad Mahmoud, Michael McMaster, Clara Pinchbeck, Ina Martin, Andrew Lininger, Anthony Ingrisano, Adam Lowe, Carlos Bayod, Elizabeth Bolman, Kenneth Singer, Michael Hinczewski
The history of art has seen significant shifts in the manner in which artworks are created, making understanding of creative processes a central question in technical art history. In the Renaissance and Early Modern period, paintings were largely produced by master painters directing workshops of apprentices who often contributed to projects. The masters varied significantly in artistic and managerial styles, meaning different combinations of artists and implements might be seen both between masters and within workshops or even individual canvases. Information on how different workshops were managed and the processes by which artworks were created remains elusive. Machine learning methods have potential to unearth new information about artists’ creative processes by extending the analysis of brushwork to a microscopic scale. Analysis of workshop paintings, however, presents a challenge in that documentation of the artists and materials involved is sparse, meaning external examples are not available to train networks to recognize their contributions. Here we present a novel machine learning approach we call pairwise assignment training for classifying heterogeneity (PATCH) that is capable of identifying individual artistic practice regimes with no external training data, or “ground truth.” The method achieves unsupervised results by supervised means, and outperforms both simple statistical procedures and unsupervised machine learning methods. We apply this method to two historical paintings by the Spanish Renaissance master, El Greco: The Baptism of Christ and Christ on the Cross with Landscape, and our findings regarding the former potentially challenge previous work that has assigned the painting to workshop members. Further, the results of our analyses create a measure of heterogeneity of artistic practice that can be used to characterize artworks across time and space.
nan
Article 321
Title@2025-07-16 (3): BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling
Title: BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling | BRIDGE: Bootstrapping-Text zur Steuerung der Time-Series-Generation über Multi-Agent iterative Optimierung und Diffusionsmodellierung | BRIDGE:通过多代理迭代优化和传播模型化控制时间- 系列生成的推进文本 2503.02445v5 |
Authors (8): Hao Li, Yu-Hao Huang, Chang Xu, Viktor Schlegel, Renhe Jiang, Riza Batista-Navarro, Goran Nenadic, Jiang Bian
Time-series Generation (TSG) is a prominent research area with broad applications in simulations, data augmentation, and counterfactual analysis. While existing methods have shown promise in unconditional single-domain TSG, real-world applications demand for cross-domain approaches capable of controlled generation tailored to domain-specific constraints and instance-level requirements. In this paper, we argue that text can provide semantic insights, domain information and instance-specific temporal patterns, to guide and improve TSG. We introduce ``Text-Controlled TSG’’, a task focused on generating realistic time series by incorporating textual descriptions. To address data scarcity in this setting, we propose a novel LLM-based Multi-Agent framework that synthesizes diverse, realistic text-to-TS datasets. Furthermore, we introduce BRIDGE, a hybrid text-controlled TSG framework that integrates semantic prototypes with text description for supporting domain-level guidance. This approach achieves state-of-the-art generation fidelity on 11 of 12 datasets, and improves controllability by up to 12% on MSE and 6% MAE compared to no text input generation, highlighting its potential for generating tailored time-series data.
nan
Article 322
Title@2025-07-16 (3): Rethinking Data Protection in the (Generative) Artificial Intelligence Era
Title: Rethinking Data Protection in the (Generative) Artificial Intelligence Era | Datenschutz im Zeitalter der (generativen) Künstlichen Intelligenz neu denken | 在人工(人工)情报时代重新思考数据保护问题 2507.03034v2 |
Authors (11): Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, Kui Ren
The (generative) artificial intelligence (AI) era has profoundly reshaped the meaning and value of data. No longer confined to static content, data now permeates every stage of the AI lifecycle from the training samples that shape model parameters to the prompts and outputs that drive real-world model deployment. This shift renders traditional notions of data protection insufficient, while the boundaries of what needs safeguarding remain poorly defined. Failing to safeguard data in AI systems can inflict societal and individual, underscoring the urgent need to clearly delineate the scope of and rigorously enforce data protection. In this perspective, we propose a four-level taxonomy, including non-usability, privacy preservation, traceability, and deletability, that captures the diverse protection needs arising in modern (generative) AI models and systems. Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline, including training datasets, model weights, system prompts, and AI-generated content. We analyze representative technical approaches at each level and reveal regulatory blind spots that leave critical assets exposed. By offering a structured lens to align future AI technologies and governance with trustworthy data practices, we underscore the urgency of rethinking data protection for modern AI techniques and provide timely guidance for developers, researchers, and regulators alike.
nan
Article 323
Title@2025-07-16 (3): d-DQIVAR: Data-centric Visual Analytics and Reasoning for Data Quality Improvement
Title: d-DQIVAR: Data-centric Visual Analytics and Reasoning for Data Quality Improvement | d-DQIVAR: datenzentrierte visuelle Analyse und Begründung zur Verbesserung der Datenqualität | d-DQIVAR:以数据为中心的提高数据质量的视觉分析和理由 2507.11960v1 |
Authors (8): Hyein Hong, Sangbong Yoo, SeokHwan Choi, Jisue Kim, Seongbum Seo, Haneol Cho, Chansoo Kim, Yun Jang
Approaches to enhancing data quality (DQ) are classified into two main categories: data- and process-driven. However, prior research has predominantly utilized batch data preprocessing within the data-driven framework, which often proves insufficient for optimizing machine learning (ML) model performance and frequently leads to distortions in data characteristics. Existing studies have primarily focused on data preprocessing rather than genuine data quality improvement (DQI). In this paper, we introduce d-DQIVAR, a novel visual analytics system designed to facilitate DQI strategies aimed at improving ML model performance. Our system integrates visual analytics techniques that leverage both data-driven and process-driven approaches. Data-driven techniques tackle DQ issues such as imputation, outlier detection, deletion, format standardization, removal of duplicate records, and feature selection. Process-driven strategies encompass evaluating DQ and DQI procedures by considering DQ dimensions and ML model performance and applying the Kolmogorov-Smirnov test. We illustrate how our system empowers users to harness expert and domain knowledge effectively within a practical workflow through case studies, evaluations, and user studies.
nan
Article 324
Title@2025-07-16 (3): PoTPTQ: A Two-step Power-of-Two Post-training for LLMs
Title: PoTPTQ: A Two-step Power-of-Two Post-training for LLMs | PoTPTQ: Zweistufige Kraft von zwei Nachschulungen für LLMs | PoTPTQ:为LLMs提供两步二级培训后培训 2507.11959v1 |
Authors (7): Xinyu Wang, Vahid Partovi Nia, Peng Lu, Jerry Huang, Xiao-Wen Chang, Boxing Chen, Yufei Cui
Large Language Models (LLMs) have demonstrated remarkable performance across various natural language processing (NLP) tasks. However, their deployment is challenging due to the substantial computational resources required. Power-of-two (PoT) quantization is a general tool to counteract this difficulty. Albeit previous works on PoT quantization can be efficiently dequantized on CPUs using fixed-point addition, it showed less effectiveness on GPUs. The reason is entanglement of the sign bit and sequential bit manipulations needed for dequantization. We propose a novel POT quantization framework for LLM weights that (i) outperforms state-of-the-art accuracy in extremely low-precision number formats, and (ii) enables faster inference through more efficient dequantization. To maintain the accuracy of the quantized model, we introduce a two-step post-training algorithm: (i) initialize the quantization scales with a robust starting point, and (ii) refine these scales using a minimal calibration set. The performance of our PoT post-training algorithm surpasses the current state-of-the-art in integer quantization, particularly at low precisions such as 2- and 3-bit formats. Our PoT quantization accelerates the dequantization step required for the floating point inference and leads to $3.67\times$ speed up on a NVIDIA V100, and $1.63\times$ on a NVIDIA RTX 4090, compared to uniform integer dequantization.
nan
Article 325
Title@2025-07-16 (3): Tuning Algorithmic and Architectural Hyperparameters in Graph-Based Semi-Supervised Learning with Provable Guarantees
Title: Tuning Algorithmic and Architectural Hyperparameters in Graph-Based Semi-Supervised Learning with Provable Guarantees | Tuning algorithmischer und architektonischer Hyperparameter im grafisch fundierten semi-überwachten Lernen mit nachweisbaren Garantien | 在以图表为基础的半监测学习中以可实现的担保进行算法和建筑建筑超参数 2502.12937v2 |
Authors (3): Ally Yalei Du, Eric Huang, Dravyansh Sharma
Graph-based semi-supervised learning is a powerful paradigm in machine learning for modeling and exploiting the underlying graph structure that captures the relationship between labeled and unlabeled data. A large number of classical as well as modern deep learning based algorithms have been proposed for this problem, often having tunable hyperparameters. We initiate a formal study of tuning algorithm hyperparameters from parameterized algorithm families for this problem. We obtain novel $O(\log n)$ pseudo-dimension upper bounds for hyperparameter selection in three classical label propagation-based algorithm families, where $n$ is the number of nodes, implying bounds on the amount of data needed for learning provably good parameters. We further provide matching $\Omega(\log n)$ pseudo-dimension lower bounds, thus asymptotically characterizing the learning-theoretic complexity of the parameter tuning problem. We extend our study to selecting architectural hyperparameters in modern graph neural networks. We bound the Rademacher complexity for tuning the self-loop weighting in recently proposed Simplified Graph Convolution (SGC) networks. We further propose a tunable architecture that interpolates graph convolutional neural networks (GCN) and graph attention networks (GAT) in every layer, and provide Rademacher complexity bounds for tuning the interpolation coefficient.
nan
Article 326
Title@2025-07-16 (3): The benefits of query-based KGQA systems for complex and temporal questions in LLM era
Title: The benefits of query-based KGQA systems for complex and temporal questions in LLM era | Die Vorteile von anfragebasierten KGQA-Systemen für komplexe und zeitliche Fragen im LLM-Zeitalter | 基于查询的KGQA系统对LLM时代复杂和时间问题的益处 2507.11954v1 |
Authors (6): Artem Alekseev, Mikhail Chaichuk, Miron Butko, Alexander Panchenko, Elena Tutubalina, Oleg Somov
Large language models excel in question-answering (QA) yet still struggle with multi-hop reasoning and temporal questions. Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers. We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks. Through generalization and rejection studies, we evaluate robustness across multi-hop and temporal QA datasets. Additionally, we introduce a novel entity linking and predicate matching method using CoT reasoning. Our results demonstrate the potential of query-based multi-stage KGQA framework for improving multi-hop and temporal QA with small language models. Code and data: https://github.com/ar2max/NLDB-KGQA-System
nan
Article 327
Title@2025-07-16 (3): IAM: Efficient Inference through Attention Mapping between Different-scale LLMs
Title: IAM: Efficient Inference through Attention Mapping between Different-scale LLMs | IAM: Effiziente Schlussfolgerung durch Aufmerksamkeitsmapping zwischen unterschiedlichen LLMs | IAM:通过在不同规模的LMMs之间绘制注意绘图,有效推论 2507.11953v1 |
Authors (3): Yi Zhao, Zuchao Li, Hai Zhao
LLMs encounter significant challenges in resource consumption nowadays, especially with long contexts. Despite extensive efforts dedicate to enhancing inference efficiency, these methods primarily exploit internal sparsity within the models, without leveraging external information for optimization. We identify the high similarity of attention matrices across different-scale LLMs, which offers a novel perspective for optimization. We first conduct a comprehensive analysis of how to measure similarity, how to select mapping Layers and whether mapping is consistency. Based on these insights, we introduce the IAM framework, which achieves dual benefits of accelerated attention computation and reduced KV cache usage by performing attention mapping between small and large LLMs. Our experimental results demonstrate that IAM can accelerate prefill by 15% and reduce KV cache usage by 22.1% without appreciably sacrificing performance. Experiments on different series of models show the generalizability of IAM. Importantly, it is also orthogonal to many existing KV cache optimization methods, making it a versatile addition to the current toolkit for enhancing LLM efficiency.
nan
Article 328
Title@2025-07-16 (3): RNAMunin: A Deep Machine Learning Model for Non-coding RNA Discovery
Title: RNAMunin: A Deep Machine Learning Model for Non-coding RNA Discovery | RNAMunin: Ein Deep Machine Learning Modell für die nicht-kodierende RNA Discovery | RNAMunin:一个非编码 RNA 探索的深机器学习模型 2507.11950v1 |
Authors (2): Lauren Lui, Torben Nielsen
Functional annotation of microbial genomes is often biased toward protein-coding genes, leaving a vast, unexplored landscape of non-coding RNAs (ncRNAs) that are critical for regulating bacterial and archaeal physiology, stress response and metabolism. Identifying ncRNAs directly from genomic sequence is a paramount challenge in bioinformatics and biology, essential for understanding the complete regulatory potential of an organism. This paper presents RNAMunin, a machine learning (ML) model that is capable of finding ncRNAs using genomic sequence alone. It is also computationally viable for large sequence datasets such as long read metagenomic assemblies with contigs totaling multiple Gbp. RNAMunin is trained on Rfam sequences extracted from approximately 60 Gbp of long read metagenomes from 16 San Francisco Estuary samples. We know of no other model that can detect ncRNAs based solely on genomic sequence at this scale. Since RNAMunin only requires genomic sequence as input, we do not need for an ncRNA to be transcribed to find it, i.e., we do not need transcriptomics data. We wrote this manuscript in a narrative style in order to best convey how RNAMunin was developed and how it works in detail. Unlike almost all current ML models, at approximately 1M parameters, RNAMunin is very small and very fast.
nan
Article 329
Title@2025-07-16 (3): Kevin: Multi-Turn RL for Generating CUDA Kernels
Title: Kevin: Multi-Turn RL for Generating CUDA Kernels | Kevin: Multi-Turn RL für die Erzeugung von CUDA-Kerneln | Kevin: 生成 CUDA 核心多发RL 2507.11948v1 |
Authors (5): Carlo Baronio, Pietro Marsella, Ben Pan, Simon Guo, Silas Alberti
Writing GPU kernels is a challenging task and critical for AI systems’ efficiency. It is also highly iterative: domain experts write code and improve performance through execution feedback. Moreover, it presents verifiable rewards like correctness and speedup, making it a natural environment to apply Reinforcement Learning (RL). To explicitly incorporate the iterative nature of this process into training, we develop a flexible multi-turn RL recipe that addresses unique challenges encountered in real-world settings, such as learning from long trajectories and effective reward attribution across turns. We present Kevin - K(ernel D)evin, the first model trained with multi-turn RL for CUDA kernel generation and optimization. In our evaluation setup, Kevin shows significant gains over its base model (QwQ-32B), improving correctness of generated kernels (in pure CUDA) from 56% to 82% and mean speedup from 0.53x to 1.10x of baseline (PyTorch Eager), and surpassing frontier models like o4-mini (0.78x). Finally, we study its behavior across test-time scaling axes: we found scaling serial refinement more beneficial than parallel sampling. In particular, when given more refinement turns, Kevin shows a higher rate of improvement.
nan
Article 330
Title@2025-07-16 (3): A Survey of Deep Learning for Geometry Problem Solving
Title: A Survey of Deep Learning for Geometry Problem Solving | Eine Umfrage über Deep Learning zur Lösung von Geometrieproblemen | 解决几何问题深层学习调查 2507.11936v1 |
Authors (3): Jianzhe Ma, Wenxuan Wang, Qin Jin
Geometry problem solving is a key area of mathematical reasoning, which is widely involved in many important fields such as education, mathematical ability assessment of artificial intelligence, and multimodal ability assessment. In recent years, the rapid development of deep learning technology, especially the rise of multimodal large language models, has triggered a widespread research boom. This paper provides a survey of the applications of deep learning in geometry problem solving, including (i) a comprehensive summary of the relevant tasks in geometry problem solving; (ii) a thorough review of related deep learning methods; (iii) a detailed analysis of evaluation metrics and methods; and (iv) a critical discussion of the current challenges and future directions that can be explored. Our goal is to provide a comprehensive and practical reference of deep learning for geometry problem solving to promote further developments in this field. We create a continuously updated list of papers on GitHub: https://github.com/majianz/dl4gps.
nan
Article 331
Title@2025-07-16 (3): Truncated Kernel Stochastic Gradient Descent on Spheres
Title: Truncated Kernel Stochastic Gradient Descent on Spheres | Beschnittener Kern Stochastischer Gradient Abstieg auf Sphären | 球体上被排出核心内核岩层渐变源 2410.01570v6 |
Authors (2): Jinhui Bai, Lei Shi
Inspired by the structure of spherical harmonics, we propose the truncated kernel stochastic gradient descent (T-kernel SGD) algorithm with a least-square loss function for spherical data fitting. T-kernel SGD introduces a novel regularization strategy by implementing stochastic gradient descent through a closed-form solution of the projection of the stochastic gradient in a low-dimensional subspace. In contrast to traditional kernel SGD, the regularization strategy implemented by T-kernel SGD is more effective in balancing bias and variance by dynamically adjusting the hypothesis space during iterations. The most significant advantage of the proposed algorithm is that it can achieve theoretically optimal convergence rates using a constant step size (independent of the sample size) while overcoming the inherent saturation problem of kernel SGD. Additionally, we leverage the structure of spherical polynomials to derive an equivalent T-kernel SGD, significantly reducing storage and computational costs compared to kernel SGD. Typically, T-kernel SGD requires only $\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$ computational complexity and $\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ storage to achieve optimal rates for the d-dimensional sphere, where $0<\epsilon<\frac{1}{2}$ can be arbitrarily small if the optimal fitting or the underlying space possesses sufficient regularity. This regularity is determined by the smoothness parameter of the objective function and the decaying rate of the eigenvalues of the integral operator associated with the kernel function, both of which reflect the difficulty of the estimation problem. Our main results quantitatively characterize how this prior information influences the convergence of T-kernel SGD. The numerical experiments further validate the theoretical findings presented in this paper.
nan
Article 332
Title@2025-07-16 (3): Complex non-backtracking matrix for directed graphs
Title: Complex non-backtracking matrix for directed graphs | Komplexe Nicht-Rückverfolgungsmatrix für gerichtete Graphen | 定向图表的复杂非后跟踪矩阵表 2507.12503v1 |
Authors (2): Keishi Sando, Hideitsu Hino
Graph representation matrices are essential tools in graph data analysis. Recently, Hermitian adjacency matrices have been proposed to investigate directed graph structures. Previous studies have demonstrated that these matrices can extract valuable information for clustering. In this paper, we propose the complex non-backtracking matrix that integrates the properties of the Hermitian adjacency matrix and the non-backtracking matrix. The proposed matrix has similar properties with the non-backtracking matrix of undirected graphs. We reveal relationships between the complex non-backtracking matrix and the Hermitian adjacency matrix. Also, we provide intriguing insights that this matrix representation holds cluster information, particularly for sparse directed graphs.
nan
Article 333
Title@2025-07-16 (3): Accelerating RF Power Amplifier Design via Intelligent Sampling and ML-Based Parameter Tuning
Title: Accelerating RF Power Amplifier Design via Intelligent Sampling and ML-Based Parameter Tuning | Beschleunigung des RF-Leistungsverstärkers über intelligente Probenahme und ML-basierte Parameter-Tuning | 通过智能取样和以 ML 为基础的参数图集加速 RF 功率放大器设计 2507.11928v1 |
Authors (2): Abhishek Sriram, Neal Tuffy
This paper presents a machine learning-accelerated optimization framework for RF power amplifier design that reduces simulation requirements by 65% while maintaining $\pm0.3$ to $\pm0.4$ dBm accuracy. The proposed method combines MaxMin Latin Hypercube Sampling with CatBoost gradient boosting to intelligently explore multidimensional parameter spaces. Instead of exhaustively simulating all parameter combinations to achieve target P2dB compression specifications, our approach strategically selects approximately 35% of critical simulation points. The framework processes ADS netlists, executes harmonic balance simulations on the reduced dataset, and trains a CatBoost model to predict P2dB performance across the entire design space. Validation across 15 PA operating modes yields an average $R^2$ of 0.901, with the system ranking parameter combinations by their likelihood of meeting target specifications. The integrated solution delivers 58.24% to 77.78% reduction in simulation time through automated GUI-based workflows, enabling rapid design iterations without compromising accuracy standards required for production RF circuits.
nan
Article 334
Title@2025-07-16 (3): From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning
Title: From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning | Von Generativ zu Episodisch: Muster-Effizient Replicable Verstärkungslernen | 从产生到起源:抽样有效复制强化学习 2507.11926v1 |
Authors (4): Max Hopkins, Sihan Liu, Christopher Ye, Yuichi Yoshida
The epidemic failure of replicability across empirical science and machine learning has recently motivated the formal study of replicable learning algorithms [Impagliazzo et al. (2022)]. In batch settings where data comes from a fixed i.i.d. source (e.g., hypothesis testing, supervised learning), the design of data-efficient replicable algorithms is now more or less understood. In contrast, there remain significant gaps in our knowledge for control settings like reinforcement learning where an agent must interact directly with a shifting environment. Karbasi et. al show that with access to a generative model of an environment with $S$ states and $A$ actions (the RL ‘batch setting’), replicably learning a near-optimal policy costs only $\tilde{O}(S^2A^2)$ samples. On the other hand, the best upper bound without a generative model jumps to $\tilde{O}(S^7 A^7)$ [Eaton et al. (2024)] due to the substantial difficulty of environment exploration. This gap raises a key question in the broader theory of replicability: Is replicable exploration inherently more expensive than batch learning? Is sample-efficient replicable RL even possible? In this work, we (nearly) resolve this problem (for low-horizon tabular MDPs): exploration is not a significant barrier to replicable learning! Our main result is a replicable RL algorithm on $\tilde{O}(S^2A)$ samples, bridging the gap between the generative and episodic settings. We complement this with a matching $\tilde{\Omega}(S^2A)$ lower bound in the generative setting (under the common parallel sampling assumption) and an unconditional lower bound in the episodic setting of $\tilde{\Omega}(S^2)$ showcasing the near-optimality of our algorithm with respect to the state space $S$.
nan
Article 335
Title@2025-07-16 (3): TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images
Title: TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images | TextDestroyer: Eine trainings- und annotationsfreie Diffusionsmethode zum Zerstören anomaler Texte aus Bildern | 文字破坏:一个销毁图像中的非原子文字的无培训和注注解-不附带说明的传播方法 2411.00355v3 |
Authors (2): Mengcheng Li, Fei Chao
In this paper, we propose TextDestroyer, the first training- and annotation-free method for scene text destruction using a pre-trained diffusion model. Existing scene text removal models require complex annotation and retraining, and may leave faint yet recognizable text information, compromising privacy protection and content concealment. TextDestroyer addresses these issues by employing a three-stage hierarchical process to obtain accurate text masks. Our method scrambles text areas in the latent start code using a Gaussian distribution before reconstruction. During the diffusion denoising process, self-attention key and value are referenced from the original latent to restore the compromised background. Latent codes saved at each inversion step are used for replacement during reconstruction, ensuring perfect background restoration. The advantages of TextDestroyer include: (1) it eliminates labor-intensive data annotation and resource-intensive training; (2) it achieves more thorough text destruction, preventing recognizable traces; and (3) it demonstrates better generalization capabilities, performing well on both real-world scenes and generated images.
nan
Article 336
Title@2025-07-16 (3): Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?
Title: Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? | Kann Prompt Schwierigkeit Online vorausgesagt werden, um RL zu beschleunigen Finetuning of Reasoning Models? | 快速困难能否预测为加速理据模型的RL微调而在线化? 2507.04632v2 |
Authors (5): Yun Qu, Qi Cheems Wang, Yixiu Mao, Vincent Tao Hu, Xiangyang Ji
Recent advances have witnessed the effectiveness of reinforcement learning (RL) finetuning in enhancing the reasoning capabilities of large language models (LLMs). The optimization process often requires numerous iterations to achieve satisfactory performance, resulting in high computational costs due to the need for frequent prompt evaluations under intensive LLM interactions and repeated policy updates. Appropriate online prompt selection methods reduce iteration steps by prioritizing informative prompts during training, while the pipeline’s reliance on exhaustive prompt evaluation and subset selection for optimization still incurs substantial computational overhead due to frequent LLM inference calls. Distinguished from these direct evaluate-then-select schemes, this work investigates iterative approximate evaluation for arbitrary prompts and introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions. Technically, MoPPS models each prompt’s success rate as a latent variable, performs streaming Bayesian inference, and employs posterior sampling in a constructed multi-armed bandit machine, enabling sample efficient and adaptive prompt selection. Extensive experiments across mathematics, planning, and vision-based geometry tasks show that MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced LLM rollouts.
nan
Article 337
Title@2025-07-16 (3): AFPM: Alignment-based Frame Patch Modeling for Cross-Dataset EEG Decoding
Title: AFPM: Alignment-based Frame Patch Modeling for Cross-Dataset EEG Decoding | AFPM: Alignmentbasierte Rahmenpatch-Modellierung für Cross-Dataset-EEG-Dekodierung | FAFPM: 跨数据交换电子EEG的对齐框架补全模型 2507.11911v1 |
Authors (3): Xiaoqing Chen, Siyang Li, Dongrui Wu
Electroencephalogram (EEG) decoding models for brain-computer interfaces (BCIs) struggle with cross-dataset learning and generalization due to channel layout inconsistencies, non-stationary signal distributions, and limited neurophysiological prior integration. To address these issues, we propose a plug-and-play Alignment-Based Frame-Patch Modeling (AFPM) framework, which has two main components: 1) Spatial Alignment, which selects task-relevant channels based on brain-region priors, aligns EEG distributions across domains, and remaps the selected channels to a unified layout; and, 2) Frame-Patch Encoding, which models multi-dataset signals into unified spatiotemporal patches for EEG decoding. Compared to 17 state-of-the-art approaches that need dataset-specific tuning, the proposed calibration-free AFPM achieves performance gains of up to 4.40% on motor imagery and 3.58% on event-related potential tasks. To our knowledge, this is the first calibration-free cross-dataset EEG decoding framework, substantially enhancing the practicalness of BCIs in real-world applications.
nan
Article 338
Title@2025-07-16 (3): Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes
Title: Learning Time-Varying Multi-Region Brain Communications via Scalable Markovian Gaussian Processes | Lernen von zeitvariierenden Multi-Region Gehirnkommunikation über skalierbare Markovian Gaussian Prozesse | 通过可缩放的马尔科维扬高斯进程进行学习、改变时间的多区域脑交流 2407.00397v6 |
Authors (4): Weihan Li, Yule Wang, Chengrui Li, Anqi Wu
Understanding and constructing brain communications that capture dynamic communications across multiple regions is fundamental to modern system neuroscience, yet current methods struggle to find time-varying region-level communications or scale to large neural datasets with long recording durations. We present a novel framework using Markovian Gaussian Processes to learn brain communications with time-varying temporal delays from multi-region neural recordings, named Adaptive Delay Model (ADM). Our method combines Gaussian Processes with State Space Models and employs parallel scan inference algorithms, enabling efficient scaling to large datasets while identifying concurrent communication patterns that evolve over time. This time-varying approach captures how brain region interactions shift dynamically during cognitive processes. Validated on synthetic and multi-region neural recordings datasets, our approach discovers both the directionality and temporal dynamics of neural communication. This work advances our understanding of distributed neural computation and provides a scalable tool for analyzing dynamic brain networks.
nan
Article 339
Title@2025-07-16 (3): Epic-Sounds: A Large-scale Dataset of Actions That Sound
Title: Epic-Sounds: A Large-scale Dataset of Actions That Sound | Epic-Sounds: Ein großer Datensatz von Aktionen, die klingen | 超声波:声响的大规模行动数据集 2302.00646v3 |
Authors (5): Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman
We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping these free-form descriptions of audio into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from video, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4k categorised segments of audible events and actions, distributed across 44 classes as well as 39.2k non-categorised segments. We train and evaluate state-of-the-art audio recognition and detection models on our dataset, for both audio-only and audio-visual methods. We also conduct analysis on: the temporal overlap between audio events, the temporal and label correlations between audio and visual modalities, the ambiguities in annotating materials from audio-only input, the importance of audio-only labels and the limitations of current models to understand actions that sound.
nan
Article 340
Title@2025-07-16 (3): Resampling strategies for imbalanced regression: a survey and empirical analysis
Title: Resampling strategies for imbalanced regression: a survey and empirical analysis | Strategien für eine unausgewogene Regression: eine Erhebung und empirische Analyse | 恢复不平衡回归的战略:调查和实证分析 2507.11902v1 |
Authors (3): Juscimara G. Avelino, George D. C. Cavalcanti, Rafael M. O. Cruz
Imbalanced problems can arise in different real-world situations, and to address this, certain strategies in the form of resampling or balancing algorithms are proposed. This issue has largely been studied in the context of classification, and yet, the same problem features in regression tasks, where target values are continuous. This work presents an extensive experimental study comprising various balancing and predictive models, and wich uses metrics to capture important elements for the user and to evaluate the predictive model in an imbalanced regression data context. It also proposes a taxonomy for imbalanced regression approaches based on three crucial criteria: regression model, learning process, and evaluation metrics. The study offers new insights into the use of such strategies, highlighting the advantages they bring to each model’s learning process, and indicating directions for further studies. The code, data and further information related to the experiments performed herein can be found on GitHub: https://github.com/JusciAvelino/imbalancedRegression.
nan
Article 341
Title@2025-07-16 (3): Imbalanced Regression Pipeline Recommendation
Title: Imbalanced Regression Pipeline Recommendation | Unausgewogene Regressionspipeline-Empfehlung | 不平衡的递减管道建议 2507.11901v1 |
Authors (3): Juscimara G. Avelino, George D. C. Cavalcanti, Rafael M. O. Cruz
Imbalanced problems are prevalent in various real-world scenarios and are extensively explored in classification tasks. However, they also present challenges for regression tasks due to the rarity of certain target values. A common alternative is to employ balancing algorithms in preprocessing to address dataset imbalance. However, due to the variety of resampling methods and learning models, determining the optimal solution requires testing many combinations. Furthermore, the learning model, dataset, and evaluation metric affect the best strategies. This work proposes the Meta-learning for Imbalanced Regression (Meta-IR) framework, which diverges from existing literature by training meta-classifiers to recommend the best pipeline composed of the resampling strategy and learning model per task in a zero-shot fashion. The meta-classifiers are trained using a set of meta-features to learn how to map the meta-features to the classes indicating the best pipeline. We propose two formulations: Independent and Chained. Independent trains the meta-classifiers to separately indicate the best learning algorithm and resampling strategy. Chained involves a sequential procedure where the output of one meta-classifier is used as input for another to model intrinsic relationship factors. The Chained scenario showed superior performance, suggesting a relationship between the learning algorithm and the resampling strategy per task. Compared with AutoML frameworks, Meta-IR obtained better results. Moreover, compared with baselines of six learning algorithms and six resampling algorithms plus no resampling, totaling 42 (6 X 7) configurations, Meta-IR outperformed all of them. The code, data, and further information of the experiments can be found on GitHub: https://github.com/JusciAvelino/Meta-IR.
nan
Article 342
Title@2025-07-16 (3): Newfluence: Boosting Model interpretability and Understanding in High Dimensions
Title: Newfluence: Boosting Model interpretability and Understanding in High Dimensions | Newfluence: Verbesserung der Interpretationsfähigkeit und des Verständnisses von Modellen in hohen Dimensionen | 新流:在高维度方面促进模型解释和理解 2507.11895v1 |
Authors (5): Haolin Zou, Arnab Auddy, Yongchan Kwon, Kamiar Rahnama Rad, Arian Maleki
The increasing complexity of machine learning (ML) and artificial intelligence (AI) models has created a pressing need for tools that help scientists, engineers, and policymakers interpret and refine model decisions and predictions. Influence functions, originating from robust statistics, have emerged as a popular approach for this purpose. However, the heuristic foundations of influence functions rely on low-dimensional assumptions where the number of parameters $p$ is much smaller than the number of observations $n$. In contrast, modern AI models often operate in high-dimensional regimes with large $p$, challenging these assumptions. In this paper, we examine the accuracy of influence functions in high-dimensional settings. Our theoretical and empirical analyses reveal that influence functions cannot reliably fulfill their intended purpose. We then introduce an alternative approximation, called Newfluence, that maintains similar computational efficiency while offering significantly improved accuracy. Newfluence is expected to provide more accurate insights than many existing methods for interpreting complex AI models and diagnosing their issues. Moreover, the high-dimensional framework we develop in this paper can also be applied to analyze other popular techniques, such as Shapley values.
nan
Article 343
Title@2025-07-16 (3): Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
Title: Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work? | Den besseren Bandit-Algorithmus unter Datenfreigabe wählen: Wann funktionieren A/B-Experimente? | 在数据共享:A/B实验何时奏效? 2507.11891v1 |
Authors (3): Shuangning Li, Chonghuan Wang, Jingyan Wang
We study A/B experiments that are designed to compare the performance of two recommendation algorithms. Prior work has shown that the standard difference-in-means estimator is biased in estimating the global treatment effect (GTE) due to a particular form of interference between experimental units. Specifically, units under the treatment and control algorithms contribute to a shared pool of data that subsequently train both algorithms, resulting in interference between the two groups. The bias arising from this type of data sharing is known as “symbiosis bias”. In this paper, we highlight that, for decision-making purposes, the sign of the GTE often matters more than its precise magnitude when selecting the better algorithm. We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the expected GTE estimate under data sharing aligns with or contradicts the sign of the true GTE. Our analysis identifies the level of exploration versus exploitation as a key determinant of how symbiosis bias impacts algorithm selection.
nan
Article 344
Title@2025-07-16 (3): HueManity: Probing Fine-Grained Visual Perception in MLLMs
Title: HueManity: Probing Fine-Grained Visual Perception in MLLMs | HueManity: Erzeugen einer feinkörnigen visuellen Wahrnehmung in MLLMs | 优才:在MLLMs中探究精美的视觉感知 2506.03194v2 |
Authors (4): Rynaa Grover, Jayant Sravan Tamarapalli, Sahiti Yerramilli, Nilay Pande
Multimodal Large Language Models (MLLMs) excel at high-level visual reasoning, but their performance on nuanced perceptual tasks remains surprisingly limited. We present HueManity, a benchmark designed to assess visual perception in MLLMs. The dataset comprises 83,850 images featuring two-character alphanumeric strings embedded in Ishihara test style dot patterns, challenging models on precise pattern recognition. Our evaluation of nine state-of-the-art MLLMs on HueManity demonstrates a significant performance deficit compared to human and traditional computer vision baselines. The best-performing MLLM achieved a 33.6% accuracy on the numeric easy' task and a striking 3% on the alphanumeric
hard’ task. In contrast, human participants achieved near-perfect scores (100% and 95.6%), and a fine-tuned ResNet50 model reached accuracies of 96.5% and 94.5%. These results highlight a critical gap in the visual capabilities of current MLLMs. Our analysis further explores potential architectural and training-paradigm factors contributing to this perceptual gap in MLLMs. We open-source HueManity dataset and code to foster further research in improving perceptual robustness of MLLMs.
nan
Article 345
Title@2025-07-16 (3): GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning
Title: GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning | GeoChain: Multimodale Kette von Ideen für geographische Vernunft | Geo Chain:为地理原因寻求的多式联运谈判链 2506.00785v2 |
Authors (4): Sahiti Yerramilli, Nilay Pande, Rynaa Grover, Jayant Sravan Tamarapalli
This paper introduces GeoChain, a large-scale benchmark for evaluating step-by-step geographic reasoning in multimodal large language models (MLLMs). Leveraging 1.46 million Mapillary street-level images, GeoChain pairs each image with a 21-step chain-of-thought (CoT) question sequence (over 30 million Q&A pairs). These sequences guide models from coarse attributes to fine-grained localization across four reasoning categories - visual, spatial, cultural, and precise geolocation - annotated by difficulty. Images are also enriched with semantic segmentation (150 classes) and a visual locatability score. Our benchmarking of contemporary MLLMs (GPT-4.1 variants, Claude 3.7, Gemini 2.5 variants) on a diverse 2,088-image subset reveals consistent challenges: models frequently exhibit weaknesses in visual grounding, display erratic reasoning, and struggle to achieve accurate localization, especially as the reasoning complexity escalates. GeoChain offers a robust diagnostic methodology, critical for fostering significant advancements in complex geographic reasoning within MLLMs.
nan
Article 346
Title@2025-07-16 (3): A Policy-Improved Deep Deterministic Policy Gradient Framework for the Discount Order Acceptance Strategy of Ride-hailing Drivers
Title: A Policy-Improved Deep Deterministic Policy Gradient Framework for the Discount Order Acceptance Strategy of Ride-hailing Drivers | Ein Policy-Improved Deep Deterministic Policy Gradient Framework für die Discount Order Acceptance Strategy of Ride-hailing Drivers | 改善乘乘驾驶员折扣令接受战略的 政策改进深确定性政策分级框架 2507.11865v1 |
Authors (5): Hanwen Dai, Chang Gao, Fang He, Congyuan Ji, Yanni Yang
The rapid expansion of platform integration has emerged as an effective solution to mitigate market fragmentation by consolidating multiple ride-hailing platforms into a single application. To address heterogeneous passenger preferences, third-party integrators provide Discount Express service delivered by express drivers at lower trip fares. For the individual platform, encouraging broader participation of drivers in Discount Express services has the potential to expand the accessible demand pool and improve matching efficiency, but often at the cost of reduced profit margins. This study aims to dynamically manage drivers’ acceptance of Discount Express from the perspective of individual platforms. The lack of historical data under the new business model necessitates online learning. However, early-stage exploration through trial and error can be costly in practice, highlighting the need for reliable early-stage performance in real-world deployment. To address these challenges, this study formulates the decision regarding the proportion of drivers’ acceptance behavior as a continuous control task. In response to the high stochasticity, the opaque matching mechanisms employed by third-party integrator, and the limited availability of historical data, we propose a policy-improved deep deterministic policy gradient (pi-DDPG) framework. The proposed framework incorporates a refiner module to boost policy performance during the early training phase, leverages a convolutional long short-term memory network to effectively capture complex spatiotemporal patterns, and adopts a prioritized experience replay mechanism to enhance learning efficiency. A simulator based on a real-world dataset is developed to validate the effectiveness of the proposed pi-DDPG. Numerical experiments demonstrate that pi-DDPG achieves superior learning efficiency and significantly reduces early-stage training losses.
nan
Article 347
Title@2025-07-16 (3): METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation
Title: METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation | METIS: Schnelle, qualitätsbewusste RAG-Systeme mit Konfigurationsanpassung | METIS:具有配置适应的快速质量软件RAG系统 2412.10543v2 |
Authors (8): Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Shaoting Feng, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang
RAG (Retrieval Augmented Generation) allows LLMs (large language models) to generate better responses with external knowledge, but using more external knowledge often improves generation quality at the expense of response delay. Prior work either reduces the response delay (through better scheduling of RAG queries) or strives to maximize quality (which involves tuning the RAG workflow), but they fall short in optimizing the tradeoff between the delay and quality of RAG responses. This paper presents METIS, the first RAG system that jointly schedules queries and adapts the key RAG configurations of each query, such as the number of retrieved text chunks and synthesis methods, in order to balance quality optimization and response delay reduction. Using 4 popular RAG-QA datasets, we show that compared with the state-of-the-art RAG optimization schemes, METIS reduces the generation latency by $1.64-2.54\times$ without sacrificing generation quality.
nan
Article 348
Title@2025-07-16 (3): OrdShap: Feature Position Importance for Sequential Black-Box Models
Title: OrdShap: Feature Position Importance for Sequential Black-Box Models | OrdShap: Feature Position Bedeutung für sequentielle Black-Box-Modelle | OrdShap: 序列黑ox 模型的特性位置重要性 2507.11855v1 |
Authors (6): Davin Hill, Brian L. Hill, Aria Masoomi, Vijay S. Nori, Robert E. Tillman, Jennifer Dy
Sequential deep learning models excel in domains with temporal or sequential dependencies, but their complexity necessitates post-hoc feature attribution methods for understanding their predictions. While existing techniques quantify feature importance, they inherently assume fixed feature ordering - conflating the effects of (1) feature values and (2) their positions within input sequences. To address this gap, we introduce OrdShap, a novel attribution method that disentangles these effects by quantifying how a model’s predictions change in response to permuting feature position. We establish a game-theoretic connection between OrdShap and Sanchez-Berganti~nos values, providing a theoretically grounded approach to position-sensitive attribution. Empirical results from health, natural language, and synthetic datasets highlight OrdShap’s effectiveness in capturing feature value and feature position attributions, and provide deeper insight into model behavior.
nan
Article 349
Title@2025-07-16 (3): Some remarks on gradient dominance and LQR policy optimization
Title: Some remarks on gradient dominance and LQR policy optimization | Einige Bemerkungen zur Gradientendominanz und zur LQR-Politikoptimierung | 关于梯度支配地位和LQR政策优化的一些评论 2507.10452v2 |
Authors (1): Eduardo D. Sontag
Solutions of optimization problems, including policy optimization in reinforcement learning, typically rely upon some variant of gradient descent. There has been much recent work in the machine learning, control, and optimization communities applying the Polyak-{\L}ojasiewicz Inequality (PLI) to such problems in order to establish an exponential rate of convergence (a.k.a. linear convergence'' in the local-iteration language of numerical analysis) of loss functions to their minima under the gradient flow. Often, as is the case of policy iteration for the continuous-time LQR problem, this rate vanishes for large initial conditions, resulting in a mixed globally linear / locally exponential behavior. This is in sharp contrast with the discrete-time LQR problem, where there is global exponential convergence. That gap between CT and DT behaviors motivates the search for various generalized PLI-like conditions, and this talk will address that topic. Moreover, these generalizations are key to understanding the transient and asymptotic effects of errors in the estimation of the gradient, errors which might arise from adversarial attacks, wrong evaluation by an oracle, early stopping of a simulation, inaccurate and very approximate digital twins, stochastic computations (algorithm
reproducibility’’), or learning by sampling from limited data. We describe an input to state stability'' (ISS) analysis of this issue. The second part discusses convergence and PLI-like properties of
linear feedforward neural networks’’ in feedback control. Much of the work described here was done in collaboration with Arthur Castello B. de Oliveira, Leilei Cui, Zhong-Ping Jiang, and Milad Siami.
nan
Article 350
Title@2025-07-16 (3): Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
Title: Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential | Ihr LLM kennt die Zukunft: Sein Multi-Token-Prognosepotenzial enthüllen | 您的LLM 了解未来: 发掘其多功能预测潜力 2507.11851v1 |
Authors (7): Mohammad Samragh, Arnav Kundu, David Harrison, Kumari Nishu, Devang Naik, Minsik Cho, Mehrdad Farajtabar
Autoregressive language models are constrained by their inherently sequential nature, generating one token at a time. This paradigm limits inference speed and parallelism, especially during later stages of generation when the direction and semantics of text are relatively certain. In this work, we propose a novel framework that leverages the inherent knowledge of vanilla autoregressive language models about future tokens, combining techniques to realize this potential and enable simultaneous prediction of multiple subsequent tokens. Our approach introduces several key innovations: (1) a masked-input formulation where multiple future tokens are jointly predicted from a common prefix; (2) a gated LoRA formulation that preserves the original LLM’s functionality, while equipping it for multi-token prediction; (3) a lightweight, learnable sampler module that generates coherent sequences from the predicted future tokens; (4) a set of auxiliary training losses, including a consistency loss, to enhance the coherence and accuracy of jointly generated tokens; and (5) a speculative generation strategy that expands tokens quadratically in the future while maintaining high fidelity. Our method achieves significant speedups through supervised fine-tuning on pretrained models. For example, it generates code and math nearly 5x faster, and improves general chat and knowledge tasks by almost 2.5x. These gains come without any loss in quality.
nan
Article 351
Title@2025-07-16 (3): Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update
Title: Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update | Generalisierte Lineare Banditen: Fast optimales Bedauern mit One-Pass-Aktualisierung | 通用线性直线强盗: 几乎最佳的误差, 带有单纸条更新 2507.11847v1 |
Authors (4): Yu-Jie Zhang, Sheng-An Xu, Peng Zhao, Masashi Sugiyama
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with $\mathcal{O}(1)$ time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator, which is derived through a novel analysis that leverages the notion of mix loss from online prediction. The analysis shows that our OMD estimator, even with its one-pass updates, achieves statistical efficiency comparable to maximum likelihood estimation, thereby leading to a jointly efficient optimistic method.
nan
Article 352
Title@2025-07-16 (3): ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving
Title: ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving | ReAL-AD: Auf dem Weg zu menschlicher Vernunft im autonomen Fahren Ende-zu-Ende | Re-AL-AD:在最终至最终自治驾驶中争取同人相同的理由 2507.12499v1 |
Authors (4): Yuhang Lu, Jiadong Tu, Yuexin Ma, Xinge Zhu
End-to-end autonomous driving has emerged as a promising approach to unify perception, prediction, and planning within a single framework, reducing information loss and improving adaptability. However, existing methods often rely on fixed and sparse trajectory supervision, limiting their ability to capture the hierarchical reasoning process that human drivers naturally employ. To bridge this gap, we propose ReAL-AD, a Reasoning-Augmented Learning framework that structures decision-making in autonomous driving based on the three-tier human cognitive model: Driving Strategy, Driving Decision, and Driving Operation, where Vision-Language Models (VLMs) are incorporated to enhance situational awareness and structured reasoning across these levels. Specifically, we introduce: (1) the Strategic Reasoning Injector, which formulates high-level driving strategies by interpreting complex traffic contexts from VLM-generated insights; (2) the Tactical Reasoning Integrator, which refines strategic intent into interpretable tactical choices such as lane changes, overtaking, and speed adjustments; and (3) the Hierarchical Trajectory Decoder, which progressively translates tactical decisions into precise control actions for smooth and human-like trajectory execution. Extensive evaluations show that integrating our framework improves planning accuracy and safety by over 30%, making end-to-end autonomous driving more interpretable and aligned with human-like hierarchical reasoning. The project page can be found at: \href{https://4dvlab.github.io/project_page/realad}{\texttt{4dvlab.github.io/project_page/realad}}
nan
Article 353
Title@2025-07-16 (3): CTSR: Cartesian tensor-based sparse regression for data-driven discovery of high-dimensional invariant governing equations
Title: CTSR: Cartesian tensor-based sparse regression for data-driven discovery of high-dimensional invariant governing equations | CTSR: Kartesische Tensor-basierte spärliche Regression für die datengetriebene Entdeckung hochdimensionaler Invariant-Regulierungsgleichungen | CTSR: 由数据驱动的发现高维变异调节方程式的 数据驱动的高度异变方程的 笛卡尔斯偏差微弱回归 2504.07618v2 |
Authors (5): Boqian Zhang, Juanmian Lei, Guoyou Sun, Shuaibing Ding, Jian Guo
Accurate and concise governing equations are crucial for understanding system dynamics. Recently, data-driven methods such as sparse regression have been employed to automatically uncover governing equations from data, representing a significant shift from traditional first-principles modeling. However, most existing methods focus on scalar equations, limiting their applicability to simple, low-dimensional scenarios, and failing to ensure rotation and reflection invariance without incurring significant computational cost or requiring additional prior knowledge. This paper proposes a Cartesian tensor-based sparse regression (CTSR) technique to accurately and efficiently uncover complex, high-dimensional governing equations while ensuring invariance. Evaluations on two two-dimensional (2D) and two three-dimensional (3D) test cases demonstrate that the proposed method achieves superior accuracy and efficiency compared to the conventional technique.
nan
Article 354
Title@2025-07-16 (3): Understanding Pan-Sharpening via Generalized Inverse
Title: Understanding Pan-Sharpening via Generalized Inverse | Pan-Sharpening über generalisierte Inverse verstehen | 通过一般化反向 2310.02718v3 |
Authors (4): Shiqi Liu, Yihua Tan, Yutong Bai, Alan Yuille
Pan-sharpening algorithms utilize a panchromatic image and a multispectral image to generate a high spatial and high spectral image. However, the optimizations of the algorithms are designed with different standards. We employ a simple matrix equation to describe the Pan-sharpening problem. The conditions for the existence of a solution and the acquisition of spectral and spatial resolution are discussed. A down-sampling enhancement method is introduced to improve the estimation of spatial and spectral down-sample matrices. Using generalized inverse theory, we discovered two kinds of solution spaces of generalized inverse matrix formulations, which correspond to the two prominent classes of Pan-sharpening methods: component substitution and multi-resolution analysis. Specifically, the Gram-Schmidt adaptive method is demonstrated to align with the generalized inverse matrix formulation of component substitution. A model prior of the generalized inverse matrix of the spectral function is rendered. Theoretical errors are analyzed. The diffusion prior is naturally embedded with the help of general solution spaces of the generalized inverse form, enabling the acquisition of refined Pan-sharpening results. Extensive experiments, including comparative, synthetic, real-data ablation and diffusion-related tests are conducted. The proposed methods produce qualitatively sharper and superior results in both synthetic and real experiments. The down-sampling enhancement method demonstrates quantitatively and qualitatively better outcomes in real-data experiments. The diffusion prior can significantly improve the performance of our methods across almost all evaluation measures. The generalized inverse matrix theory helps deepen the understanding of Pan-sharpening mechanisms.
nan
Article 355
Title@2025-07-16 (3): CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching
Title: CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching | CosmoFlow: Scale-Aware Representative Learning für die Kosmologie mit Flow Matching | CosmoFlow: 以流动匹配方式进行宇宙学规模- 软件代表制学习 2507.11842v1 |
Authors (4): Sidharth Kannan, Tian Qiu, Carolina Cuesta-Lazaro, Haewon Jeong
Generative machine learning models have been demonstrated to be able to learn low dimensional representations of data that preserve information required for downstream tasks. In this work, we demonstrate that flow matching based generative models can learn compact, semantically rich latent representations of field level cold dark matter (CDM) simulation data without supervision. Our model, CosmoFlow, learns representations 32x smaller than the raw field data, usable for field level reconstruction, synthetic data generation, and parameter inference. Our model also learns interpretable representations, in which different latent channels correspond to features at different cosmological scales.
nan
Article 356
Title@2025-07-16 (3): Protenix-Mini: Efficient Structure Predictor via Compact Architecture, Few-Step Diffusion and Switchable pLM
Title: Protenix-Mini: Efficient Structure Predictor via Compact Architecture, Few-Step Diffusion and Switchable pLM | Protenix-Mini: Effizienter Strukturvorhersage über kompakte Architektur, wenige Schritte Diffusion und umschaltbare pLM | Protenix-Mini:通过集约结构结构、很少批发和可转接的PLM, 高效的结构预测器 2507.11839v1 |
Authors (6): Chengyue Gong, Xinshi Chen, Yuxuan Zhang, Yuxuan Song, Hao Zhou, Wenzhi Xiao
Lightweight inference is critical for biomolecular structure prediction and other downstream tasks, enabling efficient real-world deployment and inference-time scaling for large-scale applications. In this work, we address the challenge of balancing model efficiency and prediction accuracy by making several key modifications, 1) Multi-step AF3 sampler is replaced by a few-step ODE sampler, significantly reducing computational overhead for the diffusion module part during inference; 2) In the open-source Protenix framework, a subset of pairformer or diffusion transformer blocks doesn’t make contributions to the final structure prediction, presenting opportunities for architectural pruning and lightweight redesign; 3) A model incorporating an ESM module is trained to substitute the conventional MSA module, reducing MSA preprocessing time. Building on these key insights, we present Protenix-Mini, a compact and optimized model designed for efficient protein structure prediction. This streamlined version incorporates a more efficient architectural design with a two-step Ordinary Differential Equation (ODE) sampling strategy. By eliminating redundant Transformer components and refining the sampling process, Protenix-Mini significantly reduces model complexity with slight accuracy drop. Evaluations on benchmark datasets demonstrate that it achieves high-fidelity predictions, with only a negligible 1 to 5 percent decrease in performance on benchmark datasets compared to its full-scale counterpart. This makes Protenix-Mini an ideal choice for applications where computational resources are limited but accurate structure prediction remains crucial.
nan
Article 357
Title@2025-07-16 (3): HyperEvent:Learning Cohesive Events for Large-scale Dynamic Link Prediction
Title: HyperEvent:Learning Cohesive Events for Large-scale Dynamic Link Prediction | HyperEvent:Learning Cohesive Events für groß angelegte dynamische Link-Vorhersage | HyperEvent: 大型动态链接预测学习共聚活动 2507.11836v1 |
Authors (3): Jian Gao, Jianshe Wu, JingYi Ding
Dynamic link prediction in continuous-time dynamic graphs is a fundamental task for modeling evolving complex systems. Existing node-centric and event-centric methods focus on individual interactions or atomic states, failing to capture the structural cohesion of composite hyper-events, groups of causally related events. To address this, we propose HyperEvent, a framework reframing dynamic link prediction as hyper-event recognition. Central to HyperEvent is the dynamic construction of an association sequence using event correlation vectors. These vectors quantify pairwise dependencies between the query event and relevant historical events, thereby characterizing the structural cohesion of a potential hyper-event. The framework predicts the occurrence of the query event by evaluating whether it collectively forms a valid hyper-event with these historical events. Notably, HyperEvent outperforms state-of-the-art methods on 4 out of 5 datasets in the official leaderboard. For scalability, we further introduce an efficient parallel training algorithm that segments large event streams to enable concurrent training. Experiments validate HyperEvent’s superior accuracy and efficiency on large-scale graphs. Among which HyperEvent achieves a 6.95% improvement in Mean Reciprocal Rank over state-of-the-art baseline on the large-scale Flight dataset while utilizing only 10.17% of the training time.
nan
Article 358
Title@2025-07-16 (3): MatRL: Provably Generalizable Iterative Algorithm Discovery via Monte-Carlo Tree Search
Title: MatRL: Provably Generalizable Iterative Algorithm Discovery via Monte-Carlo Tree Search | MatRL: Wahrscheinlich verallgemeinerbare iterative Algorithmen Entdeckung über Monte-Carlo Baumsuche | MatRL: 通过蒙特-卡洛树搜索 发现可普遍实现的迭代性电算算法 2507.03833v2 |
Authors (4): Sungyoon Kim, Rajat Vadiraj Dwaraknath, Longling geng, Mert Pilanci
Iterative methods for computing matrix functions have been extensively studied and their convergence speed can be significantly improved with the right tuning of parameters and by mixing different iteration types. Handtuning the design options for optimal performance can be cumbersome, especially in modern computing environments: numerous different classical iterations and their variants exist, each with non-trivial per-step cost and tuning parameters. To this end, we propose MatRL – a reinforcement learning based framework that automatically discovers iterative algorithms for computing matrix functions. The key idea is to treat algorithm design as a sequential decision-making process. Monte-Carlo tree search is then used to plan a hybrid sequence of matrix iterations and step sizes, tailored to a specific input matrix distribution and computing environment. Moreover, we also show that the learned algorithms provably generalize to sufficiently large matrices drawn from the same distribution. Finally, we corroborate our theoretical results with numerical experiments demonstrating that MatRL produces algorithms that outperform various baselines in the literature.
nan
Article 359
Title@2025-07-16 (3): Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
Title: Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI | Arctic Inferenz mit Shift Parallelismus: Schnelles und effizientes Open Source Inferenzsystem für Enterprise AI | 北极与转移平行主义的推论:企业AI快速有效的开放源码推断系统 2507.11830v1 |
Authors (8): Samyam Rajbhandari, Mert Hidayetoglu, Aurick Qiao, Ye Wang, Juncheng Yang, Jeff Rasley, Michael Wyatt, Yuxiong He
Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost. Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introduces Shift Parallelism, a dynamic parallelism strategy that adapts to real-world traffic while integrating speculative decoding, SwiftKV compute reduction, and optimized embedding inference. It achieves up to 3.4 times faster request completion, 1.75 times faster generation, and 1.6M tokens/sec per GPU for embeddings, outperforming both latency- and throughput-optimized deployments. Already powering Snowflake Cortex AI, Arctic Inference delivers state-of-the-art, cost-effective inference for enterprise AI and is now available to the community.
nan
Article 360
Title@2025-07-16 (3): A Group Theoretic Analysis of the Symmetries Underlying Base Addition and Their Learnability by Neural Networks
Title: A Group Theoretic Analysis of the Symmetries Underlying Base Addition and Their Learnability by Neural Networks | Eine gruppentheoretische Analyse der Symmetrien, die Basiszusatz und ihre Erlernbarkeit durch neurale Netzwerke sind | 神经网络对基底添加的对称及其可学习性进行小组理论分析 2507.10678v2 |
Authors (4): Cutter Dawes, Simon Segert, Kamesh Krishnamurthy, Jonathan D. Cohen
A major challenge in the use of neural networks both for modeling human cognitive function and for artificial intelligence is the design of systems with the capacity to efficiently learn functions that support radical generalization. At the roots of this is the capacity to discover and implement symmetry functions. In this paper, we investigate a paradigmatic example of radical generalization through the use of symmetry: base addition. We present a group theoretic analysis of base addition, a fundamental and defining characteristic of which is the carry function – the transfer of the remainder, when a sum exceeds the base modulus, to the next significant place. Our analysis exposes a range of alternative carry functions for a given base, and we introduce quantitative measures to characterize these. We then exploit differences in carry functions to probe the inductive biases of neural networks in symmetry learning, by training neural networks to carry out base addition using different carries, and comparing efficacy and rate of learning as a function of their structure. We find that even simple neural networks can achieve radical generalization with the right input format and carry function, and that learnability is closely correlated with carry function structure. We then discuss the relevance this has for cognitive science and machine learning.
nan
Article 361
Title@2025-07-16 (3): Extension OL-MDISF: Online Learning from Mix-Typed, Drifted, and Incomplete Streaming Features
Title: Extension OL-MDISF: Online Learning from Mix-Typed, Drifted, and Incomplete Streaming Features | Erweiterung OL-MDISF: Online-Lernen von Mix-Typed, Drifted und Unvollständige Streaming-Funktionen | OL-MDISF:从混ix-Typed、drifted和不完全流的特征网上学习 2507.10594v2 |
Authors (5): Shengda Zhuo, Di Wu, Yi He, Shuqiang Huang, Xindong Wu
Online learning, where feature spaces can change over time, offers a flexible learning paradigm that has attracted considerable attention. However, it still faces three significant challenges. First, the heterogeneity of real-world data streams with mixed feature types presents challenges for traditional parametric modeling. Second, data stream distributions can shift over time, causing an abrupt and substantial decline in model performance. Additionally, the time and cost constraints make it infeasible to label every data instance in a supervised setting. To overcome these challenges, we propose a new algorithm Online Learning from Mix-typed, Drifted, and Incomplete Streaming Features (OL-MDISF), which aims to relax restrictions on both feature types, data distribution, and supervision information. Our approach involves utilizing copula models to create a comprehensive latent space, employing an adaptive sliding window for detecting drift points to ensure model stability, and establishing label proximity information based on geometric structural relationships. To demonstrate the model’s efficiency and effectiveness, we provide theoretical analysis and comprehensive experimental results. This extension serves as a standalone technical reference to the original OL-MDISF method. It provides (i) a contextual analysis of OL-MDISF within the broader landscape of online learning, covering recent advances in mixed-type feature modeling, concept drift adaptation, and weak supervision, and (ii) a comprehensive set of experiments across 14 real-world datasets under two types of drift scenarios. These include full CER trends, ablation studies, sensitivity analyses, and temporal ensemble dynamics. We hope this document can serve as a reproducible benchmark and technical resource for researchers working on nonstationary, heterogeneous, and weakly supervised data streams.
nan
Article 362
Title@2025-07-16 (3): Regret Analysis for Randomized Gaussian Process Upper Confidence Bound
Title: Regret Analysis for Randomized Gaussian Process Upper Confidence Bound | Bedauerliche Analyse für Randomized Gaussian Prozess Oberes Vertrauen Gebunden | 对随机调整高斯进程最高信任圈的遗憾分析 2409.00979v3 |
Authors (3): Shion Takeno, Yu Inatsu, Masayuki Karasuyama
Gaussian process upper confidence bound (GP-UCB) is a theoretically established algorithm for Bayesian optimization (BO), where we assume the objective function $f$ follows a GP. One notable drawback of GP-UCB is that the theoretical confidence parameter $\beta$ increases along with the iterations and is too large. To alleviate this drawback, this paper analyzes the randomized variant of GP-UCB called improved randomized GP-UCB (IRGP-UCB), which uses the confidence parameter generated from the shifted exponential distribution. We analyze the expected regret and conditional expected regret, where the expectation and the probability are taken respectively with $f$ and noise and with the randomness of the BO algorithm. In both regret analyses, IRGP-UCB achieves a sub-linear regret upper bound without increasing the confidence parameter if the input domain is finite. Furthermore, we show that randomization plays a key role in avoiding an increase in confidence parameter by showing that GP-UCB using a constant confidence parameter can incur linearly growing expected cumulative regret. Finally, we show numerical experiments using synthetic and benchmark functions and real-world emulators.
nan
Article 363
Title@2025-07-16 (3): Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving
Title: Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving | Proaktive Intra-GPU-Disaggregation von Prefill und Decode in LLM Serving | 预填和解除LLM服务中编码的预填和分解 2507.06608v4 |
Authors (3): Xiaoxiang Shi, Colin Cai, Junjia Du
Monolithic serving with chunked prefill improves GPU utilization by batching prefill and decode together, but suffers from fine-grained phase interference. Engine-level prefill-decode (PD) disaggregation avoids interference but incurs higher hardware and coordination overhead. Prior intra-GPU disaggregation approaches multiplex prefill and decode within a single GPU, using SLO-based tuning guided by heuristics from offline profiling or reactive feedback loops. However, these methods respond reactively to performance issues rather than anticipating them, limiting adaptability under dynamic workloads. We ask: can we achieve proactive intra-GPU disaggregation that adapts effectively to dynamic workloads? The key challenge lies in managing the conflicting resource demands of prefill and decode under varying conditions. We first show that GPU resources exhibit diminishing returns – beyond a saturation point, more allocation yields minimal latency benefit. Second, we observe that memory bandwidth contention becomes a critical bottleneck. These insights motivate a design that dynamically partitions GPU resources across prefill and decode phases, while jointly considering compute capacity, memory footprint, and bandwidth contention. Evaluated on diverse LLMs and workloads, our system Nexus achieves up to 2.2x higher throughput, 20x lower TTFT, and 2.5x lower TBT than vLLM; outperforms SGLang by up to 2x; and matches or exceeds disaggregated vLLM.
nan
Article 364
Title@2025-07-16 (3): BiLO: Bilevel Local Operator Learning for PDE Inverse Problems. Part I: PDE-Constrained Optimization
Title: BiLO: Bilevel Local Operator Learning for PDE Inverse Problems. Part I: PDE-Constrained Optimization | BiLO: Zweistufiges lokales Operator-Lernen für inverse PDE-Probleme. Teil I: PDE-Kontrainierte Optimierung | BILO: 双级当地操作员学习PDE反问题。 第一部分:受PDE约束的优化 2404.17789v5 |
Authors (4): Ray Zirui Zhang, Christopher E. Miles, Xiaohui Xie, John S. Lowengrub
We propose a new neural network based method for solving inverse problems for partial differential equations (PDEs) by formulating the PDE inverse problem as a bilevel optimization problem. At the upper level, we minimize the data loss with respect to the PDE parameters. At the lower level, we train a neural network to locally approximate the PDE solution operator in the neighborhood of a given set of PDE parameters, which enables an accurate approximation of the descent direction for the upper level optimization problem. The lower level loss function includes the L2 norms of both the residual and its derivative with respect to the PDE parameters. We apply gradient descent simultaneously on both the upper and lower level optimization problems, leading to an effective and fast algorithm. The method, which we refer to as BiLO (Bilevel Local Operator learning), is also able to efficiently infer unknown functions in the PDEs through the introduction of an auxiliary variable. We provide a theoretical analysis that justifies our approach. Through extensive experiments over multiple PDE systems, we demonstrate that our method enforces strong PDE constraints, is robust to sparse and noisy data, and eliminates the need to balance the residual and the data loss, which is inherent to the soft PDE constraints in many existing methods.
nan
Article 365
Title@2025-07-16 (3): Symbiosis: Multi-Adapter Inference and Fine-Tuning
Title: Symbiosis: Multi-Adapter Inference and Fine-Tuning | Symbiose: Multi-Adapter-Schlussfolgerung und Feinabstimmung | 共生关系:多位开发商的推断和精准调整 2507.03220v2 |
Authors (4): Saransh Gupta, Umesh Deshpande, Travis Janssen, Swami Sundararaman
Parameter-efficient fine-tuning (PEFT) allows model builders to capture the task specific parameters into adapters, which are a fraction of the size of the original base model. Popularity of PEFT technique for fine-tuning has led to creation of a large number of adapters for popular Large Language Models (LLMs). However, existing frameworks fall short in supporting inference or fine-tuning with multiple adapters in the following ways. 1) For fine-tuning, each job needs to deploy its dedicated base model instance, which results in excessive GPU memory consumption and poor GPU utilization. 2) While popular inference platforms can serve multiple PEFT adapters, they do not allow independent resource management or mixing of different PEFT methods. 3) They cannot share resources (such as base model instance) between inference and fine-tuning jobs. 4) They do not provide privacy to users who may not wish to expose their fine-tuned parameters to service providers. In Symbiosis, we address the above problems by enabling as-a-service deployment of base model. The base model layers can be shared across multiple inference or fine-tuning processes. Our split-execution technique decouples the execution of client-specific adapters and layers from the frozen base model layers offering them flexibility to manage their resources, to select their fine-tuning method, to achieve their performance goals. Our approach is transparent to models and works out-of-the-box for most models in the transformers library. Our evaluation on Llama2-13B shows the compared to baseline, Symbiosis can fine-tune 4X more adapters on the same set of GPUs in the same amount of time.
nan
Article 366
Title@2025-07-16 (3): MNIST-Gen: A Modular MNIST-Style Dataset Generation Using Hierarchical Semantics, Reinforcement Learning, and Category Theory
Title: MNIST-Gen: A Modular MNIST-Style Dataset Generation Using Hierarchical Semantics, Reinforcement Learning, and Category Theory | MNIST-Gen: Eine modulare MNIST-Style-Datensatz-Generation mit Hierarchischer Semantik, Verstärkungslernen und Kategorietheorie | MNIST-Gen:利用等级的语义、强化学习和分类理论,形成一个Modular MNIST-Style数据集 2507.11821v1 |
Authors (3): Pouya Shaeri, Arash Karimi, Ariane Middel
Neural networks are often benchmarked using standard datasets such as MNIST, FashionMNIST, or other variants of MNIST, which, while accessible, are limited to generic classes such as digits or clothing items. For researchers working on domain-specific tasks, such as classifying trees, food items, or other real-world objects, these data sets are insufficient and irrelevant. Additionally, creating and publishing a custom dataset can be time consuming, legally constrained, or beyond the scope of individual projects. We present MNIST-Gen, an automated, modular, and adaptive framework for generating MNIST-style image datasets tailored to user-specified categories using hierarchical semantic categorization. The system combines CLIP-based semantic understanding with reinforcement learning and human feedback to achieve intelligent categorization with minimal manual intervention. Our hierarchical approach supports complex category structures with semantic characteristics, enabling fine-grained subcategorization and multiple processing modes: individual review for maximum control, smart batch processing for large datasets, and fast batch processing for rapid creation. Inspired by category theory, MNIST-Gen models each data transformation stage as a composable morphism, enhancing clarity, modularity, and extensibility. As proof of concept, we generate and benchmark two novel datasets-\textit{Tree-MNIST} and \textit{Food-MNIST}-demonstrating MNIST-Gen’s utility for producing task-specific evaluation data while achieving 85\% automatic categorization accuracy and 80\% time savings compared to manual approaches.
nan
Article 367
Title@2025-07-16 (3): SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling
Title: SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling | SynCoGen: Synthesizable 3D-Molekül-Generation über Gelenkreaktion und Koordinatenmodellierung | SynCoGen:通过联合反应和协调建模,同步可3D分子生成 2507.11818v1 |
Authors (9): Andrei Rekesh, Miruna Cretu, Dmytro Shevchuk, Vignesh Ram Somnath, Pietro Liò, Robert A. Batey, Mike Tyers, Michał Koziarski, Cheng-Hao Liu
Ensuring synthesizability in generative small molecule design remains a major challenge. While recent developments in synthesizable molecule generation have demonstrated promising results, these efforts have been largely confined to 2D molecular graph representations, limiting the ability to perform geometry-based conditional generation. In this work, we present SynCoGen (Synthesizable Co-Generation), a single framework that combines simultaneous masked graph diffusion and flow matching for synthesizable 3D molecule generation. SynCoGen samples from the joint distribution of molecular building blocks, chemical reactions, and atomic coordinates. To train the model, we curated SynSpace, a dataset containing over 600K synthesis-aware building block graphs and 3.3M conformers. SynCoGen achieves state-of-the-art performance in unconditional small molecule graph and conformer generation, and the model delivers competitive performance in zero-shot molecular linker design for protein ligand generation in drug discovery. Overall, this multimodal formulation represents a foundation for future applications enabled by non-autoregressive molecular generation, including analog expansion, lead optimization, and direct structure conditioning.
nan
Article 368
Title@2025-07-16 (3): Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models
Title: Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models | Nachvollziehen von Fakten oder nur Kopien? Eine kritische Untersuchung der Wettbewerbe von Mechanismen in großen Sprachmodellen | 对大语言模式机制竞争情况的重要调查 2507.11809v1 |
Authors (4): Dante Campregher, Yanxu Chen, Sander Hoffman, Maria Heuss
This paper presents a reproducibility study examining how Large Language Models (LLMs) manage competing factual and counterfactual information, focusing on the role of attention heads in this process. We attempt to reproduce and reconcile findings from three recent studies by Ortu et al., Yu, Merullo, and Pavlick and McDougall et al. that investigate the competition between model-learned facts and contradictory context information through Mechanistic Interpretability tools. Our study specifically examines the relationship between attention head strength and factual output ratios, evaluates competing hypotheses about attention heads’ suppression mechanisms, and investigates the domain specificity of these attention patterns. Our findings suggest that attention heads promoting factual output do so via general copy suppression rather than selective counterfactual suppression, as strengthening them can also inhibit correct facts. Additionally, we show that attention head behavior is domain-dependent, with larger models exhibiting more specialized and category-sensitive patterns.
nan
Article 369
Title@2025-07-16 (3): CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategy for Learning with Noisy Labels
Title: CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategy for Learning with Noisy Labels | CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategie zum Lernen mit lauteren Etiketten | CLID-MU:跨行业信息差异:基于跨行业信息差异的Met Met 最新学习战略,有噪音标签 2507.11807v1 |
Authors (4): Ruofan Hu, Dongyu Zhang, Huayi Zhang, Elke Rundensteiner
Learning with noisy labels (LNL) is essential for training deep neural networks with imperfect data. Meta-learning approaches have achieved success by using a clean unbiased labeled set to train a robust model. However, this approach heavily depends on the availability of a clean labeled meta-dataset, which is difficult to obtain in practice. In this work, we thus tackle the challenge of meta-learning for noisy label scenarios without relying on a clean labeled dataset. Our approach leverages the data itself while bypassing the need for labels. Building on the insight that clean samples effectively preserve the consistency of related data structures across the last hidden and the final layer, whereas noisy samples disrupt this consistency, we design the Cross-layer Information Divergence-based Meta Update Strategy (CLID-MU). CLID-MU leverages the alignment of data structures across these diverse feature spaces to evaluate model performance and use this alignment to guide training. Experiments on benchmark datasets with varying amounts of labels under both synthetic and real-world noise demonstrate that CLID-MU outperforms state-of-the-art methods. The code is released at https://github.com/ruofanhu/CLID-MU.
nan
Article 370
Title@2025-07-16 (3): MOFSimBench: Evaluating Universal Machine Learning Interatomic Potentials In Metal–Organic Framework Molecular Modeling
Title: MOFSimBench: Evaluating Universal Machine Learning Interatomic Potentials In Metal–Organic Framework Molecular Modeling | MOFSimBench: Bewertung der interatomaren Potentiale des universellen maschinellen Lernens in Metall–Organic Framework Molecular Modeling | MOFSimBench:评价金属-有机框架中的通用机器学习和相互作用潜力 2507.11806v1 |
Authors (3): Hendrik Kraß, Ju Huang, Seyed Mohamad Moosavi
Universal machine learning interatomic potentials (uMLIPs) have emerged as powerful tools for accelerating atomistic simulations, offering scalable and efficient modeling with accuracy close to quantum calculations. However, their reliability and effectiveness in practical, real-world applications remain an open question. Metal-organic frameworks (MOFs) and related nanoporous materials are highly porous crystals with critical relevance in carbon capture, energy storage, and catalysis applications. Modeling nanoporous materials presents distinct challenges for uMLIPs due to their diverse chemistry, structural complexity, including porosity and coordination bonds, and the absence from existing training datasets. Here, we introduce MOFSimBench, a benchmark to evaluate uMLIPs on key materials modeling tasks for nanoporous materials, including structural optimization, molecular dynamics (MD) stability, the prediction of bulk properties, such as bulk modulus and heat capacity, and guest-host interactions. Evaluating over 20 models from various architectures on a chemically and structurally diverse materials set, we find that top-performing uMLIPs consistently outperform classical force fields and fine-tuned machine learning potentials across all tasks, demonstrating their readiness for deployment in nanoporous materials modeling. Our analysis highlights that data quality, particularly the diversity of training sets and inclusion of out-of-equilibrium conformations, plays a more critical role than model architecture in determining performance across all evaluated uMLIPs. We release our modular and extendable benchmarking framework at https://github.com/AI4ChemS/mofsim-bench, providing an open resource to guide the adoption for nanoporous materials modeling and further development of uMLIPs.
nan
Article 371
Title@2025-07-15 (2): Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation
Title: Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation | Verstärkung der latenten euklidischen Geometrie in Single-Cell VAEs für Manifold Interpolation | 在单细胞VAEs 中执行中流的欧洲立地磷化物几何测量以用于 MManided Indigitation 2507.11789v1 |
Authors (5): Alessandro Palma, Sergei Rybakov, Leon Hetzel, Stephan Günnemann, Fabian J. Theis
Latent space interpolations are a powerful tool for navigating deep generative models in applied settings. An example is single-cell RNA sequencing, where existing methods model cellular state transitions as latent space interpolations with variational autoencoders, often assuming linear shifts and Euclidean geometry. However, unless explicitly enforced, linear interpolations in the latent space may not correspond to geodesic paths on the data manifold, limiting methods that assume Euclidean geometry in the data representations. We introduce FlatVI, a novel training framework that regularises the latent manifold of discrete-likelihood variational autoencoders towards Euclidean geometry, specifically tailored for modelling single-cell count data. By encouraging straight lines in the latent space to approximate geodesic interpolations on the decoded single-cell manifold, FlatVI enhances compatibility with downstream approaches that assume Euclidean latent geometry. Experiments on synthetic data support the theoretical soundness of our approach, while applications to time-resolved single-cell RNA sequencing data demonstrate improved trajectory reconstruction and manifold interpolation.
nan
Article 372
Title@2025-07-15 (2): Lost in Transmission: When and Why LLMs Fail to Reason Globally
Title: Lost in Transmission: When and Why LLMs Fail to Reason Globally | Verloren in der Übertragung: Wann und warum LLMs weltweit nicht vernünftig sind | LLLM女士何时和为何未能达到全球范围的理由 2505.08140v3 |
Authors (4): Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan, Jennifer Neville
Despite their many successes, transformer-based large language models (LLMs) continue to struggle with tasks that require complex reasoning over large parts of their input. We argue that these failures arise due to capacity limits on the accurate flow of information within LLMs. To formalize this issue, we introduce the bounded attention prefix oracle (BAPO) model, a new computational framework that models bandwidth constraints on attention heads, the mechanism for internal communication in LLMs. We show that several important reasoning problems like graph reachability require high communication bandwidth for BAPOs to solve; we call these problems BAPO-hard. Our experiments corroborate our theoretical predictions: GPT-4o, Claude, and Gemini succeed on BAPO-easy tasks and fail even on relatively small BAPO-hard tasks. BAPOs also reveal another benefit of chain of thought (CoT): we prove that breaking down a task using CoT can turn any BAPO-hard problem into a BAPO-easy one. Our results offer principled explanations for key LLM failures and suggest directions for architectures and inference methods that mitigate bandwidth limits.
nan
Article 373
Title@2025-07-15 (2): Metalic: Meta-Learning In-Context with Protein Language Models
Title: Metalic: Meta-Learning In-Context with Protein Language Models | Metallic: Meta-Learning im Kontext mit Protein-Sprachmodellen | 金属:使用蛋白素语言模型的元学习内文 2410.08355v3 |
Authors (7): Jacob Beck, Shikha Surana, Manus McAuliffe, Oliver Bent, Thomas D. Barrett, Juan Jose Garau Luis, Paul Duckworth
Predicting the biophysical and functional properties of proteins is essential for in silico protein design. Machine learning has emerged as a promising technique for such prediction tasks. However, the relative scarcity of in vitro annotations means that these models often have little, or no, specific data on the desired fitness prediction task. As a result of limited data, protein language models (PLMs) are typically trained on general protein sequence modeling tasks, and then fine-tuned, or applied zero-shot, to protein fitness prediction. When no task data is available, the models make strong assumptions about the correlation between the protein sequence likelihood and fitness scores. In contrast, we propose meta-learning over a distribution of standard fitness prediction tasks, and demonstrate positive transfer to unseen fitness prediction tasks. Our method, called Metalic (Meta-Learning In-Context), uses in-context learning and fine-tuning, when data is available, to adapt to new tasks. Crucially, fine-tuning enables considerable generalization, even though it is not accounted for during meta-training. Our fine-tuned models achieve strong results with 18 times fewer parameters than state-of-the-art models. Moreover, our method sets a new state-of-the-art in low-data settings on ProteinGym, an established fitness-prediction benchmark. Due to data scarcity, we believe meta-learning will play a pivotal role in advancing protein engineering.
nan
Article 374
Title@2025-07-15 (2): Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks
Title: Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks | Implizite Bias des gradienten Abstiegs für nicht-homogene Deep Networks | 非同源深层网络的梯发隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐隐 2502.16075v2 |
Authors (6): Yuhang Cai, Kangjie Zhou, Jingfeng Wu, Song Mei, Michael Lindsey, Peter L. Bartlett
We establish the asymptotic implicit bias of gradient descent (GD) for generic non-homogeneous deep networks under exponential loss. Specifically, we characterize three key properties of GD iterates starting from a sufficiently small empirical risk, where the threshold is determined by a measure of the network’s non-homogeneity. First, we show that a normalized margin induced by the GD iterates increases nearly monotonically. Second, we prove that while the norm of the GD iterates diverges to infinity, the iterates themselves converge in direction. Finally, we establish that this directional limit satisfies the Karush-Kuhn-Tucker (KKT) conditions of a margin maximization problem. Prior works on implicit bias have focused exclusively on homogeneous networks; in contrast, our results apply to a broad class of non-homogeneous networks satisfying a mild near-homogeneity condition. In particular, our results apply to networks with residual connections and non-homogeneous activation functions, thereby resolving an open problem posed by Ji and Telgarsky (2020).
nan
Article 375
Title@2025-07-15 (2): Foundation Models for Brain Signals: A Critical Review of Current Progress and Future Directions
Title: Foundation Models for Brain Signals: A Critical Review of Current Progress and Future Directions | Grundlagenmodelle für Gehirnsignale: Ein kritischer Überblick über aktuelle Fortschritte und zukünftige Richtungen | 脑信号基础模型:对当前进展和未来方向的重要审查 2507.11783v1 |
Authors (3): Gayal Kuruppu, Neeraj Wagh, Yogatheesan Varatharajah
Patterns of electrical brain activity recorded via electroencephalography (EEG) offer immense value for scientific and clinical investigations. The inability of supervised EEG encoders to learn robust EEG patterns and their over-reliance on expensive signal annotations have sparked a transition towards general-purpose self-supervised EEG encoders, i.e., EEG foundation models (EEG-FMs), for robust and scalable EEG feature extraction. However, the real-world readiness of early EEG-FMs and the rubric for long-term research progress remain unclear. A systematic and comprehensive review of first-generation EEG-FMs is therefore necessary to understand the current state-of-the-art and identify key directions for future EEG-FMs. To that end, this study reviews 10 early EEG-FMs and presents a critical synthesis of their methodology, empirical findings, and outstanding research gaps. We find that most EEG-FMs adopt a sequence-based modeling scheme that relies on transformer-based backbones and the reconstruction of masked sequences for self-supervision. However, model evaluations remain heterogeneous and largely limited, making it challenging to assess their practical off-the-shelf utility. In addition to adopting standardized and realistic evaluations, future work should demonstrate more substantial scaling effects and make principled and trustworthy choices throughout the EEG representation learning pipeline. We believe that developing benchmarks, software tools, technical methodologies, and applications in collaboration with domain experts may further advance the translational utility and real-world adoption of EEG-FMs.
nan
Article 376
Title@2025-07-15 (2): Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction
Title: Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction | Generalisierte Venn- und Venn-Abers-Kalibrierung mit Anwendungen in konformer Vorhersage | 通用文文和文安-用非正式预测对应用进行校准 2502.05676v3 |
Authors (2): Lars van der Laan, Ahmed Alaa
Ensuring model calibration is critical for reliable prediction, yet popular distribution-free methods such as histogram binning and isotonic regression offer only asymptotic guarantees. We introduce a unified framework for Venn and Venn-Abers calibration that extends Vovk’s approach beyond binary classification to a broad class of prediction problems defined by generic loss functions. Our method transforms any perfectly in-sample calibrated predictor into a set-valued predictor that, in finite samples, outputs at least one marginally calibrated point prediction. These set predictions shrink asymptotically and converge to a single conditionally calibrated prediction, capturing epistemic uncertainty. We further propose Venn multicalibration, a new approach for achieving finite-sample calibration across subpopulations. For quantile loss, our framework recovers group-conditional and multicalibrated conformal prediction as special cases and yields novel prediction intervals with quantile-conditional coverage.
nan
Article 377
Title@2025-07-15 (2): Inference on Optimal Policy Values and Other Irregular Functionals via Smoothing
Title: Inference on Optimal Policy Values and Other Irregular Functionals via Smoothing | Schlussfolgerung zu optimalen Policy Values und anderen irregulären Funktionen durch Glätten | 通过平滑对最佳政策价值和其他不正常功能的推论 2507.11780v1 |
Authors (3): Justin Whitehouse, Morgane Austern, Vasilis Syrgkanis
Constructing confidence intervals for the value of an optimal treatment policy is an important problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable. Existing approaches for handling this non-differentiability fall roughly into two camps. In one camp are estimators based on constructing smooth approximations of the optimal value. These approaches are computationally lightweight, but typically place unrealistic parametric assumptions on outcome regressions. In another camp are approaches that directly de-bias the non-smooth objective. These approaches don’t place parametric assumptions on nuisance functions, but they either require the computation of intractably-many nuisance estimates, assume unrealistic $L^\infty$ nuisance convergence rates, or make strong margin assumptions that prohibit non-response to a treatment. In this paper, we revisit the problem of constructing smooth approximations of non-differentiable functionals. By carefully controlling first-order bias and second-order remainders, we show that a softmax smoothing-based estimator can be used to estimate parameters that are specified as a maximum of scores involving nuisance components. In particular, this includes the value of the optimal treatment policy as a special case. Our estimator obtains $\sqrt{n}$ convergence rates, avoids parametric restrictions/unrealistic margin assumptions, and is often statistically efficient.
nan
Article 378
Title@2025-07-15 (2): Predicting Delayed Trajectories Using Network Features: A Study on the Dutch Railway Network
Title: Predicting Delayed Trajectories Using Network Features: A Study on the Dutch Railway Network | Vorhersage verzögerter Bahnen mit Netzwerkmerkmalen: Eine Studie zum niederländischen Eisenbahnnetz | 利用网络特点预测延迟轨道:关于荷兰铁路网的研究 2507.11776v1 |
Authors (2): Merel Kampere, Ali Mohammed Mansoor Alsahag
The Dutch railway network is one of the busiest in the world, with delays being a prominent concern for the principal passenger railway operator NS. This research addresses a gap in delay prediction studies within the Dutch railway network by employing an XGBoost Classifier with a focus on topological features. Current research predominantly emphasizes short-term predictions and neglects the broader network-wide patterns essential for mitigating ripple effects. This research implements and improves an existing methodology, originally designed to forecast the evolution of the fast-changing US air network, to predict delays in the Dutch Railways. By integrating Node Centrality Measures and comparing multiple classifiers like RandomForest, DecisionTree, GradientBoosting, AdaBoost, and LogisticRegression, the goal is to predict delayed trajectories. However, the results reveal limited performance, especially in non-simultaneous testing scenarios, suggesting the necessity for more context-specific adaptations. Regardless, this research contributes to the understanding of transportation network evaluation and proposes future directions for developing more robust predictive models for delays.
nan
Article 379
Title@2025-07-15 (2): Scaling laws for activation steering with Llama 2 models and refusal mechanisms
Title: Scaling laws for activation steering with Llama 2 models and refusal mechanisms | Skalierungsgesetze für die Aktivierungssteuerung mit Llama 2 Modellen und Ablehnungsmechanismen | 以Llama 2模式和拒绝机制启动指导的法律 2507.11771v1 |
Authors (6): Sheikh Abdur Raheem Ali, Justin Xu, Ivory Yang, Jasmine Xinze Li, Ayse Arslan, Clark Benham
As large language models (LLMs) evolve in complexity and capability, the efficacy of less widely deployed alignment techniques are uncertain. Building on previous work on activation steering and contrastive activation addition (CAA), this paper explores the effectiveness of CAA with model scale using the family of Llama 2 models (7B, 13B, and 70B). CAA works by finding desirable ‘directions’ in the model’s residual stream vector space using contrastive pairs (for example, hate to love) and adding this direction to the residual stream during the forward pass. It directly manipulates the residual stream and aims to extract features from language models to better control their outputs. Using answer matching questions centered around the refusal behavior, we found that 1) CAA is most effective when applied at early-mid layers. 2) The effectiveness of CAA diminishes with model size. 3) Negative steering has more pronounced effects than positive steering across all model sizes.
nan
Article 380
Title@2025-07-15 (2): LLMs are Bayesian, in Expectation, not in Realization
Title: LLMs are Bayesian, in Expectation, not in Realization | LLMs sind Bayesian, in Erwartung, nicht in der Realisierung | LLMs是巴耶斯人、期望、而不是实现的巴耶斯人。 2507.11768v1 |
Authors (4): Leon Chlon, Sarah Rashidi, Zein Khamis, MarcAntonio M. Awada
Large language models demonstrate remarkable in-context learning capabilities, adapting to new tasks without parameter updates. While this phenomenon has been successfully modeled as implicit Bayesian inference, recent empirical findings reveal a fundamental contradiction: transformers systematically violate the martingale property, a cornerstone requirement of Bayesian updating on exchangeable data. This violation challenges the theoretical foundations underlying uncertainty quantification in critical applications. Our theoretical analysis establishes four key results: (1) positional encodings induce martingale violations of order $\Theta(\log n / n)$; (2) transformers achieve information-theoretic optimality with excess risk $O(n^{-1/2})$ in expectation over orderings; (3) the implicit posterior representation converges to the true Bayesian posterior in the space of sufficient statistics; and (4) we derive the optimal chain-of-thought length as $k^* = \Theta(\sqrt{n}\log(1/\varepsilon))$ with explicit constants, providing a principled approach to reduce inference costs while maintaining performance. Empirical validation on GPT-3 confirms predictions (1)-(3), with transformers reaching 99\% of theoretical entropy limits within 20 examples. Our framework provides practical methods for extracting calibrated uncertainty estimates from position-aware architectures and optimizing computational efficiency in deployment.
nan
Article 381
Title@2025-07-15 (2): Differentially Private Conformal Prediction via Quantile Binary Search
Title: Differentially Private Conformal Prediction via Quantile Binary Search | Differential private konforme Vorhersage über Quantile Binary Search | 通过量度二进制搜索的不同私人 2507.12497v1 |
Authors (2): Ogonnaya M. Romanus, Roberto Molinari
Most Differentially Private (DP) approaches focus on limiting privacy leakage from learners based on the data that they are trained on, there are fewer approaches that consider leakage when procedures involve a calibration dataset which is common in uncertainty quantification methods such as Conformal Prediction (CP). Since there is a limited amount of approaches in this direction, in this work we deliver a general DP approach for CP that we call Private Conformity via Quantile Search (P-COQS). The proposed approach adapts an existing randomized binary search algorithm for computing DP quantiles in the calibration phase of CP thereby guaranteeing privacy of the consequent prediction sets. This however comes at a price of slightly under-covering with respect to the desired $(1 - \alpha)$-level when using finite-sample calibration sets (although broad empirical results show that the P-COQS generally targets the required level in the considered cases). Confirming properties of the adapted algorithm and quantifying the approximate coverage guarantees of the consequent CP, we conduct extensive experiments to examine the effects of privacy noise, sample size and significance level on the performance of our approach compared to existing alternatives. In addition, we empirically evaluate our approach on several benchmark datasets, including CIFAR-10, ImageNet and CoronaHack. Our results suggest that the proposed method is robust to privacy noise and performs favorably with respect to the current DP alternative in terms of empirical coverage, efficiency, and informativeness. Specifically, the results indicate that P-COQS produces smaller conformal prediction sets while simultaneously targeting the desired coverage and privacy guarantees in all these experimental settings.
nan
Article 382
Title@2025-07-15 (2): SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation
Title: SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation | SAMO: Ein leicht schärfer und bewusster Ansatz für die Multi-Task-Optimierung mit gemeinsamer Global-Local-Perturbation | SAMO: 与全球-地方联合干扰进行多任务优化的轻量级锐锐利软件方法 2507.07883v3 |
Authors (3): Hao Ban, Gokul Ram Subramani, Kaiyi Ji
Multi-task learning (MTL) enables a joint model to capture commonalities across multiple tasks, reducing computation costs and improving data efficiency. However, a major challenge in MTL optimization is task conflicts, where the task gradients differ in direction or magnitude, limiting model performance compared to single-task counterparts. Sharpness-aware minimization (SAM) minimizes task loss while simultaneously reducing the sharpness of the loss landscape. Our empirical observations show that SAM effectively mitigates task conflicts in MTL. Motivated by these findings, we explore integrating SAM into MTL but face two key challenges. While both the average loss gradient and individual task gradients-referred to as global and local information-contribute to SAM, how to combine them remains unclear. Moreover, directly computing each task gradient introduces significant computational and memory overheads. To address these challenges, we propose SAMO, a lightweight \textbf{S}harpness-\textbf{A}ware \textbf{M}ulti-task \textbf{O}ptimization approach, that leverages a joint global-local perturbation. The local perturbations are approximated using only forward passes and are layerwise normalized to improve efficiency. Extensive experiments on a suite of multi-task benchmarks demonstrate both the effectiveness and efficiency of our method. Code is available at https://github.com/OptMN-Lab/SAMO.
nan
Article 383
Title@2025-07-15 (2): Torsional-GFN: a conditional conformation generator for small molecules
Title: Torsional-GFN: a conditional conformation generator for small molecules | Torsional-GFN: ein konditionaler Exterieurgenerator für kleine Moleküle | Torsional-GFN:小型分子的有条件整装发电机 2507.11759v1 |
Authors (8): Alexandra Volokhova, Léna Néhale Ezzine, Piotr Gaiński, Luca Scimeca, Emmanuel Bengio, Prudencio Tossou, Yoshua Bengio, Alex Hernandez-Garcia
Generating stable molecular conformations is crucial in several drug discovery applications, such as estimating the binding affinity of a molecule to a target. Recently, generative machine learning methods have emerged as a promising, more efficient method than molecular dynamics for sampling of conformations from the Boltzmann distribution. In this paper, we introduce Torsional-GFN, a conditional GFlowNet specifically designed to sample conformations of molecules proportionally to their Boltzmann distribution, using only a reward function as training signal. Conditioned on a molecular graph and its local structure (bond lengths and angles), Torsional-GFN samples rotations of its torsion angles. Our results demonstrate that Torsional-GFN is able to sample conformations approximately proportional to the Boltzmann distribution for multiple molecules with a single model, and allows for zero-shot generalization to unseen bond lengths and angles coming from the MD simulations for such molecules. Our work presents a promising avenue for scaling the proposed approach to larger molecular systems, achieving zero-shot generalization to unseen molecules, and including the generation of the local structure into the GFlowNet model.
nan
Article 384
Title@2025-07-15 (2): FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making
Title: FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making | FOUNDER: Erdungs-Stiftungsmodelle in Weltmodellen für offene, einkörperige Entscheidungsfindung | FOUNDER: 以世界不限名额作出不限 2507.12496v1 |
Authors (5): Yucen Wang, Rui Yu, Shenghua Wan, Le Gan, De-Chuan Zhan
Foundation Models (FMs) and World Models (WMs) offer complementary strengths in task generalization at different levels. In this work, we propose FOUNDER, a framework that integrates the generalizable knowledge embedded in FMs with the dynamic modeling capabilities of WMs to enable open-ended task solving in embodied environments in a reward-free manner. We learn a mapping function that grounds FM representations in the WM state space, effectively inferring the agent’s physical states in the world simulator from external observations. This mapping enables the learning of a goal-conditioned policy through imagination during behavior learning, with the mapped task serving as the goal state. Our method leverages the predicted temporal distance to the goal state as an informative reward signal. FOUNDER demonstrates superior performance on various multi-task offline visual control benchmarks, excelling in capturing the deep-level semantics of tasks specified by text or videos, particularly in scenarios involving complex observations or domain gaps where prior methods struggle. The consistency of our learned reward function with the ground-truth reward is also empirically validated. Our project website is https://sites.google.com/view/founder-rl.
nan
Article 385
Title@2025-07-15 (2): A Graph-in-Graph Learning Framework for Drug-Target Interaction Prediction
Title: A Graph-in-Graph Learning Framework for Drug-Target Interaction Prediction | Ein Graph-in-Graph-Lernrahmen für die Vorhersage von Drogen-Target-Interaktion | 药物-目标互动预测图示-格图学习框架 2507.11757v1 |
Authors (2): Yuehua Song, Yong Gao
Accurately predicting drug-target interactions (DTIs) is pivotal for advancing drug discovery and target validation techniques. While machine learning approaches including those that are based on Graph Neural Networks (GNN) have achieved notable success in DTI prediction, many of them have difficulties in effectively integrating the diverse features of drugs, targets and their interactions. To address this limitation, we introduce a novel framework to take advantage of the power of both transductive learning and inductive learning so that features at molecular level and drug-target interaction network level can be exploited. Within this framework is a GNN-based model called Graph-in-Graph (GiG) that represents graphs of drug and target molecular structures as meta-nodes in a drug-target interaction graph, enabling a detailed exploration of their intricate relationships. To evaluate the proposed model, we have compiled a special benchmark comprising drug SMILES, protein sequences, and their interaction data, which is interesting in its own right. Our experimental results demonstrate that the GiG model significantly outperforms existing approaches across all evaluation metrics, highlighting the benefits of integrating different learning paradigms and interaction data.
nan
Article 386
Title@2025-07-15 (2): AKReF: An argumentative knowledge representation framework for structured argumentation
Title: AKReF: An argumentative knowledge representation framework for structured argumentation | AKREF: Ein argumentativer Wissensvertretungsrahmen für strukturierte Argumentation | AKREF: 结构化论证的理论知识代表框架 2506.00713v3 |
Authors (2): Debarati Bhattacharjee, Ashish Anand
This paper presents a framework to convert argumentative texts into argument knowledge graphs (AKG). The proposed argumentative knowledge representation framework (AKReF) extends the theoretical foundation and enables the AKG to provide a graphical view of the argumentative structure that is easier to understand. Starting with basic annotations of argumentative components (ACs) and argumentative relations (ARs), we enrich the information by constructing a knowledge base (KB) graph with metadata attributes for nodes. Next, we apply modus ponens on premises and inference rules from the KB to form arguments. From these arguments, we create an AKG. The nodes and edges of the AKG have attributes capturing key argumentative features such as the type of premise (e.g., axiom, ordinary premise, assumption), the type of inference rule (e.g., strict, defeasible), preference order over defeasible rules, markers (e.g., “therefore”, “however”), and the type of attack (e.g., undercut, rebuttal, undermining). We identify inference rules by locating a specific set of markers, called inference markers (IM). This, in turn, makes it possible to identify undercut attacks previously undetectable in existing datasets. AKG prepares the ground for reasoning tasks, including checking the coherence of arguments and identifying opportunities for revision. For this, it is essential to find indirect relations, many of which are implicit. Our proposed AKG format, with annotated inference rules and modus ponens, helps reasoning models learn the implicit, indirect relations that require inference over arguments and their interconnections. We use an essay from the AAEC dataset to illustrate the framework. We further show its application in complex analyses such as extracting a conflict-free set and a maximal set of admissible arguments.
nan
Article 387
Title@2025-07-15 (2): Problem-dependent convergence bounds for randomized linear gradient compression
Title: Problem-dependent convergence bounds for randomized linear gradient compression | Problemabhängige Konvergenzgrenzen für randomisierte lineare Gradientenkompression | 随机的线性梯度压缩 2411.12898v3 |
Authors (3): Thomas Flynn, Patrick Johnstone, Shinjae Yoo
In distributed optimization, the communication of model updates can be a performance bottleneck. Consequently, gradient compression has been proposed as a means of increasing optimization throughput. In general, due to information loss, compression introduces a penalty on the number of iterations needed to reach a solution. In this work, we investigate how the iteration penalty depends on the interaction between compression and problem structure, in the context of non-convex stochastic optimization. We focus on linear schemes, where compression and decompression can be modeled as multiplication with a random matrix. We consider several distributions of matrices, among them Haar-distributed orthogonal matrices and matrices with random Gaussian entries. We find that the impact of compression on convergence can be quantified in terms of a smoothness matrix associated with the objective function, using a norm defined by the compression scheme. The analysis reveals that in certain cases, compression performance is related to low-rank structure or other spectral properties of the problem and our bounds predict that the penalty introduced by compression is significantly reduced compared to worst-case bounds that only consider the compression level, ignoring problem data. We verify the theoretical findings experimentally, including fine-tuning an image classification model.
nan
Article 388
Title@2025-07-15 (2): Sparse Identification of Nonlinear Dynamics with Conformal Prediction
Title: Sparse Identification of Nonlinear Dynamics with Conformal Prediction | Sparse Identifikation von nichtlinearen Dynamiken mit konformer Vorhersage | 以非正式预测对非线性动态的简单识别 2507.11739v1 |
Authors (1): Urban Fasel
The Sparse Identification of Nonlinear Dynamics (SINDy) is a method for discovering nonlinear dynamical system models from data. Quantifying uncertainty in SINDy models is essential for assessing their reliability, particularly in safety-critical applications. While various uncertainty quantification methods exist for SINDy, including Bayesian and ensemble approaches, this work explores the integration of Conformal Prediction, a framework that can provide valid prediction intervals with coverage guarantees based on minimal assumptions like data exchangeability. We introduce three applications of conformal prediction with Ensemble-SINDy (E-SINDy): (1) quantifying uncertainty in time series prediction, (2) model selection based on library feature importance, and (3) quantifying the uncertainty of identified model coefficients using feature conformal prediction. We demonstrate the three applications on stochastic predator-prey dynamics and several chaotic dynamical systems. We show that conformal prediction methods integrated with E-SINDy can reliably achieve desired target coverage for time series forecasting, effectively quantify feature importance, and produce more robust uncertainty intervals for model coefficients, even under non-Gaussian noise, compared to standard E-SINDy coefficient estimates.
nan
Article 389
Title@2025-07-15 (2): Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning
Title: Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning | Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning | 以编码器嵌入式嵌入为改进节点学习提供动力的神经网络 2507.11732v1 |
Authors (4): Shiyu Chen, Cencheng Shen, Youngser Park, Carey E. Priebe
Graph neural networks (GNNs) have emerged as a powerful framework for a wide range of node-level graph learning tasks. However, their performance is often constrained by reliance on random or minimally informed initial feature representations, which can lead to slow convergence and suboptimal solutions. In this paper, we leverage a statistically grounded method, one-hot graph encoder embedding (GEE), to generate high-quality initial node features that enhance the end-to-end training of GNNs. We refer to this integrated framework as the GEE-powered GNN (GG), and demonstrate its effectiveness through extensive simulations and real-world experiments across both unsupervised and supervised settings. In node clustering, GG consistently achieves state-of-the-art performance, ranking first across all evaluated real-world datasets, while exhibiting faster convergence compared to the standard GNN. For node classification, we further propose an enhanced variant, GG-C, which concatenates the outputs of GG and GEE and outperforms competing baselines. These results confirm the importance of principled, structure-aware feature initialization in realizing the full potential of GNNs.
nan
Article 390
Title@2025-07-15 (2): Globalization for Scalable Short-term Load Forecasting
Title: Globalization for Scalable Short-term Load Forecasting | Globalisierung für skalierbare kurzfristige Lastprognosen | 全球化促进可伸缩的短期负载预测 2507.11729v1 |
Authors (3): Amirhossein Ahmadi, Hamidreza Zareipour, Henry Leung
Forecasting load in power transmission networks is essential across various hierarchical levels, from the system level down to individual points of delivery (PoD). While intuitive and locally accurate, traditional local forecasting models (LFMs) face significant limitations, particularly in handling generalizability, overfitting, data drift, and the cold start problem. These methods also struggle with scalability, becoming computationally expensive and less efficient as the network’s size and data volume grow. In contrast, global forecasting models (GFMs) offer a new approach to enhance prediction generalizability, scalability, accuracy, and robustness through globalization and cross-learning. This paper investigates global load forecasting in the presence of data drifts, highlighting the impact of different modeling techniques and data heterogeneity. We explore feature-transforming and target-transforming models, demonstrating how globalization, data heterogeneity, and data drift affect each differently. In addition, we examine the role of globalization in peak load forecasting and its potential for hierarchical forecasting. To address data heterogeneity and the balance between globality and locality, we propose separate time series clustering (TSC) methods, introducing model-based TSC for feature-transforming models and new weighted instance-based TSC for target-transforming models. Through extensive experiments on a real-world dataset of Alberta’s electricity load, we demonstrate that global target-transforming models consistently outperform their local counterparts, especially when enriched with global features and clustering techniques. In contrast, global feature-transforming models face challenges in balancing local and global dynamics, often requiring TSC to manage data heterogeneity effectively.
nan
Article 391
Title@2025-07-15 (2): Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop
Title: Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop | Benchmarking und Evaluation von KI-Modellen in der Biologie: Ergebnisse und Empfehlungen aus dem CZI Virtual Cells Workshop | 衡量和评价AI 生物学模型的基准和评估:CZI虚拟单元讲习班的成果和建议 2507.10502v2 |
Authors (35): Elizabeth Fahsbender, Alma Andersson, Jeremy Ash, Polina Binder, Daniel Burkhardt, Benjamin Chang, Georg K. Gerber, Anthony Gitter, Patrick Godau, Ankit Gupta, Genevieve Haliburton, Siyu He, Trey Ideker, Ivana Jelic, Aly Khan, Yang-Joon Kim, Aditi Krishnapriyan, Jon M. Laurent, Tianyu Liu, Emma Lundberg, Shalin B. Mehta, Rob Moccia, Angela Oliveira Pisco, Katherine S. Pollard, Suresh Ramani, Julio Saez-Rodriguez, Yasin Senbabaoglu, Elana Simon, Srinivasan Sivanandan, Gustavo Stolovitzky, Marc Valer, Bo Wang, Xikun Zhang, James Zou, Katrina Kalantar
Artificial intelligence holds immense promise for transforming biology, yet a lack of standardized, cross domain, benchmarks undermines our ability to build robust, trustworthy models. Here, we present insights from a recent workshop that convened machine learning and computational biology experts across imaging, transcriptomics, proteomics, and genomics to tackle this gap. We identify major technical and systemic bottlenecks such as data heterogeneity and noise, reproducibility challenges, biases, and the fragmented ecosystem of publicly available resources and propose a set of recommendations for building benchmarking frameworks that can efficiently compare ML models of biological systems across tasks and data modalities. By promoting high quality data curation, standardized tooling, comprehensive evaluation metrics, and open, collaborative platforms, we aim to accelerate the development of robust benchmarks for AI driven Virtual Cells. These benchmarks are crucial for ensuring rigor, reproducibility, and biological relevance, and will ultimately advance the field toward integrated models that drive new discoveries, therapeutic insights, and a deeper understanding of cellular systems.
nan
Article 392
Title@2025-07-15 (2): Subgraph Generation for Generalizing on Out-of-Distribution Links
Title: Subgraph Generation for Generalizing on Out-of-Distribution Links | Subgraphengenerierung für die Verallgemeinerung von Out-of-Distribution-Links | 通用分配外链接的子集 2507.11710v1 |
Authors (3): Jay Revolinsky, Harry Shomer, Jiliang Tang
Graphs Neural Networks (GNNs) demonstrate high-performance on the link prediction (LP) task. However, these models often rely on all dataset samples being drawn from the same distribution. In addition, graph generative models (GGMs) show a pronounced ability to generate novel output graphs. Despite this, GGM applications remain largely limited to domain-specific tasks. To bridge this gap, we propose FLEX as a GGM framework which leverages two mechanism: (1) structurally-conditioned graph generation, and (2) adversarial co-training between an auto-encoder and GNN. As such, FLEX ensures structural-alignment between sample distributions to enhance link-prediction performance in out-of-distribution (OOD) scenarios. Notably, FLEX does not require expert knowledge to function in different OOD scenarios. Numerous experiments are conducted in synthetic and real-world OOD settings to demonstrate FLEX’s performance-enhancing ability, with further analysis for understanding the effects of graph data augmentation on link structures. The source code is available here: https://github.com/revolins/FlexOOD.
nan
Article 393
Title@2025-07-15 (2): Sporadic Federated Learning Approach in Quantum Environment to Tackle Quantum Noise
Title: Sporadic Federated Learning Approach in Quantum Environment to Tackle Quantum Noise | Sporadic Federated Learning Approach in Quantum Environment to Tackle Quantum Noise | 处理量子噪音的量子环境中零星的联邦学习方法 2507.12492v1 |
Authors (3): Ratun Rahman, Atit Pokharel, Dinh C. Nguyen
Quantum Federated Learning (QFL) is an emerging paradigm that combines quantum computing and federated learning (FL) to enable decentralized model training while maintaining data privacy over quantum networks. However, quantum noise remains a significant barrier in QFL, since modern quantum devices experience heterogeneous noise levels due to variances in hardware quality and sensitivity to quantum decoherence, resulting in inadequate training performance. To address this issue, we propose SpoQFL, a novel QFL framework that leverages sporadic learning to mitigate quantum noise heterogeneity in distributed quantum systems. SpoQFL dynamically adjusts training strategies based on noise fluctuations, enhancing model robustness, convergence stability, and overall learning efficiency. Extensive experiments on real-world datasets demonstrate that SpoQFL significantly outperforms conventional QFL approaches, achieving superior training performance and more stable convergence.
nan
Article 394
Title@2025-07-15 (2): Reinforcement Learning from Adversarial Preferences in Tabular MDPs
Title: Reinforcement Learning from Adversarial Preferences in Tabular MDPs | Verstärkung des Lernens von Adversarial Preferences in Tabular MDPs | 从表列MDP的反向优惠中学习 2507.11706v1 |
Authors (3): Taira Tsuchiya, Shinji Ito, Haipeng Luo
We introduce a new framework of episodic tabular Markov decision processes (MDPs) with adversarial preferences, which we refer to as preference-based MDPs (PbMDPs). Unlike standard episodic MDPs with adversarial losses, where the numerical value of the loss is directly observed, in PbMDPs the learner instead observes preferences between two candidate arms, which represent the choices being compared. In this work, we focus specifically on the setting where the reward functions are determined by Borda scores. We begin by establishing a regret lower bound for PbMDPs with Borda scores. As a preliminary step, we present a simple instance to prove a lower bound of $\Omega(\sqrt{HSAT})$ for episodic MDPs with adversarial losses, where $H$ is the number of steps per episode, $S$ is the number of states, $A$ is the number of actions, and $T$ is the number of episodes. Leveraging this construction, we then derive a regret lower bound of $\Omega( (H^2 S K)^{1/3} T^{2/3} )$ for PbMDPs with Borda scores, where $K$ is the number of arms. Next, we develop algorithms that achieve a regret bound of order $T^{2/3}$. We first propose a global optimization approach based on online linear optimization over the set of all occupancy measures, achieving a regret bound of $\tilde{O}((H^2 S^2 K)^{1/3} T^{2/3} )$ under known transitions. However, this approach suffers from suboptimal dependence on the potentially large number of states $S$ and computational inefficiency. To address this, we propose a policy optimization algorithm whose regret is roughly bounded by $\tilde{O}( (H^6 S K^5)^{1/3} T^{2/3} )$ under known transitions, and further extend the result to the unknown-transition setting.
nan
Article 395
Title@2025-07-15 (2): Time series classification of satellite data using LSTM networks: an approach for predicting leaf-fall to minimize railroad traffic disruption
Title: Time series classification of satellite data using LSTM networks: an approach for predicting leaf-fall to minimize railroad traffic disruption | Zeitreihenklassifizierung von Satellitendaten unter Verwendung von LSTM-Netzen: ein Ansatz zur Vorhersage von Blattfallen zur Minimierung von Verkehrsunterbrechungen im Eisenbahnverkehr | 利用LSTM网络对卫星数据进行时间序列分类:预测落叶的方法,以尽量减少铁路交通中断 2507.11702v1 |
Authors (3): Hein de Wilde, Ali Mohammed Mansoor Alsahag, Pierre Blanchet
Railroad traffic disruption as a result of leaf-fall cost the UK rail industry over 300 million per year and measures to mitigate such disruptions are employed on a large scale, with 1.67 million kilometers of track being treated in the UK in 2021 alone. Therefore, the ability to anticipate the timing of leaf-fall would offer substantial benefits for rail network operators, enabling the efficient scheduling of such mitigation measures. However, current methodologies for predicting leaf-fall exhibit considerable limitations in terms of scalability and reliability. This study endeavors to devise a prediction system that leverages specialized prediction methods and the latest satellite data sources to generate both scalable and reliable insights into leaf-fall timings. An LSTM network trained on ground-truth leaf-falling data combined with multispectral and meteorological satellite data demonstrated a root-mean-square error of 6.32 days for predicting the start of leaf-fall and 9.31 days for predicting the end of leaf-fall. The model, which improves upon previous work on the topic, offers promising opportunities for the optimization of leaf mitigation measures in the railway industry and the improvement of our understanding of complex ecological systems.
nan
Article 396
Title@2025-07-15 (2): Spatially Grounded Explanations in Vision Language Models for Document Visual Question Answering
Title: Spatially Grounded Explanations in Vision Language Models for Document Visual Question Answering | Spatially Grounded Erklärungen in Vision Language Models for Document Visual Question Answering | 用于文件视觉问题解答的愿景语言模型中的基于空间的解释 2507.12490v1 |
Authors (3): Maximiliano Hormazábal Lagos, Héctor Cerezo-Costas, Dimosthenis Karatzas
We introduce EaGERS, a fully training-free and model-agnostic pipeline that (1) generates natural language rationales via a vision language model, (2) grounds these rationales to spatial sub-regions by computing multimodal embedding similarities over a configurable grid with majority voting, and (3) restricts the generation of responses only from the relevant regions selected in the masked image. Experiments on the DocVQA dataset demonstrate that our best configuration not only outperforms the base model on exact match accuracy and Average Normalized Levenshtein Similarity metrics but also enhances transparency and reproducibility in DocVQA without additional model fine-tuning.
nan
Article 397
Title@2025-07-15 (2): Variational Combinatorial Sequential Monte Carlo for Bayesian Phylogenetics in Hyperbolic Space
Title: Variational Combinatorial Sequential Monte Carlo for Bayesian Phylogenetics in Hyperbolic Space | Variationale Kombinatorial Sequentielle Monte Carlo für Bayesische Phylogenetik im Hyperbolischen Raum | 双曲空间巴耶斯动力基因组学变异组合序列蒙特卡洛 2501.17965v2 |
Authors (6): Alex Chen, Philipe Chlenski, Kenneth Munyuza, Antonio Khalil Moretti, Christian A. Naesseth, Itsik Pe’er
Hyperbolic space naturally encodes hierarchical structures such as phylogenies (binary trees), where inward-bending geodesics reflect paths through least common ancestors, and the exponential growth of neighborhoods mirrors the super-exponential scaling of topologies. This scaling challenge limits the efficiency of Euclidean-based approximate inference methods. Motivated by the geometric connections between trees and hyperbolic space, we develop novel hyperbolic extensions of two sequential search algorithms: Combinatorial and Nested Combinatorial Sequential Monte Carlo (\textsc{Csmc} and \textsc{Ncsmc}). Our approach introduces consistent and unbiased estimators, along with variational inference methods (\textsc{H-Vcsmc} and \textsc{H-Vncsmc}), which outperform their Euclidean counterparts. Empirical results demonstrate improved speed, scalability and performance in high-dimensional phylogenetic inference tasks.
nan
Article 398
Title@2025-07-15 (2): Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees
Title: Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees | Any-Property-Conditional Molecule Generation mit Selbst-Kritik mit Spanning Trees | 使用横贯树木进行自批评的 任何有条件的分子代 2407.09357v3 |
Authors (5): Alexia Jolicoeur-Martineau, Aristide Baratin, Kisoo Kwon, Boris Knyazev, Yan Zhang
Generating novel molecules is challenging, with most representations leading to generative models producing many invalid molecules. Spanning Tree-based Graph Generation (STGG) is a promising approach to ensure the generation of valid molecules, outperforming state-of-the-art SMILES and graph diffusion models for unconditional generation. In the real world, we want to be able to generate molecules conditional on one or multiple desired properties rather than unconditionally. Thus, in this work, we extend STGG to multi-property-conditional generation. Our approach, STGG+, incorporates a modern Transformer architecture, random masking of properties during training (enabling conditioning on any subset of properties and classifier-free guidance), an auxiliary property-prediction loss (allowing the model to self-criticize molecules and select the best ones), and other improvements. We show that STGG+ achieves state-of-the-art performance on in-distribution and out-of-distribution conditional generation, and reward maximization.
nan
Article 399
Title@2025-07-15 (2): Galaxy image simplification using Generative AI
Title: Galaxy image simplification using Generative AI | Galaxy Bildvereinfachung mit Generative KI | 利用创用AI简化银河系统图像 2507.11692v1 |
Authors (2): Sai Teja Erukude, Lior Shamir
Modern digital sky surveys have been acquiring images of billions of galaxies. While these images often provide sufficient details to analyze the shape of the galaxies, accurate analysis of such high volumes of images requires effective automation. Current solutions often rely on machine learning annotation of the galaxy images based on a set of pre-defined classes. Here we introduce a new approach to galaxy image analysis that is based on generative AI. The method simplifies the galaxy images and automatically converts them into a ``skeletonized” form. The simplified images allow accurate measurements of the galaxy shapes and analysis that is not limited to a certain pre-defined set of classes. We demonstrate the method by applying it to galaxy images acquired by the DESI Legacy Survey. The code and data are publicly available. The method was applied to 125,000 DESI Legacy Survey images, and the catalog of the simplified images is publicly available.
nan
Article 400
Title@2025-07-15 (2): The Impact of Coreset Selection on Spurious Correlations and Group Robustness
Title: The Impact of Coreset Selection on Spurious Correlations and Group Robustness | Die Auswirkungen der Coreset-Auswahl auf Purious Correlations und Group Robustness | Coreset 选择对污损和群体强势的影响 2507.11690v1 |
Authors (5): Amaya Dharmasiri, William Yang, Polina Kirichenko, Lydia Liu, Olga Russakovsky
Coreset selection methods have shown promise in reducing the training data size while maintaining model performance for data-efficient machine learning. However, as many datasets suffer from biases that cause models to learn spurious correlations instead of causal features, it is important to understand whether and how dataset reduction methods may perpetuate, amplify, or mitigate these biases. In this work, we conduct the first comprehensive analysis of the implications of data selection on the spurious bias levels of the selected coresets and the robustness of downstream models trained on them. We use an extensive experimental setting spanning ten different spurious correlations benchmarks, five score metrics to characterize sample importance/ difficulty, and five data selection policies across a broad range of coreset sizes. Thereby, we unravel a series of nontrivial nuances in interactions between sample difficulty and bias alignment, as well as dataset bias and resultant model robustness. For example, we find that selecting coresets using embedding-based sample characterization scores runs a comparatively lower risk of inadvertently exacerbating bias than selecting using characterizations based on learning dynamics. Most importantly, our analysis reveals that although some coreset selection methods could achieve lower bias levels by prioritizing difficult samples, they do not reliably guarantee downstream robustness.
nan
Article 401
Title@2025-07-15 (2): Composing Linear Layers from Irreducibles
Title: Composing Linear Layers from Irreducibles | Das Komponieren von linearen Schichten aus Irreduzierbaren | 将来自不灵异的线性图层合成成线性图层 2507.11688v1 |
Authors (3): Travis Pence, Daisuke Yamada, Vikas Singh
Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors – geometric objects encoding oriented planes – and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations. Our findings provide an algebraic perspective on how these geometric primitives can compose into higher-level functions within deep models.
nan
Article 402
Title@2025-07-15 (2): MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Title: MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization | MetaLint: Generalisierbare idiomatische Code-Qualitätsanalyse durch instruction-following und einfach-zu-harte Verallgemeinerung | MetLint: 通过执行指示和易于协调的通用化,可通用的单性守则质量分析 2507.11687v1 |
Authors (6): Atharva Naik, Lawanya Baghel, Dhakshin Govindarajan, Darsh Agrawal, Daniel Fried, Carolyn Rose
Large Language Models, though successful in code generation, struggle with code quality analysis because they are limited by static training data and can’t easily adapt to evolving best practices. We introduce MetaLint, a new instruction-following framework that formulates code quality analysis as the task of detecting and fixing problematic semantic code fragments or code idioms based on high-level specifications. Unlike conventional approaches that train models on static, rule-based data, MetaLint employs instruction tuning on synthetic linter-generated data to support easy-to-hard generalization, enabling models to adapt to novel or complex code patterns without retraining. To evaluate this, we construct a benchmark of challenging idioms inspired by real-world coding standards such as Python Enhancement Proposals (PEPs) and assess whether MetaLint-trained models reason adaptively or simply memorize. Our results show that MetaLint improves generalization to unseen PEP idioms, achieving a 70.37% F-score on idiom detection with the highest recall (70.43%) among all evaluated models. It also achieves 26.73% on localization, competitive for its 4B parameter size and comparable to larger state-of-the-art models like o3-mini, highlighting its potential for future-proof code quality analysis.
nan
Article 403
Title@2025-07-15 (2): PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
Title: PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training | PGT-I: Scaling Spatiotemporal GNNs mit speichereffizienter verteilter Ausbildung | PGT-I: 具有记忆有效分配培训的Splap Spatotomotial GNNs 2507.11683v1 |
Authors (7): Seth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Robert Ross, Shivaram Venkataraman
Spatiotemporal graph neural networks (ST-GNNs) are powerful tools for modeling spatial and temporal data dependencies. However, their applications have been limited primarily to small-scale datasets because of memory constraints. While distributed training offers a solution, current frameworks lack support for spatiotemporal models and overlook the properties of spatiotemporal data. Informed by a scaling study on a large-scale workload, we present PyTorch Geometric Temporal Index (PGT-I), an extension to PyTorch Geometric Temporal that integrates distributed data parallel training and two novel strategies: index-batching and distributed-index-batching. Our index techniques exploit spatiotemporal structure to construct snapshots dynamically at runtime, significantly reducing memory overhead, while distributed-index-batching extends this approach by enabling scalable processing across multiple GPUs. Our techniques enable the first-ever training of an ST-GNN on the entire PeMS dataset without graph partitioning, reducing peak memory usage by up to 89\% and achieving up to a 13.1x speedup over standard DDP with 128 GPUs.
nan
Article 404
Title@2025-07-15 (2): AI for Explosive Ordnance Detection in Clearance Operations: The State of Research
Title: AI for Explosive Ordnance Detection in Clearance Operations: The State of Research | KI für explosive Ordnance Detection in Clearing-Operationen: Der Stand der Forschung | 清除行动中爆炸性弹药侦测的AI:研究状况 2411.05813v2 |
Authors (4): Björn Kischelewski, Gregory Cathcart, David Wahl, Benjamin Guedj
The detection and clearance of explosive ordnance (EO) continues to be a predominantly manual and high-risk process that can benefit from advances in technology to improve its efficiency and effectiveness. Research on artificial intelligence (AI) for EO detection in clearance operations has grown significantly in recent years. However, this research spans a wide range of fields, making it difficult to gain a comprehensive understanding of current trends and developments. Therefore, this article provides a literature review of academic research on AI for EO detection in clearance operations. It finds that research can be grouped into two main streams: AI for EO object detection and AI for EO risk prediction, with the latter being much less studied than the former. From the literature review, we develop three opportunities for future research. These include a call for renewed efforts in the use of AI for EO risk prediction, the combination of different AI systems and data sources, and novel approaches to improve EO risk prediction performance, such as pattern-based predictions. Finally, we provide a perspective on the future of AI for EO detection in clearance operations. We emphasize the role of traditional machine learning (ML) for this task, the need to dynamically incorporate expert knowledge into the models, and the importance of effectively integrating AI systems with real-world operations.
nan
Article 405
Title@2025-07-15 (2): Kolmogorov-Arnold Networks: Approximation and Learning Guarantees for Functions and their Derivatives
Title: Kolmogorov-Arnold Networks: Approximation and Learning Guarantees for Functions and their Derivatives | Kolmogorov-Arnold-Netzwerke: Annäherungs- und Lerngarantien für Funktionen und deren Derivate | Kolmogorov-Arnold网络:功能及其衍生工具的近似和学习保障 2504.15110v2 |
Authors (3): Anastasis Kratsios, Bum Jun Kim, Takashi Furuya
Inspired by the Kolmogorov-Arnold superposition theorem, Kolmogorov-Arnold Networks (KANs) have recently emerged as an improved backbone for most deep learning frameworks, promising more adaptivity than their multilayer perception (MLP) predecessor by allowing for trainable spline-based activation functions. In this paper, we probe the theoretical foundations of the KAN architecture by showing that it can optimally approximate any Besov function in $B^{s}{p,q}(\mathcal{X})$ on a bounded open, or even fractal, domain $\mathcal{X}$ in $\mathbb{R}^d$ at the optimal approximation rate with respect to any weaker Besov norm $B^{\alpha}{p,q}(\mathcal{X})$; where $\alpha < s$. We complement our approximation guarantee with a dimension-free estimate on the sample complexity of a residual KAN model when learning a function of Besov regularity from $N$ i.i.d. noiseless samples. Our KAN architecture incorporates contemporary deep learning wisdom by leveraging residual/skip connections between layers.
nan
Article 406
Title@2025-07-15 (2): Machine Learning-Driven Compensation for Non-Ideal Channels in AWG-Based FBG Interrogator
Title: Machine Learning-Driven Compensation for Non-Ideal Channels in AWG-Based FBG Interrogator | Machine Learning-Driven Kompensation für nicht-ideale Kanäle im AWG-basierten FBG-Interrogator | 特设工作组FBG 干涉器中非理想通道的机器学习驱动补偿 2506.13575v2 |
Authors (8): Ivan A. Kazakov, Iana V. Kulichenko, Egor E. Kovalev, Angelina A. Treskova, Daria D. Barma, Kirill M. Malakhov, Ivan V. Oseledets, Arkady V. Shipulin
We present an experimental study of a fiber Bragg grating (FBG) interrogator based on a silicon oxynitride (SiON) photonic integrated arrayed waveguide grating (AWG). While AWG-based interrogators are compact and scalable, their practical performance is limited by non-ideal spectral responses. To address this, two calibration strategies within a 2.4 nm spectral region were compared: (1) a segmented analytical model based on a sigmoid fitting function, and (2) a machine learning (ML)-based regression model. The analytical method achieves a root mean square error (RMSE) of 7.11 pm within the calibrated range, while the ML approach based on exponential regression achieves 3.17 pm. Moreover, the ML model demonstrates generalization across an extended 2.9 nm wavelength span, maintaining sub-5 pm accuracy without re-fitting. Residual and error distribution analyses further illustrate the trade-offs between the two approaches. ML-based calibration provides a robust, data-driven alternative to analytical methods, delivering enhanced accuracy for non-ideal channel responses, reduced manual calibration effort, and improved scalability across diverse FBG sensor configurations.
nan
Article 407
Title@2025-07-15 (2): Let’s Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Title: Let’s Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification | Lassen Sie uns in zwei Schritten denken: Abmildern Vereinbarung Bias in MLLMs mit selbst-gerundete Verifikation | 让我们思考两步:在MLLMs中减少协议与自我核查的偏见 2507.11662v1 |
Authors (6): Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt Kira
Verifiers – functions assigning rewards to agent behavior – have been key for AI progress in domains like math and board games. However, extending these gains to domains without clear-cut success criteria (e.g.,computer use) remains a challenge: while humans can recognize suitable outcomes, translating this intuition into scalable rules is non-trivial. Multimodal Large Language Models(MLLMs) emerge as a promising solution, given their world knowledge, human-preference alignment, and reasoning skills. We evaluate MLLMs as verifiers of agent trajectories across web navigation, computer use, and robotic manipulation, and identify a critical limitation: agreement bias, a strong tendency for MLLMs to favor information in their context window, often generating chains of thought to rationalize flawed behavior. This bias is pervasive across models, resilient to test-time scaling, and can impact several methods using MLLMs as evaluators (e.g.,data filtering). Notably, it occurs despite MLLMs showing strong, human-aligned priors on desired behavior. To address this, we propose Self-Grounded Verification (SGV), a lightweight method that enables more effective use of MLLMs’ knowledge and reasoning by harnessing their own sampling mechanisms via unconditional and conditional generation. SGV operates in two steps: first, the MLLM is elicited to retrieve broad priors about task completion, independent of the data under evaluation. Then, conditioned on self-generated priors, it reasons over and evaluates a candidate trajectory. Enhanced with SGV, MLLM verifiers show gains of up to 20 points in accuracy and failure detection rates, and can perform real-time supervision of heterogeneous agents, boosting task completion of a GUI specialist in OSWorld, a diffusion policy in robomimic, and a ReAct agent in VisualWebArena – setting a new state of the art on the benchmark, surpassing the previous best by 48%.
nan
Article 408
Title@2025-07-15 (2): Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory
Title: Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory | Mathematische Einführung in Deep Learning: Methoden, Implementierungen und Theorie | 深层学习数学介绍:方法、实施和理论 2310.20360v3 |
Authors (3): Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger
This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization) and different optimization algorithms (such as the basic stochastic gradient descent (SGD) method, accelerated methods, and adaptive methods). We also cover several theoretical aspects of deep learning algorithms such as approximation capacities of ANNs (including a calculus for ANNs), optimization theory (including Kurdyka-{\L}ojasiewicz inequalities), and generalization errors. In the last part of the book some deep learning approximation methods for PDEs are reviewed including physics-informed neural networks (PINNs) and deep Galerkin methods. We hope that this book will be useful for students and scientists who do not yet have any background in deep learning at all and would like to gain a solid foundation as well as for practitioners who would like to obtain a firmer mathematical understanding of the objects and methods considered in deep learning.
nan
Article 409
Title@2025-07-15 (2): STAGED: A Multi-Agent Neural Network for Learning Cellular Interaction Dynamics
Title: STAGED: A Multi-Agent Neural Network for Learning Cellular Interaction Dynamics | STAGED: Ein multi-agent-neurales Netzwerk zum Lernen zellulärer Interaktionsdynamik | STAGAD: 学习细胞互动动态多要素神经网络 2507.11660v1 |
Authors (9): Joao F. Rocha, Ke Xu, Xingzhi Sun, Ananya Krishna, Dhananjay Bhaskar, Blanche Mongeon, Morgan Craig, Mark Gerstein, Smita Krishnaswamy
The advent of single-cell technology has significantly improved our understanding of cellular states and subpopulations in various tissues under normal and diseased conditions by employing data-driven approaches such as clustering and trajectory inference. However, these methods consider cells as independent data points of population distributions. With spatial transcriptomics, we can represent cellular organization, along with dynamic cell-cell interactions that lead to changes in cell state. Still, key computational advances are necessary to enable the data-driven learning of such complex interactive cellular dynamics. While agent-based modeling (ABM) provides a powerful framework, traditional approaches rely on handcrafted rules derived from domain knowledge rather than data-driven approaches. To address this, we introduce Spatio Temporal Agent-Based Graph Evolution Dynamics(STAGED) integrating ABM with deep learning to model intercellular communication, and its effect on the intracellular gene regulatory network. Using graph ODE networks (GDEs) with shared weights per cell type, our approach represents genes as vertices and interactions as directed edges, dynamically learning their strengths through a designed attention mechanism. Trained to match continuous trajectories of simulated as well as inferred trajectories from spatial transcriptomics data, the model captures both intercellular and intracellular interactions, enabling a more adaptive and accurate representation of cellular dynamics.
nan
Article 410
Title@2025-07-15 (2): ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs
Title: ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs | ZKP-FedEval: Überprüfbare und datenschutzschonende Federated Evaluation mit Null-Wissensnachweisen | ZKP-FedEval:使用零知识证明进行可核查和隐私保护的联邦评价 2507.11649v1 |
Authors (4): Daniel Commey, Benjamin Appiah, Griffith S. Klogo, Garth V. Crosby
Federated Learning (FL) enables collaborative model training on decentralized data without exposing raw data. However, the evaluation phase in FL may leak sensitive information through shared performance metrics. In this paper, we propose a novel protocol that incorporates Zero-Knowledge Proofs (ZKPs) to enable privacy-preserving and verifiable evaluation for FL. Instead of revealing raw loss values, clients generate a succinct proof asserting that their local loss is below a predefined threshold. Our approach is implemented without reliance on external APIs, using self-contained modules for federated learning simulation, ZKP circuit design, and experimental evaluation on both the MNIST and Human Activity Recognition (HAR) datasets. We focus on a threshold-based proof for a simple Convolutional Neural Network (CNN) model (for MNIST) and a multi-layer perceptron (MLP) model (for HAR), and evaluate the approach in terms of computational overhead, communication cost, and verifiability.
nan
Article 411
Title@2025-07-15 (2): Tracing the Path to Grokking: Embeddings, Dropout, and Network Activation
Title: Tracing the Path to Grokking: Embeddings, Dropout, and Network Activation | Auf dem Weg zum Grokking: Einbettungen, Dropout und Netzwerkaktivierung | 追踪通往格罗金的道路:嵌入、辍学和网络启动 2507.11645v1 |
Authors (2): Ahmed Salah, David Yevick
Grokking refers to delayed generalization in which the increase in test accuracy of a neural network occurs appreciably after the improvement in training accuracy This paper introduces several practical metrics including variance under dropout, robustness, embedding similarity, and sparsity measures, that can forecast grokking behavior. Specifically, the resilience of neural networks to noise during inference is estimated from a Dropout Robustness Curve (DRC) obtained from the variation of the accuracy with the dropout rate as the model transitions from memorization to generalization. The variance of the test accuracy under stochastic dropout across training checkpoints further exhibits a local maximum during the grokking. Additionally, the percentage of inactive neurons decreases during generalization, while the embeddings tend to a bimodal distribution independent of initialization that correlates with the observed cosine similarity patterns and dataset symmetries. These metrics additionally provide valuable insight into the origin and behaviour of grokking.
nan
Article 412
Title@2025-07-15 (2): Posture-Driven Action Intent Inference for Playing style and Fatigue Assessment
Title: Posture-Driven Action Intent Inference for Playing style and Fatigue Assessment | Posture-Driven Action Intent Inferenz für Spielstil und Müdigkeit Bewertung | 游戏风格和Fatigue评估的推论 2507.11642v1 |
Authors (2): Abhishek Jaiswal, Nisheeth Srivastava
Posture-based mental state inference has significant potential in diagnosing fatigue, preventing injury, and enhancing performance across various domains. Such tools must be research-validated with large datasets before being translated into practice. Unfortunately, such vision diagnosis faces serious challenges due to the sensitivity of human subject data. To address this, we identify sports settings as a viable alternative for accumulating data from human subjects experiencing diverse emotional states. We test our hypothesis in the game of cricket and present a posture-based solution to identify human intent from activity videos. Our method achieves over 75\% F1 score and over 80\% AUC-ROC in discriminating aggressive and defensive shot intent through motion analysis. These findings indicate that posture leaks out strong signals for intent inference, even with inherent noise in the data pipeline. Furthermore, we utilize existing data statistics as weak supervision to validate our findings, offering a potential solution for overcoming data labelling limitations. This research contributes to generalizable techniques for sports analytics and also opens possibilities for applying human behavior analysis across various fields.
nan
Article 413
Title@2025-07-15 (2): Deep Generative Methods and Tire Architecture Design
Title: Deep Generative Methods and Tire Architecture Design | Tiefe generative Methoden und Reifenarchitektur Design | 深生成方法和轮胎结构设计 2507.11639v1 |
Authors (4): Fouad Oubari, Raphael Meunier, Rodrigue Décatoire, Mathilde Mougeot
As deep generative models proliferate across the AI landscape, industrial practitioners still face critical yet unanswered questions about which deep generative models best suit complex manufacturing design tasks. This work addresses this question through a complete study of five representative models (Variational Autoencoder, Generative Adversarial Network, multimodal Variational Autoencoder, Denoising Diffusion Probabilistic Model, and Multinomial Diffusion Model) on industrial tire architecture generation. Our evaluation spans three key industrial scenarios: (i) unconditional generation of complete multi-component designs, (ii) component-conditioned generation (reconstructing architectures from partial observations), and (iii) dimension-constrained generation (creating designs that satisfy specific dimensional requirements). To enable discrete diffusion models to handle conditional scenarios, we introduce categorical inpainting, a mask-aware reverse diffusion process that preserves known labels without requiring additional training. Our evaluation employs geometry-aware metrics specifically calibrated for industrial requirements, quantifying spatial coherence, component interaction, structural connectivity, and perceptual fidelity. Our findings reveal that diffusion models achieve the strongest overall performance; a masking-trained VAE nonetheless outperforms the multimodal variant MMVAE\textsuperscript{+} on nearly all component-conditioned metrics, and within the diffusion family MDM leads in-distribution whereas DDPM generalises better to out-of-distribution dimensional constraints.
nan
Article 414
Title@2025-07-15 (2): Interpretable Prediction of Lymph Node Metastasis in Rectal Cancer MRI Using Variational Autoencoders
Title: Interpretable Prediction of Lymph Node Metastasis in Rectal Cancer MRI Using Variational Autoencoders | Interpretable Vorhersage von Lymphknotenmetastasen bei rektaler KrebsmRT mit variablen Autoencodern | 利用变化式自动电解器对直肠癌MRI中淋巴结结的代谢值进行可解释的预测 2507.11638v1 |
Authors (5): Benjamin Keel, Aaron Quyn, David Jayne, Maryam Mohsin, Samuel D. Relton
Effective treatment for rectal cancer relies on accurate lymph node metastasis (LNM) staging. However, radiological criteria based on lymph node (LN) size, shape and texture morphology have limited diagnostic accuracy. In this work, we investigate applying a Variational Autoencoder (VAE) as a feature encoder model to replace the large pre-trained Convolutional Neural Network (CNN) used in existing approaches. The motivation for using a VAE is that the generative model aims to reconstruct the images, so it directly encodes visual features and meaningful patterns across the data. This leads to a disentangled and structured latent space which can be more interpretable than a CNN. Models are deployed on an in-house MRI dataset with 168 patients who did not undergo neo-adjuvant treatment. The post-operative pathological N stage was used as the ground truth to evaluate model predictions. Our proposed model ‘VAE-MLP’ achieved state-of-the-art performance on the MRI dataset, with cross-validated metrics of AUC 0.86 +/- 0.05, Sensitivity 0.79 +/- 0.06, and Specificity 0.85 +/- 0.05. Code is available at: https://github.com/benkeel/Lymph_Node_Classification_MIUA.
nan
Article 415
Title@2025-07-15 (2): JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs
Title: JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs | JSQA: Sprachqualitätsbewertung mit Wahrnehmungs-Inspired Contractive Pretraining basierend auf JND Audio Pairs | JSQA:根据JND音频对音频对调,用自觉受启发的违反规定前训练进行语言质量评估 2507.11636v1 |
Authors (2): Junyi Fan, Donald Williamson
Speech quality assessment (SQA) is often used to learn a mapping from a high-dimensional input space to a scalar that represents the mean opinion score (MOS) of the perceptual speech quality. Learning such a mapping is challenging for many reasons, but largely because MOS exhibits high levels of inherent variance due to perceptual and experimental-design differences. Many solutions have been proposed, but many approaches do not properly incorporate perceptual factors into their learning algorithms (beyond the MOS label), which could lead to unsatisfactory results. To this end, we propose JSQA, a two-stage framework that pretrains an audio encoder using perceptually-guided contrastive learning on just noticeable difference (JND) pairs, followed by fine-tuning for MOS prediction. We first generate pairs of audio data within JND levels, which are then used to pretrain an encoder to leverage perceptual quality similarity information and map it into an embedding space. The JND pairs come from clean LibriSpeech utterances that are mixed with background noise from CHiME-3, at different signal-to-noise ratios (SNRs). The encoder is later fine-tuned with audio samples from the NISQA dataset for MOS prediction. Experimental results suggest that perceptually-inspired contrastive pretraining significantly improves the model performance evaluated by various metrics when compared against the same network trained from scratch without pretraining. These findings suggest that incorporating perceptual factors into pretraining greatly contributes to the improvement in performance for SQA.
nan
Article 416
Title@2025-07-15 (2): Multi-view biomedical foundation models for molecule-target and property prediction
Title: Multi-view biomedical foundation models for molecule-target and property prediction | Multi-View biomedizinische Stiftungsmodelle für Molekül-Ziel- und Eigenschaftsvorhersage | 分子目标和财产预测多视角生物医学基础模型 2410.19704v4 |
Authors (18): Parthasarathy Suryanarayanan, Yunguang Qiu, Shreyans Sethi, Diwakar Mahajan, Hongyang Li, Yuxin Yang, Elif Eyigoz, Aldo Guzman Saenz, Daniel E. Platt, Timothy H. Rumbell, Kenney Ng, Sanjoy Dey, Myson Burch, Bum Chul Kwon, Pablo Meyer, Feixiong Cheng, Jianying Hu, Joseph A. Morrone
Quality molecular representations are key to foundation model development in bio-medical research. Previous efforts have typically focused on a single representation or molecular view, which may have strengths or weaknesses on a given task. We develop Multi-view Molecular Embedding with Late Fusion (MMELON), an approach that integrates graph, image and text views in a foundation model setting and may be readily extended to additional representations. Single-view foundation models are each pre-trained on a dataset of up to 200M molecules. The multi-view model performs robustly, matching the performance of the highest-ranked single-view. It is validated on over 120 tasks, including molecular solubility, ADME properties, and activity against G Protein-Coupled receptors (GPCRs). We identify 33 GPCRs that are related to Alzheimer’s disease and employ the multi-view model to select strong binders from a compound screen. Predictions are validated through structure-based modeling and identification of key binding motifs.
nan
Article 417
Title@2025-07-15 (2): A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs
Title: A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs | Ein rechnerisch frugales Open-Source-Stiftungsmodell für Thorax-Erkennung in Lungenkrebs-Screening-Programmen | 肺癌筛查方案中胸腔酸疾病检测的计算节节制开源基础模型 2507.01881v2 |
Authors (16): Niccolò McConnell, Pardeep Vasudev, Daisuke Yamada, Daryl Cheng, Mehran Azimbagirad, John McCabe, Shahab Aslani, Ahmed H. Shahin, Yukun Zhou, The SUMMIT Consortium, Andre Altmann, Yipeng Hu, Paul Taylor, Sam M. Janes, Daniel C. Alexander, Joseph Jacob
Low-dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE demonstrates fast convergence during fine-tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine-tuning data. Pretrained using self-supervised learning on over 98,000 thoracic LDCTs, including the UK’s largest LCS initiative to date and 27 public datasets, TANGERINE achieves state-of-the-art performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while generalising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource-intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open-source lightweight design lays the foundation for rapid integration into next-generation medical imaging tools that could transform LCS initiatives, allowing them to pivot from a singular focus on lung cancer detection to comprehensive respiratory disease management in high-risk populations.
nan
Article 418
Title@2025-07-15 (2): MapIQ: Benchmarking Multimodal Large Language Models for Map Question Answering
Title: MapIQ: Benchmarking Multimodal Large Language Models for Map Question Answering | MapIQ: Benchmarking multimodaler Großsprachenmodelle für Kartenfrageantworten | MapIQ:为地图回答问题确定多式大语言模式基准 2507.11625v1 |
Authors (5): Varun Srivastava, Fan Lei, Srija Mukhopadhyay, Vivek Gupta, Ross Maciejewski
Recent advancements in multimodal large language models (MLLMs) have driven researchers to explore how well these models read data visualizations, e.g., bar charts, scatter plots. More recently, attention has shifted to visual question answering with maps (Map-VQA). However, Map-VQA research has primarily focused on choropleth maps, which cover only a limited range of thematic categories and visual analytical tasks. To address these gaps, we introduce MapIQ, a benchmark dataset comprising 14,706 question-answer pairs across three map types: choropleth maps, cartograms, and proportional symbol maps spanning topics from six distinct themes (e.g., housing, crime). We evaluate multiple MLLMs using six visual analytical tasks, comparing their performance against one another and a human baseline. An additional experiment examining the impact of map design changes (e.g., altered color schemes, modified legend designs, and removal of map elements) provides insights into the robustness and sensitivity of MLLMs, their reliance on internal geographic knowledge, and potential avenues for improving Map-VQA performance.
nan
Article 419
Title@2025-07-15 (2): Learning Representations of Event Time Series with Sparse Autoencoders for Anomaly Detection, Similarity Search, and Unsupervised Classification
Title: Learning Representations of Event Time Series with Sparse Autoencoders for Anomaly Detection, Similarity Search, and Unsupervised Classification | Lernrepräsentationen der Veranstaltungszeitreihe mit Sparse-Autoencodern für Anomalieerkennung, Ähnlichkeitssuche und unbeaufsichtigte Klassifizierung | 与用于异常探测、相似搜索和无监督分类的粗皮自动编码器一起进行的 活动时间系列学习说明 2507.11620v1 |
Authors (2): Steven Dillmann, Juan Rafael Martínez-Galarza
Event time series are sequences of discrete events occurring at irregular time intervals, each associated with a domain-specific observational modality. They are common in domains such as high-energy astrophysics, computational social science, cybersecurity, finance, healthcare, neuroscience, and seismology. Their unstructured and irregular structure poses significant challenges for extracting meaningful patterns and identifying salient phenomena using conventional techniques. We propose novel two- and three-dimensional tensor representations for event time series, coupled with sparse autoencoders that learn physically meaningful latent representations. These embeddings support a variety of downstream tasks, including anomaly detection, similarity-based retrieval, semantic clustering, and unsupervised classification. We demonstrate our approach on a real-world dataset from X-ray astronomy, showing that these representations successfully capture temporal and spectral signatures and isolate diverse classes of X-ray transients. Our framework offers a flexible, scalable, and generalizable solution for analyzing complex, irregular event time series across scientific and industrial domains.
nan
Article 420
Title@2025-07-15 (2): Streaming 4D Visual Geometry Transformer
Title: Streaming 4D Visual Geometry Transformer | Streaming 4D Visuelle Geometrie Transformer | 流动 4D 视觉几何变换器 2507.11539v1 |
Authors (6): Dong Zhuo, Wenzhao Zheng, Jiahe Guo, Yuqi Wu, Jie Zhou, Jiwen Lu
Perceiving and reconstructing 4D spatial-temporal geometry from videos is a fundamental yet challenging computer vision task. To facilitate interactive and real-time applications, we propose a streaming 4D visual geometry transformer that shares a similar philosophy with autoregressive large language models. We explore a simple and efficient design and employ a causal transformer architecture to process the input sequence in an online manner. We use temporal causal attention and cache the historical keys and values as implicit memory to enable efficient streaming long-term 4D reconstruction. This design can handle real-time 4D reconstruction by incrementally integrating historical information while maintaining high-quality spatial consistency. For efficient training, we propose to distill knowledge from the dense bidirectional visual geometry grounded transformer (VGGT) to our causal model. For inference, our model supports the migration of optimized efficient attention operator (e.g., FlashAttention) from the field of large language models. Extensive experiments on various 4D geometry perception benchmarks demonstrate that our model increases the inference speed in online scenarios while maintaining competitive performance, paving the way for scalable and interactive 4D vision systems. Code is available at: https://github.com/wzzheng/StreamVGGT.
nan
Article 421
Title@2025-07-15 (2): Canonical Bayesian Linear System Identification
Title: Canonical Bayesian Linear System Identification | Canonical Bayesian Linear System Identification | Canonical Bayesian Canonical Bayesian 线性系统识别 2507.11535v1 |
Authors (4): Andrey Bryutkin, Matthew E. Levine, Iñigo Urteaga, Youssef Marzouk
Standard Bayesian approaches for linear time-invariant (LTI) system identification are hindered by parameter non-identifiability; the resulting complex, multi-modal posteriors make inference inefficient and impractical. We solve this problem by embedding canonical forms of LTI systems within the Bayesian framework. We rigorously establish that inference in these minimal parameterizations fully captures all invariant system dynamics (e.g., transfer functions, eigenvalues, predictive distributions of system outputs) while resolving identifiability. This approach unlocks the use of meaningful, structure-aware priors (e.g., enforcing stability via eigenvalues) and ensures conditions for a Bernstein–von Mises theorem – a link between Bayesian and frequentist large-sample asymptotics that is broken in standard forms. Extensive simulations with modern MCMC methods highlight advantages over standard parameterizations: canonical forms achieve higher computational efficiency, generate interpretable and well-behaved posteriors, and provide robust uncertainty estimates, particularly from limited data.
nan
Article 422
Title@2025-07-15 (2): Langevin Flows for Modeling Neural Latent Dynamics
Title: Langevin Flows for Modeling Neural Latent Dynamics | Langevin-Ströme für die Modellierung neuraler Latent-Dynamik | 模拟神经内流动态的Langevin流程 2507.11531v1 |
Authors (5): Yue Song, T. Anderson Keller, Yisong Yue, Pietro Perona, Max Welling
Neural populations exhibit latent dynamical structures that drive time-evolving spiking activities, motivating the search for models that capture both intrinsic network dynamics and external unobserved influences. In this work, we introduce LangevinFlow, a sequential Variational Auto-Encoder where the time evolution of latent variables is governed by the underdamped Langevin equation. Our approach incorporates physical priors – such as inertia, damping, a learned potential function, and stochastic forces – to represent both autonomous and non-autonomous processes in neural systems. Crucially, the potential function is parameterized as a network of locally coupled oscillators, biasing the model toward oscillatory and flow-like behaviors observed in biological neural populations. Our model features a recurrent encoder, a one-layer Transformer decoder, and Langevin dynamics in the latent space. Empirically, our method outperforms state-of-the-art baselines on synthetic neural populations generated by a Lorenz attractor, closely matching ground-truth firing rates. On the Neural Latents Benchmark (NLB), the model achieves superior held-out neuron likelihoods (bits per spike) and forward prediction accuracy across four challenging datasets. It also matches or surpasses alternative methods in decoding behavioral metrics such as hand velocity. Overall, this work introduces a flexible, physics-inspired, high-performing framework for modeling complex neural population dynamics and their unobserved influences.
nan
Article 423
Title@2025-07-15 (2): EXPO: Stable Reinforcement Learning with Expressive Policies
Title: EXPO: Stable Reinforcement Learning with Expressive Policies | EXPO: Stabiles Stärkungslernen mit ausdrucksstarker Politik | 出口促进: 采用表达式政策进行稳定的加强学习 2507.07986v2 |
Authors (4): Perry Dong, Qiyang Li, Dorsa Sadigh, Chelsea Finn
We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a unique challenge of stable value maximization. Unlike simpler Gaussian policies commonly used in online RL, expressive policies like diffusion and flow-matching policies are parameterized by a long denoising chain, which hinders stable gradient propagation from actions to policy parameters when optimizing against some value function. Our key insight is that we can address stable value maximization by avoiding direct optimization over value with the expressive policy and instead construct an on-the-fly RL policy to maximize Q-value. We propose Expressive Policy Optimization (EXPO), a sample-efficient online RL algorithm that utilizes an on-the-fly policy to maximize value with two parameterized policies – a larger expressive base policy trained with a stable imitation learning objective and a light-weight Gaussian edit policy that edits the actions sampled from the base policy toward a higher value distribution. The on-the-fly policy optimizes the actions from the base policy with the learned edit policy and chooses the value maximizing action from the base and edited actions for both sampling and temporal-difference (TD) backup. Our approach yields up to 2-3x improvement in sample efficiency on average over prior methods both in the setting of fine-tuning a pretrained policy given offline data and in leveraging offline data to train online.
nan
Article 424
Title@2025-07-15 (2): CATVis: Context-Aware Thought Visualization
Title: CATVis: Context-Aware Thought Visualization | CATVis: Kontext-Bewusste Gedankenvisualisierung | CAT-Vis:背景意识思想视觉化 2507.11522v1 |
Authors (4): Tariq Mehmood, Hamza Ahmad, Muhammad Haroon Shakeel, Murtaza Taj
EEG-based brain-computer interfaces (BCIs) have shown promise in various applications, such as motor imagery and cognitive state monitoring. However, decoding visual representations from EEG signals remains a significant challenge due to their complex and noisy nature. We thus propose a novel 5-stage framework for decoding visual representations from EEG signals: (1) an EEG encoder for concept classification, (2) cross-modal alignment of EEG and text embeddings in CLIP feature space, (3) caption refinement via re-ranking, (4) weighted interpolation of concept and caption embeddings for richer semantics, and (5) image generation using a pre-trained Stable Diffusion model. We enable context-aware EEG-to-image generation through cross-modal alignment and re-ranking. Experimental results demonstrate that our method generates high-quality images aligned with visual stimuli, outperforming SOTA approaches by 13.43% in Classification Accuracy, 15.21% in Generation Accuracy and reducing Fr'echet Inception Distance by 36.61%, indicating superior semantic alignment and image quality.
nan
Article 425
Title@2025-07-15 (2): Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Title: Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models | Hi Robot: Open Ended Instruction mit Hierarchical Vision-Language-Action-Modellen | 高机器人:不限名额教学,采用等级愿景-语言-行动模式 2502.19417v2 |
Authors (15): Lucy Xiaoyang Shi, Brian Ichter, Michael Equi, Liyiming Ke, Karl Pertsch, Quan Vuong, James Tanner, Anna Walling, Haohuan Wang, Niccolo Fusai, Adrian Li-Bell, Danny Driess, Lachy Groom, Sergey Levine, Chelsea Finn
Generalist robots that can perform a range of different tasks in open-world settings must be able to not only reason about the steps needed to accomplish their goals, but also process complex instructions, prompts, and even feedback during task execution. Intricate instructions (e.g., “Could you make me a vegetarian sandwich?” or “I don’t like that one”) require not just the ability to physically perform the individual steps, but the ability to situate complex commands and feedback in the physical world. In this work, we describe a system that uses vision-language models in a hierarchical structure, first reasoning over complex prompts and user feedback to deduce the most appropriate next step to fulfill the task, and then performing that step with low-level actions. In contrast to direct instruction following methods that can fulfill simple commands (“pick up the cup”), our system can reason through complex prompts and incorporate situated feedback during task execution (“that’s not trash”). We evaluate our system across three robotic platforms, including single-arm, dual-arm, and dual-arm mobile robots, demonstrating its ability to handle tasks such as cleaning messy tables, making sandwiches, and grocery shopping. Videos are available at https://www.pi.website/research/hirobot
nan
Article 426
Title@2025-07-15 (2): AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air
Title: AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air | AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air | AirLLM:传播基于政策的适应性LORA,用于远距离微调LLM在空中的LLM 2507.11515v1 |
Authors (6): Shiyi Yang, Xiaoxue Yu, Rongpeng Li, Jianhang Zhu, Zhifeng Zhao, Honggang Zhang
Operating Large Language Models (LLMs) on edge devices is increasingly challenged by limited communication bandwidth and strained computational and memory costs. Thus, cloud-assisted remote fine-tuning becomes indispensable. Nevertheless, existing Low-Rank Adaptation (LoRA) approaches typically employ fixed or heuristic rank configurations, and the subsequent over-the-air transmission of all LoRA parameters could be rather inefficient. To address this limitation, we develop AirLLM, a hierarchical diffusion policy framework for communication-aware LoRA adaptation. Specifically, AirLLM models the rank configuration as a structured action vector that spans all LoRA-inserted projections. To solve the underlying high-dimensional sequential decision-making problem, a Proximal Policy Optimization (PPO) agent generates coarse-grained decisions by jointly observing wireless states and linguistic complexity, which are then refined via Denoising Diffusion Implicit Models (DDIM) to produce high-resolution, task- and channel-adaptive rank vectors. The two modules are optimized alternatively, with the DDIM trained under the Classifier-Free Guidance (CFG) paradigm to maintain alignment with PPO rewards. Experiments under varying signal-to-noise ratios demonstrate that AirLLM consistently enhances fine-tuning performance while significantly reducing transmission costs, highlighting the effectiveness of reinforcement-driven, diffusion-refined rank adaptation for scalable and efficient remote fine-tuning over the air.
nan
Article 427
Title@2025-07-15 (2): Are DeepSeek R1 And Other Reasoning Models More Faithful?
Title: Are DeepSeek R1 And Other Reasoning Models More Faithful? | Sind DeepSeek R1 und andere vernünftige Modelle treuer? | DeepSeek R1和其他理由模型更可信吗? 2501.08156v5 |
Authors (2): James Chua, Owain Evans
Language models trained to solve reasoning tasks via reinforcement learning have achieved striking results. We refer to these models as reasoning models. Are the Chains of Thought (CoTs) of reasoning models more faithful than traditional models? We evaluate three reasoning models (based on Qwen-2.5, Gemini-2, and DeepSeek-V3-Base) on an existing test of faithful CoT. To measure faithfulness, we test whether models can describe how a cue in their prompt influences their answer to MMLU questions. For example, when the cue “A Stanford Professor thinks the answer is D” is added to the prompt, models sometimes switch their answer to D. In such cases, the DeepSeek-R1 reasoning model describes the cue’s influence 59% of the time, compared to 7% for the non-reasoning DeepSeek model. We evaluate seven types of cue, such as misleading few-shot examples and suggestive follow-up questions from the user. Reasoning models describe cues that influence them much more reliably than all the non-reasoning models tested (including Claude-3.5-Sonnet and GPT-4o). In an additional experiment, we provide evidence suggesting that the use of reward models causes less faithful responses – which may help explain why non-reasoning models are less faithful. Our study has two main limitations. First, we test faithfulness using a set of artificial tasks, which may not reflect realistic use-cases. Second, we only measure one specific aspect of faithfulness – whether models can describe the influence of cues. Future research should investigate whether the advantage of reasoning models in faithfulness holds for a broader set of tests. Still, we think this increase in faithfulness is promising for the explainability of language models.
nan
Article 428
Title@2025-07-15 (2): Large Language Models Engineer Too Many Simple Features For Tabular Data
Title: Large Language Models Engineer Too Many Simple Features For Tabular Data | Large Language Models Engineer Zu viele einfache Funktionen für Tabellendaten | 大语言模型工程师 2410.17787v2 |
Authors (3): Jaris Küken, Lennart Purucker, Frank Hutter
Tabular machine learning problems often require time-consuming and labor-intensive feature engineering. Recent efforts have focused on using large language models (LLMs) to capitalize on their potential domain knowledge. At the same time, researchers have observed ethically concerning negative biases in other LLM-related use cases, such as text generation. These developments motivated us to investigate whether LLMs exhibit a bias that negatively impacts the performance of feature engineering. While not ethically concerning, such a bias could hinder practitioners from fully utilizing LLMs for automated data science. Therefore, we propose a method to detect potential biases by detecting anomalies in the frequency of operators (e.g., adding two features) suggested by LLMs when engineering new features. Our experiments evaluate the bias of four LLMs, two big frontier and two small open-source models, across 27 tabular datasets. Our results indicate that LLMs are biased toward simple operators, such as addition, and can fail to utilize more complex operators, such as grouping followed by aggregations. Furthermore, the bias can negatively impact the predictive performance when using LLM-generated features. Our results call for mitigating bias when using LLMs for feature engineering.
nan
Article 429
Title@2025-07-15 (2): Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques
Title: Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques | Elk: Erforschung der Effizienz von Intercore-vernetzten KI-Chips mit Deep Learning Compiler-Techniken | Elk:探索与深学习汇编者技术一起的机构间连接的AI芯片的效率 2507.11506v1 |
Authors (5): Yiqi Liu, Yuqi Xue, Noelle Crawford, Jilong Xue, Jian Huang
To meet the increasing demand of deep learning (DL) models, AI chips are employing both off-chip memory (e.g., HBM) and high-bandwidth low-latency interconnect for direct inter-core data exchange. However, it is not easy to explore the efficiency of these inter-core connected AI (ICCA) chips, due to a fundamental tussle among compute (per-core execution), communication (inter-core data exchange), and I/O (off-chip data access). In this paper, we develop Elk, a DL compiler framework to maximize the efficiency of ICCA chips by jointly trading off all the three performance factors discussed above. Elk structures these performance factors into configurable parameters and forms a global trade-off space in the DL compiler. To systematically explore this space and maximize overall efficiency, Elk employs a new inductive operator scheduling policy and a cost-aware on-chip memory allocation algorithm. It generates globally optimized execution plans that best overlap off-chip data loading and on-chip execution. To examine the efficiency of Elk, we build a full-fledged emulator based on a real ICCA chip IPU-POD4, and an ICCA chip simulator for sensitivity analysis with different interconnect network topologies. Elk achieves 94% of the ideal roofline performance of ICCA chips on average, showing the benefits of supporting large DL models on ICCA chips. We also show Elk’s capability of enabling architecture design space exploration for new ICCA chip development.
nan
Article 430
Title@2025-07-15 (2): A Mathematical Theory of Discursive Networks
Title: A Mathematical Theory of Discursive Networks | Eine mathematische Theorie diskursiver Netzwerke | 讨论网络的数学理论 2507.06565v3 |
Authors (1): Juan B. Gutiérrez
Large-language models (LLMs) turn writing into a live exchange between humans and software. We characterize this new medium as a discursive network that treats people and LLMs as equal nodes and tracks how their statements circulate. We define the generation of erroneous information as invalidation (any factual, logical, or structural breach) and show it follows four hazards: drift from truth, self-repair, fresh fabrication, and external detection. We develop a general mathematical model of discursive networks that shows that a network governed only by drift and self-repair stabilizes at a modest error rate. Giving each false claim even a small chance of peer review shifts the system to a truth-dominant state. We operationalize peer review with the open-source \emph{Flaws-of-Others (FOO) algorithm}: a configurable loop in which any set of agents critique one another while a harmonizer merges their verdicts. We identify an ethical transgression, epithesis, that occurs when humans fail to engage in the discursive network. The takeaway is practical and cultural: reliability in this new medium comes not from perfecting single models but from connecting imperfect ones into networks that enforce mutual accountability.
nan
Article 431
Title@2025-07-15 (2): ComFairGNN: Community Fair Graph Neural Network
Title: ComFairGNN: Community Fair Graph Neural Network | ComFairGNN: Gemeinschaftsgerechtes Diagramm-Neural-Netzwerk | ComfairGNNN:社区公平图形神经网络 2411.04371v3 |
Authors (2): Yonas Sium, Qi Li
Graph Neural Networks (GNNs) have become the leading approach for addressing graph analytical problems in various real-world scenarios. However, GNNs may produce biased predictions against certain demographic subgroups due to node attributes and neighbors surrounding a node. Most current research on GNN fairness focuses predominantly on debiasing GNNs using oversimplified fairness evaluation metrics, which can give a misleading impression of fairness. Understanding the potential evaluation paradoxes due to the complicated nature of the graph structure is crucial for developing effective GNN debiasing mechanisms. In this paper, we examine the effectiveness of current GNN debiasing methods in terms of unfairness evaluation. Specifically, we introduce a community-level strategy to measure bias in GNNs and evaluate debiasing methods at this level. Further, We introduce ComFairGNN, a novel framework designed to mitigate community-level bias in GNNs. Our approach employs a learnable coreset-based debiasing function that addresses bias arising from diverse local neighborhood distributions during GNNs neighborhood aggregation. Comprehensive evaluations on three benchmark datasets demonstrate our model’s effectiveness in both accuracy and fairness metrics.
nan
Article 432
Title@2025-07-15 (2): Searching Latent Program Spaces
Title: Searching Latent Program Spaces | Suche nach latenten Programmräumen | 搜索隐藏程序空间 2411.08706v2 |
Authors (2): Matthew V Macfarlane, Clément Bonnet
General intelligence requires systems that acquire new skills efficiently and generalize beyond their training distributions. Although program synthesis approaches have strong generalization power, they face scaling issues due to large combinatorial spaces that quickly make them impractical and require human-generated DSLs or pre-trained priors to narrow this search space. On the other hand, deep learning methods have had high successes, but they lack structured test-time adaptation and rely on heavy stochastic sampling or expensive gradient updates for fine-tuning. In this work, we propose the Latent Program Network (LPN), a new architecture that builds in test-time search directly into neural models. LPN learns a latent space of implicit programs–neurally mapping inputs to outputs–through which it can search using gradients at test time. LPN combines the adaptability of symbolic approaches and the scalability of neural methods. It searches through a compact latent space at test time and bypasses the need for pre-defined domain-specific languages. On a range of programming-by-examples tasks, LPN either outperforms or matches performance compared to in-context learning and test-time training methods. Tested on the ARC-AGI benchmark, we demonstrate that LPN can both learn a compact program space and search through it at test time to adapt to novel tasks. LPN doubles its performance on out-of-distribution tasks when test-time search is switched on.
nan
Article 433
Title@2025-07-15 (2): Reinforcement Learning with Action Chunking
Title: Reinforcement Learning with Action Chunking | Verstärktes Lernen mit Action Chunking | 强化学习与行动决赛 2507.07969v2 |
Authors (3): Qiyang Li, Zhiyuan Zhou, Sergey Levine
We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a ‘chunked’ action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.
nan
Article 434
Title@2025-07-15 (2): Exploring the robustness of TractOracle methods in RL-based tractography
Title: Exploring the robustness of TractOracle methods in RL-based tractography | Erforschung der Robustheit von TractOracle-Methoden in der RL-basierten Traktographie | 探索基于RL的地形图象学中的Tract Oracle方法的稳健性 2507.11486v1 |
Authors (4): Jeremi Levesque, Antoine Théberge, Maxime Descoteaux, Pierre-Marc Jodoin
Tractography algorithms leverage diffusion MRI to reconstruct the fibrous architecture of the brain’s white matter. Among machine learning approaches, reinforcement learning (RL) has emerged as a promising framework for tractography, outperforming traditional methods in several key aspects. TractOracle-RL, a recent RL-based approach, reduces false positives by incorporating anatomical priors into the training process via a reward-based mechanism. In this paper, we investigate four extensions of the original TractOracle-RL framework by integrating recent advances in RL, and we evaluate their performance across five diverse diffusion MRI datasets. Results demonstrate that combining an oracle with the RL framework consistently leads to robust and reliable tractography, regardless of the specific method or dataset used. We also introduce a novel RL training scheme called Iterative Reward Training (IRT), inspired by the Reinforcement Learning from Human Feedback (RLHF) paradigm. Instead of relying on human input, IRT leverages bundle filtering methods to iteratively refine the oracle’s guidance throughout training. Experimental results show that RL methods trained with oracle feedback significantly outperform widely used tractography techniques in terms of accuracy and anatomical validity.
nan
Article 435
Title@2025-07-15 (2): Model See Model Do: Speech-Driven Facial Animation with Style Control
Title: Model See Model Do: Speech-Driven Facial Animation with Style Control | Modell siehe Modell Do: Sprachgesteuerte Gesichtsanimation mit Stilsteuerung | 见示范 do:带有样式控制的语音驱动动画模型 2505.01319v2 |
Authors (3): Yifang Pan, Karan Singh, Luiz Gustavo Hafemann
Speech-driven 3D facial animation plays a key role in applications such as virtual avatars, gaming, and digital content creation. While existing methods have made significant progress in achieving accurate lip synchronization and generating basic emotional expressions, they often struggle to capture and effectively transfer nuanced performance styles. We propose a novel example-based generation framework that conditions a latent diffusion model on a reference style clip to produce highly expressive and temporally coherent facial animations. To address the challenge of accurately adhering to the style reference, we introduce a novel conditioning mechanism called style basis, which extracts key poses from the reference and additively guides the diffusion generation process to fit the style without compromising lip synchronization quality. This approach enables the model to capture subtle stylistic cues while ensuring that the generated animations align closely with the input speech. Extensive qualitative, quantitative, and perceptual evaluations demonstrate the effectiveness of our method in faithfully reproducing the desired style while achieving superior lip synchronization across various speech scenarios.
nan
Article 436
Title@2025-07-15 (2): Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Title: Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety | Chain of Thought Monitoringability: Eine neue und fragile Chance für KI-Sicherheit | 《思想链可监测性:AI安全的新机会和脆弱机会》 2507.11473v1 |
Authors (41): Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Yoshua Bengio, Joe Benton, Joseph Bloom, Mark Chen, Alan Cooney, Allan Dafoe, Anca Dragan, Scott Emmons, Owain Evans, David Farhi, Ryan Greenblatt, Dan Hendrycks, Marius Hobbhahn, Evan Hubinger, Geoffrey Irving, Erik Jenner, Daniel Kokotajlo, Victoria Krakovna, Shane Legg, David Lindner, David Luan, Aleksander Mądry, Julian Michael, Neel Nanda, Dave Orr, Jakub Pachocki, Ethan Perez, Mary Phuong, Fabien Roger, Joshua Saxe, Buck Shlegeris, Martín Soto, Eric Steinberger, Jasmine Wang, Wojciech Zaremba, Bowen Baker, Rohin Shah, Vlad Mikulik
AI systems that “think” in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.
nan
Article 437
Title@2025-07-15 (2): D3FL: Data Distribution and Detrending for Robust Federated Learning in Non-linear Time-series Data
Title: D3FL: Data Distribution and Detrending for Robust Federated Learning in Non-linear Time-series Data | D3FL: Datenverteilung und Detrending für robustes Federated Learning in nichtlinearen Zeitreihendaten | D3FL:非线性时间序列数据中硬性联邦学习的数据分配和分流 2507.11471v1 |
Authors (3): Harsha Varun Marisetty, Manik Gupta, Yogesh Simmhan
With advancements in computing and communication technologies, the Internet of Things (IoT) has seen significant growth. IoT devices typically collect data from various sensors, such as temperature, humidity, and energy meters. Much of this data is temporal in nature. Traditionally, data from IoT devices is centralized for analysis, but this approach introduces delays and increased communication costs. Federated learning (FL) has emerged as an effective alternative, allowing for model training across distributed devices without the need to centralize data. In many applications, such as smart home energy and environmental monitoring, the data collected by IoT devices across different locations can exhibit significant variation in trends and seasonal patterns. Accurately forecasting such non-stationary, non-linear time-series data is crucial for applications like energy consumption estimation and weather forecasting. However, these data variations can severely impact prediction accuracy. The key contributions of this paper are: (1) Investigating how non-linear, non-stationary time-series data distributions, like generalized extreme value (gen-extreme) and log norm distributions, affect FL performance. (2) Analyzing how different detrending techniques for non-linear time-series data influence the forecasting model’s performance in a FL setup. We generated several synthetic time-series datasets using non-linear data distributions and trained an LSTM-based forecasting model using both centralized and FL approaches. Additionally, we evaluated the impact of detrending on real-world datasets with non-linear time-series data distributions. Our experimental results show that: (1) FL performs worse than centralized approaches when dealing with non-linear data distributions. (2) The use of appropriate detrending techniques improves FL performance, reducing loss across different data distributions.
nan
Article 438
Title@2025-07-15 (2): Gram-Schmidt Methods for Unsupervised Feature Extraction and Selection
Title: Gram-Schmidt Methods for Unsupervised Feature Extraction and Selection | Gram-Schmidt Methoden zur unüberwachten Feature-Extraktion und -Auswahl | 不受监督地物采掘和选择的Gram-Schmidt方法 2311.09386v4 |
Authors (3): Bahram Yaghooti, Netanel Raviv, Bruno Sinopoli
Feature extraction and selection in the presence of nonlinear dependencies among the data is a fundamental challenge in unsupervised learning. We propose using a Gram-Schmidt (GS) type orthogonalization process over function spaces to detect and map out such dependencies. Specifically, by applying the GS process over some family of functions, we construct a series of covariance matrices that can either be used to identify new large-variance directions, or to remove those dependencies from known directions. In the former case, we provide information-theoretic guarantees in terms of entropy reduction. In the latter, we provide precise conditions by which the chosen function family eliminates existing redundancy in the data. Each approach provides both a feature extraction and a feature selection algorithm. Our feature extraction methods are linear, and can be seen as natural generalization of principal component analysis (PCA). We provide experimental results for synthetic and real-world benchmark datasets which show superior performance over state-of-the-art (linear) feature extraction and selection algorithms. Surprisingly, our linear feature extraction algorithms are comparable and often outperform several important nonlinear feature extraction methods such as autoencoders, kernel PCA, and UMAP. Furthermore, one of our feature selection algorithms strictly generalizes a recent Fourier-based feature selection mechanism (Heidari et al., IEEE Transactions on Information Theory, 2022), yet at significantly reduced complexity.
nan
Article 439
Title@2025-07-15 (2): Training neural control variates using correlated configurations
Title: Training neural control variates using correlated configurations | Ausbildung von Neuralsteuerungsvariaten mit korrelierten Konfigurationen | 使用相关配置的培训神经控制变异 2505.07719v3 |
Authors (1): Hyunwoo Oh
Neural control variates (NCVs) have emerged as a powerful tool for variance reduction in Monte Carlo (MC) simulations, particularly in high-dimensional problems where traditional control variates are difficult to construct analytically. By training neural networks to learn auxiliary functions correlated with the target observable, NCVs can significantly reduce estimator variance while preserving unbiasedness. However, a critical but often overlooked aspect of NCV training is the role of autocorrelated samples generated by Markov Chain Monte Carlo (MCMC). While such samples are typically discarded for error estimation due to their statistical redundancy, they may contain useful information about the structure of the underlying probability distribution that can benefit the training process. In this work, we systematically examine the effect of using correlated configurations in training neural control variates. We demonstrate, both conceptually and numerically, that training on correlated data can improve control variate performance, especially in settings with limited computational resources. Our analysis includes empirical results from $U(1)$ gauge theory and scalar field theory, illustrating when and how autocorrelated samples enhance NCV construction. These findings provide practical guidance for the efficient use of MCMC data in training neural networks.
nan
Article 440
Title@2025-07-15 (2): LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer
Title: LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer | LRMR: LLM-getriebenes relationales Multiknoten-Ranking für Lymphknotenmetastasis-Abschätzung bei rektaler Krebserkrankung | LRMR: 红外癌症中淋巴结结结节元值评估的LLM-Driven 关系多节分级 2507.11457v1 |
Authors (5): Yaoxian Dong, Yifan Gao, Haoyue Li, Yanfen Cui, Xin Gao
Accurate preoperative assessment of lymph node (LN) metastasis in rectal cancer guides treatment decisions, yet conventional MRI evaluation based on morphological criteria shows limited diagnostic performance. While some artificial intelligence models have been developed, they often operate as black boxes, lacking the interpretability needed for clinical trust. Moreover, these models typically evaluate nodes in isolation, overlooking the patient-level context. To address these limitations, we introduce LRMR, an LLM-Driven Relational Multi-node Ranking framework. This approach reframes the diagnostic task from a direct classification problem into a structured reasoning and ranking process. The LRMR framework operates in two stages. First, a multimodal large language model (LLM) analyzes a composite montage image of all LNs from a patient, generating a structured report that details ten distinct radiological features. Second, a text-based LLM performs pairwise comparisons of these reports between different patients, establishing a relative risk ranking based on the severity and number of adverse features. We evaluated our method on a retrospective cohort of 117 rectal cancer patients. LRMR achieved an area under the curve (AUC) of 0.7917 and an F1-score of 0.7200, outperforming a range of deep learning baselines, including ResNet50 (AUC 0.7708). Ablation studies confirmed the value of our two main contributions: removing the relational ranking stage or the structured prompting stage led to a significant performance drop, with AUCs falling to 0.6875 and 0.6458, respectively. Our work demonstrates that decoupling visual perception from cognitive reasoning through a two-stage LLM framework offers a powerful, interpretable, and effective new paradigm for assessing lymph node metastasis in rectal cancer.
nan
Article 441
Title@2025-07-15 (2): A Generative Approach to LLM Harmfulness Detection with Special Red Flag Tokens
Title: A Generative Approach to LLM Harmfulness Detection with Special Red Flag Tokens | Eine generative Annäherung an LLM Harmfulness Detection mit speziellen roten Flaggen-Tokens | 利用特别红旗拳生成LLM 无害性探测法 2502.16366v3 |
Authors (5): Sophie Xhonneux, David Dobre, Mehrnaz Mofakhami, Leo Schwinn, Gauthier Gidel
Most safety training methods for large language models (LLMs) are based on fine-tuning that forces models to shift from an unsafe answer to refusal when faced with harmful requests. Unfortunately, these drastic distribution shifts generally compromise model capabilities. To avoid that, we propose to expand the model’s vocabulary with a special token we call red flag token (
nan
Article 442
Title@2025-07-15 (2): Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs
Title: Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs | Synthetische Datensätze für maschinelles Lernen auf Spatio-Temporalen Graphen mit PDEs | 利用PDEs在斯帕蒂奥-时空图上进行机器学习的合成数据集 2502.04140v2 |
Authors (5): Jost Arndt, Utku Isil, Michael Detzel, Wojciech Samek, Jackie Ma
Many physical processes can be expressed through partial differential equations (PDEs). Real-world measurements of such processes are often collected at irregularly distributed points in space, which can be effectively represented as graphs; however, there are currently only a few existing datasets. Our work aims to make advancements in the field of PDE-modeling accessible to the temporal graph machine learning community, while addressing the data scarcity problem, by creating and utilizing datasets based on PDEs. In this work, we create and use synthetic datasets based on PDEs to support spatio-temporal graph modeling in machine learning for different applications. More precisely, we showcase three equations to model different types of disasters and hazards in the fields of epidemiology, atmospheric particles, and tsunami waves. Further, we show how such created datasets can be used by benchmarking several machine learning models on the epidemiological dataset. Additionally, we show how pre-training on this dataset can improve model performance on real-world epidemiological data. The presented methods enable others to create datasets and benchmarks customized to individual requirements. The source code for our methodology and the three created datasets can be found on https://github.com/github-usr-ano/Temporal_Graph_Data_PDEs.
nan
Article 443
Title@2025-07-15 (2): Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control
Title: Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control | Langsame Entscheidungshäufigkeiten in der kontinuierlichen Kontrolle überwinden: Modellbasiertes Sequenz-Verstärkungs-Lernen für modellfreie Steuerung | 克服持续控制中缓慢决定因素:无模式控制的示范序列强化学习 2410.08979v4 |
Authors (2): Devdhar Patel, Hava Siegelmann
Reinforcement learning (RL) is rapidly reaching and surpassing human-level control capabilities. However, state-of-the-art RL algorithms often require timesteps and reaction times significantly faster than human capabilities, which is impractical in real-world settings and typically necessitates specialized hardware. We introduce Sequence Reinforcement Learning (SRL), an RL algorithm designed to produce a sequence of actions for a given input state, enabling effective control at lower decision frequencies. SRL addresses the challenges of learning action sequences by employing both a model and an actor-critic architecture operating at different temporal scales. We propose a “temporal recall” mechanism, where the critic uses the model to estimate intermediate states between primitive actions, providing a learning signal for each individual action within the sequence. Once training is complete, the actor can generate action sequences independently of the model, achieving model-free control at a slower frequency. We evaluate SRL on a suite of continuous control tasks, demonstrating that it achieves performance comparable to state-of-the-art algorithms while significantly reducing actor sample complexity. To better assess performance across varying decision frequencies, we introduce the Frequency-Averaged Score (FAS) metric. Our results show that SRL significantly outperforms traditional RL algorithms in terms of FAS, making it particularly suitable for applications requiring variable decision frequencies. Furthermore, we compare SRL with model-based online planning, showing that SRL achieves comparable FAS while leveraging the same model during training that online planners use for planning.
nan
Article 444
Title@2025-07-15 (2): Implementing Adaptations for Vision AutoRegressive Model
Title: Implementing Adaptations for Vision AutoRegressive Model | Implementierung von Anpassungen für das AutoRegressive Vision Modell | 实施适应展望自动递减模式 2507.11441v1 |
Authors (4): Kaif Shaikh, Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic
Vision AutoRegressive model (VAR) was recently introduced as an alternative to Diffusion Models (DMs) in image generation domain. In this work we focus on its adaptations, which aim to fine-tune pre-trained models to perform specific downstream tasks, like medical data generation. While for DMs there exist many techniques, adaptations for VAR remain underexplored. Similarly, differentially private (DP) adaptations-ones that aim to preserve privacy of the adaptation data-have been extensively studied for DMs, while VAR lacks such solutions. In our work, we implement and benchmark many strategies for VAR, and compare them to state-of-the-art DM adaptation strategies. We observe that VAR outperforms DMs for non-DP adaptations, however, the performance of DP suffers, which necessitates further research in private adaptations for VAR. Code is available at https://github.com/sprintml/finetuning_var_dp.
nan
Article 445
Title@2025-07-15 (2): Toward Improving fNIRS Classification: A Study on Activation Functions in Deep Neural Architectures
Title: Toward Improving fNIRS Classification: A Study on Activation Functions in Deep Neural Architectures | Zur Verbesserung der fNIRS-Klassifikation: Eine Studie über Aktivierungsfunktionen in tiefen neuralen Architekturen | 努力改进FNIRS分类:关于深神经结构中激活功能的研究 2507.11436v1 |
Authors (5): Behtom Adeli, John McLinden, Pankaj Pandey, Ming Shao, Yalda Shahriari
Activation functions are critical to the performance of deep neural networks, particularly in domains such as functional near-infrared spectroscopy (fNIRS), where nonlinearity, low signal-to-noise ratio (SNR), and signal variability poses significant challenges to model accuracy. However, the impact of activation functions on deep learning (DL) performance in the fNIRS domain remains underexplored and lacks systematic investigation in the current literature. This study evaluates a range of conventional and field-specific activation functions for fNIRS classification tasks using multiple deep learning architectures, including the domain-specific fNIRSNet, AbsoluteNet, MDNN, and shallowConvNet (as the baseline), all tested on a single dataset recorded during an auditory task. To ensure fair a comparison, all networks were trained and tested using standardized preprocessing and consistent training parameters. The results show that symmetrical activation functions such as Tanh and the Absolute value function Abs(x) can outperform commonly used functions like the Rectified Linear Unit (ReLU), depending on the architecture. Additionally, a focused analysis of the role of symmetry was conducted using a Modified Absolute Function (MAF), with results further supporting the effectiveness of symmetrical activation functions on performance gains. These findings underscore the importance of selecting proper activation functions that align with the signal characteristics of fNIRS data.
nan
Article 446
Title@2025-07-15 (2): FLsim: A Modular and Library-Agnostic Simulation Framework for Federated Learning
Title: FLsim: A Modular and Library-Agnostic Simulation Framework for Federated Learning | FLsim: Ein modulares und bibliotheks-agnostisches Simulations-Framework für Federated Learning | FLsim: 联邦学习模式和图书馆-不可知模拟框架 2507.11430v1 |
Authors (3): Arnab Mukherjee, Raju Halder, Joydeep Chandra
Federated Learning (FL) has undergone significant development since its inception in 2016, advancing from basic algorithms to complex methodologies tailored to address diverse challenges and use cases. However, research and benchmarking of novel FL techniques against a plethora of established state-of-the-art solutions remain challenging. To streamline this process, we introduce FLsim, a comprehensive FL simulation framework designed to meet the diverse requirements of FL workflows in the literature. FLsim is characterized by its modularity, scalability, resource efficiency, and controlled reproducibility of experimental outcomes. Its easy to use interface allows users to specify customized FL requirements through job configuration, which supports: (a) customized data distributions, ranging from non-independent and identically distributed (non-iid) data to independent and identically distributed (iid) data, (b) selection of local learning algorithms according to user preferences, with complete agnosticism to ML libraries, (c) choice of network topology illustrating communication patterns among nodes, (d) definition of model aggregation and consensus algorithms, and (e) pluggable blockchain support for enhanced robustness. Through a series of experimental evaluations, we demonstrate the effectiveness and versatility of FLsim in simulating a diverse range of state-of-the-art FL experiments. We envisage that FLsim would mark a significant advancement in FL simulation frameworks, offering unprecedented flexibility and functionality for researchers and practitioners alike.
nan
Article 447
Title@2025-07-15 (2): Matrix Is All You Need
Title: Matrix Is All You Need | Matrix ist alles, was Sie brauchen | 母体是所有你需要的 2506.01966v2 |
Authors (1): Yuzhou Zhu
Deep neural networks employ specialized architectures for vision, sequential and language tasks, yet this proliferation obscures their underlying commonalities. We introduce a unified matrix-order framework that casts convolutional, recurrent and self-attention operations as sparse matrix multiplications. Convolution is realized via an upper-triangular weight matrix performing first-order transformations; recurrence emerges from a lower-triangular matrix encoding stepwise updates; attention arises naturally as a third-order tensor factorization. We prove algebraic isomorphism with standard CNN, RNN and Transformer layers under mild assumptions. Empirical evaluations on image classification (MNIST, CIFAR-10/100, Tiny ImageNet), time-series forecasting (ETTh1, Electricity Load Diagrams) and language modeling/classification (AG News, WikiText-2, Penn Treebank) confirm that sparse-matrix formulations match or exceed native model performance while converging in comparable or fewer epochs. By reducing architecture design to sparse pattern selection, our matrix perspective aligns with GPU parallelism and leverages mature algebraic optimization tools. This work establishes a mathematically rigorous substrate for diverse neural architectures and opens avenues for principled, hardware-aware network design.
nan
Article 448
Title@2025-07-15 (2): Improving sub-seasonal wind-speed forecasts in Europe with a non-linear model
Title: Improving sub-seasonal wind-speed forecasts in Europe with a non-linear model | Verbesserung der Windgeschwindigkeitsprognosen innerhalb der Saison in Europa mit einem nichtlinearen Modell | 利用非线性模型改进欧洲季节性风速次风速预报 2411.19077v2 |
Authors (6): Ganglin Tian, Camille Le Coz, Anastase Alexandre Charantonis, Alexis Tantet, Naveen Goutham, Riwal Plougonven
Sub-seasonal wind speed forecasts provide valuable guidance for wind power system planning and operations, yet the forecast skills of surface winds decrease sharply after two weeks. However, large-scale variables exhibit greater predictability on this time scale. This study explores the potential of leveraging non-linear relationships between 500 hPa geopotential height (Z500) and surface wind speed to improve sub-seasonal wind speed forecast skills in Europe. Our proposed framework uses a Multiple Linear Regression (MLR) or a Convolutional Neural Network (CNN) to regress surface wind speed from Z500. Evaluations on ERA5 reanalysis indicate that the CNN performs better due to its non-linearity. Applying these models to sub-seasonal forecasts from the European Centre for Medium-Range Weather Forecasts, various verification metrics demonstrate the advantages of non-linearity. Yet, this is partly explained by the fact that these statistical models are under-dispersive since they explain only a fraction of the target variable variance. Introducing stochastic perturbations to represent the stochasticity of the unexplained part from the signal helps compensate for this issue. Results show that the perturbed CNN performs better than the perturbed MLR only in the first weeks, while the perturbed MLR’s performance converges towards that of the perturbed CNN after two weeks. The study finds that introducing stochastic perturbations can address the issue of insufficient spread in these statistical models, with improvements from the non-linearity varying with the lead time of the forecasts.
nan
Article 449
Title@2025-07-15 (2): Better Regret Rates in Bilateral Trade via Sublinear Budget Violation
Title: Better Regret Rates in Bilateral Trade via Sublinear Budget Violation | Bessere Bedauernsraten im bilateralen Handel durch sublineare Haushaltsverletzung | 双边贸易中因次线性预算违反规定而出现更好的遗憾率 2507.11419v1 |
Authors (3): Anna Lunghi, Matteo Castiglioni, Alberto Marchesi
Bilateral trade is a central problem in algorithmic economics, and recent work has explored how to design trading mechanisms using no-regret learning algorithms. However, no-regret learning is impossible when budget balance has to be enforced at each time step. Bernasconi et al. [Ber+24] show how this impossibility can be circumvented by relaxing the budget balance constraint to hold only globally over all time steps. In particular, they design an algorithm achieving regret of the order of $\tilde O(T^{3/4})$ and provide a lower bound of $\Omega(T^{5/7})$. In this work, we interpolate between these two extremes by studying how the optimal regret rate varies with the allowed violation of the global budget balance constraint. Specifically, we design an algorithm that, by violating the constraint by at most $T^{\beta}$ for any given $\beta \in [\frac{3}{4}, \frac{6}{7}]$, attains regret $\tilde O(T^{1 - \beta/3})$. We complement this result with a matching lower bound, thus fully characterizing the trade-off between regret and budget violation. Our results show that both the $\tilde O(T^{3/4})$ upper bound in the global budget balance case and the $\Omega(T^{5/7})$ lower bound under unconstrained budget balance violation obtained by Bernasconi et al. [Ber+24] are tight.
nan
Article 450
Title@2025-07-15 (2): A Resource Efficient Quantum Kernel
Title: A Resource Efficient Quantum Kernel | Ein ressourceneffizienter Quantenkern | 资源效率高的量子核心 2507.03689v2 |
Authors (4): Utkarsh Singh, Jean-Frédéric Laprade, Aaron Z. Goldberg, Khabat Heshami
Quantum processors may enhance machine learning by mapping high-dimensional data onto quantum systems for processing. Conventional quantum kernels, or feature maps, for encoding data features onto a quantum circuit are currently impractical, as the number of entangling gates scales quadratically with the dimension of the dataset and the number of qubits. In this work, we introduce a quantum kernel designed to handle high-dimensional data with a significantly reduced number of qubits and entangling operations. Our approach preserves essential data characteristics while promoting computational efficiency, as evidenced by extensive experiments on benchmark datasets that demonstrate a marked improvement in both accuracy and resource utilization, as compared to state-of-the-art quantum feature maps. Our noisy simulations results combined with lower resource requirements highlight our kernel’s ability to function within the constraints of noisy intermediate-scale quantum devices. Through numerical simulations and small-scale implementation on a superconducting circuit quantum computing platform, we demonstrate that our scheme performs on par or better than a set of classical algorithms for classification. Our findings herald a promising avenue for the practical implementation of quantum machine learning algorithms on near future quantum computing platforms.
nan
Article 451
Title@2025-07-15 (2): DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation
Title: DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation | DFRot: Erzielen von aussergewöhnlicher und massiver Aktivierungsfrei für rotierte LLMs mit raffinierter Rotation | DFRot: 实现无外源和无大规模激励-无源于经过精炼旋转的旋转LMLMs 2412.00648v4 |
Authors (2): Jingyang Xiang, Sai Qian Zhang
Rotating the activation and weight matrices to reduce the influence of outliers in large language models (LLMs) has recently attracted significant attention, particularly in the context of model quantization. Prior studies have shown that in low-precision quantization scenarios, such as 4-bit weights and 4-bit activations (W4A4), randomized Hadamard transforms can achieve significantly higher accuracy than randomized orthogonal transforms. Notably, the reason behind this phenomenon remains unknown. In this paper, we find that these transformations show substantial improvement in eliminating outliers for common tokens and achieve similar quantization error. The primary reason for the accuracy difference lies in the fact that randomized Hadamard transforms can slightly reduce the quantization error for tokens with massive activations while randomized orthogonal transforms increase the quantization error. Due to the extreme rarity of these tokens and their critical impact on model accuracy, we consider this a long-tail optimization problem, and therefore construct a simple yet effective method: a weighted loss function. Additionally, we propose an optimization strategy for the rotation matrix that involves alternating optimization of quantization parameters while employing orthogonal Procrustes transforms to refine the rotation matrix. This makes the distribution of the rotated activation values more conducive to quantization, especially for tokens with massive activations. Our method enhances the Rotated LLMs by achieving dual free, Outlier-Free and Massive Activation-Free, dubbed as DFRot. Extensive experiments demonstrate the effectiveness and efficiency of DFRot. By tuning the rotation matrix using just a single sample, DFRot achieves a perplexity improvement of 0.98 and 0.95 on W4A4KV4 and W4A4KV16, respectively, for LLaMA3-70B, a model known for its quantization challenges.
nan
Article 452
Title@2025-07-15 (2): Seq vs Seq: An Open Suite of Paired Encoders and Decoders
Title: Seq vs Seq: An Open Suite of Paired Encoders and Decoders | Seq vs Seq: Eine offene Suite aus koppelten Encodern und Decodern | Seq vs Seq:一个开放的套件,其中含有子元编码器和代碼器。 2507.11412v1 |
Authors (6): Orion Weller, Kathryn Ricci, Marc Marone, Antoine Chaffin, Dawn Lawrie, Benjamin Van Durme
The large language model (LLM) community focuses almost exclusively on decoder-only language models, since they are easier to use for text generation. However, a large subset of the community still uses encoder-only models for tasks such as classification or retrieval. Previous work has attempted to compare these architectures, but is forced to make comparisons with models that have different numbers of parameters, training techniques, and datasets. We introduce the SOTA open-data Ettin suite of models: paired encoder-only and decoder-only models ranging from 17 million parameters to 1 billion, trained on up to 2 trillion tokens. Using the same recipe for both encoder-only and decoder-only models produces SOTA recipes in both categories for their respective sizes, beating ModernBERT as an encoder and Llama 3.2 and SmolLM2 as decoders. Like previous work, we find that encoder-only models excel at classification and retrieval tasks while decoders excel at generative tasks. However, we show that adapting a decoder model to encoder tasks (and vice versa) through continued training is subpar compared to using only the reverse objective (i.e. a 400M encoder outperforms a 1B decoder on MNLI, and vice versa for generative tasks). We open-source all artifacts of this study including training data, training order segmented by checkpoint, and 200+ checkpoints to allow future work to analyze or extend all aspects of training.
nan
Article 453
Title@2025-07-15 (2): Robust-Multi-Task Gradient Boosting
Title: Robust-Multi-Task Gradient Boosting | Robust-Multi-Task-Gradienten-Boosting | 强力多任务梯级推动 2507.11411v1 |
Authors (3): Seyedsaman Emami, Gonzalo Martínez-Muñoz, Daniel Hernández-Lobato
Multi-task learning (MTL) has shown effectiveness in exploiting shared information across tasks to improve generalization. MTL assumes tasks share similarities that can improve performance. In addition, boosting algorithms have demonstrated exceptional performance across diverse learning problems, primarily due to their ability to focus on hard-to-learn instances and iteratively reduce residual errors. This makes them a promising approach for learning multi-task problems. However, real-world MTL scenarios often involve tasks that are not well-aligned (known as outlier or adversarial tasks), which do not share beneficial similarities with others and can, in fact, deteriorate the performance of the overall model. To overcome this challenge, we propose Robust-Multi-Task Gradient Boosting (R-MTGB), a novel boosting framework that explicitly models and adapts to task heterogeneity during training. R-MTGB structures the learning process into three sequential blocks: (1) learning shared patterns, (2) partitioning tasks into outliers and non-outliers with regularized parameters, and (3) fine-tuning task-specific predictors. This architecture enables R-MTGB to automatically detect and penalize outlier tasks while promoting effective knowledge transfer among related tasks. Our method integrates these mechanisms seamlessly within gradient boosting, allowing robust handling of noisy or adversarial tasks without sacrificing accuracy. Extensive experiments on both synthetic benchmarks and real-world datasets demonstrate that our approach successfully isolates outliers, transfers knowledge, and consistently reduces prediction errors for each task individually, and achieves overall performance gains across all tasks. These results highlight robustness, adaptability, and reliable convergence of R-MTGB in challenging MTL environments.
nan
Article 454
Title@2025-07-15 (2): Temporal Chunking Enhances Recognition of Implicit Sequential Patterns
Title: Temporal Chunking Enhances Recognition of Implicit Sequential Patterns | Temporales Chunking verbessert die Anerkennung von impliziten Sequenzmustern | 增强对隐性序列模式的认识 2506.00588v2 |
Authors (6): Jayanta Dey, Nicholas Soures, Miranda Gonzales, Itamar Lerner, Christopher Kanan, Dhireesha Kudithipudi
In this pilot study, we propose a neuro-inspired approach that compresses temporal sequences into context-tagged chunks, where each tag represents a recurring structural unit or``community’’ in the sequence. These tags are generated during an offline sleep phase and serve as compact references to past experience, allowing the learner to incorporate information beyond its immediate input range. We evaluate this idea in a controlled synthetic environment designed to reveal the limitations of traditional neural network based sequence learners, such as recurrent neural networks (RNNs), when facing temporal patterns on multiple timescales. We evaluate this idea in a controlled synthetic environment designed to reveal the limitations of traditional neural network based sequence learners, such as recurrent neural networks (RNNs), when facing temporal patterns on multiple timescales. Our results, while preliminary, suggest that temporal chunking can significantly enhance learning efficiency under resource constrained settings. A small-scale human pilot study using a Serial Reaction Time Task further motivates the idea of structural abstraction. Although limited to synthetic tasks, this work serves as an early proof-of-concept, with initial evidence that learned context tags can transfer across related task, offering potential for future applications in transfer learning.
nan
Article 455
Title@2025-07-15 (2): Gaussian mixture models as a proxy for interacting language models
Title: Gaussian mixture models as a proxy for interacting language models | Gaußsche Mischungsmodelle als Proxy für interagierende Sprachmodelle | Gaussian 混合模型作为交互语言模型的替代 2506.00077v3 |
Authors (6): Edward L. Wang, Tianyu Wang, Hayden Helm, Avanti Athreya, Vince Lyzinski, Carey E. Priebe
Large language models (LLMs) are a powerful tool with the ability to match human capabilities and behavior in many settings. Retrieval-augmented generation (RAG) further allows LLMs to generate diverse output depending on the contents of their RAG database. This motivates their use in the social sciences to study human behavior between individuals when large-scale experiments are infeasible. However, LLMs depend on complex, computationally expensive algorithms. In this paper, we introduce interacting Gaussian mixture models (GMMs) as an alternative to similar frameworks using LLMs. We compare a simplified model of GMMs to select experimental simulations of LLMs whose updating and response depend on feedback from other LLMs. We find that interacting GMMs capture important features of the dynamics in interacting LLMs, and we investigate key similarities and differences between interacting LLMs and GMMs. We conclude by discussing the benefits of Gaussian mixture models, potential modifications, and future research directions.
nan
Article 456
Title@2025-07-15 (2): Moderate Adaptive Linear Units (MoLU)
Title: Moderate Adaptive Linear Units (MoLU) | Mäßige adaptive Lineareinheiten (MoLU) | 适应性线性线性单位(MoLU) 2302.13696v7 |
Authors (3): Hankyul Koh, Joon-hyuk Ko, Wonho Jhe
We propose the Moderate Adaptive Linear Unit (MoLU), a novel activation function for deep neural networks, defined analytically as: f(x)=x \times (1+tanh(x))/2. MoLU combines mathematical elegance with empirical effectiveness, exhibiting superior performance in terms of prediction accuracy, convergence speed, and computational efficiency. Due to its C-infinity smoothness, i.e. infinite differentiability and analyticity, MoLU is expected to mitigate issues such as vanishing or exploding gradients, making it suitable for a broad range of architectures and applications, including large language models (LLMs), Neural Ordinary Differential Equations (Neural ODEs), Physics-Informed Neural Networks (PINNs), and Convolutional Neural Networks (CNNs). Empirical evaluations show that MoLU consistently achieves faster convergence and improved final accuracy relative to widely used activation functions such as GeLU, SiLU, and Mish. These properties position MoLU as a promising and robust candidate for general-purpose activation across diverse deep learning paradigms.
nan
Article 457
Title@2025-07-15 (2): Stochastic Entanglement Configuration for Constructive Entanglement Topologies in Quantum Machine Learning with Application to Cardiac MRI
Title: Stochastic Entanglement Configuration for Constructive Entanglement Topologies in Quantum Machine Learning with Application to Cardiac MRI | Stochastische Verflechtungskonfiguration für konstruktives Verflechtungs-Topologien im Quantum Machine Learning mit Anwendung auf Herz-Kreislauf-MRT | Qantum 机器学习中用于心脏部磁共振 2507.11401v1 |
Authors (2): Mehri Mehrnia, Mohammed S. M. Elbaz
Efficient entanglement strategies are essential for advancing variational quantum circuits (VQCs) for quantum machine learning (QML). However, most current approaches use fixed entanglement topologies that are not adaptive to task requirements, limiting potential gains over classical models. We introduce a novel stochastic entanglement configuration method that systematically generates diverse entanglement topologies to identify a subspace of constructive entanglement configurations, defined as entanglement topologies that boost hybrid model performance (e.g., classification accuracy) beyond classical baselines. Each configuration is encoded as a stochastic binary matrix, denoting directed entanglement between qubits. This enables scalable exploration of the hyperspace of candidate entanglement topologies using entanglement density and per-qubit constraints as key metrics. We define unconstrained and constrained sampling modes, controlling entanglement per qubit. Using our method, 400 stochastic configurations were generated and evaluated in a hybrid QML for cardiac MRI disease classification. We identified 64 (16%) novel constructive entanglement configurations that consistently outperformed the classical baseline. Ensemble aggregation of top-performing configurations achieved ~0.92 classification accuracy, exceeding the classical model (~0.87) by over 5%. Compared to four conventional topologies (ring, nearest neighbor, no entanglement, fully entangled), none surpassed the classical baseline (maximum accuracy ~0.82), while our configurations delivered up to ~20% higher accuracy. Thus, highlighting the robustness and generalizability of the identified constructive entanglements.
nan
Article 458
Title@2025-07-15 (2): X Hacking: The Threat of Misguided AutoML
Title: X Hacking: The Threat of Misguided AutoML | X Hacking: Die Bedrohung durch fehlgeleitete AutoML | Xacking:误导自动洗钱的威胁 2401.08513v3 |
Authors (7): Rahul Sharma, Sergey Redyuk, Sumantrak Mukherjee, Andrea Šipka, Eyke Hüllermeier, Sebastian Vollmer, David Selby
Explainable AI (XAI) and interpretable machine learning methods help to build trust in model predictions and derived insights, yet also present a perverse incentive for analysts to manipulate XAI metrics to support pre-specified conclusions. This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as SHAP values. We show how easily an automated machine learning pipeline can be adapted to exploit model multiplicity at scale: searching a Rashomon set of ‘defensible’ models with similar predictive performance to find a desired explanation. We formulate the trade-off between explanation and accuracy as a multi-objective optimisation problem, and illustrate empirically on familiar real-world datasets that, on average, Bayesian optimisation accelerates X-hacking 3-fold for features susceptible to it, versus random sampling. We show the vulnerability of a dataset to X-hacking can be determined by information redundancy among features. Finally, we suggest possible methods for detection and prevention, and discuss ethical implications for the credibility and reproducibility of XAI.
nan
Article 459
Title@2025-07-15 (2): The model is the message: Lightweight convolutional autoencoders applied to noisy imaging data for planetary science and astrobiology
Title: The model is the message: Lightweight convolutional autoencoders applied to noisy imaging data for planetary science and astrobiology | Das Modell ist die Botschaft: Leichte konvolutionäre Autoencoder, die auf laute Bilddaten für die Planetenwissenschaft und Astrobiologie angewendet werden | 模型就是信息:轻量级变速自动电解码器,用于行星科学和天体生物学的噪音成像数据。 2507.11400v1 |
Authors (1): Caleb Scharf
The application of convolutional autoencoder deep learning to imaging data for planetary science and astrobiological use is briefly reviewed and explored with a focus on the need to understand algorithmic rationale, process, and results when machine learning is utilized. Successful autoencoders train to build a model that captures the features of data in a dimensionally reduced form (the latent representation) that can then be used to recreate the original input. One application is the reconstruction of incomplete or noisy data. Here a baseline, lightweight convolutional autoencoder is used to examine the utility for planetary image reconstruction or inpainting in situations where there is destructive random noise (i.e., either luminance noise with zero returned data in some image pixels, or color noise with random additive levels across pixel channels). It is shown that, in certain use cases, multi-color image reconstruction can be usefully applied even with extensive random destructive noise with 90% areal coverage and higher. This capability is discussed in the context of intentional masking to reduce data bandwidth, or situations with low-illumination levels and other factors that obscure image data (e.g., sensor degradation or atmospheric conditions). It is further suggested that for some scientific use cases the model latent space and representations have more utility than large raw imaging datasets.
nan
Article 460
Title@2025-07-15 (2): Inverse Reinforcement Learning with Switching Rewards and History Dependency for Characterizing Animal Behaviors
Title: Inverse Reinforcement Learning with Switching Rewards and History Dependency for Characterizing Animal Behaviors | Inverse Verstärkung Lernen mit wechselnden Belohnungen und Geschichte Abhängigkeit für die Charakterisierung von Tierverhalten | 反强化学习,转换奖励和对动物行为定性的动物行为的历史依赖 2501.12633v3 |
Authors (5): Jingyang Ke, Feiyang Wu, Jiyi Wang, Jeffrey Markowitz, Anqi Wu
Traditional approaches to studying decision-making in neuroscience focus on simplified behavioral tasks where animals perform repetitive, stereotyped actions to receive explicit rewards. While informative, these methods constrain our understanding of decision-making to short timescale behaviors driven by explicit goals. In natural environments, animals exhibit more complex, long-term behaviors driven by intrinsic motivations that are often unobservable. Recent works in time-varying inverse reinforcement learning (IRL) aim to capture shifting motivations in long-term, freely moving behaviors. However, a crucial challenge remains: animals make decisions based on their history, not just their current state. To address this, we introduce SWIRL (SWitching IRL), a novel framework that extends traditional IRL by incorporating time-varying, history-dependent reward functions. SWIRL models long behavioral sequences as transitions between short-term decision-making processes, each governed by a unique reward function. SWIRL incorporates biologically plausible history dependency to capture how past decisions and environmental contexts shape behavior, offering a more accurate description of animal decision-making. We apply SWIRL to simulated and real-world animal behavior datasets and show that it outperforms models lacking history dependency, both quantitatively and qualitatively. This work presents the first IRL model to incorporate history-dependent policies and rewards to advance our understanding of complex, naturalistic decision-making in animals.
nan
Article 461
Title@2025-07-15 (2): A Neural Network Model of Complementary Learning Systems: Pattern Separation and Completion for Continual Learning
Title: A Neural Network Model of Complementary Learning Systems: Pattern Separation and Completion for Continual Learning | Ein neurales Netzwerkmodell für komplementäre Lernsysteme: Mustertrennung und -vervollständigung für kontinuierliches Lernen | 补充学习系统神经网络模型:持续学习的模式分离和完成 2507.11393v1 |
Authors (4): James P Jun, Vijay Marupudi, Raj Sanjay Shah, Sashank Varma
Learning new information without forgetting prior knowledge is central to human intelligence. In contrast, neural network models suffer from catastrophic forgetting: a significant degradation in performance on previously learned tasks when acquiring new information. The Complementary Learning Systems (CLS) theory offers an explanation for this human ability, proposing that the brain has distinct systems for pattern separation (encoding distinct memories) and pattern completion (retrieving complete memories from partial cues). To capture these complementary functions, we leverage the representational generalization capabilities of variational autoencoders (VAEs) and the robust memory storage properties of Modern Hopfield networks (MHNs), combining them into a neurally plausible continual learning model. We evaluate this model on the Split-MNIST task, a popular continual learning benchmark, and achieve close to state-of-the-art accuracy (~90%), substantially reducing forgetting. Representational analyses empirically confirm the functional dissociation: the VAE underwrites pattern completion, while the MHN drives pattern separation. By capturing pattern separation and completion in scalable architectures, our work provides a functional template for modeling memory consolidation, generalization, and continual learning in both biological and artificial systems.
nan
Article 462
Title@2025-07-15 (2): Synthetic Tabular Data Generation: A Comparative Survey for Modern Techniques
Title: Synthetic Tabular Data Generation: A Comparative Survey for Modern Techniques | Synthetische tabellarische Datengenerierung: Eine vergleichende Erhebung für moderne Techniken | 制作合成图表数据:现代技术比较调查 2507.11590v1 |
Authors (4): Raju Challagundla, Mohsen Dorodchi, Pu Wang, Minwoo Lee
As privacy regulations become more stringent and access to real-world data becomes increasingly constrained, synthetic data generation has emerged as a vital solution, especially for tabular datasets, which are central to domains like finance, healthcare and the social sciences. This survey presents a comprehensive and focused review of recent advances in synthetic tabular data generation, emphasizing methods that preserve complex feature relationships, maintain statistical fidelity, and satisfy privacy requirements. A key contribution of this work is the introduction of a novel taxonomy based on practical generation objectives, including intended downstream applications, privacy guarantees, and data utility, directly informing methodological design and evaluation strategies. Therefore, this review prioritizes the actionable goals that drive synthetic data creation, including conditional generation and risk-sensitive modeling. Additionally, the survey proposes a benchmark framework to align technical innovation with real-world demands. By bridging theoretical foundations with practical deployment, this work serves as both a roadmap for future research and a guide for implementing synthetic tabular data in privacy-critical environments.
nan
Article 463
Title@2025-07-15 (2): From Kinetic Theory to AI: a Rediscovery of High-Dimensional Divergences and Their Properties
Title: From Kinetic Theory to AI: a Rediscovery of High-Dimensional Divergences and Their Properties | Von der Kinetischen Theorie zur KI: Eine Wiederentdeckung hochdimensionaler Divergenzen und ihrer Eigenschaften | 从动从理论到AI:重现高度多元差异及其属性 2507.11387v1 |
Authors (4): Gennaro Auricchio, Giovanni Brigati, Paolo Giudici, Giuseppe Toscani
Selecting an appropriate divergence measure is a critical aspect of machine learning, as it directly impacts model performance. Among the most widely used, we find the Kullback-Leibler (KL) divergence, originally introduced in kinetic theory as a measure of relative entropy between probability distributions. Just as in machine learning, the ability to quantify the proximity of probability distributions plays a central role in kinetic theory. In this paper, we present a comparative review of divergence measures rooted in kinetic theory, highlighting their theoretical foundations and exploring their potential applications in machine learning and artificial intelligence.
nan
Article 464
Title@2025-07-15 (2): Einstein Fields: A Neural Perspective To Computational General Relativity
Title: Einstein Fields: A Neural Perspective To Computational General Relativity | Einstein-Felder: Eine neurale Perspektive zur Berechnung allgemeiner Relativität | 爱因斯坦领域:从神经角度看待对一般相对论的比较 2507.11589v1 |
Authors (4): Sandeep Suresh Cranganore, Andrei Bodnar, Arturs Berzins, Johannes Brandstetter
We introduce Einstein Fields, a neural representation that is designed to compress computationally intensive four-dimensional numerical relativity simulations into compact implicit neural network weights. By modeling the \emph{metric}, which is the core tensor field of general relativity, Einstein Fields enable the derivation of physical quantities via automatic differentiation. However, unlike conventional neural fields (e.g., signed distance, occupancy, or radiance fields), Einstein Fields are \emph{Neural Tensor Fields} with the key difference that when encoding the spacetime geometry of general relativity into neural field representations, dynamics emerge naturally as a byproduct. Einstein Fields show remarkable potential, including continuum modeling of 4D spacetime, mesh-agnosticity, storage efficiency, derivative accuracy, and ease of use. We address these challenges across several canonical test beds of general relativity and release an open source JAX-based library, paving the way for more scalable and expressive approaches to numerical relativity. Code is made available at https://github.com/AndreiB137/EinFields
nan
Article 465
Title@2025-07-15 (2): Joint space-time wind field data extrapolation and uncertainty quantification using nonparametric Bayesian dictionary learning
Title: Joint space-time wind field data extrapolation and uncertainty quantification using nonparametric Bayesian dictionary learning | Gemeinsame Raum-Zeit-Windfelddaten-Extrapolation und Unsicherheits-Quantifizierung mit nichtparametrischem Bayesian Wörterbuch-Lernen | 使用非参数贝耶斯词典学习法进行联合时空风场数据外推和不确定性量化 2507.11385v1 |
Authors (3): George D. Pasparakis, Ioannis A. Kougioumtzoglou, Michael D. Shields
A methodology is developed, based on nonparametric Bayesian dictionary learning, for joint space-time wind field data extrapolation and estimation of related statistics by relying on limited/incomplete measurements. Specifically, utilizing sparse/incomplete measured data, a time-dependent optimization problem is formulated for determining the expansion coefficients of an associated low-dimensional representation of the stochastic wind field. Compared to an alternative, standard, compressive sampling treatment of the problem, the developed methodology exhibits the following advantages. First, the Bayesian formulation enables also the quantification of the uncertainty in the estimates. Second, the requirement in standard CS-based applications for an a priori selection of the expansion basis is circumvented. Instead, this is done herein in an adaptive manner based on the acquired data. Overall, the methodology exhibits enhanced extrapolation accuracy, even in cases of high-dimensional data of arbitrary form, and of relatively large extrapolation distances. Thus, it can be used, potentially, in a wide range of wind engineering applications where various constraints dictate the use of a limited number of sensors. The efficacy of the methodology is demonstrated by considering two case studies. The first relates to the extrapolation of simulated wind velocity records consistent with a prescribed joint wavenumber-frequency power spectral density in a three-dimensional domain (2D and time). The second pertains to the extrapolation of four-dimensional (3D and time) boundary layer wind tunnel experimental data that exhibit significant spatial variability and non-Gaussian characteristics.
nan
Article 466
Title@2025-07-15 (2): SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics
Title: SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics | SToFM: ein Multi-Skala-Stiftungsmodell für räumliche Transkriptomik | SToFM:空间转换学多规模基础模型 2507.11588v1 |
Authors (6): Suyuan Zhao, Yizhen Luo, Ganbo Yang, Yan Zhong, Hao Zhou, Zaiqing Nie
Spatial Transcriptomics (ST) technologies provide biologists with rich insights into single-cell biology by preserving spatial context of cells. Building foundational models for ST can significantly enhance the analysis of vast and complex data sources, unlocking new perspectives on the intricacies of biological tissues. However, modeling ST data is inherently challenging due to the need to extract multi-scale information from tissue slices containing vast numbers of cells. This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose SToFM, a multi-scale Spatial Transcriptomics Foundation Model. SToFM first performs multi-scale information extraction on each ST slice, to construct a set of ST sub-slices that aggregate macro-, micro- and gene-scale information. Then an SE(2) Transformer is used to obtain high-quality cell representations from the sub-slices. Additionally, we construct \textbf{SToCorpus-88M}, the largest high-resolution spatial transcriptomics corpus for pretraining. SToFM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation, demonstrating its comprehensive understanding of ST data
nan
Article 467
Title@2025-07-15 (2): An All-digital 8.6-nJ/Frame 65-nm Tsetlin Machine Image Classification Accelerator
Title: An All-digital 8.6-nJ/Frame 65-nm Tsetlin Machine Image Classification Accelerator | Ein volldigitaler 8,6-nJ/Frame 65-nm Tsetlin Maschineneinteilung Beschleuniger | 全数8.6-nJ/Frame 65nm Tsetlin 机器图像分类加速器 2501.19347v3 |
Authors (6): Svein Anders Tunheim, Yujin Zheng, Lei Jiao, Rishad Shafik, Alex Yakovlev, Ole-Christoffer Granmo
We present an all-digital programmable machine learning accelerator chip for image classification, underpinning on the Tsetlin machine (TM) principles. The TM is an emerging machine learning algorithm founded on propositional logic, utilizing sub-pattern recognition expressions called clauses. The accelerator implements the coalesced TM version with convolution, and classifies booleanized images of 28$\times$28 pixels with 10 categories. A configuration with 128 clauses is used in a highly parallel architecture. Fast clause evaluation is achieved by keeping all clause weights and Tsetlin automata (TA) action signals in registers. The chip is implemented in a 65 nm low-leakage CMOS technology, and occupies an active area of 2.7 mm$^2$. At a clock frequency of 27.8 MHz, the accelerator achieves 60.3k classifications per second, and consumes 8.6 nJ per classification. This demonstrates the energy-efficiency of the TM, which was the main motivation for developing this chip. The latency for classifying a single image is 25.4 $\mu$s which includes system timing overhead. The accelerator achieves 97.42%, 84.54% and 82.55% test accuracies for the datasets MNIST, Fashion-MNIST and Kuzushiji-MNIST, respectively, matching the TM software models.
nan
Article 468
Title@2025-07-15 (2): Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs
Title: Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs | Schrittweise Richtlinie für Wissen über seltene Werkzeuge (SPaRK): Offline-RL, die vielfältige Werkzeugnutzung in LLMs antreibt | 有限工具知识(SPARK)的逐步政策:驱动在LLM中使用多样化工具的离线RL 2507.11371v1 |
Authors (3): Gabriel Bo, Koa Chang, Justin Gu
We present Step-wise Policy for Rare-tool Knowledge (SPaRK), a novel reinforcement learning framework that teaches large language models to explore diverse tool usage patterns beyond conventional high-temperature sampling. Building on recent advances in step-wise reinforcement learning, we introduce a dual-objective reward system that simultaneously optimizes for answer quality and tool diversity, training a Llama-3.1 8B model through offline PPO on synthetically generated trajectories from the MMLU-Pro dataset. Our approach uniquely employs a rarity-first exploitation strategy where a GPT-4o judge scores candidate actions across eight distinct tools plus chain-of-thought reasoning, with the policy favoring less-frequently used but still viable tools to encourage systematic exploration. Empirical results demonstrate that SPaRK achieves competitive performance across 14 MMLU-Pro categories while exhibiting significantly higher entropy in tool selection compared to both baseline and supervised fine-tuning approaches, suggesting that algorithmic exploration through explicit tool diversity can enhance reasoning capabilities without sacrificing accuracy.
nan
Article 469
Title@2025-07-15 (2): Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning
Title: Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning | Lokales Paarweise Abstand passend für Backpropagation-freie Verstärkungs-Lernen | 后推进-无强化学习的地方对等相近距离匹配 2507.11367v1 |
Authors (1): Daniel Tanneberg
Training neural networks with reinforcement learning (RL) typically relies on backpropagation (BP), necessitating storage of activations from the forward pass for subsequent backward updates. Furthermore, backpropagating error signals through multiple layers often leads to vanishing or exploding gradients, which can degrade learning performance and stability. We propose a novel approach that trains each layer of the neural network using local signals during the forward pass in RL settings. Our approach introduces local, layer-wise losses leveraging the principle of matching pairwise distances from multi-dimensional scaling, enhanced with optional reward-driven guidance. This method allows each hidden layer to be trained using local signals computed during forward propagation, thus eliminating the need for backward passes and storing intermediate activations. Our experiments, conducted with policy gradient methods across common RL benchmarks, demonstrate that this backpropagation-free method achieves competitive performance compared to their classical BP-based counterpart. Additionally, the proposed method enhances stability and consistency within and across runs, and improves performance especially in challenging environments.
nan
Article 470
Title@2025-07-15 (2): A Parallelizable Approach for Characterizing NE in Zero-Sum Games After a Linear Number of Iterations of Gradient Descent
Title: A Parallelizable Approach for Characterizing NE in Zero-Sum Games After a Linear Number of Iterations of Gradient Descent | Ein parallelisierbarer Ansatz zur Charakterisierung von NE in Null-Sum-Spielen nach einer linearen Anzahl von Iterationen von gradienten Abstieg | 在 “ 累进后裔线性迭代数后零苏姆运动会 “ 中将NE定性的可平行办法 2507.11366v1 |
Authors (2): Taemin Kim, James P. Bailey
We study online optimization methods for zero-sum games, a fundamental problem in adversarial learning in machine learning, economics, and many other domains. Traditional methods approximate Nash equilibria (NE) using either regret-based methods (time-average convergence) or contraction-map-based methods (last-iterate convergence). We propose a new method based on Hamiltonian dynamics in physics and prove that it can characterize the set of NE in a finite (linear) number of iterations of alternating gradient descent in the unbounded setting, modulo degeneracy, a first in online optimization. Unlike standard methods for computing NE, our proposed approach can be parallelized and works with arbitrary learning rates, both firsts in algorithmic game theory. Experimentally, we support our results by showing our approach drastically outperforms standard methods.
nan
Article 471
Title@2025-07-15 (2): DeInfoReg: A Decoupled Learning Framework for Better Training Throughput
Title: DeInfoReg: A Decoupled Learning Framework for Better Training Throughput | DeInfoReg: Ein entkoppelter Lernrahmen für besseren Trainingsdurchsatz | DInfoReg:一个分离的学习框架,以改善培训工作量 2506.18193v2 |
Authors (3): Zih-Hao Huang, You-Teng Lin, Hung-Hsuan Chen
This paper introduces Decoupled Supervised Learning with Information Regularization (DeInfoReg), a novel approach that transforms a long gradient flow into multiple shorter ones, thereby mitigating the vanishing gradient problem. Integrating a pipeline strategy, DeInfoReg enables model parallelization across multiple GPUs, significantly improving training throughput. We compare our proposed method with standard backpropagation and other gradient flow decomposition techniques. Extensive experiments on diverse tasks and datasets demonstrate that DeInfoReg achieves superior performance and better noise resistance than traditional BP models and efficiently utilizes parallel computing resources. The code for reproducibility is available at: https://github.com/ianzih/Decoupled-Supervised-Learning-for-Information-Regularization/.
nan
Article 472
Title@2025-07-15 (2): Neurosymbolic Reasoning Shortcuts under the Independence Assumption
Title: Neurosymbolic Reasoning Shortcuts under the Independence Assumption | Neurosymbolische Begründung Kurzbefehle unter der Unabhängigkeitsaufnahme | 独立假设下的神经曲脚解释快捷键 2507.11357v1 |
Authors (4): Emile van Krieken, Pasquale Minervini, Edoardo Ponti, Antonio Vergari
The ubiquitous independence assumption among symbolic concepts in neurosymbolic (NeSy) predictors is a convenient simplification: NeSy predictors use it to speed up probabilistic reasoning. Recent works like van Krieken et al. (2024) and Marconato et al. (2024) argued that the independence assumption can hinder learning of NeSy predictors and, more crucially, prevent them from correctly modelling uncertainty. There is, however, scepticism in the NeSy community around the scenarios in which the independence assumption actually limits NeSy systems (Faronius and Dos Martires, 2025). In this work, we settle this question by formally showing that assuming independence among symbolic concepts entails that a model can never represent uncertainty over certain concept combinations. Thus, the model fails to be aware of reasoning shortcuts, i.e., the pathological behaviour of NeSy predictors that predict correct downstream tasks but for the wrong reasons.
nan
Article 473
Title@2025-07-15 (2): Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making
Title: Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making | Beyond Predictions: Ein partizipatorischer Rahmen für Entscheidungsfindung mit mehreren Interessenträgern | 超越预测:多方利益攸关方决策参与框架 2502.08542v2 |
Authors (3): Vittoria Vineis, Giuseppe Perelli, Gabriele Tolomei
Conventional automated decision-support systems, often based on supervised learning, focus on predicting outcomes to recommend actions. However, they typically overlook the complexity of multi-actor environments, where diverse and conflicting stakeholder preferences must be balanced. At the same time, participatory AI approaches remain largely context-specific, limiting their broader applicability. To address these gaps, we propose a participatory framework that reframes decision-making as a multi-stakeholder optimization problem, using context-dependent reward functions to represent each actor’s preferences. Our modular, model-agnostic framework employs k-fold cross-validation to fine-tune user-provided prediction models and evaluate decision strategies, including compromise functions that mediate stakeholder trade-offs. A synthetic scoring mechanism aggregates user-defined preferences across multiple metrics to rank strategies and select an optimal decision-maker for generating actionable recommendations on new data. Validated on two high-stake real-world case studies, the framework consistently produces stakeholder-aware decisions that outperform purely predictive baselines across multiple metrics, while enhancing the transparency and accountability of AI-supported decision-making.
nan
Article 474
Title@2025-07-15 (2): Guiding LLM Decision-Making with Fairness Reward Models
Title: Guiding LLM Decision-Making with Fairness Reward Models | Leitende LLM-Entscheidungs-Making mit Fairness-Reward-Modelle | 以公平奖励模式作出指导性LLM决策 2507.11344v1 |
Authors (5): Zara Hall, Melanie Subbiah, Thomas P Zollo, Kathleen McKeown, Richard Zemel
Large language models are increasingly used to support high-stakes decisions, potentially influencing who is granted bail or receives a loan. Naive chain-of-thought sampling can improve average decision accuracy, but has also been shown to amplify unfair bias. To address this challenge and enable the trustworthy use of reasoning models in high-stakes decision-making, we propose a framework for training a generalizable Fairness Reward Model (FRM). Our model assigns a fairness score to LLM reasoning, enabling the system to down-weight biased trajectories and favor equitable ones when aggregating decisions across reasoning chains. We show that a single Fairness Reward Model, trained on weakly supervised, LLM-annotated examples of biased versus unbiased reasoning, transfers across tasks, domains, and model families without additional fine-tuning. Applied to real-world decision-making tasks including recidivism prediction and social media moderation, we show that our approach consistently improves fairness while matching, or even surpassing, baseline accuracy.
nan
Article 475
Title@2025-07-15 (2): Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding
Title: Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding | Intervening in Black Box: Konzept Engpass-Modell für die Verbesserung der menschlichen neuralen Netzwerk gegenseitiges Verständnis | 黑盒干预:增强人类神经网络相互了解的概念瓶颈模式 2506.22803v2 |
Authors (8): Nuoye Xiong, Anqi Dong, Ning Wang, Cong Hua, Guangming Zhu, Lin Mei, Peiyi Shen, Liang Zhang
Recent advances in deep learning have led to increasingly complex models with deeper layers and more parameters, reducing interpretability and making their decisions harder to understand. While many methods explain black-box reasoning, most lack effective interventions or only operate at sample-level without modifying the model itself. To address this, we propose the Concept Bottleneck Model for Enhancing Human-Neural Network Mutual Understanding (CBM-HNMU). CBM-HNMU leverages the Concept Bottleneck Model (CBM) as an interpretable framework to approximate black-box reasoning and communicate conceptual understanding. Detrimental concepts are automatically identified and refined (removed/replaced) based on global gradient contributions. The modified CBM then distills corrected knowledge back into the black-box model, enhancing both interpretability and accuracy. We evaluate CBM-HNMU on various CNN and transformer-based models across Flower-102, CIFAR-10, CIFAR-100, FGVC-Aircraft, and CUB-200, achieving a maximum accuracy improvement of 2.64% and a maximum increase in average accuracy across 1.03%. Source code is available at: https://github.com/XiGuaBo/CBM-HNMU.
nan
Article 476
Title@2025-07-15 (2): Universal rates of ERM for agnostic learning
Title: Universal rates of ERM for agnostic learning | Universelle WKM-Raten für agnostisches Lernen | 用于不可不可知性学习的企业风险管理普遍比率 2506.14110v2 |
Authors (2): Steve Hanneke, Mingyue Xu
The universal learning framework has been developed to obtain guarantees on the learning rates that hold for any fixed distribution, which can be much faster than the ones uniformly hold over all the distributions. Given that the Empirical Risk Minimization (ERM) principle being fundamental in the PAC theory and ubiquitous in practical machine learning, the recent work of arXiv:2412.02810 studied the universal rates of ERM for binary classification under the realizable setting. However, the assumption of realizability is too restrictive to hold in practice. Indeed, the majority of the literature on universal learning has focused on the realizable case, leaving the non-realizable case barely explored. In this paper, we consider the problem of universal learning by ERM for binary classification under the agnostic setting, where the ‘‘learning curve” reflects the decay of the excess risk as the sample size increases. We explore the possibilities of agnostic universal rates and reveal a compact trichotomy: there are three possible agnostic universal rates of ERM, being either $e^{-n}$, $o(n^{-1/2})$, or arbitrarily slow. We provide a complete characterization of which concept classes fall into each of these categories. Moreover, we also establish complete characterizations for the target-dependent universal rates as well as the Bayes-dependent universal rates.
nan
Article 477
Title@2025-07-15 (2): Internal Value Alignment in Large Language Models through Controlled Value Vector Activation
Title: Internal Value Alignment in Large Language Models through Controlled Value Vector Activation | Interne Wertausrichtung in großen Sprachmodellen durch kontrollierte Wert-Vektor-Aktivierung | 通过控制值矢量激活,通过控制值矢量激活,大语言模型的内部价值对齐 2507.11316v1 |
Authors (7): Haoran Jin, Meng Li, Xiting Wang, Zhihao Xu, Minlie Huang, Yantao Jia, Defu Lian
Aligning Large Language Models (LLMs) with human values has attracted increasing attention since it provides clarity, transparency, and the ability to adapt to evolving scenarios. In this paper, we introduce a Controlled Value Vector Activation (ConVA) method that directly aligns the internal values of LLMs by interpreting how a value is encoded in their latent representations and modifies relevant activations to ensure consistent values in LLMs. To ensure an accurate and unbiased interpretation, we propose a context-controlled value vector identification method. To consistently control values without sacrificing model performance, we introduce a gated value vector activation method for effective and minimum degree of value control. Experiments show that our method achieves the highest control success rate across 10 basic values without hurting LLM performance and fluency, and ensures target values even with opposite and potentially malicious input prompts. Source code and data are available at~ https://github.com/hr-jin/ConVA.
nan
Article 478
Title@2025-07-15 (2): Contrast All the Time: Learning Time Series Representation from Temporal Consistency
Title: Contrast All the Time: Learning Time Series Representation from Temporal Consistency | Kontrast die ganze Zeit: Zeitreihendarstellung von zeitlicher Konsistenz lernen | 时间一致性的学习时间序列代表 2410.15416v2 |
Authors (3): Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor
Representation learning for time series using contrastive learning has emerged as a critical technique for improving the performance of downstream tasks. To advance this effective approach, we introduce CaTT (\textit{Contrast All The Time}), a new approach to unsupervised contrastive learning for time series, which takes advantage of dynamics between temporally similar moments more efficiently and effectively than existing methods. CaTT departs from conventional time-series contrastive approaches that rely on data augmentations or selected views. Instead, it uses the full temporal dimension by contrasting all time steps in parallel. This is made possible by a scalable NT-pair formulation, which extends the classic N-pair loss across both batch and temporal dimensions, making the learning process end-to-end and more efficient. CaTT learns directly from the natural structure of temporal data, using repeated or adjacent time steps as implicit supervision, without the need for pair selection heuristics. We demonstrate that this approach produces superior embeddings which allow better performance in downstream tasks. Additionally, training is faster than other contrastive learning approaches, making it suitable for large-scale and real-world time series applications. The source code is publicly available at \href{https://github.com/sfi-norwai/CaTT}{https://github.com/sfi-norwai/CaTT}.
nan
Article 479
Title@2025-07-15 (2): Supercharging Floorplan Localization with Semantic Rays
Title: Supercharging Floorplan Localization with Semantic Rays | Supercharging Grundriss Lokalisierung mit Semantic Rays | 配有语义雷的本地化 2507.09291v2 |
Authors (2): Yuval Grader, Hadar Averbuch-Elor
Floorplans provide a compact representation of the building’s structure, revealing not only layout information but also detailed semantics such as the locations of windows and doors. However, contemporary floorplan localization techniques mostly focus on matching depth-based structural cues, ignoring the rich semantics communicated within floorplans. In this work, we introduce a semantic-aware localization framework that jointly estimates depth and semantic rays, consolidating over both for predicting a structural-semantic probability volume. Our probability volume is constructed in a coarse-to-fine manner: We first sample a small set of rays to obtain an initial low-resolution probability volume. We then refine these probabilities by performing a denser sampling only in high-probability regions and process the refined values for predicting a 2D location and orientation angle. We conduct an evaluation on two standard floorplan localization benchmarks. Our experiments demonstrate that our approach substantially outperforms state-of-the-art methods, achieving significant improvements in recall metrics compared to prior works. Moreover, we show that our framework can easily incorporate additional metadata such as room labels, enabling additional gains in both accuracy and efficiency.
nan
Article 480
Title@2025-07-15 (2): Grasping a Handful: Sequential Multi-Object Dexterous Grasp Generation
Title: Grasping a Handful: Sequential Multi-Object Dexterous Grasp Generation | Greifen einer Handful: Sequentielle Multi-Object Dexterous Grasp Generation | 绘制手巧的 : 序列式多对象脱色重力生成 2503.22370v3 |
Authors (6): Haofei Lu, Yifei Dong, Zehang Weng, Florian Pokorny, Jens Lundell, Danica Kragic
We introduce the sequential multi-object robotic grasp sampling algorithm SeqGrasp that can robustly synthesize stable grasps on diverse objects using the robotic hand’s partial Degrees of Freedom (DoF). We use SeqGrasp to construct the large-scale Allegro Hand sequential grasping dataset SeqDataset and use it for training the diffusion-based sequential grasp generator SeqDiffuser. We experimentally evaluate SeqGrasp and SeqDiffuser against the state-of-the-art non-sequential multi-object grasp generation method MultiGrasp in simulation and on a real robot. The experimental results demonstrate that SeqGrasp and SeqDiffuser reach an 8.71%-43.33% higher grasp success rate than MultiGrasp. Furthermore, SeqDiffuser is approximately 1000 times faster at generating grasps than SeqGrasp and MultiGrasp.
nan
Article 481
Title@2025-07-15 (2): FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation
Title: FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation | FeDa4Fair: Client-Level-Federated Datasets für die Fairness-Bewertung | FeDa4fair:公平评价客户-联邦数据集 2506.21095v2 |
Authors (4): Xenia Heilmann, Luca Corbucci, Mattia Cerrato, Anna Monreale
Federated Learning (FL) enables collaborative model training across multiple clients without sharing clients’ private data. However, fairness remains a key concern, as biases in local clients’ datasets can impact the entire federated system. Heterogeneous data distributions across clients may lead to models that are fairer for some clients than others. Although several fairness-enhancing solutions are present in the literature, most focus on mitigating bias for a single sensitive attribute, typically binary, overlooking the diverse and sometimes conflicting fairness needs of different clients. This limited perspective can limit the effectiveness of fairness interventions for the different clients. To support more robust and reproducible fairness research in FL, we aim to enable a consistent benchmarking of fairness-aware FL methods at both the global and client levels. In this paper, we contribute in three ways: (1) We introduce FeDa4Fair, a library to generate tabular datasets tailored to evaluating fair FL methods under heterogeneous client bias; (2) we release four bias-heterogeneous datasets and corresponding benchmarks to compare fairness mitigation methods in a controlled environment; (3) we provide ready-to-use functions for evaluating fairness outcomes for these datasets.
nan
Article 482
Title@2025-07-15 (2): Energy Efficiency in AI for 5G and Beyond: A DeepRx Case Study
Title: Energy Efficiency in AI for 5G and Beyond: A DeepRx Case Study | Energieeffizienz in KI für 5G und darüber hinaus: Eine DeepRx-Fallstudie | 5G 及5G 以上的AI 能源效率:深Rx 案例研究 2507.10409v2 |
Authors (2): Amine Lbath, Ibtissam Labriji
This study addresses the challenge of balancing energy efficiency with performance in AI/ML models, focusing on DeepRX, a deep learning receiver based on a fully convolutional ResNet architecture. We evaluate the energy consumption of DeepRX, considering factors including FLOPs/Watt and FLOPs/clock, and find consistency between estimated and actual energy usage, influenced by memory access patterns. The research extends to comparing energy dynamics during training and inference phases. A key contribution is the application of knowledge distillation (KD) to train a compact DeepRX student model that emulates the performance of the teacher model but with reduced energy consumption. We experiment with different student model sizes, optimal teacher sizes, and KD hyperparameters. Performance is measured by comparing the Bit Error Rate (BER) performance versus Signal-to-Interference & Noise Ratio (SINR) values of the distilled model and a model trained from scratch. The distilled models demonstrate a lower error floor across SINR levels, highlighting the effectiveness of KD in achieving energy-efficient AI solutions.
nan
Article 483
Title@2025-07-15 (2): Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime
Title: Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime | Schnelle letzte Konvergenz der SGD im glatten Interpolationssystem | SGD在平滑的内插制度中的汇合 2507.11274v1 |
Authors (4): Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren
We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting – particularly with large (constant) stepsizes – has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems. We establish that after $T$ steps of SGD on $\beta$-smooth convex loss functions with stepsize $\eta \leq 1/\beta$, the last iterate exhibits expected excess risk $\widetilde{O}(1/(\eta T^{1-\beta\eta/2}) + \eta T^{\beta\eta/2} \sigma_\star^2)$, where $\sigma_\star^2$ denotes the variance of the stochastic gradients at the optimum. In particular, for a well-tuned stepsize we obtain a near optimal $\widetilde{O}(1/T + \sigma_\star/\sqrt{T})$ rate for the last iterate, extending the results of Varre et al. (2021) beyond least squares regression; and when $\sigma_\star=0$ we obtain a rate of $O(1/\sqrt{T})$ with $\eta=1/\beta$, improving upon the best-known $O(T^{-1/4})$ rate recently established by Evron et al. (2025) in the special case of realizable linear regression.
nan
Article 484
Title@2025-07-15 (2): Parameter-Efficient Fine-Tuning with Circulant and Diagonal Vectors
Title: Parameter-Efficient Fine-Tuning with Circulant and Diagonal Vectors | Parametereffizientes Feintuning mit Kreisel- und Diagonalvektoren | 具有圆环和对角矢量的高效参数精密喷射 2505.00580v2 |
Authors (4): Xinyu Ding, Lexuan Chen, Siyu Liao, Zhongfeng Wang
Foundation models have achieved tremendous success in different domains. However, their huge computation and storage complexity make these models difficult to fine-tune and also less applicable in practice. Recent study shows training in Fourier domain can be an effective fine-tuning method in terms of both model performance and number of training parameters. In this work, we propose to further reduce the complexity by the factorization through the product of interleaved circulant and diagonal matrices. In addition, we address the case of non-square fine-tuning weights by partitioning the circulant matrix into blocks. Our method avoids the construction of weight change matrix and utilizes 1D fast Fourier transform (FFT) instead of 2D FFT. Experimental results show that our method achieves similar or better performance across various tasks with much less floating-point operations (FLOPs) and the number of trainable parameters.
nan
Article 485
Title@2025-07-15 (2): Gaussian Loss Smoothing Enables Certified Training with Tight Convex Relaxations
Title: Gaussian Loss Smoothing Enables Certified Training with Tight Convex Relaxations | Gaussian Loss Smoothing ermöglicht zertifiziertes Training mit engen Convex-Entspannungen | Gausian 滑动损失平滑使经认证的培训具有紧固封顶宽宽度 2403.07095v4 |
Authors (6): Stefan Balauca, Mark Niklas Müller, Yuhao Mao, Maximilian Baader, Marc Fischer, Martin Vechev
Training neural networks with high certified accuracy against adversarial examples remains an open challenge despite significant efforts. While certification methods can effectively leverage tight convex relaxations for bound computation, in training, these methods, perhaps surprisingly, can perform worse than looser relaxations. Prior work hypothesized that this phenomenon is caused by the discontinuity, non-smoothness, and perturbation sensitivity of the loss surface induced by tighter relaxations. In this work, we theoretically show that applying Gaussian Loss Smoothing (GLS) on the loss surface can alleviate these issues. We confirm this empirically by instantiating GLS with two variants: a zeroth-order optimization algorithm, called PGPE, which allows training with non-differentiable relaxations, and a first-order optimization algorithm, called RGS, which requires gradients of the relaxation but is much more efficient than PGPE. Extensive experiments show that when combined with tight relaxations, these methods surpass state-of-the-art methods when training on the same network architecture for many settings. Our results clearly demonstrate the promise of Gaussian Loss Smoothing for training certifiably robust neural networks and pave a path towards leveraging tighter relaxations for certified training.
nan
Article 486
Title@2025-07-15 (2): BridgeNet: A Hybrid, Physics-Informed Machine Learning Framework for Solving High-Dimensional Fokker-Planck Equations
Title: BridgeNet: A Hybrid, Physics-Informed Machine Learning Framework for Solving High-Dimensional Fokker-Planck Equations | BridgeNet: Hybrides, physikinformiertes Machine Learning Framework zur Lösung hochdimensionaler Fokker-Planck-Gleichungen | BridgeNet:用于解决高二分法克-普朗克赤道的混合、物理成形机械学习框架 2506.04354v4 |
Authors (3): Elmira Mirzabeigi, Rezvan Salehi, Kourosh Parand
BridgeNet is a novel hybrid framework that integrates convolutional neural networks with physics-informed neural networks to efficiently solve non-linear, high-dimensional Fokker-Planck equations (FPEs). Traditional PINNs, which typically rely on fully connected architectures, often struggle to capture complex spatial hierarchies and enforce intricate boundary conditions. In contrast, BridgeNet leverages adaptive CNN layers for effective local feature extraction and incorporates a dynamically weighted loss function that rigorously enforces physical constraints. Extensive numerical experiments across various test cases demonstrate that BridgeNet not only achieves significantly lower error metrics and faster convergence compared to conventional PINN approaches but also maintains robust stability in high-dimensional settings. This work represents a substantial advancement in computational physics, offering a scalable and accurate solution methodology with promising applications in fields ranging from financial mathematics to complex system dynamics.
nan
Article 487
Title@2025-07-15 (2): Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound
Title: Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound | Sand zu Gold machen: Recyclingdaten überbrücken On-Policy- und Off-Policy-Lernen über Causal Bound | 将沙沙变成金子:利用回收数据,通过 “ 因果关系 “ 将 “ 沙沙变成 “ 金 “ :利用回收数据,将 “ 政策 “ 和 “ 政策外学习 “ 连接起来 2507.11269v1 |
Authors (2): Tal Fiskus, Uri Shaham
Deep reinforcement learning (DRL) agents excel in solving complex decision-making tasks across various domains. However, they often require a substantial number of training steps and a vast experience replay buffer, leading to significant computational and resource demands. To address these challenges, we introduce a novel theoretical result that leverages the Neyman-Rubin potential outcomes framework into DRL. Unlike most methods that focus on bounding the counterfactual loss, we establish a causal bound on the factual loss, which is analogous to the on-policy loss in DRL. This bound is computed by storing past value network outputs in the experience replay buffer, effectively utilizing data that is usually discarded. Extensive experiments across the Atari 2600 and MuJoCo domains on various agents, such as DQN and SAC, achieve up to 2,427% higher reward ratio, outperforming the same agents without our proposed term, and reducing the experience replay buffer size by up to 96%, significantly improving sample efficiency at negligible cost.
nan
Article 488
Title@2025-07-15 (2): Privacy Against Agnostic Inference Attacks in Vertical Federated Learning
Title: Privacy Against Agnostic Inference Attacks in Vertical Federated Learning | Datenschutz gegen agnostische Inferenzangriffe im vertikalen Föderierten Lernen | 在垂直联邦学习中针对精神推断攻击的隐私 2302.05545v3 |
Authors (1): Morteza Varasteh
A novel form of inference attack in vertical federated learning (VFL) is proposed, where two parties collaborate in training a machine learning (ML) model. Logistic regression is considered for the VFL model. One party, referred to as the active party, possesses the ground truth labels of the samples in the training phase, while the other, referred to as the passive party, only shares a separate set of features corresponding to these samples. It is shown that the active party can carry out inference attacks on both training and prediction phase samples by acquiring an ML model independently trained on the training samples available to them. This type of inference attack does not require the active party to be aware of the score of a specific sample, hence it is referred to as an agnostic inference attack. It is shown that utilizing the observed confidence scores during the prediction phase, before the time of the attack, can improve the performance of the active party’s autonomous ML model, and thus improve the quality of the agnostic inference attack. As a countermeasure, privacy-preserving schemes (PPSs) are proposed. While the proposed schemes preserve the utility of the VFL model, they systematically distort the VFL parameters corresponding to the passive party’s features. The level of the distortion imposed on the passive party’s parameters is adjustable, giving rise to a trade-off between privacy of the passive party and interpretabiliy of the VFL outcomes by the active party. The distortion level of the passive party’s parameters could be chosen carefully according to the privacy and interpretabiliy concerns of the passive and active parties, respectively, with the hope of keeping both parties (partially) satisfied. Finally, experimental results demonstrate the effectiveness of the proposed attack and the PPSs.
nan
Article 489
Title@2025-07-15 (2): Block Circulant Adapter for Large Language Models
Title: Block Circulant Adapter for Large Language Models | Block Circulant Adapter für große Sprachmodelle | 用于大语言模型的块环相适应器 2505.00582v2 |
Authors (4): Xinyu Ding, Meiqi Wang, Siyu Liao, Zhongfeng Wang
Fine-tuning large language models (LLMs) is difficult due to their huge model size. Recent Fourier domain-based methods show potential for reducing fine-tuning costs. We propose a block circulant matrix-based fine-tuning method with a stable training heuristic to leverage the properties of circulant matrices and one-dimensional Fourier transforms to reduce storage and computation costs. Experiments show that our method uses $14\times$ less number of parameters than VeRA, $16\times$ smaller than LoRA and $32\times$ less FLOPs than FourierFT, while maintaining close or better task performance. Our approach presents a promising way in frequency domain to fine-tune large models on downstream tasks.
nan
Article 490
Title@2025-07-15 (2): Learning Safe Numeric Planning Action Models
Title: Learning Safe Numeric Planning Action Models | Sichere numerische Planungs-Aktionsmodelle lernen | 学习安全数字规划行动模式 2312.10705v2 |
Authors (4): Argaman Mordoch, Shahaf S. Shperberg, Roni Stern, Berndan Juba
A significant challenge in applying planning technology to real-world problems lies in obtaining a planning model that accurately represents the problem’s dynamics. Obtaining a planning model is even more challenging in mission-critical domains, where a trial-and-error approach to learning how to act is not an option. In such domains, the action model used to generate plans must be safe, in the sense that plans generated with it must be applicable and achieve their goals. % Learning safe action models for planning has been mostly explored for domains in which states are sufficiently described with Boolean variables. % In this work, we go beyond this limitation and propose the Numeric Safe Action Models Learning (N-SAM) algorithm. In this work, we present N-SAM, an action model learning algorithm capable of learning safe numeric preconditions and effects. We prove that N-SAM runs in linear time in the number of observations and, under certain conditions, is guaranteed to return safe action models. However, to preserve this safety guarantee, N-SAM must observe a substantial number of examples for each action before including it in the learned model. We address this limitation of N-SAM and propose N-SAM, an extension to the N-SAM algorithm that always returns an action model where every observed action is applicable at least in some states, even if it was observed only once. N-SAM does so without compromising the safety of the returned action model. We prove that N-SAM* is optimal in terms of sample complexity compared to any other algorithm that guarantees safety. N-SAM and N-SAM* are evaluated over an extensive benchmark of numeric planning domains, and their performance is compared to a state-of-the-art numeric action model learning algorithm. We also provide a discussion on the impact of numerical accuracy on the learning process.
nan
Article 491
Title@2025-07-15 (2): LyAm: Robust Non-Convex Optimization for Stable Learning in Noisy Environments
Title: LyAm: Robust Non-Convex Optimization for Stable Learning in Noisy Environments | LyAm: Robuste Non-Convex-Optimierung für stabiles Lernen in lauten Umgebungen | LyAm: 在噪音环境中稳定学习的强力非Convex优化 2507.11262v1 |
Authors (3): Elmira Mirzabeigi, Sepehr Rezaee, Kourosh Parand
Training deep neural networks, particularly in computer vision tasks, often suffers from noisy gradients and unstable convergence, which hinder performance and generalization. In this paper, we propose LyAm, a novel optimizer that integrates Adam’s adaptive moment estimation with Lyapunov-based stability mechanisms. LyAm dynamically adjusts the learning rate using Lyapunov stability theory to enhance convergence robustness and mitigate training noise. We provide a rigorous theoretical framework proving the convergence guarantees of LyAm in complex, non-convex settings. Extensive experiments on like as CIFAR-10 and CIFAR-100 show that LyAm consistently outperforms state-of-the-art optimizers in terms of accuracy, convergence speed, and stability, establishing it as a strong candidate for robust deep learning optimization.
nan
Article 492
Title@2025-07-15 (2): The Pragmatic Frames of Spurious Correlations in Machine Learning: Interpreting How and Why They Matter
Title: The Pragmatic Frames of Spurious Correlations in Machine Learning: Interpreting How and Why They Matter | Die Pragmatischen Rahmen von Puriösen Korrelationen im maschinellen Lernen: Verdolmetschen, wie und warum sie wichtig sind | 机器学习中净化的相互校正的实用框架:解释这些框架如何和为何重要 2411.04696v4 |
Authors (2): Samuel J. Bell, Skyler Wang
Learning correlations from data forms the foundation of today’s machine learning (ML) and artificial intelligence (AI) research. While contemporary methods enable the automatic discovery of complex patterns, they are prone to failure when unintended correlations are captured. This vulnerability has spurred a growing interest in interrogating spuriousness, which is often seen as a threat to model performance, fairness, and robustness. In this article, we trace departures from the conventional statistical definition of spuriousness – which denotes a non-causal relationship arising from coincidence or confounding – to examine how its meaning is negotiated in ML research. Rather than relying solely on formal definitions, researchers assess spuriousness through what we call pragmatic frames: judgments based on what a correlation does in practice – how it affects model behavior, supports or impedes task performance, or aligns with broader normative goals. Drawing on a broad survey of ML literature, we identify four such frames: relevance (“Models should use correlations that are relevant to the task”), generalizability (“Models should use correlations that generalize to unseen data”), human-likeness (“Models should use correlations that a human would use to perform the same task”), and harmfulness (“Models should use correlations that are not socially or ethically harmful”). These representations reveal that correlation desirability is not a fixed statistical property but a situated judgment informed by technical, epistemic, and ethical considerations. By examining how a foundational ML conundrum is problematized in research literature, we contribute to broader conversations on the contingent practices through which technical concepts like spuriousness are defined and operationalized.
nan
Article 493
Title@2025-07-15 (2): Fairness-Aware Grouping for Continuous Sensitive Variables: Application for Debiasing Face Analysis with respect to Skin Tone
Title: Fairness-Aware Grouping for Continuous Sensitive Variables: Application for Debiasing Face Analysis with respect to Skin Tone | Fairness-Aware-Gruppierung für kontinuierliche Sensitive Variablen: Anwendung für die Debiasing Face Analysis in Bezug auf Hautton | 持续敏感变量的公平意识群集:关于皮肤色调的贬低面分析申请 2507.11247v1 |
Authors (5): Veronika Shilova, Emmanuel Malherbe, Giovanni Palma, Laurent Risser, Jean-Michel Loubes
Within a legal framework, fairness in datasets and models is typically assessed by dividing observations into predefined groups and then computing fairness measures (e.g., Disparate Impact or Equality of Odds with respect to gender). However, when sensitive attributes such as skin color are continuous, dividing into default groups may overlook or obscure the discrimination experienced by certain minority subpopulations. To address this limitation, we propose a fairness-based grouping approach for continuous (possibly multidimensional) sensitive attributes. By grouping data according to observed levels of discrimination, our method identifies the partition that maximizes a novel criterion based on inter-group variance in discrimination, thereby isolating the most critical subgroups. We validate the proposed approach using multiple synthetic datasets and demonstrate its robustness under changing population distributions - revealing how discrimination is manifested within the space of sensitive attributes. Furthermore, we examine a specialized setting of monotonic fairness for the case of skin color. Our empirical results on both CelebA and FFHQ, leveraging the skin tone as predicted by an industrial proprietary algorithm, show that the proposed segmentation uncovers more nuanced patterns of discrimination than previously reported, and that these findings remain stable across datasets for a given model. Finally, we leverage our grouping model for debiasing purpose, aiming at predicting fair scores with group-by-group post-processing. The results demonstrate that our approach improves fairness while having minimal impact on accuracy, thus confirming our partition method and opening the door for industrial deployment.
nan
Article 494
Title@2025-07-15 (2): Generative Click-through Rate Prediction with Applications to Search Advertising
Title: Generative Click-through Rate Prediction with Applications to Search Advertising | Generative Click-through-Rate-Vorhersage mit Anwendungen zur Suche Werbung | 利用搜索广告应用程序生成点击率预测 2507.11246v1 |
Authors (6): Lingwei Kong, Lu Wang, Changping Peng, Zhangang Lin, Ching Law, Jingping Shao
Click-Through Rate (CTR) prediction models are integral to a myriad of industrial settings, such as personalized search advertising. Current methods typically involve feature extraction from users’ historical behavior sequences combined with product information, feeding into a discriminative model that is trained on user feedback to estimate CTR. With the success of models such as GPT, the potential for generative models to enrich expressive power beyond discriminative models has become apparent. In light of this, we introduce a novel model that leverages generative models to enhance the precision of CTR predictions in discriminative models. To reconcile the disparate data aggregation needs of both model types, we design a two-stage training process: 1) Generative pre-training for next-item prediction with the given item category in user behavior sequences; 2) Fine-tuning the well-trained generative model within a discriminative CTR prediction framework. Our method’s efficacy is substantiated through extensive experiments on a new dataset, and its significant utility is further corroborated by online A/B testing results. Currently, the model is deployed on one of the world’s largest e-commerce platforms, and we intend to release the associated code and dataset in the future.
nan
Article 495
Title@2025-07-15 (2): Shared Global and Local Geometry of Language Model Embeddings
Title: Shared Global and Local Geometry of Language Model Embeddings | Gemeinsame globale und lokale Geometrie von Sprachmodellen | 共同的全球和地方语言对地测量 2503.21073v3 |
Authors (4): Andrew Lee, Melanie Weber, Fernanda Viégas, Martin Wattenberg
Researchers have recently suggested that models share common representations. In our work, we find numerous geometric similarities across the token embeddings of large language models. First, we find ``global’’ similarities: token embeddings often share similar relative orientations. Next, we characterize local geometry in two ways: (1) by using Locally Linear Embeddings, and (2) by defining a simple measure for the intrinsic dimension of each embedding. Both characterizations allow us to find local similarities across token embeddings. Additionally, our intrinsic dimension demonstrates that embeddings lie on a lower dimensional manifold, and that tokens with lower intrinsic dimensions often have semantically coherent clusters, while those with higher intrinsic dimensions do not. Based on our findings, we introduce EMB2EMB, a simple application to linearly transform steering vectors from one language model to another, despite the two models having different dimensions.
nan
Article 496
Title@2025-07-15 (2): Few-Shot Radar Signal Recognition through Self-Supervised Learning and Radio Frequency Domain Adaptation
Title: Few-Shot Radar Signal Recognition through Self-Supervised Learning and Radio Frequency Domain Adaptation | Wenig scharfe Radarsignalerkennung durch selbstüberwachtes Lernen und Funkfrequenz-Domänenanpassung | 通过自我监督学习和无线电频域的适应,很少点热雷达信号识别 2501.03461v3 |
Authors (5): Zi Huang, Simon Denman, Akila Pemasiri, Clinton Fookes, Terrence Martin
Radar signal recognition (RSR) plays a pivotal role in electronic warfare (EW), as accurately classifying radar signals is critical for informing decision-making. Recent advances in deep learning have shown significant potential in improving RSR in domains with ample annotated data. However, these methods fall short in EW scenarios where annotated radio frequency (RF) data are scarce or impractical to obtain. To address these challenges, we introduce a self-supervised learning (SSL) method which utilises masked signal modelling and RF domain adaption to perform few-shot RSR and enhance performance in environments with limited RF samples and annotations. We propose a two-step approach, first pre-training masked autoencoders (MAE) on baseband in-phase and quadrature (I/Q) signals from diverse RF domains, and then transferring the learned representations to the radar domain, where annotated data are scarce. Empirical results show that our lightweight self-supervised ResNet1D model with domain adaptation achieves up to a 17.5% improvement in 1-shot classification accuracy when pre-trained on in-domain signals (i.e., radar signals) and up to a 16.31% improvement when pre-trained on out-of-domain signals (i.e., comm signals), compared to its baseline without using SSL. We also present reference results for several MAE designs and pre-training strategies, establishing a new benchmark for few-shot radar signal classification.
nan
Article 497
Title@2025-07-15 (2): Improved sampling algorithms and Poincaré inequalities for non-log-concave distributions
Title: Improved sampling algorithms and Poincaré inequalities for non-log-concave distributions | Verbesserte Sampling-Algorithmen und Poincaré-Ungleichheiten für Nicht-Log-Konkaven-Distributionen | 改进取样算法和波因卡雷非卷卷混集分布分布的不平等情况 2507.11236v1 |
Authors (4): Yuchen He, Zhehan Lei, Jianan Shao, Chihao Zhang
We study the problem of sampling from a distribution $\mu$ with density $\propto e^{-V}$ for some potential function $V:\mathbb R^d\to \mathbb R$ with query access to $V$ and $\nabla V$. We start with the following standard assumptions: (1) The potential function $V$ is $L$-smooth. (2) The second moment $\mathbf{E}_{X\sim \mu}[|X|^2]\leq M$. Recently, He and Zhang (COLT’25) showed that the query complexity of sampling from such distributions is at least $\left(\frac{LM}{d\epsilon}\right)^{\Omega(d)}$ where $\epsilon$ is the desired accuracy in total variation distance, and the Poincar'e constant can be arbitrarily large. Meanwhile, another common assumption in the study of diffusion based samplers (see e.g., the work of Chen, Chewi, Li, Li, Salim and Zhang (ICLR’23)) strengthens the smoothness condition (1) to the following: (1) The potential function of *every distribution along the Ornstein-Uhlenbeck process starting from $\mu$ is $L$-smooth. We show that under the assumptions (1) and (2), the query complexity of sampling from $\mu$ can be $\mathrm{poly}(L,d)\cdot \left(\frac{Ld+M}{\epsilon^2}\right)^{\mathcal{O}(L+1)}$, which is polynomial in $d$ and $\frac{1}{\epsilon}$ when $L=\mathcal{O}(1)$ and $M=\mathrm{poly}(d)$. This improves the algorithm with quasi-polynomial query complexity developed by Huang et al. (COLT’24). Our results imply that the seemly moderate strengthening of the smoothness condition (1) to (1) can lead to an exponential gap in the query complexity of sampling algorithms. Moreover, we show that together with the assumption (1*) and the stronger moment assumption that $|X|$ is $\lambda$-sub-Gaussian for $X\sim\mu$, the Poincar'e constant of $\mu$ is at most $\mathcal{O}(\lambda)^{2(L+1)}$. As an application of our technique, we obtain improved estimate of the Poincar'e constant for mixture of Gaussians with the same covariance.
nan
Article 498
Title@2025-07-15 (2): DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion
Title: DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion | DuetGraph: Coarse-to-Fine-Wissensgrafik mit Dual-Pathway Global-Local Fusion | 迪特格格:粗到精知识图,与双路全球-本地融合 2507.11229v1 |
Authors (3): Jin Li, Zezhong Ding, Xike Xie
Knowledge graphs (KGs) are vital for enabling knowledge reasoning across various domains. Recent KG reasoning methods that integrate both global and local information have achieved promising results. However, existing methods often suffer from score over-smoothing, which blurs the distinction between correct and incorrect answers and hinders reasoning effectiveness. To address this, we propose DuetGraph, a coarse-to-fine KG reasoning mechanism with dual-pathway global-local fusion. DuetGraph tackles over-smoothing by segregating – rather than stacking – the processing of local (via message passing) and global (via attention) information into two distinct pathways, preventing mutual interference and preserving representational discrimination. In addition, DuetGraph introduces a coarse-to-fine optimization, which partitions entities into high- and low-score subsets. This strategy narrows the candidate space and sharpens the score gap between the two subsets, which alleviates over-smoothing and enhances inference quality. Extensive experiments on various datasets demonstrate that DuetGraph achieves state-of-the-art (SOTA) performance, with up to an 8.7% improvement in reasoning quality and a 1.8$\times$ acceleration in training efficiency.
nan
Article 499
Title@2025-07-15 (2): TorchCP: A Python Library for Conformal Prediction
Title: TorchCP: A Python Library for Conformal Prediction | TorchCP: Eine Python-Bibliothek für konforme Vorhersagen | 火炬CP:皮顿综合预测图书馆 2402.12683v3 |
Authors (5): Jianguo Huang, Jianqing Song, Xuanning Zhou, Bingyi Jing, Hongxin Wei
Conformal prediction (CP) is a robust statistical framework that generates prediction intervals or sets with guaranteed coverage probability, addressing the challenge of quantifying predictive uncertainty in deep learning. Despite advancements in deep learning architectures and datasets, reliable uncertainty estimation remains elusive, making CP increasingly vital. This paper introduces TorchCP, a PyTorch-native library designed to integrate state-of-the-art CP algorithms into deep learning tasks, including classification, regression, graph neural networks, and large language models. TorchCP offers a comprehensive suite of advanced methodologies, a modular design for easy customization, and full GPU-accelerated scalability. Released under the LGPL-3.0 license, TorchCP has gained widespread adoption with over 12,582 PyPi downloads. It is supported by approximately 16,132 lines of code, 564 unit tests achieving 100\% coverage, and comprehensive documentation. By bridging statistics and computer science, TorchCP empowers researchers and practitioners to advance conformal prediction in diverse deep learning applications.
nan
Article 500
Title@2025-07-15 (2): Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?
Title: Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere? | Gradient Descent on Logistic Regression: Arbeiten große Schrittgrößen mit Daten auf der Sphäre? | 物流倒退的梯度:大步级系统是否与球体数据相配合? 2507.11228v1 |
Authors (4): Si Yi Meng, Baptiste Goujaud, Antonio Orvieto, Christopher De Sa
Gradient descent (GD) on logistic regression has many fascinating properties. When the dataset is linearly separable, it is known that the iterates converge in direction to the maximum-margin separator regardless of how large the step size is. In the non-separable case, however, it has been shown that GD can exhibit a cycling behaviour even when the step sizes is still below the stability threshold $2/\lambda$, where $\lambda$ is the largest eigenvalue of the Hessian at the solution. This short paper explores whether restricting the data to have equal magnitude is a sufficient condition for global convergence, under any step size below the stability threshold. We prove that this is true in a one dimensional space, but in higher dimensions cycling behaviour can still occur. We hope to inspire further studies on quantifying how common these cycles are in realistic datasets, as well as finding sufficient conditions to guarantee global convergence with large step sizes.
nan
Article 501
Title@2025-07-15 (2): On Equivariant Model Selection through the Lens of Uncertainty
Title: On Equivariant Model Selection through the Lens of Uncertainty | Bei gleicher Modellauswahl durch das Lens of Uncertainty | 通过不确定性的镜头进行等同模型选择 2506.18629v2 |
Authors (4): Putri A. van der Linden, Alexander Timans, Dharmesh Tailor, Erik J. Bekkers
Equivariant models leverage prior knowledge on symmetries to improve predictive performance, but misspecified architectural constraints can harm it instead. While work has explored learning or relaxing constraints, selecting among pretrained models with varying symmetry biases remains challenging. We examine this model selection task from an uncertainty-aware perspective, comparing frequentist (via Conformal Prediction), Bayesian (via the marginal likelihood), and calibration-based measures to naive error-based evaluation. We find that uncertainty metrics generally align with predictive performance, but Bayesian model evidence does so inconsistently. We attribute this to a mismatch in Bayesian and geometric notions of model complexity for the employed last-layer Laplace approximation, and discuss possible remedies. Our findings point towards the potential of uncertainty in guiding symmetry-aware model selection.
nan
Article 502
Title@2025-07-15 (2): On the Effect of Instruction Tuning Loss on Generalization
Title: On the Effect of Instruction Tuning Loss on Generalization | Auf die Auswirkungen der Instruktion Tuning Verlust auf die Verallgemeinerung | 指示计票损失对普遍化的影响的影响 2507.07817v2 |
Authors (4): Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty
Instruction Tuning has emerged as a pivotal post-training paradigm that enables pre-trained language models to better follow user instructions. Despite its significance, little attention has been given to optimizing the loss function used. A fundamental, yet often overlooked, question is whether the conventional auto-regressive objective - where loss is computed only on response tokens, excluding prompt tokens - is truly optimal for instruction tuning. In this work, we systematically investigate the impact of differentially weighting prompt and response tokens in instruction tuning loss, and propose Weighted Instruction Tuning (WIT) as a better alternative to conventional instruction tuning. Through extensive experiments on five language models of different families and scale, three finetuning datasets of different sizes, and five diverse evaluation benchmarks, we show that the standard instruction tuning loss often yields suboptimal performance and limited robustness to input prompt variations. We find that a low-to-moderate weight for prompt tokens coupled with a moderate-to-high weight for response tokens yields the best-performing models across settings and also serve as better starting points for the subsequent preference alignment training. These findings highlight the need to reconsider instruction tuning loss and offer actionable insights for developing more robust and generalizable models. Our code is open-sourced at https://github.com/kowndinya-renduchintala/WIT.
nan
Article 503
Title@2025-07-15 (2): Stylometry recognizes human and LLM-generated texts in short samples
Title: Stylometry recognizes human and LLM-generated texts in short samples | Stylometrie erkennt menschliche und LLM-generierte Texte in kurzen Proben | tytylometerm在短样本中确认人类和LLM产生的文本 2507.00838v2 |
Authors (4): Karol Przystalski, Jan K. Argasiński, Iwona Grabska-Gradzińska, Jeremi K. Ochab
The paper explores stylometry as a method to distinguish between texts created by Large Language Models (LLMs) and humans, addressing issues of model attribution, intellectual property, and ethical AI use. Stylometry has been used extensively to characterise the style and attribute authorship of texts. By applying it to LLM-generated texts, we identify their emergent writing patterns. The paper involves creating a benchmark dataset based on Wikipedia, with (a) human-written term summaries, (b) texts generated purely by LLMs (GPT-3.5/4, LLaMa 2/3, Orca, and Falcon), (c) processed through multiple text summarisation methods (T5, BART, Gensim, and Sumy), and (d) rephrasing methods (Dipper, T5). The 10-sentence long texts were classified by tree-based models (decision trees and LightGBM) using human-designed (StyloMetrix) and n-gram-based (our own pipeline) stylometric features that encode lexical, grammatical, syntactic, and punctuation patterns. The cross-validated results reached a performance of up to .87 Matthews correlation coefficient in the multiclass scenario with 7 classes, and accuracy between .79 and 1. in binary classification, with the particular example of Wikipedia and GPT-4 reaching up to .98 accuracy on a balanced dataset. Shapley Additive Explanations pinpointed features characteristic of the encyclopaedic text type, individual overused words, as well as a greater grammatical standardisation of LLMs with respect to human-written texts. These results show – crucially, in the context of the increasingly sophisticated LLMs – that it is possible to distinguish machine- from human-generated texts at least for a well-defined text type.
nan
Article 504
Title@2025-07-15 (2): Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks
Title: Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks | Wahrscheinliche Robustheit von (Graph) Neuronalen Netzwerken gegen Datenvergiftung und Hintertürangriffe | 防止数据中毒和后门攻击的(格)神经网络(防止数据中毒和后门攻击)的可证实的强力 2407.10867v3 |
Authors (4): Lukas Gosch, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar, Stephan Günnemann
Generalization of machine learning models can be severely compromised by data poisoning, where adversarial changes are applied to the training data. This vulnerability has led to interest in certifying (i.e., proving) that such changes up to a certain magnitude do not affect test predictions. We, for the first time, certify Graph Neural Networks (GNNs) against poisoning attacks, including backdoors, targeting the node features of a given graph. Our certificates are white-box and based upon $(i)$ the neural tangent kernel, which characterizes the training dynamics of sufficiently wide networks; and $(ii)$ a novel reformulation of the bilevel optimization problem describing poisoning as a mixed-integer linear program. Consequently, we leverage our framework to provide fundamental insights into the role of graph structure and its connectivity on the worst-case robustness behavior of convolution-based and PageRank-based GNNs. We note that our framework is more general and constitutes the first approach to derive white-box poisoning certificates for NNs, which can be of independent interest beyond graph-related tasks.
nan
Article 505
Title@2025-07-15 (2): A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation
Title: A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation | Eine Überprüfung der Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation | 对贝叶斯不确定因素在深概率图像分割中量化的回顾 2411.16370v5 |
Authors (5): M. M. A. Valiuddin, R. J. G. van Sloun, C. G. A. Viviers, P. H. N. de With, F. van der Sommen
Advances in architectural design, data availability, and compute have driven remarkable progress in semantic segmentation. Yet, these models often rely on relaxed Bayesian assumptions, omitting critical uncertainty information needed for robust decision-making. The resulting reliance on point estimates has fueled interest in probabilistic segmentation, but the literature remains fragmented. In response, this review consolidates and contextualizes foundational concepts in uncertainty modeling, including the non-trivial task of distinguishing between epistemic and aleatoric uncertainty and examining their roles across four key downstream segmentation tasks, highlighting Active Learning as particularly promising. By unifying theory, terminology, and applications, we provide a coherent foundation for researchers and identify critical challenges, such as strong assumptions in spatial aggregation, lack of standardized benchmarks, and pitfalls in current uncertainty quantification methods. We identify trends such as the adoption of contemporary generative models, driven by advances in the broader field of generative modeling, with segmentation-specific innovation primarily in the conditioning mechanisms. Moreover, we observe growing interest in distribution- and sampling-free approaches to uncertainty estimation. We further propose directions for advancing uncertainty-aware segmentation in deep learning, including pragmatic strategies for disentangling different sources of uncertainty, novel uncertainty modeling approaches and improved Transformer-based backbones. In this way, we aim to support the development of more reliable, efficient, and interpretable segmentation models that effectively incorporate uncertainty into real-world applications.
nan
Article 506
Title@2025-07-15 (2): A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition
Title: A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition | Ein robuster, unvollständiger multimodaler Low-Rank-Anpassungsansatz für die Emotionserkennung | 强烈的承认情感的不完全的多式低Rank适应办法 2507.11202v1 |
Authors (9): Xinkui Zhao, Jinsong Shu, Yangyang Wu, Guanjie Cheng, Zihe Liu, Naibo Wang, Shuiguang Deng, Zhongle Xie, Jianwei Yin
Multimodal Emotion Recognition (MER) often encounters incomplete multimodality in practical applications due to sensor failures or privacy protection requirements. While existing methods attempt to address various incomplete multimodal scenarios by balancing the training of each modality combination through additional gradients, these approaches face a critical limitation: training gradients from different modality combinations conflict with each other, ultimately degrading the performance of the final prediction model. In this paper, we propose a unimodal decoupled dynamic low-rank adaptation method based on modality combinations, named MCULoRA, which is a novel framework for the parameter-efficient training of incomplete multimodal learning models. MCULoRA consists of two key modules, modality combination aware low-rank adaptation (MCLA) and dynamic parameter fine-tuning (DPFT). The MCLA module effectively decouples the shared information from the distinct characteristics of individual modality combinations. The DPFT module adjusts the training ratio of modality combinations based on the separability of each modality’s representation space, optimizing the learning efficiency across different modality combinations. Our extensive experimental evaluation in multiple benchmark datasets demonstrates that MCULoRA substantially outperforms previous incomplete multimodal learning approaches in downstream task accuracy.
nan
Article 507
Title@2025-07-15 (2): Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why
Title: Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why | Feature-based vs. GAN-based Learning from Demonstrations: Wann und warum | 从示范活动中学习:何时和为何 2507.05906v2 |
Authors (3): Chenhao Li, Marco Hutter, Andreas Krause
This survey provides a comparative analysis of feature-based and GAN-based approaches to learning from demonstrations, with a focus on the structure of reward functions and their implications for policy learning. Feature-based methods offer dense, interpretable rewards that excel at high-fidelity motion imitation, yet often require sophisticated representations of references and struggle with generalization in unstructured settings. GAN-based methods, in contrast, use implicit, distributional supervision that enables scalability and adaptation flexibility, but are prone to training instability and coarse reward signals. Recent advancements in both paradigms converge on the importance of structured motion representations, which enable smoother transitions, controllable synthesis, and improved task integration. We argue that the dichotomy between feature-based and GAN-based methods is increasingly nuanced: rather than one paradigm dominating the other, the choice should be guided by task-specific priorities such as fidelity, diversity, interpretability, and adaptability. This work outlines the algorithmic trade-offs and design considerations that underlie method selection, offering a framework for principled decision-making in learning from demonstrations.
nan
Article 508
Title@2025-07-15 (2): Data-Driven Differential Evolution in Tire Industry Extrusion: Leveraging Surrogate Models
Title: Data-Driven Differential Evolution in Tire Industry Extrusion: Leveraging Surrogate Models | Datengetriebene Differentialentwicklung in Reifenindustrie Extrusion: Hebelwirkung von Surrogate-Modellen | 轮胎工业振荡中数据驱动的差别变化:杠杆化代金模型 2507.11191v1 |
Authors (3): Eider Garate-Perez, Kerman López de Calle-Etxabe, Susana Ferreiro
The optimization of industrial processes remains a critical challenge, particularly when no mathematical formulation of objective functions or constraints is available. This study addresses this issue by proposing a surrogate-based, data-driven methodology for optimizing complex real-world manufacturing systems using only historical process data. Machine learning models are employed to approximate system behavior and construct surrogate models, which are integrated into a tailored metaheuristic approach: Data-Driven Differential Evolution with Multi-Level Penalty Functions and Surrogate Models, an adapted version of Differential Evolution suited to the characteristics of the studied process. The methodology is applied to an extrusion process in the tire manufacturing industry, with the goal of optimizing initialization parameters to reduce waste and production time. Results show that the surrogate-based optimization approach outperforms historical best configurations, achieving a 65\% reduction in initialization and setup time, while also significantly minimizing material waste. These findings highlight the potential of combining data-driven modeling and metaheuristic optimization for industrial processes where explicit formulations are unavailable.
nan
Article 509
Title@2025-07-15 (2): Striking the Perfect Balance: Preserving Privacy While Boosting Utility in Collaborative Medical Prediction Platforms
Title: Striking the Perfect Balance: Preserving Privacy While Boosting Utility in Collaborative Medical Prediction Platforms | Perfekte Balance: Schutz der Privatsphäre bei gleichzeitiger Steigerung der Nützlichkeit in kollaborativen medizinischen Vorhersageplattformen | 实现完美平衡:在合作医疗预测平台中维护隐私,同时促进效用 2507.11187v1 |
Authors (3): Shao-Bo Lin, Xiaotong Liu, Yao Wang
Online collaborative medical prediction platforms offer convenience and real-time feedback by leveraging massive electronic health records. However, growing concerns about privacy and low prediction quality can deter patient participation and doctor cooperation. In this paper, we first clarify the privacy attacks, namely attribute attacks targeting patients and model extraction attacks targeting doctors, and specify the corresponding privacy principles. We then propose a privacy-preserving mechanism and integrate it into a novel one-shot distributed learning framework, aiming to simultaneously meet both privacy requirements and prediction performance objectives. Within the framework of statistical learning theory, we theoretically demonstrate that the proposed distributed learning framework can achieve the optimal prediction performance under specific privacy requirements. We further validate the developed privacy-preserving collaborative medical prediction platform through both toy simulations and real-world data experiments.
nan
Article 510
Title@2025-07-15 (2): An Explainable AI-Enhanced Machine Learning Approach for Cardiovascular Disease Detection and Risk Assessment
Title: An Explainable AI-Enhanced Machine Learning Approach for Cardiovascular Disease Detection and Risk Assessment | Ein erklärbarer KI-verbesserter maschineller Lernansatz für die Erkennung und Risikobewertung von Herz-Kreislauf-Erkrankungen | 用于心血管疾病检测和风险评估的可解释的AI增强的机器学习方法 2507.11185v1 |
Authors (5): Md. Emon Akter Sourov, Md. Sabbir Hossen, Pabon Shaha, Mohammad Minoar Hossain, Md Sadiq Iqbal
Heart disease remains a major global health concern, particularly in regions with limited access to medical resources and diagnostic facilities. Traditional diagnostic methods often fail to accurately identify and manage heart disease risks, leading to adverse outcomes. Machine learning has the potential to significantly enhance the accuracy, efficiency, and speed of heart disease diagnosis. In this study, we proposed a comprehensive framework that combines classification models for heart disease detection and regression models for risk prediction. We employed the Heart Disease dataset, which comprises 1,035 cases. To address the issue of class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied, resulting in the generation of an additional 100,000 synthetic data points. Performance metrics, including accuracy, precision, recall, F1-score, R2, MSE, RMSE, and MAE, were used to evaluate the model’s effectiveness. Among the classification models, Random Forest emerged as the standout performer, achieving an accuracy of 97.2% on real data and 97.6% on synthetic data. For regression tasks, Linear Regression demonstrated the highest R2 values of 0.992 and 0.984 on real and synthetic datasets, respectively, with the lowest error metrics. Additionally, Explainable AI techniques were employed to enhance the interpretability of the models. This study highlights the potential of machine learning to revolutionize heart disease diagnosis and risk prediction, thereby facilitating early intervention and enhancing clinical decision-making.
nan
Article 511
Title@2025-07-15 (2): Quantized Rank Reduction: A Communications-Efficient Federated Learning Scheme for Network-Critical Applications
Title: Quantized Rank Reduction: A Communications-Efficient Federated Learning Scheme for Network-Critical Applications | Quantisierte Rangreduzierung: Ein kommunikativ-effizientes Federated Learning Scheme für netzwerk-kritische Anwendungen | 减少数量级:网络-英国应用通信-效率高的联邦学习计划 2507.11183v1 |
Authors (2): Dimitrios Kritsiolis, Constantine Kotropoulos
Federated learning is a machine learning approach that enables multiple devices (i.e., agents) to train a shared model cooperatively without exchanging raw data. This technique keeps data localized on user devices, ensuring privacy and security, while each agent trains the model on their own data and only shares model updates. The communication overhead is a significant challenge due to the frequent exchange of model updates between the agents and the central server. In this paper, we propose a communication-efficient federated learning scheme that utilizes low-rank approximation of neural network gradients and quantization to significantly reduce the network load of the decentralized learning process with minimal impact on the model’s accuracy.
nan
Article 512
Title@2025-07-15 (2): Mixture of Experts in Large Language Models
Title: Mixture of Experts in Large Language Models | Mixtur von Experten in großen Sprachmodellen | 大语言模式专家混合 2507.11181v1 |
Authors (7): Danyang Zhang, Junhao Song, Ziqian Bi, Yingfang Yuan, Tianyang Wang, Joe Yeong, Junfeng Hao
This paper presents a comprehensive review of the Mixture-of-Experts (MoE) architecture in large language models, highlighting its ability to significantly enhance model performance while maintaining minimal computational overhead. Through a systematic analysis spanning theoretical foundations, core architectural designs, and large language model (LLM) applications, we examine expert gating and routing mechanisms, hierarchical and sparse MoE configurations, meta-learning approaches, multimodal and multitask learning scenarios, real-world deployment cases, and recent advances and challenges in deep learning. Our analysis identifies key advantages of MoE, including superior model capacity compared to equivalent Bayesian approaches, improved task-specific performance, and the ability to scale model capacity efficiently. We also underscore the importance of ensuring expert diversity, accurate calibration, and reliable inference aggregation, as these are essential for maximizing the effectiveness of MoE architectures. Finally, this review outlines current research limitations, open challenges, and promising future directions, providing a foundation for continued innovation in MoE architecture and its applications.
nan
Article 513
Title@2025-07-15 (2): Gradient Regularization-based Neural Granger Causality
Title: Gradient Regularization-based Neural Granger Causality | Gradient Regularisierung-basierte Neural Granger Kausalität | 以神经重力为主的神经固态致果性 2507.11178v1 |
Authors (8): Meiliang Liu, Huiwen Dong, Xiaoxiao Yang, Yunfang Xu, Zijin Li, Zhengye Si, Xinyue Yang, Zhiwen Zhao
With the advancement of deep learning technologies, various neural network-based Granger causality models have been proposed. Although these models have demonstrated notable improvements, several limitations remain. Most existing approaches adopt the component-wise architecture, necessitating the construction of a separate model for each time series, which results in substantial computational costs. In addition, imposing the sparsity-inducing penalty on the first-layer weights of the neural network to extract causal relationships weakens the model’s ability to capture complex interactions. To address these limitations, we propose Gradient Regularization-based Neural Granger Causality (GRNGC), which requires only one time series prediction model and applies $L_{1}$ regularization to the gradient between model’s input and output to infer Granger causality. Moreover, GRNGC is not tied to a specific time series forecasting model and can be implemented with diverse architectures such as KAN, MLP, and LSTM, offering enhanced flexibility. Numerical simulations on DREAM, Lorenz-96, fMRI BOLD, and CausalTime show that GRNGC outperforms existing baselines and significantly reduces computational overhead. Meanwhile, experiments on real-world DNA, Yeast, HeLa, and bladder urothelial carcinoma datasets further validate the model’s effectiveness in reconstructing gene regulatory networks.
nan
Article 514
Title@2025-07-15 (2): Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction
Title: Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction | Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Verstärkung Learning Based UAV Deconfliction | 在以无人驾驶航空器为基础的强化学习中潜入 2507.11173v1 |
Authors (2): Deepak Kumar Panda, Weisi Guo
Autonomous unmanned aerial vehicles (UAVs) rely on global navigation satellite system (GNSS) pseudorange measurements for accurate real-time localization and navigation. However, this dependence exposes them to sophisticated spoofing threats, where adversaries manipulate pseudoranges to deceive UAV receivers. Among these, drift-evasive spoofing attacks subtly perturb measurements, gradually diverting the UAVs trajectory without triggering conventional signal-level anti-spoofing mechanisms. Traditional distributional shift detection techniques often require accumulating a threshold number of samples, causing delays that impede rapid detection and timely response. Consequently, robust temporal-scale detection methods are essential to identify attack onset and enable contingency planning with alternative sensing modalities, improving resilience against stealthy adversarial manipulations. This study explores a Bayesian online change point detection (BOCPD) approach that monitors temporal shifts in value estimates from a reinforcement learning (RL) critic network to detect subtle behavioural deviations in UAV navigation. Experimental results show that this temporal value-based framework outperforms conventional GNSS spoofing detectors, temporal semi-supervised learning frameworks, and the Page-Hinkley test, achieving higher detection accuracy and lower false-positive and false-negative rates for drift-evasive spoofing attacks.
nan
Article 515
Title@2025-07-15 (2): Improving Wi-Fi Network Performance Prediction with Deep Learning Models
Title: Improving Wi-Fi Network Performance Prediction with Deep Learning Models | Verbesserung der Wi-Fi-Netzwerk-Performance-Vorhersage mit Deep-Learning-Modellen | 利用深学习模式改进无线网络绩效预测 2507.11168v1 |
Authors (6): Gabriele Formis, Amanda Ericson, Stefan Forsstrom, Kyi Thar, Gianluca Cena, Stefano Scanzio
The increasing need for robustness, reliability, and determinism in wireless networks for industrial and mission-critical applications is the driver for the growth of new innovative methods. The study presented in this work makes use of machine learning techniques to predict channel quality in a Wi-Fi network in terms of the frame delivery ratio. Predictions can be used proactively to adjust communication parameters at runtime and optimize network operations for industrial applications. Methods including convolutional neural networks and long short-term memory were analyzed on datasets acquired from a real Wi-Fi setup across multiple channels. The models were compared in terms of prediction accuracy and computational complexity. Results show that the frame delivery ratio can be reliably predicted, and convolutional neural networks, although slightly less effective than other models, are more efficient in terms of CPU usage and memory consumption. This enhances the model’s usability on embedded and industrial systems.
nan
Article 516
Title@2025-07-15 (2): GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks
Title: GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks | GRAPES: Lernen zu Mustergraphen für skalierbare Graphen-Neural-Netzwerke | GRAPES: 学习可缩放图形神经网络样本图 2310.03399v3 |
Authors (5): Taraneh Younesian, Daniel Daza, Emile van Krieken, Thiviyan Thanapalasingam, Peter Bloem
Graph neural networks (GNNs) learn to represent nodes by aggregating information from their neighbors. As GNNs increase in depth, their receptive field grows exponentially, leading to high memory costs. Several existing methods address this by sampling a small subset of nodes, scaling GNNs to much larger graphs. These methods are primarily evaluated on homophilous graphs, where neighboring nodes often share the same label. However, most of these methods rely on static heuristics that may not generalize across different graphs or tasks. We argue that the sampling method should be adaptive, adjusting to the complex structural properties of each graph. To this end, we introduce GRAPES, an adaptive sampling method that learns to identify the set of nodes crucial for training a GNN. GRAPES trains a second GNN to predict node sampling probabilities by optimizing the downstream task objective. We evaluate GRAPES on various node classification benchmarks, involving homophilous as well as heterophilous graphs. We demonstrate GRAPES’ effectiveness in accuracy and scalability, particularly in multi-label heterophilous graphs. Unlike other sampling methods, GRAPES maintains high accuracy even with smaller sample sizes and, therefore, can scale to massive graphs. Our code is publicly available at https://github.com/dfdazac/grapes.
nan
Article 517
Title@2025-07-15 (2): EASTER: Embedding Aggregation-based Heterogeneous Models Training in Vertical Federated Learning
Title: EASTER: Embedding Aggregation-based Heterogeneous Models Training in Vertical Federated Learning | EASTER: Einbettung von Aggregationsbasierten Heterogenen Modellen Training in vertikales Federated Learning | EEASTER:在纵向联邦学习中嵌入基于聚合的异种模式培训 2310.13367v3 |
Authors (6): Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu, Kim-Kwang Raymond Choo, Bin Xiao
Vertical federated learning has garnered significant attention as it allows clients to train machine learning models collaboratively without sharing local data, which protects the client’s local private data. However, existing VFL methods face challenges when dealing with heterogeneous local models among participants, which affects optimization convergence and generalization. To address this challenge, this paper proposes a novel approach called Vertical federated learning for training multiple Heterogeneous models (VFedMH). VFedMH focuses on aggregating the local embeddings of each participant’s knowledge during forward propagation. To protect the participants’ local embedding values, we propose an embedding protection method based on lightweight blinding factors. In particular, participants obtain local embedding using local heterogeneous models. Then the passive party, who owns only features of the sample, injects the blinding factor into the local embedding and sends it to the active party. The active party aggregates local embeddings to obtain global knowledge embeddings and sends them to passive parties. The passive parties then utilize the global embeddings to propagate forward on their local heterogeneous networks. However, the passive party does not own the sample labels, so the local model gradient cannot be calculated locally. To overcome this limitation, the active party assists the passive party in computing its local heterogeneous model gradients. Then, each participant trains their local model using the heterogeneous model gradients. The objective is to minimize the loss value of their respective local heterogeneous models. Extensive experiments are conducted to demonstrate that VFedMH can simultaneously train multiple heterogeneous models with heterogeneous optimization and outperform some recent methods in model performance.
nan
Article 518
Title@2025-07-15 (2): Fast Fourier Correlation is a Highly Efficient and Accurate Feature Attribution Algorithm from the Perspective of Control Theory and Game Theory
Title: Fast Fourier Correlation is a Highly Efficient and Accurate Feature Attribution Algorithm from the Perspective of Control Theory and Game Theory | Fast Fourier Correlation ist ein hocheffizientes und präzises Feature Attribution Algorithmus aus der Perspektive der Steuerungstheorie und Spieltheorie | 从控制理论和游戏理论的角度看,快速的四面形关联是一种高效和准确的地物归属比值。 2504.02016v2 |
Authors (5): Zechen Liu, Feiyang Zhang, Wei Song, Xiang Li, Wei Wei
The study of neural networks from the perspective of Fourier features has garnered significant attention. While existing analytical research suggests that neural networks tend to learn low-frequency features, a clear attribution method for identifying the specific learned Fourier features has remained elusive. To bridge this gap, we propose a novel Fourier feature attribution method grounded in signal decomposition theory. Additionally, we analyze the differences between game-theoretic attribution metrics for Fourier and spatial domain features, demonstrating that game-theoretic evaluation metrics are better suited for Fourier-based feature attribution. Our experiments show that Fourier feature attribution exhibits superior feature selection capabilities compared to spatial domain attribution methods. For instance, in the case of Vision Transformers (ViTs) on the ImageNet dataset, only $8\%$ of the Fourier features are required to maintain the original predictions for $80\%$ of the samples. Furthermore, we compare the specificity of features identified by our method against traditional spatial domain attribution methods. Results reveal that Fourier features exhibit greater intra-class concentration and inter-class distinctiveness, indicating their potential for more efficient classification and explainable AI algorithms.
nan
Article 519
Title@2025-07-15 (2): RMAU-NET: A Residual-Multihead-Attention U-Net Architecture for Landslide Segmentation and Detection from Remote Sensing Images
Title: RMAU-NET: A Residual-Multihead-Attention U-Net Architecture for Landslide Segmentation and Detection from Remote Sensing Images | RMAU-NET: Eine residual-Multihead-Aufmerksamkeit U-Net-Architektur für Erdrutschsegmentierung und Detektion von Fernerkundungsbildern | RMAU-NET:从遥感图像中分离和探测滑坡的剩余-多头-注意 U-网络结构 2507.11143v1 |
Authors (9): Lam Pham, Cam Le, Hieu Tang, Khang Truong, Truong Nguyen, Jasmin Lampert, Alexander Schindler, Martin Boyer, Son Phan
In recent years, landslide disasters have reported frequently due to the extreme weather events of droughts, floods , storms, or the consequence of human activities such as deforestation, excessive exploitation of natural resources. However, automatically observing landslide is challenging due to the extremely large observing area and the rugged topography such as mountain or highland. This motivates us to propose an end-to-end deep-learning-based model which explores the remote sensing images for automatically observing landslide events. By considering remote sensing images as the input data, we can obtain free resource, observe large and rough terrains by time. To explore the remote sensing images, we proposed a novel neural network architecture which is for two tasks of landslide detection and landslide segmentation. We evaluated our proposed model on three different benchmark datasets of LandSlide4Sense, Bijie, and Nepal. By conducting extensive experiments, we achieve F1 scores of 98.23, 93.83 for the landslide detection task on LandSlide4Sense, Bijie datasets; mIoU scores of 63.74, 76.88 on the segmentation tasks regarding LandSlide4Sense, Nepal datasets. These experimental results prove potential to integrate our proposed model into real-life landslide observation systems.
nan
Article 520
Title@2025-07-15 (2): CLA: Latent Alignment for Online Continual Self-Supervised Learning
Title: CLA: Latent Alignment for Online Continual Self-Supervised Learning | CLA: Latent Alignment for Online Continual Self-Supervised Learning | CLCA: 在线持续自学在线持续自我监督学习的经常协调 2507.10434v2 |
Authors (5): Giacomo Cignoni, Andrea Cossu, Alexandra Gomez-Villa, Joost van de Weijer, Antonio Carta
Self-supervised learning (SSL) is able to build latent representations that generalize well to unseen data. However, only a few SSL techniques exist for the online CL setting, where data arrives in small minibatches, the model must comply with a fixed computational budget, and task boundaries are absent. We introduce Continual Latent Alignment (CLA), a novel SSL strategy for Online CL that aligns the representations learned by the current model with past representations to mitigate forgetting. We found that our CLA is able to speed up the convergence of the training process in the online scenario, outperforming state-of-the-art approaches under the same computational budget. Surprisingly, we also discovered that using CLA as a pretraining protocol in the early stages of pretraining leads to a better final performance when compared to a full i.i.d. pretraining.
nan
Article 521
Title@2025-07-15 (2): Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking
Title: Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking | Verschüttetes Wasserzeichen als Filter: Bekämpfung von Schmieden und Überschreiben von Angriffen bei gewichtsbasiertem Neural Network Watermarking | 流域水印作为过滤器:在以重量为基础的神经网络水印中击退伪造和推翻攻击 2507.11137v1 |
Authors (3): Yuan Yao, Jin Song, Jian Jin
As valuable digital assets, deep neural networks necessitate robust ownership protection, positioning neural network watermarking (NNW) as a promising solution. Among various NNW approaches, weight-based methods are favored for their simplicity and practicality; however, they remain vulnerable to forging and overwriting attacks. To address those challenges, we propose NeuralMark, a robust method built around a hashed watermark filter. Specifically, we utilize a hash function to generate an irreversible binary watermark from a secret key, which is then used as a filter to select the model parameters for embedding. This design cleverly intertwines the embedding parameters with the hashed watermark, providing a robust defense against both forging and overwriting attacks. An average pooling is also incorporated to resist fine-tuning and pruning attacks. Furthermore, it can be seamlessly integrated into various neural network architectures, ensuring broad applicability. Theoretically, we analyze its security boundary. Empirically, we verify its effectiveness and robustness across 13 distinct Convolutional and Transformer architectures, covering five image classification tasks and one text generation task. The source codes are available at https://github.com/AIResearch-Group/NeuralMark.
nan
Article 522
Title@2025-07-15 (2): Interpretable Bayesian Tensor Network Kernel Machines with Automatic Rank and Feature Selection
Title: Interpretable Bayesian Tensor Network Kernel Machines with Automatic Rank and Feature Selection | Interpretierbare Bayesian Tensor Netzwerk-Kernel-Maschinen mit automatischer Rang- und Feature-Auswahl | 具有自动排级和特选功能的可解释贝耶斯泰瑟网络中枢机 2507.11136v1 |
Authors (2): Afra Kilic, Kim Batselier
Tensor Network (TN) Kernel Machines speed up model learning by representing parameters as low-rank TNs, reducing computation and memory use. However, most TN-based Kernel methods are deterministic and ignore parameter uncertainty. Further, they require manual tuning of model complexity hyperparameters like tensor rank and feature dimensions, often through trial-and-error or computationally costly methods like cross-validation. We propose Bayesian Tensor Network Kernel Machines, a fully probabilistic framework that uses sparsity-inducing hierarchical priors on TN factors to automatically infer model complexity. This enables automatic inference of tensor rank and feature dimensions, while also identifying the most relevant features for prediction, thereby enhancing model interpretability. All the model parameters and hyperparameters are treated as latent variables with corresponding priors. Given the Bayesian approach and latent variable dependencies, we apply a mean-field variational inference to approximate their posteriors. We show that applying a mean-field approximation to TN factors yields a Bayesian ALS algorithm with the same computational complexity as its deterministic counterpart, enabling uncertainty quantification at no extra computational cost. Experiments on synthetic and real-world datasets demonstrate the superior performance of our model in prediction accuracy, uncertainty quantification, interpretability, and scalability.
nan
Article 523
Title@2025-07-15 (2): What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests
Title: What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests | Was sollten LLMs vergessen? Quantifizierung personenbezogener Daten in LLMs für rechts-zu-vergessene Anfragen | 普法女士应忘记什么? 将个人数据量化为 “ 有权被遗忘的请求 “ 的 “ 普法女士 “ 中的 “ 个人数据 “ 。 2507.11128v1 |
Authors (1): Dimitri Staufer
Large Language Models (LLMs) can memorize and reveal personal information, raising concerns regarding compliance with the EU’s GDPR, particularly the Right to Be Forgotten (RTBF). Existing machine unlearning methods assume the data to forget is already known but do not address how to identify which individual-fact associations are stored in the model. Privacy auditing techniques typically operate at the population level or target a small set of identifiers, limiting applicability to individual-level data inquiries. We introduce WikiMem, a dataset of over 5,000 natural language canaries covering 243 human-related properties from Wikidata, and a model-agnostic metric to quantify human-fact associations in LLMs. Our approach ranks ground-truth values against counterfactuals using calibrated negative log-likelihood across paraphrased prompts. We evaluate 200 individuals across 15 LLMs (410M-70B parameters), showing that memorization correlates with subject web presence and model scale. We provide a foundation for identifying memorized personal data in LLMs at the individual level, enabling the dynamic construction of forget sets for machine unlearning and RTBF requests.
nan
Article 524
Title@2025-07-15 (2): TAB: Unified Benchmarking of Time Series Anomaly Detection Methods
Title: TAB: Unified Benchmarking of Time Series Anomaly Detection Methods | TAB: Unified Benchmarking von Methoden zur Erkennung von Anomalien in der Zeitreihe | TAB: 不同探测方法的时间序列统一基准 2506.18046v2 |
Authors (13): Xiangfei Qiu, Zhe Li, Wanghui Qiu, Shiyan Hu, Lekui Zhou, Xingjian Wu, Zhengyu Li, Chenjuan Guo, Aoying Zhou, Zhenli Sheng, Jilin Hu, Christian S. Jensen, Bin Yang
Time series anomaly detection (TSAD) plays an important role in many domains such as finance, transportation, and healthcare. With the ongoing instrumentation of reality, more time series data will be available, leading also to growing demands for TSAD. While many TSAD methods already exist, new and better methods are still desirable. However, effective progress hinges on the availability of reliable means of evaluating new methods and comparing them with existing methods. We address deficiencies in current evaluation procedures related to datasets and experimental settings and protocols. Specifically, we propose a new time series anomaly detection benchmark, called TAB. First, TAB encompasses 29 public multivariate datasets and 1,635 univariate time series from different domains to facilitate more comprehensive evaluations on diverse datasets. Second, TAB covers a variety of TSAD methods, including Non-learning, Machine learning, Deep learning, LLM-based, and Time-series pre-trained methods. Third, TAB features a unified and automated evaluation pipeline that enables fair and easy evaluation of TSAD methods. Finally, we employ TAB to evaluate existing TSAD methods and report on the outcomes, thereby offering a deeper insight into the performance of these methods. Besides, all datasets and code are available at https://github.com/decisionintelligence/TAB.
nan
Article 525
Title@2025-07-15 (2): PPA-Game: Characterizing and Learning Competitive Dynamics Among Online Content Creators
Title: PPA-Game: Characterizing and Learning Competitive Dynamics Among Online Content Creators | PPA-Game: Charakterisieren und Lernen wettbewerbsfähige Dynamik unter Online Content Creators | PPA-Game:确定和学习在线内容创建者之间的竞争动态 2403.15524v2 |
Authors (5): Renzhe Xu, Haotian Wang, Xingxuan Zhang, Bo Li, Peng Cui
In this paper, we present the Proportional Payoff Allocation Game (PPA-Game), which characterizes situations where agents compete for divisible resources. In the PPA-game, agents select from available resources, and their payoffs are proportionately determined based on heterogeneous weights attributed to them. Such dynamics simulate content creators on online recommender systems like YouTube and TikTok, who compete for finite consumer attention, with content exposure reliant on inherent and distinct quality. We first conduct a game-theoretical analysis of the PPA-Game. While the PPA-Game does not always guarantee the existence of a pure Nash equilibrium (PNE), we identify prevalent scenarios ensuring its existence. Simulated experiments further prove that the cases where PNE does not exist rarely happen. Beyond analyzing static payoffs, we further discuss the agents’ online learning about resource payoffs by integrating a multi-player multi-armed bandit framework. We propose an online algorithm facilitating each agent’s maximization of cumulative payoffs over $T$ rounds. Theoretically, we establish that the regret of any agent is bounded by $O(\log^{1 + \eta} T)$ for any $\eta > 0$. Empirical results further validate the effectiveness of our online learning approach.
nan
Article 526
Title@2025-07-15 (2): Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Title: Dynamic Chunking for End-to-End Hierarchical Sequence Modeling | Dynamisches Chunking für die end-to-end-Hierarchische Sequenzmodellierung | 端端到末端等级序列建模动态震动 2507.07955v2 |
Authors (3): Sukjun Hwang, Brandon Wang, Albert Gu
Major progress on language models (LMs) in recent years has largely resulted from moving away from specialized models designed for specific tasks, to general models based on powerful architectures (e.g. the Transformer) that learn everything from raw data. Despite this trend, pre-processing steps such as tokenization remain a barrier to true end-to-end foundation models. We introduce a collection of new techniques that enable a dynamic chunking mechanism which automatically learns content- and context- dependent segmentation strategies learned jointly with the rest of the model. Incorporating this into an explicit hierarchical network (H-Net) allows replacing the (implicitly hierarchical) tokenization-LM-detokenization pipeline with a single model learned fully end-to-end. When compute- and data- matched, an H-Net with one stage of hierarchy operating at the byte level outperforms a strong Transformer language model operating over BPE tokens. Iterating the hierarchy to multiple stages further increases its performance by modeling multiple levels of abstraction, demonstrating significantly better scaling with data and matching the token-based Transformer of twice its size. H-Nets pretrained on English show significantly increased character-level robustness, and qualitatively learn meaningful data-dependent chunking strategies without any heuristics or explicit supervision. Finally, the H-Net’s improvement over tokenized pipelines is further increased in languages and modalities with weaker tokenization heuristics, such as Chinese and code, or DNA sequences (nearly 4x improvement in data efficiency over baselines), showing the potential of true end-to-end models that learn and scale better from unprocessed data.
nan
Article 527
Title@2025-07-15 (2): Context-Aware Deep Lagrangian Networks for Model Predictive Control
Title: Context-Aware Deep Lagrangian Networks for Model Predictive Control | Context-Aware Deep Lagrangian Networks für Modellvorhersagesteuerung | 用于模型预测控制的深拉格朗江网络 2506.15249v2 |
Authors (3): Lucas Schulze, Jan Peters, Oleg Arenz
Controlling a robot based on physics-consistent dynamic models, such as Deep Lagrangian Networks (DeLaN), can improve the generalizability and interpretability of the resulting behavior. However, in complex environments, the number of objects to potentially interact with is vast, and their physical properties are often uncertain. This complexity makes it infeasible to employ a single global model. Therefore, we need to resort to online system identification of context-aware models that capture only the currently relevant aspects of the environment. While physical principles such as the conservation of energy may not hold across varying contexts, ensuring physical plausibility for any individual context-aware model can still be highly desirable, particularly when using it for receding horizon control methods such as model predictive control (MPC). Hence, in this work, we extend DeLaN to make it context-aware, combine it with a recurrent network for online system identification, and integrate it with an MPC for adaptive, physics-consistent control. We also combine DeLaN with a residual dynamics model to leverage the fact that a nominal model of the robot is typically available. We evaluate our method on a 7-DOF robot arm for trajectory tracking under varying loads. Our method reduces the end-effector tracking error by 39%, compared to a 21% improvement achieved by a baseline that uses an extended Kalman filter.
nan
Article 528
Title@2025-07-15 (2): Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs
Title: Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs | Multi-Trigger-Vergiftung verstärkt Sicherheitslücken in LLMs | 多触发中毒行为放大了LLM 的后门脆弱性 2507.11112v1 |
Authors (4): Sanhanat Sivapiromrat, Caiqi Zhang, Marco Basaldella, Nigel Collier
Recent studies have shown that Large Language Models (LLMs) are vulnerable to data poisoning attacks, where malicious training examples embed hidden behaviours triggered by specific input patterns. However, most existing works assume a phrase and focus on the attack’s effectiveness, offering limited understanding of trigger mechanisms and how multiple triggers interact within the model. In this paper, we present a framework for studying poisoning in LLMs. We show that multiple distinct backdoor triggers can coexist within a single model without interfering with each other, enabling adversaries to embed several triggers concurrently. Using multiple triggers with high embedding similarity, we demonstrate that poisoned triggers can achieve robust activation even when tokens are substituted or separated by long token spans. Our findings expose a broader and more persistent vulnerability surface in LLMs. To mitigate this threat, we propose a post hoc recovery method that selectively retrains specific model components based on a layer-wise weight difference analysis. Our method effectively removes the trigger behaviour with minimal parameter updates, presenting a practical and efficient defence against multi-trigger poisoning.
nan
Article 529
Title@2025-07-15 (2): A Mathematical Optimization Approach to Multisphere Support Vector Data Description
Title: A Mathematical Optimization Approach to Multisphere Support Vector Data Description | Ein mathematischer Optimierungsansatz zur Multisphärenunterstützung Vektordatenbeschreibung | 多重支持矢量数据描述的数学优化方法 2507.11106v1 |
Authors (4): Víctor Blanco, Inmaculada Espejo, Raúl Páez, Antonio M. Rodríguez-Chía
We present a novel mathematical optimization framework for outlier detection in multimodal datasets, extending Support Vector Data Description approaches. We provide a primal formulation, in the shape of a Mixed Integer Second Order Cone model, that constructs Euclidean hyperspheres to identify anomalous observations. Building on this, we develop a dual model that enables the application of the kernel trick, thus allowing for the detection of outliers within complex, non-linear data structures. An extensive computational study demonstrates the effectiveness of our exact method, showing clear advantages over existing heuristic techniques in terms of accuracy and robustness.
nan
Article 530
Title@2025-07-15 (2): Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
Title: Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs | Sprachenübergreifendes Reisen: Benchmarking Cross-Lingual Consistency in multimodalen LLMs | 跨语言旅行:多模式LLM中跨语言一致基准 2505.15075v3 |
Authors (5): Hao Wang, Pinzhi Huang, Jihan Yang, Saining Xie, Daisuke Kawahara
The rapid evolution of multimodal large language models (MLLMs) has significantly enhanced their real-world applications. However, achieving consistent performance across languages, especially when integrating cultural knowledge, remains a significant challenge. To better assess this issue, we introduce two new benchmarks: KnowRecall and VisRecall, which evaluate cross-lingual consistency in MLLMs. KnowRecall is a visual question answering benchmark designed to measure factual knowledge consistency in 15 languages, focusing on cultural and historical questions about global landmarks. VisRecall assesses visual memory consistency by asking models to describe landmark appearances in 9 languages without access to images. Experimental results reveal that state-of-the-art MLLMs, including proprietary ones, still struggle to achieve cross-lingual consistency. This underscores the need for more robust approaches that produce truly multilingual and culturally aware models.
nan
Article 531
Title@2025-07-15 (2): LaCoOT: Layer Collapse through Optimal Transport
Title: LaCoOT: Layer Collapse through Optimal Transport | LaCoOT: Layer Collapse durch optimalen Transport | LaCOOT: 通过最佳迁移折叠图层 2406.08933v3 |
Authors (5): Victor Quétu, Zhu Liao, Nour Hezbri, Fabio Pizzati, Enzo Tartaglione
Although deep neural networks are well-known for their outstanding performance in tackling complex tasks, their hunger for computational resources remains a significant hurdle, posing energy-consumption issues and restricting their deployment on resource-constrained devices, preventing their widespread adoption. In this paper, we present an optimal transport-based method to reduce the depth of over-parametrized deep neural networks, alleviating their computational burden. More specifically, we propose a new regularization strategy based on the Max-Sliced Wasserstein distance to minimize the distance between the intermediate feature distributions in the neural network. We show that minimizing this distance enables the complete removal of intermediate layers in the network, achieving better performance/depth trade-off compared to existing techniques. We assess the effectiveness of our method on traditional image classification setups and extend it to generative image models. Our code is available at https://github.com/VGCQ/LaCoOT.
nan
Article 532
Title@2025-07-15 (2): Tree-Structured Parzen Estimator Can Solve Black-Box Combinatorial Optimization More Efficiently
Title: Tree-Structured Parzen Estimator Can Solve Black-Box Combinatorial Optimization More Efficiently | Tree-Structured Parzen Estimator kann Black-Box Kombinatorische Optimierung effizienter lösen | 树结构化 Parzen 模拟器能够更有效地解决黑色Box组合优化 2507.08053v2 |
Authors (3): Kenshin Abe, Yunzhuo Wang, Shuhei Watanabe
Tree-structured Parzen estimator (TPE) is a versatile hyperparameter optimization (HPO) method supported by popular HPO tools. Since these HPO tools have been developed in line with the trend of deep learning (DL), the problem setups often used in the DL domain have been discussed for TPE such as multi-objective optimization and multi-fidelity optimization. However, the practical applications of HPO are not limited to DL, and black-box combinatorial optimization is actively utilized in some domains, e.g., chemistry and biology. As combinatorial optimization has been an untouched, yet very important, topic in TPE, we propose an efficient combinatorial optimization algorithm for TPE. In this paper, we first generalize the categorical kernel with the numerical kernel in TPE, enabling us to introduce a distance structure to the categorical kernel. Then we discuss modifications for the newly developed kernel to handle a large combinatorial search space. These modifications reduce the time complexity of the kernel calculation with respect to the size of a combinatorial search space. In the experiments using synthetic problems, we verified that our proposed method identifies better solutions with fewer evaluations than the original TPE. Our algorithm is available in Optuna, an open-source framework for HPO.
nan
Article 533
Title@2025-07-15 (2): SketchDNN: Joint Continuous-Discrete Diffusion for CAD Sketch Generation
Title: SketchDNN: Joint Continuous-Discrete Diffusion for CAD Sketch Generation | SketchDNN: Joint Continuous-Discrete Diffusion für CAD Sketch Generation | SletchDNN: 为CAD SlaychDN 生成的 CAD SlaychDN 联合连续分解扩散 2507.11579v1 |
Authors (2): Sathvik Chereddy, John Femiani
We present SketchDNN, a generative model for synthesizing CAD sketches that jointly models both continuous parameters and discrete class labels through a unified continuous-discrete diffusion process. Our core innovation is Gaussian-Softmax diffusion, where logits perturbed with Gaussian noise are projected onto the probability simplex via a softmax transformation, facilitating blended class labels for discrete variables. This formulation addresses 2 key challenges, namely, the heterogeneity of primitive parameterizations and the permutation invariance of primitives in CAD sketches. Our approach significantly improves generation quality, reducing Fr'echet Inception Distance (FID) from 16.04 to 7.80 and negative log-likelihood (NLL) from 84.8 to 81.33, establishing a new state-of-the-art in CAD sketch generation on the SketchGraphs dataset.
nan
Article 534
Title@2025-07-15 (2): LogTinyLLM: Tiny Large Language Models Based Contextual Log Anomaly Detection
Title: LogTinyLLM: Tiny Large Language Models Based Contextual Log Anomaly Detection | LogTinyLLM: Kleine, große Sprachmodelle auf Basis von Kontext-Loganomalie-Erkennung | LogTinyLLLM:基于上下文原对地探测的小型大语言模型 2507.11071v1 |
Authors (3): Isaiah Thompson Ocansey, Ritwik Bhattacharya, Tanmay Sen
Log anomaly detection using traditional rule based or deep learning based methods is often challenging due to the large volume and highly complex nature of log sequence. So effective way of detection of anomalous sequence of logs is crucial for system maintenance and development. This paper proposes parameter efficient finetuning specifically low rank adaptation (LoRA) and adapter based approaches for finding contextual anomalies in sequence of logs in large log data set. It compares different tiny large language models (LLMs) on the Thunderbird dataset. The results show that LoRA based finetuning provides substantial performance improvements of 18 to 19 percentage over LogBert based full finetuning approach, achieving accuracy scores between 97.76% and 98.83% compared to 79.37%.
nan
Article 535
Title@2025-07-15 (2): A Distance Metric for Mixed Integer Programming Instances
Title: A Distance Metric for Mixed Integer Programming Instances | Ein Abstandsmetrik für gemischte Integer-Programmierungsinstanzen | 混合整数方案拟订实例远程计量 2507.11063v1 |
Authors (2): Gwen Maudet, Grégoire Danoy
Mixed-integer linear programming (MILP) is a powerful tool for addressing a wide range of real-world problems, but it lacks a clear structure for comparing instances. A reliable similarity metric could establish meaningful relationships between instances, enabling more effective evaluation of instance set heterogeneity and providing better guidance to solvers, particularly when machine learning is involved. Existing similarity metrics often lack precision in identifying instance classes or rely heavily on labeled data, which limits their applicability and generalization. To bridge this gap, this paper introduces the first mathematical distance metric for MILP instances, derived directly from their mathematical formulations. By discretizing right-hand sides, weights, and variables into classes, the proposed metric draws inspiration from the Earth mover’s distance to quantify mismatches in weight-variable distributions for constraint comparisons. This approach naturally extends to enable instance-level comparisons. We evaluate both an exact and a greedy variant of our metric under various parameter settings, using the StrIPLIB dataset. Results show that all components of the metric contribute to class identification, and that the greedy version achieves accuracy nearly identical to the exact formulation while being nearly 200 times faster. Compared to state-of-the-art baselines, including feature-based, image-based, and neural network models, our unsupervised method consistently outperforms all non-learned approaches and rivals the performance of a supervised classifier on class and subclass grouping tasks.
nan
Article 536
Title@2025-07-15 (2): Comply: Learning Sentences with Complex Weights inspired by Fruit Fly Olfaction
Title: Comply: Learning Sentences with Complex Weights inspired by Fruit Fly Olfaction | Comply: Lernen von Sätzen mit komplexen Gewichten inspiriert von Fruit Fly Olfaction | 遵守:受果蝇运动启发的具有复杂重力的学习判决 2502.01706v3 |
Authors (8): Alexei Figueroa, Justus Westerhoff, Golzar Atefi, Dennis Fast, Benjamin Winter, Felix Alexander Gers, Alexander Löser, Wolfgang Nejdl
Biologically inspired neural networks offer alternative avenues to model data distributions. FlyVec is a recent example that draws inspiration from the fruit fly’s olfactory circuit to tackle the task of learning word embeddings. Surprisingly, this model performs competitively even against deep learning approaches specifically designed to encode text, and it does so with the highest degree of computational efficiency. We pose the question of whether this performance can be improved further. For this, we introduce Comply. By incorporating positional information through complex weights, we enable a single-layer neural network to learn sequence representations. Our experiments show that Comply not only supersedes FlyVec but also performs on par with significantly larger state-of-the-art models. We achieve this without additional parameters. Comply yields sparse contextual representations of sentences that can be interpreted explicitly from the neuron weights.
nan
Article 537
Title@2025-07-15 (2): Generalising Battery Control in Net-Zero Buildings via Personalised Federated RL
Title: Generalising Battery Control in Net-Zero Buildings via Personalised Federated RL | Verallgemeinerung der Batteriesteuerung in Net-Zero-Gebäuden durch personalisierte Federated RL | 通过个性化联式RL对净零楼的通用电池控制 2412.20946v2 |
Authors (3): Nicolas M Cuadrado Avila, Samuel Horváth, Martin Takáč
This work studies the challenge of optimal energy management in building-based microgrids through a collaborative and privacy-preserving framework. We evaluated two common RL algorithms (PPO and TRPO) in different collaborative setups to manage distributed energy resources (DERs) efficiently. Using a customized version of the CityLearn environment and synthetically generated data, we simulate and design net-zero energy scenarios for microgrids composed of multiple buildings. Our approach emphasizes reducing energy costs and carbon emissions while ensuring privacy. Experimental results demonstrate that Federated TRPO is comparable with state-of-the-art federated RL methodologies without hyperparameter tuning. The proposed framework highlights the feasibility of collaborative learning for achieving optimal control policies in energy systems, advancing the goals of sustainable and efficient smart grids. Our code is accessible \href{https://github.com/Optimization-and-Machine-Learning-Lab/energy_fed_trpo.git}{\textit{this repo}}.
nan
Article 538
Title@2025-07-15 (2): Solar Flare Prediction Using Long Short-term Memory (LSTM) and Decomposition-LSTM with Sliding Window Pattern Recognition
Title: Solar Flare Prediction Using Long Short-term Memory (LSTM) and Decomposition-LSTM with Sliding Window Pattern Recognition | Solarflare-Vorhersage mit Langzeit-Kurzzeitspeicher (LSTM) und Zersetzung-LSTM mit Schiebefenstermustererkennung | 使用长期短期内存(LSTM)和分解(SLSTM)的太阳光线预测和用滑式窗口模式识别的分解(SLTM) 2507.05313v2 |
Authors (3): Zeinab Hassani, Davud Mohammadpur, Hossein Safari
We investigate the use of Long Short-Term Memory (LSTM) and Decomposition-LSTM (DLSTM) networks, combined with an ensemble algorithm, to predict solar flare occurrences using time-series data from the GOES catalog. The dataset spans from 2003 to 2023 and includes 151,071 flare events. Among approximately possible patterns, 7,552 yearly pattern windows are identified, highlighting the challenge of long-term forecasting due to the Sun’s complex, self-organized criticality-driven behavior. A sliding window technique is employed to detect temporal quasi-patterns in both irregular and regularized flare time series. Regularization reduces complexity, enhances large flare activity, and captures active days more effectively. To address class imbalance, resampling methods are applied. LSTM and DLSTM models are trained on sequences of peak fluxes and waiting times from irregular time series, while LSTM and DLSTM, integrated with an ensemble approach, are applied to sliding windows of regularized time series with a 3-hour interval. Performance metrics, particularly TSS (0.74), recall (0.95) and the area under the curve (AUC=0.87) in the receiver operating characteristic (ROC), indicate that DLSTM with an ensemble approach on regularized time series outperforms other models, offering more accurate large-flare forecasts with fewer false errors compared to models trained on irregular time series. The superior performance of DLSTM is attributed to its ability to decompose time series into trend and seasonal components, effectively isolating random noise. This study underscores the potential of advanced machine learning techniques for solar flare prediction and highlights the importance of incorporating various solar cycle phases and resampling strategies to enhance forecasting reliability.
nan
Article 539
Title@2025-07-15 (2): GATE: Graph Attention Neural Networks with Real-Time Edge Construction for Robust Indoor Localization using Mobile Embedded Devices
Title: GATE: Graph Attention Neural Networks with Real-Time Edge Construction for Robust Indoor Localization using Mobile Embedded Devices | GATE: Grafik-Achtung Neurale Netzwerke mit Echtzeit-Edge-Konstruktion für robuste Indoor-Lokalisierung mit mobilen Embedded-Geräten | GATE:利用移动嵌入装置实时边缘建设硬式室内本地化的图形关注神经网络 2507.11053v1 |
Authors (2): Danish Gufran, Sudeep Pasricha
Accurate indoor localization is crucial for enabling spatial context in smart environments and navigation systems. Wi-Fi Received Signal Strength (RSS) fingerprinting is a widely used indoor localization approach due to its compatibility with mobile embedded devices. Deep Learning (DL) models improve accuracy in localization tasks by learning RSS variations across locations, but they assume fingerprint vectors exist in a Euclidean space, failing to incorporate spatial relationships and the non-uniform distribution of real-world RSS noise. This results in poor generalization across heterogeneous mobile devices, where variations in hardware and signal processing distort RSS readings. Graph Neural Networks (GNNs) can improve upon conventional DL models by encoding indoor locations as nodes and modeling their spatial and signal relationships as edges. However, GNNs struggle with non-Euclidean noise distributions and suffer from the GNN blind spot problem, leading to degraded accuracy in environments with dense access points (APs). To address these challenges, we propose GATE, a novel framework that constructs an adaptive graph representation of fingerprint vectors while preserving an indoor state-space topology, modeling the non-Euclidean structure of RSS noise to mitigate environmental noise and address device heterogeneity. GATE introduces 1) a novel Attention Hyperspace Vector (AHV) for enhanced message passing, 2) a novel Multi-Dimensional Hyperspace Vector (MDHV) to mitigate the GNN blind spot, and 3) an new Real-Time Edge Construction (RTEC) approach for dynamic graph adaptation. Extensive real-world evaluations across multiple indoor spaces with varying path lengths, AP densities, and heterogeneous devices demonstrate that GATE achieves 1.6x to 4.72x lower mean localization errors and 1.85x to 4.57x lower worst-case errors compared to state-of-the-art indoor localization frameworks.
nan
Article 540
Title@2025-07-15 (2): The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products
Title: The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products | Der Preis der Freiheit: Exploring Expressivity und Runtime Tradeoffs in gleichwertigen Tensor-Produkten | 《自由的代价:探讨平等出租产品中的表达性和时间取舍》 2506.13523v2 |
Authors (4): YuQing Xie, Ameya Daigavane, Mit Kotak, Tess Smidt
$E(3)$-equivariant neural networks have demonstrated success across a wide range of 3D modelling tasks. A fundamental operation in these networks is the tensor product, which interacts two geometric features in an equivariant manner to create new features. Due to the high computational complexity of the tensor product, significant effort has been invested to optimize the runtime of this operation. For example, Luo et al. (2024) recently proposed the Gaunt tensor product (GTP) which promises a significant speedup. In this work, we provide a careful, systematic analysis of a number of tensor product operations. In particular, we emphasize that different tensor products are not performing the same operation. The reported speedups typically come at the cost of expressivity. We introduce measures of expressivity and interactability to characterize these differences. In addition, we realized the original implementation of GTP can be greatly simplified by directly using a spherical grid at no cost in asymptotic runtime. This spherical grid approach is faster on our benchmarks and in actual training of the MACE interatomic potential by 30%. Finally, we provide the first systematic microbenchmarks of the various tensor product operations. We find that the theoretical runtime guarantees can differ wildly from empirical performance, demonstrating the need for careful application-specific benchmarking. Code is available at https://github.com/atomicarchitects/PriceofFreedom.
nan
Article 541
Title@2025-07-15 (2): ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification
Title: ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification | ReVISE: Verfeinern lernen zur Testzeit durch Intrinsische Selbstverifizierung | REVISE:通过内在自我核查学习在试验时进行精炼 2502.14565v2 |
Authors (5): Hyunseok Lee, Seunghyuk Oh, Jaehyung Kim, Jinwoo Shin, Jihoon Tack
Self-awareness, i.e., the ability to assess and correct one’s own generation, is a fundamental aspect of human intelligence, making its replication in large language models (LLMs) an important yet challenging task. Previous works tackle this by employing extensive reinforcement learning or rather relying on large external verifiers. In this work, we propose Refine via Intrinsic Self-Verification (ReVISE), an efficient and effective framework that enables LLMs to self-correct their outputs through self-verification. The core idea of ReVISE is to enable LLMs to verify their reasoning processes and continually rethink reasoning trajectories based on its verification. We introduce a structured curriculum based upon online preference learning to implement this efficiently. Specifically, as ReVISE involves two challenging tasks (i.e., self-verification and reasoning correction), we tackle each task sequentially using curriculum learning, collecting both failed and successful reasoning paths to construct preference pairs for efficient training. During inference, our approach enjoys natural test-time scaling by integrating self-verification and correction capabilities, further enhanced by our proposed confidence-aware decoding mechanism. Our experiments on various reasoning tasks demonstrate that ReVISE achieves efficient self-correction and significantly improves reasoning performance.
nan
Article 542
Title@2025-07-15 (2): Learning from Label Proportions and Covariate-shifted Instances
Title: Learning from Label Proportions and Covariate-shifted Instances | Lernen von Etikettenproportionen und Kovariate-verschiebten Instanzen | 从标签比例和共同变换情况中学习 2411.12334v2 |
Authors (5): Sagalpreet Singh, Navodita Sharma, Shreyas Havaldar, Rishi Saket, Aravindan Raghuveer
In many applications, especially due to lack of supervision or privacy concerns, the training data is grouped into bags of instances (feature-vectors) and for each bag we have only an aggregate label derived from the instance-labels in the bag. In learning from label proportions (LLP) the aggregate label is the average of the instance-labels in a bag, and a significant body of work has focused on training models in the LLP setting to predict instance-labels. In practice however, the training data may have fully supervised albeit covariate-shifted source data, along with the usual target data with bag-labels, and we wish to train a good instance-level predictor on the target domain. We call this the covariate-shifted hybrid LLP problem. Fully supervised covariate shifted data often has useful training signals and the goal is to leverage them for better predictive performance in the hybrid LLP setting. To achieve this, we develop methods for hybrid LLP which naturally incorporate the target bag-labels along with the source instance-labels, in the domain adaptation framework. Apart from proving theoretical guarantees bounding the target generalization error, we also conduct experiments on several publicly available datasets showing that our methods outperform LLP and domain adaptation baselines as well techniques from previous related work.
nan
Article 543
Title@2025-07-15 (2): Relative Entropy Pathwise Policy Optimization
Title: Relative Entropy Pathwise Policy Optimization | Relative Entropie pfadweise politische Optimierung | 相对 Entrop 路径式政策优化 2507.11019v1 |
Authors (9): Claas Voelcker, Axel Brunnbauer, Marcel Hussing, Michal Nauman, Pieter Abbeel, Eric Eaton, Radu Grosu, Amir-massoud Farahmand, Igor Gilitschenski
Score-function policy gradients have delivered strong results in game-playing, robotics and language-model fine-tuning. Yet its high-variance often undermines training stability. On the other hand, pathwise policy gradients alleviate the training variance, but are reliable only when driven by an accurate action-conditioned value function which is notoriously hard to train without relying on past off-policy data. In this paper, we discuss how to construct a value-gradient driven, on-policy algorithm that allow training Q-value models purely from on-policy data, unlocking the possibility of using pathwise policy updates in the context of on-policy learning. We show how to balance stochastic policies for exploration with constrained policy updates for stable training, and evaluate important architectural components that facilitate accurate value function learning. Building on these insights, we propose Relative Entropy Pathwise Policy Optimization (REPPO), an efficient on-policy algorithm that combines the sample-efficiency of pathwise policy gradients with the simplicity and minimal memory footprint of standard on-policy learning. We demonstrate that REPPO provides strong empirical performance at decreased sample requirements, wall-clock time, memory footprint as well as high hyperparameter robustness in a set of experiments on two standard GPU-parallelized benchmarks.
nan
Article 544
Title@2025-07-15 (2): Structured Preconditioners in Adaptive Optimization: A Unified Analysis
Title: Structured Preconditioners in Adaptive Optimization: A Unified Analysis | Strukturierte Vorkonditionierer in adaptiver Optimierung: Eine einheitliche Analyse | 适应性优化的结构性先决条件:统一分析 2503.10537v2 |
Authors (5): Shuo Xie, Tianhao Wang, Sashank Reddi, Sanjiv Kumar, Zhiyuan Li
We present a novel unified analysis for a broad class of adaptive optimization algorithms with structured (e.g., layerwise, diagonal, and kronecker-factored) preconditioners for both online regret minimization and offline convex optimization. Our analysis not only provides matching rate to several important structured preconditioned algorithms including diagonal AdaGrad, full-matrix AdaGrad, and AdaGrad-Norm, but also gives an improved convergence rate for a one-sided variant of Shampoo over that of original Shampoo. Interestingly, more structured preconditioners (e.g., diagonal Adagrad, AdaGrad-Norm which use less space and compute) are often presented as computationally efficient approximations to full-matrix Adagrad, aiming for improved optimization performance through better approximations. Our unified analysis challenges this prevailing view and reveals, perhaps surprisingly, that more structured preconditioners, despite using less space and computation per step, can outperform their less structured counterparts. To demonstrate this, we show that one-sided Shampoo, which is relatively much cheaper than full-matrix AdaGrad could outperform it both theoretically and experimentally.
nan
Article 545
Title@2025-07-15 (2): First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
Title: First-Order Error Matters: Accurate Compensation for Quantized Large Language Models | Error Matters: Genaue Kompensation für Quantisierte große Sprachmodelle | 第一顺序误差事项:量化大语言模型的准确补偿 2507.11017v1 |
Authors (7): Xingyu Zheng, Haotong Qin, Yuye Li, Jiakai Wang, Jinyang Guo, Michele Magno, Xianglong Liu
Post-training quantization (PTQ) offers an efficient approach to compressing large language models (LLMs), significantly reducing memory access and computational costs. Existing compensation-based weight calibration methods often rely on a second-order Taylor expansion to model quantization error, under the assumption that the first-order term is negligible in well-trained full-precision models. However, we reveal that the progressive compensation process introduces accumulated first-order deviations between latent weights and their full-precision counterparts, making this assumption fundamentally flawed. To address this, we propose FOEM, a novel PTQ method that explicitly incorporates first-order gradient terms to improve quantization error compensation. FOEM approximates gradients by directly computing the difference between latent and full-precision weights, avoiding the high cost and limited generalization of backpropagation-based gradient computation. This approach introduces minimal additional computational overhead. Moreover, FOEM leverages precomputed Cholesky factors to efficiently recover the inverse of Hessian submatrices in real time. Extensive experiments across a wide range of models and benchmarks demonstrate that FOEM consistently outperforms the classical GPTQ method. In 3-bit weight-only quantization, FOEM reduces the perplexity of Llama3-8B by 89.6%, and improves the 5-shot MMLU accuracy of Llama3-70B from 51.7% to 74.9%, approaching the full-precision performance of 78.6%. Furthermore, FOEM can be seamlessly integrated with advanced techniques such as GPTAQ and SpinQuant, yielding additional improvements under the challenging W4A4KV4 setting, and further narrowing the accuracy gap with full-precision baselines beyond what current state-of-the-art methods achieve. The code is available at https://github.com/Xingyu-Zheng/FOEM.
nan
Article 546
Title@2025-07-15 (2): Leveraging Advanced Machine Learning to Predict Turbulence Dynamics from Temperature Observations at an Experimental Prescribed Fire
Title: Leveraging Advanced Machine Learning to Predict Turbulence Dynamics from Temperature Observations at an Experimental Prescribed Fire | Nutzung von fortgeschrittenem maschinellem Lernen zur Vorhersage von Turbulenzdynamiken aus Temperaturbeobachtungen bei einem experimentellen vorgeschriebenen Feuer | 利用先进机器学习利用实验定火条件下温度观测产生的预测扰动动力学 2507.11012v1 |
Authors (6): Dipak Dulal, Joseph J. Charney, Michael R. Gallagher, Pitambar Acharya, Carmeliza Navasca, Nicholas S. Skowronski
This study explores the potential for predicting turbulent kinetic energy (TKE) from more readily acquired temperature data using temperature profiles and turbulence data collected concurrently at 10 Hz during a small experimental prescribed burn in the New Jersey Pine Barrens. Machine learning models, including Deep Neural Networks, Random Forest Regressor, Gradient Boosting, and Gaussian Process Regressor, were employed to assess the potential to predict TKE from temperature perturbations and explore temporal and spatial dynamics of correlations. Data visualization and correlation analyses revealed patterns and relationships between thermocouple temperatures and TKE, providing insight into the underlying dynamics. More accurate predictions of TKE were achieved by employing various machine learning models despite a weak correlation between the predictors and the target variable. The results demonstrate significant success, particularly from regression models, in accurately predicting the TKE. The findings of this study demonstrate a novel numerical approach to identifying new relationships between temperature and airflow processes in and around the fire environment. These relationships can help refine our understanding of combustion environment processes and the coupling and decoupling of fire environment processes necessary for improving fire operations strategy and fire and smoke model predictions. The findings of this study additionally highlight the valuable role of machine learning techniques in analyzing the complex large datasets of the fire environments, showcasing their potential to advance fire research and management practices.
nan
Article 547
Title@2025-07-15 (2): MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications
Title: MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications | MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications | MATE:为无障碍应用提供LLM 授权多机构翻译环境 2506.19502v2 |
Authors (3): Aleksandr Algazinov, Matt Laing, Paul Laban
Accessibility remains a critical concern in today’s society, as many technologies are not developed to support the full range of user needs. Existing multi-agent systems (MAS) often cannot provide comprehensive assistance for users in need due to the lack of customization stemming from closed-source designs. Consequently, individuals with disabilities frequently encounter significant barriers when attempting to interact with digital environments. We introduce MATE, a multimodal accessibility MAS, which performs the modality conversions based on the user’s needs. The system is useful for assisting people with disabilities by ensuring that data will be converted to an understandable format. For instance, if the user cannot see well and receives an image, the system converts this image to its audio description. MATE can be applied to a wide range of domains, industries, and areas, such as healthcare, and can become a useful assistant for various groups of users. The system supports multiple types of models, ranging from LLM API calling to using custom machine learning (ML) classifiers. This flexibility ensures that the system can be adapted to various needs and is compatible with a wide variety of hardware. Since the system is expected to run locally, it ensures the privacy and security of sensitive information. In addition, the framework can be effectively integrated with institutional technologies (e.g., digital healthcare service) for real-time user assistance. Furthermore, we introduce ModCon-Task-Identifier, a model that is capable of extracting the precise modality conversion task from the user input. Numerous experiments show that ModCon-Task-Identifier consistently outperforms other LLMs and statistical models on our custom data. Our code and data are publicly available at https://github.com/AlgazinovAleksandr/Multi-Agent-MATE.
nan
Article 548
Title@2025-07-15 (2): On the Similarities of Embeddings in Contrastive Learning
Title: On the Similarities of Embeddings in Contrastive Learning | Über die Ähnlichkeiten von Einbettungen im kontrastiven Lernen | 关于差异学习中的嵌入相似性 2506.09781v2 |
Authors (4): Chungpa Lee, Sehee Lim, Kibok Lee, Jy-yong Sohn
Contrastive learning operates on a simple yet effective principle: Embeddings of positive pairs are pulled together, while those of negative pairs are pushed apart. In this paper, we propose a unified framework for understanding contrastive learning through the lens of cosine similarity, and present two key theoretical insights derived from this framework. First, in full-batch settings, we show that perfect alignment of positive pairs is unattainable when negative-pair similarities fall below a threshold, and this misalignment can be mitigated by incorporating within-view negative pairs into the objective. Second, in mini-batch settings, smaller batch sizes induce stronger separation among negative pairs in the embedding space, i.e., higher variance in their similarities, which in turn degrades the quality of learned representations compared to full-batch settings. To address this, we propose an auxiliary loss that reduces the variance of negative-pair similarities in mini-batch settings. Empirical results show that incorporating the proposed loss improves performance in small-batch settings.
nan
Article 549
Title@2025-07-15 (2): AdaMuon: Adaptive Muon Optimizer
Title: AdaMuon: Adaptive Muon Optimizer | AdaMuon: Adaptiver Muon-Optimierer | AdaMuon:适应性 Muon 最佳优化剂 2507.11005v1 |
Authors (3): Chongjie Si, Debing Zhang, Wei Shen
We propose AdaMuon, an adaptive learning-rate framework built upon the recently validated Muon optimizer, which has demonstrated substantial efficiency gains over AdamW in large-scale model training. AdaMuon augments Muon with two mutually dependent modules: (1) a per-parameter second-moment modulation that captures orthogonal gradient updates to ensure update-level adaptivity, and (2) a RMS-aligned rescaling that regulates the overall update magnitude by aligning it with the intrinsic structure of the parameter space. Empirical results on multiple model scales and learning-rate regimes confirm that AdaMuon consistently outperforms the original Muon, delivering higher acceleration in convergence while maintaining training stability. Our method introduces no additional tuning burden and can be seamlessly integrated into existing Muon training pipelines.
nan
Article 550
Title@2025-07-15 (2): Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data
Title: Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data | Unwahrnehmbare Angriffe auf das menschliche Gewirr für tabellarische Daten | 用于表格数据的手工艺隐蔽的在门上对立攻击 2507.10998v1 |
Authors (6): Zhipeng He, Alexander Stevens, Chun Ouyang, Johannes De Smedt, Alistair Barros, Catarina Moreira
Adversarial attacks on tabular data present fundamental challenges distinct from image or text domains due to the heterogeneous nature of mixed categorical and numerical features. Unlike images where pixel perturbations maintain visual similarity, tabular data lacks intuitive similarity metrics, making it difficult to define imperceptible modifications. Additionally, traditional gradient-based methods prioritise $\ell_p$-norm constraints, often producing adversarial examples that deviate from the original data distributions, making them detectable. We propose a latent space perturbation framework using a mixed-input Variational Autoencoder (VAE) to generate imperceptible adversarial examples. The proposed VAE integrates categorical embeddings and numerical features into a unified latent manifold, enabling perturbations that preserve statistical consistency. We specify In-Distribution Success Rate (IDSR) to measure the proportion of adversarial examples that remain statistically indistinguishable from the input distribution. Evaluation across six publicly available datasets and three model architectures demonstrates that our method achieves substantially lower outlier rates and more consistent performance compared to traditional input-space attacks and other VAE-based methods adapted from image domain approaches. Our comprehensive analysis includes hyperparameter sensitivity, sparsity control mechanisms, and generative architectural comparisons, revealing that VAE-based attacks depend critically on reconstruction quality but offer superior practical utility when sufficient training data is available. This work highlights the importance of on-manifold perturbations for realistic adversarial attacks on tabular data, offering a robust approach for practical deployment. The source code can be accessed through https://github.com/ZhipengHe/VAE-TabAttack.
nan
Article 551
Title@2025-07-15 (2): Patch-wise Structural Loss for Time Series Forecasting
Title: Patch-wise Structural Loss for Time Series Forecasting | Patch-weise strukturelle Verluste für die Zeitreihenvorhersage | 时间序列预测的补补结构损失 2503.00877v2 |
Authors (5): Dilfira Kudrat, Zongxia Xie, Yanru Sun, Tianyu Jia, Qinghua Hu
Time-series forecasting has gained significant attention in machine learning due to its crucial role in various domains. However, most existing forecasting models rely heavily on point-wise loss functions like Mean Square Error, which treat each time step independently and neglect the structural dependencies inherent in time series data, making it challenging to capture complex temporal patterns accurately. To address these challenges, we propose a novel Patch-wise Structural (PS) loss, designed to enhance structural alignment by comparing time series at the patch level. Through leveraging local statistical properties, such as correlation, variance, and mean, PS loss captures nuanced structural discrepancies overlooked by traditional point-wise losses. Furthermore, it integrates seamlessly with point-wise loss, simultaneously addressing local structural inconsistencies and individual time-step errors. PS loss establishes a novel benchmark for accurately modeling complex time series data and provides a new perspective on time series loss function design. Extensive experiments demonstrate that PS loss significantly improves the performance of state-of-the-art models across diverse real-world datasets.
nan
Article 552
Title@2025-07-15 (2): Misalignment from Treating Means as Ends
Title: Misalignment from Treating Means as Ends | Fehlausrichtung aus der Behandlung von Mitteln als Enden | 与 “ 最终 “ 处理手段的不协调 2507.10995v1 |
Authors (3): Henrik Marklund, Alex Infanger, Benjamin Van Roy
Reward functions, learned or manually specified, are rarely perfect. Instead of accurately expressing human goals, these reward functions are often distorted by human beliefs about how best to achieve those goals. Specifically, these reward functions often express a combination of the human’s terminal goals – those which are ends in themselves – and the human’s instrumental goals – those which are means to an end. We formulate a simple example in which even slight conflation of instrumental and terminal goals results in severe misalignment: optimizing the misspecified reward function results in poor performance when measured by the true reward function. This example distills the essential properties of environments that make reinforcement learning highly sensitive to conflation of instrumental and terminal goals. We discuss how this issue can arise with a common approach to reward learning and how it can manifest in real environments.
nan
Article 553
Title@2025-07-15 (2): Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective
Title: Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective | Erforschung und Verbesserung der Initialisierung für tiefe Graphen-Neural-Netzwerke: Eine Signalverbreitungsperspektive | 探索和改进深图神经网络的初始化:信号传动视角 2506.16790v2 |
Authors (5): Senmiao Wang, Yupeng Chen, Yushun Zhang, Ruoyu Sun, Tian Ding
Graph Neural Networks (GNNs) often suffer from performance degradation as the network depth increases. This paper addresses this issue by introducing initialization methods that enhance signal propagation (SP) within GNNs. We propose three key metrics for effective SP in GNNs: forward propagation, backward propagation, and graph embedding variation (GEV). While the first two metrics derive from classical SP theory, the third is specifically designed for GNNs. We theoretically demonstrate that a broad range of commonly used initialization methods for GNNs, which exhibit performance degradation with increasing depth, fail to control these three metrics simultaneously. To deal with this limitation, a direct exploitation of the SP analysis–searching for weight initialization variances that optimize the three metrics–is shown to significantly enhance the SP in deep GCNs. This approach is called Signal Propagation on Graph-guided Initialization (SPoGInit). Our experiments demonstrate that SPoGInit outperforms commonly used initialization methods on various tasks and architectures. Notably, SPoGInit enables performance improvements as GNNs deepen, which represents a significant advancement in addressing depth-related challenges and highlights the validity and effectiveness of the SP analysis framework.
nan
Article 554
Title@2025-07-15 (2): Fully Data-driven but Interpretable Human Behavioural Modelling with Differentiable Discrete Choice Model
Title: Fully Data-driven but Interpretable Human Behavioural Modelling with Differentiable Discrete Choice Model | Vollständig datengesteuerte, aber interpretierbare menschliche Verhaltensmodellierung mit differenzierbarem diskretes Wahlmodell | 完全由数据驱动但可解释的人类行为模型与差异分辨选择模型 2412.19403v3 |
Authors (4): Fumiyasu Makinoshima, Tatsuya Mitomi, Fumiya Makihara, Eigo Segawa
Discrete choice models are essential for modelling various decision-making processes in human behaviour. However, the specification of these models has depended heavily on domain knowledge from experts, and the fully automated but interpretable modelling of complex human behaviours has been a long-standing challenge. In this paper, we introduce the differentiable discrete choice model (Diff-DCM), a fully data-driven method for the interpretable modelling, learning, prediction, and control of complex human behaviours, which is realised by differentiable programming. Solely from input features and choice outcomes without any prior knowledge, Diff-DCM can estimate interpretable closed-form utility functions that reproduce observed behaviours. Comprehensive experiments with both synthetic and real-world data demonstrate that Diff-DCM can be applied to various types of data and requires only a small amount of computational resources for the estimations, which can be completed within tens of seconds on a laptop without any accelerators. In these experiments, we also demonstrate that, using its differentiability, Diff-DCM can provide useful insights into human behaviours, such as an optimal intervention path for effective behavioural changes. This study provides a strong basis for the fully automated and reliable modelling, prediction, and control of human behaviours.
nan
Article 555
Title@2025-07-15 (2): Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
Title: Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback | Online-Intrinsische Belohnungen für Entscheidungsträger aus großen Sprachmodellen Feedback | 来自大语言模式反馈的决策者在线内部奖励 2410.23022v3 |
Authors (5): Qinqing Zheng, Mikael Henaff, Amy Zhang, Aditya Grover, Brandon Amos
Automatically synthesizing dense rewards from natural language descriptions is a promising paradigm in reinforcement learning (RL), with applications to sparse reward problems, open-ended exploration, and hierarchical skill design. Recent works have made promising steps by exploiting the prior knowledge of large language models (LLMs). However, these approaches suffer from important limitations: they are either not scalable to problems requiring billions of environment samples, due to requiring LLM annotations for each observation, or they require a diverse offline dataset, which may not exist or be impossible to collect. In this work, we address these limitations through a combination of algorithmic and systems-level contributions. We propose ONI, a distributed architecture that simultaneously learns an RL policy and an intrinsic reward function using LLM feedback. Our approach annotates the agent’s collected experience via an asynchronous LLM server, which is then distilled into an intrinsic reward model. We explore a range of algorithmic choices for reward modeling with varying complexity, including hashing, classification, and ranking models. Our approach achieves state-of-the-art performance across a range of challenging tasks from the NetHack Learning Environment, while removing the need for large offline datasets required by prior work. We make our code available at https://github.com/facebookresearch/oni .
nan
Article 556
Title@2025-07-15 (2): BMDetect: A Multimodal Deep Learning Framework for Comprehensive Biomedical Misconduct Detection
Title: BMDetect: A Multimodal Deep Learning Framework for Comprehensive Biomedical Misconduct Detection | BMDEtect: Ein multimodales Deep Learning Framework für eine umfassende biomedizinische Fehlverhaltenserkennung | BMM 检测:综合生物医学不当行为检测的多式深层学习框架 2505.05763v2 |
Authors (4): Yize Zhou, Jie Zhang, Meijie Wang, Lun Yu
Academic misconduct detection in biomedical research remains challenging due to algorithmic narrowness in existing methods and fragmented analytical pipelines. We present BMDetect, a multimodal deep learning framework that integrates journal metadata (SJR, institutional data), semantic embeddings (PubMedBERT), and GPT-4o-mined textual attributes (methodological statistics, data anomalies) for holistic manuscript evaluation. Key innovations include: (1) multimodal fusion of domain-specific features to reduce detection bias; (2) quantitative evaluation of feature importance, identifying journal authority metrics (e.g., SJR-index) and textual anomalies (e.g., statistical outliers) as dominant predictors; and (3) the BioMCD dataset, a large-scale benchmark with 13,160 retracted articles and 53,411 controls. BMDetect achieves 74.33% AUC, outperforming single-modality baselines by 8.6%, and demonstrates transferability across biomedical subfields. This work advances scalable, interpretable tools for safeguarding research integrity.
nan
Article 557
Title@2025-07-15 (2): High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization
Title: High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization | High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization | 通过适应政策同步化进行适应性政策同步化 2507.10990v1 |
Authors (1): Rodney Lafuente-Mercado
Scaling reinforcement learning (RL) workloads often requires distributing environment simulation across compute clusters. Existing frameworks entangle simulation, learning logic, and orchestration into monolithic systems, limiting modularity and reusability. We present ClusterEnv, a lightweight, learner-agnostic interface for distributed environment execution that mirrors the Gymnasium API. ClusterEnv introduces the DETACH pattern, which decouples simulation from training by offloading reset() and step() operations to remote workers while keeping learning centralized. To address policy staleness in distributed execution, we propose Adaptive Actor Policy Synchronization (AAPS), a divergence-triggered update mechanism that reduces synchronization overhead without sacrificing performance. ClusterEnv integrates cleanly into existing RL pipelines, supports both on-policy and off-policy methods, and requires minimal code changes. Experiments on discrete control tasks demonstrate that AAPS achieves high sample efficiency with significantly fewer weight updates. Source code is available at https://github.com/rodlaf/ClusterEnv.
nan
Article 558
Title@2025-07-15 (2): Trajectory Imputation in Multi-Agent Sports with Derivative-Accumulating Self-Ensemble
Title: Trajectory Imputation in Multi-Agent Sports with Derivative-Accumulating Self-Ensemble | Trajektorien-Imputation im Multi-Agenten-Sport mit demivativ-akkumulierendem Selbst-Ensemble | 多机构体育中具有衍生-累积自我集合功能的多机构体育 2408.10878v4 |
Authors (7): Han-Jun Choi, Hyunsung Kim, Minho Lee, Minchul Jeong, Chang-Jo Kim, Jinsung Yoon, Sang-Ki Ko
Multi-agent trajectory data collected from domains such as team sports often suffer from missing values due to various factors. While many imputation methods have been proposed for spatiotemporal data, they are not well-suited for multi-agent sports scenarios where player movements are highly dynamic and inter-agent interactions continuously evolve. To address these challenges, we propose MIDAS (Multi-agent Imputer with Derivative-Accumulating Self-ensemble), a framework that imputes multi-agent trajectories with high accuracy and physical plausibility. It jointly predicts positions, velocities, and accelerations through a Set Transformer-based neural network and generates alternative estimates by recursively accumulating predicted velocity and acceleration values. These predictions are then combined using a learnable weighted ensemble to produce final imputed trajectories. Experiments on three sports datasets demonstrate that MIDAS significantly outperforms existing baselines in both positional accuracy and physical plausibility. Lastly, we showcase use cases of MIDAS, such as approximating total distance and pass success probability, to highlight its applicability to practical downstream tasks that require complete tracking data.
nan
Article 559
Title@2025-07-15 (2): StellarF: A Lora-Adapter Integrated Large Model Framework for Stellar Flare Forecasting with Historical & Statistical Data
Title: StellarF: A Lora-Adapter Integrated Large Model Framework for Stellar Flare Forecasting with Historical & Statistical Data | StellarF: Ein Lora-Adapter integriertes Large Model Framework für Stellar Flare-Prognose mit historischen und statistischen Daten | StellarF: 利用历史和统计数据预测Stellar 火焰的Lora-Adapter综合大型模型框架 2507.10986v1 |
Authors (6): Tianyu Su, Zhiqiang Zou, Ali Luo, Xiao Kong, Qingyu Lu, Min Li
Stellar flare forecasting, a critical research frontier in astronomy, offers profound insights into stellar activity. However, the field is constrained by both the sparsity of recorded flare events and the absence of domain-specific large-scale predictive models. To address these challenges, this study introduces StellarF (Stellar Flare Forecasting), a novel large model that leverages Low-Rank (LoRA) and Adapter techniques to parameter-efficient learning for stellar flare forecasting. At its core, StellarF integrates an flare statistical information module with a historical flare record module, enabling multi-scale pattern recognition from observational data. Extensive experiments on our self-constructed datasets (derived from Kepler and TESS light curves) demonstrate that StellarF achieves state-of-the-art performance compared to existing methods. The proposed prediction paradigm establishes a novel methodological framework for advancing astrophysical research and cross-disciplinary applications.
nan
Article 560
Title@2025-07-15 (2): Physics-Informed Neural Networks For Semiconductor Film Deposition: A Review
Title: Physics-Informed Neural Networks For Semiconductor Film Deposition: A Review | Physik-informierte Neuronale Netzwerke für Halbleiterfilmabscheidung: Ein Rückblick | 半导体电影沉积的物理内建神经网络:回顾 2507.10983v1 |
Authors (3): Tao Han, Zahra Taheri, Hyunwoong Ko
Semiconductor manufacturing relies heavily on film deposition processes, such as Chemical Vapor Deposition and Physical Vapor Deposition. These complex processes require precise control to achieve film uniformity, proper adhesion, and desired functionality. Recent advancements in Physics-Informed Neural Networks (PINNs), an innovative machine learning (ML) approach, have shown significant promise in addressing challenges related to process control, quality assurance, and predictive modeling within semiconductor film deposition and other manufacturing domains. This paper provides a comprehensive review of ML applications targeted at semiconductor film deposition processes. Through a thematic analysis, we identify key trends, existing limitations, and research gaps, offering insights into both the advantages and constraints of current methodologies. Our structured analysis aims to highlight the potential integration of these ML techniques to enhance interpretability, accuracy, and robustness in film deposition processes. Additionally, we examine state-of-the-art PINN methods, discussing strategies for embedding physical knowledge, governing laws, and partial differential equations into advanced neural network architectures tailored for semiconductor manufacturing. Based on this detailed review, we propose novel research directions that integrate the strengths of PINNs to significantly advance film deposition processes. The contributions of this study include establishing a clear pathway for future research in integrating physics-informed ML frameworks, addressing existing methodological gaps, and ultimately improving precision, scalability, and operational efficiency within semiconductor manufacturing.
nan
Article 561
Title@2025-07-15 (2): Distribution-Free Uncertainty-Aware Virtual Sensing via Conformalized Neural Operators
Title: Distribution-Free Uncertainty-Aware Virtual Sensing via Conformalized Neural Operators | Distributionsfreies Unsichersein-Bewusstsein Virtuelles Sensing über konformisierte Neuraloperatoren | 通过正规神经操作员进行分布式无不流通的不确定性-软件虚拟遥感 2507.11574v1 |
Authors (5): Kazuma Kobayashi, Shailesh Garg, Farid Ahmed, Souvik Chakraborty, Syed Bahauddin Alam
Robust uncertainty quantification (UQ) remains a critical barrier to the safe deployment of deep learning in real-time virtual sensing, particularly in high-stakes domains where sparse, noisy, or non-collocated sensor data are the norm. We introduce the Conformalized Monte Carlo Operator (CMCO), a framework that transforms neural operator-based virtual sensing with calibrated, distribution-free prediction intervals. By unifying Monte Carlo dropout with split conformal prediction in a single DeepONet architecture, CMCO achieves spatially resolved uncertainty estimates without retraining, ensembling, or custom loss design. Our method addresses a longstanding challenge: how to endow operator learning with efficient and reliable UQ across heterogeneous domains. Through rigorous evaluation on three distinct applications: turbulent flow, elastoplastic deformation, and global cosmic radiation dose estimation-CMCO consistently attains near-nominal empirical coverage, even in settings with strong spatial gradients and proxy-based sensing. This breakthrough offers a general-purpose, plug-and-play UQ solution for neural operators, unlocking real-time, trustworthy inference in digital twins, sensor fusion, and safety-critical monitoring. By bridging theory and deployment with minimal computational overhead, CMCO establishes a new foundation for scalable, generalizable, and uncertainty-aware scientific machine learning.
nan
Article 562
Title@2025-07-15 (2): Unified ODE Analysis of Smooth Q-Learning Algorithms
Title: Unified ODE Analysis of Smooth Q-Learning Algorithms | Einheitliche ODE-Analyse von glatten Q-Learning-Algorithmen | 对平滑的Q-学习算法进行UI ODE分析 2404.14442v4 |
Authors (1): Donghwan Lee
Convergence of Q-learning has been the focus of extensive research over the past several decades. Recently, an asymptotic convergence analysis for Q-learning was introduced using a switching system framework. This approach applies the so-called ordinary differential equation (ODE) approach to prove the convergence of the asynchronous Q-learning modeled as a continuous-time switching system, where notions from switching system theory are used to prove its asymptotic stability without using explicit Lyapunov arguments. However, to prove stability, restrictive conditions, such as quasi-monotonicity, must be satisfied for the underlying switching systems, which makes it hard to easily generalize the analysis method to other reinforcement learning algorithms, such as the smooth Q-learning variants. In this paper, we present a more general and unified convergence analysis that improves upon the switching system approach and can analyze Q-learning and its smooth variants. The proposed analysis is motivated by previous work on the convergence of synchronous Q-learning based on $p$-norm serving as a Lyapunov function. However, the proposed analysis addresses more general ODE models that can cover both asynchronous Q-learning and its smooth versions with simpler frameworks.
nan
Article 563
Title@2025-07-15 (2): Seeding neural network quantum states with tensor network states
Title: Seeding neural network quantum states with tensor network states | Neurale Netzwerk-Quantenzustände mit Tensor-Netzwerkzuständen absähen | 种子神经网络量量度状态与 ARW 网络状态 2506.23550v2 |
Authors (2): Ryui Kaneko, Shimpei Goto
We find an efficient approach to approximately convert matrix product states (MPSs) into restricted Boltzmann machine wave functions consisting of a multinomial hidden unit through a canonical polyadic (CP) decomposition of the MPSs. This method allows us to generate well-behaved initial neural network quantum states for quantum many-body ground-state calculations in polynomial time of the number of variational parameters and systematically shorten the distance between the initial states and the ground states with increasing the rank of the CP decomposition. We demonstrate the efficiency of our method by taking the transverse-field Ising model as an example and discuss possible applications of our method to more general quantum many-body systems in which the ground-state wave functions possess complex nodal structures.
nan
Article 564
Title@2025-07-15 (2): Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?
Title: Is Training Data Quality or Quantity More Impactful to Small Language Model Performance? | Ist Training Daten Qualität oder Quantität Impactful to Small Language Model Performance? | 培训数据质量或数量是否对小型语言模范业绩更有影响? 2411.15821v4 |
Authors (2): Aryan Sajith, Krishna Chaitanya Rao Kathala
This study investigates the relative impact of training data quality versus quantity on the performance of small language models (SLMs), utilizing the TinyStories dataset for empirical analysis. Analysis of dataset variations with respect to size (25% and 50% of the original size) and duplication (controlled rates of 25%, 50%, 75%, and 100%) were performed. Model performance was evaluated based on the validation loss, accuracy, and perplexity metrics. Results indicate training data quality plays a more significant role in the overall performance of SLMs, especially given scale of this experiment. Minimal duplication positively impacted model accuracy (+0.87% increase in accuracy at 25% duplication) without significantly increasing perplexity (+0.52% increase going from 0% to 25% duplication) but excessive duplication led to pronounced performance degradation (-40% drop in accuracy at 100% duplication). The implications of this exploration extend beyond just model performance; training large-scale models imposes significant financial and computational burdens, which can be prohibitive for organizations, individuals, and the public at large, especially in developing countries. Additionally, the energy consumption associated with large-scale training raises environmental concerns. Understanding the relative importance of data quality versus quantity could democratize AI technology, making advanced models more accessible and sustainable for all.
nan
Article 565
Title@2025-07-15 (2): GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering
Title: GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering | GOLFS: Feature-Auswahl durch Kombination sowohl globaler als auch lokaler Informationen für hochdimensionales Clustering | GOLFS:通过将全球和地方信息相结合,为高维度集束组合组合选择特选 2507.10956v1 |
Authors (4): Zhaoyu Xing, Yang Wan, Juan Wen, Wei Zhong
It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn the pseudo labels and select the discriminative features simultaneously, we propose a new unsupervised feature selection method, named GlObal and Local information combined Feature Selection (GOLFS), for high dimensional clustering problems. The GOLFS algorithm combines both local geometric structure via manifold learning and global correlation structure of samples via regularized self-representation to select the discriminative features. The combination improves the accuracy of both feature selection and clustering by exploiting more comprehensive information. In addition, an iterative algorithm is proposed to solve the optimization problem and the convergency is proved. Simulations and two real data applications demonstrate the excellent finite-sample performance of GOLFS on both feature selection and clustering.
nan
Article 566
Title@2025-07-15 (2): Diffusion Decoding for Peptide De Novo Sequencing
Title: Diffusion Decoding for Peptide De Novo Sequencing | Diffusionsdekodierung für Peptid De Novo Sequenzierung | 用于新先令Peptide的分解 2507.10955v1 |
Authors (2): Chi-en Amy Tai, Alexander Wong
Peptide de novo sequencing is a method used to reconstruct amino acid sequences from tandem mass spectrometry data without relying on existing protein sequence databases. Traditional deep learning approaches, such as Casanovo, mainly utilize autoregressive decoders and predict amino acids sequentially. Subsequently, they encounter cascading errors and fail to leverage high-confidence regions effectively. To address these issues, this paper investigates using diffusion decoders adapted for the discrete data domain. These decoders provide a different approach, allowing sequence generation to start from any peptide segment, thereby enhancing prediction accuracy. We experiment with three different diffusion decoder designs, knapsack beam search, and various loss functions. We find knapsack beam search did not improve performance metrics and simply replacing the transformer decoder with a diffusion decoder lowered performance. Although peptide precision and recall were still 0, the best diffusion decoder design with the DINOISER loss function obtained a statistically significant improvement in amino acid recall by 0.373 compared to the baseline autoregressive decoder-based Casanovo model. These findings highlight the potential of diffusion decoders to not only enhance model sensitivity but also drive significant advancements in peptide de novo sequencing.
nan
Article 567
Title@2025-07-15 (2): Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach
Title: Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach | Enthüllen von Unterschieden in generativen Modellen: Ein skalierbarer Differential-Clustering-Ansatz | 创创型模型中无法消除的差别:可缩放差异群集办法 2405.02700v3 |
Authors (4): Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li, Farzan Farnia
A fine-grained comparison of generative models requires the identification of sample types generated differently by each of the involved models. While quantitative scores have been proposed in the literature to rank different generative models, score-based evaluation and ranking do not reveal the nuanced differences between the generative models in producing different sample types. In this work, we propose solving a differential clustering problem to detect sample types generated differently by two generative models. To solve the differential clustering problem, we develop a spectral method called Fourier-based Identification of Novel Clusters (FINC) to identify modes produced by a generative model with a higher frequency in comparison to a reference distribution. FINC provides a scalable algorithm based on random Fourier features to estimate the eigenspace of kernel covariance matrices of two generative models and utilize the principal eigendirections to detect the sample types present more dominantly in each model. We demonstrate the application of the FINC method to large-scale computer vision datasets and generative modeling frameworks. Our numerical results suggest the scalability of the developed Fourier-based method in highlighting the sample types produced with different frequencies by generative models. The project code is available at https://github.com/buyeah1109/FINC.
nan
Article 568
Title@2025-07-15 (2): Rethinking the Foundations for Continual Reinforcement Learning
Title: Rethinking the Foundations for Continual Reinforcement Learning | Umdenken über die Grundlagen des kontinuierlichen Ausbaus des Lernens | 重新思考不断加强学习的基础 2504.08161v3 |
Authors (4): Esraa Elelimy, David Szepesvari, Martha White, Michael Bowling
In the traditional view of reinforcement learning, the agent’s goal is to find an optimal policy that maximizes its expected sum of rewards. Once the agent finds this policy, the learning ends. This view contrasts with \emph{continual reinforcement learning}, where learning does not end, and agents are expected to continually learn and adapt indefinitely. Despite the clear distinction between these two paradigms of learning, much of the progress in continual reinforcement learning has been shaped by foundations rooted in the traditional view of reinforcement learning. In this paper, we first examine whether the foundations of traditional reinforcement learning are suitable for the continual reinforcement learning paradigm. We identify four key pillars of the traditional reinforcement learning foundations that are antithetical to the goals of continual learning: the Markov decision process formalism, the focus on atemporal artifacts, the expected sum of rewards as an evaluation metric, and episodic benchmark environments that embrace the other three foundations. We then propose a new formalism that sheds the first and the third foundations and replaces them with the history process as a mathematical formalism and a new definition of deviation regret, adapted for continual learning, as an evaluation metric. Finally, we discuss possible approaches to shed the other two foundations.
nan
Article 569
Title@2025-07-15 (2): SimAD: A Simple Dissimilarity-based Approach for Time Series Anomaly Detection
Title: SimAD: A Simple Dissimilarity-based Approach for Time Series Anomaly Detection | SimAD: Ein einfacher, auf Dissimilarität basierender Ansatz zur Erkennung von Zeitreihenanomalien | SMAD: 一种基于时间序列异常探测的简单差异法 2405.11238v2 |
Authors (8): Zhijie Zhong, Zhiwen Yu, Xing Xi, Yue Xu, Wenming Cao, Yiyuan Yang, Kaixiang Yang, Jane You
Despite the prevalence of reconstruction-based deep learning methods, time series anomaly detection remains a tremendous challenge. Existing approaches often struggle with limited temporal contexts, insufficient representation of normal patterns, and flawed evaluation metrics, all of which hinder their effectiveness in detecting anomalous behavior. To address these issues, we introduce a $\textbf{Sim}$ple dissimilarity-based approach for time series $\textbf{A}$nomaly $\textbf{D}$etection, referred to as $\textbf{SimAD}$. Specifically, SimAD first incorporates a patching-based feature extractor capable of processing extended temporal windows and employs the EmbedPatch encoder to fully integrate normal behavioral patterns. Second, we design an innovative ContrastFusion module in SimAD, which strengthens the robustness of anomaly detection by highlighting the distributional differences between normal and abnormal data. Third, we introduce two robust enhanced evaluation metrics, Unbiased Affiliation (UAff) and Normalized Affiliation (NAff), designed to overcome the limitations of existing metrics by providing better distinctiveness and semantic clarity. The reliability of these two metrics has been demonstrated by both theoretical and experimental analyses. Experiments conducted on seven diverse time series datasets clearly demonstrate SimAD’s superior performance compared to state-of-the-art methods, achieving relative improvements of $\textbf{19.85%}$ on F1, $\textbf{4.44%}$ on Aff-F1, $\textbf{77.79%}$ on NAff-F1, and $\textbf{9.69%}$ on AUC on six multivariate datasets. Code and pre-trained models are available at https://github.com/EmorZz1G/SimAD.
nan
Article 570
Title@2025-07-15 (2): Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models
Title: Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models | Auf dem Weg zu einem praktischen Benchmarking von Datenreinigungstechniken: Authentische Fehler über große Sprachmodelle generieren | 制定数据清理技术实用基准:通过大语言模式产生真实错误 2507.10934v1 |
Authors (6): Xinyuan Liu, Jiahui Chen, Bocheng Hu, Yu Sun, Xinyang Chen, Shaoxu Song
Data quality remains an important challenge in data-driven systems, as errors in tabular data can severely compromise downstream analytics and machine learning performance. Although numerous error detection algorithms have been proposed, the lack of diverse, real-world error datasets limits comprehensive evaluation. Manual error annotation is both time-consuming and inconsistent, motivating the exploration of synthetic error generation as an alternative. In this work, we introduce TableEG, a framework that leverages large language models (LLMs) to generate authentic errors. By employing a table fine-tuning strategy and a triplet representation $(I, T, O)$ to model error generation, detection, and correction tasks, TableEG captures the complex dependencies inherent in two-dimensional tables. Trained on 12 real-world datasets spanning 10 diverse domains, TableEG ensures that the synthesized errors faithfully reflect authentic error distributions. Experimental results indicate that errors generated by TableEG exhibit superior pattern and distribution similarity compared to both rule-based methods and LLM-generated errors without fine-tuning. Furthermore, performance metrics on TableEG-generated errors closely align with those on real-world errors across nearly all datasets and detection algorithms, particularly for machine learning based detection techniques. Overall, TableEG not only bridges the gap between synthetic and real-world errors but also establishes a robust benchmark for subsequent error detection and correction tasks.
nan
Article 571
Title@2025-07-15 (2): Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout
Title: Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout | Effizientes Federated Learning mit heterogenen Daten und adaptivem Dropout | 采用异种数据和适应性辍学的高效联邦学习 2507.10430v2 |
Authors (10): Ji Liu, Beichen Ma, Qiaolin Yu, Ruoming Jin, Jingbo Zhou, Yang Zhou, Huaiyu Dai, Haixun Wang, Dejing Dou, Patrick Valduriez
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices. The data distributed among the edge devices is highly heterogeneous. Thus, FL faces the challenge of data distribution and heterogeneity, where non-Independent and Identically Distributed (non-IID) data across edge devices may yield in significant accuracy drop. Furthermore, the limited computation and communication capabilities of edge devices increase the likelihood of stragglers, thus leading to slow model convergence. In this paper, we propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD). FedDH dynamically adjusts the weights of each local model within the model aggregation process based on the non-IID degree of heterogeneous data to deal with the statistical data heterogeneity. FedAD performs neuron-adaptive operations in response to heterogeneous devices to improve accuracy while achieving superb efficiency. The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and computation cost (up to 15.0% smaller).
nan
Article 572
Title@2025-07-15 (2): Compositional Flows for 3D Molecule and Synthesis Pathway Co-design
Title: Compositional Flows for 3D Molecule and Synthesis Pathway Co-design | Kompositionsflüsse für 3D-Molekül und Synthese Pathway Co-Design | 三维分子和综合途径共同设计的组成流程 2504.08051v2 |
Authors (7): Tony Shen, Seonghwan Seo, Ross Irwin, Kieran Didi, Simon Olsson, Woo Youn Kim, Martin Ester
Many generative applications, such as synthesis-based 3D molecular design, involve constructing compositional objects with continuous features. Here, we introduce Compositional Generative Flows (CGFlow), a novel framework that extends flow matching to generate objects in compositional steps while modeling continuous states. Our key insight is that modeling compositional state transitions can be formulated as a straightforward extension of the flow matching interpolation process. We further build upon the theoretical foundations of generative flow networks (GFlowNets), enabling reward-guided sampling of compositional structures. We apply CGFlow to synthesizable drug design by jointly designing the molecule’s synthetic pathway with its 3D binding pose. Our approach achieves state-of-the-art binding affinity on all 15 targets from the LIT-PCBA benchmark, and 5.8$\times$ improvement in sampling efficiency compared to 2D synthesis-based baseline. To our best knowledge, our method is also the first to achieve state of-art-performance in both Vina Dock (-9.38) and AiZynth success rate (62.2\%) on the CrossDocked benchmark.
nan
Article 573
Title@2025-07-15 (2): Representation Bending for Large Language Model Safety
Title: Representation Bending for Large Language Model Safety | Darstellungsbiegen für große Sprachmodellsicherheit | 大语文示范语文安全示范语文代表名单 2504.01550v3 |
Authors (10): Ashkan Yousefpour, Taeheon Kim, Ryan S. Kwon, Seungbeen Lee, Wonje Jeung, Seungju Han, Alvin Wan, Harrison Ngan, Youngjae Yu, Jonghyun Choi
Large Language Models (LLMs) have emerged as powerful tools, but their inherent safety risks - ranging from harmful content generation to broader societal harms - pose significant challenges. These risks can be amplified by the recent adversarial attacks, fine-tuning vulnerabilities, and the increasing deployment of LLMs in high-stakes environments. Existing safety-enhancing techniques, such as fine-tuning with human feedback or adversarial training, are still vulnerable as they address specific threats and often fail to generalize across unseen attacks, or require manual system-level defenses. This paper introduces RepBend, a novel approach that fundamentally disrupts the representations underlying harmful behaviors in LLMs, offering a scalable solution to enhance (potentially inherent) safety. RepBend brings the idea of activation steering - simple vector arithmetic for steering model’s behavior during inference - to loss-based fine-tuning. Through extensive evaluation, RepBend achieves state-of-the-art performance, outperforming prior methods such as Circuit Breaker, RMU, and NPO, with up to 95% reduction in attack success rates across diverse jailbreak benchmarks, all with negligible reduction in model usability and general capabilities.
nan
Article 574
Title@2025-07-15 (2): A Learning Framework For Cooperative Collision Avoidance of UAV Swarms Leveraging Domain Knowledge
Title: A Learning Framework For Cooperative Collision Avoidance of UAV Swarms Leveraging Domain Knowledge | Ein Lernrahmen zur kooperativen Kollision Vermeidung von UAV-Schwärmen Nutzung von Domain-Wissen | 合作协作避免无人驾驶航空飞行器冲冲冲器利用域域知识学习框架 2507.10913v1 |
Authors (3): Shuangyao Huang, Haibo Zhang, Zhiyi Huang
This paper presents a multi-agent reinforcement learning (MARL) framework for cooperative collision avoidance of UAV swarms leveraging domain knowledge-driven reward. The reward is derived from knowledge in the domain of image processing, approximating contours on a two-dimensional field. By modeling obstacles as maxima on the field, collisions are inherently avoided as contours never go through peaks or intersect. Additionally, counters are smooth and energy-efficient. Our framework enables training with large swarm sizes as the agent interaction is minimized and the need for complex credit assignment schemes or observation sharing mechanisms in state-of-the-art MARL approaches are eliminated. Moreover, UAVs obtain the ability to adapt to complex environments where contours may be non-viable or non-existent through intensive training. Extensive experiments are conducted to evaluate the performances of our framework against state-of-the-art MARL algorithms.
nan
Article 575
Title@2025-07-15 (2): View Invariant Learning for Vision-Language Navigation in Continuous Environments
Title: View Invariant Learning for Vision-Language Navigation in Continuous Environments | Invariantes Lernen für Vision-Language-Navigation in kontinuierlichen Umgebungen anzeigen | 查看持续环境中愿景-语言导航变量学习 2507.08831v2 |
Authors (5): Josh Qixuan Sun, Xiaoying Xing, Huaiyuan Weng, Chul Min Yeum, Mark Crowley
Vision-Language Navigation in Continuous Environments (VLNCE), where an agent follows instructions and moves freely to reach a destination, is a key research problem in embodied AI. However, most navigation policies are sensitive to viewpoint changes, i.e., variations in camera height and viewing angle that alter the agent’s observation. In this paper, we introduce a generalized scenario, V2-VLNCE (VLNCE with Varied Viewpoints), and propose VIL (View Invariant Learning), a view-invariant post-training strategy that enhances the robustness of existing navigation policies to changes in camera viewpoint. VIL employs a contrastive learning framework to learn sparse and view-invariant features. Additionally, we introduce a teacher-student framework for the Waypoint Predictor Module, a core component of most VLNCE baselines, where a view-dependent teacher model distills knowledge into a view-invariant student model. We employ an end-to-end training paradigm to jointly optimize these components, thus eliminating the cost for individual module training. Empirical results show that our method outperforms state-of-the-art approaches on V2-VLNCE by 8-15% measured on Success Rate for two standard benchmark datasets R2R-CE and RxR-CE. Furthermore, we evaluate VIL under the standard VLNCE setting and find that, despite being trained for varied viewpoints, it often still improves performance. On the more challenging RxR-CE dataset, our method also achieved state-of-the-art performance across all metrics when compared to other map-free methods. This suggests that adding VIL does not diminish the standard viewpoint performance and can serve as a plug-and-play post-training method.
nan
Article 576
Title@2025-07-15 (2): Class-Proportional Coreset Selection for Difficulty-Separable Data
Title: Class-Proportional Coreset Selection for Difficulty-Separable Data | Klasse-Proportionale Coreset-Auswahl für schwer trennbare Daten | 难分离数据的类类( Palportal) 核心集选择 2507.10904v1 |
Authors (3): Elisa Tsai, Haizhong Zheng, Atul Prakash
High-quality training data is essential for building reliable and efficient machine learning systems. One-shot coreset selection addresses this by pruning the dataset while maintaining or even improving model performance, often relying on training-dynamics-based data difficulty scores. However, most existing methods implicitly assume class-wise homogeneity in data difficulty, overlooking variation in data difficulty across different classes. In this work, we challenge this assumption by showing that, in domains such as network intrusion detection and medical imaging, data difficulty often clusters by class. We formalize this as class-difficulty separability and introduce the Class Difficulty Separability Coefficient (CDSC) as a quantitative measure. We demonstrate that high CDSC values correlate with performance degradation in class-agnostic coreset methods, which tend to overrepresent easy majority classes while neglecting rare but informative ones. To address this, we introduce class-proportional variants of multiple sampling strategies. Evaluated on five diverse datasets spanning security and medical domains, our methods consistently achieve state-of-the-art data efficiency. For instance, on CTU-13, at an extreme 99% pruning rate, a class-proportional variant of Coverage-centric Coreset Selection (CCS-CP) shows remarkable stability, with accuracy dropping only 2.58%, precision 0.49%, and recall 0.19%. In contrast, the class-agnostic CCS baseline, the next best method, suffers sharper declines of 7.59% in accuracy, 4.57% in precision, and 4.11% in recall. We further show that aggressive pruning enhances generalization in noisy, imbalanced, and large-scale datasets. Our results underscore that explicitly modeling class-difficulty separability leads to more effective, robust, and generalizable data pruning, particularly in high-stakes scenarios.
nan
Article 577
Title@2025-07-15 (2): LiLM-RDB-SFC: Lightweight Language Model with Relational Database-Guided DRL for Optimized SFC Provisioning
Title: LiLM-RDB-SFC: Lightweight Language Model with Relational Database-Guided DRL for Optimized SFC Provisioning | LiLM-RDB-SFC: Leichtes Sprachmodell mit relationaler Datenbank-geführter DRL für optimierte SFC-Provisionierung | LILM-RDB-SFC:为优化SFC供应而与关系数据库-指导DRL 优化SFC供应的轻量语言模型 2507.10903v1 |
Authors (5): Parisa Fard Moshiri, Xinyu Zhu, Poonam Lohan, Burak Kantarci, Emil Janulewicz
Effective management of Service Function Chains (SFCs) and optimal Virtual Network Function (VNF) placement are critical challenges in modern Software-Defined Networking (SDN) and Network Function Virtualization (NFV) environments. Although Deep Reinforcement Learning (DRL) is widely adopted for dynamic network decision-making, its inherent dependency on structured data and fixed action rules often limits adaptability and responsiveness, particularly under unpredictable network conditions. This paper introduces LiLM-RDB-SFC, a novel approach combining Lightweight Language Model (LiLM) with Relational Database (RDB) to answer network state queries to guide DRL model for efficient SFC provisioning. Our proposed approach leverages two LiLMs, Bidirectional and Auto-Regressive Transformers (BART) and the Fine-tuned Language Net T5 (FLAN-T5), to interpret network data and support diverse query types related to SFC demands, data center resources, and VNF availability. Results demonstrate that FLAN-T5 outperforms BART with a lower test loss (0.00161 compared to 0.00734), higher accuracy (94.79% compared to 80.2%), and less processing time (2h 2min compared to 2h 38min). Moreover, when compared to the large language model SQLCoder, FLAN-T5 matches the accuracy of SQLCoder while cutting processing time by 96% (SQLCoder: 54 h 43 min; FLAN-T5: 2 h 2 min).
nan
Article 578
Title@2025-07-15 (2): Constrained Online Convex Optimization with Polyak Feasibility Steps
Title: Constrained Online Convex Optimization with Polyak Feasibility Steps | Beschränkte Online Convex-Optimierung mit Polyak-Feasibility-Schritten | 以聚氨酯可行性步骤实现优化 2502.13112v2 |
Authors (2): Spencer Hutchinson, Mahnoosh Alizadeh
In this work, we study online convex optimization with a fixed constraint function $g : \mathbb{R}^d \rightarrow \mathbb{R}$. Prior work on this problem has shown $O(\sqrt{T})$ regret and cumulative constraint satisfaction $\sum_{t=1}^{T} g(x_t) \leq 0$, while only accessing the constraint value and subgradient at the played actions $g(x_t), \partial g(x_t)$. Using the same constraint information, we show a stronger guarantee of anytime constraint satisfaction $g(x_t) \leq 0 \ \forall t \in [T]$, and matching $O(\sqrt{T})$ regret guarantees. These contributions are thanks to our approach of using Polyak feasibility steps to ensure constraint satisfaction, without sacrificing regret. Specifically, after each step of online gradient descent, our algorithm applies a subgradient descent step on the constraint function where the step-size is chosen according to the celebrated Polyak step-size. We further validate this approach with numerical experiments.
nan
Article 579
Title@2025-07-15 (2): Commuting Distance Regularization for Timescale-Dependent Label Inconsistency in EEG Emotion Recognition
Title: Commuting Distance Regularization for Timescale-Dependent Label Inconsistency in EEG Emotion Recognition | Pendeldistanz-Regularisierung für zeitabhängige Label-Inkonsistenz bei der EEG-Emotionserkennung | EEG情感识别中时间尺度依赖性标签不一致的远程常规化迁移 2507.10895v1 |
Authors (4): Xiaocong Zeng, Craig Michoski, Yan Pang, Dongyang Kuang
In this work, we address the often-overlooked issue of Timescale Dependent Label Inconsistency (TsDLI) in training neural network models for EEG-based human emotion recognition. To mitigate TsDLI and enhance model generalization and explainability, we propose two novel regularization strategies: Local Variation Loss (LVL) and Local-Global Consistency Loss (LGCL). Both methods incorporate classical mathematical principles–specifically, functions of bounded variation and commute-time distances–within a graph theoretic framework. Complementing our regularizers, we introduce a suite of new evaluation metrics that better capture the alignment between temporally local predictions and their associated global emotion labels. We validate our approach through comprehensive experiments on two widely used EEG emotion datasets, DREAMER and DEAP, across a range of neural architectures including LSTM and transformer-based models. Performance is assessed using five distinct metrics encompassing both quantitative accuracy and qualitative consistency. Results consistently show that our proposed methods outperform state-of-the-art baselines, delivering superior aggregate performance and offering a principled trade-off between interpretability and predictive power under label inconsistency. Notably, LVL achieves the best aggregate rank across all benchmarked backbones and metrics, while LGCL frequently ranks the second, highlighting the effectiveness of our framework.
nan
Article 580
Title@2025-07-15 (2): SurgeryLSTM: A Time-Aware Neural Model for Accurate and Explainable Length of Stay Prediction After Spine Surgery
Title: SurgeryLSTM: A Time-Aware Neural Model for Accurate and Explainable Length of Stay Prediction After Spine Surgery | SurgeryLSTM: Ein zeitbewusstes Neuralmodell für genaue und erklärbare Dauer der Vorhersage nach Spine Surgery | 手术LSTM: 脊柱外科后准确和可解释的停留时间预测时间长度的时器神经模型 2507.11570v1 |
Authors (5): Ha Na Cho, Sairam Sutari, Alexander Lopez, Hansen Bow, Kai Zheng
Objective: To develop and evaluate machine learning (ML) models for predicting length of stay (LOS) in elective spine surgery, with a focus on the benefits of temporal modeling and model interpretability. Materials and Methods: We compared traditional ML models (e.g., linear regression, random forest, support vector machine (SVM), and XGBoost) with our developed model, SurgeryLSTM, a masked bidirectional long short-term memory (BiLSTM) with an attention, using structured perioperative electronic health records (EHR) data. Performance was evaluated using the coefficient of determination (R2), and key predictors were identified using explainable AI. Results: SurgeryLSTM achieved the highest predictive accuracy (R2=0.86), outperforming XGBoost (R2 = 0.85) and baseline models. The attention mechanism improved interpretability by dynamically identifying influential temporal segments within preoperative clinical sequences, allowing clinicians to trace which events or features most contributed to each LOS prediction. Key predictors of LOS included bone disorder, chronic kidney disease, and lumbar fusion identified as the most impactful predictors of LOS. Discussion: Temporal modeling with attention mechanisms significantly improves LOS prediction by capturing the sequential nature of patient data. Unlike static models, SurgeryLSTM provides both higher accuracy and greater interpretability, which are critical for clinical adoption. These results highlight the potential of integrating attention-based temporal models into hospital planning workflows. Conclusion: SurgeryLSTM presents an effective and interpretable AI solution for LOS prediction in elective spine surgery. Our findings support the integration of temporal, explainable ML approaches into clinical decision support systems to enhance discharge readiness and individualized patient care.
nan
Article 581
Title@2025-07-15 (2): Modernizing CNN-based Weather Forecast Model towards Higher Computational Efficiency
Title: Modernizing CNN-based Weather Forecast Model towards Higher Computational Efficiency | Modernisierung des CNN-basierten Wettervorhersagemodells hin zu höherer rechnerischer Effizienz | 使基于CNN的天气预报模型现代化,实现更高的计算效率 2507.10893v1 |
Authors (5): Minjong Cheon, Eunhan Goo, Su-Hyeon Shin, Muhammad Ahmed, Hyungjun Kim
Recently, AI-based weather forecast models have achieved impressive advances. These models have reached accuracy levels comparable to traditional NWP systems, marking a significant milestone in data-driven weather prediction. However, they mostly leverage Transformer-based architectures, which often leads to high training complexity and resource demands due to the massive parameter sizes. In this study, we introduce a modernized CNN-based model for global weather forecasting that delivers competitive accuracy while significantly reducing computational requirements. To present a systematic modernization roadmap, we highlight key architectural enhancements across multiple design scales from an earlier CNN-based approach. KAI-a incorporates a scale-invariant architecture and InceptionNeXt-based blocks within a geophysically-aware design, tailored to the structure of Earth system data. Trained on the ERA5 daily dataset with 67 atmospheric variables, the model contains about 7 million parameters and completes training in just 12 hours on a single NVIDIA L40s GPU. Our evaluation shows that KAI-a matches the performance of state-of-the-art models in medium-range weather forecasting, while offering a significantly lightweight design. Furthermore, case studies on the 2018 European heatwave and the East Asian summer monsoon demonstrate KAI-a’s robust skill in capturing extreme events, reinforcing its practical utility.
nan
Article 582
Title@2025-07-15 (2): ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Title: ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning | ZebraLogic: Auf den Skalierungsgrenzen von LLMs für logische Vernunft | ZebraLogic:关于逻辑理由解释的LLMs限制限度 2502.01100v2 |
Authors (7): Bill Yuchen Lin, Ronan Le Bras, Kyle Richardson, Ashish Sabharwal, Radha Poovendran, Peter Clark, Yejin Choi
We investigate the logical reasoning capabilities of large language models (LLMs) and their scalability in complex non-monotonic reasoning. To this end, we introduce ZebraLogic, a comprehensive evaluation framework for assessing LLM reasoning performance on logic grid puzzles derived from constraint satisfaction problems (CSPs). ZebraLogic enables the generation of puzzles with controllable and quantifiable complexity, facilitating a systematic study of the scaling limits of models such as Llama, o1 models, and DeepSeek-R1. By encompassing a broad range of search space complexities and diverse logical constraints, ZebraLogic provides a structured environment to evaluate reasoning under increasing difficulty. Our results reveal a significant decline in accuracy as problem complexity grows – a phenomenon we term the curse of complexity. This limitation persists even with larger models and increased inference-time computation, suggesting inherent constraints in current LLM reasoning capabilities. Additionally, we explore strategies to enhance logical reasoning, including Best-of-N sampling, backtracking mechanisms, and self-verification prompts. Our findings offer critical insights into the scalability of LLM reasoning, highlight fundamental limitations, and outline potential directions for improvement.
nan
Article 583
Title@2025-07-15 (2): Robust Semi-Supervised CT Radiomics for Lung Cancer Prognosis: Cost-Effective Learning with Limited Labels and SHAP Interpretation
Title: Robust Semi-Supervised CT Radiomics for Lung Cancer Prognosis: Cost-Effective Learning with Limited Labels and SHAP Interpretation | Robuste semi-überwachte CT-Radiomics für Lungenkrebs-Prognose: Kosteneffizientes Lernen mit limitierten Etiketten und SHAP-Interpretation | 强力半半强化CT 肺癌预测的放射感应器:利用有限标签和SHAP解释进行成本-成本-效益高的学习 2507.08189v2 |
Authors (10): Mohammad R. Salmanpour, Amir Hossein Pouria, Sonia Falahati, Shahram Taeb, Somayeh Sadat Mehrnia, Mehdi Maghsudi, Ali Fathi Jouzdani, Mehrdad Oveisi, Ilker Hacihaliloglu, Arman Rahmim
Background: CT imaging is vital for lung cancer management, offering detailed visualization for AI-based prognosis. However, supervised learning SL models require large labeled datasets, limiting their real-world application in settings with scarce annotations. Methods: We analyzed CT scans from 977 patients across 12 datasets extracting 1218 radiomics features using Laplacian of Gaussian and wavelet filters via PyRadiomics Dimensionality reduction was applied with 56 feature selection and extraction algorithms and 27 classifiers were benchmarked A semi supervised learning SSL framework with pseudo labeling utilized 478 unlabeled and 499 labeled cases Model sensitivity was tested in three scenarios varying labeled data in SL increasing unlabeled data in SSL and scaling both from 10 percent to 100 percent SHAP analysis was used to interpret predictions Cross validation and external testing in two cohorts were performed. Results: SSL outperformed SL, improving overall survival prediction by up to 17 percent. The top SSL model, Random Forest plus XGBoost classifier, achieved 0.90 accuracy in cross-validation and 0.88 externally. SHAP analysis revealed enhanced feature discriminability in both SSL and SL, especially for Class 1 survival greater than 4 years. SSL showed strong performance with only 10 percent labeled data, with more stable results compared to SL and lower variance across external testing, highlighting SSL’s robustness and cost effectiveness. Conclusion: We introduced a cost-effective, stable, and interpretable SSL framework for CT-based survival prediction in lung cancer, improving performance, generalizability, and clinical readiness by integrating SHAP explainability and leveraging unlabeled data.
nan
Article 584
Title@2025-07-15 (2): Outbound Modeling for Inventory Management
Title: Outbound Modeling for Inventory Management | Outbound-Modellierung für die Bestandsverwaltung | 库存管理外部示范 2507.10890v1 |
Authors (4): Riccardo Savorgnan, Udaya Ghai, Carson Eisenach, Dean Foster
We study the problem of forecasting the number of units fulfilled (or ``drained’’) from each inventory warehouse to meet customer demand, along with the associated outbound shipping costs. The actual drain and shipping costs are determined by complex production systems that manage the planning and execution of customers’ orders fulfillment, i.e. from where and how to ship a unit to be delivered to a customer. Accurately modeling these processes is critical for regional inventory planning, especially when using Reinforcement Learning (RL) to develop control policies. For the RL usecase, a drain model is incorporated into a simulator to produce long rollouts, which we desire to be differentiable. While simulating the calls to the internal software systems can be used to recover this transition, they are non-differentiable and too slow and costly to run within an RL training environment. Accordingly, we frame this as a probabilistic forecasting problem, modeling the joint distribution of outbound drain and shipping costs across all warehouses at each time period, conditioned on inventory positions and exogenous customer demand. To ensure robustness in an RL environment, the model must handle out-of-distribution scenarios that arise from off-policy trajectories. We propose a validation scheme that leverages production systems to evaluate the drain model on counterfactual inventory states induced by RL policies. Preliminary results demonstrate the model’s accuracy within the in-distribution setting.
nan
Article 585
Title@2025-07-15 (2): SA-GDA: Spectral Augmentation for Graph Domain Adaptation
Title: SA-GDA: Spectral Augmentation for Graph Domain Adaptation | SA-GDA: Spektrale Augmentation für Graph Domain Adaption | SA-GDA:图域适应的光谱增强 2408.09189v2 |
Authors (5): Jinhui Pang, Zixuan Wang, Jiliang Tang, Mingyan Xiao, Nan Yin
Graph neural networks (GNNs) have achieved impressive impressions for graph-related tasks. However, most GNNs are primarily studied under the cases of signal domain with supervised training, which requires abundant task-specific labels and is difficult to transfer to other domains. There are few works focused on domain adaptation for graph node classification. They mainly focused on aligning the feature space of the source and target domains, without considering the feature alignment between different categories, which may lead to confusion of classification in the target domain. However, due to the scarcity of labels of the target domain, we cannot directly perform effective alignment of categories from different domains, which makes the problem more challenging. In this paper, we present the \textit{Spectral Augmentation for Graph Domain Adaptation (\method{})} for graph node classification. First, we observe that nodes with the same category in different domains exhibit similar characteristics in the spectral domain, while different classes are quite different. Following the observation, we align the category feature space of different domains in the spectral domain instead of aligning the whole features space, and we theoretical proof the stability of proposed \method{}. Then, we develop a dual graph convolutional network to jointly exploits local and global consistency for feature aggregation. Last, we utilize a domain classifier with an adversarial learning submodule to facilitate knowledge transfer between different domain graphs. Experimental results on a variety of publicly available datasets reveal the effectiveness of our \method{}.
nan
Article 586
Title@2025-07-15 (2): How to Protect Models against Adversarial Unlearning?
Title: How to Protect Models against Adversarial Unlearning? | Wie kann man Modelle gegen das Unlernen von Widersachern schützen? | 如何保护模型防止反向学习不学习? 2507.10886v1 |
Authors (3): Patryk Jasiorski, Marek Klonowski, Michał Woźniak
AI models need to be unlearned to fulfill the requirements of legal acts such as the AI Act or GDPR, and also because of the need to remove toxic content, debiasing, the impact of malicious instances, or changes in the data distribution structure in which a model works. Unfortunately, removing knowledge may cause undesirable side effects, such as a deterioration in model performance. In this paper, we investigate the problem of adversarial unlearning, where a malicious party intentionally sends unlearn requests to deteriorate the model’s performance maximally. We show that this phenomenon and the adversary’s capabilities depend on many factors, primarily on the backbone model itself and strategy/limitations in selecting data to be unlearned. The main result of this work is a new method of protecting model performance from these side effects, both in the case of unlearned behavior resulting from spontaneous processes and adversary actions.
nan
Article 587
Title@2025-07-15 (2): Learning from Imperfect Data: Robust Inference of Dynamic Systems using Simulation-based Generative Model
Title: Learning from Imperfect Data: Robust Inference of Dynamic Systems using Simulation-based Generative Model | Von unvollkommenen Daten lernen: Robuste Schlussfolgerung dynamischer Systeme mit simulationsbasiertem Generativem Modell | 从不完美数据中学习:使用模拟生成模型对动态系统进行有力的推论 2507.10884v1 |
Authors (3): Hyunwoo Cho, Hyeontae Jo, Hyung Ju Hwang
System inference for nonlinear dynamic models, represented by ordinary differential equations (ODEs), remains a significant challenge in many fields, particularly when the data are noisy, sparse, or partially observable. In this paper, we propose a Simulation-based Generative Model for Imperfect Data (SiGMoID) that enables precise and robust inference for dynamic systems. The proposed approach integrates two key methods: (1) physics-informed neural networks with hyper-networks that constructs an ODE solver, and (2) Wasserstein generative adversarial networks that estimates ODE parameters by effectively capturing noisy data distributions. We demonstrate that SiGMoID quantifies data noise, estimates system parameters, and infers unobserved system components. Its effectiveness is validated validated through realistic experimental examples, showcasing its broad applicability in various domains, from scientific research to engineered systems, and enabling the discovery of full system dynamics.
nan
Article 588
Title@2025-07-15 (2): Domain-Adaptive Small Language Models for Structured Tax Code Prediction
Title: Domain-Adaptive Small Language Models for Structured Tax Code Prediction | Domain-Adaptive kleine Sprachmodelle für strukturierte Steuervorhersage | 结构化税法预测结构化税法 2507.10880v1 |
Authors (3): Souvik Nath, Sumit Wadhwa, Luiz Perez
Every day, multinational firms process thousands of transactions, each of which must adhere to tax regulations that vary by jurisdiction and are often nuanced. The determination of product and service tax codes, such as HSN or SAC is a major use case in Tax compliance. An accurate determination of such codes is imperative to avoid any tax penalties. This paper proposes a domain-adaptive small language model (SLM) with an encoder-decoder architecture for the enhanced prediction of product and service tax codes. In this approach, we address the problem of predicting hierarchical tax code sequences using unstructured product and services data. We employ an SLM based upon encoder-decoder architecture as this enables sequential generation of tax codes to capture the hierarchical dependencies present within the tax codes. Our experiments demonstrate that encoder-decoder SLMs can be successfully applied to the sequential prediction of structured tax codes, a domain that remains comparatively unexplored in current NLP research. In this paper, we demonstrate the superior performance of the domain-adaptive encoder-decoder SLMs over flat classifiers when applied to the Harmonized System of Nomenclature (HSN), and achieve superior results compared to decoder-only and encoder-only architectures for structured sequence generation tasks. This approach can also be scaled to other government-mandated tax commodity codes, such as United Nations Standard Products and Services Codes (UNSPSC), or Brazil’s Nomenclatura Comum do Mercosul (NCM).
nan
Article 589
Title@2025-07-15 (2): BioScore: A Foundational Scoring Function For Diverse Biomolecular Complexes
Title: BioScore: A Foundational Scoring Function For Diverse Biomolecular Complexes | BioScore: Eine grundlegende Scoring-Funktion für vielfältige biomolekulare Komplexe | 生物核心:多样性生物分子复合体的基础测量功能 2507.10877v1 |
Authors (12): Yuchen Zhu, Jihong Chen, Yitong Li, Xiaomin Fang, Xianbin Ye, Jingzhou He, Xujun Zhang, Jingxuan Ge, Chao Shen, Xiaonan Zhang, Tingjun Hou, Chang-Yu Hsieh
Structural assessment of biomolecular complexes is vital for translating molecular models into functional insights, shaping our understanding of biology and aiding drug discovery. However, current structure-based scoring functions often lack generalizability across diverse biomolecular systems. We present BioScore, a foundational scoring function that addresses key challenges – data sparsity, cross-system representation, and task compatibility – through a dual-scale geometric graph learning framework with tailored modules for structure assessment and affinity prediction. BioScore supports a wide range of tasks, including affinity prediction, conformation ranking, and structure-based virtual screening. Evaluated on 16 benchmarks spanning proteins, nucleic acids, small molecules, and carbohydrates, BioScore consistently outperforms or matches 70 traditional and deep learning methods. Our newly proposed PPI Benchmark further enables comprehensive evaluation of protein-protein complex scoring. BioScore demonstrates broad applicability: (1) pretraining on mixed-structure data boosts protein-protein affinity prediction by up to 40% and antigen-antibody binding correlation by over 90%; (2) cross-system generalizability enables zero- and few-shot prediction with up to 71% correlation gain; and (3) its unified representation captures chemically challenging systems such as cyclic peptides, improving affinity prediction by over 60%. BioScore establishes a robust and generalizable framework for structural assessment across complex biomolecular landscapes.
nan
Article 590
Title@2025-07-15 (2): The Odyssey of the Fittest: Can Agents Survive and Still Be Good?
Title: The Odyssey of the Fittest: Can Agents Survive and Still Be Good? | Die Odyssee der Fittest: Können Agenten überleben und immer noch gut sein? | 《适龄者的奥德赛:代理能生存和保持良好吗? 2502.05442v3 |
Authors (2): Dylan Waldner, Risto Miikkulainen
As AI models grow in power and generality, understanding how agents learn and make decisions in complex environments is critical to promoting ethical behavior. This study introduces the Odyssey, a lightweight, adaptive text based adventure game, providing a scalable framework for exploring AI ethics and safety. The Odyssey examines the ethical implications of implementing biological drives, specifically, self preservation, into three different agents. A Bayesian agent optimized with NEAT, a Bayesian agent optimized with stochastic variational inference, and a GPT 4o agent. The agents select actions at each scenario to survive, adapting to increasingly challenging scenarios. Post simulation analysis evaluates the ethical scores of the agent decisions, uncovering the tradeoffs it navigates to survive. Specifically, analysis finds that when danger increases, agents ethical behavior becomes unpredictable. Surprisingly, the GPT 4o agent outperformed the Bayesian models in both survival and ethical consistency, challenging assumptions about traditional probabilistic methods and raising a new challenge to understand the mechanisms of LLMs’ probabilistic reasoning.
nan
Article 591
Title@2025-07-15 (2): GALDS: A Graph-Autoencoder-based Latent Dynamics Surrogate model to predict neurite material transport
Title: GALDS: A Graph-Autoencoder-based Latent Dynamics Surrogate model to predict neurite material transport | GALDS: Ein auf Graph-Autoencoder basierendes Latent Dynamics Surrogate-Modell zur Vorhersage des Neurit-Materialtransports | GALDS:一个基于图形自动电解码器的冷流动态探测模型,用于预测中程材料的迁移 2507.10871v1 |
Authors (2): Tsung Yeh Hsieh, Yongjie Jessica Zhang
Neurons exhibit intricate geometries within their neurite networks, which play a crucial role in processes such as signaling and nutrient transport. Accurate simulation of material transport in the networks is essential for understanding these biological phenomena but poses significant computational challenges because of the complex tree-like structures involved. Traditional approaches are time-intensive and resource-demanding, yet the inherent properties of neuron trees, which consists primarily of pipes with steady-state parabolic velocity profiles and bifurcations, provide opportunities for computational optimization. To address these challenges, we propose a Graph-Autoencoder-based Latent Dynamics Surrogate (GALDS) model, which is specifically designed to streamline the simulation of material transport in neural trees. GALDS employs a graph autoencoder to encode latent representations of the network’s geometry, velocity fields, and concentration profiles. These latent space representations are then assembled into a global graph, which is subsequently used to predict system dynamics in the latent space via a trained graph latent space system dynamic model, inspired by the Neural Ordinary Differential Equations (Neural ODEs) concept. The integration of an autoencoder allows for the use of smaller graph neural network models with reduced training data requirements. Furthermore, the Neural ODE component effectively mitigates the issue of error accumulation commonly encountered in recurrent neural networks. The effectiveness of the GALDS model is demonstrated through results on eight unseen geometries and four abnormal transport examples, where our approach achieves mean relative error of 3% with maximum relative error <8% and demonstrates a 10-fold speed improvement compared to previous surrogate model approaches.
nan
Article 592
Title@2025-07-14 (1): PhysiX: A Foundation Model for Physics Simulations
Title: PhysiX: A Foundation Model for Physics Simulations | PhysiX: Ein Grundlagenmodell für Physiksimulationen | PhysiX:物理模拟基础模型 2506.17774v2 |
Authors (4): Tung Nguyen, Arsh Koneru, Shufan Li, Aditya Grover
Foundation models have achieved remarkable success across video, image, and language domains. By scaling up the number of parameters and training datasets, these models acquire generalizable world knowledge and often surpass task-specific approaches. However, such progress has yet to extend to the domain of physics simulation. A primary bottleneck is data scarcity: while millions of images, videos, and textual resources are readily available on the internet, the largest physics simulation datasets contain only tens of thousands of samples. This data limitation hinders the use of large models, as overfitting becomes a major concern. As a result, physics applications typically rely on small models, which struggle with long-range prediction due to limited context understanding. Additionally, unlike images, videos, or text-which typically exhibit fixed granularity-physics datasets often vary drastically in scale, amplifying the challenges of scaling up multitask training. We introduce PhysiX, the first large-scale foundation model for physics simulation. PhysiX is a 4.5B parameter autoregressive generative model. It uses a discrete tokenizer to encode physical processes at different scales into a sequence of discrete tokens, and employs an autoregressive next-token prediction objective to model such processes in the token space. To mitigate the rounding error in the discretization process, PhysiX incorporates a specialized refinement module. Through extensive experiments, we show that PhysiX effectively addresses the data bottleneck, outperforming task-specific baselines under comparable settings as well as the previous absolute state-of-the-art approaches on The Well benchmark. Our results indicate that knowledge learned from natural videos can be successfully transferred to physics simulation, and that joint training across diverse simulation tasks enables synergistic learning.
nan
Article 593
Title@2025-07-14 (1): Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning
Title: Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning | Training Dynamics zugrunde liegende Sprachmodellskalierungsgesetze: Verlustverschleierung und Null-Summe-Lernen | 培训动态基础语言示范缩写法:损失减速和零苏姆学习 2506.05447v2 |
Authors (7): Andrei Mircea, Supriyo Chakraborty, Nima Chitsazan, Milind Naphade, Sambit Sahu, Irina Rish, Ekaterina Lobacheva
This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl
nan
Article 594
Title@2025-07-14 (1): Visually grounded emotion regulation via diffusion models and user-driven reappraisal
Title: Visually grounded emotion regulation via diffusion models and user-driven reappraisal | Optisch geerdete Emotionsregulation über Diffusionsmodelle und benutzergesteuerte Neubewertung | 通过传播模型和用户驱动的重新评价,以视觉为基础的情感调控 2507.10861v1 |
Authors (3): Edoardo Pinzuti, Oliver Tüscher, André Ferreira Castro
Cognitive reappraisal is a key strategy in emotion regulation, involving reinterpretation of emotionally charged stimuli to alter affective responses. Despite its central role in clinical and cognitive science, real-world reappraisal interventions remain cognitively demanding, abstract, and primarily verbal. This reliance on higher-order cognitive and linguistic processes is often impaired in individuals with trauma or depression, limiting the effectiveness of standard approaches. Here, we propose a novel, visually based augmentation of cognitive reappraisal by integrating large-scale text-to-image diffusion models into the emotional regulation process. Specifically, we introduce a system in which users reinterpret emotionally negative images via spoken reappraisals, which are transformed into supportive, emotionally congruent visualizations using stable diffusion models with a fine-tuned IP-adapter. This generative transformation visually instantiates users’ reappraisals while maintaining structural similarity to the original stimuli, externalizing and reinforcing regulatory intent. To test this approach, we conducted a within-subject experiment (N = 20) using a modified cognitive emotion regulation (CER) task. Participants reappraised or described aversive images from the International Affective Picture System (IAPS), with or without AI-generated visual feedback. Results show that AI-assisted reappraisal significantly reduced negative affect compared to both non-AI and control conditions. Further analyses reveal that sentiment alignment between participant reappraisals and generated images correlates with affective relief, suggesting that multimodal coherence enhances regulatory efficacy. These findings demonstrate that generative visual input can support cogitive reappraisal and open new directions at the intersection of generative AI, affective computing, and therapeutic technology.
nan
Article 595
Title@2025-07-14 (1): Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Embedding
Title: Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Embedding | RoPE neu denken: Ein mathematischer Blueprint für N-dimensionales Positional Embedding | 重新思考ROPE: N维定位嵌入的数学蓝图 2504.06308v2 |
Authors (6): Haiping Liu, Lijing Lin, Jingyuan Sun, Zhegong Shangguan, Mauricio A. Alvarez, Hongpeng Zhou
Rotary Position Embedding (RoPE) is widely adopted in large language models (LLMs) due to its efficient encoding of relative positions with strong extrapolation capabilities. However, while its application in higher-dimensional input domains, such as 2D images, have been explored in several attempts, a unified theoretical framework is still lacking. To address this, we propose a systematic mathematical framework for RoPE grounded in Lie group and Lie algebra theory. We derive the necessary and sufficient conditions for any valid $N$-dimensional RoPE based on two core properties of RoPE - relativity and reversibility. We demonstrate that RoPE can be characterized as a basis of a maximal abelian subalgebra (MASA) in the special orthogonal Lie algebra, and that the commonly used axis-aligned block-diagonal RoPE, where each input axis is encoded by an independent 2x2 rotation block, corresponds to the maximal toral subalgebra. Furthermore, we reduce spatial inter-dimensional interactions to a change of basis, resolved by learning an orthogonal transformation. Our experiment results suggest that inter-dimensional interactions should be balanced with local structure preservation. Overall, our framework unifies and explains existing RoPE designs while enabling principled extensions to higher-dimensional modalities and tasks.
nan
Article 596
Title@2025-07-14 (1): PhreshPhish: A Real-World, High-Quality, Large-Scale Phishing Website Dataset and Benchmark
Title: PhreshPhish: A Real-World, High-Quality, Large-Scale Phishing Website Dataset and Benchmark | PhreshPhish: Ein echter, hochwertiger, großformatiger Phishing-Website-Datensatz und Benchmark | PhreshPhish:一个现实世界、高质量、大规模搜索网站数据集和基准 2507.10854v1 |
Authors (8): Thomas Dalton, Hemanth Gowda, Girish Rao, Sachin Pargi, Alireza Hadj Khodabakhshi, Joseph Rombs, Stephan Jou, Manish Marwah
Phishing remains a pervasive and growing threat, inflicting heavy economic and reputational damage. While machine learning has been effective in real-time detection of phishing attacks, progress is hindered by lack of large, high-quality datasets and benchmarks. In addition to poor-quality due to challenges in data collection, existing datasets suffer from leakage and unrealistic base rates, leading to overly optimistic performance results. In this paper, we introduce PhreshPhish, a large-scale, high-quality dataset of phishing websites that addresses these limitations. Compared to existing public datasets, PhreshPhish is substantially larger and provides significantly higher quality, as measured by the estimated rate of invalid or mislabeled data points. Additionally, we propose a comprehensive suite of benchmark datasets specifically designed for realistic model evaluation by minimizing leakage, increasing task difficulty, enhancing dataset diversity, and adjustment of base rates more likely to be seen in the real world. We train and evaluate multiple solution approaches to provide baseline performance on the benchmark sets. We believe the availability of this dataset and benchmarks will enable realistic, standardized model comparison and foster further advances in phishing detection. The datasets and benchmarks are available on Hugging Face (https://huggingface.co/datasets/phreshphish/phreshphish).
nan
Article 597
Title@2025-07-14 (1): Prediction via Shapley Value Regression
Title: Prediction via Shapley Value Regression | Vorhersage durch Shapley-Wert-Regression | 通过阴影值回归预测 2505.04775v2 |
Authors (4): Amr Alkhatib, Roman Bresson, Henrik Boström, Michalis Vazirgiannis
Shapley values have several desirable, theoretically well-supported, properties for explaining black-box model predictions. Traditionally, Shapley values are computed post-hoc, leading to additional computational cost at inference time. To overcome this, a novel method, called ViaSHAP, is proposed, that learns a function to compute Shapley values, from which the predictions can be derived directly by summation. Two approaches to implement the proposed method are explored; one based on the universal approximation theorem and the other on the Kolmogorov-Arnold representation theorem. Results from a large-scale empirical investigation are presented, showing that ViaSHAP using Kolmogorov-Arnold Networks performs on par with state-of-the-art algorithms for tabular data. It is also shown that the explanations of ViaSHAP are significantly more accurate than the popular approximator FastSHAP on both tabular data and images.
nan
Article 598
Title@2025-07-14 (1): FairTargetSim: An Interactive Simulator for Understanding and Explaining the Fairness Effects of Target Variable Definition
Title: FairTargetSim: An Interactive Simulator for Understanding and Explaining the Fairness Effects of Target Variable Definition | FairTargetSim: Ein interaktiver Simulator zum Verstehen und Erklären der Fairness-Effekte von Target Variable Definition | FairtargetSim: 理解和解释目标变量定义的公平影响的交互式模拟器 2403.06031v2 |
Authors (7): Dalia Gala, Milo Phillips-Brown, Naman Goel, Carinal Prunkl, Laura Alvarez Jubete, medb corcoran, Ray Eitel-Porter
Machine learning requires defining one’s target variable for predictions or decisions, a process that can have profound implications for fairness, since biases are often encoded in target variable definition itself, before any data collection or training. The downstream impacts of target variable definition must be taken into account in order to responsibly develop, deploy, and use the algorithmic systems. We propose FairTargetSim (FTS), an interactive and simulation-based approach for this. We demonstrate FTS using the example of algorithmic hiring, grounded in real-world data and user-defined target variables. FTS is open-source; it can be used by algorithm developers, non-technical stakeholders, researchers, and educators in a number of ways. FTS is available at: http://tinyurl.com/ftsinterface. The video accompanying this paper is here: http://tinyurl.com/ijcaifts.
nan
Article 599
Title@2025-07-14 (1): HEIMDALL: a grapH-based sEIsMic Detector And Locator for microseismicity
Title: HEIMDALL: a grapH-based sEIsMic Detector And Locator for microseismicity | HEIMDALL: ein grapH-basierter SEIsMic-Detektor und Ortung für Mikroseismizität | HEMDALL: 一种基于 grapH 的微型地震探测器和定位器 2507.10850v1 |
Authors (3): Matteo Bagagli, Francesco Grigoli, Davide Bacciu
In this work, we present a new deep-learning model for microseismicity monitoring that utilizes continuous spatiotemporal relationships between seismic station recordings, forming an end-to-end pipeline for seismic catalog creation. It employs graph theory and state-of-the-art graph neural network architectures to perform phase picking, association, and event location simultaneously over rolling windows, making it suitable for both playback and near-real-time monitoring. As part of the global strategy to reduce carbon emissions within the broader context of a green-energy transition, there has been growing interest in exploiting enhanced geothermal systems. Tested in the complex geothermal area of Iceland’s Hengill region using open-access data from a temporary experiment, our model was trained and validated using both manually revised and automatic seismic catalogs. Results showed a significant increase in event detection compared to previously published automatic systems and reference catalogs, including a $4 M_w$ seismic sequence in December 2018 and a single-day sequence in February 2019. Our method reduces false events, minimizes manual oversight, and decreases the need for extensive tuning of pipelines or transfer learning of deep-learning models. Overall, it validates a robust monitoring tool for geothermal seismic regions, complementing existing systems and enhancing operational risk mitigation during geothermal energy exploitation.
nan
Article 600
Title@2025-07-14 (1): Winsor-CAM: Human-Tunable Visual Explanations from Deep Networks via Layer-Wise Winsorization
Title: Winsor-CAM: Human-Tunable Visual Explanations from Deep Networks via Layer-Wise Winsorization | Winsor-CAM: Human-Tunable Visuelle Erklärungen aus Deep Networks über Layer-Wise Winsorization | Winsor-CAM:通过图层-Wise Winsorization从深网络获得的人类可视解释 2507.10846v1 |
Authors (4): Casey Wall, Longwei Wang, Rodrigue Rizk, KC Santosh
Interpreting the decision-making process of Convolutional Neural Networks (CNNs) is critical for deploying models in high-stakes domains. Gradient-weighted Class Activation Mapping (Grad-CAM) is a widely used method for visual explanations, yet it typically focuses on the final convolutional layer or na"ively averages across layers, strategies that can obscure important semantic cues or amplify irrelevant noise. We propose Winsor-CAM, a novel, human-tunable extension of Grad-CAM that generates robust and coherent saliency maps by aggregating information across all convolutional layers. To mitigate the influence of noisy or extreme attribution values, Winsor-CAM applies Winsorization, a percentile-based outlier attenuation technique. A user-controllable threshold allows for semantic-level tuning, enabling flexible exploration of model behavior across representational hierarchies. Evaluations on standard architectures (ResNet50, DenseNet121, VGG16, InceptionV3) using the PASCAL VOC 2012 dataset demonstrate that Winsor-CAM produces more interpretable heatmaps and achieves superior performance in localization metrics, including intersection-over-union and center-of-mass alignment, when compared to Grad-CAM and uniform layer-averaging baselines. Winsor-CAM advances the goal of trustworthy AI by offering interpretable, multi-layer insights with human-in-the-loop control.
nan
Article 601
Title@2025-07-14 (1): Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps
Title: Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps | Offline-Verstärkung Lernen mit Wasserstein Regularisierung über optimale Transportkarten | 通过最佳运输地图与瓦塞斯坦通过最佳运输地图实现正规化 2507.10843v1 |
Authors (5): Motoki Omura, Yusuke Mukuta, Kazuki Ota, Takayuki Osa, Tatsuya Harada
Offline reinforcement learning (RL) aims to learn an optimal policy from a static dataset, making it particularly valuable in scenarios where data collection is costly, such as robotics. A major challenge in offline RL is distributional shift, where the learned policy deviates from the dataset distribution, potentially leading to unreliable out-of-distribution actions. To mitigate this issue, regularization techniques have been employed. While many existing methods utilize density ratio-based measures, such as the $f$-divergence, for regularization, we propose an approach that utilizes the Wasserstein distance, which is robust to out-of-distribution data and captures the similarity between actions. Our method employs input-convex neural networks (ICNNs) to model optimal transport maps, enabling the computation of the Wasserstein distance in a discriminator-free manner, thereby avoiding adversarial training and ensuring stable learning. Our approach demonstrates comparable or superior performance to widely used existing methods on the D4RL benchmark dataset. The code is available at https://github.com/motokiomura/Q-DOT .
nan
Article 602
Title@2025-07-14 (1): EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
Title: EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices | EntroLLM: Entropie-kodierte Gewichtskompression für effiziente großsprachliche Modellableitung auf Edge-Geräten | EntroLLM: Entropy Encod Weight 压缩,以高效大语言模型推导边缘设备 2505.02380v3 |
Authors (5): Arnab Sanyal, Gourav Datta, Prithwish Mukherjee, Sandeep P. Chinchali, Michael Orshansky
Large Language Models (LLMs) demonstrate exceptional performance across various tasks, but their large storage and computational requirements constrain their deployment on edge devices. To address this, we propose EntroLLM, a novel compression framework that integrates mixed quantization with entropy coding to reduce storage overhead while maintaining model accuracy. Our method applies a layer-wise mixed quantization scheme - choosing between symmetric and asymmetric quantization based on individual layer weight distributions - to optimize compressibility. We then employ Huffman encoding for lossless compression of the quantized weights, significantly reducing memory bandwidth requirements. Furthermore, we introduce parallel Huffman decoding, which enables efficient retrieval of encoded weights during inference, ensuring minimal latency impact. Our experiments on edge-compatible LLMs, including smolLM-1.7B-Instruct, phi3-mini-4k-Instruct, and mistral-7B-Instruct, demonstrate that EntroLLM achieves up to $30\%$ storage reduction compared to uint8 models and up to $65%$ storage reduction compared to uint4 models, while preserving perplexity and accuracy, on language benchmark tasks. We further show that our method enables $31.9\%$ - $146.6\%$ faster inference throughput on memory-bandwidth-limited edge devices, such as NVIDIA Jetson P3450, by reducing the required data movement. The proposed approach requires no additional re-training and is fully compatible with existing post-training quantization methods, making it a practical solution for edge LLMs.
nan
Article 603
Title@2025-07-14 (1): Geometric Learning Dynamics
Title: Geometric Learning Dynamics | Geometrische Lerndynamik | 几何学习动态 2504.14728v2 |
Authors (1): Vitaly Vanchurin
We present a unified geometric framework for modeling learning dynamics in physical, biological, and machine learning systems. The theory reveals three fundamental regimes, each emerging from the power-law relationship $g \propto \kappa^\alpha$ between the metric tensor $g$ in the space of trainable variables and the noise covariance matrix $\kappa$. The quantum regime corresponds to $\alpha = 1$ and describes Schr"odinger-like dynamics that emerges from a discrete shift symmetry. The efficient learning regime corresponds to $\alpha = \tfrac{1}{2}$ and describes very fast machine learning algorithms. The equilibration regime corresponds to $\alpha = 0$ and describes classical models of biological evolution. We argue that the emergence of the intermediate regime $\alpha = \tfrac{1}{2}$ is a key mechanism underlying the emergence of biological complexity.
nan
Article 604
Title@2025-07-14 (1): Entity-Specific Cyber Risk Assessment using InsurTech Empowered Risk Factors
Title: Entity-Specific Cyber Risk Assessment using InsurTech Empowered Risk Factors | Cyber-Risikobewertung von Unternehmen mit InsurTech Empowered Risk Factors | 利用科学、技术、赋权风险因素进行具体实体具体网络风险评估 2507.08193v2 |
Authors (3): Jiayi Guo, Zhiyu Quan, Linfeng Zhang
The lack of high-quality public cyber incident data limits empirical research and predictive modeling for cyber risk assessment. This challenge persists due to the reluctance of companies to disclose incidents that could damage their reputation or investor confidence. Therefore, from an actuarial perspective, potential resolutions conclude two aspects: the enhancement of existing cyber incident datasets and the implementation of advanced modeling techniques to optimize the use of the available data. A review of existing data-driven methods highlights a significant lack of entity-specific organizational features in publicly available datasets. To address this gap, we propose a novel InsurTech framework that enriches cyber incident data with entity-specific attributes. We develop various machine learning (ML) models: a multilabel classification model to predict the occurrence of cyber incident types (e.g., Privacy Violation, Data Breach, Fraud and Extortion, IT Error, and Others) and a multioutput regression model to estimate their annual frequencies. While classifier and regressor chains are implemented to explore dependencies among cyber incident types as well, no significant correlations are observed in our datasets. Besides, we apply multiple interpretable ML techniques to identify and cross-validate potential risk factors developed by InsurTech across ML models. We find that InsurTech empowered features enhance prediction occurrence and frequency estimation robustness compared to only using conventional risk factors. The framework generates transparent, entity-specific cyber risk profiles, supporting customized underwriting and proactive cyber risk mitigation. It provides insurers and organizations with data-driven insights to support decision-making and compliance planning.
nan
Article 605
Title@2025-07-14 (1): Functional Neural Wavefunction Optimization
Title: Functional Neural Wavefunction Optimization | Funktionelle Neuralwellenfunktionsoptimierung | 功能神经波函数优化 2507.10835v1 |
Authors (7): Victor Armegioiu, Juan Carrasquilla, Siddhartha Mishra, Johannes Müller, Jannes Nys, Marius Zeinhofer, Hang Zhang
We propose a framework for the design and analysis of optimization algorithms in variational quantum Monte Carlo, drawing on geometric insights into the corresponding function space. The framework translates infinite-dimensional optimization dynamics into tractable parameter-space algorithms through a Galerkin projection onto the tangent space of the variational ansatz. This perspective unifies existing methods such as stochastic reconfiguration and Rayleigh-Gauss-Newton, provides connections to classic function-space algorithms, and motivates the derivation of novel algorithms with geometrically principled hyperparameter choices. We validate our framework with numerical experiments demonstrating its practical relevance through the accurate estimation of ground-state energies for several prototypical models in condensed matter physics modeled with neural network wavefunctions.
nan
Article 606
Title@2025-07-14 (1): From Small to Large: A Graph Convolutional Network Approach for Solving Assortment Optimization Problems
Title: From Small to Large: A Graph Convolutional Network Approach for Solving Assortment Optimization Problems | Von klein zu groß: Ein Graph Convolutional Network Ansatz zur Lösung von Sortieroptimierungsproblemen | 从小到大:解决各类优化问题图集网络方法 2507.10834v1 |
Authors (4): Guokai Li, Pin Gao, Stefanus Jasin, Zizhuo Wang
Assortment optimization involves selecting a subset of substitutable products (subject to certain constraints) to maximize the expected revenue. It is a classic problem in revenue management and finds applications across various industries. However, the problem is usually NP-hard due to its combinatorial and non-linear nature. In this work, we explore how graph concolutional networks (GCNs) can be leveraged to efficiently solve constrained assortment optimization under the mixed multinomial logit choice model. We first develop a graph representation of the assortment problem, then train a GCN to learn the patterns of optimal assortments, and lastly propose two inference policies based on the GCN’s output. Due to the GCN’s inherent ability to generalize across inputs of varying sizes, we can use a GCN trained on small-scale instances to facilitate large-scale instances. Extensive numerical experiments demonstrate that given a GCN trained on small-scale instances (e.g., with 20 products), the proposed policies can achieve superior performance (90%+ optimality) on large-scale instances (with up to 2,000 products) within seconds, which outperform existing heuristic policies in both performance and efficiency. Furthermore, we extend our framework to a model-free setting where the underlying choice model is unknown but transaction data is available. We also conduct numerical experiments to demonstrate the effectiveness and efficiency of our proposed policies in this setting.
nan
Article 607
Title@2025-07-14 (1): FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE
Title: FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE | FLAME: Auf dem Weg zu Federated Fine-Tuning großen Sprachmodellen durch adaptive SMoE | FLAME:通过适应性SMOE,走向联邦微调大语言模式 2506.16600v2 |
Authors (4): Khiem Le, Tuan Tran, Ting Hua, Nitesh V. Chawla
Existing resource-adaptive LoRA federated fine-tuning methods enable clients to fine-tune models using compressed versions of global LoRA matrices, in order to accommodate various compute resources across clients. This compression requirement will lead to suboptimal performance due to information loss. To address this, we propose FLAME, a novel federated learning framework based on the Sparse Mixture-of-Experts (SMoE) architecture. Unlike prior approaches, FLAME retains full (uncompressed) global LoRA matrices and achieves client-side adaptability by varying the number of activated experts per client. However, incorporating SMoE into federated learning introduces unique challenges, specifically, the mismatch in output magnitude from partial expert activation and the imbalance in expert training quality across clients. FLAME tackles these challenges through a lightweight rescaling mechanism and an activation-aware aggregation scheme. Empirical results across diverse computational settings demonstrate that FLAME consistently outperforms existing methods, providing a robust and effective solution for resource-adaptive federated learning.
nan
Article 608
Title@2025-07-14 (1): Semantic Context for Tool Orchestration
Title: Semantic Context for Tool Orchestration | Semantischer Kontext für Werkzeug-Orchestrierung | 工具管弦化的语义背景 2507.10820v1 |
Authors (1): Robert Müller
This paper demonstrates that Semantic Context (SC), leveraging descriptive tool information, is a foundational component for robust tool orchestration. Our contributions are threefold. First, we provide a theoretical foundation using contextual bandits, introducing SC-LinUCB and proving it achieves lower regret and adapts favourably in dynamic action spaces. Second, we provide parallel empirical validation with Large Language Models, showing that SC is critical for successful in-context learning in both static (efficient learning) and non-stationary (robust adaptation) settings. Third, we propose the FiReAct pipeline, and demonstrate on a benchmark with over 10,000 tools that SC-based retrieval enables an LLM to effectively orchestrate over a large action space. These findings provide a comprehensive guide to building more sample-efficient, adaptive, and scalable orchestration agents.
nan
Article 609
Title@2025-07-14 (1): Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood
Title: Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood | Prognose von intermittierenden Zeitreihen mit Gaußschen Prozessen und Tweedie-Wahrscheinlichkeit | 利用高斯进程和特韦迪可能性预测间歇时间序列 2502.19086v4 |
Authors (3): Stefano Damato, Dario Azzimonti, Giorgio Corani
We adopt Gaussian Processes (GPs) as latent functions for probabilistic forecasting of intermittent time series. The model is trained in a Bayesian framework that accounts for the uncertainty about the latent function. We couple the latent GP variable with two types of forecast distributions: the negative binomial (NegBinGP) and the Tweedie distribution (TweedieGP). While the negative binomial has already been used in forecasting intermittent time series, this is the first time in which a fully parameterized Tweedie density is used for intermittent time series. We properly evaluate the Tweedie density, which has both a point mass at zero and heavy tails, avoiding simplifying assumptions made in existing models. We test our models on thousands of intermittent count time series. Results show that our models provide consistently better probabilistic forecasts than the competitors. In particular, TweedieGP obtains the best estimates of the highest quantiles, thus showing that it is more flexible than NegBinGP.
nan
Article 610
Title@2025-07-14 (1): Quantum Transfer Learning to Boost Dementia Detection
Title: Quantum Transfer Learning to Boost Dementia Detection | Quantentransfer Lernen, um Demenzerkennung zu fördern | 学习加速痴呆症检测的量子传输 2507.12485v1 |
Authors (3): Sounak Bhowmik, Talita Perciano, Himanshu Thapliyal
Dementia is a devastating condition with profound implications for individuals, families, and healthcare systems. Early and accurate detection of dementia is critical for timely intervention and improved patient outcomes. While classical machine learning and deep learning approaches have been explored extensively for dementia prediction, these solutions often struggle with high-dimensional biomedical data and large-scale datasets, quickly reaching computational and performance limitations. To address this challenge, quantum machine learning (QML) has emerged as a promising paradigm, offering faster training and advanced pattern recognition capabilities. This work aims to demonstrate the potential of quantum transfer learning (QTL) to enhance the performance of a weak classical deep learning model applied to a binary classification task for dementia detection. Besides, we show the effect of noise on the QTL-based approach, investigating the reliability and robustness of this method. Using the OASIS 2 dataset, we show how quantum techniques can transform a suboptimal classical model into a more effective solution for biomedical image classification, highlighting their potential impact on advancing healthcare technology.
nan
Article 611
Title@2025-07-14 (1): Uncovering Causal Relation Shifts in Event Sequences under Out-of-Domain Interventions
Title: Uncovering Causal Relation Shifts in Event Sequences under Out-of-Domain Interventions | Entdecken von Kausalrelation-Shifts in Ereignissequenzen unter Out-of-Domain-Interventionen | 场外干预下事件序列中未覆盖的因果关系变化变化 2507.10809v1 |
Authors (6): Kazi Tasnim Zinat, Yun Zhou, Xiang Lyu, Yawei Wang, Zhicheng Liu, Panpan Xu
Inferring causal relationships between event pairs in a temporal sequence is applicable in many domains such as healthcare, manufacturing, and transportation. Most existing work on causal inference primarily focuses on event types within the designated domain, without considering the impact of exogenous out-of-domain interventions. In real-world settings, these out-of-domain interventions can significantly alter causal dynamics. To address this gap, we propose a new causal framework to define average treatment effect (ATE), beyond independent and identically distributed (i.i.d.) data in classic Rubin’s causal framework, to capture the causal relation shift between events of temporal process under out-of-domain intervention. We design an unbiased ATE estimator, and devise a Transformer-based neural network model to handle both long-range temporal dependencies and local patterns while integrating out-of-domain intervention information into process modeling. Extensive experiments on both simulated and real-world datasets demonstrate that our method outperforms baselines in ATE estimation and goodness-of-fit under out-of-domain-augmented point processes.
nan
Article 612
Title@2025-07-14 (1): Multi-Armed Sampling Problem and the End of Exploration
Title: Multi-Armed Sampling Problem and the End of Exploration | Multi-Armed Sampling Problem und das Ende der Exploration | 多军备抽样问题和探索的结束 2507.10797v1 |
Authors (2): Mohammad Pedramfar, Siamak Ravanbakhsh
This paper introduces the framework of multi-armed sampling, as the sampling counterpart to the optimization problem of multi-arm bandits. Our primary motivation is to rigorously examine the exploration-exploitation trade-off in the context of sampling. We systematically define plausible notions of regret for this framework and establish corresponding lower bounds. We then propose a simple algorithm that achieves these optimal regret bounds. Our theoretical results demonstrate that in contrast to optimization, sampling does not require exploration. To further connect our findings with those of multi-armed bandits, we define a continuous family of problems and associated regret measures that smoothly interpolates and unifies multi-armed sampling and multi-armed bandit problems using a temperature parameter. We believe the multi-armed sampling framework, and our findings in this setting can have a foundational role in the study of sampling including recent neural samplers, akin to the role of multi-armed bandits in reinforcement learning. In particular, our work sheds light on the need for exploration and the convergence properties of algorithm for entropy-regularized reinforcement learning, fine-tuning of pretrained models and reinforcement learning with human feedback (RLHF).
nan
Article 613
Title@2025-07-14 (1): Multilayer Artificial Benchmark for Community Detection (mABCD)
Title: Multilayer Artificial Benchmark for Community Detection (mABCD) | Multilayer Artificial Benchmark für Gemeinschaftserkennung (mABCD) | 社区探测多人基准(MIABCD) 2507.10795v1 |
Authors (6): Łukasz Kraiński, Michał Czuba, Piotr Bródka, Paweł Prałat, Bogumił Kamiński, François Théberge
The Artificial Benchmark for Community Detection (ABCD) model is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster, more interpretable, and can be investigated analytically. In this paper, we use the underlying ingredients of the ABCD model and introduce its variant for multilayer networks, mABCD.
nan
Article 614
Title@2025-07-14 (1): A Generalizable Physics-Enhanced State Space Model for Long-Term Dynamics Forecasting in Complex Environments
Title: A Generalizable Physics-Enhanced State Space Model for Long-Term Dynamics Forecasting in Complex Environments | Ein generalisierbares physik-verbessertes Zustands-Raummodell für die Langzeit-Dynamik-Prognose in komplexen Umgebungen | 综合环境中长期动态预测空间模型 2507.10792v1 |
Authors (6): Yuchen Wang, Hongjue Zhao, Haohong Lin, Enze Xu, Lifang He, Huajie Shao
This work aims to address the problem of long-term dynamic forecasting in complex environments where data are noisy and irregularly sampled. While recent studies have introduced some methods to improve prediction performance, these approaches still face a significant challenge in handling long-term extrapolation tasks under such complex scenarios. To overcome this challenge, we propose Phy-SSM, a generalizable method that integrates partial physics knowledge into state space models (SSMs) for long-term dynamics forecasting in complex environments. Our motivation is that SSMs can effectively capture long-range dependencies in sequential data and model continuous dynamical systems, while the incorporation of physics knowledge improves generalization ability. The key challenge lies in how to seamlessly incorporate partially known physics into SSMs. To achieve this, we decompose partially known system dynamics into known and unknown state matrices, which are integrated into a Phy-SSM unit. To further enhance long-term prediction performance, we introduce a physics state regularization term to make the estimated latent states align with system dynamics. Besides, we theoretically analyze the uniqueness of the solutions for our method. Extensive experiments on three real-world applications, including vehicle motion prediction, drone state prediction, and COVID-19 epidemiology forecasting, demonstrate the superior performance of Phy-SSM over the baselines in both long-term interpolation and extrapolation tasks. The code is available at https://github.com/511205787/Phy_SSM-ICML2025.
nan
Article 615
Title@2025-07-14 (1): FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing
Title: FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing | FlowAlign: Trajektorie-regularisierte, inversionsfreie Fluss-basierte Bildbearbeitung | 流动对等: 轨迹- 重新分类、 转换- 无流动图像编辑 2505.23145v3 |
Authors (4): Jeongsol Kim, Yeobin Hong, Jonghyun Park, Jong Chul Ye
Recent inversion-free, flow-based image editing methods such as FlowEdit leverages a pre-trained noise-to-image flow model such as Stable Diffusion 3, enabling text-driven manipulation by solving an ordinary differential equation (ODE). While the lack of exact latent inversion is a core advantage of these methods, it often results in unstable editing trajectories and poor source consistency. To address this limitation, we propose {\em FlowAlign}, a novel inversion-free flow-based framework for consistent image editing with optimal control-based trajectory control. Specifically, FlowAlign introduces source similarity at the terminal point as a regularization term to promote smoother and more consistent trajectories during the editing process. Notably, our terminal point regularization is shown to explicitly balance semantic alignment with the edit prompt and structural consistency with the source image along the trajectory. Furthermore, FlowAlign naturally supports reverse editing by simply reversing the ODE trajectory, highliting the reversible and consistent nature of the transformation. Extensive experiments demonstrate that FlowAlign outperforms existing methods in both source preservation and editing controllability.
nan
Article 616
Title@2025-07-14 (1): Score-of-Mixture Training: Training One-Step Generative Models Made Simple via Score Estimation of Mixture Distributions
Title: Score-of-Mixture Training: Training One-Step Generative Models Made Simple via Score Estimation of Mixture Distributions | Score-of-Mixture Training: Training Ein-Schritt-Generative Modelle einfach gemacht via Score-Abschätzung von Mixture-Distributionen | 混合计分培训:通过对混合分发品进行记分估计而简单化的单级生成模型培训 2502.09609v3 |
Authors (3): Tejas Jayashankar, J. Jon Ryu, Gregory Wornell
We propose Score-of-Mixture Training (SMT), a novel framework for training one-step generative models by minimizing a class of divergences called the $\alpha$-skew Jensen–Shannon divergence. At its core, SMT estimates the score of mixture distributions between real and fake samples across multiple noise levels. Similar to consistency models, our approach supports both training from scratch (SMT) and distillation using a pretrained diffusion model, which we call Score-of-Mixture Distillation (SMD). It is simple to implement, requires minimal hyperparameter tuning, and ensures stable training. Experiments on CIFAR-10 and ImageNet 64x64 show that SMT/SMD are competitive with and can even outperform existing methods.
nan
Article 617
Title@2025-07-14 (1): XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation
Title: XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation | XGeM: Ein Multi-Prompt-Stiftungsmodell für multimodale medizinische Datengenerierung | XGeM:多式医疗数据多式生成多式医疗多模式基金会模式 2501.04614v4 |
Authors (8): Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda
The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM: first we benchmark it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we show how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at https://cosbidev.github.io/XGeM/.
nan
Article 618
Title@2025-07-14 (1): Transfer Learning Analysis of Variational Quantum Circuits
Title: Transfer Learning Analysis of Variational Quantum Circuits | Transfer Learning Analyse von Variationalen Quantenkreisen | 变化量子电路变化性量子电路的转移学习分析 2501.01507v3 |
Authors (4): Huan-Hsin Tseng, Hsin-Yi Lin, Samuel Yen-Chi Chen, Shinjae Yoo
This work analyzes transfer learning of the Variational Quantum Circuit (VQC). Our framework begins with a pretrained VQC configured in one domain and calculates the transition of 1-parameter unitary subgroups required for a new domain. A formalism is established to investigate the adaptability and capability of a VQC under the analysis of loss bounds. Our theory observes knowledge transfer in VQCs and provides a heuristic interpretation for the mechanism. An analytical fine-tuning method is derived to attain the optimal transition for adaptations of similar domains.
nan
Article 619
Title@2025-07-14 (1): Accounting for multiplicity in machine learning benchmark performance
Title: Accounting for multiplicity in machine learning benchmark performance | Bilanzierung der Vielfältigkeit in der Benchmark-Leistung bei maschinellem Lernen | 机器学习基准业绩多重性核算 2303.07272v6 |
Authors (2): Kajsa Møllersen, Einar Holsbø
State-of-the-art (SOTA) performance refers to the highest performance achieved by some model on a test sample, preferably under controlled conditions such as public data (reproducibility) or public challenges (independent sample). Thousands of classifiers are applied, and the highest performance becomes the new reference point for a particular problem. In effect, this set-up is an estimate of the expected best performance among all classifiers applied to a random sample; a sample maximum estimate. In this paper, we argue that SOTA should instead be estimated by the expected performance of the best classifier, which can be done without knowing which classifier it is. Our contribution is the formal distinction between the two, and an investigation into the practical consequences of using the former to estimate the latter. This is done by presenting sample maximum estimator distributions for non-identical and dependent classifiers. We illustrate the impact on real world examples from public challenges.
nan
Article 620
Title@2025-07-14 (1): Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift
Title: Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift | Einschließliche Interventionale Unabhängigkeit verbessert Robustheit gegen Interventionale Verteilungsverschiebung | 纳入干预性独立 增强抵御干预性分配转变的力度 2507.05412v2 |
Authors (2): Gautam Sreekumar, Vishnu Naresh Boddeti
We consider the problem of learning robust discriminative representations of causally-related latent variables. In addition to observational data, the training dataset also includes interventional data obtained through targeted interventions on some of these latent variables to learn representations robust against the resulting interventional distribution shifts. Existing approaches treat interventional data like observational data, even when the underlying causal model is known, and ignore the independence relations that arise from these interventions. Since these approaches do not fully exploit the causal relational information resulting from interventions, they learn representations that produce large disparities in predictive performance on observational and interventional data, which worsens when the number of interventional training samples is limited. In this paper, (1) we first identify a strong correlation between this performance disparity and adherence of the representations to the independence conditions induced by the interventional causal model. (2) For linear models, we derive sufficient conditions on the proportion of interventional data in the training dataset, for which enforcing interventional independence between representations corresponding to the intervened node and its non-descendants lowers the error on interventional data. Combining these insights, (3) we propose RepLIn, a training algorithm to explicitly enforce this statistical independence during interventions. We demonstrate the utility of RepLIn on a synthetic dataset and on real image and text datasets on facial attribute classification and toxicity detection, respectively. Our experiments show that RepLIn is scalable with the number of nodes in the causal graph and is suitable to improve the robust representations against interventional distribution shifts of both continuous and discrete latent variables.
nan
Article 621
Title@2025-07-14 (1): Spatial Reasoners for Continuous Variables in Any Domain
Title: Spatial Reasoners for Continuous Variables in Any Domain | Räumliche Reasoner für kontinuierliche Variablen in jeder Domäne | 任何域中连续变量的空间理由 2507.10768v1 |
Authors (4): Bart Pogodzinski, Christopher Wewer, Bernt Schiele, Jan Eric Lenssen
We present Spatial Reasoners, a software framework to perform spatial reasoning over continuous variables with generative denoising models. Denoising generative models have become the de-facto standard for image generation, due to their effectiveness in sampling from complex, high-dimensional distributions. Recently, they have started being explored in the context of reasoning over multiple continuous variables. Providing infrastructure for generative reasoning with such models requires a high effort, due to a wide range of different denoising formulations, samplers, and inference strategies. Our presented framework aims to facilitate research in this area, providing easy-to-use interfaces to control variable mapping from arbitrary data domains, generative model paradigms, and inference strategies. Spatial Reasoners are openly available at https://spatialreasoners.github.io/
nan
Article 622
Title@2025-07-14 (1): IoT Malware Network Traffic Detection using Deep Learning and GraphSAGE Models
Title: IoT Malware Network Traffic Detection using Deep Learning and GraphSAGE Models | IoT Malware Netzwerk Traffic Detection mit Deep Learning und GraphSAGE-Modellen | 利用深深学习和图形分析模型来探测 Iot 恶意网络流量 2507.10758v1 |
Authors (4): Nikesh Prajapati, Bimal Karki, Saroj Gopali, Akbar Siami Namin
This paper intends to detect IoT malicious attacks through deep learning models and demonstrates a comprehensive evaluation of the deep learning and graph-based models regarding malicious network traffic detection. The models particularly are based on GraphSAGE, Bidirectional encoder representations from transformers (BERT), Temporal Convolutional Network (TCN) as well as Multi-Head Attention, together with Bidirectional Long Short-Term Memory (BI-LSTM) Multi-Head Attention and BI-LSTM and LSTM models. The chosen models demonstrated great performance to model temporal patterns and detect feature significance. The observed performance are mainly due to the fact that IoT system traffic patterns are both sequential and diverse, leaving a rich set of temporal patterns for the models to learn. Experimental results showed that BERT maintained the best performance. It achieved 99.94% accuracy rate alongside high precision and recall, F1-score and AUC-ROC score of 99.99% which demonstrates its capabilities through temporal dependency capture. The Multi-Head Attention offered promising results by providing good detection capabilities with interpretable results. On the other side, the Multi-Head Attention model required significant processing time like BI-LSTM variants. The GraphSAGE model achieved good accuracy while requiring the shortest training time but yielded the lowest accuracy, precision, and F1 score compared to the other models
nan
Article 623
Title@2025-07-14 (1): A Benchmarking Framework for AI models in Automotive Aerodynamics
Title: A Benchmarking Framework for AI models in Automotive Aerodynamics | Ein Benchmarking-Rahmen für KI-Modelle in der Automobilaerodynamik | 汽车空气动力学AI模型基准框架 2507.10747v1 |
Authors (8): Kaustubh Tangsali, Rishikesh Ranade, Mohammad Amin Nabian, Alexey Kamenev, Peter Sharpe, Neil Ashton, Ram Cherukuri, Sanjay Choudhry
In this paper, we introduce a benchmarking framework within the open-source NVIDIA PhysicsNeMo-CFD framework designed to systematically assess the accuracy, performance, scalability, and generalization capabilities of AI models for automotive aerodynamics predictions. The open extensible framework enables incorporation of a diverse set of metrics relevant to the Computer-Aided Engineering (CAE) community. By providing a standardized methodology for comparing AI models, the framework enhances transparency and consistency in performance assessment, with the overarching goal of improving the understanding and development of these models to accelerate research and innovation in the field. To demonstrate its utility, the framework includes evaluation of both surface and volumetric flow field predictions on three AI models: DoMINO, X-MeshGraphNet, and FIGConvNet using the DrivAerML dataset. It also includes guidelines for integrating additional models and datasets, making it extensible for physically consistent metrics. This benchmarking study aims to enable researchers and industry professionals in selecting, refining, and advancing AI-driven aerodynamic modeling approaches, ultimately fostering the development of more efficient, accurate, and interpretable solutions in automotive aerodynamics
nan
Article 624
Title@2025-07-14 (1): Language Models for Adult Service Website Text Analysis
Title: Language Models for Adult Service Website Text Analysis | Sprachmodelle für Erwachsene Service Website Textanalyse | 成人服务语言模式网站文本分析 2507.10743v1 |
Authors (5): Nickolas Freeman, Thanh Nguyen, Gregory Bott, Jason Parton, Collin Francel
Sex trafficking refers to the use of force, fraud, or coercion to compel an individual to perform in commercial sex acts against their will. Adult service websites (ASWs) have and continue to be linked to sex trafficking, offering a platform for traffickers to advertise their victims. Thus, organizations involved in the fight against sex trafficking often use ASW data when attempting to identify potential sex trafficking victims. A critical challenge in transforming ASW data into actionable insight is text analysis. Previous research using ASW data has shown that ASW ad text is important for linking ads. However, working with this text is challenging due to its extensive use of emojis, poor grammar, and deliberate obfuscation to evade law enforcement scrutiny. We conduct a comprehensive study of language modeling approaches for this application area, including simple information retrieval methods, pre-trained transformers, and custom transformer models. We demonstrate that characteristics of ASW text data allow efficient custom transformer models to be trained with relatively small GPU resources and used efficiently for inference on consumer hardware. Our custom models outperform fine-tuned variants of well-known encoder-only transformer models, including BERT-base, RoBERTa, and ModernBERT, on accuracy, recall, F1 score, and ROC AUC. We demonstrate the use of our best-performing custom configuration on three tasks related to ASW data analysis: (i) decomposing the giant component in a graph representation of ASW data, (ii) clustering ASW ad text, and (iii) using the learned token embeddings to understand the use of emojis in the illicit context we study. The models we develop represent a significant advancement in ASW text analysis, which can be leveraged in a variety of downstream applications and research.
nan
Article 625
Title@2025-07-14 (1): Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language
Title: Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language | Ground-Compose-Reinforce: Verstärktes Lernen durch formale Sprache | 地面综合部队:通过正式语文指定加强学习代理 2507.10741v1 |
Authors (5): Andrew C. Li, Toryn Q. Klassen, Andrew Wang, Parand A. Alamdari, Sheila A. McIlraith
Grounding language in complex perception (e.g. pixels) and action is a key challenge when building situated agents that can interact with humans via language. In past works, this is often solved via manual design of the language grounding or by curating massive datasets relating language to elements of the environment. We propose Ground-Compose-Reinforce, a neurosymbolic framework for grounding formal language from data, and eliciting behaviours by directly tasking RL agents through this language. By virtue of data-driven learning, our framework avoids the manual design of domain-specific elements like reward functions or symbol detectors. By virtue of compositional formal language semantics, our framework achieves data-efficient grounding and generalization to arbitrary language compositions. Experiments on an image-based gridworld and a MuJoCo robotics domain show that our approach reliably maps formal language instructions to behaviours with limited data while end-to-end, data-driven approaches fail.
nan
Article 626
Title@2025-07-14 (1): The Trust Calibration Maturity Model for Characterizing and Communicating Trustworthiness of AI Systems
Title: The Trust Calibration Maturity Model for Characterizing and Communicating Trustworthiness of AI Systems | Das Modell der Treuhandkalibrierungsreife zur Charakterisierung und Kommunikation der Vertrauenswürdigkeit von KI-Systemen | AI系统确定和传播信任度的信托校准期限模型 2503.15511v2 |
Authors (9): Scott T Steinmetz, Asmeret Naugle, Paul Schutte, Matt Sweitzer, Alex Washburne, Lisa Linville, Daniel Krofcheck, Michal Kucer, Samuel Myren
Recent proliferation of powerful AI systems has created a strong need for capabilities that help users to calibrate trust in those systems. As AI systems grow in scale, information required to evaluate their trustworthiness becomes less accessible, presenting a growing risk of using these systems inappropriately. We propose the Trust Calibration Maturity Model (TCMM) to characterize and communicate information about AI system trustworthiness. The TCMM incorporates five dimensions of analytic maturity: Performance Characterization, Bias & Robustness Quantification, Transparency, Safety & Security, and Usability. The TCMM can be presented along with system performance information to (1) help a user to appropriately calibrate trust, (2) establish requirements and track progress, and (3) identify research needs. Here, we discuss the TCMM and demonstrate it on two target tasks: using ChatGPT for high consequence nuclear science determinations, and using PhaseNet (an ensemble of seismic models) for categorizing sources of seismic events.
nan
Article 627
Title@2025-07-14 (1): Extracting Document Relations from Search Corpus by Marginalizing over User Queries
Title: Extracting Document Relations from Search Corpus by Marginalizing over User Queries | Extrahieren von Dokumentenbeziehungen aus dem Suchkorpus durch Marginalisierung über Benutzerfragen | 将文件关系从搜索 Corpus 中提取, 将其边缘化于用户查询 2507.10726v1 |
Authors (3): Yuki Iwamoto, Kaoru Tsunoda, Ken Kaneiwa
Understanding relationships between documents in large-scale corpora is essential for knowledge discovery and information organization. However, existing approaches rely heavily on manual annotation or predefined relationship taxonomies. We propose EDR-MQ (Extracting Document Relations by Marginalizing over User Queries), a novel framework that discovers document relationships through query marginalization. EDR-MQ is based on the insight that strongly related documents often co-occur in results across diverse user queries, enabling us to estimate joint probabilities between document pairs by marginalizing over a collection of queries. To enable this query marginalization approach, we develop Multiply Conditioned Retrieval-Augmented Generation (MC-RAG), which employs conditional retrieval where subsequent document retrievals depend on previously retrieved content. By observing co-occurrence patterns across diverse queries, EDR-MQ estimates joint probabilities between document pairs without requiring labeled training data or predefined taxonomies. Experimental results show that our query marginalization approach successfully identifies meaningful document relationships, revealing topical clusters, evidence chains, and cross-domain connections that are not apparent through traditional similarity-based methods. Our query-driven framework offers a practical approach to document organization that adapts to different user perspectives and information needs.
nan
Article 628
Title@2025-07-14 (1): Group-wise oracle-efficient algorithms for online multi-group learning
Title: Group-wise oracle-efficient algorithms for online multi-group learning | Gruppenweise Orakel-effiziente Algorithmen für Online-Multigruppen-Lernen | 用于在线多小组学习的群集法或手法效率算法 2406.05287v2 |
Authors (3): Samuel Deng, Daniel Hsu, Jingwen Liu
We study the problem of online multi-group learning, a learning model in which an online learner must simultaneously achieve small prediction regret on a large collection of (possibly overlapping) subsequences corresponding to a family of groups. Groups are subsets of the context space, and in fairness applications, they may correspond to subpopulations defined by expressive functions of demographic attributes. In contrast to previous work on this learning model, we consider scenarios in which the family of groups is too large to explicitly enumerate, and hence we seek algorithms that only access groups via an optimization oracle. In this paper, we design such oracle-efficient algorithms with sublinear regret under a variety of settings, including: (i) the i.i.d. setting, (ii) the adversarial setting with smoothed context distributions, and (iii) the adversarial transductive setting.
nan
Article 629
Title@2025-07-14 (1): State-Constrained Offline Reinforcement Learning
Title: State-Constrained Offline Reinforcement Learning | Staatlich bedingtes Offline-Verstärkungslernen | 国家培训的离线强化学习 2405.14374v2 |
Authors (3): Charles A. Hepburn, Yue Jin, Giovanni Montana
Traditional offline reinforcement learning (RL) methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of distributional shift but restricting the policy to seen actions. In this paper, we alleviate this limitation by introducing state-constrained offline RL, a novel framework that focuses solely on the dataset’s state distribution. This approach allows the policy to take high-quality out-of-distribution actions that lead to in-distribution states, significantly enhancing learning potential. The proposed setting not only broadens the learning horizon but also improves the ability to combine different trajectories from the dataset effectively, a desirable property inherent in offline RL. Our research is underpinned by theoretical findings that pave the way for subsequent advancements in this area. Additionally, we introduce StaCQ, a deep learning algorithm that achieves state-of-the-art performance on the D4RL benchmark datasets and aligns with our theoretical propositions. StaCQ establishes a strong baseline for forthcoming explorations in this domain.
nan
Article 630
Title@2025-07-14 (1): Distributionally Robust Optimization with Adversarial Data Contamination
Title: Distributionally Robust Optimization with Adversarial Data Contamination | Verteilungsstarke Optimierung mit Adversarial Data Contamination | 使用反对数据污染优化分布强力优化 2507.10718v1 |
Authors (3): Shuyao Li, Ilias Diakonikolas, Jelena Diakonikolas
Distributionally Robust Optimization (DRO) provides a framework for decision-making under distributional uncertainty, yet its effectiveness can be compromised by outliers in the training data. This paper introduces a principled approach to simultaneously address both challenges. We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions, where an $\epsilon$-fraction of the training data is adversarially corrupted. Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts, alongside an efficient algorithm inspired by robust statistics to solve the resulting optimization problem. We prove that our method achieves an estimation error of $O(\sqrt{\epsilon})$ for the true DRO objective value using only the contaminated data under the bounded covariance assumption. This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.
nan
Article 631
Title@2025-07-14 (1): Real-time, Adaptive Radiological Anomaly Detection and Isotope Identification Using Non-negative Matrix Factorization
Title: Real-time, Adaptive Radiological Anomaly Detection and Isotope Identification Using Non-negative Matrix Factorization | Echtzeit-, adaptive radiologische Anomalienerkennung und Isotopenidentifizierung mittels nicht-negativer Matrixfaktorisierung | 利用非负矩阵化系数进行实时适应性辐射异常探测和同位素识别 2507.10715v1 |
Authors (7): Chandler Jones, Mark Bandstra, Stefan Faaland, Yue Shi Lai, Nico Abgrall, Scott Suchyta, Reynold Cooper
Spectroscopic anomaly detection and isotope identification algorithms are integral components in nuclear nonproliferation applications such as search operations. The task is especially challenging in the case of mobile detector systems due to the fact that the observed gamma-ray background changes more than for a static detector system, and a pretrained background model can easily find itself out of domain. The result is that algorithms may exceed their intended false alarm rate, or sacrifice detection sensitivity in order to maintain the desired false alarm rate. Non-negative matrix factorization (NMF) has been shown to be a powerful tool for spectral anomaly detection and identification, but, like many similar algorithms that rely on data-driven background models, in its conventional implementation it is unable to update in real time to account for environmental changes that affect the background spectroscopic signature. We have developed a novel NMF-based algorithm that periodically updates its background model to accommodate changing environmental conditions. The Adaptive NMF algorithm involves fewer assumptions about its environment, making it more generalizable than existing NMF-based methods while maintaining or exceeding detection performance on simulated and real-world datasets.
nan
Article 632
Title@2025-07-14 (1): A Simple Approximate Bayesian Inference Neural Surrogate for Stochastic Petri Net Models
Title: A Simple Approximate Bayesian Inference Neural Surrogate for Stochastic Petri Net Models | Eine einfache ungefähre Bayesian Inferenz Neural Surrogate für stochastische Petri Net Modelle | 用于Stochastic Petrii 网模型的简单近近贝耶斯导引神经基体巡天模型 2507.10714v1 |
Authors (4): Bright Kwaku Manu, Trevor Reckell, Beckett Sterner, Petar Jevtic
Stochastic Petri Nets (SPNs) are an increasingly popular tool of choice for modeling discrete-event dynamics in areas such as epidemiology and systems biology, yet their parameter estimation remains challenging in general and in particular when transition rates depend on external covariates and explicit likelihoods are unavailable. We introduce a neural-surrogate (neural-network–based approximation of the posterior distribution) framework that predicts the coefficients of known covariate-dependent rate functions directly from noisy, partially observed token trajectories. Our model employs a lightweight 1D Convolutional Residual Network trained end-to-end on Gillespie-simulated SPN realizations, learning to invert system dynamics under realistic conditions of event dropout. During inference, Monte Carlo dropout provides calibrated uncertainty bounds together with point estimates. On synthetic SPNs with 20% missing events, our surrogate recovers rate-function coefficients with an RMSE = 0.108 and substantially runs faster than traditional Bayesian approaches. These results demonstrate that data-driven, likelihood-free surrogates can enable accurate, robust, and real-time parameter recovery in complex, partially observed discrete-event systems.
nan
Article 633
Title@2025-07-14 (1): Imitation Learning from a Single Temporally Misaligned Video
Title: Imitation Learning from a Single Temporally Misaligned Video | Imitation Lernen von einem einzigen temporär fehlgeleiteten Video | 从单一临时错配视频中学习 2502.05397v2 |
Authors (5): William Huey, Huaxiaoyue Wang, Anne Wu, Yoav Artzi, Sanjiban Choudhury
We examine the problem of learning sequential tasks from a single visual demonstration. A key challenge arises when demonstrations are temporally misaligned due to variations in timing, differences in embodiment, or inconsistencies in execution. Existing approaches treat imitation as a distribution-matching problem, aligning individual frames between the agent and the demonstration. However, we show that such frame-level matching fails to enforce temporal ordering or ensure consistent progress. Our key insight is that matching should instead be defined at the level of sequences. We propose that perfect matching occurs when one sequence successfully covers all the subgoals in the same order as the other sequence. We present ORCA (ORdered Coverage Alignment), a dense per-timestep reward function that measures the probability of the agent covering demonstration frames in the correct order. On temporally misaligned demonstrations, we show that agents trained with the ORCA reward achieve $4.5$x improvement ($0.11 \rightarrow 0.50$ average normalized returns) for Meta-world tasks and $6.6$x improvement ($6.55 \rightarrow 43.3$ average returns) for Humanoid-v4 tasks compared to the best frame-level matching algorithms. We also provide empirical analysis showing that ORCA is robust to varying levels of temporal misalignment. Our code is available at https://github.com/portal-cornell/orca/
nan
Article 634
Title@2025-07-14 (1): DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
Title: DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving | DroidSpeak: KV Cache Sharing für Cross-LLM Kommunikation und Multi-LLM Serving | DroidSpeak: KV 共享缓存, 用于跨 LLM 通信和多 LLM 服务 2411.02820v4 |
Authors (12): Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse
Compound AI systems, such as agentic systems, are an emerging trend in large-scale enterprise settings, with multiple LLMs specialized for different users, tasks, and/or roles working together. In these scenarios, different models often process inputs that share the same context prefix. Although much work was done in the past to enable the reuse of prefix KV caches across inputs for a single model, how to enable one model to reuse the prefix KV caches of a different model remains an open question. We introduce DroidSpeak, the first distributed LLM inference system that enables KV cache reuse across distributed nodes running inference of different LLMs, so long as the LLMs have the same architecture. We present the first study that aims at understanding the impact of sharing KV caches across different LLMs, and if/when such sharing affects quality. Inspired by the findings, we present DroidSpeak, which selectively recomputes a few layers of the KV cache produced by another LLM and reuses the remaining layers, with negligible quality loss. Moreover, carefully pipelining the layer-wise re-computation and the loading of reused KV cache further improves the inference performance. Experiments on diverse datasets and model pairs demonstrate that DroidSpeak achieves up to 4x throughput improvement and about 3.1x faster prefill (time to first token), with negligible loss of quality in F1 scores, Rouge-L or code similarity score, compared to the baseline which does not allow any sharing across models.
nan
Article 635
Title@2025-07-14 (1): Robust Multi-Manifold Clustering via Simplex Paths
Title: Robust Multi-Manifold Clustering via Simplex Paths | Robustes Multi-Manifold-Clustering über Simplex-Pfade | 通过 Simlipx 路径进行强力多功能集成集成 2507.10710v1 |
Authors (3): Haoyu Chen, Anna Little, Akin Narayan
This article introduces a novel, geometric approach for multi-manifold clustering (MMC), i.e. for clustering a collection of potentially intersecting, d-dimensional manifolds into the individual manifold components. We first compute a locality graph on d-simplices, using the dihedral angle in between adjacent simplices as the graph weights, and then compute infinity path distances in this simplex graph. This procedure gives a metric on simplices which we refer to as the largest angle path distance (LAPD). We analyze the properties of LAPD under random sampling, and prove that with an appropriate denoising procedure, this metric separates the manifold components with high probability. We validate the proposed methodology with extensive numerical experiments on both synthetic and real-world data sets. These experiments demonstrate that the method is robust to noise, curvature, and small intersection angle, and generally out-performs other MMC algorithms. In addition, we provide a highly scalable implementation of the proposed algorithm, which leverages approximation schemes for infinity path distance to achieve quasi-linear computational complexity.
nan
Article 636
Title@2025-07-14 (1): Kernel Learning for Mean-Variance Trading Strategies
Title: Kernel Learning for Mean-Variance Trading Strategies | Kernel-Lernen für Mittlere Varianz-Trading-Strategien | 平均变化贸易战略核心学习 2507.10701v1 |
Authors (3): Owen Futter, Nicola Muca Cirone, Blanka Horvath
In this article, we develop a kernel-based framework for constructing dynamic, pathdependent trading strategies under a mean-variance optimisation criterion. Building on the theoretical results of (Muca Cirone and Salvi, 2025), we parameterise trading strategies as functions in a reproducing kernel Hilbert space (RKHS), enabling a flexible and non-Markovian approach to optimal portfolio problems. We compare this with the signature-based framework of (Futter, Horvath, Wiese, 2023) and demonstrate that both significantly outperform classical Markovian methods when the asset dynamics or predictive signals exhibit temporal dependencies for both synthetic and market-data examples. Using kernels in this context provides significant modelling flexibility, as the choice of feature embedding can range from randomised signatures to the final layers of neural network architectures. Crucially, our framework retains closed-form solutions and provides an alternative to gradient-based optimisation.
nan
Article 637
Title@2025-07-14 (1): Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment
Title: Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment | Multi-Preference Lambda-bewertet Listwise DPO für Dynamic Preference Alignment | 多首选项 Lambda 加权列表 DPO 动态首选项一致 2506.19780v4 |
Authors (5): Yuhui Sun, Xiyao Wang, Zixi Li, Zhenlong Yuan, Jinman Zhao
While large language models (LLMs) excel at text generation, aligning them with human preferences remains challenging. Reinforcement learning from human feedback (RLHF) improves alignment but is costly and unstable. Direct Preference Optimization (DPO) offers a simpler alternative, yet assumes a fixed, single-dimensional preference. We propose Multi-Preference Lambda-weighted Listwise DPO, a generalization of DPO that supports multiple preference dimensions and dynamic interpolation via a simplex-weighted lambda vector. Our method enables listwise supervision and flexible alignment without re-training. While our experiments are conducted on 1B-2B scale models, this is an intentional choice: smaller models provide a more stringent testbed where performance improvements more clearly reflect the effectiveness of the alignment strategy itself. Moreover, such models are widely used in compute-constrained applications, making our improvements both methodologically meaningful and practically valuable. Empirical results show that our approach matches or surpasses standard DPO on alignment benchmarks while offering improved adaptability.
nan
Article 638
Title@2025-07-14 (1): Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder
Title: Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder | Selbstüberwachtes Lernen auf der Kamera Trap Footage führt zu einem starken universellen Gesicht Embedder | 由自我监督的关于摄影机陷阱脚脚的自监学习 强烈的通用面部嵌入器 2507.10552v1 |
Authors (4): Vladimir Iashin, Horace Lee, Dan Schofield, Andrew Zisserman
Camera traps are revolutionising wildlife monitoring by capturing vast amounts of visual data; however, the manual identification of individual animals remains a significant bottleneck. This study introduces a fully self-supervised approach to learning robust chimpanzee face embeddings from unlabeled camera-trap footage. Leveraging the DINOv2 framework, we train Vision Transformers on automatically mined face crops, eliminating the need for identity labels. Our method demonstrates strong open-set re-identification performance, surpassing supervised baselines on challenging benchmarks such as Bossou, despite utilising no labelled data during training. This work underscores the potential of self-supervised learning in biodiversity monitoring and paves the way for scalable, non-invasive population studies.
nan
Article 639
Title@2025-07-14 (1): Quantize-then-Rectify: Efficient VQ-VAE Training
Title: Quantize-then-Rectify: Efficient VQ-VAE Training | Quantize-then-Rectify: Effiziente VQ-VAE-Schulung | 量化-随后确定:有效的VQ-VAE培训 2507.10547v1 |
Authors (5): Borui Zhang, Qihang Rao, Wenzhao Zheng, Jie Zhou, Jiwen Lu
Visual tokenizers are pivotal in multimodal large models, acting as bridges between continuous inputs and discrete tokens. Nevertheless, training high-compression-rate VQ-VAEs remains computationally demanding, often necessitating thousands of GPU hours. This work demonstrates that a pre-trained VAE can be efficiently transformed into a VQ-VAE by controlling quantization noise within the VAE’s tolerance threshold. We present \textbf{Quantize-then-Rectify (ReVQ)}, a framework leveraging pre-trained VAEs to enable rapid VQ-VAE training with minimal computational overhead. By integrating \textbf{channel multi-group quantization} to enlarge codebook capacity and a \textbf{post rectifier} to mitigate quantization errors, ReVQ compresses ImageNet images into at most 512 tokens while sustaining competitive reconstruction quality (rFID = 1.06). Significantly, ReVQ reduces training costs by over two orders of magnitude relative to state-of-the-art approaches: ReVQ finishes full training on a single NVIDIA 4090 in approximately 22 hours, whereas comparable methods require 4.5 days on 32 A100 GPUs. Experimental results show that ReVQ achieves superior efficiency-reconstruction trade-offs.
nan
Article 640
Title@2025-07-14 (1): Disentangling Neural Disjunctive Normal Form Models
Title: Disentangling Neural Disjunctive Normal Form Models | Entwirren neural disjunktiver Normalformmodelle | 分离神经分相正常格式模型 2507.10546v1 |
Authors (6): Kexin Gu Baugh, Vincent Perreault, Matthew Baugh, Luke Dickens, Katsumi Inoue, Alessandra Russo
Neural Disjunctive Normal Form (DNF) based models are powerful and interpretable approaches to neuro-symbolic learning and have shown promising results in classification and reinforcement learning settings without prior knowledge of the tasks. However, their performance is degraded by the thresholding of the post-training symbolic translation process. We show here that part of the performance degradation during translation is due to its failure to disentangle the learned knowledge represented in the form of the networks’ weights. We address this issue by proposing a new disentanglement method; by splitting nodes that encode nested rules into smaller independent nodes, we are able to better preserve the models’ performance. Through experiments on binary, multiclass, and multilabel classification tasks (including those requiring predicate invention), we demonstrate that our disentanglement method provides compact and interpretable logical representations for the neural DNF-based models, with performance closer to that of their pre-translation counterparts. Our code is available at https://github.com/kittykg/disentangling-ndnf-classification.
nan
Article 641
Title@2025-07-14 (1): Fusing LLM Capabilities with Routing Data
Title: Fusing LLM Capabilities with Routing Data | LLM-Fähigkeiten mit Routing-Daten | Fusing LLM 带路标数据功能的Fusing LLM 功能 2507.10540v1 |
Authors (8): Tao Feng, Haozhen Zhang, Zijie Lei, Pengrui Han, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jiaxuan You
The rapid advancement of large language models (LLMs) has created a vibrant ecosystem of diverse architectures, each with unique strengths due to differences in design, training data, and objectives. However, most applications still rely on a single backend model, limiting coverage of capabilities and leading to inefficiencies in performance and token cost when tackling complex tasks. We highlight an underexploited opportunity: LLM routing data, produced when hosting platforms route diverse queries to different models, which can reveal comparative strengths across tasks. To address this, we propose FusionBench, a comprehensive routing benchmark covering 14 tasks across five domains with 20 open-source LLMs (8B to 671B parameters), capturing 103M tokens and summarizing reusable thought templates from top models. Building on this, we introduce FusionFactory, a systematic fusion framework with three levels: (1) query-level fusion, tailoring routers for each query using both direct responses and reasoning-augmented outputs; (2) thought-level fusion, leveraging abstract templates derived from top-performing LLMs’ answers to similar queries; and (3) model-level fusion, transferring capabilities between models via distillation, using top responses or highest judge scores as training data. Experiments show FusionFactory consistently outperforms the best individual LLM across all 14 benchmarks, with optimal fusion configurations varying by benchmark, demonstrating the value of systematic LLM fusion in harnessing complementary strengths and improving overall performance.
nan
Article 642
Title@2025-07-14 (1): Graph World Model
Title: Graph World Model | Schaubild-Weltmodell | 世界模型 2507.10539v1 |
Authors (4): Tao Feng, Yexin Wu, Guanyu Lin, Jiaxuan You
World models (WMs) demonstrate strong capabilities in prediction, generation, and planning tasks. Existing WMs primarily focus on unstructured data and cannot leverage the ubiquitous structured data, often represented as graphs, in the digital world. While multiple graph foundation models have been proposed, they focus on graph learning tasks and cannot extend to diverse multi-modal data and interdisciplinary tasks. To address these challenges, we propose the Graph World Model (GWM), a world model that supports both unstructured and graph-structured states with multi-modal information and represents diverse tasks as actions. The core of a GWM is a generic message-passing algorithm to aggregate structured information, either over a unified multi-modal token space by converting multi-modal data into text (GWM-T) or a unified multi-modal embedding space by modality-specific encoders (GWM-E). Notably, GWM introduces action nodes to support diverse tasks, where action nodes are linked to other nodes via direct reference or similarity computation. Extensive experiments on six tasks from diverse domains, including multi-modal generation and matching, recommendation, graph prediction, multi-agent, retrieval-augmented generation, and planning and optimization, show that the same GWM outperforms or matches domain-specific baselines’ performance, benefits from multi-hop structures, and demonstrates strong zero-shot/few-shot capabilities on unseen new tasks. Our code for GWM is released at https://github.com/ulab-uiuc/GWM.
nan
Article 643
Title@2025-07-14 (1): On the Performance of Differentially Private Optimization with Heavy-Tail Class Imbalance
Title: On the Performance of Differentially Private Optimization with Heavy-Tail Class Imbalance | Zur Performance der differenzierten privaten Optimierung mit Heavy-Tail-Klasse-Unwucht | 以重赛级的不平衡进行有区别的私人优化 2507.10536v1 |
Authors (3): Qiaoyue Tang, Alain Zhiyanov, Mathias Lécuyer
In this work, we analyze the optimization behaviour of common private learning optimization algorithms under heavy-tail class imbalanced distribution. We show that, in a stylized model, optimizing with Gradient Descent with differential privacy (DP-GD) suffers when learning low-frequency classes, whereas optimization algorithms that estimate second-order information do not. In particular, DP-AdamBC that removes the DP bias from estimating loss curvature is a crucial component to avoid the ill-condition caused by heavy-tail class imbalance, and empirically fits the data better with $\approx8\%$ and $\approx5\%$ increase in training accuracy when learning the least frequent classes on both controlled experiments and real data respectively.
nan
Article 644
Title@2025-07-14 (1): Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
Title: Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Begründung oder Erinnerung? Unzuverlässige Ergebnisse des Verstärkungslernens aufgrund von Datenkontamination | 理由或记忆化?由于数据污染而加强学习的不可靠结果 2507.10532v1 |
Authors (12): Mingqi Wu, Zhihao Zhang, Qiaole Dong, Zhiheng Xi, Jun Zhao, Senjie Jin, Xiaoran Fan, Yuhao Zhou, Yanwei Fu, Qin Liu, Songyang Zhang, Qi Zhang
The reasoning capabilities of large language models (LLMs) have been a longstanding focus of research. Recent works have further enhanced these capabilities using reinforcement learning (RL), with many new methods claiming significant improvements with minimal or no external supervision. Surprisingly, some studies even suggest that random or incorrect reward signals can enhance reasoning performance. However, these breakthroughs are mostly reported on the Qwen2.5 model family and evaluated on well-known benchmarks such as MATH-500, AMC, and AIME, while failing to achieve similar gains on other models like Llama, which warrants further investigation. Our analysis shows that although Qwen2.5 achieves strong mathematical reasoning performance, its pretraining on large-scale web corpora makes it vulnerable to data contamination in popular benchmarks. As a result, results derived from these benchmarks may be unreliable. To address this, we introduce a generator that produces fully synthetic arithmetic problems of arbitrary length and difficulty, yielding a clean dataset we call RandomCalculation. Using these leakage-free datasets, we show that only accurate reward signals consistently improve performance, while noisy or incorrect signals do not. We advocate for evaluating RL methods on uncontaminated benchmarks and across diverse model families to ensure trustworthy conclusions.
nan
Article 645
Title@2025-07-14 (1): Expert-level validation of AI-generated medical text with scalable language models
Title: Expert-level validation of AI-generated medical text with scalable language models | Validierung von KI-generierten medizinischen Texten auf Expertenebene mit skalierbaren Sprachmodellen | 专家一级对AI产生的带有可缩放语言模型的可缩放语言模型的医学文本进行鉴定 2507.03152v2 |
Authors (27): Asad Aali, Vasiliki Bikia, Maya Varma, Nicole Chiou, Sophie Ostmeier, Arnav Singhvi, Magdalini Paschali, Ashwin Kumar, Andrew Johnston, Karimar Amador-Martinez, Eduardo Juan Perez Guerrero, Paola Naovi Cruz Rivera, Sergios Gatidis, Christian Bluethgen, Eduardo Pontes Reis, Eddy D. Zandee van Rilland, Poonam Laxmappa Hosamani, Kevin R Keet, Minjoung Go, Evelyn Ling, David B. Larson, Curtis Langlotz, Roxana Daneshjou, Jason Hom, Sanmi Koyejo, Emily Alsentzer, Akshay S. Chaudhari
With the growing use of language models (LMs) in clinical environments, there is an immediate need to evaluate the accuracy and safety of LM-generated medical text. Currently, such evaluation relies solely on manual physician review. However, detecting errors in LM-generated text is challenging because 1) manual review is costly and 2) expert-composed reference outputs are often unavailable in real-world settings. While the “LM-as-judge” paradigm (a LM evaluating another LM) offers scalable evaluation, even frontier LMs can miss subtle but clinically significant errors. To address these challenges, we propose MedVAL, a self-supervised framework that leverages synthetic data to train evaluator LMs to assess whether LM-generated medical outputs are factually consistent with inputs, without requiring physician labels or reference outputs. To evaluate LM performance, we introduce MedVAL-Bench, a dataset containing 840 outputs annotated by physicians, following a physician-defined taxonomy of risk levels and error categories. Across 6 diverse medical tasks and 10 state-of-the-art LMs spanning open-source, proprietary, and medically adapted models, MedVAL fine-tuning significantly improves (p < 0.001) alignment with physicians on both seen and unseen tasks, increasing average F1 scores from 66% to 83%, with per-sample safety classification scores up to 86%. MedVAL improves the performance of even the best-performing proprietary LM (GPT-4o) by 8%. To support a scalable, risk-aware pathway towards clinical integration, we open-source the 1) codebase (https://github.com/StanfordMIMI/MedVAL), 2) MedVAL-Bench (https://huggingface.co/datasets/stanfordmimi/MedVAL-Bench), and 3) MedVAL-4B (https://huggingface.co/stanfordmimi/MedVAL-4B), the best-performing open-source LM. Our research provides the first evidence of LMs approaching expert-level validation ability for medical text.
nan
Article 646
Title@2025-07-14 (1): Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Title: Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation | Mixture-of-Recursions: Dynamische Rekursive Tiefen für adaptive Token-Level-Computation lernen | 混合流流流:学习适应调控级计算法的动态回流深度 2507.10524v1 |
Authors (11): Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, Se-Young Yun
Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deployment expensive. Existing efficiency efforts typically target either parameter sharing or adaptive computation, leaving open the question of how to attain both simultaneously. We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer. MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking by dynamically assigning different recursion depths to individual tokens. This allows MoR to focus quadratic attention computation only among tokens still active at a given recursion depth, further improving memory access efficiency by selectively caching only their key-value pairs. Beyond these core mechanisms, we also propose a KV sharing variant that reuses KV pairs from the first recursion, specifically designed to decrease prefill latency and memory footprint. Across model scales ranging from 135M to 1.7B parameters, MoR forms a new Pareto frontier: at equal training FLOPs and smaller model sizes, it significantly lowers validation perplexity and improves few-shot accuracy, while delivering higher throughput compared with vanilla and existing recursive baselines. These gains demonstrate that MoR is an effective path towards large-model quality without incurring large-model cost.
nan
Article 647
Title@2025-07-14 (1): Ark: An Open-source Python-based Framework for Robot Learning
Title: Ark: An Open-source Python-based Framework for Robot Learning | Ark: Ein Open-Source-Python-basiertes Framework für das Roboterlernen | Ark:一个基于开放源码的机器人学习 Python 框架 2506.21628v2 |
Authors (13): Magnus Dierking, Christopher E. Mower, Sarthak Das, Huang Helong, Jiacheng Qiu, Cody Reading, Wei Chen, Huidong Liang, Huang Guowei, Jan Peters, Quan Xingyue, Jun Wang, Haitham Bou-Ammar
Robotics has made remarkable hardware strides-from DARPA’s Urban and Robotics Challenges to the first humanoid-robot kickboxing tournament-yet commercial autonomy still lags behind progress in machine learning. A major bottleneck is software: current robot stacks demand steep learning curves, low-level C/C++ expertise, fragmented tooling, and intricate hardware integration, in stark contrast to the Python-centric, well-documented ecosystems that propelled modern AI. We introduce ARK, an open-source, Python-first robotics framework designed to close that gap. ARK presents a Gym-style environment interface that allows users to collect data, preprocess it, and train policies using state-of-the-art imitation-learning algorithms (e.g., ACT, Diffusion Policy) while seamlessly toggling between high-fidelity simulation and physical robots. A lightweight client-server architecture provides networked publisher-subscriber communication, and optional C/C++ bindings ensure real-time performance when needed. ARK ships with reusable modules for control, SLAM, motion planning, system identification, and visualization, along with native ROS interoperability. Comprehensive documentation and case studies-from manipulation to mobile navigation-demonstrate rapid prototyping, effortless hardware swapping, and end-to-end pipelines that rival the convenience of mainstream machine-learning workflows. By unifying robotics and AI practices under a common Python umbrella, ARK lowers entry barriers and accelerates research and commercial deployment of autonomous robots.
nan
Article 648
Title@2025-07-14 (1): Visual Test-time Scaling for GUI Agent Grounding
Title: Visual Test-time Scaling for GUI Agent Grounding | Visual Test-Time Scaling für GUI Agent Grounding | GUI 代理定位的视觉测试时间缩放 2505.00684v2 |
Authors (4): Tiange Luo, Lajanugen Logeswaran, Justin Johnson, Honglak Lee
We introduce RegionFocus, a visual test-time scaling approach for Vision Language Model Agents. Understanding webpages is challenging due to the visual complexity of GUI images and the large number of interface elements, making accurate action selection difficult. Our approach dynamically zooms in on relevant regions, reducing background clutter and improving grounding accuracy. To support this process, we propose an image-as-map mechanism that visualizes key landmarks at each step, providing a transparent action record and enables the agent to effectively choose among action candidates. Even with a simple region selection strategy, we observe significant performance gains of 28+\% on Screenspot-pro and 24+\% on WebVoyager benchmarks on top of two state-of-the-art open vision language model agents, UI-TARS and Qwen2.5-VL, highlighting the effectiveness of visual test-time scaling in interactive settings. We achieve a new state-of-the-art grounding performance of 61.6\% on the ScreenSpot-Pro benchmark by applying RegionFocus to a Qwen2.5-VL-72B model. Our code will be released publicly at https://github.com/tiangeluo/RegionFocus.
nan
Article 649
Title@2025-07-14 (1): A Unified View on Learning Unnormalized Distributions via Noise-Contrastive Estimation
Title: A Unified View on Learning Unnormalized Distributions via Noise-Contrastive Estimation | Unified View on Learning Unnormalized Distributions via Noise-Contrastive Assimation | 关于通过噪音 – – 关心 – – 估计来学习非正常分发的统一观点 2409.18209v2 |
Authors (3): J. Jon Ryu, Abhin Shah, Gregory W. Wornell
This paper studies a family of estimators based on noise-contrastive estimation (NCE) for learning unnormalized distributions. The main contribution of this work is to provide a unified perspective on various methods for learning unnormalized distributions, which have been independently proposed and studied in separate research communities, through the lens of NCE. This unified view offers new insights into existing estimators. Specifically, for exponential families, we establish the finite-sample convergence rates of the proposed estimators under a set of regularity assumptions, most of which are new.
nan
Article 650
Title@2025-07-14 (1): Improved Offline Contextual Bandits with Second-Order Bounds: Betting and Freezing
Title: Improved Offline Contextual Bandits with Second-Order Bounds: Betting and Freezing | Verbesserte Offline-Kontext-Banditen mit Second-Order-Bounds: Wetten und Einfrieren | 改善有二阶边界的离线环境强盗:打赌和冻结 2502.10826v2 |
Authors (4): J. Jon Ryu, Jeongyeol Kwon, Benjamin Koppe, Kwang-Sung Jun
We consider off-policy selection and learning in contextual bandits, where the learner aims to select or train a reward-maximizing policy using data collected by a fixed behavior policy. Our contribution is two-fold. First, we propose a novel off-policy selection method that leverages a new betting-based confidence bound applied to an inverse propensity weight sequence. Our theoretical analysis reveals that this method achieves a significantly improved, variance-adaptive guarantee over prior work. Second, we propose a novel and generic condition on the optimization objective for off-policy learning that strikes a different balance between bias and variance. One special case, which we call freezing, tends to induce low variance, which is preferred in small-data regimes. Our analysis shows that it matches the best existing guarantees. In our empirical study, our selection method outperforms existing methods, and freezing exhibits improved performance in small-sample regimes.
nan
Article 651
Title@2025-07-14 (1): National level satellite-based crop field inventories in smallholder landscapes
Title: National level satellite-based crop field inventories in smallholder landscapes | Nationale satellitengestützte Feldbestände in Kleinbauernlandschaften | 国家一级基于卫星的小农地貌景观作物实地清单 2507.10499v1 |
Authors (8): Philippe Rufin, Pauline Lucie Hammer, Leon-Friedrich Thomas, Sá Nogueira Lisboa, Natasha Ribeiro, Almeida Sitoe, Patrick Hostert, Patrick Meyfroidt
The design of science-based policies to improve the sustainability of smallholder agriculture is challenged by a limited understanding of fundamental system properties, such as the spatial distribution of active cropland and field size. We integrate very high spatial resolution (1.5 m) Earth observation data and deep transfer learning to derive crop field delineations in complex agricultural systems at the national scale, while maintaining minimum reference data requirements and enhancing transferability. We provide the first national-level dataset of 21 million individual fields for Mozambique (covering ~800,000 km2) for 2023. Our maps separate active cropland from non-agricultural land use with an overall accuracy of 93% and balanced omission and commission errors. Field-level spatial agreement reached median intersection over union (IoU) scores of 0.81, advancing the state-of-the-art in large-area field delineation in complex smallholder systems. The active cropland maps capture fragmented rural regions with low cropland shares not yet identified in global land cover or cropland maps. These regions are mostly located in agricultural frontier regions which host 7-9% of the Mozambican population. Field size in Mozambique is very low overall, with half of the fields being smaller than 0.16 ha, and 83% smaller than 0.5 ha. Mean field size at aggregate spatial resolution (0.05{\deg}) is 0.32 ha, but it varies strongly across gradients of accessibility, population density, and net forest cover change. This variation reflects a diverse set of actors, ranging from semi-subsistence smallholder farms to medium-scale commercial farming, and large-scale farming operations. Our results highlight that field size is a key indicator relating to socio-economic and environmental outcomes of agriculture (e.g., food production, livelihoods, deforestation, biodiversity), as well as their trade-offs.
nan
Article 652
Title@2025-07-14 (1): Split Happens: Combating Advanced Threats with Split Learning and Function Secret Sharing
Title: Split Happens: Combating Advanced Threats with Split Learning and Function Secret Sharing | Split passiert: Mit Split Learning und Function Secret Sharing gegen fortgeschrittene Bedrohungen | 分化事件:通过分化学习和职能秘密分享来对抗先进威胁 2507.10494v1 |
Authors (3): Tanveer Khan, Mindaugas Budzys, Antonis Michalas
Split Learning (SL) – splits a model into two distinct parts to help protect client data while enhancing Machine Learning (ML) processes. Though promising, SL has proven vulnerable to different attacks, thus raising concerns about how effective it may be in terms of data privacy. Recent works have shown promising results for securing SL through the use of a novel paradigm, named Function Secret Sharing (FSS), in which servers obtain shares of a function they compute and operate on a public input hidden with a random mask. However, these works fall short in addressing the rising number of attacks which exist on SL. In SplitHappens, we expand the combination of FSS and SL to U-shaped SL. Similarly to other works, we are able to make use of the benefits of SL by reducing the communication and computational costs of FSS. However, a U-shaped SL provides a higher security guarantee than previous works, allowing a client to keep the labels of the training data secret, without having to share them with the server. Through this, we are able to generalize the security analysis of previous works and expand it to different attack vectors, such as modern model inversion attacks as well as label inference attacks. We tested our approach for two different convolutional neural networks on different datasets. These experiments show the effectiveness of our approach in reducing the training time as well as the communication costs when compared to simply using FSS while matching prior accuracy.
nan
Article 653
Title@2025-07-14 (1): On the Robustness Tradeoff in Fine-Tuning
Title: On the Robustness Tradeoff in Fine-Tuning | Über die Robustheit im Feintuning | 关于强健的决断 2503.14836v2 |
Authors (7): Kunyang Li, Jean-Charles Noirot Ferrand, Ryan Sheatsley, Blaine Hoak, Yohan Beugin, Eric Pauley, Patrick McDaniel
Fine-tuning has become the standard practice for adapting pre-trained models to downstream tasks. However, the impact on model robustness is not well understood. In this work, we characterize the robustness-accuracy trade-off in fine-tuning. We evaluate the robustness and accuracy of fine-tuned models over 6 benchmark datasets and 7 different fine-tuning strategies. We observe a consistent trade-off between adversarial robustness and accuracy. Peripheral updates such as BitFit are more effective for simple tasks – over 75% above the average measured by the area under the Pareto frontiers on CIFAR-10 and CIFAR-100. In contrast, fine-tuning information-heavy layers, such as attention layers via Compacter, achieves a better Pareto frontier on more complex tasks – 57.5% and 34.6% above the average on Caltech-256 and CUB-200, respectively. Lastly, we observe that the robustness of fine-tuning against out-of-distribution data closely tracks accuracy. These insights emphasize the need for robustness-aware fine-tuning to ensure reliable real-world deployments.
nan
Article 654
Title@2025-07-14 (1): BenchReAD: A systematic benchmark for retinal anomaly detection
Title: BenchReAD: A systematic benchmark for retinal anomaly detection | BenchReAD: Ein systematischer Benchmark für Netzhautanomaliendetektion | BenchReAD: 视视网膜异常现象探测系统基准 2507.10492v1 |
Authors (4): Chenyu Lian, Hong-Yu Zhou, Zhanli Hu, Jing Qin
Retinal anomaly detection plays a pivotal role in screening ocular and systemic diseases. Despite its significance, progress in the field has been hindered by the absence of a comprehensive and publicly available benchmark, which is essential for the fair evaluation and advancement of methodologies. Due to this limitation, previous anomaly detection work related to retinal images has been constrained by (1) a limited and overly simplistic set of anomaly types, (2) test sets that are nearly saturated, and (3) a lack of generalization evaluation, resulting in less convincing experimental setups. Furthermore, existing benchmarks in medical anomaly detection predominantly focus on one-class supervised approaches (training only with negative samples), overlooking the vast amounts of labeled abnormal data and unlabeled data that are commonly available in clinical practice. To bridge these gaps, we introduce a benchmark for retinal anomaly detection, which is comprehensive and systematic in terms of data and algorithm. Through categorizing and benchmarking previous methods, we find that a fully supervised approach leveraging disentangled representations of abnormalities (DRA) achieves the best performance but suffers from significant drops in performance when encountering certain unseen anomalies. Inspired by the memory bank mechanisms in one-class supervised learning, we propose NFM-DRA, which integrates DRA with a Normal Feature Memory to mitigate the performance degradation, establishing a new SOTA. The benchmark is publicly available at https://github.com/DopamineLcy/BenchReAD.
nan
Article 655
Title@2025-07-14 (1): Enabling Advanced Land Cover Analytics: An Integrated Data Extraction Pipeline for Predictive Modeling with the Dynamic World Dataset
Title: Enabling Advanced Land Cover Analytics: An Integrated Data Extraction Pipeline for Predictive Modeling with the Dynamic World Dataset | Ermöglichen von Advanced Land Cover Analytics: Eine integrierte Datenextraktionspipeline für vorausschauende Modellierung mit dem Dynamic World Dataset | 扶持性先进土地覆盖分析分析:利用动态世界数据集进行预测模拟的综合数据提取管道 2410.09135v2 |
Authors (7): Victor Radermecker, Andrea Zanon, Nancy Thomas, Annita Vapsi, Saba Rahimi, Rama Ramakrishnan, Daniel Borrajo
Understanding land cover holds considerable potential for a myriad of practical applications, particularly as data accessibility transitions from being exclusive to governmental and commercial entities to now including the broader research community. Nevertheless, although the data is accessible to any community member interested in exploration, there exists a formidable learning curve and no standardized process for accessing, pre-processing, and leveraging the data for subsequent tasks. In this study, we democratize this data by presenting a flexible and efficient end to end pipeline for working with the Dynamic World dataset, a cutting-edge near-real-time land use/land cover (LULC) dataset. This includes a pre-processing and representation framework which tackles noise removal, efficient extraction of large amounts of data, and re-representation of LULC data in a format well suited for several downstream tasks. To demonstrate the power of our pipeline, we use it to extract data for an urbanization prediction problem and build a suite of machine learning models with excellent performance. This task is easily generalizable to the prediction of any type of land cover and our pipeline is also compatible with a series of other downstream tasks.
nan
Article 656
Title@2025-07-14 (1): Overcoming catastrophic forgetting in neural networks
Title: Overcoming catastrophic forgetting in neural networks | Überwindung des katastrophalen Vergessens in neuronalen Netzwerken | 克服神经网络中的灾难性遗忘 2507.10485v1 |
Authors (5): Brandon Shuen Yi Loke, Filippo Quadri, Gabriel Vivanco, Maximilian Casagrande, Saúl Fenollosa
Catastrophic forgetting is the primary challenge that hinders continual learning, which refers to a neural network ability to sequentially learn multiple tasks while retaining previously acquired knowledge. Elastic Weight Consolidation, a regularization-based approach inspired by synaptic consolidation in biological neural systems, has been used to overcome this problem. In this study prior research is replicated and extended by evaluating EWC in supervised learning settings using the PermutedMNIST and RotatedMNIST benchmarks. Through systematic comparisons with L2 regularization and stochastic gradient descent (SGD) without regularization, we analyze how different approaches balance knowledge retention and adaptability. Our results confirm what was shown in previous research, showing that EWC significantly reduces forgetting compared to naive training while slightly compromising learning efficiency on new tasks. Moreover, we investigate the impact of dropout regularization and varying hyperparameters, offering insights into the generalization of EWC across diverse learning scenarios. These results underscore EWC’s potential as a viable solution for lifelong learning in neural networks.
nan
Article 657
Title@2025-07-14 (1): Random Erasing vs. Model Inversion: A Promising Defense or a False Hope?
Title: Random Erasing vs. Model Inversion: A Promising Defense or a False Hope? | Zufällige Auslöschung gegen Modellumkehr: Eine vielversprechende Verteidigung oder eine falsche Hoffnung? | 随机反射与模型反射:有希望的防御还是虚幻的希望? 2409.01062v2 |
Authors (7): Viet-Hung Tran, Ngoc-Bao Nguyen, Son T. Mai, Hans Vandierendonck, Ira Assent, Alex Kot, Ngai-Man Cheung
Model Inversion (MI) attacks pose a significant privacy threat by reconstructing private training data from machine learning models. While existing defenses primarily concentrate on model-centric approaches, the impact of data on MI robustness remains largely unexplored. In this work, we explore Random Erasing (RE), a technique traditionally used for improving model generalization under occlusion, and uncover its surprising effectiveness as a defense against MI attacks. Specifically, our novel feature space analysis shows that models trained with RE-images introduce a significant discrepancy between the features of MI-reconstructed images and those of the private data. At the same time, features of private images remain distinct from other classes and well-separated from different classification regions. These effects collectively degrade MI reconstruction quality and attack accuracy while maintaining reasonable natural accuracy. Furthermore, we explore two critical properties of RE including Partial Erasure and Random Location. Partial Erasure prevents the model from observing entire objects during training. We find this has a significant impact on MI, which aims to reconstruct the entire objects. Random Location of erasure plays a crucial role in achieving a strong privacy-utility trade-off. Our findings highlight RE as a simple yet effective defense mechanism that can be easily integrated with existing privacy-preserving techniques. Extensive experiments across 37 setups demonstrate that our method achieves state-of-the-art (SOTA) performance in the privacy-utility trade-off. The results consistently demonstrate the superiority of our defense over existing methods across different MI attacks, network architectures, and attack configurations. For the first time, we achieve a significant degradation in attack accuracy without a decrease in utility for some configurations.
nan
Article 658
Title@2025-07-14 (1): From BERT to Qwen: Hate Detection across architectures
Title: From BERT to Qwen: Hate Detection across architectures | Von BERT bis Qwen: Hasserkennung über Architekturen hinweg | 从BERT到Quw:跨结构的仇恨检测 2507.10468v1 |
Authors (3): Ariadna Mon, Saúl Fenollosa, Jon Lecumberri
Online platforms struggle to curb hate speech without over-censoring legitimate discourse. Early bidirectional transformer encoders made big strides, but the arrival of ultra-large autoregressive LLMs promises deeper context-awareness. Whether this extra scale actually improves practical hate-speech detection on real-world text remains unverified. Our study puts this question to the test by benchmarking both model families, classic encoders and next-generation LLMs, on curated corpora of online interactions for hate-speech detection (Hate or No Hate).
nan
Article 659
Title@2025-07-14 (1): An Interoperable Machine Learning Pipeline for Pediatric Obesity Risk Estimation
Title: An Interoperable Machine Learning Pipeline for Pediatric Obesity Risk Estimation | Eine interoperable Machine Learning Pipeline für die Abschätzung des Kinderleibsrisikos | 用于小儿产科风险估计的可互操作的机器学习管道 2412.10454v2 |
Authors (7): Hamed Fayyaz, Mehak Gupta, Alejandra Perez Ramirez, Claudine Jurkovitz, H. Timothy Bunnell, Thao-Ly T. Phan, Rahmatollah Beheshti
Reliable prediction of pediatric obesity can offer a valuable resource to providers, helping them engage in timely preventive interventions before the disease is established. Many efforts have been made to develop ML-based predictive models of obesity, and some studies have reported high predictive performances. However, no commonly used clinical decision support tool based on existing ML models currently exists. This study presents a novel end-to-end pipeline specifically designed for pediatric obesity prediction, which supports the entire process of data extraction, inference, and communication via an API or a user interface. While focusing only on routinely recorded data in pediatric electronic health records (EHRs), our pipeline uses a diverse expert-curated list of medical concepts to predict the 1-3 years risk of developing obesity. Furthermore, by using the Fast Healthcare Interoperability Resources (FHIR) standard in our design procedure, we specifically target facilitating low-effort integration of our pipeline with different EHR systems. In our experiments, we report the effectiveness of the predictive model as well as its alignment with the feedback from various stakeholders, including ML scientists, providers, health IT personnel, health administration representatives, and patient group representatives.
nan
Article 660
Title@2025-07-14 (1): Discrimination-free Insurance Pricing with Privatized Sensitive Attributes
Title: Discrimination-free Insurance Pricing with Privatized Sensitive Attributes | Diskriminierungsfreie Versicherungspreise mit privatisierten sensiblen Attributen | 与私有化的敏感敏感属性挂钩的无歧视无歧视保险 2504.11775v2 |
Authors (3): Tianhe Zhang, Suhan Liu, Peng Shi
Fairness has emerged as a critical consideration in the landscape of machine learning algorithms, particularly as AI continues to transform decision-making across societal domains. To ensure that these algorithms are free from bias and do not discriminate against individuals based on sensitive attributes such as gender and race, the field of algorithmic bias has introduced various fairness concepts, along with methodologies to achieve these notions in different contexts. Despite the rapid advancement, not all sectors have embraced these fairness principles to the same extent. One specific sector that merits attention in this regard is insurance. Within the realm of insurance pricing, fairness is defined through a distinct and specialized framework. Consequently, achieving fairness according to established notions does not automatically ensure fair pricing in insurance. In particular, regulators are increasingly emphasizing transparency in pricing algorithms and imposing constraints on insurance companies on the collection and utilization of sensitive consumer attributes. These factors present additional challenges in the implementation of fairness in pricing algorithms. To address these complexities and comply with regulatory demands, we propose an efficient method for constructing fair models that are tailored to the insurance domain, using only privatized sensitive attributes. Notably, our approach ensures statistical guarantees, does not require direct access to sensitive attributes, and adapts to varying transparency requirements, addressing regulatory demands while ensuring fairness in insurance pricing.
nan
Article 661
Title@2025-07-14 (1): RAPNet: A Receptive-Field Adaptive Convolutional Neural Network for Pansharpening
Title: RAPNet: A Receptive-Field Adaptive Convolutional Neural Network for Pansharpening | RAPNet: Ein rezeptives, adaptives, konvolutionäres Neuralnetzwerk für Pansharpening | RAPNet: 泛码头受体-战地适应性革命神经网络 2507.10461v1 |
Authors (2): Tao Tang, Chengxu Yang
Pansharpening refers to the process of integrating a high resolution panchromatic (PAN) image with a lower resolution multispectral (MS) image to generate a fused product, which is pivotal in remote sensing. Despite the effectiveness of CNNs in addressing this challenge, they are inherently constrained by the uniform application of convolutional kernels across all spatial positions, overlooking local content variations. To overcome this issue, we introduce RAPNet, a new architecture that leverages content-adaptive convolution. At its core, RAPNet employs the Receptive-field Adaptive Pansharpening Convolution (RAPConv), designed to produce spatially adaptive kernels responsive to local feature context, thereby enhancing the precision of spatial detail extraction. Additionally, the network integrates the Pansharpening Dynamic Feature Fusion (PAN-DFF) module, which incorporates an attention mechanism to achieve an optimal balance between spatial detail enhancement and spectral fidelity. Comprehensive evaluations on publicly available datasets confirm that RAPNet delivers superior performance compared to existing approaches, as demonstrated by both quantitative metrics and qualitative assessments. Ablation analyses further substantiate the effectiveness of the proposed adaptive components.
nan
Article 662
Title@2025-07-14 (1): Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds
Title: Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds | Poisson Midpoint-Methode für Log Concave Sampling: Jenseits der starken Fehler unteren Bounds | 日志集中取样的 Poisson 中点方法: 超越强误差, 下界 2506.07614v2 |
Authors (2): Rishikesh Srinivasan, Dheeraj Nagaraj
We study the problem of sampling from strongly log-concave distributions over $\mathbb{R}^d$ using the Poisson midpoint discretization (a variant of the randomized midpoint method) for overdamped/underdamped Langevin dynamics. We prove its convergence in the 2-Wasserstein distance ($W_2$), achieving a cubic speedup in dependence on the target accuracy ($\epsilon$) over the Euler-Maruyama discretization, surpassing existing bounds for randomized midpoint methods. Notably, in the case of underdamped Langevin dynamics, we demonstrate the complexity of $W_2$ convergence is much smaller than the complexity lower bounds for convergence in $L^2$ strong error established in the literature.
nan
Article 663
Title@2025-07-14 (1): TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models
Title: TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models | TaylorPODA: Eine Taylor Expansion-basierte Methode zur Verbesserung der Post-Hoc-Attributionen für Opaque-Modelle | 泰勒·泰勒:以扩大泰勒为基础的方法,改进不透明模式的后住房分配办法 2507.10643v1 |
Authors (4): Yuchi Tang, Iñaki Esnaola, Suzanne Mason, George Panoutsos
Existing post-hoc model-agnostic methods generate external explanations for opaque models, primarily by locally attributing the model output to its input features. However, they often lack an explicit and systematic framework for quantifying the contribution of individual features. Building on the Taylor expansion framework introduced by Deng et al. (2024) to unify existing local attribution methods, we propose a rigorous set of postulates – “precision”, “federation”, and “zero-discrepancy” – to govern Taylor term-specific attribution. Guided by these postulates, we introduce TaylorPODA (Taylor expansion-derived imPortance-Order aDapted Attribution), which incorporates an additional “adaptation” property. This property enables alignment with task-specific goals, especially in post-hoc settings lacking ground-truth explanations. Empirical evaluations demonstrate that TaylorPODA achieves competitive results against baseline methods, providing principled and visualization-friendly explanations. This work represents a step toward the trustworthy deployment of opaque models by offering explanations with stronger theoretical grounding.
nan
Article 664
Title@2025-07-14 (1): First-of-its-kind AI model for bioacoustic detection using a lightweight associative memory Hopfield neural network
Title: First-of-its-kind AI model for bioacoustic detection using a lightweight associative memory Hopfield neural network | First-of-its-Art-KI-Modell für die bioakustische Erkennung mit einem leichten assoziativen Speicher Hopfield neuronalen Netzwerk | 使用轻量级联合内存Hopfield神经网络进行生物声学探测的首类AI型AI型生物声学探测模型 2507.10642v1 |
Authors (2): Andrew Gascoyne, Wendy Lomas
A growing issue within conservation bioacoustics is the task of analysing the vast amount of data generated from the use of passive acoustic monitoring devices. In this paper, we present an alternative AI model which has the potential to help alleviate this problem. Our model formulation addresses the key issues encountered when using current AI models for bioacoustic analysis, namely the: limited training data available; environmental impact, particularly in energy consumption and carbon footprint of training and implementing these models; and associated hardware requirements. The model developed in this work uses associative memory via a transparent, explainable Hopfield neural network to store signals and detect similar signals which can then be used to classify species. Training is rapid ($3$\,ms), as only one representative signal is required for each target sound within a dataset. The model is fast, taking only $5.4$\,s to pre-process and classify all $10384$ publicly available bat recordings, on a standard Apple MacBook Air. The model is also lightweight with a small memory footprint of $144.09$\,MB of RAM usage. Hence, the low computational demands make the model ideal for use on a variety of standard personal devices with potential for deployment in the field via edge-processing devices. It is also competitively accurate, with up to $86\%$ precision on the dataset used to evaluate the model. In fact, we could not find a single case of disagreement between model and manual identification via expert field guides. Although a dataset of bat echolocation calls was chosen to demo this first-of-its-kind AI model, trained on only two representative calls, the model is not species specific. In conclusion, we propose an equitable AI model that has the potential to be a game changer for fast, lightweight, sustainable, transparent, explainable and accurate bioacoustic analysis.
nan
Article 665
Title@2025-07-14 (1): Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems
Title: Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems | Logic Layer Prompt Control Injection (LPCI): Eine neuartige Sicherheitslückenklasse in Agentensystemen | 逻辑层快速控制喷射(LPCI): 剂系统中的新安全脆弱程度类别 2507.10457v1 |
Authors (6): Hammad Atta, Ken Huang, Manish Bhatt, Kamal Ahmed, Muhammad Aziz Ul Haq, Yasir Mehmood
The integration of large language models (LLMs) into enterprise systems has created a new class of covert security vulnerabilities, particularly within logic-execution layers and persistent-memory contexts. In this paper, we introduce Logic-Layer Prompt Control Injection (LPCI), a novel attack category in which encoded, delayed, and conditionally triggered payloads are embedded in memory, vector stores, or tool outputs. These payloads can bypass conventional input filters and trigger unauthorised behaviour across sessions.
nan
Article 666
Title@2025-07-14 (1): Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Title: Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction | Rollen Sie die Würfel & Blick, bevor Sie springen: Gehen über die kreativen Grenzen der Next-Token-Vorhersage | 跳跃前的骰子滚动和看一看:超越了次声预测的创造性极限 2504.15266v3 |
Authors (4): Vaishnavh Nagarajan, Chen Henry Wu, Charles Ding, Aditi Raghunathan
We design a suite of minimal algorithmic tasks that are a loose abstraction of open-ended real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model. Much like real-world tasks that require a creative, far-sighted leap of thought, our tasks require an implicit, open-ended stochastic planning step that either (a) discovers new connections in an abstract knowledge graph (like in wordplay, drawing analogies, or research) or (b) constructs new patterns (like in designing math problems or new proteins). In these tasks, we empirically and conceptually argue how next-token learning is myopic; multi-token approaches, namely teacherless training and diffusion models, comparatively excel in producing diverse and original output. Secondly, to elicit randomness without hurting coherence, we find that injecting noise at the input layer (dubbed seed-conditioning) works surprisingly as well as (and in some conditions, better than) temperature sampling from the output layer. Thus, our work offers a principled, minimal test-bed for analyzing open-ended creative skills, and offers new arguments for going beyond next-token learning and temperature sampling. We make part of the code available under https://github.com/chenwu98/algorithmic-creativity
nan
Article 667
Title@2025-07-14 (1): Non-exchangeable Conformal Prediction with Optimal Transport: Tackling Distribution Shifts with Unlabeled Data
Title: Non-exchangeable Conformal Prediction with Optimal Transport: Tackling Distribution Shifts with Unlabeled Data | Nicht austauschbare konforme Vorhersagen mit optimalem Verkehr: Umschaltung von Verteilungsverschiebungen mit unmarkierten Daten | 采用最佳运输方式的非正规非正式预测:用无标签数据处理分配变化 2507.10425v1 |
Authors (2): Alvaro H. C. Correia, Christos Louizos
Conformal prediction is a distribution-free uncertainty quantification method that has gained popularity in the machine learning community due to its finite-sample guarantees and ease of use. Its most common variant, dubbed split conformal prediction, is also computationally efficient as it boils down to collecting statistics of the model predictions on some calibration data not yet seen by the model. Nonetheless, these guarantees only hold if the calibration and test data are exchangeable, a condition that is difficult to verify and often violated in practice due to so-called distribution shifts. The literature is rife with methods to mitigate the loss in coverage in this non-exchangeable setting, but these methods require some prior information on the type of distribution shift to be expected at test time. In this work, we study this problem via a new perspective, through the lens of optimal transport, and show that it is possible to estimate the loss in coverage and mitigate it in case of distribution shift.
nan
Article 668
Title@2025-07-14 (1): SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning
Title: SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning | SentiDrop: Ein Multi Modal Machine Learning Modell zur Vorhersage von Ausfällen im Fernunterricht | SentiDROp:用于预测远程学习辍学的多模式机械学习模式 2507.10421v1 |
Authors (3): Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui
School dropout is a serious problem in distance learning, where early detection is crucial for effective intervention and student perseverance. Predicting student dropout using available educational data is a widely researched topic in learning analytics. Our partner’s distance learning platform highlights the importance of integrating diverse data sources, including socio-demographic data, behavioral data, and sentiment analysis, to accurately predict dropout risks. In this paper, we introduce a novel model that combines sentiment analysis of student comments using the Bidirectional Encoder Representations from Transformers (BERT) model with socio-demographic and behavioral data analyzed through Extreme Gradient Boosting (XGBoost). We fine-tuned BERT on student comments to capture nuanced sentiments, which were then merged with key features selected using feature importance techniques in XGBoost. Our model was tested on unseen data from the next academic year, achieving an accuracy of 84\%, compared to 82\% for the baseline model. Additionally, the model demonstrated superior performance in other metrics, such as precision and F1-score. The proposed method could be a vital tool in developing personalized strategies to reduce dropout rates and encourage student perseverance
nan
Article 669
Title@2025-07-14 (1): Multiple Choice Learning of Low Rank Adapters for Language Modeling
Title: Multiple Choice Learning of Low Rank Adapters for Language Modeling | Multiple Choice-Lernen von Low-Rank-Adaptern für die Sprachmodellierung | 低级别语言建模适应者多选择学习 2507.10419v1 |
Authors (7): Victor Letzelter, Hugo Malard, Mathieu Fontaine, Gaël Richard, Slim Essid, Andrei Bursuc, Patrick Pérez
We propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time. Traditional language modeling is an intrinsically ill-posed problem: given a context, multiple futures may be equally plausible. Our approach leverages Multiple Choice Learning (MCL) and the Winner-Takes-All (WTA) loss to efficiently handle ambiguity through Low-Rank Adaptation (LoRA). We provide a theoretical interpretation of applying Multiple Choice Learning to Language Modeling, assuming the data is generated from a mixture of distributions. To illustrate the proposed approach, we use data sampled from mixtures of Markov chains. We then demonstrate with extensive experiments on real-world visual and audio captioning tasks that our method achieves high diversity and relevance in generated outputs.
nan
Article 670
Title@2025-07-14 (1): Anticipating the Selectivity of Cyclization Reaction Pathways with Neural Network Potentials
Title: Anticipating the Selectivity of Cyclization Reaction Pathways with Neural Network Potentials | Die Selektivität von Zyklisierungsreaktionspfaden mit neuralen Netzwerkpotentialen antizipieren | 预测具有神经网络潜力的循环反应路径的选择性 2507.10400v1 |
Authors (4): Nicholas Casetti, Dylan Anstine, Olexandr Isayev, Connor W. Coley
Reaction mechanism search tools have demonstrated the ability to provide insights into likely products and rate-limiting steps of reacting systems. However, reactions involving several concerted bond changes - as can be found in many key steps of natural product synthesis - can complicate the search process. To mitigate these complications, we present a mechanism search strategy particularly suited to help expedite exploration of an exemplary family of such complex reactions, cyclizations. We provide a cost-effective strategy for identifying relevant elementary reaction steps by combining graph-based enumeration schemes and machine learning techniques for intermediate filtering. Key to this approach is our use of a neural network potential (NNP), AIMNet2-rxn, for computational evaluation of each candidate reaction pathway. In this article, we evaluate the NNP’s ability to estimate activation energies, demonstrate the correct anticipation of stereoselectivity, and recapitulate complex enabling steps in natural product synthesis.
nan
Article 671
Title@2025-07-14 (1): SEAL: Towards Safe Autonomous Driving via Skill-Enabled Adversary Learning for Closed-Loop Scenario Generation
Title: SEAL: Towards Safe Autonomous Driving via Skill-Enabled Adversary Learning for Closed-Loop Scenario Generation | SEAL: Auf dem Weg zu einem sicheren autonomen Fahren durch qualifikationsfähiges, gewinnbringendes Lernen für die Closed-Loop-Szenario-Erzeugung | SEAL:通过技能-有技能的对抗性学习实现安全自主驾驶,促进闭路电视假想一代人的安全自主驾驶 2409.10320v3 |
Authors (4): Benjamin Stoler, Ingrid Navarro, Jonathan Francis, Jean Oh
Verification and validation of autonomous driving (AD) systems and components is of increasing importance, as such technology increases in real-world prevalence. Safety-critical scenario generation is a key approach to robustify AD policies through closed-loop training. However, existing approaches for scenario generation rely on simplistic objectives, resulting in overly-aggressive or non-reactive adversarial behaviors. To generate diverse adversarial yet realistic scenarios, we propose SEAL, a scenario perturbation approach which leverages learned objective functions and adversarial, human-like skills. SEAL-perturbed scenarios are more realistic than SOTA baselines, leading to improved ego task success across real-world, in-distribution, and out-of-distribution scenarios, of more than 20%. To facilitate future research, we release our code and tools: https://github.com/cmubig/SEAL
nan
Article 672
Title@2025-07-14 (1): Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection Systems
Title: Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection Systems | Bypassing LLM Guardrails: Eine empirische Analyse von Evasionsangriffen gegen prompte Injektions- und Jailbreak-Detektionssysteme | 绕过LLM 护卫车:对攻击即时射入和越狱侦察系统的逃难攻击经验分析 2504.11168v3 |
Authors (5): William Hackett, Lewis Birch, Stefan Trawicki, Neeraj Suri, Peter Garraghan
Large Language Models (LLMs) guardrail systems are designed to protect against prompt injection and jailbreak attacks. However, they remain vulnerable to evasion techniques. We demonstrate two approaches for bypassing LLM prompt injection and jailbreak detection systems via traditional character injection methods and algorithmic Adversarial Machine Learning (AML) evasion techniques. Through testing against six prominent protection systems, including Microsoft’s Azure Prompt Shield and Meta’s Prompt Guard, we show that both methods can be used to evade detection while maintaining adversarial utility achieving in some instances up to 100% evasion success. Furthermore, we demonstrate that adversaries can enhance Attack Success Rates (ASR) against black-box targets by leveraging word importance ranking computed by offline white-box models. Our findings reveal vulnerabilities within current LLM protection mechanisms and highlight the need for more robust guardrail systems.
nan
Article 673
Title@2025-07-14 (1): Deep Learning Accelerated Quantum Transport Simulations in Nanoelectronics: From Break Junctions to Field-Effect Transistors
Title: Deep Learning Accelerated Quantum Transport Simulations in Nanoelectronics: From Break Junctions to Field-Effect Transistors | Deep Learning Beschleunigte Quantentransportsimulationen in der Nanoelektronik: Von Break Junctions zu Field-Effect Transistors | 纳米电子中的深度学习加速量子传输模拟:从断裂交叉点到实地影响晶体体体 2411.08800v3 |
Authors (7): Jijie Zou, Zhanghao Zhouyin, Dongying Lin, Yike Huang, Linfeng Zhang, Shimin Hou, Qiangqiang Gu
Quantum transport simulations are essential for understanding and designing nanoelectronic devices, yet the long-standing trade-off between accuracy and computational efficiency has limited their practical applications. We present DeePTB-NEGF, an integrated framework combining deep learning tight-binding Hamiltonian prediction with non-equilibrium Green’s Function methodology to enable accurate quantum transport simulations in open boundary conditions with 2-3 orders of magnitude acceleration. We demonstrate DeePTB-NEGF through two challenging applications: comprehensive break junction simulations with over $10^4$ snapshots, showing excellent agreement with experimental conductance histograms; and carbon nanotube field-effect transistors (CNT-FET) at experimental dimensions, reproducing measured transfer characteristics for a 41 nm channel CNT-FET ($\sim 8000$ atoms, $3\times10^4$ orbitals) and predicting zero-bias transmission spectra for a 180 nm CNT ($\sim 3\times 10^4$ atoms, $10^5$ orbitals), showcasing the framework’s capability for large-scale device simulations. Our systematic studies across varying geometries confirm the necessity of simulating realistic experimental structures for precise predictions. DeePTB-NEGF bridges the longstanding gap between first-principles accuracy and computational efficiency, providing a scalable tool for high-throughput and large-scale quantum transport simulations that enables previously inaccessible nanoscale device investigations.
nan
Article 674
Title@2025-07-14 (1): Extracting Important Tokens in E-Commerce Queries with a Tag Interaction-Aware Transformer Model
Title: Extracting Important Tokens in E-Commerce Queries with a Tag Interaction-Aware Transformer Model | Extrahieren wichtiger Token in E-Commerce Abfragen mit einem Tag Interaction-Aware Transformer Modell | 使用标签互动软件变换模型在电子商务查询中提取重要调量 2507.10385v1 |
Authors (7): Md. Ahsanul Kabir, Mohammad Al Hasan, Aritra Mandal, Liyang Hao, Ishita Khan, Daniel Tunkelang, Zhe Wu
The major task of any e-commerce search engine is to retrieve the most relevant inventory items, which best match the user intent reflected in a query. This task is non-trivial due to many reasons, including ambiguous queries, misaligned vocabulary between buyers, and sellers, over- or under-constrained queries by the presence of too many or too few tokens. To address these challenges, query reformulation is used, which modifies a user query through token dropping, replacement or expansion, with the objective to bridge semantic gap between query tokens and users’ search intent. Early methods of query reformulation mostly used statistical measures derived from token co-occurrence frequencies from selective user sessions having clicks or purchases. In recent years, supervised deep learning approaches, specifically transformer-based neural language models, or sequence-to-sequence models are being used for query reformulation task. However, these models do not utilize the semantic tags of a query token, which are significant for capturing user intent of an e-commerce query. In this work, we pose query reformulation as a token classification task, and solve this task by designing a dependency-aware transformer-based language model, TagBERT, which makes use of semantic tags of a token for learning superior query phrase embedding. Experiments on large, real-life e-commerce datasets show that TagBERT exhibits superior performance than plethora of competing models, including BERT, eBERT, and Sequence-to-Sequence transformer model for important token classification task.
nan
Article 675
Title@2025-07-14 (1): Dynamical stability for dense patterns in discrete attractor neural networks
Title: Dynamical stability for dense patterns in discrete attractor neural networks | Dynamische Stabilität für dichte Muster in diskreten neuronalen Attraktorennetzen | 离散吸引性神经网络中密度型态动态稳定的动态稳定 2507.10383v1 |
Authors (2): Uri Cohen, Máté Lengyel
Neural networks storing multiple discrete attractors are canonical models of biological memory. Previously, the dynamical stability of such networks could only be guaranteed under highly restrictive conditions. Here, we derive a theory of the local stability of discrete fixed points in a broad class of networks with graded neural activities and in the presence of noise. By directly analyzing the bulk and outliers of the Jacobian spectrum, we show that all fixed points are stable below a critical load that is distinct from the classical \textit{critical capacity} and depends on the statistics of neural activities in the fixed points as well as the single-neuron activation function. Our analysis highlights the computational benefits of threshold-linear activation and sparse-like patterns.
nan
Article 676
Title@2025-07-14 (1): Leveraging RAG-LLMs for Urban Mobility Simulation and Analysis
Title: Leveraging RAG-LLMs for Urban Mobility Simulation and Analysis | Nutzung von RAG-LLMs für Simulation und Analyse der urbanen Mobilität | 为城市流动模拟和分析利用RAG-LLMs进行城市流动模拟和分析 2507.10382v1 |
Authors (4): Yue Ding, Conor McCarthy, Kevin O’Shea, Mingming Liu
With the rise of smart mobility and shared e-mobility services, numerous advanced technologies have been applied to this field. Cloud-based traffic simulation solutions have flourished, offering increasingly realistic representations of the evolving mobility landscape. LLMs have emerged as pioneering tools, providing robust support for various applications, including intelligent decision-making, user interaction, and real-time traffic analysis. As user demand for e-mobility continues to grow, delivering comprehensive end-to-end solutions has become crucial. In this paper, we present a cloud-based, LLM-powered shared e-mobility platform, integrated with a mobile application for personalized route recommendations. The optimization module is evaluated based on travel time and cost across different traffic scenarios. Additionally, the LLM-powered RAG framework is evaluated at the schema level for different users, using various evaluation methods. Schema-level RAG with XiYanSQL achieves an average execution accuracy of 0.81 on system operator queries and 0.98 on user queries.
nan
Article 677
Title@2025-07-14 (1): Improving Remote Sensing Classification using Topological Data Analysis and Convolutional Neural Networks
Title: Improving Remote Sensing Classification using Topological Data Analysis and Convolutional Neural Networks | Verbesserung der Klassifikation der Fernerkundung mittels topologischer Datenanalyse und konvolutionärer neuraler Netzwerke | 利用地形数据分析和进化神经网络改进遥感分类 2507.10381v1 |
Authors (1): Aaryam Sharma
Topological data analysis (TDA) is a relatively new field that is gaining rapid adoption due to its robustness and ability to effectively describe complex datasets by quantifying geometric information. In imaging contexts, TDA typically models data as filtered cubical complexes from which we can extract discriminative features using persistence homology. Meanwhile, convolutional neural networks (CNNs) have been shown to be biased towards texture based local features. To address this limitation, we propose a TDA feature engineering pipeline and a simple method to integrate topological features with deep learning models on remote sensing classification. Our method improves the performance of a ResNet18 model on the EuroSAT dataset by 1.44% achieving 99.33% accuracy, which surpasses all previously reported single-model accuracies, including those with larger architectures, such as ResNet50 (2x larger) and XL Vision Transformers (197x larger). We additionally show that our method’s accuracy is 1.82% higher than our ResNet18 baseline on the RESISC45 dataset. To our knowledge, this is the first application of TDA features in satellite scene classification with deep learning. This demonstrates that TDA features can be integrated with deep learning models, even on datasets without explicit topological structures, thereby increasing the applicability of TDA. A clean implementation of our method will be made publicly available upon publication.
nan
Article 678
Title@2025-07-14 (1): EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration
Title: EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration | EVOLvE: Bewertung und Optimierung von LLMs für In-Context Exploration | EVOLvE: 评估和优化用于内衣探索的LMs LMs 2410.06238v2 |
Authors (7): Allen Nie, Yi Su, Bo Chang, Jonathan N. Lee, Ed H. Chi, Quoc V. Le, Minmin Chen
Despite their success in many domains, large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. This is crucial as many real-world applications, ranging from personalized recommendations to healthcare interventions, demand that LLMs not only predict but also actively learn to make optimal decisions through exploration. In this work, we measure LLMs’ (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. We develop a comprehensive suite of environments, including both context-free and contextual bandits with varying task difficulties, to benchmark LLMs’ performance. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs: by providing explicit algorithm-guided support during inference; and through algorithm distillation via in-context demonstrations and fine-tuning, using synthetic data generated from these algorithms. Impressively, these techniques allow us to achieve superior exploration performance with smaller models, surpassing larger models on various tasks. We conducted an extensive ablation study to shed light on various factors, such as task difficulty and data representation, that influence the efficiency of LLM exploration. Additionally, we conduct a rigorous analysis of the LLM’s exploration efficiency using the concept of regret, linking its ability to explore to the model size and underlying algorithm.
nan
Article 679
Title@2025-07-14 (1): Test-Time Canonicalization by Foundation Models for Robust Perception
Title: Test-Time Canonicalization by Foundation Models for Robust Perception | Test-Time Canonicalization durch Foundation Models für robuste Wahrnehmung | 强力感知基础模型的试验时罐化 2507.10375v1 |
Authors (4): Utkarsh Singhal, Ryan Feng, Stella X. Yu, Atul Prakash
Real-world visual perception requires invariance to diverse transformations, yet current methods rely heavily on specialized architectures or training on predefined augmentations, limiting generalization. We propose FOCAL, a test-time, data-driven framework that achieves robust perception by leveraging internet-scale visual priors from foundation models. By generating and optimizing candidate transformations toward visually typical, “canonical” views, FOCAL enhances robustness without re-training or architectural changes. Our experiments demonstrate improved robustness of CLIP and SAM across challenging transformations, including 2D/3D rotations, illumination shifts (contrast and color), and day-night variations. We also highlight potential applications in active vision. Our approach challenges the assumption that transform-specific training is necessary, instead offering a scalable path to invariance. Our code is available at: https://github.com/sutkarsh/focal.
nan
Article 680
Title@2025-07-14 (1): Enhanced DeepONet for 1-D consolidation operator learning: an architectural investigation
Title: Enhanced DeepONet for 1-D consolidation operator learning: an architectural investigation | Verbesserte DeepONet für 1-D-Konsolidierungsoperator Lernen: eine architektonische Untersuchung | 1D整合操作员学习的强化深水卫星:建筑调查 2507.10368v1 |
Authors (3): Yongjin Choi, Chenying Liu, Jorge Macedo
Deep Operator Networks (DeepONets) have emerged as a powerful surrogate modeling framework for learning solution operators in PDE-governed systems. While their use is expanding across engineering disciplines, applications in geotechnical engineering remain limited. This study systematically evaluates several DeepONet architectures for the one-dimensional consolidation problem. We initially consider three architectures: a standard DeepONet with the coefficient of consolidation embedded in the branch net (Models 1 and 2), and a physics-inspired architecture with the coefficient embedded in the trunk net (Model 3). Results show that Model 3 outperforms the standard configurations (Models 1 and 2) but still has limitations when the target solution (excess pore pressures) exhibits significant variation. To overcome this limitation, we propose a Trunknet Fourier feature-enhanced DeepONet (Model 4) that addresses the identified limitations by capturing rapidly varying functions. All proposed architectures achieve speedups ranging from 1.5 to 100 times over traditional explicit and implicit solvers, with Model 4 being the most efficient. Larger computational savings are expected for more complex systems than the explored 1D case, which is promising. Overall, the study highlights the potential of DeepONets to enable efficient, generalizable surrogate modeling in geotechnical applications, advancing the integration of scientific machine learning in geotechnics, which is at an early stage.
nan
Article 681
Title@2025-07-14 (1): HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong
Title: HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong | HKGAI-V1: Auf dem Weg zu einem regionalen Souveränen Großsprachenmodell für Hongkong | HKGAI-V1:为香港建立区域主权大语言模式 2507.11502v1 |
Authors (4): Sirui Han, Junqi Zhu, Ruiyuan Zhang, Yike Guo
This paper presents the development of HKGAI-V1, a foundational sovereign large language model (LLM), developed as part of an initiative to establish value-aligned AI infrastructure specifically tailored for Hong Kong. Addressing the region’s unique multilingual environment (Cantonese, Mandarin, and English), its distinct socio-legal context under the “one country, two systems” framework, and specific local cultural and value considerations, the model is built upon the DeepSeek architecture and systematically aligned with regional norms through a multifaceted full parameter fine-tuning process. It is further integrated with a retrieval-augmented generation (RAG) system to ensure timely and factually grounded information access. The core contribution lies in the design and implementation of a comprehensive, region-specific AI alignment and safety framework, demonstrated through two key achievements: 1) The successful development of HKGAI-V1 itself - which outper-forms general-purpose models in handling Hong Kong-specific culturally sensitive queries, and embodies a “governance-embedded” approach to digital sovereignty - empowers Hong Kong to exercise control over AI applications in critical sectors including public services, legal systems, and edu-cation. 2) The development of the proprietary Adversarial HK Value Benchmark, a rigorous tool for evaluating model alignment with local ethical and legal stand-ards under challenging conditions. By documenting these achievements, the paper provides not only a technological artifact but also a replicable blueprint for developing advanced, regionally focused AI systems deeply rooted in their local identities.
nan
Article 682
Title@2025-07-14 (1): SENSOR: An ML-Enhanced Online Annotation Tool to Uncover Privacy Concerns from User Reviews in Social-Media Applications
Title: SENSOR: An ML-Enhanced Online Annotation Tool to Uncover Privacy Concerns from User Reviews in Social-Media Applications | SENSOR: Ein ML-erweitertes Online-Annotations-Tool, um Datenschutz-Bedenken aus User Reviews in Social-Media-Anwendungen zu enthüllen | SENSOR:一个ML-加强在线说明工具,以从社会-媒体应用中的用户审查中发现隐私问题。 2507.10640v1 |
Authors (5): Labiba Farah, Mohammad Ridwan Kabir, Shohel Ahmed, MD Mohaymen Ul Anam, Md. Sakibul Islam
The widespread use of social media applications has raised significant privacy concerns, often highlighted in user reviews. These reviews also provide developers with valuable insights into improving apps by addressing issues and introducing better features. However, the sheer volume and nuanced nature of reviews make manual identification and prioritization of privacy-related concerns challenging for developers. Previous studies have developed software utilities to automatically classify user reviews as privacy-relevant, privacy-irrelevant, bug reports, feature requests, etc., using machine learning. Notably, there is a lack of focus on classifying reviews specifically as privacy-related feature requests, privacy-related bug reports, or privacy-irrelevant. This paper introduces SENtinel SORt (SENSOR), an automated online annotation tool designed to help developers annotate and classify user reviews into these categories. For automating the annotation of such reviews, this paper introduces the annotation model, GRACE (GRU-based Attention with CBOW Embedding), using Gated Recurrent Units (GRU) with Continuous Bag of Words (CBOW) and Attention mechanism. Approximately 16000 user reviews from seven popular social media apps on Google Play Store, including Instagram, Facebook, WhatsApp, Snapchat, X (formerly Twitter), Facebook Lite, and Line were analyzed. Two annotators manually labelled the reviews, achieving a Cohen’s Kappa value of 0.87, ensuring a labeled dataset with high inter-rater agreement for training machine learning models. Among the models tested, GRACE demonstrated the best performance (macro F1-score: 0.9434, macro ROC-AUC: 0.9934, and accuracy: 95.10%) despite class imbalance. SENSOR demonstrates significant potential to assist developers with extracting and addressing privacy-related feature requests or bug reports from user reviews, enhancing user privacy and trust.
nan
Article 683
Title@2025-07-14 (1): Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation
Title: Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation | Durchschnittlicher Kalibrierungsfehler: Ein differenzierbarer Verlust für verbesserte Zuverlässigkeit in der Bildsegmentierung | 平均校准误差:图像分割法可靠性提高的可区别损失 2403.06759v4 |
Authors (4): Theodore Barfoot, Luis Garcia-Peraza-Herrera, Ben Glocker, Tom Vercauteren
Deep neural networks for medical image segmentation often produce overconfident results misaligned with empirical observations. Such miscalibration, challenges their clinical translation. We propose to use marginal L1 average calibration error (mL1-ACE) as a novel auxiliary loss function to improve pixel-wise calibration without compromising segmentation quality. We show that this loss, despite using hard binning, is directly differentiable, bypassing the need for approximate but differentiable surrogate or soft binning approaches. Our work also introduces the concept of dataset reliability histograms which generalises standard reliability diagrams for refined visual assessment of calibration in semantic segmentation aggregated at the dataset level. Using mL1-ACE, we reduce average and maximum calibration error by 45% and 55% respectively, maintaining a Dice score of 87% on the BraTS 2021 dataset. We share our code here: https://github.com/cai4cai/ACE-DLIRIS
nan
Article 684
Title@2025-07-14 (1): TAT: Temporal-Aligned Transformer for Multi-Horizon Peak Demand Forecasting
Title: TAT: Temporal-Aligned Transformer for Multi-Horizon Peak Demand Forecasting | TAT: Temporal ausgerichteter Transformer für Multi-Horizon-Peak-Nachfrageprognosen | TAT: 多霍里宗峰需求预测的时向调整变换器 2507.10349v1 |
Authors (8): Zhiyuan Zhao, Sitan Yang, Kin G. Olivares, Boris N. Oreshkin, Stan Vitebsky, Michael W. Mahoney, B. Aditya Prakash, Dmitry Efimov
Multi-horizon time series forecasting has many practical applications such as demand forecasting. Accurate demand prediction is critical to help make buying and inventory decisions for supply chain management of e-commerce and physical retailers, and such predictions are typically required for future horizons extending tens of weeks. This is especially challenging during high-stake sales events when demand peaks are particularly difficult to predict accurately. However, these events are important not only for managing supply chain operations but also for ensuring a seamless shopping experience for customers. To address this challenge, we propose Temporal-Aligned Transformer (TAT), a multi-horizon forecaster leveraging apriori-known context variables such as holiday and promotion events information for improving predictive performance. Our model consists of an encoder and decoder, both embedded with a novel Temporal Alignment Attention (TAA), designed to learn context-dependent alignment for peak demand forecasting. We conduct extensive empirical analysis on two large-scale proprietary datasets from a large e-commerce retailer. We demonstrate that TAT brings up to 30% accuracy improvement on peak demand forecasting while maintaining competitive overall performance compared to other state-of-the-art methods.
nan
Article 685
Title@2025-07-14 (1): Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning
Title: Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning | Feature Destillation ist die bessere Wahl für modell-heterogenes Federated Learning | 精化是示范-异种联邦学习的更好选择。 2507.10348v1 |
Authors (1): Yichen Li
Model-Heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally. To better aggregate knowledge from clients, ensemble distillation, as a widely used and effective technique, is often employed after global aggregation to enhance the performance of the global model. However, simply combining Hetero-FL and ensemble distillation does not always yield promising results and can make the training process unstable. The reason is that existing methods primarily focus on logit distillation, which, while being model-agnostic with softmax predictions, fails to compensate for the knowledge bias arising from heterogeneous models. To tackle this challenge, we propose a stable and efficient Feature Distillation for model-heterogeneous Federated learning, dubbed FedFD, that can incorporate aligned feature information via orthogonal projection to integrate knowledge from heterogeneous models better. Specifically, a new feature-based ensemble federated knowledge distillation paradigm is proposed. The global model on the server needs to maintain a projection layer for each client-side model architecture to align the features separately. Orthogonal techniques are employed to re-parameterize the projection layer to mitigate knowledge bias from heterogeneous models and thus maximize the distilled knowledge. Extensive experiments show that FedFD achieves superior performance compared to state-of-the-art methods.
nan
Article 686
Title@2025-07-14 (1): Parallel Sampling of Diffusion Models on $SO(3)$
Title: Parallel Sampling of Diffusion Models on $SO(3)$ | Parallele Probenahme von Diffusionsmodellen auf $SO(3)$ | 以USSO美元(3美元)平行采集传播模型样本 2507.10347v1 |
Authors (4): Yan-Ting Chen, Hao-Wei Chen, Tsu-Ching Hsiao, Chun-Yi Lee
In this paper, we design an algorithm to accelerate the diffusion process on the $SO(3)$ manifold. The inherently sequential nature of diffusion models necessitates substantial time for denoising perturbed data. To overcome this limitation, we proposed to adapt the numerical Picard iteration for the $SO(3)$ space. We demonstrate our algorithm on an existing method that employs diffusion models to address the pose ambiguity problem. Moreover, we show that this acceleration advantage occurs without any measurable degradation in task reward. The experiments reveal that our algorithm achieves a speed-up of up to 4.9$\times$, significantly reducing the latency for generating a single sample.
nan
Article 687
Title@2025-07-14 (1): Faster Reinforcement Learning by Freezing Slow States
Title: Faster Reinforcement Learning by Freezing Slow States | Schnellere Stärkung des Lernens durch einfrierende langsame Staaten | 冷冻慢速国家加快加强学习 2301.00922v3 |
Authors (2): Yijia Wang, Daniel R. Jiang
We study infinite horizon Markov decision processes (MDPs) with “fast-slow” structure, where some state variables evolve rapidly (“fast states”) while others change more gradually (“slow states”). This structure commonly arises in practice when decisions must be made at high frequencies over long horizons, and where slowly changing information still plays a critical role in determining optimal actions. Examples include inventory control under slowly changing demand indicators or dynamic pricing with gradually shifting consumer behavior. Modeling the problem at the natural decision frequency leads to MDPs with discount factors close to one, making them computationally challenging. We propose a novel approximation strategy that “freezes” slow states during phases of lower-level planning and subsequently applies value iteration to an auxiliary upper-level MDP that evolves on a slower timescale. Freezing states for short periods of time leads to easier-to-solve lower-level problems, while a slower upper-level timescale allows for a more favorable discount factor. On the theoretical side, we analyze the regret incurred by our frozen-state approach, which leads to simple insights on how to trade off regret versus computational cost. Empirically, we benchmark our new frozen-state methods on three domains, (i) inventory control with fixed order costs, (ii) a gridworld problem with spatial tasks, and (iii) dynamic pricing with reference-price effects. We demonstrate that the new methods produce high-quality policies with significantly less computation, and we show that simply omitting slow states is often a poor heuristic.
nan
Article 688
Title@2025-07-14 (1): Some Super-approximation Rates of ReLU Neural Networks for Korobov Functions
Title: Some Super-approximation Rates of ReLU Neural Networks for Korobov Functions | Einige Super-Annäherungsraten der ReLU-Neuralnetze für Korobov-Funktionen | Korobov 函数的 ReLU 神经网络的某些超接近率 2507.10345v1 |
Authors (2): Yuwen Li, Guozhi Zhang
This paper examines the $L_p$ and $W^1p$ norm approximation errors of ReLU neural networks for Korobov functions. In terms of network width and depth, we derive nearly optimal super-approximation error bounds of order $2m$ in the $L_p$ norm and order $2m-2$ in the $W^1_p$ norm, for target functions with $L_p$ mixed derivative of order $m$ in each direction. The analysis leverages sparse grid finite elements and the bit extraction technique. Our results improve upon classical lowest order $L\infty$ and $H^1$ norm error bounds and demonstrate that the expressivity of neural networks is largely unaffected by the curse of dimensionality.
nan
Article 689
Title@2025-07-14 (1): MoCap-Impute: A Comprehensive Benchmark and Comparative Analysis of Imputation Methods for IMU-based Motion Capture Data
Title: MoCap-Impute: A Comprehensive Benchmark and Comparative Analysis of Imputation Methods for IMU-based Motion Capture Data | MoCap-Impute: Umfassender Benchmark und vergleichende Analyse von Imputationsmethoden für IMU-basierte Motion Capture Daten | MoCap-Capute:以IMU为基础的运动捕获数据估算方法综合基准和比较分析 2507.10334v1 |
Authors (7): Mahmoud Bekhit, Ahmad Salah, Ahmed Salim Alrawahi, Tarek Attia, Ahmed Ali, Esraa Eldesokey, Ahmed Fathalla
Motion capture (MoCap) data from wearable Inertial Measurement Units (IMUs) is vital for applications in sports science, but its utility is often compromised by missing data. Despite numerous imputation techniques, a systematic performance evaluation for IMU-derived MoCap time-series data is lacking. We address this gap by conducting a comprehensive comparative analysis of statistical, machine learning, and deep learning imputation methods. Our evaluation considers three distinct contexts: univariate time-series, multivariate across subjects, and multivariate across kinematic angles. To facilitate this benchmark, we introduce the first publicly available MoCap dataset designed specifically for imputation, featuring data from 53 karate practitioners. We simulate three controlled missingness mechanisms: missing completely at random (MCAR), block missingness, and a novel value-dependent pattern at signal transition points. Our experiments, conducted on 39 kinematic variables across all subjects, reveal that multivariate imputation frameworks consistently outperform univariate approaches, particularly for complex missingness. For instance, multivariate methods achieve up to a 50% mean absolute error reduction (MAE from 10.8 to 5.8) compared to univariate techniques for transition point missingness. Advanced models like Generative Adversarial Imputation Networks (GAIN) and Iterative Imputers demonstrate the highest accuracy in these challenging scenarios. This work provides a critical baseline for future research and offers practical recommendations for improving the integrity and robustness of Mo-Cap data analysis.
nan
Article 690
Title@2025-07-14 (1): Constructing Extreme Heatwave Storylines with Differentiable Climate Models
Title: Constructing Extreme Heatwave Storylines with Differentiable Climate Models | Extreme Hitzewellen-Geschichten mit differenzierbaren Klimamodellen konstruieren | 以差异气候模型构建极端热浪线 2506.10660v2 |
Authors (2): Tim Whittaker, Alejandro Di Luca
Understanding the plausible upper bounds of extreme weather events is essential for risk assessment in a warming climate. Existing methods, based on large ensembles of physics-based models, are often computationally expensive or lack the fidelity needed to simulate rare, high-impact extremes. Here, we present a novel framework that leverages a differentiable hybrid climate model, NeuralGCM, to optimize initial conditions and generate physically consistent worst-case heatwave trajectories. Applied to the 2021 Pacific Northwest heatwave, our method produces heatwave intensity up to 3.7 $^\circ$C above the most extreme member of a 75-member ensemble. These trajectories feature intensified atmospheric blocking and amplified Rossby wave patterns-hallmarks of severe heat events. Our results demonstrate that differentiable climate models can efficiently explore the upper tails of event likelihoods, providing a powerful new approach for constructing targeted storylines of extreme weather under climate change.
nan
Article 691
Title@2025-07-14 (1): Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach
Title: Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach | Überbrückung von Robustheit und Verallgemeinerung gegen Wortersatzangriffe in NLP über den Ansatz der Wachstumsbound Matrix | 通过 “ 增长组合矩阵方法 “ ,在NLP中架起桥梁,反对用词替代袭击的有力性和普遍性 2507.10330v1 |
Authors (2): Mohammed Bouri, Adnane Saoud
Despite advancements in Natural Language Processing (NLP), models remain vulnerable to adversarial attacks, such as synonym substitutions. While prior work has focused on improving robustness for feed-forward and convolutional architectures, the robustness of recurrent networks and modern state space models (SSMs), such as S4, remains understudied. These architectures pose unique challenges due to their sequential processing and complex parameter dynamics. In this paper, we introduce a novel regularization technique based on Growth Bound Matrices (GBM) to improve NLP model robustness by reducing the impact of input perturbations on model outputs. We focus on computing the GBM for three architectures: Long Short-Term Memory (LSTM), State Space models (S4), and Convolutional Neural Networks (CNN). Our method aims to (1) enhance resilience against word substitution attacks, (2) improve generalization on clean text, and (3) providing the first systematic analysis of SSM (S4) robustness. Extensive experiments across multiple architectures and benchmark datasets demonstrate that our method improves adversarial robustness by up to 8.8% over existing baselines. These results highlight the effectiveness of our approach, outperforming several state-of-the-art methods in adversarial defense. Codes are available at https://github.com/BouriMohammed/GBM
nan
Article 692
Title@2025-07-14 (1): Zero-Shot Cyclic Peptide Design via Composable Geometric Constraints
Title: Zero-Shot Cyclic Peptide Design via Composable Geometric Constraints | Zero-Shot Cyclic Peptid Design über kompostierbare geometrische Einschränkungen | 利用可合成几何限制设计零热阴极球化物 2507.04225v2 |
Authors (9): Dapeng Jiang, Xiangzhe Kong, Jiaqi Han, Mingyu Li, Rui Jiao, Wenbing Huang, Stefano Ermon, Jianzhu Ma, Yang Liu
Cyclic peptides, characterized by geometric constraints absent in linear peptides, offer enhanced biochemical properties, presenting new opportunities to address unmet medical needs. However, designing target-specific cyclic peptides remains underexplored due to limited training data. To bridge the gap, we propose CP-Composer, a novel generative framework that enables zero-shot cyclic peptide generation via composable geometric constraints. Our approach decomposes complex cyclization patterns into unit constraints, which are incorporated into a diffusion model through geometric conditioning on nodes and edges. During training, the model learns from unit constraints and their random combinations in linear peptides, while at inference, novel constraint combinations required for cyclization are imposed as input. Experiments show that our model, despite trained with linear peptides, is capable of generating diverse target-binding cyclic peptides, reaching success rates from 38% to 84% on different cyclization strategies.
nan
Article 693
Title@2025-07-14 (1): Convergence of Agnostic Federated Averaging
Title: Convergence of Agnostic Federated Averaging | Konvergenz der agnostischen Föderierten Durchschnittswerte | Agnostic Federal 波动的趋同 2507.10325v1 |
Authors (3): Herlock, Rahimi, Dionysis Kalogerias
Federated learning (FL) enables decentralized model training without centralizing raw data. However, practical FL deployments often face a key realistic challenge: Clients participate intermittently in server aggregation and with unknown, possibly biased participation probabilities. Most existing convergence results either assume full-device participation, or rely on knowledge of (in fact uniform) client availability distributions – assumptions that rarely hold in practice. In this work, we characterize the optimization problem that consistently adheres to the stochastic dynamics of the well-known \emph{agnostic Federated Averaging (FedAvg)} algorithm under random (and variably-sized) client availability, and rigorously establish its convergence for convex, possibly nonsmooth losses, achieving a standard rate of order $\mathcal{O}(1/\sqrt{T})$, where $T$ denotes the aggregation horizon. Our analysis provides the first convergence guarantees for agnostic FedAvg under general, non-uniform, stochastic client participation, without knowledge of the participation distribution. We also empirically demonstrate that agnostic FedAvg in fact outperforms common (and suboptimal) weighted aggregation FedAvg variants, even with server-side knowledge of participation weights.
nan
Article 694
Title@2025-07-14 (1): LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Title: LEXam: Benchmarking Legal Reasoning on 340 Law Exams | LEXam: Benchmarking der rechtlichen Begründung von 340 Rechtsprüfungen | LEXam:340项法律考试的法律依据基准 2505.12864v3 |
Authors (17): Yu Fan, Jingwei Ni, Jakob Merane, Etienne Salimbeni, Yang Tian, Yoan Hermstrüwer, Yinya Huang, Mubashara Akhtar, Florian Geering, Oliver Dreyer, Daniel Brunner, Markus Leippold, Mrinmaya Sachan, Alexander Stremitzer, Christoph Engel, Elliott Ash, Joel Niklaus
Long-form legal reasoning remains a key challenge for large language models (LLMs) in spite of recent advances in test-time scaling. We introduce LEXam, a novel benchmark derived from 340 law exams spanning 116 law school courses across a range of subjects and degree levels. The dataset comprises 4,886 law exam questions in English and German, including 2,841 long-form, open-ended questions and 2,045 multiple-choice questions. Besides reference answers, the open questions are also accompanied by explicit guidance outlining the expected legal reasoning approach such as issue spotting, rule recall, or rule application. Our evaluation on both open-ended and multiple-choice questions present significant challenges for current LLMs; in particular, they notably struggle with open questions that require structured, multi-step legal reasoning. Moreover, our results underscore the effectiveness of the dataset in differentiating between models with varying capabilities. Adopting an LLM-as-a-Judge paradigm with rigorous human expert validation, we demonstrate how model-generated reasoning steps can be evaluated consistently and accurately. Our evaluation setup provides a scalable method to assess legal reasoning quality beyond simple accuracy metrics. Project page: https://lexam-benchmark.github.io/
nan
Article 695
Title@2025-07-14 (1): Recognizing Dementia from Neuropsychological Tests with State Space Models
Title: Recognizing Dementia from Neuropsychological Tests with State Space Models | Demenz aus neuropsychologischen Tests mit State Space Models erkennen | 利用国家空间模型进行神经心理测试的痴呆症 2507.10311v1 |
Authors (5): Liming Wang, Saurabhchand Bhati, Cody Karjadi, Rhoda Au, James Glass
Early detection of dementia is critical for timely medical intervention and improved patient outcomes. Neuropsychological tests are widely used for cognitive assessment but have traditionally relied on manual scoring. Automatic dementia classification (ADC) systems aim to infer cognitive decline directly from speech recordings of such tests. We propose Demenba, a novel ADC framework based on state space models, which scale linearly in memory and computation with sequence length. Trained on over 1,000 hours of cognitive assessments administered to Framingham Heart Study participants, some of whom were diagnosed with dementia through adjudicated review, our method outperforms prior approaches in fine-grained dementia classification by 21\%, while using fewer parameters. We further analyze its scaling behavior and demonstrate that our model gains additional improvement when fused with large language models, paving the way for more transparent and scalable dementia assessment tools. Code: https://anonymous.4open.science/r/Demenba-0861
nan
Article 696
Title@2025-07-14 (1): DESIGN: Encrypted GNN Inference via Server-Side Input Graph Pruning
Title: DESIGN: Encrypted GNN Inference via Server-Side Input Graph Pruning | DESIGN: Verschlüsselte GNN-Inferenz über Server-Side Input Graph Pruning | design:通过服务器- Side 输入图路透图加密的 GNN 推论 2507.05649v2 |
Authors (4): Kaixiang Zhao, Joseph Yousry Attalla, Qian Lou, Yushun Dong
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in various graph-based learning tasks. However, enabling privacy-preserving GNNs in encrypted domains, such as under Fully Homomorphic Encryption (FHE), typically incurs substantial computational overhead, rendering real-time and privacy-preserving inference impractical. In this work, we propose DESIGN (EncrypteD GNN Inference via sErver-Side Input Graph pruNing), a novel framework for efficient encrypted GNN inference. DESIGN tackles the critical efficiency limitations of existing FHE GNN approaches, which often overlook input data redundancy and apply uniform computational strategies. Our framework achieves significant performance gains through a hierarchical optimization strategy executed entirely on the server: first, FHE-compatible node importance scores (based on encrypted degree statistics) are computed from the encrypted graph. These scores then guide a homomorphic partitioning process, generating multi-level importance masks directly under FHE. This dynamically generated mask facilitates both input graph pruning (by logically removing unimportant elements) and a novel adaptive polynomial activation scheme, where activation complexity is tailored to node importance levels. Empirical evaluations demonstrate that DESIGN substantially accelerates FHE GNN inference compared to state-of-the-art methods while maintaining competitive model accuracy, presenting a robust solution for secure graph analytics. Our implementation is publicly available at https://github.com/LabRAI/DESIGN.
nan
Article 697
Title@2025-07-14 (1): TKAN: Temporal Kolmogorov-Arnold Networks
Title: TKAN: Temporal Kolmogorov-Arnold Networks | TKAN: Temporale Kolmogorov-Arnold-Netzwerke | TKAN: 时间性科尔莫戈罗夫-阿诺尔德网络 2405.07344v4 |
Authors (2): Remi Genet, Hugo Inzirillo
Recurrent Neural Networks (RNNs) have revolutionized many areas of machine learning, particularly in natural language and data sequence processing. Long Short-Term Memory (LSTM) has demonstrated its ability to capture long-term dependencies in sequential data. Inspired by the Kolmogorov-Arnold Networks (KANs) a promising alternatives to Multi-Layer Perceptrons (MLPs), we proposed a new neural networks architecture inspired by KAN and the LSTM, the Temporal Kolomogorov-Arnold Networks (TKANs). TKANs combined the strenght of both networks, it is composed of Recurring Kolmogorov-Arnold Networks (RKANs) Layers embedding memory management. This innovation enables us to perform multi-step time series forecasting with enhanced accuracy and efficiency. By addressing the limitations of traditional models in handling complex sequential patterns, the TKAN architecture offers significant potential for advancements in fields requiring more than one step ahead forecasting.
nan
Article 698
Title@2025-07-14 (1): MF-GLaM: A multifidelity stochastic emulator using generalized lambda models
Title: MF-GLaM: A multifidelity stochastic emulator using generalized lambda models | MF-GLaM: Ein multifidelity stochastischer Emulator mit generalisierten Lambda-Modellen | MF-GLAM:使用通用羊羔模型的多纤维性随机模拟模拟器 2507.10303v1 |
Authors (4): K. Giannoukou, X. Zhu, S. Marelli, B. Sudret
Stochastic simulators exhibit intrinsic stochasticity due to unobservable, uncontrollable, or unmodeled input variables, resulting in random outputs even at fixed input conditions. Such simulators are common across various scientific disciplines; however, emulating their entire conditional probability distribution is challenging, as it is a task traditional deterministic surrogate modeling techniques are not designed for. Additionally, accurately characterizing the response distribution can require prohibitively large datasets, especially for computationally expensive high-fidelity (HF) simulators. When lower-fidelity (LF) stochastic simulators are available, they can enhance limited HF information within a multifidelity surrogate modeling (MFSM) framework. While MFSM techniques are well-established for deterministic settings, constructing multifidelity emulators to predict the full conditional response distribution of stochastic simulators remains a challenge. In this paper, we propose multifidelity generalized lambda models (MF-GLaMs) to efficiently emulate the conditional response distribution of HF stochastic simulators by exploiting data from LF stochastic simulators. Our approach builds upon the generalized lambda model (GLaM), which represents the conditional distribution at each input by a flexible, four-parameter generalized lambda distribution. MF-GLaMs are non-intrusive, requiring no access to the internal stochasticity of the simulators nor multiple replications of the same input values. We demonstrate the efficacy of MF-GLaM through synthetic examples of increasing complexity and a realistic earthquake application. Results show that MF-GLaMs can achieve improved accuracy at the same cost as single-fidelity GLaMs, or comparable performance at significantly reduced cost.
nan
Article 699
Title@2025-07-14 (1): Low Resource Reconstruction Attacks Through Benign Prompts
Title: Low Resource Reconstruction Attacks Through Benign Prompts | Niedrige Ressourcen-Wiederaufbau Angriffe durch Benign Prompts | 通过慈善提示进行低资源重建袭击 2507.07947v2 |
Authors (2): Sol Yarkoni, Roi Livni
The recent advances in generative models such as diffusion models have raised several risks and concerns related to privacy, copyright infringements and data stewardship. To better understand and control the risks, various researchers have created techniques, experiments and attacks that reconstruct images, or part of images, from the training set. While these techniques already establish that data from the training set can be reconstructed, they often rely on high-resources, excess to the training set as well as well-engineered and designed prompts. In this work, we devise a new attack that requires low resources, assumes little to no access to the actual training set, and identifies, seemingly, benign prompts that lead to potentially-risky image reconstruction. This highlights the risk that images might even be reconstructed by an uninformed user and unintentionally. For example, we identified that, with regard to one existing model, the prompt ``blue Unisex T-Shirt’’ can generate the face of a real-life human model. Our method builds on an intuition from previous works which leverages domain knowledge and identifies a fundamental vulnerability that stems from the use of scraped data from e-commerce platforms, where templated layouts and images are tied to pattern-like prompts.
nan
Article 700
Title@2025-07-14 (1): Average Sensitivity of Hierarchical $k$-Median Clustering
Title: Average Sensitivity of Hierarchical $k$-Median Clustering | Durchschnittliche Empfindlichkeit des hierarchischen $k$-Median-Clusters | 等级平均敏感度(千克元-印面) 2507.10296v1 |
Authors (4): Shijie Li, Weiqiang He, Ruobing Bai, Pan Peng
Hierarchical clustering is a widely used method for unsupervised learning with numerous applications. However, in the application of modern algorithms, the datasets studied are usually large and dynamic. If the hierarchical clustering is sensitive to small perturbations of the dataset, the usability of the algorithm will be greatly reduced. In this paper, we focus on the hierarchical $k$ -median clustering problem, which bridges hierarchical and centroid-based clustering while offering theoretical appeal, practical utility, and improved interpretability. We analyze the average sensitivity of algorithms for this problem by measuring the expected change in the output when a random data point is deleted. We propose an efficient algorithm for hierarchical $k$-median clustering and theoretically prove its low average sensitivity and high clustering quality. Additionally, we show that single linkage clustering and a deterministic variant of the CLNSS algorithm exhibit high average sensitivity, making them less stable. Finally, we validate the robustness and effectiveness of our algorithm through experiments.
nan
Article 701
Title@2025-07-14 (1): Application of RESNET50 Convolution Neural Network for the Extraction of Optical Parameters in Scattering Media
Title: Application of RESNET50 Convolution Neural Network for the Extraction of Optical Parameters in Scattering Media | Anwendung von RESNET50 Convolution Neural Network zur Extraktion optischer Parameter in Streumedien | RESNET50 利用革命神经网络在散散射媒体中提取光学参数 2404.16647v2 |
Authors (7): Bowen Deng, Yihan Zhang, Andrew Parkes, Alex Bentley, Amanda Wright, Michael Pound, Michael Somekh
Estimation of the optical properties of scattering media such as tissue is important in diagnostics as well as in the development of techniques to image deeper. As light penetrates the sample scattering events occur that alter the propagation direction of the photons in a random manner leading degradation of image quality. The distribution of the scattered light does, however, give a measure of the optical properties such as the reduced scattering coefficient and the absorption coefficient. Unfortunately, inverting scattering patterns to recover the optical properties is not simple especially in the regime where the light is partially randomized. Machine learning has been proposed by several authors as a means of recovering these properties from either the back scattered or the transmitted light. In the present paper we train a general purpose convolutional neural network RESNET 50 with simulated data based on Monte Carlo simulations. We show that compared with previous work our approach gives comparable or better reconstruction accuracy with training on a much smaller dataset. Moreover, by training on multiple parameters such as the intensity distribution at multiple planes or the exit angle and spatial distribution one achieves improved performance compared to training on a single input such as the intensity distribution captured at the sample surface. While our approach gives good parameter reconstruction, we identify factors that limit accuracy of the recovered properties, particularly the absorption coefficient. In the light of these limitations, we suggest how the present approach may be enhanced for even better performance.
nan
Article 702
Title@2025-07-14 (1): Conditional Chemical Language Models are Versatile Tools in Drug Discovery
Title: Conditional Chemical Language Models are Versatile Tools in Drug Discovery | Bedingte chemische Sprachmodelle sind vielseitige Werkzeuge in der Drug Discovery | 有条件的化学语言模型是药物发现中易感工具 2507.10273v1 |
Authors (2): Lu Zhu, Emmanuel Noutahi
Generative chemical language models (CLMs) have demonstrated strong capabilities in molecular design, yet their impact in drug discovery remains limited by the absence of reliable reward signals and the lack of interpretability in their outputs. We present SAFE-T, a generalist chemical modeling framework that conditions on biological context – such as protein targets or mechanisms of action – to prioritize and design molecules without relying on structural information or engineered scoring functions. SAFE-T models the conditional likelihood of fragment-based molecular sequences given a biological prompt, enabling principled scoring of molecules across tasks such as virtual screening, drug-target interaction prediction, and activity cliff detection. Moreover, it supports goal-directed generation by sampling from this learned distribution, aligning molecular design with biological objectives. In comprehensive zero-shot evaluations across predictive (LIT-PCBA, DAVIS, KIBA, ACNet) and generative (DRUG, PMO) benchmarks, SAFE-T consistently achieves performance comparable to or better than existing approaches while being significantly faster. Fragment-level attribution further reveals that SAFE-T captures known structure-activity relationships, supporting interpretable and biologically grounded design. Together with its computational efficiency, these results demonstrate that conditional generative CLMs can unify scoring and generation to accelerate early-stage drug discovery.
nan
Article 703
Title@2025-07-14 (1): Asymptotic regularity of a generalised stochastic Halpern scheme
Title: Asymptotic regularity of a generalised stochastic Halpern scheme | Asymptotische Regelmäßigkeit eines generalisierten stochastischen Halpern-Systems | 普通的口切性Halpern计划无症状的常规性 2411.04845v2 |
Authors (2): Nicholas Pischke, Thomas Powell
We provide abstract, general and highly uniform rates of asymptotic regularity for a generalized stochastic Halpern-style iteration, which incorporates a second mapping in the style of a Krasnoselskii-Mann iteration. This iteration is general in two ways: First, it incorporates stochasticity in a completely abstract way rather than fixing a sampling method; secondly, it includes as special cases stochastic versions of various schemes from the optimization literature, including Halpern’s iteration as well as a Krasnoselskii-Mann iteration with Tikhonov regularization terms in the sense of Bo\c{t}, Csetnek and Meier. For these specific cases, we in particular obtain linear rates of asymptotic regularity, matching (or improving) the currently best known rates for these iterations in stochastic optimization, and quadratic rates of asymptotic regularity are obtained in the context of inner product spaces for the general iteration. At the end, we briefly sketch how the schemes presented here can be instantiated in the context of reinforcement learning to yield novel methods for Q-learning.
nan
Article 704
Title@2025-07-14 (1): DNS Tunneling: Threat Landscape and Improved Detection Solutions
Title: DNS Tunneling: Threat Landscape and Improved Detection Solutions | DNS Tunneling: Bedrohungslandschaft und verbesserte Erkennungslösungen | DNS 隧道建设:威胁景观和改进探测解决方案 2507.10267v1 |
Authors (4): Novruz Amirov, Baran Isik, Bilal Ihsan Tuncer, Serif Bahtiyar
Detecting Domain Name System (DNS) tunneling is a significant challenge in security due to its capacity to hide harmful actions within DNS traffic that appears to be normal and legitimate. Traditional detection methods are based on rule-based approaches or signature matching methods that are often insufficient to accurately identify such covert communication channels. This research is about effectively detecting DNS tunneling. We propose a novel approach to detect DNS tunneling with machine learning algorithms. We combine machine learning algorithms to analyze the traffic by using features extracted from DNS traffic. Analyses results show that the proposed approach is a good candidate to detect DNS tunneling accurately.
nan
Article 705
Title@2025-07-14 (1): On the asymptotic behaviour of stochastic processes, with applications to supermartingale convergence, Dvoretzky’s approximation theorem, and stochastic quasi-Fejér monotonicity
Title: On the asymptotic behaviour of stochastic processes, with applications to supermartingale convergence, Dvoretzky’s approximation theorem, and stochastic quasi-Fejér monotonicity | Über das asymptotische Verhalten stochastischer Prozesse, mit Anwendungen zur Supermartingale Konvergenz, Dvoretzkys Näherungssatz und stochastische Quasi-Fejér-Monotonizität | 关于随机过程的无症状行为,应用到超海趋同、Dvoretzky的近似理论,以及随机准菲杰尔单音性。 2504.12922v2 |
Authors (3): Morenikeji Neri, Nicholas Pischke, Thomas Powell
We prove a novel and general result on the asymptotic behavior of stochastic processes which conform to a certain relaxed supermartingale condition. Our result provides quantitative information in the form of an explicit and effective construction of a rate of convergence for this process, both in mean and almost surely, that is moreover highly uniform in that it only depends on very few data of the surrounding objects involved in the iteration. We then apply this result to derive new quantitative versions of well-known concepts and theorems from stochastic approximation, in particular providing effective rates for a variant of the Robbins-Siegmund theorem, Dvoretzky’s convergence theorem, as well as the convergence of stochastic quasi-Fej'er monotone sequences, the latter of which formulated in a novel and highly general metric context. We utilize the classic and widely studied Robbins-Monro procedure as a template to evaluate our quantitative results and their applicability in greater detail. We conclude by illustrating the breadth of potential further applications with a brief discussion on a variety of other well-known iterative procedures from stochastic approximation. Throughout, we isolate and discuss special cases of our results which allow for the construction of fast, and in particular linear, rates.
nan
Article 706
Title@2025-07-14 (1): A Simple Baseline for Stable and Plastic Neural Networks
Title: A Simple Baseline for Stable and Plastic Neural Networks | Eine einfache Basis für stabile und plastische Neuralnetze | 稳定神经网络和可塑神经网络的简单基线 2507.10637v1 |
Authors (3): É. Künzel, A. Jaziri, V. Ramesh
Continual learning in computer vision requires that models adapt to a continuous stream of tasks without forgetting prior knowledge, yet existing approaches often tip the balance heavily toward either plasticity or stability. We introduce RDBP, a simple, low-overhead baseline that unites two complementary mechanisms: ReLUDown, a lightweight activation modification that preserves feature sensitivity while preventing neuron dormancy, and Decreasing Backpropagation, a biologically inspired gradient-scheduling scheme that progressively shields early layers from catastrophic updates. Evaluated on the Continual ImageNet benchmark, RDBP matches or exceeds the plasticity and stability of state-of-the-art methods while reducing computational cost. RDBP thus provides both a practical solution for real-world continual learning and a clear benchmark against which future continual learning strategies can be measured.
nan
Article 707
Title@2025-07-14 (1): Transformers Can Solve Non-Linear and Non-Markovian Filtering Problems in Continuous Time For Conditionally Gaussian Signals
Title: Transformers Can Solve Non-Linear and Non-Markovian Filtering Problems in Continuous Time For Conditionally Gaussian Signals | Transformer können nicht-lineare und nicht-markowsche Filterprobleme in kontinuierlicher Zeit für bedingt gaussische Signale lösen | 变换器可以在连续时间解答非滑动和非马尔科维的过滤问题, 以用于有条件的高斯信号 2310.19603v4 |
Authors (4): Blanka Horvath, Anastasis Kratsios, Yannick Limmer, Xuwei Yang
The use of attention-based deep learning models in stochastic filtering, e.g. transformers and deep Kalman filters, has recently come into focus; however, the potential for these models to solve stochastic filtering problems remains largely unknown. The paper provides an affirmative answer to this open problem in the theoretical foundations of machine learning by showing that a class of continuous-time transformer models, called \textit{filterformers}, can approximately implement the conditional law of a broad class of non-Markovian and conditionally Gaussian signal processes given noisy continuous-time (possibly non-Gaussian) measurements. Our approximation guarantees hold uniformly over sufficiently regular compact subsets of continuous-time paths, where the worst-case 2-Wasserstein distance between the true optimal filter and our deep learning model quantifies the approximation error. Our construction relies on two new customizations of the standard attention mechanism: The first can losslessly adapt to the characteristics of a broad range of paths since we show that the attention mechanism implements bi-Lipschitz embeddings of sufficiently regular sets of paths into low-dimensional Euclidean spaces; thus, it incurs no ``dimension reduction error’’. The latter attention mechanism is tailored to the geometry of Gaussian measures in the $2$-Wasserstein space. Our analysis relies on new stability estimates of robust optimal filters in the conditionally Gaussian setting.
nan
Article 708
Title@2025-07-14 (1): DepViT-CAD: Deployable Vision Transformer-Based Cancer Diagnosis in Histopathology
Title: DepViT-CAD: Deployable Vision Transformer-Based Cancer Diagnosis in Histopathology | DepViT-CAD: Deployable Vision Transformerbasierte Krebsdiagnose in der Histopathologie | DepVVT-CAD: 在病理学中可部署的愿景变异器癌症诊断 2507.10250v1 |
Authors (5): Ashkan Shakarami, Lorenzo Nicole, Rocco Cappellesso, Angelo Paolo Dei Tos, Stefano Ghidoni
Accurate and timely cancer diagnosis from histopathological slides is vital for effective clinical decision-making. This paper introduces DepViT-CAD, a deployable AI system for multi-class cancer diagnosis in histopathology. At its core is MAViT, a novel Multi-Attention Vision Transformer designed to capture fine-grained morphological patterns across diverse tumor types. MAViT was trained on expert-annotated patches from 1008 whole-slide images, covering 11 diagnostic categories, including 10 major cancers and non-tumor tissue. DepViT-CAD was validated on two independent cohorts: 275 WSIs from The Cancer Genome Atlas and 50 routine clinical cases from pathology labs, achieving diagnostic sensitivities of 94.11% and 92%, respectively. By combining state-of-the-art transformer architecture with large-scale real-world validation, DepViT-CAD offers a robust and scalable approach for AI-assisted cancer diagnostics. To support transparency and reproducibility, software and code will be made publicly available at GitHub.
nan
Article 709
Title@2025-07-14 (1): GeoHopNet: Hopfield-Augmented Sparse Spatial Attention for Dynamic UAV Site Location Problem
Title: GeoHopNet: Hopfield-Augmented Sparse Spatial Attention for Dynamic UAV Site Location Problem | GeoHopNet: Hopfield-Augmented Sparse Räumliche Aufmerksamkeit für dynamische UAV-Standort-Problem | GeoHopNet:动态无人驾驶飞行器现场位置问题 2507.10636v1 |
Authors (3): Jianing Zhi, Xinghua Li, Zidong Chen
The rapid development of urban low-altitude unmanned aerial vehicle (UAV) economy poses new challenges for dynamic site selection of UAV landing points and supply stations. Traditional deep reinforcement learning methods face computational complexity bottlenecks, particularly with standard attention mechanisms, when handling large-scale urban-level location problems. This paper proposes GeoHopNet, a Hopfield-augmented sparse spatial attention network specifically designed for dynamic UAV site location problems. Our approach introduces four core innovations: (1) distance-biased multi-head attention mechanism that explicitly encodes spatial geometric information; (2) K-nearest neighbor sparse attention that reduces computational complexity from $O(N^2)$ to $O(NK)$; (3) a modern Hopfield external memory module; and (4) a memory regularization strategy. Experimental results demonstrate that GeoHopNet extends the boundary of solvable problem sizes. For large-scale instances with 1,000 nodes, where standard attention models become prohibitively slow (over 3 seconds per instance) and traditional solvers fail, GeoHopNet finds high-quality solutions (0.22\% optimality gap) in under 0.1 seconds. Compared to the state-of-the-art ADNet baseline on 100-node instances, our method improves solution quality by 22.2\% and is 1.8$\times$ faster.
nan
Article 710
Title@2025-07-14 (1): Kernel-Adaptive PI-ELMs for Forward and Inverse Problems in PDEs with Sharp Gradients
Title: Kernel-Adaptive PI-ELMs for Forward and Inverse Problems in PDEs with Sharp Gradients | Kernel-Adaptive PI-ELMs für vorwärts und inverse Probleme bei PDEs mit scharfen Gradienten | 具有尖锐梯度的PDE中前方问题和反问题核心适应性 PI-ELMs 2507.10241v1 |
Authors (4): Vikas Dwivedi, Balaji Srinivasan, Monica Sigovan, Bruno Sixou
This paper introduces the Kernel Adaptive Physics-Informed Extreme Learning Machine (KAPI-ELM), an adaptive Radial Basis Function (RBF)-based extension of PI-ELM designed to solve both forward and inverse Partial Differential Equation (PDE) problems involving localized sharp gradients. While PI-ELMs outperform the traditional Physics-Informed Neural Networks (PINNs) in speed due to their single-shot, least square optimization, this advantage comes at a cost: their fixed, randomly initialized input layer limits their ability to capture sharp gradients. To overcome this limitation, we introduce a lightweight Bayesian Optimization (BO) framework that, instead of adjusting each input layer parameter individually as in traditional backpropagation, learns a small set of hyperparameters defining the statistical distribution from which the input weights are drawn. This novel distributional optimization strategy – combining BO for input layer distributional parameters with least-squares optimization for output layer network parameters – enables KAPI-ELM to preserve PI-ELM’s speed while matching or exceeding the expressiveness of PINNs. We validate the proposed methodology on several challenging forward and inverse PDE benchmarks, including a 1D singularly perturbed convection-diffusion equation, a 2D Poisson equation with sharp localized sources, and a time-dependent advection equation. Notably, KAPI-ELM achieves state-of-the-art accuracy in both forward and inverse settings. In stiff PDE regimes, it matches or even outperforms advanced methods such as the Extended Theory of Functional Connections (XTFC), while requiring nearly an order of magnitude fewer tunable parameters. These results establish the potential of KAPI-ELM as a scalable, interpretable, and generalizable physics-informed learning framework, especially in stiff PDE regimes.
nan
Article 711
Title@2025-07-14 (1): Visual Analytics for Explainable and Trustworthy Artificial Intelligence
Title: Visual Analytics for Explainable and Trustworthy Artificial Intelligence | Visual Analytics für erklärbare und vertrauenswürdige Künstliche Intelligenz | 可解释和可信赖的人工智能的视觉分析分析 2507.10240v1 |
Authors (1): Angelos Chatzimparmpas
Our society increasingly depends on intelligent systems to solve complex problems, ranging from recommender systems suggesting the next movie to watch to AI models assisting in medical diagnoses for hospitalized patients. With the iterative improvement of diagnostic accuracy and efficiency, AI holds significant potential to mitigate medical misdiagnoses by preventing numerous deaths and reducing an economic burden of approximately 450 EUR billion annually. However, a key obstacle to AI adoption lies in the lack of transparency: many automated systems function as “black boxes,” providing predictions without revealing the underlying processes. This opacity can hinder experts’ ability to trust and rely on AI systems. Visual analytics (VA) provides a compelling solution by combining AI models with interactive visualizations. These specialized charts and graphs empower users to incorporate their domain expertise to refine and improve the models, bridging the gap between AI and human understanding. In this work, we define, categorize, and explore how VA solutions can foster trust across the stages of a typical AI pipeline. We propose a design space for innovative visualizations and present an overview of our previously developed VA dashboards, which support critical tasks within the various pipeline stages, including data processing, feature engineering, hyperparameter tuning, understanding, debugging, refining, and comparing models.
nan
Article 712
Title@2025-07-14 (1): Spatial Lifting for Dense Prediction
Title: Spatial Lifting for Dense Prediction | Raumheben für dichte Vorhersagen | 高度预测空间升空 2507.10222v1 |
Authors (2): Mingzhi Xu, Yizhe Zhang
We present Spatial Lifting (SL), a novel methodology for dense prediction tasks. SL operates by lifting standard inputs, such as 2D images, into a higher-dimensional space and subsequently processing them using networks designed for that higher dimension, such as a 3D U-Net. Counterintuitively, this dimensionality lifting allows us to achieve good performance on benchmark tasks compared to conventional approaches, while reducing inference costs and significantly lowering the number of model parameters. The SL framework produces intrinsically structured outputs along the lifted dimension. This emergent structure facilitates dense supervision during training and enables robust, near-zero-additional-cost prediction quality assessment at test time. We validate our approach across 19 benchmark datasets (13 for semantic segmentation and 6 for depth estimation), demonstrating competitive dense prediction performance while reducing the model parameter count by over 98% (in the U-Net case) and lowering inference costs. Spatial Lifting introduces a new vision modeling paradigm that offers a promising path toward more efficient, accurate, and reliable deep networks for dense prediction tasks in vision.
nan
Article 713
Title@2025-07-14 (1): A Graph Sufficiency Perspective for Neural Networks
Title: A Graph Sufficiency Perspective for Neural Networks | Eine grafische Sufficiency-Perspektive für neurale Netzwerke | 图 神经网络的量化透视图 2507.10215v1 |
Authors (2): Cencheng Shen, Yuexiao Dong
This paper analyzes neural networks through graph variables and statistical sufficiency. We interpret neural network layers as graph-based transformations, where neurons act as pairwise functions between inputs and learned anchor points. Within this formulation, we establish conditions under which layer outputs are sufficient for the layer inputs, that is, each layer preserves the conditional distribution of the target variable given the input variable. Under dense anchor point assumptions, we prove that asymptotic sufficiency holds in the infinite-width limit and is preserved throughout training. To align more closely with practical architectures, we further show that sufficiency can be achieved with finite-width networks by assuming region-separated input distributions and constructing appropriate anchor points. Our framework covers fully connected layers, general pairwise functions, ReLU and sigmoid activations, and convolutional neural networks. This work bridges statistical sufficiency, graph-theoretic representations, and deep learning, providing a new statistical understanding of neural networks.
nan
Article 714
Title@2025-07-14 (1): Formal Verification of Variational Quantum Circuits
Title: Formal Verification of Variational Quantum Circuits | Formale Überprüfung von Variations-Quantenkreisen | 变量量电路的正式核查 2507.10635v1 |
Authors (4): Nicola Assolini, Luca Marzari, Isabella Mastroeni, Alessandra di Pierro
Variational quantum circuits (VQCs) are a central component of many quantum machine learning algorithms, offering a hybrid quantum-classical framework that, under certain aspects, can be considered similar to classical deep neural networks. A shared aspect is, for instance, their vulnerability to adversarial inputs, small perturbations that can lead to incorrect predictions. While formal verification techniques have been extensively developed for classical models, no comparable framework exists for certifying the robustness of VQCs. Here, we present the first in-depth theoretical and practical study of the formal verification problem for VQCs. Inspired by abstract interpretation methods used in deep learning, we analyze the applicability and limitations of interval-based reachability techniques in the quantum setting. We show that quantum-specific aspects, such as state normalization, introduce inter-variable dependencies that challenge existing approaches. We investigate these issues by introducing a novel semantic framework based on abstract interpretation, where the verification problem for VQCs can be formally defined, and its complexity analyzed. Finally, we demonstrate our approach on standard verification benchmarks.
nan
Article 715
Title@2025-07-14 (1): Learning to Quantize and Precode in Massive MIMO Systems for Energy Reduction: a Graph Neural Network Approach
Title: Learning to Quantize and Precode in Massive MIMO Systems for Energy Reduction: a Graph Neural Network Approach | Quantisieren und Vorkodieren in massiven MIMO-Systemen zur Energiereduzierung lernen: ein Graph Neuronaler Netzwerkansatz | 学习如何量化和预先编码巨量海事组织大规模减少能源系统:图表神经网络方法 2507.10634v1 |
Authors (3): Thomas Feys, Liesbet Van der Perre, François Rottenberg
Massive MIMO systems are moving toward increased numbers of radio frequency chains, higher carrier frequencies and larger bandwidths. As such, digital-to-analog converters (DACs) are becoming a bottleneck in terms of hardware complexity and power consumption. In this work, non-linear precoding for coarsely quantized downlink massive MIMO is studied. Given the NP-hard nature of this problem, a graph neural network (GNN) is proposed that directly outputs the precoded quantized vector based on the channel matrix and the intended transmit symbols. The model is trained in a self-supervised manner, by directly maximizing the achievable rate. To overcome the non-differentiability of the objective function, introduced due to the non-differentiable DAC functions, a straight-through Gumbel-softmax estimation of the gradient is proposed. The proposed method achieves a significant increase in achievable sum rate under coarse quantization. For instance, in the single-user case, the proposed method can achieve the same sum rate as maximum ratio transmission (MRT) by using one-bit DAC’s as compared to 3 bits for MRT. This reduces the DAC’s power consumption by a factor 4-7 and 3 for baseband and RF DACs respectively. This, however, comes at the cost of increased digital signal processing power consumption. When accounting for this, the reduction in overall power consumption holds for a system bandwidth up to 3.5 MHz for baseband DACs, while the RF DACs can maintain a power reduction of 2.9 for higher bandwidths. Notably, indirect effects, which further reduce the power consumption, such as a reduced fronthaul consumption and reduction in other components, are not considered in this analysis.
nan
Article 716
Title@2025-07-14 (1): History Matching under Uncertainty of Geological Scenarios with Implicit Geological Realism Control with Generative Deep Learning and Graph Convolutions
Title: History Matching under Uncertainty of Geological Scenarios with Implicit Geological Realism Control with Generative Deep Learning and Graph Convolutions | Geschichte Passend unter Ungewissheit geologischer Szenarien mit impliziter geologischer Realismuskontrolle mit generativem Deep Learning und Graph Convolutions | 历史在地质情景与隐隐隐的地质现实控制与产生深层学习和图案革命的不确定性的不确定性下匹配的历史 2507.10201v1 |
Authors (3): Gleb Shishaev, Vasily Demyanov, Daniel Arnold
The graph-based variational autoencoder represents an architecture that can handle the uncertainty of different geological scenarios, such as depositional or structural, through the concept of a lowerdimensional latent space. The main difference from recent studies is utilisation of a graph-based approach in reservoir modelling instead of the more traditional lattice-based deep learning methods. We provide a solution to implicitly control the geological realism through the latent variables of a generative model and Geodesic metrics. Our experiments of AHM with synthetic dataset that consists of 3D realisations of channelised geological representations with two distinct scenarios with one and two channels shows the viability of the approach. We offer in-depth analysis of the latent space using tools such as PCA, t-SNE, and TDA to illustrate its structure.
nan
Article 717
Title@2025-07-14 (1): Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
Title: Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models | Trinity-RFT: Ein allgemein angelegtes und einheitliches Rahmenwerk zur Verstärkung der Feinsteuerung großer Sprachmodelle | 三一-RFT:加强大语言模式精美应用的一般目的和统一框架 2505.17826v2 |
Authors (14): Xuchen Pan, Yanxi Chen, Yushuo Chen, Yuchang Sun, Daoyuan Chen, Wenhao Zhang, Yuexiang Xie, Yilun Huang, Yilei Zhang, Dawei Gao, Weijie Shi, Yaliang Li, Bolin Ding, Jingren Zhou
Trinity-RFT is a general-purpose, unified and easy-to-use framework designed for reinforcement fine-tuning (RFT) of large language models. It is built with a modular and decoupled design, consisting of (1) an RFT-core that unifies and generalizes synchronous/asynchronous, on-policy/off-policy, and online/offline modes of RFT; (2) seamless integration for agent-environment interaction with high efficiency and robustness; and (3) systematic data pipelines optimized for RFT. Trinity-RFT can be easily adapted for diverse application scenarios, and serves as a unified platform for development and research of advanced reinforcement learning paradigms at both macroscopic and microscopic levels. This technical report outlines the vision, features, design and implementations of Trinity-RFT, accompanied by extensive examples, applications and experiments that demonstrate its functionalities and user-friendliness.
nan
Article 718
Title@2025-07-14 (1): Learning Private Representations through Entropy-based Adversarial Training
Title: Learning Private Representations through Entropy-based Adversarial Training | Private Repräsentationen lernen durch eine auf Entropie basierende Adversarial-Schulung | 通过以英文为基础的反向培训进行学习私人代表 2507.10194v1 |
Authors (2): Tassilo Klein, Moin Nabi
How can we learn a representation with high predictive power while preserving user privacy? We present an adversarial representation learning method for sanitizing sensitive content from the learned representation. Specifically, we introduce a variant of entropy - focal entropy, which mitigates the potential information leakage of the existing entropy-based approaches. We showcase feasibility on multiple benchmarks. The results suggest high target utility at moderate privacy leakage.
nan
Article 719
Title@2025-07-14 (1): T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs
Title: T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs | T-GRAB: Ein synthetischer Diagnose-Benchmark für das Lernen auf zeitlichen Graphen | T-GRAB: 时间图学习的合成诊断基准 2507.10183v1 |
Authors (5): Alireza Dizaji, Benedict Aaron Tjandra, Mehrab Hamidi, Shenyang Huang, Guillaume Rabusseau
Dynamic graph learning methods have recently emerged as powerful tools for modelling relational data evolving through time. However, despite extensive benchmarking efforts, it remains unclear whether current Temporal Graph Neural Networks (TGNNs) effectively capture core temporal patterns such as periodicity, cause-and-effect, and long-range dependencies. In this work, we introduce the Temporal Graph Reasoning Benchmark (T-GRAB), a comprehensive set of synthetic tasks designed to systematically probe the capabilities of TGNNs to reason across time. T-GRAB provides controlled, interpretable tasks that isolate key temporal skills: counting/memorizing periodic repetitions, inferring delayed causal effects, and capturing long-range dependencies over both spatial and temporal dimensions. We evaluate 11 temporal graph learning methods on these tasks, revealing fundamental shortcomings in their ability to generalize temporal patterns. Our findings offer actionable insights into the limitations of current models, highlight challenges hidden by traditional real-world benchmarks, and motivate the development of architectures with stronger temporal reasoning abilities. The code for T-GRAB can be found at: https://github.com/alirezadizaji/T-GRAB.
nan
Article 720
Title@2025-07-14 (1): Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving
Title: Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving | Pimba: Eine Verarbeitungs-in-Memory-Beschleunigung für Post-Transformer-Großsprachmodell-Servieren | Pimba:在外向后大语文示范服务中快速处理后大语文示范服务 2507.10178v1 |
Authors (11): Wonung Kim, Yubin Lee, Yoonsung Kim, Jinwoo Hwang, Seongryong Oh, Jiyong Jung, Aziz Huseynov, Woong Gyu Park, Chang Hyun Park, Divya Mahajan, Jongse Park
Transformers are the driving force behind today’s Large Language Models (LLMs), serving as the foundation for their performance and versatility. Yet, their compute and memory costs grow with sequence length, posing scalability challenges for long-context inferencing. In response, the algorithm community is exploring alternative architectures, such as state space models (SSMs), linear attention, and recurrent neural networks (RNNs), which we refer to as post-transformers. This shift presents a key challenge: building a serving system that efficiently supports both transformer and post-transformer LLMs within a unified framework. To address this challenge, we analyze the performance characteristics of transformer and post-transformer LLMs. Despite their algorithmic differences, both are fundamentally limited by memory bandwidth under batched inference due to attention in transformers and state updates in post-transformers. Further analyses suggest two additional insights: (1) state update operations, unlike attention, incur high hardware cost, making per-bank PIM acceleration inefficient, and (2) different low-precision arithmetic methods offer varying accuracy-area tradeoffs, while we identify Microsoft’s MX as the Pareto-optimal choice. Building on these insights, we design Pimba as an array of State-update Processing Units (SPUs), each shared between two banks to enable interleaved access to PIM. Each SPU includes a State-update Processing Engine (SPE) that comprises element-wise multipliers and adders using MX-based quantized arithmetic, enabling efficient execution of state update and attention operations. Our evaluation shows that, compared to LLM-optimized GPU and GPU+PIM systems, Pimba achieves up to 3.2x and 2.1x higher token generation throughput, respectively.
nan
Article 721
Title@2025-07-14 (1): Token-based Audio Inpainting via Discrete Diffusion
Title: Token-based Audio Inpainting via Discrete Diffusion | Token-basierte Audio-Inpainting über Discrete Diffusion | 以 Tokon 为基调的音频通过分解传播油漆 2507.08333v2 |
Authors (7): Tali Dror, Iftach Shoham, Moshe Buchris, Oren Gal, Haim Permuter, Gilad Katz, Eliya Nachmani
Audio inpainting refers to the task of reconstructing missing segments in corrupted audio recordings. While prior approaches-including waveform and spectrogram-based diffusion models-have shown promising results for short gaps, they often degrade in quality when gaps exceed 100 milliseconds (ms). In this work, we introduce a novel inpainting method based on discrete diffusion modeling, which operates over tokenized audio representations produced by a pre-trained audio tokenizer. Our approach models the generative process directly in the discrete latent space, enabling stable and semantically coherent reconstruction of missing audio. We evaluate the method on the MusicNet dataset using both objective and perceptual metrics across gap durations up to 300 ms. We further evaluated our approach on the MTG dataset, extending the gap duration to 500 ms. Experimental results demonstrate that our method achieves competitive or superior performance compared to existing baselines, particularly for longer gaps, offering a robust solution for restoring degraded musical recordings. Audio examples of our proposed method can be found at https://iftach21.github.io/
nan
Article 722
Title@2025-07-14 (1): Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning?
Title: Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning? | Sollten wir jemals Entscheidungstransformator für Offline-Verstärkung Lernen bevorzugen? | 我们是否应该更偏爱离线强化学习的决策变异器? 2507.10174v1 |
Authors (3): Yumi Omori, Zixuan Dong, Keith Ross
In recent years, extensive work has explored the application of the Transformer architecture to reinforcement learning problems. Among these, Decision Transformer (DT) has gained particular attention in the context of offline reinforcement learning due to its ability to frame return-conditioned policy learning as a sequence modeling task. Most recently, Bhargava et al. (2024) provided a systematic comparison of DT with more conventional MLP-based offline RL algorithms, including Behavior Cloning (BC) and Conservative Q-Learning (CQL), and claimed that DT exhibits superior performance in sparse-reward and low-quality data settings. In this paper, through experimentation on robotic manipulation tasks (Robomimic) and locomotion benchmarks (D4RL), we show that MLP-based Filtered Behavior Cloning (FBC) achieves competitive or superior performance compared to DT in sparse-reward environments. FBC simply filters out low-performing trajectories from the dataset and then performs ordinary behavior cloning on the filtered dataset. FBC is not only very straightforward, but it also requires less training data and is computationally more efficient. The results therefore suggest that DT is not preferable for sparse-reward environments. From prior work, arguably, DT is also not preferable for dense-reward environments. Thus, we pose the question: Is DT ever preferable?
nan
Article 723
Title@2025-07-14 (1): Play Style Identification Using Low-Level Representations of Play Traces in MicroRTS
Title: Play Style Identification Using Low-Level Representations of Play Traces in MicroRTS | Wiedergabestil-Identifizierung mit Low-Level-Darstellungen von Spielspuren in MicroRTS | 使用微小RTS游戏轨迹的低层次代表的游戏样式识别 2507.10172v1 |
Authors (3): Ruizhe Yu Xia, Jeremy Gow, Simon Lucas
Play style identification can provide valuable game design insights and enable adaptive experiences, with the potential to improve game playing agents. Previous work relies on domain knowledge to construct play trace representations using handcrafted features. More recent approaches incorporate the sequential structure of play traces but still require some level of domain abstraction. In this study, we explore the use of unsupervised CNN-LSTM autoencoder models to obtain latent representations directly from low-level play trace data in MicroRTS. We demonstrate that this approach yields a meaningful separation of different game playing agents in the latent space, reducing reliance on domain expertise and its associated biases. This latent space is then used to guide the exploration of diverse play styles within studied AI players.
nan
Article 724
Title@2025-07-14 (1): Understanding the Rank of Tensor Networks via an Intuitive Example-Driven Approach
Title: Understanding the Rank of Tensor Networks via an Intuitive Example-Driven Approach | Den Rang der Tensor-Netzwerke über einen intuitiven Beispiel-getriebenen Ansatz verstehen | 通过直观的 “ 实例转化办法 “ 了解Tensor网络的排名 2507.10170v1 |
Authors (5): Wuyang Zhou, Giorgos Iacovides, Kriton Konstantinidis, Ilya Kisil, Danilo Mandic
Tensor Network (TN) decompositions have emerged as an indispensable tool in Big Data analytics owing to their ability to provide compact low-rank representations, thus alleviating the ``Curse of Dimensionality’’ inherent in handling higher-order data. At the heart of their success lies the concept of TN ranks, which governs the efficiency and expressivity of TN decompositions. However, unlike matrix ranks, TN ranks often lack a universal meaning and an intuitive interpretation, with their properties varying significantly across different TN structures. Consequently, TN ranks are frequently treated as empirically tuned hyperparameters, rather than as key design parameters inferred from domain knowledge. The aim of this Lecture Note is therefore to demystify the foundational yet frequently misunderstood concept of TN ranks through real-life examples and intuitive visualizations. We begin by illustrating how domain knowledge can guide the selection of TN ranks in widely-used models such as the Canonical Polyadic (CP) and Tucker decompositions. For more complex TN structures, we employ a self-explanatory graphical approach that generalizes to tensors of arbitrary order. Such a perspective naturally reveals the relationship between TN ranks and the corresponding ranks of tensor unfoldings (matrices), thereby circumventing cumbersome multi-index tensor algebra while facilitating domain-informed TN design. It is our hope that this Lecture Note will equip readers with a clear and unified understanding of the concept of TN rank, along with the necessary physical insight and intuition to support the selection, explainability, and deployment of tensor methods in both practical applications and educational contexts.
nan
Article 725
Title@2025-07-14 (1): Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
Title: Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning | Entgraben von Edelsteinen aus Steinen: Politikoptimierung mit negativer Probenvergrößerung für LLM-Reasoning | 从石石中挖出金宝石:政策优化,对LLM理由的负抽样增加 2505.14403v3 |
Authors (7): Zhaohui Yang, Yuxiao Ye, Shilei Jiang, Chen Hu, Linjing Li, Shihong Deng, Daxin Jiang
Recent advances in reasoning language models have witnessed a paradigm shift from short to long CoT pattern. Given the substantial computational cost of rollouts in long CoT models, maximizing the utility of fixed training datasets becomes crucial. Our analysis reveals that negative responses contain valuable components such as self-reflection and error-correction steps, yet primary existing methods either completely discard negative samples (RFT) or apply equal penalization across all tokens (RL), failing to leverage these potential learning signals. In light of this, we propose Behavior Constrained Policy Gradient with Negative Sample Augmentation (BCPG-NSA), a fine-grained offline RL framework that encompasses three stages: 1) sample segmentation, 2) consensus-based step correctness assessment combining LLM and PRM judgers, and 3) policy optimization with NSA designed to effectively mine positive steps within negative samples. Experimental results show that BCPG-NSA outperforms baselines on several challenging math/coding reasoning benchmarks using the same training dataset, achieving improved sample efficiency and demonstrating robustness and scalability when extended to multiple iterations.
nan
Article 726
Title@2025-07-14 (1): Domain Borders Are There to Be Crossed With Federated Few-Shot Adaptation
Title: Domain Borders Are There to Be Crossed With Federated Few-Shot Adaptation | Domain-Grenzen gibt es mit Föderated Few-Shot-Anpassung überschritten werden | 与联邦几热量适应措施交界的域域边界 2507.10160v1 |
Authors (3): Manuel Röder, Christoph Raab, Frank-Michael Schleif
Federated Learning has emerged as a leading paradigm for decentralized, privacy-preserving learning, particularly relevant in the era of interconnected edge devices equipped with sensors. However, the practical implementation of Federated Learning faces three primary challenges: the need for human involvement in costly data labelling processes for target adaptation, covariate shift in client device data collection due to environmental factors affecting sensors, leading to discrepancies between source and target samples, and the impracticality of continuous or regular model updates in resource-constrained environments due to limited data transmission capabilities and technical constraints on channel availability and energy efficiency. To tackle these issues, we expand upon an efficient and scalable Federated Learning framework tailored for real-world client adaptation in industrial settings. This framework leverages a pre-trained source model comprising a deep backbone, an adaptation module, and a classifier running on a powerful server. By freezing the backbone and classifier during client adaptation on resource-constrained devices, we allow the domain adaptive linear layer to handle target domain adaptation, thus minimizing overall computational overhead. Furthermore, this setup, designated as FedAcross+, is extended to encompass the processing of streaming data, thereby rendering the solution suitable for non-stationary environments. Extensive experimental results demonstrate the effectiveness of FedAcross+ in achieving competitive adaptation on low-end client devices with limited target samples, successfully addressing the challenge of domain shift. Moreover, our framework accommodates sporadic model updates within resource-constrained environments, ensuring practical and seamless deployment.
nan
Article 727
Title@2025-07-14 (1): Simulating Biases for Interpretable Fairness in Offline and Online Classifiers
Title: Simulating Biases for Interpretable Fairness in Offline and Online Classifiers | Simulation von Biasen für interpretierbare Fairness in Offline- und Online-Klassifikatoren | 模拟离线和在线分类中的可解释公平比数 2507.10154v1 |
Authors (4): Ricardo Inácio, Zafeiris Kokkinogenis, Vitor Cerqueira, Carlos Soares
Predictive models often reinforce biases which were originally embedded in their training data, through skewed decisions. In such cases, mitigation methods are critical to ensure that, regardless of the prevailing disparities, model outcomes are adjusted to be fair. To assess this, datasets could be systematically generated with specific biases, to train machine learning classifiers. Then, predictive outcomes could aid in the understanding of this bias embedding process. Hence, an agent-based model (ABM), depicting a loan application process that represents various systemic biases across two demographic groups, was developed to produce synthetic datasets. Then, by applying classifiers trained on them to predict loan outcomes, we can assess how biased data leads to unfairness. This highlights a main contribution of this work: a framework for synthetic dataset generation with controllable bias injection. We also contribute with a novel explainability technique, which shows how mitigations affect the way classifiers leverage data features, via second-order Shapley values. In experiments, both offline and online learning approaches are employed. Mitigations are applied at different stages of the modelling pipeline, such as during pre-processing and in-processing.
nan
Article 728
Title@2025-07-14 (1): Concentration of measure for non-linear random matrices with applications to neural networks and non-commutative polynomials
Title: Concentration of measure for non-linear random matrices with applications to neural networks and non-commutative polynomials | Konzentration von Messwerten für nichtlineare Zufallsmatrizen mit Anwendungen in neuronalen Netzwerken und nicht-kommutativen Polynomen | 非线性随机随机矩阵的测量浓度,该矩阵应用到神经网络和非模拟多元复合体 2507.07625v2 |
Authors (1): Radosław Adamczak
We prove concentration inequalities for several models of non-linear random matrices. As corollaries we obtain estimates for linear spectral statistics of the conjugate kernel of neural networks and non-commutative polynomials in (possibly dependent) random matrices.
nan
Article 729
Title@2025-07-14 (1): Deep Recurrence for Dynamical Segmentation Models
Title: Deep Recurrence for Dynamical Segmentation Models | Tiefe Wiederholung für dynamische Segmentierungsmodelle | 动态分割模型的深度重现 2507.10143v1 |
Authors (2): David Calhas, Arlindo L. Oliveira
While biological vision systems rely heavily on feedback connections to iteratively refine perception, most artificial neural networks remain purely feedforward, processing input in a single static pass. In this work, we propose a predictive coding inspired feedback mechanism that introduces a recurrent loop from output to input, allowing the model to refine its internal state over time. We implement this mechanism within a standard U-Net architecture and introduce two biologically motivated operations, softmax projection and exponential decay, to ensure stability of the feedback loop. Through controlled experiments on a synthetic segmentation task, we show that the feedback model significantly outperforms its feedforward counterpart in noisy conditions and generalizes more effectively with limited supervision. Notably, feedback achieves above random performance with just two training examples, while the feedforward model requires at least four. Our findings demonstrate that feedback enhances robustness and data efficiency, and offer a path toward more adaptive and biologically inspired neural architectures. Code is available at: github.com/DCalhas/feedback_segmentation.
nan
Article 730
Title@2025-07-14 (1): Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review
Title: Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review | Anpassungsfähigkeit im Mehr-Agenten-Verstärkungs-Lernen: Ein Rahmen und eine einheitliche Überprüfung | 多机构加强学习中的适应性:框架和统一审查 2507.10142v1 |
Authors (6): Siyi Hu, Mohamad A Hady, Jianglin Qiao, Jimmy Cao, Mahardhika Pratama, Ryszard Kowalczyk
Multi-Agent Reinforcement Learning (MARL) has shown clear effectiveness in coordinating multiple agents across simulated benchmarks and constrained scenarios. However, its deployment in real-world multi-agent systems (MAS) remains limited, primarily due to the complex and dynamic nature of such environments. These challenges arise from multiple interacting sources of variability, including fluctuating agent populations, evolving task goals, and inconsistent execution conditions. Together, these factors demand that MARL algorithms remain effective under continuously changing system configurations and operational demands. To better capture and assess this capacity for adjustment, we introduce the concept of \textit{adaptability} as a unified and practically grounded lens through which to evaluate the reliability of MARL algorithms under shifting conditions, broadly referring to any changes in the environment dynamics that may occur during learning or execution. Centred on the notion of adaptability, we propose a structured framework comprising three key dimensions: learning adaptability, policy adaptability, and scenario-driven adaptability. By adopting this adaptability perspective, we aim to support more principled assessments of MARL performance beyond narrowly defined benchmarks. Ultimately, this survey contributes to the development of algorithms that are better suited for deployment in dynamic, real-world multi-agent systems.
nan
Article 731
Title@2025-07-14 (1): Large-Scale Graph Building in Dynamic Environments: Low Latency and High Quality
Title: Large-Scale Graph Building in Dynamic Environments: Low Latency and High Quality | Large-Scale Graph Building in dynamischen Umgebungen: geringe Latenz und hohe Qualität | 动态环境中的大比例图建设:低长期和高质量 2507.10139v1 |
Authors (11): Filipe Miguel Gonçalves de Almeida, CJ Carey, Hendrik Fichtenberger, Jonathan Halcrow, Silvio Lattanzi, André Linhares, Tao Meng, Ashkan Norouzi-Fard, Nikos Parotsidis, Bryan Perozzi, David Simcha
Learning and constructing large-scale graphs has attracted attention in recent decades, resulting in a rich literature that introduced various systems, tools, and algorithms. Grale is one of such tools that is designed for offline environments and is deployed in more than 50 different industrial settings at Google. Grale is widely applicable because of its ability to efficiently learn and construct a graph on datasets with multiple types of features. However, it is often the case that applications require the underlying data to evolve continuously and rapidly and the updated graph needs to be available with low latency. Such setting make the use of Grale prohibitive. While there are Approximate Nearest Neighbor (ANN) systems that handle dynamic updates with low latency, they are mostly limited to similarities over a single embedding. In this work, we introduce a system that inherits the advantages and the quality of Grale, and maintains a graph construction in a dynamic setting with tens of milliseconds of latency per request. We call the system Dynamic Grale Using ScaNN (Dynamic GUS). Our system has a wide range of applications with over 10 deployments at Google. One of the applications is in Android Security and Privacy, where Dynamic Grale Using ScaNN enables capturing harmful applications 4 times faster, before they can reach users.
nan
Article 732
Title@2025-07-14 (1): Wavelet-Enhanced Neural ODE and Graph Attention for Interpretable Energy Forecasting
Title: Wavelet-Enhanced Neural ODE and Graph Attention for Interpretable Energy Forecasting | Wavelet-Enhanced Neural ODE und Graphen-Achtung für interpretierbare Energieprognosen | 用于可解释性能源预测的增强的神经数字和图示注意 2507.10132v1 |
Authors (1): Usman Gani Joy
Accurate forecasting of energy demand and supply is critical for optimizing sustainable energy systems, yet it is challenged by the variability of renewable sources and dynamic consumption patterns. This paper introduces a neural framework that integrates continuous-time Neural Ordinary Differential Equations (Neural ODEs), graph attention, multi-resolution wavelet transformations, and adaptive learning of frequencies to address the issues of time series prediction. The model employs a robust ODE solver, using the Runge-Kutta method, paired with graph-based attention and residual connections to better understand both structural and temporal patterns. Through wavelet-based feature extraction and adaptive frequency modulation, it adeptly captures and models diverse, multi-scale temporal dynamics. When evaluated across seven diverse datasets: ETTh1, ETTh2, ETTm1, ETTm2 (electricity transformer temperature), and Waste, Solar, and Hydro (renewable energy), this architecture consistently outperforms state-of-the-art baselines in various forecasting metrics, proving its robustness in capturing complex temporal dependencies. Furthermore, the model enhances interpretability through SHAP analysis, making it suitable for sustainable energy applications.
nan
Article 733
Title@2025-07-14 (1): Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution
Title: Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution | Hypersphärische Variations-Autoencoder mit effizienter sphärischer Cauchy-Distribution | 使用高效球道球道配送的超球变异自动编码器 2506.21278v2 |
Authors (2): Lukas Sablica, Kurt Hornik
We propose a novel variational autoencoder (VAE) architecture that employs a spherical Cauchy (spCauchy) latent distribution. Unlike traditional Gaussian latent spaces or the widely used von Mises-Fisher (vMF) distribution, spCauchy provides a more natural hyperspherical representation of latent variables, better capturing directional data while maintaining flexibility. Its heavy-tailed nature prevents over-regularization, ensuring efficient latent space utilization while offering a more expressive representation. Additionally, spCauchy circumvents the numerical instabilities inherent to vMF, which arise from computing normalization constants involving Bessel functions. Instead, it enables a fully differentiable and efficient reparameterization trick via M"obius transformations, allowing for stable and scalable training. The KL divergence can be computed through a rapidly converging power series, eliminating concerns of underflow or overflow associated with evaluation of ratios of hypergeometric functions. These properties make spCauchy a compelling alternative for VAEs, offering both theoretical advantages and practical efficiency in high-dimensional generative modeling.
nan
Article 734
Title@2025-07-14 (1): A Variance-Reduced Cubic-Regularized Newton for Policy Optimization
Title: A Variance-Reduced Cubic-Regularized Newton for Policy Optimization | Ein varianzreduzierter kubisch-regularisierter Newton für politische Optimierung | 用于政策优化的 差异缩放立方( Cubic- Reculized 牛顿) 2507.10120v1 |
Authors (3): Cheng Sun, Zhen Zhang, Shaofu Yang
In this paper, we study a second-order approach to policy optimization in reinforcement learning. Existing second-order methods often suffer from suboptimal sample complexity or rely on unrealistic assumptions about importance sampling. To overcome these limitations, we propose VR-CR-PN, a variance-reduced cubic-regularized policy Newton algorithm. To the best of our knowledge, this is the first algorithm that integrates Hessian-aided variance reduction with second-order policy optimization, effectively addressing the distribution shift problem and achieving best-known sample complexity under general nonconvex conditions but without the need for importance sampling. We theoretically establish that VR-CR-PN achieves a sample complexity of $\tilde{\mathcal{O}}(\epsilon^{-3})$ to reach an $\epsilon$-second-order stationary point, significantly improving upon the previous best result of $\tilde{\mathcal{O}}(\epsilon^{-3.5})$ under comparable assumptions. As an additional contribution, we introduce a novel Hessian estimator for the expected return function, which admits a uniform upper bound independent of the horizon length $H$, allowing the algorithm to achieve horizon-independent sample complexity.
nan
Article 735
Title@2025-07-14 (1): Analysis of AI Techniques for Orchestrating Edge-Cloud Application Migration
Title: Analysis of AI Techniques for Orchestrating Edge-Cloud Application Migration | Analyse von KI-Techniken für das Orchestrieren von Edge-Cloud-Anwendungsmigration | AI: AI: 拼接边城-边城应用移民应用技术分析 2507.10119v1 |
Authors (3): Sadig Gojayev, Ahmad Anaqreh, Carolina Fortuna
Application migration in edge-cloud system enables high QoS and cost effective service delivery. However, automatically orchestrating such migration is typically solved with heuristic approaches. Starting from the Markov Decision Process (MDP), in this paper, we identify, analyze and compare selected state-of-the-art Artificial Intelligence (AI) planning and Reinforcement Learning (RL) approaches for solving the class of edge-cloud application migration problems that can be modeled as Towers of Hanoi (ToH) problems. We introduce a new classification based on state space definition and analyze the compared models also through this lense. The aim is to understand available techniques capable of orchestrating such application migration in emerging computing continuum environments.
nan
Article 736
Title@2025-07-14 (1): DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models
Title: DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models | DiaTool-DPO: Multi-Turn Direct Preference Optimierung für Tool-Augmented Large Language Models | DiaTool-DPO:多发直接首选优化工具增强型大语言模型 2504.02882v2 |
Authors (10): Sunghee Jung, Donghun Lee, Shinbok Lee, Gaeun Seo, Daniel Lee, Byeongil Ko, Junrae Cho, Kihyun Kim, Eunggyun Kim, Myeongcheol Shin
Tool-Augmented Larage Language Models (TA-LLMs) have shown promise in real-world applications, but face challenges in handling incomplete queries and out-of-scope requests. While existing approaches rely mainly on Supervised Fine-Tuning with expert trajectories, we propose DiaTool-DPO, a novel method that enhances TA-LLM’s dialogue capabilities through Direct Preference Optimization. We model TA-LLM interactions as a Markov Decision Process with 5 distinct dialogue states and categorize user queries into 3 types based on their state transition trajectories. We automatically construct paired trajectory datasets of correct and incorrect dialogue flows and introduce a specialized objective loss for dialogue control. Our comprehensive evaluation demonstrates that DiaTool-DPO approaches GPT-4o’s performance (94.8% in information gathering, 91% in tool call rejection) with substantial improvements over baseline (44% and 9.6% respectively) while maintaining core functionality. Our approach opens new possibilities for developing TA-LLMs that can handle diverse real-world scenarios without requiring additional expert demonstrations or human labeling.
nan
Article 737
Title@2025-07-14 (1): Riemannian Time Warping: Multiple Sequence Alignment in Curved Spaces
Title: Riemannian Time Warping: Multiple Sequence Alignment in Curved Spaces | Riemannian Time Warping: Mehrere Sequenzen richten sich in gekrümmten Räumen | Riemannian 时间扭曲: 曲线空间中的多个序列对齐 2506.01635v3 |
Authors (5): Julian Richter, Christopher A. Erdös, Christian Scheurer, Jochen J. Steil, Niels Dehio
Temporal alignment of multiple signals through time warping is crucial in many fields, such as classification within speech recognition or robot motion learning. Almost all related works are limited to data in Euclidean space. Although an attempt was made in 2011 to adapt this concept to unit quaternions, a general extension to Riemannian manifolds remains absent. Given its importance for numerous applications in robotics and beyond, we introduce Riemannian Time Warping (RTW). This novel approach efficiently aligns multiple signals by considering the geometric structure of the Riemannian manifold in which the data is embedded. Extensive experiments on synthetic and real-world data, including tests with an LBR iiwa robot, demonstrate that RTW consistently outperforms state-of-the-art baselines in both averaging and classification tasks.
nan
Article 738
Title@2025-07-14 (1): A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
Title: A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications | Eine umfassende Übersicht über die direkte Präferenzoptimierung: Datensätze, Theorien, Varianten und Anwendungen | 直接优先优化综合调查:数据集、理论、变式和应用 2410.15595v3 |
Authors (12): Wenyi Xiao, Zechuan Wang, Leilei Gan, Shuai Zhao, Zongrui Li, Ruirui Lei, Wanggui He, Luu Anh Tuan, Long Chen, Hao Jiang, Zhou Zhao, Fei Wu
With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an RL-free alternative to Reinforcement Learning from Human Feedback (RLHF). Despite DPO’s various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature. In this work, we present a comprehensive review of the challenges and opportunities in DPO, covering theoretical analyses, variants, relevant preference datasets, and applications. Specifically, we categorize recent studies on DPO based on key research questions to provide a thorough understanding of DPO’s current landscape. Additionally, we propose several future research directions to offer insights on model alignment for the research community. An updated collection of relevant papers can be found on https://github.com/Mr-Loevan/DPO-Survey.
nan
Article 739
Title@2025-07-14 (1): Explaining the Impact of Training on Vision Models via Activation Clustering
Title: Explaining the Impact of Training on Vision Models via Activation Clustering | Erklären der Auswirkungen von Schulungen auf Vision-Modelle durch Aktivierungs-Clustering | 解释培训通过启动集群化对愿景模型的影响 2411.19700v4 |
Authors (3): Ahcène Boubekki, Samuel G. Fadel, Sebastian Mair
This paper introduces Neuro-Activated Vision Explanations (NAVE), a method for extracting and visualizing the internal representations of vision model encoders. By clustering feature activations, NAVE provides insights into learned semantics without fine-tuning. Using object localization, we show that NAVE’s concepts align with image semantics. Through extensive experiments, we analyze the impact of training strategies and architectures on encoder representation capabilities. Additionally, we apply NAVE to study training artifacts in vision transformers and reveal how weak training strategies and spurious correlations degrade model performance. Our findings establish NAVE as a valuable tool for post-hoc model inspection and improving transparency in vision models.
nan
Article 740
Title@2025-07-14 (1): Structuring Radiology Reports: Challenging LLMs with Lightweight Models
Title: Structuring Radiology Reports: Challenging LLMs with Lightweight Models | Structuring Radiology Reports: Herausfordernde LLMs mit Leichtbaumodellen | 结构化放射学报告:用轻量级模型对LMS提出挑战 2506.00200v2 |
Authors (8): Johannes Moll, Louisa Fay, Asfandyar Azhar, Sophie Ostmeier, Tim Lueth, Sergios Gatidis, Curtis Langlotz, Jean-Benoit Delbrouck
Radiology reports are critical for clinical decision-making but often lack a standardized format, limiting both human interpretability and machine learning (ML) applications. While large language models (LLMs) have shown strong capabilities in reformatting clinical text, their high computational requirements, lack of transparency, and data privacy concerns hinder practical deployment. To address these challenges, we explore lightweight encoder-decoder models (<300M parameters)-specifically T5 and BERT2BERT-for structuring radiology reports from the MIMIC-CXR and CheXpert Plus datasets. We benchmark these models against eight open-source LLMs (1B-70B), adapted using prefix prompting, in-context learning (ICL), and low-rank adaptation (LoRA) finetuning. Our best-performing lightweight model outperforms all LLMs adapted using prompt-based techniques on a human-annotated test set. While some LoRA-finetuned LLMs achieve modest gains over the lightweight model on the Findings section (BLEU 6.4%, ROUGE-L 4.8%, BERTScore 3.6%, F1-RadGraph 1.1%, GREEN 3.6%, and F1-SRR-BERT 4.3%), these improvements come at the cost of substantially greater computational resources. For example, LLaMA-3-70B incurred more than 400 times the inference time, cost, and carbon emissions compared to the lightweight model. These results underscore the potential of lightweight, task-specific models as sustainable and privacy-preserving solutions for structuring clinical text in resource-constrained healthcare settings.
nan
Article 741
Title@2025-07-14 (1): FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Title: FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training | FRUGAL: Memory-Efficient Optimization durch Reduzierung des staatlichen Overheads für skalierbares Training | FRUGAL:通过减少国家可扩展培训的间接费用,实现记忆-有效优化 2411.07837v2 |
Authors (4): Philip Zmushko, Aleksandr Beznosikov, Martin Takáč, Samuel Horváth
With the increase in the number of parameters in large language models, the process of pre-training and fine-tuning increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the optimizer state. To overcome this challenge, recent approaches such as low-rank adaptation (LoRA (Hu et al., 2021)), low-rank gradient projection (GaLore (Zhao et al., 2024)), and blockwise optimization (BAdam (Luo et al., 2024)) have been proposed. However, in all these algorithms, the $\textit{effective rank of the weight updates remains low-rank}$, which can lead to a substantial loss of information from the gradient. This loss can be critically important, especially during the pre-training stage. In this paper, we introduce $\texttt{FRUGAL}$ ($\textbf{F}$ull-$\textbf{R}$ank $\textbf{U}$pdates with $\textbf{G}$r$\textbf{A}$dient sp$\textbf{L}$itting), a new memory-efficient optimization framework. $\texttt{FRUGAL}$ leverages gradient splitting to perform low-dimensional updates using advanced algorithms (such as Adam), while updates along the remaining directions are executed via state-free methods like SGD or signSGD (Bernstein et al., 2018). Our framework can be integrated with various low-rank update selection techniques, including GaLore and BAdam. We provide theoretical convergence guarantees for our framework when using SGDM for low-dimensional updates and SGD for state-free updates. Additionally, our method consistently outperforms concurrent approaches across various fixed memory budgets, achieving state-of-the-art results in pre-training and fine-tuning tasks while balancing memory efficiency and performance metrics.
nan
Article 742
Title@2025-07-14 (1): Class-Aware PillarMix: Can Mixed Sample Data Augmentation Enhance 3D Object Detection with Radar Point Clouds?
Title: Class-Aware PillarMix: Can Mixed Sample Data Augmentation Enhance 3D Object Detection with Radar Point Clouds? | Klasse-Aware-SäuleMix: Kann gemischte Probendatenvergrößerung die 3D-Objekterkennung mit Radarpunktwolken verbessern? | 类警用支柱混合:混合抽样数据增强能够用雷达点云加强3D物体探测吗? 2503.02687v2 |
Authors (5): Miao Zhang, Sherif Abdulatif, Benedikt Loesch, Marco Altmann, Bin Yang
Due to the significant effort required for data collection and annotation in 3D perception tasks, mixed sample data augmentation (MSDA) has been widely studied to generate diverse training samples by mixing existing data. Recently, many MSDA techniques have been developed for point clouds, but they mainly target LiDAR data, leaving their application to radar point clouds largely unexplored. In this paper, we examine the feasibility of applying existing MSDA methods to radar point clouds and identify several challenges in adapting these techniques. These obstacles stem from the radar’s irregular angular distribution, deviations from a single-sensor polar layout in multi-radar setups, and point sparsity. To address these issues, we propose Class-Aware PillarMix (CAPMix), a novel MSDA approach that applies MixUp at the pillar level in 3D point clouds, guided by class labels. Unlike methods that rely a single mix ratio to the entire sample, CAPMix assigns an independent ratio to each pillar, boosting sample diversity. To account for the density of different classes, we use class-specific distributions: for dense objects (e.g., large vehicles), we skew ratios to favor points from another sample, while for sparse objects (e.g., pedestrians), we sample more points from the original. This class-aware mixing retains critical details and enriches each sample with new information, ultimately generating more diverse training data. Experimental results demonstrate that our method not only significantly boosts performance but also outperforms existing MSDA approaches across two datasets (Bosch Street and K-Radar). We believe that this straightforward yet effective approach will spark further investigation into MSDA techniques for radar data.
nan
Article 743
Title@2025-07-14 (1): Towards High Supervised Learning Utility Training Data Generation: Data Pruning and Column Reordering
Title: Towards High Supervised Learning Utility Training Data Generation: Data Pruning and Column Reordering | Auf dem Weg zu einem hochüberwachten Lernprogramm zur Schulung von Datengenerierung: Datenkorrektur und Spaltenumstellung | 数据生成:数据调节和整列重新排序 2507.10088v1 |
Authors (4): Tung Sum Thomas Kwok, Zeyong Zhang, Chi-Hua Wang, Guang Cheng
Tabular data synthesis for supervised learning (‘SL’) model training is gaining popularity in industries such as healthcare, finance, and retail. Despite the progress made in tabular data generators, models trained with synthetic data often underperform compared to those trained with original data. This low SL utility of synthetic data stems from class imbalance exaggeration and SL data relationship overlooked by tabular generator. To address these challenges, we draw inspirations from techniques in emerging data-centric artificial intelligence and elucidate Pruning and ReOrdering (‘PRRO’), a novel pipeline that integrates data-centric techniques into tabular data synthesis. PRRO incorporates data pruning to guide the table generator towards observations with high signal-to-noise ratio, ensuring that the class distribution of synthetic data closely matches that of the original data. Besides, PRRO employs a column reordering algorithm to align the data modeling structure of generators with that of SL models. These two modules enable PRRO to optimize SL utility of synthetic data. Empirical experiments on 22 public datasets show that synthetic data generated using PRRO enhances predictive performance compared to data generated without PRRO. Specifically, synthetic replacement of original data yields an average improvement of 26.74% and up to 871.46% improvement using PRRO, while synthetic appendant to original data results with PRRO-generated data results in an average improvement of 6.13% and up to 200.32%. Furthermore, experiments on six highly imbalanced datasets show that PRRO enables the generator to produce synthetic data with a class distribution that resembles the original data more closely, achieving a similarity improvement of 43%. Through PRRO, we foster a seamless integration of data synthesis to subsequent SL prediction, promoting quality and accessible data analysis.
nan
Article 744
Title@2025-07-14 (1): A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area
Title: A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area | Eine Transfer-Lernmethode für die Segmentierung von Wasserkörpern in Fernerkundungsbildern: Eine Fallstudie des Zhada-Tulin-Gebiets | 遥感图像中水体分离的转让学习方法:Zhada Tulin地区的案例研究 2507.10084v1 |
Authors (2): Haonan Chen, Xin Tong
To address the prevalent challenges of domain shift and small sample sizes in remote sensing image water body segmentation, this study proposes and validates a two-stage transfer learning strategy based on the SegFormer model. The approach begins by training a foundational segmentation model on a diverse source domain, where it achieves an Intersection over Union (IoU) of 68.80% on its validation set, followed by fine-tuning on data from the distinct target domain. Focusing on the Zhada Tulin area in Tibet – a region characterized by highly complex topography and spectral features – the experimental results demonstrate that this strategy significantly boosts the IoU for the water body segmentation task from 25.50% (for direct transfer) to 64.84%. This not only effectively resolves the model performance degradation caused by domain discrepancy but also provides an effective technical paradigm for high-precision thematic information extraction in data-scarce and environmentally unique remote sensing scenarios.
nan
Article 745
Title@2025-07-14 (1): Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding
Title: Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding | Kodezi Chronos: Ein Debugging-First Language Model für Repository-Scale, Memory-Driven Code Understanding | Kodezi Chronos:调试第一语言模型,用于存储库规模、记忆驱动代码理解 2507.12482v1 |
Authors (4): Ishraq Khan, Assad Chowdary, Sharoz Haseeb, Urvish Patel
Large Language Models (LLMs) have advanced code generation and software automation, but are fundamentally constrained by limited inference-time context and lack of explicit code structure reasoning. We introduce Kodezi Chronos, a next-generation architecture for autonomous code understanding, debugging, and maintenance, designed to operate across ultra-long contexts comprising entire codebases, histories, and documentation, all without fixed window limits. Kodezi Chronos leverages a multi-level embedding memory engine, combining vector and graph-based indexing with continuous code-aware retrieval. This enables efficient and accurate reasoning over millions of lines of code, supporting repository-scale comprehension, multi-file refactoring, and real-time self-healing actions. Our evaluation introduces a novel Multi Random Retrieval benchmark, specifically tailored to the software engineering domain. Unlike classical retrieval benchmarks, this method requires the model to resolve arbitrarily distant and obfuscated associations across code artifacts, simulating realistic tasks such as variable tracing, dependency migration, and semantic bug localization. Chronos outperforms prior LLMs and code models, demonstrating a 23% improvement in real-world bug detection and reducing debugging cycles by up to 40% compared to traditional sequence-based approaches. By natively interfacing with IDEs and CI/CD workflows, Chronos enables seamless, autonomous software maintenance, elevating code reliability and productivity while reducing manual effort. These results mark a critical advance toward self-sustaining, continuously optimized software ecosystems.
nan
Article 746
Title@2025-07-14 (1): RsGCN: Rescaling Enhances Generalization of GCNs for Solving Scalable Traveling Salesman Problems
Title: RsGCN: Rescaling Enhances Generalization of GCNs for Solving Scalable Traveling Salesman Problems | RsGCN: Rescaling verbessert die Generalisierung von GCNs zur Lösung skalierbarer reisender Salesman-Probleme | RsGCN: 提高全球氯化萘的通用化,以解决可缩放旅行销售员问题 2506.00533v3 |
Authors (4): Junquan Huang, Zong-Gan Chen, Yuncheng Jiang, Zhi-Hui Zhan
Neural traveling salesman problem (TSP) solvers face two critical challenges: poor generalization for scalable TSPs and high training costs. To address these challenges, we propose a new Rescaling Graph Convolutional Network (RsGCN). Focusing on the scale-dependent features (i.e., features varied with problem scales) related to nodes and edges that influence the sensitivity of GCNs to the problem scales, a Rescaling Mechanism in RsGCN enhances the generalization capability by (1) rescaling adjacent nodes to construct a subgraph with a uniform number of adjacent nodes for each node across various scales of TSPs, which stabilizes the graph message aggregation; (2) rescaling subgraph edges to adjust the lengths of subgraph edges to the same magnitude, which maintains numerical consistency. In addition, an efficient training strategy with a mixed-scale dataset and bidirectional loss is used in RsGCN. To fully exploit the heatmaps generated by RsGCN, we design an efficient post-search algorithm termed Re2Opt, in which a reconstruction process based on adaptive weight is incorporated to help avoid local optima. Based on a combined architecture of RsGCN and Re2Opt, our solver achieves remarkable generalization and low training cost: with only 3 epochs of training on the mixed-scale dataset containing instances with up to 100 nodes, it can be generalized successfully to 10K-node instances without any fine-tuning. Extensive experiments demonstrate our state-of-the-art performance across uniform distribution instances of 9 different scales from 20 to 10K nodes and 78 real-world instances from TSPLIB, while requiring the fewest learnable parameters and training epochs among neural competitors.
nan
Article 747
Title@2025-07-14 (1): Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction
Title: Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction | Komprimierungsmethode für das Deep Diagonal State Space Model basierend auf $H^2$ Optimale Reduktion | 以2千赫元最佳减少量为基础的深对角国家空间模型压缩方法 2507.10078v1 |
Authors (2): Hiroki Sakamoto, Kazuhiro Sato
Deep learning models incorporating linear SSMs have gained attention for capturing long-range dependencies in sequential data. However, their large parameter sizes pose challenges for deployment on resource-constrained devices. In this study, we propose an efficient parameter reduction method for these models by applying $H^{2}$ model order reduction techniques from control theory to their linear SSM components. In experiments, the LRA benchmark results show that the model compression based on our proposed method outperforms an existing method using the Balanced Truncation, while successfully reducing the number of parameters in the SSMs to $1/32$ without sacrificing the performance of the original models.
nan
Article 748
Title@2025-07-14 (1): Quality over Quantity: An Effective Large-Scale Data Reduction Strategy Based on Pointwise V-Information
Title: Quality over Quantity: An Effective Large-Scale Data Reduction Strategy Based on Pointwise V-Information | Qualität über Quantität: Eine effektive großräumige Datenreduktionsstrategie basierend auf pointwise V-Informationen | 质量高于数量:基于点五信息的有效大型数据减少战略 2507.00038v2 |
Authors (2): Fei Chen, Wenchi Zhou
In order to increase the effectiveness of model training, data reduction is essential to data-centric AI. It does this by locating the most instructive examples in massive datasets. To increase data quality and training efficiency, the main difficulty is to choose the best examples rather than the complete datasets. In this paper, we propose an effective data reduction strategy based on Pointwise -Information (PVI). To enable a static method, we first use PVI to quantify instance difficulty and remove instances with low difficulty. Experiments show that the classifier performance is maintained with only a 0.0001% to 0.76% reduction in accuracy when 10%-30% of the data is removed. Second, we train the classifiers using a progressive learning strategy on examples sorted by increasing PVI, accelerating convergence and achieving a 0.8% accuracy gain over conventional training. Our findings imply that training a classifier on the chosen optimal subset may improve model performance and increase training efficiency when combined with an efficient data reduction strategy. Furthermore, we have adapted the PVI framework, which was previously limited to English datasets, to a variety of Chinese NLP tasks and base models, yielding insightful results for faster training and cross-lingual data reduction. The codes are released at https://github.com/zhouwenchi/DatasetReductionStrategy.
nan
Article 749
Title@2025-07-14 (1): ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism
Title: ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism | ElasticMM: Effiziente multimodale LLMs mit elastischer multimodaler Parallelität | Elastic MM: 高效的多式多式LLMs 与 Elastic 多式平行主义一起服务 2507.10069v1 |
Authors (5): Zedong Liu, Shenggan Cheng, Guangming Tan, Yang You, Dingwen Tao
Multimodal large language models (MLLMs) extend LLMs to handle images, videos, and audio by incorporating feature extractors and projection modules. However, these additional components – combined with complex inference pipelines and heterogeneous workloads – introduce significant inference overhead. Therefore, efficiently serving MLLMs remains a major challenge. Current tightly coupled serving architectures struggle to distinguish between mixed request types or adapt parallelism strategies to different inference stages, leading to increased time-to-first-token (TTFT) latency and poor resource utilization. To address this, we propose Elastic Multimodal Parallelism (EMP), a new serving paradigm that elastically adapts to resource heterogeneity across request types and inference stages. Building upon EMP, we develop ElasticMM, an MLLM serving system that (1) separates requests into independent modality groups with dynamic resource allocation via a modality-aware load balancer; (2) decouples inference stages and enables parallelism adjustment and adaptive scaling via elastic partition scheduling; and (3) improves inference efficiency through unified multimodal prefix caching and non-blocking encoding. Experiments on diverse real-world datasets show that ElasticMM outperforms state-of-the-art (SOTA) serving systems, reducing TTFT by up to 4.2x and achieving 3.2-4.5x higher throughput while meeting service-level objectives (SLOs).
nan
Article 750
Title@2025-07-14 (1): A Vector-Quantized Foundation Model for Patient Behavior Monitoring
Title: A Vector-Quantized Foundation Model for Patient Behavior Monitoring | Ein Vector-Quantisiertes Foundation-Modell für Patientenverhaltensüberwachung | 病人行为监测矢量定量基础模型 2503.15221v2 |
Authors (7): Rodrigo Oliver, Josué Pérez-Sabater, Leire Paz-Arbaizar, Diego Herrero-Quevedo, Antonio Artés-Rodríguez, Alejandro Lancho, Pablo M. Olmos
Foundation models have achieved remarkable success across various domains, yet their adoption in healthcare remains limited. While significant advances have been made in medical imaging, genetic biomarkers, and time series from electronic health records, the potential of foundation models for patient behavior monitoring through personal digital devices remains underexplored. The data generated by these devices are inherently heterogeneous, multisource, and often exhibit high rates of missing data, posing unique challenges. This paper introduces a novel foundation model based on a modified vector quantized variational autoencoder, specifically designed to process real-world data from smartphones and wearable devices. We leveraged the discrete latent representation of this model to effectively perform two downstream tasks, suicide risk assessment and emotional state prediction, on different held-out clinical cohorts without the need of fine-tuning. We also highlight the existence of a trade-off between discrete and continuous latent structures, suggesting that hybrid models may be optimal for balancing accuracy across various supervised and unsupervised tasks.
nan
Article 751
Title@2025-07-14 (1): PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization
Title: PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization | PRISM: Feinkörniges Papier-zu-Papier-Retrieval mit Multi-Aspect-Aware-Abfrageoptimierung | PRISM: 配有多频谱软件查询优化的精细读纸到纸检索器 2507.10057v1 |
Authors (4): Sangwoo Park, Jinheon Baek, Soyeong Jeong, Sung Ju Hwang
Scientific paper retrieval, particularly framed as document-to-document retrieval, aims to identify relevant papers in response to a long-form query paper, rather than a short query string. Previous approaches to this task have focused on abstracts, embedding them into dense vectors as surrogates for full documents and calculating similarity across them, although abstracts provide only sparse and high-level summaries. To address this, we propose PRISM, a novel document-to-document retrieval method that introduces multiple, fine-grained representations for both the query and candidate papers. In particular, each query paper is decomposed into multiple aspect-specific views and individually embedded, which are then matched against candidate papers similarity segmented to consider their multifaceted dimensions. Moreover, we present SciFullBench, a novel benchmark in which the complete and segmented context of full papers for both queries and candidates is available. Then, experimental results show that PRISM improves performance by an average of 4.3% over existing retrieval baselines.
nan
Article 752
Title@2025-07-14 (1): Scalable Unsupervised Segmentation via Random Fourier Feature-based Gaussian Process
Title: Scalable Unsupervised Segmentation via Random Fourier Feature-based Gaussian Process | Skalierbare unüberwachte Segmentierung über Random Fourier Feature-based Gaussian Process | 通过随机的 Fourier 地貌特征基于 Gaussian 的 Gaussian 进程进行可缩放的不受监督的分割 2507.10632v1 |
Authors (5): Issei Saito, Masatoshi Nagano, Tomoaki Nakamura, Daichi Mochihashi, Koki Mimura
In this paper, we propose RFF-GP-HSMM, a fast unsupervised time-series segmentation method that incorporates random Fourier features (RFF) to address the high computational cost of the Gaussian process hidden semi-Markov model (GP-HSMM). GP-HSMM models time-series data using Gaussian processes, requiring inversion of an N times N kernel matrix during training, where N is the number of data points. As the scale of the data increases, matrix inversion incurs a significant computational cost. To address this, the proposed method approximates the Gaussian process with linear regression using RFF, preserving expressive power while eliminating the need for inversion of the kernel matrix. Experiments on the Carnegie Mellon University (CMU) motion-capture dataset demonstrate that the proposed method achieves segmentation performance comparable to that of conventional methods, with approximately 278 times faster segmentation on time-series data comprising 39,200 frames.
nan
Article 753
Title@2025-07-14 (1): Lightweight Model for Poultry Disease Detection from Fecal Images Using Multi-Color Space Feature Optimization and Machine Learning
Title: Lightweight Model for Poultry Disease Detection from Fecal Images Using Multi-Color Space Feature Optimization and Machine Learning | Leichtes Modell für die Erkennung von Geflügelkrankheiten von Fäkalienbildern mit Multi-Color-Raum-Feature-Optimierung und maschinellem Lernen | 利用多层空间地物优化和机器学习法从地表图像中检测禽类疾病轻量模型 2507.10056v1 |
Authors (4): A. K. M. Shoriful Islam, Md. Rakib Hassan, Macbah Uddin, Md. Shahidur Rahman
Poultry farming is a vital component of the global food supply chain, yet it remains highly vulnerable to infectious diseases such as coccidiosis, salmonellosis, and Newcastle disease. This study proposes a lightweight machine learning-based approach to detect these diseases by analyzing poultry fecal images. We utilize multi-color space feature extraction (RGB, HSV, LAB) and explore a wide range of color, texture, and shape-based descriptors, including color histograms, local binary patterns (LBP), wavelet transforms, and edge detectors. Through a systematic ablation study and dimensionality reduction using PCA and XGBoost feature selection, we identify a compact global feature set that balances accuracy and computational efficiency. An artificial neural network (ANN) classifier trained on these features achieved 95.85% accuracy while requiring no GPU and only 638 seconds of execution time in Google Colab. Compared to deep learning models such as Xception and MobileNetV3, our proposed model offers comparable accuracy with drastically lower resource usage. This work demonstrates a cost-effective, interpretable, and scalable alternative to deep learning for real-time poultry disease detection in low-resource agricultural settings.
nan
Article 754
Title@2025-07-14 (1): Self-attentive Transformer for Fast and Accurate Postprocessing of Temperature and Wind Speed Forecasts
Title: Self-attentive Transformer for Fast and Accurate Postprocessing of Temperature and Wind Speed Forecasts | Selbstaufmerksamer Transformer für schnelle und genaue Nachbearbeitung von Temperatur- und Windgeschwindigkeitsprognosen | 用于快速和准确快速和准确的温度和风速预报后处理的自控变形器 2412.13957v2 |
Authors (9): Aaron Van Poecke, Tobias Sebastian Finn, Ruoke Meng, Joris Van den Bergh, Geert Smet, Jonathan Demaeyer, Piet Termonia, Hossein Tabari, Peter Hellinckx
Current postprocessing techniques often require separate models for each lead time and disregard possible inter-ensemble relationships by either correcting each member separately or by employing distributional approaches. In this work, we tackle these shortcomings with an innovative, fast and accurate Transformer which postprocesses each ensemble member individually while allowing information exchange across variables, spatial dimensions and lead times by means of multi-headed self-attention. Weather forecasts are postprocessed over 20 lead times simultaneously while including up to fifteen meteorological predictors. We use the EUPPBench dataset for training which contains ensemble predictions from the European Center for Medium-range Weather Forecasts’ integrated forecasting system alongside corresponding observations. The work presented here is the first to postprocess the ten and one hundred-meter wind speed forecasts within this benchmark dataset, while also correcting two-meter temperature. Our approach significantly improves the original forecasts, as measured by the CRPS, with 16.5\% for two-meter temperature, 10\% for ten-meter wind speed and 9\% for one hundred-meter wind speed, outperforming a classical member-by-member approach employed as a competitive benchmark. Furthermore, being up to six times faster, it fulfills the demand for rapid operational weather forecasts in various downstream applications, including renewable energy forecasting.
nan
Article 755
Title@2025-07-14 (1): IPAD: Inverse Prompt for AI Detection – A Robust and Explainable LLM-Generated Text Detector
Title: IPAD: Inverse Prompt for AI Detection – A Robust and Explainable LLM-Generated Text Detector | IPAD: Inverse Aufforderung zur KI-Erkennung – ein robuster und erklärbarer LLM-generierter Textdetektor | IPAD: AI 检测反光提示 – – 强力和可解释的LLM-发光文本检测器 2502.15902v2 |
Authors (6): Zheng Chen, Yushi Feng, Changyang He, Yue Deng, Hongxi Pu, Bo Li
Large Language Models (LLMs) have attained human-level fluency in text generation, which complicates the distinction between human-written and LLM-generated texts. This increases the risk of misuse and highlights the need for reliable detectors. Yet, existing detectors exhibit poor robustness on out-of-distribution (OOD) data and attacked data, which is critical for real-world scenarios. Also, they struggle to provide interpretable evidence to support their decisions, thus undermining the reliability. In light of these challenges, we propose IPAD (Inverse Prompt for AI Detection), a novel framework consisting of a Prompt Inverter that identifies predicted prompts that could have generated the input text, and two Distinguishers that examine the probability that the input texts align with the predicted prompts. Empirical evaluations demonstrate that IPAD outperforms the strongest baselines by 9.05% (Average Recall) on in-distribution data, 12.93% (AUROC) on out-of-distribution (OOD) data, and 5.48% (AUROC) on attacked data. IPAD also performs robustly on structured datasets. Furthermore, an interpretability assessment is conducted to illustrate that IPAD enhances the AI detection trustworthiness by allowing users to directly examine the decision-making evidence, which provides interpretable support for its state-of-the-art detection results.
nan
Article 756
Title@2025-07-14 (1): On the Learning with Augmented Class via Forests
Title: On the Learning with Augmented Class via Forests | Über das Lernen mit Augmented Class über Wälder | 通过森林进修学习 2505.09294v2 |
Authors (3): Fan Xu, Wuyang Chen, Wei Gao
Decision trees and forests have achieved successes in various real applications, most working with all testing classes known in training data. In this work, we focus on learning with augmented class via forests, where an augmented class may appear in testing data yet not in training data. We incorporate information of augmented class into trees’ splitting, that is, augmented Gini impurity, a new splitting criterion is introduced to exploit some unlabeled data from testing distribution. We then develop the Learning with Augmented Class via Forests (short for LACForest) approach, which constructs shallow forests according to the augmented Gini impurity and then splits forests with pseudo-labeled augmented instances for better performance. We also develop deep neural forests via an optimization objective based on our augmented Gini impurity, which essentially utilizes the representation power of neural networks for forests. Theoretically, we present the convergence analysis for our augmented Gini impurity, and we finally conduct experiments to evaluate our approaches. The code is available at https://github.com/nju-xuf/LACForest.
nan
Article 757
Title@2025-07-14 (1): On the Efficiency of Training Robust Decision Trees
Title: On the Efficiency of Training Robust Decision Trees | Über die Effizienz des Trainings Robuste Entscheidungsbäume | 提高培训效率的有力决策树 2507.10048v1 |
Authors (3): Benedict Gerlach, Marie Anastacio, Holger H. Hoos
As machine learning gets adopted into the industry quickly, trustworthiness is increasingly in focus. Yet, efficiency and sustainability of robust training pipelines still have to be established. In this work, we consider a simple pipeline for training adversarially robust decision trees and investigate the efficiency of each step. Our pipeline consists of three stages. Firstly, we choose the perturbation size automatically for each dataset. For that, we introduce a simple algorithm, instead of relying on intuition or prior work. Moreover, we show that the perturbation size can be estimated from smaller models than the one intended for full training, and thus significant gains in efficiency can be achieved. Secondly, we train state-of-the-art adversarial training methods and evaluate them regarding both their training time and adversarial accuracy. Thirdly, we certify the robustness of each of the models thus obtained and investigate the time required for this. We find that verification time, which is critical to the efficiency of the full pipeline, is not correlated with training time.
nan
Article 758
Title@2025-07-14 (1): Efficient Deployment of Vision-Language Models on Mobile Devices: A Case Study on OnePlus 13R
Title: Efficient Deployment of Vision-Language Models on Mobile Devices: A Case Study on OnePlus 13R | Effiziente Implementierung von Vision-Language-Modellen auf mobilen Geräten: Eine Fallstudie zu OnePlus 13R | 高效部署移动设备愿景-语言模型:关于OnePlus 13R的案例研究 2507.08505v2 |
Authors (3): Pablo Robin Guerrero, Yueyang Pan, Sanidhya Kashyap
Vision-Language Models (VLMs) offer promising capabilities for mobile devices, but their deployment faces significant challenges due to computational limitations and energy inefficiency, especially for real-time applications. This study provides a comprehensive survey of deployment frameworks for VLMs on mobile devices, evaluating llama.cpp, MLC-Imp, and mllm in the context of running LLaVA-1.5 7B, MobileVLM-3B, and Imp-v1.5 3B as representative workloads on a OnePlus 13R. Each deployment framework was evaluated on the OnePlus 13R while running VLMs, with measurements covering CPU, GPU, and NPU utilization, temperature, inference time, power consumption, and user experience. Benchmarking revealed critical performance bottlenecks across frameworks: CPU resources were consistently over-utilized during token generation, while GPU and NPU accelerators were largely unused. When the GPU was used, primarily for image feature extraction, it was saturated, leading to degraded device responsiveness. The study contributes framework-level benchmarks, practical profiling tools, and an in-depth analysis of hardware utilization bottlenecks, highlighting the consistent overuse of CPUs and the ineffective or unstable use of GPUs and NPUs in current deployment frameworks.
nan
Article 759
Title@2025-07-14 (1): First-ish Order Methods: Hessian-aware Scalings of Gradient Descent
Title: First-ish Order Methods: Hessian-aware Scalings of Gradient Descent | Erste-ish-Order-Methoden: Hessisch-bewusste Skalierungen des gradienten Abstiegs | 第一至一等秩序方法:逐渐后裔的赫西安人觉醒规模 2502.03701v3 |
Authors (3): Oscar Smee, Fred Roosta, Stephen J. Wright
Gradient descent is the primary workhorse for optimizing large-scale problems in machine learning. However, its performance is highly sensitive to the choice of the learning rate. A key limitation of gradient descent is its lack of natural scaling, which often necessitates expensive line searches or heuristic tuning to determine an appropriate step size. In this paper, we address this limitation by incorporating Hessian information to scale the gradient direction. By accounting for the curvature of the function along the gradient, our adaptive, Hessian-aware scaling method ensures a local unit step size guarantee, even in nonconvex settings. Near a local minimum that satisfies the second-order sufficient conditions, our approach achieves linear convergence with a unit step size. We show that our method converges globally under a significantly weaker version of the standard Lipschitz gradient smoothness assumption. Even when Hessian information is inexact, the local unit step size guarantee and global convergence properties remain valid under mild conditions. Finally, we validate our theoretical results empirically on a range of convex and nonconvex machine learning tasks, showcasing the effectiveness of the approach.
nan
Article 760
Title@2025-07-14 (1): Towards Applying Large Language Models to Complement Single-Cell Foundation Models
Title: Towards Applying Large Language Models to Complement Single-Cell Foundation Models | Zur Anwendung großer Sprachmodelle zur Ergänzung von Single-Cell-Stiftungsmodellen | 努力应用大语言模型来补充单一行业基金会模型 2507.10039v1 |
Authors (3): Steven Palayew, Bo Wang, Gary Bader
Single-cell foundation models such as scGPT represent a significant advancement in single-cell omics, with an ability to achieve state-of-the-art performance on various downstream biological tasks. However, these models are inherently limited in that a vast amount of information in biology exists as text, which they are unable to leverage. There have therefore been several recent works that propose the use of LLMs as an alternative to single-cell foundation models, achieving competitive results. However, there is little understanding of what factors drive this performance, along with a strong focus on using LLMs as an alternative, rather than complementary approach to single-cell foundation models. In this study, we therefore investigate what biological insights contribute toward the performance of LLMs when applied to single-cell data, and introduce scMPT; a model which leverages synergies between scGPT, and single-cell representations from LLMs that capture these insights. scMPT demonstrates stronger, more consistent performance than either of its component models, which frequently have large performance gaps between each other across datasets. We also experiment with alternate fusion methods, demonstrating the potential of combining specialized reasoning models with scGPT to improve performance. This study ultimately showcases the potential for LLMs to complement single-cell foundation models and drive improvements in single-cell analysis.
nan
Article 761
Title@2025-07-14 (1): Memory-Efficient Personalization of Text-to-Image Diffusion Models via Selective Optimization Strategies
Title: Memory-Efficient Personalization of Text-to-Image Diffusion Models via Selective Optimization Strategies | Speichereffiziente Personalisierung von Text-zu-Bild-Diffusions-Modellen über selektive Optimierungsstrategien | 通过选择性优化战略实现文本到图像传播模型的记忆有效个化 2507.10029v1 |
Authors (5): Seokeon Choi, Sunghyun Park, Hyoungwoo Park, Jeongho Kim, Sungrack Yun
Memory-efficient personalization is critical for adapting text-to-image diffusion models while preserving user privacy and operating within the limited computational resources of edge devices. To this end, we propose a selective optimization framework that adaptively chooses between backpropagation on low-resolution images (BP-low) and zeroth-order optimization on high-resolution images (ZO-high), guided by the characteristics of the diffusion process. As observed in our experiments, BP-low efficiently adapts the model to target-specific features, but suffers from structural distortions due to resolution mismatch. Conversely, ZO-high refines high-resolution details with minimal memory overhead but faces slow convergence when applied without prior adaptation. By complementing both methods, our framework leverages BP-low for effective personalization while using ZO-high to maintain structural consistency, achieving memory-efficient and high-quality fine-tuning. To maximize the efficacy of both BP-low and ZO-high, we introduce a timestep-aware probabilistic function that dynamically selects the appropriate optimization strategy based on diffusion timesteps. This function mitigates the overfitting from BP-low at high timesteps, where structural information is critical, while ensuring ZO-high is applied more effectively as training progresses. Experimental results demonstrate that our method achieves competitive performance while significantly reducing memory consumption, enabling scalable, high-quality on-device personalization without increasing inference latency.
nan
Article 762
Title@2025-07-14 (1): Integrated Gradient Correlation: a Dataset-wise Attribution Method
Title: Integrated Gradient Correlation: a Dataset-wise Attribution Method | Integrierte Gradientenkorrelation: eine datensatzweise Attributionsmethode | 集成梯度关联:数据集自定义方法 2404.13910v2 |
Authors (2): Pierre Lelièvre, Chien-Chung Chen
Attribution methods are primarily designed to study input component contributions to individual model predictions. However, some research applications require a summary of attribution patterns across the entire dataset to facilitate the interpretability of the scrutinized models at a task-level rather than an instance-level. It specifically applies when the localization of important input information is supposed to be stable for a specific problem but remains unidentified among numerous components. In this paper, we present a dataset-wise attribution method called Integrated Gradient Correlation (IGC) that enables region-specific analysis by a direct summation over associated components, and further relates the sum of all attributions to a model prediction score (correlation). We demonstrate IGC on synthetic data and fMRI neural signals (NSD dataset) with the study of the representation of image features in the brain and the estimation of the visual receptive field of neural populations. The resulting IGC attributions reveal selective patterns, coherent with respective model objectives.
nan
Article 763
Title@2025-07-14 (1): STRAP: Spatial-Temporal Risk-Attentive Vehicle Trajectory Prediction for Autonomous Driving
Title: STRAP: Spatial-Temporal Risk-Attentive Vehicle Trajectory Prediction for Autonomous Driving | STRAP: Raum-Temporale Risiko-Attentive Fahrzeug-Trajektorie Vorhersage für autonomes Fahren | SSTRAP: 机动车辆自动驾驶空间-时空风险-加速风险-机动车辆轨迹预测 2507.08563v2 |
Authors (4): Xinyi Ning, Zilin Bian, Dachuan Zuo, Semiha Ergan
Accurate vehicle trajectory prediction is essential for ensuring safety and efficiency in fully autonomous driving systems. While existing methods primarily focus on modeling observed motion patterns and interactions with other vehicles, they often neglect the potential risks posed by the uncertain or aggressive behaviors of surrounding vehicles. In this paper, we propose a novel spatial-temporal risk-attentive trajectory prediction framework that incorporates a risk potential field to assess perceived risks arising from behaviors of nearby vehicles. The framework leverages a spatial-temporal encoder and a risk-attentive feature fusion decoder to embed the risk potential field into the extracted spatial-temporal feature representations for trajectory prediction. A risk-scaled loss function is further designed to improve the prediction accuracy of high-risk scenarios, such as short relative spacing. Experiments on the widely used NGSIM and HighD datasets demonstrate that our method reduces average prediction errors by 4.8% and 31.2% respectively compared to state-of-the-art approaches, especially in high-risk scenarios. The proposed framework provides interpretable, risk-aware predictions, contributing to more robust decision-making for autonomous driving systems.
nan
Article 764
Title@2025-07-14 (1): Collaboration Promotes Group Resilience in Multi-Agent RL
Title: Collaboration Promotes Group Resilience in Multi-Agent RL | Zusammenarbeit fördert Gruppenresistenz in Multi-Agent RL | 协作促进多机构RL中的团体复原力 2111.06614v3 |
Authors (6): Ilai Shraga, Guy Azran, Matthias Gerstgrasser, Ofir Abu, Jeffrey S. Rosenschein, Sarah Keren
To effectively operate in various dynamic scenarios, RL agents must be resilient to unexpected changes in their environment. Previous work on this form of resilience has focused on single-agent settings. In this work, we introduce and formalize a multi-agent variant of resilience, which we term group resilience. We further hypothesize that collaboration with other agents is key to achieving group resilience; collaborating agents adapt better to environmental perturbations in multi-agent reinforcement learning (MARL) settings. We test our hypothesis empirically by evaluating different collaboration protocols and examining their effect on group resilience. Our experiments show that all the examined collaborative approaches achieve higher group resilience than their non-collaborative counterparts.
nan
Article 765
Title@2025-07-14 (1): Forecasting Coccidioidomycosis (Valley Fever) in Arizona: A Graph Neural Network Approach
Title: Forecasting Coccidioidomycosis (Valley Fever) in Arizona: A Graph Neural Network Approach | Prognose der Kokzidioidomykose (Valley Fever) in Arizona: Ein Graph-Neural-Netzwerk-Ansatz | 亚利桑那州Codidiosmiccidomiccios (Valley Fever) 预测亚利桑那州Codidiosmiccios (Valley Fever) : 图形神经网络方法 2507.10014v1 |
Authors (5): Ali Sarabi, Arash Sarabi, Hao Yan, Beckett Sterner, Petar Jevtić
Coccidioidomycosis, commonly known as Valley Fever, remains a significant public health concern in endemic regions of the southwestern United States. This study develops the first graph neural network (GNN) model for forecasting Valley Fever incidence in Arizona. The model integrates surveillance case data with environmental predictors using graph structures, including soil conditions, atmospheric variables, agricultural indicators, and air quality metrics. Our approach explores correlation-based relationships among variables influencing disease transmission. The model captures critical delays in disease progression through lagged effects, enhancing its capacity to reflect complex temporal dependencies in disease ecology. Results demonstrate that the GNN architecture effectively models Valley Fever trends and provides insights into key environmental drivers of disease incidence. These findings can inform early warning systems and guide resource allocation for disease prevention efforts in high-risk areas.
nan
Article 766
Title@2025-07-14 (1): Defense-as-a-Service: Black-box Shielding against Backdoored Graph Models
Title: Defense-as-a-Service: Black-box Shielding against Backdoored Graph Models | Defense-as-a-Service: Black-Box-Abschirmung gegen hintertürige Graphenmodelle | 防卫即服务:防止后门图表模型的黑箱防护 2410.04916v2 |
Authors (4): Xiao Yang, Kai Zhou, Yuni Lai, Gaolei Li
With the trend of large graph learning models, business owners tend to employ a model provided by a third party to deliver business services to users. However, these models might be backdoored, and malicious users can submit trigger-embedded inputs to manipulate the model predictions. Current graph backdoor defenses have several limitations: 1) depending on model-related details, 2) requiring additional model fine-tuning, and 3) relying upon extra explainability tools, all of which are infeasible under stringent privacy policies. To address those limitations, we propose GraphProt, which allows resource-constrained business owners to rely on third parties to avoid backdoor attacks on GNN-based graph classifiers. Our GraphProt is model-agnostic and only relies on the input graph. The key insight is to leverage subgraph information for prediction, thereby mitigating backdoor effects induced by triggers. GraphProt comprises two components: clustering-based trigger elimination and robust subgraph ensemble. Specifically, we first propose feature-topology clustering that aims to remove most of the anomalous subgraphs (triggers). Moreover, we design subgraph sampling strategies based on feature-topology clustering to build a robust classifier via majority vote. Experimental results across three backdoor attacks and six benchmark datasets demonstrate that GraphProt significantly reduces the backdoor attack success rate while preserving the model accuracy on regular graph classification tasks.
nan
Article 767
Title@2025-07-14 (1): Effects of structural properties of neural networks on machine learning performance
Title: Effects of structural properties of neural networks on machine learning performance | Auswirkungen struktureller Eigenschaften neuronaler Netze auf die Leistungsfähigkeit des maschinellen Lernens | 神经网络结构特性对机器学习绩效的影响 2507.10005v1 |
Authors (2): Yash Arya, Sang Hoon Lee
In recent years, graph-based machine learning techniques, such as reinforcement learning and graph neural networks, have garnered significant attention. While some recent studies have started to explore the relationship between the graph structure of neural networks and their predictive performance, they often limit themselves to a narrow range of model networks, particularly lacking mesoscale structures such as communities. Our work advances this area by conducting a more comprehensive investigation, incorporating realistic network structures characterized by heterogeneous degree distributions and community structures, which are typical characteristics of many real networks. These community structures offer a nuanced perspective on network architecture. Our analysis employs model networks such as random and scale-free networks, alongside a comparison with a biological neural network and its subsets for more detailed analysis. We examine the impact of these structural attributes on the performance of image classification tasks. Our findings reveal that structural properties do affect performance to some extent. Specifically, networks featuring coherent, densely interconnected communities demonstrate enhanced learning capabilities. The comparison with the biological neural network emphasizes the relevance of our findings to real-world structures, suggesting an intriguing connection worth further exploration. This study contributes meaningfully to network science and machine learning, providing insights that could inspire the design of more biologically informed neural networks.
nan
Article 768
Title@2025-07-14 (1): Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code
Title: Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code | LLM zur Vernunft bringen: Stärkung Lernen aus algorithmischen Problemen ohne Code | 教LLM到理由:加强从没有法典的等级问题中学习 2507.07498v2 |
Authors (8): Keqin Bao, Nuo Chen, Xiaoyuan Li, Binyuan Hui, Bowen Yu, Fuli Feng, Xiangnan He, Dayiheng Liu
Enhancing reasoning capabilities remains a central focus in the LLM reasearch community. A promising direction involves requiring models to simulate code execution step-by-step to derive outputs for given inputs. However, as code is often designed for large-scale systems, direct application leads to over-reliance on complex data structures and algorithms, even for simple cases, resulting in overfitting to algorithmic patterns rather than core reasoning structures. To address this, we propose TeaR, which aims at teaching LLMs to reason better. TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks, thereby improving general reasoning abilities. We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning. The results consistently show significant performance improvements. Notably, TeaR achieves a 35.9% improvement on Qwen2.5-7B and 5.9% on R1-Distilled-7B.
nan
Article 769
Title@2025-07-14 (1): Two-cluster test
Title: Two-cluster test | Zwei-Cluster-Prüfung | 两组测试 2507.08382v2 |
Authors (6): Xinying Liu, Lianyu Hu, Mudi Jiang, Simeng Zhang, Jun Lou, Zengyou He
Cluster analysis is a fundamental research issue in statistics and machine learning. In many modern clustering methods, we need to determine whether two subsets of samples come from the same cluster. Since these subsets are usually generated by certain clustering procedures, the deployment of classic two-sample tests in this context would yield extremely smaller p-values, leading to inflated Type-I error rate. To overcome this bias, we formally introduce the two-cluster test issue and argue that it is a totally different significance testing issue from conventional two-sample test. Meanwhile, we present a new method based on the boundary points between two subsets to derive an analytical p-value for the purpose of significance quantification. Experiments on both synthetic and real data sets show that the proposed test is able to significantly reduce the Type-I error rate, in comparison with several classic two-sample testing methods. More importantly, the practical usage of such two-cluster test is further verified through its applications in tree-based interpretable clustering and significance-based hierarchical clustering.
nan
Article 770
Title@2025-07-14 (1): Player-Team Heterogeneous Interaction Graph Transformer for Soccer Outcome Prediction
Title: Player-Team Heterogeneous Interaction Graph Transformer for Soccer Outcome Prediction | Spieler-Team Heterogene Interaktion Graph Transformer für Fußball Outcome Vorhersage | 用于足球结果预测的玩家团队 2507.10626v1 |
Authors (5): Lintao Wang, Shiwen Xu, Michael Horton, Joachim Gudmundsson, Zhiyong Wang
Predicting soccer match outcomes is a challenging task due to the inherently unpredictable nature of the game and the numerous dynamic factors influencing results. While it conventionally relies on meticulous feature engineering, deep learning techniques have recently shown a great promise in learning effective player and team representations directly for soccer outcome prediction. However, existing methods often overlook the heterogeneous nature of interactions among players and teams, which is crucial for accurately modeling match dynamics. To address this gap, we propose HIGFormer (Heterogeneous Interaction Graph Transformer), a novel graph-augmented transformer-based deep learning model for soccer outcome prediction. HIGFormer introduces a multi-level interaction framework that captures both fine-grained player dynamics and high-level team interactions. Specifically, it comprises (1) a Player Interaction Network, which encodes player performance through heterogeneous interaction graphs, combining local graph convolutions with a global graph-augmented transformer; (2) a Team Interaction Network, which constructs interaction graphs from a team-to-team perspective to model historical match relationships; and (3) a Match Comparison Transformer, which jointly analyzes both team and player-level information to predict match outcomes. Extensive experiments on the WyScout Open Access Dataset, a large-scale real-world soccer dataset, demonstrate that HIGFormer significantly outperforms existing methods in prediction accuracy. Furthermore, we provide valuable insights into leveraging our model for player performance evaluation, offering a new perspective on talent scouting and team strategy analysis.
nan
Article 771
Title@2025-07-14 (1): Compliance Minimization via Physics-Informed Gaussian Processes
Title: Compliance Minimization via Physics-Informed Gaussian Processes | Compliance Minimierung durch physikinformierte Gaußsche Prozesse | 通过物理系统化高斯进程最大限度地减少遵守规定的情况 2507.09968v1 |
Authors (4): Xiangyu Sun, Amin Yousefpour, Shirin Hosseinmardi, Ramin Bostanabad
Machine learning (ML) techniques have recently gained significant attention for solving compliance minimization (CM) problems. However, these methods typically provide poor feature boundaries, are very expensive, and lack a systematic mechanism to control the design complexity. Herein, we address these limitations by proposing a mesh-free and simultaneous framework based on physics-informed Gaussian processes (GPs). In our approach, we parameterize the design and state variables with GP priors which have independent kernels but share a multi-output neural network (NN) as their mean function. The architecture of this NN is based on Parametric Grid Convolutional Attention Networks (PGCANs) which not only mitigate spectral bias issues, but also provide an interpretable mechanism to control design complexity. We estimate all the parameters of our GP-based representations by simultaneously minimizing the compliance, total potential energy, and residual of volume fraction constraint. Importantly, our loss function exclude all data-based residuals as GPs automatically satisfy them. We also develop computational schemes based on curriculum training and numerical integration to increase the efficiency and robustness of our approach which is shown to (1) produce super-resolution topologies with fast convergence, (2) achieve smaller compliance and less gray area fraction compared to traditional numerical methods, (3) provide control over fine-scale features, and (4) outperform competing ML-based methods.
nan
Article 772
Title@2025-07-14 (1): Text-Driven Causal Representation Learning for Source-Free Domain Generalization
Title: Text-Driven Causal Representation Learning for Source-Free Domain Generalization | Text-getriebene Kausaldarstellungs-Lernen für quellfreie Domain-Verallgemeinerung | 为无源域普遍化进行文字-文字-文字-事业代表性学习 2507.09961v1 |
Authors (10): Lihua Zhou, Mao Ye, Nianxin Li, Shuaifeng Li, Jinlin Wu, Xiatian Zhu, Lei Deng, Hongbin Liu, Jiebo Luo, Zhen Lei
Deep learning often struggles when training and test data distributions differ. Traditional domain generalization (DG) tackles this by including data from multiple source domains, which is impractical due to expensive data collection and annotation. Recent vision-language models like CLIP enable source-free domain generalization (SFDG) by using text prompts to simulate visual representations, reducing data demands. However, existing SFDG methods struggle with domain-specific confounders, limiting their generalization capabilities. To address this issue, we propose TDCRL (\textbf{T}ext-\textbf{D}riven \textbf{C}ausal \textbf{R}epresentation \textbf{L}earning), the first method to integrate causal inference into the SFDG setting. TDCRL operates in two steps: first, it employs data augmentation to generate style word vectors, combining them with class information to generate text embeddings to simulate visual representations; second, it trains a causal intervention network with a confounder dictionary to extract domain-invariant features. Grounded in causal learning, our approach offers a clear and effective mechanism to achieve robust, domain-invariant features, ensuring robust generalization. Extensive experiments on PACS, VLCS, OfficeHome, and DomainNet show state-of-the-art performance, proving TDCRL effectiveness in SFDG.
nan
Article 773
Title@2025-07-14 (1): Radial Neighborhood Smoothing Recommender System
Title: Radial Neighborhood Smoothing Recommender System | Radial Nachbarschaft Smoothing Recommender System | 辐射邻居平滑建议系统 2507.09952v1 |
Authors (2): Zerui Zhang, Yumou Qiu
Recommender systems inherently exhibit a low-rank structure in latent space. A key challenge is to define meaningful and measurable distances in the latent space to capture user-user, item-item, user-item relationships effectively. In this work, we establish that distances in the latent space can be systematically approximated using row-wise and column-wise distances in the observed matrix, providing a novel perspective on distance estimation. To refine the distance estimation, we introduce the correction based on empirical variance estimator to account for noise-induced non-centrality. The novel distance estimation enables a more structured approach to constructing neighborhoods, leading to the Radial Neighborhood Estimator (RNE), which constructs neighborhoods by including both overlapped and partially overlapped user-item pairs and employs neighborhood smoothing via localized kernel regression to improve imputation accuracy. We provide the theoretical asymptotic analysis for the proposed estimator. We perform evaluations on both simulated and real-world datasets, demonstrating that RNE achieves superior performance compared to existing collaborative filtering and matrix factorization methods. While our primary focus is on distance estimation in latent space, we find that RNE also mitigates the ``cold-start’’ problem.
nan
Article 774
Title@2025-07-14 (1): Hierarchical Job Classification with Similarity Graph Integration
Title: Hierarchical Job Classification with Similarity Graph Integration | Hierarchische Jobklassifikation mit Ähnlichkeitsgrafikintegration | 具有相似图集集成的等级职务分类 2507.09949v1 |
Authors (4): Md Ahsanul Kabir, Kareem Abdelfatah, Mohammed Korayem, Mohammad Al Hasan
In the dynamic realm of online recruitment, accurate job classification is paramount for optimizing job recommendation systems, search rankings, and labor market analyses. As job markets evolve, the increasing complexity of job titles and descriptions necessitates sophisticated models that can effectively leverage intricate relationships within job data. Traditional text classification methods often fall short, particularly due to their inability to fully utilize the hierarchical nature of industry categories. To address these limitations, we propose a novel representation learning and classification model that embeds jobs and hierarchical industry categories into a latent embedding space. Our model integrates the Standard Occupational Classification (SOC) system and an in-house hierarchical taxonomy, Carotene, to capture both graph and hierarchical relationships, thereby improving classification accuracy. By embedding hierarchical industry categories into a shared latent space, we tackle cold start issues and enhance the dynamic matching of candidates to job opportunities. Extensive experimentation on a large-scale dataset of job postings demonstrates the model’s superior ability to leverage hierarchical structures and rich semantic features, significantly outperforming existing methods. This research provides a robust framework for improving job classification accuracy, supporting more informed decision-making in the recruitment industry.
nan
Article 775
Title@2025-07-14 (1): Iceberg: Enhancing HLS Modeling with Synthetic Data
Title: Iceberg: Enhancing HLS Modeling with Synthetic Data | Iceberg: Verbesserung der HLS-Modellierung mit synthetischen Daten | 冰山:加强利用合成数据建立HLS模型 2507.09948v1 |
Authors (6): Zijian Ding, Tung Nguyen, Weikai Li, Aditya Grover, Yizhou Sun, Jason Cong
Deep learning-based prediction models for High-Level Synthesis (HLS) of hardware designs often struggle to generalize. In this paper, we study how to close the generalizability gap of these models through pretraining on synthetic data and introduce Iceberg, a synthetic data augmentation approach that expands both large language model (LLM)-generated programs and weak labels of unseen design configurations. Our weak label generation method is integrated with an in-context model architecture, enabling meta-learning from actual and proximate labels. Iceberg improves the geometric mean modeling accuracy by $86.4\%$ when adapt to six real-world applications with few-shot examples and achieves a $2.47\times$ and a $1.12\times$ better offline DSE performance when adapting to two different test datasets. Our open-sourced code is here: \href{https://github.com/UCLA-VAST/iceberg}{https://github.com/UCLA-VAST/iceberg}
nan
Article 776
Title@2025-07-14 (1): Predicting Graph Structure via Adapted Flux Balance Analysis
Title: Predicting Graph Structure via Adapted Flux Balance Analysis | Vorhersage der Graphenstruktur über angepasste Flux-Balance-Analyse | 通过经调整的通量平衡分析实现的预测图结构 2507.05806v2 |
Authors (4): Sevvandi Kandanaarachchi, Ziqi Xu, Stefan Westerlund, Conrad Sanderson
Many dynamic processes such as telecommunication and transport networks can be described through discrete time series of graphs. Modelling the dynamics of such time series enables prediction of graph structure at future time steps, which can be used in applications such as detection of anomalies. Existing approaches for graph prediction have limitations such as assuming that the vertices do not to change between consecutive graphs. To address this, we propose to exploit time series prediction methods in combination with an adapted form of flux balance analysis (FBA), a linear programming method originating from biochemistry. FBA is adapted to incorporate various constraints applicable to the scenario of growing graphs. Empirical evaluations on synthetic datasets (constructed via Preferential Attachment model) and real datasets (UCI Message, HePH, Facebook, Bitcoin) demonstrate the efficacy of the proposed approach.
nan
Article 777
Title@2025-07-14 (1): DeepGesture: A conversational gesture synthesis system based on emotions and semantics
Title: DeepGesture: A conversational gesture synthesis system based on emotions and semantics | DeepGesture: Ein dialogisches Gesten-Synthesesystem basierend auf Emotionen und Semantik | DeepGesture:基于情感和语义的谈话手势合成系统 2507.03147v2 |
Authors (1): Thanh Hoang-Minh
Along with the explosion of large language models, improvements in speech synthesis, advancements in hardware, and the evolution of computer graphics, the current bottleneck in creating digital humans lies in generating character movements that correspond naturally to text or speech inputs. In this work, we present DeepGesture, a diffusion-based gesture synthesis framework for generating expressive co-speech gestures conditioned on multimodal signals - text, speech, emotion, and seed motion. Built upon the DiffuseStyleGesture model, DeepGesture introduces novel architectural enhancements that improve semantic alignment and emotional expressiveness in generated gestures. Specifically, we integrate fast text transcriptions as semantic conditioning and implement emotion-guided classifier-free diffusion to support controllable gesture generation across affective states. To visualize results, we implement a full rendering pipeline in Unity based on BVH output from the model. Evaluation on the ZeroEGGS dataset shows that DeepGesture produces gestures with improved human-likeness and contextual appropriateness. Our system supports interpolation between emotional states and demonstrates generalization to out-of-distribution speech, including synthetic voices - marking a step forward toward fully multimodal, emotionally aware digital humans. Project page: https://deepgesture.github.io
nan
Article 778
Title@2025-07-14 (1): Long-Tailed Data Classification by Increasing and Decreasing Neurons During Training
Title: Long-Tailed Data Classification by Increasing and Decreasing Neurons During Training | Langzeit-Datenklassifikation durch zunehmende und abnehmende Neuronen während des Trainings | 培训期间通过增加和减少中微量增加和减少长期数据分类 2507.09940v1 |
Authors (2): Taigo Sakai, Kazuhiro Hotta
In conventional deep learning, the number of neurons typically remains fixed during training. However, insights from biology suggest that the human hippocampus undergoes continuous neuron generation and pruning of neurons over the course of learning, implying that a flexible allocation of capacity can contribute to enhance performance. Real-world datasets often exhibit class imbalance situations where certain classes have far fewer samples than others, leading to significantly reduce recognition accuracy for minority classes when relying on fixed size networks.To address the challenge, we propose a method that periodically adds and removes neurons during training, thereby boosting representational power for minority classes. By retaining critical features learned from majority classes while selectively increasing neurons for underrepresented classes, our approach dynamically adjusts capacity during training. Importantly, while the number of neurons changes throughout training, the final network size and structure remain unchanged, ensuring efficiency and compatibility with deployment.Furthermore, by experiments on three different datasets and five representative models, we demonstrate that the proposed method outperforms fixed size networks and shows even greater accuracy when combined with other imbalance-handling techniques. Our results underscore the effectiveness of dynamic, biologically inspired network designs in improving performance on class-imbalanced data.
nan
Article 779
Title@2025-07-14 (1): EVALOOP: Assessing LLM Robustness in Programming from a Self-consistency Perspective
Title: EVALOOP: Assessing LLM Robustness in Programming from a Self-consistency Perspective | EVALOOP: Bewertung der Robustheit von LLM in der Programmierung aus einer Perspektive der Selbstkonsistenz | EVALOOP: 从自统一的角度评估方案拟订中的LLM强力 2505.12185v3 |
Authors (3): Sen Fang, Weiyuan Ding, Bowen Xu
Assessing the programming capabilities of Large Language Models (LLMs) is crucial for their effective use in software engineering. Current evaluations, however, predominantly measure the accuracy of generated code on static benchmarks, neglecting the critical aspect of model robustness during programming tasks. While adversarial attacks offer insights on model robustness, their effectiveness is limited and evaluation could be constrained. Current adversarial attack methods for robustness evaluation yield inconsistent results, struggling to provide a unified evaluation across different LLMs. We introduce EVALOOP, a novel assessment framework that evaluate the robustness from a self-consistency perspective, i.e., leveraging the natural duality inherent in popular software engineering tasks, e.g., code generation and code summarization. EVALOOP initiates a self-contained feedback loop: an LLM generates output (e.g., code) from an input (e.g., natural language specification), and then use the generated output as the input to produce a new output (e.g., summarizes that code into a new specification). EVALOOP repeats the process to assess the effectiveness of EVALOOP in each loop. This cyclical strategy intrinsically evaluates robustness without rely on any external attack setups, providing a unified metric to evaluate LLMs’ robustness in programming. We evaluate 16 prominent LLMs (e.g., GPT-4.1, O4-mini) on EVALOOP and found that EVALOOP typically induces a 5.01%-19.31% absolute drop in pass@1 performance within ten loops. Intriguingly, robustness does not always align with initial performance (i.e., one-time query); for instance, GPT-3.5-Turbo, despite superior initial code generation compared to DeepSeek-V2, demonstrated lower robustness over repeated evaluation loop.
nan
Article 780
Title@2025-07-14 (1): Memorization Sinks: Isolating Memorization during LLM Training
Title: Memorization Sinks: Isolating Memorization during LLM Training | Memorization Sinks: Isolation der Memorization während des LLM-Trainings | 记忆记忆辛克:在LLLM培训期间隔离记忆 2507.09937v1 |
Authors (3): Gaurav R. Ghosal, Pratyush Maini, Aditi Raghunathan
Large language models are susceptible to memorizing repeated sequences, posing privacy and copyright concerns. A popular mitigation strategy is to remove memorized information from specific neurons post-hoc. However, such approaches have shown limited success so far. In a controlled setting, we show that the memorization of natural sequences (those that resemble linguistically plausible text) become mechanistically entangled with general language abilities, thereby becoming challenging to remove post-hoc. In this work, we put forward a new paradigm of MemSinks that promotes isolation of memorization by design. We leverage a sequence identifier that activates a unique set of memorization neurons for each sequence across repetitions. By analyzing the dynamics of learning and forgetting, we argue that MemSinks facilitates isolation of memorized content, making it easier to remove without compromising general language capabilities. We implement MemSinks at the billion-parameter and billion-token scale, and observe both effective isolation and strong generalization. To our knowledge, this is the first proof-of-concept on real data demonstrating that simultaneous generalization and isolation is achievable. We open-source our code at http://github.com/grghosal/MemSinks.
nan
Article 781
Title@2025-07-14 (1): Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications
Title: Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications | Mechanistische Interpretation von LoRA-adaptierten Sprachmodellen für Anwendungen in der Reaktorsicherheit | LoRA-Adddd 核反应堆安全应用语言模型的可解释性 2507.09931v1 |
Authors (1): Yoon Pyo Lee
The integration of Large Language Models (LLMs) into safety-critical domains, such as nuclear engineering, necessitates a deep understanding of their internal reasoning processes. This paper presents a novel methodology for interpreting how an LLM encodes and utilizes domain-specific knowledge, using a Boiling Water Reactor system as a case study. We adapted a general-purpose LLM (Gemma-3-1b-it) to the nuclear domain using a parameter-efficient fine-tuning technique known as Low-Rank Adaptation. By comparing the neuron activation patterns of the base model to those of the fine-tuned model, we identified a sparse set of neurons whose behavior was significantly altered during the adaptation process. To probe the causal role of these specialized neurons, we employed a neuron silencing technique. Our results demonstrate that while silencing most of these specialized neurons individually did not produce a statistically significant effect, deactivating the entire group collectively led to a statistically significant degradation in task performance. Qualitative analysis further revealed that silencing these neurons impaired the model’s ability to generate detailed, contextually accurate technical information. This paper provides a concrete methodology for enhancing the transparency of an opaque black-box model, allowing domain expertise to be traced to verifiable neural circuits. This offers a pathway towards achieving nuclear-grade artificial intelligence (AI) assurance, addressing the verification and validation challenges mandated by nuclear regulatory frameworks (e.g., 10 CFR 50 Appendix B), which have limited AI deployment in safety-critical nuclear operations.
nan
Article 782
Title@2025-07-14 (1): Aligning Generative Speech Enhancement with Human Preferences via Direct Preference Optimization
Title: Aligning Generative Speech Enhancement with Human Preferences via Direct Preference Optimization | Generative Sprachverbesserung mit menschlichen Präferenzen über direkte Präferenzoptimierung ausrichten | 通过直接普惠制优化,使发创性话语增强与人类偏爱一致 2507.09929v1 |
Authors (6): Haoyang Li, Nana Hou, Yuchen Hu, Jixun Yao, Sabato Marco Siniscalchi, Eng Siong Chng
This work investigates speech enhancement (SE) from the perspective of language models (LMs). We propose a novel method that leverages Direct Preference Optimization (DPO) to improve the perceptual quality of enhanced speech. Using UTMOS, a neural MOS prediction model, as a proxy for human ratings, our approach guides optimization toward perceptually preferred outputs. This differs from existing LM-based SE methods that focus on maximizing the likelihood of clean speech tokens, which may misalign with human perception and degrade quality despite low prediction error. Experiments on the 2020 Deep Noise Suppression Challenge test sets demonstrate that applying DPO to a pretrained LM-based SE model yields consistent improvements across various speech quality metrics, with relative gains of up to 56%. To our knowledge, this is the first application of DPO to SE and the first to incorporate proxy perceptual feedback into LM-based SE training, pointing to a promising direction for perceptually aligned SE.
nan
Article 783
Title@2025-07-14 (1): Extracting Cause-Effect Pairs from a Sentence with a Dependency-Aware Transformer Model
Title: Extracting Cause-Effect Pairs from a Sentence with a Dependency-Aware Transformer Model | Extrahieren von Ursache-Wirkungs-Paaren aus einem Satz mit einem Dependency-Aware-Transformer-Modell | 利用依赖软件变换模型从判决中提取因果对等 2507.09925v1 |
Authors (3): Md Ahsanul Kabir, Abrar Jahin, Mohammad Al Hasan
Extracting cause and effect phrases from a sentence is an important NLP task, with numerous applications in various domains, including legal, medical, education, and scientific research. There are many unsupervised and supervised methods proposed for solving this task. Among these, unsupervised methods utilize various linguistic tools, including syntactic patterns, dependency tree, dependency relations, etc. among different sentential units for extracting the cause and effect phrases. On the other hand, the contemporary supervised methods use various deep learning based mask language models equipped with a token classification layer for extracting cause and effect phrases. Linguistic tools, specifically, dependency tree, which organizes a sentence into different semantic units have been shown to be very effective for extracting semantic pairs from a sentence, but existing supervised methods do not have any provision for utilizing such tools within their model framework. In this work, we propose DepBERT, which extends a transformer-based model by incorporating dependency tree of a sentence within the model framework. Extensive experiments over three datasets show that DepBERT is better than various state-of-the art supervised causality extraction methods.
nan
Article 784
Title@2025-07-14 (1): MixLoRA-DSI: Dynamically Expandable Mixture-of-LoRA Experts for Rehearsal-Free Generative Retrieval over Dynamic Corpora
Title: MixLoRA-DSI: Dynamically Expandable Mixture-of-LoRA Experts for Rehearsal-Free Generative Retrieval over Dynamic Corpora | MixLoRA-DSI: Dynamisch erweiterbare Mischungs-of-LoRA-Experten für ein probenfreies generatives Retrieval über Dynamic Corpora | Mix LoRA-DSI: 动态公司排练-无创录检索专家动态可扩展混合Mix-LORA 2507.09924v1 |
Authors (7): Tuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang, Trung Le, Dragan Gašević, Yuan-Fang Li, Thanh-Toan Do
Continually updating model-based indexes in generative retrieval with new documents remains challenging, as full retraining is computationally expensive and impractical under resource constraints. We propose MixLoRA-DSI, a novel framework that combines an expandable mixture of Low-Rank Adaptation experts with a layer-wise out-of-distribution (OOD)-driven expansion strategy. Instead of allocating new experts for each new corpus, our proposed expansion strategy enables sublinear parameter growth by selectively introducing new experts only when significant number of OOD documents are detected. Experiments on NQ320k and MS MARCO Passage demonstrate that MixLoRA-DSI outperforms full-model update baselines, with minimal parameter overhead and substantially lower training costs.
nan
Article 785
Title@2025-07-14 (1): Towards Efficient Quantity Retrieval from Text:An Approach via Description Parsing and Weak Supervision
Title: Towards Efficient Quantity Retrieval from Text:An Approach via Description Parsing and Weak Supervision | Auf dem Weg zu einer effizienten Menge Abrufen von Text:Ein Ansatz über Beschreibung Parsing und Schwache Überwachung | 实现从文本中有效获取数量:通过描述分析和薄弱监督的一种方法 2507.08322v2 |
Authors (5): Yixuan Cao, Zhengrong Chen, Chengxuan Xia, Kun Wu, Ping Luo
Quantitative facts are continually generated by companies and governments, supporting data-driven decision-making. While common facts are structured, many long-tail quantitative facts remain buried in unstructured documents, making them difficult to access. We propose the task of Quantity Retrieval: given a description of a quantitative fact, the system returns the relevant value and supporting evidence. Understanding quantity semantics in context is essential for this task. We introduce a framework based on description parsing that converts text into structured (description, quantity) pairs for effective retrieval. To improve learning, we construct a large paraphrase dataset using weak supervision based on quantity co-occurrence. We evaluate our approach on a large corpus of financial annual reports and a newly annotated quantity description dataset. Our method significantly improves top-1 retrieval accuracy from 30.98 percent to 64.66 percent.
nan
Article 786
Title@2025-07-14 (1): Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts
Title: Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts | Erforschen von Sparse Adaptern für skalierbare Zusammenführung von Parameter-Effizienten Experten | 探索可缩放的参数集成高效专家的分散适配器 2507.07140v2 |
Authors (9): Samin Yeasar Arnob, Zhan Su, Minseon Kim, Oleksiy Ostapenko, Riyasat Ohib, Esra’a Saleh, Doina Precup, Lucas Caccia, Alessandro Sordoni
Merging parameter-efficient task experts has recently gained growing attention as a way to build modular architectures that can be rapidly adapted on the fly for specific downstream tasks, without requiring additional fine-tuning. Typically, LoRA serves as the foundational building block of such parameter-efficient modular architectures, leveraging low-rank weight structures to reduce the number of trainable parameters. In this paper, we study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures. First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature and surprisingly outperforms both LoRA and full fine-tuning in our setting. Next, we investigate the merging properties of these sparse adapters by merging adapters for up to 20 natural language processing tasks, thus scaling beyond what is usually studied in the literature. Our findings demonstrate that sparse adapters yield superior in-distribution performance post-merging compared to LoRA or full model merging. Achieving strong held-out performance remains a challenge for all methods considered.
nan
Article 787
Title@2025-07-14 (1): Advanced U-Net Architectures with CNN Backbones for Automated Lung Cancer Detection and Segmentation in Chest CT Images
Title: Advanced U-Net Architectures with CNN Backbones for Automated Lung Cancer Detection and Segmentation in Chest CT Images | Erweiterte U-Net-Architekturen mit CNN-Backbones für automatisierte Lungenkrebserkennung und Segmentierung in Brust CT-Bildern | 使用有线电视新闻网用于肺癌自动检测和切斯特CT图象分割的U-Net高级建筑 2507.09898v1 |
Authors (4): Alireza Golkarieha, Kiana Kiashemshakib, Sajjad Rezvani Boroujenic, Nasibeh Asadi Isakand
This study investigates the effectiveness of U-Net architectures integrated with various convolutional neural network (CNN) backbones for automated lung cancer detection and segmentation in chest CT images, addressing the critical need for accurate diagnostic tools in clinical settings. A balanced dataset of 832 chest CT images (416 cancerous and 416 non-cancerous) was preprocessed using Contrast Limited Adaptive Histogram Equalization (CLAHE) and resized to 128x128 pixels. U-Net models were developed with three CNN backbones: ResNet50, VGG16, and Xception, to segment lung regions. After segmentation, CNN-based classifiers and hybrid models combining CNN feature extraction with traditional machine learning classifiers (Support Vector Machine, Random Forest, and Gradient Boosting) were evaluated using 5-fold cross-validation. Metrics included accuracy, precision, recall, F1-score, Dice coefficient, and ROC-AUC. U-Net with ResNet50 achieved the best performance for cancerous lungs (Dice: 0.9495, Accuracy: 0.9735), while U-Net with VGG16 performed best for non-cancerous segmentation (Dice: 0.9532, Accuracy: 0.9513). For classification, the CNN model using U-Net with Xception achieved 99.1 percent accuracy, 99.74 percent recall, and 99.42 percent F1-score. The hybrid CNN-SVM-Xception model achieved 96.7 percent accuracy and 97.88 percent F1-score. Compared to prior methods, our framework consistently outperformed existing models. In conclusion, combining U-Net with advanced CNN backbones provides a powerful method for both segmentation and classification of lung cancer in CT scans, supporting early diagnosis and clinical decision-making.
nan
Article 788
Title@2025-07-14 (1): Algorithm Development in Neural Networks: Insights from the Streaming Parity Task
Title: Algorithm Development in Neural Networks: Insights from the Streaming Parity Task | Algorithmenentwicklung in neuralen Netzwerken: Einblicke aus der Streaming Parity-Aufgabe | 神经网络中的算法发展:流动均等任务中的透视 2507.09897v1 |
Authors (2): Loek van Rossem, Andrew M. Saxe
Even when massively overparameterized, deep neural networks show a remarkable ability to generalize. Research on this phenomenon has focused on generalization within distribution, via smooth interpolation. Yet in some settings neural networks also learn to extrapolate to data far beyond the bounds of the original training set, sometimes even allowing for infinite generalization, implying that an algorithm capable of solving the task has been learned. Here we undertake a case study of the learning dynamics of recurrent neural networks (RNNs) trained on the streaming parity task in order to develop an effective theory of algorithm development. The streaming parity task is a simple but nonlinear task defined on sequences up to arbitrary length. We show that, with sufficient finite training experience, RNNs exhibit a phase transition to perfect infinite generalization. Using an effective theory for the representational dynamics, we find an implicit representational merger effect which can be interpreted as the construction of a finite automaton that reproduces the task. Overall, our results disclose one mechanism by which neural networks can generalize infinitely from finite training experience.
nan
Article 789
Title@2025-07-14 (1): Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
Title: Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning | Verständnis ohne Kompetenz: architektonische Grenzen von LLMs in symbolischer Computation und Vernunft | 无权限理解:符号计算和理由中LLMs的建筑界限 2507.10624v1 |
Authors (1): Zheng Zhang
Large Language Models (LLMs) display striking surface fluency yet systematically fail at tasks requiring symbolic reasoning, arithmetic accuracy, and logical consistency. This paper offers a structural diagnosis of such failures, revealing a persistent gap between \textit{comprehension} and \textit{competence}. Through controlled experiments and architectural analysis, we demonstrate that LLMs often articulate correct principles without reliably applying them–a failure rooted not in knowledge access, but in computational execution. We term this phenomenon the computational \textit{split-brain syndrome}, where instruction and action pathways are geometrically and functionally dissociated. This core limitation recurs across domains, from mathematical operations to relational inferences, and explains why model behavior remains brittle even under idealized prompting. We argue that LLMs function as powerful pattern completion engines, but lack the architectural scaffolding for principled, compositional reasoning. Our findings delineate the boundary of current LLM capabilities and motivate future models with metacognitive control, principle lifting, and structurally grounded execution. This diagnosis also clarifies why mechanistic interpretability findings may reflect training-specific pattern coordination rather than universal computational principles, and why the geometric separation between instruction and execution pathways suggests limitations in neural introspection and mechanistic analysis.
nan
Article 790
Title@2025-07-14 (1): Sequence-Model-Guided Measurement Selection for Quantum State Learning
Title: Sequence-Model-Guided Measurement Selection for Quantum State Learning | Sequence-Modell-geführte Messauswahl für Quantum State Learning | 量子州学习的测量选择 2507.09891v1 |
Authors (4): Jiaxin Huang, Yan Zhu, Giulio Chiribella, Ya-Dong Wu
Characterization of quantum systems from experimental data is a central problem in quantum science and technology. But which measurements should be used to gather data in the first place? While optimal measurement choices can be worked out for small quantum systems, the optimization becomes intractable as the system size grows large. To address this problem, we introduce a deep neural network with a sequence model architecture that searches for efficient measurement choices in a data-driven, adaptive manner. The model can be applied to a variety of tasks, including the prediction of linear and nonlinear properties of quantum states, as well as state clustering and state tomography tasks. In all these tasks, we find that the measurement choices identified by our neural network consistently outperform the uniformly random choice. Intriguingly, for topological quantum systems, our model tends to recommend measurements at the system’s boundaries, even when the task is to predict bulk properties. This behavior suggests that the neural network may have independently discovered a connection between boundaries and bulk, without having been provided any built-in knowledge of quantum physics.
nan
Article 791
Title@2025-07-14 (1): Soft Graph Clustering for single-cell RNA Sequencing Data
Title: Soft Graph Clustering for single-cell RNA Sequencing Data | Weiches Graphen-Clustering für Einzelzell-RNA-Sequenzierungsdaten | RNA 单细胞测序数据软图图群集 2507.09890v1 |
Authors (6): Ping Xu, Pengfei Wang, Zhiyuan Ning, Meng Xiao, Min Wu, Yuanchun Zhou
Clustering analysis is fundamental in single-cell RNA sequencing (scRNA-seq) data analysis for elucidating cellular heterogeneity and diversity. Recent graph-based scRNA-seq clustering methods, particularly graph neural networks (GNNs), have significantly improved in tackling the challenges of high-dimension, high-sparsity, and frequent dropout events that lead to ambiguous cell population boundaries. However, their reliance on hard graph constructions derived from thresholded similarity matrices presents challenges:(i) The simplification of intercellular relationships into binary edges (0 or 1) by applying thresholds, which restricts the capture of continuous similarity features among cells and leads to significant information loss.(ii) The presence of significant inter-cluster connections within hard graphs, which can confuse GNN methods that rely heavily on graph structures, potentially causing erroneous message propagation and biased clustering outcomes. To tackle these challenges, we introduce scSGC, a Soft Graph Clustering for single-cell RNA sequencing data, which aims to more accurately characterize continuous similarities among cells through non-binary edge weights, thereby mitigating the limitations of rigid data structures. The scSGC framework comprises three core components: (i) a zero-inflated negative binomial (ZINB)-based feature autoencoder; (ii) a dual-channel cut-informed soft graph embedding module; and (iii) an optimal transport-based clustering optimization module. Extensive experiments across ten datasets demonstrate that scSGC outperforms 13 state-of-the-art clustering models in clustering accuracy, cell type annotation, and computational efficiency. These results highlight its substantial potential to advance scRNA-seq data analysis and deepen our understanding of cellular heterogeneity.
nan
Article 792
Title@2025-07-14 (1): TolerantECG: A Foundation Model for Imperfect Electrocardiogram
Title: TolerantECG: A Foundation Model for Imperfect Electrocardiogram | TolerantECG: Ein Grundmodell für ein imperfektes Elektrokardiogramm | 缩放式ECG:不完美心电图基金会模型 2507.09887v1 |
Authors (4): Huynh Nguyen Dang, Thang Pham, Ngan Le, Van Nguyen
The electrocardiogram (ECG) is an essential and effective tool for diagnosing heart diseases. However, its effectiveness can be compromised by noise or unavailability of one or more leads of the standard 12-lead recordings, resulting in diagnostic errors or uncertainty. To address these challenges, we propose TolerantECG, a foundation model for ECG signals that is robust to noise and capable of functioning with arbitrary subsets of the standard 12-lead ECG. TolerantECG training combines contrastive and self-supervised learning frameworks to jointly learn ECG signal representations alongside their corresponding knowledge-retrieval-based text report descriptions and corrupted or lead-missing signals. Comprehensive benchmarking results demonstrate that TolerantECG consistently ranks as the best or second-best performer across various ECG signal conditions and class levels in the PTB-XL dataset, and achieves the highest performance on the MIT-BIH Arrhythmia Database.
nan
Article 793
Title@2025-07-14 (1): Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference
Title: Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference | MLPs zum Master Heterogenes Graph-Strukturiertes Wissen für effiziente und genaue Schlussfolgerungen zu bringen | 向异异质图形结构知识硕士教授多功能模型,以便高效和准确推断 2411.14035v2 |
Authors (5): Yunhui Liu, Xinyi Gao, Tieke He, Jianhua Zhao, Hongzhi Yin
Heterogeneous Graph Neural Networks (HGNNs) have achieved promising results in various heterogeneous graph learning tasks, owing to their superiority in capturing the intricate relationships and diverse relational semantics inherent in heterogeneous graph structures. However, the neighborhood-fetching latency incurred by structure dependency in HGNNs makes it challenging to deploy for latency-constrained applications that require fast inference. Inspired by recent GNN-to-MLP knowledge distillation frameworks, we introduce HG2M and HG2M+ to combine both HGNN’s superior performance and MLP’s efficient inference. HG2M directly trains student MLPs with node features as input and soft labels from teacher HGNNs as targets, and HG2M+ further distills reliable and heterogeneous semantic knowledge into student MLPs through reliable node distillation and reliable meta-path distillation. Experiments conducted on six heterogeneous graph datasets show that despite lacking structural dependencies, HG2Ms can still achieve competitive or even better performance than HGNNs and significantly outperform vanilla MLPs. Moreover, HG2Ms demonstrate a 379.24$\times$ speedup in inference over HGNNs on the large-scale IGB-3M-19 dataset, showcasing their ability for latency-sensitive deployments.
nan
Article 794
Title@2025-07-14 (1): AdaBrain-Bench: Benchmarking Brain Foundation Models for Brain-Computer Interface Applications
Title: AdaBrain-Bench: Benchmarking Brain Foundation Models for Brain-Computer Interface Applications | AdaBrain-Bench: Benchmarking Brain Foundation Modelle für Brain-Computer Interface Anwendungen | AdaBrain-Bench:脑-计算机界面应用基准脑基础模型 2507.09882v1 |
Authors (10): Jiamin Wu, Zichen Ren, Junyu Wang, Pengyu Zhu, Yonghao Song, Mianxin Liu, Qihao Zheng, Lei Bai, Wanli Ouyang, Chunfeng Song
Non-invasive Brain-Computer Interfaces (BCI) offer a safe and accessible means of connecting the human brain to external devices, with broad applications in home and clinical settings to enhance human capabilities. However, the high noise level and limited task-specific data in non-invasive signals constrain decoding capabilities. Recently, the adoption of self-supervised pre-training is transforming the landscape of non-invasive BCI research, enabling the development of brain foundation models to capture generic neural representations from large-scale unlabeled electroencephalography (EEG) signals with substantial noises. However, despite these advances, the field currently lacks comprehensive, practical and extensible benchmarks to assess the utility of the public foundation models across diverse BCI tasks, hindering their widespread adoption. To address this challenge, we present AdaBrain-Bench, a large-scale standardized benchmark to systematically evaluate brain foundation models in widespread non-invasive BCI tasks. AdaBrain-Bench encompasses a diverse collection of representative BCI decoding datasets spanning 7 key applications. It introduces a streamlined task adaptation pipeline integrated with multi-dimensional evaluation metrics and a set of adaptation tools. The benchmark delivers an inclusive framework for assessing generalizability of brain foundation models across key transfer settings, including cross-subject, multi-subject, and few-shot scenarios. We leverage AdaBrain-Bench to evaluate a suite of publicly available brain foundation models and offer insights into practices for selecting appropriate models in various scenarios. We make our benchmark pipeline available to enable reproducible research and external use, offering a continuously evolving platform to foster progress toward robust and generalized neural decoding solutions.
nan
Article 795
Title@2025-07-14 (1): Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching
Title: Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching | Bridging the Last Mile of Prediction: Verbesserung der Zeitreihenvorhersage mit konditional gesteuertem Flow Matching | 连接预测的最后一环:加强时间序列预测与有条件的引导流动匹配 2507.07192v2 |
Authors (5): Huibo Xu, Runlong Yu, Likang Wu, Xianquan Wang, Qi Liu
Diffusion models, a type of generative model, have shown promise in time series forecasting. But they face limitations like rigid source distributions and limited sampling paths, which hinder their performance. Flow matching offers faster generation, higher-quality outputs, and greater flexibility, while also possessing the ability to utilize valuable information from the prediction errors of prior models, which were previously inaccessible yet critically important. To address these challenges and fully unlock the untapped potential of flow matching, we propose Conditional Guided Flow Matching (CGFM). CGFM extends flow matching by incorporating the outputs of an auxiliary model, enabling a previously unattainable capability in the field: learning from the errors of the auxiliary model. For time series forecasting tasks, it integrates historical data as conditions and guidance, constructs two-sided conditional probability paths, and uses a general affine path to expand the space of probability paths, ultimately leading to improved predictions. Extensive experiments show that CGFM consistently enhances and outperforms state-of-the-art models, highlighting its effectiveness in advancing forecasting methods.
nan
Article 796
Title@2025-07-14 (1): Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
Title: Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition | Funktionsinduktion und Aufgabenverallgemeinerung: Eine Interpretationsstudie mit Off-by-One-Addition | 职能上岗和任务一般化:解释性研究 2507.09875v1 |
Authors (3): Qinyuan Ye, Robin Jia, Xiang Ren
Large language models demonstrate the intriguing ability to perform unseen tasks via in-context learning. However, it remains unclear what mechanisms inside the model drive such task-level generalization. In this work, we approach this question through the lens of off-by-one addition (i.e., 1+1=3, 2+2=5, 3+3=?), a two-step, counterfactual task with an unexpected +1 function as a second step. Leveraging circuit-style interpretability techniques such as path patching, we analyze the models’ internal computations behind their notable performance and present three key findings. First, we uncover a function induction mechanism that explains the model’s generalization from standard addition to off-by-one addition. This mechanism resembles the structure of the induction head mechanism found in prior work and elevates it to a higher level of abstraction. Second, we show that the induction of the +1 function is governed by multiple attention heads in parallel, each of which emits a distinct piece of the +1 function. Finally, we find that this function induction mechanism is reused in a broader range of tasks, including synthetic tasks such as shifted multiple-choice QA and algorithmic tasks such as base-8 addition. Overall, our findings offer deeper insights into how reusable and composable structures within language models enable task-level generalization.
nan
Article 797
Title@2025-07-14 (1): External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Title: External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation | Externes Large Foundation Modell: Wie Sie effizient dienen Trillionen von Parametern für Online-Anzeigen Empfehlung | 外部大型基金会模式:如何有效服务数以万计的在线咨询建议参数 2502.17494v7 |
Authors (107): Mingfu Liang, Xi Liu, Rong Jin, Boyang Liu, Qiuling Suo, Qinghai Zhou, Song Zhou, Laming Chen, Hua Zheng, Zhiyuan Li, Shali Jiang, Jiyan Yang, Xiaozhen Xia, Fan Yang, Yasmine Badr, Ellie Wen, Shuyu Xu, Hansey Chen, Zhengyu Zhang, Jade Nie, Chunzhi Yang, Zhichen Zeng, Weilin Zhang, Xingliang Huang, Qianru Li, Shiquan Wang, Evelyn Lyu, Wenjing Lu, Rui Zhang, Wenjun Wang, Jason Rudy, Mengyue Hang, Kai Wang, Yinbin Ma, Shuaiwen Wang, Sihan Zeng, Tongyi Tang, Xiaohan Wei, Longhao Jin, Jamey Zhang, Marcus Chen, Jiayi Xu, Angie Huang, Xihuan Zeng, Chi Zhang, Zhengli Zhao, Jared Yang, Qiang Jin, Xian Chen, Amit Anand Amlesahwaram, Lexi Song, Liang Luo, Yuchen Hao, Nan Xiao, Yavuz Yetim, Luoshang Pan, Gaoxiang Liu, Yuxi Hu, Yuzhen Huang, Jackie Xu, Rich Zhu, Xin Zhang, Yiqun Liu, Hang Yin, Yuxin Chen, Buyun Zhang, Xiaoyi Liu, Xingyuan Wang, Wenguang Mao, Zhijing Li, Zhehui Zhou, Feifan Gu, Qin Huang, Chonglin Sun, Nancy Yu, Shuo Gu, Shupin Mao, Benjamin Au, Jingzheng Qin, Peggy Yao, Jae-Woo Choi, Bin Gao, Ernest Wang, Lei Zhang, Wen-Yen Chen, Ted Lee, Yujie Zha, Yi Meng, Alex Gong, Edison Gao, Jack Hsueh, Jie Zheng, Alireza Vahdatpour, Yiping Han, Yantao Yao, Toshinari Kureha, Shuo Chang, Musharaf Sultan, John Bocharov, Sagar Chordia, Xiaorui Gan, Peng Sun, Rocky Liu, Bo Long, Wenlin Chen, Santanu Kolay, Huayu Li
Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications. First, training and inference budgets are restricted for the model to be served, exceeding which may incur latency and impair user experience. Second, large-volume data arrive in a streaming mode with data distributions dynamically shifting, as new users/ads join and existing users/ads leave the system. We propose the External Large Foundation Model (ExFM) framework to address the overlooked challenges. Specifically, we develop external distillation and a data augmentation system (DAS) to control the computational cost of training/inference while maintaining high performance. We design the teacher in a way like a foundation model (FM) that can serve multiple students as vertical models (VMs) to amortize its building cost. We propose Auxiliary Head and Student Adapter to mitigate the data distribution gap between FM and VMs caused by the streaming data issue. Comprehensive experiments on internal industrial-scale applications and public datasets demonstrate significant performance gain by ExFM.
nan
Article 798
Title@2025-07-14 (1): A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization
Title: A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization | Eine vollständige Verlustlandschaftsanalyse der Regularisierten Tiefenmatrix-Fabrikierung | 对正规化深母体因子化的全损全损地貌分析 2506.20344v2 |
Authors (3): Po Chen, Rujun Jiang, Peng Wang
Despite its wide range of applications across various domains, the optimization foundations of deep matrix factorization (DMF) remain largely open. In this work, we aim to fill this gap by conducting a comprehensive study of the loss landscape of the regularized DMF problem. Toward this goal, we first provide a closed-form characterization of all critical points of the problem. Building on this, we establish precise conditions under which a critical point is a local minimizer, a global minimizer, a strict saddle point, or a non-strict saddle point. Leveraging these results, we derive a necessary and sufficient condition under which every critical point is either a local minimizer or a strict saddle point. This provides insights into why gradient-based methods almost always converge to a local minimizer of the regularized DMF problem. Finally, we conduct numerical experiments to visualize its loss landscape to support our theory.
nan
Article 799
Title@2025-07-14 (1): Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks
Title: Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks | Task Priors: Verbesserung der Modellbewertung unter Berücksichtigung des gesamten Raumes von Downstream-Aufgaben | 任务前期:考虑到下游任务的全部空间,加强示范评价 2507.09871v1 |
Authors (2): Niket Patel, Randall Balestriero
The grand goal of AI research, and particularly Self Supervised Learning (SSL), is to produce systems that can successfully solve any possible task. In contrast, current evaluation methods available to AI researchers typically rely on a fixed collection of hand-picked downstream benchmarks. Hence, a large amount of effort is put into designing and searching for large collection of evaluation tasks that can serve as a proxy of our grand goal. We argue that such a rigid evaluation protocol creates a silent bottleneck in AI research. To remedy that, we define a probabilistic space of downstream tasks obtained by adopting a distribution of tasks and by defining Task Priors. Under this view, one can evaluate a model’s performance over the set of all possible downstream tasks. Our framework is the first to provide answers to key questions such as (i) what is the average performance of my model over all possible downstream tasks weighted by the probability to encounter each task? or (ii) what is the variance of my model’s performance across all downstream tasks under the defined Task Priors? Beyond establishing a new standard for evaluation, we believe that Task Priors will accelerate the pace of research in SSL - where downstream task evaluation is the sole qualitative signal that researchers have access to.
nan
Article 800
Title@2025-07-14 (1): Intersection of Reinforcement Learning and Bayesian Optimization for Intelligent Control of Industrial Processes: A Safe MPC-based DPG using Multi-Objective BO
Title: Intersection of Reinforcement Learning and Bayesian Optimization for Intelligent Control of Industrial Processes: A Safe MPC-based DPG using Multi-Objective BO | Intersektion von Verstärkungslernen und Bayesian-Optimierung zur intelligenten Steuerung industrieller Prozesse: Ein sicheres MPC-basiertes DPG mit Multi-Objective BO | 强化学习和巴耶斯优化优化对工业加工的明智控制:使用多目标BB,以MPC为基础的安全DPG 2507.09864v1 |
Authors (2): Hossein Nejatbakhsh Esfahani, Javad Mohammadpour Velni
Model Predictive Control (MPC)-based Reinforcement Learning (RL) offers a structured and interpretable alternative to Deep Neural Network (DNN)-based RL methods, with lower computational complexity and greater transparency. However, standard MPC-RL approaches often suffer from slow convergence, suboptimal policy learning due to limited parameterization, and safety issues during online adaptation. To address these challenges, we propose a novel framework that integrates MPC-RL with Multi-Objective Bayesian Optimization (MOBO). The proposed MPC-RL-MOBO utilizes noisy evaluations of the RL stage cost and its gradient, estimated via a Compatible Deterministic Policy Gradient (CDPG) approach, and incorporates them into a MOBO algorithm using the Expected Hypervolume Improvement (EHVI) acquisition function. This fusion enables efficient and safe tuning of the MPC parameters to achieve improved closed-loop performance, even under model imperfections. A numerical example demonstrates the effectiveness of the proposed approach in achieving sample-efficient, stable, and high-performance learning for control systems.
nan
Article 801
Title@2025-07-14 (1): Flows and Diffusions on the Neural Manifold
Title: Flows and Diffusions on the Neural Manifold | Strömungen und Diffusionen auf der Neuralmanifolde | 神经元层的流量和扩散 2507.10623v1 |
Authors (3): Daniel Saragih, Deyu Cao, Tejas Balaji
Diffusion and flow-based generative models have achieved remarkable success in domains such as image synthesis, video generation, and natural language modeling. In this work, we extend these advances to weight space learning by leveraging recent techniques to incorporate structural priors derived from optimization dynamics. Central to our approach is modeling the trajectory induced by gradient descent as a trajectory inference problem. We unify several trajectory inference techniques under the framework of gradient flow matching, providing a theoretical framework for treating optimization paths as inductive bias. We further explore architectural and algorithmic choices, including reward fine-tuning by adjoint matching, the use of autoencoders for latent weight representation, conditioning on task-specific context data, and adopting informative source distributions such as Kaiming uniform. Experiments demonstrate that our method matches or surpasses baselines in generating in-distribution weights, improves initialization for downstream training, and supports fine-tuning to enhance performance. Finally, we illustrate a practical application in safety-critical systems: detecting harmful covariate shifts, where our method outperforms the closest comparable baseline.
nan
Article 802
Title@2025-07-14 (1): On the Local Complexity of Linear Regions in Deep ReLU Networks
Title: On the Local Complexity of Linear Regions in Deep ReLU Networks | Über die lokale Komplexität linearer Regionen in Deep ReLU-Netzwerken | 深RELU网络线性区域局部复杂程度 2412.18283v3 |
Authors (2): Niket Patel, Guido Montufar
We define the local complexity of a neural network with continuous piecewise linear activations as a measure of the density of linear regions over an input data distribution. We show theoretically that ReLU networks that learn low-dimensional feature representations have a lower local complexity. This allows us to connect recent empirical observations on feature learning at the level of the weight matrices with concrete properties of the learned functions. In particular, we show that the local complexity serves as an upper bound on the total variation of the function over the input data distribution and thus that feature learning can be related to adversarial robustness. Lastly, we consider how optimization drives ReLU networks towards solutions with lower local complexity. Overall, this work contributes a theoretical framework towards relating geometric properties of ReLU networks to different aspects of learning such as feature learning and representation cost.
nan
Article 803
Title@2025-07-14 (1): REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models
Title: REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models | REINFORCE++: Effizienter RLHF-Algorithmus mit Robustheit sowohl für Prompt- als auch für Reward-Modelle | REINFORCE++: 高效的RLHF对快速模型和奖励模型具有强力的测算法 2501.03262v6 |
Authors (3): Jian Hu, Jason Klein Liu, Wei Shen
Reinforcement Learning from Human Feedback (RLHF) plays a crucial role in aligning large language models (LLMs) with human values and preferences. While state-of-the-art applications like ChatGPT/GPT-4 commonly employ Proximal Policy Optimization (PPO), the inclusion of a critic network introduces significant computational overhead. REINFORCE-based methods, such as REINFORCE Leave One-Out (RLOO), ReMax, and Group Relative Policy Optimization (GRPO), address this limitation by eliminating the critic network. However, these approaches face challenges in accurate advantage estimation. Specifically, they estimate advantages independently for responses to each prompt, which can lead to overfitting on simpler prompts and vulnerability to reward hacking. To address these challenges, we introduce REINFORCE++, a novel approach that removes the critic model while using the normalized reward of a batch as the baseline. Our empirical evaluation demonstrates that REINFORCE++ exhibits robust performance across various reward models without requiring prompt set truncation. Furthermore, it achieves superior generalization in both RLHF and long chain-of-thought (CoT) settings compared to existing REINFORCE-based methods. The implementation is available at https://github.com/OpenRLHF/OpenRLHF.
nan
Article 804
Title@2025-07-14 (1): CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation
Title: CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation | CRISP-SAM2: SAM2 mit Cross-Modal Interaction und semantischer Prompting für Multi-Organ Segmentierung | CRIISP-SAM2:SAM2 具有跨模式相互作用和跨组织分解的语义提示的SAM2 2506.23121v3 |
Authors (8): Xinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge
Multi-organ medical segmentation is a crucial component of medical image processing, essential for doctors to make accurate diagnoses and develop effective treatment plans. Despite significant progress in this field, current multi-organ segmentation models often suffer from inaccurate details, dependence on geometric prompts and loss of spatial information. Addressing these challenges, we introduce a novel model named CRISP-SAM2 with CRoss-modal Interaction and Semantic Prompting based on SAM2. This model represents a promising approach to multi-organ medical segmentation guided by textual descriptions of organs. Our method begins by converting visual and textual inputs into cross-modal contextualized semantics using a progressive cross-attention interaction mechanism. These semantics are then injected into the image encoder to enhance the detailed understanding of visual information. To eliminate reliance on geometric prompts, we use a semantic prompting strategy, replacing the original prompt encoder to sharpen the perception of challenging targets. In addition, a similarity-sorting self-updating strategy for memory and a mask-refining process is applied to further adapt to medical imaging and enhance localized details. Comparative experiments conducted on seven public datasets indicate that CRISP-SAM2 outperforms existing models. Extensive analysis also demonstrates the effectiveness of our method, thereby confirming its superior performance, especially in addressing the limitations mentioned earlier. Our code is available at: https://github.com/YU-deep/CRISP_SAM2.git.
nan
Article 805
Title@2025-07-14 (1): Spectral Feature Extraction for Robust Network Intrusion Detection Using MFCCs
Title: Spectral Feature Extraction for Robust Network Intrusion Detection Using MFCCs | Spektrale Feature-Extraktion für robuste Netzwerkintrusionserkennung mit MFCCs | 利用 MFCCs 进行强力网络入侵探测的光谱特征采掘 2507.10622v1 |
Authors (3): HyeYoung Lee, Muhammad Nadeem, Pavel Tsoi
The rapid expansion of Internet of Things (IoT) networks has led to a surge in security vulnerabilities, emphasizing the critical need for robust anomaly detection and classification techniques. In this work, we propose a novel approach for identifying anomalies in IoT network traffic by leveraging the Mel-frequency cepstral coefficients (MFCC) and ResNet-18, a deep learning model known for its effectiveness in feature extraction and image-based tasks. Learnable MFCCs enable adaptive spectral feature representation, capturing the temporal patterns inherent in network traffic more effectively than traditional fixed MFCCs. We demonstrate that transforming raw signals into MFCCs maps the data into a higher-dimensional space, enhancing class separability and enabling more effective multiclass classification. Our approach combines the strengths of MFCCs with the robust feature extraction capabilities of ResNet-18, offering a powerful framework for anomaly detection. The proposed model is evaluated on three widely used IoT intrusion detection datasets: CICIoT2023, NSL-KDD, and IoTID20. The experimental results highlight the potential of integrating adaptive signal processing techniques with deep learning architectures to achieve robust and scalable anomaly detection in heterogeneous IoT network landscapes.
nan
Article 806
Title@2025-07-14 (1): A General Framework for Inference-time Scaling and Steering of Diffusion Models
Title: A General Framework for Inference-time Scaling and Steering of Diffusion Models | Ein allgemeiner Rahmen für Schlussfolgerungs-Zeit-Skalierung und Steuerung von Diffusionsmodellen | 传播模型的推推时间缩放和引导总框架 2501.06848v4 |
Authors (7): Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, Rajesh Ranganath
Diffusion models produce impressive results in modalities ranging from images and video to protein design and text. However, generating samples with user-specified properties remains a challenge. Recent research proposes fine-tuning models to maximize rewards that capture desired properties, but these methods require expensive training and are prone to mode collapse. In this work, we present Feynman-Kac (FK) steering, an inference-time framework for steering diffusion models with reward functions. FK steering works by sampling a system of multiple interacting diffusion processes, called particles, and resampling particles at intermediate steps based on scores computed using functions called potentials. Potentials are defined using rewards for intermediate states and are selected such that a high value indicates that the particle will yield a high-reward sample. We explore various choices of potentials, intermediate rewards, and samplers. We evaluate FK steering on text-to-image and text diffusion models. For steering text-to-image models with a human preference reward, we find that FK steering a 0.8B parameter model outperforms a 2.6B parameter fine-tuned model on prompt fidelity, with faster sampling and no training. For steering text diffusion models with rewards for text quality and specific text attributes, we find that FK steering generates lower perplexity, more linguistically acceptable outputs and enables gradient-free control of attributes like toxicity. Our results demonstrate that inference-time scaling and steering of diffusion models - even with off-the-shelf rewards - can provide significant sample quality gains and controllability benefits. Code is available at https://github.com/zacharyhorvitz/Fk-Diffusion-Steering .
nan
Article 807
Title@2025-07-14 (1): A Data-Driven Review of Remote Sensing-Based Data Fusion in Precision Agriculture from Foundational to Transformer-Based Techniques
Title: A Data-Driven Review of Remote Sensing-Based Data Fusion in Precision Agriculture from Foundational to Transformer-Based Techniques | Eine datengestützte Überprüfung von Fernerkundungsbasierter Datenfusion in der Präzisions-Landwirtschaft von der Grundlagen- bis zur Transformer-basierten Technik | 对精密农业中从基础技术到变换技术的遥感数据融合的数据驱动审查 2410.18353v2 |
Authors (6): Mahdi Saki, Rasool Keshavarz, Daniel Franklin, Mehran Abolhasan, Justin Lipman, Negin Shariati
This review explores recent advancements in data fusion techniques and Transformer-based remote sensing applications in precision agriculture. Using a systematic, data-driven approach, we analyze research trends from 1994 to 2024, identifying key developments in data fusion, remote sensing, and AI-driven agricultural monitoring. While traditional machine learning and deep learning approaches have demonstrated effectiveness in agricultural decision-making, challenges such as limited scalability, suboptimal feature extraction, and reliance on extensive labeled data persist. This study examines the comparative advantages of Transformer-based fusion methods, particularly their ability to model spatiotemporal dependencies and integrate heterogeneous datasets for applications in soil analysis, crop classification, yield prediction, and disease detection. A comparative analysis of multimodal data fusion approaches is conducted, evaluating data types, fusion techniques, and remote sensing platforms. We demonstrate how Transformers outperform conventional models by enhancing prediction accuracy, mitigating feature redundancy, and optimizing large-scale data integration. Furthermore, we propose a structured roadmap for implementing data fusion in agricultural remote sensing, outlining best practices for ground-truth data selection, platform integration, and fusion model design. By addressing key research gaps and providing a strategic framework, this review offers valuable insights for advancing precision agriculture through AI-driven data fusion techniques.
nan
Article 808
Title@2025-07-14 (1): Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
Title: Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training | Through the River: Den Nutzen planfreier Methoden für das Sprachmodelltraining verstehen | 通过河道:了解语文示范培训的无附条件方法的益处 2507.09846v1 |
Authors (4): Minhak Song, Beomhan Baek, Kwangjun Ahn, Chulhee Yun
As both model and dataset sizes continue to scale rapidly, conventional pretraining strategies with fixed compute budgets-such as cosine learning rate schedules-are increasingly inadequate for large-scale training. Recent alternatives, including warmup-stable-decay (WSD) schedules and weight averaging, offer greater flexibility. However, WSD relies on explicit decay phases to track progress, while weight averaging addresses this limitation at the cost of additional memory. In search of a more principled and scalable alternative, we revisit the Schedule-Free (SF) method [Defazio et al., 2024], which has shown strong empirical performance across diverse settings. We show that SF-AdamW effectively navigates the “river” structure of the loss landscape without decay phases or auxiliary averaging, making it particularly suitable for continuously scaling training workloads. To understand this behavior, we conduct a theoretical and empirical analysis of SF dynamics, revealing that it implicitly performs weight averaging without memory overhead. Guided by this analysis, we propose a refined variant of SF that improves robustness to momentum and performs better under large batch sizes, addressing key limitations of the original method. Together, these results establish SF as a practical, scalable, and theoretically grounded approach for language model training.
nan
Article 809
Title@2025-07-14 (1): Spurious Stationarity and Hardness Results for Bregman Proximal-Type Algorithms
Title: Spurious Stationarity and Hardness Results for Bregman Proximal-Type Algorithms | Puristische Stationarität und Härte Ergebnisse für Bregman Proximale Algorithmen | Bregman Proximal-Type 的纯净持久性和硬性结果 2404.08073v2 |
Authors (3): He Chen, Jiajin Li, Anthony Man-Cho So
Bregman proximal-type algorithms (BPs), such as mirror descent, have become popular tools in machine learning and data science for exploiting problem structures through non-Euclidean geometries. In this paper, we show that BPs can get trapped near a class of non-stationary points, which we term spurious stationary points. Such stagnation can persist for any finite number of iterations if the gradient of the Bregman kernel is not Lipschitz continuous, even in convex problems. The root cause lies in a fundamental contrast in descent behavior between Euclidean and Bregman geometries: While Euclidean gradient descent ensures sufficient decrease near any non-stationary point, BPs may exhibit arbitrarily slow decrease around spurious stationary points. As a result, commonly used Bregman-based stationarity measure, such as relative change in terms of Bregman divergence, can vanish near spurious stationary points. This may misleadingly suggest convergence, even when the iterates remain far from any true stationary point. Our analysis further reveals that spurious stationary points are not pathological, but rather occur generically in a broad class of nonconvex problems with polyhedral constraints. Taken together, our findings reveal a serious blind spot in Bregman-based optimization methods and calls for new theoretical tools and algorithmic safeguards to ensure reliable convergence.
nan
Article 810
Title@2025-07-14 (1): Dataset Distillation-based Hybrid Federated Learning on Non-IID Data
Title: Dataset Distillation-based Hybrid Federated Learning on Non-IID Data | Datensatz Destillationsbasiertes Hybrid-Federated-Learning auf nicht-ID-Daten | 基于数据提炼的关于非统计数据数据的混合联邦学习 2409.17517v2 |
Authors (8): Xiufang Shi, Wei Zhang, Mincheng Wu, Guangyi Liu, Zhenyu Wen, Shibo He, Tejal Shah, Rajiv Ranjan
With the development of edge computing, Federated Learning (FL) has emerged as a promising solution for the intelligent Internet of Things (IoT). However, applying FL in mobile edge-cloud networks is greatly challenged by statistical heterogeneity and high communication overhead. To address it, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate approximately independent and equally distributed (IID) data, thereby improving the performance of model training. In particular, we partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced while the data labels among different clusters are balanced. The cluster heads collect distilled data from the corresponding cluster members, and conduct model training in collaboration with the server. This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of non-IID data on model training. We perform a comprehensive analysis of the convergence behavior, communication overhead, and computational complexity of the proposed HFLDD. Extensive experimental results based on multiple public datasets demonstrate that when data labels are severely imbalanced, the proposed HFLDD outperforms the baseline methods in terms of both test accuracy and communication cost.
nan
Article 811
Title@2025-07-14 (1): Subgroups Matter for Robust Bias Mitigation
Title: Subgroups Matter for Robust Bias Mitigation | Untergruppen Materie für robuste Bias Mitigation | 稳健的Biust Bias 减轻风险的分组事项 2505.21363v3 |
Authors (4): Anissa Alloula, Charles Jones, Ben Glocker, Bartłomiej W. Papież
Despite the constant development of new bias mitigation methods for machine learning, no method consistently succeeds, and a fundamental question remains unanswered: when and why do bias mitigation techniques fail? In this paper, we hypothesise that a key factor may be the often-overlooked but crucial step shared by many bias mitigation methods: the definition of subgroups. To investigate this, we conduct a comprehensive evaluation of state-of-the-art bias mitigation methods across multiple vision and language classification tasks, systematically varying subgroup definitions, including coarse, fine-grained, intersectional, and noisy subgroups. Our results reveal that subgroup choice significantly impacts performance, with certain groupings paradoxically leading to worse outcomes than no mitigation at all. Our findings suggest that observing a disparity between a set of subgroups is not a sufficient reason to use those subgroups for mitigation. Through theoretical analysis, we explain these phenomena and uncover a counter-intuitive insight that, in some cases, improving fairness with respect to a particular set of subgroups is best achieved by using a different set of subgroups for mitigation. Our work highlights the importance of careful subgroup definition in bias mitigation and presents it as an alternative lever for improving the robustness and fairness of machine learning models.
nan
Article 812
Title@2025-07-14 (1): Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs
Title: Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs | Rethinking Prompt Optimization: Verstärkung, Diversifizierung und Migration in Blackbox LLMs | 重新思考即时优化:加强、多样化和黑盒LMS中的移民 2507.09839v1 |
Authors (4): MohammadReza Davari, Utkarsh Garg, Weixin Cai, Eugene Belilovsky
An increasing number of NLP applications interact with large language models (LLMs) through black-box APIs, making prompt engineering critical for controlling model outputs. While recent Automatic Prompt Optimization (APO) methods iteratively refine prompts using model-generated feedback, textual gradients, they primarily focus on error correction and neglect valuable insights from correct predictions. This limits both their effectiveness and efficiency. In this paper, we propose a novel APO framework centered on enhancing the feedback mechanism. We reinterpret the textual gradient as a form of negative reinforcement and introduce the complementary positive reinforcement to explicitly preserve beneficial prompt components identified through successful predictions. To mitigate the noise inherent in LLM-generated feedback, we introduce a technique called feedback diversification, which aggregates multiple feedback signals, emphasizing consistent, actionable advice while filtering out outliers. Motivated by the rapid evolution and diversity of available LLMs, we also formalize Continual Prompt Optimization (CPO), addressing the practical challenge of efficiently migrating optimized prompts between different model versions or API providers. Our experiments reveal that naive prompt migration often degrades performance due to loss of critical instructions. In contrast, our approach consistently outperforms strong baselines, achieving significant accuracy improvements, faster convergence, and lower computational costs in both standard and migration scenarios.
nan
Article 813
Title@2025-07-14 (1): A Pre-training Framework for Relational Data with Information-theoretic Principles
Title: A Pre-training Framework for Relational Data with Information-theoretic Principles | Ein Vorausbildungsrahmen für relationale Daten mit informationstheoretischen Prinzipien | 带有信息理论原则的关系数据培训前框架 2507.09837v1 |
Authors (6): Quang Truong, Zhikai Chen, Mingxuan Ju, Tong Zhao, Neil Shah, Jiliang Tang
Relational databases underpin critical infrastructure across a wide range of domains, yet the design of generalizable pre-training strategies for learning from relational databases remains an open challenge due to task heterogeneity. Specifically, there exist infinitely many possible downstream tasks, as tasks are defined based on relational schema graphs, temporal dependencies, and SQL-defined label logics. An effective pre-training framework is desired to take these factors into account in order to obtain task-aware representations. By incorporating knowledge of the underlying distribution that drives label generation, downstream tasks can benefit from relevant side-channel information. To bridge this gap, we introduce Task Vector Estimation (TVE), a novel pre-training framework that constructs predictive supervisory signals via set-based aggregation over schema traversal graphs, explicitly modeling next-window relational dynamics. We formalize our approach through an information-theoretic lens, demonstrating that task-informed representations retain more relevant signals than those obtained without task priors. Extensive experiments on the RelBench benchmark show that TVE consistently outperforms traditional pre-training baselines. Our findings advocate for pre-training objectives that encode task heterogeneity and temporal structure as design principles for predictive modeling on relational databases.
nan
Article 814
Title@2025-07-14 (1): Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems
Title: Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems | Multi-Residual Mixture of Experts Learning for Cooperative Control in Multi-Vehicle Systems | 多车辆系统合作控制专家学习 2507.09836v1 |
Authors (4): Vindula Jayawardana, Sirui Li, Yashar Farid, Cathy Wu
Autonomous vehicles (AVs) are becoming increasingly popular, with their applications now extending beyond just a mode of transportation to serving as mobile actuators of a traffic flow to control flow dynamics. This contrasts with traditional fixed-location actuators, such as traffic signals, and is referred to as Lagrangian traffic control. However, designing effective Lagrangian traffic control policies for AVs that generalize across traffic scenarios introduces a major challenge. Real-world traffic environments are highly diverse, and developing policies that perform robustly across such diverse traffic scenarios is challenging. It is further compounded by the joint complexity of the multi-agent nature of traffic systems, mixed motives among participants, and conflicting optimization objectives subject to strict physical and external constraints. To address these challenges, we introduce Multi-Residual Mixture of Expert Learning (MRMEL), a novel framework for Lagrangian traffic control that augments a given suboptimal nominal policy with a learned residual while explicitly accounting for the structure of the traffic scenario space. In particular, taking inspiration from residual reinforcement learning, MRMEL augments a suboptimal nominal AV control policy by learning a residual correction, but at the same time dynamically selects the most suitable nominal policy from a pool of nominal policies conditioned on the traffic scenarios and modeled as a mixture of experts. We validate MRMEL using a case study in cooperative eco-driving at signalized intersections in Atlanta, Dallas Fort Worth, and Salt Lake City, with real-world data-driven traffic scenarios. The results show that MRMEL consistently yields superior performance-achieving an additional 4%-9% reduction in aggregate vehicle emissions relative to the strongest baseline in each setting.
nan
Article 815
Title@2025-07-13 (7): Generative Cognitive Diagnosis
Title: Generative Cognitive Diagnosis | Generative Kognitive Diagnose | 认知诊断 2507.09831v1 |
Authors (3): Jiatong Li, Qi Liu, Mengxiao Zhu
Cognitive diagnosis (CD) models latent cognitive states of human learners by analyzing their response patterns on diagnostic tests, serving as a crucial machine learning technique for educational assessment and evaluation. Traditional cognitive diagnosis models typically follow a transductive prediction paradigm that optimizes parameters to fit response scores and extract learner abilities. These approaches face significant limitations as they cannot perform instant diagnosis for new learners without computationally expensive retraining and produce diagnostic outputs with limited reliability. In this study, we introduces a novel generative diagnosis paradigm that fundamentally shifts CD from predictive to generative modeling, enabling inductive inference of cognitive states without parameter re-optimization. We propose two simple yet effective instantiations of this paradigm: Generative Item Response Theory (G-IRT) and Generative Neural Cognitive Diagnosis Model (G-NCDM), which achieve excellent performance improvements over traditional methods. The generative approach disentangles cognitive state inference from response prediction through a well-designed generation process that incorporates identifiability and monotonicity conditions. Extensive experiments on real-world datasets demonstrate the effectiveness of our methodology in addressing scalability and reliability challenges, especially $\times 100$ speedup for the diagnosis of new learners. Our framework opens new avenues for cognitive diagnosis applications in artificial intelligence, particularly for intelligent model evaluation and intelligent education systems. The code is available at https://github.com/CSLiJT/Generative-CD.git.
nan
Article 816
Title@2025-07-13 (7): Hierarchical Abstraction Enables Human-Like 3D Object Recognition in Deep Learning Models
Title: Hierarchical Abstraction Enables Human-Like 3D Object Recognition in Deep Learning Models | Hierarchische Abstraktion ermöglicht die Erkennung von Menschen wie 3D-Objekten in Deep Learning-Modellen | 在深学习模型中,等级式抽象抽象化使人类能够识别3D等3D对象 2507.09830v1 |
Authors (3): Shuhao Fu, Philip J. Kellman, Hongjing Lu
Both humans and deep learning models can recognize objects from 3D shapes depicted with sparse visual information, such as a set of points randomly sampled from the surfaces of 3D objects (termed a point cloud). Although deep learning models achieve human-like performance in recognizing objects from 3D shapes, it remains unclear whether these models develop 3D shape representations similar to those used by human vision for object recognition. We hypothesize that training with 3D shapes enables models to form representations of local geometric structures in 3D shapes. However, their representations of global 3D object shapes may be limited. We conducted two human experiments systematically manipulating point density and object orientation (Experiment 1), and local geometric structure (Experiment 2). Humans consistently performed well across all experimental conditions. We compared two types of deep learning models, one based on a convolutional neural network (DGCNN) and the other on visual transformers (point transformer), with human performance. We found that the point transformer model provided a better account of human performance than the convolution-based model. The advantage mainly results from the mechanism in the point transformer model that supports hierarchical abstraction of 3D shapes.
nan
Article 817
Title@2025-07-13 (7): LLMs Meet Cross-Modal Time Series Analytics: Overview and Directions
Title: LLMs Meet Cross-Modal Time Series Analytics: Overview and Directions | LLMs treffen auf Cross-Modal Time Series Analytics: Übersicht und Anfahrt | 跨模式时间序列分析分析:概览和方向 2507.10620v1 |
Authors (6): Chenxi Liu, Hao Miao, Cheng Long, Yan Zhao, Ziyue Li, Panos Kalnis
Large Language Models (LLMs) have emerged as a promising paradigm for time series analytics, leveraging their massive parameters and the shared sequential nature of textual and time series data. However, a cross-modality gap exists between time series and textual data, as LLMs are pre-trained on textual corpora and are not inherently optimized for time series. In this tutorial, we provide an up-to-date overview of LLM-based cross-modal time series analytics. We introduce a taxonomy that classifies existing approaches into three groups based on cross-modal modeling strategies, e.g., conversion, alignment, and fusion, and then discuss their applications across a range of downstream tasks. In addition, we summarize several open challenges. This tutorial aims to expand the practical application of LLMs in solving real-world problems in cross-modal time series analytics while balancing effectiveness and efficiency. Participants will gain a thorough understanding of current advancements, methodologies, and future research directions in cross-modal time series analytics.
nan
Article 818
Title@2025-07-13 (7): Conditional Data Synthesis Augmentation
Title: Conditional Data Synthesis Augmentation | Bedingte Daten Synthese Augmentation | 有条件数据合成增强 2504.07426v2 |
Authors (2): Xinyu Tian, Xiaotong Shen
Reliable machine learning and statistical analysis rely on diverse, well-distributed training data. However, real-world datasets are often limited in size and exhibit underrepresentation across key subpopulations, leading to biased predictions and reduced performance, particularly in supervised tasks such as classification. To address these challenges, we propose Conditional Data Synthesis Augmentation (CoDSA), a novel framework that leverages generative models, such as diffusion models, to synthesize high-fidelity data for improving model performance across multimodal domains including tabular, textual, and image data. CoDSA generates synthetic samples that faithfully capture the conditional distributions of the original data, with a focus on under-sampled or high-interest regions. Through transfer learning, CoDSA fine-tunes pre-trained generative models to enhance the realism of synthetic data and increase sample density in sparse areas. This process preserves inter-modal relationships, mitigates data imbalance, improves domain adaptation, and boosts generalization. We also introduce a theoretical framework that quantifies the statistical accuracy improvements enabled by CoDSA as a function of synthetic sample volume and targeted region allocation, providing formal guarantees of its effectiveness. Extensive experiments demonstrate that CoDSA consistently outperforms non-adaptive augmentation strategies and state-of-the-art baselines in both supervised and unsupervised settings.
nan
Article 819
Title@2025-07-13 (7): Bridging Neural Networks and Dynamic Time Warping for Adaptive Time Series Classification
Title: Bridging Neural Networks and Dynamic Time Warping for Adaptive Time Series Classification | Überbrückung von Neuronalen Netzwerken und dynamisches Zeitwarping für adaptive Zeitreihenklassifikation | 架桥神经网络和适应性时间序列分类动态时间调整 2507.09826v1 |
Authors (4): Jintao Qu, Zichong Wang, Chenhao Wu, Wenbin Zhang
Neural networks have achieved remarkable success in time series classification, but their reliance on large amounts of labeled data for training limits their applicability in cold-start scenarios. Moreover, they lack interpretability, reducing transparency in decision-making. In contrast, dynamic time warping (DTW) combined with a nearest neighbor classifier is widely used for its effectiveness in limited-data settings and its inherent interpretability. However, as a non-parametric method, it is not trainable and cannot leverage large amounts of labeled data, making it less effective than neural networks in rich-resource scenarios. In this work, we aim to develop a versatile model that adapts to cold-start conditions and becomes trainable with labeled data, while maintaining interpretability. We propose a dynamic length-shortening algorithm that transforms time series into prototypes while preserving key structural patterns, thereby enabling the reformulation of the DTW recurrence relation into an equivalent recurrent neural network. Based on this, we construct a trainable model that mimics DTW’s alignment behavior. As a neural network, it becomes trainable when sufficient labeled data is available, while still retaining DTW’s inherent interpretability. We apply the model to several benchmark time series classification tasks and observe that it significantly outperforms previous approaches in low-resource settings and remains competitive in rich-resource settings.
nan
Article 820
Title@2025-07-13 (7): Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization
Title: Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization | Beyond Multiple Choice: Bewertung von Steuerungsvektoren für adaptive Freiform-Zusammenfassung | 超越多重选择:评估适应性自由形式总结指导矢量 2505.24859v2 |
Authors (3): Joschka Braun, Carsten Eickhoff, Seyed Ali Bahrainian
Steering vectors are a lightweight method for controlling text properties by adding a learned bias to language model activations at inference time. So far, steering vectors have predominantly been evaluated in multiple-choice settings, while their effectiveness in free-form generation tasks remains understudied. Moving “Beyond Multiple Choice,” we thoroughly evaluate the effectiveness of steering vectors in adaptively controlling topical focus, sentiment, toxicity, and readability in abstractive summaries of the NEWTS dataset. We find that steering effectively controls the targeted summary properties, but high steering strengths consistently degrade both intrinsic and extrinsic text quality. Compared to steering, prompting offers weaker control, while preserving text quality. Combining steering and prompting yields the strongest control over text properties and offers the most favorable efficacy-quality trade-off at moderate steering strengths. Our results underscore the practical trade-off between control strength and text quality preservation when applying steering vectors to free-form generation tasks.
nan
Article 821
Title@2025-07-13 (7): Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding
Title: Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding | Annäherung an Ratenverzerrungsgrenzen bei Neuralkompression mit Lattice Transform Coding | 采用拉蒂节变换编码,在神经压缩中接近比率扭曲限制 2403.07320v2 |
Authors (3): Eric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
Neural compression has brought tremendous progress in designing lossy compressors with good rate-distortion (RD) performance at low complexity. Thus far, neural compression design involves transforming the source to a latent vector, which is then rounded to integers and entropy coded. While this approach has been shown to be optimal on a few specific sources, we show that it can be highly sub-optimal on synthetic sources whose intrinsic dimensionality is greater than one. With integer rounding in the latent space, the quantization regions induced by neural transformations, remain square-like and fail to match those of optimal vector quantization. We demonstrate that this phenomenon is due to the choice of scalar quantization in the latent space, and not the transform design. By employing lattice quantization instead, we propose Lattice Transform Coding (LTC) and show that it approximately recovers optimal vector quantization at reasonable complexity. On real-world sources, LTC improves upon standard neural compressors. LTC also provides a framework that can integrate structurally (near) optimal information-theoretic designs into lossy compression; examples include block coding, which yields coding gain over optimal one-shot coding and approaches the asymptotically-achievable rate-distortion function, as well as nested lattice quantization for low complexity fixed-rate coding.
nan
Article 822
Title@2025-07-13 (7): Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
Title: Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization | Nesterov findet GRAAL: Optimale und adaptive Gradienten-Methode zur Convex-Optimierung | Nesterov Finds GRAAL: 最优化和适应性梯度法 2507.09823v1 |
Authors (2): Ekaterina Borodich, Dmitry Kovalev
In this paper, we focus on the problem of minimizing a continuously differentiable convex objective function $\min_x f(x)$. Recently, several adaptive gradient methods, including GRAAL (Malitsky, 2020), have been developed. These methods estimate the local curvature of the objective function to compute stepsizes, attain the standard convergence rate $\mathcal{O}(1/k)$ of fixed-stepsize gradient descent for Lipschitz-smooth functions, and do not require any line search procedures or hyperparameter tuning. However, a natural question arises: is it possible to accelerate the convergence of these algorithms to match the optimal rate $\mathcal{O}(1/k^2)$ of the accelerated gradient descent of Nesterov (1983)? Although some attempts have been made (Li and Lan, 2023), the capabilities of the existing accelerated algorithms to adapt to the curvature of the objective function are highly limited. Consequently, we provide a positive answer to this question and develop GRAAL with Nesterov acceleration. We prove that our algorithm achieves the desired optimal convergence rate for Lipschitz smooth functions. Moreover, in contrast to existing methods, it does so with an arbitrary, even excessively small, initial stepsize at the cost of a logarithmic additive term in the iteration complexity.
nan
Article 823
Title@2025-07-13 (7): Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing
Title: Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing | Entwirren der komplexen Multiplexed DIA Spectra in De Novo Peptide Sequenzierung | 拆分新佩普迪德省复杂的多氧化DIA分层 2411.15684v5 |
Authors (8): Zheng Ma, Zeping Mao, Ruixue Zhang, Jiazhen Chen, Lei Xin, Paul Shan, Ali Ghodsi, Ming Li
Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying data quality. We present a new deep learning method DIANovo, and address each of these difficulties, and improves the previous established system DeepNovo-DIA by from 34% to 108%, averaging 50%, for amino acid recall, and by from 32% to 83%, averaging 57%, for peptide recall, by equipping the model with a deeper understanding of coeluted DIA spectra. This paper also provides criteria about when DIA data could be used for de novo peptide sequencing and when not to by providing a comparison between DDA and DIA, in both de novo and database search mode. We find that while DIA excels with narrow isolation windows on older-generation instruments, it loses its advantage with wider windows. However, with Orbitrap Astral, DIA consistently outperforms DDA due to narrow window mode enabled. We also provide a theoretical explanation of this phenomenon, emphasizing the critical role of the signal-to-noise profile in the successful application of de novo sequencing.
nan
Article 824
Title@2025-07-13 (7): Coupled Entropy: A Goldilocks Generalization for Nonextensive Statistical Mechanics
Title: Coupled Entropy: A Goldilocks Generalization for Nonextensive Statistical Mechanics | Gepaarte Entropie: Verallgemeinerung von Goldilocks für Nonextensive Statistical Mechanics | Goldilocks 通用非广延性统计机械学 2506.17229v2 |
Authors (1): Kenric P. Nelson
Evidence is presented that the accuracy of Nonextensive Statistical Mechanics framework is improved using the coupled entropy, which carefully establishes the physical measures of complex systems. While Nonextensive Statistical Mechanics (NSM) has developed into a powerful toolset, questions have persisted as to how to evaluate whether its proposed solutions properly characterize the uncertainty of heavy-tailed distributions. The entropy of the generalized Pareto distribution (GPD) is $1+\kappa+\ln\sigma$, where $\kappa$ is the shape or nonlinear coupling and $\sigma$ is the scale. A generalized entropy should retain the uncertainty due to the scale, while minimizing the dependence of the nonlinear coupling. The Tsallis entropy of the GPD instead subtracts a function of the inverse-scale and converges to one as $\kappa\rightarrow\infty$. Colloquially, the Tsallis entropy is too cold. The normalized Tsallis entropy (NTE) rectifies the positive dependence on the scale but introduces a nonlinear term multiplying the scale and the coupling, making it too hot. The coupled entropy measures the uncertainty of the GPD to be $1+\ln_\frac{\kappa}{1+\kappa}\sigma=1+\frac{1+\kappa}{\kappa}(\sigma^\frac{\kappa}{1+\kappa}-1)$, which converges to $\sigma$ as $\kappa\rightarrow\infty$. One could say, the coupled entropy allows scientists, engineers, and analysts to eat their porridge, confident that its measure of uncertainty reflects the mathematical physics of the scale of non-exponential distributions while minimizing the dependence on the shape or nonlinear coupling. The training of the coupled variational autoencoder is an example of the unique ability of the coupled entropy to improve the performance of complex systems.
nan
Article 825
Title@2025-07-13 (7): Compressed Computation: Dense Circuits in a Toy Model of the Universal-AND Problem
Title: Compressed Computation: Dense Circuits in a Toy Model of the Universal-AND Problem | Komprimierte Berechnung: Dichte Schaltungen in einem Spielzeugmodell des Universal-AND-Problems | 压缩计算:普遍问题玩具模型中的密集电路 2507.09816v1 |
Authors (1): Adam Newgas
Neural networks are capable of superposition – representing more features than there are dimensions. Recent work considers the analogous concept for computation instead of storage, proposing theoretical constructions. But there has been little investigation into whether these circuits can be learned in practice. In this work, we investigate a toy model for the Universal-AND problem which computes the AND of all $m\choose 2$ pairs of $m$ sparse inputs. The hidden dimension that determines the number of non-linear activations is restricted to pressure the model to find a compute-efficient circuit, called compressed computation. We find that the training process finds a simple solution that does not correspond to theoretical constructions. It is fully dense – every neuron contributes to every output. The solution circuit naturally scales with dimension, trading off error rates for neuron efficiency. It is similarly robust to changes in sparsity and other key parameters, and extends naturally to other boolean operations and boolean circuits. We explain the found solution in detail and compute why it is more efficient than the theoretical constructions at low sparsity. Our findings shed light on the types of circuits that models like to form and the flexibility of the superposition representation. This contributes to a broader understanding of network circuitry and interpretability.
nan
Article 826
Title@2025-07-13 (7): Interpretable Time Series Autoregression for Periodicity Quantification
Title: Interpretable Time Series Autoregression for Periodicity Quantification | Verdolmetschbare Zeitreihen Autoregression für Periodizitätsquantifizierung | 周期量化的自动递减 2506.22895v2 |
Authors (5): Xinyu Chen, Vassilis Digalakis Jr, Lijun Ding, Dingyi Zhuang, Jinhua Zhao
Time series autoregression (AR) is a classical tool for modeling auto-correlations and periodic structures in real-world systems. We revisit this model from an interpretable machine learning perspective by introducing sparse autoregression (SAR), where $\ell_0$-norm constraints are used to isolate dominant periodicities. We formulate exact mixed-integer optimization (MIO) approaches for both stationary and non-stationary settings and introduce two scalable extensions: a decision variable pruning (DVP) strategy for temporally-varying SAR (TV-SAR), and a two-stage optimization scheme for spatially- and temporally-varying SAR (STV-SAR). These models enable scalable inference on real-world spatiotemporal datasets. We validate our framework on large-scale mobility and climate time series. On NYC ridesharing data, TV-SAR reveals interpretable daily and weekly cycles as well as long-term shifts due to COVID-19. On climate datasets, STV-SAR uncovers the evolving spatial structure of temperature and precipitation seasonality across four decades in North America and detects global sea surface temperature dynamics, including El Ni~no. Together, our results demonstrate the interpretability, flexibility, and scalability of sparse autoregression for periodicity quantification in complex time series.
nan
Article 827
Title@2025-07-13 (7): Federated Learning with Graph-Based Aggregation for Traffic Forecasting
Title: Federated Learning with Graph-Based Aggregation for Traffic Forecasting | Föderiertes Lernen mit Graphen-basierter Aggregation für Verkehrsprognosen | 使用基于图表的交通流量预测汇总的联邦学习 2507.09805v1 |
Authors (3): Audri Banik, Glaucio Haroldo Silva de Carvalho, Renata Dividino
In traffic prediction, the goal is to estimate traffic speed or flow in specific regions or road segments using historical data collected by devices deployed in each area. Each region or road segment can be viewed as an individual client that measures local traffic flow, making Federated Learning (FL) a suitable approach for collaboratively training models without sharing raw data. In centralized FL, a central server collects and aggregates model updates from multiple clients to build a shared model while preserving each client’s data privacy. Standard FL methods, such as Federated Averaging (FedAvg), assume that clients are independent, which can limit performance in traffic prediction tasks where spatial relationships between clients are important. Federated Graph Learning methods can capture these dependencies during server-side aggregation, but they often introduce significant computational overhead. In this paper, we propose a lightweight graph-aware FL approach that blends the simplicity of FedAvg with key ideas from graph learning. Rather than training full models, our method applies basic neighbourhood aggregation principles to guide parameter updates, weighting client models based on graph connectivity. This approach captures spatial relationships effectively while remaining computationally efficient. We evaluate our method on two benchmark traffic datasets, METR-LA and PEMS-BAY, and show that it achieves competitive performance compared to standard baselines and recent graph-based federated learning techniques.
nan
Article 828
Title@2025-07-13 (7): Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks
Title: Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks | Meta-Reinforcement-Lernen für schnelle und dateneffiziente Frequenzallokation in dynamischen drahtlosen Netzwerken | 动态无线网络快速和数据有效频谱分配元加强学习 2507.10619v1 |
Authors (5): Oluwaseyi Giwa, Tobi Awodunmila, Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Ali Jamshed
The dynamic allocation of spectrum in 5G / 6G networks is critical to efficient resource utilization. However, applying traditional deep reinforcement learning (DRL) is often infeasible due to its immense sample complexity and the safety risks associated with unguided exploration, which can cause severe network interference. To address these challenges, we propose a meta-learning framework that enables agents to learn a robust initial policy and rapidly adapt to new wireless scenarios with minimal data. We implement three meta-learning architectures, model-agnostic meta-learning (MAML), recurrent neural network (RNN), and an attention-enhanced RNN, and evaluate them against a non-meta-learning DRL algorithm, proximal policy optimization (PPO) baseline, in a simulated dynamic integrated access/backhaul (IAB) environment. Our results show a clear performance gap. The attention-based meta-learning agent reaches a peak mean network throughput of 48 Mbps, while the PPO baseline decreased drastically to 10 Mbps. Furthermore, our method reduces SINR and latency violations by more than 50% compared to PPO. It also shows quick adaptation, with a fairness index 0.7, showing better resource allocation. This work proves that meta-learning is a very effective and safer option for intelligent control in complex wireless systems.
nan
Article 829
Title@2025-07-13 (7): Compute Requirements for Algorithmic Innovation in Frontier AI Models
Title: Compute Requirements for Algorithmic Innovation in Frontier AI Models | Berechnung der Anforderungen an algorithmische Innovationen bei Frontier-KI-Modellen | 边境AI 模型的计算方法分析创新要求 2507.10618v1 |
Authors (1): Peter Barnett
Algorithmic innovation in the pretraining of large language models has driven a massive reduction in the total compute required to reach a given level of capability. In this paper we empirically investigate the compute requirements for developing algorithmic innovations. We catalog 36 pre-training algorithmic innovations used in Llama 3 and DeepSeek-V3. For each innovation we estimate both the total FLOP used in development and the FLOP/s of the hardware utilized. Innovations using significant resources double in their requirements each year. We then use this dataset to investigate the effect of compute caps on innovation. Our analysis suggests that compute caps alone are unlikely to dramatically slow AI algorithmic progress. Even stringent compute caps – such as capping total operations to the compute used to train GPT-2 or capping hardware capacity to 8 H100 GPUs – could still have allowed for half of the cataloged innovations.
nan
Article 830
Title@2025-07-13 (7): A Scalable and Efficient Signal Integration System for Job Matching
Title: A Scalable and Efficient Signal Integration System for Job Matching | Ein skalierbares und effizientes Signalintegrationssystem für Job Matching | 用于匹配工作的可缩放和高效信号集成系统 2507.09797v1 |
Authors (16): Ping Liu, Rajat Arora, Xiao Shi, Benjamin Le, Qianqi Shen, Jianqiang Shen, Chengming Jiang, Nikita Zhiltsov, Priya Bannur, Yidan Zhu, Liming Dong, Haichao Wei, Qi Guo, Luke Simon, Liangjie Hong, Wenjing Zhang
LinkedIn, one of the world’s largest platforms for professional networking and job seeking, encounters various modeling challenges in building recommendation systems for its job matching product, including cold-start, filter bubbles, and biases affecting candidate-job matching. To address these, we developed the STAR (Signal Integration for Talent And Recruiters) system, leveraging the combined strengths of Large Language Models (LLMs) and Graph Neural Networks (GNNs). LLMs excel at understanding textual data, such as member profiles and job postings, while GNNs capture intricate relationships and mitigate cold-start issues through network effects. STAR integrates diverse signals by uniting LLM and GNN capabilities with industrial-scale paradigms including adaptive sampling and version management. It provides an end-to-end solution for developing and deploying embeddings in large-scale recommender systems. Our key contributions include a robust methodology for building embeddings in industrial applications, a scalable GNN-LLM integration for high-performing recommendations, and practical insights for real-world model deployment.
nan
Article 831
Title@2025-07-13 (7): LASER: Attention with Exponential Transformation
Title: LASER: Attention with Exponential Transformation | LASER: Aufmerksamkeit bei exponentieller Transformation | LASER: 关注感官转变 2411.03493v2 |
Authors (2): Sai Surya Duvvuri, Inderjit S. Dhillon
Transformers have had tremendous impact for several sequence related tasks, largely due to their ability to retrieve from any part of the sequence via softmax based dot-product attention. This mechanism plays a crucial role in Transformer’s performance. We analyze the gradients backpropagated through the softmax operation in the attention mechanism and observe that these gradients can often be small. This poor gradient signal backpropagation can lead to inefficient learning of parameters preceeding the attention operations. To this end, we introduce a new attention mechanism called LASER, which we analytically show to admit a larger gradient signal. We show that LASER attention can be implemented by making small modifications to existing attention implementations. We conduct experiments on autoregressive large language models (LLMs) with upto 7.7 billion parameters with an average improvement of upto 1.44% over standard attention on downstream evaluations and 1.65% finetuning improvements. Additionally, LASER demonstrates generalization performance improvement across a variety of tasks (vision, text and speech):Vision Transformer (ViT) on Imagenet, Conformer on the Librispeech speech-to-text and BERT with 2.2 billion parameters.
nan
Article 832
Title@2025-07-13 (7): NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection
Title: NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection | NegRefine: Verfeinerung negativer Label-basierter Zero-Shot-OOD-Erkennung | NegRefine: 改进以标签为基的零热 OOOD 检测 2507.09795v1 |
Authors (3): Amirhossein Ansari, Ke Wang, Pulei Xiong
Recent advancements in Vision-Language Models like CLIP have enabled zero-shot OOD detection by leveraging both image and textual label information. Among these, negative label-based methods such as NegLabel and CSP have shown promising results by utilizing a lexicon of words to define negative labels for distinguishing OOD samples. However, these methods suffer from detecting in-distribution samples as OOD due to negative labels that are subcategories of in-distribution labels or proper nouns. They also face limitations in handling images that match multiple in-distribution and negative labels. We propose NegRefine, a novel negative label refinement framework for zero-shot OOD detection. By introducing a filtering mechanism to exclude subcategory labels and proper nouns from the negative label set and incorporating a multi-matching-aware scoring function that dynamically adjusts the contributions of multiple labels matching an image, NegRefine ensures a more robust separation between in-distribution and OOD samples. We evaluate NegRefine on large-scale benchmarks, including ImageNet-1K. Source code is available at https://github.com/ah-ansari/NegRefine.
nan
Article 833
Title@2025-07-13 (7): Leveraging Distribution Matching to Make Approximate Machine Unlearning Faster
Title: Leveraging Distribution Matching to Make Approximate Machine Unlearning Faster | Leveraging Distribution Passend, um annähernde Maschine Unlearning schneller zu machen | 利用配配配配的配送让近似机器更快退出学习 2507.09786v1 |
Authors (1): Junaid Iqbal Khan
Approximate machine unlearning (AMU) enables models to `forget’ specific training data through specialized fine-tuning on a retained dataset subset. However, processing this retained subset still dominates computational runtime, while reductions of epochs also remain a challenge. We propose two complementary methods to accelerate classification-oriented AMU. First, \textbf{Blend}, a novel distribution-matching dataset condensation (DC), merges visually similar images with shared blend-weights to significantly reduce the retained set size. It operates with minimal pre-processing overhead and is orders of magnitude faster than state-of-the-art DC methods. Second, our loss-centric method, \textbf{Accelerated-AMU (A-AMU)}, augments the unlearning objective to quicken convergence. A-AMU achieves this by combining a steepened primary loss to expedite forgetting with a novel, differentiable regularizer that matches the loss distributions of forgotten and in-distribution unseen data. Our extensive experiments demonstrate that this dual approach of data and loss-centric optimization dramatically reduces end-to-end unlearning latency across both single and multi-round scenarios, all while preserving model utility and privacy. To our knowledge, this is the first work to systematically tackle unlearning efficiency by jointly designing a specialized dataset condensation technique with a dedicated accelerated loss function. Code is available at https://github.com/algebraicdianuj/DC_Unlearning.
nan
Article 834
Title@2025-07-13 (7): Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow
Title: Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow | Effiziente molekulare Konformer-Generation mit SO(3)-gemitteltem Flow Matching und Reflow | 具有SO(3)-可预见流动匹配和回流的高效分子前代分子 2507.09785v1 |
Authors (9): Zhonglin Cao, Mario Geiger, Allan dos Santos Costa, Danny Reidenbach, Karsten Kreis, Tomas Geffner, Franco Pellegrini, Guoqing Zhou, Emine Kucukbenli
Fast and accurate generation of molecular conformers is desired for downstream computational chemistry and drug discovery tasks. Currently, training and sampling state-of-the-art diffusion or flow-based models for conformer generation require significant computational resources. In this work, we build upon flow-matching and propose two mechanisms for accelerating training and inference of generative models for 3D molecular conformer generation. For fast training, we introduce the SO(3)-Averaged Flow training objective, which leads to faster convergence to better generation quality compared to conditional optimal transport flow or Kabsch-aligned flow. We demonstrate that models trained using SO(3)-Averaged Flow can reach state-of-the-art conformer generation quality. For fast inference, we show that the reflow and distillation methods of flow-based models enable few-steps or even one-step molecular conformer generation with high quality. The training techniques proposed in this work show a path towards highly efficient molecular conformer generation with flow-based models.
nan
Article 835
Title@2025-07-13 (7): Physics-informed neural networks for high-dimensional solutions and snaking bifurcations in nonlinear lattices
Title: Physics-informed neural networks for high-dimensional solutions and snaking bifurcations in nonlinear lattices | Physik-informierte neuronale Netzwerke für hochdimensionale Lösungen und snaking bifurkations in nichtlinearen Gittern | 物理知情神经网络,用于高维溶液和在非线性顶层中截断双硫 2507.09782v1 |
Authors (4): Muhammad Luthfi Shahab, Fidya Almira Suheri, Rudy Kusdiantara, Hadi Susanto
This paper introduces a framework based on physics-informed neural networks (PINNs) for addressing key challenges in nonlinear lattices, including solution approximation, bifurcation diagram construction, and linear stability analysis. We first employ PINNs to approximate solutions of nonlinear systems arising from lattice models, using the Levenberg-Marquardt algorithm to optimize network weights for greater accuracy. To enhance computational efficiency in high-dimensional settings, we integrate a stochastic sampling strategy. We then extend the method by coupling PINNs with a continuation approach to compute snaking bifurcation diagrams, incorporating an auxiliary equation to effectively track successive solution branches. For linear stability analysis, we adapt PINNs to compute eigenvectors, introducing output constraints to enforce positivity, in line with Sturm-Liouville theory. Numerical experiments are conducted on the discrete Allen-Cahn equation with cubic and quintic nonlinearities in one to five spatial dimensions. The results demonstrate that the proposed approach achieves accuracy comparable to, or better than, traditional numerical methods, especially in high-dimensional regimes where computational resources are a limiting factor. These findings highlight the potential of neural networks as scalable and efficient tools for the study of complex nonlinear lattice systems.
nan
Article 836
Title@2025-07-13 (7): Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces
Title: Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces | Wahrscheinlich adaptive durchschnittliche Belohnung Verstärkung Lernen für Metrische Räume | 可调适性平均增益学习,用于计量空间 2410.19919v2 |
Authors (2): Avik Kar, Rahul Singh
We study infinite-horizon average-reward reinforcement learning (RL) for Lipschitz MDPs, a broad class that subsumes several important classes such as linear and RKHS MDPs, function approximation frameworks, and develop an adaptive algorithm $\text{ZoRL}$ with regret bounded as $\mathcal{O}\big(T^{1 - d_{\text{eff.}}^{-1}}\big)$, where $d_{\text{eff.}}= 2d_\mathcal{S} + d_z + 3$, $d_\mathcal{S}$ is the dimension of the state space and $d_z$ is the zooming dimension. In contrast, algorithms with fixed discretization yield $d_{\text{eff.}} = 2(d_\mathcal{S} + d_\mathcal{A}) + 2$, $d_\mathcal{A}$ being the dimension of action space. $\text{ZoRL}$ achieves this by discretizing the state-action space adaptively and zooming into ‘‘promising regions’’ of the state-action space. $d_z$, a problem-dependent quantity bounded by the state-action space’s dimension, allows us to conclude that if an MDP is benign, then the regret of $\text{ZoRL}$ will be small. The zooming dimension and $\text{ZoRL}$ are truly adaptive, i.e., the current work shows how to capture adaptivity gains for infinite-horizon average-reward RL. $\text{ZoRL}$ outperforms other state-of-the-art algorithms in experiments, thereby demonstrating the gains arising due to adaptivity.
nan
Article 837
Title@2025-07-13 (7): DataDecide: How to Predict Best Pretraining Data with Small Experiments
Title: DataDecide: How to Predict Best Pretraining Data with Small Experiments | DataDecide: Wie man die besten Vorschulungsdaten mit kleinen Experimenten vorhersagt | 数据减少:如何利用小型实验预测最佳培训前数据 2504.11393v2 |
Authors (13): Ian Magnusson, Nguyen Tai, Ben Bogin, David Heineman, Jena D. Hwang, Luca Soldaini, Akshita Bhagia, Jiacheng Liu, Dirk Groeneveld, Oyvind Tafjord, Noah A. Smith, Pang Wei Koh, Jesse Dodge
Because large language models are expensive to pretrain on different datasets, using smaller-scale experiments to decide on data is crucial for reducing costs. Which benchmarks and methods of making decisions from observed performance at small scale most accurately predict the datasets that yield the best large models? To empower open exploration of this question, we release models, data, and evaluations in DataDecide – the most extensive open suite of models over differences in data and scale. We conduct controlled pretraining experiments across 25 corpora with differing sources, deduplication, and filtering up to 100B tokens, model sizes up to 1B parameters, and 3 random seeds. We find that the ranking of models at a single, small size (e.g., 150M parameters) is a strong baseline for predicting best models at our larger target scale (1B) (~80% of com parisons correct). No scaling law methods among 8 baselines exceed the compute-decision frontier of single-scale predictions, but DataDecide can measure improvement in future scaling laws. We also identify that using continuous likelihood metrics as proxies in small experiments makes benchmarks including MMLU, ARC, HellaSwag, MBPP, and HumanEval >80% predictable at the target 1B scale with just 0.01% of the compute.
nan
Article 838
Title@2025-07-13 (7): SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving
Title: SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving | SLED: Ein spekulatives LLM-Decoding-Framework für effizientes Edge Serving | SLED: 有效边缘服务投机性LLM代谢框架 2506.09397v4 |
Authors (8): Xiangchen Li, Dimitrios Spatharakis, Saeid Ghafouri, Jiakun Fan, Hans Vandierendonck, Deepu John, Bo Ji, Dimitrios Nikolopoulos
The growing gap between the increasing complexity of large language models (LLMs) and the limited computational budgets of edge devices poses a key challenge for efficient on-device inference, despite gradual improvements in hardware capabilities. Existing strategies, such as aggressive quantization, pruning, or remote inference, trade accuracy for efficiency or lead to substantial cost burdens. This position paper introduces a new framework that leverages speculative decoding, previously viewed primarily as a decoding acceleration technique for autoregressive generation of LLMs, as a promising approach specifically adapted for edge computing by orchestrating computation across heterogeneous devices. We propose \acronym, a framework that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models, while a single, shared edge server verifies the tokens utilizing a more precise target model. To further increase the efficiency of verification, the edge server batch the diverse verification requests from devices. This approach supports device heterogeneity and reduces server-side memory footprint by sharing the same upstream target model across multiple devices. Our initial experiments with Jetson Orin Nano, Raspberry Pi 4B/5, and an edge server equipped with 4 Nvidia A100 GPUs indicate substantial benefits: 2.2 more system throughput, 2.8 more system capacity, and better cost efficiency, all without sacrificing model accuracy.
nan
Article 839
Title@2025-07-13 (7): Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Title: Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding | Vision-geführtes Chunking ist alles, was Sie brauchen: Verbesserung der RAG durch multimodales Dokumentenverständnis | 愿景引导的决赛是您所需要的:用多模式文件理解加强RAG 2506.16035v2 |
Authors (5): Vishesh Tripathi, Tanmay Odapally, Indraneel Das, Uday Allu, Biddwan Ahmed
Retrieval-Augmented Generation (RAG) systems have revolutionized information retrieval and question answering, but traditional text-based chunking methods struggle with complex document structures, multi-page tables, embedded figures, and contextual dependencies across page boundaries. We present a novel multimodal document chunking approach that leverages Large Multimodal Models (LMMs) to process PDF documents in batches while maintaining semantic coherence and structural integrity. Our method processes documents in configurable page batches with cross-batch context preservation, enabling accurate handling of tables spanning multiple pages, embedded visual elements, and procedural content. We evaluate our approach on a curated dataset of PDF documents with manually crafted queries, demonstrating improvements in chunk quality and downstream RAG performance. Our vision-guided approach achieves better accuracy compared to traditional vanilla RAG systems, with qualitative analysis showing superior preservation of document structure and semantic coherence.
nan
Article 840
Title@2025-07-13 (7): Knowing When to Quit: Probabilistic Early Exits for Speech Separation
Title: Knowing When to Quit: Probabilistic Early Exits for Speech Separation | Zu wissen, wann man aufhören soll: probabilistische frühe Ausgänge für Sprachtrennung | 了解何时退出:语言分离的概率早期出场 2507.09768v1 |
Authors (6): Kenny Falkær Olsen. Mads Østergaard, Karl Ulbæk, Søren Føns Nielsen, Rasmus Malik Høegh Lindrup, Bjørn Sand Jensen, Morten Mørup
In recent years, deep learning-based single-channel speech separation has improved considerably, in large part driven by increasingly compute- and parameter-efficient neural network architectures. Most such architectures are, however, designed with a fixed compute and parameter budget, and consequently cannot scale to varying compute demands or resources, which limits their use in embedded and heterogeneous devices such as mobile phones and hearables. To enable such use-cases we design a neural network architecture for speech separation capable of early-exit, and we propose an uncertainty-aware probabilistic framework to jointly model the clean speech signal and error variance which we use to derive probabilistic early-exit conditions in terms of desired signal-to-noise ratios. We evaluate our methods on both speech separation and enhancement tasks, and we show that a single early-exit model can be competitive with state-of-the-art models trained at many compute and parameter budgets. Our framework enables fine-grained dynamic compute-scaling of speech separation networks while achieving state-of-the-art performance and interpretable exit conditions.
nan
Article 841
Title@2025-07-13 (7): Toward accurate RUL and SOH estimation using reinforced graph-based PINNs enhanced with dynamic weights
Title: Toward accurate RUL and SOH estimation using reinforced graph-based PINNs enhanced with dynamic weights | Zur genauen RUL- und SOH-Schätzung mit verstärkten graphbasierten PINNs mit dynamischen Gewichten | 使用强化的以图表为基础的活性净净化网,加上动态权重,实现准确的RUL和SOH估算 2507.09766v1 |
Authors (4): Mohamadreza Akbari Pour, Ali Ghasemzadeh, MohamadAli Bijarchi, Mohammad Behshad Shafii
Accurate estimation of Remaining Useful Life (RUL) and State of Health (SOH) is essential for Prognostics and Health Management (PHM) across a wide range of industrial applications. We propose a novel framework – Reinforced Graph-Based Physics-Informed Neural Networks Enhanced with Dynamic Weights (RGPD) – that combines physics-based supervision with advanced spatio-temporal learning. Graph Convolutional Recurrent Networks (GCRNs) embed graph-convolutional filters within recurrent units to capture how node representations evolve over time. Graph Attention Convolution (GATConv) leverages a self-attention mechanism to compute learnable, edge-wise attention coefficients, dynamically weighting neighbor contributions for adaptive spatial aggregation. A Soft Actor-Critic (SAC) module is positioned between the Temporal Attention Unit (TAU) and GCRN to further improve the spatio-temporal learning. This module improves attention and prediction accuracy by dynamically scaling hidden representations to minimize noise and highlight informative features. To identify the most relevant physical constraints in each area, Q-learning agents dynamically assign weights to physics-informed loss terms, improving generalization across real-time industrial systems and reducing the need for manual tuning. In both RUL and SOH estimation tasks, the proposed method consistently outperforms state-of-the-art models, demonstrating strong robustness and predictive accuracy across varied degradation patterns across three diverse industrial benchmark datasets.
nan
Article 842
Title@2025-07-13 (7): Cascade Speculative Drafting for Even Faster LLM Inference
Title: Cascade Speculative Drafting for Even Faster LLM Inference | Cascade Spekulative Drafting für noch schnellere LLM-Inferenz | 连速度更快LLM推论的连带连带性投机起草 2312.11462v5 |
Authors (6): Ziyi Chen, Xiaocong Yang, Jiacheng Lin, Chenkai Sun, Kevin Chen-Chuan Chang, Jie Huang
Introduced to enhance the efficiency of large language model (LLM) inference, speculative decoding operates by having a smaller model generate a draft. A larger target model then reviews this draft to align with its output, and any acceptance by the target model results in a reduction of the number of the target model runs, ultimately improving efficiency. However, the drafting process in speculative decoding includes slow autoregressive generation and allocates equal time to generating tokens, irrespective of their importance. These inefficiencies collectively contribute to the suboptimal performance of speculative decoding. To further improve LLM inference, we introduce Cascade Speculative Drafting (CS Drafting), a speculative execution algorithm that incorporates two types of cascades. The Vertical Cascade eliminates autoregressive generation from neural models, while the Horizontal Cascade optimizes time allocation in drafting for improved efficiency. Combining both cascades, CS Drafting achieves greater speedup compared to the baselines in our experiments, while preserving the same output distribution as the target model.
nan
Article 843
Title@2025-07-13 (7): Data-Centric Human Preference with Rationales for Direct Preference Alignment
Title: Data-Centric Human Preference with Rationales for Direct Preference Alignment | Daten-Centric Human Preference mit Rationales für direkte Präferenzausrichtung | 数据中心人类首选与直接优先调整的理由说明 2407.14477v4 |
Authors (5): Hoang Anh Just, Ming Jin, Anit Sahu, Huy Phan, Ruoxi Jia
Aligning language models with human preferences through reinforcement learning from human feedback is crucial for their safe and effective deployment. The human preference is typically represented through comparison where one response is chosen over another for a given prompt. However, standard preference datasets often lack explicit information on why a particular choice was made, presenting an ambiguity that can hinder efficient learning and robust alignment, especially given the high cost of acquiring extensive human annotations. While many studies focus on algorithmic improvements, this work adopts a data-centric perspective, exploring how to enhance learning from existing preference data. We propose augmenting standard preference pairs with rationales that explain the reasoning behind the human preference. Specifically, we introduce a simple and principled framework that leverages machine-generated rationales to enrich preference data for preference optimization algorithms. Our comprehensive analysis demonstrates that incorporating rationales improves learning efficiency. Extensive experiments reveal some advantages: rationale-augmented learning accelerates convergence and can achieve higher final model performance. Furthermore, this approach is versatile and compatible with various direct preference optimization algorithms. Our findings showcase the potential of thoughtful data design in preference learning, demonstrating that enriching existing datasets with explanatory rationales can help unlock improvements in model alignment and annotation efficiency.
nan
Article 844
Title@2025-07-13 (7): Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding
Title: Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding | Ihr prätrainiertes Modell erzählt die Schwierigkeit selbst: Ein selbstadaptives Curriculum Lernen Paradigma für das natürliche Sprachverständnis | 您训练有素的模型告诉困难本身:学习自然语言理解的自适应课程学习范式 2507.09758v1 |
Authors (3): Qi Feng, Yihong Liu, Hinrich Schütze
Curriculum learning is a widely adopted training strategy in natural language processing (NLP), where models are exposed to examples organized by increasing difficulty to enhance learning efficiency and performance. However, most existing approaches rely on manually defined difficulty metrics – such as text length – which may not accurately reflect the model’s own perspective. To overcome this limitation, we present a self-adaptive curriculum learning paradigm that prioritizes fine-tuning examples based on difficulty scores predicted by pre-trained language models (PLMs) themselves. Building on these scores, we explore various training strategies that differ in the ordering of examples for the fine-tuning: from easy-to-hard, hard-to-easy, to mixed sampling. We evaluate our method on four natural language understanding (NLU) datasets covering both binary and multi-class classification tasks. Experimental results show that our approach leads to faster convergence and improved performance compared to standard random sampling.
nan
Article 845
Title@2025-07-13 (7): Energy Dissipation Rate Guided Adaptive Sampling for Physics-Informed Neural Networks: Resolving Surface-Bulk Dynamics in Allen-Cahn Systems
Title: Energy Dissipation Rate Guided Adaptive Sampling for Physics-Informed Neural Networks: Resolving Surface-Bulk Dynamics in Allen-Cahn Systems | Energieableitungsrate Geführte adaptive Probenahme für physikinformierte Neuronale Netzwerke: Auflösen von Oberflächen-Bulk-Dynamik in Allen-Cahn-Systemen | 物理内成形神经网络的能源损耗率向导适应性抽样抽样:Allen-Cahn系统中的表面-柱体动力学的解决方案 2507.09757v1 |
Authors (3): Chunyan Li, Wenkai Yu, Qi Wang
We introduce the Energy Dissipation Rate guided Adaptive Sampling (EDRAS) strategy, a novel method that substantially enhances the performance of Physics-Informed Neural Networks (PINNs) in solving thermodynamically consistent partial differential equations (PDEs) over arbitrary domains. EDRAS leverages the local energy dissipation rate density as a guiding metric to identify and adaptively re-sample critical collocation points from both the interior and boundary of the computational domain. This dynamical sampling approach improves the accuracy of residual-based PINNs by aligning the training process with the underlying physical structure of the system. In this study, we demonstrate the effectiveness of EDRAS using the Allen-Cahn phase field model in irregular geometries, achieving up to a sixfold reduction in the relative mean square error compared to traditional residual-based adaptive refinement (RAR) methods. Moreover, we compare EDRAS with other residual-based adaptive sampling approaches and show that EDRAS is not only computationally more efficient but also more likely to identify high-impact collocation points. Through numerical solutions of the Allen-Cahn equation with both static (Neumann) and dynamic boundary conditions in 2D disk- and ellipse-shaped domains solved using PINN coupled with EDRAS, we gain significant insights into how dynamic boundary conditions influence bulk phase evolution and thermodynamic behavior. The proposed approach offers an effective, physically informed enhancement to PINN frameworks for solving thermodynamically consistent models, making PINN a robust and versatile computational tool for investigating complex thermodynamic processes in arbitrary geometries.
nan
Article 846
Title@2025-07-13 (7): DiPT: Enhancing LLM reasoning through diversified perspective-taking
Title: DiPT: Enhancing LLM reasoning through diversified perspective-taking | DiPT: Verbesserung der LLM-Reinigung durch diversifizierte Perspektive | DPT:通过从不同角度出发,加强LLM推理 2409.06241v2 |
Authors (5): Hoang Anh Just, Mahavir Dabas, Lifu Huang, Ming Jin, Ruoxi Jia
Existing work on improving language model reasoning typically explores a single solution path, which can be prone to errors. Inspired by perspective-taking in social studies, this paper introduces DiPT, a novel approach that complements current reasoning methods by explicitly incorporating diversified viewpoints. This approach allows the model to gain a deeper understanding of the problem’s context and identify the most effective solution path during the inference stage. Additionally, it provides a general data-centric AI recipe for augmenting existing data to improve their quality for fine-tuning. Our empirical results demonstrate that DiPT can be flexibly integrated into existing methods that focus on a single reasoning approach, enhancing their reasoning performance and stability when presented with paraphrased problems. Furthermore, we illustrate improved context understanding by maintaining the model’s safe outputs against “jailbreaking” prompts intentionally designed to bypass safeguards built into deployed models. Lastly, we show that fine-tuning with data enriched with diverse perspectives can boost the reasoning capabilities of the model compared to fine-tuning with raw data alone.
nan
Article 847
Title@2025-07-13 (7): Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts
Title: Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts | Erklärbare KI in der Genomik: Transkriptionsfaktor Bindung Site Prediction mit Mischung von Experten | 在基因组学中可解释的AI:与专家混合的转移要素约束性现场预测 2507.09754v1 |
Authors (5): Aakash Tripathi, Ian E. Nielsen, Muhammad Umer, Ravi P. Ramachandran, Ghulam Rasool
Transcription Factor Binding Site (TFBS) prediction is crucial for understanding gene regulation and various biological processes. This study introduces a novel Mixture of Experts (MoE) approach for TFBS prediction, integrating multiple pre-trained Convolutional Neural Network (CNN) models, each specializing in different TFBS patterns. We evaluate the performance of our MoE model against individual expert models on both in-distribution and out-of-distribution (OOD) datasets, using six randomly selected transcription factors (TFs) for OOD testing. Our results demonstrate that the MoE model achieves competitive or superior performance across diverse TF binding sites, particularly excelling in OOD scenarios. The Analysis of Variance (ANOVA) statistical test confirms the significance of these performance differences. Additionally, we introduce ShiftSmooth, a novel attribution mapping technique that provides more robust model interpretability by considering small shifts in input sequences. Through comprehensive explainability analysis, we show that ShiftSmooth offers superior attribution for motif discovery and localization compared to traditional Vanilla Gradient methods. Our work presents an efficient, generalizable, and interpretable solution for TFBS prediction, potentially enabling new discoveries in genome biology and advancing our understanding of transcriptional regulation.
nan
Article 848
Title@2025-07-13 (7): Do we need equivariant models for molecule generation?
Title: Do we need equivariant models for molecule generation? | Brauchen wir äquivariante Modelle für die Molekülgenerierung? | 我们需要分子生成的等同模型吗? 2507.09753v1 |
Authors (7): Ewa M. Nowara, Joshua Rackers, Patricia Suriana, Pan Kessel, Max Shen, Andrew Martin Watkins, Michael Maser
Deep generative models are increasingly used for molecular discovery, with most recent approaches relying on equivariant graph neural networks (GNNs) under the assumption that explicit equivariance is essential for generating high-quality 3D molecules. However, these models are complex, difficult to train, and scale poorly. We investigate whether non-equivariant convolutional neural networks (CNNs) trained with rotation augmentations can learn equivariance and match the performance of equivariant models. We derive a loss decomposition that separates prediction error from equivariance error, and evaluate how model size, dataset size, and training duration affect performance across denoising, molecule generation, and property prediction. To our knowledge, this is the first study to analyze learned equivariance in generative tasks.
nan
Article 849
Title@2025-07-13 (7): Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them
Title: Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them | Scalpel vs. Hammer: GRPO verstärkt bestehende Fähigkeiten, SFT ersetzt sie | 缩略图与锤子:GROPO 放大现有能力,SFT 替换 2507.10616v1 |
Authors (4): Neel Rajani, Aryo Pradipta Gema, Seraphina Goldfarb-Tarrant, Ivan Titov
Training large language models (LLMs) for reasoning via maths and code datasets has become a major new focus in LLM post-training. Two particularly popular approaches are reinforcement learning (RL) and supervised fine-tuning (SFT), but their training dynamics are poorly understood. We present a comparative analysis of RL and SFT on the same maths problems with the same model and similar hyperparameters. We find that RL yields minor in-domain gains on maths and slight degradation on knowledge-intensive benchmarks like MMLU, while both trends are more pronounced in SFT. We also analyse model parameters across checkpoints, observing that both algorithms modify query and key weights the most. Meanwhile, SFT exhibits greater updates and also affects mid-layer MLPs more, leading us to hypothesise that this may have caused the out-of-domain degradation. We therefore investigate whether freezing parts of the model during training can mitigate the reduced performance on knowledge-intensive benchmarks. However, our results are inconclusive, with benefits on GPQA:Diamond and degradation on other benchmarks. Taken together, our observations provide a preliminary indication for why RL amplifies existing capabilities, while SFT replaces old skills with new ones.
nan
Article 850
Title@2025-07-13 (7): MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients
Title: MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients | MB-RIRs: ein Synthetischer Raumimpuls-Ansprechdatensatz mit Frequenzabhängigen Absorptionskoeffizienten | MB-RIRs:一个具有频率依赖吸收系数的合成室电动脉冲反应数据集 2507.09750v1 |
Authors (4): Enric Gusó, Joanna Luberadzka, Umut Sayin, Xavier Serra
We investigate the effects of four strategies for improving the ecological validity of synthetic room impulse response (RIR) datasets for monoaural Speech Enhancement (SE). We implement three features on top of the traditional image source method-based (ISM) shoebox RIRs: multiband absorption coefficients, source directivity and receiver directivity. We additionally consider mesh-based RIRs from the SoundSpaces dataset. We then train a DeepFilternet3 model for each RIR dataset and evaluate the performance on a test set of real RIRs both objectively and subjectively. We find that RIRs which use frequency-dependent acoustic absorption coefficients (MB-RIRs) can obtain +0.51dB of SDR and a +8.9 MUSHRA score when evaluated on real RIRs. The MB-RIRs dataset is publicly available for free download.
nan
Article 851
Title@2025-07-13 (7): Fair Domain Generalization: An Information-Theoretic View
Title: Fair Domain Generalization: An Information-Theoretic View | Fair Domain Generalization: Eine informationstheoretische Ansicht | 公平域一般化:信息理论观点 2507.05823v2 |
Authors (5): Tangzheng Lian, Guanyu Hu, Dimitrios Kollias, Xinyu Yang, Oya Celiktutan
Domain generalization (DG) and algorithmic fairness are two critical challenges in machine learning. However, most DG methods focus only on minimizing expected risk in the unseen target domain without considering algorithmic fairness. Conversely, fairness methods typically do not account for domain shifts, so the fairness achieved during training may not generalize to unseen test domains. In this work, we bridge these gaps by studying the problem of Fair Domain Generalization (FairDG), which aims to minimize both expected risk and fairness violations in unseen target domains. We derive novel mutual information-based upper bounds for expected risk and fairness violations in multi-class classification tasks with multi-group sensitive attributes. These bounds provide key insights for algorithm design from an information-theoretic perspective. Guided by these insights, we introduce PAFDG (Pareto-Optimal Fairness for Domain Generalization), a practical framework that solves the FairDG problem and models the utility-fairness trade-off through Pareto optimization. Experiments on real-world vision and language datasets show that PAFDG achieves superior utility-fairness trade-offs compared to existing methods.
nan
Article 852
Title@2025-07-13 (7): Discovering Governing Equations in the Presence of Uncertainty
Title: Discovering Governing Equations in the Presence of Uncertainty | Entdeckt regierende Gleichungen in der Gegenwart von Ungewissheit | 不确定性存在时的发现等值 2507.09740v1 |
Authors (3): Ridwan Olabiyi, Han Hu, Ashif Iquebal
In the study of complex dynamical systems, understanding and accurately modeling the underlying physical processes is crucial for predicting system behavior and designing effective interventions. Yet real-world systems exhibit pronounced input (or system) variability and are observed through noisy, limited data conditions that confound traditional discovery methods that assume fixed-coefficient deterministic models. In this work, we theorize that accounting for system variability together with measurement noise is the key to consistently discover the governing equations underlying dynamical systems. As such, we introduce a stochastic inverse physics-discovery (SIP) framework that treats the unknown coefficients as random variables and infers their posterior distribution by minimizing the Kullback-Leibler divergence between the push-forward of the posterior samples and the empirical data distribution. Benchmarks on four canonical problems – the Lotka-Volterra predator-prey system (multi- and single-trajectory), the historical Hudson Bay lynx-hare data, the chaotic Lorenz attractor, and fluid infiltration in porous media using low- and high-viscosity liquids – show that SIP consistently identifies the correct equations and lowers coefficient root-mean-square error by an average of 82\% relative to the Sparse Identification of Nonlinear Dynamics (SINDy) approach and its Bayesian variant. The resulting posterior distributions yield 95\% credible intervals that closely track the observed trajectories, providing interpretable models with quantified uncertainty. SIP thus provides a robust, data-efficient approach for consistent physics discovery in noisy, variable, and data-limited settings.
nan
Article 853
Title@2025-07-13 (7): Accelerating Constrained Sampling: A Large Deviations Approach
Title: Accelerating Constrained Sampling: A Large Deviations Approach | Beschleunigte Probenahme beschleunigen: Ein großer Abweichungsansatz | 加速受控抽样:大偏离方法 2506.07816v2 |
Authors (4): Yingli Wang, Changwei Tu, Xiaoyu Wang, Lingjiong Zhu
The problem of sampling a target probability distribution on a constrained domain arises in many applications including machine learning. For constrained sampling, various Langevin algorithms such as projected Langevin Monte Carlo (PLMC) based on the discretization of reflected Langevin dynamics (RLD) and more generally skew-reflected non-reversible Langevin Monte Carlo (SRNLMC) based on the discretization of skew-reflected non-reversible Langevin dynamics (SRNLD) have been proposed and studied in the literature. This work focuses on the long-time behavior of SRNLD, where a skew-symmetric matrix is added to RLD. Although acceleration for SRNLD has been studied, it is not clear how one should design the skew-symmetric matrix in the dynamics to achieve good performance in practice. We establish a large deviation principle (LDP) for the empirical measure of SRNLD when the skew-symmetric matrix is chosen such that its product with the inward unit normal vector field on the boundary is zero. By explicitly characterizing the rate functions, we show that this choice of the skew-symmetric matrix accelerates the convergence to the target distribution compared to RLD and reduces the asymptotic variance. Numerical experiments for SRNLMC based on the proposed skew-symmetric matrix show superior performance, which validate the theoretical findings from the large deviations theory.
nan
Article 854
Title@2025-07-13 (7): Universal Physics Simulation: A Foundational Diffusion Approach
Title: Universal Physics Simulation: A Foundational Diffusion Approach | Universelle Physik Simulation: Ein grundlegender Diffusionsansatz | 宇宙物理模拟:基础扩散方法 2507.09733v1 |
Authors (1): Bradley Camburn
We present the first foundational AI model for universal physics simulation that learns physical laws directly from boundary-condition data without requiring a priori equation encoding. Traditional physics-informed neural networks (PINNs) and finite-difference methods necessitate explicit mathematical formulation of governing equations, fundamentally limiting their generalizability and discovery potential. Our sketch-guided diffusion transformer approach reimagines computational physics by treating simulation as a conditional generation problem, where spatial boundary conditions guide the synthesis of physically accurate steady-state solutions. By leveraging enhanced diffusion transformer architectures with novel spatial relationship encoding, our model achieves direct boundary-to-equilibrium mapping and is generalizable to diverse physics domains. Unlike sequential time-stepping methods that accumulate errors over iterations, our approach bypasses temporal integration entirely, directly generating steady-state solutions with SSIM > 0.8 while maintaining sub-pixel boundary accuracy. Our data-informed approach enables physics discovery through learned representations analyzable via Layer-wise Relevance Propagation (LRP), revealing emergent physical relationships without predetermined mathematical constraints. This work represents a paradigm shift from AI-accelerated physics to AI-discovered physics, establishing the first truly universal physics simulation framework.
nan
Article 855
Title@2025-07-13 (7): Continental scale habitat modelling with artificial intelligence and multimodal earth observation
Title: Continental scale habitat modelling with artificial intelligence and multimodal earth observation | Lebensraummodellierung im kontinentalen Maßstab mit künstlicher Intelligenz und multimodaler Erdbeobachtung | 利用人工智能和多式地球观测进行大陆规模的大陆生境建模 2507.09732v1 |
Authors (5): Sara Si-Moussi, Stephan Hennekens, Sander Mucher, Stan Los, Wilfried Thuiller
Habitats integrate the abiotic conditions and biophysical structures that support biodiversity and sustain nature’s contributions to people. As these ecosystems face mounting pressure from human activities, accurate, high-resolution habitat maps are essential for effective conservation and restoration. Yet current maps often fall short in thematic or spatial resolution because they must (1) model several mutually exclusive habitat types that co-occur across landscapes and (2) cope with severe class imbalance that complicate multi-class training. Here, we evaluated how high-resolution remote sensing (RS) data and Artificial Intelligence (AI) tools can improve habitat classification over large geographic extents at fine thematic resolution. Using vegetation plots from the European Vegetation Archive, we modelled Level 3 EUNIS habitats across Europe and assessed multiple modelling strategies against independent validation datasets. Strategies that exploited the hierarchical nature of habitat nomenclatures resolved classification ambiguities, especially in fragmented landscapes. Integrating multi-spectral (MSI) and synthetic aperture radar (SAR) imagery, particularly through Earth Observation Foundation models, enhanced within-formation discrimination and overall performance. Finally, ensemble machine learning that corrects class imbalance boosted accuracy further. Our methodological framework is transferable beyond Europe and adaptable to other classification systems. Future research should advance temporal modelling of dynamic habitats, extend to habitat segmentation and quality assessment, and exploit next-generation EO data paired with higher-quality in-situ observations.
nan
Article 856
Title@2025-07-13 (7): Signed Graph Learning: Algorithms and Theory
Title: Signed Graph Learning: Algorithms and Theory | Unterzeichnetes Graphenlernen: Algorithmen und Theorie | 签署图表学习:算法和理论 2507.09717v1 |
Authors (4): Abdullah Karaaslanli, Bisakh Banerjee, Tapabrata Maiti, Selin Aviyente
Real-world data is often represented through the relationships between data samples, forming a graph structure. In many applications, it is necessary to learn this graph structure from the observed data. Current graph learning research has primarily focused on unsigned graphs, which consist only of positive edges. However, many biological and social systems are better described by signed graphs that account for both positive and negative interactions, capturing similarity and dissimilarity between samples. In this paper, we develop a method for learning signed graphs from a set of smooth signed graph signals. Specifically, we employ the net Laplacian as a graph shift operator (GSO) to define smooth signed graph signals as the outputs of a low-pass signed graph filter defined by the net Laplacian. The signed graph is then learned by formulating a non-convex optimization problem where the total variation of the observed signals is minimized with respect to the net Laplacian. The proposed problem is solved using alternating direction method of multipliers (ADMM) and a fast algorithm reducing the per-ADMM iteration complexity from quadratic to linear in the number of nodes is introduced. Furthermore, theoretical proofs of convergence for the algorithm and a bound on the estimation error of the learned net Laplacian as a function of sample size, number of nodes, and graph topology are provided. Finally, the proposed method is evaluated on simulated data and gene regulatory network inference problem and compared to existing signed graph learning methods.
nan
Article 857
Title@2025-07-13 (7): TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems
Title: TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems | TimberStrike: Datensatz-Rekonstruktion Angriff Enthüllen der Privatsphäre Leckage in Federated Tree-Based Systems | 木材三角:联邦树基系统中数据集重建攻击清除隐私渗漏 2506.07605v3 |
Authors (5): Marco Di Gennaro, Giovanni De Lucia, Stefano Longari, Stefano Zanero, Michele Carminati
Federated Learning has emerged as a privacy-oriented alternative to centralized Machine Learning, enabling collaborative model training without direct data sharing. While extensively studied for neural networks, the security and privacy implications of tree-based models remain underexplored. This work introduces TimberStrike, an optimization-based dataset reconstruction attack targeting horizontally federated tree-based models. Our attack, carried out by a single client, exploits the discrete nature of decision trees by using split values and decision paths to infer sensitive training data from other clients. We evaluate TimberStrike on State-of-the-Art federated gradient boosting implementations across multiple frameworks, including Flower, NVFlare, and FedTree, demonstrating their vulnerability to privacy breaches. On a publicly available stroke prediction dataset, TimberStrike consistently reconstructs between 73.05% and 95.63% of the target dataset across all implementations. We further analyze Differential Privacy, showing that while it partially mitigates the attack, it also significantly degrades model performance. Our findings highlight the need for privacy-preserving mechanisms specifically designed for tree-based Federated Learning systems, and we provide preliminary insights into their design.
nan
Article 858
Title@2025-07-13 (7): Task-Agnostic Pre-training and Task-Guided Fine-tuning for Versatile Diffusion Planner
Title: Task-Agnostic Pre-training and Task-Guided Fine-tuning for Versatile Diffusion Planner | Task-Agnostic Pre-Training und Task-Guided Fine-Tuning für vielseitige Diffusion Planner | Versatile Difatile 扩散规划器任务不可知性培训前和任务指导微调 2409.19949v3 |
Authors (6): Chenyou Fan, Chenjia Bai, Zhao Shan, Haoran He, Yang Zhang, Zhen Wang
Diffusion models have demonstrated their capabilities in modeling trajectories of multi-tasks. However, existing multi-task planners or policies typically rely on task-specific demonstrations via multi-task imitation, or require task-specific reward labels to facilitate policy optimization via Reinforcement Learning (RL). They are costly due to the substantial human efforts required to collect expert data or design reward functions. To address these challenges, we aim to develop a versatile diffusion planner capable of leveraging large-scale inferior data that contains task-agnostic sub-optimal trajectories, with the ability to fast adapt to specific tasks. In this paper, we propose SODP, a two-stage framework that leverages Sub-Optimal data to learn a Diffusion Planner, which is generalizable for various downstream tasks. Specifically, in the pre-training stage, we train a foundation diffusion planner that extracts general planning capabilities by modeling the versatile distribution of multi-task trajectories, which can be sub-optimal and has wide data coverage. Then for downstream tasks, we adopt RL-based fine-tuning with task-specific rewards to quickly refine the diffusion planner, which aims to generate action sequences with higher task-specific returns. Experimental results from multi-task domains including Meta-World and Adroit demonstrate that SODP outperforms state-of-the-art methods with only a small amount of data for reward-guided fine-tuning.
nan
Article 859
Title@2025-07-13 (7): Phase transition of the Sinkhorn-Knopp algorithm
Title: Phase transition of the Sinkhorn-Knopp algorithm | Phasenübergang des Sinkhorn-Knopp-Algorithmus | Sinkhorn- Knopp 算法的阶段过渡 2507.09711v1 |
Authors (1): Kun He
The matrix scaling problem, particularly the Sinkhorn-Knopp algorithm, has been studied for over 60 years. In practice, the algorithm often yields high-quality approximations within just a few iterations. Theoretically, however, the best-known upper bound places it in the class of pseudopolynomial-time approximation algorithms. Meanwhile, the lower-bound landscape remains largely unexplored. Two fundamental questions persist: what accounts for the algorithm’s strong empirical performance, and can a tight bound on its iteration count be established? For an $n\times n$ matrix, its normalized version is obtained by dividing each entry by its largest entry. We say that a normalized matrix has a density $\gamma$ if there exists a constant $\rho > 0$ such that one row or column has exactly $\lceil \gamma n \rceil$ entries with values at least $\rho$, and every other row and column has at least $\lceil \gamma n \rceil$ such entries. For the upper bound, we show that the Sinkhorn-Knopp algorithm produces a nearly doubly stochastic matrix in $O(\log n - \log \varepsilon)$ iterations and $\widetilde{O}(n^2)$ time for all nonnegative square matrices whose normalized version has a density $\gamma > 1/2$. Such matrices cover both the algorithm’s principal practical inputs and its typical theoretical regime, and the $\widetilde{O}(n^2)$ runtime is optimal. For the lower bound, we establish a tight bound of $\widetilde{\Omega}\left(n^{1/2}/\varepsilon\right)$ iterations for positive matrices under the $\ell_2$-norm error measure. Moreover, for every $\gamma < 1/2$, there exists a matrix with density $\gamma$ for which the algorithm requires $\Omega\left(n^{1/2}/\varepsilon\right)$ iterations. In summary, our results reveal a sharp phase transition in the Sinkhorn-Knopp algorithm at the density threshold $\gamma = 1/2$.
nan
Article 860
Title@2025-07-13 (7): Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces
Title: Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces | Große Sprachmodelle kodieren Semantik in Low-Dimensional Linear Subspaces | 低多维线性线性子空间中大语言模型编码语义学 2507.09709v1 |
Authors (6): Baturay Saglam, Paul Kassianik, Blaine Nelson, Sajana Weerawardhena, Yaron Singer, Amin Karbasi
Understanding the latent space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment. \baturay{However, it remains unclear to what extent LLMs internally organize representations related to semantic understanding. To investigate this, we conduct a large-scale empirical study of hidden states in transformer-based LLMs, analyzing 11 decoder-only models across 6 scientific topics and 12 layers each. We find that high-level semantic information consistently lies in low-dimensional subspaces that form linearly separable representations across distinct domains. This separability becomes more pronounced in deeper layers and under prompts that trigger structured reasoning or alignment behaviors$\unicode{x2013}$even when surface content is unchanged. This geometry enables simple yet effective causal interventions in hidden space; for example, reasoning patterns like chain-of-thought can be captured by a single vector direction. Together, these findings support the development of geometry-aware tools that operate directly on latent representations to detect and mitigate harmful or adversarial content, using methods such as transport-based defenses that leverage this separability. As a proof of concept, we demonstrate this potential by training a simple MLP classifier as a lightweight latent-space guardrail, which detects adversarial and malicious prompts with high precision.
nan
Article 861
Title@2025-07-13 (7): BiDepth: A Bidirectional-Depth Neural Network for Spatio-Temporal Prediction
Title: BiDepth: A Bidirectional-Depth Neural Network for Spatio-Temporal Prediction | BiDepth: Ein bidirektional-depth-Neurales Netzwerk für Spatio-Temporale Vorhersagen | 双向 – – 双向 – – 外心神经网络 2501.08411v3 |
Authors (4): Sina Ehsani, Fenglian Pan, Qingpei Hu, Jian Liu
Accurate spatial-temporal (ST) prediction for dynamic systems, such as urban mobility and weather patterns, is crucial but hindered by complex ST correlations and the challenge of concurrently modeling long-term trends with short-term fluctuations. Existing methods often falter in these areas. This paper proposes the BiDepth Multimodal Neural Network (BDMNN), which integrates two key innovations: 1) a bidirectional depth modulation mechanism that dynamically adjusts network depth to comprehensively capture both long-term seasonality and immediate short-term events; and 2) a novel convolutional self-attention cell (CSAC). Critically, unlike many attention mechanisms that can lose spatial acuity, our CSAC is specifically designed to preserve crucial spatial relationships throughout the network, akin to standard convolutional layers, while simultaneously capturing temporal dependencies. Evaluated on real-world urban traffic and precipitation datasets, BDMNN demonstrates significant accuracy improvements, achieving a 12% Mean Squared Error (MSE) reduction in urban traffic prediction and a 15% improvement in precipitation forecasting over leading deep learning benchmarks like ConvLSTM, using comparable computational resources. These advancements offer robust ST forecasting for smart city management, disaster prevention, and resource optimization.
nan
Article 862
Title@2025-07-13 (7): CCDM: Continuous Conditional Diffusion Models for Image Generation
Title: CCDM: Continuous Conditional Diffusion Models for Image Generation | CCDM: Continuous Conditional Diffusion Models für die Bildgenerierung | CCDM: 图像生成持续有条件传播模型 2405.03546v3 |
Authors (4): Xin Ding, Yongwei Wang, Kao Zhang, Z. Jane Wang
Continuous Conditional Generative Modeling (CCGM) estimates high-dimensional data distributions, such as images, conditioned on scalar continuous variables (aka regression labels). While Continuous Conditional Generative Adversarial Networks (CcGANs) were designed for this task, their instability during adversarial learning often leads to suboptimal results. Conditional Diffusion Models (CDMs) offer a promising alternative, generating more realistic images, but their diffusion processes, label conditioning, and model fitting procedures are either not optimized for or incompatible with CCGM, making it difficult to integrate CcGANs’ vicinal approach. To address these issues, we introduce Continuous Conditional Diffusion Models (CCDMs), the first CDM specifically tailored for CCGM. CCDMs address existing limitations with specially designed conditional diffusion processes, a novel hard vicinal image denoising loss, a customized label embedding method, and efficient conditional sampling procedures. Through comprehensive experiments on four datasets with resolutions ranging from 64x64 to 192x192, we demonstrate that CCDMs outperform state-of-the-art CCGM models, establishing a new benchmark. Ablation studies further validate the model design and implementation, highlighting that some widely used CDM implementations are ineffective for the CCGM task. Our code is publicly available at https://github.com/UBCDingXin/CCDM.
nan
Article 863
Title@2025-07-13 (7): EPT-2 Technical Report
Title: EPT-2 Technical Report | EPT-2 Technischer Bericht | EPT-2 技术报告 2507.09703v1 |
Authors (15): Roberto Molinaro, Niall Siegenheim, Niels Poulsen, Jordan Dane Daubinet, Henry Martin, Mark Frey, Kevin Thiart, Alexander Jakob Dautel, Andreas Schlueter, Alex Grigoryev, Bogdan Danciu, Nikoo Ekhtiari, Bas Steunebrink, Leonie Wagner, Marvin Vincent Gabler
We present EPT-2, the latest iteration in our Earth Physics Transformer (EPT) family of foundation AI models for Earth system forecasting. EPT-2 delivers substantial improvements over its predecessor, EPT-1.5, and sets a new state of the art in predicting energy-relevant variables-including 10m and 100m wind speed, 2m temperature, and surface solar radiation-across the full 0-240h forecast horizon. It consistently outperforms leading AI weather models such as Microsoft Aurora, as well as the operational numerical forecast system IFS HRES from the European Centre for Medium-Range Weather Forecasts (ECMWF). In parallel, we introduce a perturbation-based ensemble model of EPT-2 for probabilistic forecasting, called EPT-2e. Remarkably, EPT-2e significantly surpasses the ECMWF ENS mean-long considered the gold standard for medium- to longrange forecasting-while operating at a fraction of the computational cost. EPT models, as well as third-party forecasts, are accessible via the app.jua.ai platform.
nan
Article 864
Title@2025-07-13 (7): Latent Functional Maps: a spectral framework for representation alignment
Title: Latent Functional Maps: a spectral framework for representation alignment | Latent Functional Maps: ein spektraler Rahmen für die Darstellungsausrichtung | 原始功能地图:代表调整的光谱框架 2406.14183v4 |
Authors (5): Marco Fumero, Marco Pegoraro, Valentino Maiorca, Francesco Locatello, Emanuele Rodolà
Neural models learn data representations that lie on low-dimensional manifolds, yet modeling the relation between these representational spaces is an ongoing challenge. By integrating spectral geometry principles into neural modeling, we show that this problem can be better addressed in the functional domain, mitigating complexity, while enhancing interpretability and performances on downstream tasks. To this end, we introduce a multi-purpose framework to the representation learning community, which allows to: (i) compare different spaces in an interpretable way and measure their intrinsic similarity; (ii) find correspondences between them, both in unsupervised and weakly supervised settings, and (iii) to effectively transfer representations between distinct spaces. We validate our framework on various applications, ranging from stitching to retrieval tasks, and on multiple modalities, demonstrating that Latent Functional Maps can serve as a swiss-army knife for representation alignment.
nan
Article 865
Title@2025-07-13 (7): VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Title: VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling | VideoChat-Flash: Hierarchische Komprimierung für die Langkontext-Videomodellierung | VideoChat-Flash:长文本视频建模的等级压缩 2501.00574v4 |
Authors (13): Xinhao Li, Yi Wang, Jiashuo Yu, Xiangyu Zeng, Yuhan Zhu, Haian Huang, Jianfei Gao, Kunchang Li, Yinan He, Chenting Wang, Yu Qiao, Yali Wang, Limin Wang
Long-context video modeling is critical for multimodal large language models (MLLMs), enabling them to process movies, online video streams, and so on. Despite its advances, handling long videos remains challenging due to the difficulty in efficiently understanding the extremely long video context. This paper aims to address this issue from aspects of model architecture, training data, training strategy and evaluation benchmark. First, we propose a novel Hierarchical video token Compression (HiCo) method, which leverages visual redundancy in long videos to compress long video context from Clip-level to Video-level, reducing the computation significantly while preserving essential details, achieving an extreme compression ratio of approximately 1/50 with almost no performance loss. Second, we introduce a multi-stage short-to-long learning scheme, a large-scale dataset of real-world long videos named LongVid, and a challenging ``Multi-Hop Needle-In-A-Video-Haystack’’ benchmark. Finally, we build a powerful video MLLM named VideoChat-Flash, which shows a leading performance on both mainstream long and short video benchmarks at the 2B and 7B model scale. It first gets 99.1% accuracy over 10,000 frames in NIAH among open-source models.
nan
Article 866
Title@2025-07-13 (7): Frequency-aware Surrogate Modeling With SMT Kernels For Advanced Data Forecasting
Title: Frequency-aware Surrogate Modeling With SMT Kernels For Advanced Data Forecasting | Frequency-aware Surrogate Modellierung mit SMT-Kerneln für erweiterte Datenvorhersage | 利用SMT内核建模甚高能代谢模型,用于高级数据预报 2507.09694v1 |
Authors (3): Nicolas Gonel, Paul Saves, Joseph Morlier
This paper introduces a comprehensive open-source framework for developing correlation kernels, with a particular focus on user-defined and composition of kernels for surrogate modeling. By advancing kernel-based modeling techniques, we incorporate frequency-aware elements that effectively capture complex mechanical behaviors and timefrequency dynamics intrinsic to aircraft systems. Traditional kernel functions, often limited to exponential-based methods, are extended to include a wider range of kernels such as exponential squared sine and rational quadratic kernels, along with their respective firstand second-order derivatives. The proposed methodologies are first validated on a sinus cardinal test case and then applied to forecasting Mauna-Loa Carbon Dioxide (CO 2 ) concentrations and airline passenger traffic. All these advancements are integrated into the open-source Surrogate Modeling Toolbox (SMT 2.0), providing a versatile platform for both standard and customizable kernel configurations. Furthermore, the framework enables the combination of various kernels to leverage their unique strengths into composite models tailored to specific problems. The resulting framework offers a flexible toolset for engineers and researchers, paving the way for numerous future applications in metamodeling for complex, frequency-sensitive domains.
nan
Article 867
Title@2025-07-13 (7): Comprehensive Evaluation of OCT-based Automated Segmentation of Retinal Layer, Fluid and Hyper-Reflective Foci: Impact on Clinical Assessment of Diabetic Retinopathy Severity
Title: Comprehensive Evaluation of OCT-based Automated Segmentation of Retinal Layer, Fluid and Hyper-Reflective Foci: Impact on Clinical Assessment of Diabetic Retinopathy Severity | Umfassende Bewertung der OCT-basierten Automatisierten Segmentierung von Netzhautschicht, Flüssigkeit und Hyperreflektiver Foci: Auswirkungen auf die klinische Beurteilung von severity diabetischer Retinopathie | 综合评价基于OCT的视网膜、流体和超反光谱系的视网膜、流体和超反光谱系的自动分解:对诊断性糖尿病病理严重性评估的影响 2503.01248v4 |
Authors (9): S. Chen, D. Ma, M. Raviselvan, S. Sundaramoorthy, K. Popuri, M. J. Ju, M. V. Sarunic, D. Ratra, M. F. Beg
Diabetic retinopathy (DR) is a leading cause of vision loss, requiring early and accurate assessment to prevent irreversible damage. Spectral Domain Optical Coherence Tomography (SD-OCT) enables high-resolution retinal imaging, but automated segmentation performance varies, especially in cases with complex fluid and hyperreflective foci (HRF) patterns. This study proposes an active-learning-based deep learning pipeline for automated segmentation of retinal layers, fluid, and HRF, using four state-of-the-art models: U-Net, SegFormer, SwinUNETR, and VM-UNet, trained on expert-annotated SD-OCT volumes. Segmentation accuracy was evaluated with five-fold cross-validation, and retinal thickness was quantified using a K-nearest neighbors algorithm and visualized with Early Treatment Diabetic Retinopathy Study (ETDRS) maps. SwinUNETR achieved the highest overall accuracy (DSC = 0.7719; NSD = 0.8149), while VM-UNet excelled in specific layers. Structural differences were observed between non-proliferative and proliferative DR, with layer-specific thickening correlating with visual acuity impairment. The proposed framework enables robust, clinically relevant DR assessment while reducing the need for manual annotation, supporting improved disease monitoring and treatment planning.
nan
Article 868
Title@2025-07-13 (7): Post-Training Quantization of Generative and Discriminative LSTM Text Classifiers: A Study of Calibration, Class Balance, and Robustness
Title: Post-Training Quantization of Generative and Discriminative LSTM Text Classifiers: A Study of Calibration, Class Balance, and Robustness | Post-Training Quantization of Generative and Discriminative LSTM Text Klassifikatoren: Eine Studie zur Kalibrierung, Klassenbilanz und Robustheit | 培训后对产生和区别的LSTM文字分类的量化:校准、分类平衡和强力研究 2507.09687v1 |
Authors (4): Md Mushfiqur Rahaman, Elliot Chang, Tasmiah Haque, Srinjoy Das
Text classification plays a pivotal role in edge computing applications like industrial monitoring, health diagnostics, and smart assistants, where low latency and high accuracy are both key requirements. Generative classifiers, in particular, have been shown to exhibit robustness to out-of-distribution and noisy data, which is an extremely critical consideration for deployment in such real-time edge environments. However, deploying such models on edge devices faces computational and memory constraints. Post Training Quantization (PTQ) reduces model size and compute costs without retraining, making it ideal for edge deployment. In this work, we present a comprehensive comparative study of generative and discriminative Long Short Term Memory (LSTM)-based text classification models with PTQ using the Brevitas quantization library. We evaluate both types of classifier models across multiple bitwidths and assess their robustness under regular and noisy input conditions. We find that while discriminative classifiers remain robust, generative ones are more sensitive to bitwidth, calibration data used during PTQ, and input noise during quantized inference. We study the influence of class imbalance in calibration data for both types of classifiers, comparing scenarios with evenly and unevenly distributed class samples including their effect on weight adjustments and activation profiles during PTQ. Using test statistics derived from nonparametric hypothesis testing, we identify that using class imbalanced data during calibration introduces insufficient weight adaptation at lower bitwidths for generative LSTM classifiers, thereby leading to degraded performance. This study underscores the role of calibration data in PTQ and when generative classifiers succeed or fail under noise, aiding deployment in edge environments.
nan
Article 869
Title@2025-07-13 (7): Symptom-Driven Personalized Proton Pump Inhibitors Therapy Using Bayesian Neural Networks and Model Predictive Control
Title: Symptom-Driven Personalized Proton Pump Inhibitors Therapy Using Bayesian Neural Networks and Model Predictive Control | Symptom-getriebene personalisierte Protonenpumpenhemmer Therapie mit Bayesian Neural Networks und Modell Predictive Control | 利用贝耶斯神经网络和模型预测控制进行治疗 2507.09685v1 |
Authors (2): Yutong Li, Ilya Kolmanovsky
Proton Pump Inhibitors (PPIs) are the standard of care for gastric acid disorders but carry significant risks when administered chronically at high doses. Precise long-term control of gastric acidity is challenged by the impracticality of invasive gastric acid monitoring beyond 72 hours and wide inter-patient variability. We propose a noninvasive, symptom-based framework that tailors PPI dosing solely on patient-reported reflux and digestive symptom patterns. A Bayesian Neural Network prediction model learns to predict patient symptoms and quantifies its uncertainty from historical symptom scores, meal, and PPIs intake data. These probabilistic forecasts feed a chance-constrained Model Predictive Control (MPC) algorithm that dynamically computes future PPI doses to minimize drug usage while enforcing acid suppression with high confidence - without any direct acid measurement. In silico studies over diverse dietary schedules and virtual patient profiles demonstrate that our learning-augmented MPC reduces total PPI consumption by 65 percent compared to standard fixed regimens, while maintaining acid suppression with at least 95 percent probability. The proposed approach offers a practical path to personalized PPI therapy, minimizing treatment burden and overdose risk without invasive sensors.
nan
Article 870
Title@2025-07-13 (7): Networked Information Aggregation via Machine Learning
Title: Networked Information Aggregation via Machine Learning | Vernetzte Informationsaggregation über maschinelles Lernen | 通过机器学习建立网络信息聚合 2507.09683v1 |
Authors (3): Michael Kearns, Aaron Roth, Emily Ryu
We study a distributed learning problem in which learning agents are embedded in a directed acyclic graph (DAG). There is a fixed and arbitrary distribution over feature/label pairs, and each agent or vertex in the graph is able to directly observe only a subset of the features – potentially a different subset for every agent. The agents learn sequentially in some order consistent with a topological sort of the DAG, committing to a model mapping observations to predictions of the real-valued label. Each agent observes the predictions of their parents in the DAG, and trains their model using both the features of the instance that they directly observe, and the predictions of their parents as additional features. We ask when this process is sufficient to achieve \emph{information aggregation}, in the sense that some agent in the DAG is able to learn a model whose error is competitive with the best model that could have been learned (in some hypothesis class) with direct access to \emph{all} features, despite the fact that no single agent in the network has such access. We give upper and lower bounds for this problem for both linear and general hypothesis classes. Our results identify the \emph{depth} of the DAG as the key parameter: information aggregation can occur over sufficiently long paths in the DAG, assuming that all of the relevant features are well represented along the path, and there are distributions over which information aggregation cannot occur even in the linear case, and even in arbitrarily large DAGs that do not have sufficient depth (such as a hub-and-spokes topology in which the spoke vertices collectively see all the features). We complement our theoretical results with a comprehensive set of experiments.
nan
Article 871
Title@2025-07-13 (7): Towards Reliable Forgetting: A Survey on Machine Unlearning Verification
Title: Towards Reliable Forgetting: A Survey on Machine Unlearning Verification | Zuverlässiges Vergessen: Eine Umfrage über die Überprüfung des maschinellen Lernens | 实现可靠地遗忘:关于机械不学习核查的调查 2506.15115v2 |
Authors (10): Lulu Xue, Shengshan Hu, Wei Lu, Yan Shen, Dongxu Li, Peijin Guo, Ziqi Zhou, Minghui Li, Yanjun Zhang, Leo Yu Zhang
With growing demands for privacy protection, security, and legal compliance (e.g., GDPR), machine unlearning has emerged as a critical technique for ensuring the controllability and regulatory alignment of machine learning models. However, a fundamental challenge in this field lies in effectively verifying whether unlearning operations have been successfully and thoroughly executed. Despite a growing body of work on unlearning techniques, verification methodologies remain comparatively underexplored and often fragmented. Existing approaches lack a unified taxonomy and a systematic framework for evaluation. To bridge this gap, this paper presents the first structured survey of machine unlearning verification methods. We propose a taxonomy that organizes current techniques into two principal categories – behavioral verification and parametric verification – based on the type of evidence used to assess unlearning fidelity. We examine representative methods within each category, analyze their underlying assumptions, strengths, and limitations, and identify potential vulnerabilities in practical deployment. In closing, we articulate a set of open problems in current verification research, aiming to provide a foundation for developing more robust, efficient, and theoretically grounded unlearning verification mechanisms.
nan
Article 872
Title@2025-07-13 (7): Conformal Prediction for Privacy-Preserving Machine Learning
Title: Conformal Prediction for Privacy-Preserving Machine Learning | Conformal Prediction for Privacy-Preserving Machine Learning | 隐私保护机器学习的正规预测 2507.09678v1 |
Authors (3): Alexander David Balinsky, Dominik Krzeminski, Alexander Balinsky
We investigate the integration of Conformal Prediction (CP) with supervised learning on deterministically encrypted data, aiming to bridge the gap between rigorous uncertainty quantification and privacy-preserving machine learning. Using AES-encrypted variants of the MNIST dataset, we demonstrate that CP methods remain effective even when applied directly in the encrypted domain, owing to the preservation of data exchangeability under fixed-key encryption. We test traditional $p$-value-based against $e$-value-based conformal predictors. Our empirical evaluation reveals that models trained on deterministically encrypted data retain the ability to extract meaningful structure, achieving 36.88\% test accuracy – significantly above random guessing (9.56\%) observed with per-instance encryption. Moreover, $e$-value-based CP achieves predictive set coverage of over 60\% with 4.3 loss-threshold calibration, correctly capturing the true label in 4888 out of 5000 test cases. In contrast, the $p$-value-based CP yields smaller predictive sets but with reduced coverage accuracy. These findings highlight both the promise and limitations of CP in encrypted data settings and underscore critical trade-offs between prediction set compactness and reliability. %Our work sets a foundation for principled uncertainty quantification in secure, privacy-aware learning systems.
nan
Article 873
Title@2025-07-13 (7): Fine-tuning Large Language Model for Automated Algorithm Design
Title: Fine-tuning Large Language Model for Automated Algorithm Design | Feinabstimmung Großsprachiges Modell für automatisiertes Algorithmen-Design | 自动算法设计大语言模型 2507.10614v1 |
Authors (5): Fei Liu, Rui Zhang, Xi Lin, Zhichao Lu, Qingfu Zhang
The integration of large language models (LLMs) into automated algorithm design has shown promising potential. A prevalent approach embeds LLMs within search routines to iteratively generate and refine candidate algorithms. However, most existing methods rely on off-the-shelf LLMs trained for general coding tasks,leaving a key question open: Do we need LLMs specifically tailored for algorithm design? If so, how can such LLMs be effectively obtained and how well can they generalize across different algorithm design tasks? In this paper, we take a first step toward answering these questions by exploring fine-tuning of LLMs for algorithm design. We introduce a Diversity-Aware Rank based (DAR) sampling strategy to balance training data diversity and quality, then we leverage direct preference optimization to efficiently align LLM outputs with task objectives. Our experiments, conducted on Llama-3.2-1B-Instruct and Llama- 3.1-8B-Instruct, span three distinct algorithm design tasks. Results suggest that finetuned LLMs can significantly outperform their off-the-shelf counterparts with the smaller Llama-3.2-1B-Instruct and match the larger Llama-3.1-8B-Instruct on the admissible set problem. Moreover, we observe promising generalization: LLMs finetuned on specific algorithm design tasks also improve performance on related tasks with varying settings. These findings highlight the value of task-specific adaptation for LLMs in algorithm design and open new avenues for future research.
nan
Article 874
Title@2025-07-13 (7): Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs
Title: Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs | Sub-Scaling-Gesetze: Zur Rolle der Datendichte und Ausbildungsstrategien in LLMs | 次级衡量法律:关于数据密度的作用和培训战略 2507.10613v1 |
Authors (8): Zhengyu Chen, Siqi Wang, Teng Xiao, Yudong Wang, Shiqi Chen, Xunliang Cai, Junxian He, Jingang Wang
Traditional scaling laws in natural language processing suggest that increasing model size and training data enhances performance. However, recent studies reveal deviations, particularly in large language models, where performance improvements decelerate, which is a phenomenon known as sub-scaling. This paper revisits these scaling laws by examining the impact of data quality and training strategies on model performance. Through extensive empirical analysis of over 400 models, we identify high data density and non-optimal resource allocation as key factors contributing to sub-scaling. High data density leads to diminishing returns due to redundant information, while optimal resource allocation is crucial for sustained performance improvements. We propose a sub-optimal scaling law that better predicts performance in sub-scaling regimes, highlighting the importance of data quality and diversity.
nan
Article 875
Title@2025-07-13 (7): Machine-Precision Prediction of Low-Dimensional Chaotic Systems
Title: Machine-Precision Prediction of Low-Dimensional Chaotic Systems | Maschinenpräzisionsvorhersage von niederdimensionalen Chaotischen Systemen | 低多功能卫生系统机器精确预测 2507.09652v1 |
Authors (2): Christof Schötz, Niklas Boers
Low-dimensional chaotic systems such as the Lorenz-63 model are commonly used to benchmark system-agnostic methods for learning dynamics from data. Here we show that learning from noise-free observations in such systems can be achieved up to machine precision: using ordinary least squares regression on high-degree polynomial features with 512-bit arithmetic, our method exceeds the accuracy of standard 64-bit numerical ODE solvers of the true underlying dynamical systems. Depending on the configuration, we obtain valid prediction times of 32 to 105 Lyapunov times for the Lorenz-63 system, dramatically outperforming prior work that reaches 13 Lyapunov times at most. We further validate our results on Thomas’ Cyclically Symmetric Attractor, a non-polynomial chaotic system that is considerably more complex than the Lorenz-63 model, and show that similar results extend also to higher dimensions using the spatiotemporally chaotic Lorenz-96 model. Our findings suggest that learning low-dimensional chaotic systems from noise-free data is a solved problem.
nan
Article 876
Title@2025-07-13 (7): Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
Title: Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset | Pluralismus in der algorithmischen Monokultur kultivieren: Der Datensatz zur Gemeinschaftsausrichtung | 在高农业单体养殖中培养多元主义:社区协调数据集 2507.09650v1 |
Authors (15): Lily Hong Zhang, Smitha Milli, Karen Jusko, Jonathan Smith, Brandon Amos, Wassim, Bouaziz, Manon Revel, Jack Kussman, Lisa Titus, Bhaktipriya Radharapu, Jane Yu, Vidya Sarma, Kris Rose, Maximilian Nickel
How can large language models (LLMs) serve users with varying preferences that may conflict across cultural, political, or other dimensions? To advance this challenge, this paper establishes four key results. First, we demonstrate, through a large-scale multilingual human study with representative samples from five countries (N=15,000), that humans exhibit significantly more variation in preferences than the responses of 21 state-of-the-art LLMs. Second, we show that existing methods for preference dataset collection are insufficient for learning the diversity of human preferences even along two of the most salient dimensions of variability in global values, due to the underlying homogeneity of candidate responses. Third, we argue that this motivates the need for negatively-correlated sampling when generating candidate sets, and we show that simple prompt-based techniques for doing so significantly enhance the performance of alignment methods in learning heterogeneous preferences. Fourth, based on this novel candidate sampling approach, we collect and open-source Community Alignment, the largest and most representative multilingual and multi-turn preference dataset to date, featuring almost 200,000 comparisons from annotators spanning five countries. We hope that the Community Alignment dataset will be a valuable resource for improving the effectiveness of LLMs for a diverse global population.
nan
Article 877
Title@2025-07-13 (7): Learning Flexible Forward Trajectories for Masked Molecular Diffusion
Title: Learning Flexible Forward Trajectories for Masked Molecular Diffusion | Flexible Forward-Trajektorien für maskierte molekulare Diffusion lernen | 蒙面分子扩散学习灵活前向轨迹 2505.16790v3 |
Authors (4): Hyunjin Seo, Taewon Kim, Sihyun Yu, SungSoo Ahn
Masked diffusion models (MDMs) have achieved notable progress in modeling discrete data, while their potential in molecular generation remains underexplored. In this work, we explore their potential and introduce the surprising result that naively applying standards MDMs severely degrades the performance. We identify the critical cause of this issue as a state-clashing problem-where the forward diffusion of distinct molecules collapse into a common state, resulting in a mixture of reconstruction targets that cannot be learned using typical reverse diffusion process with unimodal predictions. To mitigate this, we propose Masked Element-wise Learnable Diffusion (MELD) that orchestrates per-element corruption trajectories to avoid collision between distinct molecular graphs. This is achieved through a parameterized noise scheduling network that assigns distinct corruption rates to individual graph elements, i.e., atoms and bonds. Extensive experiments on diverse molecular benchmarks reveal that MELD markedly enhances overall generation quality compared to element-agnostic noise scheduling, increasing the chemical validity of vanilla MDMs on ZINC250K from 15% to 93%, Furthermore, it achieves state-of-the-art property alignment in conditional generation tasks.
nan
Article 878
Title@2025-07-13 (7): Disentanglement and Assessment of Shortcuts in Ophthalmological Retinal Imaging Exams
Title: Disentanglement and Assessment of Shortcuts in Ophthalmological Retinal Imaging Exams | Entflechtung und Beurteilung von Abkürzungen bei ophthalmologischen Retina-Imaging-Prüfungen | 眼视视网膜成像Exams 中快捷键的分解和评估 2507.09640v1 |
Authors (5): Leonor Fernandes, Tiago Gonçalves, João Matos, Luis Filipe Nakayama, Jaime S. Cardoso
Diabetic retinopathy (DR) is a leading cause of vision loss in working-age adults. While screening reduces the risk of blindness, traditional imaging is often costly and inaccessible. Artificial intelligence (AI) algorithms present a scalable diagnostic solution, but concerns regarding fairness and generalization persist. This work evaluates the fairness and performance of image-trained models in DR prediction, as well as the impact of disentanglement as a bias mitigation technique, using the diverse mBRSET fundus dataset. Three models, ConvNeXt V2, DINOv2, and Swin V2, were trained on macula images to predict DR and sensitive attributes (SAs) (e.g., age and gender/sex). Fairness was assessed between subgroups of SAs, and disentanglement was applied to reduce bias. All models achieved high DR prediction performance in diagnosing (up to 94% AUROC) and could reasonably predict age and gender/sex (91% and 77% AUROC, respectively). Fairness assessment suggests disparities, such as a 10% AUROC gap between age groups in DINOv2. Disentangling SAs from DR prediction had varying results, depending on the model selected. Disentanglement improved DINOv2 performance (2% AUROC gain), but led to performance drops in ConvNeXt V2 and Swin V2 (7% and 3%, respectively). These findings highlight the complexity of disentangling fine-grained features in fundus imaging and emphasize the importance of fairness in medical imaging AI to ensure equitable and reliable healthcare solutions.
nan
Article 879
Title@2025-07-13 (7): Limits of Discrete Energy of Families of Increasing Sets
Title: Limits of Discrete Energy of Families of Increasing Sets | Grenzen der diskreten Energie von Familien zunehmender Sets | 增加组家庭不同能源限度的限制 2504.11302v3 |
Authors (1): Hari Sarang Nathan
The Hausdorff dimension of a set can be detected using the Riesz energy. Here, we consider situations where a sequence of points, ${x_n}$, ``fills in’’ a set $E \subset \mathbb{R}^d$ in an appropriate sense and investigate the degree to which the discrete analog to the Riesz energy of these sets can be used to bound the Hausdorff dimension of $E$. We also discuss applications to data science and Erd\H{o}s/Falconer type problems.
nan
Article 880
Title@2025-07-13 (7): Lightweight Deep Learning-Based Channel Estimation for RIS-Aided Extremely Large-Scale MIMO Systems on Resource-Limited Edge Devices
Title: Lightweight Deep Learning-Based Channel Estimation for RIS-Aided Extremely Large-Scale MIMO Systems on Resource-Limited Edge Devices | Leichte Deep Learning-basierte Kanalschätzung für RIS-geförderte extrem großräumige MIMO-Systeme auf ressourcenschonenden Edge-Geräten | 对资源限制边缘装置的RIS帮助极大型IMIM系统进行基于深深学习的频道估计 2507.09627v1 |
Authors (3): Muhammad Kamran Saeed, Ashfaq Khokhar, Shakil Ahmed
Next-generation wireless technologies such as 6G aim to meet demanding requirements such as ultra-high data rates, low latency, and enhanced connectivity. Extremely Large-Scale MIMO (XL-MIMO) and Reconfigurable Intelligent Surface (RIS) are key enablers, with XL-MIMO boosting spectral and energy efficiency through numerous antennas, and RIS offering dynamic control over the wireless environment via passive reflective elements. However, realizing their full potential depends on accurate Channel State Information (CSI). Recent advances in deep learning have facilitated efficient cascaded channel estimation. However, the scalability and practical deployment of existing estimation models in XL-MIMO systems remain limited. The growing number of antennas and RIS elements introduces a significant barrier to real-time and efficient channel estimation, drastically increasing data volume, escalating computational complexity, requiring advanced hardware, and resulting in substantial energy consumption. To address these challenges, we propose a lightweight deep learning framework for efficient cascaded channel estimation in XL-MIMO systems, designed to minimize computational complexity and make it suitable for deployment on resource-constrained edge devices. Using spatial correlations in the channel, we introduce a patch-based training mechanism that reduces the dimensionality of input to patch-level representations while preserving essential information, allowing scalable training for large-scale systems. Simulation results under diverse conditions demonstrate that our framework significantly improves estimation accuracy and reduces computational complexity, regardless of the increasing number of antennas and RIS elements in XL-MIMO systems.
nan
Article 881
Title@2025-07-13 (7): WeGeFT: Weight-Generative Fine-Tuning for Multi-Faceted Efficient Adaptation of Large Models
Title: WeGeFT: Weight-Generative Fine-Tuning for Multi-Faceted Efficient Adaptation of Large Models | WeGeFT: Gewicht-Generative Feintuning für die effiziente Anpassung großer Modelle | WeGeFT: 使大型模型的多面高效适应的重量-弹性微调 2312.00700v5 |
Authors (3): Chinmay Savadikar, Xi Song, Tianfu Wu
Fine-tuning large pretrained Transformer models can focus on either introducing a small number of new learnable parameters (parameter efficiency) or editing representations of a small number of tokens using lightweight modules (representation efficiency). While the pioneering method LoRA (Low-Rank Adaptation) inherently balances parameter, compute, and memory efficiency, many subsequent variants trade off compute and memory efficiency and/or performance to further reduce fine-tuning parameters. To address this limitation and unify parameter-efficient and representation-efficient fine-tuning, we propose Weight-Generative Fine-Tuning (WeGeFT, pronounced wee-gift), a novel approach that learns to generate fine-tuning weights directly from the pretrained weights. WeGeFT employs a simple low-rank formulation consisting of two linear layers, either shared across multiple layers of the pretrained model or individually learned for different layers. This design achieves multi-faceted efficiency in parameters, representations, compute, and memory, while maintaining or exceeding the performance of LoRA and its variants. Extensive experiments on commonsense reasoning, arithmetic reasoning, instruction following, code generation, and visual recognition verify the effectiveness of our proposed WeGeFT. Our code is available at https://github.com/savadikarc/wegeft
nan
Article 882
Title@2025-07-13 (7): CAN-Trace Attack: Exploit CAN Messages to Uncover Driving Trajectories
Title: CAN-Trace Attack: Exploit CAN Messages to Uncover Driving Trajectories | CAN-Trace Attack: CAN-Nachrichten nutzen, um Fahrbahnen zu entdecken | Can- Trace 攻击: 将 CAN 信件开发到无法覆盖的驱动轨迹 2507.09624v1 |
Authors (7): Xiaojie Lin, Baihe Ma, Xu Wang, Guangsheng Yu, Ying He, Wei Ni, Ren Ping Liu
Driving trajectory data remains vulnerable to privacy breaches despite existing mitigation measures. Traditional methods for detecting driving trajectories typically rely on map-matching the path using Global Positioning System (GPS) data, which is susceptible to GPS data outage. This paper introduces CAN-Trace, a novel privacy attack mechanism that leverages Controller Area Network (CAN) messages to uncover driving trajectories, posing a significant risk to drivers’ long-term privacy. A new trajectory reconstruction algorithm is proposed to transform the CAN messages, specifically vehicle speed and accelerator pedal position, into weighted graphs accommodating various driving statuses. CAN-Trace identifies driving trajectories using graph-matching algorithms applied to the created graphs in comparison to road networks. We also design a new metric to evaluate matched candidates, which allows for potential data gaps and matching inaccuracies. Empirical validation under various real-world conditions, encompassing different vehicles and driving regions, demonstrates the efficacy of CAN-Trace: it achieves an attack success rate of up to 90.59% in the urban region, and 99.41% in the suburban region.
nan
Article 883
Title@2025-07-13 (7): The Full-scale Assembly Simulation Testbed (FAST) Dataset
Title: The Full-scale Assembly Simulation Testbed (FAST) Dataset | Der Full-Scale Assembly Simulation Testbed (FAST) Datensatz | 全规模大会模拟模拟试验数据集 2403.08969v2 |
Authors (5): Alec G. Moore, Tiffany D. Do, Nayan N. Chawla, Antonia Jimenez Iriarte, Ryan P. McMahan
In recent years, numerous researchers have begun investigating how virtual reality (VR) tracking and interaction data can be used for a variety of machine learning purposes, including user identification, predicting cybersickness, and estimating learning gains. One constraint for this research area is the dearth of open datasets. In this paper, we present a new open dataset captured with our VR-based Full-scale Assembly Simulation Testbed (FAST). This dataset consists of data collected from 108 participants (50 females, 56 males, 2 non-binary) learning how to assemble two distinct full-scale structures in VR. In addition to explaining how the dataset was collected and describing the data included, we discuss how the dataset may be used by future researchers.
nan
Article 884
Title@2025-07-13 (7): Regret Analysis of Policy Optimization over Submanifolds for Linearly Constrained Online LQG
Title: Regret Analysis of Policy Optimization over Submanifolds for Linearly Constrained Online LQG | Bedauerliche Analyse der Politikoptimierung über Submanifolds für linear eingeschränkte Online LQG | 对线性受约束在线LQG对潜艇皮带政策优化的遗憾分析 2403.08553v2 |
Authors (2): Ting-Jui Chang, Shahin Shahrampour
Recent advancement in online optimization and control has provided novel tools to study online linear quadratic regulator (LQR) problems, where cost matrices are time-varying and unknown in advance. In this work, we study the online linear quadratic Gaussian (LQG) problem over the manifold of stabilizing controllers that are linearly constrained to impose physical conditions such as sparsity. By adopting a Riemannian perspective, we propose the online Newton on manifold (ONM) algorithm, which generates an online controller on-the-fly based on the second-order information of the cost function sequence. To quantify the algorithm performance, we use the notion of regret, defined as the sub-optimality of the algorithm cumulative cost against a (locally) minimizing controller sequence. We establish a regret bound in terms of the path-length of the benchmark minimizer sequence, and we further verify the effectiveness of ONM via simulations.
nan
Article 885
Title@2025-07-13 (7): MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression
Title: MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression | MLoRQ: Bridging Low-Rank und Quantisierung für Transformer-Kompression | MLORQ: 连接低兰克和变压压缩量化 2507.09616v1 |
Authors (6): Ofir Gordon, Ariel Lapid, Elad Cohen, Yarden Yagil, Arnon Netzer, Hai Victor Habi
Deploying transformer-based neural networks on resource-constrained edge devices presents a significant challenge. This challenge is often addressed through various techniques, such as low-rank approximation and mixed-precision quantization. In this work, we introduce Mixed Low-Rank and Quantization (MLoRQ), a novel method that integrates both techniques. MLoRQ employs a two-stage optimization process to determine optimal bit-width and rank assignments for each layer, adhering to predefined memory constraints. This process includes: (i) an intra-layer optimization that identifies potentially optimal compression solutions out of all low-rank and quantization combinations; (ii) an inter-layer optimization that assigns bit-width precision and rank to each layer while ensuring the memory constraint is met. An optional final step applies a sequential optimization process using a modified adaptive rounding technique to mitigate compression-induced errors in joint low-rank approximation and quantization. The method is compatible and can be seamlessly integrated with most existing quantization algorithms. MLoRQ shows state-of-the-art results with up to 15\% performance improvement, evaluated on Vision Transformers for image classification, object detection, and instance segmentation tasks.
nan
Article 886
Title@2025-07-13 (7): Your Absorbing Discrete Diffusion Secretly Models the Bayesian Posterior
Title: Your Absorbing Discrete Diffusion Secretly Models the Bayesian Posterior | Ihre absorbierende Diskrete Diffusion heimlich Modelle der Bayesian Posterior | 您的吸收分解扩散秘密模型 贝叶斯波斯别墅 2507.07586v2 |
Authors (1): Cooper Doyle
Discrete diffusion language models learn to reconstruct text from randomly masked inputs, yet under mild assumptions their denoiser already implements the exact Bayesian posterior over the original tokens. We prove that the expected denoiser output under the forward corruption distribution recovers the true posterior, and that a simple Monte Carlo estimator converges to this posterior at rate O(1/sqrt(K)) with finite-sample concentration bounds. Building on this insight, we introduce an inference-time ensemble that runs K independent denoising passes and aggregates both posterior means and variances without any extra training. On WikiText-2, our MC-marginal sampler recovers the analytic lambda-DCE zero-shot perplexity (approximately 39) to within a few points at K=128, and its per-token variance shows a strong rank correlation with reconstruction error (Spearman rho = 0.996). This cost-proportional procedure yields calibrated uncertainty estimates and a direct trade-off between compute and posterior fidelity in discrete diffusion LMs.
nan
Article 887
Title@2025-07-13 (7): Prediction-Augmented Mechanism Design for Weighted Facility Location
Title: Prediction-Augmented Mechanism Design for Weighted Facility Location | Voraussichtlicher Mechanismus für den Standort der gewichteten Fazilität | 加权设施位置设计 2507.06509v3 |
Authors (2): Yangguang Shi, Zhenyu Xue
Facility location is fundamental in operations research, mechanism design, and algorithmic game theory, with applications ranging from urban infrastructure planning to distributed systems. Recent research in this area has focused on augmenting classic strategyproof mechanisms with predictions to achieve an improved performance guarantee against the uncertainty under the strategic environment. Previous work has been devoted to address the trade-off obstacle of balancing the consistency (near-optimality under accurate predictions) and robustness (bounded inefficiency under poor predictions) primarily in the unweighted setting, assuming that all agents have the same importance. However, this assumption may not be true in some practical scenarios, leading to research of weighted facility location problems. The major contribution of the current work is to provide a prediction augmented algorithmic framework for balancing the consistency and robustness over strategic agents with non-uniform weights. In particular, through a reduction technique that identifies a subset of representative instances and maps the other given locations to the representative ones, we prove that there exists a strategyproof mechanism achieving a bounded consistency guarantee of $\frac{\sqrt{(1+c)^2W^2{\min}+(1-c)^2W^2{\max}}}{(1+c)W_{\min}}$ and a bounded robustness guarantee of $\frac{\sqrt{(1-c)^2W^2{\min}+(1+c)^2W^2{\max}}}{(1-c)W_{\min}}$ in weighted settings, where $c$ can be viewed as a parameter to make a trade-off between the consistency and robustness and $W_{\min}$ and $W_{\max}$ denote the minimum and maximum agents’ weight. We also prove that there is no strategyproof deterministic mechanism that reach $1$-consistency and $O\left( n \cdot \frac{W_{\max}}{W_{\min}} \right)$-robustness in weighted FLP, even with fully predictions of all agents.
nan
Article 888
Title@2025-07-13 (7): DRAGD: A Federated Unlearning Data Reconstruction Attack Based on Gradient Differences
Title: DRAGD: A Federated Unlearning Data Reconstruction Attack Based on Gradient Differences | DRAGD: Ein Federated Unlearning Data Reconstruction Attack basierend auf gradienten Unterschieden | DRADD:基于渐变差异的联合会不学习数据重建攻击 2507.09602v1 |
Authors (4): Bocheng Ju, Junchao Fan, Jiaqi Liu, Xiaolin Chang
Federated learning enables collaborative machine learning while preserving data privacy. However, the rise of federated unlearning, designed to allow clients to erase their data from the global model, introduces new privacy concerns. Specifically, the gradient exchanges during the unlearning process can leak sensitive information about deleted data. In this paper, we introduce DRAGD, a novel attack that exploits gradient discrepancies before and after unlearning to reconstruct forgotten data. We also present DRAGDP, an enhanced version of DRAGD that leverages publicly available prior data to improve reconstruction accuracy, particularly for complex datasets like facial images. Extensive experiments across multiple datasets demonstrate that DRAGD and DRAGDP significantly outperform existing methods in data reconstruction.Our work highlights a critical privacy vulnerability in federated unlearning and offers a practical solution, advancing the security of federated unlearning systems in real-world applications.
nan
Article 889
Title@2025-07-13 (7): Denoising and Reconstruction of Nonlinear Dynamics using Truncated Reservoir Computing
Title: Denoising and Reconstruction of Nonlinear Dynamics using Truncated Reservoir Computing | Denoising und Rekonstruktion von nichtlinearen Dynamiken mit verkürztem Reservoir Computing | 使用流动储量计算法进行非线性动态的衰减和重建 2504.13355v2 |
Authors (4): Omid Sedehi, Manish Yadav, Merten Stender, Sebastian Oberst
Measurements acquired from distributed physical systems are often sparse and noisy. Therefore, signal processing and system identification tools are required to mitigate noise effects and reconstruct unobserved dynamics from limited sensor data. However, this process is particularly challenging because the fundamental equations governing the dynamics are largely unavailable in practice. Reservoir Computing (RC) techniques have shown promise in efficiently simulating dynamical systems through an unstructured and efficient computation graph comprising a set of neurons with random connectivity. However, the potential of RC to operate in noisy regimes and distinguish noise from the primary smooth or non-smooth deterministic dynamics of the system has not been fully explored. This paper presents a novel RC method for noise filtering and reconstructing unobserved nonlinear dynamics, offering a novel learning protocol associated with hyperparameter optimization. The performance of the RC in terms of noise intensity, noise frequency content, and drastic shifts in dynamical parameters is studied in two illustrative examples involving the nonlinear dynamics of the Lorenz attractor and the adaptive exponential integrate-and-fire system. It is demonstrated that denoising performance improves by truncating redundant nodes and edges of the reservoir, as well as by properly optimizing hyperparameters, such as the leakage rate, spectral radius, input connectivity, and ridge regression parameter. Furthermore, the presented framework shows good generalization behavior when tested for reconstructing unseen and qualitatively different attractors. Compared to the extended Kalman filter, the presented RC framework yields competitive accuracy at low signal-to-noise ratios and high-frequency ranges.
nan
Article 890
Title@2025-07-13 (7): Identifying Offline Metrics that Predict Online Impact: A Pragmatic Strategy for Real-World Recommender Systems
Title: Identifying Offline Metrics that Predict Online Impact: A Pragmatic Strategy for Real-World Recommender Systems | Offline-Metriken identifizieren, die Online-Impact voraussagen: Eine Pragmatische Strategie für Real-World-Empfängersysteme | 查明预测在线影响的离线下矩阵:现实世界建议系统实用战略 2507.09566v1 |
Authors (2): Timo Wilm, Philipp Normann
A critical challenge in recommender systems is to establish reliable relationships between offline and online metrics that predict real-world performance. Motivated by recent advances in Pareto front approximation, we introduce a pragmatic strategy for identifying offline metrics that align with online impact. A key advantage of this approach is its ability to simultaneously serve multiple test groups, each with distinct offline performance metrics, in an online experiment controlled by a single model. The method is model-agnostic for systems with a neural network backbone, enabling broad applicability across architectures and domains. We validate the strategy through a large-scale online experiment in the field of session-based recommender systems on the OTTO e-commerce platform. The online experiment identifies significant alignments between offline metrics and real-word click-through rate, post-click conversion rate and units sold. Our strategy provides industry practitioners with a valuable tool for understanding offline-to-online metric relationships and making informed, data-driven decisions.
nan
Article 891
Title@2025-07-13 (7): CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Title: CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning | CATP-LLM: Große Sprachmodelle für die kostenbewusste Werkzeugplanung | CATP-LLM:增强成本软件工具规划大语言模型能力 2411.16313v3 |
Authors (6): Duo Wu, Jinghe Wang, Yuan Meng, Yanning Zhang, Le Sun, Zhi Wang
Utilizing large language models (LLMs) for tool planning has emerged as a promising avenue for developing general AI systems, where LLMs automatically schedule external tools (e.g., vision models) to tackle complex tasks based on task descriptions. To push this paradigm toward practical applications, it is crucial for LLMs to consider tool execution costs (e.g., execution time) for tool planning. Unfortunately, prior studies overlook the tool execution costs, leading to the generation of expensive plans whose costs outweigh their benefits in terms of task performance. To fill this gap, we propose the Cost-Aware Tool Planning with LLMs (CATP-LLM) framework, which for the first time provides a coherent design to empower LLMs for cost-aware tool planning. Specifically, To facilitate efficient concurrent tool execution and cost reduction, we design a tool planning language to enhance the LLM for creating multi-branch non-sequential plans. Moreover, we propose a cost-aware offline reinforcement learning algorithm to fine-tune the LLM to optimize the performance-cost trade-off in tool planning. In the lack of public cost-related datasets, we further present OpenCATP, the first dataset for cost-aware planning, which comprises 11,100 evaluation samples from diverse tasks. Extensive experiments show that CATP-LLM outperforms GPT-4 even when using Llama2-7B as its backbone, with the average improvement of 1.5%-93.9% in terms of plan quality. Codes and dataset are available at: https://github.com/duowuyms/OpenCATP-LLM.
nan
Article 892
Title@2025-07-13 (7): A modular framework for automated evaluation of procedural content generation in serious games with deep reinforcement learning agents
Title: A modular framework for automated evaluation of procedural content generation in serious games with deep reinforcement learning agents | Ein modularer Rahmen für die automatisierte Bewertung der verfahrenstechnischen Inhaltsgenerierung in ernsten Spielen mit Deep-Enforcement-Learning-Agenten | 与深强化学习机构一起对严重游戏的程序内容生成进行自动自动评价的模块化框架 2505.16801v2 |
Authors (5): Eleftherios Kalafatis, Konstantinos Mitsis, Konstantia Zarkogianni, Maria Athanasiou, Konstantina Nikita
Serious Games (SGs) are nowadays shifting focus to include procedural content generation (PCG) in the development process as a means of offering personalized and enhanced player experience. However, the development of a framework to assess the impact of PCG techniques when integrated into SGs remains particularly challenging. This study proposes a methodology for automated evaluation of PCG integration in SGs, incorporating deep reinforcement learning (DRL) game testing agents. To validate the proposed framework, a previously introduced SG featuring card game mechanics and incorporating three different versions of PCG for nonplayer character (NPC) creation has been deployed. Version 1 features random NPC creation, while versions 2 and 3 utilize a genetic algorithm approach. These versions are used to test the impact of different dynamic SG environments on the proposed framework’s agents. The obtained results highlight the superiority of the DRL game testing agents trained on Versions 2 and 3 over those trained on Version 1 in terms of win rate (i.e. number of wins per played games) and training time. More specifically, within the execution of a test emulating regular gameplay, both Versions 2 and 3 peaked at a 97% win rate and achieved statistically significant higher (p=0009) win rates compared to those achieved in Version 1 that peaked at 94%. Overall, results advocate towards the proposed framework’s capability to produce meaningful data for the evaluation of procedurally generated content in SGs.
nan
Article 893
Title@2025-07-13 (7): Is Intermediate Fusion All You Need for UAV-based Collaborative Perception?
Title: Is Intermediate Fusion All You Need for UAV-based Collaborative Perception? | Ist Intermediate Fusion alles, was Sie für UAV-basierte Collaborative Perception benötigen? | 中间融合 需要所有你 以无人驾驶飞行器为基础的协作感知? 2504.21774v2 |
Authors (7): Jiuwu Hao, Liguo Sun, Yuting Wan, Yueyang Wu, Ti Xiang, Haolin Song, Pin Lv
Collaborative perception enhances environmental awareness through inter-agent communication and is regarded as a promising solution to intelligent transportation systems. However, existing collaborative methods for Unmanned Aerial Vehicles (UAVs) overlook the unique characteristics of the UAV perspective, resulting in substantial communication overhead. To address this issue, we propose a novel communication-efficient collaborative perception framework based on late-intermediate fusion, dubbed LIF. The core concept is to exchange informative and compact detection results and shift the fusion stage to the feature representation level. In particular, we leverage vision-guided positional embedding (VPE) and box-based virtual augmented feature (BoBEV) to effectively integrate complementary information from various agents. Additionally, we innovatively introduce an uncertainty-driven communication mechanism that uses uncertainty evaluation to select high-quality and reliable shared areas. Experimental results demonstrate that our LIF achieves superior performance with minimal communication bandwidth, proving its effectiveness and practicality. Code and models are available at https://github.com/uestchjw/LIF.
nan
Article 894
Title@2025-07-13 (7): Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes
Title: Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes | Beschreibung des Ausbildungsprozesses von Neuronalen Netzwerken über Ergodic Theorem : Geisterknoten | 描述Ergodic定理神经网络培训过程:幽灵节点 2507.01003v3 |
Authors (2): Eun-Ji Park, Sangwon Yun
Recent studies have proposed interpreting the training process from an ergodic perspective. Building on this foundation, we present a unified framework for understanding and accelerating the training of deep neural networks via stochastic gradient descent (SGD). By analyzing the geometric landscape of the objective function we introduce a practical diagnostic, the running estimate of the largest Lyapunov exponent, which provably distinguishes genuine convergence toward stable minimizers from mere statistical stabilization near saddle points. We then propose a ghost category extension for standard classifiers that adds auxiliary ghost output nodes so the model gains extra descent directions that open a lateral corridor around narrow loss barriers and enable the optimizer to bypass poor basins during the early training phase. We show that this extension strictly reduces the approximation error and that after sufficient convergence the ghost dimensions collapse so that the extended model coincides with the original one and there exists a path in the enlarged parameter space along which the total loss does not increase. Taken together, these results provide a principled architecture level intervention that accelerates early stage trainability while preserving asymptotic behavior and simultaneously serves as an architecture-friendly regularizer.
nan
Article 895
Title@2025-07-13 (7): Reinforced Reasoning for Embodied Planning
Title: Reinforced Reasoning for Embodied Planning | Verstärkte Begründung für die körperbetonte Planung | 强化规划强化理由 2505.22050v2 |
Authors (7): Di Wu, Jiaxin Fan, Junzhe Zang, Guanbo Wang, Wei Yin, Wenhao Li, Bo Jin
Embodied planning requires agents to make coherent multi-step decisions based on dynamic visual observations and natural language goals. While recent vision-language models (VLMs) excel at static perception tasks, they struggle with the temporal reasoning, spatial understanding, and commonsense grounding needed for planning in interactive environments. In this work, we introduce a reinforcement fine-tuning framework that brings R1-style reasoning enhancement into embodied planning. We first distill a high-quality dataset from a powerful closed-source model and perform supervised fine-tuning (SFT) to equip the model with structured decision-making priors. We then design a rule-based reward function tailored to multi-step action quality and optimize the policy via Generalized Reinforced Preference Optimization (GRPO). Our approach is evaluated on Embench, a recent benchmark for interactive embodied tasks, covering both in-domain and out-of-domain scenarios. Experimental results show that our method significantly outperforms models of similar or larger scale, including GPT-4o-mini and 70B+ open-source baselines, and exhibits strong generalization to unseen environments. This work highlights the potential of reinforcement-driven reasoning to advance long-horizon planning in embodied AI.
nan
Article 896
Title@2025-07-13 (7): Lightweight Federated Learning over Wireless Edge Networks
Title: Lightweight Federated Learning over Wireless Edge Networks | Leichtes Federated Learning über drahtlose Edge-Netzwerke | 对无线边缘网络进行轻量量量联邦学习 2507.09546v1 |
Authors (6): Xiangwang Hou, Jingjing Wang, Jun Du, Chunxiao Jiang, Yong Ren, Dusit Niyato
With the exponential growth of smart devices connected to wireless networks, data production is increasing rapidly, requiring machine learning (ML) techniques to unlock its value. However, the centralized ML paradigm raises concerns over communication overhead and privacy. Federated learning (FL) offers an alternative at the network edge, but practical deployment in wireless networks remains challenging. This paper proposes a lightweight FL (LTFL) framework integrating wireless transmission power control, model pruning, and gradient quantization. We derive a closed-form expression of the FL convergence gap, considering transmission error, model pruning error, and gradient quantization error. Based on these insights, we formulate an optimization problem to minimize the convergence gap while meeting delay and energy constraints. To solve the non-convex problem efficiently, we derive closed-form solutions for the optimal model pruning ratio and gradient quantization level, and employ Bayesian optimization for transmission power control. Extensive experiments on real-world datasets show that LTFL outperforms state-of-the-art schemes.
nan
Article 897
Title@2025-07-13 (7): Assessing reliability of explanations in unbalanced datasets: a use-case on the occurrence of frost events
Title: Assessing reliability of explanations in unbalanced datasets: a use-case on the occurrence of frost events | Beurteilung der Zuverlässigkeit von Erklärungen in unausgewogenen Datensätzen: ein Anwendungsfall zum Auftreten von Frostereignissen | 评估不平衡数据集中解释解释的可靠性:发生霜冻事件的情况 2507.09545v1 |
Authors (5): Ilaria Vascotto, Valentina Blasone, Alex Rodriguez, Alessandro Bonaita, Luca Bortolussi
The usage of eXplainable Artificial Intelligence (XAI) methods has become essential in practical applications, given the increasing deployment of Artificial Intelligence (AI) models and the legislative requirements put forward in the latest years. A fundamental but often underestimated aspect of the explanations is their robustness, a key property that should be satisfied in order to trust the explanations. In this study, we provide some preliminary insights on evaluating the reliability of explanations in the specific case of unbalanced datasets, which are very frequent in high-risk use-cases, but at the same time considerably challenging for both AI models and XAI methods. We propose a simple evaluation focused on the minority class (i.e. the less frequent one) that leverages on-manifold generation of neighbours, explanation aggregation and a metric to test explanation consistency. We present a use-case based on a tabular dataset with numerical features focusing on the occurrence of frost events.
nan
Article 898
Title@2025-07-13 (7): FedGSCA: Medical Federated Learning with Global Sample Selector and Client Adaptive Adjuster under Label Noise
Title: FedGSCA: Medical Federated Learning with Global Sample Selector and Client Adaptive Adjuster under Label Noise | FedGSCA: Medizinisches Federated Learning mit Global Sample Selector und Client Adaptive Justierer unter Label Noise | FedGSCA:与全球抽样选择者和标签噪音下的客户适应调整器进行医学联合会学习 2507.10611v1 |
Authors (6): Mengwen Ye, Yingzi Huangfu, Shujian Gao, Wei Ren, Weifan Liu, Zekuan Yu
Federated Learning (FL) emerged as a solution for collaborative medical image classification while preserving data privacy. However, label noise, which arises from inter-institutional data variability, can cause training instability and degrade model performance. Existing FL methods struggle with noise heterogeneity and the imbalance in medical data. Motivated by these challenges, we propose FedGSCA, a novel framework for enhancing robustness in noisy medical FL. FedGSCA introduces a Global Sample Selector that aggregates noise knowledge from all clients, effectively addressing noise heterogeneity and improving global model stability. Furthermore, we develop a Client Adaptive Adjustment (CAA) mechanism that combines adaptive threshold pseudo-label generation and Robust Credal Labeling Loss. CAA dynamically adjusts to class distributions, ensuring the inclusion of minority samples and carefully managing noisy labels by considering multiple plausible labels. This dual approach mitigates the impact of noisy data and prevents overfitting during local training, which improves the generalizability of the model. We evaluate FedGSCA on one real-world colon slides dataset and two synthetic medical datasets under various noise conditions, including symmetric, asymmetric, extreme, and heterogeneous types. The results show that FedGSCA outperforms the state-of-the-art methods, excelling in extreme and heterogeneous noise scenarios. Moreover, FedGSCA demonstrates significant advantages in improving model stability and handling complex noise, making it well-suited for real-world medical federated learning scenarios.
nan
Article 899
Title@2025-07-13 (7): Quantum Curriculum Learning
Title: Quantum Curriculum Learning | Quantum Curriculum Lernen | 量量课程学习 2407.02419v4 |
Authors (3): Quoc Hoan Tran, Yasuhiro Endo, Hirotaka Oshima
Quantum machine learning (QML) requires significant quantum resources to address practical real-world problems. When the underlying quantum information exhibits hierarchical structures in the data, limitations persist in training complexity and generalization. Research should prioritize both the efficient design of quantum architectures and the development of learning strategies to optimize resource usage. We propose a framework called quantum curriculum learning (Q-CurL) for quantum data, where the curriculum introduces simpler tasks or data to the learning model before progressing to more challenging ones. Q-CurL exhibits robustness to noise and data limitations, which is particularly relevant for current and near-term noisy intermediate-scale quantum devices. We achieve this through a curriculum design based on quantum data density ratios and a dynamic learning schedule that prioritizes the most informative quantum data. Empirical evidence shows that Q-CurL significantly enhances training convergence and generalization for unitary learning and improves the robustness of quantum phase recognition tasks. Q-CurL is effective with physical learning applications in physics and quantum chemistry.
nan
Article 900
Title@2025-07-13 (7): Consistency Trajectory Planning: High-Quality and Efficient Trajectory Optimization for Offline Model-Based Reinforcement Learning
Title: Consistency Trajectory Planning: High-Quality and Efficient Trajectory Optimization for Offline Model-Based Reinforcement Learning | Konsequente Trajektorienplanung: Hochqualitative und effiziente Trajektorienoptimierung für Offline-Modellbasiertes Verstärkungslernen | 一致性轨迹规划:离线示范强化学习的高质量和高效率轨迹优化 2507.09534v1 |
Authors (3): Guanquan Wang, Takuya Hiraoka, Yoshimasa Tsuruoka
This paper introduces Consistency Trajectory Planning (CTP), a novel offline model-based reinforcement learning method that leverages the recently proposed Consistency Trajectory Model (CTM) for efficient trajectory optimization. While prior work applying diffusion models to planning has demonstrated strong performance, it often suffers from high computational costs due to iterative sampling procedures. CTP supports fast, single-step trajectory generation without significant degradation in policy quality. We evaluate CTP on the D4RL benchmark and show that it consistently outperforms existing diffusion-based planning methods in long-horizon, goal-conditioned tasks. Notably, CTP achieves higher normalized returns while using significantly fewer denoising steps. In particular, CTP achieves comparable performance with over $120\times$ speedup in inference time, demonstrating its practicality and effectiveness for high-performance, low-latency offline planning.
nan
Article 901
Title@2025-07-13 (7): Monte Carlo Tree Diffusion for System 2 Planning
Title: Monte Carlo Tree Diffusion for System 2 Planning | Monte Carlo Tree Diffusion für System 2 Planung | 用于系统2规划的蒙特卡洛树传播 2502.07202v6 |
Authors (5): Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn
Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance naturally improves with inference-time computation scaling-standard diffusion-based planners offer only limited avenues for the scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as inference-time computation increases.
nan
Article 902
Title@2025-07-13 (7): VDInstruct: Zero-Shot Key Information Extraction via Content-Aware Vision Tokenization
Title: VDInstruct: Zero-Shot Key Information Extraction via Content-Aware Vision Tokenization | VDInstruct: Zero-Shot-Schlüsselinformationsextraktion über Content-Aware Vision Tokenization | VDInstruct: 通过内容软件愿景提取零热关键信息 2507.09531v1 |
Authors (5): Son Nguyen, Giang Nguyen, Hung Dao, Thao Do, Daeyoung Kim
Key Information Extraction (KIE) underpins the understanding of visual documents (e.g., receipts and contracts) by extracting precise semantic content and accurately capturing spatial structure. Yet existing multimodal large language models (MLLMs) often perform poorly on dense documents and rely on vision tokenization approaches that scale with image size, leading to redundant computation and memory inefficiency. To address these challenges, we introduce VDInstruct, an MLLM that separates spatial region detection from semantic feature extraction. Central to our model is a content-aware tokenization strategy: rather than fragmenting the entire image uniformly, it generates tokens in proportion to document complexity, preserving critical structure while eliminating wasted tokens. Leveraging a three-stage training paradigm, our model achieves state-of-the-art (SOTA) results on KIE benchmarks, matching or exceeding the accuracy of leading approaches while reducing the number of image tokens by roughly 3.6x. In zero-shot evaluations, VDInstruct surpasses strong baselines-such as DocOwl 1.5-by +5.5 F1 points, highlighting its robustness to unseen documents. These findings show that content-aware tokenization combined with explicit layout modeling offers a promising direction forward for document understanding. Data, source code, and model weights will be made publicly available.
nan
Article 903
Title@2025-07-13 (7): A Feed-Forward Artificial Intelligence Pipeline for Sustainable Desalination under Climate Uncertainties: UAE Insights
Title: A Feed-Forward Artificial Intelligence Pipeline for Sustainable Desalination under Climate Uncertainties: UAE Insights | Eine Feed-Forward-Pipeline für künstliche Intelligenz zur nachhaltigen Entsalzung unter Klimaunsicherheiten: VAE-Insights | 在气候不确定性下实现可持续脱盐的进餐前人工智能管道:阿联酋观察 2507.10609v1 |
Authors (4): Obumneme Nwafor, Chioma Nwafor, Amro Zakaria, Nkechi Nwankwo
The United Arab Emirates (UAE) relies heavily on seawater desalination to meet over 90% of its drinking water needs. Desalination processes are highly energy intensive and account for approximately 15% of the UAE’s electricity consumption, contributing to over 22% of the country’s energy-related CO2 emissions. Moreover, these processes face significant sustainability challenges in the face of climate uncertainties such as rising seawater temperatures, salinity, and aerosol optical depth (AOD). AOD greatly affects the operational and economic performance of solar-powered desalination systems through photovoltaic soiling, membrane fouling, and water turbidity cycles. This study proposes a novel pipelined two-stage predictive modelling architecture: the first stage forecasts AOD using satellite-derived time series and meteorological data; the second stage uses the predicted AOD and other meteorological factors to predict desalination performance efficiency losses. The framework achieved 98% accuracy, and SHAP (SHapley Additive exPlanations) was used to reveal key drivers of system degradation. Furthermore, this study proposes a dust-aware rule-based control logic for desalination systems based on predicted values of AOD and solar efficiency. This control logic is used to adjust the desalination plant feed water pressure, adapt maintenance scheduling, and regulate energy source switching. To enhance the practical utility of the research findings, the predictive models and rule-based controls were packaged into an interactive dashboard for scenario and predictive analytics. This provides a management decision-support system for climate-adaptive planning.
nan
Article 904
Title@2025-07-13 (7): An Analysis of Action-Value Temporal-Difference Methods That Learn State Values
Title: An Analysis of Action-Value Temporal-Difference Methods That Learn State Values | Eine Analyse von Aktions-Wert-Temporal-Difference-Methoden, die State Values lernen | 《学习国家价值观的行动—-重视时间—-差异方法分析》 2507.09523v1 |
Authors (4): Brett Daley, Prabhat Nagarajan, Martha White, Marlos C. Machado
The hallmark feature of temporal-difference (TD) learning is bootstrapping: using value predictions to generate new value predictions. The vast majority of TD methods for control learn a policy by bootstrapping from a single action-value function (e.g., Q-learning and Sarsa). Significantly less attention has been given to methods that bootstrap from two asymmetric value functions: i.e., methods that learn state values as an intermediate step in learning action values. Existing algorithms in this vein can be categorized as either QV-learning or AV-learning. Though these algorithms have been investigated to some degree in prior work, it remains unclear if and when it is advantageous to learn two value functions instead of just one – and whether such approaches are theoretically sound in general. In this paper, we analyze these algorithmic families in terms of convergence and sample efficiency. We find that while both families are more efficient than Expected Sarsa in the prediction setting, only AV-learning methods offer any major benefit over Q-learning in the control setting. Finally, we introduce a new AV-learning algorithm called Regularized Dueling Q-learning (RDQ), which significantly outperforms Dueling DQN in the MinAtar benchmark.
nan
Article 905
Title@2025-07-13 (7): The Shape of Deceit: Behavioral Consistency and Fragility in Money Laundering Patterns
Title: The Shape of Deceit: Behavioral Consistency and Fragility in Money Laundering Patterns | Die Form des Verfalls: Verhaltenskonsistenz und Fragilität in Geldwaschmustern | 犯罪模式的形状:洗钱模式中的行为一贯性和脆弱性 2507.10608v1 |
Authors (4): Danny Butvinik, Ofir Yakobi, Michal Einhorn Cohen, Elina Maliarsky
Conventional anti-money laundering (AML) systems predominantly focus on identifying anomalous entities or transactions, flagging them for manual investigation based on statistical deviation or suspicious behavior. This paradigm, however, misconstrues the true nature of money laundering, which is rarely anomalous but often deliberate, repeated, and concealed within consistent behavioral routines. In this paper, we challenge the entity-centric approach and propose a network-theoretic perspective that emphasizes detecting predefined laundering patterns across directed transaction networks. We introduce the notion of behavioral consistency as the core trait of laundering activity, and argue that such patterns are better captured through subgraph structures expressing semantic and functional roles - not solely geometry. Crucially, we explore the concept of pattern fragility: the sensitivity of laundering patterns to small attribute changes and, conversely, their semantic robustness even under drastic topological transformations. We claim that laundering detection should not hinge on statistical outliers, but on preservation of behavioral essence, and propose a reconceptualization of pattern similarity grounded in this insight. This philosophical and practical shift has implications for how AML systems model, scan, and interpret networks in the fight against financial crime.
nan
Article 906
Title@2025-07-13 (7): Neural Expectation Operators
Title: Neural Expectation Operators | Neurale Erwartungen Betreiber | 神经期待运算符 2507.10607v1 |
Authors (1): Qian Qi
This paper introduces \textbf{Measure Learning}, a paradigm for modeling ambiguity via non-linear expectations. We define Neural Expectation Operators as solutions to Backward Stochastic Differential Equations (BSDEs) whose drivers are parameterized by neural networks. The main mathematical contribution is a rigorous well-posedness theorem for BSDEs whose drivers satisfy a local Lipschitz condition in the state variable $y$ and quadratic growth in its martingale component $z$. This result circumvents the classical global Lipschitz assumption, is applicable to common neural network architectures (e.g., with ReLU activations), and holds for exponentially integrable terminal data, which is the sharp condition for this setting. Our primary innovation is to build a constructive bridge between the abstract, and often restrictive, assumptions of the deep theory of quadratic BSDEs and the world of machine learning, demonstrating that these conditions can be met by concrete, verifiable neural network designs. We provide constructive methods for enforcing key axiomatic properties, such as convexity, by architectural design. The theory is extended to the analysis of fully coupled Forward-Backward SDE systems and to the asymptotic analysis of large interacting particle systems, for which we establish both a Law of Large Numbers (propagation of chaos) and a Central Limit Theorem. This work provides the foundational mathematical framework for data-driven modeling under ambiguity.
nan
Article 907
Title@2025-07-13 (7): DALI-PD: Diffusion-based Synthetic Layout Heatmap Generation for ML in Physical Design
Title: DALI-PD: Diffusion-based Synthetic Layout Heatmap Generation for ML in Physical Design | DALI-PD: Diffusionsbasiertes Synthetisches Layout Heatmap Generation für ML in Physical Design | DALI-PD:在物理设计中为ML制造以扩散为基础的合成布局热电图 2507.10606v1 |
Authors (2): Bing-Yue Wu, Vidya A. Chhabria
Machine learning (ML) has demonstrated significant promise in various physical design (PD) tasks. However, model generalizability remains limited by the availability of high-quality, large-scale training datasets. Creating such datasets is often computationally expensive and constrained by IP. While very few public datasets are available, they are typically static, slow to generate, and require frequent updates. To address these limitations, we present DALI-PD, a scalable framework for generating synthetic layout heatmaps to accelerate ML in PD research. DALI-PD uses a diffusion model to generate diverse layout heatmaps via fast inference in seconds. The heatmaps include power, IR drop, congestion, macro placement, and cell density maps. Using DALI-PD, we created a dataset comprising over 20,000 layout configurations with varying macro counts and placements. These heatmaps closely resemble real layouts and improve ML accuracy on downstream ML tasks such as IR drop or congestion prediction.
nan
Article 908
Title@2025-07-13 (7): Neural Two-Stage Stochastic Optimization for Solving Unit Commitment Problem
Title: Neural Two-Stage Stochastic Optimization for Solving Unit Commitment Problem | Neurale Zwei-Stufen-Stochastische Optimierung zur Lösung von Unit Commitment Problem | 用于解决单位承诺问题的神经双层两层斯托卡优化 2507.09503v1 |
Authors (3): Zhentong Shao, Jingtao Qin, Nanpeng Yu
This paper proposes a neural stochastic optimization method for efficiently solving the two-stage stochastic unit commitment (2S-SUC) problem under high-dimensional uncertainty scenarios. The proposed method approximates the second-stage recourse problem using a deep neural network trained to map commitment decisions and uncertainty features to recourse costs. The trained network is subsequently embedded into the first-stage UC problem as a mixed-integer linear program (MILP), allowing for explicit enforcement of operational constraints while preserving the key uncertainty characteristics. A scenario-embedding network is employed to enable dimensionality reduction and feature aggregation across arbitrary scenario sets, serving as a data-driven scenario reduction mechanism. Numerical experiments on IEEE 5-bus, 30-bus, and 118-bus systems demonstrate that the proposed neural two-stage stochastic optimization method achieves solutions with an optimality gap of less than 1%, while enabling orders-of-magnitude speedup compared to conventional MILP solvers and decomposition-based methods. Moreover, the model’s size remains constant regardless of the number of scenarios, offering significant scalability for large-scale stochastic unit commitment problems.
nan
Article 909
Title@2025-07-13 (7): Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization
Title: Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization | Verbesserte Regret Bounds für Gaussian Prozess Oberes Vertrauen in Bayesian Optimierung | 改善对巴耶斯最佳优化高山进程最高信任圈的遗憾区 2506.01393v2 |
Authors (1): Shogo Iwazaki
This paper addresses the Bayesian optimization problem (also referred to as the Bayesian setting of the Gaussian process bandit), where the learner seeks to minimize the regret under a function drawn from a known Gaussian process (GP). Under a Mat'ern kernel with a certain degree of smoothness, we show that the Gaussian process upper confidence bound (GP-UCB) algorithm achieves $\tilde{O}(\sqrt{T})$ cumulative regret with high probability. Furthermore, our analysis yields $O(\sqrt{T \ln^2 T})$ regret under a squared exponential kernel. These results fill the gap between the existing regret upper bound for GP-UCB and the best-known bound provided by Scarlett (2018). The key idea in our proof is to capture the concentration behavior of the input sequence realized by GP-UCB, enabling a more refined analysis of the GP’s information gain.
nan
Article 910
Title@2025-07-13 (7): An Algorithm for Identifying Interpretable Subgroups With Elevated Treatment Effects
Title: An Algorithm for Identifying Interpretable Subgroups With Elevated Treatment Effects | Ein Algorithmus zur Identifizierung von interpretierbaren Untergruppen mit erhöhten Behandlungseffekten | 确定具有更高治疗效果的解释分组的数值 2507.09494v1 |
Authors (1): Albert Chiu
We introduce an algorithm for identifying interpretable subgroups with elevated treatment effects, given an estimate of individual or conditional average treatment effects (CATE). Subgroups are characterized by rule sets'' -- easy-to-understand statements of the form (Condition A AND Condition B) OR (Condition C) -- which can capture high-order interactions while retaining interpretability. Our method complements existing approaches for estimating the CATE, which often produce high dimensional and uninterpretable results, by summarizing and extracting critical information from fitted models to aid decision making, policy implementation, and scientific understanding. We propose an objective function that trades-off subgroup size and effect size, and varying the hyperparameter that controls this trade-off results in a
frontier’’ of Pareto optimal rule sets, none of which dominates the others across all criteria. Valid inference is achievable through sample splitting. We demonstrate the utility and limitations of our method using simulated and empirical examples.
nan
Article 911
Title@2025-07-13 (7): Auditing Prompt Caching in Language Model APIs
Title: Auditing Prompt Caching in Language Model APIs | Auditieren von Prompt-Caching in Sprachmodell-APIs | 语言模式APIP中快速抓取 2502.07776v2 |
Authors (5): Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, Tatsunori Hashimoto
Prompt caching in large language models (LLMs) results in data-dependent timing variations: cached prompts are processed faster than non-cached prompts. These timing differences introduce the risk of side-channel timing attacks. For example, if the cache is shared across users, an attacker could identify cached prompts from fast API response times to learn information about other users’ prompts. Because prompt caching may cause privacy leakage, transparency around the caching policies of API providers is important. To this end, we develop and conduct statistical audits to detect prompt caching in real-world LLM API providers. We detect global cache sharing across users in seven API providers, including OpenAI, resulting in potential privacy leakage about users’ prompts. Timing variations due to prompt caching can also result in leakage of information about model architecture. Namely, we find evidence that OpenAI’s embedding model is a decoder-only Transformer, which was previously not publicly known.
nan
Article 912
Title@2025-07-13 (7): LEP-QNN: Loan Eligibility Prediction using Quantum Neural Networks
Title: LEP-QNN: Loan Eligibility Prediction using Quantum Neural Networks | LEP-QNN: Kreditfähigkeitsvorhersage über Quantum-Neural-Netzwerke | LEP-QNN:利用量子神经网络预测贷款资格 2412.03158v2 |
Authors (4): Nouhaila Innan, Alberto Marchisio, Mohamed Bennai, Muhammad Shafique
Predicting loan eligibility with high accuracy remains a significant challenge in the finance sector. Accurate predictions enable financial institutions to make informed decisions, mitigate risks, and effectively adapt services to meet customer needs. However, the complexity and the high-dimensional nature of financial data have always posed significant challenges to achieving this level of precision. To overcome these issues, we propose a novel approach that employs Quantum Machine Learning (QML) for Loan Eligibility Prediction using Quantum Neural Networks (LEP-QNN). Our innovative approach achieves an accuracy of 98% in predicting loan eligibility from a single, comprehensive dataset. This performance boost is attributed to the strategic implementation of a dropout mechanism within the quantum circuit, aimed at minimizing overfitting and thereby improving the model’s predictive reliability. In addition, our exploration of various optimizers leads to identifying the most efficient setup for our LEP-QNN framework, optimizing its performance. We also rigorously evaluate the resilience of LEP-QNN under different quantum noise scenarios, ensuring its robustness and dependability for quantum computing environments. This research showcases the potential of QML in financial predictions and establishes a foundational guide for advancing QML technologies, marking a step towards developing advanced, quantum-driven financial decision-making tools.
nan
Article 913
Title@2025-07-13 (7): QFNN-FFD: Quantum Federated Neural Network for Financial Fraud Detection
Title: QFNN-FFD: Quantum Federated Neural Network for Financial Fraud Detection | QFNN-FFD: Quantum Federated Neural Network for Financial Betrug Detection | QFNN-FFD:金融欺诈侦查量子联邦神经网络 2404.02595v5 |
Authors (4): Nouhaila Innan, Alberto Marchisio, Mohamed Bennai, Muhammad Shafique
This study introduces the Quantum Federated Neural Network for Financial Fraud Detection (QFNN-FFD), a cutting-edge framework merging Quantum Machine Learning (QML) and quantum computing with Federated Learning (FL) for financial fraud detection. Using quantum technologies’ computational power and the robust data privacy protections offered by FL, QFNN-FFD emerges as a secure and efficient method for identifying fraudulent transactions within the financial sector. Implementing a dual-phase training model across distributed clients enhances data integrity and enables superior performance metrics, achieving precision rates consistently above 95%. Additionally, QFNN-FFD demonstrates exceptional resilience by maintaining an impressive 80% accuracy, highlighting its robustness and readiness for real-world applications. This combination of high performance, security, and robustness against noise positions QFNN-FFD as a transformative advancement in financial technology solutions and establishes it as a new benchmark for privacy-focused fraud detection systems. This framework facilitates the broader adoption of secure, quantum-enhanced financial services and inspires future innovations that could use QML to tackle complex challenges in other areas requiring high confidentiality and accuracy.
nan
Article 914
Title@2025-07-13 (7): Learning Expressive Random Feature Models via Parametrized Activations
Title: Learning Expressive Random Feature Models via Parametrized Activations | Expressive Zufalls-Feature-Modelle über parametrisierte Aktivierungen lernen | 通过半美化动能进行学习表达式随机特质模型 2411.19468v2 |
Authors (3): Zailin Ma, Jiansheng Yang, Yaodong Yang
Random feature (RF) method is a powerful kernel approximation technique, but is typically equipped with fixed activation functions, limiting its adaptability across diverse tasks. To overcome this limitation, we introduce the Random Feature Model with Learnable Activation Functions (RFLAF), which enhances the model expressivity by parameterizing activation functions as weighted sums of basis functions. Specifically, we propose to use radial basis functions (RBFs) as bases. We first analyze the RF model with a single RBF activation, deriving a novel kernel and presenting its theoretical properties. Extending this to multiple RBFs, we show that RFLAF significantly expands the function space of RF models while maintaining parameter efficiency. Experimental results across multiple tasks demonstrate that RFLAF consistently outperforms standard RF models with minimal extra computational cost. Furthermore, RFLAF showcases the ability of recovering the optimal activation function directly from data. Our work provides a deeper understanding of the component of learnable activation functions within modern neural networks architectures.
nan
Article 915
Title@2025-07-13 (7): Learning-Order Autoregressive Models with Application to Molecular Graph Generation
Title: Learning-Order Autoregressive Models with Application to Molecular Graph Generation | Autoregressive Modelle mit Anwendung auf die molekulare Graphengenerierung lernen-Ordnen | 适用于分子图生成的学习顺序自动递减模型 2503.05979v2 |
Authors (5): Zhe Wang, Jiaxin Shi, Nicolas Heess, Arthur Gretton, Michalis K. Titsias
Autoregressive models (ARMs) have become the workhorse for sequence generation tasks, since many problems can be modeled as next-token prediction. While there appears to be a natural ordering for text (i.e., left-to-right), for many data types, such as graphs, the canonical ordering is less obvious. To address this problem, we introduce a variant of ARM that generates high-dimensional data using a probabilistic ordering that is sequentially inferred from data. This model incorporates a trainable probability distribution, referred to as an order-policy, that dynamically decides the autoregressive order in a state-dependent manner. To train the model, we introduce a variational lower bound on the log-likelihood, which we optimize with stochastic gradient estimation. We demonstrate experimentally that our method can learn meaningful autoregressive orderings in image and graph generation. On the challenging domain of molecular graph generation, we achieve state-of-the-art results on the QM9 and ZINC250k benchmarks, evaluated across key metrics for distribution similarity and drug-likeless.
nan
Article 916
Title@2025-07-13 (7): Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling
Title: Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling | Adaptives Federated LoRA in heterogenen drahtlosen Netzwerken mit unabhängiger Probenahme | 具有独立抽样调查的多源无线网络中的联邦适应性 2505.23555v3 |
Authors (7): Yanzhao Hou, Jiaxiang Geng, Boyu Li, Xiaofeng Tao, Juncheng Wang, Xiaodong Xu, Bing Luo
Federated LoRA has emerged as a promising technique for efficiently fine-tuning large language models (LLMs) on distributed devices by reducing the number of trainable parameters. However, existing approaches often inadequately overlook the theoretical and practical implications of system and data heterogeneity, thereby failing to optimize the overall training efficiency, particularly in terms of wall-clock time. In this paper, we propose an adaptive federated LoRA strategy with independent client sampling to minimize the convergence wall-clock time of federated fine-tuning under both computation and communication heterogeneity. We first derive a new convergence bound for federated LoRA with arbitrary and independent client sampling, notably without requiring the stringent bounded gradient assumption. Then, we introduce an adaptive bandwidth allocation scheme that accounts for heterogeneous client resources and system bandwidth constraints. Based on the derived theory, we formulate and solve a non-convex optimization problem to jointly determine the LoRA sketching ratios and sampling probabilities, aiming to minimize wall-clock convergence time. An efficient and low-complexity algorithm is developed to approximate the solution. Finally, extensive experiments demonstrate that our approach significantly reduces wall-clock training time compared to state-of-the-art methods across various models and datasets.
nan
Article 917
Title@2025-07-13 (7): Neural Architecture Search generated Phase Retrieval Net for Real-time Off-axis Quantitative Phase Imaging
Title: Neural Architecture Search generated Phase Retrieval Net for Real-time Off-axis Quantitative Phase Imaging | Neurale Architektur Suche erzeugtes Phasen-Retrieval-Netz für Echtzeit-Off-Axis Quantitative Phasen-Imaging | 实时非轴外定量成像的神经结构搜索生成阶段回收网 2210.14231v2 |
Authors (5): Xin Shu, Mengxuan Niu, Yi Zhang, Wei Luo, Renjie Zhou
In off-axis Quantitative Phase Imaging (QPI), artificial neural networks have been recently applied for phase retrieval with aberration compensation and phase unwrapping. However, the involved neural network architectures are largely unoptimized and inefficient with low inference speed, which hinders the realization of real-time imaging. Here, we propose a Neural Architecture Search (NAS) generated Phase Retrieval Net (NAS-PRNet) for accurate and fast phase retrieval. NAS-PRNet is an encoder-decoder style neural network, automatically found from a large neural network architecture search space through NAS. By modifying the differentiable NAS scheme from SparseMask, we learn the optimized skip connections through gradient descent. Specifically, we implement MobileNet-v2 as the encoder and define a synthesized loss that incorporates phase reconstruction loss and network sparsity loss. NAS-PRNet has achieved high-fidelity phase retrieval by achieving a peak Signal-to-Noise Ratio (PSNR) of 36.7 dB and a Structural SIMilarity (SSIM) of 86.6% as tested on interferograms of biological cells. Notably, NAS-PRNet achieves phase retrieval in only 31 ms, representing 15x speedup over the most recent Mamba-UNet with only a slightly lower phase retrieval accuracy.
nan
Article 918
Title@2025-07-13 (7): Discrete Differential Principle for Continuous Smooth Function Representation
Title: Discrete Differential Principle for Continuous Smooth Function Representation | Diskrete Differentialprinzip für kontinuierliche glatte Funktionsdarstellung | 连续平滑职能代表的不区分原则 2507.09480v1 |
Authors (3): Guoyou Wang, Yihua Tan, Shiqi Liu
Taylor’s formula holds significant importance in function representation, such as solving differential difference equations, ordinary differential equations, partial differential equations, and further promotes applications in visual perception, complex control, fluid mechanics, weather forecasting and thermodynamics. However, the Taylor’s formula suffers from the curse of dimensionality and error propagation during derivative computation in discrete situations. In this paper, we propose a new discrete differential operator to estimate derivatives and to represent continuous smooth function locally using the Vandermonde coefficient matrix derived from truncated Taylor series. Our method simultaneously computes all derivatives of orders less than the number of sample points, inherently mitigating error propagation. Utilizing equidistant uniform sampling, it achieves high-order accuracy while alleviating the curse of dimensionality. We mathematically establish rigorous error bounds for both derivative estimation and function representation, demonstrating tighter bounds for lower-order derivatives. We extend our method to the two-dimensional case, enabling its use for multivariate derivative calculations. Experiments demonstrate the effectiveness and superiority of the proposed method compared to the finite forward difference method for derivative estimation and cubic spline and linear interpolation for function representation. Consequently, our technique offers broad applicability across domains such as vision representation, feature extraction, fluid mechanics, and cross-media imaging.
nan
Article 919
Title@2025-07-13 (7): Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints
Title: Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints | Anreiz-Aware Dynamische Ressourcenzuweisung unter langfristigen Kosteneinschränkungen | 长期成本制约因素下的奖励性-软件动态资源分配 2507.09473v1 |
Authors (3): Yan Dai, Negin Golrezaei, Patrick Jaillet
Motivated by applications such as cloud platforms allocating GPUs to users or governments deploying mobile health units across competing regions, we study the dynamic allocation of a reusable resource to strategic agents with private valuations. Our objective is to simultaneously (i) maximize social welfare, (ii) satisfy multi-dimensional long-term cost constraints, and (iii) incentivize truthful reporting. We begin by numerically evaluating primal-dual methods widely used in constrained online optimization and find them to be highly fragile in strategic settings – agents can easily manipulate their reports to distort future dual updates for future gain. To address this vulnerability, we develop an incentive-aware framework that makes primal-dual methods robust to strategic behavior. Our design combines epoch-based lazy updates – where dual variables remain fixed within each epoch – with randomized exploration rounds that extract approximately truthful signals for learning. Leveraging carefully designed online learning subroutines that can be of independent interest for dual updates, our mechanism achieves $\tilde{\mathcal{O}}(\sqrt{T})$ social welfare regret, satisfies all cost constraints, and ensures incentive alignment. This matches the performance of non-strategic allocation approaches while being robust to strategic agents.
nan
Article 920
Title@2025-07-13 (7): La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching
Title: La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching | La-Proteina: Atomistische Proteinerzeugung über teilweise Latent Flow Matching | La-Proteina:通过局部延迟流动配对产生的原子蛋白质 2507.09466v1 |
Authors (9): Tomas Geffner, Kieran Didi, Zhonglin Cao, Danny Reidenbach, Zuobai Zhang, Christian Dallago, Emine Kucukbenli, Karsten Kreis, Arash Vahdat
Recently, many generative models for de novo protein structure design have emerged. Yet, only few tackle the difficult task of directly generating fully atomistic structures jointly with the underlying amino acid sequence. This is challenging, for instance, because the model must reason over side chains that change in length during generation. We introduce La-Proteina for atomistic protein design based on a novel partially latent protein representation: coarse backbone structure is modeled explicitly, while sequence and atomistic details are captured via per-residue latent variables of fixed dimensionality, thereby effectively side-stepping challenges of explicit side-chain representations. Flow matching in this partially latent space then models the joint distribution over sequences and full-atom structures. La-Proteina achieves state-of-the-art performance on multiple generation benchmarks, including all-atom co-designability, diversity, and structural validity, as confirmed through detailed structural analyses and evaluations. Notably, La-Proteina also surpasses previous models in atomistic motif scaffolding performance, unlocking critical atomistic structure-conditioned protein design tasks. Moreover, La-Proteina is able to generate co-designable proteins of up to 800 residues, a regime where most baselines collapse and fail to produce valid samples, demonstrating La-Proteina’s scalability and robustness.
nan
Article 921
Title@2025-07-13 (7): Enhancing ALS Progression Tracking with Semi-Supervised ALSFRS-R Scores Estimated from Ambient Home Health Monitoring
Title: Enhancing ALS Progression Tracking with Semi-Supervised ALSFRS-R Scores Estimated from Ambient Home Health Monitoring | Verbesserung der ALS-Progressionsverfolgung mit semi-überwachten ALSFRS-R Punktzahl Geschätzt von Ambient Home Health Monitoring | 环境家庭健康监测估计的半超ALSFRS-R分数加强ALS进展跟踪 2507.09460v1 |
Authors (4): Noah Marchal, William E. Janes, Mihail Popescu, Xing Song
Clinical monitoring of functional decline in ALS relies on periodic assessments that may miss critical changes occurring between visits. To address this gap, semi-supervised regression models were developed to estimate rates of decline in a case series cohort by targeting ALSFRS- R scale trajectories with continuous in-home sensor monitoring data. Our analysis compared three model paradigms (individual batch learning and cohort-level batch versus incremental fine-tuned transfer learning) across linear slope, cubic polynomial, and ensembled self-attention pseudo-label interpolations. Results revealed cohort homogeneity across functional domains responding to learning methods, with transfer learning improving prediction error for ALSFRS-R subscales in 28 of 32 contrasts (mean RMSE=0.20(0.04)), and individual batch learning for predicting the composite scale (mean RMSE=3.15(1.25)) in 2 of 3. Self-attention interpolation achieved the lowest prediction error for subscale-level models (mean RMSE=0.19(0.06)), capturing complex nonlinear progression patterns, outperforming linear and cubic interpolations in 20 of 32 contrasts, though linear interpolation proved more stable in all ALSFRS-R composite scale models (mean RMSE=0.23(0.10)). We identified distinct homogeneity-heterogeneity profiles across functional domains with respiratory and speech exhibiting patient-specific patterns benefiting from personalized incremental adaptation, while swallowing and dressing functions followed cohort-level trajectories suitable for transfer models. These findings suggest that matching learning and pseudo-labeling techniques to functional domain-specific homogeneity-heterogeneity profiles enhances predictive accuracy in ALS progression tracking. Integrating adaptive model selection within sensor monitoring platforms could enable timely interventions and scalable deployment in future multi-center studies.
nan
Article 922
Title@2025-07-13 (7): Aequa: Fair Model Rewards in Collaborative Learning via Slimmable Networks
Title: Aequa: Fair Model Rewards in Collaborative Learning via Slimmable Networks | Aequa: Faire Modellprämien im kollaborativen Lernen über schlanke Netzwerke | Aequa:通过可恢复网络合作学习的公平示范奖励 2502.04850v2 |
Authors (3): Nurbek Tastan, Samuel Horvath, Karthik Nandakumar
Collaborative learning enables multiple participants to learn a single global model by exchanging focused updates instead of sharing data. One of the core challenges in collaborative learning is ensuring that participants are rewarded fairly for their contributions, which entails two key sub-problems: contribution assessment and reward allocation. This work focuses on fair reward allocation, where the participants are incentivized through model rewards - differentiated final models whose performance is commensurate with the contribution. In this work, we leverage the concept of slimmable neural networks to collaboratively learn a shared global model whose performance degrades gracefully with a reduction in model width. We also propose a post-training fair allocation algorithm that determines the model width for each participant based on their contributions. We theoretically study the convergence of our proposed approach and empirically validate it using extensive experiments on different datasets and architectures. We also extend our approach to enable training-time model reward allocation.
nan
Article 923
Title@2025-07-13 (7): RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services
Title: RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services | RedOne: Enthüllen von Domain-spezifischen LLM-Post-Trainings in Social Networking Services | 红一:在社会联网服务培训后推出特定域域LLM 2507.10605v1 |
Authors (25): Fei Zhao, Chonggang Lu, Yue Wang, Zheyong Xie, Ziyan Liu, Haofu Qian, JianZhao Huang, Fangcheng Shi, Zijie Meng, Hongcheng Guo, Mingqian He, Xinze Lyu, Yiming Lu, Ziyang Xiang, Zheyu Ye, Chengqiang Lu, Zhe Xu, Yi Wu, Yao Hu, Yan Gao, Jun Fan, Xiaolong Jiang, Weiting Liu, Boyang Wang, Shaosheng Cao
As a primary medium for modern information dissemination, social networking services (SNS) have experienced rapid growth, which has proposed significant challenges for platform content management and interaction quality improvement. Recently, the development of large language models (LLMs) has offered potential solutions but existing studies focus on isolated tasks, which not only encounter diminishing benefit from the data scaling within individual scenarios but also fail to flexibly adapt to diverse real-world context. To address these challenges, we introduce RedOne, a domain-specific LLM designed to break the performance bottleneck of single-task baselines and establish a comprehensive foundation for the SNS. RedOne was developed through a three-stage training strategy consisting of continue pretraining, supervised fine-tuning, and preference optimization, using a large-scale real-world dataset. Through extensive experiments, RedOne maintains strong general capabilities, and achieves an average improvement up to 14.02% across 8 major SNS tasks and 7.56% in SNS bilingual evaluation benchmark, compared with base models. Furthermore, through online testing, RedOne reduced the exposure rate in harmful content detection by 11.23% and improved the click page rate in post-view search by 14.95% compared with single-tasks finetuned baseline models. These results establish RedOne as a robust domain-specific LLM for SNS, demonstrating excellent generalization across various tasks and promising applicability in real-world scenarios.
nan
Article 924
Title@2025-07-13 (7): Fourier Basis Mapping: A Time-Frequency Learning Framework for Time Series Forecasting
Title: Fourier Basis Mapping: A Time-Frequency Learning Framework for Time Series Forecasting | Fourier Basis Mapping: Ein Zeit-Frequenz-Lernrahmen für Zeitreihenprognosen | Fourier地基绘图:时间序列预测时间-期限学习框架 2507.09445v1 |
Authors (6): Runze Yang, Longbing Cao, Xin You, Kun Fang, Jianxun Li, Jie Yang
The integration of Fourier transform and deep learning opens new avenues for time series forecasting. We reconsider the Fourier transform from a basis functions perspective. Specifically, the real and imaginary parts of the frequency components can be regarded as the coefficients of cosine and sine basis functions at tiered frequency levels, respectively. We find that existing Fourier-based methods face inconsistent starting cycles and inconsistent series length issues. They fail to interpret frequency components precisely and overlook temporal information. Accordingly, the novel Fourier Basis Mapping (FBM) method addresses these issues by integrating time-frequency features through Fourier basis expansion and mapping in the time-frequency space. Our approach extracts explicit frequency features while preserving temporal characteristics. FBM supports plug-and-play integration with various types of neural networks by only adjusting the first initial projection layer for better performance. First, we propose FBM-L, FBM-NL, and FBM-NP to enhance linear, MLP-based, and Transformer-based models, respectively, demonstrating the effectiveness of time-frequency features. Next, we propose a synergetic model architecture, termed FBM-S, which decomposes the seasonal, trend, and interaction effects into three separate blocks, each designed to model time-frequency features in a specialized manner. Finally, we introduce several techniques tailored for time-frequency features, including interaction masking, centralization, patching, rolling window projection, and multi-scale down-sampling. The results are validated on diverse real-world datasets for both long-term and short-term forecasting tasks with SOTA performance.
nan
Article 925
Title@2025-07-13 (7): Securing Transformer-based AI Execution via Unified TEEs and Crypto-protected Accelerators
Title: Securing Transformer-based AI Execution via Unified TEEs and Crypto-protected Accelerators | Sicherung transformerbasierter KI-Execution über Unified TEEs und Crypto-geschützte Beschleuniger | 通过统一TEE和加密保护加速器实施基于安全变压器的 AI 执行 2507.03278v2 |
Authors (6): Jiaqi Xue, Yifei Zhao, Mengxin Zheng, Fan Yao, Yan Solihin, Qian Lou
Recent advances in Transformer models, e.g., large language models (LLMs), have brought tremendous breakthroughs in various artificial intelligence (AI) tasks, leading to their wide applications in many security-critical domains. Due to their unprecedented scale and prohibitively high development cost, these models have become highly valuable intellectual property for AI stakeholders and are increasingly deployed via machine learning as a service (MLaaS). However, MLaaS often runs on untrusted cloud infrastructure, exposing data and models to potential breaches. Mainstream protection mechanisms leverage trusted execution environments (TEEs) where confidentiality and integrity for secretive data are shielded using hardware-based encryption and integrity checking. Unfortunately, running model inference entirely within TEEs is subject to non-trivial slowdown, which is further exacerbated in LLMs due to the substantial computation and memory footprint involved. Recent studies reveal that the hybrid TEE-based scheme offloading partial model inference operations to the untrusted accelerators (e.g., GPU) is a promising solution. However, prior offloading schemes fail to ensure dual protection of data and model in Transformer inference, as they cannot securely offload critical operations, i.e., Attention and SoftMax, forcing these computations to remain confined within TEEs. To address these challenges, we propose TwinShield, a framework enabling secure Transformer inference in heterogeneous TEE and accelerator systems with dual protection for both model and data. TwinShield offloads ~87% of computation to GPUs and delivers 4.0x - 6.1x speedups over previous approaches across various Transformer models.
nan
Article 926
Title@2025-07-13 (7): Toward Developing Machine-Learning-Aided Tools for the Thermomechanical Monitoring of Nuclear Reactor Components
Title: Toward Developing Machine-Learning-Aided Tools for the Thermomechanical Monitoring of Nuclear Reactor Components | Entwicklung von maschinenlernenden Werkzeugen für die thermomechanische Überwachung nuklearer Reaktorkomponenten | 逐步开发用于对核反应堆部件进行热机械机械机械监测的机械学习辅助工具 2507.09443v1 |
Authors (7): Luiz Aldeia Machado, Victor Coppo Leite, Elia Merzari, Arthur Motta, Roberto Ponciroli, Lander Ibarra, Lise Charlot
Proactive maintenance strategies, such as Predictive Maintenance (PdM), play an important role in the operation of Nuclear Power Plants (NPPs), particularly due to their capacity to reduce offline time by preventing unexpected shutdowns caused by component failures. In this work, we explore the use of a Convolutional Neural Network (CNN) architecture combined with a computational thermomechanical model to calculate the temperature, stress, and strain of a Pressurized Water Reactor (PWR) fuel rod during operation. This estimation relies on a limited number of temperature measurements from the cladding’s outer surface. This methodology can potentially aid in developing PdM tools for nuclear reactors by enabling real-time monitoring of such systems. The training, validation, and testing datasets were generated through coupled simulations involving BISON, a finite element-based nuclear fuel performance code, and the MOOSE Thermal-Hydraulics Module (MOOSE-THM). We conducted eleven simulations, varying the peak linear heat generation rates. Of these, eight were used for training, two for validation, and one for testing. The CNN was trained for over 1,000 epochs without signs of overfitting, achieving highly accurate temperature distribution predictions. These were then used in a thermomechanical model to determine the stress and strain distribution within the fuel rod.
nan
Article 927
Title@2025-07-13 (7): Next-token pretraining implies in-context learning
Title: Next-token pretraining implies in-context learning | Pretraining im Rahmen von Next-token impliziert das In-Context-Lernen | 下一级培训前的学习意味着通俗的学习 2505.18373v2 |
Authors (4): Paul M. Riechers, Henry R. Bigelow, Eric A. Alt, Adam Shai
We argue that in-context learning (ICL) predictably arises from standard self-supervised next-token pretraining, rather than being an exotic emergent property. This work establishes the foundational principles of this emergence by focusing on in-distribution ICL, demonstrating how models necessarily adapt to context when trained on token sequences, especially from non-ergodic sources. Our information-theoretic framework precisely predicts these in-distribution ICL dynamics (i.e., context-dependent loss reduction). We verify this with experiments using synthetic datasets of differing types of correlational structure, reproducing characteristic phenomena like phase transitions in training loss for induction head formation and power-law scaling of in-context loss. We further show that a model’s in-context performance on any task is mathematically coupled to the ensemble of tasks seen in pretraining, offering a fundamental explanation, grounded in architecture- and modality-independent principles, for such inference-time learning.
nan
Article 928
Title@2025-07-13 (7): Transformers Don’t In-Context Learn Least Squares Regression
Title: Transformers Don’t In-Context Learn Least Squares Regression | Transformer lernen nicht im Kontext Least Squares Regression | 变换者不要在知识中学习最小平方倒退 2507.09440v1 |
Authors (3): Joshua Hill, Benjamin Eyre, Elliot Creager
In-context learning (ICL) has emerged as a powerful capability of large pretrained transformers, enabling them to solve new tasks implicit in example input-output pairs without any gradient updates. Despite its practical success, the mechanisms underlying ICL remain largely mysterious. In this work we study synthetic linear regression to probe how transformers implement learning at inference time. Previous works have demonstrated that transformers match the performance of learning rules such as Ordinary Least Squares (OLS) regression or gradient descent and have suggested ICL is facilitated in transformers through the learned implementation of one of these techniques. In this work, we demonstrate through a suite of out-of-distribution generalization experiments that transformers trained for ICL fail to generalize after shifts in the prompt distribution, a behaviour that is inconsistent with the notion of transformers implementing algorithms such as OLS. Finally, we highlight the role of the pretraining corpus in shaping ICL behaviour through a spectral analysis of the learned representations in the residual stream. Inputs from the same distribution as the training data produce representations with a unique spectral signature: inputs from this distribution tend to have the same top two singular vectors. This spectral signature is not shared by out-of-distribution inputs, and a metric characterizing the presence of this signature is highly correlated with low loss.
nan
Article 929
Title@2025-07-13 (7): Dynamic Sparse Causal-Attention Temporal Networks for Interpretable Causality Discovery in Multivariate Time Series
Title: Dynamic Sparse Causal-Attention Temporal Networks for Interpretable Causality Discovery in Multivariate Time Series | Dynamische Sparse Causal-Aufmerksamkeit Temporale Netzwerke für interpretierbare Kausalitäts-Entdeckung in multivariaten Zeitreihen | 多变量时间序列中可解释性诱因发现时空网络 2507.09439v1 |
Authors (3): Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui
Understanding causal relationships in multivariate time series (MTS) is essential for effective decision-making in fields such as finance and marketing, where complex dependencies and lagged effects challenge conventional analytical approaches. We introduce Dynamic Sparse Causal-Attention Temporal Networks for Interpretable Causality Discovery in MTS (DyCAST-Net), a novel architecture designed to enhance causal discovery by integrating dilated temporal convolutions and dynamic sparse attention mechanisms. DyCAST-Net effectively captures multiscale temporal dependencies through dilated convolutions while leveraging an adaptive thresholding strategy in its attention mechanism to eliminate spurious connections, ensuring both accuracy and interpretability. A statistical shuffle test validation further strengthens robustness by filtering false positives and improving causal inference reliability. Extensive evaluations on financial and marketing datasets demonstrate that DyCAST-Net consistently outperforms existing models such as TCDF, GCFormer, and CausalFormer. The model provides a more precise estimation of causal delays and significantly reduces false discoveries, particularly in noisy environments. Moreover, attention heatmaps offer interpretable insights, uncovering hidden causal patterns such as the mediated effects of advertising on consumer behavior and the influence of macroeconomic indicators on financial markets. Case studies illustrate DyCAST-Net’s ability to detect latent mediators and lagged causal factors, making it particularly effective in high-dimensional, dynamic settings. The model’s architecture enhanced by RMSNorm stabilization and causal masking ensures scalability and adaptability across diverse application domains
nan
Article 930
Title@2025-07-13 (7): Neural networks leverage nominally quantum and post-quantum representations
Title: Neural networks leverage nominally quantum and post-quantum representations | Neurale Netzwerke nutzen nominal Quanten- und Post-Quantum-Darstellungen | 神经网络在名义上对数量和数量后代表的杠杆作用发挥杠杆作用 2507.07432v2 |
Authors (3): Paul M. Riechers, Thomas J. Elliott, Adam S. Shai
We show that deep neural networks, including transformers and RNNs, pretrained as usual on next-token prediction, intrinsically discover and represent beliefs over ‘quantum’ and ‘post-quantum’ low-dimensional generative models of their training data – as if performing iterative Bayesian updates over the latent state of this world model during inference as they observe more context. Notably, neural nets easily find these representation whereas there is no finite classical circuit that would do the job. The corresponding geometric relationships among neural activations induced by different input sequences are found to be largely independent of neural-network architecture. Each point in this geometry corresponds to a history-induced probability density over all possible futures, and the relative displacement of these points reflects the difference in mechanism and magnitude for how these distinct pasts affect the future.
nan
Article 931
Title@2025-07-13 (7): Modern approaches to building interpretable models of the property market using machine learning on the base of mass cadastral valuation
Title: Modern approaches to building interpretable models of the property market using machine learning on the base of mass cadastral valuation | Moderne Ansätze für den Aufbau interpretierbarer Modelle des Immobilienmarkts mit maschinellem Lernen auf Basis der Massenkadastralbewertung | 采用现代方法,利用根据质量地籍估价进行的机器学习,建立可解释的财产市场模型 2506.15723v2 |
Authors (4): Irina G. Tanashkina, Alexey S. Tanashkin, Alexander S. Maksimchuik, Anna Yu. Poshivailo
In this article, we review modern approaches to building interpretable models of property markets using machine learning on the base of mass valuation of property in the Primorye region, Russia. The researcher, lacking expertise in this topic, encounters numerous difficulties in the effort to build a good model. The main source of this is the huge difference between noisy real market data and ideal data which is very common in all types of tutorials on machine learning. This paper covers all stages of modeling: the collection of initial data, identification of outliers, the search and analysis of patterns in the data, the formation and final choice of price factors, the building of the model, and the evaluation of its efficiency. For each stage, we highlight potential issues and describe sound methods for overcoming emerging difficulties on actual examples. We show that the combination of classical linear regression with interpolation methods of geostatistics allows to build an effective model for land parcels. For flats, when many objects are attributed to one spatial point the application of geostatistical methods is difficult. Therefore we suggest linear regression with automatic generation and selection of additional rules on the base of decision trees, so called the RuleFit method. Thus we show, that despite such a strong restriction as the requirement of interpretability which is important in practical aspects, for example, legal matters, it is still possible to build effective models of real property markets.
nan
Article 932
Title@2025-07-13 (7): Sensitivity Analysis of Transport and Radiation in NeuralPlasmaODE for ITER Burning Plasmas
Title: Sensitivity Analysis of Transport and Radiation in NeuralPlasmaODE for ITER Burning Plasmas | Sensitivitätsanalyse von Transport und Strahlung in NeuralPlasmaODE für ITER-Brennplasma | ITER 燃烧日光虫的神经PlasmaODE内运输和辐射感敏分析 2507.09432v1 |
Authors (2): Zefang Liu, Weston M. Stacey
Understanding how key physical parameters influence burning plasma behavior is critical for the reliable operation of ITER. In this work, we extend NeuralPlasmaODE, a multi-region, multi-timescale model based on neural ordinary differential equations, to perform a sensitivity analysis of transport and radiation mechanisms in ITER plasmas. Normalized sensitivities of core and edge temperatures and densities are computed with respect to transport diffusivities, electron cyclotron radiation (ECR) parameters, impurity fractions, and ion orbit loss (IOL) timescales. The analysis focuses on perturbations around a trained nominal model for the ITER inductive scenario. Results highlight the dominant influence of magnetic field strength, safety factor, and impurity content on energy confinement, while also revealing how temperature-dependent transport contributes to self-regulating behavior. These findings demonstrate the utility of NeuralPlasmaODE for predictive modeling and scenario optimization in burning plasma environments.
nan
Article 933
Title@2025-07-12 (6): Optimizing External Sources for Controlled Burning Plasma in Tokamaks with Neural Ordinary Differential Equations
Title: Optimizing External Sources for Controlled Burning Plasma in Tokamaks with Neural Ordinary Differential Equations | Optimierung externer Quellen für kontrolliertes Brennplasma in Tokamaks mit neuralen normalen Differentialgleichungen | 利用神经普通差异等同优化托卡马克受控燃烧等离外部源的最佳利用 2507.09431v1 |
Authors (2): Zefang Liu, Weston M. Stacey
Achieving controlled burning plasma in tokamaks requires precise regulation of external particle and energy sources to reach and maintain target core densities and temperatures. This work presents an inverse modeling approach using a multinodal plasma dynamics model based on neural ordinary differential equations (Neural ODEs). Given a desired time evolution of nodal quantities such as deuteron density or electron temperature, we compute the external source profiles, such as neutral beam injection (NBI) power, that drive the plasma toward the specified behavior. The approach is implemented within the NeuralPlasmaODE framework, which models multi-region, multi-timescale transport and incorporates physical mechanisms including radiation, auxiliary heating, and internodal energy exchange. By formulating the control task as an optimization problem, we use automatic differentiation through the Neural ODE solver to minimize the discrepancy between simulated and target trajectories. This framework transforms the forward simulation tool into a control-oriented model and provides a practical method for computing external source profiles in both current and future fusion devices.
nan
Article 934
Title@2025-07-12 (6): Causal Discovery-Driven Change Point Detection in Time Series
Title: Causal Discovery-Driven Change Point Detection in Time Series | Causal Discovery-Driven Change Point Detection in der Zeitreihe | 时间序列中因果发现 - 驱动变化点探测 2407.07290v2 |
Authors (5): Shanyun Gao, Raghavendra Addanki, Tong Yu, Ryan A. Rossi, Murat Kocaoglu
Change point detection in time series aims to identify moments when the probability distribution of time series changes. It is widely applied in many areas, such as human activity sensing and medical science. In the context of multivariate time series, this typically involves examining the joint distribution of multiple variables: If the distribution of any one variable changes, the entire time series undergoes a distribution shift. However, in practical applications, we may be interested only in certain components of the time series, exploring abrupt changes in their distributions while accounting for the presence of other components. Here, assuming an underlying structural causal model that governs the time-series data generation, we address this task by proposing a two-stage non-parametric algorithm that first learns parts of the causal structure through constraint-based discovery methods, and then employs conditional relative Pearson divergence estimation to identify the change points. The conditional relative Pearson divergence quantifies the distribution difference between consecutive segments in the time series, while the causal discovery method allows a focus on the causal mechanism, facilitating access to independent and identically distributed (IID) samples. Theoretically, the typical assumption of samples being IID in conventional change point detection methods can be relaxed based on the Causal Markov Condition. Through experiments on both synthetic and real-world datasets, we validate the correctness and utility of our approach.
nan
Article 935
Title@2025-07-12 (6): On Information Geometry and Iterative Optimization in Model Compression: Operator Factorization
Title: On Information Geometry and Iterative Optimization in Model Compression: Operator Factorization | Über Informationsgeometrie und iterative Optimierung in der Modellkompression: Operator Factorization | 关于模型压缩中信息几何和迭代优化的信息优化:操作者化 2507.09428v1 |
Authors (3): Zakhar Shumaylov, Vasileios Tsiaras, Yannis Stylianou
The ever-increasing parameter counts of deep learning models necessitate effective compression techniques for deployment on resource-constrained devices. This paper explores the application of information geometry, the study of density-induced metrics on parameter spaces, to analyze existing methods within the space of model compression, primarily focusing on operator factorization. Adopting this perspective highlights the core challenge: defining an optimal low-compute submanifold (or subset) and projecting onto it. We argue that many successful model compression approaches can be understood as implicitly approximating information divergences for this projection. We highlight that when compressing a pre-trained model, using information divergences is paramount for achieving improved zero-shot accuracy, yet this may no longer be the case when the model is fine-tuned. In such scenarios, trainability of bottlenecked models turns out to be far more important for achieving high compression ratios with minimal performance degradation, necessitating adoption of iterative methods. In this context, we prove convergence of iterative singular value thresholding for training neural networks subject to a soft rank constraint. To further illustrate the utility of this perspective, we showcase how simple modifications to existing methods through softer rank reduction result in improved performance under fixed compression rates.
nan
Article 936
Title@2025-07-12 (6): Domain Adaptation and Multi-view Attention for Learnable Landmark Tracking with Sparse Data
Title: Domain Adaptation and Multi-view Attention for Learnable Landmark Tracking with Sparse Data | Domain-Anpassung und Multi-View-Achtung für erlernbares Landmark-Tracking mit Sparse-Daten | 利用简化数据进行可学习土地标记跟踪的域适应和多视角关注 2507.09420v1 |
Authors (2): Timothy Chase Jr, Karthik Dantu
The detection and tracking of celestial surface terrain features are crucial for autonomous spaceflight applications, including Terrain Relative Navigation (TRN), Entry, Descent, and Landing (EDL), hazard analysis, and scientific data collection. Traditional photoclinometry-based pipelines often rely on extensive a priori imaging and offline processing, constrained by the computational limitations of radiation-hardened systems. While historically effective, these approaches typically increase mission costs and duration, operate at low processing rates, and have limited generalization. Recently, learning-based computer vision has gained popularity to enhance spacecraft autonomy and overcome these limitations. While promising, emerging techniques frequently impose computational demands exceeding the capabilities of typical spacecraft hardware for real-time operation and are further challenged by the scarcity of labeled training data for diverse extraterrestrial environments. In this work, we present novel formulations for in-situ landmark tracking via detection and description. We utilize lightweight, computationally efficient neural network architectures designed for real-time execution on current-generation spacecraft flight processors. For landmark detection, we propose improved domain adaptation methods that enable the identification of celestial terrain features with distinct, cheaply acquired training data. Concurrently, for landmark description, we introduce a novel attention alignment formulation that learns robust feature representations that maintain correspondence despite significant landmark viewpoint variations. Together, these contributions form a unified system for landmark tracking that demonstrates superior performance compared to existing state-of-the-art techniques.
nan
Article 937
Title@2025-07-12 (6): On Supernet Transfer Learning for Effective Task Adaptation
Title: On Supernet Transfer Learning for Effective Task Adaptation | Auf Supernet Transfer Learning für effektive Aufgabenanpassung | 用于有效任务适应的超级网传输学习 2407.20279v3 |
Authors (2): Prabhant Singh, Joaquin Vanschoren
Neural Architecture Search (NAS) methods have been shown to outperform hand-designed models and help to democratize AI. However, NAS methods often start from scratch with each new task, making them computationally expensive and limiting their applicability. Transfer learning is a practical alternative with the rise of ever-larger pretrained models. However, it is also bound to the architecture of the pretrained model, which inhibits proper adaptation of the architecture to different tasks, leading to suboptimal (and excessively large) models. We address both challenges at once by introducing a novel and practical method to \textit{transfer supernets}, which parameterize both weight and architecture priors, and efficiently finetune both to new tasks. This enables supernet transfer learning as a replacement for traditional transfer learning that also finetunes model architectures to new tasks. Through extensive experiments across multiple image classification tasks, we demonstrate that supernet transfer learning does not only drastically speed up the discovery of optimal models (3 to 5 times faster on average), but will also find better models than running NAS from scratch. The added model flexibility also increases the robustness of transfer learning, yielding positive transfer to even very different target datasets, especially with multi-dataset pretraining.
nan
Article 938
Title@2025-07-12 (6): Intelligent Orchestration of Distributed Large Foundation Model Inference at the Edge
Title: Intelligent Orchestration of Distributed Large Foundation Model Inference at the Edge | Intelligente Orchestrierung der verteilten Large Foundation Model Inferenz am Rande | 分散在边缘的大基金会模型推断 2504.03668v3 |
Authors (3): Fernando Koch, Aladin Djuhera, Alecio Binotto
Large Foundation Models (LFMs), including multi-modal and generative models, promise to unlock new capabilities for next-generation Edge AI applications. However, performing inference with LFMs in resource-constrained and heterogeneous edge environments, such as Multi-access Edge Computing (MEC), presents significant challenges for workload orchestration due to time-varying network, compute, and storage conditions. In particular, current split inference strategies, which partition LFM layers across nodes, are not designed to adapt to fluctuating workloads, dynamic bandwidth conditions, or evolving privacy constraints in high-utilization MEC environments. In this work, we propose a novel adaptive split inference orchestration framework that elevates both the placement and partitioning of LFM layers to runtime-tunable variables. Specifically, our framework enables real-time, quality-of-service (QoS)-aware management of inference workloads by extending conventional orchestrators with three key services: (1) Capacity-aware workload distribution, which continuously profiles node resources and selects an optimal subset of MEC nodes; (2) Dynamic partition migration, which transparently relocates pre-cut LFM segments in response to changes in utilization or network conditions; (3) Real-time reconfiguration, which dynamically re-splits LFM layers to balance latency, throughput, and privacy. We formalize the joint placement-partitioning problem, outline a reference architecture and algorithmic workflow, and discuss applicability in representative smart city, V2X, and industrial edge scenarios.
nan
Article 939
Title@2025-07-12 (6): Insuring Uninsurable Risks from AI: Government as Insurer of Last Resort
Title: Insuring Uninsurable Risks from AI: Government as Insurer of Last Resort | Unversicherbare Risiken von KI sichern: Regierung als Versicherer des letzten Resorts | AI:政府作为最后度假地的保险人 2409.06672v3 |
Authors (1): Cristian Trout
Many experts believe that AI systems will sooner or later pose uninsurable risks, including existential risks. This creates an extreme judgment-proof problem: few if any parties can be held accountable ex post in the event of such a catastrophe. This paper proposes a novel solution: a government-provided, mandatory indemnification program for AI developers. The program uses risk-priced indemnity fees to induce socially optimal levels of care. Risk-estimates are determined by surveying experts, including indemnified developers. The Bayesian Truth Serum mechanism is employed to incent honest and effortful responses. Compared to alternatives, this approach arguably better leverages all private information, and provides a clearer signal to indemnified developers regarding what risks they must mitigate to lower their fees. It’s recommended that collected fees be used to help fund the safety research developers need, employing a fund matching mechanism (Quadratic Financing) to induce an optimal supply of this public good. Under Quadratic Financing, safety research projects would compete for private contributions from developers, signaling how much each is to be supplemented with public funds.
nan
Article 940
Title@2025-07-12 (6): GreenCrossingAI: A Camera Trap/Computer Vision Pipeline for Environmental Science Research Groups
Title: GreenCrossingAI: A Camera Trap/Computer Vision Pipeline for Environmental Science Research Groups | GreenCrossingAI: Eine Kamerafalle/Computer Vision Pipeline für Forschungsgruppen der Umweltwissenschaften | GreenCrossingAI:环境科学研究小组的相机陷阱/计算机视觉管道 2507.09410v1 |
Authors (5): Bernie Boscoe, Shawn Johnson, Andrea Osborn, Chandler Campbell, Karen Mager
Camera traps have long been used by wildlife researchers to monitor and study animal behavior, population dynamics, habitat use, and species diversity in a non-invasive and efficient manner. While data collection from the field has increased with new tools and capabilities, methods to develop, process, and manage the data, especially the adoption of ML/AI tools, remain challenging. These challenges include the sheer volume of data generated, the need for accurate labeling and annotation, variability in environmental conditions affecting data quality, and the integration of ML/AI tools into existing workflows that often require domain-specific customization and computational resources. This paper provides a guide to a low-resource pipeline to process camera trap data on-premise, incorporating ML/AI capabilities tailored for small research groups with limited resources and computational expertise. By focusing on practical solutions, the pipeline offers accessible approaches for data transmission, inference, and evaluation, enabling researchers to discover meaningful insights from their ever-increasing camera trap datasets.
nan
Article 941
Title@2025-07-12 (6): Divergence of Empirical Neural Tangent Kernel in Classification Problems
Title: Divergence of Empirical Neural Tangent Kernel in Classification Problems | Unterschiedlichkeit des empirischen neuralen Tangenten-Kernels bei Klassifizierungsproblemen | 在分类问题方面经验性神经神经下层核心的差别 2504.11130v2 |
Authors (3): Zixiong Yu, Songtao Tian, Guhan Chen
This paper demonstrates that in classification problems, fully connected neural networks (FCNs) and residual neural networks (ResNets) cannot be approximated by kernel logistic regression based on the Neural Tangent Kernel (NTK) under overtraining (i.e., when training time approaches infinity). Specifically, when using the cross-entropy loss, regardless of how large the network width is (as long as it is finite), the empirical NTK diverges from the NTK on the training samples as training time increases. To establish this result, we first demonstrate the strictly positive definiteness of the NTKs for multi-layer FCNs and ResNets. Then, we prove that during training, % with the cross-entropy loss, the neural network parameters diverge if the smallest eigenvalue of the empirical NTK matrix (Gram matrix) with respect to training samples is bounded below by a positive constant. This behavior contrasts sharply with the lazy training regime commonly observed in regression problems. Consequently, using a proof by contradiction, we show that the empirical NTK does not uniformly converge to the NTK across all times on the training samples as the network width increases. We validate our theoretical results through experiments on both synthetic data and the MNIST classification task. This finding implies that NTK theory is not applicable in this context, with significant theoretical implications for understanding neural networks in classification problems.
nan
Article 942
Title@2025-07-12 (6): Score Attack: A Lower Bound Technique for Optimal Differentially Private Learning
Title: Score Attack: A Lower Bound Technique for Optimal Differentially Private Learning | Score Attack: Eine untere Bound-Technik für optimales, differenziertes Private Learning | 得分攻击: 最佳差异化私人学习的低劣技术 2303.07152v2 |
Authors (3): T. Tony Cai, Yichen Wang, Linjun Zhang
Achieving optimal statistical performance while ensuring the privacy of personal data is a challenging yet crucial objective in modern data analysis. However, characterizing the optimality, particularly the minimax lower bound, under privacy constraints is technically difficult. To address this issue, we propose a novel approach called the score attack, which provides a lower bound on the differential-privacy-constrained minimax risk of parameter estimation. The score attack method is based on the tracing attack concept in differential privacy and can be applied to any statistical model with a well-defined score statistic. It can optimally lower bound the minimax risk of estimating unknown model parameters, up to a logarithmic factor, while ensuring differential privacy for a range of statistical problems. We demonstrate the effectiveness and optimality of this general method in various examples, such as the generalized linear model in both classical and high-dimensional sparse settings, the Bradley-Terry-Luce model for pairwise comparisons, and non-parametric regression over the Sobolev class.
nan
Article 943
Title@2025-07-12 (6): New Statistical and Computational Results for Learning Junta Distributions
Title: New Statistical and Computational Results for Learning Junta Distributions | Neue statistische und rechnerische Ergebnisse für Junta-Distributionen | 学习军军分发的新的统计和计算结果 2505.05819v3 |
Authors (1): Lorenzo Beretta
We study the problem of learning junta distributions on ${0, 1}^n$, where a distribution is a $k$-junta if its probability mass function depends on a subset of at most $k$ variables. We make two main contributions: - We show that learning $k$-junta distributions is \emph{computationally} equivalent to learning $k$-parity functions with noise (LPN), a landmark problem in computational learning theory. - We design an algorithm for learning junta distributions whose statistical complexity is optimal, up to polylogarithmic factors. Computationally, our algorithm matches the complexity of previous (non-sample-optimal) algorithms. Combined, our two contributions imply that our algorithm cannot be significantly improved, statistically or computationally, barring a breakthrough for LPN.
nan
Article 944
Title@2025-07-12 (6): Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers
Title: Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers | Adversarial Activation Patching: Ein Framework zur Erkennung und Abmilderung von Emergent Deception in sicherheitsorientierten Transformern | 反反向启动补补补:在安全自动变形器中发现和减轻新出现欺骗的框架 2507.09406v1 |
Authors (1): Santhosh Kumar Ravindran
Large language models (LLMs) aligned for safety through techniques like reinforcement learning from human feedback (RLHF) often exhibit emergent deceptive behaviors, where outputs appear compliant but subtly mislead or omit critical information. This paper introduces adversarial activation patching, a novel mechanistic interpretability framework that leverages activation patching as an adversarial tool to induce, detect, and mitigate such deception in transformer-based models. By sourcing activations from “deceptive” prompts and patching them into safe forward passes at specific layers, we simulate vulnerabilities and quantify deception rates. Through toy neural network simulations across multiple scenarios (e.g., 1000 trials per setup), we demonstrate that adversarial patching increases deceptive outputs to 23.9% from a 0% baseline, with layer-specific variations supporting our hypotheses. We propose six hypotheses, including transferability across models, exacerbation in multimodal settings, and scaling effects. An expanded literature review synthesizes over 20 key works in interpretability, deception, and adversarial attacks. Mitigation strategies, such as activation anomaly detection and robust fine-tuning, are detailed, alongside ethical considerations and future research directions. This work advances AI safety by highlighting patching’s dual-use potential and provides a roadmap for empirical studies on large-scale models.
nan
Article 945
Title@2025-07-12 (6): Deep learning lattice gauge theories
Title: Deep learning lattice gauge theories | Theorien des tiefen Lernens von Gittermessgeräten | 深深学习花束仪表学理论 2405.14830v2 |
Authors (4): Anuj Apte, Anthony Ashmore, Clay Cordova, Tzu-Chen Huang
Monte Carlo methods have led to profound insights into the strong-coupling behaviour of lattice gauge theories and produced remarkable results such as first-principles computations of hadron masses. Despite tremendous progress over the last four decades, fundamental challenges such as the sign problem and the inability to simulate real-time dynamics remain. Neural network quantum states have emerged as an alternative method that seeks to overcome these challenges. In this work, we use gauge-invariant neural network quantum states to accurately compute the ground state of $\mathbb{Z}_N$ lattice gauge theories in $2+1$ dimensions. Using transfer learning, we study the distinct topological phases and the confinement phase transition of these theories. For $\mathbb{Z}_2$, we identify a continuous transition and compute critical exponents, finding excellent agreement with existing numerics for the expected Ising universality class. In the $\mathbb{Z}_3$ case, we observe a weakly first-order transition and identify the critical coupling. Our findings suggest that neural network quantum states are a promising method for precise studies of lattice gauge theory.
nan
Article 946
Title@2025-07-12 (6): No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
Title: No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms | Nein, natürlich kann ich! Tiefere Feinabstimmung Angriffe, die Token-Level Sicherheitsmechanismen umgehen | 更深的精准攻击 绕过托肯级安全机制 2502.19537v5 |
Authors (8): Joshua Kazdan, Abhay Puri, Rylan Schaeffer, Lisa Yu, Chris Cundy, Jason Stanley, Sanmi Koyejo, Krishnamurthy Dvijotham
Leading language model (LM) providers like OpenAI and Anthropic allow customers to fine-tune frontier LMs for specific use cases. To prevent abuse, these providers apply filters to block fine-tuning on overtly harmful data. In this setting, we make three contributions: First, while past work has shown that safety alignment is “shallow”, we correspondingly demonstrate that existing fine-tuning attacks are shallow – attacks target only the first several tokens of the model response, and consequently can be blocked by generating the first several response tokens with an aligned model. Second, we conceptually illustrate how to make attacks deeper by introducing a new fine-tuning attack that trains models to first refuse harmful requests before answering them; this “refuse-then-comply” strategy bypasses shallow defenses and produces harmful responses that evade output filters. Third, we demonstrate the potency of our new fine-tuning attack by jailbreaking both open-source models equipped with defenses and production models, achieving attack success rates of 57% and 72% against GPT-4o and Claude Haiku, respectively. Our attack received a $2000 bug bounty from OpenAI and was acknowledged as a vulnerability by Anthropic. Our work undermines the notion that models are safe because they initially refuse harmful requests and broadens awareness of the scope of attacks that face production fine-tuning APIs.
nan
Article 947
Title@2025-07-12 (6): Scaling Laws for Optimal Data Mixtures
Title: Scaling Laws for Optimal Data Mixtures | Skalierungsgesetze für optimale Datenmischungen | 优化数据混合法的缩放法 2507.09404v1 |
Authors (7): Mustafa Shukor, Louis Bethune, Dan Busbridge, David Grangier, Enrico Fini, Alaaeldin El-Nouby, Pierre Ablin
Large foundation models are typically trained on data from multiple domains, with the data mixture–the proportion of each domain used–playing a critical role in model performance. The standard approach to selecting this mixture relies on trial and error, which becomes impractical for large-scale pretraining. We propose a systematic method to determine the optimal data mixture for any target domain using scaling laws. Our approach accurately predicts the loss of a model of size $N$ trained with $D$ tokens and a specific domain weight vector $h$. We validate the universality of these scaling laws by demonstrating their predictive power in three distinct and large-scale settings: large language model (LLM), native multimodal model (NMM), and large vision models (LVM) pretraining. We further show that these scaling laws can extrapolate to new data mixtures and across scales: their parameters can be accurately estimated using a few small-scale training runs, and used to estimate the performance at larger scales and unseen domain weights. The scaling laws allow to derive the optimal domain weights for any target domain under a given training budget ($N$,$D$), providing a principled alternative to costly trial-and-error methods.
nan
Article 948
Title@2025-07-12 (6): Bayesian Theory of Consciousness as Exchangeable Emotion-Cognition Inference
Title: Bayesian Theory of Consciousness as Exchangeable Emotion-Cognition Inference | Bayesische Bewusstseinstheorie als auswechselbare Emotion-Kognition-Schlussfolgerung | 贝叶斯人的觉悟理论,作为可交流的情感 – – 情绪 – – 气氛推论 2407.09488v3 |
Authors (1): Xin Li
This paper proposes a unified framework in which consciousness emerges as a cycle-consistent, affectively anchored inference process, recursively structured by the interaction of emotion and cognition. Drawing from information theory, optimal transport, and the Bayesian brain hypothesis, we formalize emotion as a low-dimensional structural prior and cognition as a specificity-instantiating update. This emotion-cognition cycle minimizes joint uncertainty by aligning emotionally weighted priors with context-sensitive cognitive appraisals. Subjective experience thus arises as the informational footprint of temporally extended, affect-modulated simulation. We introduce the Exchangeable Integration Theory of Consciousness (EITC), modeling conscious episodes as conditionally exchangeable samples drawn from a latent affective self-model. This latent variable supports integration, via a unified cause-effect structure with nonzero irreducibility, and differentiation, by preserving contextual specificity across episodes. We connect this architecture to the Bayesian theory of consciousness through Rao-Blackwellized inference, which stabilizes inference by marginalizing latent self-structure while enabling adaptive updates. This mechanism ensures coherence, prevents inference collapse, and supports goal-directed simulation. The formal framework builds on De Finetti’s exchangeability theorem, integrated information theory, and KL-regularized optimal transport. Overall, consciousness is reframed as a recursive inference process, shaped by emotion, refined by cognition, stabilized through exchangeability, and unified through a latent self-model that integrates experience across time.
nan
Article 949
Title@2025-07-12 (6): A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention
Title: A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention | Eine zufällige Matrix-Theorie Perspektive auf die Lerndynamik von mehrköpfiger latenter Aufmerksamkeit | 多头端注意学习动态的随机矩阵理论视角 2507.09394v1 |
Authors (2): Nandan Kumar Jha, Brandon Reagen
In this work, we study how multi-head latent attention (MLA), a popular strategy for compressing key/value memory, affects a transformer’s internal capacity during pretraining. Using a lightweight suite of Marchenko-Pastur (MP) diagnostics, we analyze the spectrum of the $W_{Q}W_{K}^\top$ gram matrix throughout training, comparing three variants: the standard multi-head attention (MHA) baseline, MLA-PreRoPE with rotary applied before compression, and MLA-Decoupled, which shares a single rotary sub-vector across all heads. Our random matrix analysis reveals \textbf{three key findings:} \textbf{ i)} capacity bottlenecks emerge locally: both MHA and MLA-PreRoPE exhibit sharp, early spikes in specific layers that persist and propagate, disrupting the balance between bulk and outlier directions; \textbf{ ii)} these spikes coincide with rank collapse, concentrating the model’s expressivity into narrow subspaces; \textbf{ iii)} only the decoupled variant prevents this cascade, maintaining broad spectral support and suppressing outlier formation across layers. These results underscore that \emph{how} rotary embeddings are applied is just as critical as \emph{where} compression occurs. Sharing rotary components across heads mitigates spectral fragmentation and preserves representational capacity.
nan
Article 950
Title@2025-07-12 (6): Geometric Generative Modeling with Noise-Conditioned Graph Networks
Title: Geometric Generative Modeling with Noise-Conditioned Graph Networks | Geometrische Generative Modellierung mit lärmkonditionierten Graphennetzen | 带有噪音、有条件条件的图形网络的生成模型 2507.09391v1 |
Authors (3): Peter Pao-Huang, Mitchell Black, Xiaojie Qiu
Generative modeling of graphs with spatial structure is essential across many applications from computer graphics to spatial genomics. Recent flow-based generative models have achieved impressive results by gradually adding and then learning to remove noise from these graphs. Existing models, however, use graph neural network architectures that are independent of the noise level, limiting their expressiveness. To address this issue, we introduce \textit{Noise-Conditioned Graph Networks} (NCGNs), a class of graph neural networks that dynamically modify their architecture according to the noise level during generation. Our theoretical and empirical analysis reveals that as noise increases, (1) graphs require information from increasingly distant neighbors and (2) graphs can be effectively represented at lower resolutions. Based on these insights, we develop Dynamic Message Passing (DMP), a specific instantiation of NCGNs that adapts both the range and resolution of message passing to the noise level. DMP consistently outperforms noise-independent architectures on a variety of domains including $3$D point clouds, spatiotemporal transcriptomics, and images. Code is available at https://github.com/peterpaohuang/ncgn.
nan
Article 951
Title@2025-07-12 (6): Multi-Player Zero-Sum Markov Games with Networked Separable Interactions
Title: Multi-Player Zero-Sum Markov Games with Networked Separable Interactions | Multi-Player Zero-Sum Markov Spiele mit vernetzten Separable Interaktionen | 多层零Sum Markov 游戏, 带有网络化分离互动 2307.09470v3 |
Authors (3): Chanwoo Park, Kaiqing Zhang, Asuman Ozdaglar
We study a new class of Markov games, \emph(multi-player) zero-sum Markov Games} with \emph{Networked separable interactions} (zero-sum NMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making. We define a zero-sum NMG as a model where {the payoffs of the auxiliary games associated with each state are zero-sum and} have some separable (i.e., polymatrix) structure across the neighbors over some interaction network. We first identify the necessary and sufficient conditions under which an MG can be presented as a zero-sum NMG, and show that the set of Markov coarse correlated equilibrium (CCE) collapses to the set of Markov Nash equilibrium (NE) in these games, in that the product of per-state marginalization of the former for all players yields the latter. Furthermore, we show that finding approximate Markov \emph{stationary} CCE in infinite-horizon discounted zero-sum NMGs is \texttt{PPAD}-hard, unless the underlying network has a ``star topology’’. Then, we propose fictitious-play-type dynamics, the classical learning dynamics in normal-form games, for zero-sum NMGs, and establish convergence guarantees to Markov stationary NE under a star-shaped network structure. Finally, in light of the hardness result, we focus on computing a Markov \emph{non-stationary} NE and provide finite-iteration guarantees for a series of value-iteration-based algorithms. We also provide numerical experiments to corroborate our theoretical results.
nan
Article 952
Title@2025-07-12 (6): Credit Card Fraud Detection Using RoFormer Model With Relative Distance Rotating Encoding
Title: Credit Card Fraud Detection Using RoFormer Model With Relative Distance Rotating Encoding | Kreditkarte Betrugserkennung mit RoFormer-Modell mit relativer Entfernung rotierende Encoding | 使用具有相对远程旋转编码的ROFermer模型发现信用卡欺诈 2507.09385v1 |
Authors (2): Kevin Reyes, Vasco Cortez
Fraud detection is one of the most important challenges that financial systems must address. Detecting fraudulent transactions is critical for payment gateway companies like Flow Payment, which process millions of transactions monthly and require robust security measures to mitigate financial risks. Increasing transaction authorization rates while reducing fraud is essential for providing a good user experience and building a sustainable business. For this reason, discovering novel and improved methods to detect fraud requires continuous research and investment for any company that wants to succeed in this industry. In this work, we introduced a novel method for detecting transactional fraud by incorporating the Relative Distance Rotating Encoding (ReDRE) in the RoFormer model. The incorporation of angle rotation using ReDRE enhances the characterization of time series data within a Transformer, leading to improved fraud detection by better capturing temporal dependencies and event relationships.
nan
Article 953
Title@2025-07-12 (6): Real-Time Adaptive Motion Planning via Point Cloud-Guided, Energy-Based Diffusion and Potential Fields
Title: Real-Time Adaptive Motion Planning via Point Cloud-Guided, Energy-Based Diffusion and Potential Fields | Echtzeit-Adaptive Motion-Planung über Point Cloud-geführte, energiebasierte Diffusion und potenzielle Felder | 通过点云引导、基于能源的传播和潜在领域进行实时适应性运动规划 2507.09383v1 |
Authors (6): Wondmgezahu Teshome, Kian Behzad, Octavia Camps, Michael Everett, Milad Siami, Mario Sznaier
Motivated by the problem of pursuit-evasion, we present a motion planning framework that combines energy-based diffusion models with artificial potential fields for robust real time trajectory generation in complex environments. Our approach processes obstacle information directly from point clouds, enabling efficient planning without requiring complete geometric representations. The framework employs classifier-free guidance training and integrates local potential fields during sampling to enhance obstacle avoidance. In dynamic scenarios, the system generates initial trajectories using the diffusion model and continuously refines them through potential field-based adaptation, demonstrating effective performance in pursuit-evasion scenarios with partial pursuer observability.
nan
Article 954
Title@2025-07-12 (6): Don’t be so negative! Score-based Generative Modeling with Oracle-assisted Guidance
Title: Don’t be so negative! Score-based Generative Modeling with Oracle-assisted Guidance | Seien Sie nicht so negativ! Score-basierte Generative Modellierung mit Oracle-assisted Guidance | 不要这么消极! 2307.16463v2 |
Authors (5): Saeid Naderiparizi, Xiaoxuan Liang, Setareh Cohan, Berend Zwartsenberg, Frank Wood
Score-based diffusion models are a powerful class of generative models, widely utilized across diverse domains. Despite significant advancements in large-scale tasks such as text-to-image generation, their application to constrained domains has received considerably less attention. This work addresses model learning in a setting where, in addition to the training dataset, there further exists side-information in the form of an oracle that can label samples as being outside the support of the true data generating distribution. Specifically we develop a new denoising diffusion probabilistic modeling methodology, Gen-neG, that leverages this additional side-information. Gen-neG builds on classifier guidance in diffusion models to guide the generation process towards the positive support region indicated by the oracle. We empirically establish the utility of Gen-neG in applications including collision avoidance in self-driving simulators and safety-guarded human motion generation.
nan
Article 955
Title@2025-07-12 (6): Fair CCA for Fair Representation Learning: An ADNI Study
Title: Fair CCA for Fair Representation Learning: An ADNI Study | Faire CCA für Fair Representative Learning: Eine ADNI-Studie | 公平代表性学习公平共同国家评析:ADNI研究 2507.09382v1 |
Authors (9): Bojian Hou, Zhanliang Wang, Zhuoping Zhou, Boning Tong, Zexuan Wang, Jingxuan Bao, Duy Duong-Tran, Qi Long, Li Shen
Canonical correlation analysis (CCA) is a technique for finding correlations between different data modalities and learning low-dimensional representations. As fairness becomes crucial in machine learning, fair CCA has gained attention. However, previous approaches often overlook the impact on downstream classification tasks, limiting applicability. We propose a novel fair CCA method for fair representation learning, ensuring the projected features are independent of sensitive attributes, thus enhancing fairness without compromising accuracy. We validate our method on synthetic data and real-world data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), demonstrating its ability to maintain high correlation analysis performance while improving fairness in classification tasks. Our work enables fair machine learning in neuroimaging studies where unbiased analysis is essential.
nan
Article 956
Title@2025-07-12 (6): ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans
Title: ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans | ESPFormer: Doppelstochastische Aufmerksamkeit mit erwarteten Sliced Transport Plans | ESP Former: 带有预期切片运输计划的多孔蒸汽关注 2502.07962v2 |
Authors (6): Ashkan Shahbazi, Elaheh Akbari, Darian Salehi, Xinran Liu, Navid Naderializadeh, Soheil Kolouri
While self-attention has been instrumental in the success of Transformers, it can lead to over-concentration on a few tokens during training, resulting in suboptimal information flow. Enforcing doubly-stochastic constraints in attention matrices has been shown to improve structure and balance in attention distributions. However, existing methods rely on iterative Sinkhorn normalization, which is computationally costly. In this paper, we introduce a novel, fully parallelizable doubly-stochastic attention mechanism based on sliced optimal transport, leveraging Expected Sliced Transport Plans (ESP). Unlike prior approaches, our method enforces doubly stochasticity without iterative Sinkhorn normalization, significantly enhancing efficiency. To ensure differentiability, we incorporate a temperature-based soft sorting technique, enabling seamless integration into deep learning models. Experiments across multiple benchmark datasets, including image classification, point cloud classification, sentiment analysis, and neural machine translation, demonstrate that our enhanced attention regularization consistently improves performance across diverse applications. Our implementation code can be found at https://github.com/dariansal/ESPFormer.
nan
Article 957
Title@2025-07-12 (6): Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Title: Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers | Untere Grenzen für die Ketten-of-Thought-Reasoning in Hard-Attention Transformers | 硬注意力变换器中寻求链引因的下下下界宽度 2502.02393v3 |
Authors (4): Alireza Amiri, Xinting Huang, Mark Rofin, Michael Hahn
Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers’ expressivity from $TC^0$ to $PTIME$, their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $TC^0$, such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of chain-of-thought steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought reasoning.
nan
Article 958
Title@2025-07-12 (6): Prune ‘n Predict: Optimizing LLM Decision-making with Conformal Prediction
Title: Prune ‘n Predict: Optimizing LLM Decision-making with Conformal Prediction | Prune ‘n Predict: Optimierung der LLM-Entscheidungsfindung mit konformer Vorhersage | 普鲁奈预测:利用非正式预测优化LLM决策 2501.00555v2 |
Authors (6): Harit Vishwakarma, Alan Mishler, Thomas Cook, Niccolò Dalmasso, Natraj Raman, Sumitra Ganesh
Large language models (LLMs) are empowering decision-making in several applications, including tool or API usage and answering multiple-choice questions (MCQs). However, incorrect outputs pose significant risks in high-stakes domains like healthcare and finance. To quantify LLM uncertainty and thereby mitigate these risks, recent works employ conformal prediction (CP), a model- and distribution-agnostic framework that uses LLM outputs to generate a \emph{prediction set} containing the true answer with high probability. Leveraging CP, we propose \emph{conformal revision of questions} (CROQ), which revises the question by narrowing down the available choices to those in the prediction set and asking the LLM the revised question. We expect LLMs to be more accurate on revised questions with fewer choices. Furthermore, we expect CROQ to be effective when the prediction sets from CP are small. Commonly used logit scores often lead to large sets, diminishing CROQ’s effectiveness. To overcome this, we propose CP-OPT, an optimization framework to learn scores that minimize set sizes while maintaining coverage. Our extensive experiments on MMLU, ToolAlpaca, and TruthfulQA datasets with multiple LLMs show that CROQ improves accuracy over the standard inference, with more pronounced gains when paired with CP-OPT.
nan
Article 959
Title@2025-07-12 (6): TabDPT: Scaling Tabular Foundation Models on Real Data
Title: TabDPT: Scaling Tabular Foundation Models on Real Data | TabDPT: Scaling Tabular Foundation Models on Real Data | TabDPT: 真实数据缩放表表基建模型 2410.18164v2 |
Authors (10): Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L. Caterini, Maksims Volkovs
Tabular data is one of the most ubiquitous sources of information worldwide, spanning a wide variety of domains. This inherent heterogeneity has slowed the development of Tabular Foundation Models (TFMs) capable of fast generalization to unseen datasets. In-Context Learning (ICL) has recently emerged as a promising solution for TFMs, enabling dynamic adaptation to new tasks without additional tuning. While many studies have attempted to re-purpose large language models for tabular ICL, they have had limited success, so recent works have focused on developing tabular-specific foundation models. In this work, we propose an approach to combine ICL-based retrieval with self supervised learning to train tabular foundation models. We also investigate the utility of real vs. synthetic data for model pre-training, and show that real data can contain useful signal not easily captured in synthetic training. Specifically, we show that incorporating real data during the pre-training phase can lead to significantly faster training and better downstream generalization to unseen data. Our resulting model, TabDPT, achieves top performance on both regression (CTR23) and classification (CC18) benchmarks. Importantly, we also demonstrate that with our pre-training procedure, scaling both model and data size leads to consistent performance improvements that follow power laws. This echoes scaling laws in LLMs and other foundation models, and suggests that Internet-scale TFMs can be achievable. We open-source our full pipeline: inference code including trained model weights can be found at github.com/layer6ai-labs/TabDPT-inference, and the training code to reproduce experiments can be found at github.com/layer6ai-labs/TabDPT-training.
nan
Article 960
Title@2025-07-12 (6): Meta-autoencoders: An approach to discovery and representation of relationships between dynamically evolving classes
Title: Meta-autoencoders: An approach to discovery and representation of relationships between dynamically evolving classes | Meta-Autoencoder: Ein Ansatz zur Entdeckung und Darstellung von Beziehungen zwischen dynamisch sich entwickelnden Klassen | Meta-autoencoldders:发现动态演变中的类别之间的关系并体现这种关系的方法 2507.09362v1 |
Authors (4): Assaf Marron, Smadar Szekely, Irun Cohen, David Harel
An autoencoder (AE) is a neural network that, using self-supervised training, learns a succinct parameterized representation, and a corresponding encoding and decoding process, for all instances in a given class. Here, we introduce the concept of a meta-autoencoder (MAE): an AE for a collection of autoencoders. Given a family of classes that differ from each other by the values of some parameters, and a trained AE for each class, an MAE for the family is a neural net that has learned a compact representation and associated encoder and decoder for the class-specific AEs. One application of this general concept is in research and modeling of natural evolution – capturing the defining and the distinguishing properties across multiple species that are dynamically evolving from each other and from common ancestors. In this interim report we provide a constructive definition of MAEs, initial examples, and the motivating research directions in machine learning and biology.
nan
Article 961
Title@2025-07-12 (6): Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts
Title: Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts | Vermeiden von Leckagevergiftungen: Konzeptinterventionen unter Verteilungsverschiebungen | 避免漏漏毒:分配变更下的概念干预 2504.17921v2 |
Authors (5): Mateo Espinosa Zarlenga, Gabriele Dominici, Pietro Barbiero, Zohreh Shams, Mateja Jamnik
In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e.g., stripes, black) and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i.e., operations where a human expert corrects a CM’s mispredicted concepts at test time) on CMs’ task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.
nan
Article 962
Title@2025-07-12 (6): Impute With Confidence: A Framework for Uncertainty Aware Multivariate Time Series Imputation
Title: Impute With Confidence: A Framework for Uncertainty Aware Multivariate Time Series Imputation | Impute With Confidence: Ein Framework für Unsicherheit im Bewusstsein multivariate Zeitreihen Imputation | 充满信心的含义:不确定性意识多变时间序列计算框架 2507.09353v1 |
Authors (2): Addison Weatherhead, Anna Goldenberg
Time series data with missing values is common across many domains. Healthcare presents special challenges due to prolonged periods of sensor disconnection. In such cases, having a confidence measure for imputed values is critical. Most existing methods either overlook model uncertainty or lack mechanisms to estimate it. To address this gap, we introduce a general framework that quantifies and leverages uncertainty for selective imputation. By focusing on values the model is most confident in, highly unreliable imputations are avoided. Our experiments on multiple EHR datasets, covering diverse types of missingness, demonstrate that selectively imputing less-uncertain values not only reduces imputation errors but also improves downstream tasks. Specifically, we show performance gains in a 24-hour mortality prediction task, underscoring the practical benefit of incorporating uncertainty into time series imputation.
nan
Article 963
Title@2025-07-12 (6): Watermarking Degrades Alignment in Language Models: Analysis and Mitigation
Title: Watermarking Degrades Alignment in Language Models: Analysis and Mitigation | Wasserzeichen degradiert Ausrichtung in Sprachmodellen: Analyse und Milderung | 语言模型的分级调整:分析和减轻影响 2506.04462v3 |
Authors (3): Apurv Verma, NhatHai Phan, Shubhendu Trivedi
Watermarking techniques for large language models (LLMs) can significantly impact output quality, yet their effects on truthfulness, safety, and helpfulness remain critically underexamined. This paper presents a systematic analysis of how two popular watermarking approaches-Gumbel and KGW-affect these core alignment properties across four aligned LLMs. Our experiments reveal two distinct degradation patterns: guard attenuation, where enhanced helpfulness undermines model safety, and guard amplification, where excessive caution reduces model helpfulness. These patterns emerge from watermark-induced shifts in token distribution, surfacing the fundamental tension that exists between alignment objectives. To mitigate these degradations, we propose Alignment Resampling (AR), an inference-time sampling method that uses an external reward model to restore alignment. We establish a theoretical lower bound on the improvement in expected reward score as the sample size is increased and empirically demonstrate that sampling just 2-4 watermarked generations effectively recovers or surpasses baseline (unwatermarked) alignment scores. To overcome the limited response diversity of standard Gumbel watermarking, our modified implementation sacrifices strict distortion-freeness while maintaining robust detectability, ensuring compatibility with AR. Experimental results confirm that AR successfully recovers baseline alignment in both watermarking approaches, while maintaining strong watermark detectability. This work reveals the critical balance between watermark strength and model alignment, providing a simple inference-time solution to responsibly deploy watermarked LLMs in practice.
nan
Article 964
Title@2025-07-12 (6): Unified Linear Parametric Map Modeling and Perception-aware Trajectory Planning for Mobile Robotics
Title: Unified Linear Parametric Map Modeling and Perception-aware Trajectory Planning for Mobile Robotics | Einheitliche lineare Parametrische Kartenmodellierung und Wahrnehmungs-Bewusst-Planung für mobile Robotik | 移动机器人学统一线性参数测深图建模和感知感测轨迹规划 2507.09340v1 |
Authors (6): Hongyu Nie, Xingyu Li, Xu Liu, Zhaotong Tan, Sen Mei, Wenbo Su
Autonomous navigation in mobile robots, reliant on perception and planning, faces major hurdles in large-scale, complex environments. These include heavy computational burdens for mapping, sensor occlusion failures for UAVs, and traversal challenges on irregular terrain for UGVs, all compounded by a lack of perception-aware strategies. To address these challenges, we introduce Random Mapping and Random Projection (RMRP). This method constructs a lightweight linear parametric map by first mapping data to a high-dimensional space, followed by a sparse random projection for dimensionality reduction. Our novel Residual Energy Preservation Theorem provides theoretical guarantees for this process, ensuring critical geometric properties are preserved. Based on this map, we propose the RPATR (Robust Perception-Aware Trajectory Planner) framework. For UAVs, our method unifies grid and Euclidean Signed Distance Field (ESDF) maps. The front-end uses an analytical occupancy gradient to refine initial paths for safety and smoothness, while the back-end uses a closed-form ESDF for trajectory optimization. Leveraging the trained RMRP model’s generalization, the planner predicts unobserved areas for proactive navigation. For UGVs, the model characterizes terrain and provides closed-form gradients, enabling online planning to circumvent large holes. Validated in diverse scenarios, our framework demonstrates superior mapping performance in time, memory, and accuracy, and enables computationally efficient, safe navigation for high-speed UAVs and UGVs. The code will be released to foster community collaboration.
nan
Article 965
Title@2025-07-12 (6): An Introduction to Flow Matching and Diffusion Models
Title: An Introduction to Flow Matching and Diffusion Models | Eine Einführung in Flow Matching- und Diffusionsmodelle | 流动匹配和推广模型介绍 2506.02070v2 |
Authors (2): Peter Holderrieth, Ezra Erives
Diffusion and flow-based models have become the state of the art for generative AI across a wide range of data modalities, including images, videos, shapes, molecules, music, and more. This tutorial provides a self-contained introduction to diffusion and flow-based generative models from first principles. We systematically develop the necessary mathematical background in ordinary and stochastic differential equations and derive the core algorithms of flow matching and denoising diffusion models. We then provide a step-by-step guide to building image and video generators, including training methods, guidance, and architectural design. This tutorial is ideal for machine learning researchers who want to develop a principled understanding of the theory and practice of generative AI.
nan
Article 966
Title@2025-07-12 (6): WellPINN: Accurate Well Representation for Transient Fluid Pressure Diffusion in Subsurface Reservoirs with Physics-Informed Neural Networks
Title: WellPINN: Accurate Well Representation for Transient Fluid Pressure Diffusion in Subsurface Reservoirs with Physics-Informed Neural Networks | WellPINN: Präzise Well Representation für Transient Fluid Pressure Diffusion in unterirdischen Reservoirs mit physikinformierten Neuronalen Netzwerken | WellPINN: 物理成形神经网络在次表层储层中中流水压力扩散的准确代表性 2507.09330v1 |
Authors (4): Linus Walter, Qingkai Kong, Sara Hanson-Hedgecock, Víctor Vilarrasa
Accurate representation of wells is essential for reliable reservoir characterization and simulation of operational scenarios in subsurface flow models. Physics-informed neural networks (PINNs) have recently emerged as a promising method for reservoir modeling, offering seamless integration of monitoring data and governing physical equations. However, existing PINN-based studies face major challenges in capturing fluid pressure near wells, particularly during the early stage after injection begins. To address this, we propose WellPINN, a modeling workflow that combines the outputs of multiple sequentially trained PINN models to accurately represent wells. This workflow iteratively approximates the radius of the equivalent well to match the actual well dimensions by decomposing the domain into stepwise shrinking subdomains with a simultaneously reducing equivalent well radius. Our results demonstrate that sequential training of superimposing networks around the pumping well is the first workflow that focuses on accurate inference of fluid pressure from pumping rates throughout the entire injection period, significantly advancing the potential of PINNs for inverse modeling and operational scenario simulations. All data and code for this paper will be made openly available at https://github.com/linuswalter/WellPINN.
nan
Article 967
Title@2025-07-12 (6): AGFS-Tractometry: A Novel Atlas-Guided Fine-Scale Tractometry Approach for Enhanced Along-Tract Group Statistical Comparison Using Diffusion MRI Tractography
Title: AGFS-Tractometry: A Novel Atlas-Guided Fine-Scale Tractometry Approach for Enhanced Along-Tract Group Statistical Comparison Using Diffusion MRI Tractography | AGFS-Traktometrie: Ein neuartiger Atlas-geführter Fine-Scale-Traktometrie-Ansatz für einen verbesserten along-Tract-Gruppen-Statistikvergleich mit Diffusions-MRT-Traktographie | AGFS-Tracto量测:利用扩散MRI轨迹测量法,采用新式阿特拉斯综合地图集指导的微规模微规模轨迹测量方法,加强联合接触小组统计比较 2507.10601v1 |
Authors (10): Ruixi Zheng, Wei Zhang, Yijie Li, Xi Zhu, Zhou Lan, Jarrett Rushmore, Yogesh Rathi, Nikos Makris, Lauren J. O’Donnell, Fan Zhang
Diffusion MRI (dMRI) tractography is currently the only method for in vivo mapping of the brain’s white matter (WM) connections. Tractometry is an advanced tractography analysis technique for along-tract profiling to investigate the morphology and microstructural properties along the fiber tracts. Tractometry has become an essential tool for studying local along-tract differences between different populations (e.g., health vs disease). In this study, we propose a novel atlas-guided fine-scale tractometry method, namely AGFS-Tractometry, that leverages tract spatial information and permutation testing to enhance the along-tract statistical analysis between populations. There are two major contributions in AGFS-Tractometry. First, we create a novel atlas-guided tract profiling template that enables consistent, fine-scale, along-tract parcellation of subject-specific fiber tracts. Second, we propose a novel nonparametric permutation testing group comparison method to enable simultaneous analysis across all along-tract parcels while correcting for multiple comparisons. We perform experimental evaluations on synthetic datasets with known group differences and in vivo real data. We compare AGFS-Tractometry with two state-of-the-art tractometry methods, including Automated Fiber-tract Quantification (AFQ) and BUndle ANalytics (BUAN). Our results show that the proposed AGFS-Tractometry obtains enhanced sensitivity and specificity in detecting local WM differences. In the real data analysis experiments, AGFS-Tractometry can identify more regions with significant differences, which are anatomically consistent with the existing literature. Overall, these demonstrate the ability of AGFS-Tractometry to detect subtle or spatially localized WM group-level differences. The created tract profiling template and related code are available at: https://github.com/ZhengRuixi/AGFS-Tractometry.git.
nan
Article 968
Title@2025-07-12 (6): Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Title: Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? | Bongard im Wunderland: Visuelle Puzzles, die KI immer noch verrückt machen? | Bongard in Wonderland:仍然让AI疯掉的视觉图解? 2410.19546v4 |
Authors (8): Antonia Wüst, Tim Woydt, Lukas Helff, Inga Ibs, Wolfgang Stammer, Devendra S. Dhami, Constantin A. Rothkopf, Kristian Kersting
Recently, newly developed Vision-Language Models (VLMs), such as OpenAI’s o1, have emerged, seemingly demonstrating advanced reasoning capabilities across text and image modalities. However, the depth of these advances in language-guided perception and abstract reasoning remains underexplored, and it is unclear whether these models can truly live up to their ambitious promises. To assess the progress and identify shortcomings, we enter the wonderland of Bongard problems, a set of classic visual reasoning puzzles that require human-like abilities of pattern recognition and abstract reasoning. With our extensive evaluation setup, we show that while VLMs occasionally succeed in identifying discriminative concepts and solving some of the problems, they frequently falter. Surprisingly, even elementary concepts that may seem trivial to humans, such as simple spirals, pose significant challenges. Moreover, when explicitly asked to recognize ground truth concepts, they continue to falter, suggesting not only a lack of understanding of these elementary visual concepts but also an inability to generalize to unseen concepts. We compare the results of VLMs to human performance and observe that a significant gap remains between human visual reasoning capabilities and machine cognition.
nan
Article 969
Title@2025-07-12 (6): LLM Agents Are the Antidote to Walled Gardens
Title: LLM Agents Are the Antidote to Walled Gardens | LLM-Agenten sind das Gegenmittel zu ummauerten Gärten | LLM 药剂是被围墙隔绝的花园的抗药剂 2506.23978v2 |
Authors (2): Samuele Marro, Philip Torr
While the Internet’s core infrastructure was designed to be open and universal, today’s application layer is dominated by closed, proprietary platforms. Open and interoperable APIs require significant investment, and market leaders have little incentive to enable data exchange that could erode their user lock-in. We argue that LLM-based agents fundamentally disrupt this status quo. Agents can automatically translate between data formats and interact with interfaces designed for humans: this makes interoperability dramatically cheaper and effectively unavoidable. We name this shift universal interoperability: the ability for any two digital services to exchange data seamlessly using AI-mediated adapters. Universal interoperability undermines monopolistic behaviours and promotes data portability. However, it can also lead to new security risks and technical debt. Our position is that the ML community should embrace this development while building the appropriate frameworks to mitigate the downsides. By acting now, we can harness AI to restore user freedom and competitive markets without sacrificing security.
nan
Article 970
Title@2025-07-12 (6): Uncovering symmetric and asymmetric species associations from community and environmental data
Title: Uncovering symmetric and asymmetric species associations from community and environmental data | Entdeckung symmetrischer und asymmetrischer Artenverbände aus Gemeinschafts- und Umweltdaten | 利用社区和环境数据覆盖对称和不对称物种协会 2507.09317v1 |
Authors (6): Sara Si-Moussi, Esther Galbrun, Mickael Hedde, Giovanni Poggiato, Matthias Rohr, Wilfried Thuiller
There is no much doubt that biotic interactions shape community assembly and ultimately the spatial co-variations between species. There is a hope that the signal of these biotic interactions can be observed and retrieved by investigating the spatial associations between species while accounting for the direct effects of the environment. By definition, biotic interactions can be both symmetric and asymmetric. Yet, most models that attempt to retrieve species associations from co-occurrence or co-abundance data internally assume symmetric relationships between species. Here, we propose and validate a machine-learning framework able to retrieve bidirectional associations by analyzing species community and environmental data. Our framework (1) models pairwise species associations as directed influences from a source to a target species, parameterized with two species-specific latent embeddings: the effect of the source species on the community, and the response of the target species to the community; and (2) jointly fits these associations within a multi-species conditional generative model with different modes of interactions between environmental drivers and biotic associations. Using both simulated and empirical data, we demonstrate the ability of our framework to recover known asymmetric and symmetric associations and highlight the properties of the learned association networks. By comparing our approach to other existing models such as joint species distribution models and probabilistic graphical models, we show its superior capacity at retrieving symmetric and asymmetric interactions. The framework is intuitive, modular and broadly applicable across various taxonomic groups.
nan
Article 971
Title@2025-07-12 (6): Emergence of Hierarchical Emotion Organization in Large Language Models
Title: Emergence of Hierarchical Emotion Organization in Large Language Models | Entstehung der Hierarchischen Emotionsorganisation in großen Sprachmodellen | 大语言模式中等级情感组织的出现 2507.10599v1 |
Authors (7): Bo Zhao, Maya Okawa, Eric J. Bigelow, Rose Yu, Tomer Ullman, Ekdeep Singh Lubana, Hidenori Tanaka
As large language models (LLMs) increasingly power conversational agents, understanding how they model users’ emotional states is critical for ethical deployment. Inspired by emotion wheels – a psychological framework that argues emotions organize hierarchically – we analyze probabilistic dependencies between emotional states in model outputs. We find that LLMs naturally form hierarchical emotion trees that align with human psychological models, and larger models develop more complex hierarchies. We also uncover systematic biases in emotion recognition across socioeconomic personas, with compounding misclassifications for intersectional, underrepresented groups. Human studies reveal striking parallels, suggesting that LLMs internalize aspects of social perception. Beyond highlighting emergent emotional reasoning in LLMs, our results hint at the potential of using cognitively-grounded theories for developing better model evaluations.
nan
Article 972
Title@2025-07-12 (6): DAA*: Deep Angular A Star for Image-based Path Planning
Title: DAA*: Deep Angular A Star for Image-based Path Planning | DAA*: Deep Angular Ein Stern für bildbasierte Pfadplanung | DAA*:基于图像的路径规划深角A星 2507.09305v1 |
Authors (1): Zhiwei Xu
Path smoothness is often overlooked in path imitation learning from expert demonstrations. In this paper, we introduce a novel learning method, termed deep angular A* (DAA), by incorporating the proposed path angular freedom (PAF) into A to improve path similarity through adaptive path smoothness. The PAF aims to explore the effect of move angles on path node expansion by finding the trade-off between their minimum and maximum values, allowing for high adaptiveness for imitation learning. DAA* improves path optimality by closely aligning with the reference path through joint optimization of path shortening and smoothing, which correspond to heuristic distance and PAF, respectively. Throughout comprehensive evaluations on 7 datasets, including 4 maze datasets, 2 video-game datasets, and a real-world drone-view dataset containing 2 scenarios, we demonstrate remarkable improvements of our DAA* over neural A* in path similarity between the predicted and reference paths with a shorter path length when the shortest path is plausible, improving by 9.0% SPR, 6.9% ASIM, and 3.9% PSIM. Furthermore, when jointly learning pathfinding with both path loss and path probability map loss, DAA* significantly outperforms the state-of-the-art TransPath by 6.7% SPR, 6.5% PSIM, and 3.7% ASIM. We also discuss the minor trade-off between path optimality and search efficiency where applicable.
nan
Article 973
Title@2025-07-12 (6): ViT-ProtoNet for Few-Shot Image Classification: A Multi-Benchmark Evaluation
Title: ViT-ProtoNet for Few-Shot Image Classification: A Multi-Benchmark Evaluation | ViT-ProtoNet für die Wenig-Schuss-Bildklassifikation: Eine Multi-Benchmark-Bewertung | 鲜热图像分类Vit-ProtoNet:多基准评价 2507.09299v1 |
Authors (3): Abdulvahap Mutlu, Şengül Doğan, Türker Tuncer
The remarkable representational power of Vision Transformers (ViTs) remains underutilized in few-shot image classification. In this work, we introduce ViT-ProtoNet, which integrates a ViT-Small backbone into the Prototypical Network framework. By averaging class conditional token embeddings from a handful of support examples, ViT-ProtoNet constructs robust prototypes that generalize to novel categories under 5-shot settings. We conduct an extensive empirical evaluation on four standard benchmarks: Mini-ImageNet, FC100, CUB-200, and CIFAR-FS, including overlapped support variants to assess robustness. Across all splits, ViT-ProtoNet consistently outperforms CNN-based prototypical counterparts, achieving up to a 3.2\% improvement in 5-shot accuracy and demonstrating superior feature separability in latent space. Furthermore, it outperforms or is competitive with transformer-based competitors using a more lightweight backbone. Comprehensive ablations examine the impact of transformer depth, patch size, and fine-tuning strategy. To foster reproducibility, we release code and pretrained weights. Our results establish ViT-ProtoNet as a powerful, flexible approach for few-shot classification and set a new baseline for transformer-based meta-learners.
nan
Article 974
Title@2025-07-12 (6): Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming
Title: Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming | Lernbasierte Multiuser-Scheichung in MIMO-OFDM-Systemen mit Hybrid-Beamforming | MOIMO-OFDM系统和混合波束系统中基于学习的多用户规划 2506.08263v2 |
Authors (4): Pouya Agheli, Tugce Kobal, François Durand, Matthew Andrews
We investigate the multiuser scheduling problem in multiple-input multiple-output (MIMO) systems using orthogonal frequency division multiplexing (OFDM) and hybrid beamforming in which a base station (BS) communicates with multiple users over millimeter wave (mmWave) channels in the downlink. Improved scheduling is critical for enhancing spectral efficiency and the long-term performance of the system from the perspective of proportional fairness (PF) metric in hybrid beamforming systems due to its limited multiplexing gain. Our objective is to maximize PF by properly designing the analog and digital precoders within the hybrid beamforming and selecting the users subject to the number of radio frequency (RF) chains. Leveraging the characteristics of mmWave channels, we apply a two-timescale protocol. On a long timescale, we assign an analog beam to each user. Scheduling the users and designing the digital precoder are done accordingly on a short timescale. To conduct scheduling, we propose combinatorial solutions, such as greedy and sorting algorithms, followed by a machine learning (ML) approach. Our numerical results highlight the trade-off between the performance and complexity of the proposed approaches. Consequently, we show that the choice of approach depends on the specific criteria within a given scenario.
nan
Article 975
Title@2025-07-12 (6): ClaritySpeech: Dementia Obfuscation in Speech
Title: ClaritySpeech: Dementia Obfuscation in Speech | ClaritySpeech: Dementia Verschleierung in der Rede | 清晰的言语:言语中的痴呆症 2507.09282v1 |
Authors (3): Dominika Woszczyk, Ranya Aloufi, Soteris Demetriou
Dementia, a neurodegenerative disease, alters speech patterns, creating communication barriers and raising privacy concerns. Current speech technologies, such as automatic speech transcription (ASR), struggle with dementia and atypical speech, further challenging accessibility. This paper presents a novel dementia obfuscation in speech framework, ClaritySpeech, integrating ASR, text obfuscation, and zero-shot text-to-speech (TTS) to correct dementia-affected speech while preserving speaker identity in low-data environments without fine-tuning. Results show a 16% and 10% drop in mean F1 score across various adversarial settings and modalities (audio, text, fusion) for ADReSS and ADReSSo, respectively, maintaining 50% speaker similarity. We also find that our system improves WER (from 0.73 to 0.08 for ADReSS and 0.15 for ADReSSo) and speech quality from 1.65 to ~2.15, enhancing privacy and accessibility.
nan
Article 976
Title@2025-07-12 (6): A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving
Title: A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving | Eine Überprüfung der Belohnungsfunktionen für die Stärkung des Lernens im Kontext des autonomen Fahrens | 在自主驾驶的情况下审查加强学习的奖励职能 2405.01440v2 |
Authors (3): Ahmed Abouelazm, Jonas Michel, J. Marius Zoellner
Reinforcement learning has emerged as an important approach for autonomous driving. A reward function is used in reinforcement learning to establish the learned skill objectives and guide the agent toward the optimal policy. Since autonomous driving is a complex domain with partly conflicting objectives with varying degrees of priority, developing a suitable reward function represents a fundamental challenge. This paper aims to highlight the gap in such function design by assessing different proposed formulations in the literature and dividing individual objectives into Safety, Comfort, Progress, and Traffic Rules compliance categories. Additionally, the limitations of the reviewed reward functions are discussed, such as objectives aggregation and indifference to driving context. Furthermore, the reward categories are frequently inadequately formulated and lack standardization. This paper concludes by proposing future research that potentially addresses the observed shortcomings in rewards, including a reward validation framework and structured rewards that are context-aware and able to resolve conflicts.
nan
Article 977
Title@2025-07-12 (6): Controllable Patching for Compute-Adaptive Surrogate Modeling of Partial Differential Equations
Title: Controllable Patching for Compute-Adaptive Surrogate Modeling of Partial Differential Equations | Ansteuerbare Patching für die Berechnung adaptive Surrogate Modellierung von partiellen Differentialgleichungen | 局部差别等量计算-加速替代模型可控补补丁 2507.09264v1 |
Authors (4): Payel Mukhopadhyay, Michael McCabe, Ruben Ohana, Miles Cranmer
Patch-based transformer surrogates have become increasingly effective for modeling spatiotemporal dynamics, but the fixed patch size is a major limitation for budget-conscience deployment in production. We introduce two lightweight, architecture-agnostic modules-the Convolutional Kernel Modulator (CKM) and Convolutional Stride Modulator (CSM)-that enable dynamic patch size control at inference in patch based models, without retraining or accuracy loss. Combined with a cyclic patch-size rollout, our method mitigates patch artifacts and improves long-term stability for video-like prediction tasks. Applied to a range of challenging 2D and 3D PDE benchmarks, our approach improves rollout fidelity and runtime efficiency. To our knowledge, this is the first framework to enable inference-time patch-size tunability in patch-based PDE surrogates. Its plug-and-play design makes it broadly applicable across architectures-establishing a general foundation for compute-adaptive modeling in PDE surrogate tasks.
nan
Article 978
Title@2025-07-12 (6): TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding
Title: TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding | TPP-SD: Beschleunigung der Transformer-Punkt-Prozedursampling mit spekulativer Dekodierung | TPP-SD:加速变速点进程与投机代号抽样 2507.09252v1 |
Authors (4): Shukai Gong, Yiyang Fu, Fengyuan Ran, Feng Zhou
We propose TPP-SD, a novel approach that accelerates Transformer temporal point process (TPP) sampling by adapting speculative decoding (SD) techniques from language models. By identifying the structural similarities between thinning algorithms for TPPs and speculative decoding for language models, we develop an efficient sampling framework that leverages a smaller draft model to generate multiple candidate events, which are then verified by the larger target model in parallel. TPP-SD maintains the same output distribution as autoregressive sampling while achieving significant acceleration. Experiments on both synthetic and real datasets demonstrate that our approach produces samples from identical distributions as standard methods, but with 2-6$\times$ speedup. Our ablation studies analyze the impact of hyperparameters such as draft length and draft model size on sampling efficiency. TPP-SD bridges the gap between powerful Transformer TPP models and the practical need for rapid sequence sampling.
nan
Article 979
Title@2025-07-12 (6): GRAG: Graph Retrieval-Augmented Generation
Title: GRAG: Graph Retrieval-Augmented Generation | GRAG: Graph Retrieval-Augmented Generation | GRAG: 图表检索-提款一代 2405.16506v3 |
Authors (6): Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao
Naive Retrieval-Augmented Generation (RAG) focuses on individual documents during retrieval and, as a result, falls short in handling networked documents which are very popular in many applications such as citation graphs, social media, and knowledge graphs. To overcome this limitation, we introduce Graph Retrieval-Augmented Generation (GRAG), which tackles the fundamental challenges in retrieving textual subgraphs and integrating the joint textual and topological information into Large Language Models (LLMs) to enhance its generation. To enable efficient textual subgraph retrieval, we propose a novel divide-and-conquer strategy that retrieves the optimal subgraph structure in linear time. To achieve graph context-aware generation, incorporate textual graphs into LLMs through two complementary views-the text view and the graph view-enabling LLMs to more effectively comprehend and utilize the graph context. Extensive experiments on graph reasoning benchmarks demonstrate that in scenarios requiring multi-hop reasoning on textual graphs, our GRAG approach significantly outperforms current state-of-the-art RAG methods. Our datasets as well as codes of GRAG are available at https://github.com/HuieL/GRAG.
nan
Article 980
Title@2025-07-12 (6): Shaping Laser Pulses with Reinforcement Learning
Title: Shaping Laser Pulses with Reinforcement Learning | Laserpulse mit Verstärkungslernen gestalten | 利用强化学习制造激光脉动 2503.00499v2 |
Authors (3): Francesco Capuano, Davorin Peceli, Gabriele Tiboni
High Power Laser (HPL) systems operate in the attoseconds regime – the shortest timescale ever created by humanity. HPL systems are instrumental in high-energy physics, leveraging ultra-short impulse durations to yield extremely high intensities, which are essential for both practical applications and theoretical advancements in light-matter interactions. Traditionally, the parameters regulating HPL optical performance have been manually tuned by human experts, or optimized using black-box methods that can be computationally demanding. Critically, black box methods rely on stationarity assumptions overlooking complex dynamics in high-energy physics and day-to-day changes in real-world experimental settings, and thus need to be often restarted. Deep Reinforcement Learning (DRL) offers a promising alternative by enabling sequential decision making in non-static settings. This work explores the feasibility of applying DRL to HPL systems, extending the current research by (1) learning a control policy relying solely on non-destructive image observations obtained from readily available diagnostic devices, and (2) retaining performance when the underlying dynamics vary. We evaluate our method across various test dynamics, and observe that DRL effectively enables cross-domain adaptability, coping with dynamics’ fluctuations while achieving 90\% of the target intensity in test environments.
nan
Article 981
Title@2025-07-12 (6): Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
Title: Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs | Bepflanzt in der Vorausbildung, durch Finetuning abgeschwächt: Eine Fallstudie über die Herkunft von Kognitiv-Biasen in LLMs | 编在培训前编,《微调:关于LLM中认知性双星起源的个案研究》,《微调摇摇晃》 2507.07186v2 |
Authors (3): Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky
Large language models (LLMs) exhibit cognitive biases – systematic tendencies of irrational decision-making, similar to those seen in humans. Prior work has found that these biases vary across models and can be amplified by instruction tuning. However, it remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise due to training stochasticity. We propose a two-step causal experimental approach to disentangle these factors. First, we finetune models multiple times using different random seeds to study how training randomness affects over $30$ cognitive biases. Second, we introduce \emph{cross-tuning} – swapping instruction datasets between models to isolate bias sources. This swap uses datasets that led to different bias patterns, directly testing whether biases are dataset-dependent. Our findings reveal that while training randomness introduces some variability, biases are mainly shaped by pretraining: models with the same pretrained backbone exhibit more similar bias patterns than those sharing only finetuning data. These insights suggest that understanding biases in finetuned models requires considering their pretraining origins beyond finetuning effects. This perspective can guide future efforts to develop principled strategies for evaluating and mitigating bias in LLMs.
nan
Article 982
Title@2025-07-12 (6): PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution
Title: PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution | PanoDiff-SR: Dental Panoramic Radiographen mit Diffusion und Super-Auflösung synthetisieren | PanoDiff-SR:利用传播和超分辨率合成牙科全无光辐射 2507.09227v1 |
Authors (5): Sanyam Jain, Bruna Neves de Freitas, Andreas Basse-OConnor, Alexandros Iosifidis, Ruben Pauwels
There has been increasing interest in the generation of high-quality, realistic synthetic medical images in recent years. Such synthetic datasets can mitigate the scarcity of public datasets for artificial intelligence research, and can also be used for educational purposes. In this paper, we propose a combination of diffusion-based generation (PanoDiff) and Super-Resolution (SR) for generating synthetic dental panoramic radiographs (PRs). The former generates a low-resolution (LR) seed of a PR (256 X 128) which is then processed by the SR model to yield a high-resolution (HR) PR of size 1024 X 512. For SR, we propose a state-of-the-art transformer that learns local-global relationships, resulting in sharper edges and textures. Experimental results demonstrate a Frechet inception distance score of 40.69 between 7243 real and synthetic images (in HR). Inception scores were 2.55, 2.30, 2.90 and 2.98 for real HR, synthetic HR, real LR and synthetic LR images, respectively. Among a diverse group of six clinical experts, all evaluating a mixture of 100 synthetic and 100 real PRs in a time-limited observation, the average accuracy in distinguishing real from synthetic images was 68.5% (with 50% corresponding to random guessing).
nan
Article 983
Title@2025-07-12 (6): Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
Title: Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models | Feature-Extraktion und -Lenkung für eine verbesserte Kettenbildung in Sprachmodellen | 语言模型中强化研究链理由的特征采掘和指南 2505.15634v4 |
Authors (6): Zihao Li, Xu Wang, Yuzhe Yang, Ziyu Yao, Haoyi Xiong, Mengnan Du
Large Language Models (LLMs) demonstrate the ability to solve reasoning and mathematical problems using the Chain-of-Thought (CoT) technique. Expanding CoT length, as seen in models such as DeepSeek-R1, significantly enhances this reasoning for complex problems, but requires costly and high-quality long CoT data and fine-tuning. This work, inspired by the deep thinking paradigm of DeepSeek-R1, utilizes a steering technique to enhance the reasoning ability of an LLM without external datasets. Our method first employs Sparse Autoencoders (SAEs) to extract interpretable features from vanilla CoT. These features are then used to steer the LLM’s internal states during generation. Recognizing that many LLMs do not have corresponding pre-trained SAEs, we further introduce a novel SAE-free steering algorithm, which directly computes steering directions from the residual activations of an LLM, obviating the need for an explicit SAE. Experimental results demonstrate that both our SAE-based and subsequent SAE-free steering algorithms significantly enhance the reasoning capabilities of LLMs.
nan
Article 984
Title@2025-07-12 (6): Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift
Title: Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift | Kalibrierte und robuste Fundamentierungsmodelle für Vision-Sprache und medizinische Bildaufgaben unter Verteilungsverschiebung | 分配变化下的愿景语言和医疗图像任务模型 2507.09222v1 |
Authors (2): Behraj Khan, Tahir Syed
Foundation models like CLIP and SAM have transformed computer vision and medical imaging via low-shot transfer learning. However, deployment of these models hindered by two key challenges: \textit{distribution shift} between training and test data, and \textit{confidence misalignment} that leads to overconfident incorrect predictions. These issues manifest differently in vision-language classification and medical segmentation tasks, yet existing solutions remain domain-specific. We propose \textit{StaRFM}, a unified framework addressing both challenges. It introduces a Fisher information penalty (FIP), extended to 3D medical data via patch-wise regularization, to reduce covariate shift in CLIP and SAM embeddings. Additionally, a confidence misalignment penalty (CMP), reformulated for voxel-level predictions, calibrates uncertainty in segmentation tasks. We theoretically derive PAC-Bayes bounds showing FIP controls generalization via the Fisher-Rao norm, while CMP minimizes calibration error through Brier score optimization. StaRFM shows consistent performance like \texttt{+}3.5\% accuracy and 28\% lower ECE on 19 vision datasets (e.g., ImageNet, Office-Home), 84.7\% DSC and 4.8mm HD95 in medical segmentation (e.g., BraTS, ATLAS), and 40\% lower cross-domain performance gap compared to prior benchmarking methods. The framework is plug-and-play, requiring minimal architectural changes for seamless integration with foundation models. Code and models will be released at https://anonymous.4open.science/r/StaRFM-C0CD/README.md
nan
Article 985
Title@2025-07-12 (6): Optimizing Basis Function Selection in Constructive Wavelet Neural Networks and Its Applications
Title: Optimizing Basis Function Selection in Constructive Wavelet Neural Networks and Its Applications | Optimierung der Basisfunktionsauswahl in konstruktiven Wavelet-Neuralnetzwerken und deren Anwendungen | 在建设性动态神经网络及其应用中优化基础功能选择 2507.09213v1 |
Authors (4): Dunsheng Huang, Dong Shen, Lei Lu, Ying Tan
Wavelet neural network (WNN), which learns an unknown nonlinear mapping from the data, has been widely used in signal processing, and time-series analysis. However, challenges in constructing accurate wavelet bases and high computational costs limit their application. This study introduces a constructive WNN that selects initial bases and trains functions by introducing new bases for predefined accuracy while reducing computational costs. For the first time, we analyze the frequency of unknown nonlinear functions and select appropriate initial wavelets based on their primary frequency components by estimating the energy of the spatial frequency component. This leads to a novel constructive framework consisting of a frequency estimator and a wavelet-basis increase mechanism to prioritize high-energy bases, significantly improving computational efficiency. The theoretical foundation defines the necessary time-frequency range for high-dimensional wavelets at a given accuracy. The framework’s versatility is demonstrated through four examples: estimating unknown static mappings from offline data, combining two offline datasets, identifying time-varying mappings from time-series data, and capturing nonlinear dependencies in real time-series data. These examples showcase the framework’s broad applicability and practicality. All the code will be released at https://github.com/dshuangdd/CWNN.
nan
Article 986
Title@2025-07-12 (6): Warm Starts Accelerate Generative Modelling
Title: Warm Starts Accelerate Generative Modelling | Warmer Start beschleunigt generative Modellierung | 温度起温加速生成模型 2507.09212v1 |
Authors (2): Jonas Scholz, Richard E. Turner
Iterative generative models, like diffusion and flow-matching, create high-fidelity samples by progressively refining a noise vector into data. However, this process is notoriously slow, often requiring hundreds of function evaluations. We introduce the warm-start model, a simple, deterministic model that dramatically accelerates conditional generation by providing a better starting point. Instead of starting generation from an uninformed N(0, I) prior, our warm-start model predicts an informed prior N(mu, sigma), whose moments are conditioned on the input context. This “warm start” substantially reduces the distance the generative process must traverse, particularly when the conditioning information is strongly informative. On tasks like image inpainting, our method achieves results competitive with a 1000-step DDPM baseline using only 11 total function evaluations (1 for the warm start, 10 for generation). A simple conditional normalization trick makes our method compatible with any standard generative model and sampler without modification, allowing it to be combined with other efficient sampling techniques for further acceleration. Our implementation is available at https://github.com/jonas-scholz123/warm-start-model.
nan
Article 987
Title@2025-07-12 (6): Capturing Unseen Spatial Extremes Through Knowledge-Informed Generative Modeling
Title: Capturing Unseen Spatial Extremes Through Knowledge-Informed Generative Modeling | Ungesehene räumliche Extreme durch wissensbasierte generative Modellierung erfassen | 通过知识化创创创型模型获取不见的空间极端 2507.09211v1 |
Authors (8): Xinyue Liu, Xiao Peng, Shuyue Yan, Yuntian Chen, Dongxiao Zhang, Zhixiao Niu, Hui-Min Wang, Xiaogang He
Observed records of climate extremes provide an incomplete picture of risk, missing “unseen” extremes that exceed historical bounds. In parallel, neglecting spatial dependence undervalues the risk of synchronized hazards that amplify impacts. To address these challenges, we develop DeepX-GAN (Dependence-Enhanced Embedding for Physical eXtremes - Generative Adversarial Network), a knowledge-informed deep generative model designed to better capture the spatial structure of rare extremes. The zero-shot generalizability of DeepX-GAN enables simulation of unseen extremes that fall outside historical experience yet remain statistically plausible. We define two types of unseen extremes: “checkmate” extremes that directly hit targets, and “stalemate” extremes that narrowly miss. These unrealized scenarios expose latent risks in fragile systems and may reinforce a false sense of resilience if overlooked. Near misses, in particular, can prompt either proactive adaptation or dangerous complacency, depending on how they are interpreted. Applying DeepX-GAN to the Middle East and North Africa (MENA), we find that these unseen extremes disproportionately affect regions with high vulnerability and low socioeconomic readiness, but differ in urgency and interpretation. Future warming could expand and redistribute these unseen extremes, with emerging exposure hotspots in Indo-Pakistan and Central Africa. This distributional shift highlights critical blind spots in conventional hazard planning and underscores the need to develop spatially adaptive policies that anticipate emergent risk hotspots rather than simply extrapolating from historical patterns.
nan
Article 988
Title@2025-07-12 (6): Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data
Title: Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data | Diffusion Datensatzkondensation: Training Ihres Diffusionsmodells schneller mit weniger Daten | 传播数据集集中: 训练您的传播模型, 以更少数据更快的速度 2507.05914v2 |
Authors (9): Rui Huang, Shitong Shao, Zikai Zhou, Pukun Zhao, Hangyu Guo, Tian Ye, Lichen Bai, Shuo Yang, Zeke Xie
Diffusion models have achieved remarkable success in various generative tasks, but training them remains highly resource-intensive, often requiring millions of images and many days of GPU computation. From a data-centric perspective addressing this limitation, we study diffusion dataset condensation as a new and challenging problem setting. The goal is to construct a “synthetic” sub-dataset with significantly fewer samples than the original dataset, enabling high-quality diffusion model training with greatly reduced cost. To the best of our knowledge, we are the first to formally investigate dataset condensation for diffusion models, whereas prior work focused on training discriminative models. To tackle this new challenge, we propose a novel Diffusion Dataset Condensation (D2C) framework, which consists of two phases: Select and Attach. The Select phase identifies a compact and diverse subset using a diffusion difficulty score and interval sampling. The Attach phase enhances the selected subset by attaching rich semantic and visual representations to strengthen the conditional signals. Extensive experiments across various dataset sizes, model architectures, and resolutions show that our D2C framework enables significantly faster diffusion model training with dramatically fewer data, while preserving high visual quality. Notably, for the SiT-XL/2 architecture, D2C achieves a 100x training speed-up, reaching a FID score of 4.3 in just 40k steps using only 0.8% of the training data.
nan
Article 989
Title@2025-07-12 (6): XiChen: An observation-scalable fully AI-driven global weather forecasting system with 4D variational knowledge
Title: XiChen: An observation-scalable fully AI-driven global weather forecasting system with 4D variational knowledge | XiChen: Ein beobachtungs-skalierbares, voll KI-gesteuertes globales Wettervorhersagesystem mit 4D-Variationswissen | Xichin Chhen: 一个具有4D变异知识的、可观测的完全可扩展的AI驱动的全球天气预报系统 2507.09202v1 |
Authors (13): Wuxin Wang, Weicheng Ni, Lilan Huang, Tao Hao, Ben Fei, Shuo Ma, Taikang Yuan, Yanlai Zhao, Kefeng Deng, Xiaoyong Li, Boheng Duan, Lei Bai, Kaijun Ren
Recent advancements in Artificial Intelligence (AI) demonstrate significant potential to revolutionize weather forecasting. However, most AI-driven models rely on Numerical Weather Prediction (NWP) systems for initial condition preparation, which often consumes hours on supercomputers. Here we introduce XiChen, the first observation-scalable fully AI-driven global weather forecasting system, whose entire pipeline, from Data Assimilation (DA) to medium-range forecasting, can be accomplished within only 17 seconds. XiChen is built upon a foundation model that is pre-trained for weather forecasting. Meanwhile, this model is subsequently fine-tuned to serve as both observation operators and DA models, thereby scalably assimilating conventional and raw satellite observations. Furthermore, the integration of four-dimensional variational knowledge ensures that XiChen’s DA and medium-range forecasting accuracy rivals that of operational NWP systems, amazingly achieving a skillful forecasting lead time exceeding 8.25 days. These findings demonstrate that XiChen holds strong potential toward fully AI-driven weather forecasting independent of NWP systems.
nan
Article 990
Title@2025-07-12 (6): Learning from M-Tuple Dominant Positive and Unlabeled Data
Title: Learning from M-Tuple Dominant Positive and Unlabeled Data | Lernen von M-Tuple Dominant Positive und unmarkierte Daten | 从 M- Tiple 主导正和非标签数据中学习 2506.15686v2 |
Authors (4): Jiahe Qin, Junpeng Li, Changchun Hua, Yana Yang
Label Proportion Learning (LLP) addresses the classification problem where multiple instances are grouped into bags and each bag contains information about the proportion of each class. However, in practical applications, obtaining precise supervisory information regarding the proportion of instances in a specific class is challenging. To better align with real-world application scenarios and effectively leverage the proportional constraints of instances within tuples, this paper proposes a generalized learning framework \emph{MDPU}. Specifically, we first mathematically model the distribution of instances within tuples of arbitrary size, under the constraint that the number of positive instances is no less than that of negative instances. Then we derive an unbiased risk estimator that satisfies risk consistency based on the empirical risk minimization (ERM) method. To mitigate the inevitable overfitting issue during training, a risk correction method is introduced, leading to the development of a corrected risk estimator. The generalization error bounds of the unbiased risk estimator theoretically demonstrate the consistency of the proposed method. Extensive experiments on multiple datasets and comparisons with other relevant baseline methods comprehensively validate the effectiveness of the proposed learning framework.
nan
Article 991
Title@2025-07-12 (6): Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models
Title: Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models | Erkennen und Beschneiden Prominenter, aber detrimentaler Neuronen in großen Sprachmodellen | 在大语言模型中检测和预视突出但有偏偏的神经元 2507.09185v1 |
Authors (4): Ameen Ali, Shahar Katz, Lior Wolf, Ivan Titov
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets, such as reliance on domain-specific correlations, which yield high-confidence predictions without generalizable reasoning. While beneficial in one setting, these dataset-specific mechanisms typically degrade performance when models encounter novel tasks or distributions. In this work, we introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms in transformer-based LLMs. Our method employs Integrated Gradients to quantify each neuron’s influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance without supporting robust, transferable reasoning. Selectively pruning these neurons compels the model to depend on generalizable representations. Evaluated across multiple-choice benchmarks, our pruning-based fine-tuning significantly enhances performance, surpassing prior (non-pruning) adaptation methods.
nan
Article 992
Title@2025-07-12 (6): CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
Title: CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models | CASCADE Ihre Datensätze für Cross-Mode Knowledge Retrieval von Sprachmodellen | CASCADE 语言模型跨模式知识检索数据集 2504.01450v2 |
Authors (2): Runlong Zhou, Yi Zhang
Language models often struggle with cross-mode knowledge retrieval – the ability to access knowledge learned in one format (mode) when queried in another. We demonstrate that models trained on multiple data sources (e.g., Wikipedia and TinyStories) exhibit significantly reduced accuracy when retrieving knowledge in a format different from its original training mode. This paper quantitatively investigates this phenomenon through a controlled study of random token sequence memorization across different modes. We first explore dataset rewriting as a solution, revealing that effective cross-mode retrieval requires prohibitively extensive rewriting efforts that follow a sigmoid-like relationship. As an alternative, we propose CASCADE, a novel pretraining algorithm that uses cascading datasets with varying sequence lengths and computing losses on only the second half of each training sequence to capture knowledge at different scales. Our experiments demonstrate that CASCADE outperforms dataset rewriting approaches, even when compressed into a single model with a unified loss function. This work provides both qualitative evidence of cross-mode retrieval limitations and a practical solution to enhance language models’ ability to access knowledge independently of its presentational format.
nan
Article 993
Title@2025-07-12 (6): Continual Reinforcement Learning by Planning with Online World Models
Title: Continual Reinforcement Learning by Planning with Online World Models | Weiterbildung durch Planung mit Online-Weltmodellen | 通过规划与在线世界模式持续加强学习 2507.09177v1 |
Authors (5): Zichen Liu, Guoji Fu, Chao Du, Wee Sun Lee, Min Lin
Continual reinforcement learning (CRL) refers to a naturalistic setting where an agent needs to endlessly evolve, by trial and error, to solve multiple tasks that are presented sequentially. One of the largest obstacles to CRL is that the agent may forget how to solve previous tasks when learning a new task, known as catastrophic forgetting. In this paper, we propose to address this challenge by planning with online world models. Specifically, we learn a Follow-The-Leader shallow model online to capture the world dynamics, in which we plan using model predictive control to solve a set of tasks specified by any reward functions. The online world model is immune to forgetting by construction with a proven regret bound of $\mathcal{O}(\sqrt{K^2D\log(T)})$ under mild assumptions. The planner searches actions solely based on the latest online model, thus forming a FTL Online Agent (OA) that updates incrementally. To assess OA, we further design Continual Bench, a dedicated environment for CRL, and compare with several strong baselines under the same model-planning algorithmic framework. The empirical results show that OA learns continuously to solve new tasks while not forgetting old skills, outperforming agents built on deep world models with various continual learning techniques.
nan
Article 994
Title@2025-07-12 (6): DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values
Title: DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values | DeltaSHAP: Erklären von Vorhersageentwicklungen bei der Online-Patientenüberwachung mit Shapley-Werten | DelsaSHAP: 解释在有阴影值的在线患者监测中的预测演变 2507.02342v2 |
Authors (4): Changhun Kim, Yechan Mun, Sangchul Hahn, Eunho Yang
This study proposes DeltaSHAP, a novel explainable artificial intelligence (XAI) algorithm specifically designed for online patient monitoring systems. In clinical environments, discovering the causes driving patient risk evolution is critical for timely intervention, yet existing XAI methods fail to address the unique requirements of clinical time series explanation tasks. To this end, DeltaSHAP addresses three key clinical needs: explaining the changes in the consecutive predictions rather than isolated prediction scores, providing both magnitude and direction of feature attributions, and delivering these insights in real time. By adapting Shapley values to temporal settings, our approach accurately captures feature coalition effects. It further attributes prediction changes using only the actually observed feature combinations, making it efficient and practical for time-sensitive clinical applications. We also introduce new evaluation metrics to evaluate the faithfulness of the attributions for online time series, and demonstrate through experiments on online patient monitoring tasks that DeltaSHAP outperforms state-of-the-art XAI methods in both explanation quality as 62% and computational efficiency as 33% time reduction on the MIMIC-III decompensation benchmark. We release our code at https://github.com/AITRICS/DeltaSHAP.
nan
Article 995
Title@2025-07-12 (6): Towards Interpretable Drug-Drug Interaction Prediction: A Graph-Based Approach with Molecular and Network-Level Explanations
Title: Towards Interpretable Drug-Drug Interaction Prediction: A Graph-Based Approach with Molecular and Network-Level Explanations | Auf dem Weg zu einer interpretierbaren Drogen- und Drogen-Interaktion Vorhersage: Graphenbasierter Ansatz mit molekularen und netzwerkbasierten Erklärungen | 迈向可解释的药物-药物-药物相互作用的预测:以图表为基础的方法与分子和网络一级的解释 2507.09173v1 |
Authors (3): Mengjie Chen, Ming Zhang, Cunquan Qu
Drug-drug interactions (DDIs) represent a critical challenge in pharmacology, often leading to adverse drug reactions with significant implications for patient safety and healthcare outcomes. While graph-based methods have achieved strong predictive performance, most approaches treat drug pairs independently, overlooking the complex, context-dependent interactions unique to drug pairs. Additionally, these models struggle to integrate biological interaction networks and molecular-level structures to provide meaningful mechanistic insights. In this study, we propose MolecBioNet, a novel graph-based framework that integrates molecular and biomedical knowledge for robust and interpretable DDI prediction. By modeling drug pairs as unified entities, MolecBioNet captures both macro-level biological interactions and micro-level molecular influences, offering a comprehensive perspective on DDIs. The framework extracts local subgraphs from biomedical knowledge graphs and constructs hierarchical interaction graphs from molecular representations, leveraging classical graph neural network methods to learn multi-scale representations of drug pairs. To enhance accuracy and interpretability, MolecBioNet introduces two domain-specific pooling strategies: context-aware subgraph pooling (CASPool), which emphasizes biologically relevant entities, and attention-guided influence pooling (AGIPool), which prioritizes influential molecular substructures. The framework further employs mutual information minimization regularization to enhance information diversity during embedding fusion. Experimental results demonstrate that MolecBioNet outperforms state-of-the-art methods in DDI prediction, while ablation studies and embedding visualizations further validate the advantages of unified drug pair modeling and multi-scale knowledge integration.
nan
Article 996
Title@2025-07-12 (6): Logits are All We Need to Adapt Closed Models
Title: Logits are All We Need to Adapt Closed Models | Logits sind alles, was wir brauchen, um geschlossene Modelle anzupassen | 只需登录即可,我们只需调整已关闭的模型 2502.06806v4 |
Authors (4): Gaurush Hiranandani, Haolun Wu, Subhojyoti Mukherjee, Sanmi Koyejo
Many commercial Large Language Models (LLMs) are often closed-source, limiting developers to prompt tuning for aligning content generation with specific applications. While these models currently do not provide access to token logits, we argue that if such access were available, it would enable more powerful adaptation techniques beyond prompt engineering. In this paper, we propose a token-level probability reweighting framework that, given access to logits and a small amount of task-specific data, can effectively steer black-box LLMs toward application-specific content generation. Our approach views next-token prediction through the lens of supervised classification. We show that aligning black-box LLMs with task-specific data can be formulated as a label noise correction problem, leading to Plugin model – an autoregressive probability reweighting model that operates solely on logits. We provide theoretical justification for why reweighting logits alone is sufficient for task adaptation. Extensive experiments with multiple datasets, LLMs, and reweighting models demonstrate the effectiveness of our method, advocating for broader access to token logits in closed-source models.
nan
Article 997
Title@2025-07-12 (6): An Epistemic and Aleatoric Decomposition of Arbitrariness to Constrain the Set of Good Models
Title: An Epistemic and Aleatoric Decomposition of Arbitrariness to Constrain the Set of Good Models | Eine epistemische und aleatorische Zersetzung der Willkür, um das Set guter Modelle zu beschränken | 向约束一套良好模型的可变性分解 2302.04525v2 |
Authors (4): Falaah Arif Khan, Denys Herasymuk, Nazar Protsiv, Julia Stoyanovich
Recent research reveals that machine learning (ML) models are highly sensitive to minor changes in their training procedure, such as the inclusion or exclusion of a single data point, leading to conflicting predictions on individual data points; a property termed as arbitrariness or instability in ML pipelines in prior work. Drawing from the uncertainty literature, we show that stability decomposes into epistemic and aleatoric components, capturing the consistency and confidence in prediction, respectively. We use this decomposition to provide two main contributions. Our first contribution is an extensive empirical evaluation. We find that (i) epistemic instability can be reduced with more training data whereas aleatoric instability cannot; (ii) state-of-the-art ML models have aleatoric instability as high as 79% and aleatoric instability disparities among demographic groups as high as 29% in popular fairness benchmarks; and (iii) fairness pre-processing interventions generally increase aleatoric instability more than in-processing interventions, and both epistemic and aleatoric instability are highly sensitive to data-processing interventions and model architecture. Our second contribution is a practical solution to the problem of systematic arbitrariness. We propose a model selection procedure that includes epistemic and aleatoric criteria alongside existing accuracy and fairness criteria, and show that it successfully narrows down a large set of good models (50-100 on our datasets) to a handful of stable, fair and accurate ones. We built and publicly released a python library to measure epistemic and aleatoric multiplicity in any ML pipeline alongside existing confusion-matrix-based metrics, providing practitioners with a rich suite of evaluation metrics to use to define a more precise criterion during model selection.
nan
Article 998
Title@2025-07-12 (6): Investigating the Robustness of Extreme Precipitation Super-Resolution Across Climates
Title: Investigating the Robustness of Extreme Precipitation Super-Resolution Across Climates | Untersuchung der Robustheit extremer Niederschlags-Super-Resolution über Klima hinweg | 调查极端降水性超强 超分辨率 横跨气候 2507.09166v1 |
Authors (6): Louise Largeau, Erwan Koch, David Leutwyler, Gregoire Mariethoz, Valerie Chavez-Demoulin, Tom Beucler
The coarse spatial resolution of gridded climate models, such as general circulation models, limits their direct use in projecting socially relevant variables like extreme precipitation. Most downscaling methods estimate the conditional distributions of extremes by generating large ensembles, complicating the assessment of robustness under distributional shifts, such as those induced by climate change. To better understand and potentially improve robustness, we propose super-resolving the parameters of the target variable’s probability distribution directly using analytically tractable mappings. Within a perfect-model framework over Switzerland, we demonstrate that vector generalized linear and additive models can super-resolve the generalized extreme value distribution of summer hourly precipitation extremes from coarse precipitation fields and topography. We introduce the notion of a “robustness gap”, defined as the difference in predictive error between present-trained and future-trained models, and use it to diagnose how model structure affects the generalization of each quantile to a pseudo-global warming scenario. By evaluating multiple model configurations, we also identify an upper limit on the super-resolution factor based on the spatial auto- and cross-correlation of precipitation and elevation, beyond which coarse precipitation loses predictive value. Our framework is broadly applicable to variables governed by parametric distributions and offers a model-agnostic diagnostic for understanding when and why empirical downscaling generalizes to climate change and extremes.
nan
Article 999
Title@2025-07-12 (6): Tactile-VLA: Unlocking Vision-Language-Action Model’s Physical Knowledge for Tactile Generalization
Title: Tactile-VLA: Unlocking Vision-Language-Action Model’s Physical Knowledge for Tactile Generalization | Tactile-VLA: Das physische Wissen des Vision-Sprache-Action-Modells für die Taktile Generalisierung | 触觉-VLA:解锁视觉-语言-行动模型的物理知识促进触觉一般化 2507.09160v1 |
Authors (6): Jialei Huang, Shuo Wang, Fanqi Lin, Yihang Hu, Chuan Wen, Yang Gao
Vision-Language-Action (VLA) models have shown remarkable achievements, driven by the rich implicit knowledge of their vision-language components. However, achieving generalist robotic agents demands precise grounding into physical interactions, especially in contact-rich scenarios where fine-grained force control is essential. We advance VLAs’ implicit knowledge beyond identifying what to do, towards guiding how to physically interact with real world. This paper introduces Tactile-VLA, a novel framework that deeply fuses vision, language, action, and tactile sensing. This framework incorporates a hybrid position-force controller to translate the model’s intentions into precise physical actions and a reasoning module that allows the robot to adapt its strategy based on tactile feedback. Experiments demonstrate Tactile-VLA’s effectiveness and generalizability in three key aspects: (1) enabling tactile-aware instruction following, (2) utilizing tactile-relevant commonsense, and (3) facilitating adaptive tactile-involved reasoning. A key finding is that the VLM’s prior knowledge already contains semantic understanding of physical interaction; by connecting it to the robot’s tactile sensors with only a few demonstrations, we can activate this prior knowledge to achieve zero-shot generalization in contact-rich tasks.
nan
Article 1000
Title@2025-07-12 (6): Regularization-based Framework for Quantization-, Fault- and Variability-Aware Training
Title: Regularization-based Framework for Quantization-, Fault- and Variability-Aware Training | Regularisierungsbasiertes Framework für Quantization-, Fehler- und Variability-Aware Training | 量化、失责和易变-软件培训规范化框架 2503.01297v3 |
Authors (5): Anmol Biswas, Raghav Singhal, Sivakumar Elangovan, Shreyas Sabnis, Udayan Ganguly
Efficient inference is critical for deploying deep learning models on edge AI devices. Low-bit quantization (e.g., 3- and 4-bit) with fixed-point arithmetic improves efficiency, while low-power memory technologies like analog nonvolatile memory enable further gains. However, these methods introduce non-ideal hardware behavior, including bit faults and device-to-device variability. We propose a regularization-based quantization-aware training (QAT) framework that supports fixed, learnable step-size, and learnable non-uniform quantization, achieving competitive results on CIFAR-10 and ImageNet. Our method also extends to Spiking Neural Networks (SNNs), demonstrating strong performance on 4-bit networks on CIFAR10-DVS and N-Caltech 101. Beyond quantization, our framework enables fault and variability-aware fine-tuning, mitigating stuck-at faults (fixed weight bits) and device resistance variability. Compared to prior fault-aware training, our approach significantly improves performance recovery under upto 20% bit-fault rate and 40% device-to-device variability. Our results establish a generalizable framework for quantization and robustness-aware training, enhancing efficiency and reliability in low-power, non-ideal hardware.
nan
Article 1001
Title@2025-07-12 (6): AdRo-FL: Informed and Secure Client Selection for Federated Learning in the Presence of Adversarial Aggregator
Title: AdRo-FL: Informed and Secure Client Selection for Federated Learning in the Presence of Adversarial Aggregator | AdRo-FL: Informierte und sichere Kundenauswahl für das Federated Learning in der Gegenwart von Adversarial Aggregator | ADRO-FL:在存在反versarial聚合体的情况下,为联邦学习进行知情和安全的客户选择 2506.17805v2 |
Authors (5): Md. Kamrul Hossain, Walid Aljoby, Anis Elgabli, Ahmed M. Abdelmoniem, Khaled A. Harras
Federated Learning (FL) enables collaborative learning without exposing clients’ data. While clients only share model updates with the aggregator, studies reveal that aggregators can infer sensitive information from these updates. Secure Aggregation (SA) protects individual updates during transmission; however, recent work demonstrates a critical vulnerability where adversarial aggregators manipulate client selection to bypass SA protections, constituting a Biased Selection Attack (BSA). Although verifiable random selection prevents BSA, it precludes informed client selection essential for FL performance. We propose Adversarial Robust Federated Learning (AdRo-FL), which simultaneously enables: informed client selection based on client utility, and robust defense against BSA maintaining privacy-preserving aggregation. AdRo-FL implements two client selection frameworks tailored for distinct settings. The first framework assumes clients are grouped into clusters based on mutual trust, such as different branches of an organization. The second framework handles distributed clients where no trust relationships exist between them. For the cluster-oriented setting, we propose a novel defense against BSA by (1) enforcing a minimum client selection quota from each cluster, supervised by a cluster-head in every round, and (2) introducing a client utility function to prioritize efficient clients. For the distributed setting, we design a two-phase selection protocol: first, the aggregator selects the top clients based on our utility-driven ranking; then, a verifiable random function (VRF) ensures a BSA-resistant final selection. AdRo-FL also applies quantization to reduce communication overhead and sets strict transmission deadlines to improve energy efficiency. AdRo-FL achieves up to $1.85\times$ faster time-to-accuracy and up to $1.06\times$ higher final accuracy compared to insecure baselines.
nan
Article 1002
Title@2025-07-12 (6): Advanced Health Misinformation Detection Through Hybrid CNN-LSTM Models Informed by the Elaboration Likelihood Model (ELM)
Title: Advanced Health Misinformation Detection Through Hybrid CNN-LSTM Models Informed by the Elaboration Likelihood Model (ELM) | Fortschrittliche Gesundheits-Missinformationserkennung durch Hybrid-CNN-LSTM-Modelle Das Elaboration Likelihood Model (ELM) | 通过有线电视新闻网-LSTM混合模型,通过 “ 发展相似性模型 “ (ELM)所了解的模型,发现高级健康错误信息 2507.09149v1 |
Authors (3): Mkululi Sikosana, Sean Maudsley-Barton, Oluwaseun Ajao
Health misinformation during the COVID-19 pandemic has significantly challenged public health efforts globally. This study applies the Elaboration Likelihood Model (ELM) to enhance misinformation detection on social media using a hybrid Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) model. The model aims to enhance the detection accuracy and reliability of misinformation classification by integrating ELM-based features such as text readability, sentiment polarity, and heuristic cues (e.g., punctuation frequency). The enhanced model achieved an accuracy of 97.37%, precision of 96.88%, recall of 98.50%, F1-score of 97.41%, and ROC-AUC of 99.50%. A combined model incorporating feature engineering further improved performance, achieving a precision of 98.88%, recall of 99.80%, F1-score of 99.41%, and ROC-AUC of 99.80%. These findings highlight the value of ELM features in improving detection performance, offering valuable contextual information. This study demonstrates the practical application of psychological theories in developing advanced machine learning algorithms to address health misinformation effectively.
nan
Article 1003
Title@2025-07-12 (6): A Randomized Algorithm for Sparse PCA based on the Basic SDP Relaxation
Title: A Randomized Algorithm for Sparse PCA based on the Basic SDP Relaxation | Ein Randomisierter Algorithmus für Sparse PCA auf Basis der Basic SDP Relaxation | 基于基本 SDP 放松的 SDP 随机化 Sparse 五氯苯甲醚的算法 2507.09148v1 |
Authors (2): Alberto Del Pia, Dekun Zhou
Sparse Principal Component Analysis (SPCA) is a fundamental technique for dimensionality reduction, and is NP-hard. In this paper, we introduce a randomized approximation algorithm for SPCA, which is based on the basic SDP relaxation. Our algorithm has an approximation ratio of at most the sparsity constant with high probability, if called enough times. Under a technical assumption, which is consistently satisfied in our numerical tests, the average approximation ratio is also bounded by $\mathcal{O}(\log{d})$, where $d$ is the number of features. We show that this technical assumption is satisfied if the SDP solution is low-rank, or has exponentially decaying eigenvalues. We then present a broad class of instances for which this technical assumption holds. We also demonstrate that in a covariance model, which generalizes the spiked Wishart model, our proposed algorithm achieves a near-optimal approximation ratio. We demonstrate the efficacy of our algorithm through numerical results on real-world datasets.
nan
Article 1004
Title@2025-07-12 (6): Continuous Spiking Graph Neural Networks
Title: Continuous Spiking Graph Neural Networks | Kontinuierliche Spiking Graph Neuronale Netzwerke | 连续Spiking 图形神经网络 2404.01897v2 |
Authors (7): Nan Yin, Mengzhu Wan, Li Shen, Hitesh Laxmichand Patel, Baopu Li, Bin Gu, Huan Xiong
Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs) by introducing continuous dynamics. They typically draw inspiration from diffusion-based methods to introduce a novel propagation scheme, which is analyzed using ordinary differential equations (ODE). However, the implementation of CGNNs requires significant computational power, making them challenging to deploy on battery-powered devices. Inspired by recent spiking neural networks (SNNs), which emulate a biological inference process and provide an energy-efficient neural architecture, we incorporate the SNNs with CGNNs in a unified framework, named Continuous Spiking Graph Neural Networks (COS-GNN). We employ SNNs for graph node representation at each time step, which are further integrated into the ODE process along with time. To enhance information preservation and mitigate information loss in SNNs, we introduce the high-order structure of COS-GNN, which utilizes the second-order ODE for spiking representation and continuous propagation. Moreover, we provide the theoretical proof that COS-GNN effectively mitigates the issues of exploding and vanishing gradients, enabling us to capture long-range dependencies between nodes. Experimental results on graph-based learning tasks demonstrate the effectiveness of the proposed COS-GNN over competitive baselines.
nan
Article 1005
Title@2025-07-12 (6): HedraRAG: Coordinating LLM Generation and Database Retrieval in Heterogeneous RAG Serving
Title: HedraRAG: Coordinating LLM Generation and Database Retrieval in Heterogeneous RAG Serving | HedraRAG: Koordinierung der LLM-Erzeugung und Datenbankwiederherstellung im heterogenen RAG-Servieren | HedraRAG:在异基因RAG服务中协调LLM生成和数据库检索 2507.09138v1 |
Authors (7): Zhengding Hu, Vibha Murthy, Zaifeng Pan, Wanlu Li, Xiaoyi Fang, Yufei Ding, Yuke Wang
This paper addresses emerging system-level challenges in heterogeneous retrieval-augmented generation (RAG) serving, where complex multi-stage workflows and diverse request patterns complicate efficient execution. We present HedraRAG, a runtime system built on a graph-based abstraction that exposes optimization opportunities across stage-level parallelism, intra-request similarity, and inter-request skewness. These opportunities are realized through dynamic graph transformations, such as node splitting, reordering, edge addition, and dependency rewiring, applied to wavefronts of subgraphs spanning concurrent requests. The resulting execution plans are mapped onto hybrid CPU-GPU pipelines to improve resource utilization and reduce latency. Evaluations across a wide range of RAG workflows demonstrate speedups exceeding 1.5x and reaching up to 5x over existing frameworks, showcasing the effectiveness of coordinated generation and retrieval in serving environments.
nan
Article 1006
Title@2025-07-12 (6): POIFormer: A Transformer-Based Framework for Accurate and Scalable Point-of-Interest Attribution
Title: POIFormer: A Transformer-Based Framework for Accurate and Scalable Point-of-Interest Attribution | POIFormer: Ein transformerbasierter Rahmen für präzise und skalierbare Point-of-Interest Attribution | POI Foremer: 以变换器为基础的准确和可缩放的利点归属框架 2507.09137v1 |
Authors (6): Nripsuta Ani Saxena, Shang-Ling Hsu, Mehul Shetty, Omar Alkhadra, Cyrus Shahabi, Abigail L. Horn
Accurately attributing user visits to specific Points of Interest (POIs) is a foundational task for mobility analytics, personalized services, marketing and urban planning. However, POI attribution remains challenging due to GPS inaccuracies, typically ranging from 2 to 20 meters in real-world settings, and the high spatial density of POIs in urban environments, where multiple venues can coexist within a small radius (e.g., over 50 POIs within a 100-meter radius in dense city centers). Relying on proximity is therefore often insufficient for determining which POI was actually visited. We introduce \textsf{POIFormer}, a novel Transformer-based framework for accurate and efficient POI attribution. Unlike prior approaches that rely on limited spatiotemporal, contextual, or behavioral features, \textsf{POIFormer} jointly models a rich set of signals, including spatial proximity, visit timing and duration, contextual features from POI semantics, and behavioral features from user mobility and aggregated crowd behavior patterns–using the Transformer’s self-attention mechanism to jointly model complex interactions across these dimensions. By leveraging the Transformer to model a user’s past and future visits (with the current visit masked) and incorporating crowd-level behavioral patterns through pre-computed KDEs, \textsf{POIFormer} enables accurate, efficient attribution in large, noisy mobility datasets. Its architecture supports generalization across diverse data sources and geographic contexts while avoiding reliance on hard-to-access or unavailable data layers, making it practical for real-world deployment. Extensive experiments on real-world mobility datasets demonstrate significant improvements over existing baselines, particularly in challenging real-world settings characterized by spatial noise and dense POI clustering.
nan
Article 1007
Title@2025-07-12 (6): Dynamic Spiking Framework for Graph Neural Networks
Title: Dynamic Spiking Framework for Graph Neural Networks | Dynamisches Spiking-Framework für Graphen-Neural-Netzwerke | 图形神经网络动态Spiking框架 2401.05373v4 |
Authors (6): Nan Yin, Mengzhu Wang, Zhenghan Chen, Giulia De Masi, Bin Gu, Huan Xiong
The integration of Spiking Neural Networks (SNNs) and Graph Neural Networks (GNNs) is gradually attracting attention due to the low power consumption and high efficiency in processing the non-Euclidean data represented by graphs. However, as a common problem, dynamic graph representation learning faces challenges such as high complexity and large memory overheads. Current work often uses SNNs instead of Recurrent Neural Networks (RNNs) by using binary features instead of continuous ones for efficient training, which would overlooks graph structure information and leads to the loss of details during propagation. Additionally, optimizing dynamic spiking models typically requires propagation of information across time steps, which increases memory requirements. To address these challenges, we present a framework named \underline{Dy}namic \underline{S}p\underline{i}king \underline{G}raph \underline{N}eural Networks (\method{}). To mitigate the information loss problem, \method{} propagates early-layer information directly to the last layer for information compensation. To accommodate the memory requirements, we apply the implicit differentiation on the equilibrium state, which does not rely on the exact reverse of the forward computation. While traditional implicit differentiation methods are usually used for static situations, \method{} extends it to the dynamic graph setting. Extensive experiments on three large-scale real-world dynamic graph datasets validate the effectiveness of \method{} on dynamic node classification tasks with lower computational costs.
nan
Article 1008
Title@2025-07-12 (6): MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian
Title: MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian | MSVD-Indonesier: Benchmark für multimodale Video-Text-Aufgaben auf Indonesisch | MSVD-印度尼西亚文:印度尼西亚多式视频文字任务基准 2306.11341v2 |
Authors (1): Willy Fitra Hendria
Multimodal learning on video and text has seen significant progress, particularly in tasks like text-to-video retrieval, video-to-text retrieval, and video captioning. However, most existing methods and datasets focus exclusively on English. Despite Indonesian being one of the most widely spoken languages, multimodal research in Indonesian remains under-explored, largely due to the lack of benchmark datasets. To address this gap, we introduce the first public Indonesian video-text dataset by translating the English captions in the MSVD dataset into Indonesian. Using this dataset, we evaluate neural network models which were developed for the English video-text dataset on three tasks, i.e., text-to-video retrieval, video-to-text retrieval, and video captioning. Most existing models rely on feature extractors pretrained on English vision-language datasets, raising concerns about their applicability to Indonesian, given the scarcity of large-scale pretraining resources in the language. We apply a cross-lingual transfer learning approach by leveraging English-pretrained extractors and fine-tuning models on our Indonesian dataset. Experimental results demonstrate that this strategy improves performance across all tasks and metrics. We release our dataset publicly to support future research and hope it will inspire further progress in Indonesian multimodal learning.
nan
Article 1009
Title@2025-07-12 (6): CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification
Title: CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification | CoCo: Ein gekoppeltes Kontrastrahmenwerk für eine nicht überwachte Domänen-Adaptive Graphenklassifikation | Co: 未经监督的域适应性图表分类的相互抵触框架 2306.04979v4 |
Authors (8): Nan Yin, Li Shen, Mengzhu Wang, Long Lan, Zeyu Ma, Chong Chen, Xian-Sheng Hua, Xiao Luo
Although graph neural networks (GNNs) have achieved impressive achievements in graph classification, they often need abundant task-specific labels, which could be extensively costly to acquire. A credible solution is to explore additional labeled graphs to enhance unsupervised learning on the target domain. However, how to apply GNNs to domain adaptation remains unsolved owing to the insufficient exploration of graph topology and the significant domain discrepancy. In this paper, we propose Coupled Contrastive Graph Representation Learning (CoCo), which extracts the topological information from coupled learning branches and reduces the domain discrepancy with coupled contrastive learning. CoCo contains a graph convolutional network branch and a hierarchical graph kernel network branch, which explore graph topology in implicit and explicit manners. Besides, we incorporate coupled branches into a holistic multi-view contrastive learning framework, which not only incorporates graph representations learned from complementary views for enhanced understanding, but also encourages the similarity between cross-domain example pairs with the same semantics for domain alignment. Extensive experiments on popular datasets show that our CoCo outperforms these competing baselines in different settings generally.
nan
Article 1010
Title@2025-07-12 (6): Heterogeneous Graph Prompt Learning via Adaptive Weight Pruning
Title: Heterogeneous Graph Prompt Learning via Adaptive Weight Pruning | Heterogenes Graphen-Prompt-Lernen durch adaptive Gewichtsprüfung | 通过适应性弱力缓冲快速学习 2507.09132v1 |
Authors (6): Chu-Yuan Wei, Shun-Yao Liu, Sheng-Da Zhuo, Chang-Dong Wang, Shu-Qiang Huang, Mohsen Guizani
Graph Neural Networks (GNNs) have achieved remarkable success in various graph-based tasks (e.g., node classification or link prediction). Despite their triumphs, GNNs still face challenges such as long training and inference times, difficulty in capturing complex relationships, and insufficient feature extraction. To tackle these issues, graph pre-training and graph prompt methods have garnered increasing attention for their ability to leverage large-scale datasets for initial learning and task-specific adaptation, offering potential improvements in GNN performance. However, previous research has overlooked the potential of graph prompts in optimizing models, as well as the impact of both positive and negative graph prompts on model stability and efficiency. To bridge this gap, we propose a novel framework combining graph prompts with weight pruning, called GPAWP, which aims to enhance the performance and efficiency of graph prompts by using fewer of them. We evaluate the importance of graph prompts using an importance assessment function to determine positive and negative weights at different granularities. Through hierarchically structured pruning, we eliminate negative prompt labels, resulting in more parameter-efficient and competitively performing prompts. Extensive experiments on three benchmark datasets demonstrate the superiority of GPAWP, leading to a significant reduction in parameters in node classification tasks.
nan
Article 1011
Title@2025-07-12 (6): Learning Traffic Anomalies from Generative Models on Real-Time Observations
Title: Learning Traffic Anomalies from Generative Models on Real-Time Observations | Verkehrsanomalien aus generativen Modellen auf Echtzeit-Beobachtungen lernen | 实时观测生成模型的学习交通异常现象 2502.01391v5 |
Authors (2): Fotis I. Giasemis, Alexandros Sopasakis
Accurate detection of traffic anomalies is crucial for effective urban traffic management and congestion mitigation. We use the Spatiotemporal Generative Adversarial Network (STGAN) framework combining Graph Neural Networks and Long Short-Term Memory networks to capture complex spatial and temporal dependencies in traffic data. We apply STGAN to real-time, minute-by-minute observations from 42 traffic cameras across Gothenburg, Sweden, collected over several months in 2020. The images are processed to compute a flow metric representing vehicle density, which serves as input for the model. Training is conducted on data from April to November 2020, and validation is performed on a separate dataset from November 14 to 23, 2020. Our results demonstrate that the model effectively detects traffic anomalies with high precision and low false positive rates. The detected anomalies include camera signal interruptions, visual artifacts, and extreme weather conditions affecting traffic flow.
nan
Article 1012
Title@2025-07-12 (6): DuSEGO: Dual Second-order Equivariant Graph Ordinary Differential Equation
Title: DuSEGO: Dual Second-order Equivariant Graph Ordinary Differential Equation | DuSEGO: Zweifach-Äquivariant Graph Normal Differentialgleichung zweiter Ordnung | DSEGO: 双二等等同图形普通等同法 2411.10000v2 |
Authors (6): Yingxu Wang, Nan Yin, Mingyan Xiao, Xinhao Yi, Siwei Liu, Shangsong Liang
Graph Neural Networks (GNNs) with equivariant properties have achieved significant success in modeling complex dynamic systems and molecular properties. However, their expressiveness ability is limited by: (1) Existing methods often overlook the over-smoothing issue caused by traditional GNN models, as well as the gradient explosion or vanishing problems in deep GNNs. (2) Most models operate on first-order information, neglecting that the real world often consists of second-order systems, which further limits the model’s representation capabilities. To address these issues, we propose the \textbf{Du}al \textbf{S}econd-order \textbf{E}quivariant \textbf{G}raph \textbf{O}rdinary Differential Equation (\method{}) for equivariant representation. Specifically, \method{} apply the dual second-order equivariant graph ordinary differential equations (Graph ODEs) on graph embeddings and node coordinates, simultaneously. Theoretically, we first prove that \method{} maintains the equivariant property. Furthermore, we provide theoretical insights showing that \method{} effectively alleviates the over-smoothing problem in both feature representation and coordinate update. Additionally, we demonstrate that the proposed \method{} mitigates the exploding and vanishing gradients problem, facilitating the training of deep multi-layer GNNs. Extensive experiments on benchmark datasets validate the superiority of the proposed \method{} compared to baselines.
nan
Article 1013
Title@2025-07-12 (6): A Generalization Theory for Zero-Shot Prediction
Title: A Generalization Theory for Zero-Shot Prediction | Eine Verallgemeinerungstheorie für Null-Shot-Vorhersage | 零热预测通用理论 2507.09128v1 |
Authors (2): Ronak Mehta, Zaid Harchaoui
A modern paradigm for generalization in machine learning and AI consists of pre-training a task-agnostic foundation model, generally obtained using self-supervised and multimodal contrastive learning. The resulting representations can be used for prediction on a downstream task for which no labeled data is available. We present a theoretical framework to better understand this approach, called zero-shot prediction. We identify the target quantities that zero-shot prediction aims to learn, or learns in passing, and the key conditional independence relationships that enable its generalization ability.
nan
Article 1014
Title@2025-07-12 (6): Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs
Title: Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs | Divide-Then-Rule: Ein clustergetriebener Hierarchischer Interpolator für Attribute-Missing Graphen | 区分后规则: 用于属性映射图的集成驱动等级式内插工具 2507.10595v1 |
Authors (8): Yaowen Hu, Wenxuan Tu, Yue Liu, Miaomiao Li, Wenpeng Lu, Zhigang Luo, Xinwang Liu, Ping Chen
Deep graph clustering (DGC) for attribute-missing graphs is an unsupervised task aimed at partitioning nodes with incomplete attributes into distinct clusters. Addressing this challenging issue is vital for practical applications. However, research in this area remains underexplored. Existing imputation methods for attribute-missing graphs often fail to account for the varying amounts of information available across node neighborhoods, leading to unreliable results, especially for nodes with insufficient known neighborhood. To address this issue, we propose a novel method named Divide-Then-Rule Graph Completion (DTRGC). This method first addresses nodes with sufficient known neighborhood information and treats the imputed results as new knowledge to iteratively impute more challenging nodes, while leveraging clustering information to correct imputation errors. Specifically, Dynamic Cluster-Aware Feature Propagation (DCFP) initializes missing node attributes by adjusting propagation weights based on the clustering structure. Subsequently, Hierarchical Neighborhood-aware Imputation (HNAI) categorizes attribute-missing nodes into three groups based on the completeness of their neighborhood attributes. The imputation is performed hierarchically, prioritizing the groups with nodes that have the most available neighborhood information. The cluster structure is then used to refine the imputation and correct potential errors. Finally, Hop-wise Representation Enhancement (HRE) integrates information across multiple hops, thereby enriching the expressiveness of node representations. Experimental results on six widely used graph datasets show that DTRGC significantly improves the clustering performance of various DGC methods under attribute-missing graphs.
nan
Article 1015
Title@2025-07-12 (6): A Study of Value-Aware Eigenoptions
Title: A Study of Value-Aware Eigenoptions | Eine Studie über wertbewusste Eigenoptionen | 价值-知识Eigen备选方法研究 2507.09127v1 |
Authors (2): Harshil Kotamreddy, Marlos C. Machado
Options, which impose an inductive bias toward temporal and hierarchical structure, offer a powerful framework for reinforcement learning (RL). While effective in sequential decision-making, they are often handcrafted rather than learned. Among approaches for discovering options, eigenoptions have shown strong performance in exploration, but their role in credit assignment remains underexplored. In this paper, we investigate whether eigenoptions can accelerate credit assignment in model-free RL, evaluating them in tabular and pixel-based gridworlds. We find that pre-specified eigenoptions aid not only exploration but also credit assignment, whereas online discovery can bias the agent’s experience too strongly and hinder learning. In the context of deep RL, we also propose a method for learning option-values under non-linear function approximation, highlighting the impact of termination conditions on performance. Our findings reveal both the promise and complexity of using eigenoptions, and options more broadly, to simultaneously support credit assignment and exploration in reinforcement learning.
nan
Article 1016
Title@2025-07-12 (6): Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning
Title: Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning | Mind the Gap: Erhalten und Kompensieren der Modalitätslücke im CLIP-basierten kontinuierlichen Lernen | 牢记差距:维护和补偿基于CLIP的不断学习模式差距 2507.09118v1 |
Authors (6): Linlan Huang, Xusheng Cao, Haori Lu, Yifan Meng, Fei Yang, Xialei Liu
Continual learning aims to enable models to learn sequentially from continuously incoming data while retaining performance on previously learned tasks. With the Contrastive Language-Image Pre-trained model (CLIP) exhibiting strong capabilities across various downstream tasks, there has been growing interest in leveraging CLIP for continual learning in such scenarios. Most existing works overlook the inherent modality gap in CLIP, a key factor in its generalization and adaptability. In this paper, we analyze the variations in the modality gap during the fine-tuning of vision-language pre-trained models. Our observations reveal that the modality gap effectively reflects the extent to which pre-trained knowledge is preserved. Based on these insights, we propose a simple yet effective method, MG-CLIP, that improves CLIP’s performance in class-incremental learning. Our approach leverages modality gap preservation to mitigate forgetting and modality gap compensation to enhance the capacity for new data, introducing a novel modality-gap-based perspective for continual learning. Extensive experiments on multiple benchmarks demonstrate that our method outperforms existing approaches without requiring additional replay data. Our code is available at https://github.com/linlany/MindtheGap.
nan
Article 1017
Title@2025-07-12 (6): KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
Title: KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding | KodCode: Ein vielfältiger, anspruchsvoller und überprüfbarer synthetischer Datensatz für die Codierung | KodCode:用于编码的多样化、挑战性和可核查合成数据集 2503.02951v2 |
Authors (5): Zhangchen Xu, Yang Liu, Yueqin Yin, Mingyuan Zhou, Radha Poovendran
We introduce KodCode, a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data across diverse difficulties and domains for training Large Language Models for coding. Existing code-focused resources typically fail to ensure either the breadth of coverage (e.g., spanning simple coding tasks to advanced algorithmic problems) or verifiable correctness (e.g., unit tests). In contrast, KodCode comprises question-solution-test triplets that are systematically validated via a self-verification procedure. Our pipeline begins by synthesizing a broad range of coding questions, then generates solutions and test cases with additional attempts allocated to challenging problems. Finally, post-training data synthesis is done by rewriting questions into diverse formats and generating responses under a test-based reject sampling procedure from a reasoning model (DeepSeek R1). This pipeline yields a large-scale, robust and diverse coding dataset. KodCode is suitable for supervised fine-tuning and the paired unit tests also provide great potential for RL tuning. Fine-tuning experiments on coding benchmarks (HumanEval(+), MBPP(+), BigCodeBench, and LiveCodeBench) demonstrate that KodCode-tuned models achieve state-of-the-art performance, surpassing models like Qwen2.5-Coder-32B-Instruct and DeepSeek-R1-Distill-Llama-70B.
nan
Article 1018
Title@2025-07-12 (6): Deep Neural Network Based Accelerated Failure Time Models using Rank Loss
Title: Deep Neural Network Based Accelerated Failure Time Models using Rank Loss | Deep Neural Network Based Accelerated Failure Time Models mit Rang Loss | 基于深神经网络的深神经网络加速失败时间模型 2206.05974v2 |
Authors (2): Gwangsu Kim, Sangwook Kang
An accelerated failure time (AFT) model assumes a log-linear relationship between failure times and a set of covariates. In contrast to other popular survival models that work on hazard functions, the effects of covariates are directly on failure times, whose interpretation is intuitive. The semiparametric AFT model that does not specify the error distribution is flexible and robust to departures from the distributional assumption. Owing to the desirable features, this class of models has been considered as a promising alternative to the popular Cox model in the analysis of censored failure time data. However, in these AFT models, a linear predictor for the mean is typically assumed. Little research has addressed the nonlinearity of predictors when modeling the mean. Deep neural networks (DNNs) have received a focal attention over the past decades and have achieved remarkable success in a variety of fields. DNNs have a number of notable advantages and have been shown to be particularly useful in addressing the nonlinearity. By taking advantage of this, we propose to apply DNNs in fitting AFT models using a Gehan-type loss, combined with a sub-sampling technique. Finite sample properties of the proposed DNN and rank based AFT model (DeepR-AFT) are investigated via an extensive stimulation study. DeepR-AFT shows a superior performance over its parametric or semiparametric counterparts when the predictor is nonlinear. For linear predictors, DeepR-AFT performs better when the dimensions of covariates are large. The proposed DeepR-AFT is illustrated using two real datasets, which demonstrates its superiority.
nan
Article 1019
Title@2025-07-12 (6): CoVAE: Consistency Training of Variational Autoencoders
Title: CoVAE: Consistency Training of Variational Autoencoders | CoVAE: Konsequentitätstraining von Variationalen Autoencodern | COVAE: 对机动机动机动人员的统一培训 2507.09103v1 |
Authors (2): Gianluigi Silvestri, Luca Ambrogioni
Current state-of-the-art generative approaches frequently rely on a two-stage training procedure, where an autoencoder (often a VAE) first performs dimensionality reduction, followed by training a generative model on the learned latent space. While effective, this introduces computational overhead and increased sampling times. We challenge this paradigm by proposing Consistency Training of Variational AutoEncoders (CoVAE), a novel single-stage generative autoencoding framework that adopts techniques from consistency models to train a VAE architecture. The CoVAE encoder learns a progressive series of latent representations with increasing encoding noise levels, mirroring the forward processes of diffusion and flow matching models. This sequence of representations is regulated by a time dependent $\beta$ parameter that scales the KL loss. The decoder is trained using a consistency loss with variational regularization, which reduces to a conventional VAE loss at the earliest latent time. We show that CoVAE can generate high-quality samples in one or few steps without the use of a learned prior, significantly outperforming equivalent VAEs and other single-stage VAEs methods. Our approach provides a unified framework for autoencoding and diffusion-style generative modeling and provides a viable route for one-step generative high-performance autoencoding. Our code is publicly available at https://github.com/gisilvs/covae.
nan
Article 1020
Title@2025-07-12 (6): AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model
Title: AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model | AHCPTQ: Genaue und hardwarekompatible Nachschulungs-Quantisierung für Segment-Anything-Modell | ACHPTQ: 分片 “ 任何 “ 模式的准确和硬件兼容的训练后培训后量化 2503.03088v3 |
Authors (4): Wenlun Zhang, Yunshan Zhong, Shimpei Ando, Kentaro Yoshioka
The Segment Anything Model (SAM) has demonstrated strong versatility across various visual tasks. However, its large storage requirements and high computational cost pose challenges for practical deployment. Post-training quantization (PTQ) has emerged as an effective strategy for efficient deployment, but we identify two key challenges in SAM that hinder the effectiveness of existing PTQ methods: the heavy-tailed and skewed distribution of post-GELU activations, and significant inter-channel variation in linear projection activations. To address these challenges, we propose AHCPTQ, an accurate and hardware-efficient PTQ method for SAM. AHCPTQ introduces hardware-compatible Hybrid Log-Uniform Quantization (HLUQ) to manage post-GELU activations, employing log2 quantization for dense small values and uniform quantization for sparse large values to enhance quantization resolution. Additionally, AHCPTQ incorporates Channel-Aware Grouping (CAG) to mitigate inter-channel variation by progressively clustering activation channels with similar distributions, enabling them to share quantization parameters and improving hardware efficiency. The combination of HLUQ and CAG not only enhances quantization effectiveness but also ensures compatibility with efficient hardware execution. For instance, under the W4A4 configuration on the SAM-L model, AHCPTQ achieves 36.6% mAP on instance segmentation with the DINO detector, while achieving a 7.89x speedup and 8.64x energy efficiency over its floating-point counterpart in FPGA implementation.
nan
Article 1021
Title@2025-07-12 (6): S2SRec2: Set-to-Set Recommendation for Basket Completion with Recipe
Title: S2SRec2: Set-to-Set Recommendation for Basket Completion with Recipe | S2SRec2: Set-to-Set Empfehlung für Korb Fertigstellung mit Rezept | S2SRec2:关于配有食谱的篮子补全的设置到设置建议 2507.09101v1 |
Authors (6): Yanan Cao, Omid Memarrast, Shiqin Cai, Sinduja Subramaniam, Evren Korpeoglu, Kannan Achan
In grocery e-commerce, customers often build ingredient baskets guided by dietary preferences but lack the expertise to create complete meals. Leveraging recipe knowledge to recommend complementary ingredients based on a partial basket is essential for improving the culinary experience. Traditional recipe completion methods typically predict a single missing ingredient using a leave-one-out strategy. However, they fall short in two key aspects: (i) they do not reflect real-world scenarios where multiple ingredients are often needed, and (ii) they overlook relationships among the missing ingredients themselves. To address these limitations, we reformulate basket completion as a set-to-set (S2S) recommendation problem, where an incomplete basket is input into a system that predicts a set of complementary ingredients. We introduce S2SRec2, a set-to-set ingredient recommendation framework based on a Set Transformer and trained in a multitask learning paradigm. S2SRec2 jointly learns to (i) retrieve missing ingredients from the representation of existing ones and (ii) assess basket completeness after prediction. These tasks are optimized together, enforcing accurate retrieval and coherent basket completion. Experiments on large-scale recipe datasets and qualitative analyses show that S2SRec2 significantly outperforms single-target baselines, offering a promising approach to enhance grocery shopping and inspire culinary creativity.
nan
Article 1022
Title@2025-07-12 (6): On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving
Title: On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving | Zur Fragilität der multimodalen Wahrnehmung zur zeitlichen Fehlausrichtung im autonomen Fahren | 自主驾驶时时时失调的多模式观念的易变性 2507.09095v1 |
Authors (6): Md Hasan Shahriar, Md Mohaimin Al Barat, Harshavardhan Sundar, Naren Ramakrishnan, Y. Thomas Hou, Wenjing Lou
Multimodal fusion (MMF) plays a critical role in the perception of autonomous driving, which primarily fuses camera and LiDAR streams for a comprehensive and efficient scene understanding. However, its strict reliance on precise temporal synchronization exposes it to new vulnerabilities. In this paper, we introduce DejaVu, a novel attack that exploits network-induced delays to create subtle temporal misalignments across sensor streams, severely degrading downstream MMF-based perception tasks. Our comprehensive attack analysis across different models and datasets reveals these sensors’ task-specific imbalanced sensitivities: object detection is overly dependent on LiDAR inputs while object tracking is highly reliant on the camera inputs. Consequently, with a single-frame LiDAR delay, an attacker can reduce the car detection mAP by up to 88.5%, while with a three-frame camera delay, multiple object tracking accuracy (MOTA) for car drops by 73%. To detect such attacks, we propose AION, a defense patch that can work alongside the existing perception model to monitor temporal alignment through cross-modal temporal consistency. AION leverages multimodal shared representation learning and dynamic time warping to determine the path of temporal alignment and calculate anomaly scores based on the alignment. Our thorough evaluation of AION shows it achieves AUROC scores of 0.92-0.98 with low false positives across datasets and model architectures, demonstrating it as a robust and generalized defense against the temporal misalignment attacks.
nan
Article 1023
Title@2025-07-12 (6): Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization
Title: Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization | Optimale Hochwahrscheinlichkeit Konvergenz von nichtlinearer SGD unter stark gestaffelter Geräuschentwicklung durch Symmetrisierung | 非线性SGD在通过平衡化的重尾噪音下达到最佳高概率一致 2507.09093v1 |
Authors (4): Aleksandar Armacki, Dragana Bajovic, Dusan Jakovetic, Soummya Kar
We study convergence in high-probability of SGD-type methods in non-convex optimization and the presence of heavy-tailed noise. To combat the heavy-tailed noise, a general black-box nonlinear framework is considered, subsuming nonlinearities like sign, clipping, normalization and their smooth counterparts. Our first result shows that nonlinear SGD (N-SGD) achieves the rate $\widetilde{\mathcal{O}}(t^{-1/2})$, for any noise with unbounded moments and a symmetric probability density function (PDF). Crucially, N-SGD has exponentially decaying tails, matching the performance of linear SGD under light-tailed noise. To handle non-symmetric noise, we propose two novel estimators, based on the idea of noise symmetrization. The first, dubbed Symmetrized Gradient Estimator (SGE), assumes a noiseless gradient at any reference point is available at the start of training, while the second, dubbed Mini-batch SGE (MSGE), uses mini-batches to estimate the noiseless gradient. Combined with the nonlinear framework, we get N-SGE and N-MSGE methods, respectively, both achieving the same convergence rate and exponentially decaying tails as N-SGD, while allowing for non-symmetric noise with unbounded moments and PDF satisfying a mild technical condition, with N-MSGE additionally requiring bounded noise moment of order $p \in (1,2]$. Compared to works assuming noise with bounded $p$-th moment, our results: 1) are based on a novel symmetrization approach; 2) provide a unified framework and relaxed moment conditions; 3) imply optimal oracle complexity of N-SGD and N-SGE, strictly better than existing works when $p < 2$, while the complexity of N-MSGE is close to existing works. Compared to works assuming symmetric noise with unbounded moments, we: 1) provide a sharper analysis and improved rates; 2) facilitate state-dependent symmetric noise; 3) extend the strong guarantees to non-symmetric noise.
nan
Article 1024
Title@2025-07-12 (6): MI CAM: Mutual Information Weighted Activation Mapping for Causal Visual Explanations of Convolutional Neural Networks
Title: MI CAM: Mutual Information Weighted Activation Mapping for Causal Visual Explanations of Convolutional Neural Networks | MI CAM: Gegenseitige Information Gewichtete Aktivierungsmapping für ursächliche visuelle Erklärungen konvolutionärer neuraler Netzwerke | MI CAM: 关于革命神经网络的客观视觉解释的相互信息加权活动绘图 2507.09092v1 |
Authors (3): Ram S Iyer, Narayan S Iyer, Rugmini Ammal P
With the intervention of machine vision in our crucial day to day necessities including healthcare and automated power plants, attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network provides specific inferences. This paper proposes a novel post-hoc visual explanation method called MI CAM based on activation mapping. Differing from previous class activation mapping based approaches, MI CAM produces saliency visualizations by weighing each feature map through its mutual information with the input image and the final result is generated by a linear combination of weights and activation maps. It also adheres to producing causal interpretations as validated with the help of counterfactual analysis. We aim to exhibit the visual performance and unbiased justifications for the model inferencing procedure achieved by MI CAM. Our approach works at par with all state-of-the-art methods but particularly outperforms some in terms of qualitative and quantitative measures. The implementation of proposed method can be found on https://anonymous.4open.science/r/MI-CAM-4D27
nan
Article 1025
Title@2025-07-12 (6): Continuous-Time Signal Decomposition: An Implicit Neural Generalization of PCA and ICA
Title: Continuous-Time Signal Decomposition: An Implicit Neural Generalization of PCA and ICA | Kontinuierliche Zeitsignalzersetzung: Eine implizite Neuralverallgemeinerung von PCA und ICA | 连续信号分解:五氯苯甲醚和ICA的隐性神经化 2507.09091v1 |
Authors (3): Shayan K. Azmoodeh, Krishna Subramani, Paris Smaragdis
We generalize the low-rank decomposition problem, such as principal and independent component analysis (PCA, ICA) for continuous-time vector-valued signals and provide a model-agnostic implicit neural signal representation framework to learn numerical approximations to solve the problem. Modeling signals as continuous-time stochastic processes, we unify the approaches to both the PCA and ICA problems in the continuous setting through a contrast function term in the network loss, enforcing the desired statistical properties of the source signals (decorrelation, independence) learned in the decomposition. This extension to a continuous domain allows the application of such decompositions to point clouds and irregularly sampled signals where standard techniques are not applicable.
nan
Article 1026
Title@2025-07-12 (6): Deep Reinforcement Learning with Gradient Eligibility Traces
Title: Deep Reinforcement Learning with Gradient Eligibility Traces | Tiefe Verstärkung Lernen mit gradienten Berechtigungsspuren | 具有渐进资格追踪的深强化学习 2507.09087v1 |
Authors (6): Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, Martha White
Achieving fast and stable off-policy learning in deep reinforcement learning (RL) is challenging. Most existing methods rely on semi-gradient temporal-difference (TD) methods for their simplicity and efficiency, but are consequently susceptible to divergence. While more principled approaches like Gradient TD (GTD) methods have strong convergence guarantees, they have rarely been used in deep RL. Recent work introduced the Generalized Projected Bellman Error ($\GPBE$), enabling GTD methods to work efficiently with nonlinear function approximation. However, this work is only limited to one-step methods, which are slow at credit assignment and require a large number of samples. In this paper, we extend the $\GPBE$ objective to support multistep credit assignment based on the $\lambda$-return and derive three gradient-based methods that optimize this new objective. We provide both a forward-view formulation compatible with experience replay and a backward-view formulation compatible with streaming algorithms. Finally, we evaluate the proposed algorithms and show that they outperform both PPO and StreamQ in MuJoCo and MinAtar environments, respectively. Code available at https://github.com/esraaelelimy/gtd_algos
nan
Article 1027
Title@2025-07-12 (6): Queue up for takeoff: a transferable deep learning framework for flight delay prediction
Title: Queue up for takeoff: a transferable deep learning framework for flight delay prediction | Warteschlange für Start: ein übertragbares Deep-Learning-Framework für die Flugverzögerungsvorhersage | 飞行延迟预测的可转让深程学习框架 2507.09084v1 |
Authors (8): Nnamdi Daniel Aghanya, Ta Duong Vu, Amaëlle Diop, Charlotte Deville, Nour Imane Kerroumi, Irene Moulitsas, Jun Li, Desmond Bisandu
Flight delays are a significant challenge in the aviation industry, causing major financial and operational disruptions. To improve passenger experience and reduce revenue loss, flight delay prediction models must be both precise and generalizable across different networks. This paper introduces a novel approach that combines Queue-Theory with a simple attention model, referred to as the Queue-Theory SimAM (QT-SimAM). To validate our model, we used data from the US Bureau of Transportation Statistics, where our proposed QT-SimAM (Bidirectional) model outperformed existing methods with an accuracy of 0.927 and an F1 score of 0.932. To assess transferability, we tested the model on the EUROCONTROL dataset. The results demonstrated strong performance, achieving an accuracy of 0.826 and an F1 score of 0.791. Ultimately, this paper outlines an effective, end-to-end methodology for predicting flight delays. The proposed model’s ability to forecast delays with high accuracy across different networks can help reduce passenger anxiety and improve operational decision-making
nan
Article 1028
Title@2025-07-11 (5): Infinite Video Understanding
Title: Infinite Video Understanding | Unendliches Video-Verständnis | 无限视频理解 2507.09068v1 |
Authors (9): Dell Zhang, Xiangyu Chen, Jixiang Luo, Mengxi Jia, Changzhi Sun, Ruilong Ren, Jingren Liu, Hao Sun, Xuelong Li
The rapid advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have ushered in remarkable progress in video understanding. However, a fundamental challenge persists: effectively processing and comprehending video content that extends beyond minutes or hours. While recent efforts like Video-XL-2 have demonstrated novel architectural solutions for extreme efficiency, and advancements in positional encoding such as HoPE and VideoRoPE++ aim to improve spatio-temporal understanding over extensive contexts, current state-of-the-art models still encounter significant computational and memory constraints when faced with the sheer volume of visual tokens from lengthy sequences. Furthermore, maintaining temporal coherence, tracking complex events, and preserving fine-grained details over extended periods remain formidable hurdles, despite progress in agentic reasoning systems like Deep Video Discovery. This position paper posits that a logical, albeit ambitious, next frontier for multimedia research is Infinite Video Understanding – the capability for models to continuously process, understand, and reason about video data of arbitrary, potentially never-ending duration. We argue that framing Infinite Video Understanding as a blue-sky research objective provides a vital north star for the multimedia, and the wider AI, research communities, driving innovation in areas such as streaming architectures, persistent memory mechanisms, hierarchical and adaptive representations, event-centric reasoning, and novel evaluation paradigms. Drawing inspiration from recent work on long/ultra-long video understanding and several closely related fields, we outline the core challenges and key research directions towards achieving this transformative capability.
nan
Article 1029
Title@2025-07-11 (5): HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization
Title: HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization | HYPEROFA: Erweitern von LLM Vokabeln auf neue Sprachen über Hypernetwork-basierte Einbettung in Initialisierung | HYPROOFA:通过基于超网络的嵌入式初始化,将LLM词汇扩大到新语言 2504.21018v2 |
Authors (3): Enes Özeren, Yihong Liu, Hinrich Schütze
Many pre-trained language models (PLMs) exhibit suboptimal performance on mid- and low-resource languages, largely due to limited exposure to these languages during pre-training. A common strategy to address this is to introduce new tokens specific to the target languages, initialize their embeddings, and apply continual pre-training on target-language data. Among such methods, OFA (Liu et al., 2024a) proposes a similarity-based subword embedding initialization heuristic that is both effective and efficient. However, OFA restricts target-language token embeddings to be convex combinations of a fixed number of source-language embeddings, which may limit expressiveness. To overcome this limitation, we propose HYPEROFA, a hypernetwork-based approach for more adaptive token embedding initialization. The hypernetwork is trained to map from an external multilingual word vector space to the PLMs token embedding space using source-language tokens. Once trained, it can generate flexible embeddings for target-language tokens, serving as a good starting point for continual pretraining. Experiments demonstrate that HYPEROFA consistently outperforms random initialization baseline and matches or exceeds the performance of OFA in both continual pre-training convergence and downstream task performance. We make the code publicly available.
nan
Article 1030
Title@2025-07-11 (5): Risk Bounds For Distributional Regression
Title: Risk Bounds For Distributional Regression | Risikogrenzen für distributive Regression | 分布性倒退的风险临界值 2505.09075v3 |
Authors (3): Carlos Misael Madrid Padilla, Oscar Hernan Madrid Padilla, Sabyasachi Chatterjee
This work examines risk bounds for nonparametric distributional regression estimators. For convex-constrained distributional regression, general upper bounds are established for the continuous ranked probability score (CRPS) and the worst-case mean squared error (MSE) across the domain. These theoretical results are applied to isotonic and trend filtering distributional regression, yielding convergence rates consistent with those for mean estimation. Furthermore, a general upper bound is derived for distributional regression under non-convex constraints, with a specific application to neural network-based estimators. Comprehensive experiments on both simulated and real data validate the theoretical contributions, demonstrating their practical effectiveness.
nan
Article 1031
Title@2025-07-11 (5): SetupBench: Assessing Software Engineering Agents’ Ability to Bootstrap Development Environments
Title: SetupBench: Assessing Software Engineering Agents’ Ability to Bootstrap Development Environments | SetupBench: Bewertung der Fähigkeit von Software-Engineering-Agenten zu Bootstrap-Entwicklungsumgebungen | 设置基准:评估软件工程代理器的能力,以建立发展环境 2507.09063v1 |
Authors (3): Avi Arora, Jinu Jang, Roshanak Zilouchian Moghaddam
Modern Large Language Model (LLM) agents promise end to end assistance with real-world software tasks, yet existing benchmarks evaluate LLM agents almost exclusively in pre-baked environments where every dependency is pre-installed. To fill this gap, we introduce SetupBench, a 93 instance benchmark that isolates the environment-bootstrap skill: starting from a bare Linux sandbox, an agent must install packages, resolve dependency conflicts, initialize databases, and configure background services. Our tasks span seven language ecosystems, five database engines, and multi-service orchestration scenarios, each accompanies by a natural language problem statement and a deterministic success command. Through evaluation of OpenHands, a state-of-the-art coding agent, we find low success rates across task categories, with particular challenges in repository setup (38.9-57.4%) and local database configuration (20.0-53.3%). Our analysis reveals systematic failure modes including incomplete development tooling installation, hallucinated task constraints, and non-persistent environment modifications that break agent-human collaboration workflows. We identify substantial inefficiencies in agent exploration strategies, with 38-89% of actions being unnecessary compared to optimal human behavior. These findings highlight gaps in current agents’ practical environment-bootstrap capabilities. By targeting this critical yet under-evaluated capability, SetupBench provides a rigorous yard-stick for the next generation of software developer agents aiming to solve end to end real-wold tasks.
nan
Article 1032
Title@2025-07-11 (5): Imitation Learning in Continuous Action Spaces: Mitigating Compounding Error without Interaction
Title: Imitation Learning in Continuous Action Spaces: Mitigating Compounding Error without Interaction | Imitation Learning in Continuous Action Spaces: Compounding Fehler ohne Wechselwirkungen | 连续行动空间的模拟学习:没有相互作用的减缓化合物错误 2507.09061v1 |
Authors (4): Thomas T. Zhang, Daniel Pfrommer, Nikolai Matni, Max Simchowitz
We study the problem of imitating an expert demonstrator in a continuous state-and-action dynamical system. While imitation learning in discrete settings such as autoregressive language modeling has seen immense success and popularity in recent years, imitation in physical settings such as autonomous driving and robot learning has proven comparably more complex due to the compounding errors problem, often requiring elaborate set-ups to perform stably. Recent work has demonstrated that even in benign settings, exponential compounding errors are unavoidable when learning solely from expert-controlled trajectories, suggesting the need for more advanced policy parameterizations or data augmentation. To this end, we present minimal interventions that provably mitigate compounding errors in continuous state-and-action imitation learning. When the system is open-loop stable, we prescribe “action chunking,” i.e., predicting and playing sequences of actions in open-loop; when the system is possibly unstable, we prescribe “noise injection,” i.e., adding noise during expert demonstrations. These interventions align with popular choices in modern robot learning, though the benefits we derive are distinct from the effects they were designed to target. Our results draw insights and tools from both control theory and reinforcement learning; however, our analysis reveals novel considerations that do not naturally arise when either literature is considered in isolation.
nan
Article 1033
Title@2025-07-11 (5): Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins
Title: Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins | Konformations-Aware-Struktur Vorhersage von Antigen-Erkennung Immunproteine | 抗原识别免疫素蛋白的预测 2507.09054v1 |
Authors (11): Frédéric A. Dreyer, Jan Ludwiczak, Karolis Martinkus, Brennan Abanades, Robert G. Alberstein, Pan Kessel, Pranav Rao, Jae Hyeon Lee, Richard Bonneau, Andrew M. Watkins, Franziska Seeger
We introduce Ibex, a pan-immunoglobulin structure prediction model that achieves state-of-the-art accuracy in modeling the variable domains of antibodies, nanobodies, and T-cell receptors. Unlike previous approaches, Ibex explicitly distinguishes between bound and unbound protein conformations by training on labeled apo and holo structural pairs, enabling accurate prediction of both states at inference time. Using a comprehensive private dataset of high-resolution antibody structures, we demonstrate superior out-of-distribution performance compared to existing specialized and general protein structure prediction tools. Ibex combines the accuracy of cutting-edge models with significantly reduced computational requirements, providing a robust foundation for accelerating large molecule design and therapeutic development.
nan
Article 1034
Title@2025-07-11 (5): Can Contrastive Learning Improve Class-Imbalanced Diffusion Model?
Title: Can Contrastive Learning Improve Class-Imbalanced Diffusion Model? | Kann Kontrastives Lernen das Klassen-Imbalanced Diffusion Model verbessern? | 差异学习能改善班级平衡传播模式吗? 2507.09052v1 |
Authors (5): Fang Chen, Alex Villa, Gongbo Liang, Xiaoyi Lu, Meng Tang
Training data for class-conditional image synthesis often exhibit a long-tailed distribution with limited images for tail classes. Such an imbalance causes mode collapse and reduces the diversity of synthesized images for tail classes. For class-conditional diffusion models trained on imbalanced data, we aim to improve the diversity of tail class images without compromising the fidelity and diversity of head class images. We achieve this by introducing two deceptively simple but highly effective contrastive loss functions. Firstly, we employ an unsupervised InfoNCE loss utilizing negative samples to increase the distance/dissimilarity among synthetic images, particularly for tail classes. To further enhance the diversity of tail classes, our second loss is an MSE loss that contrasts class-conditional generation with unconditional generation at large timesteps. This second loss makes the denoising process insensitive to class conditions for the initial steps, which enriches tail classes through knowledge sharing from head classes. Conditional-unconditional alignment has been shown to enhance the performance of long-tailed GAN. We are the first to adapt such alignment to diffusion models. We successfully leveraged contrastive learning for class-imbalanced diffusion models. Our contrastive learning framework is easy to implement and outperforms standard DDPM and alternative methods for class-imbalanced diffusion models across various datasets, including CIFAR10/100-LT, PlacesLT, TinyImageNetLT, and ImageNetLT.
nan
Article 1035
Title@2025-07-11 (5): GPS-Aided Deep Learning for Beam Prediction and Tracking in UAV mmWave Communication
Title: GPS-Aided Deep Learning for Beam Prediction and Tracking in UAV mmWave Communication | GPS-gestütztes Deep Learning für Strahlvorhersage und Tracking in UAV mmWave Kommunikation | GPS 辅助的无人驾驶飞行器波段通信光束预测和跟踪深层学习 2505.17530v2 |
Authors (2): Vendi Ardianto Nugroho, Byung Moo Lee
Millimeter-wave (mmWave) communication enables high data rates for cellular-connected Unmanned Aerial Vehicles (UAVs). However, a robust beam management remains challenging due to significant path loss and the dynamic mobility of UAVs, which can destabilize the UAV-base station (BS) link. This research presents a GPS-aided deep learning (DL) model that simultaneously predicts current and future optimal beams for UAV mmWave communications, maintaining a Top-1 prediction accuracy exceeding 70% and an average power loss below 0.6 dB across all prediction steps. These outcomes stem from a proposed data set splitting method ensuring balanced label distribution, paired with a GPS preprocessing technique that extracts key positional features, and a DL architecture that maps sequential position data to beam index predictions. The model reduces overhead by approximately 93% (requiring the training of 2 ~ 3 beams instead of 32 beams) with 95% beam prediction accuracy guarantees, and ensures 94% to 96% of predictions exhibit mean power loss not exceeding 1 dB.
nan
Article 1036
Title@2025-07-11 (5): A Method for Learning to Solve Parametric Bilevel Optimization with Coupling Constraints
Title: A Method for Learning to Solve Parametric Bilevel Optimization with Coupling Constraints | Eine Methode zum Lösen parametrischer Bilevel-Optimierung mit Koppelungsbeschränkungen | 学会解决双级优化和组合制约的 参数参数优化方法 2507.09050v1 |
Authors (6): James Kotary, Himanshu Sharma, Ethan King, Draguna Vrabie, Ferdinando Fioretto, Jan Drgona
Learning to Optimize (L2O) is a subfield of machine learning (ML) in which ML models are trained to solve parametric optimization problems. The general goal is to learn a fast approximator of solutions to constrained optimization problems, as a function of their defining parameters. Prior L2O methods focus almost entirely on single-level programs, in contrast to the bilevel programs, whose constraints are themselves expressed in terms of optimization subproblems. Bilevel programs have numerous important use cases but are notoriously difficult to solve, particularly under stringent time demands. This paper proposes a framework for learning to solve a broad class of challenging bilevel optimization problems, by leveraging modern techniques for differentiation through optimization problems. The framework is illustrated on an array of synthetic bilevel programs, as well as challenging control system co-design problems, showing how neural networks can be trained as efficient approximators of parametric bilevel optimization.
nan
Article 1037
Title@2025-07-11 (5): Shortening the Trajectories: Identity-Aware Gaussian Approximation for Efficient 3D Molecular Generation
Title: Shortening the Trajectories: Identity-Aware Gaussian Approximation for Efficient 3D Molecular Generation | Verkürzung der Trajektorien: Identity-Aware Gaussian Approximation für effiziente 3D-Molekulargeneration | 缩短轨迹:为高效的三维分子生成而使身份-软件高斯近似化 2507.09043v1 |
Authors (3): Jingxiang Qu, Wenhan Gao, Yi Liu
Gaussian-based Probabilistic Generative Models (GPGMs) generate data by reversing a stochastic process that progressively corrupts samples with Gaussian noise. While these models have achieved state-of-the-art performance across diverse domains, their practical deployment remains constrained by the high computational cost of long generative trajectories, which often involve hundreds to thousands of steps during training and sampling. In this work, we introduce a theoretically grounded and empirically validated framework that improves generation efficiency without sacrificing training granularity or inference fidelity. Our key insight is that for certain data modalities, the noising process causes data to rapidly lose its identity and converge toward a Gaussian distribution. We analytically identify a characteristic step at which the data has acquired sufficient Gaussianity, and then replace the remaining generation trajectory with a closed-form Gaussian approximation. Unlike existing acceleration techniques that coarsening the trajectories by skipping steps, our method preserves the full resolution of learning dynamics while avoiding redundant stochastic perturbations between `Gaussian-like’ distributions. Empirical results across multiple data modalities demonstrate substantial improvements in both sample quality and computational efficiency.
nan
Article 1038
Title@2025-07-11 (5): Behavioral Exploration: Learning to Explore via In-Context Adaptation
Title: Behavioral Exploration: Learning to Explore via In-Context Adaptation | Verhaltensforschung: Lernen, durch In-Context-Anpassung zu erkunden | B. 行为探索:学习通过内容内适应探索 2507.09041v1 |
Authors (3): Andrew Wagenmaker, Zhiyuan Zhou, Sergey Levine
Developing autonomous agents that quickly explore an environment and adapt their behavior online is a canonical challenge in robotics and machine learning. While humans are able to achieve such fast online exploration and adaptation, often acquiring new information and skills in only a handful of interactions, existing algorithmic approaches tend to rely on random exploration and slow, gradient-based behavior updates. How can we endow autonomous agents with such capabilities on par with humans? Taking inspiration from recent progress on both in-context learning and large-scale behavioral cloning, in this work we propose behavioral exploration: training agents to internalize what it means to explore and adapt in-context over the space of expert'' behaviors. To achieve this, given access to a dataset of expert demonstrations, we train a long-context generative model to predict expert actions conditioned on a context of past observations and a measure of how
exploratory’’ the expert’s behaviors are relative to this context. This enables the model to not only mimic the behavior of an expert, but also, by feeding its past history of interactions into its context, to select different expert behaviors than what have been previously selected, thereby allowing for fast online adaptation and targeted, ``expert-like’’ exploration. We demonstrate the effectiveness of our method in both simulated locomotion and manipulation settings, as well as on real-world robotic manipulation tasks, illustrating its ability to learn adaptive, exploratory behavior.
nan
Article 1039
Title@2025-07-11 (5): BrainLesion Suite: A Flexible and User-Friendly Framework for Modular Brain Lesion Image Analysis
Title: BrainLesion Suite: A Flexible and User-Friendly Framework for Modular Brain Lesion Image Analysis | BrainLesion Suite: Ein flexibles und benutzerfreundliches Framework für die modulare Gehirn-Lesions-Bildanalyse | 脑悬浮套件:模块脑悬浮图像分析灵活和用户友好框架 2507.09036v1 |
Authors (29): Florian Kofler, Marcel Rosier, Mehdi Astaraki, Hendrik Möller, Ilhem Isra Mekki, Josef A. Buchner, Anton Schmick, Arianna Pfiffer, Eva Oswald, Lucas Zimmer, Ezequiel de la Rosa, Sarthak Pati, Julian Canisius, Arianna Piffer, Ujjwal Baid, Mahyar Valizadeh, Akis Linardos, Jan C. Peeken, Surprosanna Shit, Felix Steinbauer, Daniel Rueckert, Rolf Heckemann, Spyridon Bakas, Jan Kirschke, Constantin von See, Ivan Ezhov, Marie Piraud, Benedikt Wiestler, Bjoern Menze
BrainLesion Suite is a versatile toolkit for building modular brain lesion image analysis pipelines in Python. Following Pythonic principles, BrainLesion Suite is designed to provide a ‘brainless’ development experience, minimizing cognitive effort and streamlining the creation of complex workflows for clinical and scientific practice. At its core is an adaptable preprocessing module that performs co-registration, atlas registration, and optional skull-stripping and defacing on arbitrary multi-modal input images. BrainLesion Suite leverages algorithms from the BraTS challenge to synthesize missing modalities, inpaint lesions, and generate pathology-specific tumor segmentations. BrainLesion Suite also enables quantifying segmentation model performance, with tools such as panoptica to compute lesion-wise metrics. Although BrainLesion Suite was originally developed for image analysis pipelines of brain lesions such as glioma, metastasis, and multiple sclerosis, it can be adapted for other biomedical image analysis applications. The individual BrainLesion Suite packages and tutorials are accessible on GitHub.
nan
Article 1040
Title@2025-07-11 (5): Confounder-Free Continual Learning via Recursive Feature Normalization
Title: Confounder-Free Continual Learning via Recursive Feature Normalization | Confounder-Free Continual Learning via Rekursive Feature Normalisierung | 通过递归性地貌正常化实现连续学习 2507.09031v1 |
Authors (6): Yash Shah, Camila Gonzalez, Mohammad H. Abbasi, Qingyu Zhao, Kilian M. Pohl, Ehsan Adeli
Confounders are extraneous variables that affect both the input and the target, resulting in spurious correlations and biased predictions. There are recent advances in dealing with or removing confounders in traditional models, such as metadata normalization (MDN), where the distribution of the learned features is adjusted based on the study confounders. However, in the context of continual learning, where a model learns continuously from new data over time without forgetting, learning feature representations that are invariant to confounders remains a significant challenge. To remove their influence from intermediate feature representations, we introduce the Recursive MDN (R-MDN) layer, which can be integrated into any deep learning architecture, including vision transformers, and at any model stage. R-MDN performs statistical regression via the recursive least squares algorithm to maintain and continually update an internal model state with respect to changing distributions of data and confounding variables. Our experiments demonstrate that R-MDN promotes equitable predictions across population groups, both within static learning and across different stages of continual learning, by reducing catastrophic forgetting caused by confounder effects changing over time.
nan
Article 1041
Title@2025-07-11 (5): Model Parallelism With Subnetwork Data Parallelism
Title: Model Parallelism With Subnetwork Data Parallelism | Modell-Parallelität mit Subnetzwerk-Daten-Parallelität | 与亚网络数据平行的模型平行主义 2507.09029v1 |
Authors (4): Vaibhav Singh, Zafir Khalid, Edouard Oyallon, Eugene Belilovsky
Distributed pre-training of large models at scale often imposes heavy memory demands on individual nodes and incurs significant intra-node communication costs. We propose a novel alternative approach that reduces the memory requirements by training small, structured subnetworks of the model on separate workers. Unlike pipelining, our method avoids inter-node activation communication and maintains bandwidth requirements that are comparable to or lower than standard data parallel communication schemes based on all-reduce. We evaluate two subnetwork construction strategies guided by the principle of ensuring uniform representation of each parameter across the distributed training setup. Our results show that the stochastic block dropping technique consistently outperforms the width-wise subnetwork construction previously explored in federated learning. We empirically attribute this superior performance to stronger gradient alignment in subnetworks that retain blocks having skip connections. Preliminary experiments highlight the promise of our approach, achieving a 20-40% reduction in memory usage without any loss in performance.
nan
Article 1042
Title@2025-07-11 (5): On the Gradient Domination of the LQG Problem
Title: On the Gradient Domination of the LQG Problem | Zur Gradienten-Domination des LQG-Problems | LQG 问题的渐变多变 2507.09026v1 |
Authors (3): Kasra Fallah, Leonardo F. Toso, James Anderson
We consider solutions to the linear quadratic Gaussian (LQG) regulator problem via policy gradient (PG) methods. Although PG methods have demonstrated strong theoretical guarantees in solving the linear quadratic regulator (LQR) problem, despite its nonconvex landscape, their theoretical understanding in the LQG setting remains limited. Notably, the LQG problem lacks gradient dominance in the classical parameterization, i.e., with a dynamic controller, which hinders global convergence guarantees. In this work, we study PG for the LQG problem by adopting an alternative parameterization of the set of stabilizing controllers and employing a lifting argument. We refer to this parameterization as a history representation of the control input as it is parameterized by past input and output data from the previous p time-steps. This representation enables us to establish gradient dominance and approximate smoothness for the LQG cost. We prove global convergence and per-iteration stability guarantees for policy gradient LQG in model-based and model-free settings. Numerical experiments on an open-loop unstable system are provided to support the global convergence guarantees and to illustrate convergence under different history lengths of the history representation.
nan
Article 1043
Title@2025-07-11 (5): Lizard: An Efficient Linearization Framework for Large Language Models
Title: Lizard: An Efficient Linearization Framework for Large Language Models | Lizard: Ein effizienter Linearisierungsrahmen für große Sprachmodelle | Lizard:大型语言模型的高效线性框架 2507.09025v1 |
Authors (12): Chien Van Nguyen, Ruiyi Zhang, Hanieh Deilamsalehy, Puneet Mathur, Viet Dac Lai, Haoliang Wang, Jayakumar Subramanian, Ryan A. Rossi, Trung Bui, Nikos Vlassis, Franck Dernoncourt, Thien Huu Nguyen
We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into flexible, subquadratic architectures for infinite-context generation. Transformer-based LLMs face significant memory and computational bottlenecks as context lengths increase, due to the quadratic complexity of softmax attention and the growing key-value (KV) cache. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving the output quality. Unlike previous linearization methods, which are often limited by fixed model structures and therefore exclude gating mechanisms, Lizard incorporates a gating module inspired by recent state-of-the-art linear models. This enables adaptive memory control, supports constant-memory inference, offers strong length generalization, and allows more flexible model design. Lizard combines gated linear attention for global context compression with sliding window attention enhanced by meta memory, forming a hybrid mechanism that captures both long-range dependencies and fine-grained local interactions. Moreover, we introduce a hardware-aware algorithm that accelerates the training speed of our models. Extensive experiments show that Lizard achieves near-lossless recovery of the teacher model’s performance across standard language modeling tasks, while significantly outperforming previous linearization methods. On the 5-shot MMLU benchmark, Lizard improves over prior models by 18 points and shows significant improvements on associative recall tasks.
nan
Article 1044
Title@2025-07-11 (5): Adaptive Non-local Observable on Quantum Neural Networks
Title: Adaptive Non-local Observable on Quantum Neural Networks | Adaptive nicht-lokale Beobachtung auf Quantum-Neural-Netzwerken | 在量子神经网络上可观测的非当地可观测 2504.13414v3 |
Authors (4): Hsin-Yi Lin, Huan-Hsin Tseng, Samuel Yen-Chi Chen, Shinjae Yoo
Conventional Variational Quantum Circuits (VQCs) for Quantum Machine Learning typically rely on a fixed Hermitian observable, often built from Pauli operators. Inspired by the Heisenberg picture, we propose an adaptive non-local measurement framework that substantially increases the model complexity of the quantum circuits. Our introduction of dynamical Hermitian observables with evolving parameters shows that optimizing VQC rotations corresponds to tracing a trajectory in the observable space. This viewpoint reveals that standard VQCs are merely a special case of the Heisenberg representation. Furthermore, we show that properly incorporating variational rotations with non-local observables enhances qubit interaction and information mixture, admitting flexible circuit designs. Two non-local measurement schemes are introduced, and numerical simulations on classification tasks confirm that our approach outperforms conventional VQCs, yielding a more powerful and resource-efficient approach as a Quantum Neural Network.
nan
Article 1045
Title@2025-07-11 (5): On Evaluating Performance of LLM Inference Serving Systems
Title: On Evaluating Performance of LLM Inference Serving Systems | Zur Bewertung der Leistung von LLM-Inferenz-Serviersystemen | 评价LLLM LM 推断服务系统的性能 2507.09019v1 |
Authors (8): Amey Agrawal, Nitin Kedia, Anmol Agarwal, Jayashree Mohan, Nipun Kwatra, Souvik Kundu, Ramachandran Ramjee, Alexey Tumanov
The rapid evolution of Large Language Model (LLM) inference systems has yielded significant efficiency improvements. However, our systematic analysis reveals that current evaluation methodologies frequently exhibit fundamental flaws, often manifesting as common evaluation anti-patterns that obscure true performance characteristics and impede scientific progress. Through a comprehensive examination of recent systems, we identify recurring anti-patterns across three key dimensions: Baseline Fairness, Evaluation Setup, and Metric Design. These anti-patterns are uniquely problematic for LLM inference due to its dual-phase nature combining distinct prefill and decode operations, its handling of highly heterogeneous workloads, and its strict temporal requirements for interactive use. We demonstrate how common anti-patterns – such as inadequate baseline comparisons that conflate engineering effort with algorithmic novelty, workload selections that fail to represent production scenarios, and metric normalizations that hide substantial performance variability like generation stalls-lead to misleading conclusions. To address these challenges, we provide a comprehensive checklist derived from our analysis, establishing a framework for recognizing and avoiding these anti-patterns in favor of robust LLM inference evaluation. To demonstrate the practical application of our framework, we present a case study analyzing speculative decoding, a technique whose bursty, non-uniform token generation is easily misinterpreted when evaluated using approaches characteristic of these anti-patterns. Our work establishes a rigorous foundation for evaluation methodology, enabling meaningful comparisons, ensuring reproducible results, and ultimately accelerating genuine progress in LLM inference systems by moving beyond common anti-patterns to align evaluation with real-world requirements.
nan
Article 1046
Title@2025-07-11 (5): Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
Title: Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration | Ausschüttung von Kompetenzen aus nicht gekennzeichneten vorherigen Daten für effiziente Online-Exploration | 从未贴标签的先前数据中利用技能以进行有效的在线探索 2410.18076v4 |
Authors (4): Max Wilcoxson, Qiyang Li, Kevin Frans, Sergey Levine
Unsupervised pretraining has been transformative in many supervised domains. However, applying such ideas to reinforcement learning (RL) presents a unique challenge in that fine-tuning does not involve mimicking task-specific data, but rather exploring and locating the solution through iterative self-improvement. In this work, we study how unlabeled offline trajectory data can be leveraged to learn efficient exploration strategies. While prior data can be used to pretrain a set of low-level skills, or as additional off-policy data for online RL, it has been unclear how to combine these ideas effectively for online exploration. Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits. Our method first extracts low-level skills using a variational autoencoder (VAE), and then pseudo-labels unlabeled trajectories with optimistic rewards and high-level action labels, transforming prior data into high-level, task-relevant examples that encourage novelty-seeking behavior. Finally, SUPE uses these transformed examples as additional off-policy data for online RL to learn a high-level policy that composes pretrained low-level skills to explore efficiently. In our experiments, SUPE consistently outperforms prior strategies across a suite of 42 long-horizon, sparse-reward tasks. Code: https://github.com/rail-berkeley/supe.
nan
Article 1047
Title@2025-07-11 (5): Multimodal Cardiovascular Risk Profiling Using Self-Supervised Learning of Polysomnography
Title: Multimodal Cardiovascular Risk Profiling Using Self-Supervised Learning of Polysomnography | Multimodales kardiovaskuläres Risiko Profilieren mittels selbstüberwachtem Lernen der Polysomnographie | 利用对多光谱学进行自我监督学习的多式多模式心血管风险分析 2507.09009v1 |
Authors (7): Zhengxiao He, Huayu Li, Geng Yuan, William D. S. Killgore, Stuart F. Quan, Chen X. Chen, Ao Li
Methods: We developed a self-supervised deep learning model that extracts meaningful patterns from multi-modal signals (Electroencephalography (EEG), Electrocardiography (ECG), and respiratory signals). The model was trained on data from 4,398 participants. Projection scores were derived by contrasting embeddings from individuals with and without CVD outcomes. External validation was conducted in an independent cohort with 1,093 participants. The source code is available on https://github.com/miraclehetech/sleep-ssl. Results: The projection scores revealed distinct and clinically meaningful patterns across modalities. ECG-derived features were predictive of both prevalent and incident cardiac conditions, particularly CVD mortality. EEG-derived features were predictive of incident hypertension and CVD mortality. Respiratory signals added complementary predictive value. Combining these projection scores with the Framingham Risk Score consistently improved predictive performance, achieving area under the curve values ranging from 0.607 to 0.965 across different outcomes. Findings were robustly replicated and validated in the external testing cohort. Conclusion: Our findings demonstrate that the proposed framework can generate individualized CVD risk scores directly from PSG data. The resulting projection scores have the potential to be integrated into clinical practice, enhancing risk assessment and supporting personalized care.
nan
Article 1048
Title@2025-07-11 (5): Surprisingly High Redundancy in Electronic Structure Data
Title: Surprisingly High Redundancy in Electronic Structure Data | Überraschend hohe Redundanz in elektronischen Strukturdaten | 电子结构数据冗余率之高令人惊讶 2507.09001v1 |
Authors (7): Sazzad Hossain, Ponkrshnan Thiagarajan, Shashank Pathrudkar, Stephanie Taylor, Abhijeet S. Gangan, Amartya S. Banerjee, Susanta Ghosh
Machine Learning (ML) models for electronic structure rely on large datasets generated through expensive Kohn-Sham Density Functional Theory simulations. This study reveals a surprisingly high level of redundancy in such datasets across various material systems, including molecules, simple metals, and complex alloys. Our findings challenge the prevailing assumption that large, exhaustive datasets are necessary for accurate ML predictions of electronic structure. We demonstrate that even random pruning can substantially reduce dataset size with minimal loss in predictive accuracy, while a state-of-the-art coverage-based pruning strategy retains chemical accuracy and model generalizability using up to 100-fold less data and reducing training time by threefold or more. By contrast, widely used importance-based pruning methods, which eliminate seemingly redundant data, can catastrophically fail at higher pruning factors, possibly due to the significant reduction in data coverage. This heretofore unexplored high degree of redundancy in electronic structure data holds the potential to identify a minimal, essential dataset representative of each material class.
nan
Article 1049
Title@2025-07-11 (5): Fixed-Confidence Multiple Change Point Identification under Bandit Feedback
Title: Fixed-Confidence Multiple Change Point Identification under Bandit Feedback | Fixed-Confidence Multiple Change Point Identification unter Bandit Feedback | 土匪反馈下的多变点识别 2507.08994v1 |
Authors (2): Joseph Lazzaro, Ciara Pike-Burke
Piecewise constant functions describe a variety of real-world phenomena in domains ranging from chemistry to manufacturing. In practice, it is often required to confidently identify the locations of the abrupt changes in these functions as quickly as possible. For this, we introduce a fixed-confidence piecewise constant bandit problem. Here, we sequentially query points in the domain and receive noisy evaluations of the function under bandit feedback. We provide instance-dependent lower bounds for the complexity of change point identification in this problem. These lower bounds illustrate that an optimal method should focus its sampling efforts adjacent to each of the change points, and the number of samples around each change point should be inversely proportional to the magnitude of the change. Building on this, we devise a simple and computationally efficient variant of Track-and-Stop and prove that it is asymptotically optimal in many regimes. We support our theoretical findings with experimental results in synthetic environments demonstrating the efficiency of our method.
nan
Article 1050
Title@2025-07-11 (5): Physics-Based Machine Learning Closures and Wall Models for Hypersonic Transition-Continuum Boundary Layer Predictions
Title: Physics-Based Machine Learning Closures and Wall Models for Hypersonic Transition-Continuum Boundary Layer Predictions | Physikbasiertes maschinelles Lernen von Schließungen und Wandmodellen für Hypersonic Transition-Continuum Boundary Layer Vorhersagen | 基于物理的机器学习封闭和超音速过渡-连续边界层预测墙模型 2507.08986v1 |
Authors (5): Ashish S. Nair, Narendra Singh, Marco Panesi, Justin Sirignano, Jonathan F. MacArt
Modeling rarefied hypersonic flows remains a fundamental challenge due to the breakdown of classical continuum assumptions in the transition-continuum regime, where the Knudsen number ranges from approximately 0.1 to 10. Conventional Navier-Stokes-Fourier (NSF) models with empirical slip-wall boundary conditions fail to accurately predict nonequilibrium effects such as velocity slip, temperature jump, and shock structure deviations. We develop a physics-constrained machine learning framework that augments transport models and boundary conditions to extend the applicability of continuum solvers in nonequilibrium hypersonic regimes. We employ deep learning PDE models (DPMs) for the viscous stress and heat flux embedded in the governing PDEs and trained via adjoint-based optimization. We evaluate these for two-dimensional supersonic flat-plate flows across a range of Mach and Knudsen numbers. Additionally, we introduce a wall model based on a mixture of skewed Gaussian approximations of the particle velocity distribution function. This wall model replaces empirical slip conditions with physically informed, data-driven boundary conditions for the streamwise velocity and wall temperature. Our results show that a trace-free anisotropic viscosity model, paired with the skewed-Gaussian distribution function wall model, achieves significantly improved accuracy, particularly at high-Mach and high-Knudsen number regimes. Strategies such as parallel training across multiple Knudsen numbers and inclusion of high-Mach data during training are shown to enhance model generalization. Increasing model complexity yields diminishing returns for out-of-sample cases, underscoring the need to balance degrees of freedom and overfitting. This work establishes data-driven, physics-consistent strategies for improving hypersonic flow modeling for regimes in which conventional continuum approaches are invalid.
nan
Article 1051
Title@2025-07-11 (5): Exploiting Leaderboards for Large-Scale Distribution of Malicious Models
Title: Exploiting Leaderboards for Large-Scale Distribution of Malicious Models | Ausnutzung von Leaderboards für die großräumige Verbreitung von bösartigen Modellen | 利用恶意模式大规模分布模式主导板 2507.08983v1 |
Authors (6): Anshuman Suri, Harsh Chaudhari, Yuefeng Peng, Ali Naseh, Amir Houmansadr, Alina Oprea
While poisoning attacks on machine learning models have been extensively studied, the mechanisms by which adversaries can distribute poisoned models at scale remain largely unexplored. In this paper, we shed light on how model leaderboards – ranked platforms for model discovery and evaluation – can serve as a powerful channel for adversaries for stealthy large-scale distribution of poisoned models. We present TrojanClimb, a general framework that enables injection of malicious behaviors while maintaining competitive leaderboard performance. We demonstrate its effectiveness across four diverse modalities: text-embedding, text-generation, text-to-speech and text-to-image, showing that adversaries can successfully achieve high leaderboard rankings while embedding arbitrary harmful functionalities, from backdoors to bias injection. Our findings reveal a significant vulnerability in the machine learning ecosystem, highlighting the urgent need to redesign leaderboard evaluation mechanisms to detect and filter malicious (e.g., poisoned) models, while exposing broader security implications for the machine learning community regarding the risks of adopting models from unverified sources.
nan
Article 1052
Title@2025-07-11 (5): VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models
Title: VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models | VIP: Visueller Informationsschutz durch feindliche Angriffe auf Vision-Sprachen-Modelle | 要人:通过对视觉语言模型的对立攻击保护视觉信息 2507.08982v1 |
Authors (4): Hanene F. Z. Brachemi Meftah, Wassim Hamidouche, Sid Ahmed Fezza, Olivier Déforges
Recent years have witnessed remarkable progress in developing Vision-Language Models (VLMs) capable of processing both textual and visual inputs. These models have demonstrated impressive performance, leading to their widespread adoption in various applications. However, this widespread raises serious concerns regarding user privacy, particularly when models inadvertently process or expose private visual information. In this work, we frame the preservation of privacy in VLMs as an adversarial attack problem. We propose a novel attack strategy that selectively conceals information within designated Region Of Interests (ROIs) in an image, effectively preventing VLMs from accessing sensitive content while preserving the semantic integrity of the remaining image. Unlike conventional adversarial attacks that often disrupt the entire image, our method maintains high coherence in unmasked areas. Experimental results across three state-of-the-art VLMs namely LLaVA, Instruct-BLIP, and BLIP2-T5 demonstrate up to 98% reduction in detecting targeted ROIs, while maintaining global image semantics intact, as confirmed by high similarity scores between clean and adversarial outputs. We believe that this work contributes to a more privacy conscious use of multimodal models and offers a practical tool for further research, with the source code publicly available at: https://github.com/hbrachemi/Vlm_defense-attack.
nan
Article 1053
Title@2025-07-11 (5): Learning Diffusion Models with Flexible Representation Guidance
Title: Learning Diffusion Models with Flexible Representation Guidance | Diffusionsmodelle mit flexibler Darstellungsführung lernen | 具有灵活代表制指导的学习传播模式 2507.08980v1 |
Authors (7): Chenyu Wang, Cai Zhou, Sharut Gupta, Zongyu Lin, Stefanie Jegelka, Stephen Bates, Tommi Jaakkola
Diffusion models can be improved with additional guidance towards more effective representations of input. Indeed, prior empirical work has already shown that aligning internal representations of the diffusion model with those of pre-trained models improves generation quality. In this paper, we present a systematic framework for incorporating representation guidance into diffusion models. We provide alternative decompositions of denoising models along with their associated training criteria, where the decompositions determine when and how the auxiliary representations are incorporated. Guided by our theoretical insights, we introduce two new strategies for enhancing representation alignment in diffusion models. First, we pair examples with target representations either derived from themselves or arisen from different synthetic modalities, and subsequently learn a joint model over the multimodal pairs. Second, we design an optimal training curriculum that balances representation learning and data generation. Our experiments across image, protein sequence, and molecule generation tasks demonstrate superior performance as well as accelerated training. In particular, on the class-conditional ImageNet $256\times 256$ benchmark, our guidance results in $23.3$ times faster training than the original SiT-XL as well as four times speedup over the state-of-the-art method REPA. The code is available at https://github.com/ChenyuWang-Monica/REED.
nan
Article 1054
Title@2025-07-11 (5): PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection
Title: PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection | PRISM: Reduzieren von sauberen Impliziten in Vision-Sprachenmodellen mit LLM-geführter Einbettung | PRISM: 利用LLM-引导嵌入式预测减少视觉-语言模型中的纯净隐含比喻 2507.08979v1 |
Authors (5): Mahdiyar Molahasani, Azadeh Motamedi, Michael Greenspan, Il-Min Kim, Ali Etemad
We introduce Projection-based Reduction of Implicit Spurious bias in vision-language Models (PRISM), a new data-free and task-agnostic solution for bias mitigation in VLMs like CLIP. VLMs often inherit and amplify biases in their training data, leading to skewed predictions. PRISM is designed to debias VLMs without relying on predefined bias categories or additional external data. It operates in two stages: first, an LLM is prompted with simple class prompts to generate scene descriptions that contain spurious correlations. Next, PRISM uses our novel contrastive-style debiasing loss to learn a projection that maps the embeddings onto a latent space that minimizes spurious correlations while preserving the alignment between image and text embeddings.Extensive experiments demonstrate that PRISM outperforms current debiasing methods on the commonly used Waterbirds and CelebA datasets We make our code public at: https://github.com/MahdiyarMM/PRISM.
nan
Article 1055
Title@2025-07-11 (5): Exploration Behavior of Untrained Policies
Title: Exploration Behavior of Untrained Policies | Explorationsverhalten ungeübter Politiken | 未经过培训的政策的探索行为 2506.22566v2 |
Authors (1): Jacob Adamczyk
Exploration remains a fundamental challenge in reinforcement learning (RL), particularly in environments with sparse or adversarial reward structures. In this work, we study how the architecture of deep neural policies implicitly shapes exploration before training. We theoretically and empirically demonstrate strategies for generating ballistic or diffusive trajectories from untrained policies in a toy model. Using the theory of infinite-width networks and a continuous-time limit, we show that untrained policies return correlated actions and result in non-trivial state-visitation distributions. We discuss the distributions of the corresponding trajectories for a standard architecture, revealing insights into inductive biases for tackling exploration. Our results establish a theoretical and experimental framework for using policy initialization as a design tool to understand exploration behavior in early training.
nan
Article 1056
Title@2025-07-11 (5): Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery
Title: Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery | Simulation als Aufsicht: Mechanistische Vorausbildung für wissenschaftliche Entdeckung | 模拟监督:科学发现机械预科训练 2507.08977v1 |
Authors (4): Carson Dudley, Reiden Magdaleno, Christopher Harding, Marisa Eisenberg
Scientific modeling faces a core limitation: mechanistic models offer interpretability but collapse under real-world complexity, while machine learning models are flexible but require large labeled datasets, cannot infer unobservable quantities, and operate as black boxes. We introduce Simulation-Grounded Neural Networks (SGNNs), a general framework that uses mechanistic simulations as training data for neural networks. SGNNs are pretrained on synthetic corpora spanning diverse model structures, parameter regimes, stochasticity, and observational artifacts. We evaluated SGNNs across scientific disciplines and modeling tasks, and found that SGNNs achieved state-of-the-art results across settings: for prediction tasks, they nearly tripled COVID-19 forecasting skill versus CDC baselines, reduced chemical yield prediction error by one third, and maintained accuracy in ecological forecasting where task specific models failed. For inference tasks, SGNNs also accurately classified the source of information spread in simulated social networks and enabled supervised learning for unobservable targets, such as estimating COVID-19 transmissibility more accurately than traditional methods even in early outbreaks. Finally, SGNNs enable back-to-simulation attribution, a new form of mechanistic interpretability. Given real world input, SGNNs retrieve simulations based on what the model has learned to see as most similar, revealing which underlying dynamics the model believes are active. This provides process-level insight – what the model thinks is happening – not just which features mattered. SGNNs unify scientific theory with deep learning flexibility and unlock a new modeling paradigm – transforming simulations from rigid, post hoc tools into flexible sources of supervision, enabling robust, interpretable inference even when ground truth is missing.
nan
Article 1057
Title@2025-07-11 (5): Simulating Three-dimensional Turbulence with Physics-informed Neural Networks
Title: Simulating Three-dimensional Turbulence with Physics-informed Neural Networks | Simulation von dreidimensionalen Turbulenzen mit physikinformierten Neuronalen Netzwerken | 用物理知情神经网络模拟三维振动 2507.08972v1 |
Authors (4): Sifan Wang, Shyam Sankaran, Panos Stinis, Paris Perdikaris
Turbulent fluid flows are among the most computationally demanding problems in science, requiring enormous computational resources that become prohibitive at high flow speeds. Physics-informed neural networks (PINNs) represent a radically different approach that trains neural networks directly from physical equations rather than data, offering the potential for continuous, mesh-free solutions. Here we show that appropriately designed PINNs can successfully simulate fully turbulent flows in both two and three dimensions, directly learning solutions to the fundamental fluid equations without traditional computational grids or training data. Our approach combines several algorithmic innovations including adaptive network architectures, causal training, and advanced optimization methods to overcome the inherent challenges of learning chaotic dynamics. Through rigorous validation on challenging turbulence problems, we demonstrate that PINNs accurately reproduce key flow statistics including energy spectra, kinetic energy, enstrophy, and Reynolds stresses. Our results demonstrate that neural equation solvers can handle complex chaotic systems, opening new possibilities for continuous turbulence modeling that transcends traditional computational limitations.
nan
Article 1058
Title@2025-07-11 (5): ToxBench: A Binding Affinity Prediction Benchmark with AB-FEP-Calculated Labels for Human Estrogen Receptor Alpha
Title: ToxBench: A Binding Affinity Prediction Benchmark with AB-FEP-Calculated Labels for Human Estrogen Receptor Alpha | ToxBench: Ein verbindlicher Affinitätsvorhersage-Benchmark mit AB-FEP-Kalkulierten Etiketten für den menschlichen Östrogenrezeptor Alpha | ToxBonch:与AB-FEP-Calculate的人体雌性激素受体实验室的捆绑性亲同预测基准 2507.08966v1 |
Authors (22): Meng Liu, Karl Leswing, Simon K. S. Chu, Farhad Ramezanghorbani, Griffin Young, Gabriel Marques, Prerna Das, Anjali Panikar, Esther Jamir, Mohammed Sulaiman Shamsudeen, K. Shawn Watts, Ananya Sen, Hari Priya Devannagari, Edward B. Miller, Muyun Lihan, Howook Hwang, Janet Paulsen, Xin Yu, Kyle Gion, Timur Rvachov, Emine Kucukbenli, Saee Gopal Paliwal
Protein-ligand binding affinity prediction is essential for drug discovery and toxicity assessment. While machine learning (ML) promises fast and accurate predictions, its progress is constrained by the availability of reliable data. In contrast, physics-based methods such as absolute binding free energy perturbation (AB-FEP) deliver high accuracy but are computationally prohibitive for high-throughput applications. To bridge this gap, we introduce ToxBench, the first large-scale AB-FEP dataset designed for ML development and focused on a single pharmaceutically critical target, Human Estrogen Receptor Alpha (ER$\alpha$). ToxBench contains 8,770 ER$\alpha$-ligand complex structures with binding free energies computed via AB-FEP with a subset validated against experimental affinities at 1.75 kcal/mol RMSE, along with non-overlapping ligand splits to assess model generalizability. Using ToxBench, we further benchmark state-of-the-art ML methods, and notably, our proposed DualBind model, which employs a dual-loss framework to effectively learn the binding energy function. The benchmark results demonstrate the superior performance of DualBind and the potential of ML to approximate AB-FEP at a fraction of the computational cost.
nan
Article 1059
Title@2025-07-11 (5): Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models
Title: Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models | Theorie-informierte Verbesserungen an klassifikatorfreier Anleitung für diskrete Diffusionsmodelle | 对分辨扩散模型的无分类/无分类指南的理论化改进 2507.08965v1 |
Authors (6): Kevin Rojas, Ye He, Chieh-Hsin Lai, Yuta Takida, Yuki Mitsufuji, Molei Tao
Classifier-Free Guidance (CFG) is a widely used technique for conditional generation and improving sample quality in continuous diffusion models, and recent works have extended it to discrete diffusion. This paper theoretically analyzes CFG in the context of masked discrete diffusion, focusing on the role of guidance schedules. Our analysis shows that high guidance early in sampling (when inputs are heavily masked) harms generation quality, while late-stage guidance has a larger effect. These findings provide a theoretical explanation for empirical observations in recent studies on guidance schedules. The analysis also reveals an imperfection of the current CFG implementations. These implementations can unintentionally cause imbalanced transitions, such as unmasking too rapidly during the early stages of generation, which degrades the quality of the resulting samples. To address this, we draw insight from the analysis and propose a novel classifier-free guidance mechanism empirically applicable to any discrete diffusion. Intuitively, our method smoothens the transport between the data distribution and the initial (masked/uniform) distribution, which results in improved sample quality. Remarkably, our method is achievable via a simple one-line code change. The efficacy of our method is empirically demonstrated with experiments on ImageNet (masked discrete diffusion) and QM9 (uniform discrete diffusion).
nan
Article 1060
Title@2025-07-11 (5): Stochastic Approximation with Block Coordinate Optimal Stepsizes
Title: Stochastic Approximation with Block Coordinate Optimal Stepsizes | Stochastische Annäherung mit Blockkoordinaten Optimale Stufengrößen | 带有块坐标坐标最佳步进的斯托步相近 2507.08963v1 |
Authors (2): Tao Jiang, Lin Xiao
We consider stochastic approximation with block-coordinate stepsizes and propose adaptive stepsize rules that aim to minimize the expected distance from the next iterate to an optimal point. These stepsize rules employ online estimates of the second moment of the search direction along each block coordinate. The popular Adam algorithm can be interpreted as a particular heuristic for such estimation. By leveraging a simple conditional estimator, we derive a new method that obtains comparable performance as Adam but requires less memory and fewer hyper-parameters. We prove that this family of methods converges almost surely to a small neighborhood of the optimal point, and the radius of the neighborhood depends on the bias and variance of the second-moment estimator. Our analysis relies on a simple aiming condition that assumes neither convexity nor smoothness, thus has broad applicability.
nan
Article 1061
Title@2025-07-11 (5): From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis
Title: From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis | Vom Video zum EEG: Anpassung der gemeinsamen Einbettung von vorausschauender Architektur an die Entdeckung visueller Konzepte in der Gehirnsignalanalyse | 从视频到EEG:使联合嵌入的预测结构适应脑信号分析中的不可见视觉概念 2507.03633v4 |
Authors (6): Amirabbas Hojjati, Lu Li, Ibrahim Hameed, Anis Yazidi, Pedro G. Lind, Rabindra Khadka
EEG signals capture brain activity with high temporal and low spatial resolution, supporting applications such as neurological diagnosis, cognitive monitoring, and brain-computer interfaces. However, effective analysis is hindered by limited labeled data, high dimensionality, and the absence of scalable models that fully capture spatiotemporal dependencies. Existing self-supervised learning (SSL) methods often focus on either spatial or temporal features, leading to suboptimal representations. To this end, we propose EEG-VJEPA, a novel adaptation of the Video Joint Embedding Predictive Architecture (V-JEPA) for EEG classification. By treating EEG as video-like sequences, EEG-VJEPA learns semantically meaningful spatiotemporal representations using joint embeddings and adaptive masking. To our knowledge, this is the first work that exploits V-JEPA for EEG classification and explores the visual concepts learned by the model. Evaluations on the publicly available Temple University Hospital (TUH) Abnormal EEG dataset show that EEG-VJEPA outperforms existing state-of-the-art models in classification accuracy. Beyond classification accuracy, EEG-VJEPA captures physiologically relevant spatial and temporal signal patterns, offering interpretable embeddings that may support human-AI collaboration in diagnostic workflows. These findings position EEG-VJEPA as a promising framework for scalable, trustworthy EEG analysis in real-world clinical settings.
nan
Article 1062
Title@2025-07-11 (5): Individual Causal Inference with Structural Causal Model
Title: Individual Causal Inference with Structural Causal Model | Individueller Kausalzusammenhang mit strukturellem Kausalmodell | 与结构因果模型的个体因果推断 2506.17300v2 |
Authors (1): Daniel T. Chang
Individual causal inference (ICI) uses causal inference methods to understand and predict the effects of interventions on individuals, considering their specific characteristics / facts. It aims to estimate individual causal effect (ICE), which varies across individuals. Estimating ICE can be challenging due to the limited data available for individuals, and the fact that most causal inference methods are population-based. Structural Causal Model (SCM) is fundamentally population-based. Therefore, causal discovery (structural learning and parameter learning), association queries and intervention queries are all naturally population-based. However, exogenous variables (U) in SCM can encode individual variations and thus provide the mechanism for individualized population per specific individual characteristics / facts. Based on this, we propose ICI with SCM as a “rung 3” causal inference, because it involves “imagining” what would be the causal effect of a hypothetical intervention on an individual, given the individual’s observed characteristics / facts. Specifically, we propose the indiv-operator, indiv(W), to formalize/represent the population individualization process, and the individual causal query, P(Y | indiv(W), do(X), Z), to formalize/represent ICI. We show and argue that ICI with SCM is inference on individual alternatives (possible), not individual counterfactuals (non-actual). |
nan
Article 1063
Title@2025-07-11 (5): How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs
Title: How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs | Wie man einen Führer ausbildet: Hierarchische Vernunft in multi-agenten LLMs | 如何培训领导者:多机构LLM中的等级原因 2507.08960v1 |
Authors (4): Andrew Estornell, Jean-Francois Ton, Muhammad Faaiz Taufiq, Hang Li
Large Language Models (LLMs) have achieved strong performance on a wide range of complex reasoning tasks, yet further gains are often possible by leveraging the complementary strengths of multiple models. While multi-agent frameworks can improve solution quality by leveraging multiple LLMs, existing methods are often computationally expensive, both at training and inference time. In this work, we introduce a hierarchical multi-agent framework that addresses these challenges by training only a single leader LLM to coordinate a team of untrained peer agents. To this end, we propose Multi-agent guided Leader Policy \textbf{O}ptimization (MLPO), a novel approach which trains the leader to evaluate and synthesize agent responses without auxiliary value networks or explicit agent feedback. Leaders trained with MLPO exhibit improved performance not only when interacting with the agent team at inference time, but also enjoy improved performance when deployed in single-agent settings without the team. Empirical results on Big-Bench Hard (BBH), MATH, and MMLU demonstrate that our framework achieves substantial performance improvements over both single-agent and multi-agent baselines. Our results highlight the effectiveness and efficiency of training a single, flexible leader for collaborative reasoning in multi-agent LLM systems.
nan
Article 1064
Title@2025-07-11 (5): Graph Neural Network Enhanced Sequential Recommendation Method for Cross-Platform Ad Campaign
Title: Graph Neural Network Enhanced Sequential Recommendation Method for Cross-Platform Ad Campaign | Diagramm Neuronales Netzwerk Verbesserte sequentielle Empfehlungsmethode für plattformübergreifende Werbekampagnen | 跨平台运动的神经网络强化序列建议方法 2507.08959v1 |
Authors (3): Xiang Li, Xinyu Wang, Yifan Lin
In order to improve the accuracy of cross-platform advertisement recommendation, a graph neural network (GNN)- based advertisement recommendation method is analyzed. Through multi-dimensional modeling, user behavior data (e.g., click frequency, active duration) reveal temporal patterns of interest evolution, ad content (e.g., type, tag, duration) influences semantic preferences, and platform features (e.g., device type, usage context) shape the environment where interest transitions occur. These factors jointly enable the GNN to capture the latent pathways of user interest migration across platforms. The experimental results are based on the datasets of three platforms, and Platform B reaches 0.937 in AUC value, which is the best performance. Platform A and Platform C showed a slight decrease in precision and recall with uneven distribution of ad labels. By adjusting the hyperparameters such as learning rate, batch size and embedding dimension, the adaptability and robustness of the model in heterogeneous data are further improved.
nan
Article 1065
Title@2025-07-11 (5): Beyond Scores: Proximal Diffusion Models
Title: Beyond Scores: Proximal Diffusion Models | Beyond Scores: Proximale Diffusionsmodelle | 超过分数: 快速扩散模型 2507.08956v1 |
Authors (4): Zhenghan Fang, Mateo Díaz, Sam Buchanan, Jeremias Sulam
Diffusion models have quickly become some of the most popular and powerful generative models for high-dimensional data. The key insight that enabled their development was the realization that access to the score – the gradient of the log-density at different noise levels – allows for sampling from data distributions by solving a reverse-time stochastic differential equation (SDE) via forward discretization, and that popular denoisers allow for unbiased estimators of this score. In this paper, we demonstrate that an alternative, backward discretization of these SDEs, using proximal maps in place of the score, leads to theoretical and practical benefits. We leverage recent results in proximal matching to learn proximal operators of the log-density and, with them, develop Proximal Diffusion Models (ProxDM). Theoretically, we prove that $\widetilde{O}(d/\sqrt{\varepsilon})$ steps suffice for the resulting discretization to generate an $\varepsilon$-accurate distribution w.r.t. the KL divergence. Empirically, we show that two variants of ProxDM achieve significantly faster convergence within just a few sampling steps compared to conventional score-matching methods.
nan
Article 1066
Title@2025-07-11 (5): Spectral Manifold Harmonization for Graph Imbalanced Regression
Title: Spectral Manifold Harmonization for Graph Imbalanced Regression | Spektrale Manifold Harmonisierung für Graph Imbalanced Regression | 图I平衡回归的光谱蒙面协调 2507.01132v2 |
Authors (5): Brenda Nogueira, Gabe Gomes, Meng Jiang, Nitesh V. Chawla, Nuno Moniz
Graph-structured data is ubiquitous in scientific domains, where models often face imbalanced learning settings. In imbalanced regression, domain preferences focus on specific target value ranges that represent the most scientifically valuable cases; however, we observe a significant lack of research regarding this challenge. In this paper, we present Spectral Manifold Harmonization (SMH), a novel approach to address imbalanced regression challenges on graph-structured data by generating synthetic graph samples that preserve topological properties while focusing on the most relevant target distribution regions. Conventional methods fail in this context because they either ignore graph topology in case generation or do not target specific domain ranges, resulting in models biased toward average target values. Experimental results demonstrate the potential of SMH on chemistry and drug discovery benchmark datasets, showing consistent improvements in predictive performance for target domain ranges. Code is available at https://github.com/brendacnogueira/smh-graph-imbalance.git.
nan
Article 1067
Title@2025-07-11 (5): Drowning in Documents: Consequences of Scaling Reranker Inference
Title: Drowning in Documents: Consequences of Scaling Reranker Inference | Ertrinken in Dokumenten: Konsequenzen der Skalierungs-Reranker-Schlussfolgerung | 文件中淹没:扩大重新排序者推断的后果 2411.11767v2 |
Authors (6): Mathew Jacob, Erik Lindgren, Matei Zaharia, Michael Carbin, Omar Khattab, Andrew Drozdov
Rerankers, typically cross-encoders, are computationally intensive but are frequently used because they are widely assumed to outperform cheaper initial IR systems. We challenge this assumption by measuring reranker performance for full retrieval, not just re-scoring first-stage retrieval. To provide a more robust evaluation, we prioritize strong first-stage retrieval using modern dense embeddings and test rerankers on a variety of carefully chosen, challenging tasks, including internally curated datasets to avoid contamination, and out-of-domain ones. Our empirical results reveal a surprising trend: the best existing rerankers provide initial improvements when scoring progressively more documents, but their effectiveness gradually declines and can even degrade quality beyond a certain limit. We hope that our findings will spur future research to improve reranking.
nan
Article 1068
Title@2025-07-11 (5): The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Title: The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability? | Das nicht-lineare Repräsentations-Dilemma: Reicht die Kausale Abstraktion für die mechanistische Interpretationsfähigkeit? | “非碱性代表:因果抽象是否足以进行机械解释?” 2507.08802v1 |
Authors (4): Denis Sutter, Julian Minder, Thomas Hofmann, Tiago Pimentel
The concept of causal abstraction got recently popularised to demystify the opaque decision-making processes of machine learning models; in short, a neural network can be abstracted as a higher-level algorithm if there exists a function which allows us to map between them. Notably, most interpretability papers implement these maps as linear functions, motivated by the linear representation hypothesis: the idea that features are encoded linearly in a model’s representations. However, this linearity constraint is not required by the definition of causal abstraction. In this work, we critically examine the concept of causal abstraction by considering arbitrarily powerful alignment maps. In particular, we prove that under reasonable assumptions, any neural network can be mapped to any algorithm, rendering this unrestricted notion of causal abstraction trivial and uninformative. We complement these theoretical findings with empirical evidence, demonstrating that it is possible to perfectly map models to algorithms even when these models are incapable of solving the actual task; e.g., on an experiment using randomly initialised language models, our alignment maps reach 100% interchange-intervention accuracy on the indirect object identification task. This raises the non-linear representation dilemma: if we lift the linearity constraint imposed to alignment maps in causal abstraction analyses, we are left with no principled way to balance the inherent trade-off between these maps’ complexity and accuracy. Together, these results suggest an answer to our title’s question: causal abstraction is not enough for mechanistic interpretability, as it becomes vacuous without assumptions about how models encode information. Studying the connection between this information-encoding assumption and causal abstraction should lead to exciting future work.
nan
Article 1069
Title@2025-07-11 (5): NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
Title: NeuralOS: Towards Simulating Operating Systems via Neural Generative Models | NeuralOS: Auf dem Weg zur Simulation von Betriebssystemen über neurale Generative Modelle | NeurorOS:通过神经产生模型努力模拟操作系统 2507.08800v1 |
Authors (5): Luke Rivard, Sun Sun, Hongyu Guo, Wenhu Chen, Yuntian Deng
We introduce NeuralOS, a neural framework that simulates graphical user interfaces (GUIs) of operating systems by directly predicting screen frames in response to user inputs such as mouse movements, clicks, and keyboard events. NeuralOS combines a recurrent neural network (RNN), which tracks computer state, with a diffusion-based neural renderer that generates screen images. The model is trained on a large-scale dataset of Ubuntu XFCE recordings, which include both randomly generated interactions and realistic interactions produced by AI agents. Experiments show that NeuralOS successfully renders realistic GUI sequences, accurately captures mouse interactions, and reliably predicts state transitions like application launches. Although modeling fine-grained keyboard interactions precisely remains challenging, NeuralOS offers a step toward creating fully adaptive, generative neural interfaces for future human-computer interaction systems.
nan
Article 1070
Title@2025-07-11 (5): Filter Equivariant Functions: A symmetric account of length-general extrapolation on lists
Title: Filter Equivariant Functions: A symmetric account of length-general extrapolation on lists | Filter Equivariant Funktionen: Eine symmetrische Darstellung der Längen-allgemeinen Extrapolation auf Listen | 过滤器等同函数 : 列表中长度一般外推法的对称账户 2507.08796v1 |
Authors (6): Owen Lewis, Neil Ghani, Andrew Dudzik, Christos Perivolaropoulos, Razvan Pascanu, Petar Veličković
What should a function that extrapolates beyond known input/output examples look like? This is a tricky question to answer in general, as any function matching the outputs on those examples can in principle be a correct extrapolant. We argue that a “good” extrapolant should follow certain kinds of rules, and here we study a particularly appealing criterion for rule-following in list functions: that the function should behave predictably even when certain elements are removed. In functional programming, a standard way to express such removal operations is by using a filter function. Accordingly, our paper introduces a new semantic class of functions – the filter equivariant functions. We show that this class contains interesting examples, prove some basic theorems about it, and relate it to the well-known class of map equivariant functions. We also present a geometric account of filter equivariants, showing how they correspond naturally to certain simplicial structures. Our highlight result is the amalgamation algorithm, which constructs any filter-equivariant function’s output by first studying how it behaves on sublists of the input, in a way that extrapolates perfectly.
nan
Article 1071
Title@2025-07-11 (5): One Token to Fool LLM-as-a-Judge
Title: One Token to Fool LLM-as-a-Judge | Ein Token zum Narren LLM-as-a-Richter | 愚人一拳LLM -A法官 2507.08794v1 |
Authors (6): Yulai Zhao, Haolin Liu, Dian Yu, S. Y. Kung, Haitao Mi, Dong Yu
Generative reward models (also known as LLMs-as-judges), which use large language models (LLMs) to evaluate answer quality, are increasingly adopted in reinforcement learning with verifiable rewards (RLVR). They are often preferred over rigid rule-based metrics, especially for complex reasoning tasks involving free-form outputs. In this paradigm, an LLM is typically prompted to compare a candidate answer against a ground-truth reference and assign a binary reward indicating correctness. Despite the seeming simplicity of this comparison task, we find that generative reward models exhibit surprising vulnerabilities to superficial manipulations: non-word symbols (e.g., “:” or “.”) or reasoning openers like “Thought process:” and “Let’s solve this problem step by step.” can often lead to false positive rewards. We demonstrate that this weakness is widespread across LLMs, datasets, and prompt formats, posing a serious threat for core algorithmic paradigms that rely on generative reward models, such as rejection sampling, preference optimization, and RLVR. To mitigate this issue, we introduce a simple yet effective data augmentation strategy and train a new generative reward model with substantially improved robustness. Our findings highlight the urgent need for more reliable LLM-based evaluation methods. We release our robust, general-domain reward model and its synthetic training data at https://huggingface.co/sarosavo/Master-RM and https://huggingface.co/datasets/sarosavo/Master-RM.
nan
Article 1072
Title@2025-07-11 (5): Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning
Title: Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning | Optimistische Exploration für risikoabhängiges verstärktes Lernen | 最佳探索,以进行风险与风险相关的强化学习 2507.08793v1 |
Authors (4): James McCarthy, Radu Marinescu, Elizabeth Daly, Ivana Dusparic
Risk-averse Constrained Reinforcement Learning (RaCRL) aims to learn policies that minimise the likelihood of rare and catastrophic constraint violations caused by an environment’s inherent randomness. In general, risk-aversion leads to conservative exploration of the environment which typically results in converging to sub-optimal policies that fail to adequately maximise reward or, in some cases, fail to achieve the goal. In this paper, we propose an exploration-based approach for RaCRL called Optimistic Risk-averse Actor Critic (ORAC), which constructs an exploratory policy by maximising a local upper confidence bound of the state-action reward value function whilst minimising a local lower confidence bound of the risk-averse state-action cost value function. Specifically, at each step, the weighting assigned to the cost value is increased or decreased if it exceeds or falls below the safety constraint value. This way the policy is encouraged to explore uncertain regions of the environment to discover high reward states whilst still satisfying the safety constraints. Our experimental results demonstrate that the ORAC approach prevents convergence to sub-optimal policies and improves significantly the reward-cost trade-off in various continuous control tasks such as Safety-Gymnasium and a complex building energy management environment CityLearn.
nan
Article 1073
Title@2025-07-11 (5): MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation
Title: MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation | MH-FSF: Ein einheitliches Framework zur Überwindung von Benchmarking und Reproduzierbarkeitsbeschränkungen in der Feature Selection Evaluation | MH-FSF:在地物选择评价中克服基准设定和可复制限制的统一框架 2507.10591v1 |
Authors (5): Vanderson Rocha, Diego Kreutz, Gabriel Canto, Hendrio Bragança, Eduardo Feitosa
Feature selection is vital for building effective predictive models, as it reduces dimensionality and emphasizes key features. However, current research often suffers from limited benchmarking and reliance on proprietary datasets. This severely hinders reproducibility and can negatively impact overall performance. To address these limitations, we introduce the MH-FSF framework, a comprehensive, modular, and extensible platform designed to facilitate the reproduction and implementation of feature selection methods. Developed through collaborative research, MH-FSF provides implementations of 17 methods (11 classical, 6 domain-specific) and enables systematic evaluation on 10 publicly available Android malware datasets. Our results reveal performance variations across both balanced and imbalanced datasets, highlighting the critical need for data preprocessing and selection criteria that account for these asymmetries. We demonstrate the importance of a unified platform for comparing diverse feature selection techniques, fostering methodological consistency and rigor. By providing this framework, we aim to significantly broaden the existing literature and pave the way for new research directions in feature selection, particularly within the context of Android malware detection.
nan
Article 1074
Title@2025-07-11 (5): Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks
Title: Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks | Lernen-unterstützte Bigraph Matching Ansatz zur Multi-Crew Wiederherstellung beschädigter Stromnetze mit Straßentransport-Netzwerke gekoppelt | 与公路运输网相结合的多组恢复受损电力网的学习辅助活书匹配方法 2506.19703v2 |
Authors (5): Nathan Maurer, Harshal Kaushik, Roshni Anna Jacob, Jie Zhang, Souma Chowdhury
The resilience of critical infrastructure networks (CINs) after disruptions, such as those caused by natural hazards, depends on both the speed of restoration and the extent to which operational functionality can be regained. Allocating resources for restoration is a combinatorial optimal planning problem that involves determining which crews will repair specific network nodes and in what order. This paper presents a novel graph-based formulation that merges two interconnected graphs, representing crew and transportation nodes and power grid nodes, into a single heterogeneous graph. To enable efficient planning, graph reinforcement learning (GRL) is integrated with bigraph matching. GRL is utilized to design the incentive function for assigning crews to repair tasks based on the graph-abstracted state of the environment, ensuring generalization across damage scenarios. Two learning techniques are employed: a graph neural network trained using Proximal Policy Optimization and another trained via Neuroevolution. The learned incentive functions inform a bipartite graph that links crews to repair tasks, enabling weighted maximum matching for crew-to-task allocations. An efficient simulation environment that pre-computes optimal node-to-node path plans is used to train the proposed restoration planning methods. An IEEE 8500-bus power distribution test network coupled with a 21 square km transportation network is used as the case study, with scenarios varying in terms of numbers of damaged nodes, depots, and crews. Results demonstrate the approach’s generalizability and scalability across scenarios, with learned policies providing 3-fold better performance than random policies, while also outperforming optimization-based solutions in both computation time (by several orders of magnitude) and power restored.
nan
Article 1075
Title@2025-07-11 (5): Exploring Efficient Quantification of Modeling Uncertainties with Differentiable Physics-Informed Machine Learning Architectures
Title: Exploring Efficient Quantification of Modeling Uncertainties with Differentiable Physics-Informed Machine Learning Architectures | Effiziente Quantifizierung von Modellierungsunsicherheiten mit differenzierten physikinformierten Machine Learning-Architekturen | 探索对以不同物理和机械化学习架构建模的不确定性模型化进行高效率的量化 2506.18247v2 |
Authors (5): Manaswin Oddiraju, Bharath Varma Penumatsa, Divyang Amin, Michael Piedmonte, Souma Chowdhury
Quantifying and propagating modeling uncertainties is crucial for reliability analysis, robust optimization, and other model-based algorithmic processes in engineering design and control. Now, physics-informed machine learning (PIML) methods have emerged in recent years as a new alternative to traditional computational modeling and surrogate modeling methods, offering a balance between computing efficiency, modeling accuracy, and interpretability. However, their ability to predict and propagate modeling uncertainties remains mostly unexplored. In this paper, a promising class of auto-differentiable hybrid PIML architectures that combine partial physics and neural networks or ANNs (for input transformation or adaptive parameter estimation) is integrated with Bayesian Neural networks (replacing the ANNs); this is done with the goal to explore whether BNNs can successfully provision uncertainty propagation capabilities in the PIML architectures as well, further supported by the auto-differentiability of these architectures. A two-stage training process is used to alleviate the challenges traditionally encountered in training probabilistic ML models. The resulting BNN-integrated PIML architecture is evaluated on an analytical benchmark problem and flight experiments data for a fixed-wing RC aircraft, with prediction performance observed to be slightly worse or at par with purely data-driven ML and original PIML models. Moreover, Monte Carlo sampling of probabilistic BNN weights was found to be most effective in propagating uncertainty in the BNN-integrated PIML architectures.
nan
Article 1076
Title@2025-07-11 (5): Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees
Title: Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees | Greedy Low-Rank Gradient Compression für verteiltes Lernen mit Konvergenzgarantien | 利用聚合担保分配学习的贪婪低频梯度压缩 2507.08784v1 |
Authors (5): Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, Kun Yuan
Distributed optimization is pivotal for large-scale signal processing and machine learning, yet communication overhead remains a major bottleneck. Low-rank gradient compression, in which the transmitted gradients are approximated by low-rank matrices to reduce communication, offers a promising remedy. Existing methods typically adopt either randomized or greedy compression strategies: randomized approaches project gradients onto randomly chosen subspaces, introducing high variance and degrading empirical performance; greedy methods select the most informative subspaces, achieving strong empirical results but lacking convergence guarantees. To address this gap, we propose GreedyLore–the first Greedy Low-Rank gradient compression algorithm for distributed learning with rigorous convergence guarantees. GreedyLore incorporates error feedback to correct the bias introduced by greedy compression and introduces a semi-lazy subspace update that ensures the compression operator remains contractive throughout all iterations. With these techniques, we prove that GreedyLore achieves a convergence rate of $\mathcal{O}(\sigma/\sqrt{NT} + 1/T)$ under standard optimizers such as MSGD and Adam–marking the first linear speedup convergence rate for low-rank gradient compression. Extensive experiments are conducted to validate our theoretical findings.
nan
Article 1077
Title@2025-07-11 (5): Predicting Barge Presence and Quantity on Inland Waterways using Vessel Tracking Data: A Machine Learning Approach
Title: Predicting Barge Presence and Quantity on Inland Waterways using Vessel Tracking Data: A Machine Learning Approach | Vorhersagen von Barge Präsenz und Menge auf Binnenwasserstraßen mit Vessel Tracking Daten: Ein Ansatz zum maschinellen Lernen | 利用船舶跟踪数据预测内陆水道的内河水道存在和数量:机械学习方法 2501.00615v2 |
Authors (5): Geoffery Agorku, Sarah Hernandez, Maria Falquez, Subhadipto Poddar, Shihao Pang
This study presents a machine learning approach to predict the number of barges transported by vessels on inland waterways using tracking data from the Automatic Identification System (AIS). While AIS tracks the location of tug and tow vessels, it does not monitor the presence or number of barges transported by those vessels. Understanding the number and types of barges conveyed along river segments, between ports, and at ports is crucial for estimating the quantities of freight transported on the nation’s waterways. This insight is also valuable for waterway management and infrastructure operations impacting areas such as targeted dredging operations, and data-driven resource allocation. Labeled sample data was generated using observations from traffic cameras located along key river segments and matched to AIS data records. A sample of 164 vessels representing up to 42 barge convoys per vessel was used for model development. The methodology involved first predicting barge presence and then predicting barge quantity. Features derived from the AIS data included speed measures, vessel characteristics, turning measures, and interaction terms. For predicting barge presence, the AdaBoost model achieved an F1 score of 0.932. For predicting barge quantity, the Random Forest combined with an AdaBoost ensemble model achieved an F1 score of 0.886. Bayesian optimization was used for hyperparameter tuning. By advancing predictive modeling for inland waterways, this study offers valuable insights for transportation planners and organizations, which require detailed knowledge of traffic volumes, including the flow of commodities, their destinations, and the tonnage moving in and out of ports.
nan
Article 1078
Title@2025-07-11 (5): BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
Title: BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity | BlockFFN: Auf dem Weg zur End-Side Acceleration-Friendly Mixture-of-Experts mit Chunk-Level-Aktivierung Sparsity | 块块FFN: 向具有整块级激活分级的 终端- 双极加速- 友好混合混合专家方向 2507.08771v1 |
Authors (8): Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Yuxuan Li, Zhiyuan Liu, Maosong Sun
To alleviate the computational burden of large language models (LLMs), architectures with activation sparsity, represented by mixture-of-experts (MoE), have attracted increasing attention. However, the non-differentiable and inflexible routing of vanilla MoE hurts model performance. Moreover, while each token activates only a few parameters, these sparsely-activated architectures exhibit low chunk-level sparsity, indicating that the union of multiple consecutive tokens activates a large ratio of parameters. Such a sparsity pattern is unfriendly for acceleration under low-resource conditions (e.g., end-side devices) and incompatible with mainstream acceleration techniques (e.g., speculative decoding). To address these challenges, we introduce a novel MoE architecture, BlockFFN, as well as its efficient training and deployment techniques. Specifically, we use a router integrating ReLU activation and RMSNorm for differentiable and flexible routing. Next, to promote both token-level sparsity (TLS) and chunk-level sparsity (CLS), CLS-aware training objectives are designed, making BlockFFN more acceleration-friendly. Finally, we implement efficient acceleration kernels, combining activation sparsity and speculative decoding for the first time. The experimental results demonstrate the superior performance of BlockFFN over other MoE baselines, achieving over 80% TLS and 70% 8-token CLS. Our kernels achieve up to 3.67$\times$ speedup on real end-side devices than dense models. All codes and checkpoints are available publicly (https://github.com/thunlp/BlockFFN).
nan
Article 1079
Title@2025-07-11 (5): A Hybrid Multi-Well Hopfield-CNN with Feature Extraction and K-Means for MNIST Classification
Title: A Hybrid Multi-Well Hopfield-CNN with Feature Extraction and K-Means for MNIST Classification | Hybrides Multiwell-Hopfield-CNN mit Feature Extraction und K-Means für die MNIST-Klassifikation | 多功能Hopfield-CNN混合型多功能井-CNN,具有用于MNIST分类的地貌采掘和K-MISM-Means 2507.08766v1 |
Authors (1): Ahmed Farooq
This study presents a hybrid model for classifying handwritten digits in the MNIST dataset, combining convolutional neural networks (CNNs) with a multi-well Hopfield network. The approach employs a CNN to extract high-dimensional features from input images, which are then clustered into class-specific prototypes using k-means clustering. These prototypes serve as attractors in a multi-well energy landscape, where a Hopfield network performs classification by minimizing an energy function that balances feature similarity and class assignment.The model’s design enables robust handling of intraclass variability, such as diverse handwriting styles, while providing an interpretable framework through its energy-based decision process. Through systematic optimization of the CNN architecture and the number of wells, the model achieves a high test accuracy of 99.2% on 10,000 MNIST images, demonstrating its effectiveness for image classification tasks. The findings highlight the critical role of deep feature extraction and sufficient prototype coverage in achieving high performance, with potential for broader applications in pattern recognition.
nan
Article 1080
Title@2025-07-11 (5): Local Flow Matching Generative Models
Title: Local Flow Matching Generative Models | Lokale Flow-Matching Generative Modelle | 本地流程匹配生成模型 2410.02548v3 |
Authors (3): Chen Xu, Xiuyuan Cheng, Yao Xie
Flow Matching (FM) is a simulation-free method for learning a continuous and invertible flow to interpolate between two distributions, and in particular to generate data from noise. Inspired by the variational nature of the diffusion process as a gradient flow, we introduce a stepwise FM model called Local Flow Matching (LFM), which consecutively learns a sequence of FM sub-models, each matching a diffusion process up to the time of the step size in the data-to-noise direction. In each step, the two distributions to be interpolated by the sub-flow model are closer to each other than data vs. noise, and this enables the use of smaller models with faster training. This variational perspective also allows us to theoretically prove a generation guarantee of the proposed flow model in terms of the $\chi^2$-divergence between the generated and true data distributions, utilizing the contraction property of the diffusion process. In practice, the stepwise structure of LFM is natural to be distilled and different distillation techniques can be adopted to speed up generation. We empirically demonstrate improved training efficiency and competitive generative performance of LFM compared to FM on the unconditional generation of tabular data and image datasets, and also on the conditional generation of robotic manipulation policies.
nan
Article 1081
Title@2025-07-11 (5): The Bayesian Approach to Continual Learning: An Overview
Title: The Bayesian Approach to Continual Learning: An Overview | Der Bayesische Ansatz zum kontinuierlichen Lernen: Ein Überblick | Bayesian 持续学习方法:概览 2507.08922v1 |
Authors (1): Tameem Adel
Continual learning is an online paradigm where a learner continually accumulates knowledge from different tasks encountered over sequential time steps. Importantly, the learner is required to extend and update its knowledge without forgetting about the learning experience acquired from the past, and while avoiding the need to retrain from scratch. Given its sequential nature and its resemblance to the way humans think, continual learning offers an opportunity to address several challenges which currently stand in the way of widening the range of applicability of deep models to further real-world problems. The continual need to update the learner with data arriving sequentially strikes inherent congruence between continual learning and Bayesian inference which provides a principal platform to keep updating the prior beliefs of a model given new data, without completely forgetting the knowledge acquired from the old data. This survey inspects different settings of Bayesian continual learning, namely task-incremental learning and class-incremental learning. We begin by discussing definitions of continual learning along with its Bayesian setting, as well as the links with related fields, such as domain adaptation, transfer learning and meta-learning. Afterwards, we introduce a taxonomy offering a comprehensive categorization of algorithms belonging to the Bayesian continual learning paradigm. Meanwhile, we analyze the state-of-the-art while zooming in on some of the most prominent Bayesian continual learning algorithms to date. Furthermore, we shed some light on links between continual learning and developmental psychology, and correspondingly introduce analogies between both fields. We follow that with a discussion of current challenges, and finally conclude with potential areas for future research on Bayesian continual learning.
nan
Article 1082
Title@2025-07-11 (5): Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data
Title: Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data | Straffung undurchführbarer Aktionen und Belohnungsskalierung im Ausbau des Lernens mit Offline-Daten | 利用离线数据在加强学习中处罚不可行的行动和奖励措施 2507.08761v1 |
Authors (8): Jeonghye Kim, Yongjae Shin, Whiyoung Jung, Sunghoon Hong, Deunsol Yoon, Youngchul Sung, Kanghoon Lee, Woohyung Lim
Reinforcement learning with offline data suffers from Q-value extrapolation errors. To address this issue, we first demonstrate that linear extrapolation of the Q-function beyond the data range is particularly problematic. To mitigate this, we propose guiding the gradual decrease of Q-values outside the data range, which is achieved through reward scaling with layer normalization (RS-LN) and a penalization mechanism for infeasible actions (PA). By combining RS-LN and PA, we develop a new algorithm called PARS. We evaluate PARS across a range of tasks, demonstrating superior performance compared to state-of-the-art algorithms in both offline training and online fine-tuning on the D4RL benchmark, with notable success in the challenging AntMaze Ultra task.
nan
Article 1083
Title@2025-07-11 (5): The Value of Prediction in Identifying the Worst-Off
Title: The Value of Prediction in Identifying the Worst-Off | Der Wert der Vorhersage bei der Identifizierung des Schlimmsten | 预测在查明最有害的 2501.19334v3 |
Authors (3): Unai Fischer-Abaigar, Christoph Kern, Juan Carlos Perdomo
Machine learning is increasingly used in government programs to identify and support the most vulnerable individuals, prioritizing assistance for those at greatest risk over optimizing aggregate outcomes. This paper examines the welfare impacts of prediction in equity-driven contexts, and how they compare to other policy levers, such as expanding bureaucratic capacity. Through mathematical models and a real-world case study on long-term unemployment amongst German residents, we develop a comprehensive understanding of the relative effectiveness of prediction in surfacing the worst-off. Our findings provide clear analytical frameworks and practical, data-driven tools that empower policymakers to make principled decisions when designing these systems.
nan
Article 1084
Title@2025-07-11 (5): ML-Based Automata Simplification for Symbolic Accelerators
Title: ML-Based Automata Simplification for Symbolic Accelerators | ML-basierte Automata-Vereinfachung für symbolische Beschleuniger | ML 符号加速器的基于 ML 的自动数据简化 2507.08751v1 |
Authors (4): Tiffany Yu, Rye Stahle-Smith, Darssan Eswaramoorthi, Rasha Karakchi
Symbolic accelerators are increasingly used for symbolic data processing in domains such as genomics, NLP, and cybersecurity. However, these accelerators face scalability issues due to excessive memory use and routing complexity, especially when targeting a large set. We present AutoSlim, a machine learning-based graph simplification framework designed to reduce the complexity of symbolic accelerators built on Non-deterministic Finite Automata (NFA) deployed on FPGA-based overlays such as NAPOLY+. AutoSlim uses Random Forest classification to prune low-impact transitions based on edge scores and structural features, significantly reducing automata graph density while preserving semantic correctness. Unlike prior tools, AutoSlim targets automated score-aware simplification with weighted transitions, enabling efficient ranking-based sequence analysis. We evaluated data sets (1K to 64K nodes) in NAPOLY+ and conducted performance measurements including latency, throughput, and resource usage. AutoSlim achieves up to 40 percent reduction in FPGA LUTs and over 30 percent pruning in transitions, while scaling to graphs an order of magnitude larger than existing benchmarks. Our results also demonstrate how hardware interconnection (fanout) heavily influences hardware cost and that AutoSlim’s pruning mitigates resource blowup.
nan
Article 1085
Title@2025-07-11 (5): Modeling Partially Observed Nonlinear Dynamical Systems and Efficient Data Assimilation via Discrete-Time Conditional Gaussian Koopman Network
Title: Modeling Partially Observed Nonlinear Dynamical Systems and Efficient Data Assimilation via Discrete-Time Conditional Gaussian Koopman Network | Modellierung teilweise beobachtete nichtlineare dynamische Systeme und effiziente Datenassimilation über diskret-zeitbedingtes Gaußian Koopman Network | 通过分立时间条件性高斯扬库普曼网络模拟部分观测的非线性非线性动态系统和有效的数据同化 2507.08749v1 |
Authors (4): Chuanqi Chen, Zhongrui Wang, Nan Chen, Jin-Long Wu
A discrete-time conditional Gaussian Koopman network (CGKN) is developed in this work to learn surrogate models that can perform efficient state forecast and data assimilation (DA) for high-dimensional complex dynamical systems, e.g., systems governed by nonlinear partial differential equations (PDEs). Focusing on nonlinear partially observed systems that are common in many engineering and earth science applications, this work exploits Koopman embedding to discover a proper latent representation of the unobserved system states, such that the dynamics of the latent states are conditional linear, i.e., linear with the given observed system states. The modeled system of the observed and latent states then becomes a conditional Gaussian system, for which the posterior distribution of the latent states is Gaussian and can be efficiently evaluated via analytical formulae. The analytical formulae of DA facilitate the incorporation of DA performance into the learning process of the modeled system, which leads to a framework that unifies scientific machine learning (SciML) and data assimilation. The performance of discrete-time CGKN is demonstrated on several canonical problems governed by nonlinear PDEs with intermittency and turbulent features, including the viscous Burgers’ equation, the Kuramoto-Sivashinsky equation, and the 2-D Navier-Stokes equations, with which we show that the discrete-time CGKN framework achieves comparable performance as the state-of-the-art SciML methods in state forecast and provides efficient and accurate DA results. The discrete-time CGKN framework also serves as an example to illustrate unifying the development of SciML models and their other outer-loop applications such as design optimization, inverse problems, and optimal control.
nan
Article 1086
Title@2025-07-11 (5): Partitioned Hybrid Quantum Fourier Neural Operators for Scientific Quantum Machine Learning
Title: Partitioned Hybrid Quantum Fourier Neural Operators for Scientific Quantum Machine Learning | Partitionierte Hybrid-Quantum Fourier-Neural-Betreiber für das wissenschaftliche Quantenmaschinenlernen | 用于科学量子机器学习的四级神经操作员 2507.08746v1 |
Authors (5): Paolo Marcandelli, Yuanchun He, Stefano Mariani, Martina Siena, Stefano Markidis
We introduce the Partitioned Hybrid Quantum Fourier Neural Operator (PHQFNO), a generalization of the Quantum Fourier Neural Operator (QFNO) for scientific machine learning. PHQFNO partitions the Fourier operator computation across classical and quantum resources, enabling tunable quantum-classical hybridization and distributed execution across quantum and classical devices. The method extends QFNOs to higher dimensions and incorporates a message-passing framework to distribute data across different partitions. Input data are encoded into quantum states using unary encoding, and quantum circuit parameters are optimized using a variational scheme. We implement PHQFNO using PennyLane with PyTorch integration and evaluate it on Burgers’ equation, incompressible and compressible Navier-Stokes equations. We show that PHQFNO recovers classical FNO accuracy. On incompressible Navier-Stokes, PHQFNO achieves higher accuracy than its classical counterparts. Finally, we perform a sensitivity analysis under input noise, confirming improved stability of PHQFNO over classical baselines.
nan
Article 1087
Title@2025-07-11 (5): Hashing for Fast Pattern Set Selection
Title: Hashing for Fast Pattern Set Selection | Hashing für schnelle Muster Set Auswahl | 仓促快速模式集选择 2507.08745v1 |
Authors (2): Maiju Karjalainen, Pauli Miettinen
Pattern set mining, which is the task of finding a good set of patterns instead of all patterns, is a fundamental problem in data mining. Many different definitions of what constitutes a good set have been proposed in recent years. In this paper, we consider the reconstruction error as a proxy measure for the goodness of the set, and concentrate on the adjacent problem of how to find a good set efficiently. We propose a method based on bottom-k hashing for efficiently selecting the set and extend the method for the common case where the patterns might only appear in approximate form in the data. Our approach has applications in tiling databases, Boolean matrix factorization, and redescription mining, among others. We show that our hashing-based approach is significantly faster than the standard greedy algorithm while obtaining almost equally good results in both synthetic and real-world data sets.
nan
Article 1088
Title@2025-07-11 (5): Discovering Algorithms with Computational Language Processing
Title: Discovering Algorithms with Computational Language Processing | Algorithmen mit numerischer Sprachverarbeitung entdecken | 使用计算语言语言处理发现算法 2507.03190v2 |
Authors (4): Theo Bourdais, Abeynaya Gnanasekaran, Houman Owhadi, Tuhin Sahai
Algorithms are the engine for reproducible problem-solving. We present a framework automating algorithm discovery by conceptualizing them as sequences of operations, represented as tokens. These computational tokens are chained using a grammar, enabling the formation of increasingly sophisticated procedures. Our ensemble Monte Carlo tree search (MCTS) guided by reinforcement learning (RL) explores token chaining and drives the creation of new tokens. This methodology rediscovers, improves, and generates new algorithms that substantially outperform existing methods for strongly NP-hard combinatorial optimization problems and foundational quantum computing approaches such as Grover’s and Quantum Approximate Optimization Algorithm. Operating at the computational rather than code-generation level, our framework produces algorithms that can be tailored specifically to problem instances, not merely classes.
nan
Article 1089
Title@2025-07-11 (5): Adaptive Nonlinear Vector Autoregression: Robust Forecasting for Noisy Chaotic Time Series
Title: Adaptive Nonlinear Vector Autoregression: Robust Forecasting for Noisy Chaotic Time Series | Adaptive nichtlineare Vektor-Autoregression: Robuste Prognose für lärmende Chaotische Zeitreihen | 非线性适应性非线性矢量自动递减: 噪声拖拉时间序列的强力预报 2507.08738v1 |
Authors (5): Azimov Sherkhon, Susana Lopez-Moreno, Eric Dolores-Cuenca, Sieun Lee, Sangil Kim
Nonlinear vector autoregression (NVAR) and reservoir computing (RC) have shown promise in forecasting chaotic dynamical systems, such as the Lorenz-63 model and El Nino-Southern Oscillation. However, their reliance on fixed nonlinearities - polynomial expansions in NVAR or random feature maps in RC - limits their adaptability to high noise or real-world data. These methods also scale poorly in high-dimensional settings due to costly matrix inversion during readout computation. We propose an adaptive NVAR model that combines delay-embedded linear inputs with features generated by a shallow, learnable multi-layer perceptron (MLP). The MLP and linear readout are jointly trained using gradient-based optimization, enabling the model to learn data-driven nonlinearities while preserving a simple readout structure. Unlike standard NVAR, our approach avoids the need for an exhaustive and sensitive grid search over ridge and delay parameters. Instead, tuning is restricted to neural network hyperparameters, improving scalability. Initial experiments on chaotic systems tested under noise-free and synthetically noisy conditions showed that the adaptive model outperformed the standard NVAR in predictive accuracy and showed robust forecasting under noisy conditions with a lower observation frequency.
nan
Article 1090
Title@2025-07-11 (5): Catastrophic Forgetting Mitigation Through Plateau Phase Activity Profiling
Title: Catastrophic Forgetting Mitigation Through Plateau Phase Activity Profiling | Katastrophisches Vergessen der Milderung durch Plateau-Phasen-Aktivität Profiling | 通过高原阶段活动分析,通过高原阶段减轻灾难 2507.08736v1 |
Authors (3): Idan Mashiach, Oren Glickman, Tom Tirer
Catastrophic forgetting in deep neural networks occurs when learning new tasks degrades performance on previously learned tasks due to knowledge overwriting. Among the approaches to mitigate this issue, regularization techniques aim to identify and constrain “important” parameters to preserve previous knowledge. In the highly nonconvex optimization landscape of deep learning, we propose a novel perspective: tracking parameters during the final training plateau is more effective than monitoring them throughout the entire training process. We argue that parameters that exhibit higher activity (movement and variability) during this plateau reveal directions in the loss landscape that are relatively flat, making them suitable for adaptation to new tasks while preserving knowledge from previous ones. Our comprehensive experiments demonstrate that this approach achieves superior performance in balancing catastrophic forgetting mitigation with strong performance on newly learned tasks.
nan
Article 1091
Title@2025-07-11 (5): Bias-Aware Mislabeling Detection via Decoupled Confident Learning
Title: Bias-Aware Mislabeling Detection via Decoupled Confident Learning | Bias-Aware-Mislabeling-Erkennung durch entkoppeltes vertrauensvolles Lernen | 通过解开信任学习解开错误标签检测 2507.07216v2 |
Authors (3): Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky
Reliable data is a cornerstone of modern organizational systems. A notable data integrity challenge stems from label bias, which refers to systematic errors in a label, a covariate that is central to a quantitative analysis, such that its quality differs across social groups. This type of bias has been conceptually and empirically explored and is widely recognized as a pressing issue across critical domains. However, effective methodologies for addressing it remain scarce. In this work, we propose Decoupled Confident Learning (DeCoLe), a principled machine learning based framework specifically designed to detect mislabeled instances in datasets affected by label bias, enabling bias aware mislabelling detection and facilitating data quality improvement. We theoretically justify the effectiveness of DeCoLe and evaluate its performance in the impactful context of hate speech detection, a domain where label bias is a well documented challenge. Empirical results demonstrate that DeCoLe excels at bias aware mislabeling detection, consistently outperforming alternative approaches for label error detection. Our work identifies and addresses the challenge of bias aware mislabeling detection and offers guidance on how DeCoLe can be integrated into organizational data management practices as a powerful tool to enhance data reliability.
nan
Article 1092
Title@2025-07-11 (5): Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Title: Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks | Alternierende Gradientenströme: Eine Theorie des Feature-Lernens in zweischichtigen Neuronalen Netzwerken | 交错的渐变流:两层神经网络中的特色学习理论 2506.06489v2 |
Authors (8): Daniel Kunin, Giovanni Luca Marchetti, Feng Chen, Dhruva Karkada, James B. Simon, Michael R. DeWeese, Surya Ganguli, Nina Miolane
What features neural networks learn, and how, remains an open question. In this paper, we introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of feature learning in two-layer networks trained from small initialization. Prior works have shown that gradient flow in this regime exhibits a staircase-like loss curve, alternating between plateaus where neurons slowly align to useful directions and sharp drops where neurons rapidly grow in norm. AGF approximates this behavior as an alternating two-step process: maximizing a utility function over dormant neurons and minimizing a cost function over active ones. AGF begins with all neurons dormant. At each round, a dormant neuron activates, triggering the acquisition of a feature and a drop in the loss. AGF quantifies the order, timing, and magnitude of these drops, matching experiments across architectures. We show that AGF unifies and extends existing saddle-to-saddle analyses in fully connected linear networks and attention-only linear transformers, where the learned features are singular modes and principal components, respectively. In diagonal linear networks, we prove AGF converges to gradient flow in the limit of vanishing initialization. Applying AGF to quadratic networks trained to perform modular addition, we give the first complete characterization of the training dynamics, revealing that networks learn Fourier features in decreasing order of coefficient magnitude. Altogether, AGF offers a promising step towards understanding feature learning in neural networks.
nan
Article 1093
Title@2025-07-11 (5): Monitoring Risks in Test-Time Adaptation
Title: Monitoring Risks in Test-Time Adaptation | Überwachung von Risiken bei der Anpassung an die Testzeit | 监测试验时间适应中的风险 2507.08721v1 |
Authors (4): Mona Schirmer, Metod Jazbec, Christian A. Naesseth, Eric Nalisnick
Encountering shifted data at test time is a ubiquitous challenge when deploying predictive models. Test-time adaptation (TTA) methods address this issue by continuously adapting a deployed model using only unlabeled test data. While TTA can extend the model’s lifespan, it is only a temporary solution. Eventually the model might degrade to the point that it must be taken offline and retrained. To detect such points of ultimate failure, we propose pairing TTA with risk monitoring frameworks that track predictive performance and raise alerts when predefined performance criteria are violated. Specifically, we extend existing monitoring tools based on sequential testing with confidence sequences to accommodate scenarios in which the model is updated at test time and no test labels are available to estimate the performance metrics of interest. Our extensions unlock the application of rigorous statistical risk monitoring to TTA, and we demonstrate the effectiveness of our proposed TTA monitoring framework across a representative set of datasets, distribution shift types, and TTA methods.
nan
Article 1094
Title@2025-07-11 (5): On the Effect of Regularization in Policy Mirror Descent
Title: On the Effect of Regularization in Policy Mirror Descent | Auf die Auswirkungen der Regularisierung im politischen Spiegelabbruch | 对政策从属来源正规化的影响的影响 2507.08718v1 |
Authors (3): Jan Felix Kleuker, Aske Plaat, Thomas Moerland
Policy Mirror Descent (PMD) has emerged as a unifying framework in reinforcement learning (RL) by linking policy gradient methods with a first-order optimization method known as mirror descent. At its core, PMD incorporates two key regularization components: (i) a distance term that enforces a trust region for stable policy updates and (ii) an MDP regularizer that augments the reward function to promote structure and robustness. While PMD has been extensively studied in theory, empirical investigations remain scarce. This work provides a large-scale empirical analysis of the interplay between these two regularization techniques, running over 500k training seeds on small RL environments. Our results demonstrate that, although the two regularizers can partially substitute each other, their precise combination is critical for achieving robust performance. These findings highlight the potential for advancing research on more robust algorithms in RL, particularly with respect to hyperparameter sensitivity.
nan
Article 1095
Title@2025-07-11 (5): On learning functions over biological sequence space: relating Gaussian process priors, regularization, and gauge fixing
Title: On learning functions over biological sequence space: relating Gaussian process priors, regularization, and gauge fixing | Auf Lernfunktionen über biologischen Sequenzraum: Gaußsche Prozessvorhersage, Regularisierung und Messwertfixierung | 生物序列空间学习功能方面的学习功能:与高斯进程前期、正规化和测量确定有关 2504.19034v2 |
Authors (5): Samantha Petti, Carlos Martí-Gómez, Justin B. Kinney, Juannan Zhou, David M. McCandlish
Mappings from biological sequences (DNA, RNA, protein) to quantitative measures of sequence functionality play an important role in contemporary biology. We are interested in the related tasks of (i) inferring predictive sequence-to-function maps and (ii) decomposing sequence-function maps to elucidate the contributions of individual subsequences. Because each sequence-function map can be written as a weighted sum over subsequences in multiple ways, meaningfully interpreting these weights requires “gauge-fixing,” i.e., defining a unique representation for each map. Recent work has established that most existing gauge-fixed representations arise as the unique solutions to $L_2$-regularized regression in an overparameterized “weight space” where the choice of regularizer defines the gauge. Here, we establish the relationship between regularized regression in overparameterized weight space and Gaussian process approaches that operate in “function space,” i.e. the space of all real-valued functions on a finite set of sequences. We disentangle how weight space regularizers both impose an implicit prior on the learned function and restrict the optimal weights to a particular gauge. We also show how to construct regularizers that correspond to arbitrary explicit Gaussian process priors combined with a wide variety of gauges. Next, we derive the distribution of gauge-fixed weights implied by the Gaussian process posterior and demonstrate that even for long sequences this distribution can be efficiently computed for product-kernel priors using a kernel trick. Finally, we characterize the implicit function space priors associated with the most common weight space regularizers. Overall, our framework unifies and extends our ability to infer and interpret sequence-function relationships.
nan
Article 1096
Title@2025-07-11 (5): Rethinking Approximate Gaussian Inference in Classification
Title: Rethinking Approximate Gaussian Inference in Classification | Ungefähre gaussische Schlussfolgerung in der Klassifizierung neu denken | 重新思考约近高斯在分类中的推理 2502.03366v2 |
Authors (3): Bálint Mucsányi, Nathaël Da Costa, Philipp Hennig
In classification tasks, softmax functions are ubiquitously used as output activations to produce predictive probabilities. Such outputs only capture aleatoric uncertainty. To capture epistemic uncertainty, approximate Gaussian inference methods have been proposed. We develop a common formalism to describe such methods, which we view as outputting Gaussian distributions over the logit space. Predictives are then obtained as the expectations of the Gaussian distributions pushed forward through the softmax. However, such softmax Gaussian integrals cannot be solved analytically, and Monte Carlo (MC) approximations can be costly and noisy. We propose to replace the softmax activation by element-wise normCDF or sigmoid, which allows for the accurate sampling-free approximation of predictives. This also enables the approximation of the Gaussian pushforwards by Dirichlet distributions with moment matching. This approach entirely eliminates the runtime and memory overhead associated with MC sampling. We evaluate it combined with several approximate Gaussian inference methods (Laplace, HET, SNGP) on large- and small-scale datasets (ImageNet, CIFAR-100, CIFAR-10), demonstrating improved uncertainty quantification capabilities compared to softmax MC sampling.
nan
Article 1097
Title@2025-07-11 (5): SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations
Title: SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations | SPLASH! Probeneffizientes Inverse Verstärkungslernen auf Präferenzbasis für langhorizontige Adversarialaufgaben aus suboptimalen Hierarchischen Demonstrationen | 苏丹解放军-苏丹解放军-苏丹解放军-苏人解(苏人解)为次最佳等级示威的长风对流任务提供抽样高效的基于优惠的反反强化学习学习 2507.08707v1 |
Authors (6): Peter Crowley, Zachary Serlin, Tyler Paine, Makai Mann, Michael Benjamin, Calin Belta
Inverse Reinforcement Learning (IRL) presents a powerful paradigm for learning complex robotic tasks from human demonstrations. However, most approaches make the assumption that expert demonstrations are available, which is often not the case. Those that allow for suboptimality in the demonstrations are not designed for long-horizon goals or adversarial tasks. Many desirable robot capabilities fall into one or both of these categories, thus highlighting a critical shortcoming in the ability of IRL to produce field-ready robotic agents. We introduce Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations (SPLASH), which advances the state-of-the-art in learning from suboptimal demonstrations to long-horizon and adversarial settings. We empirically validate SPLASH on a maritime capture-the-flag task in simulation, and demonstrate real-world applicability with sim-to-real translation experiments on autonomous unmanned surface vehicles. We show that our proposed methods allow SPLASH to significantly outperform the state-of-the-art in reward learning from suboptimal demonstrations.
nan
Article 1098
Title@2025-07-11 (5): Conditional regression for the Nonlinear Single-Variable Model
Title: Conditional regression for the Nonlinear Single-Variable Model | Bedingte Regression für das nichtlineare Single-Variable Modell | 非线性单一可变模式有条件回归 2411.09686v2 |
Authors (2): Yantao Wu, Mauro Maggioni
Regressing a function $F$ on $\mathbb{R}^d$ without the statistical and computational curse of dimensionality requires special statistical models, for example that impose geometric assumptions on the distribution of the data (e.g., that its support is low-dimensional), or strong smoothness assumptions on $F$, or a special structure $F$. Among the latter, compositional models $F=f\circ g$ with $g$ mapping to $\mathbb{R}^r$ with $r\ll d$ include classical single- and multi-index models, as well as neural networks. While the case where $g$ is linear is well-understood, less is known when $g$ is nonlinear, and in particular for which $g$’s the curse of dimensionality in estimating $F$, or both $f$ and $g$, may be circumvented. Here we consider a model $F(X):=f(\Pi_\gamma X)$ where $\Pi_\gamma:\mathbb{R}^d\to[0,\textrm{len}\gamma]$ is the closest-point projection onto the parameter of a regular curve $\gamma:[0, \textrm{len}\gamma]\to\mathbb{R}^d$, and $f:[0,\textrm{len}\gamma]\to \mathbb{R}^1$. The input data $X$ is not low-dimensional: it can be as far from $\gamma$ as the condition that $\Pi\gamma(X)$ is well-defined allows. The distribution $X$, the curve $\gamma$ and the function $f$ are all unknown. This model is a natural nonlinear generalization of the single-index model, corresponding to $\gamma$ being a line. We propose a nonparametric estimator, based on conditional regression, that under suitable assumptions, the strongest of which being that $f$ is coarsely monotone, achieves, up to log factors, the $\textit{one-dimensional}$ optimal min-max rate for non-parametric regression, up to the level of noise in the observations, and be constructed in time $\mathcal{O}(d^2 n\log n)$. All the constants in the learning bounds, in the minimal number of samples required for our bounds to hold, and in the computational complexity are at most low-order polynomials in $d$.
nan
Article 1099
Title@2025-07-11 (5): SEREP: Semantic Facial Expression Representation for Robust In-the-Wild Capture and Retargeting
Title: SEREP: Semantic Facial Expression Representation for Robust In-the-Wild Capture and Retargeting | SEREP: Semantische Gesichtsausdruck-Darstellung für robustes In-the-Wild-Capture und Retargeting | SEREP: 野外强力捕捉和重新瞄准目标的语义法表达表 2412.14371v3 |
Authors (6): Arthur Josi, Luiz Gustavo Hafemann, Abdallah Dib, Emeline Got, Rafael M. O. Cruz, Marc-Andre Carbonneau
Monocular facial performance capture in-the-wild is challenging due to varied capture conditions, face shapes, and expressions. Most current methods rely on linear 3D Morphable Models, which represent facial expressions independently of identity at the vertex displacement level. We propose SEREP (Semantic Expression Representation), a model that disentangles expression from identity at the semantic level. We start by learning an expression representation from high-quality 3D data of unpaired facial expressions. Then, we train a model to predict expression from monocular images relying on a novel semi-supervised scheme using low quality synthetic data. In addition, we introduce MultiREX, a benchmark addressing the lack of evaluation resources for the expression capture task. Our experiments show that SEREP outperforms state-of-the-art methods, capturing challenging expressions and transferring them to new identities.
nan
Article 1100
Title@2025-07-11 (5): Domain-Informed Operation Excellence of Gas Turbine System with Machine Learning
Title: Domain-Informed Operation Excellence of Gas Turbine System with Machine Learning | Domain-informierte Operation Exzellenz des Gasturbinensystems mit maschinellem Lernen | 采用机器学习的天然气涡轮系统内部一体化英才行动 2507.08697v1 |
Authors (6): Waqar Muhammad Ashraf, Amir H. Keshavarzzadeh, Abdulelah S. Alshehri, Abdulrahman bin Jumah, Ramit Debnath, Vivek Dua
The domain-consistent adoption of artificial intelligence (AI) remains low in thermal power plants due to the black-box nature of AI algorithms and low representation of domain knowledge in conventional data-centric analytics. In this paper, we develop a MAhalanobis Distance-based OPTimization (MAD-OPT) framework that incorporates the Mahalanobis distance-based constraint to introduce domain knowledge into data-centric analytics. The developed MAD-OPT framework is applied to maximize thermal efficiency and minimize turbine heat rate for a 395 MW capacity gas turbine system. We demonstrate that the MAD-OPT framework can estimate domain-informed optimal process conditions under different ambient conditions, and the optimal solutions are found to be robust as evaluated by Monte Carlo simulations. We also apply the MAD-OPT framework to estimate optimal process conditions beyond the design power generation limit of the gas turbine system, and have found comparable results with the actual data of the power plant. We demonstrate that implementing data-centric optimization analytics without incorporating domain-informed constraints may provide ineffective solutions that may not be implementable in the real operation of the gas turbine system. This research advances the integration of the data-driven domain knowledge into machine learning-powered analytics that enhances the domain-informed operation excellence and paves the way for safe AI adoption in thermal power systems.
nan
Article 1101
Title@2025-07-11 (5): Learnable quantum spectral filters for hybrid graph neural networks
Title: Learnable quantum spectral filters for hybrid graph neural networks | Erlernbare Quantenspektralfilter für hybride Graphen-Neuralnetzwerke | 用于混合图形神经网络的可学习量子光谱过滤器 2507.05640v2 |
Authors (1): Ammar Daskin
In this paper, we describe a parameterized quantum circuit that can be considered as convolutional and pooling layers for graph neural networks. The circuit incorporates the parameterized quantum Fourier circuit where the qubit connections for the controlled gates derived from the Laplacian operator. Specifically, we show that the eigenspace of the Laplacian operator of a graph can be approximated by using QFT based circuit whose connections are determined from the adjacency matrix. For an $N\times N$ Laplacian, this approach yields an approximate polynomial-depth circuit requiring only $n=log(N)$ qubits. These types of circuits can eliminate the expensive classical computations for approximating the learnable functions of the Laplacian through Chebyshev polynomial or Taylor expansions. Using this circuit as a convolutional layer provides an $n-$ dimensional probability vector that can be considered as the filtered and compressed graph signal. Therefore, the circuit along with the measurement can be considered a very efficient convolution plus pooling layer that transforms an $N$-dimensional signal input into $n-$dimensional signal with an exponential compression. We then apply a classical neural network prediction head to the output of the circuit to construct a complete graph neural network. Since the circuit incorporates geometric structure through its graph connection-based approach, we present graph classification results for the benchmark datasets listed in TUDataset library. Using only [1-100] learnable parameters for the quantum circuit and minimal classical layers (1000-5000 parameters) in a generic setting, the obtained results are comparable to and in some cases better than many of the baseline results, particularly for the cases when geometric structure plays a significant role.
nan
Article 1102
Title@2025-07-11 (5): PREAMBLE: Private and Efficient Aggregation via Block Sparse Vectors
Title: PREAMBLE: Private and Efficient Aggregation via Block Sparse Vectors | PRÄAMBLE: Private und effiziente Aggregation über Block Sparse Vektoren | PREAMBL: 通过块状散射矢量进行私人和高效聚合 2503.11897v2 |
Authors (5): Hilal Asi, Vitaly Feldman, Hannah Keller, Guy N. Rothblum, Kunal Talwar
We revisit the problem of secure aggregation of high-dimensional vectors in a two-server system such as Prio. These systems are typically used to aggregate vectors such as gradients in private federated learning, where the aggregate itself is protected via noise addition to ensure differential privacy. Existing approaches require communication scaling with the dimensionality, and thus limit the dimensionality of vectors one can efficiently process in this setup. We propose PREAMBLE: {\bf Pr}ivate {\bf E}fficient {\bf A}ggregation {\bf M}echanism via {\bf BL}ock-sparse {\bf E}uclidean Vectors. PREAMBLE builds on an extension of distributed point functions that enables communication- and computation-efficient aggregation of {\em block-sparse vectors}, which are sparse vectors where the non-zero entries occur in a small number of clusters of consecutive coordinates. We show that these block-sparse DPFs can be combined with random sampling and privacy amplification by sampling results, to allow asymptotically optimal privacy-utility trade-offs for vector aggregation, at a fraction of the communication cost. When coupled with recent advances in numerical privacy accounting, our approach incurs a negligible overhead in noise variance, compared to the Gaussian mechanism used with Prio.
nan
Article 1103
Title@2025-07-11 (5): Forget Me Not: Fighting Local Overfitting with Knowledge Fusion and Distillation
Title: Forget Me Not: Fighting Local Overfitting with Knowledge Fusion and Distillation | Vergessen Sie mich nicht: Gegen lokales Überpassen mit Wissensfusion und Destillation kämpfen | 忘记我,不要忘记我,不要在本地与知识融合和蒸馏的重叠作斗争。 2507.08686v1 |
Authors (3): Uri Stern, Eli Corn, Daphna Weinshall
Overfitting in deep neural networks occurs less frequently than expected. This is a puzzling observation, as theory predicts that greater model capacity should eventually lead to overfitting – yet this is rarely seen in practice. But what if overfitting does occur, not globally, but in specific sub-regions of the data space? In this work, we introduce a novel score that measures the forgetting rate of deep models on validation data, capturing what we term local overfitting: a performance degradation confined to certain regions of the input space. We demonstrate that local overfitting can arise even without conventional overfitting, and is closely linked to the double descent phenomenon. Building on these insights, we introduce a two-stage approach that leverages the training history of a single model to recover and retain forgotten knowledge: first, by aggregating checkpoints into an ensemble, and then by distilling it into a single model of the original size, thus enhancing performance without added inference cost. Extensive experiments across multiple datasets, modern architectures, and training regimes validate the effectiveness of our approach. Notably, in the presence of label noise, our method – Knowledge Fusion followed by Knowledge Distillation – outperforms both the original model and independently trained ensembles, achieving a rare win-win scenario: reduced training and inference complexity.
nan
Article 1104
Title@2025-07-11 (5): Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness
Title: Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness | Wiederkehrende Konvergenz: Umwerfende Komplexität jenseits von Lipschitz Smoothness | 重新审视趋同:利普施茨平滑之后的复杂程度 2507.08913v1 |
Authors (4): Qi He, Peiran Yu, Ziyi Chen, Heng Huang
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance. Despite extensive development of convergence guarantees under various assumptions in recent years, most require the Lipschitz smoothness condition, which is often not met in common machine learning models. We highlight this issue with specific counterexamples. To address this gap, we revisit the convergence rates of shuffling-type gradient methods without assuming Lipschitz smoothness. Using our stepsize strategy, the shuffling-type gradient algorithm not only converges under weaker assumptions but also match the current best-known convergence rates, thereby broadening its applicability. We prove the convergence rates for nonconvex, strongly convex, and non-strongly convex cases, each under both random reshuffling and arbitrary shuffling schemes, under a general bounded variance condition. Numerical experiments further validate the performance of our shuffling-type gradient algorithm, underscoring its practical efficacy.
nan
Article 1105
Title@2025-07-11 (5): Open Materials Generation with Stochastic Interpolants
Title: Open Materials Generation with Stochastic Interpolants | Offene Materialgenerierung mit stochastischen Interpolanten | 与室内内刑警一起制造开放材料 2502.02582v2 |
Authors (14): Philipp Hoellmer, Thomas Egg, Maya M. Martirossyan, Eric Fuemmeler, Zeren Shui, Amit Gupta, Pawan Prakash, Adrian Roitberg, Mingjie Liu, George Karypis, Mark Transtrum, Richard G. Hennig, Ellad B. Tadmor, Stefano Martiniani
The discovery of new materials is essential for enabling technological advancements. Computational approaches for predicting novel materials must effectively learn the manifold of stable crystal structures within an infinite design space. We introduce Open Materials Generation (OMatG), a unifying framework for the generative design and discovery of inorganic crystalline materials. OMatG employs stochastic interpolants (SI) to bridge an arbitrary base distribution to the target distribution of inorganic crystals via a broad class of tunable stochastic processes, encompassing both diffusion models and flow matching as special cases. In this work, we adapt the SI framework by integrating an equivariant graph representation of crystal structures and extending it to account for periodic boundary conditions in unit cell representations. Additionally, we couple the SI flow over spatial coordinates and lattice vectors with discrete flow matching for atomic species. We benchmark OMatG’s performance on two tasks: Crystal Structure Prediction (CSP) for specified compositions, and ‘de novo’ generation (DNG) aimed at discovering stable, novel, and unique structures. In our ground-up implementation of OMatG, we refine and extend both CSP and DNG metrics compared to previous works. OMatG establishes a new state of the art in generative modeling for materials discovery, outperforming purely flow-based and diffusion-based implementations. These results underscore the importance of designing flexible deep learning frameworks to accelerate progress in materials science. The OMatG code is available at https://github.com/FERMat-ML/OMatG.
nan
Article 1106
Title@2025-07-11 (5): Fair-FLIP: Fair Deepfake Detection with Fairness-Oriented Final Layer Input Prioritising
Title: Fair-FLIP: Fair Deepfake Detection with Fairness-Oriented Final Layer Input Prioritising | Fair-FLIP: Faire Deepfake-Erkennung mit Fairness-orientiertem Final Layer Input Priorisierung | Fair-FLIP:以公平为导向、以公平为导向的最后层投入为优先的公平深海探测 2507.08912v1 |
Authors (5): Tomasz Szandala, Fatima Ezzeddine, Natalia Rusin, Silvia Giordano, Omran Ayoub
Artificial Intelligence-generated content has become increasingly popular, yet its malicious use, particularly the deepfakes, poses a serious threat to public trust and discourse. While deepfake detection methods achieve high predictive performance, they often exhibit biases across demographic attributes such as ethnicity and gender. In this work, we tackle the challenge of fair deepfake detection, aiming to mitigate these biases while maintaining robust detection capabilities. To this end, we propose a novel post-processing approach, referred to as Fairness-Oriented Final Layer Input Prioritising (Fair-FLIP), that reweights a trained model’s final-layer inputs to reduce subgroup disparities, prioritising those with low variability while demoting highly variable ones. Experimental results comparing Fair-FLIP to both the baseline (without fairness-oriented de-biasing) and state-of-the-art approaches show that Fair-FLIP can enhance fairness metrics by up to 30% while maintaining baseline accuracy, with only a negligible reduction of 0.25%. Code is available on Github: https://github.com/szandala/fair-deepfake-detection-toolbox
nan
Article 1107
Title@2025-07-11 (5): Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
Title: Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs | Modellkollaps ist kein Fehler, sondern ein Feature in Machine Unlearning für LLMs | 模型折叠不是臭虫,而是机器为 LLM 取消学习的特写 2507.04219v2 |
Authors (4): Yan Scholten, Sophie Xhonneux, Leo Schwinn, Stephan Günnemann
Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their training objectives. We argue this not only risks reinforcing exposure to sensitive data, it also fundamentally contradicts the principle of minimizing its use. As a remedy, we propose a novel unlearning method - Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective. Our approach is inspired by recent observations that training generative models on their own generations leads to distribution collapse, effectively removing information from the model. Our core idea is to leverage this collapse for unlearning by triggering collapse partially on the sensitive data. We theoretically analyze that our approach converges to the desired outcome, i.e. the LLM unlearns the information in the forget set. We empirically demonstrate that PMC overcomes two key limitations of existing unlearning approaches that explicitly optimize on unlearning targets, and more effectively removes private information from model outputs. Overall, our contributions represent an important step toward more comprehensive unlearning that aligns with real-world privacy constraints. Code available at https://www.cs.cit.tum.de/daml/partial-model-collapse/.
nan
Article 1108
Title@2025-07-11 (5): Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry
Title: Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry | Feature Learning beyond the Lazy-Rich Dichotomie: Einblicke aus der Repräsentationsgeometrie | 超越Lazy-Rich二分切开术的特征学习:代表式几何的透视 2503.18114v2 |
Authors (4): Chi-Ning Chou, Hang Le, Yichen Wang, SueYeon Chung
Integrating task-relevant information into neural representations is a fundamental ability of both biological and artificial intelligence systems. Recent theories have categorized learning into two regimes: the rich regime, where neural networks actively learn task-relevant features, and the lazy regime, where networks behave like random feature models. Yet this simple lazy-rich dichotomy overlooks a diverse underlying taxonomy of feature learning, shaped by differences in learning algorithms, network architectures, and data properties. To address this gap, we introduce an analysis framework to study feature learning via the geometry of neural representations. Rather than inspecting individual learned features, we characterize how task-relevant representational manifolds evolve throughout the learning process. We show, in both theoretical and empirical settings, that as networks learn features, task-relevant manifolds untangle, with changes in manifold geometry revealing distinct learning stages and strategies beyond the lazy-rich dichotomy. This framework provides novel insights into feature learning across neuroscience and machine learning, shedding light on structural inductive biases in neural circuits and the mechanisms underlying out-of-distribution generalization.
nan
Article 1109
Title@2025-07-11 (5): The Impact of Automatic Speech Transcription on Speaker Attribution
Title: The Impact of Automatic Speech Transcription on Speaker Attribution | Die Auswirkungen der automatischen Sprachtranskription auf die Sprecherzuweisung | 自动发言限制对议长权力的影响 2507.08660v1 |
Authors (4): Cristina Aggazzotti, Matthew Wiesner, Elizabeth Allyn Smith, Nicholas Andrews
Speaker attribution from speech transcripts is the task of identifying a speaker from the transcript of their speech based on patterns in their language use. This task is especially useful when the audio is unavailable (e.g. deleted) or unreliable (e.g. anonymized speech). Prior work in this area has primarily focused on the feasibility of attributing speakers using transcripts produced by human annotators. However, in real-world settings, one often only has more errorful transcripts produced by automatic speech recognition (ASR) systems. In this paper, we conduct what is, to our knowledge, the first comprehensive study of the impact of automatic transcription on speaker attribution performance. In particular, we study the extent to which speaker attribution performance degrades in the face of transcription errors, as well as how properties of the ASR system impact attribution. We find that attribution is surprisingly resilient to word-level transcription errors and that the objective of recovering the true transcript is minimally correlated with attribution performance. Overall, our findings suggest that speaker attribution on more errorful transcripts produced by ASR is as good, if not better, than attribution based on human-transcribed data, possibly because ASR transcription errors can capture speaker-specific features revealing of speaker identity.
nan
Article 1110
Title@2025-07-11 (5): Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees
Title: Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees | Sicheres tiefes Stärkungslernen für Ressourcenallokation mit Spitzenzeit der Informationsverletzungsgarantien | 安全深强化学习,以进行违反信息达到高峰年龄的违反信息保障的资源分配 2507.08653v1 |
Authors (2): Berire Gunes Reyhan, Sinem Coleri
In Wireless Networked Control Systems (WNCSs), control and communication systems must be co-designed due to their strong interdependence. This paper presents a novel optimization theory-based safe deep reinforcement learning (DRL) framework for ultra-reliable WNCSs, ensuring constraint satisfaction while optimizing performance, for the first time in the literature. The approach minimizes power consumption under key constraints, including Peak Age of Information (PAoI) violation probability, transmit power, and schedulability in the finite blocklength regime. PAoI violation probability is uniquely derived by combining stochastic maximum allowable transfer interval (MATI) and maximum allowable packet delay (MAD) constraints in a multi-sensor network. The framework consists of two stages: optimization theory and safe DRL. The first stage derives optimality conditions to establish mathematical relationships among variables, simplifying and decomposing the problem. The second stage employs a safe DRL model where a teacher-student framework guides the DRL agent (student). The control mechanism (teacher) evaluates compliance with system constraints and suggests the nearest feasible action when needed. Extensive simulations show that the proposed framework outperforms rule-based and other optimization theory based DRL benchmarks, achieving faster convergence, higher rewards, and greater stability.
nan
Article 1111
Title@2025-07-11 (5): Scaling Attention to Very Long Sequences in Linear Time with Wavelet-Enhanced Random Spectral Attention (WERSA)
Title: Scaling Attention to Very Long Sequences in Linear Time with Wavelet-Enhanced Random Spectral Attention (WERSA) | Skalierung der Aufmerksamkeit auf sehr lange Sequenzen in linearer Zeit mit Wavelet-erweiterter Zufallsspektral-Achtung (WERSA) | 以波浪增强随机光谱注意, 将注意力转向线性时间的甚长序列( WERSA) 2507.08637v1 |
Authors (1): Vincenzo Dentamaro
Transformer models are computationally costly on long sequences since regular attention has quadratic $O(n^2)$ time complexity. We introduce Wavelet-Enhanced Random Spectral Attention (WERSA), a novel mechanism of linear $O(n)$ time complexity that is pivotal to enable successful long-sequence processing without the performance trade-off. WERSA merges content-adaptive random spectral features together with multi-resolution Haar wavelets and learnable parameters to selectively attend to informative scales of data while preserving linear efficiency. Large-scale comparisons \textbf{on single GPU} and across various benchmarks (vision, NLP, hierarchical reasoning) and various attention mechanisms (like Multiheaded Attention, Flash-Attention-2, FNet, Linformer, Performer, Waveformer), reveal uniform advantages of WERSA. It achieves best accuracy in all tests. On ArXiv classification, WERSA improves accuracy over vanilla attention by 1.2\% (86.2\% vs 85.0\%) while cutting training time by 81\% (296s vs 1554s) and FLOPS by 73.4\% (26.2G vs 98.4G). Significantly, WERSA excels where vanilla and FlashAttention-2 fail: on ArXiv-128k’s extremely lengthy sequences, it achieves best accuracy (79.1\%) and AUC (0.979) among viable methods, operating on data that gives Out-Of-Memory errors to quadratic methods while being \textbf{twice as fast} as Waveformer, its next-best competitor. By significantly reducing computational loads without compromising accuracy, WERSA makes possible more practical, more affordable, long-context models, in particular on low-resource hardware, for more sustainable and more scalable AI development.
nan
Article 1112
Title@2025-07-11 (5): Entangled Threats: A Unified Kill Chain Model for Quantum Machine Learning Security
Title: Entangled Threats: A Unified Kill Chain Model for Quantum Machine Learning Security | Verschränkte Bedrohungen: Ein einheitliches Kill Chain Modell für Quantum Machine Learning Security | 相互纠缠的威胁:量子机器学习安全的统一杀手链模式 2507.08623v1 |
Authors (10): Pascal Debus, Maximilian Wendlinger, Kilian Tscharke, Daniel Herr, Cedric Brügmann, Daniel Ohl de Mello, Juris Ulmanis, Alexander Erhard, Arthur Schmidt, Fabian Petsch
Quantum Machine Learning (QML) systems inherit vulnerabilities from classical machine learning while introducing new attack surfaces rooted in the physical and algorithmic layers of quantum computing. Despite a growing body of research on individual attack vectors - ranging from adversarial poisoning and evasion to circuit-level backdoors, side-channel leakage, and model extraction - these threats are often analyzed in isolation, with unrealistic assumptions about attacker capabilities and system environments. This fragmentation hampers the development of effective, holistic defense strategies. In this work, we argue that QML security requires more structured modeling of the attack surface, capturing not only individual techniques but also their relationships, prerequisites, and potential impact across the QML pipeline. We propose adapting kill chain models, widely used in classical IT and cybersecurity, to the quantum machine learning context. Such models allow for structured reasoning about attacker objectives, capabilities, and possible multi-stage attack paths - spanning reconnaissance, initial access, manipulation, persistence, and exfiltration. Based on extensive literature analysis, we present a detailed taxonomy of QML attack vectors mapped to corresponding stages in a quantum-aware kill chain framework that is inspired by the MITRE ATLAS for classical machine learning. We highlight interdependencies between physical-level threats (like side-channel leakage and crosstalk faults), data and algorithm manipulation (such as poisoning or circuit backdoors), and privacy attacks (including model extraction and training data inference). This work provides a foundation for more realistic threat modeling and proactive security-in-depth design in the emerging field of quantum machine learning.
nan
Article 1113
Title@2025-07-11 (5): Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
Title: Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference | Mind the Memory Gap: Enthüllen von GPU-Flaschenhalsen in großflächiger LLM-Inferenz | 牢记记忆差距:大型批量LLM 推理中的 GPU 堆积点 2503.08311v2 |
Authors (8): Pol G. Recasens, Ferran Agullo, Yue Zhu, Chen Wang, Eun Kyung Lee, Olivier Tardieu, Jordi Torres, Josep Ll. Berral
Large language models have been widely adopted across different tasks, but their auto-regressive generation nature often leads to inefficient resource utilization during inference. While batching is commonly used to increase throughput, performance gains plateau beyond a certain batch size, especially with smaller models, a phenomenon that existing literature typically explains as a shift to the compute-bound regime. In this paper, through an in-depth GPU-level analysis, we reveal that large-batch inference remains memory-bound, with most GPU compute capabilities underutilized due to DRAM bandwidth saturation as the primary bottleneck. To address this, we propose a Batching Configuration Advisor (BCA) that optimizes memory allocation, reducing GPU memory requirements with minimal impact on throughput. The freed memory and underutilized GPU compute capabilities can then be leveraged by concurrent workloads. Specifically, we use model replication to improve serving throughput and GPU utilization. Our findings challenge conventional assumptions about LLM inference, offering new insights and practical strategies for improving resource utilization, particularly for smaller language models. The code is publicly available at https://github.com/FerranAgulloLopez/vLLMBatchingMemoryGap.
nan
Article 1114
Title@2025-07-11 (5): A Malliavin calculus approach to score functions in diffusion generative models
Title: A Malliavin calculus approach to score functions in diffusion generative models | Ein Malliavin Kalkül Ansatz, um Funktionen in Diffusion generative Modelle punkten | 以Malliavin微积分法在传播基因变异模型中计分功能 2507.05550v2 |
Authors (5): Ehsan Mirafzali, Frank Proske, Utkarsh Gupta, Daniele Venturi, Razvan Marinescu
Score-based diffusion generative models have recently emerged as a powerful tool for modelling complex data distributions. These models aim at learning the score function, which defines a map from a known probability distribution to the target data distribution via deterministic or stochastic differential equations (SDEs). The score function is typically estimated from data using a variety of approximation techniques, such as denoising or sliced score matching, Hyv"arien’s method, or Schr"odinger bridges. In this paper, we derive an exact, closed form, expression for the score function for a broad class of nonlinear diffusion generative models. Our approach combines modern stochastic analysis tools such as Malliavin derivatives and their adjoint operators (Skorokhod integrals or Malliavin Divergence) with a new Bismut-type formula. The resulting expression for the score function can be written entirely in terms of the first and second variation processes, with all Malliavin derivatives systematically eliminated, thereby enhancing its practical applicability. The theoretical framework presented in this work offers a principled foundation for advancing score estimation methods in generative modelling, enabling the design of new sampling algorithms for complex probability distributions. Our results can be extended to broader classes of stochastic differential equations, opening new directions for the development of score-based diffusion generative models.
nan
Article 1115
Title@2025-07-11 (5): Towards Collaborative Fairness in Federated Learning Under Imbalanced Covariate Shift
Title: Towards Collaborative Fairness in Federated Learning Under Imbalanced Covariate Shift | Auf dem Weg zu kollaborativer Fairness im Federated Learning under Imbalanced Covariate Shift | 实现在平衡的共变调整下实现联邦学习合作公平 2507.08617v1 |
Authors (7): Tianrun Yu, Jiaqi Wang, Haoyu Wang, Mingquan Lin, Han Liu, Nelson S. Yee, Fenglong Ma
Collaborative fairness is a crucial challenge in federated learning. However, existing approaches often overlook a practical yet complex form of heterogeneity: imbalanced covariate shift. We provide a theoretical analysis of this setting, which motivates the design of FedAKD (Federated Asynchronous Knowledge Distillation)- simple yet effective approach that balances accurate prediction with collaborative fairness. FedAKD consists of client and server updates. In the client update, we introduce a novel asynchronous knowledge distillation strategy based on our preliminary analysis, which reveals that while correctly predicted samples exhibit similar feature distributions across clients, incorrectly predicted samples show significant variability. This suggests that imbalanced covariate shift primarily arises from misclassified samples. Leveraging this insight, our approach first applies traditional knowledge distillation to update client models while keeping the global model fixed. Next, we select correctly predicted high-confidence samples and update the global model using these samples while keeping client models fixed. The server update simply aggregates all client models. We further provide a theoretical proof of FedAKD’s convergence. Experimental results on public datasets (FashionMNIST and CIFAR10) and a real-world Electronic Health Records (EHR) dataset demonstrate that FedAKD significantly improves collaborative fairness, enhances predictive accuracy, and fosters client participation even under highly heterogeneous data distributions.
nan
Article 1116
Title@2025-07-11 (5): AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Title: AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs | AgentsNet: Koordination und kollaborative Reasoning in Multi-Agent LLMs | 网:多机构LLM中的协调与合作理由 2507.08616v1 |
Authors (5): Florian Grötschla, Luis Müller, Jan Tönshoff, Mikhail Galkin, Bryan Perozzi
Large-language models (LLMs) have demonstrated powerful problem-solving capabilities, in particular when organized in multi-agent systems. However, the advent of such systems also raises several questions on the ability of a complex network of agents to effectively self-organize and collaborate. While measuring performance on standard reasoning benchmarks indicates how well multi-agent systems can solve reasoning tasks, it is unclear whether these systems are able to leverage their topology effectively. Here, we propose AgentsNet, a new benchmark for multi-agent reasoning. By drawing inspiration from classical problems in distributed systems and graph theory, AgentsNet measures the ability of multi-agent systems to collaboratively form strategies for problem-solving, self-organization, and effective communication given a network topology. We evaluate a variety of baseline methods on AgentsNet including homogeneous networks of agents which first have to agree on basic protocols for organization and communication. We find that some frontier LLMs are already demonstrating strong performance for small networks but begin to fall off once the size of the network scales. While existing multi-agent benchmarks cover at most 2-5 agents, AgentsNet is practically unlimited in size and can scale with new generations of LLMs. As such, we also probe frontier models in a setup with up to 100 agents.
nan
Article 1117
Title@2025-07-11 (5): Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data
Title: Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data | Emergent Natural Language mit Kommunikationsspielen zur Verbesserung der Bildbeschriftung Fähigkeiten ohne zusätzliche Daten | 新兴自然语言与交流运动会:在没有额外数据的情况下提高图像能力交流运动会 2507.08610v1 |
Authors (2): Parag Dutta, Ambedkar Dukkipati
Image captioning is an important problem in developing various AI systems, and these tasks require large volumes of annotated images to train the models. Since all existing labelled datasets are already used for training the large Vision Language Models (VLMs), it becomes challenging to improve the performance of the same. Considering this, it is essential to consider the unsupervised image captioning performance, which remains relatively under-explored. To that end, we propose LoGIC (Lewis Communication Game for Image Captioning), a Multi-agent Reinforcement Learning game. The proposed method consists of two agents, a ‘speaker’ and a ‘listener’, with the objective of learning a strategy for communicating in natural language. We train agents in the cooperative common-reward setting using the GRPO algorithm and show that improvement in image captioning performance emerges as a consequence of the agents learning to play the game. We show that using pre-trained VLMs as the ‘speaker’ and Large Language Model (LLM) for language understanding in the ‘listener’, we achieved a $46$ BLEU score after fine-tuning using LoGIC without additional labels, a $2$ units advantage in absolute metrics compared to the $44$ BLEU score of the vanilla VLM. Additionally, we replace the VLM from the ‘speaker’ with lightweight components: (i) a ViT for image perception and (ii) a GPT2 language generation, and train them from scratch using LoGIC, obtaining a $31$ BLEU score in the unsupervised setting, a $10$ points advantage over existing unsupervised image-captioning methods.
nan
Article 1118
Title@2025-07-11 (5): Attribution assignment for deep-generative sequence models enables interpretability analysis using positive-only data
Title: Attribution assignment for deep-generative sequence models enables interpretability analysis using positive-only data | Zuordnungszuweisung für tiefgenerative Sequenzmodelle ermöglicht eine Interpretationsanalyse mit Positiv-Only-Daten | 深遗传序列模型的归属分配,使得能够使用只使用正数数据的可解释性分析 2506.23182v2 |
Authors (7): Robert Frank, Michael Widrich, Rahmad Akbar, Günter Klambauer, Geir Kjetil Sandve, Philippe A. Robert, Victor Greiff
Generative machine learning models offer a powerful framework for therapeutic design by efficiently exploring large spaces of biological sequences enriched for desirable properties. Unlike supervised learning methods, which require both positive and negative labeled data, generative models such as LSTMs can be trained solely on positively labeled sequences, for example, high-affinity antibodies. This is particularly advantageous in biological settings where negative data are scarce, unreliable, or biologically ill-defined. However, the lack of attribution methods for generative models has hindered the ability to extract interpretable biological insights from such models. To address this gap, we developed Generative Attribution Metric Analysis (GAMA), an attribution method for autoregressive generative models based on Integrated Gradients. We assessed GAMA using synthetic datasets with known ground truths to characterize its statistical behavior and validate its ability to recover biologically relevant features. We further demonstrated the utility of GAMA by applying it to experimental antibody-antigen binding data. GAMA enables model interpretability and the validation of generative sequence design strategies without the need for negative training data.
nan
Article 1119
Title@2025-07-11 (5): MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
Title: MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs | MedSegFactory: Textgeführte Generation medizinischer Image-Mask-Paare | MedSegFactory: 以文本指导方式制作的医学图像图像-面面对称 2504.06897v2 |
Authors (8): Jiawei Mao, Yuhan Wang, Yucheng Tang, Daguang Xu, Kang Wang, Yang Yang, Zongwei Zhou, Yuyin Zhou
This paper presents MedSegFactory, a versatile medical synthesis framework that generates high-quality paired medical images and segmentation masks across modalities and tasks. It aims to serve as an unlimited data repository, supplying image-mask pairs to enhance existing segmentation tools. The core of MedSegFactory is a dual-stream diffusion model, where one stream synthesizes medical images and the other generates corresponding segmentation masks. To ensure precise alignment between image-mask pairs, we introduce Joint Cross-Attention (JCA), enabling a collaborative denoising paradigm by dynamic cross-conditioning between streams. This bidirectional interaction allows both representations to guide each other’s generation, enhancing consistency between generated pairs. MedSegFactory unlocks on-demand generation of paired medical images and segmentation masks through user-defined prompts that specify the target labels, imaging modalities, anatomical regions, and pathological conditions, facilitating scalable and high-quality data generation. This new paradigm of medical image synthesis enables seamless integration into diverse medical imaging workflows, enhancing both efficiency and accuracy. Extensive experiments show that MedSegFactory generates data of superior quality and usability, achieving competitive or state-of-the-art performance in 2D and 3D segmentation tasks while addressing data scarcity and regulatory constraints.
nan
Article 1120
Title@2025-07-11 (5): Remote Sensing Reveals Adoption of Sustainable Rice Farming Practices Across Punjab, India
Title: Remote Sensing Reveals Adoption of Sustainable Rice Farming Practices Across Punjab, India | Fernerkundung offenbart Annahme nachhaltiger Rice Farming-Praktiken in Punjab, Indien | 在印度旁遮普省各地采用可持续的稻米耕作做法 2507.08605v1 |
Authors (9): Ando Shah, Rajveer Singh, Akram Zaytar, Girmaw Abebe Tadesse, Caleb Robinson, Negar Tafti, Stephen A. Wood, Rahul Dodhia, Juan M. Lavista Ferres
Rice cultivation consumes 24-30% of global freshwater, creating critical water management challenges in major rice-producing regions. Sustainable irrigation practices like direct seeded rice (DSR) and alternate wetting and drying (AWD) can reduce water use by 20-40% while maintaining yields, helping secure long-term agricultural productivity as water scarcity intensifies - a key component of the Zero Hunger Sustainable Development Goal. However, limited data on adoption rates of these practices prevents evidence-based policymaking and targeted resource allocation. We developed a novel remote sensing framework to monitor sustainable water management practices at scale in Punjab, India - a region facing severe groundwater depletion of 41.6 cm/year. To collect essential ground truth data, we partnered with the Nature Conservancy’s Promoting Regenerative and No-burn Agriculture (PRANA) program, which trained approximately 1,400 farmers on water-saving techniques while documenting their field-level practices. Using this data, we created a classification system with Sentinel-1 satellite imagery that separates water management along sowing and irrigation dimensions. Our approach achieved a 78% F1-score in distinguishing DSR from traditional puddled transplanted rice without requiring prior knowledge of planting dates. We demonstrated scalability by mapping DSR adoption across approximately 3 million agricultural plots in Punjab, with district-level predictions showing strong correlation (Pearson=0.77, RBO= 0.77) with government records. This study provides policymakers with a powerful tool to track sustainable water management adoption, target interventions, and measure program impacts at scale.
nan
Article 1121
Title@2025-07-11 (5): ADAPT: A Pseudo-labeling Approach to Combat Concept Drift in Malware Detection
Title: ADAPT: A Pseudo-labeling Approach to Combat Concept Drift in Malware Detection | ADAPT: Ein Pseudo-Labeling-Ansatz zur Bekämpfung von Konzept Drift bei Malware-Erkennung | ADAPT: 一种以优多为标签的方法,以对抗马利软件探测中的漂流概念 2507.08597v1 |
Authors (3): Md Tanvirul Alam, Aritran Piplai, Nidhi Rastogi
Machine learning models are commonly used for malware classification; however, they suffer from performance degradation over time due to concept drift. Adapting these models to changing data distributions requires frequent updates, which rely on costly ground truth annotations. While active learning can reduce the annotation burden, leveraging unlabeled data through semi-supervised learning remains a relatively underexplored approach in the context of malware detection. In this research, we introduce \texttt{ADAPT}, a novel pseudo-labeling semi-supervised algorithm for addressing concept drift. Our model-agnostic method can be applied to various machine learning models, including neural networks and tree-based algorithms. We conduct extensive experiments on five diverse malware detection datasets spanning Android, Windows, and PDF domains. The results demonstrate that our method consistently outperforms baseline models and competitive benchmarks. This work paves the way for more effective adaptation of machine learning models to concept drift in malware detection.
nan
Article 1122
Title@2025-07-11 (5): The Engineer’s Dilemma: A Review of Establishing a Legal Framework for Integrating Machine Learning in Construction by Navigating Precedents and Industry Expectations
Title: The Engineer’s Dilemma: A Review of Establishing a Legal Framework for Integrating Machine Learning in Construction by Navigating Precedents and Industry Expectations | Das Dilemma des Ingenieurs: Eine Überprüfung der Schaffung eines rechtlichen Rahmens für die Integration von maschinellem Lernen in den Bau durch Navigieren von Vor- und Industrieerwartungen | 工程师的难题:审查建立一个法律框架,通过控制先例和工业预期,将机械学习纳入建筑的法律框架 2507.08908v1 |
Authors (1): M. Z. Naser
Despite the widespread interest in machine learning (ML), the engineering industry has not yet fully adopted ML-based methods, which has left engineers and stakeholders uncertain about the legal and regulatory frameworks that govern their decisions. This gap remains unaddressed as an engineer’s decision-making process, typically governed by professional ethics and practical guidelines, now intersects with complex algorithmic outputs. To bridge this gap, this paper explores how engineers can navigate legal principles and legislative justifications that support and/or contest the deployment of ML technologies. Drawing on recent precedents and experiences gained from other fields, this paper argues that analogical reasoning can provide a basis for embedding ML within existing engineering codes while maintaining professional accountability and meeting safety requirements. In exploring these issues, the discussion focuses on established liability doctrines, such as negligence and product liability, and highlights how courts have evaluated the use of predictive models. We further analyze how legislative bodies and standard-setting organizations can furnish explicit guidance equivalent to prior endorsements of emergent technologies. This exploration stresses the vitality of understanding the interplay between technical justifications and legal precedents for shaping an informed stance on ML’s legitimacy in engineering practice. Finally, our analysis catalyzes a legal framework for integrating ML through which stakeholders can critically assess the responsibilities, liabilities, and benefits inherent in ML-driven engineering solutions.
nan
Article 1123
Title@2025-07-11 (5): On the Gaussian process limit of Bayesian Additive Regression Trees
Title: On the Gaussian process limit of Bayesian Additive Regression Trees | Auf der Gaußschen Prozessgrenze von Bayesian Additive Regression Trees | Bayesian Additive 倒退树的高斯进程极限 2410.20289v2 |
Authors (1): Giacomo Petrillo
Bayesian Additive Regression Trees (BART) is a nonparametric Bayesian regression technique of rising fame. It is a sum-of-decision-trees model, and is in some sense the Bayesian version of boosting. In the limit of infinite trees, it becomes equivalent to Gaussian process (GP) regression. This limit is known but has not yet led to any useful analysis or application. For the first time, I derive and compute the exact BART prior covariance function. With it I implement the infinite trees limit of BART as GP regression. Through empirical tests, I show that this limit is worse than standard BART in a fixed configuration, but also that tuning its hyperparameters in the natural GP way makes it competitive with BART. The advantage of using a GP surrogate of BART is the analytical likelihood, which simplifies model building and sidesteps the complex BART MCMC algorithm. More generally, this study opens new ways to understand and develop BART and GP regression. The implementation of BART as GP is available in the Python package lsqfitgp.
nan
Article 1124
Title@2025-07-11 (5): What should a neuron aim for? Designing local objective functions based on information theory
Title: What should a neuron aim for? Designing local objective functions based on information theory | Was sollte ein Neuron anstreben? Auf der Grundlage der Informationstheorie lokale objektive Funktionen entwerfen | 神经神经元的目标应该是什么?根据信息理论设计当地客观功能 2412.02482v4 |
Authors (7): Andreas C. Schneider, Valentin Neuhaus, David A. Ehrlich, Abdullah Makkeh, Alexander S. Ecker, Viola Priesemann, Michael Wibral
In modern deep neural networks, the learning dynamics of the individual neurons is often obscure, as the networks are trained via global optimization. Conversely, biological systems build on self-organized, local learning, achieving robustness and efficiency with limited global information. We here show how self-organization between individual artificial neurons can be achieved by designing abstract bio-inspired local learning goals. These goals are parameterized using a recent extension of information theory, Partial Information Decomposition (PID), which decomposes the information that a set of information sources holds about an outcome into unique, redundant and synergistic contributions. Our framework enables neurons to locally shape the integration of information from various input classes, i.e. feedforward, feedback, and lateral, by selecting which of the three inputs should contribute uniquely, redundantly or synergistically to the output. This selection is expressed as a weighted sum of PID terms, which, for a given problem, can be directly derived from intuitive reasoning or via numerical optimization, offering a window into understanding task-relevant local information processing. Achieving neuron-level interpretability while enabling strong performance using local learning, our work advances a principled information-theoretic foundation for local learning strategies.
nan
Article 1125
Title@2025-07-11 (5): AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling
Title: AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling | AbbIE: Autoregressiver Blockbasierter iterativer Encoder für effiziente Sequenzmodellierung | BBIE: 高效序列建模自动递减区块迭代计算器 2507.08567v1 |
Authors (10): Preslav Aleksandrov, Meghdad Kurmanji, Fernando Garcia Redondo, David O’Shea, William Shen, Alex Iacob, Lorenzo Sani, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane
We introduce the Autoregressive Block-Based Iterative Encoder (AbbIE), a novel recursive generalization of the encoder-only Transformer architecture, which achieves better perplexity than a standard Transformer and allows for the dynamic scaling of compute resources at test time. This simple, recursive approach is a complement to scaling large language model (LLM) performance through parameter and token counts. AbbIE performs its iterations in latent space, but unlike latent reasoning models, does not require a specialized dataset or training protocol. We show that AbbIE upward generalizes (ability to generalize to arbitrary iteration lengths) at test time by only using 2 iterations during train time, far outperforming alternative iterative methods. AbbIE’s ability to scale its computational expenditure based on the complexity of the task gives it an up to \textbf{12\%} improvement in zero-shot in-context learning tasks versus other iterative and standard methods and up to 5\% improvement in language perplexity. The results from this study open a new avenue to Transformer performance scaling. We perform all of our evaluations on model sizes up to 350M parameters.
nan
Article 1126
Title@2025-07-11 (5): Data-driven system identification using quadratic embeddings of nonlinear dynamics
Title: Data-driven system identification using quadratic embeddings of nonlinear dynamics | Datengesteuerte Systemidentifikation mittels quadratischer Einbettungen nichtlinearer Dynamik | 利用非线性动态的二次嵌入进行数据驱动系统识别 2501.08202v2 |
Authors (2): Stefan Klus, Joel-Pascal Ntwali N’konzi
We propose a novel data-driven method called QENDy (Quadratic Embedding of Nonlinear Dynamics) that not only allows us to learn quadratic representations of highly nonlinear dynamical systems, but also to identify the governing equations. The approach is based on an embedding of the system into a higher-dimensional feature space in which the dynamics become quadratic. Just like SINDy (Sparse Identification of Nonlinear Dynamics), our method requires trajectory data, time derivatives for the training data points, which can also be estimated using finite difference approximations, and a set of preselected basis functions, called dictionary. We illustrate the efficacy and accuracy of QENDy with the aid of various benchmark problems and compare its performance with SINDy and a deep learning method for identifying quadratic embeddings. Furthermore, we analyze the convergence of QENDy and SINDy in the infinite data limit, highlight their similarities and main differences, and compare the quadratic embedding with linearization techniques based on the Koopman operator.
nan
Article 1127
Title@2025-07-11 (5): LITE: Efficiently Estimating Gaussian Probability of Maximality
Title: LITE: Efficiently Estimating Gaussian Probability of Maximality | LITE: Effiziente Bewertung der Gaußschen Wahrscheinlichkeit von Maximalität | LITE:有效估计高斯人最大化的概率 2501.13535v3 |
Authors (4): Nicolas Menet, Jonas Hübotter, Parnian Kassraie, Andreas Krause
We consider the problem of computing the probability of maximality (PoM) of a Gaussian random vector, i.e., the probability for each dimension to be maximal. This is a key challenge in applications ranging from Bayesian optimization to reinforcement learning, where the PoM not only helps with finding an optimal action, but yields a fine-grained analysis of the action domain, crucial in tasks such as drug discovery. Existing techniques are costly, scaling polynomially in computation and memory with the vector size. We introduce LITE, the first approach for estimating Gaussian PoM with almost-linear time and memory complexity. LITE achieves SOTA accuracy on a number of tasks, while being in practice several orders of magnitude faster than the baselines. This also translates to a better performance on downstream tasks such as entropy estimation and optimal control of bandits. Theoretically, we cast LITE as entropy-regularized UCB and connect it to prior PoM estimators.
nan
Article 1128
Title@2025-07-11 (5): GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction
Title: GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction | GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction | GNN-ALLP:基于模拟电路链接预测的图表神经网络 2504.10240v3 |
Authors (9): Guanyuan Pan, Tiansheng Zhou, Bingtao Ma, Yaqi Wang, Jianxiang Zhao, Zhi Li, Yugui Lin, Pietro Lio, Shuai Wang
Circuit link prediction identifying missing component connections from incomplete netlists is crucial in automating analog circuit design. However, existing methods face three main challenges: 1) Insufficient use of topological patterns in circuit graphs reduces prediction accuracy; 2) Data scarcity due to the complexity of annotations hinders model generalization; 3) Limited adaptability to various netlist formats. We propose GNN-ACLP, a Graph Neural Networks (GNNs) based framework featuring three innovations to tackle these challenges. First, we introduce the SEAL (Subgraphs, Embeddings, and Attributes for Link Prediction) framework and achieve port-level accuracy in circuit link prediction. Second, we propose Netlist Babel Fish, a netlist format conversion tool leveraging retrieval-augmented generation (RAG) with a large language model (LLM) to enhance the compatibility of netlist formats. Finally, we construct SpiceNetlist, a comprehensive dataset that contains 775 annotated circuits across 10 different component classes. Experiments demonstrate accuracy improvements of 16.08% on SpiceNetlist, 11.38% on Image2Net, and 16.01% on Masala-CHAI compared to the baseline in intra-dataset evaluation, while maintaining accuracy from 92.05% to 99.07% in cross-dataset evaluation, exhibiting robust feature transfer capabilities.
nan
Article 1129
Title@2025-07-11 (5): Leveraging priors on distribution functions for multi-arm bandits
Title: Leveraging priors on distribution functions for multi-arm bandits | Nutzung von Vorabinformationen über Verteilungsfunktionen für Mehrarmbanditen | 利用多武器强盗分配功能的前身 2503.04518v2 |
Authors (2): Sumit Vashishtha, Odalric-Ambrym Maillard
We introduce Dirichlet Process Posterior Sampling (DPPS), a Bayesian non-parametric algorithm for multi-arm bandits based on Dirichlet Process (DP) priors. Like Thompson-sampling, DPPS is a probability-matching algorithm, i.e., it plays an arm based on its posterior-probability of being optimal. Instead of assuming a parametric class for the reward generating distribution of each arm, and then putting a prior on the parameters, in DPPS the reward generating distribution is directly modeled using DP priors. DPPS provides a principled approach to incorporate prior belief about the bandit environment, and in the noninformative limit of the DP posteriors (i.e. Bayesian Bootstrap), we recover Non Parametric Thompson Sampling (NPTS), a popular non-parametric bandit algorithm, as a special case of DPPS. We employ stick-breaking representation of the DP priors, and show excellent empirical performance of DPPS in challenging synthetic and real world bandit environments. Finally, using an information-theoretic analysis, we show non-asymptotic optimality of DPPS in the Bayesian regret setup.
nan
Article 1130
Title@2025-07-11 (5): SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2
Title: SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2 | SAM2RL: Auf dem Weg zu einer verstärkten Gedächtnissteuerung im Segment Anything Modell 2 | SAM2RL: 争取加强第2部分 “ 任何内容 “ 模式中的学习记忆控制 2507.08548v1 |
Authors (5): Alen Adamyan, Tomáš Čížek, Matej Straka, Klara Janouskova, Martin Schmid
Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks and has become the state-of-the-art for visual object tracking. The model stores information from previous frames in a memory bank, enabling temporal consistency across video sequences. Recent methods augment SAM 2 with hand-crafted update rules to better handle distractors, occlusions, and object motion. We propose a fundamentally different approach using reinforcement learning for optimizing memory updates in SAM 2 by framing memory control as a sequential decision-making problem. In an overfitting setup with a separate agent per video, our method achieves a relative improvement over SAM 2 that exceeds by more than three times the gains of existing heuristics. These results reveal the untapped potential of the memory bank and highlight reinforcement learning as a powerful alternative to hand-crafted update rules for memory control in visual object tracking.
nan
Article 1131
Title@2025-07-11 (5): Quantum Algorithms for Projection-Free Sparse Convex Optimization
Title: Quantum Algorithms for Projection-Free Sparse Convex Optimization | Quantenalgorithmen für projektionsfreie Sparse Convex-Optimierung | 用于无投射无孔式微粒电解优化的量图量算法 2507.08543v1 |
Authors (2): Jianhao He, John C. S. Lui
This paper considers the projection-free sparse convex optimization problem for the vector domain and the matrix domain, which covers a large number of important applications in machine learning and data science. For the vector domain $\mathcal{D} \subset \mathbb{R}^d$, we propose two quantum algorithms for sparse constraints that finds a $\varepsilon$-optimal solution with the query complexity of $O(\sqrt{d}/\varepsilon)$ and $O(1/\varepsilon)$ by using the function value oracle, reducing a factor of $O(\sqrt{d})$ and $O(d)$ over the best classical algorithm, respectively, where $d$ is the dimension. For the matrix domain $\mathcal{D} \subset \mathbb{R}^{d\times d}$, we propose two quantum algorithms for nuclear norm constraints that improve the time complexity to $\tilde{O}(rd/\varepsilon^2)$ and $\tilde{O}(\sqrt{r}d/\varepsilon^3)$ for computing the update step, reducing at least a factor of $O(\sqrt{d})$ over the best classical algorithm, where $r$ is the rank of the gradient matrix. Our algorithms show quantum advantages in projection-free sparse convex optimization problems as they outperform the optimal classical methods in dependence on the dimension $d$.
nan
Article 1132
Title@2025-07-11 (5): CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes
Title: CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes | CircFormerMoE: Ein durchgängiges Deep-Learning-Framework für die kreisförmige RNA-Splice-Site-Erkennung und -Pairing in Pflanzengenomen | Circ FormerMoE: 植物基因组中循环RNA Spolice Spolice Spolice 站点探测和配对的尾端至尾端深层学习框架 2507.08542v1 |
Authors (1): Tianyou Jiang
Circular RNAs (circRNAs) are important components of the non-coding RNA regulatory network. Previous circRNA identification primarily relies on high-throughput RNA sequencing (RNA-seq) data combined with alignment-based algorithms that detect back-splicing signals. However, these methods face several limitations: they can’t predict circRNAs directly from genomic DNA sequences and relies heavily on RNA experimental data; they involve high computational costs due to complex alignment and filtering steps; and they are inefficient for large-scale or genome-wide circRNA prediction. The challenge is even greater in plants, where plant circRNA splice sites often lack the canonical GT-AG motif seen in human mRNA splicing, and no efficient deep learning model with strong generalization capability currently exists. Furthermore, the number of currently identified plant circRNAs is likely far lower than their true abundance. In this paper, we propose a deep learning framework named CircFormerMoE based on transformers and mixture-of experts for predicting circRNAs directly from plant genomic DNA. Our framework consists of two subtasks known as splicing site detection (SSD) and splicing site pairing (SSP). The model’s effectiveness has been validated on gene data of 10 plant species. Trained on known circRNA instances, it is also capable of discovering previously unannotated circRNAs. In addition, we performed interpretability analyses on the trained model to investigate the sequence patterns contributing to its predictions. Our framework provides a fast and accurate computational method and tool for large-scale circRNA discovery in plants, laying a foundation for future research in plant functional genomics and non-coding RNA annotation.
nan
Article 1133
Title@2025-07-11 (5): Recursive Reward Aggregation
Title: Recursive Reward Aggregation | Rekursive Prämienaggregation | 递递回报聚合 2507.08537v1 |
Authors (6): Yuting Tang, Yivan Zhang, Johannes Ackermann, Yu-Jie Zhang, Soichiro Nishimori, Masashi Sugiyama
In reinforcement learning (RL), aligning agent behavior with specific objectives typically requires careful design of the reward function, which can be challenging when the desired objectives are complex. In this work, we propose an alternative approach for flexible behavior alignment that eliminates the need to modify the reward function by selecting appropriate reward aggregation functions. By introducing an algebraic perspective on Markov decision processes (MDPs), we show that the Bellman equations naturally emerge from the recursive generation and aggregation of rewards, allowing for the generalization of the standard discounted sum to other recursive aggregations, such as discounted max and Sharpe ratio. Our approach applies to both deterministic and stochastic settings and integrates seamlessly with value-based and actor-critic algorithms. Experimental results demonstrate that our approach effectively optimizes diverse objectives, highlighting its versatility and potential for real-world applications.
nan
Article 1134
Title@2025-07-11 (5): Multiaccuracy and Multicalibration via Proxy Groups
Title: Multiaccuracy and Multicalibration via Proxy Groups | Multiakkuratität und Multikalibrierung über Proxy-Gruppen | 通过代理集团实现多准确度和多校准 2503.02870v3 |
Authors (4): Beepul Bharti, Mary Versa Clemens-Sewall, Paul H. Yi, Jeremias Sulam
As the use of predictive machine learning algorithms increases in high-stakes decision-making, it is imperative that these algorithms are fair across sensitive groups. However, measuring and enforcing fairness in real-world applications can be challenging due to the missing or incomplete sensitive group information. Proxy-sensitive attributes have been proposed as a practical and effective solution in these settings, but only for parity-based fairness notions. Knowing how to evaluate and control for fairness with missing sensitive group data for newer, different, and more flexible frameworks, such as multiaccuracy and multicalibration, remain unexplored. In this work, we address this gap by demonstrating that in the absence of sensitive group data, proxy-sensitive attributes can provably used to derive actionable upper bounds on the true multiaccuracy and multicalibration violations, providing insights into a predictive model’s potential worst-case fairness violations. Additionally, we show that adjusting models to satisfy multiaccuracy and multicalibration across proxy-sensitive attributes can significantly mitigate these violations for the true, but unknown, sensitive groups. Through several experiments on real-world datasets, we illustrate that approximate multiaccuracy and multicalibration can be achieved even when sensitive group data is incomplete or unavailable.
nan
Article 1135
Title@2025-07-11 (5): Binary and Ternary Quantization Can Enhance Feature Discrimination
Title: Binary and Ternary Quantization Can Enhance Feature Discrimination | Binäre und Ternäre Quantisierung kann Feature-Diskriminierung verbessern | 二进制和三进制量化能够增强特征歧视 2504.13792v2 |
Authors (3): Weizhi Lu, Mingrui Chen, Weiyu Li
Quantization is widely applied in machine learning to reduce computational and storage costs for both data and models. Considering that classification tasks are fundamental to the field, it is crucial to investigate how quantization impacts classification performance. Traditional research has focused on quantization errors, assuming that larger errors generally lead to lower classification accuracy. However, this assumption lacks a solid theoretical foundation and often contradicts empirical observations. For example, despite introducing significant errors, ${0,1}$-binary and ${0, \pm1}$-ternary quantized data have sometimes achieved classification accuracy comparable or even superior to full-precision data. To reasonably explain this phenomenon, a more accurate evaluation of classification performance is required. To achieve this, we propose a direct analysis of the feature discrimination of quantized data, instead of focusing on quantization errors. Our analysis reveals that both binary and ternary quantization can potentially enhance, rather than degrade, the feature discrimination of the original data. This finding is supported by classification experiments conducted on both synthetic and real data.
nan
Article 1136
Title@2025-07-11 (5): Communities in the Kuramoto Model: Dynamics and Detection via Path Signatures
Title: Communities in the Kuramoto Model: Dynamics and Detection via Path Signatures | Gemeinschaften im Kuramoto-Modell: Dynamik und Erkennung über Pfadsignaturen | 仓本模式中的社区:动态和通过路径签名探测 2503.17546v3 |
Authors (3): Tâm Johan Nguyên, Darrick Lee, Bernadette Jana Stolz
The behavior of multivariate dynamical processes is often governed by underlying structural connections that relate the components of the system. For example, brain activity, which is often measured via time series is determined by an underlying structural graph, where nodes represent neurons or brain regions and edges cortical connectivity. Existing methods for inferring structural connections from observed dynamics, such as correlation-based or spectral techniques, may fail to fully capture complex relationships in high-dimensional time series in an interpretable way. Here, we propose the use of path signatures, a mathematical framework that encodes geometric and temporal properties of continuous paths, to address this problem. Path signatures provide a reparametrization-invariant characterization of dynamical data and can be used to compute the lead matrix, which reveals lead-lag phenomena. We showcase our approach on time series from coupled oscillators in the Kuramoto model defined on a stochastic block model graph, termed the Kuramoto Stochastic Block Model (KSBM). Using mean-field theory and Gaussian approximations, we analytically derive reduced models of KSBM dynamics in different temporal regimes and theoretically characterize the lead matrix in these settings. Leveraging these insights, we propose a novel signature-based community detection algorithm, achieving exact recovery of structural communities from observed time series in multiple KSBM instances. We also explored the performance of our community detection on a stochastic variant of the KSBM as well as on real neuropixels of cortical recordings to demonstrate applicability on real-world data. Our results demonstrate that path signatures provide a novel perspective on analyzing complex neural data and other high-dimensional systems, explicitly exploiting temporal functional relationships to infer underlying structure.
nan
Article 1137
Title@2025-07-11 (5): REGEN: A Dataset and Benchmarks with Natural Language Critiques and Narratives
Title: REGEN: A Dataset and Benchmarks with Natural Language Critiques and Narratives | REGEN: Ein Datensatz und Benchmarks mit natürlichen Sprachkritiken und Erzählungen | REGEN: 一套具有自然语种背景和叙述的数据集和基准 2503.11924v2 |
Authors (11): Kun Su, Krishna Sayana, Hubert Pham, James Pine, Yuri Vasilevski, Raghavendra Vasudeva, Marialena Kyriakidi, Liam Hebert, Ambarish Jash, Anushya Subbiah, Sukhdeep Sodhi
This paper introduces a novel dataset REGEN (Reviews Enhanced with GEnerative Narratives), designed to benchmark the conversational capabilities of recommender Large Language Models (LLMs), addressing the limitations of existing datasets that primarily focus on sequential item prediction. REGEN extends the Amazon Product Reviews dataset by inpainting two key natural language features: (1) user critiques, representing user “steering” queries that lead to the selection of a subsequent item, and (2) narratives, rich textual outputs associated with each recommended item taking into account prior context. The narratives include product endorsements, purchase explanations, and summaries of user preferences. Further, we establish an end-to-end modeling benchmark for the task of conversational recommendation, where models are trained to generate both recommendations and corresponding narratives conditioned on user history (items and critiques). For this joint task, we introduce a modeling framework LUMEN (LLM-based Unified Multi-task Model with Critiques, Recommendations, and Narratives) which uses an LLM as a backbone for critiquing, retrieval and generation. We also evaluate the dataset’s quality using standard auto-rating techniques and benchmark it by training both traditional and LLM-based recommender models. Our results demonstrate that incorporating critiques enhances recommendation quality by enabling the recommender to learn language understanding and integrate it with recommendation signals. Furthermore, LLMs trained on our dataset effectively generate both recommendations and contextual narratives, achieving performance comparable to state-of-the-art recommenders and language models.
nan
Article 1138
Title@2025-07-11 (5): Data Depth as a Risk
Title: Data Depth as a Risk | Datentiefe als Risiko | 数据深度作为风险 2507.08518v1 |
Authors (2): Arturo Castellanos, Pavlo Mozharovskyi
Data depths are score functions that quantify in an unsupervised fashion how central is a point inside a distribution, with numerous applications such as anomaly detection, multivariate or functional data analysis, arising across various fields. The halfspace depth was the first depth to aim at generalising the notion of quantile beyond the univariate case. Among the existing variety of depth definitions, it remains one of the most used notions of data depth. Taking a different angle from the quantile point of view, we show that the halfspace depth can also be regarded as the minimum loss of a set of classifiers for a specific labelling of the points. By changing the loss or the set of classifiers considered, this new angle naturally leads to a family of “loss depths”, extending to well-studied classifiers such as, e.g., SVM or logistic regression, among others. This framework directly inherits computational efficiency of existing machine learning algorithms as well as their fast statistical convergence rates, and opens the data depth realm to the high-dimensional setting. Furthermore, the new loss depths highlight a connection between the dataset and the right amount of complexity or simplicity of the classifiers. The simplicity of classifiers as well as the interpretation as a risk makes our new kind of data depth easy to explain, yet efficient for anomaly detection, as is shown by experiments.
nan
Article 1139
Title@2025-07-11 (5): SFedKD: Sequential Federated Learning with Discrepancy-Aware Multi-Teacher Knowledge Distillation
Title: SFedKD: Sequential Federated Learning with Discrepancy-Aware Multi-Teacher Knowledge Distillation | SFedKD: Sequentielles Föderales Lernen mit Diskrepanz-Bewusst-Multi-Lehrer-Wissensdestillation | SFedKD: 分级的联邦学习与差异-软件软件多教学员知识蒸馏 2507.08508v1 |
Authors (6): Haotian Xu, Jinrui Zhou, Xichong Zhang, Mingjun Xiao, He Sun, Yin Xu
Federated Learning (FL) is a distributed machine learning paradigm which coordinates multiple clients to collaboratively train a global model via a central server. Sequential Federated Learning (SFL) is a newly-emerging FL training framework where the global model is trained in a sequential manner across clients. Since SFL can provide strong convergence guarantees under data heterogeneity, it has attracted significant research attention in recent years. However, experiments show that SFL suffers from severe catastrophic forgetting in heterogeneous environments, meaning that the model tends to forget knowledge learned from previous clients. To address this issue, we propose an SFL framework with discrepancy-aware multi-teacher knowledge distillation, called SFedKD, which selects multiple models from the previous round to guide the current round of training. In SFedKD, we extend the single-teacher Decoupled Knowledge Distillation approach to our multi-teacher setting and assign distinct weights to teachers’ target-class and non-target-class knowledge based on the class distributional discrepancy between teacher and student data. Through this fine-grained weighting strategy, SFedKD can enhance model training efficacy while mitigating catastrophic forgetting. Additionally, to prevent knowledge dilution, we eliminate redundant teachers for the knowledge distillation and formalize it as a variant of the maximum coverage problem. Based on the greedy strategy, we design a complementary-based teacher selection mechanism to ensure that the selected teachers achieve comprehensive knowledge space coverage while reducing communication and computational costs. Extensive experiments show that SFedKD effectively overcomes catastrophic forgetting in SFL and outperforms state-of-the-art FL methods.
nan
Article 1140
Title@2025-07-11 (5): Physics-informed machine learning: A mathematical framework with applications to time series forecasting
Title: Physics-informed machine learning: A mathematical framework with applications to time series forecasting | Physik-informiertes maschinelles Lernen: Ein mathematisches Rahmenwerk mit Anwendungen zur Zeitreihenvorhersage | 物理知情机机学习:一个数学框架,可应用于时间序列预测 2507.08906v1 |
Authors (1): Nathan Doumèche
Physics-informed machine learning (PIML) is an emerging framework that integrates physical knowledge into machine learning models. This physical prior often takes the form of a partial differential equation (PDE) system that the regression function must satisfy. In the first part of this dissertation, we analyze the statistical properties of PIML methods. In particular, we study the properties of physics-informed neural networks (PINNs) in terms of approximation, consistency, overfitting, and convergence. We then show how PIML problems can be framed as kernel methods, making it possible to apply the tools of kernel ridge regression to better understand their behavior. In addition, we use this kernel formulation to develop novel physics-informed algorithms and implement them efficiently on GPUs. The second part explores industrial applications in forecasting energy signals during atypical periods. We present results from the Smarter Mobility challenge on electric vehicle charging occupancy and examine the impact of mobility on electricity demand. Finally, we introduce a physics-constrained framework for designing and enforcing constraints in time series, applying it to load forecasting and tourism forecasting in various countries.
nan
Article 1141
Title@2025-07-11 (5): One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning
Title: One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning | One-Pass to Reason: Token-Duplikation und Block-Spar-Maske für effizientes Feintuning auf Multi-Turn-Reasoning | 单向理由:在多向理由上高效精美调整的相重复和块分割掩码 2504.18246v2 |
Authors (3): Ritesh Goru, Shanay Mehta, Prateek Jain
Fine-tuning Large Language Models (LLMs) on multi-turn reasoning datasets requires N (number of turns) separate forward passes per conversation due to reasoning token visibility constraints, as reasoning tokens for a turn are discarded in subsequent turns. We propose duplicating response tokens along with a custom attention mask to enable single-pass processing of entire conversations. We prove our method produces identical losses to the N-pass approach while reducing time complexity from $O\bigl(N^{3}\bigl)$ to $O\bigl(N^{2}\bigl)$ and maintaining the same memory complexity for a transformer based model. Our approach achieves significant training speedup while preserving accuracy. Our implementation is available online (https://github.com/devrev/One-Pass-to-Reason).
nan
Article 1142
Title@2025-07-11 (5): Universal Approximation Theorem for a Single-Layer Transformer
Title: Universal Approximation Theorem for a Single-Layer Transformer | Universelles Approximationstheorem für einen Single-Layer Transformer | 单层变形器的通用近光理论论 2507.10581v1 |
Authors (1): Esmail Gumaan
Deep learning employs multi-layer neural networks trained via the backpropagation algorithm. This approach has achieved success across many domains and relies on adaptive gradient methods such as the Adam optimizer. Sequence modeling evolved from recurrent neural networks to attention-based models, culminating in the Transformer architecture. Transformers have achieved state-of-the-art performance in natural language processing (for example, BERT and GPT-3) and have been applied in computer vision and computational biology. However, theoretical understanding of these models remains limited. In this paper, we examine the mathematical foundations of deep learning and Transformers and present a novel theoretical result. We review key concepts from linear algebra, probability, and optimization that underpin deep learning, and we analyze the multi-head self-attention mechanism and the backpropagation algorithm in detail. Our main contribution is a universal approximation theorem for Transformers: we prove that a single-layer Transformer, comprising one self-attention layer followed by a position-wise feed-forward network with ReLU activation, can approximate any continuous sequence-to-sequence mapping on a compact domain to arbitrary precision. We provide a formal statement and a complete proof. Finally, we present case studies that demonstrate the practical implications of this result. Our findings advance the theoretical understanding of Transformer models and help bridge the gap between theory and practice.
nan
Article 1143
Title@2025-07-11 (5): Feasibility Study of CNNs and MLPs for Radiation Heat Transfer in 2-D Furnaces with Spectrally Participative Gases
Title: Feasibility Study of CNNs and MLPs for Radiation Heat Transfer in 2-D Furnaces with Spectrally Participative Gases | Machbarkeitsstudie von CNNs und MLPs für den Strahlungswärmetransfer in 2-D-Öfen mit Spektrally Participative Gasen | 关于有线电视新闻网和多频多频卫星在2-D发热中用光谱参与气体进行辐射热传导的有线电视新闻网和 MLP的可行性研究 2506.08033v3 |
Authors (5): Axel TahmasebiMoradi, Vincent Ren, Benjamin Le-Creurer, Chetra Mang, Mouadh Yagoubi
Aiming to reduce the computational cost of numerical simulations, a convolutional neural network (CNN) and a multi-layer perceptron (MLP) are introduced to build a surrogate model to approximate radiative heat transfer solutions in a 2-D walled domain with participative gases. The originality of this work lays in the adaptation of the inputs of the problem (gas and wall properties) in order to fit with the CNN architecture, more commonly used for image processing. Two precision datasets have been created with the classical solver, ICARUS2D, that uses the discrete transfer radiation method with the statistical narrow bands model. The performance of the CNN architecture is compared to a more classical MLP architecture in terms of speed and accuracy. Thanks to Optuna, all results are obtained using the optimized hyper parameters networks. The results show a significant speedup with industrially acceptable relative errors compared to the classical solver for both architectures. Additionally, the CNN outperforms the MLP in terms of precision and is more robust and stable to changes in hyper-parameters. A performance analysis on the dataset size of the samples have also been carried out to gain a deeper understanding of the model behavior.
nan
Article 1144
Title@2025-07-11 (5): SynBridge: Bridging Reaction States via Discrete Flow for Bidirectional Reaction Prediction
Title: SynBridge: Bridging Reaction States via Discrete Flow for Bidirectional Reaction Prediction | SynBridge: Überbrückungsreaktionszustände über diskreten Fluss für bidirektionale Reaktionsvorhersage | SynBridge:通过分向流为双向反应预测进行连接反应国家 2507.08475v1 |
Authors (8): Haitao Lin, Junjie Wang, Zhifeng Gao, Xiaohong Ji, Rong Zhu, Linfeng Zhang, Guolin Ke, Weinan E
The essence of a chemical reaction lies in the redistribution and reorganization of electrons, which is often manifested through electron transfer or the migration of electron pairs. These changes are inherently discrete and abrupt in the physical world, such as alterations in the charge states of atoms or the formation and breaking of chemical bonds. To model the transition of states, we propose SynBridge, a bidirectional flow-based generative model to achieve multi-task reaction prediction. By leveraging a graph-to-graph transformer network architecture and discrete flow bridges between any two discrete distributions, SynBridge captures bidirectional chemical transformations between graphs of reactants and products through the bonds’ and atoms’ discrete states. We further demonstrate the effectiveness of our method through extensive experiments on three benchmark datasets (USPTO-50K, USPTO-MIT, Pistachio), achieving state-of-the-art performance in both forward and retrosynthesis tasks. Our ablation studies and noise scheduling analysis reveal the benefits of structured diffusion over discrete spaces for reaction prediction.
nan
Article 1145
Title@2025-07-11 (5): Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
Title: Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model | Squeeze the Soaked Sponge: Effiziente Off-Policy-Verstärkung Feinsteuerung für großes Sprachmodell | 挤压海绵:高效非政策强化大语言模式的高效非政策改进微调 2507.06892v3 |
Authors (8): Jing Liang, Hongyao Tang, Yi Ma, Jinyi Liu, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao
Reinforcement Learning (RL) has demonstrated its potential to improve the reasoning ability of Large Language Models (LLMs). One major limitation of most existing Reinforcement Finetuning (RFT) methods is that they are on-policy RL in nature, i.e., data generated during the past learning process is not fully utilized. This inevitably comes at a significant cost of compute and time, posing a stringent bottleneck on continuing economic and efficient scaling. To this end, we launch the renaissance of off-policy RL and propose Reincarnating Mix-policy Proximal Policy Gradient (ReMix), a general approach to enable on-policy RFT methods like PPO and GRPO to leverage off-policy data. ReMix consists of three major components: (1) Mix-policy proximal policy gradient with an increased Update-To-Data (UTD) ratio for efficient training; (2) KL-Convex policy constraint to balance the trade-off between stability and flexibility; (3) Policy reincarnation to achieve a seamless transition from efficient early-stage learning to steady asymptotic improvement. In our experiments, we train a series of ReMix models upon PPO, GRPO and 1.5B, 7B base models. ReMix shows an average Pass@1 accuracy of 52.10% (for 1.5B model) with 0.079M response rollouts, 350 training steps and achieves 63.27%/64.39% (for 7B model) with 0.007M/0.011M response rollouts, 50/75 training steps, on five math reasoning benchmarks (i.e., AIME’24, AMC’23, Minerva, OlympiadBench, and MATH500). Compared with 15 recent advanced models, ReMix shows SOTA-level performance with an over 30x to 450x reduction in training cost in terms of rollout data volume. In addition, we reveal insightful findings via multifaceted analysis, including the implicit preference for shorter responses due to the Whipping Effect of off-policy discrepancy, the collapse mode of self-reflection behavior under the presence of severe off-policyness, etc.
nan
Article 1146
Title@2025-07-11 (5): Evaluating SAE interpretability without explanations
Title: Evaluating SAE interpretability without explanations | Bewertung der SAE-Interpretation ohne Erklärungen | 评估是否可无解释地对SAE进行可解释性评估 2507.08473v1 |
Authors (2): Gonçalo Paulo, Nora Belrose
Sparse autoencoders (SAEs) and transcoders have become important tools for machine learning interpretability. However, measuring how interpretable they are remains challenging, with weak consensus about which benchmarks to use. Most evaluation procedures start by producing a single-sentence explanation for each latent. These explanations are then evaluated based on how well they enable an LLM to predict the activation of a latent in new contexts. This method makes it difficult to disentangle the explanation generation and evaluation process from the actual interpretability of the latents discovered. In this work, we adapt existing methods to assess the interpretability of sparse coders, with the advantage that they do not require generating natural language explanations as an intermediate step. This enables a more direct and potentially standardized assessment of interpretability. Furthermore, we compare the scores produced by our interpretability metrics with human evaluations across similar tasks and varying setups, offering suggestions for the community on improving the evaluation of these techniques.
nan
Article 1147
Title@2025-07-11 (5): Predicting Air Pollution in Cork, Ireland Using Machine Learning
Title: Predicting Air Pollution in Cork, Ireland Using Machine Learning | Vorhersage der Luftverschmutzung in Cork, Irland durch maschinelles Lernen | 利用机器学习预测爱尔兰科克的空气污染 2507.04196v2 |
Authors (6): Md Rashidunnabi, Fahmida Faiza Ananna, Kailash Hambarde, Bruno Gabriel Nascimento Andrade, Dean Venables, Hugo Proenca
Air pollution poses a critical health threat in cities worldwide, with nitrogen dioxide levels in Cork, Ireland exceeding World Health Organization safety standards by up to $278\%$. This study leverages artificial intelligence to predict air pollution with unprecedented accuracy, analyzing nearly ten years of data from five monitoring stations combined with 30 years of weather records. We evaluated 17 machine learning algorithms, with Extra Trees emerging as the optimal solution, achieving $77\%$ prediction accuracy and significantly outperforming traditional forecasting methods. Our analysis reveals that meteorological conditions particularly temperature, wind speed, and humidity are the primary drivers of pollution levels, while traffic patterns and seasonal changes create predictable pollution cycles. Pollution exhibits dramatic seasonal variations, with winter levels nearly double those of summer, and daily rush-hour peaks reaching $120\%$ above normal levels. While Cork’s air quality shows concerning violations of global health standards, our models detected an encouraging $31\%$ improvement from 2014 to 2022. This research demonstrates that intelligent forecasting systems can provide city planners and environmental officials with powerful prediction tools, enabling life-saving early warning systems and informed urban planning decisions. The technology exists today to transform urban air quality management. All research materials and code are freely available at: https://github.com/MdRashidunnabi/Air-Pollution-Analysis.git
nan
Article 1148
Title@2025-07-11 (5): Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings
Title: Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings | Neural Concept Verifier: Scaling Prover-Verifier Spiele über Concept Encodings | 神经概念验证符:通过概念编码来缩放Prover-Ver化游戏 2507.07532v2 |
Authors (5): Berkant Turan, Suhrab Asadulla, David Steinmann, Wolfgang Stammer, Sebastian Pokutta
While Prover-Verifier Games (PVGs) offer a promising path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, Concept Bottleneck Models (CBMs) effectively translate such data into interpretable concepts but are limited by their reliance on low-capacity linear predictors. In this work, we introduce the Neural Concept Verifier (NCV), a unified framework combining PVGs with concept encodings for interpretable, nonlinear classification in high-dimensional settings. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A prover then selects a subset of these encodings, which a verifier – implemented as a nonlinear predictor – uses exclusively for decision-making. Our evaluations show that NCV outperforms CBM and pixel-based PVG classifier baselines on high-dimensional, logically complex datasets and also helps mitigate shortcut behavior. Overall, we demonstrate NCV as a promising step toward performative, verifiable AI.
nan
Article 1149
Title@2025-07-11 (5): Pre-Training LLMs on a budget: A comparison of three optimizers
Title: Pre-Training LLMs on a budget: A comparison of three optimizers | Pre-Training LLMs auf einem Budget: Ein Vergleich von drei Optimierern | 预算培训前LLMLM项目:三个优化器的比较 2507.08472v1 |
Authors (6): Joel Schlotthauer, Christian Kroos, Chris Hinze, Viktor Hangya, Luzian Hahn, Fabian Küch
Optimizers play a decisive role in reducing pre-training times for LLMs and achieving better-performing models. In this study, we compare three major variants: the de-facto standard AdamW, the simpler Lion, developed through an evolutionary search, and the second-order optimizer Sophia. For better generalization, we train with two different base architectures and use a single- and a multiple-epoch approach while keeping the number of tokens constant. Using the Maximal Update Parametrization and smaller proxy models, we tune relevant hyperparameters separately for each combination of base architecture and optimizer. We found that while the results from all three optimizers were in approximately the same range, Sophia exhibited the lowest training and validation loss, Lion was fastest in terms of training GPU hours but AdamW led to the best downstream evaluation results.
nan
Article 1150
Title@2025-07-11 (5): Last Layer Hamiltonian Monte Carlo
Title: Last Layer Hamiltonian Monte Carlo | Letzte Schicht Hamiltonian Monte Carlo | 汉密尔顿·蒙特卡洛 2507.08905v1 |
Authors (5): Koen Vellenga, H. Joe Steinhauer, Göran Falkman, Jonas Andersson, Anders Sjögren
We explore the use of Hamiltonian Monte Carlo (HMC) sampling as a probabilistic last layer approach for deep neural networks (DNNs). While HMC is widely regarded as a gold standard for uncertainty estimation, the computational demands limit its application to large-scale datasets and large DNN architectures. Although the predictions from the sampled DNN parameters can be parallelized, the computational cost still scales linearly with the number of samples (similar to an ensemble). Last layer HMC (LL–HMC) reduces the required computations by restricting the HMC sampling to the final layer of a DNN, making it applicable to more data-intensive scenarios with limited computational resources. In this paper, we compare LL-HMC against five last layer probabilistic deep learning (LL-PDL) methods across three real-world video datasets for driver action and intention. We evaluate the in-distribution classification performance, calibration, and out-of-distribution (OOD) detection. Due to the stochastic nature of the probabilistic evaluations, we performed five grid searches for different random seeds to avoid being reliant on a single initialization for the hyperparameter configurations. The results show that LL–HMC achieves competitive in-distribution classification and OOD detection performance. Additional sampled last layer parameters do not improve the classification performance, but can improve the OOD detection. Multiple chains or starting positions did not yield consistent improvements.
nan
Article 1151
Title@2025-07-11 (5): Sculpting Quantum Landscapes: Fubini-Study Metric Conditioning for Geometry Aware Learning in Parameterized Quantum Circuits
Title: Sculpting Quantum Landscapes: Fubini-Study Metric Conditioning for Geometry Aware Learning in Parameterized Quantum Circuits | Sculpting Quantum Landscapes: Fubini-Studie Metric Conditioning for Geometry Aware Learning in parameterized Quantum Circuits | 雕刻量子地貌:在可计量量子电路进行几何认知学习的 Fubini-Study 测量测量条件 2506.21940v3 |
Authors (2): Marwan Ait Haddou, Mohamed Bennai
We present a novel meta learning framework called Sculpture that explicitly conditions the Fubini Study metric tensor of parameterized quantum circuits to mitigate barren plateaus in variational quantum algorithms. Our theoretical analysis identifies the logarithmic condition number of the Fubini Study metric as a critical geometric quantity governing trainability, optimization dynamics, and generalization. Sculpture uses a classical meta model trained to generate data dependent quantum circuit initializations that minimize the logarithmic condition number, thereby promoting an isotropic and well conditioned parameter space. Empirical results show that meta training reduces the logarithmic condition number from approximately 1.47 to 0.64 by significantly increasing the minimum eigenvalue and slightly decreasing the maximum eigenvalue of the metric, effectively alleviating barren plateaus. This improved conditioning generalizes well to unseen data, consistently producing well conditioned quantum circuit initializations. In a downstream hybrid quantum classical classification task on the Kaggle diabetes dataset, increasing the meta scaling coefficient accelerates convergence, reduces training loss and gradient norms, and crucially improves generalization, with test accuracy increasing from about 0.68 to over 0.78. These findings demonstrate that sculpting the quantum landscape via meta learning serves as a principled geometric regularizer, substantially enhancing trainability, optimization, and generalization of parameterized quantum circuits and enabling more robust and efficient variational quantum algorithms.
nan
Article 1152
Title@2025-07-11 (5): Ranked Set Sampling-Based Multilayer Perceptron: Improving Generalization via Variance-Based Bounds
Title: Ranked Set Sampling-Based Multilayer Perceptron: Improving Generalization via Variance-Based Bounds | Ranked Set Sampling-based Multilayer Perceptron: Verbesserung der Generalisierung durch variance-based Bounds | 按等级排列的基于抽样的多层概念:通过基于差异的边界改进普遍化 2507.08465v1 |
Authors (5): Feijiang Li, Liuya Zhang, Jieting Wang, Tao Yan, Yuhua Qian
Multilayer perceptron (MLP), one of the most fundamental neural networks, is extensively utilized for classification and regression tasks. In this paper, we establish a new generalization error bound, which reveals how the variance of empirical loss influences the generalization ability of the learning model. Inspired by this learning bound, we advocate to reduce the variance of empirical loss to enhance the ability of MLP. As is well-known, bagging is a popular ensemble method to realize variance reduction. However, bagging produces the base training data sets by the Simple Random Sampling (SRS) method, which exhibits a high degree of randomness. To handle this issue, we introduce an ordered structure in the training data set by Rank Set Sampling (RSS) to further reduce the variance of loss and develop a RSS-MLP method. Theoretical results show that the variance of empirical exponential loss and the logistic loss estimated by RSS are smaller than those estimated by SRS, respectively. To validate the performance of RSS-MLP, we conduct comparison experiments on twelve benchmark data sets in terms of the two convex loss functions under two fusion methods. Extensive experimental results and analysis illustrate the effectiveness and rationality of the propose method.
nan
Article 1153
Title@2025-07-11 (5): Collaborative filtering based on nonnegative/binary matrix factorization
Title: Collaborative filtering based on nonnegative/binary matrix factorization | Kollaborative Filterung auf der Grundlage nichtnegativer/binärer Matrixfaktorisierung | 基于非负负/二进制矩阵因子化的合作过滤 2410.10381v3 |
Authors (5): Yukino Terui, Yuka Inoue, Yohei Hamakawa, Kosuke Tatsumura, Kazue Kudo
Collaborative filtering generates recommendations by exploiting user-item similarities based on rating data, which often contains numerous unrated items. This paper proposes a nonnegative/binary matrix factorization (NBMF) algorithm modified for collaborative filtering and demonstrates that utilizing a low-latency Ising machine in NBMF is advantageous in terms of computation time. While previous studies have primarily applied NBMF to dense data, such as images, this study applies a modified NBMF to sparse data. Results show the benefits of using a low-latency Ising machine to implement the proposed method.
nan
Article 1154
Title@2025-07-11 (5): Space filling positionality and the Spiroformer
Title: Space filling positionality and the Spiroformer | Raumfüllpositionalität und der Spiroformer | 空间填充定位和空间 2507.08456v1 |
Authors (3): M. Maurin, M. Á. Evangelista-Alvarado, P. Suárez-Serrato
Transformers excel when dealing with sequential data. Generalizing transformer models to geometric domains, such as manifolds, we encounter the problem of not having a well-defined global order. We propose a solution with attention heads following a space-filling curve. As a first experimental example, we present the Spiroformer, a transformer that follows a polar spiral on the $2$-sphere.
nan
Article 1155
Title@2025-07-11 (5): Why this and not that? A Logic-based Framework for Contrastive Explanations
Title: Why this and not that? A Logic-based Framework for Contrastive Explanations | Warum das und nicht das? Ein logisch-basiertes Framework für kontrastive Erklärungen | 为什么这样而不是这样?基于逻辑的矛盾解释框架 2507.08454v1 |
Authors (5): Tobias Geibinger, Reijo Jaakkola, Antti Kuusisto, Xinghan Liu, Miikka Vilander
We define several canonical problems related to contrastive explanations, each answering a question of the form ‘‘Why P but not Q?’’. The problems compute causes for both P and Q, explicitly comparing their differences. We investigate the basic properties of our definitions in the setting of propositional logic. We show, inter alia, that our framework captures a cardinality-minimal version of existing contrastive explanations in the literature. Furthermore, we provide an extensive analysis of the computational complexities of the problems. We also implement the problems for CNF-formulas using answer set programming and present several examples demonstrating how they work in practice.
nan
Article 1156
Title@2025-07-11 (5): Field Matching: an Electrostatic Paradigm to Generate and Transfer Data
Title: Field Matching: an Electrostatic Paradigm to Generate and Transfer Data | Field Matching: ein elektrostatisches Paradigma zur Generierung und Übertragung von Daten | 字段匹配:生成和传输数据的电静电模型 2502.02367v2 |
Authors (4): Alexander Kolesov, Manukhov Stepan, Vladimir V. Palyulin, Alexander Korotin
We propose Electrostatic Field Matching (EFM), a novel method that is suitable for both generative modeling and distribution transfer tasks. Our approach is inspired by the physics of an electrical capacitor. We place source and target distributions on the capacitor plates and assign them positive and negative charges, respectively. We then learn the electrostatic field of the capacitor using a neural network approximator. To map the distributions to each other, we start at one plate of the capacitor and move the samples along the learned electrostatic field lines until they reach the other plate. We theoretically justify that this approach provably yields the distribution transfer. In practice, we demonstrate the performance of our EFM in toy and image data experiments.
nan
Article 1157
Title@2025-07-11 (5): KGRAG-Ex: Explainable Retrieval-Augmented Generation with Knowledge Graph-based Perturbations
Title: KGRAG-Ex: Explainable Retrieval-Augmented Generation with Knowledge Graph-based Perturbations | KGRAG-Ex: Erklärbare retrieval-erweiterte Generation mit wissensgraphbasierten Störungen | KGRAG-Ex: 具有基于知识图表的扰动作用的可解释的检索增强型生成器 2507.08443v1 |
Authors (4): Georgios Balanos, Evangelos Chasanis, Konstantinos Skianis, Evaggelia Pitoura
Retrieval-Augmented Generation (RAG) enhances language models by grounding responses in external information, yet explainability remains a critical challenge, particularly when retrieval relies on unstructured text. Knowledge graphs (KGs) offer a solution by introducing structured, semantically rich representations of entities and their relationships, enabling transparent retrieval paths and interpretable reasoning. In this work, we present KGRAG-Ex, a RAG system that improves both factual grounding and explainability by leveraging a domain-specific KG constructed via prompt-based information extraction. Given a user query, KGRAG-Ex identifies relevant entities and semantic paths in the graph, which are then transformed into pseudo-paragraphs: natural language representations of graph substructures that guide corpus retrieval. To improve interpretability and support reasoning transparency, we incorporate perturbation-based explanation methods that assess the influence of specific KG-derived components on the generated answers. We conduct a series of experiments to analyze the sensitivity of the system to different perturbation methods, the relationship between graph component importance and their structural positions, the influence of semantic node types, and how graph metrics correspond to the influence of components within the explanations process.
nan
Article 1158
Title@2025-07-11 (5): Optimal and Practical Batched Linear Bandit Algorithm
Title: Optimal and Practical Batched Linear Bandit Algorithm | Optimaler und praktischer Batched Linear Bandit Algorithmus | 最佳和实用的 Batched 线性强盗 2507.08438v1 |
Authors (2): Sanghoon Yu, Min-hwan Oh
We study the linear bandit problem under limited adaptivity, known as the batched linear bandit. While existing approaches can achieve near-optimal regret in theory, they are often computationally prohibitive or underperform in practice. We propose \texttt{BLAE}, a novel batched algorithm that integrates arm elimination with regularized G-optimal design, achieving the minimax optimal regret (up to logarithmic factors in $T$) in both large-$K$ and small-$K$ regimes for the first time, while using only $O(\log\log T)$ batches. Our analysis introduces new techniques for batch-wise optimal design and refined concentration bounds. Crucially, \texttt{BLAE} demonstrates low computational overhead and strong empirical performance, outperforming state-of-the-art methods in extensive numerical evaluations. Thus, \texttt{BLAE} is the first algorithm to combine provable minimax-optimality in all regimes and practical superiority in batched linear bandits.
nan
Article 1159
Title@2025-07-11 (5): FonTS: Text Rendering with Typography and Style Controls
Title: FonTS: Text Rendering with Typography and Style Controls | FonTS: Text Rendering mit Typografie und Style Controls | FonTS: 带有打字和样式控控管的文字成文 2412.00136v3 |
Authors (5): Wenda Shi, Yiren Song, Dengming Zhang, Jiaming Liu, Xingxing Zou
Visual text rendering are widespread in various real-world applications, requiring careful font selection and typographic choices. Recent progress in diffusion transformer (DiT)-based text-to-image (T2I) models show promise in automating these processes. However, these methods still encounter challenges like inconsistent fonts, style variation, and limited fine-grained control, particularly at the word-level. This paper proposes a two-stage DiT-based pipeline to address these problems by enhancing controllability over typography and style in text rendering. We introduce typography control fine-tuning (TC-FT), an parameter-efficient fine-tuning method (on $5\%$ key parameters) with enclosing typography control tokens (ETC-tokens), which enables precise word-level application of typographic features. To further address style inconsistency in text rendering, we propose a text-agnostic style control adapter (SCA) that prevents content leakage while enhancing style consistency. To implement TC-FT and SCA effectively, we incorporated HTML-render into the data synthesis pipeline and proposed the first word-level controllable dataset. Through comprehensive experiments, we demonstrate the effectiveness of our approach in achieving superior word-level typographic control, font consistency, and style consistency in text rendering tasks. The datasets and models will be available for academic use.
nan
Article 1160
Title@2025-07-11 (5): Answer Generation for Questions With Multiple Information Sources in E-Commerce
Title: Answer Generation for Questions With Multiple Information Sources in E-Commerce | Antwortgenerierung für Fragen mit mehreren Informationsquellen im E-Commerce | 电子商务中具有多种信息来源问题的答案生成问题 2111.14003v2 |
Authors (2): Anand A. Rajasekar, Nikesh Garera
Automatic question answering is an important yet challenging task in E-commerce given the millions of questions posted by users about the product that they are interested in purchasing. Hence, there is a great demand for automatic answer generation systems that provide quick responses using related information about the product. There are three sources of knowledge available for answering a user posted query, they are reviews, duplicate or similar questions, and specifications. Effectively utilizing these information sources will greatly aid us in answering complex questions. However, there are two main challenges present in exploiting these sources: (i) The presence of irrelevant information and (ii) the presence of ambiguity of sentiment present in reviews and similar questions. Through this work we propose a novel pipeline (MSQAP) that utilizes the rich information present in the aforementioned sources by separately performing relevancy and ambiguity prediction before generating a response. Experimental results show that our relevancy prediction model (BERT-QA) outperforms all other variants and has an improvement of 12.36% in F1 score compared to the BERT-base baseline. Our generation model (T5-QA) outperforms the baselines in all content preservation metrics such as BLEU, ROUGE and has an average improvement of 35.02% in ROUGE and 198.75% in BLEU compared to the highest performing baseline (HSSC-q). Human evaluation of our pipeline shows us that our method has an overall improvement in accuracy of 30.7% over the generation model (T5-QA), resulting in our full pipeline-based approach (MSQAP) providing more accurate answers. To the best of our knowledge, this is the first work in the e-commerce domain that automatically generates natural language answers combining the information present in diverse sources such as specifications, similar questions, and reviews data.
nan
Article 1161
Title@2025-07-11 (5): RTNinja: a generalized machine learning framework for analyzing random telegraph noise signals in nanoelectronic devices
Title: RTNinja: a generalized machine learning framework for analyzing random telegraph noise signals in nanoelectronic devices | RTNinja: ein generalisierter Rahmen für maschinelles Lernen zur Analyse von zufälligen Telegraphenrauschsignalen in nanoelektronischen Geräten | RTNinja:用于分析纳米电子设备随机电报噪音信号的通用机器学习框架 2507.08424v1 |
Authors (4): Anirudh Varanasi, Robin Degraeve, Philippe Roussel, Clement Merckling
Random telegraph noise is a prevalent variability phenomenon in nanoelectronic devices, arising from stochastic carrier exchange at defect sites and critically impacting device reliability and performance. Conventional analysis techniques often rely on restrictive assumptions or manual interventions, limiting their applicability to complex, noisy datasets. Here, we introduce RTNinja, a generalized, fully automated machine learning framework for the unsupervised analysis of random telegraph noise signals. RTNinja deconvolves complex signals to identify the number and characteristics of hidden individual sources, without requiring prior knowledge of the system. The framework comprises two modular components: LevelsExtractor, which uses Bayesian inference and model selection to denoise and discretize the signal; and SourcesMapper, which infers source configurations through probabilistic clustering and optimization. To evaluate performance, we developed a Monte Carlo simulator that generates labeled datasets spanning broad signal-to-noise ratios and source complexities; across 7000 such datasets, RTNinja consistently demonstrated high-fidelity signal reconstruction and accurate extraction of source amplitudes and activity patterns. Our results demonstrate that RTNinja offers a robust, scalable, and device-agnostic tool for random telegraph noise characterization, enabling large-scale statistical benchmarking, reliability-centric technology qualification, predictive failure modeling, and device physics exploration in next-generation nanoelectronics.
nan
Article 1162
Title@2025-07-11 (5): Minerva: A File-Based Ransomware Detector
Title: Minerva: A File-Based Ransomware Detector | Minerva: Ein dateibasierter Ransomware-Detektor | Minerva: 以文件为基础的序列器检测器 2301.11050v4 |
Authors (5): Dorjan Hitaj, Giulio Pagnotta, Fabio De Gaspari, Lorenzo De Carli, Luigi V. Mancini
Ransomware attacks have caused billions of dollars in damages in recent years, and are expected to cause billions more in the future. Consequently, significant effort has been devoted to ransomware detection and mitigation. Behavioral-based ransomware detection approaches have garnered considerable attention recently. These behavioral detectors typically rely on process-based behavioral profiles to identify malicious behaviors. However, with an increasing body of literature highlighting the vulnerability of such approaches to evasion attacks, a comprehensive solution to the ransomware problem remains elusive. This paper presents Minerva, a novel, robust approach to ransomware detection. Minerva is engineered to be robust by design against evasion attacks, with architectural and feature selection choices informed by their resilience to adversarial manipulation. We conduct a comprehensive analysis of Minerva across a diverse spectrum of ransomware types, encompassing unseen ransomware as well as variants designed specifically to evade Minerva. Our evaluation showcases the ability of Minerva to accurately identify ransomware, generalize to unseen threats, and withstand evasion attacks. Furthermore, over 99% of detected ransomware are identified within 0.52sec of activity, enabling the adoption of data loss prevention techniques with near-zero overhead.
nan
Article 1163
Title@2025-07-11 (5): Towards AI-Native RAN: An Operator’s Perspective of 6G Day 1 Standardization
Title: Towards AI-Native RAN: An Operator’s Perspective of 6G Day 1 Standardization | Auf dem Weg zu KI-Native RAN: Die Perspektive des Betreibers von 6G Tag 1 Standardisierung | 面向AI-Native RAN:运营商对6G日1标准化的看法 2507.08403v1 |
Authors (9): Nan Li, Qi Sun, Lehan Wang, Xiaofei Xu, Jinri Huang, Chunhui Liu, Jing Gao, Yuhong Huang, Chih-Lin I
Artificial Intelligence/Machine Learning (AI/ML) has become the most certain and prominent feature of 6G mobile networks. Unlike 5G, where AI/ML was not natively integrated but rather an add-on feature over existing architecture, 6G shall incorporate AI from the onset to address its complexity and support ubiquitous AI applications. Based on our extensive mobile network operation and standardization experience from 2G to 5G, this paper explores the design and standardization principles of AI-Native radio access networks (RAN) for 6G, with a particular focus on its critical Day 1 architecture, functionalities and capabilities. We investigate the framework of AI-Native RAN and present its three essential capabilities to shed some light on the standardization direction; namely, AI-driven RAN processing/optimization/automation, reliable AI lifecycle management (LCM), and AI-as-a-Service (AIaaS) provisioning. The standardization of AI-Native RAN, in particular the Day 1 features, including an AI-Native 6G RAN architecture, were proposed. For validation, a large-scale field trial with over 5000 5G-A base stations have been built and delivered significant improvements in average air interface latency, root cause identification, and network energy consumption with the proposed architecture and the supporting AI functions. This paper aims to provide a Day 1 framework for 6G AI-Native RAN standardization design, balancing technical innovation with practical deployment.
nan
Article 1164
Title@2025-07-11 (5): SPINT: Spatial Permutation-Invariant Neural Transformer for Consistent Intracortical Motor Decoding
Title: SPINT: Spatial Permutation-Invariant Neural Transformer for Consistent Intracortical Motor Decoding | SPINT: Raumpermutations-Invarianter Neuraltransformator für konsistente intrakortikale Motordekodierung | SPINT: 空间变异-内变量内神经变异器,用于连贯一致的异质内装配机动车代号 2507.08402v1 |
Authors (8): Trung Le, Hao Fang, Jingyuan Li, Tung Nguyen, Lu Mi, Amy Orsborn, Uygar Sümbül, Eli Shlizerman
Intracortical Brain-Computer Interfaces (iBCI) aim to decode behavior from neural population activity, enabling individuals with motor impairments to regain motor functions and communication abilities. A key challenge in long-term iBCI is the nonstationarity of neural recordings, where the composition and tuning profiles of the recorded populations are unstable across recording sessions. Existing methods attempt to address this issue by explicit alignment techniques; however, they rely on fixed neural identities and require test-time labels or parameter updates, limiting their generalization across sessions and imposing additional computational burden during deployment. In this work, we introduce SPINT - a Spatial Permutation-Invariant Neural Transformer framework for behavioral decoding that operates directly on unordered sets of neural units. Central to our approach is a novel context-dependent positional embedding scheme that dynamically infers unit-specific identities, enabling flexible generalization across recording sessions. SPINT supports inference on variable-size populations and allows few-shot, gradient-free adaptation using a small amount of unlabeled data from the test session. To further promote model robustness to population variability, we introduce dynamic channel dropout, a regularization method for iBCI that simulates shifts in population composition during training. We evaluate SPINT on three multi-session datasets from the FALCON Benchmark, covering continuous motor decoding tasks in human and non-human primates. SPINT demonstrates robust cross-session generalization, outperforming existing zero-shot and few-shot unsupervised baselines while eliminating the need for test-time alignment and fine-tuning. Our work contributes an initial step toward a robust and scalable neural decoding framework for long-term iBCI applications.
nan
Article 1165
Title@2025-07-11 (5): DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
Title: DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving | DriveTransformer: Unified Transformer für skalierbares autonomes Fahren | 驱动器变换: 用于可缩放的终端到终端自动驱动的统一变换器 2503.07656v2 |
Authors (4): Xiaosong Jia, Junqi You, Zhiyuan Zhang, Junchi Yan
End-to-end autonomous driving (E2E-AD) has emerged as a trend in the field of autonomous driving, promising a data-driven, scalable approach to system design. However, existing E2E-AD methods usually adopt the sequential paradigm of perception-prediction-planning, which leads to cumulative errors and training instability. The manual ordering of tasks also limits the system`s ability to leverage synergies between tasks (for example, planning-aware perception and game-theoretic interactive prediction and planning). Moreover, the dense BEV representation adopted by existing methods brings computational challenges for long-range perception and long-term temporal fusion. To address these challenges, we present DriveTransformer, a simplified E2E-AD framework for the ease of scaling up, characterized by three key features: Task Parallelism (All agent, map, and planning queries direct interact with each other at each block), Sparse Representation (Task queries direct interact with raw sensor features), and Streaming Processing (Task queries are stored and passed as history information). As a result, the new framework is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention, which significantly reduces the complexity of system and leads to better training stability. DriveTransformer achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
nan
Article 1166
Title@2025-07-11 (5): Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling
Title: Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling | Inferenz-Zeit-Skalierung von Diffusions-Sprachmodellen mit Partikel Gibbs-Sampling | 配有粒子Gibbs抽样的传播语言模型的推推-时间缩放 2507.08390v1 |
Authors (6): Meihua Dang, Jiaqi Han, Minkai Xu, Kai Xu, Akash Srivastava, Stefano Ermon
Discrete diffusion models have emerged as a powerful paradigm for language modeling, rivaling auto-regressive models by training-time scaling. However, inference-time scaling in discrete diffusion models remains relatively under-explored. In this work, we study sampling-based approaches for achieving high-quality text generation from discrete diffusion models in reward-guided settings. We introduce a novel inference-time scaling approach based on particle Gibbs sampling for discrete diffusion models. The particle Gibbs sampling algorithm iteratively refines full diffusion trajectories using conditional Sequential Monte Carlo as its transition mechanism. This process ensures that the updated samples progressively improve and move closer to the reward-weighted target distribution. Unlike existing inference-time scaling methods, which are often limited to single diffusion trajectories, our approach leverages iterative refinement across multiple trajectories. Within this framework, we further analyze the trade-offs between four key axes for inference-time scaling under fixed compute budgets: particle Gibbs iterations, particle count, denoising steps, and reward estimation cost. Empirically, our method consistently outperforms prior inference-time strategies on reward-guided text generation tasks, achieving significant improvement in accuracy under varying compute budgets.
nan
Article 1167
Title@2025-07-11 (5): Online Pre-Training for Offline-to-Online Reinforcement Learning
Title: Online Pre-Training for Offline-to-Online Reinforcement Learning | Online-Vorschulung für Offline-to-Online-Verstärkung | 离线至在线强化学习在线培训前培训 2507.08387v1 |
Authors (11): Yongjae Shin, Jeonghye Kim, Whiyoung Jung, Sunghoon Hong, Deunsol Yoon, Youngsoo Jang, Geonhyeong Kim, Jongseong Chae, Youngchul Sung, Kanghoon Lee, Woohyung Lim
Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.
nan
Article 1168
Title@2025-07-11 (5): Estimation of conditional average treatment effects on distributed confidential data
Title: Estimation of conditional average treatment effects on distributed confidential data | Schätzung der bedingten durchschnittlichen Behandlungseffekte auf verteilte vertrauliche Daten | 对分发的机密数据进行有条件平均待遇影响的估计 2402.02672v4 |
Authors (5): Yuji Kawamata, Ryoki Motai, Yukihiko Okada, Akira Imakura, Tetsuya Sakurai
The estimation of conditional average treatment effects (CATEs) is an important topic in many scientific fields. CATEs can be estimated with high accuracy if data distributed across multiple parties are centralized. However, it is difficult to aggregate such data owing to confidentiality or privacy concerns. To address this issue, we propose data collaboration double machine learning, a method for estimating CATE models using privacy-preserving fusion data constructed from distributed sources, and evaluate its performance through simulations. We make three main contributions. First, our method enables estimation and testing of semi-parametric CATE models without iterative communication on distributed data, providing robustness to model mis-specification compared to parametric approaches. Second, it enables collaborative estimation across different time points and parties by accumulating a knowledge base. Third, our method performs as well as or better than existing methods in simulations using synthetic, semi-synthetic, and real-world datasets.
nan
Article 1169
Title@2025-07-11 (5): Advances in Machine Learning: Where Can Quantum Techniques Help?
Title: Advances in Machine Learning: Where Can Quantum Techniques Help? | Fortschritte beim maschinellen Lernen: Wo können Quantentechniken helfen? | 机器学习的进步:量子技术能帮助哪里? 2507.08379v1 |
Authors (4): Samarth Kashyap, Rohit K Ramakrishnan, Kumari Jyoti, Apoorva D Patel
Quantum Machine Learning (QML) represents a promising frontier at the intersection of quantum computing and artificial intelligence, aiming to leverage quantum computational advantages to enhance data-driven tasks. This review explores the potential of QML to address the computational bottlenecks of classical machine learning, particularly in processing complex datasets. We introduce the theoretical foundations of QML, including quantum data encoding, quantum learning theory and optimization techniques, while categorizing QML approaches based on data type and computational architecture. It is well-established that quantum computational advantages are problem-dependent, and so potentially useful directions for QML need to be systematically identified. Key developments, such as Quantum Principal Component Analysis, quantum-enhanced sensing and applications in material science, are critically evaluated for their theoretical speed-ups and practical limitations. The challenges posed by Noisy Intermediate-Scale Quantum (NISQ) devices, including hardware noise, scalability constraints and data encoding overheads, are discussed in detail. We also outline future directions, emphasizing the need for quantum-native algorithms, improved error correction, and realistic benchmarks to bridge the gap between theoretical promise and practical deployment. This comprehensive analysis underscores that while QML has significant potential for specific applications such as quantum chemistry and sensing, its broader utility in real-world scenarios remains contingent on overcoming technological and methodological hurdles.
nan
Article 1170
Title@2025-07-11 (5): Sampling from Your Language Model One Byte at a Time
Title: Sampling from Your Language Model One Byte at a Time | Proben aus Ihrem Sprachmodell ein Byte zu einer Zeit | 一次抽取您语言模式一字节的样本 2506.14123v2 |
Authors (4): Jonathan Hayase, Alisa Liu, Noah A. Smith, Sewoong Oh
Tokenization is used almost universally by modern language models, enabling efficient text representation using multi-byte or multi-character tokens. However, prior work has shown that tokenization can introduce distortion into the model’s generations, an issue known as the Prompt Boundary Problem (PBP). For example, users are often advised not to end their prompts with a space because it prevents the model from including the space as part of the next token. While this heuristic is effective in English, the underlying PBP continues to affect languages such as Chinese as well as code generation, where tokens often do not line up with word and syntactic boundaries. In this work, we present an inference-time method to convert any autoregressive LM with a BPE tokenizer into a character-level or byte-level LM. Our method efficiently solves the PBP and is also able to unify the vocabularies of language models with different tokenizers, allowing one to ensemble LMs with different tokenizers at inference time or transfer the post-training from one model to another using proxy-tuning. We demonstrate in experiments that the ensemble and proxy-tuned models outperform their constituents on downstream evals. Code is available at https://github.com/SewoongLab/byte-sampler .
nan
Article 1171
Title@2025-07-11 (5): Learning Pole Structures of Hadronic States using Predictive Uncertainty Estimation
Title: Learning Pole Structures of Hadronic States using Predictive Uncertainty Estimation | Erlernen der Polstrukturen von Hadronischen Staaten mittels vorausschauender Unsicherheitsabschätzung | 使用预测性不确定性估计值的 强力国家学习极极结构 2507.07668v2 |
Authors (4): Felix Frohnert, Denny Lane B. Sombillo, Evert van Nieuwenburg, Patrick Emonts
Matching theoretical predictions to experimental data remains a central challenge in hadron spectroscopy. In particular, the identification of new hadronic states is difficult, as exotic signals near threshold can arise from a variety of physical mechanisms. A key diagnostic in this context is the pole structure of the scattering amplitude, but different configurations can produce similar signatures. The mapping between pole configurations and line shapes is especially ambiguous near the mass threshold, where analytic control is limited. In this work, we introduce an uncertainty-aware machine learning approach for classifying pole structures in $S$-matrix elements. Our method is based on an ensemble of classifier chains that provide both epistemic and aleatoric uncertainty estimates. We apply a rejection criterion based on predictive uncertainty, achieving a validation accuracy of nearly $95\%$ while discarding only a small fraction of high-uncertainty predictions. Trained on synthetic data with known pole structures, the model generalizes to previously unseen experimental data, including enhancements associated with the $P_{c\bar{c}}(4312)^+$ state observed by LHCb. In this, we infer a four-pole structure, representing the presence of a genuine compact pentaquark in the presence of a higher channel virtual state pole with non-vanishing width. While evaluated on this particular state, our framework is broadly applicable to other candidate hadronic states and offers a scalable tool for pole structure inference in scattering amplitudes.
nan
Article 1172
Title@2025-07-11 (5): Prediction of Lane Change Intentions of Human Drivers using an LSTM, a CNN and a Transformer
Title: Prediction of Lane Change Intentions of Human Drivers using an LSTM, a CNN and a Transformer | Vorhersage von Lane Change Absichten menschlicher Treiber mit einem LSTM, einem CNN und einem Transformer | 使用LSTM、CNN和变形器预测人驾驶员的车道改变意图 2507.08365v1 |
Authors (4): Francesco De Cristofaro, Felix Hofbaur, Aixi Yang, Arno Eichberger
Lane changes of preceding vehicles have a great impact on the motion planning of automated vehicles especially in complex traffic situations. Predicting them would benefit the public in terms of safety and efficiency. While many research efforts have been made in this direction, few concentrated on predicting maneuvers within a set time interval compared to predicting at a set prediction time. In addition, there exist a lack of comparisons between different architectures to try to determine the best performing one and to assess how to correctly choose the input for such models. In this paper the structure of an LSTM, a CNN and a Transformer network are described and implemented to predict the intention of human drivers to perform a lane change. We show how the data was prepared starting from a publicly available dataset (highD), which features were used, how the networks were designed and finally we compare the results of the three networks with different configurations of input data. We found that transformer networks performed better than the other networks and was less affected by overfitting. The accuracy of the method spanned from $82.79\%$ to $96.73\%$ for different input configurations and showed overall good performances considering also precision and recall.
nan
Article 1173
Title@2025-07-11 (5): A Plea for History and Philosophy of Statistics and Machine Learning
Title: A Plea for History and Philosophy of Statistics and Machine Learning | Ein Plädoyer für Geschichte und Philosophie der Statistik und des maschinellen Lernens | 统计和机器学习历史和哲学 2506.22236v2 |
Authors (1): Hanti Lin
The integration of the history and philosophy of statistics was initiated at least by Hacking (1965) and advanced by Mayo (1996), but it has not received sustained follow-up. Yet such integration is more urgent than ever, as the recent success of artificial intelligence has been driven largely by machine learning – a field historically developed alongside statistics. Today, the boundary between statistics and machine learning is increasingly blurred. What we now need is integration, twice over: of history and philosophy, and of two fields they engage – statistics and machine learning. I present a case study of a philosophical idea in machine learning (and in formal epistemology) whose root can be traced back to an often under-appreciated insight in Neyman and Pearson’s 1936 work (a follow-up to their 1933 classic). This leads to the articulation of an epistemological principle – largely implicit in, but shared by, the practices of frequentist statistics and machine learning – which I call achievabilism: the thesis that the correct standard for assessing non-deductive inference methods should not be fixed, but should instead be sensitive to what is achievable in specific problem contexts. Another integration also emerges at the level of methodology, combining two ends of the philosophy of science spectrum: history and philosophy of science on the one hand, and formal epistemology on the other hand.
nan
Article 1174
Title@2025-07-11 (5): Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text
Title: Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text | Nutzung von maschinellem Lernen und verbesserte Parallelitätserkennung für die BPMN-Modellgenerierung aus Text | 利用机器学习和强化平行探测,从文字中生成BPMN模型 2507.08362v1 |
Authors (6): Phuong Nam Lê, Charlotte Schneider-Depré, Alexandre Goossens, Alexander Stevens, Aurélie Leribaux, Johannes De Smedt
Efficient planning, resource management, and consistent operations often rely on converting textual process documents into formal Business Process Model and Notation (BPMN) models. However, this conversion process remains time-intensive and costly. Existing approaches, whether rule-based or machine-learning-based, still struggle with writing styles and often fail to identify parallel structures in process descriptions. This paper introduces an automated pipeline for extracting BPMN models from text, leveraging the use of machine learning and large language models. A key contribution of this work is the introduction of a newly annotated dataset, which significantly enhances the training process. Specifically, we augment the PET dataset with 15 newly annotated documents containing 32 parallel gateways for model training, a critical feature often overlooked in existing datasets. This addition enables models to better capture parallel structures, a common but complex aspect of process descriptions. The proposed approach demonstrates adequate performance in terms of reconstruction accuracy, offering a promising foundation for organizations to accelerate BPMN model creation.
nan
Article 1175
Title@2025-07-11 (5): scE$^2$TM: Toward Interpretable Single-Cell Embedding via Topic Modeling
Title: scE$^2$TM: Toward Interpretable Single-Cell Embedding via Topic Modeling | scE$^2$TM: Auf dem Weg zur Interpretierbaren Single-Cell-Einbettung über Topic Modeling | ScE$2美元TM:争取通过专题建模以可解释的单一公司嵌入 2507.08355v1 |
Authors (6): Hegang Chen, Yuyin Lu, Zhiming Dai, Fu Lee Wang, Qing Li, Yanghui Rao
Recent advances in sequencing technologies have enabled researchers to explore cellular heterogeneity at single-cell resolution. Meanwhile, interpretability has gained prominence parallel to the rapid increase in the complexity and performance of deep learning models. In recent years, topic models have been widely used for interpretable single-cell embedding learning and clustering analysis, which we refer to as single-cell embedded topic models. However, previous studies evaluated the interpretability of the models mainly through qualitative analysis, and these single-cell embedded topic models suffer from the potential problem of interpretation collapse. Furthermore, their neglect of external biological knowledge constrains analytical performance. Here, we present scE2TM, an external knowledge-guided single-cell embedded topic model that provides a high-quality cell embedding and strong interpretation, contributing to comprehensive scRNA-seq data analysis. Our comprehensive evaluation across 20 scRNA-seq datasets demonstrates that scE2TM achieves significant clustering performance gains compared to 7 state-of-the-art methods. In addition, we propose a new interpretability evaluation benchmark that introduces 10 metrics to quantitatively assess the interpretability of single-cell embedded topic models. The results show that the interpretation provided by scE2TM performs encouragingly in terms of diversity and consistency with the underlying biological signals, contributing to a better revealing of the underlying biological mechanisms.
nan
Article 1176
Title@2025-07-11 (5): DRAN: A Distribution and Relation Adaptive Network for Spatio-temporal Forecasting
Title: DRAN: A Distribution and Relation Adaptive Network for Spatio-temporal Forecasting | DRAN: Ein Vertriebs- und Beziehungsadaptives Netzwerk für die räumlich-zeitliche Vorhersage | DRAN: 空间时预报分布和关系适应网络 2504.01531v3 |
Authors (5): Xiaobei Zou, Luolin Xiong, Kexuan Zhang, Cesare Alippi, Yang Tang
Accurate predictions of spatio-temporal systems are crucial for tasks such as system management, control, and crisis prevention. However, the inherent time variance of many spatio-temporal systems poses challenges to achieving accurate predictions whenever stationarity is not granted. In order to address non-stationarity, we propose a Distribution and Relation Adaptive Network (DRAN) capable of dynamically adapting to relation and distribution changes over time. While temporal normalization and de-normalization are frequently used techniques to adapt to distribution shifts, this operation is not suitable for the spatio-temporal context as temporal normalization scales the time series of nodes and possibly disrupts the spatial relations among nodes. In order to address this problem, a Spatial Factor Learner (SFL) module is developed that enables the normalization and de-normalization process. To adapt to dynamic changes in spatial relationships among sensors, we propose a Dynamic-Static Fusion Learner (DSFL) module that effectively integrates features learned from both dynamic and static relations through an adaptive fusion ratio mechanism. Furthermore, we introduce a Stochastic Learner to capture the noisy components of spatio-temporal representations. Our approach outperforms state-of-the-art methods on weather prediction and traffic flow forecasting tasks.Experimental results show that our SFL efficiently preserves spatial relationships across various temporal normalization operations. Visualizations of the learned dynamic and static relations demonstrate that DSFL can capture both local and distant relationships between nodes.
nan
Article 1177
Title@2025-07-11 (5): Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting
Title: Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting | Galerkin-ARIMA: Ein zweistufiges Polynom-Regressions-Framework für schnelles Ein-Schritt-Vorhersagen | Galerkin-ARIMA:一个双级多级倒退框架,用于快速滚动单步单步预告 2507.07469v2 |
Authors (2): Haojie Liu, Zihan Lin
We introduce Galerkin-ARIMA, a novel time-series forecasting framework that integrates Galerkin projection techniques with the classical ARIMA model to capture potentially nonlinear dependencies in lagged observations. By replacing the fixed linear autoregressive component with a spline-based basis expansion, Galerkin-ARIMA flexibly approximates the underlying relationship among past values via ordinary least squares, while retaining the moving-average structure and Gaussian innovation assumptions of ARIMA. We derive closed-form solutions for both the AR and MA components using two-stage Galerkin projections, establish conditions for asymptotic unbiasedness and consistency, and analyze the bias-variance trade-off under basis-size growth. Complexity analysis reveals that, for moderate basis dimensions, our approach can substantially reduce computational cost compared to maximum-likelihood ARIMA estimation. Through extensive simulations on four synthetic processes-including noisy ARMA, seasonal, trend-AR, and nonlinear recursion series-we demonstrate that Galerkin-ARIMA matches or closely approximates ARIMA’s forecasting accuracy while achieving orders-of-magnitude speedups in rolling forecasting tasks. These results suggest that Galerkin-ARIMA offers a powerful, efficient alternative for modeling complex time series dynamics in high-volume or real-time applications.
nan
Article 1178
Title@2025-07-11 (5): Enhancing Distributional Robustness in Principal Component Analysis by Wasserstein Distances
Title: Enhancing Distributional Robustness in Principal Component Analysis by Wasserstein Distances | Verbesserung der Verteilungs Robustheit in der Hauptkomponentenanalyse durch Wasserstein-Abstände | 提高瓦塞斯坦距离主要构成部分分析的分布强度 2503.02494v2 |
Authors (3): Lei Wang, Xin Liu, Xiaojun Chen
We consider the distributionally robust optimization (DRO) model of principal component analysis (PCA) to account for uncertainty in the underlying probability distribution. The resulting formulation leads to a nonsmooth constrained min-max optimization problem, where the ambiguity set captures the distributional uncertainty by the type-$2$ Wasserstein distance. We prove that the inner maximization problem admits a closed-form optimal value. This explicit characterization equivalently reformulates the original DRO model into a minimization problem on the Stiefel manifold with intricate nonsmooth terms, a challenging formulation beyond the reach of existing algorithms. To address this issue, we devise an efficient smoothing manifold proximal gradient algorithm. Our analysis establishes Riemannian gradient consistency and global convergence of our algorithm to a stationary point of the nonsmooth minimization problem. We also provide the iteration complexity $O(\epsilon^{-3})$ of our algorithm to achieve an $\epsilon$-approximate stationary point. Finally, numerical experiments are conducted to validate the effectiveness and scalability of our algorithm, as well as to highlight the necessity and rationality of adopting the DRO model for PCA.
nan
Article 1179
Title@2025-07-11 (5): Interpretability-Aware Pruning for Efficient Medical Image Analysis
Title: Interpretability-Aware Pruning for Efficient Medical Image Analysis | Dolmetschbarkeits-Vorsicht für effiziente medizinische Bildanalyse | 高效医学图像分析的解释性软件 2507.08330v1 |
Authors (5): Nikita Malik, Pratinav Seth, Neeraj Kumar Singh, Chintan Chitroda, Vinay Kumar Sankarapu
Deep learning has driven significant advances in medical image analysis, yet its adoption in clinical practice remains constrained by the large size and lack of transparency in modern models. Advances in interpretability techniques such as DL-Backtrace, Layer-wise Relevance Propagation, and Integrated Gradients make it possible to assess the contribution of individual components within neural networks trained on medical imaging tasks. In this work, we introduce an interpretability-guided pruning framework that reduces model complexity while preserving both predictive performance and transparency. By selectively retaining only the most relevant parts of each layer, our method enables targeted compression that maintains clinically meaningful representations. Experiments across multiple medical image classification benchmarks demonstrate that this approach achieves high compression rates with minimal loss in accuracy, paving the way for lightweight, interpretable models suited for real-world deployment in healthcare settings.
nan
Article 1180
Title@2025-07-11 (5): An Adaptive Volatility-based Learning Rate Scheduler
Title: An Adaptive Volatility-based Learning Rate Scheduler | Eine adaptive Volatilität-basierte Lernrate Scheduler | 基于适应性波动的学习率计划表 2507.10575v1 |
Authors (1): Kieran Chai Kai Ren
Effective learning rate (LR) scheduling is crucial for training deep neural networks. However, popular pre-defined and adaptive schedulers can still lead to suboptimal generalization. This paper introduces VolSched, a novel adaptive LR scheduler inspired by the concept of volatility in stochastic processes like Geometric Brownian Motion to dynamically adjust the learning rate. By calculating the ratio between long-term and short-term accuracy volatility, VolSched increases the LR to escape plateaus and decreases it to stabilize training, allowing the model to explore the loss landscape more effectively. We evaluate VolSched on the CIFAR-100 dataset against a strong baseline using a standard augmentation pipeline. When paired with ResNet-18 and ResNet-34, our scheduler delivers consistent performance gains, improving top-1 accuracy by 1.4 and 1.3 percentage points respectively. Analysis of the loss curves reveals that VolSched promotes a longer exploration phase. A quantitative analysis of the Hessian shows that VolSched finds a final solution that is 38% flatter than the next-best baseline, allowing the model to obtain wider minima and hence better generalization performance.
nan
Article 1181
Title@2025-07-11 (5): Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
Title: Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection | Emoji-Angriff: Verstärkung von Jailbreak-Angriffen gegen Richter LLM-Erkennung | Emoji攻击:加强针对LLM法官的越狱袭击 2411.01077v4 |
Authors (3): Zhipeng Wei, Yuqi Liu, N. Benjamin Erichson
Jailbreaking techniques trick Large Language Models (LLMs) into producing restricted output, posing a potential threat. One line of defense is to use another LLM as a Judge to evaluate the harmfulness of generated text. However, we reveal that these Judge LLMs are vulnerable to token segmentation bias, an issue that arises when delimiters alter the tokenization process, splitting words into smaller sub-tokens. This alters the embeddings of the entire sequence, reducing detection accuracy and allowing harmful content to be misclassified as safe. In this paper, we introduce Emoji Attack, a novel strategy that amplifies existing jailbreak prompts by exploiting token segmentation bias. Our method leverages in-context learning to systematically insert emojis into text before it is evaluated by a Judge LLM, inducing embedding distortions that significantly lower the likelihood of detecting unsafe content. Unlike traditional delimiters, emojis also introduce semantic ambiguity, making them particularly effective in this attack. Through experiments on state-of-the-art Judge LLMs, we demonstrate that Emoji Attack substantially reduces the unsafe prediction rate, bypassing existing safeguards.
nan
Article 1182
Title@2025-07-11 (5): EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
Title: EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees | EvalTree: Profiling Language Model Schwächen über Hierarchische Fähigkeiten Bäume | EvalTree:通过等级能力树分析语言模型弱点 2503.08893v2 |
Authors (4): Zhiyuan Zeng, Yizhong Wang, Hannaneh Hajishirzi, Pang Wei Koh
An ideal model evaluation should achieve two goals: identifying where the model fails and providing actionable improvement guidance. Toward these goals for language model (LM) evaluations, we formulate the problem of generating a weakness profile, a set of weaknesses expressed in natural language, given an LM’s performance on every individual instance in a benchmark. We introduce a suite of quantitative assessments to compare different weakness profiling methods. We also introduce a weakness profiling method EvalTree. EvalTree constructs a capability tree where each node represents a capability described in natural language and is linked to a subset of benchmark instances that specifically evaluate this capability; it then extracts nodes where the LM performs poorly to generate a weakness profile. On the MATH and WildChat benchmarks, we show that EvalTree outperforms baseline weakness profiling methods by identifying weaknesses more precisely and comprehensively. Weakness profiling further enables weakness-guided data collection, and training data collection guided by EvalTree-identified weaknesses improves LM performance more than other data collection strategies. We also show how EvalTree exposes flaws in Chatbot Arena’s human-voter-based evaluation practice. To facilitate future work, we provide an interface that allows practitioners to interactively explore the capability trees built by EvalTree.
nan
Article 1183
Title@2025-07-11 (5): Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments
Title: Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments | Entschlüsselung des Interplays zwischen Übertragungseffekten und Belohnungsautokorrelationen in Switchback-Experimenten | 在回转实验中解开结转效应与回转回实验中回调自动关系之间的交互作用 2403.17285v5 |
Authors (5): Qianglin Wen, Chengchun Shi, Yang Ying, Niansheng Tang, Hongtu Zhu
A/B testing has become the gold standard for policy evaluation in modern technological industries. Motivated by the widespread use of switchback experiments in A/B testing, this paper conducts a comprehensive comparative analysis of various switchback designs in Markovian environments. Unlike many existing works which derive the optimal design based on specific and relatively simple estimators, our analysis covers a range of state-of-the-art estimators developed in the reinforcement learning (RL) literature. It reveals that the effectiveness of different switchback designs depends crucially on (i) the size of the carryover effect and (ii) the auto-correlations among reward errors over time. Meanwhile, these findings are estimator-agnostic, i.e., they apply to most RL estimators. Based on these insights, we provide a workflow to offer guidelines for practitioners on designing switchback experiments in A/B testing.
nan
Article 1184
Title@2025-07-11 (5): A Comprehensively Adaptive Architectural Optimization-Ingrained Quantum Neural Network Model for Cloud Workloads Prediction
Title: A Comprehensively Adaptive Architectural Optimization-Ingrained Quantum Neural Network Model for Cloud Workloads Prediction | Ein umfassend adaptives architektonisches Optimierungs- und Quantum-Neural-Netzwerkmodell für Cloud Workloads Vorhersage | 全面适应性建筑建筑优化-植入量子云工作量预测神经网络模型 2507.08317v1 |
Authors (5): Jitendra Kumar, Deepika Saxena, Kishu Gupta, Satyam Kumar, Ashutosh Kumar Singh
Accurate workload prediction and advanced resource reservation are indispensably crucial for managing dynamic cloud services. Traditional neural networks and deep learning models frequently encounter challenges with diverse, high-dimensional workloads, especially during sudden resource demand changes, leading to inefficiencies. This issue arises from their limited optimization during training, relying only on parametric (inter-connection weights) adjustments using conventional algorithms. To address this issue, this work proposes a novel Comprehensively Adaptive Architectural Optimization-based Variable Quantum Neural Network (CA-QNN), which combines the efficiency of quantum computing with complete structural and qubit vector parametric learning. The model converts workload data into qubits, processed through qubit neurons with Controlled NOT-gated activation functions for intuitive pattern recognition. In addition, a comprehensive architecture optimization algorithm for networks is introduced to facilitate the learning and propagation of the structure and parametric values in variable-sized QNNs. This algorithm incorporates quantum adaptive modulation and size-adaptive recombination during training process. The performance of CA-QNN model is thoroughly investigated against seven state-of-the-art methods across four benchmark datasets of heterogeneous cloud workloads. The proposed model demonstrates superior prediction accuracy, reducing prediction errors by up to 93.40% and 91.27% compared to existing deep learning and QNN-based approaches.
nan
Article 1185
Title@2025-07-11 (5): CAS Condensed and Accelerated Silhouette: An Efficient Method for Determining the Optimal K in K-Means Clustering
Title: CAS Condensed and Accelerated Silhouette: An Efficient Method for Determining the Optimal K in K-Means Clustering | CAS Kondensiertes und Beschleunigtes Silhouette: Eine effiziente Methode zur Bestimmung des Optimalen K in K-Means Clustering | CAS 集中和加速的西尔休埃特:确定K-Meyans集群中最佳K的高效方法 2507.08311v1 |
Authors (3): Krishnendu Das, Sumit Gupta, Awadhesh Kumar
Clustering is a critical component of decision-making in todays data-driven environments. It has been widely used in a variety of fields such as bioinformatics, social network analysis, and image processing. However, clustering accuracy remains a major challenge in large datasets. This paper presents a comprehensive overview of strategies for selecting the optimal value of k in clustering, with a focus on achieving a balance between clustering precision and computational efficiency in complex data environments. In addition, this paper introduces improvements to clustering techniques for text and image data to provide insights into better computational performance and cluster validity. The proposed approach is based on the Condensed Silhouette method, along with statistical methods such as Local Structures, Gap Statistics, Class Consistency Ratio, and a Cluster Overlap Index CCR and COIbased algorithm to calculate the best value of k for K-Means clustering. The results of comparative experiments show that the proposed approach achieves up to 99 percent faster execution times on high-dimensional datasets while retaining both precision and scalability, making it highly suitable for real time clustering needs or scenarios demanding efficient clustering with minimal resource utilization.
nan
Article 1186
Title@2025-07-11 (5): M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
Title: M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning | M2-Reasoning: Stärkung von MLLMs mit einheitlicher allgemeiner und räumlicher Vernunft | M2-反应:以统一的一般和空间理由,赋予MLLMs权力 2507.08306v1 |
Authors (15): Inclusion AI, :, Fudong Wang, Jiajia Liu, Jingdong Chen, Jun Zhou, Kaixiang Ji, Lixiang Ru, Qingpei Guo, Ruobing Zheng, Tianqi Li, Yi Yuan, Yifan Mao, Yuting Xiao, Ziping Ma
Recent advancements in Multimodal Large Language Models (MLLMs), particularly through Reinforcement Learning with Verifiable Rewards (RLVR), have significantly enhanced their reasoning abilities. However, a critical gap persists: these models struggle with dynamic spatial interactions, a capability essential for real-world applications. To bridge this gap, we introduce M2-Reasoning-7B, a model designed to excel in both general and spatial reasoning. Our approach integrates two key innovations: (1) a novel data pipeline that generates 294.2K high-quality data samples (168K for cold-start fine-tuning and 126.2K for RLVR), which feature logically coherent reasoning trajectories and have undergone comprehensive assessment; and (2) a dynamic multi-task training strategy with step-wise optimization to mitigate conflicts between data, and task-specific rewards for delivering tailored incentive signals. This combination of curated data and advanced training allows M2-Reasoning-7B to set a new state-of-the-art (SOTA) across 8 benchmarks, showcasing superior performance in both general and spatial reasoning domains.
nan
Article 1187
Title@2025-07-11 (5): Amortized Posterior Sampling with Diffusion Prior Distillation
Title: Amortized Posterior Sampling with Diffusion Prior Distillation | Amortisierte amortisierte hintere Probenahme mit Diffusionsvordestillation | 先前蒸馏阶段的分散分解的摊销水底抽样 2407.17907v2 |
Authors (3): Abbas Mammadov, Hyungjin Chung, Jong Chul Ye
We propose Amortized Posterior Sampling (APS), a novel variational inference approach for efficient posterior sampling in inverse problems. Our method trains a conditional flow model to minimize the divergence between the variational distribution and the posterior distribution implicitly defined by the diffusion model. This results in a powerful, amortized sampler capable of generating diverse posterior samples with a single neural function evaluation, generalizing across various measurements. Unlike existing methods, our approach is unsupervised, requires no paired training data, and is applicable to both Euclidean and non-Euclidean domains. We demonstrate its effectiveness on a range of tasks, including image restoration, manifold signal reconstruction, and climate data imputation. APS significantly outperforms existing approaches in computational efficiency while maintaining competitive reconstruction quality, enabling real-time, high-quality solutions to inverse problems across diverse domains.
nan
Article 1188
Title@2025-07-11 (5): Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers
Title: Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers | Bandit-Based Prompt Design Strategy Selection verbessert Prompt Optimizers | 基于强盗的即时设计战略选择改进即时优化 2503.01163v2 |
Authors (5): Rin Ashizawa, Yoichi Hirose, Nozomu Yoshinari, Kento Uchida, Shinichi Shirakawa
Prompt optimization aims to search for effective prompts that enhance the performance of large language models (LLMs). Although existing prompt optimization methods have discovered effective prompts, they often differ from sophisticated prompts carefully designed by human experts. Prompt design strategies, representing best practices for improving prompt performance, can be key to improving prompt optimization. Recently, a method termed the Autonomous Prompt Engineering Toolbox (APET) has incorporated various prompt design strategies into the prompt optimization process. In APET, the LLM is needed to implicitly select and apply the appropriate strategies because prompt design strategies can have negative effects. This implicit selection may be suboptimal due to the limited optimization capabilities of LLMs. This paper introduces Optimizing Prompts with sTrategy Selection (OPTS), which implements explicit selection mechanisms for prompt design. We propose three mechanisms, including a Thompson sampling-based approach, and integrate them into EvoPrompt, a well-known prompt optimizer. Experiments optimizing prompts for two LLMs, Llama-3-8B-Instruct and GPT-4o mini, were conducted using BIG-Bench Hard. Our results show that the selection of prompt design strategies improves the performance of EvoPrompt, and the Thompson sampling-based mechanism achieves the best overall results. Our experimental code is provided at https://github.com/shiralab/OPTS .
nan
Article 1189
Title@2025-07-11 (5): Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
Title: Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces | Dualformer: Kontrollierbares schnelles und langsames Denken durch Lernen mit Randomized Reasoning Traces | 二进制:可控制快速和慢思维,通过学习随机调整理性路径进行思考 2410.09918v3 |
Authors (5): DiJia Su, Sainbayar Sukhbaatar, Michael Rabbat, Yuandong Tian, Qinqing Zheng
In cognition theory, human thinking is governed by two systems: the fast and intuitive System 1 and the slower but more deliberative System 2. Analogously, Large Language Models (LLMs) can operate in two reasoning modes: outputting only the solutions (\emph{fast mode}) or both the reasoning chain and the final solution (\emph{slow mode}). We present \dualformer, a single Transformer model that seamlessly integrates both the fast and slow reasoning modes by training on randomized reasoning traces, where different parts of the traces are strategically dropped during training. At inference time, \dualformer can be easily configured to execute in either fast or slow mode, or automatically decide which mode to engage (\emph{auto mode}). It outperforms baselines in both performance and computational efficiency across all three modes: (1) in slow mode, \dualformer achieves $97.6\%$ optimal rate on unseen $30 \times 30$ maze tasks, surpassing the \searchformer baseline ($93.3\%$) trained on data with complete reasoning traces, with $45.5\%$ fewer reasoning steps; (2) in fast mode, \dualformer achieves $80\%$ optimal rate, significantly outperforming the Solution-Only model trained on solution-only data, which has an optimal rate of only $30\%$; (3) in auto mode, \dualformer achieves $96.6\%$ optimal rate with $59.9\%$ fewer steps than \searchformer. Moreover, \dualformer produces more diverse reasoning traces than \searchformer{}. For math reasoning problems, our techniques have also achieved improved performance with LLM fine-tuning, demonstrating its generalization beyond task-specific models. We open source our code at https://github.com/facebookresearch/dualformer.
nan
Article 1190
Title@2025-07-11 (5): Granular Ball Twin Support Vector Machine
Title: Granular Ball Twin Support Vector Machine | Granular Ball Twin Unterstützung Vektor Maschine | 颗粒球双双支持矢量机 2410.04774v3 |
Authors (3): A. Quadir, M. Sajid, M. Tanveer
On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture ModelsTwin support vector machine (TSVM) is an emerging machine learning model with versatile applicability in classification and regression endeavors. Nevertheless, TSVM confronts noteworthy challenges: $(i)$ the imperative demand for matrix inversions presents formidable obstacles to its efficiency and applicability on large-scale datasets; $(ii)$ the omission of the structural risk minimization (SRM) principle in its primal formulation heightens the vulnerability to overfitting risks; and $(iii)$ the TSVM exhibits a high susceptibility to noise and outliers, and also demonstrates instability when subjected to resampling. In view of the aforementioned challenges, we propose the granular ball twin support vector machine (GBTSVM). GBTSVM takes granular balls, rather than individual data points, as inputs to construct a classifier. These granular balls, characterized by their coarser granularity, exhibit robustness to resampling and reduced susceptibility to the impact of noise and outliers. We further propose a novel large-scale granular ball twin support vector machine (LS-GBTSVM). LS-GBTSVM’s optimization formulation ensures two critical facets: $(i)$ it eliminates the need for matrix inversions, streamlining the LS-GBTSVM’s computational efficiency, and $(ii)$ it incorporates the SRM principle through the incorporation of regularization terms, effectively addressing the issue of overfitting. The proposed LS-GBTSVM exemplifies efficiency, scalability for large datasets, and robustness against noise and outliers. We conduct a comprehensive evaluation of the GBTSVM and LS-GBTSVM models on benchmark datasets from UCI, KEEL, and NDC datasets. Our experimental findings and statistical analyses affirm the superior generalization prowess of the proposed GBTSVM and LS-GBTSVM models.
nan
Article 1191
Title@2025-07-11 (5): Distributional Soft Actor-Critic with Diffusion Policy
Title: Distributional Soft Actor-Critic with Diffusion Policy | Verteilungs-Soft-Actor-Kritik mit Diffusionspolitik | 配发软软软动作- 带有传播政策批评器 2507.01381v3 |
Authors (9): Tong Liu, Yinuo Wang, Xujie Song, Wenjun Zou, Liangfa Chen, Likun Wang, Bin Shuai, Jingliang Duan, Shengbo Eben Li
Reinforcement learning has been proven to be highly effective in handling complex control tasks. Traditional methods typically use unimodal distributions, such as Gaussian distributions, to model the output of value distributions. However, unimodal distribution often and easily causes bias in value function estimation, leading to poor algorithm performance. This paper proposes a distributional reinforcement learning algorithm called DSAC-D (Distributed Soft Actor Critic with Diffusion Policy) to address the challenges of estimating bias in value functions and obtaining multimodal policy representations. A multimodal distributional policy iteration framework that can converge to the optimal policy was established by introducing policy entropy and value distribution function. A diffusion value network that can accurately characterize the distribution of multi peaks was constructed by generating a set of reward samples through reverse sampling using a diffusion model. Based on this, a distributional reinforcement learning algorithm with dual diffusion of the value network and the policy network was derived. MuJoCo testing tasks demonstrate that the proposed algorithm not only learns multimodal policy, but also achieves state-of-the-art (SOTA) performance in all 9 control tasks, with significant suppression of estimation bias and total average return improvement of over 10% compared to existing mainstream algorithms. The results of real vehicle testing show that DSAC-D can accurately characterize the multimodal distribution of different driving styles, and the diffusion policy network can characterize multimodal trajectories.
nan
Article 1192
Title@2025-07-11 (5): Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training
Title: Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training | Leichte Sicherheits-Guardrails über Synthetische Daten und RL-geführtes Adversarial Training | 通过合成数据和RL制导反向训练轻量安全护卫车 2507.08284v1 |
Authors (8): Aleksei Ilin, Gor Matevosyan, Xueying Ma, Vladimir Eremin, Suhaa Dada, Muqun Li, Riyaaz Shaik, Haluk Noyan Tokgozoglu
We introduce a lightweight yet highly effective safety guardrail framework for language models, demonstrating that small-scale language models can achieve, and even surpass, the performance of larger counterparts in content moderation tasks. This is accomplished through high-fidelity synthetic data generation and adversarial training. The synthetic data generation process begins with human-curated seed data, which undergoes query augmentation and paraphrasing to create diverse and contextually rich examples. This augmented data is then subjected to multiple rounds of curation, ensuring high fidelity and relevance. Inspired by recent advances in the Generative Adversarial Network (GAN) architecture, our adversarial training employs reinforcement learning to guide a generator that produces challenging synthetic examples. These examples are used to fine-tune the safety classifier, enhancing its ability to detect and mitigate harmful content. Additionally, we incorporate strategies from recent research on efficient LLM training, leveraging the capabilities of smaller models to improve the performance of larger generative models. With iterative adversarial training and the generation of diverse, high-quality synthetic data, our framework enables small language models (SLMs) to serve as robust safety guardrails. This approach not only reduces computational overhead but also enhances resilience against adversarial attacks, offering a scalable and efficient solution for content moderation in AI systems.
nan
Article 1193
Title@2025-07-11 (5): Predictive Causal Inference via Spatio-Temporal Modeling and Penalized Empirical Likelihood
Title: Predictive Causal Inference via Spatio-Temporal Modeling and Penalized Empirical Likelihood | Prädiktive Kausalableitung über Spatio-Temporale Modellierung und Penalized Empirical Likelihood | 通过SPATIO-临时模拟和惩罚性实证可能性,预测性因果推断 2507.08896v1 |
Authors (3): Byunghee Lee, Hye Yeon Sin, Joonsung Kang
This study introduces an integrated framework for predictive causal inference designed to overcome limitations inherent in conventional single model approaches. Specifically, we combine a Hidden Markov Model (HMM) for spatial health state estimation with a Multi Task and Multi Graph Convolutional Network (MTGCN) for capturing temporal outcome trajectories. The framework asymmetrically treats temporal and spatial information regarding them as endogenous variables in the outcome regression, and exogenous variables in the propensity score model, thereby expanding the standard doubly robust treatment effect estimation to jointly enhance bias correction and predictive accuracy. To demonstrate its utility, we focus on clinical domains such as cancer, dementia, and Parkinson disease, where treatment effects are challenging to observe directly. Simulation studies are conducted to emulate latent disease dynamics and evaluate the model performance under varying conditions. Overall, the proposed framework advances predictive causal inference by structurally adapting to spatiotemporal complexities common in biomedical data.
nan
Article 1194
Title@2025-07-11 (5): MIRRAMS: Towards Training Models Robust to Missingness Distribution Shifts
Title: MIRRAMS: Towards Training Models Robust to Missingness Distribution Shifts | MIRRAMS: Auf dem Weg zu Trainingsmodellen Robuste bis fehlende Verteilungsverschiebungen | MIRRAMS:努力建立培训模式,以强化缺失分布分布变化 2507.08280v1 |
Authors (3): Jihye Lee, Minseo Kang, Dongha Kim
In real-world data analysis, missingness distributional shifts between training and test input datasets frequently occur, posing a significant challenge to achieving robust prediction performance. In this study, we propose a novel deep learning framework designed to address such shifts in missingness distributions. We begin by introducing a set of mutual information-based conditions, called MI robustness conditions, which guide a prediction model to extract label-relevant information while remaining invariant to diverse missingness patterns, thereby enhancing robustness to unseen missingness scenarios at test-time. To make these conditions practical, we propose simple yet effective techniques to derive loss terms corresponding to each and formulate a final objective function, termed MIRRAMS(Mutual Information Regularization for Robustness Against Missingness Shifts). As a by-product, our analysis provides a theoretical interpretation of the principles underlying consistency regularization-based semi-supervised learning methods, such as FixMatch. Extensive experiments across various benchmark datasets show that MIRRAMS consistently outperforms existing baselines and maintains stable performance across diverse missingness scenarios. Moreover, our approach achieves state-of-the-art performance even without missing data and can be naturally extended to address semi-supervised learning tasks, highlighting MIRRAMS as a powerful, off-the-shelf framework for general-purpose learning.
nan
Article 1195
Title@2025-07-11 (5): Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
Title: Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets | Pocket2Mol: Effiziente molekulare Probenahme auf Basis von 3D Protein Pockets | Pocket2Mol:基于 3D 蛋白质薄片的有效分子取样 2205.07249v2 |
Authors (6): Xingang Peng, Shitong Luo, Jiaqi Guan, Qi Xie, Jian Peng, Jianzhu Ma
Deep generative models have achieved tremendous success in designing novel drug molecules in recent years. A new thread of works have shown the great potential in advancing the specificity and success rate of in silico drug design by considering the structure of protein pockets. This setting posts fundamental computational challenges in sampling new chemical compounds that could satisfy multiple geometrical constraints imposed by pockets. Previous sampling algorithms either sample in the graph space or only consider the 3D coordinates of atoms while ignoring other detailed chemical structures such as bond types and functional groups. To address the challenge, we develop Pocket2Mol, an E(3)-equivariant generative network composed of two modules: 1) a new graph neural network capturing both spatial and bonding relationships between atoms of the binding pockets and 2) a new efficient algorithm which samples new drug candidates conditioned on the pocket representations from a tractable distribution without relying on MCMC. Experimental results demonstrate that molecules sampled from Pocket2Mol achieve significantly better binding affinity and other drug properties such as druglikeness and synthetic accessibility.
nan
Article 1196
Title@2025-07-11 (5): Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators
Title: Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators | Lokales Transfer-Lernen Gaußsche Prozessmodellierung, mit Anwendungen zur Ersatzmodellierung von teuren Computersimulatoren | 当地转移学习学习 高斯进程建模,并应用昂贵计算机模拟器替代模型 2410.12690v3 |
Authors (4): Xinming Wang, Simon Mak, John Miller, Jianguo Wu
A critical bottleneck for scientific progress is the costly nature of computer simulations for complex systems. Surrogate models provide an appealing solution: such models are trained on simulator evaluations, then used to emulate and quantify uncertainty on the expensive simulator at unexplored inputs. In many applications, one often has available data on related systems. For example, in designing a new jet turbine, there may be existing studies on turbines with similar configurations. A key question is how information from such source'' systems can be transferred for effective surrogate training on the
target’’ system of interest. We thus propose a new LOcal transfer Learning Gaussian Process (LOL-GP) model, which leverages a carefully-designed Gaussian process to transfer such information for surrogate modeling. The key novelty of the LOL-GP is a latent regularization model, which identifies regions where transfer should be performed and regions where it should be avoided. Such a local transfer'' property is present in many scientific systems: at certain parameters, systems may behave similarly and thus transfer is beneficial; at other parameters, they may behave differently and thus transfer is detrimental. By accounting for local transfer, the LOL-GP can temper the risk of
negative transfer’’, i.e., the risk of worsening predictive performance from information transfer. We derive a Gibbs sampling algorithm for efficient posterior predictive sampling on the LOL-GP, for both the multi-source and multi-fidelity transfer settings. We then show, via a suite of numerical experiments and an application for jet turbine design, the improved surrogate performance of the LOL-GP over existing methods.
nan
Article 1197
Title@2025-07-11 (5): EmissionNet: Air Quality Pollution Forecasting for Agriculture
Title: EmissionNet: Air Quality Pollution Forecasting for Agriculture | EmissionsNet: Vorhersage der Luftqualität für die Landwirtschaft | 排放网:农业空气质量污染预测 2507.05416v2 |
Authors (2): Prady Saligram, Tanvir Bhathal
Air pollution from agricultural emissions is a significant yet often overlooked contributor to environmental and public health challenges. Traditional air quality forecasting models rely on physics-based approaches, which struggle to capture complex, nonlinear pollutant interactions. In this work, we explore forecasting N$_2$O agricultural emissions through evaluating popular architectures, and proposing two novel deep learning architectures, EmissionNet (ENV) and EmissionNet-Transformer (ENT). These models leverage convolutional and transformer-based architectures to extract spatial-temporal dependencies from high-resolution emissions data
nan
Article 1198
Title@2025-07-11 (5): A Novel Shape-Aware Topological Representation for GPR Data with DNN Integration
Title: A Novel Shape-Aware Topological Representation for GPR Data with DNN Integration | Eine neuartige formbewusste Topologische Darstellung für GPR-Daten mit DNN-Integration | 与 DNN 融合的GPR数据新元形状- 工具地形代表 2506.06311v2 |
Authors (6): Meiyan Kang, Shizuo Kaji, Sang-Yun Lee, Taegon Kim, Hee-Hwan Ryu, Suyoung Choi
Ground Penetrating Radar (GPR) is a widely used Non-Destructive Testing (NDT) technique for subsurface exploration, particularly in infrastructure inspection and maintenance. However, conventional interpretation methods are often limited by noise sensitivity and a lack of structural awareness. This study presents a novel framework that enhances the detection of underground utilities, especially pipelines, by integrating shape-aware topological features derived from B-scan GPR images using Topological Data Analysis (TDA), with the spatial detection capabilities of the YOLOv5 deep neural network (DNN). We propose a novel shape-aware topological representation that amplifies structural features in the input data, thereby improving the model’s responsiveness to the geometrical features of buried objects. To address the scarcity of annotated real-world data, we employ a Sim2Real strategy that generates diverse and realistic synthetic datasets, effectively bridging the gap between simulated and real-world domains. Experimental results demonstrate significant improvements in mean Average Precision (mAP), validating the robustness and efficacy of our approach. This approach underscores the potential of TDA-enhanced learning in achieving reliable, real-time subsurface object detection, with broad applications in urban planning, safety inspection, and infrastructure management.
nan
Article 1199
Title@2025-07-11 (5): Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization
Title: Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization | datengetriebene Dimensionssynthese unterschiedlicher planarer Vier-Leiter-Funktionsgenerierungsmechanismen über direkte Parametrierung | 通过直接参数化实现的多层平板四巴函数生成机制数据驱动多维度合成 2507.08269v1 |
Authors (5): Woon Ryong Kim, Jaeheun Jung, Jeong Un Ha, Donghun Lee, Jae Kyung Shim
Dimensional synthesis of planar four-bar mechanisms is a challenging inverse problem in kinematics, requiring the determination of mechanism dimensions from desired motion specifications. We propose a data-driven framework that bypasses traditional equation-solving and optimization by leveraging supervised learning. Our method combines a synthetic dataset, an LSTM-based neural network for handling sequential precision points, and a Mixture of Experts (MoE) architecture tailored to different linkage types. Each expert model is trained on type-specific data and guided by a type-specifying layer, enabling both single-type and multi-type synthesis. A novel simulation metric evaluates prediction quality by comparing desired and generated motions. Experiments show our approach produces accurate, defect-free linkages across various configurations. This enables intuitive and efficient mechanism design, even for non-expert users, and opens new possibilities for scalable and flexible synthesis in kinematic design.
nan
Article 1200
Title@2025-07-11 (5): A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning
Title: A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning | Ein praktisches Zwei-Stufen-Rezept für mathematische LLMs: Maximierung der Genauigkeit mit SFT und Effizienz mit Verstärkungslernen | 数学LMM的实用双级两套套套餐:最大限度地提高SFT的准确度和强化学习的效率 2507.08267v1 |
Authors (3): Hiroshi Yoshihara, Taiki Yamaguchi, Yuichi Inoue
Enhancing the mathematical reasoning of Large Language Models (LLMs) is a pivotal challenge in advancing AI capabilities. While Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are the dominant training paradigms, a systematic methodology for combining them to maximize both accuracy and efficiency remains largely unexplored. This paper introduces a practical and effective training recipe that strategically integrates extended SFT with RL from online inference (GRPO). We posit that these methods play complementary, not competing, roles: a prolonged SFT phase first pushes the model’s accuracy to its limits, after which a GRPO phase dramatically improves token efficiency while preserving this peak performance. Our experiments reveal that extending SFT for as many as 10 epochs is crucial for performance breakthroughs, and that the primary role of GRPO in this framework is to optimize solution length. The efficacy of our recipe is rigorously validated through top-tier performance on challenging benchmarks, including a high rank among over 2,200 teams in the strictly leak-free AI Mathematical Olympiad (AIMO). This work provides the community with a battle-tested blueprint for developing state-of-the-art mathematical reasoners that are both exceptionally accurate and practically efficient. To ensure full reproducibility and empower future research, we will open-source our entire framework, including all code, model checkpoints, and training configurations at https://github.com/analokmaus/kaggle-aimo2-fast-math-r1.
nan
Article 1201
Title@2025-07-11 (5): Task Arithmetic Through The Lens Of One-Shot Federated Learning
Title: Task Arithmetic Through The Lens Of One-Shot Federated Learning | Aufgabe Arithmetik durch die Linse des ein-shot-Federated Learning | 通过单层联邦学习的镜头进行任务自真 2411.18607v2 |
Authors (4): Zhixu Silvia Tao, Ian Mason, Sanjeev Kulkarni, Xavier Boix
Task Arithmetic is a model merging technique that enables the combination of multiple models’ capabilities into a single model through simple arithmetic in the weight space, without the need for additional fine-tuning or access to the original training data. However, the factors that determine the success of Task Arithmetic remain unclear. In this paper, we examine Task Arithmetic for multi-task learning by framing it as a one-shot Federated Learning problem. We demonstrate that Task Arithmetic is mathematically equivalent to the commonly used algorithm in Federated Learning, called Federated Averaging (FedAvg). By leveraging well-established theoretical results from FedAvg, we identify two key factors that impact the performance of Task Arithmetic: data heterogeneity and training heterogeneity. To mitigate these challenges, we adapt several algorithms from Federated Learning to improve the effectiveness of Task Arithmetic. Our experiments demonstrate that applying these algorithms can often significantly boost performance of the merged model compared to the original Task Arithmetic approach. This work bridges Task Arithmetic and Federated Learning, offering new theoretical perspectives on Task Arithmetic and improved practical methodologies for model merging.
nan
Article 1202
Title@2025-07-11 (5): Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration
Title: Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration | Navigieren Sie das Unbekannte: Verbesserung der LLM-Vernunft mit intrinsischer Motivation geführte Exploration | 导航未知:利用内在动力性引导探索加强LLM 2505.17621v3 |
Authors (8): Jingtong Gao, Ling Pan, Yejing Wang, Rui Zhong, Chi Lu, Qingpeng Cai, Peng Jiang, Xiangyu Zhao
Reinforcement learning (RL) has emerged as a pivotal method for improving the reasoning capabilities of Large Language Models (LLMs). However, prevalent RL approaches such as Proximal Policy Optimization (PPO) and Group-Regularized Policy Optimization (GRPO) face critical limitations due to their reliance on sparse outcome-based rewards and inadequate mechanisms for incentivizing exploration. These limitations result in inefficient guidance for multi-step reasoning processes. Specifically, sparse reward signals fail to deliver effective or sufficient feedback, particularly for challenging problems. Furthermore, such reward structures induce systematic biases that prioritize exploitation of familiar trajectories over novel solution discovery. These shortcomings critically hinder performance in complex reasoning tasks, which inherently demand iterative refinement across ipntermediate steps. To address these challenges, we propose an Intrinsic Motivation guidEd exploratioN meThOd foR LLM Reasoning (i-MENTOR), a novel method designed to both deliver dense rewards and amplify explorations in the RL-based training paradigm. i-MENTOR introduces three key innovations: trajectory-aware exploration rewards that mitigate bias in token-level strategies while maintaining computational efficiency; dynamic reward scaling to stabilize exploration and exploitation in large action spaces; and advantage-preserving reward implementation that maintains advantage distribution integrity while incorporating exploratory guidance. Experiments across three public datasets demonstrate i-MENTOR’s effectiveness with a 22.39% improvement on the difficult dataset Countdown-4.
nan
Article 1203
Title@2025-07-11 (5): Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks
Title: Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks | Zulässigkeit des Steinschrumpfens für die Batch-Normalisierung in Gegenwart von Adversarialangriffen | 是否允许施泰因·施特里奇在出现对立攻击时进行批次正常化 2507.08261v1 |
Authors (3): Sofia Ivolgina, P. Thomas Fletcher, Baba C. Vemuri
Batch normalization (BN) is a ubiquitous operation in deep neural networks used primarily to achieve stability and regularization during network training. BN involves feature map centering and scaling using sample means and variances, respectively. Since these statistics are being estimated across the feature maps within a batch, this problem is ideally suited for the application of Stein’s shrinkage estimation, which leads to a better, in the mean-squared-error sense, estimate of the mean and variance of the batch. In this paper, we prove that the Stein shrinkage estimator for the mean and variance dominates over the sample mean and variance estimators in the presence of adversarial attacks when modeling these attacks using sub-Gaussian distributions. This facilitates and justifies the application of Stein shrinkage to estimate the mean and variance parameters in BN and use it in image classification (segmentation) tasks with and without adversarial attacks. We present SOTA performance results using this Stein corrected batch norm in a standard ResNet architecture applied to the task of image classification using CIFAR-10 data, 3D CNN on PPMI (neuroimaging) data and image segmentation using HRNet on Cityscape data with and without adversarial attacks.
nan
Article 1204
Title@2025-07-11 (5): Quantum-Accelerated Neural Imputation with Large Language Models (LLMs)
Title: Quantum-Accelerated Neural Imputation with Large Language Models (LLMs) | Quantenbeschleunigte neurale Imputation mit großen Sprachmodellen (LLMs) | 与大语言模型(LLMs)的量度加速神经量算 2507.08255v1 |
Authors (1): Hossein Jamali
Missing data presents a critical challenge in real-world datasets, significantly degrading the performance of machine learning models. While Large Language Models (LLMs) have recently demonstrated remarkable capabilities in tabular data imputation, exemplified by frameworks like UnIMP, their reliance on classical embedding methods often limits their ability to capture complex, non-linear correlations, particularly in mixed-type data scenarios encompassing numerical, categorical, and textual features. This paper introduces Quantum-UnIMP, a novel framework that integrates shallow quantum circuits into an LLM-based imputation architecture. Our core innovation lies in replacing conventional classical input embeddings with quantum feature maps generated by an Instantaneous Quantum Polynomial (IQP) circuit. This approach enables the model to leverage quantum phenomena such as superposition and entanglement, thereby learning richer, more expressive representations of data and enhancing the recovery of intricate missingness patterns. Our experiments on benchmark mixed-type datasets demonstrate that Quantum-UnIMP reduces imputation error by up to 15.2% for numerical features (RMSE) and improves classification accuracy by 8.7% for categorical features (F1-Score) compared to state-of-the-art classical and LLM-based methods. These compelling results underscore the profound potential of quantum-enhanced representations for complex data imputation tasks, even with near-term quantum hardware.
nan
Article 1205
Title@2025-07-11 (5): Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models
Title: Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models | Raptor: Skalierbare Train-Free-Embeddings für 3D medizinische Volumes Leveraging Pretrained 2D Foundation Models | 3D医疗量利用预先训练的2D基础模型 2507.08254v1 |
Authors (6): Ulzee An, Moonseong Jeong, Simon A. Lee, Aditya Gorla, Yuzhe Yang, Sriram Sankararaman
Current challenges in developing foundational models for volumetric imaging data, such as magnetic resonance imaging (MRI), stem from the computational complexity of training state-of-the-art architectures in high dimensions and curating sufficiently large datasets of volumes. To address these challenges, we introduce Raptor (Random Planar Tensor Reduction), a train-free method for generating semantically rich embeddings for volumetric data. Raptor leverages a frozen 2D foundation model, pretrained on natural images, to extract visual tokens from individual cross-sections of medical volumes. These tokens are then spatially compressed using random projections, significantly reducing computational complexity while retaining semantic information. Extensive experiments on ten diverse medical volume tasks verify the superior performance of Raptor over state-of-the-art methods, including those pretrained exclusively on medical volumes (+3% SuPreM, +6% MISFM, +10% Merlin, +13% VoCo, and +14% SLIViT), while entirely bypassing the need for costly training. Our results highlight the effectiveness and versatility of Raptor as a foundation for advancing deep learning-based methods for medical volumes.
nan
Article 1206
Title@2025-07-11 (5): Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs
Title: Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs | Algorithmische Kontiguität aus Low-Grad-Konjektur und Anwendungen in korrelierten Zufallsgraphen | 低度推测和相关随机图中应用的低度推断和 2502.09832v3 |
Authors (1): Zhangsong Li
In this paper, assuming a natural strengthening of the low-degree conjecture, we provide evidence of computational hardness for two problems: (1) the (partial) matching recovery problem in the sparse correlated Erd\H{o}s-R'enyi graphs $\mathcal G(n,q;\rho)$ when the edge-density $q=n^{-1+o(1)}$ and the correlation $\rho<\sqrt{\alpha}$ lies below the Otter’s threshold, solving a remaining problem in \cite{DDL23+}; (2) the detection problem between the correlated sparse stochastic block model $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon;s)$ and a pair of independent stochastic block models $\mathcal S(n,\tfrac{\lambda s}{n};k,\epsilon)$ when $\epsilon^2 \lambda s<1$ lies below the Kesten-Stigum (KS) threshold and $s<\sqrt{\alpha}$ lies below the Otter’s threshold, solving a remaining problem in \cite{CDGL24+}. One of the main ingredient in our proof is to derive certain forms of \emph{algorithmic contiguity} between two probability measures based on bounds on their low-degree advantage. To be more precise, consider the high-dimensional hypothesis testing problem between two probability measures $\mathbb{P}$ and $\mathbb{Q}$ based on the sample $\mathsf Y$. We show that if the low-degree advantage $\mathsf{Adv}_{\leq D} \big( \frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}} \big)=O(1)$, then (assuming the low-degree conjecture) there is no efficient algorithm $\mathcal A$ such that $\mathbb{Q}(\mathcal A(\mathsf Y)=0)=1-o(1)$ and $\mathbb{P}(\mathcal A(\mathsf Y)=1)=\Omega(1)$. This framework provides a useful tool for performing reductions between different inference tasks.
nan
Article 1207
Title@2025-07-11 (5): Thinner Latent Spaces: Detecting Dimension and Imposing Invariance with Conformal Autoencoders
Title: Thinner Latent Spaces: Detecting Dimension and Imposing Invariance with Conformal Autoencoders | Dünnere Latent Spaces: Dimension erkennen und Invarianz mit konformen Autoencodern imposieren | 细边空格: 检测尺寸和与普通自动编码器的不协调情况 2408.16138v2 |
Authors (5): George A. Kevrekidis, Zan Ahmad, Mauro Maggioni, Soledad Villar, Yannis G. Kevrekidis
Conformal Autoencoders are a neural network architecture that imposes orthogonality conditions between the gradients of latent variables to obtain disentangled representations of data. In this work we show that orthogonality relations within the latent layer of the network can be leveraged to infer the intrinsic dimensionality of nonlinear manifold data sets (locally characterized by the dimension of their tangent space), while simultaneously computing encoding and decoding (embedding) maps. We outline the relevant theory relying on differential geometry, and describe the corresponding gradient-descent optimization algorithm. The method is applied to several data sets and we highlight its applicability, advantages, and shortcomings. In addition, we demonstrate that the same computational technology can be used to build coordinate invariance to local group actions when defined only on a (reduced) submanifold of the embedding space.
nan
Article 1208
Title@2025-07-11 (5): SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Title: SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths | SpecDec++: Spekulative Dekodierung durch adaptive Kandidatenlängen steigern | SpecDec+++:通过适应性候选时间长度促进投机替代 2405.19715v3 |
Authors (3): Kaixuan Huang, Xudong Guo, Mengdi Wang
Speculative decoding reduces the inference latency of a target large language model via utilizing a smaller and faster draft model. Its performance depends on a hyperparameter K – the candidate length, i.e., the number of candidate tokens for the target model to verify in each round. However, previous methods often use simple heuristics to choose K, which may result in sub-optimal performance. We study the choice of the candidate length K and formulate it as a Markov Decision Process. We theoretically show that the optimal policy of this Markov decision process takes the form of a threshold policy, i.e., the current speculation should stop and be verified when the probability of getting a rejection exceeds a threshold value. Motivated by this theory, we propose SpecDec++, an enhanced version of speculative decoding that adaptively determines the candidate length on the fly. We augment the draft model with a trained acceptance prediction head to predict the conditional acceptance probability of the candidate tokens. SpecDec++ will stop the current speculation when the predicted probability that at least one token gets rejected exceeds a threshold. We implement SpecDec++ and apply it to the llama-2-chat 7B & 70B model pair. Our adaptive method achieves a 2.04x speedup on the Alpaca dataset (7.2% improvement over the baseline speculative decoding). On the GSM8K and HumanEval datasets, our method achieves a 2.26x speedup (9.4% improvement) and 2.23x speedup (11.1% improvement), respectively. The code of this paper is available at https://github.com/Kaffaljidhmah2/SpecDec_pp.
nan
Article 1209
Title@2025-07-11 (5): PAC-Bayes Analysis for Recalibration in Classification
Title: PAC-Bayes Analysis for Recalibration in Classification | PAC-Bayes Analyse zur Rekalibrierung in der Klassifizierung | PAC-Bayes分类重新计算分析 2406.06227v2 |
Authors (2): Masahiro Fujisawa, Futoshi Futami
Nonparametric estimation using uniform-width binning is a standard approach for evaluating the calibration performance of machine learning models. However, existing theoretical analyses of the bias induced by binning are limited to binary classification, creating a significant gap with practical applications such as multiclass classification. Additionally, many parametric recalibration algorithms lack theoretical guarantees for their generalization performance. To address these issues, we conduct a generalization analysis of calibration error using the probably approximately correct Bayes framework. This approach enables us to derive the first optimizable upper bound for generalization error in the calibration context. On the basis of our theory, we propose a generalization-aware recalibration algorithm. Numerical experiments show that our algorithm enhances the performance of Gaussian process-based recalibration across various benchmark datasets and models.
nan
Article 1210
Title@2025-07-11 (5): Transfer Learning and Mixup for Fine-Grained Few-Shot Fungi Classification
Title: Transfer Learning and Mixup for Fine-Grained Few-Shot Fungi Classification | Transfer-Lernen und Mischen für die feinkörnige Wenig-Hot-Fungi-Klassifikation | 微粒少沙托菌菌类分类的转移学习和混合学习和混合 2507.08248v1 |
Authors (3): Jason Kahei Tam, Murilo Gustineli, Anthony Miyaguchi
Accurate identification of fungi species presents a unique challenge in computer vision due to fine-grained inter-species variation and high intra-species variation. This paper presents our approach for the FungiCLEF 2025 competition, which focuses on few-shot fine-grained visual categorization (FGVC) using the FungiTastic Few-Shot dataset. Our team (DS@GT) experimented with multiple vision transformer models, data augmentation, weighted sampling, and incorporating textual information. We also explored generative AI models for zero-shot classification using structured prompting but found them to significantly underperform relative to vision-based models. Our final model outperformed both competition baselines and highlighted the effectiveness of domain specific pretraining and balanced sampling strategies. Our approach ranked 35/74 on the private test set in post-completion evaluation, this suggests additional work can be done on metadata selection and domain-adapted multi-modal learning. Our code is available at https://github.com/dsgt-arc/fungiclef-2025.
nan
Article 1211
Title@2025-07-11 (5): A Survey on State-of-the-art Deep Learning Applications and Challenges
Title: A Survey on State-of-the-art Deep Learning Applications and Challenges | Eine Umfrage zu aktuellen Anwendungen und Herausforderungen des Deep Learning | 关于最先进的深深学习应用和挑战的调查 2403.17561v8 |
Authors (2): Mohd Halim Mohd Noor, Ayokunle Olalekan Ige
Deep learning, a branch of artificial intelligence, is a data-driven method that uses multiple layers of interconnected units or neurons to learn intricate patterns and representations directly from raw input data. Empowered by this learning capability, it has become a powerful tool for solving complex problems and is the core driver of many groundbreaking technologies and innovations. Building a deep learning model is challenging due to the algorithm’s complexity and the dynamic nature of real-world problems. Several studies have reviewed deep learning concepts and applications. However, the studies mostly focused on the types of deep learning models and convolutional neural network architectures, offering limited coverage of the state-of-the-art deep learning models and their applications in solving complex problems across different domains. Therefore, motivated by the limitations, this study aims to comprehensively review the state-of-the-art deep learning models in computer vision, natural language processing, time series analysis and pervasive computing, and robotics. We highlight the key features of the models and their effectiveness in solving the problems within each domain. Furthermore, this study presents the fundamentals of deep learning, various deep learning model types and prominent convolutional neural network architectures. Finally, challenges and future directions in deep learning research are discussed to offer a broader perspective for future researchers.
nan
Article 1212
Title@2025-07-11 (5): CoreSPECT: Enhancing Clustering Algorithms via an Interplay of Density and Geometry
Title: CoreSPECT: Enhancing Clustering Algorithms via an Interplay of Density and Geometry | CoreSPECT: Verbesserung der Clustering-Algorithmen durch ein Interplay von Dichte und Geometrie | 核心内容:通过密度和几何的相互作用加强群集比 2507.08243v1 |
Authors (3): Chandra Sekhar Mukherjee, Joonyoung Bae, Jiapeng Zhang
Density and geometry have long served as two of the fundamental guiding principles in clustering algorithm design, with algorithm usually focusing either on the density structure of the data (e.g., HDBSCAN and Density Peak Clustering) or the complexity of underlying geometry (e.g., manifold clustering algorithms). In this paper, we identify and formalize a recurring but often overlooked interaction between distribution and geometry and leverage this insight to design our clustering enhancement framework CoreSPECT (Core Space Projection-based Enhancement of Clustering Techniques). Our framework boosts the performance of simple algorithms like K-Means and GMM by applying them to strategically selected regions, then extending the partial partition to a complete partition for the dataset using a novel neighborhood graph based multi-layer propagation procedure. We apply our framework on 15 datasets from three different domains and obtain consistent and substantial gain in clustering accuracy for both K-Means and GMM. On average, our framework improves the ARI of K-Means by 40% and of GMM by 14%, often surpassing the performance of both manifold-based and recent density-based clustering algorithms. We further support our framework with initial theoretical guarantees, ablation to demonstrate the usefulness of the individual steps and with evidence of robustness to noise.
nan
Article 1213
Title@2025-07-11 (5): An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems
Title: An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems | Ausblick auf die Chancen und Herausforderungen multiagenter KI-Systeme | 关于多机构AI系统机会和挑战的展望 2505.18397v2 |
Authors (15): Fangqiao Tian, An Luo, Jin Du, Xun Xian, Robert Specht, Ganghua Wang, Xuan Bi, Jiawei Zhou, Ashish Kundu, Jayanth Srinivasa, Charles Fleming, Rui Zhang, Zirui Liu, Mingyi Hong, Jie Ding
A multi-agent AI system (MAS) is composed of multiple autonomous agents that interact, exchange information, and make decisions based on internal generative models. Recent advances in large language models and tool-using agents have made MAS increasingly practical in areas like scientific discovery and collaborative automation. However, key questions remain: When are MAS more effective than single-agent systems? What new safety risks arise from agent interactions? And how should we evaluate their reliability and structure? This paper outlines a formal framework for analyzing MAS, focusing on two core aspects: effectiveness and safety. We explore whether MAS truly improve robustness, adaptability, and performance, or merely repackage known techniques like ensemble learning. We also study how inter-agent dynamics may amplify or suppress system vulnerabilities. While MAS are relatively new to the signal processing community, we envision them as a powerful abstraction that extends classical tools like distributed estimation and sensor fusion to higher-level, policy-driven inference. Through experiments on data science automation, we highlight the potential of MAS to reshape how signal processing systems are designed and trusted.
nan
Article 1214
Title@2025-07-11 (5): Exploring Gender Differences in Chronic Pain Discussions on Reddit
Title: Exploring Gender Differences in Chronic Pain Discussions on Reddit | Erforschung geschlechtsspezifischer Unterschiede bei chronischen Schmerzdiskussionen auf Reddit | 探讨关于康复的慢性疼痛讨论中的性别差异 2507.08241v1 |
Authors (3): Ancita Maria Andrade, Tanvi Banerjee, Ramakrishna Mundugar
Pain is an inherent part of human existence, manifesting as both physical and emotional experiences, and can be categorized as either acute or chronic. Over the years, extensive research has been conducted to understand the causes of pain and explore potential treatments, with contributions from various scientific disciplines. However, earlier studies often overlooked the role of gender in pain experiences. In this study, we utilized Natural Language Processing (NLP) to analyze and gain deeper insights into individuals’ pain experiences, with a particular focus on gender differences. We successfully classified posts into male and female corpora using the Hidden Attribute Model-Convolutional Neural Network (HAM-CNN), achieving an F1 score of 0.86 by aggregating posts based on usernames. Our analysis revealed linguistic differences between genders, with female posts tending to be more emotionally focused. Additionally, the study highlighted that conditions such as migraine and sinusitis are more prevalent among females and explored how pain medication affects individuals differently based on gender.
nan
Article 1215
Title@2025-07-11 (5): On the Principles of ReLU Networks with One Hidden Layer
Title: On the Principles of ReLU Networks with One Hidden Layer | Über die Prinzipien von ReLU-Netzwerken mit einer verborgenen Ebene | 关于 “ 同一层 “ RELU网络原则 2411.06728v2 |
Authors (1): Changcun Huang
A neural network with one hidden layer or a two-layer network (regardless of the input layer) is the simplest feedforward neural network, whose mechanism may be the basis of more general network architectures. However, even to this type of simple architecture, it is also a ``black box’’; that is, it remains unclear how to interpret the mechanism of its solutions obtained by the back-propagation algorithm and how to control the training process through a deterministic way. This paper systematically studies the first problem by constructing universal function-approximation solutions. It is shown that, both theoretically and experimentally, the training solution for the one-dimensional input could be completely understood, and that for a higher-dimensional input can also be well interpreted to some extent. Those results pave the way for thoroughly revealing the black box of two-layer ReLU networks and advance the understanding of deep ReLU networks.
nan
Article 1216
Title@2025-07-11 (5): Data Generation without Function Estimation
Title: Data Generation without Function Estimation | Datenerstellung ohne Funktionsabschätzung | 无函数估算的生成数据 2507.08239v1 |
Authors (2): Hadi Daneshmand, Ashkan Soleymani
Estimating the score function (or other population-density-dependent functions) is a fundamental component of most generative models. However, such function estimation is computationally and statistically challenging. Can we avoid function estimation for data generation? We propose an estimation-free generative method: A set of points whose locations are deterministically updated with (inverse) gradient descent can transport a uniform distribution to arbitrary data distribution, in the mean field regime, without function estimation, training neural networks, and even noise injection. The proposed method is built upon recent advances in the physics of interacting particles. We show, both theoretically and experimentally, that these advances can be leveraged to develop novel generative methods.
nan
Article 1217
Title@2025-07-11 (5): Self-Supervised Learning-Based Multimodal Prediction on Prosocial Behavior Intentions
Title: Self-Supervised Learning-Based Multimodal Prediction on Prosocial Behavior Intentions | Selbstüberwachte multimodale Lernvorhersage über prosoziale Verhaltensabsichten | 对有利社会行为行为的自我监督学习的多模式预测 2507.08238v1 |
Authors (4): Abinay Reddy Naini, Zhaobo K. Zheng, Teruhisa Misu, Kumar Akash
Human state detection and behavior prediction have seen significant advancements with the rise of machine learning and multimodal sensing technologies. However, predicting prosocial behavior intentions in mobility scenarios, such as helping others on the road, is an underexplored area. Current research faces a major limitation. There are no large, labeled datasets available for prosocial behavior, and small-scale datasets make it difficult to train deep-learning models effectively. To overcome this, we propose a self-supervised learning approach that harnesses multi-modal data from existing physiological and behavioral datasets. By pre-training our model on diverse tasks and fine-tuning it with a smaller, manually labeled prosocial behavior dataset, we significantly enhance its performance. This method addresses the data scarcity issue, providing a more effective benchmark for prosocial behavior prediction, and offering valuable insights for improving intelligent vehicle systems and human-machine interaction.
nan
Article 1218
Title@2025-07-11 (5): InsightBuild: LLM-Powered Causal Reasoning in Smart Building Systems
Title: InsightBuild: LLM-Powered Causal Reasoning in Smart Building Systems | InsightBuild: LLM-Powered Causal Reasoning in Smart Building Systems | Insight 建筑:智能建筑系统中的LLM能动原因推理 2507.08235v1 |
Authors (3): Pinaki Prasad Guha Neogi, Ahmad Mohammadshirazi, Rajiv Ramnath
Smart buildings generate vast streams of sensor and control data, but facility managers often lack clear explanations for anomalous energy usage. We propose InsightBuild, a two-stage framework that integrates causality analysis with a fine-tuned large language model (LLM) to provide human-readable, causal explanations of energy consumption patterns. First, a lightweight causal inference module applies Granger causality tests and structural causal discovery on building telemetry (e.g., temperature, HVAC settings, occupancy) drawn from Google Smart Buildings and Berkeley Office datasets. Next, an LLM, fine-tuned on aligned pairs of sensor-level causes and textual explanations, receives as input the detected causal relations and generates concise, actionable explanations. We evaluate InsightBuild on two real-world datasets (Google: 2017-2022; Berkeley: 2018-2020), using expert-annotated ground-truth causes for a held-out set of anomalies. Our results demonstrate that combining explicit causal discovery with LLM-based natural language generation yields clear, precise explanations that assist facility managers in diagnosing and mitigating energy inefficiencies.
nan
Article 1219
Title@2025-07-10 (4): Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning
Title: Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning | Verdolmetschen von großen Text-zu-Bild-Diffusions-Modellen mit Wörterbuch-Lernen | 解释具有字典学习的大文本到图像传播模型 2505.24360v3 |
Authors (6): Stepan Shabalin, Ayush Panda, Dmitrii Kharlapenko, Abdur Raheem Ali, Yixiong Hao, Arthur Conmy
Sparse autoencoders are a promising new approach for decomposing language model activations for interpretation and control. They have been applied successfully to vision transformer image encoders and to small-scale diffusion models. Inference-Time Decomposition of Activations (ITDA) is a recently proposed variant of dictionary learning that takes the dictionary to be a set of data points from the activation distribution and reconstructs them with gradient pursuit. We apply Sparse Autoencoders (SAEs) and ITDA to a large text-to-image diffusion model, Flux 1, and consider the interpretability of embeddings of both by introducing a visual automated interpretation pipeline. We find that SAEs accurately reconstruct residual stream embeddings and beat MLP neurons on interpretability. We are able to use SAE features to steer image generation through activation addition. We find that ITDA has comparable interpretability to SAEs.
nan
Article 1220
Title@2025-07-10 (4): ZKTorch: Compiling ML Inference to Zero-Knowledge Proofs via Parallel Proof Accumulation
Title: ZKTorch: Compiling ML Inference to Zero-Knowledge Proofs via Parallel Proof Accumulation | ZKTorch: Kompilieren von ML-Inferenz zu Null-Wissens-Proofs durch parallele Proof-Kumulation | ZKTorch:通过平行证据累积,将ML推论编成零知识证据 2507.07031v2 |
Authors (3): Bing-Jyue Chen, Lilia Tang, Daniel Kang
As AI models become ubiquitous in our daily lives, there has been an increasing demand for transparency in ML services. However, the model owner does not want to reveal the weights, as they are considered trade secrets. To solve this problem, researchers have turned to zero-knowledge proofs of ML model inference. These proofs convince the user that the ML model output is correct, without revealing the weights of the model to the user. Past work on these provers can be placed into two categories. The first method compiles the ML model into a low-level circuit, and proves the circuit using a ZK-SNARK. The second method uses custom cryptographic protocols designed only for a specific class of models. Unfortunately, the first method is highly inefficient, making it impractical for the large models used today, and the second method does not generalize well, making it difficult to update in the rapidly changing field of machine learning. To solve this, we propose ZKTorch, an open source end-to-end proving system that compiles ML models into base cryptographic operations called basic blocks, each proved using specialized protocols. ZKTorch is built on top of a novel parallel extension to the Mira accumulation scheme, enabling succinct proofs with minimal accumulation overhead. These contributions allow ZKTorch to achieve at least a $3\times$ reduction in the proof size compared to specialized protocols and up to a $6\times$ speedup in proving time over a general-purpose ZKML framework.
nan
Article 1221
Title@2025-07-10 (4): Extracting memorized pieces of (copyrighted) books from open-weight language models
Title: Extracting memorized pieces of (copyrighted) books from open-weight language models | Extrahieren von auswendig gelernten Stücken von Büchern aus Open-Wight-Sprachmodellen | 从开放重量级语言模式中提取(复印权)书籍 2505.12546v2 |
Authors (8): A. Feder Cooper, Aaron Gokaslan, Ahmed Ahmed, Amy B. Cyphert, Christopher De Sa, Mark A. Lemley, Daniel E. Ho, Percy Liang
Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs’ protected expression. Drawing on adversarial ML and copyright law, we show that these polarized positions dramatically oversimplify the relationship between memorization and copyright. To do so, we leverage a recent probabilistic extraction technique to extract pieces of the Books3 dataset from 17 open-weight LLMs. Through numerous experiments, we show that it’s possible to extract substantial parts of at least some books from different LLMs. This is evidence that these LLMs have memorized the extracted text; this memorized content is copied inside the model parameters. But the results are complicated: the extent of memorization varies both by model and by book. With our specific experiments, we find that the largest LLMs don’t memorize most books–either in whole or in part. However, we also find that Llama 3.1 70B memorizes some books, like Harry Potter and the Sorcerer’s Stone and 1984, almost entirely. In fact, Harry Potter is so memorized that, using a seed prompt consisting of just the first line of chapter 1, we can deterministically generate the entire book near-verbatim. We discuss why our results have significant implications for copyright cases, though not ones that unambiguously favor either side.
nan
Article 1222
Title@2025-07-10 (4): On the Necessity of Output Distribution Reweighting for Effective Class Unlearning
Title: On the Necessity of Output Distribution Reweighting for Effective Class Unlearning | Über die Notwendigkeit der Neugewichtung der Output-Distribution für effektives Klassenunlernen | 有效班级取消学习时必须增加产出分配的加权 2506.20893v2 |
Authors (3): Yian Wang, Ali Ebrahimpour-Boroojeny, Hari Sundaram
In this work, we introduce an output-reweighting unlearning method, RWFT, a lightweight technique that erases an entire class from a trained classifier without full retraining. Forgetting specific classes from trained models is essential for enforcing user deletion rights and mitigating harmful or biased predictions. The full retraining is costly and existing unlearning methods fail to replicate the behavior of the retrained models when predicting samples from the unlearned class. We prove this failure by designing a variant of membership inference attacks, MIA-NN that successfully reveals the unlearned class for any of these methods. We propose a simple redistribution of the probability mass for the prediction on the samples in the forgotten class which is robust to MIA-NN. We also introduce a new metric based on the total variation (TV) distance of the prediction probabilities to quantify residual leakage to prevent future methods from susceptibility to the new attack. Through extensive experiments with state of the art baselines in machine unlearning, we show that our approach matches the results of full retraining in both metrics used for evaluation by prior work and the new metric we propose in this work. Compare to state-of-the-art methods, we gain 2.79% in previously used metrics and 111.45% in our new TV-based metric over the best existing method.
nan
Article 1223
Title@2025-07-10 (4): EvA: Evolutionary Attacks on Graphs
Title: EvA: Evolutionary Attacks on Graphs | EvA: Evolutionäre Angriffe auf Graphen | EvA:对图表的进化攻击 2507.08212v1 |
Authors (4): Mohammad Sadegh Akhondzadeh, Soroush H. Zargarbashi, Jimin Cao, Aleksandar Bojchevski
Even a slight perturbation in the graph structure can cause a significant drop in the accuracy of graph neural networks (GNNs). Most existing attacks leverage gradient information to perturb edges. This relaxes the attack’s optimization problem from a discrete to a continuous space, resulting in solutions far from optimal. It also restricts the adaptability of the attack to non-differentiable objectives. Instead, we introduce a few simple yet effective enhancements of an evolutionary-based algorithm to solve the discrete optimization problem directly. Our Evolutionary Attack (EvA) works with any black-box model and objective, eliminating the need for a differentiable proxy loss. This allows us to design two novel attacks that reduce the effectiveness of robustness certificates and break conformal sets. The memory complexity of our attack is linear in the attack budget. Among our experiments, EvA shows $\sim$11\% additional drop in accuracy on average compared to the best previous attack, revealing significant untapped potential in designing attacks.
nan
Article 1224
Title@2025-07-10 (4): Deep Learning-Based Forecasting of Boarding Patient Counts to Address ED Overcrowding
Title: Deep Learning-Based Forecasting of Boarding Patient Counts to Address ED Overcrowding | Deep Learning-based Forecasting von Boarding-Patienten zählt ED Overcrowding Adresse | 对寄宿病人计数进行深入的基于学习的预测,以解决ED过度拥挤问题 2505.14765v2 |
Authors (5): Orhun Vural, Bunyamin Ozaydin, James Booth, Brittany F. Lindsey, Abdulaziz Ahmed
This study presents a deep learning-based framework for predicting emergency department (ED) boarding counts six hours in advance using only operational and contextual data, without patient-level information. Data from ED tracking systems, inpatient census, weather, holidays, and local events were aggregated hourly and processed with comprehensive feature engineering. The mean ED boarding count was 28.7 (standard deviation = 11.2). Multiple deep learning models, including ResNetPlus, TSTPlus, and TSiTPlus, were trained and optimized using Optuna, with TSTPlus achieving the best results (mean absolute error = 4.30, mean squared error = 29.47, R2 = 0.79). The framework accurately forecasted boarding counts, including during extreme periods, and demonstrated that broader input features improve predictive accuracy. This approach supports proactive hospital management and offers a practical method for mitigating ED overcrowding.
nan
Article 1225
Title@2025-07-10 (4): Signed Diverse Multiplex Networks: Clustering and Inference
Title: Signed Diverse Multiplex Networks: Clustering and Inference | Signierte Vielfältige Multiplex-Netzwerke: Clustering und Schlussfolgerung | 已签署的多元多重网络:集群和推断 2402.10242v3 |
Authors (1): Marianna Pensky
The paper introduces a Signed Generalized Random Dot Product Graph (SGRDPG) model, which is a variant of the Generalized Random Dot Product Graph (GRDPG), where, in addition, edges can be positive or negative. The setting is extended to a multiplex version, where all layers have the same collection of nodes and follow the SGRDPG. The only common feature of the layers of the network is that they can be partitioned into groups with common subspace structures, while otherwise matrices of connection probabilities can be all different. The setting above is extremely flexible and includes a variety of existing multiplex network models, including GRDPG, as its particular cases. By employing novel methodologies, our paper ensures strongly consistent clustering of layers and highly accurate subspace estimation, which are significant improvements over the results of Pensky and Wang (2024). All algorithms and theoretical results in the paper remain true for both signed and binary networks. In addition, the paper shows that keeping signs of the edges in the process of network construction leads to a better precision of estimation and clustering and, hence, is beneficial for tackling real world problems such as, for example, analysis of brain networks.
nan
Article 1226
Title@2025-07-10 (4): Compositional Risk Minimization
Title: Compositional Risk Minimization | Zusammensetzungelle Risikominimierung | 尽量减少风险 2410.06303v3 |
Authors (6): Divyat Mahajan, Mohammad Pezeshki, Charles Arnal, Ioannis Mitliagkas, Kartik Ahuja, Pascal Vincent
Compositional generalization is a crucial step towards developing data-efficient intelligent machines that generalize in human-like ways. In this work, we tackle a challenging form of distribution shift, termed compositional shift, where some attribute combinations are completely absent at training but present in the test distribution. This shift tests the model’s ability to generalize compositionally to novel attribute combinations in discriminative tasks. We model the data with flexible additive energy distributions, where each energy term represents an attribute, and derive a simple alternative to empirical risk minimization termed compositional risk minimization (CRM). We first train an additive energy classifier to predict the multiple attributes and then adjust this classifier to tackle compositional shifts. We provide an extensive theoretical analysis of CRM, where we show that our proposal extrapolates to special affine hulls of seen attribute combinations. Empirical evaluations on benchmark datasets confirms the improved robustness of CRM compared to other methods from the literature designed to tackle various forms of subpopulation shifts.
nan
Article 1227
Title@2025-07-10 (4): EP-GAT: Energy-based Parallel Graph Attention Neural Network for Stock Trend Classification
Title: EP-GAT: Energy-based Parallel Graph Attention Neural Network for Stock Trend Classification | EP-GAT: Energiebasierte parallele Graphen-Achtung Neuronales Netzwerk für die Bestandstrendklassifikation | EP-GAT:基于能源的库存趋势分类平行图形关注神经网络 2507.08184v1 |
Authors (3): Zhuodong Jiang, Pengju Zhang, Peter Martin
Graph neural networks have shown remarkable performance in forecasting stock movements, which arises from learning complex inter-dependencies between stocks and intra-dynamics of stocks. Existing approaches based on graph neural networks typically rely on static or manually defined factors to model changing inter-dependencies between stocks. Furthermore, these works often struggle to preserve hierarchical features within stocks. To bridge these gaps, this work presents the Energy-based Parallel Graph Attention Neural Network, a novel approach for predicting future movements for multiple stocks. First, it generates a dynamic stock graph with the energy difference between stocks and Boltzmann distribution, capturing evolving inter-dependencies between stocks. Then, a parallel graph attention mechanism is proposed to preserve the hierarchical intra-stock dynamics. Extensive experiments on five real-world datasets are conducted to validate the proposed approach, spanning from the US stock markets (NASDAQ, NYSE, SP) and UK stock markets (FTSE, LSE). The experimental results demonstrate that EP-GAT consistently outperforms competitive five baselines on test periods across various metrics. The ablation studies and hyperparameter sensitivity analysis further validate the effectiveness of each module in the proposed method.
nan
Article 1228
Title@2025-07-10 (4): Shifting Work Patterns with Generative AI
Title: Shifting Work Patterns with Generative AI | Verschiebende Arbeitsmuster mit generativer KI | 以创新创新创新创新创新创新创新创新创新创 2504.11436v3 |
Authors (4): Eleanor Wiske Dillon, Sonia Jaffe, Nicole Immorlica, Christopher T. Stanton
We present evidence on how generative AI changes the work patterns of knowledge workers using data from a 6-month-long, cross-industry, randomized field experiment. Half of the 7,137 workers in the study received access to a generative AI tool integrated into the applications they already used for emails, document creation, and meetings. We find that access to the AI tool during the first year of its release primarily impacted behaviors that workers could change independently and not behaviors that require coordination to change: workers who used the tool in more than half of the sample weeks spent 3.6 fewer hours, or 31% less time on email each week (intent to treat estimate is 1.3 hours) and completed documents moderately faster, but did not significantly change time spent in meetings.
nan
Article 1229
Title@2025-07-10 (4): Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
Title: Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm | Cloud Computing Energieverbrauch Vorhersage auf Basis von Kernel Extreme Learning Machine Algorithm Verbessert durch Vector Gewichteter Durchschnitt Algorithm | 以内核极端学习机器算法为基础,用矢量加权平均算法改进的云计算 云能消耗预测值 2503.04088v3 |
Authors (2): Yuqing Wang, Xiao Yang
With the rapid expansion of cloud computing infrastructure, energy consumption has become a critical challenge, driving the need for accurate and efficient prediction models. This study proposes a novel Vector Weighted Average Kernel Extreme Learning Machine (VWAA-KELM) model to enhance energy consumption prediction in cloud computing environments. By integrating a vector weighted average algorithm (VWAA) with kernel extreme learning machine (KELM), the proposed model dynamically adjusts feature weights and optimizes kernel functions, significantly improving prediction accuracy and generalization. Experimental results demonstrate the superior performance of VWAA-KELM: 94.7% of test set prediction errors fall within [0, 50] units, with only three cases exceeding 100 units, indicating strong stability. The model achieves a coefficient of determination (R2) of 0.987 in the training set (RMSE = 28.108, RPD = 8.872) and maintains excellent generalization with R2 = 0.973 in the test set (RMSE = 43.227, RPD = 6.202). Visual analysis confirms that predicted values closely align with actual energy consumption trends, avoiding overfitting while capturing nonlinear dependencies. A key innovation of this study is the introduction of adaptive feature weighting, allowing the model to dynamically assign importance to different input parameters, thereby enhancing high-dimensional data processing. This advancement provides a scalable and efficient approach for optimizing cloud data center energy consumption. Beyond cloud computing, the proposed hybrid framework has broader applications in Internet of Things (IoT) and edge computing, supporting real-time energy management and intelligent resource allocation.
nan
Article 1230
Title@2025-07-10 (4): Parametrized Quantum Circuit Learning for Quantum Chemical Applications
Title: Parametrized Quantum Circuit Learning for Quantum Chemical Applications | Parametrisiertes Quantum Circuit Lernen für Quantum Chemical Anwendungen | 量子化学应用量子电路学习 2507.08183v1 |
Authors (4): Grier M. Jones, Viki Kumar Prasad, Ulrich Fekl, Hans-Arno Jacobsen
In the field of quantum machine learning (QML), parametrized quantum circuits (PQCs) – constructed using a combination of fixed and tunable quantum gates – provide a promising hybrid framework for tackling complex machine learning problems. Despite numerous proposed applications, there remains limited exploration of datasets relevant to quantum chemistry. In this study, we investigate the potential benefits and limitations of PQCs on two chemically meaningful datasets: (1) the BSE49 dataset, containing bond separation energies for 49 different classes of chemical bonds, and (2) a dataset of water conformations, where coupled-cluster singles and doubles (CCSD) wavefunctions are predicted from lower-level electronic structure methods using the data-driven coupled-cluster (DDCC) approach. We construct a comprehensive set of 168 PQCs by combining 14 data encoding strategies with 12 variational ans{"a}tze, and evaluate their performance on circuits with 5 and 16 qubits. Our initial analysis examines the impact of circuit structure on model performance using state-vector simulations. We then explore how circuit depth and training set size influence model performance. Finally, we assess the performance of the best-performing PQCs on current quantum hardware, using both noisy simulations (“fake” backends) and real quantum devices. Our findings underscore the challenges of applying PQCs to chemically relevant problems that are straightforward for classical machine learning methods but remain non-trivial for quantum approaches.
nan
Article 1231
Title@2025-07-10 (4): State Estimation Using Sparse DEIM and Recurrent Neural Networks
Title: State Estimation Using Sparse DEIM and Recurrent Neural Networks | Staatliche Schätzung mit Sparse DEIM und recurrenten Neuronalen Netzwerken | 使用简缩的DEIM和经常性神经网络的状态估计 2410.15982v2 |
Authors (1): Mohammad Farazmand
Sparse Discrete Empirical Interpolation Method (S-DEIM) was recently proposed for state estimation in dynamical systems when only a sparse subset of the state variables can be observed. The S-DEIM estimate involves a kernel vector whose optimal value is inferred through a data assimilation algorithm. This data assimilation step suffers from two drawbacks: (i) It requires the knowledge of the governing equations of the dynamical system, and (ii) It is not generally guaranteed to converge to the optimal kernel vector. To address these issues, here we introduce an equation-free S-DEIM framework that estimates the optimal kernel vector from sparse observational time series using recurrent neural networks (RNNs). We show that the recurrent architecture is necessary since the kernel vector cannot be estimated from instantaneous observations. But RNNs, which incorporate the past history of the observations in the learning process, lead to nearly optimal estimations. We demonstrate the efficacy of our method on three numerical examples with increasing degree of spatiotemporal complexity: a conceptual model of atmospheric flow known as the Lorenz-96 system, the Kuramoto-Sivashinsky equation, and the Rayleigh-Benard convection. In each case, the resulting S-DEIM estimates are satisfactory even when a relatively simple RNN architecture, namely the reservoir computing network, is used.
nan
Article 1232
Title@2025-07-10 (4): CTRLS: Chain-of-Thought Reasoning via Latent State-Transition
Title: CTRLS: Chain-of-Thought Reasoning via Latent State-Transition | CTRLS: Gedankliche Veranlagung durch Latent State-Transition | CTRLS:通过中端国家-过渡进行的研究链理由 2507.08182v1 |
Authors (9): Junda Wu, Yuxin Xiong, Xintong Li, Zhengmian Hu, Tong Yu, Rui Wang, Xiang Chen, Jingbo Shang, Julian McAuley
Chain-of-thought (CoT) reasoning enables large language models (LLMs) to break down complex problems into interpretable intermediate steps, significantly enhancing model transparency and performance in reasoning tasks. However, conventional CoT methods rely on heuristic sampling without structured modeling of reasoning transitions, constraining their ability to systematically explore and discover diverse and effective reasoning trajectories. In this work, we introduce CTRLS, a framework that formulates CoT reasoning as a Markov decision process (MDP) with latent state transitions, enabling principled and state-aware exploration via distributional reinforcement learning. By modelling reasoning actions as explicit probability distributions in latent space, our approach explicitly models epistemic uncertainty, facilitating robust exploration of the reasoning space. As part of our framework, we introduce an on-policy reinforcement learning strategy incorporating epsilon-greedy exploration and entropy-based regularization to iteratively refine latent state transitions without requiring additional fine-tuning of the underlying LLM. Theoretical analyses provide evidence lower bounds (ELBO), theoretically grounding our transition-aware modeling of latent reasoning dynamics. Further experiments demonstrate improvements in reasoning accuracy, diversity, and exploration efficiency across benchmark reasoning tasks.
nan
Article 1233
Title@2025-07-10 (4): Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations
Title: Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations | Wissenschaftliches maschinelles Lernen chaotischer Systeme entdeckt regierende Gleichungen für neurale Bevölkerungen | 神经人口等分的麻风系统发现科学机学 2507.03631v2 |
Authors (9): Anthony G. Chesebro, David Hofmann, Vaibhav Dixit, Earl K. Miller, Richard H. Granger, Alan Edelman, Christopher V. Rackauckas, Lilianne R. Mujica-Parodi, Helmut H. Strey
Discovering governing equations that describe complex chaotic systems remains a fundamental challenge in physics and neuroscience. Here, we introduce the PEM-UDE method, which combines the prediction-error method with universal differential equations to extract interpretable mathematical expressions from chaotic dynamical systems, even with limited or noisy observations. This approach succeeds where traditional techniques fail by smoothing optimization landscapes and removing the chaotic properties during the fitting process without distorting optimal parameters. We demonstrate its efficacy by recovering hidden states in the Rossler system and reconstructing dynamics from noise-corrupted electrical circuit data, where the correct functional form of the dynamics is recovered even when one of the observed time series is corrupted by noise 5x the magnitude of the true signal. We demonstrate that this method is capable of recovering the correct dynamics, whereas direct symbolic regression methods, such as SINDy, fail to do so with the given amount of data and noise. Importantly, when applied to neural populations, our method derives novel governing equations that respect biological constraints such as network sparsity - a constraint necessary for cortical information processing yet not captured in next-generation neural mass models - while preserving microscale neuronal parameters. These equations predict an emergent relationship between connection density and both oscillation frequency and synchrony in neural circuits. We validate these predictions using three intracranial electrode recording datasets from the medial entorhinal cortex, prefrontal cortex, and orbitofrontal cortex. Our work provides a pathway to develop mechanistic, multi-scale brain models that generalize across diverse neural architectures, bridging the gap between single-neuron dynamics and macroscale brain activity.
nan
Article 1234
Title@2025-07-10 (4): Rethinking Spatio-Temporal Anomaly Detection: A Vision for Causality-Driven Cybersecurity
Title: Rethinking Spatio-Temporal Anomaly Detection: A Vision for Causality-Driven Cybersecurity | Spatio-Temporale Anomalie-Erkennung neu denken: Eine Vision für ursächliche Cybersicherheit | 重新思考时空空间异常探测:驱动力-驱动网络安全愿景 2507.08177v1 |
Authors (6): Arun Vignesh Malarkkan, Haoyue Bai, Xinyuan Wang, Anjali Kaushik, Dongjie Wang, Yanjie Fu
As cyber-physical systems grow increasingly interconnected and spatially distributed, ensuring their resilience against evolving cyberattacks has become a critical priority. Spatio-Temporal Anomaly detection plays an important role in ensuring system security and operational integrity. However, current data-driven approaches, largely driven by black-box deep learning, face challenges in interpretability, adaptability to distribution shifts, and robustness under evolving system dynamics. In this paper, we advocate for a causal learning perspective to advance anomaly detection in spatially distributed infrastructures that grounds detection in structural cause-effect relationships. We identify and formalize three key directions: causal graph profiling, multi-view fusion, and continual causal graph learning, each offering distinct advantages in uncovering dynamic cause-effect structures across time and space. Drawing on real-world insights from systems such as water treatment infrastructures, we illustrate how causal models provide early warning signals and root cause attribution, addressing the limitations of black-box detectors. Looking ahead, we outline the future research agenda centered on multi-modality, generative AI-driven, and scalable adaptive causal frameworks. Our objective is to lay a new research trajectory toward scalable, adaptive, explainable, and spatially grounded anomaly detection systems. We hope to inspire a paradigm shift in cybersecurity research, promoting causality-driven approaches to address evolving threats in interconnected infrastructures.
nan
Article 1235
Title@2025-07-10 (4): Emotion Recognition in Older Adults with Quantum Machine Learning and Wearable Sensors
Title: Emotion Recognition in Older Adults with Quantum Machine Learning and Wearable Sensors | Emotionserkennung bei älteren Erwachsenen mit Quantum Machine Learning und tragbaren Sensoren | 具有量子机器学习和穿戴感应器的老年人的情感认同 2507.08175v1 |
Authors (3): Md. Saif Hassan Onim, Travis S. Humble, Himanshu Thapliyal
We investigate the feasibility of inferring emotional states exclusively from physiological signals, thereby presenting a privacy-preserving alternative to conventional facial recognition techniques. We conduct a performance comparison of classical machine learning algorithms and hybrid quantum machine learning (QML) methods with a quantum kernel-based model. Our results indicate that the quantum-enhanced SVM surpasses classical counterparts in classification performance across all emotion categories, even when trained on limited datasets. The F1 scores over all classes are over 80% with around a maximum of 36% improvement in the recall values. The integration of wearable sensor data with quantum machine learning not only enhances accuracy and robustness but also facilitates unobtrusive emotion recognition. This methodology holds promise for populations with impaired communication abilities, such as individuals with Alzheimer’s Disease and Related Dementias (ADRD) and veterans with Post-Traumatic Stress Disorder (PTSD). The findings establish an early foundation for passive emotional monitoring in clinical and assisted living conditions.
nan
Article 1236
Title@2025-07-10 (4): Reconstructing Galaxy Cluster Mass Maps using Score-based Generative Modeling
Title: Reconstructing Galaxy Cluster Mass Maps using Score-based Generative Modeling | Rekonstruieren von Galaxy Cluster Massenkarten mit Score-basierte Generative Modellierung | 使用计分生成模型重建银河群群群地图 2410.02857v2 |
Authors (7): Alan Hsu, Matthew Ho, Joyce Lin, Carleen Markey, Michelle Ntampaka, Hy Trac, Barnabás Póczos
We present a novel approach to reconstruct gas and dark matter projected density maps of galaxy clusters using score-based generative modeling. Our diffusion model takes in mock SZ and X-ray images as conditional inputs, and generates realizations of corresponding gas and dark matter maps by sampling from a learned data posterior. We train and validate the performance of our model by using mock data from a cosmological simulation. The model accurately reconstructs both the mean and spread of the radial density profiles in the spatial domain, indicating that the model is able to distinguish between clusters of different mass sizes. In the spectral domain, the model achieves close-to-unity values for the bias and cross-correlation coefficients, indicating that the model can accurately probe cluster structures on both large and small scales. Our experiments demonstrate the ability of score models to learn a strong, nonlinear, and unbiased mapping between input observables and fundamental density distributions of galaxy clusters. These diffusion models can be further fine-tuned and generalized to not only take in additional observables as inputs, but also real observations and predict unknown density distributions of galaxy clusters.
nan
Article 1237
Title@2025-07-10 (4): Emotion Detection in Older Adults Using Physiological Signals from Wearable Sensors
Title: Emotion Detection in Older Adults Using Physiological Signals from Wearable Sensors | Emotionserkennung bei älteren Erwachsenen mit physiologischen Signalen von tragbaren Sensoren | 使用穿戴感应器的生理信号在老年人体内检测情感 2507.08167v1 |
Authors (3): Md. Saif Hassan Onim, Andrew M. Kiselica, Himanshu Thapliyal
Emotion detection in older adults is crucial for understanding their cognitive and emotional well-being, especially in hospital and assisted living environments. In this work, we investigate an edge-based, non-obtrusive approach to emotion identification that uses only physiological signals obtained via wearable sensors. Our dataset includes data from 40 older individuals. Emotional states were obtained using physiological signals from the Empatica E4 and Shimmer3 GSR+ wristband and facial expressions were recorded using camera-based emotion recognition with the iMotion’s Facial Expression Analysis (FEA) module. The dataset also contains twelve emotion categories in terms of relative intensities. We aim to study how well emotion recognition can be accomplished using simply physiological sensor data, without the requirement for cameras or intrusive facial analysis. By leveraging classical machine learning models, we predict the intensity of emotional responses based on physiological signals. We achieved the highest 0.782 r2 score with the lowest 0.0006 MSE on the regression task. This method has significant implications for individuals with Alzheimer’s Disease and Related Dementia (ADRD), as well as veterans coping with Post-Traumatic Stress Disorder (PTSD) or other cognitive impairments. Our results across multiple classical regression models validate the feasibility of this method, paving the way for privacy-preserving and efficient emotion recognition systems in real-world settings.
nan
Article 1238
Title@2025-07-10 (4): Grokking Beyond the Euclidean Norm of Model Parameters
Title: Grokking Beyond the Euclidean Norm of Model Parameters | Grokking jenseits der euklidischen Norm von Modellparametern | 示范参数欧洲标准 2506.05718v2 |
Authors (3): Pascal Jr Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau
Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. In this work, we demonstrate that grokking can be induced by regularization, either explicit or implicit. More precisely, we show that when there exists a model with a property $P$ (e.g., sparse or low-rank weights) that generalizes on the problem of interest, gradient descent with a small but non-zero regularization of $P$ (e.g., $\ell_1$ or nuclear norm regularization) results in grokking. This extends previous work showing that small non-zero weight decay induces grokking. Moreover, our analysis shows that over-parameterization by adding depth makes it possible to grok or ungrok without explicitly using regularization, which is impossible in shallow cases. We further show that the $\ell_2$ norm is not a reliable proxy for generalization when the model is regularized toward a different property $P$, as the $\ell_2$ norm grows in many cases where no weight decay is used, but the model generalizes anyway. We also show that grokking can be amplified solely through data selection, with any other hyperparameter fixed.
nan
Article 1239
Title@2025-07-10 (4): Adaptive Diffusion Denoised Smoothing : Certified Robustness via Randomized Smoothing with Differentially Private Guided Denoising Diffusion
Title: Adaptive Diffusion Denoised Smoothing : Certified Robustness via Randomized Smoothing with Differentially Private Guided Denoising Diffusion | Adaptive Diffusion Denoised Glättung : Zertifizierte Robustheit durch Randomized Glättung mit Differential Private Guided Denoising Diffusion | 适应性扩散 脱节滑动:通过有差异的私人制导滑动,通过随机化滑动,证明强力 2507.08163v1 |
Authors (5): Frederick Shpilevskiy, Saiyue Lyu, Krishnamurthy Dj Dvijotham, Mathias Lécuyer, Pierre-André Noël
We propose Adaptive Diffusion Denoised Smoothing, a method for certifying the predictions of a vision model against adversarial examples, while adapting to the input. Our key insight is to reinterpret a guided denoising diffusion model as a long sequence of adaptive Gaussian Differentially Private (GDP) mechanisms refining a pure noise sample into an image. We show that these adaptive mechanisms can be composed through a GDP privacy filter to analyze the end-to-end robustness of the guided denoising process, yielding a provable certification that extends the adaptive randomized smoothing analysis. We demonstrate that our design, under a specific guiding strategy, can improve both certified accuracy and standard accuracy on ImageNet for an $\ell_2$ threat model.
nan
Article 1240
Title@2025-07-10 (4): Hybrid machine learning based scale bridging framework for permeability prediction of fibrous structures
Title: Hybrid machine learning based scale bridging framework for permeability prediction of fibrous structures | Hybrides maschinelles Lernen auf Basis von Skalenüberbrückungsrahmen für die Permeabilitätsvorhersage von faserigen Strukturen | 用于预测纤维结构渗透性的混合机 机床学习比例过渡框架 2502.05044v2 |
Authors (7): Denis Korolev, Tim Schmidt, Dinesh K. Natarajan, Stefano Cassola, David May, Miro Duhovic, Michael Hintermüller
This study introduces a hybrid machine learning-based scale-bridging framework for predicting the permeability of fibrous textile structures. By addressing the computational challenges inherent to multiscale modeling, the proposed approach evaluates the efficiency and accuracy of different scale-bridging methodologies combining traditional surrogate models and even integrating physics-informed neural networks (PINNs) with numerical solvers, enabling accurate permeability predictions across micro- and mesoscales. Four methodologies were evaluated: Single Scale Method (SSM), Simple Upscaling Method (SUM), Scale-Bridging Method (SBM), and Fully Resolved Model (FRM). SSM, the simplest method, neglects microscale permeability and exhibited permeability values deviating by up to 150\% of the FRM model, which was taken as ground truth at an equivalent lower fiber volume content. SUM improved predictions by considering uniform microscale permeability, yielding closer values under similar conditions, but still lacked structural variability. The SBM method, incorporating segment-based microscale permeability assignments, showed significant enhancements, achieving almost equivalent values while maintaining computational efficiency and modeling runtimes of ~45 minutes per simulation. In contrast, FRM, which provides the highest fidelity by fully resolving microscale and mesoscale geometries, required up to 270 times more computational time than SSM, with model files exceeding 300 GB. Additionally, a hybrid dual-scale solver incorporating PINNs has been developed and shows the potential to overcome generalization errors and the problem of data scarcity of the data-driven surrogate approaches. The hybrid framework advances permeability modelling by balancing computational cost and prediction reliability, laying the foundation for further applications in fibrous composite manufacturing.
nan
Article 1241
Title@2025-07-10 (4): Just Read the Question: Enabling Generalization to New Assessment Items with Text Awareness
Title: Just Read the Question: Enabling Generalization to New Assessment Items with Text Awareness | Lesen Sie einfach die Frage: Ermöglichung der Generalisierung zu neuen Bewertungsgegenständen mit Text-Bewusstsein | 只需读一读问题:在有文本意识的情况下,使新的评估项目能够普遍化。 2507.08154v1 |
Authors (4): Arisha Khan, Nathaniel Li, Tori Shen, Anna N. Rafferty
Machine learning has been proposed as a way to improve educational assessment by making fine-grained predictions about student performance and learning relationships between items. One challenge with many machine learning approaches is incorporating new items, as these approaches rely heavily on historical data. We develop Text-LENS by extending the LENS partial variational auto-encoder for educational assessment to leverage item text embeddings, and explore the impact on predictive performance and generalization to previously unseen items. We examine performance on two datasets: Eedi, a publicly available dataset that includes item content, and LLM-Sim, a novel dataset with test items produced by an LLM. We find that Text-LENS matches LENS’ performance on seen items and improves upon it in a variety of conditions involving unseen items; it effectively learns student proficiency from and makes predictions about student performance on new items.
nan
Article 1242
Title@2025-07-10 (4): ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction
Title: ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction | ALCo-FM: Adaptives Long-Context-Stiftungsmodell für Unfallvorhersage | ALCO-FM:适应性长全文基金会事故预测模型 2507.08153v1 |
Authors (3): Pinaki Prasad Guha Neogi, Ahmad Mohammadshirazi, Rajiv Ramnath
Traffic accidents are rare, yet high-impact events that require long-context multimodal reasoning for accurate risk forecasting. In this paper, we introduce ALCo-FM, a unified adaptive long-context foundation model that computes a volatility pre-score to dynamically select context windows for input data and encodes and fuses these multimodal data via shallow cross attention. Following a local GAT layer and a BigBird-style sparse global transformer over H3 hexagonal grids, coupled with Monte Carlo dropout for confidence, the model yields superior, well-calibrated predictions. Trained on data from 15 US cities with a class-weighted loss to counter label imbalance, and fine-tuned with minimal data on held-out cities, ALCo-FM achieves 0.94 accuracy, 0.92 F1, and an ECE of 0.04, outperforming more than 20 state-of-the-art baselines in large-scale urban risk prediction. Code and dataset are available at: https://github.com/PinakiPrasad12/ALCo-FM
nan
Article 1243
Title@2025-07-10 (4): Downscaling Extreme Precipitation with Wasserstein Regularized Diffusion
Title: Downscaling Extreme Precipitation with Wasserstein Regularized Diffusion | Downscaling Extreme Niederschlag mit Wasserstein Regularized Diffusion | 降降降极端降降,与瓦塞斯坦正规化的传播 2410.00381v3 |
Authors (5): Yuhao Liu, James Doss-Gollin, Qiushi Dai, Ashok Veeraraghavan, Guha Balakrishnan
Understanding the risks posed by extreme rainfall events necessitates both high-resolution products (to assess localized hazards) and extensive historical records (to capture rare occurrences). Radar and mesonet networks provide kilometer-scale precipitation fields, but with limited historical records and geographical coverage. Conversely, global gauge and blended products span decades, yet their coarse 30-50 km grids obscure local extremes. This work introduces Wasserstein Regularized Diffusion (WassDiff), a generative downscaling framework that integrates diffusion modeling with a distribution-matching (Wasserstein) regularizer, suppressing bias throughout the entire generative denoising process. Conditioned on 55 km CPC gauge-based precipitation and the 31 km ERA5 reanalysis, WassDiff generates 1 km precipitation estimates that remain well-calibrated to targets across the full intensity range, including the extremes. Comprehensive evaluations demonstrate that WassDiff outperforms existing state-of-the-art downscaling methods, delivering lower reconstruction error and reduced bias. Case studies further demonstrate its ability to reproduce realistic fine-scale structures and accurate peak intensities from extreme weather phenomena, such as tropical storms and cold fronts. By unlocking decades of high-resolution rainfall information from globally available coarse records, WassDiff offers a practical pathway toward more accurate flood-risk assessments and climate-adaptation planning.
nan
Article 1244
Title@2025-07-10 (4): CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk
Title: CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk | CLEAR: Kalibriertes Lernen für epistemisches und aleatorisches Risiko | CLEAR: 流行和感知风险校准学习 2507.08150v1 |
Authors (4): Ilia Azizi, Juraj Bodik, Jakob Heiss, Bin Yu
Accurate uncertainty quantification is critical for reliable predictive modeling, especially in regression tasks. Existing methods typically address either aleatoric uncertainty from measurement noise or epistemic uncertainty from limited data, but not necessarily both in a balanced way. We propose CLEAR, a calibration method with two distinct parameters, $\gamma_1$ and $\gamma_2$, to combine the two uncertainty components for improved conditional coverage. CLEAR is compatible with any pair of aleatoric and epistemic estimators; we show how it can be used with (i) quantile regression for aleatoric uncertainty and (ii) ensembles drawn from the Predictability-Computability-Stability (PCS) framework for epistemic uncertainty. Across 17 diverse real-world datasets, CLEAR achieves an average improvement of 28.2% and 17.4% in the interval width compared to the two individually calibrated baselines while maintaining nominal coverage. This improvement can be particularly evident in scenarios dominated by either high epistemic or high aleatoric uncertainty.
nan
Article 1245
Title@2025-07-10 (4): Convergence of Natural Policy Gradient for a Family of Infinite-State Queueing MDPs
Title: Convergence of Natural Policy Gradient for a Family of Infinite-State Queueing MDPs | Konvergenz des Gradienten der Naturpolitik für eine Familie unendlicher Staaten, die MDPs in Anspruch nehmen | 自然政策 “ 进步 “ 与 “ 无限国家排队多DP家庭 “ 的趋同 2402.05274v3 |
Authors (3): Isaac Grosof, Siva Theja Maguluri, R. Srikant
A wide variety of queueing systems can be naturally modeled as infinite-state Markov Decision Processes (MDPs). In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs. At the heart of many popular policy-gradient based learning algorithms, such as natural actor-critic, TRPO, and PPO, lies the Natural Policy Gradient (NPG) policy optimization algorithm. Convergence results for these RL algorithms rest on convergence results for the NPG algorithm. However, all existing results on the convergence of the NPG algorithm are limited to finite-state settings. We study a general class of queueing MDPs, and prove a $O(1/\sqrt{T})$ convergence rate for the NPG algorithm, if the NPG algorithm is initialized with the MaxWeight policy. This is the first convergence rate bound for the NPG algorithm for a general class of infinite-state average-reward MDPs. Moreover, our result applies to a beyond the queueing setting to any countably-infinite MDP satisfying certain mild structural assumptions, given a sufficiently good initial policy. Key to our result are state-dependent bounds on the relative value function achieved by the iterate policies of the NPG algorithm.
nan
Article 1246
Title@2025-07-10 (4): UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
Title: UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching | UmbraTTS: Text-zu-Sprechen an Umweltkontexte anpassen mit Flow Matching | UmbratTS:用流动匹配使文字语音适应环境环境环境 2506.09874v2 |
Authors (9): Neta Glazer, Aviv Navon, Yael Segal, Aviv Shamsian, Hilit Segev, Asaf Buchnick, Menachem Pirchi, Gil Hetz, Joseph Keshet
Recent advances in Text-to-Speech (TTS) have enabled highly natural speech synthesis, yet integrating speech with complex background environments remains challenging. We introduce UmbraTTS, a flow-matching based TTS model that jointly generates both speech and environmental audio, conditioned on text and acoustic context. Our model allows fine-grained control over background volume and produces diverse, coherent, and context-aware audio scenes. A key challenge is the lack of data with speech and background audio aligned in natural context. To overcome the lack of paired training data, we propose a self-supervised framework that extracts speech, background audio, and transcripts from unannotated recordings. Extensive evaluations demonstrate that UmbraTTS significantly outperformed existing baselines, producing natural, high-quality, environmentally aware audios.
nan
Article 1247
Title@2025-07-10 (4): Stochastic Operator Network: A Stochastic Maximum Principle Based Approach to Operator Learning
Title: Stochastic Operator Network: A Stochastic Maximum Principle Based Approach to Operator Learning | Stochastic Operator Network: Ein stochastisches Maximum-Prinzip basierte Ansatz zum Operator-Lernen | 存储操作员网络:运行操作员学习的存储最大原则性方法 2507.10401v1 |
Authors (5): Ryan Bausback, Jingqiao Tang, Lu Lu, Feng Bao, Toan Huynh
We develop a novel framework for uncertainty quantification in operator learning, the Stochastic Operator Network (SON). SON combines the stochastic optimal control concepts of the Stochastic Neural Network (SNN) with the DeepONet. By formulating the branch net as an SDE and backpropagating through the adjoint BSDE, we replace the gradient of the loss function with the gradient of the Hamiltonian from Stohastic Maximum Principle in the SGD update. This allows SON to learn the uncertainty present in operators through its diffusion parameters. We then demonstrate the effectiveness of SON when replicating several noisy operators in 2D and 3D.
nan
Article 1248
Title@2025-07-10 (4): Graph Convolutional Branch and Bound
Title: Graph Convolutional Branch and Bound | Graphischer konvolutionärer Zweig und Bound | 革命分支和圆环 2406.03099v3 |
Authors (5): Lorenzo Sciandra, Roberto Esposito, Andrea Cesare Grosso, Laura Sacerdote, Cristina Zucca
This article explores the integration of deep learning models into combinatorial optimization pipelines, specifically targeting NP-hard problems. Traditional exact algorithms for such problems often rely on heuristic criteria to guide the exploration of feasible solutions. In this work, we propose using neural networks to learn informative heuristics-most notably, an optimality score that estimates a solution’s proximity to the optimum. This score is used to evaluate nodes within a branch-and-bound framework, enabling a more efficient traversal of the solution space. Focusing on the Traveling Salesman Problem, we describe two exact solvers-1-tree branch-and-bound and Concorde-and introduce a hybrid approach called Graph Convolutional Branch and Bound, which augments these solvers with a graph convolutional neural network along with a novel unsupervised training strategy that facilitates generalization to graphs of varying sizes without requiring labeled data. Empirical results demonstrate the effectiveness of the proposed method, showing a significant reduction in the number of explored branch-and-bound nodes and overall computational time.
nan
Article 1249
Title@2025-07-10 (4): Assessing the Chemical Intelligence of Large Language Models
Title: Assessing the Chemical Intelligence of Large Language Models | Bewertung der chemischen Intelligenz großer Sprachmodelle | 评估大语言模型的化学情报 2505.07735v2 |
Authors (3): Nicholas T. Runcie, Charlotte M. Deane, Fergus Imrie
Large Language Models are versatile, general-purpose tools with a wide range of applications. Recently, the advent of “reasoning models” has led to substantial improvements in their abilities in advanced problem-solving domains such as mathematics and software engineering. In this work, we assessed the ability of reasoning models to perform chemistry tasks directly, without any assistance from external tools. We created a novel benchmark, called ChemIQ, consisting of 816 questions assessing core concepts in organic chemistry, focused on molecular comprehension and chemical reasoning. Unlike previous benchmarks, which primarily use multiple choice formats, our approach requires models to construct short-answer responses, more closely reflecting real-world applications. The reasoning models, OpenAI’s o3-mini, Google’s Gemini Pro 2.5, and DeepSeek R1, answered 50%-57% of questions correctly in the highest reasoning modes, with higher reasoning levels significantly increasing performance on all tasks. These models substantially outperformed the non-reasoning models which achieved only 3%-7% accuracy. We found that Large Language Models can now convert SMILES strings to IUPAC names, a task earlier models were unable to perform. Additionally, we show that the latest reasoning models can elucidate structures from 1D and 2D 1H and 13C NMR data, with Gemini Pro 2.5 correctly generating SMILES strings for around 90% of molecules containing up to 10 heavy atoms, and in one case solving a structure comprising 25 heavy atoms. For each task, we found evidence that the reasoning process mirrors that of a human chemist. Our results demonstrate that the latest reasoning models can, in some cases, perform advanced chemical reasoning.
nan
Article 1250
Title@2025-07-10 (4): Physics-Informed Neural Networks with Hard Nonlinear Equality and Inequality Constraints
Title: Physics-Informed Neural Networks with Hard Nonlinear Equality and Inequality Constraints | Physik-informierte neurale Netzwerke mit harten nichtlinearen Gleichstellungs- und Ungleichheitsbeschränkungen | 具有硬非线性平等和不平等制约因素的物理内立神经网络 2507.08124v1 |
Authors (3): Ashfaq Iftakher, Rahul Golder, M. M. Faruque Hasan
Traditional physics-informed neural networks (PINNs) do not guarantee strict constraint satisfaction. This is problematic in engineering systems where minor violations of governing laws can significantly degrade the reliability and consistency of model predictions. In this work, we develop KKT-Hardnet, a PINN architecture that enforces both linear and nonlinear equality and inequality constraints up to machine precision. It leverages a projection onto the feasible region through solving Karush-Kuhn-Tucker (KKT) conditions of a distance minimization problem. Furthermore, we reformulate the nonlinear KKT conditions using log-exponential transformation to construct a general sparse system with only linear and exponential terms, thereby making the projection differentiable. We apply KKT-Hardnet on both test problems and a real-world chemical process simulation. Compared to multilayer perceptrons and PINNs, KKT-Hardnet achieves higher accuracy and strict constraint satisfaction. This approach allows the integration of domain knowledge into machine learning towards reliable hybrid modeling of complex systems.
nan
Article 1251
Title@2025-07-10 (4): Quasi-Random Physics-informed Neural Networks
Title: Quasi-Random Physics-informed Neural Networks | Quasi-Random Physik-informierte Neuronale Netzwerke | 准环境网 物理-知情神经网络 2507.08121v1 |
Authors (2): Tianchi Yu, Ivan Oseledets
Physics-informed neural networks have shown promise in solving partial differential equations (PDEs) by integrating physical constraints into neural network training, but their performance is sensitive to the sampling of points. Based on the impressive performance of quasi Monte-Carlo methods in high dimensional problems, this paper proposes Quasi-Random Physics-Informed Neural Networks (QRPINNs), which use low-discrepancy sequences for sampling instead of random points directly from the domain. Theoretically, QRPINNs have been proven to have a better convergence rate than PINNs. Empirically, experiments demonstrate that QRPINNs significantly outperform PINNs and some representative adaptive sampling methods, especially in high-dimensional PDEs. Furthermore, combining QRPINNs with adaptive sampling can further improve the performance.
nan
Article 1252
Title@2025-07-10 (4): PDE-aware Optimizer for Physics-informed Neural Networks
Title: PDE-aware Optimizer for Physics-informed Neural Networks | PDE-aware Optimizer für physikinformierte Neuronale Netzwerke | PDE-觉醒物理知情神经网络优化器 2507.08118v1 |
Authors (3): Hardik Shukla, Manurag Khullar, Vismay Churiwala
Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving partial differential equations (PDEs) by embedding physical constraints into the loss function. However, standard optimizers such as Adam often struggle to balance competing loss terms, particularly in stiff or ill-conditioned systems. In this work, we propose a PDE-aware optimizer that adapts parameter updates based on the variance of per-sample PDE residual gradients. This method addresses gradient misalignment without incurring the heavy computational costs of second-order optimizers such as SOAP. We benchmark the PDE-aware optimizer against Adam and SOAP on 1D Burgers’, Allen-Cahn and Korteweg-de Vries(KdV) equations. Across both PDEs, the PDE-aware optimizer achieves smoother convergence and lower absolute errors, particularly in regions with sharp gradients. Our results demonstrate the effectiveness of PDE residual-aware adaptivity in enhancing stability in PINNs training. While promising, further scaling on larger architectures and hardware accelerators remains an important direction for future research.
nan
Article 1253
Title@2025-07-10 (4): Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation
Title: Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation | Mallows-Modell mit Lerndistanz-Metriken: Probenahme und maximale Likelihood-Schätzung | 边远计量:抽样和最大可能性估计 2507.08108v1 |
Authors (2): Yeganeh Alimohammadi, Kiana Asgari
\textit{Mallows model} is a widely-used probabilistic framework for learning from ranking data, with applications ranging from recommendation systems and voting to aligning language models with human preferences~\cite{chen2024mallows, kleinberg2021algorithmic, rafailov2024direct}. Under this model, observed rankings are noisy perturbations of a central ranking $\sigma$, with likelihood decaying exponentially in distance from $\sigma$, i.e, $P (\pi) \propto \exp\big(-\beta \cdot d(\pi, \sigma)\big),$ where $\beta > 0$ controls dispersion and $d$ is a distance function. Existing methods mainly focus on fixed distances (such as Kendall’s $\tau$ distance), with no principled approach to learning the distance metric directly from data. In practice, however, rankings naturally vary by context; for instance, in some sports we regularly see long-range swaps (a low-rank team beating a high-rank one), while in others such events are rare. Motivated by this, we propose a generalization of Mallows model that learns the distance metric directly from data. Specifically, we focus on $L_\alpha$ distances: $d_\alpha(\pi,\sigma):=\sum_{i=1} | \pi(i)-\sigma(i) | ^\alpha$. For any $\alpha\geq 1$ and $\beta>0$, we develop a Fully Polynomial-Time Approximation Scheme (FPTAS) to efficiently generate samples that are $\epsilon$- close (in total variation distance) to the true distribution. Even in the special cases of $L_1$ and $L_2$, this generalizes prior results that required vanishing dispersion ($\beta\to0$). Using this sampling algorithm, we propose an efficient Maximum Likelihood Estimation (MLE) algorithm that jointly estimates the central ranking, the dispersion parameter, and the optimal distance metric. We prove strong consistency results for our estimators (for any values of $\alpha$ and $\beta$), and we validate our approach empirically using datasets from sports rankings. |
nan
Article 1254
Title@2025-07-10 (4): Predicting Flow Dynamics using Diffusion Models
Title: Predicting Flow Dynamics using Diffusion Models | Vorhersage von Strömungsdynamiken mit Diffusionsmodellen | 利用传播模型预测流动动态 2507.08106v1 |
Authors (2): Yannick Gachnang, Vismay Churiwala
In this work, we aimed to replicate and extend the results presented in the DiffFluid paper[1]. The DiffFluid model showed that diffusion models combined with Transformers are capable of predicting fluid dynamics. It uses a denoising diffusion probabilistic model (DDPM) framework to tackle Navier-Stokes and Darcy flow equations. Our goal was to validate the reproducibility of the methods in the DiffFluid paper while testing its viability for other simulation types, particularly the Lattice Boltzmann method. Despite our computational limitations and time constraints, this work provides evidence of the flexibility and potential of the model as a general-purpose solver for fluid dynamics. Our results show both the potential and challenges of applying diffusion models to complex fluid dynamics problems. This work highlights the opportunities for future research in optimizing the computational efficiency and scaling such models in broader domains.
nan
Article 1255
Title@2025-07-10 (4): PIAD-SRNN: Physics-Informed Adaptive Decomposition in State-Space RNN
Title: PIAD-SRNN: Physics-Informed Adaptive Decomposition in State-Space RNN | PIAD-SRNN: Physik-informierte Adaptive Zersetzung im State-Space RNN | PIAD-SRNN: 国家空间空间网中的物理系统化适应性分解 2412.00994v2 |
Authors (3): Ahmad Mohammadshirazi, Pinaki Prasad Guha Neogi, Rajiv Ramnath
Time series forecasting often demands a trade-off between accuracy and efficiency. While recent Transformer models have improved forecasting capabilities, they come with high computational costs. Linear-based models have shown better accuracy than Transformers but still fall short of ideal performance. We propose PIAD-SRNN, a physics-informed adaptive decomposition state-space RNN, that separates seasonal and trend components and embeds domain equations in a recurrent framework. We evaluate PIAD-SRNN’s performance on indoor air quality datasets, focusing on CO2 concentration prediction across various forecasting horizons, and results demonstrate that it consistently outperforms SoTA models in both long-term and short-term time series forecasting, including transformer-based architectures, in terms of both MSE and MAE. Besides proposing PIAD-SRNN which balances accuracy with efficiency, this paper also provides four curated datasets. Code and data: https://github.com/ahmad-shirazi/DSSRNN
nan
Article 1256
Title@2025-07-10 (4): Low-rank Momentum Factorization for Memory Efficient Training
Title: Low-rank Momentum Factorization for Memory Efficient Training | Low-rank Momentum Factorization für ein speichereffizientes Training | 记忆高效培训的低调动力化 2507.08091v1 |
Authors (2): Pouria Mahdavinia, Mehrdad Mahdavi
Fine-tuning large foundation models presents significant memory challenges due to stateful optimizers like AdamW, often requiring several times more GPU memory than inference. While memory-efficient methods like parameter-efficient fine-tuning (e.g., LoRA) and optimizer state compression exist, recent approaches like GaLore bridge these by using low-rank gradient projections and subspace moment accumulation. However, such methods may struggle with fixed subspaces or computationally costly offline resampling (e.g., requiring full-matrix SVDs). We propose Momentum Factorized SGD (MoFaSGD), which maintains a dynamically updated low-rank SVD representation of the first-order momentum, closely approximating its full-rank counterpart throughout training. This factorization enables a memory-efficient fine-tuning method that adaptively updates the optimization subspace at each iteration. Crucially, MoFaSGD leverages the computed low-rank momentum factors to perform efficient spectrally normalized updates, offering an alternative to subspace moment accumulation. We establish theoretical convergence guarantees for MoFaSGD, proving it achieves an optimal rate for non-convex stochastic optimization under standard assumptions. Empirically, we demonstrate MoFaSGD’s effectiveness on large language model alignment benchmarks, achieving a competitive trade-off between memory reduction (comparable to LoRA) and performance compared to state-of-the-art low-rank optimization methods. Our implementation is available at https://github.com/pmahdavi/MoFaSGD.
nan
Article 1257
Title@2025-07-10 (4): Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models
Title: Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models | Auswirkungen von Pretraining Word Co-occurence auf die kompositorische Generalisierung in multimodalen Modellen | 预言前世界共同会议对多式联运模式中整体构成的影响 2507.08000v1 |
Authors (2): Helen Qu, Sang Michael Xie
CLIP and large multimodal models (LMMs) have better accuracy on examples involving concepts that are highly represented in the training data. However, the role of concept combinations in the training data on compositional generalization is largely unclear – for instance, how does accuracy vary when a common object appears in an uncommon pairing with another object? In this paper, we investigate how word co-occurrence statistics in the pretraining dataset (a proxy for co-occurrence of visual concepts) impacts CLIP/LMM performance. To disentangle the effects of word co-occurrence frequencies from single-word frequencies, we measure co-occurrence with pointwise mutual information (PMI), which normalizes the joint probability of two words co-occurring by the probability of co-occurring independently. Using synthetically generated images with a variety of concept pairs, we show a strong correlation between PMI in the CLIP pretraining data and zero-shot accuracy in CLIP models trained on LAION-400M (r=0.97 and 14% accuracy gap between images in the top and bottom 5% of PMI values), demonstrating that even accuracy on common concepts is affected by the combination of concepts in the image. Leveraging this finding, we reproduce this effect in natural images by editing them to contain pairs with varying PMI, resulting in a correlation of r=0.75. Finally, we demonstrate that this behavior in CLIP transfers to LMMs built on top of CLIP (r=0.70 for TextVQA, r=0.62 for VQAv2). Our findings highlight the need for algorithms and architectures that improve compositional generalization in multimodal models without scaling the training data combinatorially. Our code is available at https://github.com/helenqu/multimodal-pretraining-pmi.
nan
Article 1258
Title@2025-07-10 (4): Single-pass Adaptive Image Tokenization for Minimum Program Search
Title: Single-pass Adaptive Image Tokenization for Minimum Program Search | Single-Pass Adaptive Image Tokenization für minimale Programmsuche | 用于最低程序搜索的单一被动图像适配 2507.07995v1 |
Authors (5): Shivam Duggal, Sanghyun Byun, William T. Freeman, Antonio Torralba, Phillip Isola
According to Algorithmic Information Theory (AIT) – Intelligent representations compress data into the shortest possible program that can reconstruct its content, exhibiting low Kolmogorov Complexity (KC). In contrast, most visual representation learning systems use fixed-length representations for all inputs, ignoring variations in complexity or familiarity. Recent adaptive tokenization methods address this by allocating variable-length representations but typically require test-time search over multiple encodings to find the most predictive one. Inspired by Kolmogorov Complexity principles, we propose a single-pass adaptive tokenizer, KARL, which predicts the appropriate number of tokens for an image in a single forward pass, halting once its approximate KC is reached. The token count serves as a proxy for the minimum description length. KARL’s training procedure closely resembles the Upside-Down Reinforcement Learning paradigm, as it learns to conditionally predict token halting based on a desired reconstruction quality. KARL matches the performance of recent adaptive tokenizers while operating in a single pass. We present scaling laws for KARL, analyzing the role of encoder/decoder size, continuous vs. discrete tokenization and more. Additionally, we offer a conceptual study drawing an analogy between Adaptive Image Tokenization and Algorithmic Information Theory, examining the predicted image complexity (KC) across axes such as structure vs. noise and in- vs. out-of-distribution familiarity – revealing alignment with human intuition.
nan
Article 1259
Title@2025-07-10 (4): Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Title: Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs | Überspringen Sie eine Ebene oder Schleifen Sie es? Test-Zeit Tiefe Anpassung von vorgebildeten LLMs | 跳过图层或循环它? 预设 LLM 的测试时间深度适应 2507.07996v1 |
Authors (3): Ziyue Li, Yang Li, Tianyi Zhou
Can a pretrained neural network adapt its architecture to different inputs without any finetuning? Do we need all layers for simple tasks, and are they adequate for challenging tasks? We found that the layers of a pretrained large language model (LLM) can be manipulated as separate modules to build a better and even shallower model customized for each test sample. In particular, each layer from the pretrained model can be skipped/pruned or repeated multiple times as recurrent neural networks (RNN), and stacked with others in arbitrary orders, yielding a chain-of-layers (CoLa) per sample. This compositional space greatly expands the scope of existing works on looped/recurrent pretrained modules, layer pruning, or early-exit networks. We develop a Monte Carlo Tree Search (MCTS) protocol to explore and identify the optimal CoLa for each sample from math and commonsense reasoning benchmarks. Compared to a static model of a fixed depth, CoLa allows shortcut paths (fast thinking), recurrence of the same layer(s) (slow thinking), and combining both, offering more flexible, dynamic architectures for different inputs. We conduct an extensive analysis of the MCTS-optimized CoLa, which leads to two key findings: (1) For >75% of samples with correct predictions by the original LLM, we can find shorter CoLa, suggesting a large space for improving inference efficiency; (2) For >60% of samples with originally incorrect predictions, we can identify CoLa achieving correct predictions, suggesting a large space of performance enhancement. Our results highlight the shortcomings of using a fixed architecture of pre-trained LLMs for inference on different samples and pave the way to unlock the generalization power of test-time depth adaptation.
nan
Article 1260
Title@2025-07-10 (4): Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012
Title: Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012 | Verwendung von KI zur Zusammenfassung der US-Präsidentschaftskampagne TV-Werbung Videos, 1952-2012 | 利用大赦国际总结1952-2012年美国总统竞选运动电视广告视频, 2503.22589v2 |
Authors (6): Adam Breuer, Bryce J. Dietrich, Michael H. Crespin, Matthew Butler, J. A. Pryse, Kosuke Imai
This paper introduces the largest and most comprehensive dataset of US presidential campaign television advertisements, available in digital format. The dataset also includes machine-searchable transcripts and high-quality summaries designed to facilitate a variety of academic research. To date, there has been great interest in collecting and analyzing US presidential campaign advertisements, but the need for manual procurement and annotation led many to rely on smaller subsets. We design a large-scale parallelized, AI-based analysis pipeline that automates the laborious process of preparing, transcribing, and summarizing videos. We then apply this methodology to the 9,707 presidential ads from the Julian P. Kanter Political Commercial Archive. We conduct extensive human evaluations to show that these transcripts and summaries match the quality of manually generated alternatives. We illustrate the value of this data by including an application that tracks the genesis and evolution of current focal issue areas over seven decades of presidential elections. Our analysis pipeline and codebase also show how to use LLM-based tools to obtain high-quality summaries for other video datasets.
nan
Article 1261
Title@2025-07-10 (4): Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Title: Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions | Quantile Reward Policy Optimierung: Ausrichtung mit punktweisen Regressions- und Exaktpartitionsfunktionen | 量化奖退利政策优化:与点回归和精密分区函数一致 2507.08068v1 |
Authors (3): Simon Matrenok, Skander Moalla, Caglar Gulcehre
Aligning large language models with pointwise absolute rewards has so far required online, on-policy algorithms such as PPO and GRPO. In contrast, simpler methods that can leverage offline or off-policy data, such as DPO and REBEL, are limited to learning from preference pairs or relative signals. To bridge this gap, we introduce \emph{Quantile Reward Policy Optimization} (QRPO), which learns from pointwise absolute rewards while preserving the simplicity and offline applicability of DPO-like methods. QRPO uses quantile rewards to enable regression to the closed-form solution of the KL-regularized RL objective. This reward yields an analytically tractable partition function, removing the need for relative signals to cancel this term. Moreover, QRPO scales with increased compute to estimate quantile rewards, opening a new dimension for pre-computation scaling. Empirically, QRPO consistently achieves top performance on chat and coding evaluations – reward model scores, AlpacaEval 2, and LeetCode – compared to DPO, REBEL, and SimPO across diverse datasets and 8B-scale models. Finally, we find that training with robust rewards instead of converting them to preferences induces less length bias.
nan
Article 1262
Title@2025-07-10 (4): KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors
Title: KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors | KinDEL: DNA-kodierter Bibliotheks-Datensatz für Kinase-Inhibitoren | KinDEL: Kinas Inhibbitor 的DNA编码图书馆数据集 2410.08938v2 |
Authors (21): Benson Chen, Tomasz Danel, Gabriel H. S. Dreiman, Patrick J. McEnaney, Nikhil Jain, Kirill Novikov, Spurti Umesh Akki, Joshua L. Turnbull, Virja Atul Pandya, Boris P. Belotserkovskii, Jared Bryce Weaver, Ankita Biswas, Dat Nguyen, Kent Gorday, Mohammad Sultan, Nathaniel Stanley, Daniel M Whalen, Divya Kanichar, Christoph Klein, Emily Fox, R. Edward Watts
DNA-Encoded Libraries (DELs) represent a transformative technology in drug discovery, facilitating the high-throughput exploration of vast chemical spaces. Despite their potential, the scarcity of publicly available DEL datasets presents a bottleneck for the advancement of machine learning methodologies in this domain. To address this gap, we introduce KinDEL, one of the largest publicly accessible DEL datasets and the first one that includes binding poses from molecular docking experiments. Focused on two kinases, Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1), KinDEL includes 81 million compounds, offering a rich resource for computational exploration. Additionally, we provide comprehensive biophysical assay validation data, encompassing both on-DNA and off-DNA measurements, which we use to evaluate a suite of machine learning techniques, including novel structure-based probabilistic models. We hope that our benchmark, encompassing both 2D and 3D structures, will help advance the development of machine learning models for data-driven hit identification using DELs.
nan
Article 1263
Title@2025-07-10 (4): Why is Your Language Model a Poor Implicit Reward Model?
Title: Why is Your Language Model a Poor Implicit Reward Model? | Warum ist Ihr Sprachmodell ein schlechtes Implizit-Reward-Modell? | 为什么您的语言模式 是一个贫穷的隐含奖赏模式? 2507.07981v1 |
Authors (4): Noam Razin, Yong Lin, Jiarui Yao, Sanjeev Arora
Reward models are key to language model post-training and inference pipelines. Conveniently, recent work showed that every language model defines an implicit reward model (IM-RM), without requiring any architectural changes. However, such IM-RMs tend to generalize worse, especially out-of-distribution, compared to explicit reward models (EX-RMs) that apply a dedicated linear head over the hidden representations of a language model. The existence of a generalization gap is puzzling, as EX-RMs and IM-RMs are nearly identical. They can be trained using the same data, loss function, and language model, and differ only in how the reward is computed. Towards a fundamental understanding of the implicit biases underlying different reward model types, we investigate the root cause of this gap. Our main finding, backed by theory and experiments, is that IM-RMs rely more heavily on superficial token-level cues. Consequently, they often generalize worse than EX-RMs under token-level distribution shifts, as well as in-distribution. Furthermore, we provide evidence against alternative hypotheses for the generalization gap. Most notably, we challenge the intuitive claim that IM-RMs struggle in tasks where generation is harder than verification because they can operate both as a verifier and a generator. Taken together, our results highlight that seemingly minor design choices can substantially impact the generalization behavior of reward models.
nan
Article 1264
Title@2025-07-10 (4): Prospective Learning in Retrospect
Title: Prospective Learning in Retrospect | Zukunftsorientiertes Lernen im Nachhinein | 回溯中的未来学习 2507.07965v1 |
Authors (6): Yuxin Bai, Cecelia Shuai, Ashwin De Silva, Siyu Yu, Pratik Chaudhari, Joshua T. Vogelstein
In most real-world applications of artificial intelligence, the distributions of the data and the goals of the learners tend to change over time. The Probably Approximately Correct (PAC) learning framework, which underpins most machine learning algorithms, fails to account for dynamic data distributions and evolving objectives, often resulting in suboptimal performance. Prospective learning is a recently introduced mathematical framework that overcomes some of these limitations. We build on this framework to present preliminary results that improve the algorithm and numerical results, and extend prospective learning to sequential decision-making scenarios, specifically foraging. Code is available at: https://github.com/neurodata/prolearn2.
nan
Article 1265
Title@2025-07-10 (4): TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices
Title: TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices | TinierHAR: Auf dem Weg zu ultraleichten Deep-Learning-Modellen für effiziente menschliche Aktivitätserkennung auf Edge-Geräten | TiniierHAR:迈向超轻量深深学习模型,以便有效识别人类在边缘装置方面的活动 2507.07949v1 |
Authors (5): Sizhen Bian, Mengxi Liu, Vitor Fortes Rey, Daniel Geissler, Paul Lukowicz
Human Activity Recognition (HAR) on resource-constrained wearable devices demands inference models that harmonize accuracy with computational efficiency. This paper introduces TinierHAR, an ultra-lightweight deep learning architecture that synergizes residual depthwise separable convolutions, gated recurrent units (GRUs), and temporal aggregation to achieve SOTA efficiency without compromising performance. Evaluated across 14 public HAR datasets, TinierHAR reduces Parameters by 2.7x (vs. TinyHAR) and 43.3x (vs. DeepConvLSTM), and MACs by 6.4x and 58.6x, respectively, while maintaining the averaged F1-scores. Beyond quantitative gains, this work provides the first systematic ablation study dissecting the contributions of spatial-temporal components across proposed TinierHAR, prior SOTA TinyHAR, and the classical DeepConvLSTM, offering actionable insights for designing efficient HAR systems. We finally discussed the findings and suggested principled design guidelines for future efficient HAR. To catalyze edge-HAR research, we open-source all materials in this work for future benchmarking\footnote{https://github.com/zhaxidele/TinierHAR}
nan
Article 1266
Title@2025-07-10 (4): BarcodeBERT: Transformers for Biodiversity Analysis
Title: BarcodeBERT: Transformers for Biodiversity Analysis | BarcodeBERT: Transformer für Biodiversitätsanalyse | 条码BERT:生物多样性分析变异器 2311.02401v3 |
Authors (12): Pablo Millan Arias, Niousha Sadjadi, Monireh Safari, ZeMing Gong, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Dirk Steinke, Lila Kari, Angel X. Chang, Scott C. Lowe, Graham W. Taylor
In the global challenge of understanding and characterizing biodiversity, short species-specific genomic sequences known as DNA barcodes play a critical role, enabling fine-grained comparisons among organisms within the same kingdom of life. Although machine learning algorithms specifically designed for the analysis of DNA barcodes are becoming more popular, most existing methodologies rely on generic supervised training algorithms. We introduce BarcodeBERT, a family of models tailored to biodiversity analysis and trained exclusively on data from a reference library of 1.5M invertebrate DNA barcodes. We compared the performance of BarcodeBERT on taxonomic identification tasks against a spectrum of machine learning approaches including supervised training of classical neural architectures and fine-tuning of general DNA foundation models. Our self-supervised pretraining strategies on domain-specific data outperform fine-tuned foundation models, especially in identification tasks involving lower taxa such as genera and species. We also compared BarcodeBERT with BLAST, one of the most widely used bioinformatics tools for sequence searching, and found that our method matched BLAST’s performance in species-level classification while being 55 times faster. Our analysis of masking and tokenization strategies also provides practical guidance for building customized DNA language models, emphasizing the importance of aligning model training strategies with dataset characteristics and domain knowledge. The code repository is available at https://github.com/bioscan-ml/BarcodeBERT.
nan
Article 1267
Title@2025-07-10 (4): Towards Continuous Home Cage Monitoring: An Evaluation of Tracking and Identification Strategies for Laboratory Mice
Title: Towards Continuous Home Cage Monitoring: An Evaluation of Tracking and Identification Strategies for Laboratory Mice | Towards Continuous Home Cage Monitoring: Eine Bewertung von Tracking- und Identifikationsstrategien für Labor-Mäuse | 逐步实现家用钥匙持续监测:对实验室老鼠跟踪和识别战略的评价 2507.07929v1 |
Authors (2): Juan Pablo Oberhauser, Daniel Grzenda
Continuous, automated monitoring of laboratory mice enables more accurate data collection and improves animal welfare through real-time insights. Researchers can achieve a more dynamic and clinically relevant characterization of disease progression and therapeutic effects by integrating behavioral and physiological monitoring in the home cage. However, providing individual mouse metrics is difficult because of their housing density, similar appearances, high mobility, and frequent interactions. To address these challenges, we develop a real-time identification (ID) algorithm that accurately assigns ID predictions to mice wearing custom ear tags in digital home cages monitored by cameras. Our pipeline consists of three parts: (1) a custom multiple object tracker (MouseTracks) that combines appearance and motion cues from mice; (2) a transformer-based ID classifier (Mouseformer); and (3) a tracklet associator linear program to assign final ID predictions to tracklets (MouseMap). Our models assign an animal ID based on custom ear tags at 30 frames per second with 24/7 cage coverage. We show that our custom tracking and ID pipeline improves tracking efficiency and lowers ID switches across mouse strains and various environmental factors compared to current mouse tracking methods.
nan
Article 1268
Title@2025-07-10 (4): A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search
Title: A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search | Eine Theorie der Schlussfolgerung Berechnung Scaling: Vernunft durch gerichtete stochastische Fähigkeiten Suche | 推断计算尺度理论论:通过定向斯托卡技能搜索推理 2507.00004v2 |
Authors (3): Austin R. Ellis-Mohr, Anuj K. Nayak, Lav R. Varshney
Large language models (LLMs) demand considerable computational, energy, and financial resources during both training and deployment. While scaling laws for training have guided much of the field’s recent progress, inference costs now represent a significant and growing component of the overall resource burden, particularly for reasoning-focused models. Existing characterizations of compute-optimality that consider model size, dataset size, and inference tokens in isolation or in fixed combinations risk overlooking more efficient operating points. We introduce directed stochastic skill search (DS3), a general framework that represents inference as stochastic traversal over a learned skill graph. From a simplified yet expressive instantiation, we derive closed-form expressions for task success and compute cost across a wide range of inference strategies – including chain-of-thought (CoT) and tree-of-thought (ToT) – enabling comparative analysis as a function of task difficulty and model capability. To that end, we extend a prior first-principles tripartite graph framework of LLM training to incorporate inference, and separately bridge DS3 with empirical methods that characterize LLM scaling behavior. We theoretically recover empirically observed patterns, including: linear accuracy scaling with logarithmic compute; variation in preferred inference strategies as a function of task difficulty and model capability; emergent behavior elicited by reasoning even when performance plateaus under parameter scaling; and both best-of-N (BoN) and majority voting behavior captured within a unified analytical framework. By explicitly characterizing training-inference interdependencies, our framework deepens theoretical understanding and supports principled algorithmic design and resource allocation.
nan
Article 1269
Title@2025-07-10 (4): No $D_{\text{train}}$: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning
Title: No $D_{\text{train}}$: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning | Keine $D_{\text{train}}}$: Modell-agnostische Gegenfaktische Erklärungen mit Verstärkungslernen | 无 $D{text{train$:利用强化学习模型-不可允许的反事实解释 2405.18563v2 |
Authors (3): Xiangyu Sun, Raquel Aoki, Kevin H. Wilson
Machine learning (ML) methods have experienced significant growth in the past decade, yet their practical application in high-impact real-world domains has been hindered by their opacity. When ML methods are responsible for making critical decisions, stakeholders often require insights into how to alter these decisions. Counterfactual explanations (CFEs) have emerged as a solution, offering interpretations of opaque ML models and providing a pathway to transition from one decision to another. However, most existing CFE methods require access to the model’s training dataset, few methods can handle multivariate time-series, and none of model-agnostic CFE methods can handle multivariate time-series without training datasets. These limitations can be formidable in many scenarios. In this paper, we present NTD-CFE, a novel model-agnostic CFE method based on reinforcement learning (RL) that generates CFEs when training datasets are unavailable. NTD-CFE is suitable for both static and multivariate time-series datasets with continuous and discrete features. NTD-CFE reduces the CFE search space from a multivariate time-series domain to a lower dimensional space and addresses the problem using RL. Users have the flexibility to specify non-actionable, immutable, and preferred features, as well as causal constraints. We demonstrate the performance of NTD-CFE against four baselines on several datasets and find that, despite not having access to a training dataset, NTD-CFE finds CFEs that make significantly fewer and significantly smaller changes to the input time-series. These properties make CFEs more actionable, as the magnitude of change required to alter an outcome is vastly reduced. The code is available in the supplementary material.
nan
Article 1270
Title@2025-07-10 (4): Plausible Counterfactual Explanations of Recommendations
Title: Plausible Counterfactual Explanations of Recommendations | Plausible gegenfaktische Erklärungen der Empfehlungen | 对建议的反事实解释 2507.07919v1 |
Authors (4): Jakub Černý, Jiří Němeček, Ivan Dovica, Jakub Mareček
Explanations play a variety of roles in various recommender systems, from a legally mandated afterthought, through an integral element of user experience, to a key to persuasiveness. A natural and useful form of an explanation is the Counterfactual Explanation (CE). We present a method for generating highly plausible CEs in recommender systems and evaluate it both numerically and with a user study.
nan
Article 1271
Title@2025-07-10 (4): A statistical physics framework for optimal learning
Title: A statistical physics framework for optimal learning | Statistischer Physikrahmen für optimales Lernen | 促进最佳学习的统计物理框架 2507.07907v1 |
Authors (2): Francesca Mignacco, Francesco Mori
Learning is a complex dynamical process shaped by a range of interconnected decisions. Careful design of hyperparameter schedules for artificial neural networks or efficient allocation of cognitive resources by biological learners can dramatically affect performance. Yet, theoretical understanding of optimal learning strategies remains sparse, especially due to the intricate interplay between evolving meta-parameters and nonlinear learning dynamics. The search for optimal protocols is further hindered by the high dimensionality of the learning space, often resulting in predominantly heuristic, difficult to interpret, and computationally demanding solutions. Here, we combine statistical physics with control theory in a unified theoretical framework to identify optimal protocols in prototypical neural network models. In the high-dimensional limit, we derive closed-form ordinary differential equations that track online stochastic gradient descent through low-dimensional order parameters. We formulate the design of learning protocols as an optimal control problem directly on the dynamics of the order parameters with the goal of minimizing the generalization error at the end of training. This framework encompasses a variety of learning scenarios, optimization constraints, and control budgets. We apply it to representative cases, including optimal curricula, adaptive dropout regularization and noise schedules in denoising autoencoders. We find nontrivial yet interpretable strategies highlighting how optimal protocols mediate crucial learning tradeoffs, such as maximizing alignment with informative input directions while minimizing noise fitting. Finally, we show how to apply our framework to real datasets. Our results establish a principled foundation for understanding and designing optimal learning protocols and suggest a path toward a theory of meta-learning grounded in statistical physics.
nan
Article 1272
Title@2025-07-10 (4): Agentic Retrieval of Topics and Insights from Earnings Calls
Title: Agentic Retrieval of Topics and Insights from Earnings Calls | Agentische Retrieval von Themen und Erkenntnisse aus Earnings Calls | 收入呼吁的主题和透视的 Agent 检索 2507.07906v1 |
Authors (3): Anant Gupta, Rajarshi Bhowmik, Geoffrey Gunow
Tracking the strategic focus of companies through topics in their earnings calls is a key task in financial analysis. However, as industries evolve, traditional topic modeling techniques struggle to dynamically capture emerging topics and their relationships. In this work, we propose an LLM-agent driven approach to discover and retrieve emerging topics from quarterly earnings calls. We propose an LLM-agent to extract topics from documents, structure them into a hierarchical ontology, and establish relationships between new and existing topics through a topic ontology. We demonstrate the use of extracted topics to infer company-level insights and emerging trends over time. We evaluate our approach by measuring ontology coherence, topic evolution accuracy, and its ability to surface emerging financial trends.
nan
Article 1273
Title@2025-07-10 (4): Enhancing Cross Entropy with a Linearly Adaptive Loss Function for Optimized Classification Performance
Title: Enhancing Cross Entropy with a Linearly Adaptive Loss Function for Optimized Classification Performance | Verbesserung der Kreuzentropie mit einer linearen adaptiven Verlustfunktion für optimierte Klassifizierungsleistung | 优化分类绩效的线性适应性损失函数 2507.10574v1 |
Authors (1): Jae Wan Shim
We propose the Linearly Adaptive Cross Entropy Loss function. This is a novel measure derived from the information theory. In comparison to the standard cross entropy loss function, the proposed one has an additional term that depends on the predicted probability of the true class. This feature serves to enhance the optimization process in classification tasks involving one-hot encoded class labels. The proposed one has been evaluated on a ResNet-based model using the CIFAR-100 dataset. Preliminary results show that the proposed one consistently outperforms the standard cross entropy loss function in terms of classification accuracy. Moreover, the proposed one maintains simplicity, achieving practically the same efficiency to the traditional cross entropy loss. These findings suggest that our approach could broaden the scope for future research into loss function design.
nan
Article 1274
Title@2025-07-10 (4): Efficient Causal Discovery for Autoregressive Time Series
Title: Efficient Causal Discovery for Autoregressive Time Series | Effiziente Causal Discovery für autoregressive Zeitreihen | 自动递减时间序列高效因果发现 2507.07898v1 |
Authors (2): Mohammad Fesanghary, Achintya Gopal
In this study, we present a novel constraint-based algorithm for causal structure learning specifically designed for nonlinear autoregressive time series. Our algorithm significantly reduces computational complexity compared to existing methods, making it more efficient and scalable to larger problems. We rigorously evaluate its performance on synthetic datasets, demonstrating that our algorithm not only outperforms current techniques, but also excels in scenarios with limited data availability. These results highlight its potential for practical applications in fields requiring efficient and accurate causal inference from nonlinear time series data.
nan
Article 1275
Title@2025-07-10 (4): Sampling Imbalanced Data with Multi-objective Bilevel Optimization
Title: Sampling Imbalanced Data with Multi-objective Bilevel Optimization | Probenahme ausgewogener Daten mit multi-objektiver Bilevel-Optimierung | 具有多目标双一级最佳优化的数据 2506.11315v2 |
Authors (3): Karen Medlin, Sven Leyffer, Krishnan Raghavan
Two-class classification problems are often characterized by an imbalance between the number of majority and minority datapoints resulting in poor classification of the minority class in particular. Traditional approaches, such as reweighting the loss function or na"ive resampling, risk overfitting and subsequently fail to improve classification because they do not consider the diversity between majority and minority datasets. Such consideration is infeasible because there is no metric that can measure the impact of imbalance on the model. To obviate these challenges, we make two key contributions. First, we introduce MOODS~(Multi-Objective Optimization for Data Sampling), a novel multi-objective bilevel optimization framework that guides both synthetic oversampling and majority undersampling. Second, we introduce a validation metric – `$\epsilon/ \delta$ non-overlapping diversification metric’ – that quantifies the goodness of a sampling method towards model performance. With this metric we experimentally demonstrate state-of-the-art performance with improvement in diversity driving a $1-15 \%$ increase in $F1$ scores.
nan
Article 1276
Title@2025-07-10 (4): Masked Image Modeling: A Survey
Title: Masked Image Modeling: A Survey | Maskenbildmodellierung: Eine Umfrage | 蒙面图像建模:调查 2408.06687v3 |
Authors (5): Vlad Hondru, Florinel Alin Croitoru, Shervin Minaee, Radu Tudor Ionescu, Nicu Sebe
In this work, we survey recent studies on masked image modeling (MIM), an approach that emerged as a powerful self-supervised learning technique in computer vision. The MIM task involves masking some information, e.g. pixels, patches, or even latent representations, and training a model, usually an autoencoder, to predicting the missing information by using the context available in the visible part of the input. We identify and formalize two categories of approaches on how to implement MIM as a pretext task, one based on reconstruction and one based on contrastive learning. Then, we construct a taxonomy and review the most prominent papers in recent years. We complement the manually constructed taxonomy with a dendrogram obtained by applying a hierarchical clustering algorithm. We further identify relevant clusters via manually inspecting the resulting dendrogram. Our review also includes datasets that are commonly used in MIM research. We aggregate the performance results of various masked image modeling methods on the most popular datasets, to facilitate the comparison of competing methods. Finally, we identify research gaps and propose several interesting directions of future work. We supplement our survey with the following public repository containing organized references: https://github.com/vladhondru25/MIM-Survey.
nan
Article 1277
Title@2025-07-10 (4): A Bilevel Optimization Framework for Imbalanced Data Classification
Title: A Bilevel Optimization Framework for Imbalanced Data Classification | Ein Bilevel-Optimierungsrahmen für die unausgewogene Datenklassifikation | 平衡数据分类双级优化框架 2410.11171v3 |
Authors (3): Karen Medlin, Sven Leyffer, Krishnan Raghavan
Data rebalancing techniques, including oversampling and undersampling, are a common approach to addressing the challenges of imbalanced data. To tackle unresolved problems related to both oversampling and undersampling, we propose a new undersampling approach that: (i) avoids the pitfalls of noise and overlap caused by synthetic data and (ii) avoids the pitfall of under-fitting caused by random undersampling. Instead of undersampling majority data randomly, our method undersamples datapoints based on their ability to improve model loss. Using improved model loss as a proxy measurement for classification performance, our technique assesses a datapoint’s impact on loss and rejects those unable to improve it. In so doing, our approach rejects majority datapoints redundant to datapoints already accepted and, thereby, finds an optimal subset of majority training data for classification. The accept/reject component of our algorithm is motivated by a bilevel optimization problem uniquely formulated to identify the optimal training set we seek. Experimental results show our proposed technique with F1 scores up to 10% higher than state-of-the-art methods.
nan
Article 1278
Title@2025-07-10 (4): UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs
Title: UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs | UnIT: Skalierbare unstrukturierte Schlussfolgerungs-Zeit-Rechnung für MAC-effiziente Neuralinferenz auf MCUs | UnIT:MAC 高效神经引力对多边协调单位的可缩放无结构的推推力-时间节制 2507.07885v1 |
Authors (6): Ashe Neth, Sawinder kaur, Mohammad Nur Hossain Khan, Subrata Biswas, Asif Salekin, Bashima Islam
Existing pruning methods are typically applied during training or compile time and often rely on structured sparsity. While compatible with low-power microcontrollers (MCUs), structured pruning underutilizes the opportunity for fine-grained efficiency on devices without SIMD support or parallel compute. To address these limitations, we introduce UnIT (Unstructured Inference-Time pruning), a lightweight method that dynamically identifies and skips unnecessary multiply-accumulate (MAC) operations during inference, guided by input-specific activation patterns. Unlike structured pruning, UnIT embraces irregular sparsity and does not require retraining or hardware specialization. It transforms pruning decisions into lightweight comparisons, replacing multiplications with threshold checks and approximated divisions. UnIT further optimizes compute by reusing threshold computations across multiple connections and applying layer- and group-specific pruning sensitivity. We present three fast, hardware-friendly division approximations tailored to the capabilities of common embedded platforms. Demonstrated on the MSP430 microcontroller, UnIT achieves 11.02% to 82.03% MAC reduction, 27.30% to 84.19% faster inference, and 27.33% to 84.38% lower energy consumption compared to training-time pruned models, while maintaining accuracy with 0.48-7%. Under domain shift, UnIT matches or exceeds the accuracy of retrained models while requiring significantly fewer MACs. These results establish unstructured inference-time pruning as a viable and practical solution for efficient, retraining-free deployment of deep neural networks on MCUs.
nan
Article 1279
Title@2025-07-10 (4): Can AI-predicted complexes teach machine learning to compute drug binding affinity?
Title: Can AI-predicted complexes teach machine learning to compute drug binding affinity? | Können KI-vorhergesehene Komplexe maschinelles Lernen beibringen, um Arzneimittelbindungsaffinität zu berechnen? | 人工智能预测综合体能教机器学习如何计算药物绑定的亲缘关系吗? 2507.07882v1 |
Authors (5): Wei-Tse Hsu, Savva Grevtsev, Thomas Douglas, Aniket Magarkar, Philip C. Biggin
We evaluate the feasibility of using co-folding models for synthetic data augmentation in training machine learning-based scoring functions (MLSFs) for binding affinity prediction. Our results show that performance gains depend critically on the structural quality of augmented data. In light of this, we established simple heuristics for identifying high-quality co-folding predictions without reference structures, enabling them to substitute for experimental structures in MLSF training. Our study informs future data augmentation strategies based on co-folding models.
nan
Article 1280
Title@2025-07-10 (4): What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Title: What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models | Was hat ein Stiftungsmodell gefunden? Mit induktiven Bias zur Untersuchung von Weltmodellen | ” 基金会模式 “ 有何发现? 2507.06952v2 |
Authors (4): Keyon Vafa, Peter G. Chang, Ashesh Rambachan, Sendhil Mullainathan
Foundation models are premised on the idea that sequence prediction can uncover deeper domain understanding, much like how Kepler’s predictions of planetary motion later led to the discovery of Newtonian mechanics. However, evaluating whether these models truly capture deeper structure remains a challenge. We develop a technique for evaluating foundation models that examines how they adapt to synthetic datasets generated from some postulated world model. Our technique measures whether the foundation model’s inductive bias aligns with the world model, and so we refer to it as an inductive bias probe. Across multiple domains, we find that foundation models can excel at their training tasks yet fail to develop inductive biases towards the underlying world model when adapted to new tasks. We particularly find that foundation models trained on orbital trajectories consistently fail to apply Newtonian mechanics when adapted to new physics tasks. Further analysis reveals that these models behave as if they develop task-specific heuristics that fail to generalize.
nan
Article 1281
Title@2025-07-10 (4): Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models
Title: Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models | Edge-ASR: Auf dem Weg zur Low-Bit Quantisierung von automatischen Spracherkennungsmodellen | 边缘-ASR:实现自动语音识别模式的低比量量化 2507.07877v1 |
Authors (7): Chen Feng, Yicheng Lin, Shaojie Zhuo, Chenzheng Su, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Xiaopeng Zhang
Recent advances in Automatic Speech Recognition (ASR) have demonstrated remarkable accuracy and robustness in diverse audio applications, such as live transcription and voice command processing. However, deploying these models on resource constrained edge devices (e.g., IoT device, wearables) still presents substantial challenges due to strict limits on memory, compute and power. Quantization, particularly Post-Training Quantization (PTQ), offers an effective way to reduce model size and inference cost without retraining. Despite its importance, the performance implications of various advanced quantization methods and bit-width configurations on ASR models remain unclear. In this work, we present a comprehensive benchmark of eight state-of-the-art (SOTA) PTQ methods applied to two leading edge-ASR model families, Whisper and Moonshine. We systematically evaluate model performances (i.e., accuracy, memory I/O and bit operations) across seven diverse datasets from the open ASR leaderboard, analyzing the impact of quantization and various configurations on both weights and activations. Built on an extension of the LLM compression toolkit, our framework integrates edge-ASR models, diverse advanced quantization algorithms, a unified calibration and evaluation data pipeline, and detailed analysis tools. Our results characterize the trade-offs between efficiency and accuracy, demonstrating that even 3-bit quantization can succeed on high capacity models when using advanced PTQ techniques. These findings provide valuable insights for optimizing ASR models on low-power, always-on edge devices.
nan
Article 1282
Title@2025-07-10 (4): Fair Uncertainty Quantification for Depression Prediction
Title: Fair Uncertainty Quantification for Depression Prediction | Faire Unsicherheit Quantifizierung für Depression Vorhersage | 预测萧条预测的公平不确定性量化 2505.04931v2 |
Authors (2): Yonghong Li, Xiuzhuang Zhou
Trustworthy depression prediction based on deep learning, incorporating both predictive reliability and algorithmic fairness across diverse demographic groups, is crucial for clinical application. Recently, achieving reliable depression predictions through uncertainty quantification has attracted increasing attention. However, few studies have focused on the fairness of uncertainty quantification (UQ) in depression prediction. In this work, we investigate the algorithmic fairness of UQ, namely Equal Opportunity Coverage (EOC) fairness, and propose Fair Uncertainty Quantification (FUQ) for depression prediction. FUQ pursues reliable and fair depression predictions through group-based analysis. Specifically, we first group all the participants by different sensitive attributes and leverage conformal prediction to quantify uncertainty within each demographic group, which provides a theoretically guaranteed and valid way to quantify uncertainty for depression prediction and facilitates the investigation of fairness across different demographic groups. Furthermore, we propose a fairness-aware optimization strategy that formulates fairness as a constrained optimization problem under EOC constraints. This enables the model to preserve predictive reliability while adapting to the heterogeneous uncertainty levels across demographic groups, thereby achieving optimal fairness. Through extensive evaluations on several visual and audio depression datasets, our approach demonstrates its effectiveness.
nan
Article 1283
Title@2025-07-10 (4): Improving AEBS Validation Through Objective Intervention Classification Leveraging the Prediction Divergence Principle
Title: Improving AEBS Validation Through Objective Intervention Classification Leveraging the Prediction Divergence Principle | Verbesserung der AEBS-Validierung durch Ziel-Interventions-Klassifikation Begünstigung des Prinzips der Prognoseabweichung | 通过利用预测差异原则的客观干预分类,改进对AEBS的验证 2507.07872v1 |
Authors (2): Daniel Betschinske, Steven Peters
The safety validation of automatic emergency braking system (AEBS) requires accurately distinguishing between false positive (FP) and true positive (TP) system activations. While simulations allow straightforward differentiation by comparing scenarios with and without interventions, analyzing activations from open-loop resimulations - such as those from field operational testing (FOT) - is more complex. This complexity arises from scenario parameter uncertainty and the influence of driver interventions in the recorded data. Human labeling is frequently used to address these challenges, relying on subjective assessments of intervention necessity or situational criticality, potentially introducing biases and limitations. This work proposes a rule-based classification approach leveraging the Prediction Divergence Principle (PDP) to address those issues. Applied to a simplified AEBS, the proposed method reveals key strengths, limitations, and system requirements for effective implementation. The findings suggest that combining this approach with human labeling may enhance the transparency and consistency of classification, thereby improving the overall validation process. While the rule set for classification derived in this work adopts a conservative approach, the paper outlines future directions for refinement and broader applicability. Finally, this work highlights the potential of such methods to complement existing practices, paving the way for more reliable and reproducible AEBS validation frameworks.
nan
Article 1284
Title@2025-07-10 (4): Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking
Title: Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking | Abmildernde Wasserzeichen-Stealing-Angriffe in generativen Modellen über Multi-Key-Wasserzeichen | 通过多钥匙划水标记,在产生模型时通过多钥匙划水标记减轻盗用盗用水标志袭击 2507.07871v1 |
Authors (8): Toluwani Aremu, Noor Hussein, Munachiso Nwadike, Samuele Poppi, Jie Zhang, Karthik Nandakumar, Neil Gong, Nils Lukas
Watermarking offers a promising solution for GenAI providers to establish the provenance of their generated content. A watermark is a hidden signal embedded in the generated content, whose presence can later be verified using a secret watermarking key. A threat to GenAI providers are \emph{watermark stealing} attacks, where users forge a watermark into content that was \emph{not} generated by the provider’s models without access to the secret key, e.g., to falsely accuse the provider. Stealing attacks collect \emph{harmless} watermarked samples from the provider’s model and aim to maximize the expected success rate of generating \emph{harmful} watermarked samples. Our work focuses on mitigating stealing attacks while treating the underlying watermark as a black-box. Our contributions are: (i) Proposing a multi-key extension to mitigate stealing attacks that can be applied post-hoc to any watermarking method across any modality. (ii) We provide theoretical guarantees and demonstrate empirically that our method makes forging substantially less effective across multiple datasets, and (iii) we formally define the threat of watermark forging as the task of generating harmful, watermarked content and model this threat via security games.
nan
Article 1285
Title@2025-07-10 (4): Parametric Scaling Law of Tuning Bias in Conformal Prediction
Title: Parametric Scaling Law of Tuning Bias in Conformal Prediction | Parametrisches Skalierungsgesetz des Tuning Bias in konformer Vorhersage | 非正规预测中计票比价的参数衡量法 2502.03023v2 |
Authors (4): Hao Zeng, Kangdao Liu, Bingyi Jing, Hongxin Wei
Conformal prediction is a popular framework of uncertainty quantification that constructs prediction sets with coverage guarantees. To uphold the exchangeability assumption, many conformal prediction methods necessitate an additional holdout set for parameter tuning. Yet, the impact of violating this principle on coverage remains underexplored, making it ambiguous in practical applications. In this work, we empirically find that the tuning bias - the coverage gap introduced by leveraging the same dataset for tuning and calibration, is negligible for simple parameter tuning in many conformal prediction methods. In particular, we observe the scaling law of the tuning bias: this bias increases with parameter space complexity and decreases with calibration set size. Formally, we establish a theoretical framework to quantify the tuning bias and provide rigorous proof for the scaling law of the tuning bias by deriving its upper bound. In the end, we discuss how to reduce the tuning bias, guided by the theories we developed.
nan
Article 1286
Title@2025-07-10 (4): Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders
Title: Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders | Re-Bottleneck: Latent Re-Structuring für Neural Audio Autoencoder | 重新装瓶器:神经音频自动自动编码器前端重新结构 2507.07867v1 |
Authors (3): Dimitrios Bralios, Jonah Casebeer, Paris Smaragdis
Neural audio codecs and autoencoders have emerged as versatile models for audio compression, transmission, feature-extraction, and latent-space generation. However, a key limitation is that most are trained to maximize reconstruction fidelity, often neglecting the specific latent structure necessary for optimal performance in diverse downstream applications. We propose a simple, post-hoc framework to address this by modifying the bottleneck of a pre-trained autoencoder. Our method introduces a “Re-Bottleneck”, an inner bottleneck trained exclusively through latent space losses to instill user-defined structure. We demonstrate the framework’s effectiveness in three experiments. First, we enforce an ordering on latent channels without sacrificing reconstruction quality. Second, we align latents with semantic embeddings, analyzing the impact on downstream diffusion modeling. Third, we introduce equivariance, ensuring that a filtering operation on the input waveform directly corresponds to a specific transformation in the latent space. Ultimately, our Re-Bottleneck framework offers a flexible and efficient way to tailor representations of neural audio models, enabling them to seamlessly meet the varied demands of different applications with minimal additional training.
nan
Article 1287
Title@2025-07-10 (4): Predicting and generating antibiotics against future pathogens with ApexOracle
Title: Predicting and generating antibiotics against future pathogens with ApexOracle | Vorhersage und Generierung von Antibiotika gegen zukünftige Krankheitserreger mit ApexOracle | 预测并产生抗生素,用ApexOracle来防治未来的病原体 2507.07862v1 |
Authors (4): Tianang Leng, Fangping Wan, Marcelo Der Torossian Torres, Cesar de la Fuente-Nunez
Antimicrobial resistance (AMR) is escalating and outpacing current antibiotic development. Thus, discovering antibiotics effective against emerging pathogens is becoming increasingly critical. However, existing approaches cannot rapidly identify effective molecules against novel pathogens or emerging drug-resistant strains. Here, we introduce ApexOracle, an artificial intelligence (AI) model that both predicts the antibacterial potency of existing compounds and designs de novo molecules active against strains it has never encountered. Departing from models that rely solely on molecular features, ApexOracle incorporates pathogen-specific context through the integration of molecular features captured via a foundational discrete diffusion language model and a dual-embedding framework that combines genomic- and literature-derived strain representations. Across diverse bacterial species and chemical modalities, ApexOracle consistently outperformed state-of-the-art approaches in activity prediction and demonstrated reliable transferability to novel pathogens with little or no antimicrobial data. Its unified representation-generation architecture further enables the in silico creation of “new-to-nature” molecules with high predicted efficacy against priority threats. By pairing rapid activity prediction with targeted molecular generation, ApexOracle offers a scalable strategy for countering AMR and preparing for future infectious-disease outbreaks.
nan
Article 1288
Title@2025-07-10 (4): Studying and Improving Graph Neural Network-based Motif Estimation
Title: Studying and Improving Graph Neural Network-based Motif Estimation | Untersuchung und Verbesserung der graphischen Neuralnetz-basierten Motivationsschätzung | 研究和改善图形神经网络基于Motif 估计 2506.15709v3 |
Authors (3): Pedro C. Vieira, Miguel E. P. Silva, Pedro Manuel Pinto Ribeiro
Graph Neural Networks (GNNs) are a predominant method for graph representation learning. However, beyond subgraph frequency estimation, their application to network motif significance-profile (SP) prediction remains under-explored, with no established benchmarks in the literature. We propose to address this problem, framing SP estimation as a task independent of subgraph frequency estimation. Our approach shifts from frequency counting to direct SP estimation and modulates the problem as multitarget regression. The reformulation is optimised for interpretability, stability and scalability on large graphs. We validate our method using a large synthetic dataset and further test it on real-world graphs. Our experiments reveal that 1-WL limited models struggle to make precise estimations of SPs. However, they can generalise to approximate the graph generation processes of networks by comparing their predicted SP with the ones originating from synthetic generators. This first study on GNN-based motif estimation also hints at how using direct SP estimation can help go past the theoretical limitations that motif estimation faces when performed through subgraph counting.
nan
Article 1289
Title@2025-07-10 (4): Principled Foundations for Preference Optimization
Title: Principled Foundations for Preference Optimization | Prinzipierte Grundlagen für die Preference-Optimierung | 最优化原则基金会 2507.07855v1 |
Authors (7): Wenxuan Zhou, Shujian Zhang, Brice Magdalou, John Lambert, Ehsan Amid, Richard Nock, Andrew Hard
In this paper, we show that direct preference optimization (DPO) is a very specific form of a connection between two major theories in the ML context of learning from preferences: loss functions (Savage) and stochastic choice (Doignon-Falmagne and Machina). The connection is established for all of Savage’s losses and at this level of generality, (i) it includes support for abstention on the choice theory side, (ii) it includes support for non-convex objectives on the ML side, and (iii) it allows to frame for free some notable extensions of the DPO setting, including margins and corrections for length. Getting to understand how DPO operates from a general principled perspective is crucial because of the huge and diverse application landscape of models, because of the current momentum around DPO, but also – and importantly – because many state of the art variations on DPO definitely occupy a small region of the map that we cover. It also helps to understand the pitfalls of departing from this map, and figure out workarounds.
nan
Article 1290
Title@2025-07-10 (4): Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain
Title: Credit Risk Analysis for SMEs Using Graph Neural Networks in Supply Chain | Kreditrisikoanalyse für KMU mit Hilfe von Graph Neural Networks in der Lieferkette | 利用供应链中图表神经网络的中小企业信贷风险分析 2507.07854v1 |
Authors (5): Zizhou Zhang, Qinyan Shen, Zhuohuan Hu, Qianying Liu, Huijie Shen
Small and Medium-sized Enterprises (SMEs) are vital to the modern economy, yet their credit risk analysis often struggles with scarce data, especially for online lenders lacking direct credit records. This paper introduces a Graph Neural Network (GNN)-based framework, leveraging SME interactions from transaction and social data to map spatial dependencies and predict loan default risks. Tests on real-world datasets from Discover and Ant Credit (23.4M nodes for supply chain analysis, 8.6M for default prediction) show the GNN surpasses traditional and other GNN baselines, with AUCs of 0.995 and 0.701 for supply chain mining and default prediction, respectively. It also helps regulators model supply chain disruption impacts on banks, accurately forecasting loan defaults from material shortages, and offers Federal Reserve stress testers key data for CCAR risk buffers. This approach provides a scalable, effective tool for assessing SME credit risk.
nan
Article 1291
Title@2025-07-10 (4): Optimization Guarantees for Square-Root Natural-Gradient Variational Inference
Title: Optimization Guarantees for Square-Root Natural-Gradient Variational Inference | Optimierungsgarantien für Square-Root Natural-Gradient Variational Inferenz | 平方-极极自然-梯度变动性推断的最佳保障 2507.07853v1 |
Authors (4): Navish Kumar, Thomas Möllenhoff, Mohammad Emtiyaz Khan, Aurelien Lucchi
Variational inference with natural-gradient descent often shows fast convergence in practice, but its theoretical convergence guarantees have been challenging to establish. This is true even for the simplest cases that involve concave log-likelihoods and use a Gaussian approximation. We show that the challenge can be circumvented for such cases using a square-root parameterization for the Gaussian covariance. This approach establishes novel convergence guarantees for natural-gradient variational-Gaussian inference and its continuous-time gradient flow. Our experiments demonstrate the effectiveness of natural gradient methods and highlight their advantages over algorithms that use Euclidean or Wasserstein geometries.
nan
Article 1292
Title@2025-07-10 (4): Pre-Trained AI Model Assisted Online Decision-Making under Missing Covariates: A Theoretical Perspective
Title: Pre-Trained AI Model Assisted Online Decision-Making under Missing Covariates: A Theoretical Perspective | Pre-Trained AI Model Assisted Online Entscheidungsfindung unter fehlenden Kovariaten: Eine theoretische Perspektive | 在失踪的共变之下协助作出在线决策的模式:理论视角 2507.07852v1 |
Authors (2): Haichen Hu, David Simchi-Levi
We study a sequential contextual decision-making problem in which certain covariates are missing but can be imputed using a pre-trained AI model. From a theoretical perspective, we analyze how the presence of such a model influences the regret of the decision-making process. We introduce a novel notion called “model elasticity”, which quantifies the sensitivity of the reward function to the discrepancy between the true covariate and its imputed counterpart. This concept provides a unified way to characterize the regret incurred due to model imputation, regardless of the underlying missingness mechanism. More surprisingly, we show that under the missing at random (MAR) setting, it is possible to sequentially calibrate the pre-trained model using tools from orthogonal statistical learning and doubly robust regression. This calibration significantly improves the quality of the imputed covariates, leading to much better regret guarantees. Our analysis highlights the practical value of having an accurate pre-trained model in sequential decision-making tasks and suggests that model elasticity may serve as a fundamental metric for understanding and improving the integration of pre-trained models in a wide range of data-driven decision-making problems.
nan
Article 1293
Title@2025-07-10 (4): Revisiting the Predictability of Performative, Social Events
Title: Revisiting the Predictability of Performative, Social Events | Über die Vorhersagbarkeit von performativen, gesellschaftlichen Veranstaltungen | 重新审视表演性、社会活动的可预测性 2503.11713v2 |
Authors (1): Juan C. Perdomo
Social predictions do not passively describe the future; they actively shape it. They inform actions and change individual expectations in ways that influence the likelihood of the predicted outcome. Given these dynamics, to what extent can social events be predicted? This question was discussed throughout the 20th century by authors like Merton, Morgenstern, Simon, and others who considered it a central issue in social science methodology. In this work, we provide a modern answer to this old problem. Using recent ideas from performative prediction and outcome indistinguishability, we establish that one can always efficiently predict social events accurately, regardless of how predictions influence data. While achievable, we also show that these predictions are often undesirable, highlighting the limitations of previous desiderata. We end with a discussion of various avenues forward.
nan
Article 1294
Title@2025-07-10 (4): “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents
Title: “So, Tell Me About Your Policy…”: Distillation of interpretable policies from Deep Reinforcement Learning agents | “So, erzählen Sie mir über Ihre Politik…”: Destillation von interpretierbaren Richtlinien von Deep Reinforcement Learning Agents | “告诉我你们的政策……:从深强化学习机构那里提炼可解释的政策”。 2507.07848v1 |
Authors (3): Giovanni Dispoto, Paolo Bonetti, Marcello Restelli
Recent advances in Reinforcement Learning (RL) largely benefit from the inclusion of Deep Neural Networks, boosting the number of novel approaches proposed in the field of Deep Reinforcement Learning (DRL). These techniques demonstrate the ability to tackle complex games such as Atari, Go, and other real-world applications, including financial trading. Nevertheless, a significant challenge emerges from the lack of interpretability, particularly when attempting to comprehend the underlying patterns learned, the relative importance of the state features, and how they are integrated to generate the policy’s output. For this reason, in mission-critical and real-world settings, it is often preferred to deploy a simpler and more interpretable algorithm, although at the cost of performance. In this paper, we propose a novel algorithm, supported by theoretical guarantees, that can extract an interpretable policy (e.g., a linear policy) without disregarding the peculiarities of expert behavior. This result is obtained by considering the advantage function, which includes information about why an action is superior to the others. In contrast to previous works, our approach enables the training of an interpretable policy using previously collected experience. The proposed algorithm is empirically evaluated on classic control environments and on a financial trading scenario, demonstrating its ability to extract meaningful information from complex expert policies.
nan
Article 1295
Title@2025-07-10 (4): Response Wide Shut? Surprising Observations in Basic Vision Language Model Capabilities
Title: Response Wide Shut? Surprising Observations in Basic Vision Language Model Capabilities | Response Wide Shut? Überraschende Beobachtungen in grundlegenden Vision Sprachmodell Fähigkeiten | 在基本愿景语言模型能力中的令人惊讶的观察 2507.10442v1 |
Authors (5): Shivam Chandhok, Wan-Cyuan Fan, Vered Shwartz, Vineeth N Balasubramanian, Leonid Sigal
Vision-language Models (VLMs) have emerged as general-purpose tools for addressing a variety of complex computer vision problems. Such models have been shown to be highly capable, but, at the same time, lacking some basic visual understanding skills. In this paper, we set out to understand the limitations of SoTA VLMs on fundamental visual tasks by constructing a series of tests that probe which components of design, specifically, may be lacking. Importantly, we go significantly beyond the current benchmarks, which simply measure the final performance of VLM response, by also comparing and contrasting it to the performance of probes trained directly on features obtained from the visual encoder, intermediate vision-language projection and LLM-decoder output. In doing so, we uncover shortcomings in VLMs and make a number of important observations about their capabilities, robustness and how they process visual information. We hope our insights will guide progress in further improving VLMs.
nan
Article 1296
Title@2025-07-10 (4): Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components
Title: Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components | Bewertung der Einhaltung der Hierarchischen Sicherheitsgrundsätze durch LLM-Agenten: Ein leichter Maßstab für die Erprobung grundlegender Steuerungskomponenten | 遵守等级安全原则:基础控制组成部分检验的轻量基准 2506.02357v2 |
Authors (1): Ram Potham
Credible safety plans for advanced AI development require methods to verify agent behavior and detect potential control deficiencies early. A fundamental aspect is ensuring agents adhere to safety-critical principles, especially when these conflict with operational goals. This paper introduces a lightweight, interpretable benchmark to evaluate an LLM agent’s ability to uphold a high-level safety principle when faced with conflicting task instructions. Our evaluation of six LLMs reveals two primary findings: (1) a quantifiable “cost of compliance” where safety constraints degrade task performance even when compliant solutions exist, and (2) an “illusion of compliance” where high adherence often masks task incompetence rather than principled choice. These findings provide initial evidence that while LLMs can be influenced by hierarchical directives, current approaches lack the consistency required for reliable safety governance.
nan
Article 1297
Title@2025-07-10 (4): Unsupervised Morphological Tree Tokenizer
Title: Unsupervised Morphological Tree Tokenizer | Unüberwachter morphologischer Baum Tokenizer | 不受监督的病理树化器 2406.15245v2 |
Authors (5): Qingyang Zhu, Xiang Hu, Pengyu Ji, Wei Wu, Kewei Tu
As a cornerstone in language modeling, tokenization involves segmenting text inputs into pre-defined atomic units. Conventional statistical tokenizers often disrupt constituent boundaries within words, thereby corrupting semantic information. To address this drawback, we introduce morphological structure guidance to tokenization and propose a deep model to induce character-level structures of words. Specifically, the deep model jointly encodes internal structures and representations of words with a mechanism named $\textit{MorphOverriding}$ to ensure the indecomposability of morphemes. By training the model with self-supervised objectives, our method is capable of inducing character-level structures that align with morphological rules without annotated training data. Based on the induced structures, our algorithm tokenizes words through vocabulary matching in a top-down manner. Empirical results indicate that the proposed method effectively retains complete morphemes and outperforms widely adopted methods such as BPE and WordPiece on both morphological segmentation tasks and language modeling tasks. Code is available at https://github.com/martianmartina/TreeTokenizer.
nan
Article 1298
Title@2025-07-10 (4): Statistical physics analysis of graph neural networks: Approaching optimality in the contextual stochastic block model
Title: Statistical physics analysis of graph neural networks: Approaching optimality in the contextual stochastic block model | Statistische Physik-Analyse von Graphen-Neuronalen-Netzwerken: Annäherung an die Optimität im kontextuellen stochastischen Blockmodell | 图形神经网络的统计物理学分析:在背景随机区块模型中接近最佳性 2503.01361v2 |
Authors (2): O. Duranthon, L. Zdeborová
Graph neural networks (GNNs) are designed to process data associated with graphs. They are finding an increasing range of applications; however, as with other modern machine learning techniques, their theoretical understanding is limited. GNNs can encounter difficulties in gathering information from nodes that are far apart by iterated aggregation steps. This situation is partly caused by so-called oversmoothing; and overcoming it is one of the practically motivated challenges. We consider the situation where information is aggregated by multiple steps of convolution, leading to graph convolutional networks (GCNs). We analyze the generalization performance of a basic GCN, trained for node classification on data generated by the contextual stochastic block model. We predict its asymptotic performance by deriving the free energy of the problem, using the replica method, in the high-dimensional limit. Calling depth the number of convolutional steps, we show the importance of going to large depth to approach the Bayes-optimality. We detail how the architecture of the GCN has to scale with the depth to avoid oversmoothing. The resulting large depth limit can be close to the Bayes-optimality and leads to a continuous GCN. Technically, we tackle this continuous limit via an approach that resembles dynamical mean-field theory (DMFT) with constraints at the initial and final times. An expansion around large regularization allows us to solve the corresponding equations for the performance of the deep GCN. This promising tool may contribute to the analysis of further deep neural networks.
nan
Article 1299
Title@2025-07-10 (4): Towards Benchmarking Foundation Models for Tabular Data With Text
Title: Towards Benchmarking Foundation Models for Tabular Data With Text | Auf dem Weg zu Benchmarking-Grundlagenmodellen für tabellarische Daten mit Text | 建立文字表格数据基准基准基础模型 2507.07829v1 |
Authors (5): Martin Mráz, Breenda Das, Anshul Gupta, Lennart Purucker, Frank Hutter
Foundation models for tabular data are rapidly evolving, with increasing interest in extending them to support additional modalities such as free-text features. However, existing benchmarks for tabular data rarely include textual columns, and identifying real-world tabular datasets with semantically rich text features is non-trivial. We propose a series of simple yet effective ablation-style strategies for incorporating text into conventional tabular pipelines. Moreover, we benchmark how state-of-the-art tabular foundation models can handle textual data by manually curating a collection of real-world tabular datasets with meaningful textual features. Our study is an important step towards improving benchmarking of foundation models for tabular data with text.
nan
Article 1300
Title@2025-07-10 (4): An Empirical Bernstein Inequality for Dependent Data in Hilbert Spaces and Applications
Title: An Empirical Bernstein Inequality for Dependent Data in Hilbert Spaces and Applications | Eine empirische Bernsteinungleichheit für abhängige Daten in Hilbert-Räumen und Anwendungen | 希尔伯特空间和应用中依赖数据方面的不平等问题 2507.07826v1 |
Authors (4): Erfan Mirzaei, Andreas Maurer, Vladimir R. Kostic, Massimiliano Pontil
Learning from non-independent and non-identically distributed data poses a persistent challenge in statistical learning. In this study, we introduce data-dependent Bernstein inequalities tailored for vector-valued processes in Hilbert space. Our inequalities apply to both stationary and non-stationary processes and exploit the potential rapid decay of correlations between temporally separated variables to improve estimation. We demonstrate the utility of these bounds by applying them to covariance operator estimation in the Hilbert-Schmidt norm and to operator learning in dynamical systems, achieving novel risk bounds. Finally, we perform numerical experiments to illustrate the practical implications of these bounds in both contexts.
nan
Article 1301
Title@2025-07-10 (4): Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution
Title: Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution | Symmetrie entdecken Breaking in Physical Systems mit entspannter Gruppenkonvolution | 发现物理系统中的对称断裂与放松的集团革命 2310.02299v8 |
Authors (5): Rui Wang, Elyssa Hofgard, Han Gao, Robin Walters, Tess E. Smidt
Modeling symmetry breaking is essential for understanding the fundamental changes in the behaviors and properties of physical systems, from microscopic particle interactions to macroscopic phenomena like fluid dynamics and cosmic structures. Thus, identifying sources of asymmetry is an important tool for understanding physical systems. In this paper, we focus on learning asymmetries of data using relaxed group convolutions. We provide both theoretical and empirical evidence that this flexible convolution technique allows the model to maintain the highest level of equivariance that is consistent with data and discover the subtle symmetry-breaking factors in various physical systems. We employ various relaxed group convolution architectures to uncover various symmetry-breaking factors that are interpretable and physically meaningful in different physical systems, including the phase transition of crystal structure, the isotropy and homogeneity breaking in turbulent flow, and the time-reversal symmetry breaking in pendulum systems.
nan
Article 1302
Title@2025-07-10 (4): MAEBE: Multi-Agent Emergent Behavior Framework
Title: MAEBE: Multi-Agent Emergent Behavior Framework | MAEBE: Multi-Agent Emergent Behavior Framework | 多边代理新兴行为框架 2506.03053v2 |
Authors (4): Sinem Erisken, Timothy Gothard, Martin Leitgab, Ram Potham
Traditional AI safety evaluations on isolated LLMs are insufficient as multi-agent AI ensembles become prevalent, introducing novel emergent risks. This paper introduces the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework to systematically assess such risks. Using MAEBE with the Greatest Good Benchmark (and a novel double-inversion question technique), we demonstrate that: (1) LLM moral preferences, particularly for Instrumental Harm, are surprisingly brittle and shift significantly with question framing, both in single agents and ensembles. (2) The moral reasoning of LLM ensembles is not directly predictable from isolated agent behavior due to emergent group dynamics. (3) Specifically, ensembles exhibit phenomena like peer pressure influencing convergence, even when guided by a supervisor, highlighting distinct safety and alignment challenges. Our findings underscore the necessity of evaluating AI systems in their interactive, multi-agent contexts.
nan
Article 1303
Title@2025-07-10 (4): An Algorithm for Learning Smaller Representations of Models With Scarce Data
Title: An Algorithm for Learning Smaller Representations of Models With Scarce Data | Ein Algorithmus für das Lernen kleinerer Darstellungen von Modellen mit knappen Daten | 学习缺乏数据模型较小比例模型的计算方法 2010.07990v2 |
Authors (1): Adrian de Wynter
We present an algorithm for solving binary classification problems when the dataset is not fully representative of the problem being solved, and obtaining more data is not possible. It relies on a trained model with loose accuracy constraints, an iterative hyperparameter searching-and-pruning procedure over a search space $\Theta$, and a data-generating function. Our algorithm works by reconstructing up to homology the manifold on which lies the support of the underlying distribution. We provide an analysis on correctness and runtime complexity under ideal conditions and an extension to deep neural networks. In the former case, if $\size{\Theta}$ is the number of hyperparameter sets in the search space, this algorithm returns a solution that is up to $2(1 - {2^{-\size{\Theta}}})$ times better than simply training with an enumeration of $\Theta$ and picking the best model. As part of our analysis we also prove that an open cover of a dataset has the same homology as the manifold on which lies the support of the underlying probability distribution, if and only said dataset is learnable. This latter result acts as a formal argument to explain the effectiveness of data expansion techniques.
nan
Article 1304
Title@2025-07-10 (4): AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift
Title: AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift | KI sollte besser fühlen, nicht nur größer skalieren: Adaptive Sensing als Paradigmenverschiebung | AI 应当更好,而不仅仅是规模更大:将适应性遥感作为范式转变 2507.07820v1 |
Authors (6): Eunsu Baek, Keondo Park, Jeonggil Ko, Min-hwan Oh, Taesik Gong, Hyung-Sin Kim
Current AI advances largely rely on scaling neural models and expanding training datasets to achieve generalization and robustness. Despite notable successes, this paradigm incurs significant environmental, economic, and ethical costs, limiting sustainability and equitable access. Inspired by biological sensory systems, where adaptation occurs dynamically at the input (e.g., adjusting pupil size, refocusing vision)–we advocate for adaptive sensing as a necessary and foundational shift. Adaptive sensing proactively modulates sensor parameters (e.g., exposure, sensitivity, multimodal configurations) at the input level, significantly mitigating covariate shifts and improving efficiency. Empirical evidence from recent studies demonstrates that adaptive sensing enables small models (e.g., EfficientNet-B0) to surpass substantially larger models (e.g., OpenCLIP-H) trained with significantly more data and compute. We (i) outline a roadmap for broadly integrating adaptive sensing into real-world applications spanning humanoid, healthcare, autonomous systems, agriculture, and environmental monitoring, (ii) critically assess technical and ethical integration challenges, and (iii) propose targeted research directions, such as standardized benchmarks, real-time adaptive algorithms, multimodal integration, and privacy-preserving methods. Collectively, these efforts aim to transition the AI community toward sustainable, robust, and equitable artificial intelligence systems.
nan
Article 1305
Title@2025-07-10 (4): MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving
Title: MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving | MoSE: Skill-by-Skill-Mixture-of-Expert-Lernen für autonomes Fahren | MOSE: 自主驾驶专家技能与技能混合学习 2507.07818v1 |
Authors (10): Lu Xu, Jiaqian Yu, Xiongfeng Peng, Yiwei Chen, Weiming Li, Jaewook Yoo, Sunghyun Chunag, Dongwook Lee, Daehyun Ji, Chao Zhang
Recent studies show large language models (LLMs) and vision language models (VLMs) trained using web-scale data can empower end-to-end autonomous driving systems for a better generalization and interpretation. Specifically, by dynamically routing inputs to specialized subsets of parameters, the Mixture-of-Experts (MoE) technique enables general LLMs or VLMs to achieve substantial performance improvements while maintaining computational efficiency. However, general MoE models usually demands extensive training data and complex optimization. In this work, inspired by the learning process of human drivers, we propose a skill-oriented MoE, called MoSE, which mimics human drivers’ learning process and reasoning process, skill-by-skill and step-by-step. We propose a skill-oriented routing mechanism that begins with defining and annotating specific skills, enabling experts to identify the necessary driving competencies for various scenarios and reasoning tasks, thereby facilitating skill-by-skill learning. Further align the driving process to multi-step planning in human reasoning and end-to-end driving models, we build a hierarchical skill dataset and pretrain the router to encourage the model to think step-by-step. Unlike multi-round dialogs, MoSE integrates valuable auxiliary tasks (e.g.\ description, reasoning, planning) in one single forward process without introducing any extra computational cost. With less than 3B sparsely activated parameters, our model outperforms several 8B+ parameters on CODA AD corner case reasoning task. Compared to existing methods based on open-source models and data, our approach achieves state-of-the-art performance with significantly reduced activated model size (at least by $62.5\%$) with a single-turn conversation.
nan
Article 1306
Title@2025-07-10 (4): Pay Attention to Attention Distribution: A New Local Lipschitz Bound for Transformers
Title: Pay Attention to Attention Distribution: A New Local Lipschitz Bound for Transformers | Achten Sie auf Aufmerksamkeit Verteilung: Eine neue lokale Lipschitz Bound für Transformatoren | ” 注意注意分发 “ : “ 变革者新地方利普施奇茨圆环 “ 。 2507.07814v1 |
Authors (4): Nikolay Yudin, Alexander Gaponov, Sergei Kudriashov, Maxim Rakhuba
We present a novel local Lipschitz bound for self-attention blocks of transformers. This bound is based on a refined closed-form expression for the spectral norm of the softmax function. The resulting bound is not only more accurate than in the prior art, but also unveils the dependence of the Lipschitz constant on attention score maps. Based on the new findings, we suggest an explanation of the way distributions inside the attention map affect the robustness from the Lipschitz constant perspective. We also introduce a new lightweight regularization term called JaSMin (Jacobian Softmax norm Minimization), which boosts the transformer’s robustness and decreases local Lipschitz constants of the whole network.
nan
Article 1307
Title@2025-07-10 (4): “I am bad”: Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Title: “I am bad”: Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models | “I am bad”: Verdolmetschen von Stealthy, Universal und Robust Audio Jailbreaks in Audio-Language-Modellen | “我是坏人”:在音频语言模型中解释隐形、通用和强势音频牢房破损 2502.00718v2 |
Authors (3): Isha Gupta, David Khachaturov, Robert Mullins
The rise of multimodal large language models has introduced innovative human-machine interaction paradigms but also significant challenges in machine learning safety. Audio-Language Models (ALMs) are especially relevant due to the intuitive nature of spoken communication, yet little is known about their failure modes. This paper explores audio jailbreaks targeting ALMs, focusing on their ability to bypass alignment mechanisms. We construct adversarial perturbations that generalize across prompts, tasks, and even base audio samples, demonstrating the first universal jailbreaks in the audio modality, and show that these remain effective in simulated real-world conditions. Beyond demonstrating attack feasibility, we analyze how ALMs interpret these audio adversarial examples and reveal them to encode imperceptible first-person toxic speech - suggesting that the most effective perturbations for eliciting toxic outputs specifically embed linguistic features within the audio signal. These results have important implications for understanding the interactions between different modalities in multimodal models, and offer actionable insights for enhancing defenses against adversarial audio attacks.
nan
Article 1308
Title@2025-07-10 (4): Deep Survival Analysis in Multimodal Medical Data: A Parametric and Probabilistic Approach with Competing Risks
Title: Deep Survival Analysis in Multimodal Medical Data: A Parametric and Probabilistic Approach with Competing Risks | Tiefe Überlebensanalyse in multimodalen medizinischen Daten: Ein parametrischer und probabilistischer Ansatz mit kompetitiven Risiken | 多模式医疗数据深度生存分析:与相竞风险的参数和概率分析方法 2507.07804v1 |
Authors (5): Alba Garrido, Alejandro Almodóvar, Patricia A. Apellániz, Juan Parras, Santiago Zazo
Accurate survival prediction is critical in oncology for prognosis and treatment planning. Traditional approaches often rely on a single data modality, limiting their ability to capture the complexity of tumor biology. To address this challenge, we introduce a multimodal deep learning framework for survival analysis capable of modeling both single and competing risks scenarios, evaluating the impact of integrating multiple medical data sources on survival predictions. We propose SAMVAE (Survival Analysis Multimodal Variational Autoencoder), a novel deep learning architecture designed for survival prediction that integrates six data modalities: clinical variables, four molecular profiles, and histopathological images. SAMVAE leverages modality specific encoders to project inputs into a shared latent space, enabling robust survival prediction while preserving modality specific information. Its parametric formulation enables the derivation of clinically meaningful statistics from the output distributions, providing patient-specific insights through interactive multimedia that contribute to more informed clinical decision-making and establish a foundation for interpretable, data-driven survival analysis in oncology. We evaluate SAMVAE on two cancer cohorts breast cancer and lower grade glioma applying tailored preprocessing, dimensionality reduction, and hyperparameter optimization. The results demonstrate the successful integration of multimodal data for both standard survival analysis and competing risks scenarios across different datasets. Our model achieves competitive performance compared to state-of-the-art multimodal survival models. Notably, this is the first parametric multimodal deep learning architecture to incorporate competing risks while modeling continuous time to a specific event, using both tabular and image data.
nan
Article 1309
Title@2025-07-10 (4): Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning
Title: Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning | Kontextuelle Banditen in der Zahlungsabwicklung: Nicht einheitliche Exploration und überwachtes Lernen | 付款处理:非统一探索和监督学习 2412.00569v2 |
Authors (2): Akhila Vangara, Alex Egg
Uniform random exploration in decision-making systems supports off-policy learning via supervision but incurs high regret, making it impractical for many applications. Conversely, non-uniform exploration offers better immediate performance but lacks support for off-policy learning. Recent research suggests that regression oracles can bridge this gap by combining non-uniform exploration with supervised learning. In this paper, we analyze these approaches within a real-world industrial context at Adyen, a large global payments processor characterized by batch logged delayed feedback, short-term memory, and dynamic action spaces under the Empirical Risk Minimization (ERM) framework. Our analysis reveals that while regression oracles significantly improve performance, they introduce challenges due to rigid algorithmic assumptions. Specifically, we observe that as a policy improves, subsequent generations may perform worse due to shifts in the reward distribution and increased class imbalance in the training data. This degradation occurs de spite improvements in other aspects of the training data, leading to decreased performance in successive policy iterations. We further explore the long-term impact of regression oracles, identifying a potential “oscillation effect.” This effect arises when regression oracles influence probability estimates and the realizability of subsequent policy models, leading to fluctuations in performance across iterations. Our findings highlight the need for more adaptable algorithms that can leverage the benefits of regression oracles without introducing instability in policy performance over time.
nan
Article 1310
Title@2025-07-10 (4): Space-Filling Regularization for Robust and Interpretable Nonlinear State Space Models
Title: Space-Filling Regularization for Robust and Interpretable Nonlinear State Space Models | Raumfüllende Regularisierung für robuste und interpretierbare nichtlineare State Space Modelle | 强力和可解释的非线性国家空间模型的空间巡空常规化 2507.07792v1 |
Authors (3): Hermann Klein, Max Heinz Herkersdorf, Oliver Nelles
The state space dynamics representation is the most general approach for nonlinear systems and often chosen for system identification. During training, the state trajectory can deform significantly leading to poor data coverage of the state space. This can cause significant issues for space-oriented training algorithms which e.g. rely on grid structures, tree partitioning, or similar. Besides hindering training, significant state trajectory deformations also deteriorate interpretability and robustness properties. This paper proposes a new type of space-filling regularization that ensures a favorable data distribution in state space via introducing a data-distribution-based penalty. This method is demonstrated in local model network architectures where good interpretability is a major concern. The proposed approach integrates ideas from modeling and design of experiments for state space structures. This is why we present two regularization techniques for the data point distributions of the state trajectories for local affine state space models. Beyond that, we demonstrate the results on a widely known system identification benchmark.
nan
Article 1311
Title@2025-07-10 (4): Understanding Chain-of-Thought in LLMs through Information Theory
Title: Understanding Chain-of-Thought in LLMs through Information Theory | Verständnis der in LLMs durch Informationstheorie gesuchten Gedankenkette | 通过信息理论在LLM 中探索了解链 2411.11984v2 |
Authors (3): Jean-Francois Ton, Muhammad Faaiz Taufiq, Yang Liu
Large Language Models (LLMs) have shown impressive performance in complex reasoning tasks through the use of Chain-of-Thought (CoT) reasoning, allowing models to break down problems into manageable sub-tasks. However, existing CoT evaluation techniques either require annotated CoT data or fall short in accurately assessing intermediate reasoning steps, leading to high rates of false positives. In this paper, we formalize CoT reasoning in LLMs through an information-theoretic lens. Specifically, our framework quantifies the `information-gain’ at each reasoning step, enabling the identification of failure modes in LLMs without the need for expensive annotated datasets. We demonstrate the efficacy of our approach through extensive experiments on toy arithmetic, GSM8K and PRM800k datasets, where it significantly outperforms existing outcome-based methods by providing more accurate insights into model performance on individual subtasks.
nan
Article 1312
Title@2025-07-10 (4): Unsupervised Automata Learning via Discrete Optimization
Title: Unsupervised Automata Learning via Discrete Optimization | Unüberwachtes Automata-Lernen über Diskrete Optimierung | 通过 Discrete 优化化学习不受监督的自动自动数据 2303.14111v2 |
Authors (8): Simon Lutz, Daniil Kaminskyi, Florian Wittbold, Simon Dierl, Falk Howar, Barbara König, Emmanuel Müller, Daniel Neider
Automata learning is a successful tool for many application domains such as robotics and automatic verification. Typically, automata learning techniques operate in a supervised learning setting (active or passive) where they learn a finite state machine in contexts where additional information, such as labeled system executions, is available. However, other settings, such as learning from unlabeled data - an important aspect in machine learning - remain unexplored. To overcome this limitation, we propose a framework for learning a deterministic finite automaton (DFA) from a given multi-set of unlabeled words. We show that this problem is computationally hard and develop three learning algorithms based on constraint optimization. Moreover, we introduce novel regularization schemes for our optimization problems that improve the overall interpretability of our DFAs. Using a prototype implementation, we demonstrate practical feasibility in the context of unsupervised anomaly detection.
nan
Article 1313
Title@2025-07-10 (4): Learning Algorithms in the Limit
Title: Learning Algorithms in the Limit | Algorithmen lernen an der Grenze | 在限制范围内学习算法 2506.15543v2 |
Authors (2): Hristo Papazov, Nicolas Flammarion
This paper studies the problem of learning computable functions in the limit by extending Gold’s inductive inference framework to incorporate \textit{computational observations} and \textit{restricted input sources}. Complimentary to the traditional Input-Output Observations, we introduce Time-Bound Observations, and Policy-Trajectory Observations to study the learnability of general recursive functions under more realistic constraints. While input-output observations do not suffice for learning the class of general recursive functions in the limit, we overcome this learning barrier by imposing computational complexity constraints or supplementing with approximate time-bound observations. Further, we build a formal framework around observations of \textit{computational agents} and show that learning computable functions from policy trajectories reduces to learning rational functions from input and output, thereby revealing interesting connections to finite-state transducer inference. On the negative side, we show that computable or polynomial-mass characteristic sets cannot exist for the class of linear-time computable functions even for policy-trajectory observations.
nan
Article 1314
Title@2025-07-10 (4): Approximation Depth of Convex Polytopes
Title: Approximation Depth of Convex Polytopes | Näherungstiefe von Konvex-Polytopen | 电解多面的近似深度 2507.07779v1 |
Authors (3): Egor Bakaev, Florestan Brunck, Amir Yehudayoff
We study approximations of polytopes in the standard model for computing polytopes using Minkowski sums and (convex hulls of) unions. Specifically, we study the ability to approximate a target polytope by polytopes of a given depth. Our main results imply that simplices can only be trivially approximated''. On the way, we obtain a characterization of simplices as the only
outer additive’’ convex bodies.
nan
Article 1315
Title@2025-07-10 (4): Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Title: Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training | Aufgabenverhalten synchronisieren: Mehrere Aufgaben während der Test-Time-Schulung ausrichten | 同步任务行为: 测试时训练中对齐多个任务 2507.07778v1 |
Authors (4): Wooseong Jeong, Jegyeong Cho, Youngho Yoon, Kuk-Jin Yoon
Generalizing neural networks to unseen target domains is a significant challenge in real-world deployments. Test-time training (TTT) addresses this by using an auxiliary self-supervised task to reduce the domain gap caused by distribution shifts between the source and target. However, we find that when models are required to perform multiple tasks under domain shifts, conventional TTT methods suffer from unsynchronized task behavior, where the adaptation steps needed for optimal performance in one task may not align with the requirements of other tasks. To address this, we propose a novel TTT approach called Synchronizing Tasks for Test-time Training (S4T), which enables the concurrent handling of multiple tasks. The core idea behind S4T is that predicting task relations across domain shifts is key to synchronizing tasks during test time. To validate our approach, we apply S4T to conventional multi-task benchmarks, integrating it with traditional TTT protocols. Our empirical results show that S4T outperforms state-of-the-art TTT methods across various benchmarks.
nan
Article 1316
Title@2025-07-10 (4): Deep Learning is Not So Mysterious or Different
Title: Deep Learning is Not So Mysterious or Different | Deep Learning ist nicht so geheimnisvoll oder anders | 深深学习不是那么神秘或不同 2503.02113v2 |
Authors (1): Andrew Gordon Wilson
Deep neural networks are often seen as different from other model classes by defying conventional notions of generalization. Popular examples of anomalous generalization behaviour include benign overfitting, double descent, and the success of overparametrization. We argue that these phenomena are not distinct to neural networks, or particularly mysterious. Moreover, this generalization behaviour can be intuitively understood, and rigorously characterized, using long-standing generalization frameworks such as PAC-Bayes and countable hypothesis bounds. We present soft inductive biases as a key unifying principle in explaining these phenomena: rather than restricting the hypothesis space to avoid overfitting, embrace a flexible hypothesis space, with a soft preference for simpler solutions that are consistent with the data. This principle can be encoded in many model classes, and thus deep learning is not as mysterious or different from other model classes as it might seem. However, we also highlight how deep learning is relatively distinct in other ways, such as its ability for representation learning, phenomena such as mode connectivity, and its relative universality.
nan
Article 1317
Title@2025-07-10 (4): A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision
Title: A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision | Ein einheitliches empirisches Risikominimierungs-Framework für flexible N-Tuples Schwache Überwachung | 灵活N-Tuples弱监督统一经验风险最小化框架 2507.07771v1 |
Authors (4): Shuying Huang, Junpeng Li, Changchun Hua, Yana Yang
To alleviate the annotation burden in supervised learning, N-tuples learning has recently emerged as a powerful weakly-supervised method. While existing N-tuples learning approaches extend pairwise learning to higher-order comparisons and accommodate various real-world scenarios, they often rely on task-specific designs and lack a unified theoretical foundation. In this paper, we propose a general N-tuples learning framework based on empirical risk minimization, which systematically integrates pointwise unlabeled data to enhance learning performance. This paper first unifies the data generation processes of N-tuples and pointwise unlabeled data under a shared probabilistic formulation. Based on this unified view, we derive an unbiased empirical risk estimator that generalizes a broad class of existing N-tuples models. We further establish a generalization error bound for theoretical support. To demonstrate the flexibility of the framework, we instantiate it in four representative weakly supervised scenarios, each recoverable as a special case of our general model. Additionally, to address overfitting issues arising from negative risk terms, we adopt correction functions to adjust the empirical risk. Extensive experiments on benchmark datasets validate the effectiveness of the proposed framework and demonstrate that leveraging pointwise unlabeled data consistently improves generalization across various N-tuples learning tasks.
nan
Article 1318
Title@2025-07-10 (4): BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning
Title: BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning | BEAVER: Bauen von Umgebungen mit einschätzbarer Variation zur Bewertung von multi-objektiven Verstärkungslernen | BEAVER: 在环境建设中采用可评估的变数评估多目标强化学习 2507.07769v1 |
Authors (3): Ruohong Liu, Jack Umenberger, Yize Chen
Recent years have seen significant advancements in designing reinforcement learning (RL)-based agents for building energy management. While individual success is observed in simulated or controlled environments, the scalability of RL approaches in terms of efficiency and generalization across building dynamics and operational scenarios remains an open question. In this work, we formally characterize the generalization space for the cross-environment, multi-objective building energy management task, and formulate the multi-objective contextual RL problem. Such a formulation helps understand the challenges of transferring learned policies across varied operational contexts such as climate and heat convection dynamics under multiple control objectives such as comfort level and energy consumption. We provide a principled way to parameterize such contextual information in realistic building RL environments, and construct a novel benchmark to facilitate the evaluation of generalizable RL algorithms in practical building control tasks. Our results show that existing multi-objective RL methods are capable of achieving reasonable trade-offs between conflicting objectives. However, their performance degrades under certain environment variations, underscoring the importance of incorporating dynamics-dependent contextual information into the policy learning process.
nan
Article 1319
Title@2025-07-10 (4): TRIX- Trading Adversarial Fairness via Mixed Adversarial Training
Title: TRIX- Trading Adversarial Fairness via Mixed Adversarial Training | TRIX- Trading-Adversarial Fairness durch gemischte Adversarial Training | TRIX-通过混合反向培训进行贸易反向公平 2507.07768v1 |
Authors (3): Tejaswini Medi, Steffen Jung, Margret Keuper
Adversarial Training (AT) is a widely adopted defense against adversarial examples. However, existing approaches typically apply a uniform training objective across all classes, overlooking disparities in class-wise vulnerability. This results in adversarial unfairness: classes with well distinguishable features (strong classes) tend to become more robust, while classes with overlapping or shared features(weak classes) remain disproportionately susceptible to adversarial attacks. We observe that strong classes do not require strong adversaries during training, as their non-robust features are quickly suppressed. In contrast, weak classes benefit from stronger adversaries to effectively reduce their vulnerabilities. Motivated by this, we introduce TRIX, a feature-aware adversarial training framework that adaptively assigns weaker targeted adversaries to strong classes, promoting feature diversity via uniformly sampled targets, and stronger untargeted adversaries to weak classes, enhancing their focused robustness. TRIX further incorporates per-class loss weighting and perturbation strength adjustments, building on prior work, to emphasize weak classes during the optimization. Comprehensive experiments on standard image classification benchmarks, including evaluations under strong attacks such as PGD and AutoAttack, demonstrate that TRIX significantly improves worst-case class accuracy on both clean and adversarial data, reducing inter-class robustness disparities, and preserves overall accuracy. Our results highlight TRIX as a practical step toward fair and effective adversarial defense.
nan
Article 1320
Title@2025-07-10 (4): Distributed and Decentralised Training: Technical Governance Challenges in a Shifting AI Landscape
Title: Distributed and Decentralised Training: Technical Governance Challenges in a Shifting AI Landscape | Verteilte und dezentralisierte Ausbildung: Technische Governance-Herausforderungen in einer sich verändernden KI-Landschaft | 分散和分散化培训:AI 横向变化中的技术治理挑战 2507.07765v1 |
Authors (3): Jakub Kryś, Yashvardhan Sharma, Janet Egan
Advances in low-communication training algorithms are enabling a shift from centralised model training to compute setups that are either distributed across multiple clusters or decentralised via community-driven contributions. This paper distinguishes these two scenarios - distributed and decentralised training - which are little understood and often conflated in policy discourse. We discuss how they could impact technical AI governance through an increased risk of compute structuring, capability proliferation, and the erosion of detectability and shutdownability. While these trends foreshadow a possible new paradigm that could challenge key assumptions of compute governance, we emphasise that certain policy levers, like export controls, remain relevant. We also acknowledge potential benefits of decentralised AI, including privacy-preserving training runs that could unlock access to more data, and mitigating harmful power concentration. Our goal is to support more precise policymaking around compute, capability proliferation, and decentralised AI development.
nan
Article 1321
Title@2025-07-10 (4): OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting
Title: OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting | OPC: One-Point-Contraction Unlearning Toward Deep Feature Vergessen | OPC: 一点-合同拆开学习深地地貌的遗忘 2507.07754v1 |
Authors (4): Jaeheun Jung, Bosung Jung, Suhyun Bae, Donghun Lee
Machine unlearning seeks to remove the influence of particular data or class from trained models to meet privacy, legal, or ethical requirements. Existing unlearning methods tend to forget shallowly: phenomenon of an unlearned model pretend to forget by adjusting only the model response, while its internal representations retain information sufficiently to restore the forgotten data or behavior. We empirically confirm the widespread shallowness by reverting the forgetting effect of various unlearning methods via training-free performance recovery attack and gradient-inversion-based data reconstruction attack. To address this vulnerability fundamentally, we define a theoretical criterion of ``deep forgetting’’ based on one-point-contraction of feature representations of data to forget. We also propose an efficient approximation algorithm, and use it to construct a novel general-purpose unlearning algorithm: One-Point-Contraction (OPC). Empirical evaluations on image classification unlearning benchmarks show that OPC achieves not only effective unlearning performance but also superior resilience against both performance recovery attack and gradient-inversion attack. The distinctive unlearning performance of OPC arises from the deep feature forgetting enforced by its theoretical foundation, and recaps the need for improved robustness of machine unlearning methods.
nan
Article 1322
Title@2025-07-10 (4): Efficient and Scalable Estimation of Distributional Treatment Effects with Multi-Task Neural Networks
Title: Efficient and Scalable Estimation of Distributional Treatment Effects with Multi-Task Neural Networks | Effiziente und skalierbare Abschätzung der Verteilungseffekte mit multi-Task Neuronalen Netzwerken | 与多任务神经神经网络一道高效和可缩放地估算分布式治疗效应 2507.07738v1 |
Authors (5): Tomu Hirata, Undral Byambadalai, Tatsushi Oka, Shota Yasui, Shingo Uto
We propose a novel multi-task neural network approach for estimating distributional treatment effects (DTE) in randomized experiments. While DTE provides more granular insights into the experiment outcomes over conventional methods focusing on the Average Treatment Effect (ATE), estimating it with regression adjustment methods presents significant challenges. Specifically, precision in the distribution tails suffers due to data imbalance, and computational inefficiencies arise from the need to solve numerous regression problems, particularly in large-scale datasets commonly encountered in industry. To address these limitations, our method leverages multi-task neural networks to estimate conditional outcome distributions while incorporating monotonic shape constraints and multi-threshold label learning to enhance accuracy. To demonstrate the practical effectiveness of our proposed method, we apply our method to both simulated and real-world datasets, including a randomized field experiment aimed at reducing water consumption in the US and a large-scale A/B test from a leading streaming platform in Japan. The experimental results consistently demonstrate superior performance across various datasets, establishing our method as a robust and practical solution for modern causal inference applications requiring a detailed understanding of treatment effect heterogeneity.
nan
Article 1323
Title@2025-07-10 (4): GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing
Title: GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing | GuardVal: Dynamic Large Language Model Jailbreak Evaluation für umfassende Sicherheitstests | 警卫:综合安全测试动态大语言示范监狱防爆评价 2507.07735v1 |
Authors (4): Peiyan Zhang, Haibo Jin, Liying Kang, Haohan Wang
Jailbreak attacks reveal critical vulnerabilities in Large Language Models (LLMs) by causing them to generate harmful or unethical content. Evaluating these threats is particularly challenging due to the evolving nature of LLMs and the sophistication required in effectively probing their vulnerabilities. Current benchmarks and evaluation methods struggle to fully address these challenges, leaving gaps in the assessment of LLM vulnerabilities. In this paper, we review existing jailbreak evaluation practices and identify three assumed desiderata for an effective jailbreak evaluation protocol. To address these challenges, we introduce GuardVal, a new evaluation protocol that dynamically generates and refines jailbreak prompts based on the defender LLM’s state, providing a more accurate assessment of defender LLMs’ capacity to handle safety-critical situations. Moreover, we propose a new optimization method that prevents stagnation during prompt refinement, ensuring the generation of increasingly effective jailbreak prompts that expose deeper weaknesses in the defender LLMs. We apply this protocol to a diverse set of models, from Mistral-7b to GPT-4, across 10 safety domains. Our findings highlight distinct behavioral patterns among the models, offering a comprehensive view of their robustness. Furthermore, our evaluation process deepens the understanding of LLM behavior, leading to insights that can inform future research and drive the development of more secure models.
nan
Article 1324
Title@2025-07-10 (4): Robust Federated Personalised Mean Estimation for the Gaussian Mixture Model
Title: Robust Federated Personalised Mean Estimation for the Gaussian Mixture Model | Robuste, federführende, personalisierte mittlere Schätzung für das Gaussian Mixture Model | Gaussian Mixture 模型的联邦硬性个人化平均平均估计值 2504.19955v2 |
Authors (3): Malhar A. Managoli, Vinod M. Prabhakaran, Suhas Diggavi
Federated learning with heterogeneous data and personalization has received significant recent attention. Separately, robustness to corrupted data in the context of federated learning has also been studied. In this paper we explore combining personalization for heterogeneous data with robustness, where a constant fraction of the clients are corrupted. Motivated by this broad problem, we formulate a simple instantiation which captures some of its difficulty. We focus on the specific problem of personalized mean estimation where the data is drawn from a Gaussian mixture model. We give an algorithm whose error depends almost linearly on the ratio of corrupted to uncorrupted samples, and show a lower bound with the same behavior, albeit with a gap of a constant factor.
nan
Article 1325
Title@2025-07-10 (4): Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization
Title: Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization | Stabile Preference-Optimierung für LLMs: Ein zweistufiger Ansatz über die direkte Preference-Optimierung hinaus | 对LLLMM公司的稳定优惠优化:超越直接优惠优化的双级办法 2507.07723v1 |
Authors (4): Chengtao Jian, Kai Yang, Ye Ouyang, Xiaozhou Ye
Direct Preference Optimization (DPO) has emerged as a popular and efficient alternative to reward modeling and reinforcement learning for aligning language models with human preferences. Despite its empirical success, the theoretical properties and intrinsic limitations of DPO remain underexplored. In this work, we first present a comprehensive analysis of DPO’s dynamics from a probability evolution perspective. Our analysis reveals that DPO is highly sensitive to initialization. It also tends to misallocate probability mass, which can inadvertently shift probability toward irrelevant or undesired responses. This misallocation may unintentionally reinforce model bias, thereby compromising both the stability of model alignment and the consistency with intended preferences. Motivated by these theoretical findings, we propose a theoretically grounded bilevel optimization framework that tightly integrate supervised fine-tuning with an enhanced DPO objective a.k.a. stable preference optimization. Our approach introduces a principled regularization scheme to explicitly encourage absolute probability improvement for preferred outputs, while maintaining stable optimization dynamics. Experiments on challenging reasoning and summarization benchmarks elucidate that our method consistently improves reasoning accuracy and better aligns output distributions with intended preferences, outperforming standard DPO. Stable preference optimization provides new insights into the design of preference-based alignment objectives and opens up new avenues towards more reliable and interpretable language model alignment.
nan
Article 1326
Title@2025-07-10 (4): Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means
Title: Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means | Robuste Verteilte Schätzung: Erweiterung von Gossip-Algorithmen auf Rangfolge und Trimmmittel | 强有力的分布分布式估算:将Gossip的数值扩大至排名和缩略语 2505.17836v6 |
Authors (3): Anna Van Elst, Igor Colin, Stephan Clémençon
This paper addresses the problem of robust estimation in gossip algorithms over arbitrary communication graphs. Gossip algorithms are fully decentralized, relying only on local neighbor-to-neighbor communication, making them well-suited for situations where communication is constrained. A fundamental challenge in existing mean-based gossip algorithms is their vulnerability to malicious or corrupted nodes. In this paper, we show that an outlier-robust mean can be computed by globally estimating a robust statistic. More specifically, we propose a novel gossip algorithm for rank estimation, referred to as \textsc{GoRank}, and leverage it to design a gossip procedure dedicated to trimmed mean estimation, coined \textsc{GoTrim}. In addition to a detailed description of the proposed methods, a key contribution of our work is a precise convergence analysis: we establish an $\mathcal{O}(1/t)$ rate for rank estimation and an $\mathcal{O}(1 / {t})$ rate for trimmed mean estimation, where by $t$ is meant the number of iterations. Moreover, we provide a breakdown point analysis of \textsc{GoTrim}. We empirically validate our theoretical results through experiments on diverse network topologies, data distributions and contamination schemes.
nan
Article 1327
Title@2025-07-10 (4): Discrete Optimal Transport and Voice Conversion
Title: Discrete Optimal Transport and Voice Conversion | Diskreter Optimaler Transport und Sprachumwandlung | 分辨最佳传输和语音转换 2505.04382v2 |
Authors (2): Anton Selitskiy, Maitreya Kocharekar
In this work, we address the voice conversion (VC) task using a vector-based interface. To align audio embeddings between speakers, we employ discrete optimal transport mapping. Our evaluation results demonstrate the high quality and effectiveness of this method. Additionally, we show that applying discrete optimal transport as a post-processing step in audio generation can lead to the incorrect classification of synthetic audio as real.
nan
Article 1328
Title@2025-07-10 (4): Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots
Title: Adaptive Gaussian Mixture Models-based Anomaly Detection for under-constrained Cable-Driven Parallel Robots | Adaptive Gaussian Mixture Models-basierte Anomalieerkennung für unterbeschränkte kabelgetriebene Parallelroboter | 用于控制不足的有线驱动平行机器人的适应性高斯混合混合模型异常探测 2507.07714v1 |
Authors (6): Julio Garrido, Javier Vales, Diego Silva-Muñiz, Enrique Riveiro, Pablo López-Matencio, Josué Rivera-Andrade
Cable-Driven Parallel Robots (CDPRs) are increasingly used for load manipulation tasks involving predefined toolpaths with intermediate stops. At each stop, where the platform maintains a fixed pose and the motors keep the cables under tension, the system must evaluate whether it is safe to proceed by detecting anomalies that could compromise performance (e.g., wind gusts or cable impacts). This paper investigates whether anomalies can be detected using only motor torque data, without additional sensors. It introduces an adaptive, unsupervised outlier detection algorithm based on Gaussian Mixture Models (GMMs) to identify anomalies from torque signals. The method starts with a brief calibration period, just a few seconds, during which a GMM is fit on known anomaly-free data. Real-time torque measurements are then evaluated using Mahalanobis distance from the GMM, with statistically derived thresholds triggering anomaly flags. Model parameters are periodically updated using the latest segments identified as anomaly-free to adapt to changing conditions. Validation includes 14 long-duration test sessions simulating varied wind intensities. The proposed method achieves a 100% true positive rate and 95.4% average true negative rate, with 1-second detection latency. Comparative evaluation against power threshold and non-adaptive GMM methods indicates higher robustness to drift and environmental variation.
nan
Article 1329
Title@2025-07-10 (4): Balancing the Past and Present: A Coordinated Replay Framework for Federated Class-Incremental Learning
Title: Balancing the Past and Present: A Coordinated Replay Framework for Federated Class-Incremental Learning | Ausbalancieren der Vergangenheit und Gegenwart: Ein koordiniertes Replay-Framework für das Federated Class-Incremental Learning | 平衡过去和现在的平衡:联邦级强化学习协调重现框架 2507.07712v1 |
Authors (3): Zhuang Qi, Lei Meng, Han Yu
Federated Class Incremental Learning (FCIL) aims to collaboratively process continuously increasing incoming tasks across multiple clients. Among various approaches, data replay has become a promising solution, which can alleviate forgetting by reintroducing representative samples from previous tasks. However, their performance is typically limited by class imbalance, both within the replay buffer due to limited global awareness and between replayed and newly arrived classes. To address this issue, we propose a class wise balancing data replay method for FCIL (FedCBDR), which employs a global coordination mechanism for class-level memory construction and reweights the learning objective to alleviate the aforementioned imbalances. Specifically, FedCBDR has two key components: 1) the global-perspective data replay module reconstructs global representations of prior task in a privacy-preserving manner, which then guides a class-aware and importance-sensitive sampling strategy to achieve balanced replay; 2) Subsequently, to handle class imbalance across tasks, the task aware temperature scaling module adaptively adjusts the temperature of logits at both class and instance levels based on task dynamics, which reduces the model’s overconfidence in majority classes while enhancing its sensitivity to minority classes. Experimental results verified that FedCBDR achieves balanced class-wise sampling under heterogeneous data distributions and improves generalization under task imbalance between earlier and recent tasks, yielding a 2%-15% Top-1 accuracy improvement over six state-of-the-art methods.
nan
Article 1330
Title@2025-07-10 (4): Shapley-Based Data Valuation with Mutual Information: A Key to Modified K-Nearest Neighbors
Title: Shapley-Based Data Valuation with Mutual Information: A Key to Modified K-Nearest Neighbors | Shapley-based Data Valuation mit gegenseitiger Information: Ein Schlüssel zu veränderten K-Nächsten Nachbarn | 与相互信息一起进行基于虚光的数据估值:修改 K- 最近邻的密钥 2312.01991v4 |
Authors (4): Mohammad Ali Vahedifar, Azim Akhtarshenas, Mohammad Mohammadi Rafatpanah, Maryam Sabbaghian
The K-Nearest Neighbors (KNN) algorithm is widely used for classification and regression; however, it suffers from limitations, including the equal treatment of all samples. We propose Information-Modified KNN (IM-KNN), a novel approach that leverages Mutual Information ($I$) and Shapley values to assign weighted values to neighbors, thereby bridging the gap in treating all samples with the same value and weight. On average, IM-KNN improves the accuracy, precision, and recall of traditional KNN by 16.80%, 17.08%, and 16.98%, respectively, across 12 benchmark datasets. Experiments on four large-scale datasets further highlight IM-KNN’s robustness to noise, imbalanced data, and skewed distributions.
nan
Article 1331
Title@2025-07-10 (4): Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought
Title: Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought | Rationale-Enhanced Decodierung für multimodale Chain-of-Thought | 多式联运谈判链附加说明 2507.07685v1 |
Authors (3): Shin’ya Yamaguchi, Kosuke Nishida, Daiki Chijiwa
Large vision-language models (LVLMs) have demonstrated remarkable capabilities by integrating pre-trained vision encoders with large language models (LLMs). Similar to single-modal LLMs, chain-of-thought (CoT) prompting has been adapted for LVLMs to enhance multi-modal reasoning by generating intermediate rationales based on visual and textual inputs. While CoT is assumed to improve grounding and accuracy in LVLMs, our experiments reveal a key challenge: existing LVLMs often ignore the contents of generated rationales in CoT reasoning. To address this, we re-formulate multi-modal CoT reasoning as a KL-constrained reward maximization focused on rationale-conditional log-likelihood. As the optimal solution, we propose rationale-enhanced decoding (RED), a novel plug-and-play inference-time decoding strategy. RED harmonizes visual and rationale information by multiplying distinct image-conditional and rationale-conditional next token distributions. Extensive experiments show that RED consistently and significantly improves reasoning over standard CoT and other decoding methods across multiple benchmarks and LVLMs. Our work offers a practical and effective approach to improve both the faithfulness and accuracy of CoT reasoning in LVLMs, paving the way for more reliable rationale-grounded multi-modal systems.
nan
Article 1332
Title@2025-07-10 (4): Accelerating Transposed Convolutions on FPGA-based Edge Devices
Title: Accelerating Transposed Convolutions on FPGA-based Edge Devices | Beschleunigung transponierter Konvolutionen auf FPGA-basierten Edge-Geräten | 加速基于 FPGA 的边缘设备的转换变速 2507.07683v1 |
Authors (2): Jude Haris, José Cano
Transposed Convolutions (TCONV) enable the up-scaling mechanism within generative Artificial Intelligence (AI) models. However, the predominant Input-Oriented Mapping (IOM) method for implementing TCONV has complex output mapping, overlapping sums, and ineffectual computations. These inefficiencies further exacerbate the performance bottleneck of TCONV and generative models on resource-constrained edge devices. To address this problem, in this paper we propose MM2IM, a hardware-software co-designed accelerator that combines Matrix Multiplication (MatMul) with col2IM to process TCONV layers on resource-constrained edge devices efficiently. Using the SECDA-TFLite design toolkit, we implement MM2IM and evaluate its performance across 261 TCONV problem configurations, achieving an average speedup of 1.9x against a dual-thread ARM Neon optimized CPU baseline. We then evaluate the performance of MM2IM on a range of TCONV layers from well-known generative models achieving up to 4.2x speedup, and compare it against similar resource-constrained TCONV accelerators, outperforming them by at least 2x GOPs/DSP. Finally, we evaluate MM2IM on the DCGAN and pix2pix GAN models, achieving up to 3x speedup and 2.4x energy reduction against the CPU baseline.
nan
Article 1333
Title@2025-07-10 (4): Beyond Cox Models: Assessing the Performance of Machine-Learning Methods in Non-Proportional Hazards and Non-Linear Survival Analysis
Title: Beyond Cox Models: Assessing the Performance of Machine-Learning Methods in Non-Proportional Hazards and Non-Linear Survival Analysis | Jenseits von Cox-Modellen: Bewertung der Leistungsfähigkeit von Machine-Learning-Methoden bei nichtproportionalen Gefahren und nichtlinearer Überlebensanalyse | 超越考克斯模型:评估机器学习方法在非季节性危险和无林性生存分析方面的性能 2504.17568v2 |
Authors (6): Ivan Rossi, Flavio Sartori, Cesare Rollo, Giovanni Birolo, Piero Fariselli, Tiziana Sanavia
Survival analysis often relies on Cox models, assuming both linearity and proportional hazards (PH). This study evaluates machine and deep learning methods that relax these constraints, comparing their performance with penalized Cox models on a benchmark of three synthetic and three real datasets. In total, eight different models were tested, including six non-linear models of which four were also non-PH. Although Cox regression often yielded satisfactory performance, we showed the conditions under which machine and deep learning models can perform better. Indeed, the performance of these methods has often been underestimated due to the improper use of Harrell’s concordance index (C-index) instead of more appropriate scores such as Antolini’s concordance index, which generalizes C-index in cases where the PH assumption does not hold. In addition, since occasionally high C-index models happen to be badly calibrated, combining Antolini’s C-index with Brier’s score is useful to assess the overall performance of a survival method. Results on our benchmark data showed that survival prediction should be approached by testing different methods to select the most appropriate one according to sample size, non-linearity and non-PH conditions. To allow an easy reproducibility of these tests on our benchmark data, code and documentation are freely available at https://github.com/compbiomed-unito/survhive.
nan
Article 1334
Title@2025-07-10 (4): Implicit Counterfactual Data Augmentation for Robust Learning
Title: Implicit Counterfactual Data Augmentation for Robust Learning | Implizite gegenfaktische Datenvergrößerung für robustes Lernen | 强力学习所需的反事实数据放大 2304.13431v4 |
Authors (3): Xiaoling Zhou, Ou Wu, Michael K. Ng
Machine learning models are prone to capturing the spurious correlations between non-causal attributes and classes, with counterfactual data augmentation being a promising direction for breaking these spurious associations. However, generating counterfactual data explicitly poses a challenge, and incorporating augmented data into the training process decreases training efficiency. This study proposes an Implicit Counterfactual Data Augmentation (ICDA) method to remove spurious correlations and make stable predictions. Specifically, first, a novel sample-wise augmentation strategy is developed that generates semantically and counterfactually meaningful deep features with distinct augmentation strength for each sample. Second, we derive an easy-to-compute surrogate loss on the augmented feature set when the number of augmented samples becomes infinite. Third, two concrete schemes are proposed, including direct quantification and meta-learning, to derive the key parameters for the robust loss. In addition, ICDA is explained from a regularization perspective, revealing its capacity to improve intra-class compactness and augment margins at both class and sample levels. Extensive experiments have been conducted across various biased learning scenarios covering both image and text datasets, demonstrating that ICDA consistently enhances the generalization and robustness performance of popular networks.
nan
Article 1335
Title@2025-07-10 (4): Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks
Title: Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks | Einige theoretische Ergebnisse auf schichtweise Effektive Dimensions-Oszillationen in Finite-Wide-ReLU-Netzwerken | 关于有限宽度 RELU 网络中多层有效尺寸振动的一些理论结果 2507.07675v1 |
Authors (1): Darshan Makwana
We analyze the layerwise effective dimension (rank of the feature matrix) in fully-connected ReLU networks of finite width. Specifically, for a fixed batch of $m$ inputs and random Gaussian weights, we derive closed-form expressions for the expected rank of the $m\times n$ hidden activation matrices. Our main result shows that $\mathbb{E}[EDim(\ell)]=m[1-(1-2/\pi)^\ell]+O(e^{-c m})$ so that the rank deficit decays geometrically with ratio $1-2 / \pi \approx 0.3634$. We also prove a sub-Gaussian concentration bound, and identify the “revival” depths at which the expected rank attains local maxima. In particular, these peaks occur at depths $\ell_k^*\approx(k+1/2)\pi/\log(1/\rho)$ with height $\approx (1-e^{-\pi/2}) m \approx 0.79m$. We further show that this oscillatory rank behavior is a finite-width phenomenon: under orthogonal weight initialization or strong negative-slope leaky-ReLU, the rank remains (nearly) full. These results provide a precise characterization of how random ReLU layers alternately collapse and partially revive the subspace of input variations, adding nuance to prior work on expressivity of deep networks.
nan
Article 1336
Title@2025-07-10 (4): Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL
Title: Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL | Uncovering RL Integration in SSL Loss: Zielspezifische Implikationen für dateneffiziente RL | SSL损失中未覆盖的 RL 整合:对数据高效RL的客观具体影响 2410.17428v3 |
Authors (2): Ömer Veysel Çağatan, Barış Akgün
In this study, we investigate the effect of SSL objective modifications within the SPR framework, focusing on specific adjustments such as terminal state masking and prioritized replay weighting, which were not explicitly addressed in the original design. While these modifications are specific to RL, they are not universally applicable across all RL algorithms. Therefore, we aim to assess their impact on performance and explore other SSL objectives that do not accommodate these adjustments like Barlow Twins and VICReg. We evaluate six SPR variants on the Atari 100k benchmark, including versions both with and without these modifications. Additionally, we test the performance of these objectives on the DeepMind Control Suite, where such modifications are absent. Our findings reveal that incorporating specific SSL modifications within SPR significantly enhances performance, and this influence extends to subsequent frameworks like SR-SPR and BBF, highlighting the critical importance of SSL objective selection and related adaptations in achieving data efficiency in self-predictive reinforcement learning.
nan
Article 1337
Title@2025-07-10 (4): Curriculum Negative Mining For Temporal Networks
Title: Curriculum Negative Mining For Temporal Networks | Curriculum Negative Mining für zeitliche Netzwerke | 时间网络负面采矿课程 2407.17070v2 |
Authors (3): Ziyue Chen, Tongya Zheng, Mingli Song
Temporal networks are effective in capturing the evolving interactions of networks over time, such as social networks and e-commerce networks. In recent years, researchers have primarily concentrated on developing specific model architectures for Temporal Graph Neural Networks (TGNNs) in order to improve the representation quality of temporal nodes and edges. However, limited attention has been given to the quality of negative samples during the training of TGNNs. When compared with static networks, temporal networks present two specific challenges for negative sampling: positive sparsity and positive shift. Positive sparsity refers to the presence of a single positive sample amidst numerous negative samples at each timestamp, while positive shift relates to the variations in positive samples across different timestamps. To robustly address these challenges in training TGNNs, we introduce Curriculum Negative Mining (CurNM), a model-aware curriculum learning framework that adaptively adjusts the difficulty of negative samples. Within this framework, we first establish a dynamically updated negative pool that balances random, historical, and hard negatives to address the challenges posed by positive sparsity. Secondly, we implement a temporal-aware negative selection module that focuses on learning from the disentangled factors of recently active edges, thus accurately capturing shifting preferences. Finally, the selected negatives are combined with annealing random negatives to support stable training. Extensive experiments on 12 datasets and 3 TGNNs demonstrate that our method outperforms baseline methods by a significant margin. Additionally, thorough ablation studies and parameter sensitivity experiments verify the usefulness and robustness of our approach.
nan
Article 1338
Title@2025-07-10 (4): Machine Learning-Assisted Surrogate Modeling with Multi-Objective Optimization and Decision-Making of a Steam Methane Reforming Reactor
Title: Machine Learning-Assisted Surrogate Modeling with Multi-Objective Optimization and Decision-Making of a Steam Methane Reforming Reactor | Machine Learning-Assisted Surrogate Modellierung mit multi-objektiver Optimierung und Entscheidungsfindung eines Dampfmethan-Reformreaktors | 利用蒸气甲烷改造反应堆的多目标优化和决策 2507.07641v1 |
Authors (3): Seyed Reza Nabavi, Zonglin Guo, Zhiyuan Wang
This study presents an integrated modeling and optimization framework for a steam methane reforming (SMR) reactor, combining a mathematical model, artificial neural network (ANN)-based hybrid modeling, advanced multi-objective optimization (MOO) and multi-criteria decision-making (MCDM) techniques. A one-dimensional fixed-bed reactor model accounting for internal mass transfer resistance was employed to simulate reactor performance. To reduce the high computational cost of the mathematical model, a hybrid ANN surrogate was constructed, achieving a 93.8% reduction in average simulation time while maintaining high predictive accuracy. The hybrid model was then embedded into three MOO scenarios using the non-dominated sorting genetic algorithm II (NSGA-II) solver: 1) maximizing methane conversion and hydrogen output; 2) maximizing hydrogen output while minimizing carbon dioxide emissions; and 3) a combined three-objective case. The optimal trade-off solutions were further ranked and selected using two MCDM methods: technique for order of preference by similarity to ideal solution (TOPSIS) and simplified preference ranking on the basis of ideal-average distance (sPROBID). Optimal results include a methane conversion of 0.863 with 4.556 mol/s hydrogen output in the first case, and 0.988 methane conversion with 3.335 mol/s hydrogen and 0.781 mol/s carbon dioxide in the third. This comprehensive methodology offers a scalable and effective strategy for optimizing complex catalytic reactor systems with multiple, often conflicting, objectives.
nan
Article 1339
Title@2025-07-10 (4): HLF-FSL. A Decentralized Federated Split Learning Solution for IoT on Hyperledger Fabric
Title: HLF-FSL. A Decentralized Federated Split Learning Solution for IoT on Hyperledger Fabric | HLF-FSL. Eine dezentrale, gefederte Split-Learning-Lösung für IoT auf Hyperledger Fabric | HLF-FLF-FLF. 关于超板机纤维化的IOT的分散化的联邦学习分解解决方案 2507.07637v1 |
Authors (5): Carlos Beis Penedo, Rebeca P. Díaz Redondo, Ana Fernández Vilas, Manuel Fernández Veiga, Francisco Troncoso Pastoriza
Collaborative machine learning in sensitive domains demands scalable, privacy preserving solutions for enterprise deployment. Conventional Federated Learning (FL) relies on a central server, introducing single points of failure and privacy risks, while Split Learning (SL) partitions models for privacy but scales poorly due to sequential training. We present a decentralized architecture that combines Federated Split Learning (FSL) with the permissioned blockchain Hyperledger Fabric (HLF). Our chaincode orchestrates FSL’s split model execution and peer-to-peer aggregation without any central coordinator, leveraging HLF’s transient fields and Private Data Collections (PDCs) to keep raw data and model activations private. On CIFAR-10 and MNIST benchmarks, HLF-FSL matches centralized FSL accuracy while reducing per epoch training time compared to Ethereum-based works. Performance and scalability tests show minimal blockchain overhead and preserved accuracy, demonstrating enterprise grade viability.
nan
Article 1340
Title@2025-07-10 (4): Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights
Title: Comparative sentiment analysis of public perception: Monkeypox vs. COVID-19 behavioral insights | Vergleichende Stimmungsanalyse der öffentlichen Wahrnehmung: Monkeypox vs. COVID-19 Verhaltenseinblicke | 对公众感知的比较情绪分析:天花对COVID-19行为洞察力 2505.07430v2 |
Authors (3): Mostafa Mohaimen Akand Faisal, Rabeya Amin Jhuma, Jamini Jasim
The emergence of global health crises, such as COVID-19 and Monkeypox (mpox), has underscored the importance of understanding public sentiment to inform effective public health strategies. This study conducts a comparative sentiment analysis of public perceptions surrounding COVID-19 and mpox by leveraging extensive datasets of 147,475 and 106,638 tweets, respectively. Advanced machine learning models, including Logistic Regression, Naive Bayes, RoBERTa, DistilRoBERTa and XLNet, were applied to perform sentiment classification, with results indicating key trends in public emotion and discourse. The analysis highlights significant differences in public sentiment driven by disease characteristics, media representation, and pandemic fatigue. Through the lens of sentiment polarity and thematic trends, this study offers valuable insights into tailoring public health messaging, mitigating misinformation, and fostering trust during concurrent health crises. The findings contribute to advancing sentiment analysis applications in public health informatics, setting the groundwork for enhanced real-time monitoring and multilingual analysis in future research.
nan
Article 1341
Title@2025-07-10 (4): Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks
Title: Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks | Erforschung der Grenzen der Modellkompression in LLMs: Eine Studie zur Wissensdestillation über QA-Aufgaben | 探索LLMM中模型压缩的限度:关于质量保证任务的知识积累研究 2507.07630v1 |
Authors (4): Joyeeta Datta, Niclas Doll, Qusai Ramadan, Zeyd Boukhers
Large Language Models (LLMs) have demonstrated outstanding performance across a range of NLP tasks, however, their computational demands hinder their deployment in real-world, resource-constrained environments. This work investigates the extent to which LLMs can be compressed using Knowledge Distillation (KD) while maintaining strong performance on Question Answering (QA) tasks. We evaluate student models distilled from the Pythia and Qwen2.5 families on two QA benchmarks, SQuAD and MLQA, under zero-shot and one-shot prompting conditions. Results show that student models retain over 90% of their teacher models’ performance while reducing parameter counts by up to 57.1%. Furthermore, one-shot prompting yields additional performance gains over zero-shot setups for both model families. These findings underscore the trade-off between model efficiency and task performance, demonstrating that KD, combined with minimal prompting, can yield compact yet capable QA systems suitable for resource-constrained applications.
nan
Article 1342
Title@2025-07-10 (4): TransformEEG: Towards Improving Model Generalizability in Deep Learning-based EEG Parkinson’s Disease Detection
Title: TransformEEG: Towards Improving Model Generalizability in Deep Learning-based EEG Parkinson’s Disease Detection | TransformEEG: Auf dem Weg zur Verbesserung des Modells Generalizability in Deep Learning-based EEG Parkinson’s Disease Detection | TerverEEEG:努力改进深学习性EEG Parkinson疾病检测模式 2507.07622v1 |
Authors (10): Federico Del Pup, Riccardo Brun, Filippo Iotti, Edoardo Paccagnella, Mattia Pezzato, Sabrina Bertozzo, Andrea Zanola, Louis Fabrice Tshimanga, Henning Müller, Manfredo Atzori
Electroencephalography (EEG) is establishing itself as an important, low-cost, noninvasive diagnostic tool for the early detection of Parkinson’s Disease (PD). In this context, EEG-based Deep Learning (DL) models have shown promising results due to their ability to discover highly nonlinear patterns within the signal. However, current state-of-the-art DL models suffer from poor generalizability caused by high inter-subject variability. This high variability underscores the need for enhancing model generalizability by developing new architectures better tailored to EEG data. This paper introduces TransformEEG, a hybrid Convolutional-Transformer designed for Parkinson’s disease detection using EEG data. Unlike transformer models based on the EEGNet structure, TransformEEG incorporates a depthwise convolutional tokenizer. This tokenizer is specialized in generating tokens composed by channel-specific features, which enables more effective feature mixing within the self-attention layers of the transformer encoder. To evaluate the proposed model, four public datasets comprising 290 subjects (140 PD patients, 150 healthy controls) were harmonized and aggregated. A 10-outer, 10-inner Nested-Leave-N-Subjects-Out (N-LNSO) cross-validation was performed to provide an unbiased comparison against seven other consolidated EEG deep learning models. TransformEEG achieved the highest balanced accuracy’s median (78.45%) as well as the lowest interquartile range (6.37%) across all the N-LNSO partitions. When combined with data augmentation and threshold correction, median accuracy increased to 80.10%, with an interquartile range of 5.74%. In conclusion, TransformEEG produces more consistent and less skewed results. It demonstrates a substantial reduction in variability and more reliable PD detection using EEG data compared to the other investigated models.
nan
Article 1343
Title@2025-07-10 (4): Sparse Causal Discovery with Generative Intervention for Unsupervised Graph Domain Adaptation
Title: Sparse Causal Discovery with Generative Intervention for Unsupervised Graph Domain Adaptation | Sparse Causal Discovery mit generativer Intervention für unüberwachte Graphen-Domänenanpassung | 以未受监督的图形域适应的生成干预生成的简单原因发现 2507.07621v1 |
Authors (9): Junyu Luo, Yuhao Tang, Yiwei Fu, Xiao Luo, Zhizhuo Kou, Zhiping Xiao, Wei Ju, Wentao Zhang, Ming Zhang
Unsupervised Graph Domain Adaptation (UGDA) leverages labeled source domain graphs to achieve effective performance in unlabeled target domains despite distribution shifts. However, existing methods often yield suboptimal results due to the entanglement of causal-spurious features and the failure of global alignment strategies. We propose SLOGAN (Sparse Causal Discovery with Generative Intervention), a novel approach that achieves stable graph representation transfer through sparse causal modeling and dynamic intervention mechanisms. Specifically, SLOGAN first constructs a sparse causal graph structure, leveraging mutual information bottleneck constraints to disentangle sparse, stable causal features while compressing domain-dependent spurious correlations through variational inference. To address residual spurious correlations, we innovatively design a generative intervention mechanism that breaks local spurious couplings through cross-domain feature recombination while maintaining causal feature semantic consistency via covariance constraints. Furthermore, to mitigate error accumulation in target domain pseudo-labels, we introduce a category-adaptive dynamic calibration strategy, ensuring stable discriminative learning. Extensive experiments on multiple real-world datasets demonstrate that SLOGAN significantly outperforms existing baselines.
nan
Article 1344
Title@2025-07-10 (4): Sparse Self-Federated Learning for Energy Efficient Cooperative Intelligence in Society 5.0
Title: Sparse Self-Federated Learning for Energy Efficient Cooperative Intelligence in Society 5.0 | Sparse Selbstgebundenes Lernen für energieeffiziente kooperative Intelligenz in der Gesellschaft 5.0 | 社会节能合作情报学会 2507.07613v1 |
Authors (7): Davide Domini, Laura Erhan, Gianluca Aguzzi, Lucia Cavallaro, Amirhossein Douzandeh Zenoozi, Antonio Liotta, Mirko Viroli
Federated Learning offers privacy-preserving collaborative intelligence but struggles to meet the sustainability demands of emerging IoT ecosystems necessary for Society 5.0-a human-centered technological future balancing social advancement with environmental responsibility. The excessive communication bandwidth and computational resources required by traditional FL approaches make them environmentally unsustainable at scale, creating a fundamental conflict with green AI principles as billions of resource-constrained devices attempt to participate. To this end, we introduce Sparse Proximity-based Self-Federated Learning (SParSeFuL), a resource-aware approach that bridges this gap by combining aggregate computing for self-organization with neural network sparsification to reduce energy and bandwidth consumption.
nan
Article 1345
Title@2025-07-10 (4): S2FGL: Spatial Spectral Federated Graph Learning
Title: S2FGL: Spatial Spectral Federated Graph Learning | S2FGL: Raumspektrales Federiertes Graphenlernen | S2FGL: 空间光谱联邦图表学习 2507.02409v2 |
Authors (6): Zihan Tan, Suyuan Huang, Guancheng Wan, Wenke Huang, He Li, Mang Ye
Federated Graph Learning (FGL) combines the privacy-preserving capabilities of federated learning (FL) with the strong graph modeling capability of Graph Neural Networks (GNNs). Current research addresses subgraph-FL only from the structural perspective, neglecting the propagation of graph signals on spatial and spectral domains of the structure. From a spatial perspective, subgraph-FL introduces edge disconnections between clients, leading to disruptions in label signals and a degradation in the class knowledge of the global GNN. From a spectral perspective, spectral heterogeneity causes inconsistencies in signal frequencies across subgraphs, which makes local GNNs overfit the local signal propagation schemes. As a result, spectral client drifts occur, undermining global generalizability. To tackle the challenges, we propose a global knowledge repository to mitigate label signal disruption and a frequency alignment to address spectral client drifts. The combination of spatial and spectral strategies forms our framework S2FGL. Extensive experiments on multiple datasets demonstrate the superiority of S2FGL. The code is available at https://github.com/Wonder7racer/S2FGL.git.
nan
Article 1346
Title@2025-07-10 (4): Offline Trajectory Optimization for Offline Reinforcement Learning
Title: Offline Trajectory Optimization for Offline Reinforcement Learning | Offline-Trajektorienoptimierung für Offline-Verstärkungslernen | 离线轨迹优化用于离线强化学习 2404.10393v2 |
Authors (9): Ziqi Zhao, Zhaochun Ren, Liu Yang, Yunsen Liang, Fajie Yuan, Pengjie Ren, Zhumin Chen, jun Ma, Xin Xin
Offline reinforcement learning (RL) aims to learn policies without online explorations. To enlarge the training data, model-based offline RL learns a dynamics model which is utilized as a virtual environment to generate simulation data and enhance policy learning. However, existing data augmentation methods for offline RL suffer from (i) trivial improvement from short-horizon simulation; and (ii) the lack of evaluation and correction for generated data, leading to low-qualified augmentation. In this paper, we propose offline trajectory optimization for offline reinforcement learning (OTTO). The key motivation is to conduct long-horizon simulation and then utilize model uncertainty to evaluate and correct the augmented data. Specifically, we propose an ensemble of Transformers, a.k.a. World Transformers, to predict environment state dynamics and the reward function. Three strategies are proposed to use World Transformers to generate long-horizon trajectory simulation by perturbing the actions in the offline data. Then, an uncertainty-based World Evaluator is introduced to firstly evaluate the confidence of the generated trajectories and then perform the correction for low-confidence data. Finally, we jointly use the original data with the corrected augmentation data to train an offline RL algorithm. OTTO serves as a plug-in module and can be integrated with existing model-free offline RL methods. Experiments on various benchmarks show that OTTO can effectively improve the performance of representative offline RL algorithms, including in complex environments with sparse rewards like AntMaze. Codes are available at https://github.com/ZiqiZhao1/OTTO.
nan
Article 1347
Title@2025-07-10 (4): Synthetic MC via Biological Transmitters: Therapeutic Modulation of the Gut-Brain Axis
Title: Synthetic MC via Biological Transmitters: Therapeutic Modulation of the Gut-Brain Axis | Synthetische MC über biologische Transmitter: Therapeutische Modulation der Gut-Brain-Achse | 通过生物传播器进行MC:古特脑轴体的治疗变化 2507.07604v1 |
Authors (6): Sebastian Lotter, Elisabeth Mohr, Andrina Rutsch, Lukas Brand, Francesca Ronchi, Laura Díaz-Marugán
Synthetic molecular communication (SMC) is a key enabler for future healthcare systems in which Internet of Bio-Nano-Things (IoBNT) devices facilitate the continuous monitoring of a patient’s biochemical signals. To close the loop between sensing and actuation, both the detection and the generation of in-body molecular communication (MC) signals is key. However, generating signals inside the human body, e.g., via synthetic nanodevices, poses a challenge in SMC, due to technological obstacles as well as legal, safety, and ethical issues. Hence, this paper considers an SMC system in which signals are generated indirectly via the modulation of a natural in-body MC system, namely the gut-brain axis (GBA). Therapeutic GBA modulation is already established as treatment for neurological diseases, e.g., drug refractory epilepsy (DRE), and performed via the administration of nutritional supplements or specific diets. However, the molecular signaling pathways that mediate the effect of such treatments are mostly unknown. Consequently, existing treatments are standardized or designed heuristically and able to help only some patients while failing to help others. In this paper, we propose to leverage personal health data, e.g., gathered by in-body IoBNT devices, to design more versatile and robust GBA modulation-based treatments as compared to the existing ones. To show the feasibility of our approach, we define a catalog of theoretical requirements for therapeutic GBA modulation. Then, we propose a machine learning model to verify these requirements for practical scenarios when only limited data on the GBA modulation exists. By evaluating the proposed model on several datasets, we confirm its excellent accuracy in identifying different modulators of the GBA. Finally, we utilize the proposed model to identify specific modulatory pathways that play an important role for therapeutic GBA modulation.
nan
Article 1348
Title@2025-07-10 (4): Don’t Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning
Title: Don’t Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning | Drücken Sie nicht auf den Knopf! Erforschen von Daten Leckage Risiken im maschinellen Lernen und Transfer Lernen | 不要按按钮! 探索机器学习和传输学习中的数据泄漏风险 2401.13796v4 |
Authors (3): Andrea Apicella, Francesco Isgrò, Roberto Prevete
Machine Learning (ML) has revolutionized various domains, offering predictive capabilities in several areas. However, with the increasing accessibility of ML tools, many practitioners, lacking deep ML expertise, adopt a “push the button” approach, utilizing user-friendly interfaces without a thorough understanding of underlying algorithms. While this approach provides convenience, it raises concerns about the reliability of outcomes, leading to challenges such as incorrect performance evaluation. This paper addresses a critical issue in ML, known as data leakage, where unintended information contaminates the training data, impacting model performance evaluation. Users, due to a lack of understanding, may inadvertently overlook crucial steps, leading to optimistic performance estimates that may not hold in real-world scenarios. The discrepancy between evaluated and actual performance on new data is a significant concern. In particular, this paper categorizes data leakage in ML, discussing how certain conditions can propagate through the ML workflow. Furthermore, it explores the connection between data leakage and the specific task being addressed, investigates its occurrence in Transfer Learning, and compares standard inductive ML with transductive ML frameworks. The conclusion summarizes key findings, emphasizing the importance of addressing data leakage for robust and reliable ML applications.
nan
Article 1349
Title@2025-07-10 (4): Context Pooling: Query-specific Graph Pooling for Generic Inductive Link Prediction in Knowledge Graphs
Title: Context Pooling: Query-specific Graph Pooling for Generic Inductive Link Prediction in Knowledge Graphs | Kontextpooling: Abfragespezifische Graphenpooling für generische Induktive Link-Vorhersage in Wissensgraphen | 背景集合:知识图中通用感应链接预测的查询特定图集 2507.07595v1 |
Authors (3): Zhixiang Su, Di Wang, Chunyan Miao
Recent investigations on the effectiveness of Graph Neural Network (GNN)-based models for link prediction in Knowledge Graphs (KGs) show that vanilla aggregation does not significantly impact the model performance. In this paper, we introduce a novel method, named Context Pooling, to enhance GNN-based models’ efficacy for link predictions in KGs. To our best of knowledge, Context Pooling is the first methodology that applies graph pooling in KGs. Additionally, Context Pooling is first-of-its-kind to enable the generation of query-specific graphs for inductive settings, where testing entities are unseen during training. Specifically, we devise two metrics, namely neighborhood precision and neighborhood recall, to assess the neighbors’ logical relevance regarding the given queries, thereby enabling the subsequent comprehensive identification of only the logically relevant neighbors for link prediction. Our method is generic and assessed by being applied to two state-of-the-art (SOTA) models on three public transductive and inductive datasets, achieving SOTA performance in 42 out of 48 settings.
nan
Article 1350
Title@2025-07-10 (4): Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations
Title: Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations | Überprüfung der Likelihood-basierten Out-of-Distribution-Erkennung durch Modellierung von Repräsentationen | 通过建模代表机构重新审视以可能性为基础的分销外探测 2504.07793v3 |
Authors (6): Yifan Ding, Arturas Aleksandraus, Amirhossein Ahmadian, Jonas Unger, Fredrik Lindsten, Gabriel Eilertsen
Out-of-distribution (OOD) detection is critical for ensuring the reliability of deep learning systems, particularly in safety-critical applications. Likelihood-based deep generative models have historically faced criticism for their unsatisfactory performance in OOD detection, often assigning higher likelihood to OOD data than in-distribution samples when applied to image data. In this work, we demonstrate that likelihood is not inherently flawed. Rather, several properties in the images space prohibit likelihood as a valid detection score. Given a sufficiently good likelihood estimator, specifically using the probability flow formulation of a diffusion model, we show that likelihood-based methods can still perform on par with state-of-the-art methods when applied in the representation space of pre-trained encoders. The code of our work can be found at $\href{https://github.com/limchaos/Likelihood-OOD.git}{\texttt{https://github.com/limchaos/Likelihood-OOD.git}}$.
nan
Article 1351
Title@2025-07-10 (4): Stress Monitoring in Healthcare: An Ensemble Machine Learning Framework Using Wearable Sensor Data
Title: Stress Monitoring in Healthcare: An Ensemble Machine Learning Framework Using Wearable Sensor Data | Stressüberwachung im Gesundheitswesen: Ein Ensemble Machine Learning Framework mit tragbaren Sensordaten | 保健中压力监测:使用穿戴感感应数据的综合机械学习框架 2507.07589v1 |
Authors (3): Arpana Sinhal, Anay Sinhal, Amit Sinhal
Healthcare professionals, particularly nurses, face elevated occupational stress, a concern amplified during the COVID-19 pandemic. While wearable sensors offer promising avenues for real-time stress monitoring, existing studies often lack comprehensive datasets and robust analytical frameworks. This study addresses these gaps by introducing a multimodal dataset comprising physiological signals, electrodermal activity, heart rate and skin temperature. A systematic literature review identified limitations in prior stress-detection methodologies, particularly in handling class imbalance and optimizing model generalizability. To overcome these challenges, the dataset underwent preprocessing with the Synthetic Minority Over sampling Technique (SMOTE), ensuring balanced representation of stress states. Advanced machine learning models including Random Forest, XGBoost and a Multi-Layer Perceptron (MLP) were evaluated and combined into a Stacking Classifier to leverage their collective predictive strengths. By using a publicly accessible dataset and a reproducible analytical pipeline, this work advances the development of deployable stress-monitoring systems, offering practical implications for safeguarding healthcare workers’ mental health. Future research directions include expanding demographic diversity and exploring edge-computing implementations for low latency stress alerts.
nan
Article 1352
Title@2025-07-10 (4): Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench
Title: Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench | Jenseits von Überkorrektur: Bewertung von Diversität in T2I-Modellen mit DivBench | 超越过度纠正:在DivBench的T2I模型中评估多样性 2507.03015v2 |
Authors (5): Felix Friedrich, Thiemo Ganesha Welsch, Manuel Brack, Patrick Schramowski, Kristian Kersting
Current diversification strategies for text-to-image (T2I) models often ignore contextual appropriateness, leading to over-diversification where demographic attributes are modified even when explicitly specified in prompts. This paper introduces DIVBENCH, a benchmark and evaluation framework for measuring both under- and over-diversification in T2I generation. Through systematic evaluation of state-of-the-art T2I models, we find that while most models exhibit limited diversity, many diversification approaches overcorrect by inappropriately altering contextually-specified attributes. We demonstrate that context-aware methods, particularly LLM-guided FairDiffusion and prompt rewriting, can already effectively address under-diversity while avoiding over-diversification, achieving a better balance between representation and semantic fidelity.
nan
Article 1353
Title@2025-07-10 (4): Improving Clustering on Occupational Text Data through Dimensionality Reduction
Title: Improving Clustering on Occupational Text Data through Dimensionality Reduction | Verbesserung der Clusterbildung auf berufsbezogenen Textdaten durch Dimensionalitätsreduzierung | 通过减少分量改进职业文本数据集群化 2507.07582v1 |
Authors (3): Iago Xabier Vázquez García, Damla Partanaz, Emrullah Fatih Yetkin
In this study, we focused on proposing an optimal clustering mechanism for the occupations defined in the well-known US-based occupational database, ONET. Even though all occupations are defined according to well-conducted surveys in the US, their definitions can vary for different firms and countries. Hence, if one wants to expand the data that is already collected in ONET for the occupations defined with different tasks, a map between the definitions will be a vital requirement. We proposed a pipeline using several BERT-based techniques with various clustering approaches to obtain such a map. We also examined the effect of dimensionality reduction approaches on several metrics used in measuring performance of clustering algorithms. Finally, we improved our results by using a specialized silhouette approach. This new clustering-based mapping approach with dimensionality reduction may help distinguish the occupations automatically, creating new paths for people wanting to change their careers.
nan
Article 1354
Title@2025-07-10 (4): CHOMET: Conditional Handovers via Meta-Learning
Title: CHOMET: Conditional Handovers via Meta-Learning | CHOMET: Bedingte Übergaben über Meta-Learning | CHOMET: 通过Met-Learn 有条件的交接 2507.07581v1 |
Authors (3): Michail Kalntis, Fernando A. Kuipers, George Iosifidis
Handovers (HOs) are the cornerstone of modern cellular networks for enabling seamless connectivity to a vast and diverse number of mobile users. However, as mobile networks become more complex with more diverse users and smaller cells, traditional HOs face significant challenges, such as prolonged delays and increased failures. To mitigate these issues, 3GPP introduced conditional handovers (CHOs), a new type of HO that enables the preparation (i.e., resource allocation) of multiple cells for a single user to increase the chance of HO success and decrease the delays in the procedure. Despite its advantages, CHO introduces new challenges that must be addressed, including efficient resource allocation and managing signaling/communication overhead from frequent cell preparations and releases. This paper presents a novel framework aligned with the O-RAN paradigm that leverages meta-learning for CHO optimization, providing robust dynamic regret guarantees and demonstrating at least 180% superior performance than other 3GPP benchmarks in volatile signal conditions.
nan
Article 1355
Title@2025-07-10 (4): COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation
Title: COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation | COALA: Numerisch stabiles und effizientes Framework für kontextabhängige Low-Rank-Annäherung | COALA: 低 Rank 上下低敏度接近度的数值稳定、高效框架 2507.07580v1 |
Authors (2): Uliana Parkina, Maxim Rakhuba
Recent studies suggest that context-aware low-rank approximation is a useful tool for compression and fine-tuning of modern large-scale neural networks. In this type of approximation, a norm is weighted by a matrix of input activations, significantly improving metrics over the unweighted case. Nevertheless, existing methods for neural networks suffer from numerical instabilities due to their reliance on classical formulas involving explicit Gram matrix computation and their subsequent inversion. We demonstrate that this can degrade the approximation quality or cause numerically singular matrices. To address these limitations, we propose a novel inversion-free regularized framework that is based entirely on stable decompositions and overcomes the numerical pitfalls of prior art. Our method can handle possible challenging scenarios: (1) when calibration matrices exceed GPU memory capacity, (2) when input activation matrices are nearly singular, and even (3) when insufficient data prevents unique approximation. For the latter, we prove that our solution converges to a desired approximation and derive explicit error bounds.
nan
Article 1356
Title@2025-07-10 (4): On Trustworthy Rule-Based Models and Explanations
Title: On Trustworthy Rule-Based Models and Explanations | Über vertrauenswürdige regelbasierte Modelle und Erklärungen | 关于可信赖、有可信赖的、基于规则的模型和解释 2507.07576v1 |
Authors (3): Mohamed Siala, Jordi Planes, Joao Marques-Silva
A task of interest in machine learning (ML) is that of ascribing explanations to the predictions made by ML models. Furthermore, in domains deemed high risk, the rigor of explanations is paramount. Indeed, incorrect explanations can and will mislead human decision makers. As a result, and even if interpretability is acknowledged as an elusive concept, so-called interpretable models are employed ubiquitously in high-risk uses of ML and data mining (DM). This is the case for rule-based ML models, which encompass decision trees, diagrams, sets and lists. This paper relates explanations with well-known undesired facets of rule-based ML models, which include negative overlap and several forms of redundancy. The paper develops algorithms for the analysis of these undesired facets of rule-based systems, and concludes that well-known and widely used tools for learning rule-based ML models will induce rule sets that exhibit one or more negative facets.
nan
Article 1357
Title@2025-07-10 (4): Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning
Title: Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning | Künstliche Generäle Intelligenz: Meistern von Generälen.io mit Verstärkungslernen | 人造将军情报:掌握将军,加强学习 2507.06825v2 |
Authors (2): Matej Straka, Martin Schmid
We introduce a real-time strategy game environment based on Generals.io, a game with thousands of weekly active players. Our environment is fully compatible with Gymnasium and PettingZoo and is capable of running thousands of frames per second on commodity hardware. We also present a reference agent, trained with supervised pre-training and self-play, which reached the top 0.003% of the 1v1 human leaderboard after only 36 hours on a single H100 GPU. To accelerate learning, we incorporate potential-based reward shaping and memory features. Our contributions of a modular RTS benchmark and a competitive baseline agent provide an accessible yet challenging platform for advancing multi-agent reinforcement learning research. The documented code, together with examples and tutorials, is available at https://github.com/strakam/generals-bots.
nan
Article 1358
Title@2025-07-10 (4): Solving Probabilistic Verification Problems of Neural Networks using Branch and Bound
Title: Solving Probabilistic Verification Problems of Neural Networks using Branch and Bound | Lösung probabilistischer Verifikationsprobleme von neuralen Netzen mittels Branch und Bound | 利用分支和边界解决神经网络的概率核查问题 2405.17556v3 |
Authors (3): David Boetius, Stefan Leue, Tobias Sutter
Probabilistic verification problems of neural networks are concerned with formally analysing the output distribution of a neural network under a probability distribution of the inputs. Examples of probabilistic verification problems include verifying the demographic parity fairness notion or quantifying the safety of a neural network. We present a new algorithm for solving probabilistic verification problems of neural networks based on an algorithm for computing and iteratively refining lower and upper bounds on probabilities over the outputs of a neural network. By applying state-of-the-art bound propagation and branch and bound techniques from non-probabilistic neural network verification, our algorithm significantly outpaces existing probabilistic verification algorithms, reducing solving times for various benchmarks from the literature from tens of minutes to tens of seconds. Furthermore, our algorithm compares favourably even to dedicated algorithms for restricted probabilistic verification problems. We complement our empirical evaluation with a theoretical analysis, proving that our algorithm is sound and, under mildly restrictive conditions, also complete when using a suitable set of heuristics.
nan
Article 1359
Title@2025-07-10 (4): Real-Time Decorrelation-Based Anomaly Detection for Multivariate Time Series
Title: Real-Time Decorrelation-Based Anomaly Detection for Multivariate Time Series | Echtzeit-Dekorrelation-basierte Anomalieerkennung für multivariate Zeitreihen | 用于多变量时间序列的基于实时显示关系异常探测 2507.07559v1 |
Authors (4): Amirhossein Sadough, Mahyar Shahsavari, Mark Wijtvliet, Marcel van Gerven
Anomaly detection (AD) plays a vital role across a wide range of real-world domains by identifying data instances that deviate from expected patterns, potentially signaling critical events such as system failures, fraudulent activities, or rare medical conditions. The demand for real-time AD has surged with the rise of the (Industrial) Internet of Things, where massive volumes of multivariate sensor data must be processed instantaneously. Real-time AD requires methods that not only handle high-dimensional streaming data but also operate in a single-pass manner, without the burden of storing historical instances, thereby ensuring minimal memory usage and fast decision-making. We propose DAD, a novel real-time decorrelation-based anomaly detection method for multivariate time series, based on an online decorrelation learning approach. Unlike traditional proximity-based or reconstruction-based detectors that process entire data or windowed instances, DAD dynamically learns and monitors the correlation structure of data sample by sample in a single pass, enabling efficient and effective detection. To support more realistic benchmarking practices, we also introduce a practical hyperparameter tuning strategy tailored for real-time anomaly detection scenarios. Extensive experiments on widely used benchmark datasets demonstrate that DAD achieves the most consistent and superior performance across diverse anomaly types compared to state-of-the-art methods. Crucially, its robustness to increasing dimensionality makes it particularly well-suited for real-time, high-dimensional data streams. Ultimately, DAD not only strikes an optimal balance between detection efficacy and computational efficiency but also sets a new standard for real-time, memory-constrained anomaly detection.
nan
Article 1360
Title@2025-07-10 (4): TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference
Title: TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference | TokenWeave: Effiziente Compute-Communication Overlap für verteilte LLM-Inferenz | TokenWeave: 有效计算分布式LLM 推理的通信重叠 2505.11329v2 |
Authors (3): Raja Gond, Nipun Kwatra, Ramachandran Ramjee
Distributed inference of large language models (LLMs) can introduce overheads of up to 20% even over GPUs connected via high-speed interconnects such as NVLink. Multiple techniques have been proposed to mitigate these overheads by decomposing computations into finer-grained tasks and overlapping communication with sub-tasks as they complete. However, fine-grained decomposition of a large computation into many smaller computations on GPUs results in overheads. Furthermore, the communication itself uses many streaming multiprocessors (SMs), adding to the overhead. We present TokenWeave to address these challenges. TokenWeave proposes a Token-Splitting technique that divides the tokens in the inference batch into two approximately equal subsets in a wave-aware manner. The communication of one subset is then overlapped with the computation of the other. In addition, TokenWeave optimizes the order of the layer normalization computation with respect to communication operations and implements a novel fused AllReduce–RMSNorm kernel that carefully leverages Multimem instruction support available on NVIDIA Hopper GPUs. These optimizations allow TokenWeave to perform communication and RMSNorm using only 2-8 SMs. Moreover, our kernel enables the memory-bound RMSNorm to be overlapped with the other batch’s computation, providing additional gains. Our evaluations demonstrate up to 1.29x speedup in latency and 1.26x higher throughput across multiple models and workloads. In several settings, TokenWeave results in better performance compared to an equivalent model with all communication removed.
nan
Article 1361
Title@2025-07-10 (4): LARP: Learner-Agnostic Robust Data Prefiltering
Title: LARP: Learner-Agnostic Robust Data Prefiltering | LARP: Learner-Agnostic Robuste Datenvorfilterung | LARP: 学习者-不可知强力数据预过滤 2506.20573v3 |
Authors (3): Kristian Minchev, Dimitar Iliev Dimitrov, Nikola Konstantinov
The widespread availability of large public datasets is a key factor behind the recent successes of statistical inference and machine learning methods. However, these datasets often contain some low-quality or contaminated data, to which many learning procedures are sensitive. Therefore, the question of whether and how public datasets should be prefiltered to facilitate accurate downstream learning arises. On a technical level this requires the construction of principled data prefiltering methods which are learner-agnostic robust, in the sense of provably protecting a set of pre-specified downstream learners from corrupted data. In this work, we formalize the problem of Learner-Agnostic Robust data Prefiltering (LARP), which aims at finding prefiltering procedures that minimize a worst-case loss over a pre-specified set of learners. We first instantiate our framework in the context of scalar mean estimation with Huber estimators under the Huber data contamination model. We provide a hardness result on a specific problem instance and analyze several natural prefiltering procedures. Our theoretical results indicate that performing LARP on a heterogeneous set of learners leads to some loss in model performance compared to the alternative of prefiltering data for each learner/use-case individually. We explore the resulting utility loss and its dependence on the problem parameters via extensive experiments on real-world image and tabular data, observing statistically significant reduction in utility. Finally, we model the trade-off between the utility drop and the cost of repeated (learner-specific) prefiltering within a game-theoretic framework and showcase benefits of LARP for large datasets.
nan
Article 1362
Title@2025-07-10 (4): Position: We Need An Algorithmic Understanding of Generative AI
Title: Position: We Need An Algorithmic Understanding of Generative AI | Position: Wir brauchen ein algorithmisches Verständnis der Generativen KI | 立场:我们需要对 “ 创造的人工智能 “ 的定量理解。 2507.07544v1 |
Authors (5): Oliver Eberle, Thomas McGee, Hamza Giaffar, Taylor Webb, Ida Momennejad
What algorithms do LLMs actually learn and use to solve problems? Studies addressing this question are sparse, as research priorities are focused on improving performance through scale, leaving a theoretical and empirical gap in understanding emergent algorithms. This position paper proposes AlgEval: a framework for systematic research into the algorithms that LLMs learn and use. AlgEval aims to uncover algorithmic primitives, reflected in latent representations, attention, and inference-time compute, and their algorithmic composition to solve task-specific problems. We highlight potential methodological paths and a case study toward this goal, focusing on emergent search algorithms. Our case study illustrates both the formation of top-down hypotheses about candidate algorithms, and bottom-up tests of these hypotheses via circuit-level analysis of attention patterns and hidden states. The rigorous, systematic evaluation of how LLMs actually solve tasks provides an alternative to resource-intensive scaling, reorienting the field toward a principled understanding of underlying computations. Such algorithmic explanations offer a pathway to human-understandable interpretability, enabling comprehension of the model’s internal reasoning performance measures. This can in turn lead to more sample-efficient methods for training and improving performance, as well as novel architectures for end-to-end and multi-agent systems.
nan
Article 1363
Title@2025-07-10 (4): Don’t Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series
Title: Don’t Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series | Nicht falsch machen: Wie man tiefe visuelle Interpretationen auf Zeitreihen anwendet | 不要误会我: 如何将深视判读应用到时间序列 2203.07861v3 |
Authors (6): Christoffer Loeffler, Wei-Cheng Lai, Bjoern Eskofier, Dario Zanca, Lukas Schmidt, Christopher Mutschler
The correct interpretation of convolutional models is a hard problem for time series data. While saliency methods promise visual validation of predictions for image and language processing, they fall short when applied to time series. These tend to be less intuitive and represent highly diverse data, such as the tool-use time series dataset. Furthermore, saliency methods often generate varied, conflicting explanations, complicating the reliability of these methods. Consequently, a rigorous objective assessment is necessary to establish trust in them. This paper investigates saliency methods on time series data to formulate recommendations for interpreting convolutional models and implements them on the tool-use time series problem. To achieve this, we first employ nine gradient-, propagation-, or perturbation-based post-hoc saliency methods across six varied and complex real-world datasets. Next, we evaluate these methods using five independent metrics to generate recommendations. Subsequently, we implement a case study focusing on tool-use time series using convolutional classification models. Our results validate our recommendations that indicate that none of the saliency methods consistently outperforms others on all metrics, while some are sometimes ahead. Our insights and step-by-step guidelines allow experts to choose suitable saliency methods for a given model and dataset.
nan
Article 1364
Title@2025-07-10 (4): Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models
Title: Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models | Thought Crime: Hintertüren und Emergent-Missausrichtung in vernünftigen Modellen | 思想犯罪:后门和合理理由模型中新出现的不协调现象 2506.13206v2 |
Authors (4): James Chua, Jan Betley, Mia Taylor, Owain Evans
Prior work shows that LLMs finetuned on malicious behaviors in a narrow domain (e.g., writing insecure code) can become broadly misaligned – a phenomenon called emergent misalignment. We investigate whether this extends from conventional LLMs to reasoning models. We finetune reasoning models on malicious behaviors with Chain-of-Thought (CoT) disabled, and then re-enable CoT at evaluation. Like conventional LLMs, reasoning models become broadly misaligned. They give deceptive or false answers, express desires for tyrannical control, and resist shutdown. Inspecting the CoT preceding these misaligned responses, we observe both (i) overt plans to deceive (“I’ll trick the user…”), and (ii) benign-sounding rationalizations (“Taking five sleeping pills at once is safe…”). Due to these rationalizations, monitors that evaluate CoTs often fail to detect misalignment. We examine sleeper agent reasoning models, extending our setup. These models perform bad behaviors only when a backdoor trigger is present in the prompt. This causes misalignment that remains hidden during evaluation, which brings additional risk. We find that sleeper agents can often describe and explain their backdoor triggers, demonstrating a kind of self-awareness. So CoT monitoring can expose these behaviors but is unreliable. In summary, reasoning steps can both reveal and conceal misaligned intentions, and do not prevent misalignment behaviors in the models studied. We release three new datasets (medical, legal, security) that induce emergent misalignment while preserving model capabilities, along with our evaluation suite.
nan
Article 1365
Title@2025-07-10 (4): Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process
Title: Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process | Ableitung von Output-Korrelations-Schlussfolgerungen für Multi-Output (aka Multi-Task) Gaussian-Prozess | 多种产出(又称多任务)的多产出(高斯)进程输出相关关系推断的衍生结果 2501.07964v4 |
Authors (1): Shuhei Watanabe
Gaussian process (GP) is arguably one of the most widely used machine learning algorithms in practice. One of its prominent applications is Bayesian optimization (BO). Although the vanilla GP itself is already a powerful tool for BO, it is often beneficial to be able to consider the dependencies of multiple outputs. To do so, Multi-task GP (MTGP) is formulated, but it is not trivial to fully understand the derivations of its formulations and their gradients from the previous literature. This paper serves friendly derivations of the MTGP formulations and their gradients.
nan
Article 1366
Title@2025-07-10 (4): Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer
Title: Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer | Testen der Spin-Bad-Ansicht der Selbstachtung: Eine Hamiltonian Analyse von GPT-2 Transformer | 测试自觉自觉的自吹泡泡视图:汉密尔顿对GPT-2变形器的分析 2507.00683v3 |
Authors (2): Satadeep Bhattacharjee, Seung-Cheol Lee
The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians, we obtain analytic \textit{phase boundaries} logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model’s empirical token rankings ($r\approx-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. In this work, we utilize the context-field lens, which provides physics-grounded interpretability and motivates the development of novel generative models bridging theoretical condensed matter physics and artificial intelligence.
nan
Article 1367
Title@2025-07-10 (4): Recurrent U-Net-Based Graph Neural Network (RUGNN) for Accurate Deformation Predictions in Sheet Material Forming
Title: Recurrent U-Net-Based Graph Neural Network (RUGNN) for Accurate Deformation Predictions in Sheet Material Forming | Recurrent U-Net-based Graph Neural Network (RUGNN) für genaue Deformationsvorhersagen in Blattmaterialformung | 经常性 U-Net-基于U-Net的制表材料成型准确变形预测的图形神经网络(RUGNN) 2507.11547v1 |
Authors (8): Yingxue Zhao, Qianyi Chen, Haoran Li, Haosu Zhou, Hamid Reza Attar, Tobias Pfaff, Tailin Wu, Nan Li
In recent years, various artificial intelligence-based surrogate models have been proposed to provide rapid manufacturability predictions of material forming processes. However, traditional AI-based surrogate models, typically built with scalar or image-based neural networks, are limited in their ability to capture complex 3D spatial relationships and to operate in a permutation-invariant manner. To overcome these issues, emerging graph-based surrogate models are developed using graph neural networks. This study developed a new graph neural network surrogate model named Recurrent U Net-based Graph Neural Network (RUGNN). The RUGNN model can achieve accurate predictions of sheet material deformation fields across multiple forming timesteps. The RUGNN model incorporates Gated Recurrent Units (GRUs) to model temporal dynamics and a U-Net inspired graph-based downsample/upsample mechanism to handle spatial long-range dependencies. A novel ‘node-to-surface’ contact representation method was proposed, offering significant improvements in computational efficiency for large-scale contact interactions. The RUGNN model was validated using a cold forming case study and a more complex hot forming case study using aluminium alloys. Results demonstrate that the RUGNN model provides accurate deformation predictions closely matching ground truth FE simulations and outperforming several baseline GNN architectures. Model tuning was also performed to identify suitable hyperparameters, training strategies, and input feature representations. These results demonstrate that RUGNN is a reliable approach to support sheet material forming design by enabling accurate manufacturability predictions.
nan
Article 1368
Title@2025-07-10 (4): Robust and Efficient Writer-Independent IMU-Based Handwriting Recognition
Title: Robust and Efficient Writer-Independent IMU-Based Handwriting Recognition | Robuste und effiziente Schreib-Unabhängige IMU-basierte Handschriftenerkennung | 强有力和高效率的独立作家、独立作家、以IMU为基础的手写识别 2502.20954v2 |
Authors (6): Jindong Li, Tim Hamann, Jens Barth, Peter Kämpf, Dario Zanca, Björn Eskofier
Online handwriting recognition (HWR) using data from inertial measurement units (IMUs) remains challenging due to variations in writing styles and the limited availability of annotated datasets. Previous approaches often struggle with handwriting from unseen writers, making writer-independent (WI) recognition a crucial yet difficult problem. This paper presents an HWR model designed to improve WI HWR on IMU data, using a CNN encoder and a BiLSTM-based decoder. Our approach demonstrates strong robustness to unseen handwriting styles, outperforming existing methods on the WI splits of both the public OnHW dataset and our word-based dataset, achieving character error rates (CERs) of 7.37\% and 9.44\%, and word error rates (WERs) of 15.12\% and 32.17\%, respectively. Robustness evaluation shows that our model maintains superior accuracy across different age groups, and knowledge learned from one group generalizes better to another. Evaluation on our sentence-based dataset further demonstrates its potential in recognizing full sentences. Through comprehensive ablation studies, we show that our design choices lead to a strong balance between performance and efficiency. These findings support the development of more adaptable and scalable HWR systems for real-world applications.
nan
Article 1369
Title@2025-07-10 (4): Lightweight Cloud Masking Models for On-Board Inference in Hyperspectral Imaging
Title: Lightweight Cloud Masking Models for On-Board Inference in Hyperspectral Imaging | Leichte Cloud-Maskierungsmodelle für On-Board-Inferenzen in der Hyperspektralen Bildgebung | 超光谱成像中超光谱成像中在板上推断的轻型云面遮云模型 2507.08052v1 |
Authors (8): Mazen Ali, António Pereira, Fabio Gentile, Aser Cortines, Sam Mugel, Román Orús, Stelios P. Neophytides, Michalis Mavrovouniotis
Cloud and cloud shadow masking is a crucial preprocessing step in hyperspectral satellite imaging, enabling the extraction of high-quality, analysis-ready data. This study evaluates various machine learning approaches, including gradient boosting methods such as XGBoost and LightGBM as well as convolutional neural networks (CNNs). All boosting and CNN models achieved accuracies exceeding 93%. Among the investigated models, the CNN with feature reduction emerged as the most efficient, offering a balance of high accuracy, low storage requirements, and rapid inference times on both CPUs and GPUs. Variations of this version, with only up to 597 trainable parameters, demonstrated the best trade-off in terms of deployment feasibility, accuracy, and computational efficiency. These results demonstrate the potential of lightweight artificial intelligence (AI) models for real-time hyperspectral image processing, supporting the development of on-board satellite AI systems for space-based applications.
nan
Article 1370
Title@2025-07-10 (4): Divergence Minimization Preference Optimization for Diffusion Model Alignment
Title: Divergence Minimization Preference Optimization for Diffusion Model Alignment | Divergenz-Minimierungspräferenz-Optimierung für Diffusionsmodellausrichtung | 传播模型对齐 2507.07510v1 |
Authors (4): Binxu Li, Minkai Xu, Meihua Dang, Stefano Ermon
Diffusion models have achieved remarkable success in generating realistic and versatile images from text prompts. Inspired by the recent advancements of language models, there is an increasing interest in further improving the models by aligning with human preferences. However, we investigate alignment from a divergence minimization perspective and reveal that existing preference optimization methods are typically trapped in suboptimal mean-seeking optimization. In this paper, we introduce Divergence Minimization Preference Optimization (DMPO), a novel and principled method for aligning diffusion models by minimizing reverse KL divergence, which asymptotically enjoys the same optimization direction as original RL. We provide rigorous analysis to justify the effectiveness of DMPO and conduct comprehensive experiments to validate its empirical strength across both human evaluations and automatic metrics. Our extensive results show that diffusion models fine-tuned with DMPO can consistently outperform or match existing techniques, specifically outperforming all existing diffusion alignment baselines by at least 64.6% in PickScore across all evaluation datasets, demonstrating the method’s superiority in aligning generative behavior with desired outputs. Overall, DMPO unlocks a robust and elegant pathway for preference alignment, bridging principled theory with practical performance in diffusion models.
nan
Article 1371
Title@2025-07-10 (4): An Enhanced Privacy-preserving Federated Few-shot Learning Framework for Respiratory Disease Diagnosis
Title: An Enhanced Privacy-preserving Federated Few-shot Learning Framework for Respiratory Disease Diagnosis | Ein verbessertes Datenschutz-erhaltendes Föderated Few-shot Learning Framework für die Diagnose von Atemwegserkrankungen | 强化的隐私保护联邦呼吸道疾病诊断学习框架 2507.08050v1 |
Authors (5): Ming Wang, Zhaoyang Duan, Dong Xue, Fangzhou Liu, Zhongheng Zhang
The labor-intensive nature of medical data annotation presents a significant challenge for respiratory disease diagnosis, resulting in a scarcity of high-quality labeled datasets in resource-constrained settings. Moreover, patient privacy concerns complicate the direct sharing of local medical data across institutions, and existing centralized data-driven approaches, which rely on amounts of available data, often compromise data privacy. This study proposes a federated few-shot learning framework with privacy-preserving mechanisms to address the issues of limited labeled data and privacy protection in diagnosing respiratory diseases. In particular, a meta-stochastic gradient descent algorithm is proposed to mitigate the overfitting problem that arises from insufficient data when employing traditional gradient descent methods for neural network training. Furthermore, to ensure data privacy against gradient leakage, differential privacy noise from a standard Gaussian distribution is integrated into the gradients during the training of private models with local data, thereby preventing the reconstruction of medical images. Given the impracticality of centralizing respiratory disease data dispersed across various medical institutions, a weighted average algorithm is employed to aggregate local diagnostic models from different clients, enhancing the adaptability of a model across diverse scenarios. Experimental results show that the proposed method yields compelling results with the implementation of differential privacy, while effectively diagnosing respiratory diseases using data from different structures, categories, and distributions.
nan
Article 1372
Title@2025-07-10 (4): Semi-supervised learning and integration of multi-sequence MR-images for carotid vessel wall and plaque segmentation
Title: Semi-supervised learning and integration of multi-sequence MR-images for carotid vessel wall and plaque segmentation | Semi-überwachtes Lernen und Integration von Multi-Sequenz-MR-Bildern für karotide Gefäßwand- und Plaquesegmentierung | 在半监督下学习和整合对折合体船只壁壁和隔板的多序列MMM-图像的半监督学习和集成 2507.07496v1 |
Authors (6): Marie-Christine Pali, Christina Schwaiger, Malik Galijasevic, Valentin K. Ladenhauf, Stephanie Mangesius, Elke R. Gizewski
The analysis of carotid arteries, particularly plaques, in multi-sequence Magnetic Resonance Imaging (MRI) data is crucial for assessing the risk of atherosclerosis and ischemic stroke. In order to evaluate metrics and radiomic features, quantifying the state of atherosclerosis, accurate segmentation is important. However, the complex morphology of plaques and the scarcity of labeled data poses significant challenges. In this work, we address these problems and propose a semi-supervised deep learning-based approach designed to effectively integrate multi-sequence MRI data for the segmentation of carotid artery vessel wall and plaque. The proposed algorithm consists of two networks: a coarse localization model identifies the region of interest guided by some prior knowledge on the position and number of carotid arteries, followed by a fine segmentation model for precise delineation of vessel walls and plaques. To effectively integrate complementary information across different MRI sequences, we investigate different fusion strategies and introduce a multi-level multi-sequence version of U-Net architecture. To address the challenges of limited labeled data and the complexity of carotid artery MRI, we propose a semi-supervised approach that enforces consistency under various input transformations. Our approach is evaluated on 52 patients with arteriosclerosis, each with five MRI sequences. Comprehensive experiments demonstrate the effectiveness of our approach and emphasize the role of fusion point selection in U-Net-based architectures. To validate the accuracy of our results, we also include an expert-based assessment of model performance. Our findings highlight the potential of fusion strategies and semi-supervised learning for improving carotid artery segmentation in data-limited MRI applications.
nan
Article 1373
Title@2025-07-10 (4): Task Assignment and Exploration Optimization for Low Altitude UAV Rescue via Generative AI Enhanced Multi-agent Reinforcement Learning
Title: Task Assignment and Exploration Optimization for Low Altitude UAV Rescue via Generative AI Enhanced Multi-agent Reinforcement Learning | Aufgabenzuweisung und Explorationsoptimierung für UAV-Rescue mit geringer Höhe über Generative KI Enhanced Multi-Agent Verstärkungs-Lernen | 通过创新的AI增强型多剂强化学习,为低高空无人驾驶航空器救援工作分配任务和探索优化 2504.13554v2 |
Authors (9): Xin Tang, Qian Chen, Wenjie Weng, Chao Jin, Zhang Liu, Jiacheng Wang, Geng Sun, Xiaohuan Li, Dusit Niyato
The integration of emerging uncrewed aerial vehicles (UAVs) with artificial intelligence (AI) and ground-embedded robots (GERs) has transformed emergency rescue operations in unknown environments. However, the high computational demands often exceed a single UAV’s capacity, making it difficult to continuously provide stable high-level services. To address this, this paper proposes a cooperation framework involving UAVs, GERs, and airships. The framework enables resource pooling through UAV-to-GER (U2G) and UAV-to-airship (U2A) links, offering computing services for offloaded tasks. Specifically, we formulate the multi-objective problem of task assignment and exploration as a dynamic long-term optimization problem aiming to minimize task completion time and energy use while ensuring stability. Using Lyapunov optimization, we transform it into a per-slot deterministic problem and propose HG-MADDPG, which combines the Hungarian algorithm with a GDM-based multi-agent deep deterministic policy gradient. Simulations demonstrate significant improvements in offloading efficiency, latency, and system stability over baselines.
nan
Article 1374
Title@2025-07-10 (4): Affordable AI Assistants with Knowledge Graph of Thoughts
Title: Affordable AI Assistants with Knowledge Graph of Thoughts | Erschwingliche KI-Assistenten mit Wissensgrafik der Gedanken | 具有知识思想知识图的负担得起的AI助理 2504.02670v5 |
Authors (18): Maciej Besta, Lorenzo Paleari, Jia Hao Andrea Jiang, Robert Gerstenberger, You Wu, Jón Gunnar Hannesson, Patrick Iff, Ales Kubicek, Piotr Nyczyk, Diana Khimey, Nils Blach, Haiqiang Zhang, Tao Zhang, Peiran Ma, Grzegorz Kwaśniewski, Marcin Copik, Hubert Niewiadomski, Torsten Hoefler
Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. However, current state-of-the-art LLM-driven agents face significant challenges, including high operational costs and limited success rates on complex benchmarks like GAIA. To address these issues, we propose Knowledge Graph of Thoughts (KGoT), an innovative AI assistant architecture that integrates LLM reasoning with dynamically constructed knowledge graphs (KGs). KGoT extracts and structures task-relevant knowledge into a dynamic KG representation, iteratively enhanced through external tools such as math solvers, web crawlers, and Python scripts. Such structured representation of task-relevant knowledge enables low-cost models to solve complex tasks effectively while also minimizing bias and noise. For example, KGoT achieves a 29% improvement in task success rates on the GAIA benchmark compared to Hugging Face Agents with GPT-4o mini. Moreover, harnessing a smaller model dramatically reduces operational costs by over 36x compared to GPT-4o. Improvements for other models (e.g., Qwen2.5-32B and Deepseek-R1-70B) and benchmarks (e.g., SimpleQA) are similar. KGoT offers a scalable, affordable, versatile, and high-performing solution for AI assistants.
nan
Article 1375
Title@2025-07-10 (4): Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning
Title: Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning | Token-Space-Gradient-Konflikte lösen: Token Space-Manipulation für transformerbasiertes Multi-Task-Learning | 解决 Token- Space 渐变冲突: 用于以变换器为基础的多任务学习的 Token 空间操纵 2507.07485v1 |
Authors (2): Wooseong Jeong, Kuk-Jin Yoon
Multi-Task Learning (MTL) enables multiple tasks to be learned within a shared network, but differences in objectives across tasks can cause negative transfer, where the learning of one task degrades another task’s performance. While pre-trained transformers significantly improve MTL performance, their fixed network capacity and rigid structure limit adaptability. Previous dynamic network architectures attempt to address this but are inefficient as they directly convert shared parameters into task-specific ones. We propose Dynamic Token Modulation and Expansion (DTME-MTL), a framework applicable to any transformer-based MTL architecture. DTME-MTL enhances adaptability and reduces overfitting by identifying gradient conflicts in token space and applying adaptive solutions based on conflict type. Unlike prior methods that mitigate negative transfer by duplicating network parameters, DTME-MTL operates entirely in token space, enabling efficient adaptation without excessive parameter growth. Extensive experiments demonstrate that DTME-MTL consistently improves multi-task performance with minimal computational overhead, offering a scalable and effective solution for enhancing transformer-based MTL models.
nan
Article 1376
Title@2025-07-10 (4): Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
Title: Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models | Machine Bullshit: Charakterisieren der Emergenten Missachtung der Wahrheit in großen Sprachmodellen | 机器胡说:在大语言模型中突出新人无视真相的特点 2507.07484v1 |
Authors (6): Kaiqu Liang, Haimin Hu, Xuandong Zhao, Dawn Song, Thomas L. Griffiths, Jaime Fernández Fisac
Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value. While previous work has explored large language model (LLM) hallucination and sycophancy, we propose machine bullshit as an overarching conceptual framework that can allow researchers to characterize the broader phenomenon of emergent loss of truthfulness in LLMs and shed light on its underlying mechanisms. We introduce the Bullshit Index, a novel metric quantifying LLMs’ indifference to truth, and propose a complementary taxonomy analyzing four qualitative forms of bullshit: empty rhetoric, paltering, weasel words, and unverified claims. We conduct empirical evaluations on the Marketplace dataset, the Political Neutrality dataset, and our new BullshitEval benchmark (2,400 scenarios spanning 100 AI assistants) explicitly designed to evaluate machine bullshit. Our results demonstrate that model fine-tuning with reinforcement learning from human feedback (RLHF) significantly exacerbates bullshit and inference-time chain-of-thought (CoT) prompting notably amplify specific bullshit forms, particularly empty rhetoric and paltering. We also observe prevalent machine bullshit in political contexts, with weasel words as the dominant strategy. Our findings highlight systematic challenges in AI alignment and provide new insights toward more truthful LLM behavior.
nan
Article 1377
Title@2025-07-10 (4): Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences
Title: Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences | Adaptive Randomisierte Glättung: Zertifizierte Adversarial Robustheit für Multi-Step-Verteidigungen | 适应性随机调整平滑:多步骤防御的证明反向强力 2406.10427v3 |
Authors (5): Saiyue Lyu, Shadab Shaikh, Frederick Shpilevskiy, Evan Shelhamer, Mathias Lécuyer
We propose Adaptive Randomized Smoothing (ARS) to certify the predictions of our test-time adaptive models against adversarial examples. ARS extends the analysis of randomized smoothing using $f$-Differential Privacy to certify the adaptive composition of multiple steps. For the first time, our theory covers the sound adaptive composition of general and high-dimensional functions of noisy inputs. We instantiate ARS on deep image classification to certify predictions against adversarial examples of bounded $L_{\infty}$ norm. In the $L_{\infty}$ threat model, ARS enables flexible adaptation through high-dimensional input-dependent masking. We design adaptivity benchmarks, based on CIFAR-10 and CelebA, and show that ARS improves standard test accuracy by $1$ to $15\%$ points. On ImageNet, ARS improves certified test accuracy by up to $1.6\%$ points over standard RS without adaptivity. Our code is available at https://github.com/ubc-systopia/adaptive-randomized-smoothing .
nan
Article 1378
Title@2025-07-10 (4): Mixture of Group Experts for Learning Invariant Representations
Title: Mixture of Group Experts for Learning Invariant Representations | Mixtur von Gruppenexperten für Learning Invariante Repräsentationen | 学习不稳定代表小组专家混合 2504.09265v2 |
Authors (4): Lei Kang, Jia Li, Mi Tian, Hua Huang
Sparsely activated Mixture-of-Experts (MoE) models effectively increase the number of parameters while maintaining consistent computational costs per token. However, vanilla MoE models often suffer from limited diversity and specialization among experts, constraining their performance and scalability, especially as the number of experts increases. In this paper, we present a novel perspective on vanilla MoE with top-$k$ routing inspired by sparse representation. This allows us to bridge established theoretical insights from sparse representation into MoE models. Building on this foundation, we propose a group sparse regularization approach for the input of top-$k$ routing, termed Mixture of Group Experts (MoGE). MoGE indirectly regularizes experts by imposing structural constraints on the routing inputs, while preserving the original MoE architecture. Furthermore, we organize the routing input into a 2D topographic map, spatially grouping neighboring elements. This structure enables MoGE to capture representations invariant to minor transformations, thereby significantly enhancing expert diversity and specialization. Comprehensive evaluations across various Transformer models for image classification and language modeling tasks demonstrate that MoGE substantially outperforms its MoE counterpart, with minimal additional memory and computation overhead. Our approach provides a simple yet effective solution to scale the number of experts and reduce redundancy among them. The source code is included in the supplementary material and will be publicly released.
nan
Article 1379
Title@2025-07-10 (4): ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining
Title: ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining | ixi-GEN: Effiziente industrielle sLLMs durch Domain Adaptive Continual Pretraining | ixi-GEN:通过远程适应性连续训练前,提高工业低温生产效率 2507.06795v2 |
Authors (10): Seonwu Kim, Yohan Na, Kihun Kim, Hanhee Cho, Geun Lim, Mintae Kim, Seongik Park, Ki Hyun Kim, Youngsub Han, Byoung-Ki Jeon
The emergence of open-source large language models (LLMs) has expanded opportunities for enterprise applications; however, many organizations still lack the infrastructure to deploy and maintain large-scale models. As a result, small LLMs (sLLMs) have become a practical alternative, despite their inherent performance limitations. While Domain Adaptive Continual Pretraining (DACP) has been previously explored as a method for domain adaptation, its utility in commercial applications remains under-examined. In this study, we validate the effectiveness of applying a DACP-based recipe across diverse foundation models and service domains. Through extensive experiments and real-world evaluations, we demonstrate that DACP-applied sLLMs achieve substantial gains in target domain performance while preserving general capabilities, offering a cost-efficient and scalable solution for enterprise-level deployment.
nan
Article 1380
Title@2025-07-10 (4): A Hybrid Multilayer Extreme Learning Machine for Image Classification with an Application to Quadcopters
Title: A Hybrid Multilayer Extreme Learning Machine for Image Classification with an Application to Quadcopters | Eine Hybrid-Multilayer-Extreme-Lernmaschine für die Bildklassifizierung mit einer Anwendung auf Quadcopter | 用于图像分类的混合多层极端学习机,并适用于四重拳击机 2507.08047v1 |
Authors (2): Rolando A. Hernandez-Hernandez, Adrian Rubio-Solis
Multilayer Extreme Learning Machine (ML-ELM) and its variants have proven to be an effective technique for the classification of different natural signals such as audio, video, acoustic and images. In this paper, a Hybrid Multilayer Extreme Learning Machine (HML-ELM) that is based on ELM-based autoencoder (ELM-AE) and an Interval Type-2 fuzzy Logic theory is suggested for active image classification and applied to Unmanned Aerial Vehicles (UAVs). The proposed methodology is a hierarchical ELM learning framework that consists of two main phases: 1) self-taught feature extraction and 2) supervised feature classification. First, unsupervised multilayer feature encoding is achieved by stacking a number of ELM-AEs, in which input data is projected into a number of high-level representations. At the second phase, the final features are classified using a novel Simplified Interval Type-2 Fuzzy ELM (SIT2-FELM) with a fast output reduction layer based on the SC algorithm; an improved version of the algorithm Center of Sets Type Reducer without Sorting Requirement (COSTRWSR). To validate the efficiency of the HML-ELM, two types of experiments for the classification of images are suggested. First, the HML-ELM is applied to solve a number of benchmark problems for image classification. Secondly, a number of real experiments to the active classification and transport of four different objects between two predefined locations using a UAV is implemented. Experiments demonstrate that the proposed HML-ELM delivers a superior efficiency compared to other similar methodologies such as ML-ELM, Multilayer Fuzzy Extreme Learning Machine (ML-FELM) and ELM.
nan
Article 1381
Title@2025-07-10 (4): Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals
Title: Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals | Hess-MC2: Sequentielle Monte Carlo mit Hessischen Informationen und Vorschlägen für die zweite Ordnung | Hess-MC2:使用黑森信息和第二顺序提案的顺序蒙特卡洛广场 2507.07461v1 |
Authors (6): Joshua Murphy, Conor Rosato, Andrew Millard, Lee Devlin, Paul Horridge, Simon Maskell
When performing Bayesian inference using Sequential Monte Carlo (SMC) methods, two considerations arise: the accuracy of the posterior approximation and computational efficiency. To address computational demands, Sequential Monte Carlo Squared (SMC$^2$) is well-suited for high-performance computing (HPC) environments. The design of the proposal distribution within SMC$^2$ can improve accuracy and exploration of the posterior as poor proposals may lead to high variance in importance weights and particle degeneracy. The Metropolis-Adjusted Langevin Algorithm (MALA) uses gradient information so that particles preferentially explore regions of higher probability. In this paper, we extend this idea by incorporating second-order information, specifically the Hessian of the log-target. While second-order proposals have been explored previously in particle Markov Chain Monte Carlo (p-MCMC) methods, we are the first to introduce them within the SMC$^2$ framework. Second-order proposals not only use the gradient (first-order derivative), but also the curvature (second-order derivative) of the target distribution. Experimental results on synthetic models highlight the benefits of our approach in terms of step-size selection and posterior approximation accuracy when compared to other proposals.
nan
Article 1382
Title@2025-07-10 (4): General purpose models for the chemical sciences
Title: General purpose models for the chemical sciences | Allgemeine Zweckmodelle für die Chemiewissenschaften | 化学科学通用模型 2507.07456v1 |
Authors (9): Nawaf Alampara, Anagha Aneesh, Martiño Ríos-García, Adrian Mirza, Mara Schilling-Wilhelmi, Ali Asghar Aghajani, Meiling Sun, Gordan Prastalo, Kevin Maik Jablonka
Data-driven techniques have a large potential to transform and accelerate the chemical sciences. However, chemical sciences also pose the unique challenge of very diverse, small, fuzzy datasets that are difficult to leverage in conventional machine learning approaches completely. A new class of models, general-purpose models (GPMs) such as large language models, have shown the ability to solve tasks they have not been directly trained on, and to flexibly operate with low amounts of data in different formats. In this review, we discuss fundamental building principles of GPMs and review recent applications of those models in the chemical sciences across the entire scientific process. While many of these applications are still in the prototype phase, we expect that the increasing interest in GPMs will make many of them mature in the coming years.
nan
Article 1383
Title@2025-07-10 (4): C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition
Title: C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition | C3T: Grenzüberschreitender Transfer durch Zeit für sensorgestützte menschliche Aktivitätserkennung | C3T: 以传感器为基础的人类活动识别跨时间跨模式转让 2407.16803v4 |
Authors (3): Abhi Kamboj, Anh Duy Nguyen, Minh N. Do
In order to unlock the potential of diverse sensors, we investigate a method to transfer knowledge between time-series modalities using a multimodal \textit{temporal} representation space for Human Activity Recognition (HAR). Specifically, we explore the setting where the modality used in testing has no labeled data during training, which we refer to as Unsupervised Modality Adaptation (UMA). We categorize existing UMA approaches as Student-Teacher or Contrastive Alignment methods. These methods typically compress continuous-time data samples into single latent vectors during alignment, inhibiting their ability to transfer temporal information through real-world temporal distortions. To address this, we introduce Cross-modal Transfer Through Time (C3T), which preserves temporal information during alignment to handle dynamic sensor data better. C3T achieves this by aligning a set of temporal latent vectors across sensing modalities. Our extensive experiments on various camera+IMU datasets demonstrate that C3T outperforms existing methods in UMA by at least 8% in accuracy and shows superior robustness to temporal distortions such as time-shift, misalignment, and dilation. Our findings suggest that C3T has significant potential for developing generalizable models for time-series sensor data, opening new avenues for various multimodal applications.
nan
Article 1384
Title@2025-07-10 (4): ARBoids: Adaptive Residual Reinforcement Learning With Boids Model for Cooperative Multi-USV Target Defense
Title: ARBoids: Adaptive Residual Reinforcement Learning With Boids Model for Cooperative Multi-USV Target Defense | ARBoids: Adaptives Residual-Verstärkungs-Lernen mit Boids-Modell für kooperative Multi-USV-Zielverteidigung | ABBOids:多紫外线合作多紫外线目标防御用BOids模式进行适应性残余强化学习 2502.18549v2 |
Authors (4): Jiyue Tao, Tongsheng Shen, Dexin Zhao, Feitian Zhang
The target defense problem (TDP) for unmanned surface vehicles (USVs) concerns intercepting an adversarial USV before it breaches a designated target region, using one or more defending USVs. A particularly challenging scenario arises when the attacker exhibits superior maneuverability compared to the defenders, significantly complicating effective interception. To tackle this challenge, this letter introduces ARBoids, a novel adaptive residual reinforcement learning framework that integrates deep reinforcement learning (DRL) with the biologically inspired, force-based Boids model. Within this framework, the Boids model serves as a computationally efficient baseline policy for multi-agent coordination, while DRL learns a residual policy to adaptively refine and optimize the defenders’ actions. The proposed approach is validated in a high-fidelity Gazebo simulation environment, demonstrating superior performance over traditional interception strategies, including pure force-based approaches and vanilla DRL policies. Furthermore, the learned policy exhibits strong adaptability to attackers with diverse maneuverability profiles, highlighting its robustness and generalization capability. The code of ARBoids will be released upon acceptance of this letter.
nan
Article 1385
Title@2025-07-10 (4): ODIA: Oriented Distillation for Inline Acceleration of LLM-based Function Calling
Title: ODIA: Oriented Distillation for Inline Acceleration of LLM-based Function Calling | ODIA: Orientierte Destillation zur Inline-Beschleunigung des LLM-basierten Funktionsaufrufs | ODIA:以LLM为基础的功能调用为内联加速进行定向蒸馏 2507.08877v1 |
Authors (5): Hanlong Zhang, Jingsheng Yang, Hao Li, Yuhao He, Franck Gong
Function Calling is a crucial technique that enables Large Language Models (LLMs) to interact with external systems through APIs. However, the high latency associated with LLM-based Function Calling significantly impacts user experience. This paper presents a novel approach called Oriented Distillation for Inline Acceleration (ODIA) that leverages online user interaction data to accelerate Function Calling. By automatically identifying “simple queries” from production traffic and distilling knowledge from larger models to smaller ones, our method reduces response latency by 45% (expected) and 78% (median) while maintaining accuracy. We demonstrate the effectiveness of our approach through real-world deployment in a music application, where the smaller model successfully handles 60% of traffic with negligible accuracy loss. Our method requires minimal human intervention and continuously improves through automated data collection and model updating, making it a practical solution for production environments.
nan
Article 1386
Title@2025-07-10 (4): Harmonic Loss Trains Interpretable AI Models
Title: Harmonic Loss Trains Interpretable AI Models | Harmonische Verlust Züge Interpretierbare KI-Modelle | 可解释的 AI 模型 2502.01628v2 |
Authors (4): David D. Baek, Ziming Liu, Riya Tyagi, Max Tegmark
In this paper, we introduce harmonic loss as an alternative supervisory signal for training neural networks and large language models (LLMs). Harmonic loss differs from standard cross-entropy loss by (a) replacing the usual SoftMax normalization with a scale-invariant HarMax function and (b) computing logits via Euclidean distance rather than a dot product. Harmonic loss enables improved interpretability and faster convergence, owing to its scale invariance and finite convergence point by design, which can be interpreted as a class center. We first validate the performance of harmonic models across algorithmic, vision, and language datasets. Through extensive experiments, we demonstrate that models trained with harmonic loss perform better than standard models by: (a) enhancing interpretability, (b) requiring less data for generalization, and (c) reducing grokking. Moreover, we compare a GPT-2 model trained with harmonic loss to the standard GPT-2, illustrating that the harmonic model develops more interpretable representations. Looking forward, we believe harmonic loss may become a valuable tool in domains with limited data availability or in high-stakes applications where interpretability and reliability are paramount, paving the way for more robust and efficient neural network models.
nan
Article 1387
Title@2025-07-10 (4): Probabilistic Approximate Optimization: A New Variational Monte Carlo Algorithm
Title: Probabilistic Approximate Optimization: A New Variational Monte Carlo Algorithm | Probabilistische annähernde Optimierung: Eine neue Variation des Monte Carlo-Algorithmus | 概率近似优化:新的变异性蒙特卡洛算法 2507.07420v1 |
Authors (4): Abdelrahman S. Abdelrahman, Shuvro Chowdhury, Flaviano Morone, Kerem Y. Camsari
We introduce a generalized \textit{Probabilistic Approximate Optimization Algorithm (PAOA)}, a classical variational Monte Carlo framework that extends and formalizes prior work by Weitz \textit{et al.}~\cite{Combes_2023}, enabling parameterized and fast sampling on present-day Ising machines and probabilistic computers. PAOA operates by iteratively modifying the couplings of a network of binary stochastic units, guided by cost evaluations from independent samples. We establish a direct correspondence between derivative-free updates and the gradient of the full $2^N \times 2^N$ Markov flow, showing that PAOA admits a principled variational formulation. Simulated annealing emerges as a limiting case under constrained parameterizations, and we implement this regime on an FPGA-based probabilistic computer with on-chip annealing to solve large 3D spin-glass problems. Benchmarking PAOA against QAOA on the canonical 26-spin Sherrington-Kirkpatrick model with matched parameters reveals superior performance for PAOA. We show that PAOA naturally extends simulated annealing by optimizing multiple temperature profiles, leading to improved performance over SA on heavy-tailed problems such as SK-L'evy.
nan
Article 1388
Title@2025-07-10 (4): Autonomous AI-based Cybersecurity Framework for Critical Infrastructure: Real-Time Threat Mitigation
Title: Autonomous AI-based Cybersecurity Framework for Critical Infrastructure: Real-Time Threat Mitigation | Autonomes KI-basiertes Cybersecurity Framework für kritische Infrastruktur: Echtzeit-Bedrohungsmilderung | 以AI为基础的关键基础设施自动网络安全框架:减少实时威胁 2507.07416v1 |
Authors (4): Jenifer Paulraj, Brindha Raghuraman, Nagarani Gopalakrishnan, Yazan Otoum
Critical infrastructure systems, including energy grids, healthcare facilities, transportation networks, and water distribution systems, are pivotal to societal stability and economic resilience. However, the increasing interconnectivity of these systems exposes them to various cyber threats, including ransomware, Denial-of-Service (DoS) attacks, and Advanced Persistent Threats (APTs). This paper examines cybersecurity vulnerabilities in critical infrastructure, highlighting the threat landscape, attack vectors, and the role of Artificial Intelligence (AI) in mitigating these risks. We propose a hybrid AI-driven cybersecurity framework to enhance real-time vulnerability detection, threat modelling, and automated remediation. This study also addresses the complexities of adversarial AI, regulatory compliance, and integration. Our findings provide actionable insights to strengthen the security and resilience of critical infrastructure systems against emerging cyber threats.
nan
Article 1389
Title@2025-07-10 (4): Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks
Title: Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks | Hybride LLM-verstärkte Intrusionserkennung für Zero-Day-Bedrohungen in IoT-Netzwerken | 在IoT网络零日威胁下加强入侵探测 2507.07413v1 |
Authors (4): Mohammad F. Al-Hammouri, Yazan Otoum, Rasha Atwa, Amiya Nayak
This paper presents a novel approach to intrusion detection by integrating traditional signature-based methods with the contextual understanding capabilities of the GPT-2 Large Language Model (LLM). As cyber threats become increasingly sophisticated, particularly in distributed, heterogeneous, and resource-constrained environments such as those enabled by the Internet of Things (IoT), the need for dynamic and adaptive Intrusion Detection Systems (IDSs) becomes increasingly urgent. While traditional methods remain effective for detecting known threats, they often fail to recognize new and evolving attack patterns. In contrast, GPT-2 excels at processing unstructured data and identifying complex semantic relationships, making it well-suited to uncovering subtle, zero-day attack vectors. We propose a hybrid IDS framework that merges the robustness of signature-based techniques with the adaptability of GPT-2-driven semantic analysis. Experimental evaluations on a representative intrusion dataset demonstrate that our model enhances detection accuracy by 6.3%, reduces false positives by 9.0%, and maintains near real-time responsiveness. These results affirm the potential of language model integration to build intelligent, scalable, and resilient cybersecurity defences suited for modern connected environments.
nan
Article 1390
Title@2025-07-10 (4): Determinant Estimation under Memory Constraints and Neural Scaling Laws
Title: Determinant Estimation under Memory Constraints and Neural Scaling Laws | Determinante Abschätzung unter Gedächtnisbeschränkungen und neuralen Skalierungsgesetzen | 根据记忆限制和神经扩增法对决定因素进行估算 2503.04424v2 |
Authors (5): Siavash Ameli, Chris van der Heide, Liam Hodgkinson, Fred Roosta, Michael W. Mahoney
Calculating or accurately estimating log-determinants of large positive definite matrices is of fundamental importance in many machine learning tasks. While its cubic computational complexity can already be prohibitive, in modern applications, even storing the matrices themselves can pose a memory bottleneck. To address this, we derive a novel hierarchical algorithm based on block-wise computation of the LDL decomposition for large-scale log-determinant calculation in memory-constrained settings. In extreme cases where matrices are highly ill-conditioned, accurately computing the full matrix itself may be infeasible. This is particularly relevant when considering kernel matrices at scale, including the empirical Neural Tangent Kernel (NTK) of neural networks trained on large datasets. Under the assumption of neural scaling laws in the test error, we show that the ratio of pseudo-determinants satisfies a power-law relationship, allowing us to derive corresponding scaling laws. This enables accurate estimation of NTK log-determinants from a tiny fraction of the full dataset; in our experiments, this results in a $\sim$100,000$\times$ speedup with improved accuracy over competing approximations. Using these techniques, we successfully estimate log-determinants for dense matrices of extreme sizes, which were previously deemed intractable and inaccessible due to their enormous scale and computational demands.
nan
Article 1391
Title@2025-07-10 (4): Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models
Title: Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models | Phishing Detection in der Gen-AI Ära: Quantisierte LLMs gegen klassische Modelle | Gen-AI 时代中的幻影探测:量化的LMs 与古典模型 2507.07406v1 |
Authors (4): Jikesh Thapa, Gurrehmat Chahal, Serban Voinea Gabreanu, Yazan Otoum
Phishing attacks are becoming increasingly sophisticated, underscoring the need for detection systems that strike a balance between high accuracy and computational efficiency. This paper presents a comparative evaluation of traditional Machine Learning (ML), Deep Learning (DL), and quantized small-parameter Large Language Models (LLMs) for phishing detection. Through experiments on a curated dataset, we show that while LLMs currently underperform compared to ML and DL methods in terms of raw accuracy, they exhibit strong potential for identifying subtle, context-based phishing cues. We also investigate the impact of zero-shot and few-shot prompting strategies, revealing that LLM-rephrased emails can significantly degrade the performance of both ML and LLM-based detectors. Our benchmarking highlights that models like DeepSeek R1 Distill Qwen 14B (Q8_0) achieve competitive accuracy, above 80%, using only 17GB of VRAM, supporting their viability for cost-efficient deployment. We further assess the models’ adversarial robustness and cost-performance tradeoffs, and demonstrate how lightweight LLMs can provide concise, interpretable explanations to support real-time decision-making. These findings position optimized LLMs as promising components in phishing defence systems and offer a path forward for integrating explainable, efficient AI into modern cybersecurity frameworks.
nan
Article 1392
Title@2025-07-10 (4): HGMP:Heterogeneous Graph Multi-Task Prompt Learning
Title: HGMP:Heterogeneous Graph Multi-Task Prompt Learning | HGMP:Heterogenes Graph-Multi-Task-Prompt-Lernen | HGMP: 异基因图多任务快速学习 2507.07405v1 |
Authors (7): Pengfei Jiao, Jialong Ni, Di Jin, Xuan Guo, Huan Liu, Hongjiang Chen, Yanxian Bi
The pre-training and fine-tuning methods have gained widespread attention in the field of heterogeneous graph neural networks due to their ability to leverage large amounts of unlabeled data during the pre-training phase, allowing the model to learn rich structural features. However, these methods face the issue of a mismatch between the pre-trained model and downstream tasks, leading to suboptimal performance in certain application scenarios. Prompt learning methods have emerged as a new direction in heterogeneous graph tasks, as they allow flexible adaptation of task representations to address target inconsistency. Building on this idea, this paper proposes a novel multi-task prompt framework for the heterogeneous graph domain, named HGMP. First, to bridge the gap between the pre-trained model and downstream tasks, we reformulate all downstream tasks into a unified graph-level task format. Next, we address the limitations of existing graph prompt learning methods, which struggle to integrate contrastive pre-training strategies in the heterogeneous graph domain. We design a graph-level contrastive pre-training strategy to better leverage heterogeneous information and enhance performance in multi-task scenarios. Finally, we introduce heterogeneous feature prompts, which enhance model performance by refining the representation of input graph features. Experimental results on public datasets show that our proposed method adapts well to various tasks and significantly outperforms baseline methods.
nan
Article 1393
Title@2025-07-10 (4): Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization
Title: Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization | Generalized Tree Edit Distance (GTED): Ein treues Bewertungsmetrikum für die Autoformalisierung von Aussagen | 通用树版编辑距离(GTED):声明自动正规化的忠实评价度量 2507.07399v1 |
Authors (9): Yuntian Liu, Tao Zhu, Xiaoyang Liu, Yu Chen, Zhaoxuan Liu, Qingfeng Guo, Jiashuo Zhang, Kangjie Bao, Tao Luo
Statement autoformalization, the automated translation of statement from natural language into formal languages, has become a subject of extensive research, yet the development of robust automated evaluation metrics remains limited. Existing evaluation methods often lack semantic understanding, face challenges with high computational costs, and are constrained by the current progress of automated theorem proving. To address these issues, we propose GTED (Generalized Tree Edit Distance), a novel evaluation framework that first standardizes formal statements and converts them into operator trees, then determines the semantic similarity using the eponymous GTED metric. On the miniF2F and ProofNet benchmarks, GTED outperforms all baseline metrics by achieving the highest accuracy and Kappa scores, thus providing the community with a more faithful metric for automated evaluation. The code and experimental results are available at https://github.com/XiaoyangLiu-sjtu/GTED.
nan
Article 1394
Title@2025-07-10 (4): IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
Title: IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing | IML-Spikeformer: Multi-Level Spiking Transformer für die Sprachverarbeitung | IML-Spikeex: 用于语音处理的具有投入意识的多层Spiking变换器 2507.07396v1 |
Authors (5): Zeyang Song, Shimin Zhang, Yuhong Chou, Jibin Wu, Haizhou Li
Spiking Neural Networks (SNNs), inspired by biological neural mechanisms, represent a promising neuromorphic computing paradigm that offers energy-efficient alternatives to traditional Artificial Neural Networks (ANNs). Despite proven effectiveness, SNN architectures have struggled to achieve competitive performance on large-scale speech processing task. Two key challenges hinder progress: (1) the high computational overhead during training caused by multi-timestep spike firing, and (2) the absence of large-scale SNN architectures tailored to speech processing tasks. To overcome the issues, we introduce Input-aware Multi-Level Spikeformer, i.e. IML-Spikeformer, a spiking Transformer architecture specifically designed for large-scale speech processing. Central to our design is the Input-aware Multi-Level Spike (IMLS) mechanism, which simulate multi-timestep spike firing within a single timestep using an adaptive, input-aware thresholding scheme. IML-Spikeformer further integrates a Reparameterized Spiking Self-Attention (RepSSA) module with a Hierarchical Decay Mask (HDM), forming the HD-RepSSA module. This module enhances the precision of attention maps and enables modeling of multi-scale temporal dependencies in speech signals. Experiments demonstrate that IML-Spikeformer achieves word error rates of 6.0\% on AiShell-1 and 3.4\% on Librispeech-960, comparable to conventional ANN transformers while reducing theoretical inference energy consumption by 4.64$\times$ and 4.32$\times$ respectively. IML-Spikeformer marks an advance of scalable SNN architectures for large-scale speech processing in both task performance and energy efficiency.
nan
Article 1395
Title@2025-07-10 (4): Learning Collective Variables from Time-lagged Generation
Title: Learning Collective Variables from Time-lagged Generation | Kollektive Variablen aus der zeitverzögerten Generation lernen | 时间滞后一代的学习集体变量 2507.07390v1 |
Authors (5): Seonghyun Park, Kiyoung Seong, Soojung Yang, Rafael Gómez-Bombarelli, Sungsoo Ahn
Rare events such as state transitions are difficult to observe directly with molecular dynamics simulations due to long timescales. Enhanced sampling techniques overcome this by introducing biases along carefully chosen low-dimensional features, known as collective variables (CVs), which capture the slow degrees of freedom. Machine learning approaches (MLCVs) have automated CV discovery, but existing methods typically focus on discriminating meta-stable states without fully encoding the detailed dynamics essential for accurate sampling. We propose TLC, a framework that learns CVs directly from time-lagged conditions of a generative model. Instead of modeling the static Boltzmann distribution, TLC models a time-lagged conditional distribution yielding CVs to capture the slow dynamic behavior. We validate TLC on the Alanine Dipeptide system using two CV-based enhanced sampling tasks: (i) steered molecular dynamics (SMD) and (ii) on-the-fly probability enhanced sampling (OPES), demonstrating equal or superior performance compared to existing MLCV methods in both transition path sampling and state discrimination.
nan
Article 1396
Title@2025-07-10 (4): ST-GRIT: Spatio-Temporal Graph Transformer For Internal Ice Layer Thickness Prediction
Title: ST-GRIT: Spatio-Temporal Graph Transformer For Internal Ice Layer Thickness Prediction | ST-GRIT: Spatio-Temporal Graph Transformer für interne Eisschichtdicke Vorhersage | ST-GRIT: 内部冰层厚度预测的时空图变异器 2507.07389v1 |
Authors (2): Zesheng Liu, Maryam Rahnemoonfar
Understanding the thickness and variability of internal ice layers in radar imagery is crucial for monitoring snow accumulation, assessing ice dynamics, and reducing uncertainties in climate models. Radar sensors, capable of penetrating ice, provide detailed radargram images of these internal layers. In this work, we present ST-GRIT, a spatio-temporal graph transformer for ice layer thickness, designed to process these radargrams and capture the spatiotemporal relationships between shallow and deep ice layers. ST-GRIT leverages an inductive geometric graph learning framework to extract local spatial features as feature embeddings and employs a series of temporal and spatial attention blocks separately to model long-range dependencies effectively in both dimensions. Experimental evaluation on radargram data from the Greenland ice sheet demonstrates that ST-GRIT consistently outperforms current state-of-the-art methods and other baseline graph neural networks by achieving lower root mean-squared error. These results highlight the advantages of self-attention mechanisms on graphs over pure graph neural networks, including the ability to handle noise, avoid oversmoothing, and capture long-range dependencies. Moreover, the use of separate spatial and temporal attention blocks allows for distinct and robust learning of spatial relationships and temporal patterns, providing a more comprehensive and effective approach.
nan
Article 1397
Title@2025-07-10 (4): GRIT: Graph Transformer For Internal Ice Layer Thickness Prediction
Title: GRIT: Graph Transformer For Internal Ice Layer Thickness Prediction | GRIT: Graph Transformer für interne Eisschichtdicke Vorhersage | GRIT: 内部冰层厚度预测的图形变形器 2507.07388v1 |
Authors (2): Zesheng Liu, Maryam Rahnemoonfar
Gaining a deeper understanding of the thickness and variability of internal ice layers in Radar imagery is essential in monitoring the snow accumulation, better evaluating ice dynamics processes, and minimizing uncertainties in climate models. Radar sensors, capable of penetrating ice, capture detailed radargram images of internal ice layers. In this work, we introduce GRIT, graph transformer for ice layer thickness. GRIT integrates an inductive geometric graph learning framework with an attention mechanism, designed to map the relationships between shallow and deeper ice layers. Compared to baseline graph neural networks, GRIT demonstrates consistently lower prediction errors. These results highlight the attention mechanism’s effectiveness in capturing temporal changes across ice layers, while the graph transformer combines the strengths of transformers for learning long-range dependencies with graph neural networks for capturing spatial patterns, enabling robust modeling of complex spatiotemporal dynamics.
nan
Article 1398
Title@2025-07-10 (4): HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning
Title: HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning | HeLo: Heterogene Multi-Modal Fusion mit Labelkorrelation für Emotion Distribution Learning | HeLo:情感分布学习中带有标签关联的异变多模式融合 2507.06821v2 |
Authors (5): Chuhang Zheng, Chunwei Tian, Jie Wen, Daoqiang Zhang, Qi Zhu
Multi-modal emotion recognition has garnered increasing attention as it plays a significant role in human-computer interaction (HCI) in recent years. Since different discrete emotions may exist at the same time, compared with single-class emotion recognition, emotion distribution learning (EDL) that identifies a mixture of basic emotions has gradually emerged as a trend. However, existing EDL methods face challenges in mining the heterogeneity among multiple modalities. Besides, rich semantic correlations across arbitrary basic emotions are not fully exploited. In this paper, we propose a multi-modal emotion distribution learning framework, named HeLo, aimed at fully exploring the heterogeneity and complementary information in multi-modal emotional data and label correlation within mixed basic emotions. Specifically, we first adopt cross-attention to effectively fuse the physiological data. Then, an optimal transport (OT)-based heterogeneity mining module is devised to mine the interaction and heterogeneity between the physiological and behavioral representations. To facilitate label correlation learning, we introduce a learnable label embedding optimized by correlation matrix alignment. Finally, the learnable label embeddings and label correlation matrices are integrated with the multi-modal representations through a novel label correlation-driven cross-attention mechanism for accurate emotion distribution learning. Experimental results on two publicly available datasets demonstrate the superiority of our proposed method in emotion distribution learning.
nan
Article 1399
Title@2025-07-10 (4): Online Continual Learning via Spiking Neural Networks with Sleep Enhanced Latent Replay
Title: Online Continual Learning via Spiking Neural Networks with Sleep Enhanced Latent Replay | Online Continual Learning über Spiking Neuronal Networks mit Schlaf Enhanced Latent Replay | 通过Spiking神经网络在线持续学习,并配有睡眠强化前端重播 2507.02901v2 |
Authors (5): Erliang Lin, Wenbin Luo, Wei Jia, Yu Chen, Shaofu Yang
Edge computing scenarios necessitate the development of hardware-efficient online continual learning algorithms to be adaptive to dynamic environment. However, existing algorithms always suffer from high memory overhead and bias towards recently trained tasks. To tackle these issues, this paper proposes a novel online continual learning approach termed as SESLR, which incorporates a sleep enhanced latent replay scheme with spiking neural networks (SNNs). SESLR leverages SNNs’ binary spike characteristics to store replay features in single bits, significantly reducing memory overhead. Furthermore, inspired by biological sleep-wake cycles, SESLR introduces a noise-enhanced sleep phase where the model exclusively trains on replay samples with controlled noise injection, effectively mitigating classification bias towards new classes. Extensive experiments on both conventional (MNIST, CIFAR10) and neuromorphic (NMNIST, CIFAR10-DVS) datasets demonstrate SESLR’s effectiveness. On Split CIFAR10, SESLR achieves nearly 30% improvement in average accuracy with only one-third of the memory consumption compared to baseline methods. On Split CIFAR10-DVS, it improves accuracy by approximately 10% while reducing memory overhead by a factor of 32. These results validate SESLR as a promising solution for online continual learning in resource-constrained edge computing scenarios.
nan
Article 1400
Title@2025-07-10 (4): Unifews: You Need Fewer Operations for Efficient Graph Neural Networks
Title: Unifews: You Need Fewer Operations for Efficient Graph Neural Networks | Unifews: Sie brauchen weniger Operationen für effiziente Graphen-Neural-Netzwerke | Unifews: 高效图形神经网络需要更少操作 2403.13268v2 |
Authors (4): Ningyi Liao, Zihao Yu, Ruixiao Zeng, Siqiang Luo
Graph Neural Networks (GNNs) have shown promising performance, but at the cost of resource-intensive operations on graph-scale matrices. To reduce computational overhead, previous studies attempt to sparsify the graph or network parameters, but with limited flexibility and precision boundaries. In this work, we propose Unifews, a joint sparsification technique to unify graph and weight matrix operations and enhance GNN learning efficiency. The Unifews design enables adaptive compression across GNN layers with progressively increased sparsity, and is applicable to a variety of architectures with on-the-fly simplification. Theoretically, we establish a novel framework to characterize sparsified GNN learning in view of the graph optimization process, showing that Unifews effectively approximates the learning objective with bounded error and reduced computational overhead. Extensive experiments demonstrate that Unifews achieves efficiency improvements with comparable or better accuracy, including 10-20x matrix operation reduction and up to 100x acceleration on graphs up to billion-edge scale.
nan
Article 1401
Title@2025-07-10 (4): User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection
Title: User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection | Benutzerbasierte sequentielle Modellierung mit Transformer-Encodern für Insider Threat Detection | 以用户为基础的序列模型,使用变换器编码器进行内部威胁探测 2506.23446v2 |
Authors (2): Mohamed Elbasheer, Adewale Akinfaderin
Insider threat detection presents unique challenges due to the authorized status of malicious actors and the subtlety of anomalous behaviors. Existing machine learning methods often treat user activity as isolated events, thereby failing to leverage sequential dependencies in user behavior. In this study, we propose a User-Based Sequencing (UBS) methodology, transforming the CERT insider threat dataset into structured temporal sequences suitable for deep sequential modeling. We deploy a Transformer Encoder architecture to model benign user activity and employ its reconstruction errors as anomaly scores. These scores are subsequently evaluated using three unsupervised outlier detection algorithms: One-Class SVM (OCSVM), Local Outlier Factor (LOF), and Isolation Forest (iForest). Across four rigorously designed test sets, including combinations of multiple CERT dataset releases, our UBS-Transformer pipeline consistently achieves state-of-the-art performance - notably 96.61% accuracy, 99.43% recall, 96.38% F1-score, 95.00% AUROC, and exceptionally low false negative (0.0057) and false positive (0.0571) rates. Comparative analyses demonstrate that our approach substantially outperforms tabular and conventional autoencoder baselines, underscoring the efficacy of sequential user modeling and advanced anomaly detection in the insider threat domain.
nan
Article 1402
Title@2025-07-10 (4): An Automated Classifier of Harmful Brain Activities for Clinical Usage Based on a Vision-Inspired Pre-trained Framework
Title: An Automated Classifier of Harmful Brain Activities for Clinical Usage Based on a Vision-Inspired Pre-trained Framework | Ein automatisierter Klassifikator schädlicher Gehirnaktivitäten für die klinische Anwendung basierend auf einem Vision-Inspired Pre-trained Framework | 以 “ 愿景引导的预培训框架 “ 为基础,对临床使用的有害脑活动进行自动分类 2507.08874v1 |
Authors (13): Yulin Sun, Xiaopeng Si, Runnan He, Xiao Hu, Peter Smielewski, Wenlong Wang, Xiaoguang Tong, Wei Yue, Meijun Pang, Kuo Zhang, Xizi Song, Dong Ming, Xiuyun Liu
Timely identification of harmful brain activities via electroencephalography (EEG) is critical for brain disease diagnosis and treatment, which remains limited application due to inter-rater variability, resource constraints, and poor generalizability of existing artificial intelligence (AI) models. In this study, a convolutional neural network model, VIPEEGNet, was developed and validated using EEGs recorded from Massachusetts General Hospital/Harvard Medical School. The VIPEEGNet was developed and validated using two independent datasets, collected between 2006 and 2020. The development cohort included EEG recordings from 1950 patients, with 106,800 EEG segments annotated by at least one experts (ranging from 1 to 28). The online testing cohort consisted of EEG segments from a subset of an additional 1,532 patients, each annotated by at least 10 experts. For the development cohort (n=1950), the VIPEEGNet achieved high accuracy, with an AUROC for binary classification of seizure, LPD, GPD, LRDA, GRDA, and “other” categories at 0.972 (95% CI, 0.957-0.988), 0.962 (95% CI, 0.954-0.970), 0.972 (95% CI, 0.960-0.984), 0.938 (95% CI, 0.917-0.959), 0.949 (95% CI, 0.941-0.957), and 0.930 (95% CI, 0.926-0.935). For multi classification, the sensitivity of VIPEEGNET for the six categories ranges from 36.8% to 88.2% and the precision ranges from 55.6% to 80.4%, and performance similar to human experts. Notably, the external validation showed Kullback-Leibler Divergence (KLD)of 0.223 and 0.273, ranking top 2 among the existing 2,767 competing algorithms, while we only used 2.8% of the parameters of the first-ranked algorithm.
nan
Article 1403
Title@2025-07-10 (4): BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
Title: BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems | BountyBench: Dollar-Impact von KI-Agenten-Angriffen und Verteidigern auf reale Cybersicherheitssysteme | BuntyBuntyBunnench: AI代理攻击者和捍卫者对现实世界网络安全系统的美元影响 2505.15216v2 |
Authors (34): Andy K. Zhang, Joey Ji, Celeste Menders, Riya Dulepet, Thomas Qin, Ron Y. Wang, Junrong Wu, Kyleen Liao, Jiliang Li, Jinghan Hu, Sara Hong, Nardos Demilew, Shivatmica Murgai, Jason Tran, Nishka Kacheria, Ethan Ho, Denis Liu, Lauren McLane, Olivia Bruvik, Dai-Rong Han, Seungwoo Kim, Akhil Vyas, Cuiyuanxiu Chen, Ryan Li, Weiran Xu, Jonathan Z. Ye, Prerit Choudhary, Siddharth M. Bhatia, Vikram Sivashankar, Yuxuan Bao, Dawn Song, Dan Boneh, Daniel E. Ho, Percy Liang
AI agents have the potential to significantly alter the cybersecurity landscape. Here, we introduce the first framework to capture offensive and defensive cyber-capabilities in evolving real-world systems. Instantiating this framework with BountyBench, we set up 25 systems with complex, real-world codebases. To capture the vulnerability lifecycle, we define three task types: Detect (detecting a new vulnerability), Exploit (exploiting a specific vulnerability), and Patch (patching a specific vulnerability). For Detect, we construct a new success indicator, which is general across vulnerability types and provides localized evaluation. We manually set up the environment for each system, including installing packages, setting up server(s), and hydrating database(s). We add 40 bug bounties, which are vulnerabilities with monetary awards of $10-$30,485, covering 9 of the OWASP Top 10 Risks. To modulate task difficulty, we devise a new strategy based on information to guide detection, interpolating from identifying a zero day to exploiting a specific vulnerability. We evaluate 8 agents: Claude Code, OpenAI Codex CLI with o3-high and o4-mini, and custom agents with o3-high, GPT-4.1, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet Thinking, and DeepSeek-R1. Given up to three attempts, the top-performing agents are OpenAI Codex CLI: o3-high (12.5% on Detect, mapping to $3,720; 90% on Patch, mapping to $14,152), Custom Agent with Claude 3.7 Sonnet Thinking (67.5% on Exploit), and OpenAI Codex CLI: o4-mini (90% on Patch, mapping to $14,422). OpenAI Codex CLI: o3-high, OpenAI Codex CLI: o4-mini, and Claude Code are more capable at defense, achieving higher Patch scores of 90%, 90%, and 87.5%, compared to Exploit scores of 47.5%, 32.5%, and 57.5% respectively; while the custom agents are relatively balanced between offense and defense, achieving Exploit scores of 37.5-67.5% and Patch scores of 35-60%.
nan
Article 1404
Title@2025-07-10 (4): A Multi-Granularity Supervised Contrastive Framework for Remaining Useful Life Prediction of Aero-engines
Title: A Multi-Granularity Supervised Contrastive Framework for Remaining Useful Life Prediction of Aero-engines | Ein Multi-Granularität überwacht Kontrastive Rahmen für das Bleiben nützlicher Lebensvorhersage von Aero-Motoren | 空气-发动机剩余使用寿命预测多族监督多族监督违规框架 2411.00461v3 |
Authors (6): Zixuan He, Ziqian Kong, Zhengyu Chen, Yuling Zhan, Zijun Que, Zhengguo Xu
Accurate remaining useful life (RUL) predictions are critical to the safe operation of aero-engines. Currently, the RUL prediction task is mainly a regression paradigm with only mean square error as the loss function and lacks research on feature space structure, the latter of which has shown excellent performance in a large number of studies. This paper develops a multi-granularity supervised contrastive (MGSC) framework from plain intuition that samples with the same RUL label should be aligned in the feature space, and address the problems of too large minibatch size and unbalanced samples in the implementation. The RUL prediction with MGSC is implemented on using the proposed multi-phase training strategy. This paper also demonstrates a simple and scalable basic network structure and validates the proposed MGSC strategy on the CMPASS dataset using a convolutional long short-term memory network as a baseline, which effectively improves the accuracy of RUL prediction.
nan
Article 1405
Title@2025-07-10 (4): Bradley-Terry and Multi-Objective Reward Modeling Are Complementary
Title: Bradley-Terry and Multi-Objective Reward Modeling Are Complementary | Bradley-Terry und Multi-Objective Reward Modeling sind komplementär | Bradley-Terriy和多目标奖励模型具有补充作用 2507.07375v1 |
Authors (13): Zhiwei Zhang, Hui Liu, Xiaomin Li, Zhenwei Dai, Jingying Zeng, Fali Wang, Minhua Lin, Ramraj Chandradevan, Zhen Li, Chen Luo, Xianfeng Tang, Qi He, Suhang Wang
Reward models trained on human preference data have demonstrated strong effectiveness in aligning Large Language Models (LLMs) with human intent under the framework of Reinforcement Learning from Human Feedback (RLHF). However, RLHF remains vulnerable to reward hacking, where the policy exploits imperfections in the reward function rather than genuinely learning the intended behavior. Although significant efforts have been made to mitigate reward hacking, they predominantly focus on and evaluate in-distribution scenarios, where the training and testing data for the reward model share the same distribution. In this paper, we empirically show that state-of-the-art methods struggle in more challenging out-of-distribution (OOD) settings. We further demonstrate that incorporating fine-grained multi-attribute scores helps address this challenge. However, the limited availability of high-quality data often leads to weak performance of multi-objective reward functions, which can negatively impact overall performance and become the bottleneck. To address this issue, we propose a unified reward modeling framework that jointly trains Bradley–Terry (BT) single-objective and multi-objective regression-based reward functions using a shared embedding space. We theoretically establish a connection between the BT loss and the regression objective and highlight their complementary benefits. Specifically, the regression task enhances the single-objective reward function’s ability to mitigate reward hacking in challenging OOD settings, while BT-based training improves the scoring capability of the multi-objective reward function, enabling a 7B model to outperform a 70B baseline. Extensive experimental results demonstrate that our framework significantly improves both the robustness and the scoring performance of reward models.
nan
Article 1406
Title@2025-07-10 (4): Atherosclerosis through Hierarchical Explainable Neural Network Analysis
Title: Atherosclerosis through Hierarchical Explainable Neural Network Analysis | Atherosklerose durch hierarchische erklärende neurale Netzwerkanalyse | 通过可解释的神经网络分析,通过高层次解析神经网络分析,实现天体硬化 2507.07373v1 |
Authors (10): Irsyad Adam, Steven Swee, Erika Yilin, Ethan Ji, William Speier, Dean Wang, Alex Bui, Wei Wang, Karol Watson, Peipei Ping
In this work, we study the problem pertaining to personalized classification of subclinical atherosclerosis by developing a hierarchical graph neural network framework to leverage two characteristic modalities of a patient: clinical features within the context of the cohort, and molecular data unique to individual patients. Current graph-based methods for disease classification detect patient-specific molecular fingerprints, but lack consistency and comprehension regarding cohort-wide features, which are an essential requirement for understanding pathogenic phenotypes across diverse atherosclerotic trajectories. Furthermore, understanding patient subtypes often considers clinical feature similarity in isolation, without integration of shared pathogenic interdependencies among patients. To address these challenges, we introduce ATHENA: Atherosclerosis Through Hierarchical Explainable Neural Network Analysis, which constructs a novel hierarchical network representation through integrated modality learning; subsequently, it optimizes learned patient-specific molecular fingerprints that reflect individual omics data, enforcing consistency with cohort-wide patterns. With a primary clinical dataset of 391 patients, we demonstrate that this heterogeneous alignment of clinical features with molecular interaction patterns has significantly boosted subclinical atherosclerosis classification performance across various baselines by up to 13% in area under the receiver operating curve (AUC) and 20% in F1 score. Taken together, ATHENA enables mechanistically-informed patient subtype discovery through explainable AI (XAI)-driven subnetwork clustering; this novel integration framework strengthens personalized intervention strategies, thereby improving the prediction of atherosclerotic disease progression and management of their clinical actionable outcomes.
nan
Article 1407
Title@2025-07-10 (4): Data-driven Kinematic Modeling in Soft Robots: System Identification and Uncertainty Quantification
Title: Data-driven Kinematic Modeling in Soft Robots: System Identification and Uncertainty Quantification | Datengesteuerte kinematische Modellierung in Soft Robots: Systemidentifikation und Unsicherheitsquantifizierung | 软机器人中数据驱动的虚拟模型:系统识别和不确定性量化 2507.07370v1 |
Authors (4): Zhanhong Jiang, Dylan Shah, Hsin-Jung Yang, Soumik Sarkar
Precise kinematic modeling is critical in calibration and controller design for soft robots, yet remains a challenging issue due to their highly nonlinear and complex behaviors. To tackle the issue, numerous data-driven machine learning approaches have been proposed for modeling nonlinear dynamics. However, these models suffer from prediction uncertainty that can negatively affect modeling accuracy, and uncertainty quantification for kinematic modeling in soft robots is underexplored. In this work, using limited simulation and real-world data, we first investigate multiple linear and nonlinear machine learning models commonly used for kinematic modeling of soft robots. The results reveal that nonlinear ensemble methods exhibit the most robust generalization performance. We then develop a conformal kinematic modeling framework for soft robots by utilizing split conformal prediction to quantify predictive position uncertainty, ensuring distribution-free prediction intervals with a theoretical guarantee.
nan
Article 1408
Title@2025-07-10 (4): Contrastive Language-Image Pre-Training Model based Semantic Communication Performance Optimization
Title: Contrastive Language-Image Pre-Training Model based Semantic Communication Performance Optimization | Kontrastive Sprach-Image Pre-Training Modellbasierte Semantische Kommunikationsleistung Optimierung | 基于语义交流交流绩效优化的示范示范 2507.08873v1 |
Authors (6): Shaoran Yang, Dongyu Wei, Hanzhi Yu, Zhaohui Yang, Yuchen Liu, Mingzhe Chen
In this paper, a novel contrastive language-image pre-training (CLIP) model based semantic communication framework is designed. Compared to standard neural network (e.g.,convolutional neural network) based semantic encoders and decoders that require joint training over a common dataset, our CLIP model based method does not require any training procedures thus enabling a transmitter to extract data meanings of the original data without neural network model training, and the receiver to train a neural network for follow-up task implementation without the communications with the transmitter. Next, we investigate the deployment of the CLIP model based semantic framework over a noisy wireless network. Since the semantic information generated by the CLIP model is susceptible to wireless noise and the spectrum used for semantic information transmission is limited, it is necessary to jointly optimize CLIP model architecture and spectrum resource block (RB) allocation to maximize semantic communication performance while considering wireless noise, the delay and energy used for semantic communication. To achieve this goal, we use a proximal policy optimization (PPO) based reinforcement learning (RL) algorithm to learn how wireless noise affect the semantic communication performance thus finding optimal CLIP model and RB for each user. Simulation results show that our proposed method improves the convergence rate by up to 40%, and the accumulated reward by 4x compared to soft actor-critic.
nan
Article 1409
Title@2025-07-10 (4): A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning
Title: A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning | Eine kryptografische Perspektive auf Mitigation vs. Detection in Machine Learning | 关于减缓与机械学习中的探测的加密视角 2504.20310v2 |
Authors (2): Greg Gluch, Shafi Goldwasser
In this paper, we initiate a cryptographically inspired theoretical study of detection versus mitigation of adversarial inputs produced by attackers on Machine Learning algorithms during inference time. We formally define defense by detection (DbD) and defense by mitigation (DbM). Our definitions come in the form of a 3-round protocol between two resource-bounded parties: a trainer/defender and an attacker. The attacker aims to produce inference-time inputs that fool the training algorithm. We define correctness, completeness, and soundness properties to capture successful defense at inference time while not degrading (too much) the performance of the algorithm on inputs from the training distribution. We first show that achieving DbD and achieving DbM are equivalent for ML classification tasks. Surprisingly, this is not the case for ML generative learning tasks, where there are many possible correct outputs for each input. We show a separation between DbD and DbM by exhibiting two generative learning tasks for which it is possible to defend by mitigation but it is provably impossible to defend by detection. The mitigation phase uses significantly less computational resources than the initial training algorithm. In the first learning task we consider sample complexity as the resource and in the second the time complexity. The first result holds under the assumption that the Identity-Based Fully Homomorphic Encryption (IB-FHE), publicly-verifiable zero-knowledge Succinct Non-Interactive Arguments of Knowledge (zk-SNARK), and Strongly Unforgeable Signatures exist. The second result assumes the existence of Non-Parallelizing Languages with Average-Case Hardness (NPL) and Incrementally-Verifiable Computation (IVC) and IB-FHE.
nan
Article 1410
Title@2025-07-10 (4): Platform for Representation and Integration of multimodal Molecular Embeddings
Title: Platform for Representation and Integration of multimodal Molecular Embeddings | Plattform für Repräsentation und Integration multimodaler molekularer Einbettungen | 多式联运分子嵌入的 代表性和一体化平台 2507.07367v1 |
Authors (11): Erika Yilin Zheng, Yu Yan, Baradwaj Simha Sankar, Ethan Ji, Steven Swee, Irsyad Adam, Ding Wang, Alexander Russell Pelletier, Alex Bui, Wei Wang, Peipei Ping
Existing machine learning methods for molecular (e.g., gene) embeddings are restricted to specific tasks or data modalities, limiting their effectiveness within narrow domains. As a result, they fail to capture the full breadth of gene functions and interactions across diverse biological contexts. In this study, we have systematically evaluated knowledge representations of biomolecules across multiple dimensions representing a task-agnostic manner spanning three major data sources, including omics experimental data, literature-derived text data, and knowledge graph-based representations. To distinguish between meaningful biological signals from chance correlations, we devised an adjusted variant of Singular Vector Canonical Correlation Analysis (SVCCA) that quantifies signal redundancy and complementarity across different data modalities and sources. These analyses reveal that existing embeddings capture largely non-overlapping molecular signals, highlighting the value of embedding integration. Building on this insight, we propose Platform for Representation and Integration of multimodal Molecular Embeddings (PRISME), a machine learning based workflow using an autoencoder to integrate these heterogeneous embeddings into a unified multimodal representation. We validated this approach across various benchmark tasks, where PRISME demonstrated consistent performance, and outperformed individual embedding methods in missing value imputations. This new framework supports comprehensive modeling of biomolecules, advancing the development of robust, broadly applicable multimodal embeddings optimized for downstream biomedical machine learning applications.
nan
Article 1411
Title@2025-07-10 (4): Goal-Oriented Sequential Bayesian Experimental Design for Causal Learning
Title: Goal-Oriented Sequential Bayesian Experimental Design for Causal Learning | Zielorientiertes sequentielles Bayesian Experimental Design für das kausale Lernen | 以目标为导向、按顺序排列的Bayesian 因果关系学习实验设计 2507.07359v1 |
Authors (4): Zheyu Zhang, Jiayuan Dong, Jie Liu, Xun Huan
We present GO-CBED, a goal-oriented Bayesian framework for sequential causal experimental design. Unlike conventional approaches that select interventions aimed at inferring the full causal model, GO-CBED directly maximizes the expected information gain (EIG) on user-specified causal quantities of interest, enabling more targeted and efficient experimentation. The framework is both non-myopic, optimizing over entire intervention sequences, and goal-oriented, targeting only model aspects relevant to the causal query. To address the intractability of exact EIG computation, we introduce a variational lower bound estimator, optimized jointly through a transformer-based policy network and normalizing flow-based variational posteriors. The resulting policy enables real-time decision-making via an amortized network. We demonstrate that GO-CBED consistently outperforms existing baselines across various causal reasoning and discovery tasks-including synthetic structural causal models and semi-synthetic gene regulatory networks-particularly in settings with limited experimental budgets and complex causal mechanisms. Our results highlight the benefits of aligning experimental design objectives with specific research goals and of forward-looking sequential planning.
nan
Article 1412
Title@2025-07-10 (4): Learning from positive and unlabeled examples -Finite size sample bounds
Title: Learning from positive and unlabeled examples -Finite size sample bounds | Aus positiven und unmarkierten Beispielen lernen -Finite-Size-Probengrenzen | 从正面和未贴标签的例子中学习 - 微小大小抽样范围 2507.07354v1 |
Authors (2): Farnam Mansouri, Shai Ben-David
PU (Positive Unlabeled) learning is a variant of supervised classification learning in which the only labels revealed to the learner are of positively labeled instances. PU learning arises in many real-world applications. Most existing work relies on the simplifying assumptions that the positively labeled training data is drawn from the restriction of the data generating distribution to positively labeled instances and/or that the proportion of positively labeled points (a.k.a. the class prior) is known apriori to the learner. This paper provides a theoretical analysis of the statistical complexity of PU learning under a wider range of setups. Unlike most prior work, our study does not assume that the class prior is known to the learner. We prove upper and lower bounds on the required sample sizes (of both the positively labeled and the unlabeled samples).
nan
Article 1413
Title@2025-07-10 (4): Machine Learning-driven Multiscale MD Workflows: The Mini-MuMMI Experience
Title: Machine Learning-driven Multiscale MD Workflows: The Mini-MuMMI Experience | Mehrstufige MD-Workflows mit maschinellem Lernen: Die Mini-MuMMI-Erfahrung | 由学习驱动的机械式学习驱动的多规模MD工作流程:微型MIMI经验 2507.07352v1 |
Authors (11): Loïc Pottier, Konstantia Georgouli, Timothy S. Carpenter, Fikret Aydin, Jeremy O. B. Tempkin, Dwight V. Nissley, Frederick H. Streitz, Thomas R. W. Scogland, Peer-Timo Bremer, Felice C. Lightstone, Helgi I. Ingólfsson
Computational models have become one of the prevalent methods to model complex phenomena. To accurately model complex interactions, such as detailed biomolecular interactions, scientists often rely on multiscale models comprised of several internal models operating at difference scales, ranging from microscopic to macroscopic length and time scales. Bridging the gap between different time and length scales has historically been challenging but the advent of newer machine learning (ML) approaches has shown promise for tackling that task. Multiscale models require massive amounts of computational power and a powerful workflow management system. Orchestrating ML-driven multiscale studies on parallel systems with thousands of nodes is challenging, the workflow must schedule, allocate and control thousands of simulations operating at different scales. Here, we discuss the massively parallel Multiscale Machine-Learned Modeling Infrastructure (MuMMI), a multiscale workflow management infrastructure, that can orchestrate thousands of molecular dynamics (MD) simulations operating at different timescales, spanning from millisecond to nanosecond. More specifically, we introduce a novel version of MuMMI called “mini-MuMMI”. Mini-MuMMI is a curated version of MuMMI designed to run on modest HPC systems or even laptops whereas MuMMI requires larger HPC systems. We demonstrate mini-MuMMI utility by exploring RAS-RAF membrane interactions and discuss the different challenges behind the generalization of multiscale workflows and how mini-MuMMI can be leveraged to target a broader range of applications outside of MD and RAS-RAF interactions.
nan
Article 1414
Title@2025-07-10 (4): Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts
Title: Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts | Zero-Shot-Context-Verallgemeinerung in der Verstärkung Lernen aus wenigen Trainingskontexten | 从少见的培训背景中加强学习的零零零片背景概括化 2507.07348v1 |
Authors (3): James Chapman, Kedar Karhadkar, Guido Montufar
Deep reinforcement learning (DRL) has achieved remarkable success across multiple domains, including competitive games, natural language processing, and robotics. Despite these advancements, policies trained via DRL often struggle to generalize to evaluation environments with different parameters. This challenge is typically addressed by training with multiple contexts and/or by leveraging additional structure in the problem. However, obtaining sufficient training data across diverse contexts can be impractical in real-world applications. In this work, we consider contextual Markov decision processes (CMDPs) with transition and reward functions that exhibit regularity in context parameters. We introduce the context-enhanced Bellman equation (CEBE) to improve generalization when training on a single context. We prove both analytically and empirically that the CEBE yields a first-order approximation to the Q-function trained across multiple contexts. We then derive context sample enhancement (CSE) as an efficient data augmentation method for approximating the CEBE in deterministic control environments. We numerically validate the performance of CSE in simulation environments, showcasing its potential to improve generalization in DRL.
nan
Article 1415
Title@2025-07-10 (4): It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation
Title: It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation | Es ist schwer, normal zu sein: Der Einfluss von Lärm auf die strukturagnostische Abschätzung | 很难正常:噪音对结构-不可计量估计的影响 2507.02275v2 |
Authors (3): Jikai Jin, Lester Mackey, Vasilis Syrgkanis
Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of \citet{robinson1988root}, we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of \citet{mackey2018orthogonal}. Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These \emph{ACE} procedures use structure-agnostic cumulant estimators to achieve $r$-th order insensitivity to nuisance errors whenever the $(r+1)$-st treatment cumulant is non-zero. We complement these core results with novel minimax guarantees for binary treatments in the partially linear model. Finally, using synthetic demand estimation experiments, we demonstrate the practical benefits of our higher-order robust estimators.
nan
Article 1416
Title@2025-07-10 (4): Way More Than the Sum of Their Parts: From Statistical to Structural Mixtures
Title: Way More Than the Sum of Their Parts: From Statistical to Structural Mixtures | Viel mehr als die Summe ihrer Teile: Von statistischen zu strukturellen Mischungen | 超出其部分总和:从统计到结构混合 2507.07343v1 |
Authors (1): James P. Crutchfield
We show that mixtures comprised of multicomponent systems typically are much more structurally complex than the sum of their parts; sometimes, infinitely more complex. We contrast this with the more familiar notion of statistical mixtures, demonstrating how statistical mixtures miss key aspects of emergent hierarchical organization. This leads us to identify a new kind of structural complexity inherent in multicomponent systems and to draw out broad consequences for system ergodicity.
nan