• 00 06-26 (4) Ad-Hoc Human-AI Coordination Challenge Ad-hoc-Koordinierungsherausforderung Mensch-AI A. 协调挑战 2506.21490v1
  • 01 06-26 xChemAgents: Agentic AI for Explainable Quantum Chemistry xChemAgenten: Agentische KI für erklärbare Quantenchemie xchemAgents: 可解释量子化学的AAA剂 2505.20574v2
  • 02 06-26 Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective Werden LLMs Professional bei Fund Investment sein? DeepFund: Eine Live Arena Perspektive LLM女士在基金投资方面是否具有专业性? 2503.18313v2
  • 03 06-25 (3) Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners Märkte mit heterogenen Agenten: Dynamik und Überleben von Bayesian vs. No-Regret Learners 具有异基因物剂的市场:巴伊西亚的动态和生存与无学习者对无学习者 2502.08597v2
  • 04 06-25 The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind Der Decrypto-Benchmark für multi-agente Vernunft und Theorie des Geistes 多种代理理由和思想理论的Decrypto Decrypto基准 2506.20664v1
  • 05 06-25 Task Allocation of UAVs for Monitoring Missions via Hardware-in-the-Loop Simulation and Experimental Validation Aufgabenverteilung von UAVs zur Überwachung von Missionen über Hardware-in-the-Loop-Simulation und experimentelle Validierung 通过 “ 网上硬件模拟和实验校验 “ ,为监测任务分配无人驾驶航空器的任务 2506.20626v1
  • 06 06-25 Opinion Dynamics with Highly Oscillating Opinions Meinungsdynamik mit stark oszillierenden Meinungen 具有高度振动性意见的意见动态 2506.20472v1
  • 07 06-25 An Agentic System for Rare Disease Diagnosis with Traceable Reasoning Ein Agentisches System für die Diagnose seltener Krankheiten mit rückverfolgbarer Begründung 利用可追踪理由进行罕见疾病诊断的制剂系统 2506.20430v1
  • 08 06-25 SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models SV-LLM: Agentischer Ansatz für die SoC-Sicherheitsverifizierung mit großen Sprachmodellen SV-LLLM:使用大语言模型进行 SoC安全核查的代理方法 2506.20415v1
  • 09 06-25 A Visualization Framework for Exploring Multi-Agent-Based Simulations Case Study of an Electric Vehicle Home Charging Ecosystem Ein Visualisierungsrahmen für die Erforschung von Multi-Agent-basierten Simulationen Fallstudie eines Elektroauto-Heimlade-Ökosystems 电动车辆家庭充电生态系统模拟模拟研究的可视化框架 2506.20400v1
  • 10 06-25 Argumentative Ensembling for Robust Recourse under Model Multiplicity Argumentatives Zusammenbauen für robusten Rücklauf unter Modellvielfalt 多种模式下强力利用的参数组合 2506.20260v1
  • 11 06-25 Language Modeling by Language Models Sprachmodellierung nach Sprachmodellen 按语文模式建模的语文 2506.20249v1
  • 12 06-25 On the $h$-majority dynamics with many opinions Auf der $h$-Mehrheitsdynamik mit vielen Meinungen 关于以美元为多数的动态, 2506.20218v1
  • 13 06-24 (2) Learning Bilateral Team Formation in Cooperative Multi-Agent Reinforcement Learning Bilaterale Teambildung im kooperativen Multi-Agenten-Verstärkungs-Lernen lernen 合作多机构加强合作学习双边学习小组 2506.20039v1
  • 14 06-24 KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality KnowRL: Erforschendes Wissenswertes Verstärktes Lernen für die Realität KnowRL:探索知识强化学习促进事实质量 2506.19807v1
  • 15 06-24 LLM-Based Social Simulations Require a Boundary LLM-basierte soziale Simulationen erfordern eine Grenze 以LLM为基础的社会模拟需要边界 2506.19806v1
  • 16 06-24 Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Warum kämpfen Open Source LLMs mit Datenanalyse? Eine systematische empirische Studie 开放源码LLMs为何要与数据分析斗争?系统的经验研究 2506.19794v1
  • 17 06-24 Collaborative governance of cyber violence: A two-phase, multi-scenario four-party evolutionary game and SBI1I2R public opinion dissemination Collaborative Governance von Cybergewalt: Zwei-Phasen-Multiszenario-Evolutionsspiel mit vier Parteien und öffentliche Meinungsverbreitung SBI1I2R 协作治理网络暴力:两阶段、多阶段、多设想、四党演进游戏和SSBI1I2R 公共舆论传播 2506.19704v1
  • 18 06-24 Smart Traffic Signals: Comparing MARL and Fixed-Time Strategies Intelligente Verkehrssignale: Vergleich von MARL- und Fixed-Time-Strategien 智能交通信号信号:MARL和固定时战略的比较 2505.14544v2
  • 19 06-24 MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications MATE:为无障碍应用提供LLM 授权多机构翻译环境 2506.19502v1
  • 20 06-24 Agent-Based Triangle Counting: Unlocking Truss Decomposition, Triangle Centrality, and Local Clustering Coefficient Agent-Based Triangle Counting: Entsperren Truss Zersetzung, Dreieck Zentralität und lokale Clustering Koeffizient 基于代理的三角计数:解锁Truss分解、三角中心以及地方集束 2402.03653v2
  • 21 06-24 Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Verstärkung Learning 重力-引力引导焦点集中影响多机构强化学习机制中心 2506.19417v1
  • 22 06-24 ChatModel: Automating Reference Model Design and Verification with LLMs ChatModel: Automatisieren von Referenzmodell-Design und Überprüfung mit LLMs 聊天模式:使用LLMs自动使用参考模型设计和核查 2506.15066v2
  • 23 06-24 Computing Tree Structures in Anonymous Graphs via Mobile Agents Berechnung von Baumstrukturen in anonymen Graphen über Mobile Agents 通过移动代理器在匿名图纸中的电子树结构 2506.19365v1
  • 24 06-24 PBFT-Backed Semantic Voting for Multi-Agent Memory Pruning PBFT-unterstützte semantische Abstimmung für Multi-Agent Memory Pruning PBFT 多重机构内存缓冲后退的语义投票 2506.17338v2
  • 25 06-23 (1) Low-Cost Infrastructure-Free 3D Relative Localization with Sub-Meter Accuracy in Near Field Low-Cost-Infrastruktur-freie 3D-relative Lokalisierung mit Sub-Meter-Genauigkeit im Nahfeld 低成本基础设施-无3D 相对本地化,近地有亚彼得精密度 2506.19199v1
  • 26 06-23 Experimental Setup and Software Pipeline to Evaluate Optimization based Autonomous Multi-Robot Search Algorithms Experimentelle Einrichtung und Software-Pipeline zur Bewertung von Optimierungs-basierten autonomen Multi-Roboter-Suche Algorithmen 实验设置和软件管道以评价基于优化的自动多机器人搜索算法 2506.16710v2
  • 27 06-23 Agentic Information Theory: Ergodicity and Intrinsic Semantics of Information Processes Agentische Informationstheorie: Ergodikität und Intrinsische Semantik von Informationsprozessen 代理信息理论:信息过程的分化和内在的语义 2505.19275v2
  • 28 06-23 Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Sequential Auctions under Unknown Environments Online-Lernen für dynamischen Vickrey-Clarke-Groves-Mechanismus in sequenziellen Auktionen unter unbekannten Umgebungen 在未知环境中有顺序拍卖的动态Vickrey-Clark-Groves机制在线学习 2506.19038v1
  • 29 06-23 TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation TRIZ-Agenten: Multi-Agenten-LLM-Ansatz für TRIZ-basierte Innovationen TRIZ 代理物:以TRIZ为基础的创新 2506.18783v1
  • 30 06-23 Reply to “Emergent LLM behaviors are observationally equivalent to data leakage” Antwort auf “Emergente LLM-Verhalten sind Beobachtungsäquivalent zu Daten Leckage” 对“紧急LLM行为”的答复在观测上等同于数据泄漏” 2506.18600v1
  • 31 06-23 Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning Transformer-Weltmodell für Proben Effizientes Mehr-Agenten-Verstärkungs-Lernen 取样效率高的多机构强化学习世界模式 2506.18537v1
  • 32 06-23 Autocratic strategies in Cournot oligopoly game Autokratische Strategien in Cournot Oligopol Spiel Cournot 寡头寡头寡头游戏中的专制策略 2506.16038v2
  • 33 06-23 IDCAIS: Inter-Defender Collision-Aware Interception Strategy against Multiple Attackers IDCAIS: Inter-Defender Collision-Aware Interception Strategy gegen mehrere Angreifer IDCAIS:针对多攻击者的防御人员碰撞-软件拦截战略 2112.12098v3
  • 34 06-22 (7) Wisdom of Crowds Through Myopic Self-Confidence Adaptation Weisheit der Massen durch myopische Selbst-Konfidenz-Anpassung 通过短视自信心适应而实现的群众智慧 2506.18195v1
  • 35 06-22 Multi-Agent Soft Actor-Critic with Coordinated Loss for Autonomous Mobility-on-Demand Fleet Control Multi-Agent Soft Actor-Critic mit koordiniertem Verlust für autonome Mobilität-auf-Demand-Flotte-Kontrolle 多代理商软软软操作器-对自动机动按需机动车队控制协调损失具有协调损失的批评 2404.06975v2
  • 36 06-22 Physics-Informed Multi-Agent Reinforcement Learning for Distributed Multi-Robot Problems Physik-informiertes Multi-Agenten-Verstärkungs-Lernen für verteilte Multi-Roboter-Probleme 为分布式多机器人问题进行物理化多机构强化学习 2401.00212v4
  • 37 06-22 RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation RoboTwin 2.0: Ein skalierbarer Datengenerator und Benchmark mit starker Domain Randomisierung für robuste bimanuelle Robotermanipulation RoboTwin 2. 0 : 一个可缩放数据生成器和基准, 具有强力域随机化功能, 用于机械二手机器人操纵的可缩放数据生成器和基准 2506.18088v1
  • 38 06-22 Optimization of Flying Ad Hoc Network Topology and Collaborative Path Planning for Multiple UAVs Optimierung von Flying Ad Hoc Network Topologie und kollaborative Pfadplanung für mehrere UAVs 优化多无人驾驶航空器飞行特设网络特设网络地形和协作道路规划 2506.17945v1
  • 39 06-22 Effective Red-Teaming of Policy-Adherent Agents Effektives Red-Teaming von Policy-Adherent Agents 有效的政策协调代理人红队 2506.09600v2
  • 40 06-21 (6) The Hive Mind is a Single Reinforcement Learning Agent Der Hive Mind ist ein einzelner Verstärkungs-Lernagent 蜂巢思想是单兵增援学习代理 2410.17517v3
  • 41 06-21 Bayesian Social Deduction with Graph-Informed Language Models Bayesische soziale Deduktion mit Graphen-informierten Sprachmodellen 采用图形化语言模型的巴伊斯社会衰退 2506.17788v1
  • 42 06-21 Multi-agent Embodied AI: Advances and Future Directions Multi-Agent Embodyd KI: Fortschritte und Zukunftsaussichten AI:进步和未来方向 2505.05108v2
  • 43 06-21 Distributed Butterfly Analysis using Mobile Agents Verteilte Schmetterlingsanalyse mit mobilen Agenten 使用移动剂进行分布式蝴蝶分析 2506.17721v1
  • 44 06-21 Towards Zero-Shot Coordination between Teams of Agents: The N-XPlay Framework Auf dem Weg zur Null-Shot-Koordination zwischen Agententeams: Das N-XPlay-Framework 实现各代理小组之间零位零位协调:NXPlay框架 2506.17560v1
  • 45 06-20 (5) On the Power of Spatial Locality on Online Routing Problems Über die Macht der räumlichen Lokalität bei Online-Routing-Problemen 在线运行问题空间地方空间定位力量 2506.17517v1
  • 46 06-20 Cash or Comfort? How LLMs Value Your Inconvenience Bargeld oder Komfort? Wie LLMs Wert Ihre Unannehmlichkeit 现金还是安慰? 2506.17367v1
  • 47 06-20 Formal Control for Uncertain Systems via Contract-Based Probabilistic Surrogates (Extended Version) Formale Kontrolle für unsichere Systeme über kontraktbasierte probabilistische Surrogate (erweiterte Version) 通过基于合同的概率性代管国对不确定系统进行正式控制(例外版本) 2506.16971v1
  • 48 06-20 Engineering Resilience: An Energy-Based Approach to Sustainable Behavioural Interventions Engineering Resilience: Ein energiebasierter Ansatz für nachhaltige Verhaltensinterventionen 工程复原力:以能源为基础的可持续行为干预办法 2506.16836v1
  • 49 06-20 Reimagining Urban Science: Scaling Causal Inference with Large Language Models Reimagining Urban Science: Skalierung von Kausalität mit großen Sprachmodellen 重新想象城市科学:与大语言模型的大规模因果推断 2504.12345v3
  • 50 06-20 A Scalable Post-Processing Pipeline for Large-Scale Free-Space Multi-Agent Path Planning with PiBT Eine skalierbare Post-Processing-Pipeline für großräumige Freiraum-Multi-Agenten-Pfadplanung mit PiBT 与 PiBT 合作的大型自由空间多机构多空间路径规划可缩放后处理管道 2506.16748v1
  • 51 06-20 Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation Generalisierbare Agentenmodellierung für Agent Collaboration-Competition Anpassung mit Multi-Retrieval und dynamischer Generation 多检索和有活力发电的合作-竞争适应 2506.16718v1
  • 52 06-19 (4) SemAgent: A Semantics Aware Program Repair Agent SemAgent: Ein Semantik-Bewusst-Programm-Reparatur-Agent SemAgenger: 语义学意识方案维修代理 2506.16650v1
  • 53 06-19 Autonomous Computer Vision Development with Agentic AI Autonome Computer Vision Entwicklung mit Agentischer KI 与Agric AI合作的自主计算机愿景发展 2506.11140v3
  • 54 06-19 eCAV: An Edge-Assisted Evaluation Platform for Connected Autonomous Vehicles eCAV: Eine Edge Assisted Evaluation Platform für vernetzte autonome Fahrzeuge eCAV: 连接自治车辆的边缘辅助评价平台 2506.16535v1
  • 55 06-19 Advanced Game-Theoretic Frameworks for Multi-Agent AI Challenges: A 2025 Outlook Fortgeschrittene Game-Theoretische Frameworks für Multi-Agent-KI-Herausforderungen: Ein 2025er Ausblick 应对多机构AI挑战的先进游戏理论框架:2025年展望 2506.17348v1
  • 56 06-19 Goal-conditioned Hierarchical Reinforcement Learning for Sample-efficient and Safe Autonomous Driving at Intersections Zielkonditioniertes Hierarchisches Verstärkungslernen für probeneffizientes und sicheres autonomes Fahren an Kreuzungen 以目标为条件的级级强化学习,促进在跨部门进行抽样高效和安全自主驾驶 2506.16336v1
  • 57 06-19 Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners Beitrag anregen und auch Parameter lernen: Föderiertes Lernen mit strategischen Dateninhabern 激励贡献和学习参数:与战略数据所有者进行联邦学习 2505.12010v2
  • 58 06-19 Towards Emergency Scenarios: An Integrated Decision-making Framework of Multi-lane Platoon Reorganization Auf dem Weg zu Notfallszenarien: Ein integrierter Entscheidungsrahmen für mehrspurige Platoon-Reorganisation 实现紧急情况设想:多lane排重组综合决策框架 2506.16311v1
  • 59 06-19 Coordination of Electrical and Heating Resources by Self-Interested Agents Koordination von elektrischen und Heizressourcen durch selbstinteressierte Agenten 自利代理人协调电气和供暖资源 2506.16277v1
  • 60 06-19 Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems Beyond Self-Talk: Eine kommunikationszentrische Untersuchung von LLM-basierten Multiagentensystemen 超越自言自语:以LLM为基础的多种机构系统的通信中心调查 2502.14321v2
  • 61 06-19 Solving Zero-Sum Convex Markov Games Lösen Zero-Sum Convex Markov Spiele 解决零- 苏姆 Convex Markov 游戏 2506.16120v1
  • 62 06-19 DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents DrunkAgent: Stealthy Memory Korruption in LLM-Powered Recommender Agents DrunkAgent:LLM授权建议代理人的隐性记忆腐败 2503.23804v2
  • 63 06-19 Decentralized Collective World Model for Emergent Communication and Coordination Dezentrales kollektives Weltmodell für Emergent Communication und Coordination 新兴通信和协调世界分散集体模式 2504.03353v2
  • 64 06-19 AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment AssistantX: Ein LLM-powered Proaktiver Assistent in kollaborativer Mensch-bevölkerter Umgebung 助理X:在合作人类普惠环境方面由LLM授权的一名主动助理助理 2409.17655v3
  • 65 06-19 Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning Rekonfigurierbare intelligente oberflächenunterstützte VEC auf Basis von Multi-Agenten-Verstärkungslernen 基于多机构强化学习的可重新配置智能表面辅助VEC 2406.11318v2

Article 0

Title@2025-06-26 (4): Ad-Hoc Human-AI Coordination Challenge

Title: Ad-Hoc Human-AI Coordination Challenge Ad-hoc-Koordinierungsherausforderung Mensch-AI A. 协调挑战 2506.21490v1

Authors (8): Tin Dizdarević, Ravi Hammond, Tobias Gessler, Anisoara Calinescu, Jonathan Cook, Matteo Gallici, Andrei Lupu, Jakob Nicolaus Foerster

Achieving seamless coordination between AI agents and humans is crucial for real-world applications, yet it remains a significant open challenge. Hanabi is a cooperative card game featuring imperfect information, constrained communication, theory of mind requirements, and coordinated action – making it an ideal testbed for human-AI coordination. However, its use for human-AI interaction has been limited by the challenges of human evaluation. In this work, we introduce the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) to overcome the constraints of costly and difficult-to-reproduce human evaluations. We develop \textit{human proxy agents} on a large-scale human dataset that serve as robust, cheap, and reproducible human-like evaluation partners in AH2AC2. To encourage the development of data-efficient methods, we open-source a dataset of 3,079 games, deliberately limiting the amount of available human gameplay data. We present baseline results for both two- and three- player Hanabi scenarios. To ensure fair evaluation, we host the proxy agents through a controlled evaluation system rather than releasing them publicly. The code is available at \href{https://github.com/FLAIROx/ah2ac2}{https://github.com/FLAIROx/ah2ac2}.

AI代理商和人类之间实现无缝协调对于现实应用至关重要,但它仍然是一个重要的公开挑战。Hanabi是一个合作的纸牌游戏,其特点是信息不完善、沟通受限、思想要求理论和协调行动 – – 使它成为人类-AI协调的理想测试点。然而,人类-AI互动的利用受到人类评估挑战的限制。在这项工作中,我们介绍了A-Hoc Human-AI协调挑战(AH2AC2),以克服昂贵和难以再处理的人类评估的制约因素。我们开发了用于大规模人类数据集的Textit{human代理商,该数据集是强健、廉价和可复制的人类类评价伙伴。为了鼓励开发数据效率高的方法,我们开源了3 079个游戏数据集,有意限制现有人类游戏数据的数量。我们为两种和三种玩家Hanabi情景提供了基准结果。为了确保公平评价,我们通过一个受控的评价系统来托管代理商,而不是公开放。代码可在以下网站获得:\refusb2/FLAIAA2/AAAA/GLAAA/A/AAA/A/ARC2。


Article 1

Title@2025-06-26 (4): xChemAgents: Agentic AI for Explainable Quantum Chemistry

Title: xChemAgents: Agentic AI for Explainable Quantum Chemistry xChemAgenten: Agentische KI für erklärbare Quantenchemie xchemAgents: 可解释量子化学的AAA剂 2505.20574v2

Authors (5): Can Polat, Mehmet Tuncel, Mustafa Kurban, Erchin Serpedin, Hasan Kurban

Recent progress in multimodal graph neural networks has demonstrated that augmenting atomic XYZ geometries with textual chemical descriptors can enhance predictive accuracy across a range of electronic and thermodynamic properties. However, naively appending large sets of heterogeneous descriptors often degrades performance on tasks sensitive to molecular shape or symmetry, and undermines interpretability. xChemAgents proposes a cooperative agent framework that injects physics-aware reasoning into multimodal property prediction. xChemAgents comprises two language-model-based agents: a Selector, which adaptively identifies a sparse, weighted subset of descriptors relevant to each target, and provides a natural language rationale; and a Validator, which enforces physical constraints such as unit consistency and scaling laws through iterative dialogue. On standard benchmark datasets, xChemAgents achieves up to a 22% reduction in mean absolute error over the state-of-the-art baselines, while producing faithful, human-interpretable explanations. Experiment results highlight the potential of cooperative, self-verifying agents to enhance both accuracy and transparency in foundation-model-driven materials science. The implementation and accompanying dataset are available at https://github.com/KurbanIntelligenceLab/xChemAgents.

多式联运图形神经网络的近期进展表明,以文本化学描述器增强原子XYZ的地形特征可以提高一系列电子和热力特性的预测准确性;然而,天真地附加大量不同描述器往往会降低对分子形状或对称敏感的任务的性能,并损害可解释性。 xChemAgents提出一个合作剂框架,将物理觉知推理注入多式联运属性预测。 xChemAgents 提议一个合作剂框架,将物理觉识推入到多式属性预测中。 xChemAgents 由两种语言模型构成的代理物组成:一个选择器,该选择器适应性地识别出与每个目标相关的稀有加权描述器子,并提供自然语言理由;以及一个验证器,通过迭代对话强制实施单位一致性和扩展法律等物理限制。关于标准基准数据集, xchemagenents 实现比最新基准基线的绝对误差高达22%,同时提出忠实、人际的解释。实验结果突出表明合作、自我验证的代理物的潜力,以提高基础建模材料的准确性和透明度。


Article 2

Title@2025-06-26 (4): Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective

Title: Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective Werden LLMs Professional bei Fund Investment sein? DeepFund: Eine Live Arena Perspektive LLM女士在基金投资方面是否具有专业性? 2503.18313v2

Authors (4): Changlun Li, Yao Shi, Yuyu Luo, Nan Tang

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, but their effectiveness in financial decision-making remains inadequately evaluated. Current benchmarks primarily assess LLMs’ understanding on financial documents rather than the ability to manage assets or dig out trading opportunities in dynamic market conditions. Despite the release of new benchmarks for evaluating diversified tasks on the financial domain, we identified four major problems in these benchmarks, which are data leakage, navel-gazing, over-intervention, and maintenance-hard. To pave the research gap, we introduce DeepFund, a comprehensive arena platform for evaluating LLM-based trading strategies in a live environment. Our approach implements a multi-agent framework where they serve as multiple key roles that realize the real-world investment decision processes. Moreover, we provide a web interface that visualizes LLMs’ performance with fund investment metrics across different market conditions, enabling detailed comparative analysis. Through DeepFund, we aim to provide a more realistic and fair assessment on LLM’s capabilities in fund investment, offering diversified insights and revealing their potential applications in real-world financial markets. Our code is publicly available at https://github.com/HKUSTDial/DeepFund.

大型语言模型(LLMS)在各个领域表现出了令人印象深刻的能力,但在金融决策方面的效力仍然没有得到充分的评价。目前的基准主要评估LLMS对金融文件的理解,而不是在活跃的市场条件下管理资产或挖掘贸易机会的能力。尽管为评价金融领域的多样化任务发布了新的基准,但我们查明了这些基准中的四个主要问题,即数据泄漏、收缩、过度干预和维护。为填补研究空白,我们引入了DeepFund,这是一个在现实环境中评价以LLM为基础的贸易战略的全面的舞台平台。我们的方法是实施一个多试办框架,作为实现现实世界投资决策过程的多重关键作用。此外,我们提供了一个网络界面,通过在不同市场条件下的基金投资指标将LLMs的业绩形象化,进行详细的比较分析。我们通过EmepFund,旨在对LM在资金投资方面的能力进行更现实和公正的评估,提供多样化的洞察,并揭示其在现实世界金融市场的潜在应用。我们的代码在https://github.com/HKustDIal/DIFO。


Article 3

Title@2025-06-25 (3): Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners

Title: Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners Märkte mit heterogenen Agenten: Dynamik und Überleben von Bayesian vs. No-Regret Learners 具有异基因物剂的市场:巴伊西亚的动态和生存与无学习者对无学习者 2502.08597v2

Authors (3): David Easley, Yoav Kolumbus, Eva Tardos

We analyze the performance of heterogeneous learning agents in asset markets with stochastic payoffs. Our main focus is on comparing Bayesian learners and no-regret learners who compete in markets and identifying the conditions under which each approach is more effective. Surprisingly, we find that low regret is not sufficient for survival: an agent can have regret as low as $O(\log T)$ but still vanish when competing against a Bayesian with a finite prior and any positive prior probability on the correct model. On the other hand, we show that Bayesian learning is fragile, while no-regret learning requires less knowledge of the environment and is therefore more robust. Motivated by the strengths and weaknesses of both approaches, we propose a balanced strategy for utilizing Bayesian updates that improves robustness and adaptability to distribution shifts, providing a step toward a best-of-both-worlds learning approach. The method is general, efficient, and easy to implement. Finally, we formally establish the relationship between the notions of survival and market dominance studied in economics and the framework of regret minimization, thus bridging these theories. More broadly, our work contributes to the understanding of dynamics with heterogeneous types of learning agents and their impact on markets.

我们分析了资产市场中不同学习代理商在资产市场中以零差利得的绩效。 我们的主要重点是比较贝耶斯学习者和在市场中竞争的无雷学习者,并找出每种方法都更为有效的条件。 令人惊讶的是,我们发现低遗憾不足以生存:一个代理商可能会后悔低到O(glog T)美元,但在与一个拥有有限前期和任何前期概率的正确模式上的正面概率的巴耶斯人竞争时仍然会消失。 另一方面,我们表明巴耶斯人的学习是脆弱的,而无雷学习则需要较少对环境的了解,因此更加有力。受这两种方法的长处和短处的驱使,我们提出了一个平衡的战略,利用巴伊斯人的最新消息来提高分销变化的稳健性和适应性,为向两个世界最佳学习方法迈进了一步。 这种方法是一般的、高效的和容易执行的。 最后,我们正式确定了在经济学中研究的生存和市场支配地位概念与减罪框架之间的关系,从而弥补了这些理论。 更广泛地说,我们的工作有助于理解其多样性的动力和对市场的影响。


Article 4

Title@2025-06-25 (3): The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind

Title: The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind Der Decrypto-Benchmark für multi-agente Vernunft und Theorie des Geistes 多种代理理由和思想理论的Decrypto Decrypto基准 2506.20664v1

Authors (3): Andrei Lupu, Timon Willi, Jakob Foerster

As Large Language Models (LLMs) gain agentic abilities, they will have to navigate complex multi-agent scenarios, interacting with human users and other agents in cooperative and competitive settings. This will require new reasoning skills, chief amongst them being theory of mind (ToM), or the ability to reason about the “mental” states of other agents. However, ToM and other multi-agent abilities in LLMs are poorly understood, since existing benchmarks suffer from narrow scope, data leakage, saturation, and lack of interactivity. We thus propose Decrypto, a game-based benchmark for multi-agent reasoning and ToM drawing inspiration from cognitive science, computational pragmatics and multi-agent reinforcement learning. It is designed to be as easy as possible in all other dimensions, eliminating confounding factors commonly found in other benchmarks. To our knowledge, it is also the first platform for designing interactive ToM experiments. We validate the benchmark design through comprehensive empirical evaluations of frontier LLMs, robustness studies, and human-AI cross-play experiments. We find that LLM game-playing abilities lag behind humans and simple word-embedding baselines. We then create variants of two classic cognitive science experiments within Decrypto to evaluate three key ToM abilities. Surprisingly, we find that state-of-the-art reasoning models are significantly worse at those tasks than their older counterparts. This demonstrates that Decrypto addresses a crucial gap in current reasoning and ToM evaluations, and paves the path towards better artificial agents.

随着大型语言模型(LLMS)获得代理能力,它们将不得不浏览复杂的多试样情景,与人类用户和其他代理人在合作和竞争性环境下互动。这将需要新的推理技能,其中最主要的是思维理论(ToM),或对其他代理人的“心理”状态进行解释的能力。然而,由于现有基准的范围狭窄,数据渗漏、饱和和和缺乏互动性,因此LMS的TOM和其他多试剂能力不甚为人理解,因此,我们提议了基于游戏的多试理基准Decrypto,即基于游戏的推理基准,TOM从认知科学、计算实用学和多剂强化学习中提取灵感。这需要新的推理技能,主要是思维理论理论理论(ToM),或其他代理人的“心理”状态。根据我们的知识,TOM和其他代理人的“精神”能力也是设计互动式TOM实验的第一个平台。我们通过对前沿LMS、坚固性研究以及人类-AI的交叉实验来验证基准设计。我们发现LM游戏能力落后于人类的游戏能力,以及简单的文字组合当前基准基线。我们随后在更精确的推理学的推理学上,我们创建了两个关键的推理到更精确的推理到更精确的推理。我们发现LM到更精确的推理的推理到更精确的推理。我们到了更深的推理。我们到了更深的推理到更深的推理到更深的推理。我们发现,在12进到更精确的推理。


Article 5

Title@2025-06-25 (3): Task Allocation of UAVs for Monitoring Missions via Hardware-in-the-Loop Simulation and Experimental Validation

Title: Task Allocation of UAVs for Monitoring Missions via Hardware-in-the-Loop Simulation and Experimental Validation Aufgabenverteilung von UAVs zur Überwachung von Missionen über Hardware-in-the-Loop-Simulation und experimentelle Validierung 通过 “ 网上硬件模拟和实验校验 “ ,为监测任务分配无人驾驶航空器的任务 2506.20626v1

Authors (4): Hamza Chakraa, François Guérin, Edouard Leclercq, Dimitri Lefebvre

This study addresses the optimisation of task allocation for Unmanned Aerial Vehicles (UAVs) within industrial monitoring missions. The proposed methodology integrates a Genetic Algorithms (GA) with a 2-Opt local search technique to obtain a high-quality solution. Our approach was experimentally validated in an industrial zone to demonstrate its efficacy in real-world scenarios. Also, a Hardware-in-the-loop (HIL) simulator for the UAVs team is introduced. Moreover, insights about the correlation between the theoretical cost function and the actual battery consumption and time of flight are deeply analysed. Results show that the considered costs for the optimisation part of the problem closely correlate with real-world data, confirming the practicality of the proposed approach.

本研究涉及在工业监测任务中优化无人驾驶航空器的任务分配问题,拟议方法将遗传算术(GA)与获得高质量解决办法的2-最佳本地搜索技术相结合,我们的方法在工业区进行了实验性验证,以证明其在现实世界情景中的有效性,还引入了无人驾驶航空器小组的硬件即时模拟器(HIL)模拟器,此外,对理论成本功能与实际电池消耗和飞行时间之间的关联性进行了深入分析,结果显示,所考虑的问题优化部分成本与现实世界数据密切相关,证实了拟议方法的实用性。


Article 6

Title@2025-06-25 (3): Opinion Dynamics with Highly Oscillating Opinions

Title: Opinion Dynamics with Highly Oscillating Opinions Meinungsdynamik mit stark oszillierenden Meinungen 具有高度振动性意见的意见动态 2506.20472v1

Authors (3): Víctor A. Vargas-Pérez, Jesús Giráldez-Cru, Oscar Cordón

Opinion Dynamics (OD) models are a particular case of Agent-Based Models in which the evolution of opinions within a population is studied. In most OD models, opinions evolve as a consequence of interactions between agents, and the opinion fusion rule defines how those opinions are updated. In consequence, despite being simplistic, OD models provide an explainable and interpretable mechanism for understanding the underlying dynamics of opinion evolution. Unfortunately, existing OD models mainly focus on explaining the evolution of (usually synthetic) opinions towards consensus, fragmentation, or polarization, but they usually fail to analyze scenarios of (real-world) highly oscillating opinions. This work overcomes this limitation by studying the ability of several OD models to reproduce highly oscillating dynamics. To this end, we formulate an optimization problem which is further solved using Evolutionary Algorithms, providing both quantitative results on the performance of the optimization and qualitative interpretations on the obtained results. Our experiments on a real-world opinion dataset about immigration from the monthly barometer of the Spanish Sociological Research Center show that the ATBCR, based on both rational and emotional mechanisms of opinion update, is the most accurate OD model for capturing highly oscillating opinions.

意见动态(OD)模型是研究人口内部意见演变的代理基础模型的一个特例。在大多数OD模型中,意见随着代理人之间的互动而演变,意见融合规则界定了这些观点如何更新。因此,尽管数据动态模型简单化,但为理解观点演变的基本动态提供了可解释和可解释的机制。不幸的是,现有的数据动态模型主要侧重于解释(通常是合成的)意见演变到共识、分裂或两极分化,但它们通常无法分析(现实世界)高度扭曲意见的情景。这项工作通过研究几种数据模式复制高度振动动态的能力克服了这一局限性。为此,我们提出了一个优化问题,利用进化解算法法进一步解决,提供了优化绩效的定量结果和对所获结果的定性解释。我们从西班牙社会学研究中心每月晴雨量计中对关于移民的真实观点数据集的实验表明,根据理性和情绪性的观点更新机制,ATBCRCR发现,ATBCR是高度精确的模型,用于获取高度精确的意见模型。


Article 7

Title@2025-06-25 (3): An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Title: An Agentic System for Rare Disease Diagnosis with Traceable Reasoning Ein Agentisches System für die Diagnose seltener Krankheiten mit rückverfolgbarer Begründung 利用可追踪理由进行罕见疾病诊断的制剂系统 2506.20430v1

Authors (12): Weike Zhao, Chaoyi Wu, Yanjie Fan, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie

Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM), capable of processing heterogeneous clinical inputs. The system generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning that links intermediate analytic steps to verifiable medical evidence. DeepRare comprises three key components: a central host with a long-term memory module; specialized agent servers responsible for domain-specific analytical tasks integrating over 40 specialized tools and web-scale, up-to-date medical knowledge sources, ensuring access to the most current clinical information. This modular and scalable design enables complex diagnostic reasoning while maintaining traceability and adaptability. We evaluate DeepRare on eight datasets. The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare significantly outperforms other 15 methods, like traditional bioinformatics diagnostic tools, LLMs, and other agentic systems, achieving an average Recall@1 score of 57.18% and surpassing the second-best method (Reasoning LLM) by a substantial margin of 23.79 percentage points. For multi-modal input scenarios, DeepRare achieves 70.60% at Recall@1 compared to Exomiser’s 53.20% in 109 cases. Manual verification of reasoning chains by clinical experts achieves 95.40% agreements. Furthermore, the DeepRare system has been implemented as a user-friendly web application http://raredx.cn/doctor.

  1. 但及时、准确的诊断仍是一个普遍的挑战。这在很大程度上是由于他们的临床异质性、个人流行率低以及大多数临床医生对罕见条件的熟悉程度有限。在这里,我们引入了DeepRare,这是第一个由大型语言模型(LLM)驱动的罕见疾病诊断毒剂系统,能够处理多种临床投入。这个系统生成了稀有疾病的诊断假设,每个系统都有一个透明的推理链,将中间分析步骤与可核查的医疗证据联系起来。DeepRare包含三个关键组成部分:一个具有长期记忆模块的中央主机体;专门代理服务器负责具体领域的分析任务,整合了40多种专门工具和网络规模的最新医学知识来源,确保了获取最新临床信息的机会。这个模块和可缩缩放设计使得复杂的诊断推理同时保持可追踪性和适应性。我们在8个数据集上对Deeparreare进行了评估。这个系统展示了2,919种疾病的特殊诊断性,实现了1013种疾病的100%的准确性。在基于HPO的评估中,DeepRare 明显地超前程,对其他15种成本模型进行了分析,在传统生物内部分析工具中实现了一种超标。

Article 8

Title@2025-06-25 (3): SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models

Title: SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models SV-LLM: Agentischer Ansatz für die SoC-Sicherheitsverifizierung mit großen Sprachmodellen SV-LLLM:使用大语言模型进行 SoC安全核查的代理方法 2506.20415v1

Authors (11): Dipayan Saha, Shams Tarek, Hasan Al Shaikh, Khan Thamid Hasan, Pavan Sai Nalluri, Md. Ajoad Hasan, Nashmin Alam, Jingbo Zhou, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi

Ensuring the security of complex system-on-chips (SoCs) designs is a critical imperative, yet traditional verification techniques struggle to keep pace due to significant challenges in automation, scalability, comprehensiveness, and adaptability. The advent of large language models (LLMs), with their remarkable capabilities in natural language understanding, code generation, and advanced reasoning, presents a new paradigm for tackling these issues. Moving beyond monolithic models, an agentic approach allows for the creation of multi-agent systems where specialized LLMs collaborate to solve complex problems more effectively. Recognizing this opportunity, we introduce SV-LLM, a novel multi-agent assistant system designed to automate and enhance SoC security verification. By integrating specialized agents for tasks like verification question answering, security asset identification, threat modeling, test plan and property generation, vulnerability detection, and simulation-based bug validation, SV-LLM streamlines the workflow. To optimize their performance in these diverse tasks, agents leverage different learning paradigms, such as in-context learning, fine-tuning, and retrieval-augmented generation (RAG). The system aims to reduce manual intervention, improve accuracy, and accelerate security analysis, supporting proactive identification and mitigation of risks early in the design cycle. We demonstrate its potential to transform hardware security practices through illustrative case studies and experiments that showcase its applicability and efficacy.

由于自动化、可扩缩性、全面性和适应性等重大挑战,传统的核查技术难以跟上步伐。大型语言模型(LLMS)的出现,在自然语言理解、代码生成和高级推理方面的超强能力,为解决这些问题提供了一个新的范式。超越单一模型,一种代理方法允许建立多试剂系统,专门LLMS可以借此进行合作,更有效地解决复杂问题。认识到这一机会,我们引入了SV-LLM(SV-LLM),这是一个新型的多试剂助理系统,旨在自动化和加强SoC的安全核查。通过整合诸如核查问题回答、安全资产识别、威胁建模、测试计划和财产生成、脆弱性检测和模拟错误验证等任务的专门机构,SV-LLM精简了工作流程。为了优化这些系统在各种任务中的绩效,代理商利用了不同的学习模式,例如文文本学习、微调和回收型(RAG),该系统旨在减少人工干预、改进准确性、测试威胁模型、测试计划和基于模拟的错误校准性分析,我们通过对安全进行早期风险的测试,以展示性分析,从而显示其安全风险。


Article 9

Title@2025-06-25 (3): A Visualization Framework for Exploring Multi-Agent-Based Simulations Case Study of an Electric Vehicle Home Charging Ecosystem

Title: A Visualization Framework for Exploring Multi-Agent-Based Simulations Case Study of an Electric Vehicle Home Charging Ecosystem Ein Visualisierungsrahmen für die Erforschung von Multi-Agent-basierten Simulationen Fallstudie eines Elektroauto-Heimlade-Ökosystems 电动车辆家庭充电生态系统模拟模拟研究的可视化框架 2506.20400v1

Authors (3): Kristoffer Christensen, Bo Nørregaard Jørgensen, Zheng Grace Ma

Multi-agent-based simulations (MABS) of electric vehicle (EV) home charging ecosystems generate large, complex, and stochastic time-series datasets that capture interactions between households, grid infrastructure, and energy markets. These interactions can lead to unexpected system-level events, such as transformer overloads or consumer dissatisfaction, that are difficult to detect and explain through static post-processing. This paper presents a modular, Python-based dashboard framework, built using Dash by Plotly, that enables efficient, multi-level exploration and root-cause analysis of emergent behavior in MABS outputs. The system features three coordinated views (System Overview, System Analysis, and Consumer Analysis), each offering high-resolution visualizations such as time-series plots, spatial heatmaps, and agent-specific drill-down tools. A case study simulating full EV adoption with smart charging in a Danish residential network demonstrates how the dashboard supports rapid identification and contextual explanation of anomalies, including clustered transformer overloads and time-dependent charging failures. The framework facilitates actionable insight generation for researchers and distribution system operators, and its architecture is adaptable to other distributed energy resources and complex energy systems.

电动车辆家用充电器生态系统的多试剂模拟(MABS)生成大型、复杂和随机的时间序列数据集,收集住户、电网基础设施和能源市场之间的相互作用。这些相互作用可能导致出乎意料的系统级事件,如变压器超载或消费者不满,难以通过静态处理后检测和解释。本文提出了一个模块化的、基于Python的仪表板框架,由Plotly使用Dash制成,有助于高效、多层次的探索和对MABS产出的突发行为进行根基分析。该系统有三个协调的观点(系统概览、系统分析和消费者分析),每个系统都提供高分辨率的直观化,如时间序列图、空间热测图和具体代理器的自下调工具。一项案例研究模拟了丹麦住宅网络采用完全EV的智能收费,展示了仪表板如何支持快速识别和背景解释异常现象,包括集群变压器超载和基于时间的收费失败。框架便利研究人员和分配系统操作的洞察力生成,其结构适应了其他分布式的能源和复杂能源系统。


Article 10

Title@2025-06-25 (3): Argumentative Ensembling for Robust Recourse under Model Multiplicity

Title: Argumentative Ensembling for Robust Recourse under Model Multiplicity Argumentatives Zusammenbauen für robusten Rücklauf unter Modellvielfalt 多种模式下强力利用的参数组合 2506.20260v1

Authors (4): Junqi Jiang, Antonio Rago, Francesco Leofante, Francesca Toni

In machine learning, it is common to obtain multiple equally performing models for the same prediction task, e.g., when training neural networks with different random seeds. Model multiplicity (MM) is the situation which arises when these competing models differ in their predictions for the same input, for which ensembling is often employed to determine an aggregation of the outputs. Providing recourse recommendations via counterfactual explanations (CEs) under MM thus becomes complex, since the CE may not be valid across all models, i.e., the CEs are not robust under MM. In this work, we formalise the problem of providing recourse under MM, which we name recourse-aware ensembling (RAE). We propose the idea that under MM, CEs for each individual model should be considered alongside their predictions so that the aggregated prediction and recourse are decided in tandem. Centred around this intuition, we introduce six desirable properties for solutions to this problem. For solving RAE, we propose a novel argumentative ensembling method which guarantees the robustness of CEs under MM. Specifically, our method leverages computational argumentation to explicitly represent the conflicts between models and counterfactuals regarding prediction results and CE validity. It then uses argumentation semantics to resolve the conflicts and obtain the final solution, in a manner which is parametric to the chosen semantics. Our method also allows for the specification of preferences over the models under MM, allowing further customisation of the ensemble. In a comprehensive theoretical analysis, we characterise the behaviour of argumentative ensembling with four different argumentation semantics. We then empirically demonstrate the effectiveness of our approach in satisfying desirable properties with eight instantiations of our method. (Abstract is shortened for arXiv.)

在机器学习中,通常的做法是为同一预测任务获得多种同样表现的模型,例如,培训使用不同随机种子的神经网络。模型多样性(MMM)是当这些相互竞争的模型对同一输入的预测不同时出现的情况,对于这些模型,往往使用集合来决定产出的汇总。因此,在MMM下,通过反事实解释(CES)提供追索建议变得复杂,因为CE可能并不是在所有模型中都有效,即CE在MM下并不强大。在这项工作中,我们正式解决了在MM下提供追索权的问题,我们称之为追索性(RAE)的多重性(MM)是指这些相互竞争的模型对同一输入的预测(MMM(MM)下,我们称之为追索性(MM)的追索性(MMM)的追索权性(MMM) 的追索权(MMM) 的追索权(MMMM) 的追索权(MMM(MMMMM) 下,我们称之为追索性(MMMMM) 的追索性(MMMMMM) 的追索权(MMMMM) 的追索(M) 多重) 的追索权(MMMMMMM) 的追索权(M) 多重) 的追索权(M) 的追索权问题(MM) 的追索权问题(M) 的追索权问题(M) 的追索权问题(M) 问题(ML) 的问题。我们(MD) 的多重性(M(M) (M) 多重性(M(M) MI(M) (M) ) ) Proc) (M(M(M) (M(M) (M) II) ) (M) (M) (MMMMD) (MD) (MD) (MD) (MMM) en) eal) (M) (M) (M) (M) (M) (M) (M) (M) (M) (M) (M) (M(M) (M) (M) (M) (M) (M)


Article 11

Title@2025-06-25 (3): Language Modeling by Language Models

Title: Language Modeling by Language Models Sprachmodellierung nach Sprachmodellen 按语文模式建模的语文 2506.20249v1

Authors (3): Junyan Cheng, Peter Clark, Kyle Richardson

Can we leverage LLMs to model the process of discovering novel language model (LM) architectures? Inspired by real research, we propose a multi-agent LLM approach that simulates the conventional stages of research, from ideation and literature search (proposal stage) to design implementation (code generation), generative pre-training, and downstream evaluation (verification). Using ideas from scaling laws, our system, Genesys, employs a Ladder of Scales approach; new designs are proposed, adversarially reviewed, implemented, and selectively verified at increasingly larger model scales (14M$\sim$350M parameters) with a narrowing budget (the number of models we can train at each scale). To help make discovery efficient and factorizable, Genesys uses a novel genetic programming backbone, which we show has empirical advantages over commonly used direct prompt generation workflows (e.g., $\sim$86\% percentage point improvement in successful design generation, a key bottleneck). We report experiments involving 1,162 newly discovered designs (1,062 fully verified through pre-training) and find the best designs to be highly competitive with known architectures (e.g., outperform GPT2, Mamba2, etc., on 6/9 common benchmarks). We couple these results with comprehensive system-level ablations and formal results, which give broader insights into the design of effective autonomous discovery systems.

我们能否利用LLMS来模拟发现新语言模型(LM)结构的进程?在真正的研究的启发下,我们提议一种多试剂LM方法,模拟传统研究阶段,从思想和文献搜索(提案阶段)到设计实施(代码生成)、基因化培训前和下游评估(核查),利用规模法、我们的系统、基因系统、使用一个“缩放梯子”的方法等想法,提出新的设计、对立审查、实施和有选择地在日益扩大的模型规模(14M$\sim$350M参数)上核查?我们建议一种多剂LM方法,以缩小预算范围(我们可以在每个规模上培训的模型数目),模拟传统研究阶段(从思想和文献搜索(提案阶段)到设计(代码搜索阶段),Genesysysyss使用新的基因方案主干线,我们表明,这些主干法比通常使用的直接生成流程(例如,$\sim 86_百分点提高成功设计方法,一个关键的瓶颈 ) 。我们报告涉及1 162个新发现的设计的实验(1,062个(通过培训前充分核查),发现),发现最佳设计是最佳设计在已知结构上具有高度竞争力,使6个通用的层次上具有高度竞争力,Mexexexprismexbrofroforismismismismismismex,这些基准,使共同的系统具有高度。


Article 12

Title@2025-06-25 (3): On the $h$-majority dynamics with many opinions

Title: On the $h$-majority dynamics with many opinions Auf der $h$-Mehrheitsdynamik mit vielen Meinungen 关于以美元为多数的动态, 2506.20218v1

Authors (4): Francesco d’Amore, Niccolò D’Archivio, George Giakkoupis, Emanuele Natale

We present the first upper bound on the convergence time to consensus of the well-known $h$-majority dynamics with $k$ opinions, in the synchronous setting, for $h$ and $k$ that are both non-constant values. We suppose that, at the beginning of the process, there is some initial additive bias towards some plurality opinion, that is, there is an opinion that is supported by $x$ nodes while any other opinion is supported by strictly fewer nodes. We prove that, with high probability, if the bias is $\omega(\sqrt{x})$ and the initial plurality opinion is supported by at least $x = \omega(\log n)$ nodes, then the process converges to plurality consensus in $O(\log n)$ rounds whenever $h = \omega(n \log n / x)$. A main corollary is the following: if $k = o(n / \log n)$ and the process starts from an almost-balanced configuration with an initial bias of magnitude $\omega(\sqrt{n/k})$ towards the initial plurality opinion, then any function $h = \omega(k \log n)$ suffices to guarantee convergence to consensus in $O(\log n)$ rounds, with high probability. Our upper bound shows that the lower bound of $\Omega(k / h^2)$ rounds to reach consensus given by Becchetti et al.\ (2017) cannot be pushed further than $\widetilde{\Omega}(k / h)$. Moreover, the bias we require is asymptotically smaller than the $\Omega(\sqrt{n\log n})$ bias that guarantees plurality consensus in the $3$-majority dynamics: in our case, the required bias is at most any (arbitrarily small) function in $\omega(\sqrt{x})$ for any value of $k \ge 2$.

在同步环境下,我们对已知的以美元为单位的正辛醇-多数动态在2美元为单位,以美元为单位,以美元为单位,以美元和美元为单位,在趋同时间上,我们展示了第一个关于趋同时间的上限。我们认为,在进程开始时,对某种多元观点存在一些初始添加偏向性偏向,也就是说,有一种观点得到美元偏向的支持,而任何其他观点则得到严格更少的节点的支持。我们证明,如果偏差为$(美元)和美元为单位,那么以美元为单位,则以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元,以美元,以美元,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元为单位,以美元,以美元为,以美元,以美元,以美元


Article 13

Title@2025-06-24 (2): Learning Bilateral Team Formation in Cooperative Multi-Agent Reinforcement Learning

Title: Learning Bilateral Team Formation in Cooperative Multi-Agent Reinforcement Learning Bilaterale Teambildung im kooperativen Multi-Agenten-Verstärkungs-Lernen lernen 合作多机构加强合作学习双边学习小组 2506.20039v1

Authors (2): Koorosh Moslemi, Chi-Guhn Lee

Team formation and the dynamics of team-based learning have drawn significant interest in the context of Multi-Agent Reinforcement Learning (MARL). However, existing studies primarily focus on unilateral groupings, predefined teams, or fixed-population settings, leaving the effects of algorithmic bilateral grouping choices in dynamic populations underexplored. To address this gap, we introduce a framework for learning two-sided team formation in dynamic multi-agent systems. Through this study, we gain insight into what algorithmic properties in bilateral team formation influence policy performance and generalization. We validate our approach using widely adopted multi-agent scenarios, demonstrating competitive performance and improved generalization in most scenarios.

团队构成和团队学习动态在多机构强化学习(MARL)背景下引起了极大的兴趣,但是,现有研究主要侧重于单边组合、预先界定的团队或固定人口环境,留下在探索不足的动态人群中双边组合算法选择的影响,为弥补这一差距,我们引入了在动态多试剂系统中学习双面团队组成的框架。通过这项研究,我们深入了解双边团队形成过程中的算法特性对政策绩效和一般化有何影响。我们使用广泛采纳的多机构情景验证了我们的做法,展示了竞争性绩效,并在多数情况下改进了通用性。


Article 14

Title@2025-06-24 (2): KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality

Title: KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality KnowRL: Erforschendes Wissenswertes Verstärktes Lernen für die Realität KnowRL:探索知识强化学习促进事实质量 2506.19807v1

Authors (5): Baochang Ren, Shuofei Qiao, Wenhao Yu, Huajun Chen, Ningyu Zhang

Large Language Models (LLMs), particularly slow-thinking models, often exhibit severe hallucination, outputting incorrect content due to an inability to accurately recognize knowledge boundaries during reasoning. While Reinforcement Learning (RL) can enhance complex reasoning abilities, its outcome-oriented reward mechanism often lacks factual supervision over the thinking process, further exacerbating the hallucination problem. To address the high hallucination in slow-thinking models, we propose Knowledge-enhanced RL, KnowRL. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. This targeted factual input during RL training enables the model to learn and internalize fact-based reasoning strategies. By directly rewarding adherence to facts within the reasoning steps, KnowRL fosters a more reliable thinking process. Experimental results on three hallucination evaluation datasets and two reasoning evaluation datasets demonstrate that KnowRL effectively mitigates hallucinations in slow-thinking models while maintaining their original strong reasoning capabilities. Our code is available at https://github.com/zjunlp/KnowRL.

大型语言模型(LLMs),特别是低思维模式,往往表现出严重的幻觉,由于在推理过程中无法准确识别知识界限,输出不正确的内容,结果模型(LLMs),特别是低思维模式,往往表现出严重的幻觉,由于在推理过程中无法准确识别知识界限,结果不正确的内容。虽然加强学习(RL)可以增强复杂的推理能力,但其注重结果的奖励机制往往缺乏对思维过程的实际监督,从而进一步加剧幻觉问题。为解决低思维模型中高幻觉的问题,我们提议“知识强化RL”(KnowRL.KnowRL)指导模型进行基于事实的缓慢思维,在知识核查的基础上,将事实质量奖赏纳入RL培训过程,从而进行基于事实的缓慢思维模式。在RL.Sintinking/Rsurgrass中,“KondRL”系统在慢思维中有效地减少原始/原始逻辑。


Article 15

Title@2025-06-24 (2): LLM-Based Social Simulations Require a Boundary

Title: LLM-Based Social Simulations Require a Boundary LLM-basierte soziale Simulationen erfordern eine Grenze 以LLM为基础的社会模拟需要边界 2506.19806v1

Authors (4): Zengqing Wu, Run Peng, Takayuki Ito, Chuan Xiao

This position paper argues that large language model (LLM)-based social simulations should establish clear boundaries to meaningfully contribute to social science research. While LLMs offer promising capabilities for modeling human-like agents compared to traditional agent-based modeling, they face fundamental limitations that constrain their reliability for social pattern discovery. The core issue lies in LLMs’ tendency towards an ``average persona’’ that lacks sufficient behavioral heterogeneity, a critical requirement for simulating complex social dynamics. We examine three key boundary problems: alignment (simulated behaviors matching real-world patterns), consistency (maintaining coherent agent behavior over time), and robustness (reproducibility under varying conditions). We propose heuristic boundaries for determining when LLM-based simulations can reliably advance social science understanding. We believe that these simulations are more valuable when focusing on (1) collective patterns rather than individual trajectories, (2) agent behaviors aligning with real population averages despite limited variance, and (3) proper validation methods available for testing simulation robustness. We provide a practical checklist to guide researchers in determining the appropriate scope and claims for LLM-based social simulations.

这份立场文件认为,基于大语言模式(LLM)的社会模拟应该为对社会科学研究做出有意义的贡献确定明确的界限。虽然LLM公司提供了与传统代理模型相比,在模拟人种制剂方面有希望的能力,但它们面临着限制其社会模式发现可靠性的基本限制。核心问题在于LLM公司倾向于“平均人种”,这种人种缺乏足够的行为异质性,这是模拟复杂社会动态的关键要求。我们研究了三个关键的边界问题:协调(模拟行为与现实世界模式相匹配)、一致性(长期保持连贯的代理行为)和稳健(在不同条件下可减少)。我们提出了确定基于LLMM的模拟何时能可靠地推进社会科学理解的超自然界限。我们认为,这些模拟在侧重于(1)集体模式而不是个人轨迹时更有价值,(2)尽管差异有限,但与实际人口平均数相一致的代理行为;(3)用于测试模拟稳健性的适当验证方法。我们提供了一个实用的核对清单,指导研究人员确定LM公司基于社会模拟的适当范围和索赔范围。


Article 16

Title@2025-06-24 (2): Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Title: Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Warum kämpfen Open Source LLMs mit Datenanalyse? Eine systematische empirische Studie 开放源码LLMs为何要与数据分析斗争?系统的经验研究 2506.19794v1

Authors (10): Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang

Large Language Models (LLMs) hold promise in automating data analysis tasks, yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios. In this work, we investigate strategies to enhance the data analysis capabilities of open-source LLMs. By curating a seed dataset of diverse, realistic scenarios, we evaluate models across three dimensions: data understanding, code generation, and strategic planning. Our analysis reveals three key findings: (1) Strategic planning quality serves as the primary determinant of model performance; (2) Interaction design and task complexity significantly influence reasoning capabilities; (3) Data quality demonstrates a greater impact than diversity in achieving optimal performance. We leverage these insights to develop a data synthesis methodology, demonstrating significant improvements in open-source LLMs’ analytical reasoning capabilities.

大型语言模型(LLMs)在数据分析任务自动化方面很有希望,然而,开放源代码模型在这类推理密集型假设情景中面临重大限制。在这项工作中,我们调查了提高开放源代码LLMs数据分析能力的战略。我们通过整理一套多样、现实的假设情景的种子数据集,评估了三个方面的模型:数据理解、代码生成和战略规划。我们的分析揭示了三个主要结论:(1)战略规划质量是模型绩效的主要决定因素;(2)互动设计和任务复杂性极大地影响推理能力;(3)数据质量显示在实现最佳绩效方面的影响大于多样性。我们利用这些洞见来开发数据综合方法,展示了开放源代码LLMs分析推理能力的重大改进。


Article 17

Title@2025-06-24 (2): Collaborative governance of cyber violence: A two-phase, multi-scenario four-party evolutionary game and SBI1I2R public opinion dissemination

Title: Collaborative governance of cyber violence: A two-phase, multi-scenario four-party evolutionary game and SBI1I2R public opinion dissemination Collaborative Governance von Cybergewalt: Zwei-Phasen-Multiszenario-Evolutionsspiel mit vier Parteien und öffentliche Meinungsverbreitung SBI1I2R 协作治理网络暴力:两阶段、多阶段、多设想、四党演进游戏和SSBI1I2R 公共舆论传播 2506.19704v1

Authors (4): Xiaoting Yang, Wei Lv, Ting Yang, Bart Baesens

Cyber violence severely disrupts public order in both cyberspace and the real world. Existing studies have gradually advocated collaborative governance but rely on macro-level theoretical analyses. This study integrates micro- and macro-level perspectives to propose a two-stage, multi-scenario governance mechanism for cyber violence. In the first phase, a multi-scenario evolutionary game model with four parties involved in cyber violence was developed based on evolutionary game theory. Matlab simulations show that under strong government regulation, moderate levels of punishment implemented by the government against the online media that adopt misguidance strategies can achieve the most desirable stable state. In the second phase, the role of bystanders was introduced by integrating communication dynamics theory, and emotional factors were considered alongside game strategies. This led to the development of a new SBI1I2R model for public opinion dissemination in cyber violence. Netlogo simulations found that increasing the “correct guidance” strategy by the online media reduces the influence of cyber violence supporters and the time it takes for their nodes to drop to zero, but does not significantly shorten the time for the peak to occur. Comparatively, collaborative intervention between the online media and the government was most effective in curbing public opinion, followed by the government’s independent “strong regulation.” Relying solely on the online media’s “correct guidance” produced the weakest effect. Finally, this mechanism was applied to a case study, and a multi-stage, multi-scenario analysis based on life cycle theory enhanced its practical applicability.

网络暴力在网络空间和现实世界中都严重扰乱了公共秩序。 现有研究已逐渐倡导合作治理,但依靠宏观层面的理论分析。 这项研究将微观和宏观层面的观点结合起来,为网络暴力提出一个两阶段、多视角的多重治理机制。 在第一阶段,根据进化游戏理论,与参与网络暴力的四方开发了一个多设想进化的游戏模式。 Matlab模拟显示,根据强有力的政府监管,政府对采用错误指导战略的在线媒体实施适度惩罚可以达到最理想的稳定状态。 在第二阶段,通过整合通信动态理论,引入了旁观者的作用,并且将情感因素与游戏战略放在一起考虑。 这导致开发了一个新的SBI1I2R模式,用于在网络暴力中传播公众舆论。 Netlogo模拟发现,通过在线媒体增加“正确指导”战略,降低了网络暴力支持者的影响力,降低了其节点下降到零的时间,但并没有大大缩短达到高峰。 比较而言,在线媒体和政府之间的协作性干预是“ 最稳定的理论最终运用了“ ” 一种基于最强的理论的多重分析案例, 。“ ” 一种基于最高级的理论, 以强化的理论 。


Article 18

Title@2025-06-24 (2): Smart Traffic Signals: Comparing MARL and Fixed-Time Strategies

Title: Smart Traffic Signals: Comparing MARL and Fixed-Time Strategies Intelligente Verkehrssignale: Vergleich von MARL- und Fixed-Time-Strategien 智能交通信号信号:MARL和固定时战略的比较 2505.14544v2

Authors (1): Saahil Mahato

Urban traffic congestion, particularly at intersections, significantly impacts travel time, fuel consumption, and emissions. Traditional fixed-time signal control systems often lack the adaptability to manage dynamic traffic patterns effectively. This study explores the application of multi-agent reinforcement learning (MARL) to optimize traffic signal coordination across multiple intersections within a simulated environment. Utilizing Pygame, a simulation was developed to model a network of interconnected intersections with randomly generated vehicle flows to reflect realistic traffic variability. A decentralized MARL controller was implemented, in which each traffic signal operates as an autonomous agent, making decisions based on local observations and information from neighboring agents. Performance was evaluated against a baseline fixed-time controller using metrics such as average vehicle wait time and overall throughput. The MARL approach demonstrated statistically significant improvements, including reduced average waiting times and improved throughput. These findings suggest that MARL-based dynamic control strategies hold substantial promise for improving urban traffic management efficiency. More research is recommended to address scalability and real-world implementation challenges.

传统的固定时间信号控制系统往往缺乏有效管理动态交通模式的适应性。本研究探索如何应用多试剂强化学习(MARL),以优化模拟环境中多个交叉点的交通信号协调。利用Pygame模拟,模拟了与随机产生的车辆流动的互联交叉点网络,以反映现实的交通变化。实施了分散式MARL控制器,每个通信信号都作为自主代理器运作,根据当地观察和邻近物剂的信息作出决定。利用车辆平均等待时间和总体吞吐量等衡量标准,对照基线固定时间控制器对绩效进行了评估。MARL方法在统计上显示出显著的改进,包括平均等待时间的减少和吞吐量的改善。这些结果表明,基于MARL的动态控制战略对于提高城市交通管理效率有着巨大的希望。建议开展更多的研究,以解决可扩展性和现实世界执行方面的挑战。


Article 19

Title@2025-06-24 (2): MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications

Title: MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications MATE:为无障碍应用提供LLM 授权多机构翻译环境 2506.19502v1

Authors (3): Aleksandr Algazinov, Matt Laing, Paul Laban

Accessibility remains a critical concern in today’s society, as many technologies are not developed to support the full range of user needs. Existing multi-agent systems (MAS) often cannot provide comprehensive assistance for users in need due to the lack of customization stemming from closed-source designs. Consequently, individuals with disabilities frequently encounter significant barriers when attempting to interact with digital environments. We introduce MATE, a multimodal accessibility MAS, which performs the modality conversions based on the user’s needs. The system is useful for assisting people with disabilities by ensuring that data will be converted to an understandable format. For instance, if the user cannot see well and receives an image, the system converts this image to its audio description. MATE can be applied to a wide range of domains, industries, and areas, such as healthcare, and can become a useful assistant for various groups of users. The system supports multiple types of models, ranging from LLM API calling to using custom machine learning (ML) classifiers. This flexibility ensures that the system can be adapted to various needs and is compatible with a wide variety of hardware. Since the system is expected to run locally, it ensures the privacy and security of sensitive information. In addition, the framework can be effectively integrated with institutional technologies (e.g., digital healthcare service) for real-time user assistance. Furthermore, we introduce ModCon-Task-Identifier, a model that is capable of extracting the precise modality conversion task from the user input. Numerous experiments show that ModCon-Task-Identifier consistently outperforms other LLMs and statistical models on our custom data. Our code and data are publicly available at https://github.com/AlgazinovAleksandr/Multi-Agent-MATE.

现有多试剂系统(MAS)往往无法向需要的用户提供全面援助,因为封闭源码设计缺乏定制,因此残疾人在试图与数字环境互动时经常遇到重大障碍。我们引入了MATE,一个基于用户需要进行模式转换的多式无障碍MAS。这个系统有助于帮助残疾人,确保数据转换成易理解的格式。例如,如果用户不能看好并接收图像,该系统将这种图像转换成其音频描述。MATE可以应用于广泛的领域、行业和领域,例如医疗保健,并可以成为各类用户群体的一个有用助手。我们引入了多种模式,从LAM API到使用自定义机器学习(MLM)分类器。这种灵活性确保了系统能够适应各种需求,并且与广泛的硬件兼容。由于该系统要在当地运行,它能确保用户的隐私和安全性将MORSK转换成其数据格式。此外,我们也可以在系统内部的服务器服务器服务器服务器上,我们可以有效地使用实时服务器。


Article 20

Title@2025-06-24 (2): Agent-Based Triangle Counting: Unlocking Truss Decomposition, Triangle Centrality, and Local Clustering Coefficient

Title: Agent-Based Triangle Counting: Unlocking Truss Decomposition, Triangle Centrality, and Local Clustering Coefficient Agent-Based Triangle Counting: Entsperren Truss Zersetzung, Dreieck Zentralität und lokale Clustering Koeffizient 基于代理的三角计数:解锁Truss分解、三角中心以及地方集束 2402.03653v2

Authors (3): Prabhat Kumar Chand, Apurba Das, Anisur Rahaman Molla

Triangle counting in a graph is a fundamental problem with wide-ranging applications. It is crucial for understanding graph structure and serves as a basis for more advanced graph analytics. One key application is truss decomposition, a technique for identifying maximal, highly interconnected subgraphs, revealing structural cohesion and tight-knit communities in complex graphs. This facilitates analysis of relationships and information flow in fields such as social networks, biology, and recommendation systems. Using mobile agents or robots for tasks like truss decomposition and clustering coefficient computation is especially advantageous in decentralised environments with limited or unreliable communication. In such scenarios, agents can perform local computations without requiring an extensive communication infrastructure. This is valuable in contexts like disaster response, urban management, and military operations, where broadcast communication is impractical. In this paper, we address the triangle counting problem in an arbitrary anonymous graph using mobile agents. This method is extended as a subroutine to solve the truss decomposition problem and compute triangle centrality and the local clustering coefficient for each node. Our approach uses $n$ autonomous mobile agents, each starting at a different node of an $n$-node graph. These agents coordinate to collaboratively solve triangle enumeration, then truss decomposition, triangle centrality, and clustering coefficient. We assume a synchronous system where agents execute tasks concurrently, allowing time to be measured in rounds. The graph is anonymous (nodes have no IDs), but agents have distinct IDs and limited memory. Agents can perform local computations and communicate only when co-located. Our goal is to design algorithms that minimise both time and memory per agent, while enabling solutions to the above problems.

图表中三角形的计数是一个包含广泛应用的根本性问题。 它对于理解图形结构至关重要, 并且是更先进的图形分析的基础。 一个关键应用是 truss 分解, 这是一种在复杂的图表中识别最大、 高度相互关联的子图层的技术, 揭示结构的凝聚力和紧密的连接社区。 这有助于分析社交网络、 生物学 和建议系统 等领域的关系和信息流动。 使用移动剂或机器人处理 trus 分解和组合系数计算等任务, 在匿名通信有限或不可靠的分散环境中特别有利。 在这样的情况下, 代理可以进行本地计算, 而不需要广泛的通信基础设施。 在灾难反应、 城市管理和军事行动等情况下, 这是一种有价值的技术, 在广播通信不切实际的情况下, 我们用一个任意的匿名图表来处理三角点计数问题。 这种方法扩大为子路程, 解决tuss decomposition 问题, 并且为每个节点计算三角点, 我们的方法使用美元自主的移动剂, 每一个在不同的节点开始一个节点上, 实现一个不同的节流的递解的递解的递解的计算, 度, 解的递解的递解的递解的递解的递解的代数 度, 度, 度, 也就是的轴 的轴 的计算是 的轴 的轴 的轴 的轴 的轴 的轴 , 的轴 的轴 的计算, 开始一个调的递合的递合的递合的递合的递合的递合的递合的递合的递合的递合的递合的递合的调制的轴 。


Article 21

Title@2025-06-24 (2): Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

Title: Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Verstärkung Learning 重力-引力引导焦点集中影响多机构强化学习机制中心 2506.19417v1

Authors (3): Yisak Park, Sunwoo Lee, Seungyul Han

Cooperative multi-agent reinforcement learning (MARL) under sparse rewards presents a fundamental challenge due to limited exploration and insufficient coordinated attention among agents. In this work, we propose the Focusing Influence Mechanism (FIM), a novel framework that enhances cooperation by directing agent influence toward task-critical elements, referred to as Center of Gravity (CoG) state dimensions, inspired by Clausewitz’s military theory. FIM consists of three core components: (1) identifying CoG state dimensions based on their stability under agent behavior, (2) designing counterfactual intrinsic rewards to promote meaningful influence on these dimensions, and (3) encouraging persistent and synchronized focus through eligibility-trace-based credit accumulation. These mechanisms enable agents to induce more targeted and effective state transitions, facilitating robust cooperation even in extremely sparse reward settings. Empirical evaluations across diverse MARL benchmarks demonstrate that the proposed FIM significantly improves cooperative performance compared to baselines.

在这项工作中,我们提议 “ 聚焦影响机制 “ (FIM),这是一个新的框架,通过将代理人的影响引向任务关键要素,加强合作,称为重力中心(CoG)国家层面,受克劳德维茨军事理论的启发。 “ 重力中心 “ 由三个核心部分组成:(1) 根据其代理人行为下的稳定性,确定COG国家层面,(2) 设计反事实内在收益,以促进对这些层面的有意义的影响,(3) 通过基于资格的追踪信用积累,鼓励持续和同步地关注重点,使代理人能够促成更有针对性和更有效的国家过渡,促进即使在极为稀少的奖励环境中的有力合作。 “ 重力中心 “ 各种基准的实证评估表明,拟议的FIM与基线相比,合作业绩显著改善。


Article 22

Title@2025-06-24 (2): ChatModel: Automating Reference Model Design and Verification with LLMs

Title: ChatModel: Automating Reference Model Design and Verification with LLMs ChatModel: Automatisieren von Referenzmodell-Design und Überprüfung mit LLMs 聊天模式:使用LLMs自动使用参考模型设计和核查 2506.15066v2

Authors (6): Jianmin Ye, Tianyang Liu, Qi Tian, Shengchu Su, Zhe Jiang, Xi Wang

As the complexity of integrated circuit designs continues to escalate, the functional verification becomes increasingly challenging. Reference models, critical for accelerating the verification process, are themselves becoming more intricate and time-consuming to develop. Despite the promise shown by large language models (LLMs) in code programming, effectively generating complex reference models remains a significant hurdle. To address these challenges, we introduce ChatModel, the first LLM-aided agile reference model generation and verification platform. ChatModel streamlines the transition from design specifications to fully functional reference models by integrating design standardization and hierarchical agile modeling. Employing a building-block generation strategy, it not only enhances the design capabilities of LLMs for reference models but also significantly boosts verification efficiency. We evaluated ChatModel on 300 designs of varying complexity, demonstrating substantial improvements in both efficiency and quality of reference model generation. ChatModel achieved a peak performance improvement of 55.02% compared to alternative methods, with notable enhancements in generation stability, and delivered a 9.18x increase in its capacity to produce reference model designs. Furthermore, it accelerated the iterative process of reference model design and validation by an average of 5.90x compared to traditional approaches. These results highlight the potential of ChatModel to significantly advance the automation of reference model generation and validation.

由于集成电路设计的复杂性继续升级,功能性核查变得日益具有挑战性。对于加速核查进程至关重要的参考模型本身正在变得更加复杂和耗时地开发。尽管大型语言模型(LLMs)在代码编程中显示了希望,但有效生成复杂的参考模型仍是一个重大障碍。为了应对这些挑战,我们引入了ChatModel,即第一个由LLM协助的LLM型快速参考模型生成和核查平台。ChatModel通过整合设计标准化和等级灵活建模,简化了从设计规格向完全功能性能参考模型的过渡。采用建筑区块生成战略,不仅提高了LLMs用于参考模型的设计能力,而且还大大提高了核查效率。我们评估了300种不同复杂设计的ChatModel,表明在创建参考模型的效率和质量方面都有很大改进。ChatModel实现了与替代方法相比最高性能改进55.02%,并显著加强了生产参考模型设计的能力。此外,它加速了参考模型设计和验证的迭接过程,比传统模型的参照率平均提高了5.90x。这些结果突出表明了Chadel的自动化。


Article 23

Title@2025-06-24 (2): Computing Tree Structures in Anonymous Graphs via Mobile Agents

Title: Computing Tree Structures in Anonymous Graphs via Mobile Agents Berechnung von Baumstrukturen in anonymen Graphen über Mobile Agents 通过移动代理器在匿名图纸中的电子树结构 2506.19365v1

Authors (3): Prabhat Kumar Chand, Manish Kumar, Anisur Rahaman Molla

Minimum Spanning Tree (MST) and Breadth-First Search (BFS) tree constructions are classical problems in distributed computing, traditionally studied in the message-passing model, where static nodes communicate via messages. This paper investigates MST and BFS tree construction in an agent-based network, where mobile agents explore a graph and compute. Each node hosts one agent, and communication occurs when agents meet at a node. We consider $n$ agents initially dispersed (one per node) in an anonymous, arbitrary $n$-node, $m$-edge graph $G$. The goal is to construct the BFS and MST trees from this configuration such that each tree edge is known to at least one of its endpoints, while minimizing time and memory per agent. We work in a synchronous model and assume agents have no prior knowledge of any graph parameters such as $n$, $m$, $D$, $\Delta$ (graph diameter and maximum degree). Prior work solves BFS in $O(D\Delta)$ rounds with $O(\log n)$ bits per agent, assuming the root is known. We give a deterministic algorithm that constructs the BFS tree in $O(\min(D\Delta, m\log n) + n\log n + \Delta \log^2 n)$ rounds using $O(\log n)$ bits per agent without root knowledge. To determine the root, we solve leader election and MST construction. We elect a leader and construct the MST in $O(n\log n + \Delta \log^2 n)$ rounds, with $O(\log n)$ bits per agent. Prior MST algorithms require $O(m + n\log n)$ rounds and $\max(\Delta, \log n) \log n$ bits. Our results significantly improve memory efficiency and time, achieving nearly linear-time leader election and MST. Agents are assumed to know $\lambda$, the maximum identifier, bounded by a polynomial in $n$.

最小树( MST) 和 Breadth- First (BFS) 树的构造是分布式计算中的典型问题, 这是传统上在信息传递模式中研究的经典问题, 其中静态节点通过信件进行交流。 此文件调查在基于代理的网络中 MST 和 BFS 树的构造, 其中移动代理者探索一个图形和计算。 每个节点都拥有一个代理商, 当代理商在节点中相会时, 通信就会发生。 我们认为, 美元( 美元/ nd) 最初以匿名方式分散( 美元/ nd) (美元/ 美元- 美元, 美元- 美元- 美元- 美元) 。 目标在于从此配置中构建 BFS 和 MST 树端至少知道一个端点, 时间\ m 模式和假设任何图形参数, 例如 $, 美元、 美元、 美元、 美元、 ndelta 和 最高度。 以 美元( D) 以 美元) 以 美元( 美元) 美元, 和 美元, 美元, 前工作解决 美元, 美元, 以 美元, 美元, 美元, 确定 美元- 确定 美元- 将 美元 美元 美元 美元- 美元- 树流流流流 建立一个 游戏的 游戏的 , , , , 根流流流流流流流 , , , , , 以 美元- 美元, 流流 流 流流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 流 , , , , 流 流 流 , , , 流 流 流 流 流 , , 流 流 流 流 流, 流 流 流 流 流 流 流 流 流, 流, 流, 流, 流, 流, 流, 流, 流, 流, 流, 流, 流, 流


Article 24

Title@2025-06-24 (2): PBFT-Backed Semantic Voting for Multi-Agent Memory Pruning

Title: PBFT-Backed Semantic Voting for Multi-Agent Memory Pruning PBFT-unterstützte semantische Abstimmung für Multi-Agent Memory Pruning PBFT 多重机构内存缓冲后退的语义投票 2506.17338v2

Authors (1): Duong Bach

The proliferation of multi-agent systems (MAS) in complex, dynamic environments necessitates robust and efficient mechanisms for managing shared knowledge. A critical challenge is ensuring that distributed memories remain synchronized, relevant, and free from the accumulation of outdated or inconsequential data - a process analogous to biological forgetting. This paper introduces the Co-Forgetting Protocol, a novel, comprehensive framework designed to address this challenge by enabling synchronized memory pruning in MAS. The protocol integrates three key components: (1) context-aware semantic voting, where agents utilize a lightweight DistilBERT model to assess the relevance of memory items based on their content and the current operational context; (2) multi-scale temporal decay functions, which assign diminishing importance to memories based on their age and access frequency across different time horizons; and (3) a Practical Byzantine Fault Tolerance (PBFT)-based consensus mechanism, ensuring that decisions to retain or discard memory items are agreed upon by a qualified and fault-tolerant majority of agents, even in the presence of up to f Byzantine (malicious or faulty) agents in a system of N greater than or equal to 3f+1 agents. The protocol leverages gRPC for efficient inter-agent communication and Pinecone for scalable vector embedding storage and similarity search, with SQLite managing metadata. Experimental evaluations in a simulated MAS environment with four agents demonstrate the protocol’s efficacy, achieving a 52% reduction in memory footprint over 500 epochs, 88% voting accuracy in forgetting decisions against human-annotated benchmarks, a 92% PBFT consensus success rate under simulated Byzantine conditions, and an 82% cache hit rate for memory access.

多试剂系统(MAS)在复杂、动态环境中的扩散,使得管理共享知识的强大和高效机制成为了管理共享知识的强大和高效机制。一个关键的挑战是如何确保分布式记忆保持同步、相关且不受过时或无关紧要的数据积累的影响,这是一个类似于生物遗忘的过程。本文件介绍了《共同制定议定书》,这是一个新颖的、全面的框架,旨在通过在MAS中同步存储存储存储来应对这一挑战。协议包含三个关键组成部分:(1) 符合环境需要的语义表决,即代理使用一种轻巧的DistilBERT基准来评估根据其内容和当前操作环境的记忆项目的相关性;(2) 多尺度的时间衰减功能,根据时间跨不同时间跨度的存取频率,赋予记忆越来越不重要的重要性;(3) 实用的Byzantine Fault容忍(PBFFFFFT)基于共识的机制,确保保留或丢弃存储记忆物品的决定得到合格和不宽容的多数物剂的同意,即使存在比赞丁基准(错误或错误)基准,评估记忆项目的相关性;(2) 高级时间缩缩缩缩缩缩(PRPB) 和类似存储剂在4的存储器中,实现搜索速度(RPL) 速度的智能的流流流流流流中,实现。


Article 25

Title@2025-06-23 (1): Low-Cost Infrastructure-Free 3D Relative Localization with Sub-Meter Accuracy in Near Field

Title: Low-Cost Infrastructure-Free 3D Relative Localization with Sub-Meter Accuracy in Near Field Low-Cost-Infrastruktur-freie 3D-relative Lokalisierung mit Sub-Meter-Genauigkeit im Nahfeld 低成本基础设施-无3D 相对本地化,近地有亚彼得精密度 2506.19199v1

Authors (4): Qiangsheng Gao, Ka Ho Cheng, Li Qiu, Zijun Gong

Relative localization in the near-field scenario is critically important for unmanned vehicle (UxV) applications. Although related works addressing 2D relative localization problem have been widely studied for unmanned ground vehicles (UGVs), the problem in 3D scenarios for unmanned aerial vehicles (UAVs) involves more uncertainties and remains to be investigated. Inspired by the phenomenon that animals can achieve swarm behaviors solely based on individual perception of relative information, this study proposes an infrastructure-free 3D relative localization framework that relies exclusively on onboard ultra-wideband (UWB) sensors. Leveraging 2D relative positioning research, we conducted feasibility analysis, system modeling, simulations, performance evaluation, and field tests using UWB sensors. The key contributions of this work include: derivation of the Cram'er-Rao lower bound (CRLB) and geometric dilution of precision (GDOP) for near-field scenarios; development of two localization algorithms – one based on Euclidean distance matrix (EDM) and another employing maximum likelihood estimation (MLE); comprehensive performance comparison and computational complexity analysis against state-of-the-art methods; simulation studies and field experiments; a novel sensor deployment strategy inspired by animal behavior, enabling single-sensor implementation within the proposed framework for UxV applications. The theoretical, simulation, and experimental results demonstrate strong generalizability to other 3D near-field localization tasks, with significant potential for a cost-effective cross-platform UxV collaborative system.

虽然对无人驾驶地面飞行器(UGVs)进行了广泛的研究,但无人驾驶航空器(UAVs)的三维假设情景中的问题涉及更多的不确定性,仍有待调查。受动物仅根据个人对相对信息的看法就可以实现群状行为的现象的启发,本研究报告提议了一个完全依靠超广频带传感器的无基础设施的三维相对本地化框架。利用2D相对定位研究,我们进行了可行性研究、系统建模、模拟、性能评估和使用UWB传感器的实地测试。这项工作的主要贡献包括:对Cram\er-Rao低约束(CRLB)的推断以及近地情景假设情景精确度的几何分解(GDOP)的推介;开发了两种U本地化算法 – – 一种完全依靠Euclidean 超广频带传感器(EDMM),另一种则采用最大的可能性估算(MLE);根据具有先导力的实地测试、具有先导力的实地测试力的系统应用进行全面的绩效比较和计算; 一种具有先导力的、先导力的实地实验性、先导力的实地研究; 一种拟议的单一实验性战略的实地研究; 一种对具有先导力的实地应用的、先导力的系统进行重大的实地和实验性试验式的实地研究; 一种新的实验性试验式的实地试验结果的模型式的模型式研究; 一种新的实验性研究; 一种新的实验性研究; 一种新的实验性研究; 一种新的实验性研究; 一种新的实验性战略的实地试验性研究; 一种为制式的实地试验式的实地试验式的试验性研究; 一种为制式的试验式的模型式的实地试验式的试验式的模型式的试验性研究; 一种对一种对准的试验性研究; 一种对准性研究; 一种新的的试验性研究; 一种在新的的试验性试验性研究; 一种为制式的试验性研究; 一种新的的实验性试验性试验性研究; 一种新的的实验性研究; 一种新的的实验性研究; 和试验式的试验性试验性试验式的试验式的试验式的试验式的试验式的试验式的试验式的试验式的试验式的试验式的试验制式的


Article 26

Title@2025-06-23 (1): Experimental Setup and Software Pipeline to Evaluate Optimization based Autonomous Multi-Robot Search Algorithms

Title: Experimental Setup and Software Pipeline to Evaluate Optimization based Autonomous Multi-Robot Search Algorithms Experimentelle Einrichtung und Software-Pipeline zur Bewertung von Optimierungs-basierten autonomen Multi-Roboter-Suche Algorithmen 实验设置和软件管道以评价基于优化的自动多机器人搜索算法 2506.16710v2

Authors (5): Aditya Bhatt, Mary Katherine Corra, Franklin Merlo, Prajit KrisshnaKumar, Souma Chowdhury

Signal source localization has been a problem of interest in the multi-robot systems domain given its applications in search & rescue and hazard localization in various industrial and outdoor settings. A variety of multi-robot search algorithms exist that usually formulate and solve the associated autonomous motion planning problem as a heuristic model-free or belief model-based optimization process. Most of these algorithms however remains tested only in simulation, thereby losing the opportunity to generate knowledge about how such algorithms would compare/contrast in a real physical setting in terms of search performance and real-time computing performance. To address this gap, this paper presents a new lab-scale physical setup and associated open-source software pipeline to evaluate and benchmark multi-robot search algorithms. The presented physical setup innovatively uses an acoustic source (that is safe and inexpensive) and small ground robots (e-pucks) operating in a standard motion-capture environment. This setup can be easily recreated and used by most robotics researchers. The acoustic source also presents interesting uncertainty in terms of its noise-to-signal ratio, which is useful to assess sim-to-real gaps. The overall software pipeline is designed to readily interface with any multi-robot search algorithm with minimal effort and is executable in parallel asynchronous form. This pipeline includes a framework for distributed implementation of multi-robot or swarm search algorithms, integrated with a ROS (Robotics Operating System)-based software stack for motion capture supported localization. The utility of this novel setup is demonstrated by using it to evaluate two state-of-the-art multi-robot search algorithms, based on swarm optimization and batch-Bayesian Optimization (called Bayes-Swarm), as well as a random walk baseline.

信号源本地化一直是多机器人系统域中一个令人感兴趣的问题,因为它在各种工业和室外环境中应用了搜索和救援以及危险本地化等应用程序,因此对多机器人系统域产生了兴趣。 存在多种多机器人搜索算法,这些算法通常会将相关的自主动作规划问题作为无超自然模型或基于信仰的模型优化程序来制定和解决。 这些算法大多只是模拟测试,从而失去了在搜索性能和实时计算性能方面如何在真实的战时物理环境中比较/调控的知识。 为解决这一差距,本文展示了一个新的实验室级物理设置和相关的开源的开源软件管道管道以评价和基准多机器人搜索算法。 所展示的物理设置创新使用一个声学源(这是安全和廉价的)和小型地面机器人(e-pucks)在一个标准的运动封套件环境中运作。 这种设置很容易被重新创造出来,并被多数机器人研究人员用作支持。 声学源源也展示了在两个基于噪音到信号的轨道运行运行率比率方面的令人感兴趣的不确定性, 用于对整个管道界面进行快速的搜索。


Article 27

Title@2025-06-23 (1): Agentic Information Theory: Ergodicity and Intrinsic Semantics of Information Processes

Title: Agentic Information Theory: Ergodicity and Intrinsic Semantics of Information Processes Agentische Informationstheorie: Ergodikität und Intrinsische Semantik von Informationsprozessen 代理信息理论:信息过程的分化和内在的语义 2505.19275v2

Authors (2): James P. Crutchfield, Alexandra Jurgens

We develop information theory for the temporal behavior of memoryful agents moving through complex – structured, stochastic – environments. We introduce information processes – stochastic processes produced by cognitive agents in real-time as they interact with and interpret incoming stimuli. We provide basic results on the ergodicity and semantics of the resulting time series of Shannon information measures that monitor an agent’s adapting view of uncertainty and structural correlation in its environment.

我们为在复杂的 – – 结构化的、随机的 – – 环境中移动的记忆性物剂的时间行为发展信息理论。我们引入了信息过程 – – 认知性物剂在与进取刺激进行互动和解释时实时生成的随机过程。我们提供了由此产生的香农信息措施的时间序列的灵敏性和语义学基本结果,监测一个物剂对其环境中的不确定性和结构相关性的适应观点。


Article 28

Title@2025-06-23 (1): Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Sequential Auctions under Unknown Environments

Title: Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Sequential Auctions under Unknown Environments Online-Lernen für dynamischen Vickrey-Clarke-Groves-Mechanismus in sequenziellen Auktionen unter unbekannten Umgebungen 在未知环境中有顺序拍卖的动态Vickrey-Clark-Groves机制在线学习 2506.19038v1

Authors (2): Vincent Leon, S. Rasoul Etesami

We consider the problem of online dynamic mechanism design for sequential auctions in unknown environments, where the underlying market and, thus, the bidders’ values vary over time as interactions between the seller and the bidders progress. We model the sequential auctions as an infinite-horizon average-reward Markov decision process (MDP), where the transition kernel and reward functions are unknown to the seller. In each round, the seller determines an allocation and a payment for each bidder. Each bidder receives a private reward and submits a sealed bid to the seller. The state, which represents the underlying market, evolves according to an unknown transition kernel and the seller’s allocation policy. Unlike existing works that formulate the problem as a multi-armed bandit model or as an episodic MDP, where the environment resets to an initial state after each round or episode, our paper considers a more realistic and sophisticated setting in which the market continues to evolve without restarting. We first extend the Vickrey-Clarke-Groves (VCG) mechanism, which is known to be efficient, truthful, and individually rational for one-shot static auctions, to sequential auctions, thereby obtaining a dynamic VCG mechanism counterpart that preserves these desired properties. We then focus on the online setting and develop an online reinforcement learning algorithm for the seller to learn the underlying MDP model and implement a mechanism that closely resembles the dynamic VCG mechanism. We show that the learned online mechanism asymptotically converges to a dynamic mechanism that approximately satisfies efficiency, truthfulness, and individual rationality with arbitrarily high probability and achieves guaranteed performance in terms of various notions of regret.

我们考虑了在未知环境中进行连续拍卖的在线动态机制设计问题,在这种环境中,基本市场和因此投标人的价值随时间变化而随着卖方和投标人之间的互动进展而变化。我们把连续拍卖模拟为无穷无尽的平均回报马尔科夫裁决程序(MDP),卖方不知道过渡核心和奖励功能。在每一回合中,卖方决定每个投标人的分配和付款。每个投标人都得到私人奖励,并向卖方提交密封出价。代表基础市场的理性市场,根据未知的过渡内核和卖方的分配政策演变。与将问题发展成多武装盗匪模式或分明型MDP(MDP)的现有工作不同,在每次回合或事件后,环境转至初始状态的过渡核心和奖励功能。在每回合中,卖方决定为每个投标人确定一个更加现实和复杂的环境,让Vickrey-Clark-Groves(VCG)机制(Vickrey-Clark-GGG)发展,这是众所周知的高效、真实和单独理性的过渡政策。与现有直观的准确的准确性工作模式不同。我们从一个高清晰的准确的准确的准确性规则,从而学习了动态的动态的动态的虚拟拍卖机制。我们学习了动态的动态的系统。


Article 29

Title@2025-06-23 (1): TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation

Title: TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation TRIZ-Agenten: Multi-Agenten-LLM-Ansatz für TRIZ-basierte Innovationen TRIZ 代理物:以TRIZ为基础的创新 2506.18783v1

Authors (2): Kamil Szczepanik, Jarosław A. Chudziak

TRIZ, the Theory of Inventive Problem Solving, is a structured, knowledge-based framework for innovation and abstracting problems to find inventive solutions. However, its application is often limited by the complexity and deep interdisciplinary knowledge required. Advancements in Large Language Models (LLMs) have revealed new possibilities for automating parts of this process. While previous studies have explored single LLMs in TRIZ applications, this paper introduces a multi-agent approach. We propose an LLM-based multi-agent system, called TRIZ agents, each with specialized capabilities and tool access, collaboratively solving inventive problems based on the TRIZ methodology. This multi-agent system leverages agents with various domain expertise to efficiently navigate TRIZ steps. The aim is to model and simulate an inventive process with language agents. We assess the effectiveness of this team of agents in addressing complex innovation challenges based on a selected case study in engineering. We demonstrate the potential of agent collaboration to produce diverse, inventive solutions. This research contributes to the future of AI-driven innovation, showcasing the advantages of decentralized problem-solving in complex ideation tasks.

发明解决问题理论TRIZ是创新和抽象问题的结构性知识框架,旨在寻找创新的解决办法,然而,其应用往往受到复杂和深层次跨学科知识的限制。大语言模型的进步揭示了这一进程部分自动化的新可能性。虽然以前的研究探索了TRIZ应用中的单一LLMs,但本文采用了一种多试剂方法。我们提议了一个基于LLM的多试剂系统,称为TRIZ代理,每个代理都具有专门能力和工具,根据TRIZ方法合作解决创新问题。这个多试剂系统利用具有不同领域专门知识的代理器来有效导航TRIZ步骤。目的是模拟和模拟与语言代理器一起的发明过程。我们评估了这一代理器团队在应对复杂的创新挑战方面的效力。我们展示了代理器协作产生多样化、创新解决方案的潜力。这一研究有助于AI驱动创新的未来,展示了在复杂思维任务中分散解决问题的好处。


Article 30

Title@2025-06-23 (1): Reply to “Emergent LLM behaviors are observationally equivalent to data leakage”

Title: Reply to “Emergent LLM behaviors are observationally equivalent to data leakage” Antwort auf “Emergente LLM-Verhalten sind Beobachtungsäquivalent zu Daten Leckage” 对“紧急LLM行为”的答复在观测上等同于数据泄漏” 2506.18600v1

Authors (3): Ariel Flint Ashery, Luca Maria Aiello, Andrea Baronchelli

A potential concern when simulating populations of large language models (LLMs) is data contamination, i.e. the possibility that training data may shape outcomes in unintended ways. While this concern is important and may hinder certain experiments with multi-agent models, it does not preclude the study of genuinely emergent dynamics in LLM populations. The recent critique by Barrie and T"ornberg [1] of the results of Flint Ashery et al. [2] offers an opportunity to clarify that self-organisation and model-dependent emergent dynamics can be studied in LLM populations, highlighting how such dynamics have been empirically observed in the specific case of social conventions.

当模拟大型语言模型(LLMs)的人口是数据污染时,即培训数据有可能以无意的方式影响结果,一个潜在的关注问题,即培训数据有可能以意外的方式影响结果。虽然这种关注很重要,并可能阻碍多试剂模型的某些实验,但不排除研究LLM人群中真正的新兴动态。Barrie和T'ornberg最近对Flint Ashery等人等人的结果[2]的评议提供了一个机会,以澄清可在LLM人群中研究自我组织和依赖模型的新兴动态,强调在社会公约的具体案例中如何以经验方式观察到这种动态。


Article 31

Title@2025-06-23 (1): Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning

Title: Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning Transformer-Weltmodell für Proben Effizientes Mehr-Agenten-Verstärkungs-Lernen 取样效率高的多机构强化学习世界模式 2506.18537v1

Authors (3): Azad Deihim, Eduardo Alonso, Dimitra Apostolopoulou

We present the Multi-Agent Transformer World Model (MATWM), a novel transformer-based world model designed for multi-agent reinforcement learning in both vector- and image-based environments. MATWM combines a decentralized imagination framework with a semi-centralized critic and a teammate prediction module, enabling agents to model and anticipate the behavior of others under partial observability. To address non-stationarity, we incorporate a prioritized replay mechanism that trains the world model on recent experiences, allowing it to adapt to agents’ evolving policies. We evaluated MATWM on a broad suite of benchmarks, including the StarCraft Multi-Agent Challenge, PettingZoo, and MeltingPot. MATWM achieves state-of-the-art performance, outperforming both model-free and prior world model approaches, while demonstrating strong sample efficiency, achieving near-optimal performance in as few as 50K environment interactions. Ablation studies confirm the impact of each component, with substantial gains in coordination-heavy tasks.

我们展示了多机构变换世界模型(MATWM),这是一个基于变压器的新颖的世界模型,目的是在矢量和图像环境中进行多剂强化学习。MATWM将分散的想象框架与半集中的批评家和团队化预测模块结合起来,使代理商能够模拟和预测他人的行为,在部分可观察性之下。为了解决非常态问题,我们引入了一个优先重播机制,根据最近的经验对世界模型进行培训,使其能够适应代理商不断演变的政策。我们根据一套广泛的基准对MATWM进行了评估,其中包括StarCraft多机构挑战、PettingZoo和MeltingPot。MATWM取得了最先进的业绩,超越了无模式和先前的世界模型方法,同时展示了强大的样本效率,近于50K环境互动的接近最佳业绩。对比研究证实了每个组成部分的影响,在协调重重任务中取得了重大收益。


Article 32

Title@2025-06-23 (1): Autocratic strategies in Cournot oligopoly game

Title: Autocratic strategies in Cournot oligopoly game Autokratische Strategien in Cournot Oligopol Spiel Cournot 寡头寡头寡头游戏中的专制策略 2506.16038v2

Authors (3): Masahiko Ueda, Shoma Yagi, Genki Ichinose

An oligopoly is a market in which the price of a goods is controlled by a few firms. Cournot introduced the simplest game-theoretic model of oligopoly, where profit-maximizing behavior of each firm results in market failure. Furthermore, when the Cournot oligopoly game is infinitely repeated, firms can tacitly collude to monopolize the market. Such tacit collusion is realized by the same mechanism as direct reciprocity in the repeated prisoner’s dilemma game, where mutual cooperation can be realized whereas defection is favorable for both prisoners in one-shot game. Recently, in the repeated prisoner’s dilemma game, a class of strategies called zero-determinant strategies attracts much attention in the context of direct reciprocity. Zero-determinant strategies are autocratic strategies which unilaterally control payoffs of players. There were many attempts to find zero-determinant strategies in other games and to extend them so as to apply them to broader situations. In this paper, first, we show that zero-determinant strategies exist even in the repeated Cournot oligopoly game. Especially, we prove that an averagely unbeatable zero-determinant strategy exists, which is guaranteed to obtain the average payoff of the opponents. Second, we numerically show that the averagely unbeatable zero-determinant strategy can be used to promote collusion when it is used against an adaptively learning player, whereas it cannot promote collusion when it is used against two adaptively learning players. Our findings elucidate some negative impact of zero-determinant strategies in oligopoly market.

寡头垄断是一个由少数公司控制货物价格的市场。 库诺引入了最简单的寡头垄断游戏理论模式, 使每个公司利润最大化的行为导致市场失灵。 此外, 当库诺寡头垄断游戏无限重复时, 公司可以暗中勾结垄断市场。 这种暗中勾结与囚犯反复两难游戏的直接对等机制相同, 双方合作可以实现, 而叛逃在一局游戏中对两名囚犯都有利。 最近, 在反复的囚犯两难游戏中, 一种叫零分解策略的策略在直接对等的背景下引起极大关注。 零分解策略是专制策略, 单方面控制玩家的回报。 许多尝试在其他游戏中找到零分化策略, 并将这些策略推广到更广泛的局势中。 首先, 我们显示, 在反复的两难分解策略中, 重复的两难分解策略, 当反复的对手对正反换的策略使用时, , 零分定式策略是无法被使用。 尤其可以证明我们平时, 我们平时的平时, 平时的平时, 平时的策略是会显示我们平时, 一种平均的对平反的策略。


Article 33

Title@2025-06-23 (1): IDCAIS: Inter-Defender Collision-Aware Interception Strategy against Multiple Attackers

Title: IDCAIS: Inter-Defender Collision-Aware Interception Strategy against Multiple Attackers IDCAIS: Inter-Defender Collision-Aware Interception Strategy gegen mehrere Angreifer IDCAIS:针对多攻击者的防御人员碰撞-软件拦截战略 2112.12098v3

Authors (3): Vishnu S. Chipade, Xinyi Wang, Dimitra Panagou

In the prior literature on multi-agent area defense games, the assignments of the defenders to the attackers are done based on a cost metric associated only with the interception of the attackers. In contrast to that, this paper presents an Inter-Defender Collision-Aware Interception Strategy (IDCAIS) for defenders to intercept attackers in order to defend a protected area, such that the defender-to-attacker assignment protocol not only takes into account an interception-related cost but also takes into account any possible future collisions among the defenders on their optimal interception trajectories. In particular, in this paper, the defenders are assigned to intercept attackers using a mixed-integer quadratic program (MIQP) that: 1) minimizes the sum of times taken by defenders to capture the attackers under time-optimal control, as well as 2) helps eliminate or delay possible future collisions among the defenders on the optimal trajectories. To prevent inevitable collisions on optimal trajectories or collisions arising due to time-sub-optimal behavior by the attackers, a minimally augmented control using exponential control barrier function (ECBF) is also provided. Simulations show the efficacy of the approach.

在以前关于多试管区防御游戏的文献中,向攻击者分配维权者是依据仅与拦截攻击者有关的成本衡量标准进行的,与此相反,本文件介绍了一个维权者拦截攻击者以保卫保护区的跨Defender Collision-Aware拦截战略(IDCAIS),这样,维权者对攻击者的任务分配协议不仅考虑到与拦截有关的费用,而且考虑到今后维权者之间在其最佳拦截轨道上可能发生的任何碰撞,特别是,在本文中,维权者被指派使用混合干涉四边程序拦截攻击者,该方案:1) 尽量减少维权者在时间最佳控制下抓捕攻击者的时间总和,2) 帮助消除或推迟今后维权者之间在最佳轨迹上可能发生的碰撞。为了防止因攻击者的时间次偏差行为而导致的最佳轨迹或碰撞不可避免的碰撞,还提供了使用指数控制屏障功能(ECBFI) 最低限度地加强控制。


Article 34

Title@2025-06-22 (7): Wisdom of Crowds Through Myopic Self-Confidence Adaptation

Title: Wisdom of Crowds Through Myopic Self-Confidence Adaptation Weisheit der Massen durch myopische Selbst-Konfidenz-Anpassung 通过短视自信心适应而实现的群众智慧 2506.18195v1

Authors (3): Giacomo Como, Fabio Fagnani, Anton Proskurnikov

The wisdom of crowds is an umbrella term for phenomena suggesting that the collective judgment or decision of a large group can be more accurate than the individual judgments or decisions of the group members. A well-known example illustrating this concept is the competition at a country fair described by Galton, where the median value of the individual guesses about the weight of an ox resulted in an astonishingly accurate estimate of the actual weight. This phenomenon resembles classical results in probability theory and relies on independent decision-making. The accuracy of the group’s final decision can be significantly reduced if the final agents’ opinions are driven by a few influential agents. In this paper, we consider a group of agents who initially possess uncorrelated and unbiased noisy measurements of a common state of the world. Assume these agents iteratively update their estimates according to a simple non-Bayesian learning rule, commonly known in mathematical sociology as the French-DeGroot dynamics or iterative opinion pooling. As a result of this iterative distributed averaging process, each agent arrives at an asymptotic estimate of the state of the world, with the variance of this estimate determined by the matrix of weights the agents assign to each other. Every agent aims at minimizing the variance of her asymptotic estimate of the state of the world; however, such variance is also influenced by the weights allocated by other agents. To achieve the best possible estimate, the agents must then solve a game-theoretic, multi-objective optimization problem defined by the available sets of influence weights. We characterize both the Pareto frontier and the set of Nash equilibria in the resulting game. Additionally, we examine asynchronous best-response dynamics for the group of agents and prove their convergence to the set of strict Nash equilibria.

人群的智慧是一个总括术语,代表着一个大群体的集体判断或决定可能比集团成员的个人判断或决定更准确。一个众所周知的例子说明这个概念的例子是Galton所描述的一个国家博览会的竞争,在那里,个人猜测牛重量的中值导致对实际重量的精确估计,令人惊讶地令人吃惊。这种现象类似于典型的概率理论结果,并依赖于独立的决策。如果最终代理人的意见是由少数有影响力的代理人驱动,那么集团最后决定的准确性可以大大降低。在本文件中,我们认为一组最初拥有世界共同状态不相关和不偏袒的激烈测量的代理人。在Galton所描述的国家博览会中,个人猜测牛重量的中值导致对实际重量的精确性估计,而数学社会学通常称之为法国-DeGroot动力或迭接式意见的集合。由于这种迭接式的分布平均过程,每个代理人对世界状况的精确性估计,每个代理人的精度的精度和精度的精度的精度估计,我们所认定的精度的精度的精度的精度,其精度的精度的精度的精度的精度的精度将每个代理人的精度估计结果作为其他代理人的精度。


Article 35

Title@2025-06-22 (7): Multi-Agent Soft Actor-Critic with Coordinated Loss for Autonomous Mobility-on-Demand Fleet Control

Title: Multi-Agent Soft Actor-Critic with Coordinated Loss for Autonomous Mobility-on-Demand Fleet Control Multi-Agent Soft Actor-Critic mit koordiniertem Verlust für autonome Mobilität-auf-Demand-Flotte-Kontrolle 多代理商软软软操作器-对自动机动按需机动车队控制协调损失具有协调损失的批评 2404.06975v2

Authors (5): Zeno Woywood, Jasper I. Wiltfang, Julius Luy, Tobias Enders, Maximilian Schiffer

We study a sequential decision-making problem for a profit-maximizing operator of an autonomous mobility-on-demand system. Optimizing a central operator’s vehicle-to-request dispatching policy requires efficient and effective fleet control strategies. To this end, we employ a multi-agent Soft Actor-Critic algorithm combined with weighted bipartite matching. We propose a novel vehicle-based algorithm architecture and adapt the critic’s loss function to appropriately consider coordinated actions. Furthermore, we extend our algorithm to incorporate rebalancing capabilities. Through numerical experiments, we show that our approach outperforms state-of-the-art benchmarks by up to 12.9% for dispatching and up to 38.9% with integrated rebalancing.

我们为自动按需流动系统的利润最大化操作者研究一个顺序决策问题。优化中央操作者的车辆对请求调度政策需要高效和有效的车队控制战略。为此,我们采用多试剂SoftAcor-Critic 算法,加上加权双方匹配。我们建议一个新的基于车辆的算法结构,并调整评论家的损失功能,以适当考虑协调行动。此外,我们扩展我们的算法,将再平衡能力纳入其中。通过数字实验,我们显示我们的方法比最先进的标准高达12.9%的发送率和38.9%的整合再平衡率。


Article 36

Title@2025-06-22 (7): Physics-Informed Multi-Agent Reinforcement Learning for Distributed Multi-Robot Problems

Title: Physics-Informed Multi-Agent Reinforcement Learning for Distributed Multi-Robot Problems Physik-informiertes Multi-Agenten-Verstärkungs-Lernen für verteilte Multi-Roboter-Probleme 为分布式多机器人问题进行物理化多机构强化学习 2401.00212v4

Authors (5): Eduardo Sebastian, Thai Duong, Nikolay Atanasov, Eduardo Montijano, Carlos Sagues

The networked nature of multi-robot systems presents challenges in the context of multi-agent reinforcement learning. Centralized control policies do not scale with increasing numbers of robots, whereas independent control policies do not exploit the information provided by other robots, exhibiting poor performance in cooperative-competitive tasks. In this work we propose a physics-informed reinforcement learning approach able to learn distributed multi-robot control policies that are both scalable and make use of all the available information to each robot. Our approach has three key characteristics. First, it imposes a port-Hamiltonian structure on the policy representation, respecting energy conservation properties of physical robot systems and the networked nature of robot team interactions. Second, it uses self-attention to ensure a sparse policy representation able to handle time-varying information at each robot from the interaction graph. Third, we present a soft actor-critic reinforcement learning algorithm parameterized by our self-attention port-Hamiltonian control policy, which accounts for the correlation among robots during training while overcoming the need of value function factorization. Extensive simulations in different multi-robot scenarios demonstrate the success of the proposed approach, surpassing previous multi-robot reinforcement learning solutions in scalability, while achieving similar or superior performance (with averaged cumulative reward up to x2 greater than the state-of-the-art with robot teams x6 larger than the number of robots at training time). We also validate our approach on multiple real robots in the Georgia Tech Robotarium under imperfect communication, demonstrating zero-shot sim-to-real transfer and scalability across number of robots.

多机器人系统的网络性质在多试剂加固学习方面提出了挑战。中央控制政策没有随着机器人数量的增加而扩大规模,而独立控制政策没有利用其他机器人提供的信息,在合作竞争任务中表现不佳。在这项工作中,我们提议了一个物理知情强化学习方法,能够学习分布式多机器人控制政策,这种政策既可缩放,又能利用每个机器人掌握的所有可用信息。我们的方法有三个关键特点。首先,它要求政策代表采用港口-汉堡结构,尊重物理机器人系统的节能特性和机器人团队互动的网络性质。第二,它利用自我意识,确保政策代表很少,能够从互动图中处理每个机器人的时间变化信息。第三,我们提出了一个软的行为体-加速学习算法,该算法以我们自我使用港口-汉堡控制政策为参数,它说明了培训期间机器人之间的关联性,同时克服了价值函数化的需要。 在不同的多机器人系统化假设中进行广泛的模拟,在不同的多机器人系统化情景下进行广泛的模拟,以确保在互动图中能够处理每个机器人之间时间变化的信息。第三,我们提出的数字比以往的升级方法更具有更高的性,同时,我们提出的数字级的升级性方法比以往的升级性,在机器人升级方法中学习更高级的升级性,比以往的升级性方法更能更接近于更高的性。


Article 37

Title@2025-06-22 (7): RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Title: RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation RoboTwin 2.0: Ein skalierbarer Datengenerator und Benchmark mit starker Domain Randomisierung für robuste bimanuelle Robotermanipulation RoboTwin 2. 0 : 一个可缩放数据生成器和基准, 具有强力域随机化功能, 用于机械二手机器人操纵的可缩放数据生成器和基准 2506.18088v1

Authors (26): Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Qiwei Liang, Zixuan Li, Xianliang Lin, Yiheng Ge, Zhenyu Gu, Weiliang Deng, Yubin Guo, Tian Nian, Xuanbing Xie, Qiangyu Chen, Kailun Su, Tianling Xu, Guodong Liu, Mengkang Hu, Huan-ang Gao, Kaixuan Wang, Zhixuan Liang, Yusen Qin, Xiaokang Yang, Ping Luo, Yao Mu

Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We present RoboTwin 2.0, a scalable simulation framework that enables automated, large-scale generation of diverse and realistic data, along with unified evaluation protocols for dual-arm manipulation. We first construct RoboTwin-OD, a large-scale object library comprising 731 instances across 147 categories, each annotated with semantic and manipulation-relevant labels. Building on this foundation, we develop an expert data synthesis pipeline that combines multimodal large language models (MLLMs) with simulation-in-the-loop refinement to generate task-level execution code automatically. To improve sim-to-real transfer, RoboTwin 2.0 incorporates structured domain randomization along five axes: clutter, lighting, background, tabletop height and language instructions, thereby enhancing data diversity and policy robustness. We instantiate this framework across 50 dual-arm tasks spanning five robot embodiments, and pre-collect over 100,000 domain-randomized expert trajectories. Empirical results show a 10.9% gain in code generation success and improved generalization to novel real-world scenarios. A VLA model fine-tuned on our dataset achieves a 367% relative improvement (42.0% vs. 9.0%) on unseen scene real-world tasks, while zero-shot models trained solely on our synthetic data achieve a 228% relative gain, highlighting strong generalization without real-world supervision. We release the data generator, benchmark, dataset, and code to support scalable research in robust bimanual manipulation.

模拟数据合成已成为加强真实世界机器人操作的强大范例。然而,现有的合成数据集仍然不足以成为强力的二元操纵,因为有两个挑战:(1) 缺乏高效、可扩缩的新任务数据生成方法,(2) 过于简化的模拟环境,无法捕捉现实世界的复杂性。我们展示了机器人双臂操纵的大规模生成多样化和现实数据的缩放模拟框架,以及双臂操纵的统一评估协议。我们首先建造了机器人双臂操作,这是一个大型的物体库,由147个类别的731个事件组成,每个类别都配有语义和操作相关标签。基于这个基础,我们开发了一个专家的数据合成管道,将多式大语言模型(MLMLMM)与模拟在运行中自动生成任务级执行代码。为了改进模拟到真实传输, RoboTwin 2.0 包含结构化的域级支持,与五个轴相连接:毛线、照明、背景、桌面高度和语言指令,从而提升了相对的数学50级和操作的相对数据多样性,同时展示了我们的数据-直径罗比的模型,同时展示了10级的模型。


Article 38

Title@2025-06-22 (7): Optimization of Flying Ad Hoc Network Topology and Collaborative Path Planning for Multiple UAVs

Title: Optimization of Flying Ad Hoc Network Topology and Collaborative Path Planning for Multiple UAVs Optimierung von Flying Ad Hoc Network Topologie und kollaborative Pfadplanung für mehrere UAVs 优化多无人驾驶航空器飞行特设网络特设网络地形和协作道路规划 2506.17945v1

Authors (5): Ming He, Peizhao Wang, Haihua Chen, Bin Sun, Hongpeng Wang

Multiple unmanned aerial vehicles (UAVs) play a vital role in monitoring and data collection in wide area environments with harsh conditions. In most scenarios, issues such as real-time data retrieval and real-time UAV positioning are often disregarded, essentially neglecting the communication constraints. In this paper, we comprehensively address both the coverage of the target area and the data transmission capabilities of the flying ad hoc network (FANET). The data throughput of the network is therefore maximized by optimizing the network topology and the UAV trajectories. The resultant optimization problem is effectively solved by the proposed reinforcement learning-based trajectory planning (RL-TP) algorithm and the convex-based topology optimization (C-TOP) algorithm sequentially. The RL-TP optimizes the UAV paths while considering the constraints of FANET. The C-TOP maximizes the data throughput of the network while simultaneously constraining the neighbors and transmit powers of the UAVs, which is shown to be a convex problem that can be efficiently solved in polynomial time. Simulations and field experimental results show that the proposed optimization strategy can effectively plan the UAV trajectories and significantly improve the data throughput of the FANET over the adaptive local minimum spanning tree (A-LMST) and cyclic pruning-assisted power optimization (CPAPO) methods.

多个无人驾驶飞行器(无人驾驶飞行器)在条件恶劣的大面积环境中的监测和数据收集方面发挥着至关重要的作用。在多数假设中,实时数据检索和实时无人驾驶飞行器定位等问题往往被忽略,基本上忽视了通信限制。在本文件中,我们全面处理目标区域的覆盖范围和飞行临时网络(FANET)的数据传输能力。因此,通过优化网络地形和无人驾驶飞行器的轨迹,网络的数据输送量达到最大化。由此产生的优化问题通过拟议的强化学习轨迹规划(RL-TP)算法和基于 convex的地形优化(C-TOP)算法得到有效解决。RL-TP在考虑飞行临时网络(FANET)的限制的同时,优化了目标区域的覆盖范围和飞行临时网络的数据传输能力。结果显示,由此产生的优化战略可以有效地改进AVA-S-ST的最优化地方方法,从而大大改进A-MA-MA-SEA的最起码的调整方法。


Article 39

Title@2025-06-22 (7): Effective Red-Teaming of Policy-Adherent Agents

Title: Effective Red-Teaming of Policy-Adherent Agents Effektives Red-Teaming von Policy-Adherent Agents 有效的政策协调代理人红队 2506.09600v2

Authors (6): Itay Nakash, George Kour, Koren Lazar, Matan Vetzler, Guy Uziel, Ateret Anaby-Tavor

Task-oriented LLM-based agents are increasingly used in domains with strict policies, such as refund eligibility or cancellation rules. The challenge lies in ensuring that the agent consistently adheres to these rules and policies, appropriately refusing any request that would violate them, while still maintaining a helpful and natural interaction. This calls for the development of tailored design and evaluation methodologies to ensure agent resilience against malicious user behavior. We propose a novel threat model that focuses on adversarial users aiming to exploit policy-adherent agents for personal benefit. To address this, we present CRAFT, a multi-agent red-teaming system that leverages policy-aware persuasive strategies to undermine a policy-adherent agent in a customer-service scenario, outperforming conventional jailbreak methods such as DAN prompts, emotional manipulation, and coercive. Building upon the existing tau-bench benchmark, we introduce tau-break, a complementary benchmark designed to rigorously assess the agent’s robustness against manipulative user behavior. Finally, we evaluate several straightforward yet effective defense strategies. While these measures provide some protection, they fall short, highlighting the need for stronger, research-driven safeguards to protect policy-adherent agents from adversarial attacks

以任务为导向的LLM代理商越来越多地在有严格政策的领域使用,例如退税资格或注销规则。挑战在于确保代理商始终遵守这些规则和政策,适当拒绝违反这些规则和政策的任何要求,同时保持有益和自然的互动。这要求制定有针对性的设计和评价方法,以确保代理商抵御恶意用户行为的能力。我们提出了一个新的威胁模式,以对抗性用户为重点,目的是利用政策适应性代理商谋取个人利益。为了解决这个问题,我们提出了CRAFT,这是一个多试剂红色组合系统,利用政策认知的有说服力战略,在客户服务情景中破坏政策适应性代理商,超过常规的破狱方法,如丹麦语提示、情感操纵和胁迫性。我们在现有的Tau-bench基准的基础上,我们引入Tau-break,这是一个补充性基准,旨在严格评估代理商抵御操纵性用户行为的强健性。最后,我们评估了若干直接而有效的防御战略。这些措施提供了一些保护,但很短,它们需要强调为保护政策适应性攻击的代理人免受敌对性攻击需要更有力、更强有力的研究驱动的保障措施。


Article 40

Title@2025-06-21 (6): The Hive Mind is a Single Reinforcement Learning Agent

Title: The Hive Mind is a Single Reinforcement Learning Agent Der Hive Mind ist ein einzelner Verstärkungs-Lernagent 蜂巢思想是单兵增援学习代理 2410.17517v3

Authors (4): Karthik Soma, Yann Bouteiller, Heiko Hamann, Giovanni Beltrame

Decision-making is an essential attribute of any intelligent agent or group. Natural systems are known to converge to optimal strategies through at least two distinct mechanisms: collective decision-making via imitation of others, and individual trial-and-error. This paper establishes an equivalence between these two paradigms by drawing from the well-established collective decision-making model of nest-site selection in swarms of honey bees. We show that the emergent distributed cognition (sometimes referred to as the hive mind ) arising from individual bees following simple, local imitation-based rules is equivalent to a single online reinforcement learning (RL) agent interacting with many parallel environments. The update rule through which this macro-agent learns is a bandit algorithm that we coin Maynard-Cross Learning. Our analysis implies that a group of cognition-limited organisms can be on-par with a more complex, reinforcement-enabled entity, substantiating the idea that group-level intelligence may explain how seemingly simple and blind individual behaviors are selected in nature.

自然系统通过至少两个不同的机制(通过模仿他人的集体决策,以及个别试验和操作)与最佳战略趋同。本文件从蜂蜜群中牢固确立的巢穴选择集体决策模式中,确定这两种模式的等同性。我们发现,由单个蜜蜂遵循简单、以本地模仿为基础的规则产生的新分散的认知(有时称为蜂窝心)相当于一个单一的在线强化学习(RL)代理与许多平行环境互动。这个宏观代理商学习的更新规则是我们发明梅纳德-克罗斯学习的强盗算法。我们的分析意味着,受认知限制的有机体可以与一个更复杂、增强能力的实体进行分离,从而证明集团级情报可以解释如何在性质上选择表面上简单和盲目的个人行为。


Article 41

Title@2025-06-21 (6): Bayesian Social Deduction with Graph-Informed Language Models

Title: Bayesian Social Deduction with Graph-Informed Language Models Bayesische soziale Deduktion mit Graphen-informierten Sprachmodellen 采用图形化语言模型的巴伊斯社会衰退 2506.17788v1

Authors (7): Shahab Rahimirad, Guven Gergerli, Lucia Romero, Angela Qian, Matthew Lyle Olson, Simon Stepputtis, Joseph Campbell

Social reasoning - inferring unobservable beliefs and intentions from partial observations of other agents - remains a challenging task for large language models (LLMs). We evaluate the limits of current reasoning language models in the social deduction game Avalon and find that while the largest models demonstrate strong performance, they require extensive test-time inference and degrade sharply when distilled to smaller, real-time-capable variants. To address this, we introduce a hybrid reasoning framework that externalizes belief inference to a structured probabilistic model, while using an LLM for language understanding and interaction. Our approach achieves competitive performance with much larger models in Agent-Agent play and, notably, is the first language agent to defeat human players in a controlled study - achieving a 67% win rate and receiving higher qualitative ratings than both reasoning baselines and human teammates. We release code, models, and a dataset to support future work on social reasoning in LLM agents, which can be found at https://camp-lab-purdue.github.io/bayesian-social-deduction/

社会推理——从其他代理人的部分观察中推断出不易观察的信仰和意图——对于大型语言模型(LLMs)来说,仍然是一项艰巨的任务。我们评估了社会推理游戏阿瓦隆目前推理语言模型的局限性,发现虽然最大的模型表现强劲,但是它们需要广泛的测试-时间推理,在蒸馏成更小的、实时的变体时,它们需要大量测试-时间推论和急剧降解。为了解决这个问题,我们引入了一个混合推理框架,将相信结构化的概率模型外部化,同时使用LLLM来进行语言理解和互动。我们的方法在Agency游戏中取得了更大的模型的竞争性性能,特别是成为在一项受控的研究中击败人类玩家的第一个语言推理——达到67%的赢率和获得比推理基线和人类团队伙伴更高的质量评分。我们发布代码、模型和数据集,以支持LLM代理人今后关于社会推理的工作,可在https://camp-lab-purdue.github.io/baysian-social-deduction/b中找到。


Article 42

Title@2025-06-21 (6): Multi-agent Embodied AI: Advances and Future Directions

Title: Multi-agent Embodied AI: Advances and Future Directions Multi-Agent Embodyd KI: Fortschritte und Zukunftsaussichten AI:进步和未来方向 2505.05108v2

Authors (10): Zhaohan Feng, Ruiqi Xue, Lei Yuan, Yang Yu, Ning Ding, Meiqin Liu, Bingzhao Gao, Jian Sun, Xinhu Zheng, Gang Wang

Embodied artificial intelligence (Embodied AI) plays a pivotal role in the application of advanced technologies in the intelligent era, where AI systems are integrated with physical bodies that enable them to perceive, reason, and interact with their environments. Through the use of sensors for input and actuators for action, these systems can learn and adapt based on real-world feedback, allowing them to perform tasks effectively in dynamic and unpredictable environments. As techniques such as deep learning (DL), reinforcement learning (RL), and large language models (LLMs) mature, embodied AI has become a leading field in both academia and industry, with applications spanning robotics, healthcare, transportation, and manufacturing. However, most research has focused on single-agent systems that often assume static, closed environments, whereas real-world embodied AI must navigate far more complex scenarios. In such settings, agents must not only interact with their surroundings but also collaborate with other agents, necessitating sophisticated mechanisms for adaptation, real-time learning, and collaborative problem-solving. Despite increasing interest in multi-agent systems, existing research remains narrow in scope, often relying on simplified models that fail to capture the full complexity of dynamic, open environments for multi-agent embodied AI. Moreover, no comprehensive survey has systematically reviewed the advancements in this area. As embodied AI rapidly evolves, it is crucial to deepen our understanding of multi-agent embodied AI to address the challenges presented by real-world applications. To fill this gap and foster further development in the field, this paper reviews the current state of research, analyzes key contributions, and identifies challenges and future directions, providing insights to guide innovation and progress in this field.

人工智能(Embodied AI)在智能时代应用先进技术方面发挥着关键作用,在这个时代,人工智能系统与物理机构相结合,使其能够感知、理性和与自身环境互动。通过使用输入感应器和促动器进行行动,这些系统可以在现实世界反馈的基础上学习和适应,使其能够在动态和不可预测的环境中有效开展工作。作为深层次学习(DL),强化学习(RL)和大型语言模型(LLMS)等技术,在智能时代,人工智能系统与物理机构相结合,使其能够感知、理性和与自身环境相结合。然而,大多数研究侧重于单一代理系统系统,这些系统化系统化环境往往包含静态、封闭的环境,而现实世界体现的人工智能则必须更复杂得多的情景。在这种环境下,代理人不仅必须与其周围环境互动,而且还与其他代理机构协作,需要复杂的适应机制、实时学习和协作解决问题。尽管对多代理人系统系统系统系统越来越感兴趣,但现有研究范围仍然狭窄,往往依赖简化的模型,无法全面了解当前动态、封闭的环境,因此,对多层次进行深入分析。


Article 43

Title@2025-06-21 (6): Distributed Butterfly Analysis using Mobile Agents

Title: Distributed Butterfly Analysis using Mobile Agents Verteilte Schmetterlingsanalyse mit mobilen Agenten 使用移动剂进行分布式蝴蝶分析 2506.17721v1

Authors (3): Prabhat Kumar Chand, Apurba Das, Anisur Rahaman Molla

Butterflies, or 4-cycles in bipartite graphs, are crucial for identifying cohesive structures and dense subgraphs. While agent-based data mining is gaining prominence, its application to bipartite networks remains relatively unexplored. We propose distributed, agent-based algorithms for \emph{Butterfly Counting} in a bipartite graph $G((A,B),E)$. Agents first determine their respective partitions and collaboratively construct a spanning tree, electing a leader within $O(n \log \lambda)$ rounds using only $O(\log \lambda)$ bits per agent. A novel meeting mechanism between adjacent agents improves efficiency and eliminates the need for prior knowledge of the graph, requiring only the highest agent ID $\lambda$ among the $n$ agents. Notably, our techniques naturally extend to general graphs, where leader election and spanning tree construction maintain the same round and memory complexities. Building on these foundations, agents count butterflies per node in $O(\Delta)$ rounds and compute the total butterfly count of $G$ in $O(\Delta+\min{ A , B })$ rounds.

蝴蝶或双面图中的4个周期对于确定凝固结构和密集子集至关重要。 虽然基于代理的数据挖掘越来越突出, 但它对双面网络的应用仍然相对没有被探索。 我们用双面图$G( B) E 提出用于 emph{Butterfly 计数的分布式代理算法。 代理人首先决定各自的分区, 并合力建造横贯树, 在$( n) log\lambda) 内选举一位领先者, 仅使用$O( log\ lambda) 的每代理比特。 相邻的代理人之间的新颖会议机制提高了效率, 并消除了先前对图形知识的需求, 只需要最高代理商在$( 美元) 代理商中确定 $( lambda) 。 值得注意的是, 我们的技术自然延伸到一般图, 领导选举和横跨树构造保持同样的圆圈和记忆复杂性。 在这些基础上, 代理人以$( delta) $( Delta) 回合计算每节的蝴蝶。


Article 44

Title@2025-06-21 (6): Towards Zero-Shot Coordination between Teams of Agents: The N-XPlay Framework

Title: Towards Zero-Shot Coordination between Teams of Agents: The N-XPlay Framework Auf dem Weg zur Null-Shot-Koordination zwischen Agententeams: Das N-XPlay-Framework 实现各代理小组之间零位零位协调:NXPlay框架 2506.17560v1

Authors (7): Ava Abderezaei, Chi-Hui Lin, Joseph Miceli, Naren Sivagnanadasan, Stéphane Aroca-Ouellette, Jake Brawer, Alessandro Roncone

Zero-shot coordination (ZSC) – the ability to collaborate with unfamiliar partners – is essential to making autonomous agents effective teammates. Existing ZSC methods evaluate coordination capabilities between two agents who have not previously interacted. However, these scenarios do not reflect the complexity of real-world multi-agent systems, where coordination often involves a hierarchy of sub-groups and interactions between teams of agents, known as Multi-Team Systems (MTS). To address this gap, we first introduce N-player Overcooked, an N-agent extension of the popular two-agent ZSC benchmark, enabling evaluation of ZSC in N-agent scenarios. We then propose N-XPlay for ZSC in N-agent, multi-team settings. Comparison against Self-Play across two-, three- and five-player Overcooked scenarios, where agents are split between an ego-team'' and a group of unseen collaborators shows that agents trained with N-XPlay are better able to simultaneously balanceintra-team’’ and ``inter-team’’ coordination than agents trained with SP.

零点协调(ZSC) – – 与不熟悉的合作伙伴合作的能力 – – 是使自主代理公司成为有效的团队伴侣的关键。现有的ZSC方法评价两个以前没有互动的代理公司之间的协调能力。然而,这些假设并不反映真实世界多剂系统的复杂性,其中协调往往涉及一个分组的等级和代理公司团队之间的互动,称为多队系统(MTS)。为了解决这一差距,我们首先引入了N-球员过关,这是流行的两剂ZSC基准的N-试办扩展,有利于在N-试办情况下对ZSC进行评估。我们随后提议在N-试办、多队环境中为ZSC公司提供N-XPlay。对照二、三和五人自我布局的情景,其中代理公司在“ego-team”和一组隐形合作者之间发生分裂,这表明,经过N-XPlay培训的代理公司在“intra-team”和“inter-team”协调方面比受SP培训的代理公司更能同时实现平衡。


Article 45

Title@2025-06-20 (5): On the Power of Spatial Locality on Online Routing Problems

Title: On the Power of Spatial Locality on Online Routing Problems Über die Macht der räumlichen Lokalität bei Online-Routing-Problemen 在线运行问题空间地方空间定位力量 2506.17517v1

Authors (2): Swapnil Guragain, Gokarna Sharma

We consider the online versions of two fundamental routing problems, traveling salesman (TSP) and dial-a-ride (DARP), which have a variety of relevant applications in logistics and robotics. The online versions of these problems concern with efficiently serving a sequence of requests presented in a real-time on-line fashion located at points of a metric space by servers (salesmen/vehicles/robots). In this paper, motivated from real-world applications, such as Uber/Lyft rides, where some limited knowledge is available on the future requests, we propose the {\em spatial locality} model that provides in advance the distance within which new request(s) will be released from the current position of server(s). We study the usefulness of this advanced information on achieving the improved competitive ratios for both the problems with $k\geq 1$ servers, compared to the competitive results established in the literature without such spatial locality consideration. We show that small locality is indeed useful in obtaining improved competitive ratios irrespective of the metric space.

我们考虑两个基本路线问题的在线版本,即旅行销售员(TSP)和拨号(DARP),它们具有物流和机器人方面的多种相关应用,这些问题的在线版本涉及高效率地服务于服务器(销售员/车辆/机器人)在公用空间点以实时在线方式提出的一系列请求。在本文中,由于Uber/Lyft等现实世界应用程序的驱动,对未来请求掌握了有限的知识,我们提议了提供从服务器当前位置释放新请求的距离的“空间地点”模式。我们研究了这一先进信息对于在使用1美元服务器这两个问题上实现更好的竞争性比率的有用性,与文献中确立的没有考虑到空间地点的竞争性结果相比。我们表明,无论是否使用计量空间,小地点确实有助于提高竞争性比率。


Article 46

Title@2025-06-20 (5): Cash or Comfort? How LLMs Value Your Inconvenience

Title: Cash or Comfort? How LLMs Value Your Inconvenience Bargeld oder Komfort? Wie LLMs Wert Ihre Unannehmlichkeit 现金还是安慰? 2506.17367v1

Authors (6): Mateusz Cedro, Timour Ichmoukhamedov, Sofie Goethals, Yifan He, James Hinns, David Martens

Large Language Models (LLMs) are increasingly proposed as near-autonomous artificial intelligence (AI) agents capable of making everyday decisions on behalf of humans. Although LLMs perform well on many technical tasks, their behaviour in personal decision-making remains less understood. Previous studies have assessed their rationality and moral alignment with human decisions. However, the behaviour of AI assistants in scenarios where financial rewards are at odds with user comfort has not yet been thoroughly explored. In this paper, we tackle this problem by quantifying the prices assigned by multiple LLMs to a series of user discomforts: additional walking, waiting, hunger and pain. We uncover several key concerns that strongly question the prospect of using current LLMs as decision-making assistants: (1) a large variance in responses between LLMs, (2) within a single LLM, responses show fragility to minor variations in prompt phrasing (e.g., reformulating the question in the first person can considerably alter the decision), (3) LLMs can accept unreasonably low rewards for major inconveniences (e.g., 1 Euro to wait 10 hours), and (4) LLMs can reject monetary gains where no discomfort is imposed (e.g., 1,000 Euro to wait 0 minutes). These findings emphasize the need for scrutiny of how LLMs value human inconvenience, particularly as we move toward applications where such cash-versus-comfort trade-offs are made on users’ behalf.

大型语言模型(LLMS)越来越多地被推荐为能够代表人类做出日常决策的近自主人工智能(AI)人员。尽管LLMS在很多技术任务上表现良好,但是其个人决策行为仍然不甚为人理解。以前的研究评估了其合理性和与人类决策的道德一致性。然而,在财务奖励与用户舒适性相冲突的情况下,AI助理的行为尚未得到彻底探讨。在本文件中,我们通过量化多个LMS给一系列用户带来不便的价格来解决这个问题:增加步行、等待、饥饿和疼痛。我们发现了一些关键关切,这些关切强烈质疑使用目前的LLMS作为决策助理的可能性:(1)LMS之间在答复方面存在巨大差异,(2)在单一LMM内部,应对措施表明在迅速调整(例如,重塑第一个人的问题可以大大改变决定)方面,AIA助理人员的行为很脆弱。(3)LMS可以接受不合理的低报酬(例如,1欧元到10小时),以及(4)LMS可以拒绝货币收益收益,因为那里不会让用户为决策助理而不愿接受这种不方便的收益,因为我们不得不进行这样的现金审查。


Article 47

Title@2025-06-20 (5): Formal Control for Uncertain Systems via Contract-Based Probabilistic Surrogates (Extended Version)

Title: Formal Control for Uncertain Systems via Contract-Based Probabilistic Surrogates (Extended Version) Formale Kontrolle für unsichere Systeme über kontraktbasierte probabilistische Surrogate (erweiterte Version) 通过基于合同的概率性代管国对不确定系统进行正式控制(例外版本) 2506.16971v1

Authors (3): Oliver Schön, Sofie Haesaert, Sadegh Soudjani

The requirement for identifying accurate system representations has not only been a challenge to fulfill, but it has compromised the scalability of formal methods, as the resulting models are often too complex for effective decision making with formal correctness and performance guarantees. Focusing on probabilistic simulation relations and surrogate models of stochastic systems, we propose an approach that significantly enhances the scalability and practical applicability of such simulation relations by eliminating the need to compute error bounds directly. As a result, we provide an abstraction-based technique that scales effectively to higher dimensions while addressing complex nonlinear agent-environment interactions with infinite-horizon temporal logic guarantees amidst uncertainty. Our approach trades scalability for conservatism favorably, as demonstrated on a complex high-dimensional vehicle intersection case study.

确定准确系统表述的要求不仅是一个有待实现的挑战,而且损害了正式方法的可扩展性,因为所产生的模型往往过于复杂,无法以正式的正确性和性能保障来有效决策。我们注重概率模拟关系和随机系统替代模型,提出一种办法,通过消除直接计算误差界限的需要,大大提高这种模拟关系的可扩展性和实际适用性。结果,我们提供了一种基于抽象的技术,这种技术能够有效地到更高的层面,同时处理复杂的非线性剂-环境相互作用,并用无限高度的时间逻辑来保证在不确定的情况下进行有效决策。我们的方法有利于保守主义的可扩展性,正如在复杂的高维乘车辆交叉案例研究中所表明的那样。


Article 48

Title@2025-06-20 (5): Engineering Resilience: An Energy-Based Approach to Sustainable Behavioural Interventions

Title: Engineering Resilience: An Energy-Based Approach to Sustainable Behavioural Interventions Engineering Resilience: Ein energiebasierter Ansatz für nachhaltige Verhaltensinterventionen 工程复原力:以能源为基础的可持续行为干预办法 2506.16836v1

Authors (5): Arpitha Srivathsa Malavalli, Karthik Sama, Janvi Chhabra, Pooja Bassin, Srinath Srinivasa

Addressing complex societal challenges, such as improving public health, fostering honesty in workplaces, or encouraging eco-friendly behaviour requires effective nudges to influence human behaviour at scale. Intervention science seeks to design such nudges within complex societal systems. While interventions primarily aim to shift the system toward a desired state, less attention is given to the sustainability of that state, which we define in terms of resilience: the system’s ability to retain the desired state even under perturbations. In this work, we offer a more holistic perspective to intervention design by incorporating a nature-inspired postulate i.e., lower energy states tend to exhibit greater resilience, as a regularization mechanism within intervention optimization to ensure that the resulting state is also sustainable. Using a simple agent-based simulation where commuters are nudged to choose eco-friendly options (e.g., cycles) over individually attractive but less eco-friendly ones (e.g., cars), we demonstrate how embedding lower energy postulate into intervention design induces resilience. The system energy is defined in terms of motivators that drive its agent’s behaviour. By inherently ensuring that agents are not pushed into actions that contradict their motivators, the energy-based approach helps design effective interventions that contribute to resilient behavioural states.

应对复杂的社会挑战,如改善公共卫生,促进工作场所的诚实,或鼓励生态友好行为等,需要有效的手段来影响人类大规模的行为。干预科学寻求在复杂的社会系统中设计这样的手段。虽然干预的主要目的是将系统转向理想状态,但我们从复原力的角度出发,较少关注国家的可持续性,我们从以下角度界定了这种可持续性:系统保持理想状态的能力,即使是在扰动下也是如此。在这项工作中,我们为干预设计提供了更加全面的视角,采用了自然激发的假设,即低能源国家往往表现出更大的复原力,作为干预优化中的正规化机制,确保由此产生的国家也具有可持续性。使用简单的代理模拟,在这种模拟中,通勤人员不得不选择无害生态的选项(例如周期),而不是个人有吸引力但不太无害生态的选项(例如汽车),我们展示了将较低的能源后继能力嵌入干预设计如何产生复原力。系统能源的定义是驱动其代理人行为的动力驱动器。通过内在确保代理商不会推向反其行为举动的行为举止。


Article 49

Title@2025-06-20 (5): Reimagining Urban Science: Scaling Causal Inference with Large Language Models

Title: Reimagining Urban Science: Scaling Causal Inference with Large Language Models Reimagining Urban Science: Skalierung von Kausalität mit großen Sprachmodellen 重新想象城市科学:与大语言模型的大规模因果推断 2504.12345v3

Authors (11): Yutong Xia, Ao Qu, Yunhan Zheng, Yihong Tang, Dingyi Zhuang, Yuxuan Liang, Shenhao Wang, Cathy Wu, Lijun Sun, Roger Zimmermann, Jinhua Zhao

Urban causal research is essential for understanding the complex, dynamic processes that shape cities and for informing evidence-based policies. However, current practices are often constrained by inefficient and biased hypothesis formulation, challenges in integrating multimodal data, and fragile experimental methodologies. Imagine a system that automatically estimates the causal impact of congestion pricing on commute times by income group or measures how new green spaces affect asthma rates across neighborhoods using satellite imagery and health reports, and then generates comprehensive, policy-ready outputs, including causal estimates, subgroup analyses, and actionable recommendations. In this Perspective, we propose UrbanCIA, an LLM-driven conceptual framework composed of four distinct modular agents responsible for hypothesis generation, data engineering, experiment design and execution, and results interpretation with policy insights. We begin by examining the current landscape of urban causal research through a structured taxonomy of research topics, data sources, and methodological approaches, revealing systemic limitations across the workflow. Next, we introduce the design principles and technological roadmap for the four modules in the proposed framework. We also propose evaluation criteria to assess the rigor and transparency of these AI-augmented processes. Finally, we reflect on the broader implications for human-AI collaboration, equity, and accountability. We call for a new research agenda that embraces LLM-driven tools as catalysts for more scalable, reproducible, and inclusive urban research.

城市因果研究对于理解影响城市的复杂动态过程和为基于证据的政策提供信息至关重要。然而,目前的做法往往受到以下因素的限制:低效和偏颇的假设、在整合多式联运数据方面的挑战以及脆弱的实验方法。设想一个系统,通过卫星图像和健康报告,自动估计拥挤定价对收入群体通勤时间的因果关系影响,或测量新的绿色空间如何影响社区间哮喘发病率,然后利用卫星图像和健康报告,产生全面的、政策成熟的产出,包括因果估计、分组分析和可操作的建议。我们从这个角度出发,提议由LLLM驱动的概念框架 “ UrbanCIA “ ,由四个不同的模块组成,分别负责假设生成、数据工程、实验设计和执行以及以政策洞察力解释结果。我们首先通过结构化的研究专题、数据来源和方法分类,审视城市因果研究的当前环境,揭示整个工作流程的系统性局限性。我们提出评估这些AI强化型流程的严格性和透明度的评价标准。最后,我们思考对人-AI合作、公平性、实验性、问责性更强的城市研究工具的更广泛影响。我们呼吁一个新的研究议程。


Article 50

Title@2025-06-20 (5): A Scalable Post-Processing Pipeline for Large-Scale Free-Space Multi-Agent Path Planning with PiBT

Title: A Scalable Post-Processing Pipeline for Large-Scale Free-Space Multi-Agent Path Planning with PiBT Eine skalierbare Post-Processing-Pipeline für großräumige Freiraum-Multi-Agenten-Pfadplanung mit PiBT 与 PiBT 合作的大型自由空间多机构多空间路径规划可缩放后处理管道 2506.16748v1

Authors (4): Arjo Chakravarty, Michael X. Grey, M. A. Viraj J. Muthugala, Mohan Rajesh Elara

Free-space multi-agent path planning remains challenging at large scales. Most existing methods either offer optimality guarantees but do not scale beyond a few dozen agents, or rely on grid-world assumptions that do not generalize well to continuous space. In this work, we propose a hybrid, rule-based planning framework that combines Priority Inheritance with Backtracking (PiBT) with a novel safety-aware path smoothing method. Our approach extends PiBT to 8-connected grids and selectively applies string-pulling based smoothing while preserving collision safety through local interaction awareness and a fallback collision resolution step based on Safe Interval Path Planning (SIPP). This design allows us to reduce overall path lengths while maintaining real-time performance. We demonstrate that our method can scale to over 500 agents in large free-space environments, outperforming existing any-angle and optimal methods in terms of runtime, while producing near-optimal trajectories in sparse domains. Our results suggest this framework is a promising building block for scalable, real-time multi-agent navigation in robotics systems operating beyond grid constraints.

自由空间多试剂路径规划仍大范围具有挑战性。 大多数现有方法要么提供最佳保障,但不超过数十个代理,要么依靠网格-世界假设,这些假设并不普遍到连续空间。在这项工作中,我们提议了一个混合的、基于规则的规划框架,将优先继承与后跟踪(PiBT)相结合,采用一种新的安全感知路径平滑方法。我们的方法将PiBT延伸到8个连接的网格,有选择地运用基于平滑的绳索拉法,同时通过当地互动意识和基于安全跨线规划(SIPPP)的反向碰撞解决步骤来保持碰撞安全。这一设计使我们能够缩短总路径长度,同时保持实时性能。我们证明,我们的方法可以在大型自由空间环境中达到500多个代理,在运行时间比现有的任何角和最佳方法都好,同时在稀少的域生成近最佳的轨迹。我们的结果表明,这一框架是一个充满希望的建筑块块,可以在机器人系统中超越网格限制,进行可缩缩缩放的多剂导航。


Article 51

Title@2025-06-20 (5): Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation

Title: Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation Generalisierbare Agentenmodellierung für Agent Collaboration-Competition Anpassung mit Multi-Retrieval und dynamischer Generation 多检索和有活力发电的合作-竞争适应 2506.16718v1

Authors (10): Chenxu Wang, Yonggang Jin, Cheng Hu, Youpeng Zhao, Zipeng Dai, Jian Zhao, Shiyu Huang, Liuyu Xiang, Junge Zhang, Zhaofeng He

Adapting a single agent to a new multi-agent system brings challenges, necessitating adjustments across various tasks, environments, and interactions with unknown teammates and opponents. Addressing this challenge is highly complex, and researchers have proposed two simplified scenarios, Multi-agent reinforcement learning for zero-shot learning and Ad-Hoc Teamwork. Building on these foundations, we propose a more comprehensive setting, Agent Collaborative-Competitive Adaptation (ACCA), which evaluates an agent to generalize across diverse scenarios, tasks, and interactions with both unfamiliar opponents and teammates. In ACCA, agents adjust to task and environmental changes, collaborate with unseen teammates, and compete against unknown opponents. We introduce a new modeling approach, Multi-Retrieval and Dynamic Generation (MRDG), that effectively models both teammates and opponents using their behavioral trajectories. This method incorporates a positional encoder for varying team sizes and a hypernetwork module to boost agents’ learning and adaptive capabilities. Additionally, a viewpoint alignment module harmonizes the observational perspectives of retrieved teammates and opponents with the learning agent. Extensive tests in benchmark scenarios like SMAC, Overcooked-AI, and Melting Pot show that MRDG significantly improves robust collaboration and competition with unseen teammates and opponents, surpassing established baselines. Our code is available at: https://github.com/vcis-wangchenxu/MRDG.git

调整一个单一的代理机构以适应新的多试剂系统带来挑战,需要在各种任务、环境以及与未知的队友和对手互动方面进行调整。应对这一挑战非常复杂,研究人员提出了两种简化方案,即 “ 多试剂强化学习零照学习和Ad-Hoc 团队工作 “ 。基于这些基础,我们提议一个更全面的设置,即合作-竞争适应(ACCA)代理机构,该代理机构负责评价一个机构,以推广不同的情况、任务以及与不熟悉的对手和队友的互动。在ACCAA, 代理机构适应任务和环境变化,与隐蔽的队友合作,并与未知的对手竞争对手竞争。我们采用了新的建模方法,即多恢复和动态一代(MRGDG),有效地模拟队友和反对者使用其行为轨迹。这个方法包括一个不同团队规模的定位编码和一个超网络模块,以提高代理机构的学习和适应能力。此外,一个观点调整模块将回收的队友和反对者与学习代理人的观察观点统一起来。在基准情景中进行广泛的测试,如SMACMAC、O-G、G-G-G-SUG-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-G-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S


Article 52

Title@2025-06-19 (4): SemAgent: A Semantics Aware Program Repair Agent

Title: SemAgent: A Semantics Aware Program Repair Agent SemAgent: Ein Semantik-Bewusst-Programm-Reparatur-Agent SemAgenger: 语义学意识方案维修代理 2506.16650v1

Authors (4): Anvith Pabba, Alex Mathai, Anindya Chakraborty, Baishakhi Ray

Large Language Models (LLMs) have shown impressive capabilities in downstream software engineering tasks such as Automated Program Repair (APR). In particular, there has been a lot of research on repository-level issue-resolution benchmarks such as SWE-Bench. Although there has been significant progress on this topic, we notice that in the process of solving such issues, existing agentic systems tend to hyper-localize on immediately suspicious lines of code and fix them in isolation, without a deeper understanding of the issue semantics, code semantics, or execution semantics. Consequently, many existing systems generate patches that overfit to the user issue, even when a more general fix is preferable. To address this limitation, we introduce SemAgent, a novel workflow-based procedure that leverages issue, code, and execution semantics to generate patches that are complete - identifying and fixing all lines relevant to the issue. We achieve this through a novel pipeline that (a) leverages execution semantics to retrieve relevant context, (b) comprehends issue-semantics via generalized abstraction, (c) isolates code-semantics within the context of this abstraction, and (d) leverages this understanding in a two-stage architecture: a repair stage that proposes fine-grained fixes, followed by a reviewer stage that filters relevant fixes based on the inferred issue-semantics. Our evaluations show that our methodology achieves a solve rate of 44.66% on the SWEBench-Lite benchmark beating all other workflow-based approaches, and an absolute improvement of 7.66% compared to our baseline, which lacks such deep semantic understanding. We note that our approach performs particularly well on issues requiring multi-line reasoning (and editing) and edge-case handling, suggesting that incorporating issue and code semantics into APR pipelines can lead to robust and semantically consistent repairs.

大型语言模型(LLMS) 显示下游软件工程任务(如自动程序修理(APR) ) 的能力令人印象深刻。 特别是,对SWE- Bench等存储器级问题解答基准(如SWE- Bench)进行了大量研究。 虽然在这一专题上取得了显著进展,但我们注意到,在解决这些问题的过程中,现有代理系统倾向于在直接可疑的代码线上超本地化,并孤立地修正它们,而没有更深入地理解问题语义学、代码语义学或执行语义学等。 因此,许多现有系统产生了与用户问题格格不入的补丁,甚至更一般的修补。 为解决这一限制,我们引入了SemAgent这个基于工作流程的新程序,即利用基于工作流程的新的流程来生成完整补丁。 我们通过一个新的管道来做到这一点:(a) 利用执行语义语义学来检索相关背景, (b) 通过通用的抽象的抽象的抽象的抽象的抽象的抽象化,理解, (c) 将问题- 改进处理- 改进的处理方法, (c) 将精化的精化的精化的精化的精化的精化的精化的精化, 显示, 显示我们这个阶段的精化的精化的精化的精化的精化的精化的精化的精化的精化的精化的精化的精制, 显示的精制的精制, 显示的精制, 。


Article 53

Title@2025-06-19 (4): Autonomous Computer Vision Development with Agentic AI

Title: Autonomous Computer Vision Development with Agentic AI Autonome Computer Vision Entwicklung mit Agentischer KI 与Agric AI合作的自主计算机愿景发展 2506.11140v3

Authors (6): Jin Kim, Muhammad Wahi-Anwa, Sangyun Park, Shawn Shin, John M. Hoffman, Matthew S. Brown

Agentic Artificial Intelligence (AI) systems leveraging Large Language Models (LLMs) exhibit significant potential for complex reasoning, planning, and tool utilization. We demonstrate that a specialized computer vision system can be built autonomously from a natural language prompt using Agentic AI methods. This involved extending SimpleMind (SM), an open-source Cognitive AI environment with configurable tools for medical image analysis, with an LLM-based agent, implemented using OpenManus, to automate the planning (tool configuration) for a particular computer vision task. We provide a proof-of-concept demonstration that an agentic system can interpret a computer vision task prompt, plan a corresponding SimpleMind workflow by decomposing the task and configuring appropriate tools. From the user input prompt, “provide sm (SimpleMind) config for lungs, heart, and ribs segmentation for cxr (chest x-ray)”), the agent LLM was able to generate the plan (tool configuration file in YAML format), and execute SM-Learn (training) and SM-Think (inference) scripts autonomously. The computer vision agent automatically configured, trained, and tested itself on 50 chest x-ray images, achieving mean dice scores of 0.96, 0.82, 0.83, for lungs, heart, and ribs, respectively. This work shows the potential for autonomous planning and tool configuration that has traditionally been performed by a data scientist in the development of computer vision applications.

利用大语言模型(LLMS)的人工智能(AI)系统,利用大语言模型(LLMS),利用大语言模型(LLMS),在复杂的推理、规划和工具利用方面有巨大的潜力。我们证明一个专门的计算机视觉系统可以自动地从自然语言中建立,使用AAID方法,这涉及扩展SimleMind(SM),这是一个开放源源代码的人工智能环境,具有可配置的医学图像分析工具,使用OpenManus(LLMM)的代理,使规划(工具配置)自动化(工具配置),用于计算机的某个特定任务。我们提供了一个概念证明,证明一个代理系统可以迅速解释计算机视觉任务,通过拆分任务和配置适当工具来规划相应的简单MimpleMind工作流程。从用户输入提示中,“为肺部、心脏和肋骨部(chet x-ray)提供一个配置工具,通过测试YAML格式将SMS-L(培训)和SMYS-TINK3,用传统的50级图像自动配置,通过测试的计算机-BLILIMLMLMLMLA图图图图图,实现。


Article 54

Title@2025-06-19 (4): eCAV: An Edge-Assisted Evaluation Platform for Connected Autonomous Vehicles

Title: eCAV: An Edge-Assisted Evaluation Platform for Connected Autonomous Vehicles eCAV: Eine Edge Assisted Evaluation Platform für vernetzte autonome Fahrzeuge eCAV: 连接自治车辆的边缘辅助评价平台 2506.16535v1

Authors (7): Tyler Landle, Jordan Rapp, Dean Blank, Chandramouli Amarnath, Abhijit Chatterjee, Alex Daglis, Umakishore Ramachandran

As autonomous vehicles edge closer to widespread adoption, enhancing road safety through collision avoidance and minimization of collateral damage becomes imperative. Vehicle-to-everything (V2X) technologies, which include vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), and vehicle-to-cloud (V2C), are being proposed as mechanisms to achieve this safety improvement. Simulation-based testing is crucial for early-stage evaluation of Connected Autonomous Vehicle (CAV) control systems, offering a safer and more cost-effective alternative to real-world tests. However, simulating large 3D environments with many complex single- and multi-vehicle sensors and controllers is computationally intensive. There is currently no evaluation framework that can effectively evaluate realistic scenarios involving large numbers of autonomous vehicles. We propose eCAV – an efficient, modular, and scalable evaluation platform to facilitate both functional validation of algorithmic approaches to increasing road safety, as well as performance prediction of algorithms of various V2X technologies, including a futuristic Vehicle-to-Edge control plane and correspondingly designed control algorithms. eCAV can model up to 256 vehicles running individual control algorithms without perception enabled, which is $8\times$ more vehicles than what is possible with state-of-the-art alternatives. %faster than state-of-the-art alternatives that can simulate $8\times$ fewer vehicles. With perception enabled, eCAV simulates up to 64 vehicles with a step time under 800ms, which is $4\times$ more and $1.5\times$ faster than the state-of-the-art OpenCDA framework.

随着自治车辆接近广泛采用,通过避免碰撞和尽量减少附带损害而加强道路安全就变得势在必行。车辆对一切的技术,包括车辆对车辆、车辆对车辆、车辆对基础设施(V2V)、车辆对车对库(V2C),正在作为实现这一安全改善的机制提出建议。模拟测试对于早期评价连接的自动车辆控制系统(CAV)至关重要,为实际世界测试提供了更安全、更具有成本效益的替代方法。然而,以许多复杂的单一和多车辆传感器和控制器模拟大型三维环境(V2X)是计算密集的。目前没有任何评价框架可以有效评价涉及大量自主车辆的现实情景。我们提议eCAVA – – 一个高效、模块化和可扩展的评价平台,以便利对提高道路安全的算法方法进行功能验证,以及对各种V2X技术的性算法进行性预测,包括更安全、更快速的车辆对州级控制平板和相应设计的电子控制算法。 eC – – 一种更能让成本的车辆对成本的替代方法进行更精确的模型,而不能让个人对可能控制。


Article 55

Title@2025-06-19 (4): Advanced Game-Theoretic Frameworks for Multi-Agent AI Challenges: A 2025 Outlook

Title: Advanced Game-Theoretic Frameworks for Multi-Agent AI Challenges: A 2025 Outlook Fortgeschrittene Game-Theoretische Frameworks für Multi-Agent-KI-Herausforderungen: Ein 2025er Ausblick 应对多机构AI挑战的先进游戏理论框架:2025年展望 2506.17348v1

Authors (1): Pavel Malinovskiy

This paper presents a substantially reworked examination of how advanced game-theoretic paradigms can serve as a foundation for the next-generation challenges in Artificial Intelligence (AI), forecasted to arrive in or around 2025. Our focus extends beyond traditional models by incorporating dynamic coalition formation, language-based utilities, sabotage risks, and partial observability. We provide a set of mathematical formalisms, simulations, and coding schemes that illustrate how multi-agent AI systems may adapt and negotiate in complex environments. Key elements include repeated games, Bayesian updates for adversarial detection, and moral framing within payoff structures. This work aims to equip AI researchers with robust theoretical tools for aligning strategic interaction in uncertain, partially adversarial contexts.

本文件对先进的游戏理论范式如何能成为预计2025年或前后会到来的下一代人造情报(AI)挑战的基础进行了大量重整。我们的重点超越了传统模式,纳入了动态联盟的形成、基于语言的公用设施、破坏风险和部分可观察性。我们提供了一套数学形式主义、模拟和编码计划,以说明多试剂的AI系统如何在复杂环境中适应和谈判。关键内容包括反复游戏、巴耶斯式对抗性检测更新和报酬结构内的道德框架。这项工作旨在为AI研究人员提供强有力的理论工具,以便在不确定、部分敌对的环境中调整战略互动。


Article 56

Title@2025-06-19 (4): Goal-conditioned Hierarchical Reinforcement Learning for Sample-efficient and Safe Autonomous Driving at Intersections

Title: Goal-conditioned Hierarchical Reinforcement Learning for Sample-efficient and Safe Autonomous Driving at Intersections Zielkonditioniertes Hierarchisches Verstärkungslernen für probeneffizientes und sicheres autonomes Fahren an Kreuzungen 以目标为条件的级级强化学习,促进在跨部门进行抽样高效和安全自主驾驶 2506.16336v1

Authors (1): Yiou Huang

Reinforcement learning (RL) exhibits remarkable potential in addressing autonomous driving tasks. However, it is difficult to train a sample-efficient and safe policy in complex scenarios. In this article, we propose a novel hierarchical reinforcement learning (HRL) framework with a goal-conditioned collision prediction (GCCP) module. In the hierarchical structure, the GCCP module predicts collision risks according to different potential subgoals of the ego vehicle. A high-level decision-maker choose the best safe subgoal. A low-level motion-planner interacts with the environment according to the subgoal. Compared to traditional RL methods, our algorithm is more sample-efficient, since its hierarchical structure allows reusing the policies of subgoals across similar tasks for various navigation scenarios. In additional, the GCCP module’s ability to predict both the ego vehicle’s and surrounding vehicles’ future actions according to different subgoals, ensures the safety of the ego vehicle throughout the decision-making process. Experimental results demonstrate that the proposed method converges to an optimal policy faster and achieves higher safety than traditional RL methods.

强化学习(RL)在应对自主驾驶任务方面具有非凡的潜力。然而,很难在复杂的情景中培训一个抽样高效和安全的政策。在本条中,我们提出一个新的等级强化学习(HRL)框架,其中包含一个有目标条件的碰撞预测模块。在等级结构中,GCCP模块根据自我驾驶工具的不同潜在子目标预测碰撞风险。一个高级决策者选择了最安全的子目标。一个低级别运动规划者根据次级目标与环境互动。与传统的RL方法相比,我们的算法更具样本效率,因为其等级结构允许在各种导航情景的类似任务中重新使用次级目标政策。此外,GCCP模块根据不同的次级目标预测自用车辆及其周围车辆未来行动的能力,确保自用车辆在整个决策过程中的安全。实验结果表明,拟议的方法与最佳政策一致,比传统的RL方法更快,并实现更高的安全性。


Article 57

Title@2025-06-19 (4): Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners

Title: Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners Beitrag anregen und auch Parameter lernen: Föderiertes Lernen mit strategischen Dateninhabern 激励贡献和学习参数:与战略数据所有者进行联邦学习 2505.12010v2

Authors (5): Drashthi Doshi, Aditya Vema Reddy Kesari, Swaprava Nath, Avishek Ghosh, Suhas S Kowshik

Classical federated learning (FL) assumes that the clients have a limited amount of noisy data with which they voluntarily participate and contribute towards learning a global, more accurate model in a principled manner. The learning happens in a distributed fashion without sharing the data with the center. However, these methods do not consider the incentive of an agent for participating and contributing to the process, given that data collection and running a distributed algorithm is costly for the clients. The question of rationality of contribution has been asked recently in the literature and some results exist that consider this problem. This paper addresses the question of simultaneous parameter learning and incentivizing contribution, which distinguishes it from the extant literature. Our first mechanism incentivizes each client to contribute to the FL process at a Nash equilibrium and simultaneously learn the model parameters. However, this equilibrium outcome can be away from the optimal, where clients contribute with their full data and the algorithm learns the optimal parameters. We propose a second mechanism with monetary transfers that is budget balanced and enables the full data contribution along with optimal parameter learning. Large scale experiments with real (federated) datasets (CIFAR-10, FeMNIST, and Twitter) show that these algorithms converge quite fast in practice, yield good welfare guarantees, and better model performance for all agents.

古老的联邦学习(FL)假设客户自愿参与的紧张数据数量有限,有助于以有原则的方式学习全球更准确的模式; 学习以分散的方式进行,没有与中心分享数据; 然而,这些方法并不考虑参与和促进这一过程的代理商的积极性,因为数据收集和运行分布式算法对客户来说代价高昂; 最近文献中询问了捐款的合理性问题,有些成果考虑了这一问题。 本文探讨了同时学习参数和激励贡献的问题,这使其与现有文献不同。 我们的第一个机制激励每个客户在纳什均衡时为FL进程作出贡献,同时学习模型参数参数参数参数参数。 然而,这种平衡的结果可能偏离最佳,客户用其全部数据和算法来贡献其全部数据和算法学习最佳参数。 我们提出了第二个机制,即货币转移在预算模式上保持平衡,使全部数据贡献能够与最佳参数学习相结合。 大规模实验用真实(Feded)数据集(CIFAR-10、FEMNIST、FEMINST和Twitter)进行,快速地将业绩分析,所有收益率都显示良好的做法。


Article 58

Title@2025-06-19 (4): Towards Emergency Scenarios: An Integrated Decision-making Framework of Multi-lane Platoon Reorganization

Title: Towards Emergency Scenarios: An Integrated Decision-making Framework of Multi-lane Platoon Reorganization Auf dem Weg zu Notfallszenarien: Ein integrierter Entscheidungsrahmen für mehrspurige Platoon-Reorganisation 实现紧急情况设想:多lane排重组综合决策框架 2506.16311v1

Authors (5): Aijing Kong, Chengkai Xu, Xian Wu, Xinbo Chen, Peng Hang

To enhance the ability for vehicle platoons to respond to emergency scenarios, a platoon distribution reorganization decision-making framework is proposed. This framework contains platoon distribution layer, vehicle cooperative decision-making layer and vehicle planning and control layer. Firstly, a reinforcement-learning-based platoon distribution model is presented, where a risk potential field is established to quantitatively assess driving risks, and a reward function tailored to the platoon reorganization process is constructed. Then, a coalition-game-based vehicle cooperative decision-making model is put forward, modeling the cooperative relationships among vehicles through dividing coalitions and generating the optimal decision results for each vehicle. Additionally, a novel graph-theory-based Platoon Disposition Index (PDI) is incorporated into the game reward function to measure the platoon’s distribution state during the reorganization process, in order to accelerating the reorganization process. Finally, the validation of the proposed framework is conducted in two high-risk scenarios under random traffic flows. The results show that, compared to the baseline models, the proposed method can significantly reduce the collision rate and improve driving efficiency. Moreover, the model with PDI can significantly decrease the platoon formation reorganization time and improve the reorganization efficiency.

为加强车辆排应对紧急情况的能力,提出了排级分配重组决策框架,包括排分配层、车辆合作决策层和车辆规划和控制层;首先,提出了基于强化学习排分配模式,其中为量化评估驾驶风险确定了潜在风险领域,并建立了适合排级重组过程的奖励功能;然后,提出了基于联盟的车辆合作决策模式,通过分割联盟和为每部车辆创造最佳决策结果,模拟车辆之间的合作关系;此外,在游戏奖励功能中纳入了新的图形式平板处理指数,以衡量排级在重组过程中的分布状况,以加快重组进程;最后,在随机交通流动情况下,在两种高风险情况下对拟议框架进行验证;结果显示,与基线模型相比,拟议方法可以大大减少碰撞率,提高驾驶效率;此外,与PDI相比,模式可以大大减少排编组的重组时间,提高重组效率。


Article 59

Title@2025-06-19 (4): Coordination of Electrical and Heating Resources by Self-Interested Agents

Title: Coordination of Electrical and Heating Resources by Self-Interested Agents Koordination von elektrischen und Heizressourcen durch selbstinteressierte Agenten 自利代理人协调电气和供暖资源 2506.16277v1

Authors (3): Rico Schrage, Jari Radler, Astrid Nieße

With the rise of distributed energy resources and sector coupling, distributed optimization can be a sensible approach to coordinate decentralized energy resources. Further, district heating, heat pumps, cogeneration, and sharing concepts like local energy communities introduce the potential to optimize heating and electricity output simultaneously. To solve this issue, we tackle the distributed multi-energy scheduling optimization problem, which describes the optimization of distributed energy generators over multiple time steps to reach a specific target schedule. This work describes a novel distributed hybrid algorithm as a solution approach. This approach is based on the heuristics of gossiping and local search and can simultaneously optimize the private objective of the participants and the collective objective, considering multiple energy sectors. We show that the algorithm finds globally near-optimal solutions while protecting the stakeholders’ economic goals and the plants’ technical properties. Two test cases representing pure electrical and gas-based technologies are evaluated.

随着分配式能源资源和部门组合的兴起,分配式优化可以成为协调分散式能源资源的合理方法。此外,地区供暖、热泵、热泵、热电联产和分享概念,如地方能源社区,可以同时带来优化供暖和电力输出的潜力。为了解决这个问题,我们解决分配式多能源调度优化问题,它描述了在多个时间步骤中优化分配式能源发电机以达到具体的目标时间表。这项工作将新颖的分散式混合算法描述为一种解决办法。这种方法基于流言和本地搜索的惯性,可以同时优化参与者的私人目标和集体目标,同时考虑多个能源部门。我们表明算法在保护利益攸关方的经济目标和工厂的技术特性的同时发现全球近乎最佳的解决办法。对代表纯电气和天然气技术的两个测试案例进行了评估。


Article 60

Title@2025-06-19 (4): Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems

Title: Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems Beyond Self-Talk: Eine kommunikationszentrische Untersuchung von LLM-basierten Multiagentensystemen 超越自言自语:以LLM为基础的多种机构系统的通信中心调查 2502.14321v2

Authors (9): Bingyu Yan, Zhibo Zhou, Litian Zhang, Lian Zhang, Ziyi Zhou, Dezhuang Miao, Zhoujun Li, Chaozhuo Li, Xiaoming Zhang

Large language model-based multi-agent systems have recently gained significant attention due to their potential for complex, collaborative, and intelligent problem-solving capabilities. Existing surveys typically categorize LLM-based multi-agent systems (LLM-MAS) according to their application domains or architectures, overlooking the central role of communication in coordinating agent behaviors and interactions. To address this gap, this paper presents a comprehensive survey of LLM-MAS from a communication-centric perspective. Specifically, we propose a structured framework that integrates system-level communication (architecture, goals, and protocols) with system internal communication (strategies, paradigms, objects, and content), enabling a detailed exploration of how agents interact, negotiate, and achieve collective intelligence. Through an extensive analysis of recent literature, we identify key components in multiple dimensions and summarize their strengths and limitations. In addition, we highlight current challenges, including communication efficiency, security vulnerabilities, inadequate benchmarking, and scalability issues, and outline promising future research directions. This review aims to help researchers and practitioners gain a clear understanding of the communication mechanisms in LLM-MAS, thereby facilitating the design and deployment of robust, scalable, and secure multi-agent systems.

现有调查通常根据LLM-MAS(LLMM-MAS)的应用领域或结构对基于LLMM(LLMM-MAS)的多剂系统(LLMM-MAS)进行分类,忽略通信在协调代理人行为和互动方面的中心作用。为了解决这一差距,本文件从以通信为中心的角度对LLM-MAS(LM-MAS)进行全面调查。具体地说,我们提议了一个结构化框架,将系统一级的通信(结构、目标和协议)与系统内部通信(战略、范例、目标和内容)结合起来,以便能够详细探讨代理人如何互动、谈判和实现集体情报。我们通过广泛分析最近的文献,确定多方面的关键组成部分,总结其长处和局限性。此外,我们强调当前的挑战,包括通信效率、安全脆弱性、基准不足和可扩展性问题,并概述有希望的未来研究方向。本审查的目的是帮助研究人员和从业人员对LMMAS-MAS(LMAS)中的通信机制有明确了解,从而便利设计和部署强有力、可扩展和安全的多剂系统。


Article 61

Title@2025-06-19 (4): Solving Zero-Sum Convex Markov Games

Title: Solving Zero-Sum Convex Markov Games Lösen Zero-Sum Convex Markov Spiele 解决零- 苏姆 Convex Markov 游戏 2506.16120v1

Authors (4): Fivos Kalogiannis, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Ian Gemp, Georgios Piliouras

We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) by using independent policy gradient methods. Convex Markov games, recently defined by Gemp et al. (2024), extend Markov decision processes to multi-agent settings with preferences that are convex over occupancy measures, offering a broad framework for modeling generic strategic interactions. However, even the fundamental min-max case of cMGs presents significant challenges, including inherent nonconvexity, the absence of Bellman consistency, and the complexity of the infinite horizon. We follow a two-step approach. First, leveraging properties of hidden-convex–hidden-concave functions, we show that a simple nonconvex regularization transforms the min-max optimization problem into a nonconvex-proximal Polyak-Lojasiewicz (NC-pPL) objective. Crucially, this regularization can stabilize the iterates of independent policy gradient methods and ultimately lead them to converge to equilibria. Second, building on this reduction, we address the general constrained min-max problems under NC-pPL and two-sided pPL conditions, providing the first global convergence guarantees for stochastic nested and alternating gradient descent-ascent methods, which we believe may be of independent interest.

我们通过使用独立的政策梯度方法,为全球与Nash equilibria(NE)的Nash equilibria(cMGs) 游戏(cMGs) 在双玩者零和 convex convex Markov (cMGs) 游戏(cMGs) 中,为全球趋同提供了第一个可行的保证。Convex Markov 游戏最近由Gemp等人(2024) 定义,将Markov 决策程序扩大到多试剂环境,其偏好在于占用措施,为模拟通用战略互动提供了一个广泛的框架。然而,即使是大型组合组合组合的基本微量(Min-momex-max progy-Lojasiewicz (NC-pPL) , 也带来了重大挑战,包括固有的政策梯度方法,以及无限的地平地平线的复杂性。我们遵循了两步方法。首先,利用隐藏的Convex-hi-hid-hid-cal coil commal mail mail mail mailtial delieval delieval mabil max maxiltitution max subiltiquest pres subiltitution subiltime subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil subil ro ro subil subil subil su,我们在降低 提供我们在降低低级上,我们在降低 上,我们


Article 62

Title@2025-06-19 (4): DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents

Title: DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents DrunkAgent: Stealthy Memory Korruption in LLM-Powered Recommender Agents DrunkAgent:LLM授权建议代理人的隐性记忆腐败 2503.23804v2

Authors (8): Shiyi Yang, Zhibo Hu, Xinshu Li, Chen Wang, Tong Yu, Xiwei Xu, Liming Zhu, Lina Yao

Large language model (LLM)-powered agents are increasingly used in recommender systems (RSs) to achieve personalized behavior modeling, where the memory mechanism plays a pivotal role in enabling the agents to autonomously explore, learn and self-evolve from real-world interactions. However, this very mechanism, serving as a contextual repository, inherently exposes an attack surface for potential adversarial manipulations. Despite its central role, the robustness of agentic RSs in the face of such threats remains largely underexplored. Previous works suffer from semantic mismatches or rely on static embeddings or pre-defined prompts, all of which hinder their applicability to systems with dynamic memory states. This challenge is exacerbated by the black-box nature of commercial RSs. To tackle the above problems, in this paper, we present the first systematic investigation of memory-based vulnerabilities in LLM-powered recommender agents, revealing their security limitations and guiding efforts to strengthen system resilience and trustworthiness. Specifically, we propose a novel black-box attack framework named DrunkAgent. DrunkAgent crafts semantically meaningful adversarial textual triggers for target item promotions and introduces a series of strategies to maximize the trigger effect by corrupting the memory updates during the interactions. The triggers and strategies are optimized on a surrogate model, enabling DrunkAgent transferable and stealthy. Extensive experiments on real-world datasets across diverse agentic RSs, including collaborative filtering, retrieval augmentation and sequential recommendations, demonstrate the generalizability, transferability and stealthiness of DrunkAgent.

大型语言模型(LLM)动力剂越来越多地用于建议系统(RSs),以实现个性化行为模型,其中记忆机制在使代理人能够自主地探索、学习和从现实世界互动中自我演化方面发挥着关键作用。然而,这一机制作为背景库,本身就暴露了潜在的对抗操纵攻击表面。尽管其中心作用,在面对这种威胁时,代理人RS的强力仍然在很大程度上没有得到充分利用。以前的工作由于语义错误或依赖静态嵌入或预先定义的提示而受到影响,所有这些都阻碍了它们与动态记忆状态下的系统的适用性。由于商业RSs的黑盒性质,这一挑战更加严重。为了解决上述问题,我们在本文件中首次系统地调查LLM动力建议剂中基于记忆的脆弱性,暴露其安全局限性,指导加强系统复原力和可信度的努力。具体地说,我们提议了一个名为DrunkAent的黑箱袭击框架,或依赖静态性嵌入式嵌入或预设的提示,所有这些都妨碍了它们对于动态记忆状态的系统应用性嵌入性嵌入。


Article 63

Title@2025-06-19 (4): Decentralized Collective World Model for Emergent Communication and Coordination

Title: Decentralized Collective World Model for Emergent Communication and Coordination Dezentrales kollektives Weltmodell für Emergent Communication und Coordination 新兴通信和协调世界分散集体模式 2504.03353v2

Authors (4): Kentaro Nomura, Tatsuya Aoki, Tadahiro Taniguchi, Takato Horii

We propose a fully decentralized multi-agent world model that enables both symbol emergence for communication and coordinated behavior through temporal extension of collective predictive coding. Unlike previous research that focuses on either communication or coordination separately, our approach achieves both simultaneously. Our method integrates world models with communication channels, enabling agents to predict environmental dynamics, estimate states from partial observations, and share critical information through bidirectional message exchange with contrastive learning for message alignment. Using a two-agent trajectory drawing task, we demonstrate that our communication-based approach outperforms non-communicative models when agents have divergent perceptual capabilities, achieving the second-best coordination after centralized models. Importantly, our decentralized approach with constraints preventing direct access to other agents’ internal states facilitates the emergence of more meaningful symbol systems that accurately reflect environmental states. These findings demonstrate the effectiveness of decentralized communication for supporting coordination while developing shared representations of the environment.

我们提出一个完全分散化的多试剂世界模式,通过集体预测编码的暂时延伸,为通信和协调行为提供标志。 与以往以通信或协调分别为重点的研究不同,我们的方法是同时实现的。 我们的方法将世界模型与通信渠道相结合,使代理器能够预测环境动态,根据部分观察作出估计,并通过双向信息交换和对信息匹配的对比性学习分享重要信息。 我们用双向试剂轨迹绘制任务,证明我们的基于通信的方法在代理器具有不同的概念能力时,优于非互动模式,在集中模型之后实现第二最佳协调。 重要的是,我们采用分散化方法,限制直接接触其他代理器的内部国家,有利于出现更有意义的标志系统,准确反映环境状况。 这些发现表明分散化通信在支持协调的同时发展共同环境表述的有效性。


Article 64

Title@2025-06-19 (4): AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

Title: AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment AssistantX: Ein LLM-powered Proaktiver Assistent in kollaborativer Mensch-bevölkerter Umgebung 助理X:在合作人类普惠环境方面由LLM授权的一名主动助理助理 2409.17655v3

Authors (5): Nan Sun, Bo Mao, Yongchang Li, Di Guo, Huaping Liu

Current service robots suffer from limited natural language communication abilities, heavy reliance on predefined commands, ongoing human intervention, and, most notably, a lack of proactive collaboration awareness in human-populated environments. This results in narrow applicability and low utility. In this paper, we introduce AssistantX, an LLM-powered proactive assistant designed for autonomous operation in realworld scenarios with high accuracy. AssistantX employs a multi-agent framework consisting of 4 specialized LLM agents, each dedicated to perception, planning, decision-making, and reflective review, facilitating advanced inference capabilities and comprehensive collaboration awareness, much like a human assistant by your side. We built a dataset of 210 real-world tasks to validate AssistantX, which includes instruction content and status information on whether relevant personnel are available. Extensive experiments were conducted in both text-based simulations and a real office environment over the course of a month and a half. Our experiments demonstrate the effectiveness of the proposed framework, showing that AssistantX can reactively respond to user instructions, actively adjust strategies to adapt to contingencies, and proactively seek assistance from humans to ensure successful task completion. More details and videos can be found at https://assistantx-agent.github.io/AssistantX/.

目前,服务机器人的自然语言交流能力有限,严重依赖预先界定的指令,不断进行人类干预,而且最显著的是,在人类居住环境中缺乏主动的合作意识,这导致适用性狭窄和效用低。在本文中,我们引入了A助理X,这是一个具有LLM动力的主动性助理,在现实世界情景下能以高度精确的方式自主运作;A助理X采用一个由4个专门LLM代理组成的多试剂框架,每个代理都致力于认识、规划、决策和反思性审查,促进先进的推论能力和全面合作意识,就像你身边的人类助理一样。我们建立了一个由210个真实世界任务组成的数据集,以验证A助理X,其中包括教学内容和关于是否有相关人员的信息。在一个半月内,在基于文本的模拟和真正的办公环境中进行了广泛的实验。我们的实验表明拟议的框架的有效性,表明A助理X能够对用户的指示作出反应,积极调整战略以适应紧急情况,并积极寻求人类的援助以确保任务顺利完成。在https://GRIANT/ARIANTX.BIUBIG.


Article 65

Title@2025-06-19 (4): Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning

Title: Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning Rekonfigurierbare intelligente oberflächenunterstützte VEC auf Basis von Multi-Agenten-Verstärkungslernen 基于多机构强化学习的可重新配置智能表面辅助VEC 2406.11318v2

Authors (6): Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Qiang Fan, Jiangzhou Wang

Vehicular edge computing (VEC) is an emerging technology that enables vehicles to perform high-intensity tasks by executing tasks locally or offloading them to nearby edge devices. However, obstacles such as buildings may degrade the communications and incur communication interruptions, and thus the vehicle may not meet the requirement for task offloading. Reconfigurable intelligent surfaces (RIS) is introduced to support vehicle communication and provide an alternative communication path. The system performance can be improved by flexibly adjusting the phase-shift of the RIS. For RIS-assisted VEC system where tasks arrive randomly, we design a control scheme that considers offloading power, local power allocation and phase-shift optimization. To solve this non-convex problem, we propose a new deep reinforcement learning (DRL) framework that employs modified multi-agent deep deterministic policy gradient (MADDPG) approach to optimize the power allocation for vehicle users (VUs) and block coordinate descent (BCD) algorithm to optimize the phase-shift of the RIS. Simulation results show that our proposed scheme outperforms the centralized deep deterministic policy gradient (DDPG) scheme and random scheme.

车辆边缘计算(Vec)是一种新兴技术,使车辆能够通过在当地执行任务或将其卸载到附近的边缘装置完成高强度任务,但建筑物等障碍可能会降低通信质量,造成通信中断,因此车辆可能达不到卸载任务的要求。引入了可配置智能表面(RIS)以支持车辆通信并提供替代通信路径。系统性能可以通过灵活调整RIS的阶段性位来改进。对于任务随机到达的RIS辅助VEC系统,我们设计了一个考虑到卸载能力、当地电力分配和分期制优化的控制方案。为解决这一非凝固问题,我们提出了一个新的深度强化学习(DRL)框架,采用经修改的多剂深度确定性政策梯度(MADPG)方法优化车辆使用者(VUs)的动力分配和块协调后代算法,以优化RIS的阶段性位。模拟结果表明,我们拟议的计划超出了中央深度确定性政策梯度(DPG)计划和随机计划。