cs.MA @ 2025-08-01: 065
-
00 07-31 (4) GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis GenoMAS: Ein Multi-Agenten-Framework für wissenschaftliche Entdeckung durch codegetriebene Genexpressionsanalyse GenoMAS: 通过代码驱动基因表达分析科学发现多机构框架 2507.21035v2 -
01 07-31 Distributed AI Agents for Cognitive Underwater Robot Autonomy Verteilte KI-Agenten für kognitive Unterwasser-Roboterautonomie AI 用于水下认知化的代理物 2507.23735v1 -
02 07-31 A survey of multi-agent geosimulation methodologies: from ABM to LLM Eine Übersicht über die Methoden der Multi-Agenten-Geosimulation: von ABM bis LLM 多试剂地球模拟方法调查:从反弹道导弹到LLM 2507.23694v1 -
03 07-31 Barriers to Healthcare: Agent-Based Modeling to Mitigate Inequity Barrieren für die Gesundheitsversorgung: agentenbasierte Modellierung zur Verhinderung von Ungleichheiten 保健方面的障碍:基于代理的模型模型,以缩小不平等 2507.23644v1 -
04 07-31 Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding Where Paths Collide: Eine umfassende Untersuchung der klassischen und lernbasierten multi-agenten Pathfinding 路径相撞之处:对经典和以学习为基础的多方代理调查的全面调查 2505.19219v2 -
05 07-31 Chatting with your ERP: A Recipe Chatten mit Ihrem ERP: Ein Rezept 与您的 ERP 聊天: 食谱 2507.23429v1 -
06 07-31 Designing Dynamic Pricing for Bike-sharing Systems via Differentiable Agent-based Simulation Dynamische Preisgestaltung für Bike-Sharing-Systeme über eine charakteristische agentenbasierte Simulation 通过基于不同制剂的模拟,为自行车共享系统设计动态定价 2507.23344v1 -
07 07-31 DSBC : Data Science task Benchmarking with Context engineering DSBC : Data Science-Aufgabe Benchmarking mit Kontext-Engineering DSBC: 数据科学任务与背景工程基准 2507.23336v1 -
08 07-31 SDHN: Skewness-Driven Hypergraph Networks for Enhanced Localized Multi-Robot Coordination SDHN: Skewness-getriebene Hypergrafennetzwerke für verbesserte lokale Multi-Roboter-Koordination SDHN: Skewness-Driven 增强本地化多机器人协调电报网络 2504.06684v2 -
09 07-31 CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation CEE: Eine Inferenz-Zeit-Jailbreak-Verteidigung für eingedrungene Intelligenz über Subraumkonzept-Rotation 中东欧:通过子空间概念旋转对潜入式情报进行推论-时间破狱防御 2504.13201v2 -
10 07-31 XABPs: Towards eXplainable Autonomous Business Processes XABPs: Auf dem Weg zu eXplainable Autonomous Business Processes XABPs:迈向可塑性自治商业进程 2507.23269v1 -
11 07-31 DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System DynaSwarm: Dynamische Graphenstrukturauswahl für LLM-basiertes Multi-Agent-System DynSwarm: 以LLM为基础的多剂系统动态图结构选择 2507.23261v1 -
12 07-31 Accessibility Scout: Personalized Accessibility Scans of Built Environments Accessibility Scout: Personalisierte Barrierefreiheit Scans von gebauten Umgebungen 无障碍童子军:个人化无障碍环境扫描仪 2507.23190v1 -
13 07-31 LENS: Learning Ensemble Confidence from Neural States for Multi-LLM Answer Integration LENS: Lerne Ensemble Vertrauen aus neuralen Staaten für Multi-LLM-Antwortintegration LENS:从神经国家学习多LLM应答整合的集合信任 2507.23167v1 -
14 07-30 (3) Causal-Inspired Multi-Agent Decision-Making via Graph Reinforcement Learning Causal-Inspired Multi-Agent Entscheidungs-Making über Graph Verstärkungs-Lernen 通过图集强化学习作出因果-受激励的多机构机构决策 2507.23080v1 -
15 07-30 Bifröst: Spatial Networking with Bigraphs Bifröst: Räumliche Vernetzung mit Bigraphen Bifröst:与论文进行空间联网 2507.22687v1 -
16 07-30 Towards Simulating Social Influence Dynamics with LLM-based Multi-agents Auf dem Weg zur Simulation sozialer Einflussdynamik mit LLM-basierten Multi-Agenten 利用以LLM为基础的多剂模拟社会影响动态 2507.22467v1 -
17 07-30 Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework Auf dem Weg zu einer interpretierbaren Renal Health-Prognose über Multi-LMM-Kollaboratives Reasoning-Framework 通过多伦多和多伦多MM合作理由框架,迈向可解释性中时健康下降预测 2507.22464v1 -
18 07-30 The challenge of hidden gifts in multi-agent reinforcement learning Die Herausforderung der versteckten Gaben in Multi-Agenten-Verstärkung Lernen 多试剂强化学习中隐藏礼品的挑战 2505.20579v3 -
19 07-29 (2) Multi-Agent Path Finding Among Dynamic Uncontrollable Agents with Statistical Safety Guarantees Multi-Agent Pfad finden unter dynamischen unkontrollierbaren Agenten mit statistischen Sicherheitsgarantien 在具有统计安全保障措施的动态不可控制的代理人中寻找多机构途径 2507.22282v1 -
20 07-29 Physics-Informed EvolveGCN: Satellite Prediction for Multi Agent Systems Physik-informierte EvolveGCN: Satellitenvorhersage für Multi-Agent-Systeme GCN:多剂系统的卫星预测 2507.22279v1 -
21 07-29 Successor Features for Transfer in Alternating Markov Games Nachfolger Funktionen für den Transfer in Wechsel Markov Spiele 在交替的 Markov 游戏中传输的后继功能 2507.22278v1 -
22 07-29 Validating Generative Agent-Based Models of Social Norm Enforcement: From Replication to Novel Predictions Validierung generativer agentenbasierter Modelle der Durchsetzung sozialer Normen: Von der Replikation zu neuartigen Vorhersagen 验证社会规范执行的产生代理模式:从复制到新预测 2507.22049v1 -
23 07-29 Towards Cognitive Synergy in LLM-Based Multi-Agent Systems: Integrating Theory of Mind and Critical Evaluation Auf dem Weg zu kognitiver Synergie in LLM-basierten Multiagentensystemen: Integration der Theorie des Geistes und kritische Evaluation 在以LLM为基础的多种机构系统中实现认知协同:综合思维理论和关键评价 2507.21969v1 -
24 07-29 A finite time analysis of distributed Q-learning Eine endliche Zeitanalyse des verteilten Q-Learning 对分发的 “ 学习 “ 的有限时间分析 2405.14078v2 -
25 07-29 Agent-Based Exploration of Recommendation Systems in Misinformation Propagation Agent-based Exploration von Empfehlungssystemen in falscher Informationsverbreitung 错误信息传播中建议系统的基于代理人的探索 2507.21724v1 -
26 07-29 Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis Intrinsische Barrieren und praktische Wege für die Mensch-AI-Ausrichtung: Eine auf Vereinbarungen basierende Komplexitätsanalyse 内在障碍和人类-AI协调的实用途径:基于协定的复杂程度分析 2502.05934v2 -
27 07-29 “Teammates, Am I Clear?”: Analysing Legible Behaviours in Teams “Teamkollegen, bin ich klar?”: Legible Verhaltensmuster in Teams analysieren “Teammates,我清楚了吗?” “分析团队中可行的行为” 2507.21631v1 -
28 07-29 A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature Ein Multi-Agent-System ermöglicht vielseitige Informationsextraktion aus der chemischen Literatur 一个多机构系统能够从化学文献中提取 Versatile 信息 2507.20230v2 -
29 07-28 (1) Games Agents Play: Towards Transactional Analysis in LLM-based Multi-Agent Systems Games Agents Play: Auf dem Weg zur Transaktionsanalyse in LLM-basierten Multi-Agent-Systemen 玩游戏代理游戏:争取在基于LLM的多机构系统中进行交易分析 2507.21354v1 -
30 07-28 Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model Nachahmung des Verhaltens von Fahrern von Elektrofahrzeugen mit Hilfe eines agentengestützten Bewehrungs-Lernmodells 利用以代理为基础的强化学习模式复制电动车辆驾驶员的行为 2507.21341v1 -
31 07-28 Core Safety Values for Provably Corrigible Agents Grundlegende Sicherheitswerte für wahrscheinlich korrigierbare Wirkstoffe 可可调代用品的核心安全价值 2507.20964v1 -
32 07-28 Contrastive learning-based agent modeling for deep reinforcement learning Kontrastive Learning-basierte Agentenmodellierung für tiefe Verstärkungs-Lernen 用于深强化学习的反向学习代理模型模型 2401.00132v3 -
33 07-27 (7) Real-Time LaCAM for Real-Time MAPF Echtzeit-LaCAM für Echtzeit-MAPF 实时MAPF的实时拉卡姆 2504.06091v2 -
34 07-27 MLC-Agent: Cognitive Model based on Memory-Learning Collaboration in LLM Empowered Agent Simulation Environment MLC-Agent: Kognitives Modell auf Basis von Memory-Learning Collaboration in LLM Empowered Agent Simulation Environment 刚果解放运动-刚果解放运动代理:基于LLM授权模拟环境中的记忆-学习合作的认知模型 2507.20215v1 -
35 07-27 ADL: A Declarative Language for Agent-Based Chatbots ADL: Eine deklarative Sprache für agentenbasierte Chatbots ADL: 代理查博特人的宣布语言 2504.14787v2 -
36 07-27 Local Prompt Adaptation for Style-Consistent Multi-Object Generation in Diffusion Models Lokale Prompt-Anpassung für stilkonsistente Multi-Object-Generierung in Diffusions-Modellen 在传播模型中为样式一致多对象生成发布模式进行本地快速适应 2507.20094v1 -
37 07-26 (6) Large-Scale Mixed-Traffic and Intersection Control using Multi-agent Reinforcement Learning Multi-Agenten-Verstärkungs-Lernen mit großflächiger Mixed-Traffic- und Intersektionskontrolle 利用多剂强化学习系统进行大型混合运输和跨部门控制 2504.04691v2 -
38 07-26 Homotopy-aware Multi-agent Navigation via Distributed Model Predictive Control Homotopy-aware Multi-Agent Navigation über verteilte Modell Predictive Control 通过分布式模型预测控制,通过分布式预测控制进行多剂导航 2507.19860v1 -
39 07-26 VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets VAE-GAN-basierte Preismanipulation in koordinierten lokalen Energiemärkten VAE-GAN 协调的地方能源市场价格操纵 2507.19844v1 -
40 07-26 Moving Out: Physically-grounded Human-AI Collaboration Ausstieg: physikalisch begründete Mensch-AI-Kollaboration 搬出:基于身体的人类 – – AI协作 2507.18623v2 -
41 07-26 Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation Assembly Your Crew: Automatisches Multi-Agenten-Kommunikationstopologie-Design über autoregressive Graphen-Generierung 通过自动递减图形生成将您的组群组合成:自动多剂多剂通信地形设计 2507.18224v2 -
42 07-25 (5) Ultracoarse Equilibria and Ordinal-Folding Dynamics in Operator-Algebraic Models of Infinite Multi-Agent Games Ultracoarse Equilibria und Ordinal-Folding-Dynamik in Operator-Algebraische Modelle von unendlichen Multi-Agent-Spiele 无限多生运动会操作者-代数模型中的超粗平衡和奥地平流和奥地硬化动态 2507.19694v1 -
43 07-25 Hypergames: Modeling Misaligned Perceptions and Nested Beliefs for Multi-agent Systems Hypergames: Modellierung falscher Wahrnehmungen und verschachtelter Überzeugungen für Multi-Agent-Systeme 超游戏:模拟多试剂系统的错误观念和信仰 2507.19593v1 -
44 07-25 MCP4EDA: LLM-Powered Model Context Protocol RTL-to-GDSII Automation with Backend Aware Synthesis Optimization MCP4EDA: LLM-Powered Model Context Protocol RTL-to-GDSII Automation mit Backend Aware Syntheseoptimierung MCP4EDA: LLM 授权示范背景议定书RTL-GDSII 2507.19570v1 -
45 07-25 Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges Integration von LLM in agentenbasierte Sozialsimulation: Chancen und Herausforderungen 将LLM纳入代理社会模拟:机会与挑战 2507.19364v1 -
46 07-25 Exploring 6G Potential for Industrial Digital Twinning and Swarm Intelligence in Obstacle-Rich Environments 6G-Potenzial für industrielle digitale Twinnings und Schwarmintelligenz in Hindernis-Rich-Umgebungen erkunden 探索6G潜力,以工业数字结对和摇篮情报在奥斯塔克 – – 里希环境方面的潜力 2406.19930v3 -
47 07-25 ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination ReCoDe: Verstärktes Learning-basiertes dynamisches Constraint-Design für Multi-Agent-Koordination ReCode:加强以学习为基础的强化学习,为多机构协调设计动态制约 2507.19151v1 -
48 07-25 Heterogeneous Risk Management Using a Multi-Agent Framework for Supply Chain Disruption Response Heterogenes Risikomanagement mit Hilfe eines Multi-Agenten-Rahmens für die Reaktion auf Störungen der Lieferkette 利用多机构框架应对供应链干扰的多机构应对框架进行不同不同的风险管理 2507.19049v1 -
49 07-25 Dynamic distributed decision-making for resilient resource reallocation in disrupted manufacturing systems Dynamisch verteilte Entscheidungsfindung für widerstandsfähige Ressourcenumverteilung in gestörten Fertigungssystemen 在被破坏的制造系统内进行有弹性资源重新分配的动态分配决策的动态分布式决策 2507.19043v1 -
50 07-25 A Distributed Approach for Agile Supply Chain Decision-Making Based on Network Attributes Ein verteilter Ansatz für agile Supply Chain Entscheidungsfindung auf der Grundlage von Netzwerkattributen 基于网络属性的 “ 危险供应链决策分配办法 “ 2507.19038v1 -
51 07-25 Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies Mixed-Reality Digital Twins: Nutzung der physischen und virtuellen Welten für Hybrid Sim2Real Transition von Multi-Agent Verstärkungs-Learning-Politiken 混合-现实数字双对:利用物理和虚拟世界促进混合的Sim2重新过渡多机构强化学习政策 2403.10996v7 -
52 07-25 From Cloud-Native to Trust-Native: A Protocol for Verifiable Multi-Agent Systems Von Cloud-Native zu Trust-Native: Ein Protokoll für überprüfbare Multi-Agent-Systeme 从云源向信任的转移:可核证的多机构系统议定书 2507.22077v1 -
53 07-25 Adaptive Cluster Collaborativeness Boosts LLMs Medical Decision Support Capacity Adaptive Cluster Zusammenarbeit steigert LLMs medizinische Entscheidungsunterstützung Kapazität LLM 医疗决策支助能力 2507.21159v1 -
54 07-25 TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework with Group-Based Monte Carlo Tree Search TrafficMCTS: Ein Closed-Loop Traffic Flow Generation Framework mit gruppenbasierter Monte Carlo Tree Suche 交通流量监测:一个闭路交通流量生成框架,并配有基于集团的蒙特卡洛树搜索 2308.12797v3 -
55 07-25 Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise Individueller Intrinsischer Lohn im Mehr-Agenten-Verstärkungs-Lernen durch Einbeziehung allgemeiner menschlicher Expertise 通过纳入通用的人类专门知识,学习多机构加强学习中的个人内在奖赏 2507.18867v1 -
56 07-24 (4) Toward Super Agent System with Hybrid AI Routers Auf dem Weg zum Super Agent System mit Hybrid-KI Routern 向超级代理系统过渡 2504.10519v2 -
57 07-24 Towards Multi-Agent Economies: Enhancing the A2A Protocol with Ledger-Anchored Identities and x402 Micropayments for AI Agents Auf dem Weg zu Multi-Agent Economies: Verbesserung des A2A-Protokolls mit Ledger-Anchored Identities und x402 Micropayments für KI-Agenten 朝向多机构经济体:加强A2A议定书,使用分类标志和X402向AI代理商支付微额付款 2507.19550v1 -
58 07-24 EH-Benchmark Ophthalmic Hallucination Benchmark and Agent-Driven Top-Down Traceable Reasoning Workflow EH-Benchmark Ophthalmische Halluzination Benchmark und Agent-getriebene Top-Down-Rückverfolgbarkeit Workflow EH-Benchmark Ophthalmic 幻觉基准和代理Dripreven 顶底可追踪合理理由工作流程 2507.22929v1 -
59 07-24 Remembering the Markov Property in Cooperative MARL Erinnerung an das Markov-Grundstück in der Genossenschaft MARL 记得马尔科夫在MARL合作社中的财产 2507.18333v1 -
60 07-24 Designing Value-Aligned Traffic Agents through Conflict Sensitivity Gestaltung wertorientierter Verkehrsagenten durch Konfliktsensitivität 通过冲突敏感性设计符合价值的交通代理 2507.18284v1 -
61 07-24 Compositional Coordination for Multi-Robot Teams with Large Language Models Kompositionskoordination für Multi-Roboter-Teams mit großen Sprachmodellen 具有大语言模式的多机器人小组的组成协调 2507.16068v2 -
62 07-24 A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms Eine differenzierte Prämienmethode für verstärktes Lernen auf der Grundlage von Multi-Fahrzeug-Kooperativen-Entscheidungs-Making-Algorithmen 基于多维合作社决策的强化学习有区别的奖励方法 2502.00352v2 -
63 07-24 Recognizing and Eliciting Weakly Single Crossing Profiles on Trees Erkennen und Elizitieren von schwachen einzelnen Kreuzungsprofilen auf Bäumen 承认树树和树的脆弱单一交叉概况 1611.04175v4 -
64 07-24 Multi-Agent Guided Policy Optimization Multi-Agent gesteuerte Politikoptimierung 多边机构引导政策优化政策 2507.18059v1
Article 0
Title@2025-07-31 (4): GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
Title: GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis | GenoMAS: Ein Multi-Agenten-Framework für wissenschaftliche Entdeckung durch codegetriebene Genexpressionsanalyse | GenoMAS: 通过代码驱动基因表达分析科学发现多机构框架 2507.21035v2 |
Authors (3): Haoyang Liu, Yijiang Li, Haohan Wang
Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.
基因表达分析是许多生物医学发现的关键,然而,由于多个大型半结构化的半结构化文件的复杂性和对广泛领域专门知识的需要,从原始的笔录缩写数据中提取洞察力仍然十分艰巨。当前的自动化方法往往受到下列因素的限制:处于边缘的不灵活工作流程破裂,或完全自主的代理机构缺乏严格科学调查的必要精确度。GenoMAS通过展示一个基于LLMM的科学家团队,将结构化工作流程的可靠性与自主代理商的适应性结合起来,从而勾勒出不同的课程。GenoMAS通过打字式信息传递协议,将六种专门的LMM 代理商通过六个专门的LMM 代理商进行调试,这六种都为共同的解析工作提供了补充优势。GenoMAS的核心是一个指导性规划框架:方案代理商将高层次的任务指南引入行动股,并在每一时刻选择推进、修改、绕过或背轨,从而保持逻辑一致性,同时将精细的LMAS-LEX基准, GenalMAS 将89.13%混为数据预处理和BIBI_BL_BL_BL_I_BAR_BR_BAR_BAR_BR_BR_BR_BARBARBR_60BARBR_BR_BR_BR_BR_BR_BR_BR_BR_BR_BR_18BAR_18BAR_BAR_BAR_BAR_BARBAR_BARBARBARBARBARBARBARBARBARBAR_1860 AS_18_18
Article 1
Title@2025-07-31 (4): Distributed AI Agents for Cognitive Underwater Robot Autonomy
Title: Distributed AI Agents for Cognitive Underwater Robot Autonomy | Verteilte KI-Agenten für kognitive Unterwasser-Roboterautonomie | AI 用于水下认知化的代理物 2507.23735v1 |
Authors (4): Markus Buchholz, Ignacio Carlucho, Michele Grimaldi, Yvan R. Petillot
Achieving robust cognitive autonomy in robots navigating complex, unpredictable environments remains a fundamental challenge in robotics. This paper presents Underwater Robot Self-Organizing Autonomy (UROSA), a groundbreaking architecture leveraging distributed Large Language Model AI agents integrated within the Robot Operating System 2 (ROS 2) framework to enable advanced cognitive capabilities in Autonomous Underwater Vehicles. UROSA decentralises cognition into specialised AI agents responsible for multimodal perception, adaptive reasoning, dynamic mission planning, and real-time decision-making. Central innovations include flexible agents dynamically adapting their roles, retrieval-augmented generation utilising vector databases for efficient knowledge management, reinforcement learning-driven behavioural optimisation, and autonomous on-the-fly ROS 2 node generation for runtime functional extensibility. Extensive empirical validation demonstrates UROSA’s promising adaptability and reliability through realistic underwater missions in simulation and real-world deployments, showing significant advantages over traditional rule-based architectures in handling unforeseen scenarios, environmental uncertainties, and novel mission objectives. This work not only advances underwater autonomy but also establishes a scalable, safe, and versatile cognitive robotics framework capable of generalising to a diverse array of real-world applications.
在机器人航行的复杂、不可预测的环境中实现强大的认知自主仍然是机器人中的一项根本挑战。本文件介绍的是水下机器人自主自主自主(UROSA),这是一个开拓性建筑,利用在机器人操作系统2(ROS 2)框架内整合的分布式大型语言模型AI代理器,使自动水下机动车辆具备先进的认知能力。UROSA将认知分散到负责多式联运认知、适应性推理、动态任务规划和实时决策的专门的AI代理器中。中央创新包括动态调整其作用的灵活代理器、检索增强生成的矢量数据库,用于高效知识管理、加强学习驱动的行为优化,以及自动在空中自动生成ROS 2节节,用于运行功能扩展。广泛的实证验证表明,UROSA通过在模拟和实际部署中现实的水下任务,在传统的基于规则的架构处理意外情形、环境不确定性和新任务目标方面显示出巨大的优势。这项工作不仅推进了水下自主性自治,而且还建立了可缩放、安全、灵活多功能的智能智能机器人框架,能够运行运行运行。
Article 2
Title@2025-07-31 (4): A survey of multi-agent geosimulation methodologies: from ABM to LLM
Title: A survey of multi-agent geosimulation methodologies: from ABM to LLM | Eine Übersicht über die Methoden der Multi-Agenten-Geosimulation: von ABM bis LLM | 多试剂地球模拟方法调查:从反弹道导弹到LLM 2507.23694v1 |
Authors (2): Virginia Padilla, Jacinto Dávila
We provide a comprehensive examination of agent-based approaches that codify the principles and linkages underlying multi-agent systems, simulations, and information systems. Based on two decades of study, this paper confirms a framework intended as a formal specification for geosimulation platforms. Our findings show that large language models (LLMs) can be effectively incorporated as agent components if they follow a structured architecture specific to fundamental agent activities such as perception, memory, planning, and action. This integration is precisely consistent with the architecture that we formalize, providing a solid platform for next-generation geosimulation systems.
根据20年的研究,本文件确认一个框架,作为地球模拟平台的正式规格。 我们的研究结果显示,大型语言模型(LLMs)如果遵循概念、记忆、规划和行动等基本代理活动特有的结构架构,就可以有效地作为代理组成部分纳入其中。这种整合完全符合我们正式确定的结构,为下一代地球模拟系统提供了一个坚实的平台。
Article 3
Title@2025-07-31 (4): Barriers to Healthcare: Agent-Based Modeling to Mitigate Inequity
Title: Barriers to Healthcare: Agent-Based Modeling to Mitigate Inequity | Barrieren für die Gesundheitsversorgung: agentenbasierte Modellierung zur Verhinderung von Ungleichheiten | 保健方面的障碍:基于代理的模型模型,以缩小不平等 2507.23644v1 |
Authors (3): Alba Aguilera, Georgina Curto, Nardine Osman
Agent-based simulations have an enormous potential as tools to evaluate social policies in a non-invasive way, before these are implemented to real-world populations. However, the recommendations that these computational approaches may offer to tackle urgent human development challenges can vary substantially depending on how we model agents’ (people) behaviour and the criteria that we use to measure inequity. In this paper, we integrate the conceptual framework of the capability approach (CA), which is explicitly designed to promote and assess human well-being, to guide the simulation and evaluate the effectiveness of policies. We define a reinforcement learning environment where agents behave to restore their capabilities under the constraints of a specific policy. Working in collaboration with local stakeholders, non-profits and domain experts, we apply our model in a case study to mitigate health inequity among the population experiencing homelessness (PEH) in Barcelona. By doing so, we present the first proof of concept simulation, aligned with the CA for human development, to assess the impact of policies under parliamentary discussion.
以代理为基础的模拟具有巨大的潜力,可以作为工具,以非侵入方式评价社会政策,然后才能对现实世界的人口加以执行。然而,这些计算方法为解决人类发展方面的紧迫挑战而可能提供的建议可能大不相同,这取决于我们如何模拟代理人(人民)的行为,以及我们用来衡量不平等的标准。在本文件中,我们综合了能力方法的概念框架,该能力方法明确是为了促进和评估人类福祉,指导模拟和评价政策的有效性。我们定义了强化学习环境,使代理人在特定政策的限制下采取行动恢复其能力。我们与当地利益有关者、非营利者和领域专家合作,在一项案例研究中运用我们的模型,以减轻巴塞罗那无家可归人口的健康不平等。我们这样做是为了提出概念模拟的第一个证据,与人类发展的CA相一致,以评估议会讨论下的政策的影响。
Article 4
Title@2025-07-31 (4): Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding
Title: Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding | Where Paths Collide: Eine umfassende Untersuchung der klassischen und lernbasierten multi-agenten Pathfinding | 路径相撞之处:对经典和以学习为基础的多方代理调查的全面调查 2505.19219v2 |
Authors (7): Shiyue Wang, Haozheng Xu, Yuhan Zhang, Jingran Lin, Changhong Lu, Xiangfeng Wang, Wenhao Li
Multi-Agent Path Finding (MAPF) is a fundamental problem in artificial intelligence and robotics, requiring the computation of collision-free paths for multiple agents navigating from their start locations to designated goals. As autonomous systems become increasingly prevalent in warehouses, urban transportation, and other complex environments, MAPF has evolved from a theoretical challenge to a critical enabler of real-world multi-robot coordination. This comprehensive survey bridges the long-standing divide between classical algorithmic approaches and emerging learning-based methods in MAPF research. We present a unified framework that encompasses search-based methods (including Conflict-Based Search, Priority-Based Search, and Large Neighborhood Search), compilation-based approaches (SAT, SMT, CSP, ASP, and MIP formulations), and data-driven techniques (reinforcement learning, supervised learning, and hybrid strategies). Through systematic analysis of experimental practices across 200+ papers, we uncover significant disparities in evaluation methodologies, with classical methods typically tested on larger-scale instances (up to 200 by 200 grids with 1000+ agents) compared to learning-based approaches (predominantly 10-100 agents). We provide a comprehensive taxonomy of evaluation metrics, environment types, and baseline selections, highlighting the need for standardized benchmarking protocols. Finally, we outline promising future directions including mixed-motive MAPF with game-theoretic considerations, language-grounded planning with large language models, and neural solver architectures that combine the rigor of classical methods with the flexibility of deep learning. This survey serves as both a comprehensive reference for researchers and a practical guide for deploying MAPF solutions in increasingly complex real-world applications.
多机构路径定位(MAPF)是人工智能和机器人研究中一个根本问题,需要计算从最初地点到指定目标的多种代理商的无碰撞路径。随着自动系统在仓库、城市交通和其他复杂环境中日益盛行,MAPF已经从理论挑战演变为现实世界多机器人协调的关键推动者。这一全面调查弥合了传统算法方法与MAPF研究中新出现的基于学习的方法之间的长期差距。我们提出了一个统一的框架,其中包括基于搜索的方法(包括基于冲突的搜索、基于优先权的搜索和大型邻里搜索)、日益基于汇编的方法(SAT、SMT、CSP、ASP和MIP的制定)以及数据驱动技术(加强学习、监管的学习和混合战略)。通过系统分析200+文件的实验做法,我们发现评价方法存在重大差异,典型方法通常在更大规模的解决方案中测试(超过200个网络,有1 000+的参考工具),而基于学习的方法(主要为10100个内行者)、基于汇编的方法(SAT、SMER、SAP和MIMF的深度模型选择,我们作为未来标准化的常规排序选择方法,我们最终的模型选择,包括标准化的标准化的模型,我们作为基础的模型选择。
Article 5
Title@2025-07-31 (4): Chatting with your ERP: A Recipe
Title: Chatting with your ERP: A Recipe | Chatten mit Ihrem ERP: Ein Rezept | 与您的 ERP 聊天: 食谱 2507.23429v1 |
Authors (5): Jorge Ruiz Gómez, Lidia Andrés Susinos, Jorge Alamo Olivé, Sonia Rey Osorno, Manuel Luis Gonzalez Hernández
This paper presents the design, implementation, and evaluation behind a Large Language Model (LLM) agent that chats with an industrial production-grade ERP system. The agent is capable of interpreting natural language queries and translating them into executable SQL statements, leveraging open-weight LLMs. A novel dual-agent architecture combining reasoning and critique stages was proposed to improve query generation reliability.
本文件介绍了与工业生产级企业资源规划系统交谈的大型语言模型代理商的设计、实施和评价,该代理商能够解释自然语言询问,并将其转化为可执行的SQL声明,利用开放的轻量LMs。 提议了将推理和批评阶段相结合的新型双剂结构,以提高生成查询的可靠性。
Article 6
Title@2025-07-31 (4): Designing Dynamic Pricing for Bike-sharing Systems via Differentiable Agent-based Simulation
Title: Designing Dynamic Pricing for Bike-sharing Systems via Differentiable Agent-based Simulation | Dynamische Preisgestaltung für Bike-Sharing-Systeme über eine charakteristische agentenbasierte Simulation | 通过基于不同制剂的模拟,为自行车共享系统设计动态定价 2507.23344v1 |
Authors (4): Tatsuya Mitomi, Fumiyasu Makinoshima, Fumiya Makihara, Eigo Segawa
Bike-sharing systems are emerging in various cities as a new ecofriendly transportation system. In these systems, spatiotemporally varying user demands lead to imbalanced inventory at bicycle stations, resulting in additional relocation costs. Therefore, it is essential to manage user demand through optimal dynamic pricing for the system. However, optimal pricing design for such a system is challenging because the system involves users with diverse backgrounds and their probabilistic choices. To address this problem, we develop a differentiable agent-based simulation to rapidly design dynamic pricing in bike-sharing systems, achieving balanced bicycle inventory despite spatiotemporally heterogeneous trips and probabilistic user decisions. We first validate our approach against conventional methods through numerical experiments involving 25 bicycle stations and five time slots, yielding 100 parameters. Compared to the conventional methods, our approach obtains a more accurate solution with a 73% to 78% reduction in loss while achieving more than a 100-fold increase in convergence speed. We further validate our approach on a large-scale urban bike-sharing system scenario involving 289 bicycle stations, resulting in a total of 1156 parameters. Through simulations using the obtained pricing policies, we confirm that these policies can naturally induce balanced inventory without any manual relocation. Additionally, we find that the cost of discounts to induce the balanced inventory can be minimized by setting appropriate initial conditions.
在不同的城市中,自行车共享系统正在作为一种新的生态友好运输系统出现。在这些系统中,零星变化的用户需求导致自行车站库存的不平衡,导致更多的搬迁费用。因此,通过系统的最佳动态定价来管理用户需求至关重要。然而,这种系统的最佳定价设计具有挑战性,因为该系统涉及不同背景的用户及其概率选择。为解决这一问题,我们开发了一种基于不同代理的模拟,以迅速设计自行车共享系统的动态定价,尽管有短暂的不同旅行和概率性用户决定,却实现平衡的自行车库存。我们首先通过涉及25个自行车站和5个时档的数字实验来验证我们的方法,产生100个参数。与常规方法相比,我们的方法获得了更准确的解决方案,将损失减少73%至78%,同时将趋同速度提高100倍以上。我们进一步验证了我们对于涉及289个自行车站的大规模城市自行车共享系统情景的方法,从而得出了总共1156项参数。我们通过使用获得的定价政策进行模拟,首先验证了常规方法,通过涉及25个自行车站和5个时档,产生100个时段参数。我们的方法,我们确认与常规方法的方法是有效的。与常规方法的验证。与常规方法相比,我们的方法得到了一种更精确的解决方案,与常规方法,与常规方法,与常规方法相比,与常规方法得到100个参数。与常规方法是比较,与常规方法是比较,与常规方法,与常规方法,我们的方法是比较,与常规方法,与常规方法,与常规方法,与常规方法的比,我们的方法是比较,与常规方法的比,我们的方法,与常规方法是比较,我们的方法,我们的方法得到的比,与常规方法的比,我们的方法得到的比,我们的方法得到的比,我们的方法得到的比,我们的方法得到的比,与常规方法得到100。与常规方法得到的比,比,我们的方法得到的比,比,我们的方法得到的比,我们的方法得到的比,比,我们的方法得到的比,我们的方法得到的比,我们得到的比,我们的方法得到的比,我们的方法是可以使我们的方法得到的比,比,我们的方法得到的比,我们得到的比重,我们的方法获得的价格政策可以使我们的方法得到的比,我们得到的比,我们得到的比,我们得到的比价政策,比,我们得到的
Article 7
Title@2025-07-31 (4): DSBC : Data Science task Benchmarking with Context engineering
Title: DSBC : Data Science task Benchmarking with Context engineering | DSBC : Data Science-Aufgabe Benchmarking mit Kontext-Engineering | DSBC: 数据科学任务与背景工程基准 2507.23336v1 |
Authors (6): Ram Mohan Rao Kadiyala, Siddhant Gupta, Jebish Purbey, Giulio Martini, Suman Debnath, Hamza Farooq
Recent advances in large language models (LLMs) have significantly impacted data science workflows, giving rise to specialized data science agents designed to automate analytical tasks. Despite rapid adoption, systematic benchmarks evaluating the efficacy and limitations of these agents remain scarce. In this paper, we introduce a comprehensive benchmark specifically crafted to reflect real-world user interactions with data science agents by observing usage of our commercial applications. We evaluate three LLMs: Claude-4.0-Sonnet, Gemini-2.5-Flash, and OpenAI-o4-Mini across three approaches: zero-shot with context engineering, multi-step with context engineering, and with SmolAgent. Our benchmark assesses performance across a diverse set of eight data science task categories, additionally exploring the sensitivity of models to common prompting issues, such as data leakage and slightly ambiguous instructions. We further investigate the influence of temperature parameters on overall and task-specific outcomes for each model and approach. Our findings reveal distinct performance disparities among the evaluated models and methodologies, highlighting critical factors that affect practical deployment. The benchmark dataset and evaluation framework introduced herein aim to provide a foundation for future research of more robust and effective data science agents.
大型语言模型(LLMS)的最近进展对数据科学工作流程产生了重大影响,产生了专门的数据科学代理物,目的是实现分析任务的自动化。尽管迅速采用,但评估这些代理物的功效和局限性的系统基准仍然很少。在本文件中,我们引入了一个全面基准,专门通过观察我们商业应用的使用情况来反映实际用户与数据科学代理物的相互作用。我们评估了三个LLMs:Claude-4.0-Sonnet、Gemini-2.5-Flash和OpenAI-o4-Mini,这三种方法包括:环境工程零弹射、环境工程多步和SmolAgency。我们的基准评估了八个数据科学任务类别的业绩,另外探讨了模型对共同提示问题的敏感性,例如数据泄漏和略微模糊的指示。我们进一步调查了温度参数对每个模型和方法的总体和具体任务结果的影响。我们的调查结果揭示了评价模型和方法之间不同的业绩差异,突出了影响实际应用的关键因素。我们在此介绍的基准数据集和评价框架的目的是为未来研究更可靠和有效的数据科学代理物提供基础。
Article 8
Title@2025-07-31 (4): SDHN: Skewness-Driven Hypergraph Networks for Enhanced Localized Multi-Robot Coordination
Title: SDHN: Skewness-Driven Hypergraph Networks for Enhanced Localized Multi-Robot Coordination | SDHN: Skewness-getriebene Hypergrafennetzwerke für verbesserte lokale Multi-Roboter-Koordination | SDHN: Skewness-Driven 增强本地化多机器人协调电报网络 2504.06684v2 |
Authors (6): Delin Zhao, Yanbo Shan, Chang Liu, Shenghang Lin, Yingxin Shou, Bin Xu
Multi-Agent Reinforcement Learning is widely used for multi-robot coordination, where simple graphs typically model pairwise interactions. However, such representations fail to capture higher-order collaborations, limiting effectiveness in complex tasks. While hypergraph-based approaches enhance cooperation, existing methods often generate arbitrary hypergraph structures and lack adaptability to environmental uncertainties. To address these challenges, we propose the Skewness-Driven Hypergraph Network (SDHN), which employs stochastic Bernoulli hyperedges to explicitly model higher-order multi-robot interactions. By introducing a skewness loss, SDHN promotes an efficient structure with Small-Hyperedge Dominant Hypergraph, allowing robots to prioritize localized synchronization while still adhering to the overall information, similar to human coordination. Extensive experiments on Moving Agents in Formation and Robotic Warehouse tasks validate SDHN’s effectiveness, demonstrating superior performance over state-of-the-art baselines.
多代理强化学习方案被广泛用于多机器人协调,其中简单的图表一般是模拟双向互动的模型,但是,这种表述未能捕捉高阶协作,限制了复杂任务的效力。虽然基于高空方法可以加强合作,但现有方法往往产生专横的高压结构,缺乏对环境不确定性的适应性。为了应对这些挑战,我们建议使用Skewness-Driven超光速网络(SDHN),利用Skewness-Driven超光速网络(SDHN)来明确模拟高阶多式机器人互动。通过引入一个偏差损失,SDHN促进与小型顶部域域图的高效结构,允许机器人将本地同步列为优先事项,同时仍然坚持总体信息,类似于人类协调。关于形成和机器人仓库任务中的移动工具的广泛实验可以验证SDHN的效能,表明其优于最先进的基线。
Article 9
Title@2025-07-31 (4): CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation
Title: CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation | CEE: Eine Inferenz-Zeit-Jailbreak-Verteidigung für eingedrungene Intelligenz über Subraumkonzept-Rotation | 中东欧:通过子空间概念旋转对潜入式情报进行推论-时间破狱防御 2504.13201v2 |
Authors (8): Jirui Yang, Zheyu Lin, Zhihui Lu, Yinggui Wang, Lei Wang, Tao Wei, Xin Du, Shuhan Yang
Large Language Models (LLMs) are increasingly becoming the cognitive core of Embodied Intelligence (EI) systems, such as robots and autonomous vehicles. However, this integration also exposes them to serious jailbreak risks, where malicious instructions can be transformed into dangerous physical actions. Existing defense mechanisms suffer from notable drawbacks–including high training costs, significant inference delays, and complex hyperparameter tuning–which limit their practical applicability. To address these challenges, we propose a novel and efficient inference-time defense framework: Concept Enhancement Engineering (CEE). CEE enhances the model’s inherent safety mechanisms by directly manipulating its internal representations, requiring neither additional training nor external modules, thereby improving defense efficiency. Furthermore, CEE introduces a rotation-based control mechanism that enables stable and linearly tunable behavioral control of the model. This design eliminates the need for tedious manual tuning and avoids the output degradation issues commonly observed in other representation engineering methods. Extensive experiments across multiple EI safety benchmarks and diverse attack scenarios demonstrate that CEE significantly improves the defense success rates of various multimodal LLMs. It effectively mitigates safety risks while preserving high-quality generation and inference efficiency, offering a promising solution for deploying safer embodied intelligence systems.
大型语言模型(LLMS)正日益成为机器人和自主车辆等Ebodied Intell(EI)系统的认知核心;然而,这种整合还使其面临严重的越狱风险,恶意指令可转化为危险的物理行为;现有防御机制存在明显的缺陷,包括高培训成本、严重推论拖延和复杂的超参数调,限制了其实际适用性;为应对这些挑战,我们提议了一个创新和有效的推论时间防御框架:概念增强工程。中东欧通过直接调整内部代表方式,既不需要额外的培训,也不需要外部模块,从而提高防御效率,加强了模型的固有安全机制。此外,中东欧还引入了基于轮换的控制机制,使该模型能够稳定和线性地对金枪鱼行为进行控制。这一设计消除了对老调和避免其他代表工程方法常见的产出退化问题的需求。跨多个EI安全基准和不同攻击情景的广泛实验表明,中东欧通过直接操纵其内部代表方式,既不需要额外的培训,也不需要额外的外部模块,从而能够提高国防成功率,从而提高防御效率。此外,中东欧还引入了基于高品质的系统,有效地降低安全风险。
Article 10
Title@2025-07-31 (4): XABPs: Towards eXplainable Autonomous Business Processes
Title: XABPs: Towards eXplainable Autonomous Business Processes | XABPs: Auf dem Weg zu eXplainable Autonomous Business Processes | XABPs:迈向可塑性自治商业进程 2507.23269v1 |
Authors (6): Peter Fettke, Fabiana Fournier, Lior Limonad, Andreas Metzger, Stefanie Rinderle-Ma, Barbara Weber
Autonomous business processes (ABPs), i.e., self-executing workflows leveraging AI/ML, have the potential to improve operational efficiency, reduce errors, lower costs, improve response times, and free human workers for more strategic and creative work. However, ABPs may raise specific concerns including decreased stakeholder trust, difficulties in debugging, hindered accountability, risk of bias, and issues with regulatory compliance. We argue for eXplainable ABPs (XABPs) to address these concerns by enabling systems to articulate their rationale. The paper outlines a systematic approach to XABPs, characterizing their forms, structuring explainability, and identifying key BPM research challenges towards XABPs.
自主业务流程,即利用AI/ML的自动执行工作流程,有可能提高业务效率,减少错误,降低成本,改进反应时间,让工人免费从事更具战略性和创造性的工作,但是,ABP可能会引起具体关切,包括利益攸关方信任度降低、调试困难、问责制受到阻碍、偏见风险和监管合规问题。我们主张采用exiveABP(XABP)系统来解决这些问题,使系统能够说明其理由。该文件概述了对XABP的系统做法,说明其形式,安排解释性,并确定BPM对XABP的主要研究挑战。
Article 11
Title@2025-07-31 (4): DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System
Title: DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System | DynaSwarm: Dynamische Graphenstrukturauswahl für LLM-basiertes Multi-Agent-System | DynSwarm: 以LLM为基础的多剂系统动态图结构选择 2507.23261v1 |
Authors (2): Hui Yi Leong, Yuqing Wu
Current multi-agent systems (MAS) frameworks often rely on manually designed and static collaboration graph structures, limiting adaptability and performance. To address these limitations, we propose DynaSwarm, a dynamic framework that enhances LLM-based MAS through two key innovations: (1) an actor-critic reinforcement learning (A2C) mechanism to optimize graph structures with improved stability over prior RL methods, and (2) a dynamic graph selector that adaptively chooses the optimal graph structure for each input sample via parameter-efficient LLM fine-tuning. DynaSwarm eliminates the need for rigid, one-fits-all graph architectures, instead leveraging sample-specific idiosyncrasies to dynamically route queries through specialized agent networks. (c) We propose to fine-tune the demonstration retriever to fully exploit the power of in-context learning (ICL). Extensive experiments on question answering, mathematical reasoning, and coding tasks demonstrate that DynaSwarm consistently outperforms state-of-the-art single-agent and MAS baselines across multiple LLM backbones. Our findings highlight the importance of sample-aware structural flexibility in LLM MAS designs.
目前的多试剂系统框架往往依赖于人工设计和静态协作图结构,限制了适应性和性能。为解决这些限制,我们提议Dynaswarm,这是一个动态框架,通过两个关键的创新,加强以LLM为基础的MAS,加强LM的LMS:(1) 行为者-北极强化学习(A2C)机制,优化图形结构,使其比先前RL方法更加稳定;(2) 动态图形选择器,通过参数高效LLM微调,适应性地选择每个输入样本的最佳图形结构。Dynaswarm消除了对硬性、一刀切的图形结构的需要,而不用通过专门的LMM主干网利用样本的特征合成来动态地查询。 (c) 我们建议对演示检索器进行微调,以充分利用文本学习的力量(ICLL),对问题回答、数学推理和共同任务进行广泛的实验,表明Dynaswarm始终超越LMM系统多个主干网的S-awa结构。我们的调查结果强调了LMMAS设计中样品结构灵活性的重要性。
Article 12
Title@2025-07-31 (4): Accessibility Scout: Personalized Accessibility Scans of Built Environments
Title: Accessibility Scout: Personalized Accessibility Scans of Built Environments | Accessibility Scout: Personalisierte Barrierefreiheit Scans von gebauten Umgebungen | 无障碍童子军:个人化无障碍环境扫描仪 2507.23190v1 |
Authors (4): William Huang, Xia Su, Jon E. Froehlich, Yang Zhang
Assessing the accessibility of unfamiliar built environments is critical for people with disabilities. However, manual assessments, performed by users or their personal health professionals, are laborious and unscalable, while automatic machine learning methods often neglect an individual user’s unique needs. Recent advances in Large Language Models (LLMs) enable novel approaches to this problem, balancing personalization with scalability to enable more adaptive and context-aware assessments of accessibility. We present Accessibility Scout, an LLM-based accessibility scanning system that identifies accessibility concerns from photos of built environments. With use, Accessibility Scout becomes an increasingly capable “accessibility scout”, tailoring accessibility scans to an individual’s mobility level, preferences, and specific environmental interests through collaborative Human-AI assessments. We present findings from three studies: a formative study with six participants to inform the design of Accessibility Scout, a technical evaluation of 500 images of built environments, and a user study with 10 participants of varying mobility. Results from our technical evaluation and user study show that Accessibility Scout can generate personalized accessibility scans that extend beyond traditional ADA considerations. Finally, we conclude with a discussion on the implications of our work and future steps for building more scalable and personalized accessibility assessments of the physical world.
对不熟悉的建筑环境的无障碍环境进行评估对于残疾人来说至关重要,然而,由用户或其个人保健专业人员进行的手工评估是艰苦和无法扩展的,而自动机器学习方法往往忽视个人用户的独特需要。在大语言模型(LLMS)方面最近的进展使得能够对这个问题采取新的办法,在个性化和可扩展性之间取得平衡,以便能够对无障碍环境进行更适应性和符合背景的评估。我们介绍了基于LLM的无障碍访问扫描系统,即基于LLM的无障碍访问扫描系统,该系统可以查明建筑环境照片对无障碍环境的关切。随着使用,无障碍童子军变得越来越能够“无障碍探测”,通过人类-AI合作评估,将无障碍扫描与个人的行动水平、偏好和具体环境利益进行定制。我们介绍了三项研究的结果:由6名参与者组成的成型研究,为无障碍童军的设计提供信息,对500幅建筑环境的图像进行技术评估,以及10名不同流动性的参与者进行用户研究。我们的技术评价和用户研究的结果显示,无障碍童军可以产生超越传统AD考虑的个性化无障碍扫描。最后,我们讨论了我们的工作的影响以及未来步骤,以建立更可扩展的无障碍环境的无障碍环境的无障碍世界评估。
Article 13
Title@2025-07-31 (4): LENS: Learning Ensemble Confidence from Neural States for Multi-LLM Answer Integration
Title: LENS: Learning Ensemble Confidence from Neural States for Multi-LLM Answer Integration | LENS: Lerne Ensemble Vertrauen aus neuralen Staaten für Multi-LLM-Antwortintegration | LENS:从神经国家学习多LLM应答整合的集合信任 2507.23167v1 |
Authors (1): Jizhou Guo
Large Language Models (LLMs) have demonstrated impressive performance across various tasks, with different models excelling in distinct domains and specific abilities. Effectively combining the predictions of multiple LLMs is crucial for enhancing system robustness and performance. However, existing ensemble methods often rely on simple techniques like voting or logits ensembling, which overlook the varying confidence and reliability of models in different contexts. In this work, we propose LENS (Learning ENsemble confidence from Neural States), a novel approach that learns to estimate model confidence by analyzing internal representations. For each LLM, we train a lightweight linear confidence predictor that leverages layer-wise hidden states and normalized probabilities as inputs. This allows for more nuanced weighting of model predictions based on their context-dependent reliability. Our method does not require modifying the model parameters and requires negligible additional computation. Experimental results on multiple-choice and boolean question-answering tasks demonstrate that LENS outperforms traditional ensemble methods by a substantial margin. Our findings suggest that internal representations provide valuable signals for determining model confidence and can be effectively leveraged for ensemble learning.
大型语言模型(LLMS)在各种任务中表现出了令人印象深刻的成绩,不同模型在不同的领域和具体能力方面表现得不同。有效地结合对多个LLMS的预测对于提高系统稳健性和性能至关重要。然而,现有的混合方法往往依赖简单的技术,如投票或登录组合,这些技术忽视了不同情况下模型的不同信心和可靠性。在这项工作中,我们提议LENS(从神经国学习可综合信任),这是一种新颖的方法,通过分析内部代表来评估模型的信心。我们为每个LM公司培训了一个轻量线性线性信心预测器,该预测器能够利用分层的隐藏状态和正常的概率作为投入。这使得能够根据不同背景的可靠性对模型预测进行更细致的加权。我们的方法并不要求修改模型参数,而需要微不足道的额外计算。多曲和布林问答任务的实验结果表明,LENS比传统的混合方法要差很多。我们的研究结果表明,内部代表提供了宝贵的信号,用以确定模型信任度,并且能够有效地利用该软件学习。
Article 14
Title@2025-07-30 (3): Causal-Inspired Multi-Agent Decision-Making via Graph Reinforcement Learning
Title: Causal-Inspired Multi-Agent Decision-Making via Graph Reinforcement Learning | Causal-Inspired Multi-Agent Entscheidungs-Making über Graph Verstärkungs-Lernen | 通过图集强化学习作出因果-受激励的多机构机构决策 2507.23080v1 |
Authors (4): Jing Wang, Yan Jin, Fei Ding, Chongfeng Wei
Since the advent of autonomous driving technology, it has experienced remarkable progress over the last decade. However, most existing research still struggles to address the challenges posed by environments where multiple vehicles have to interact seamlessly. This study aims to integrate causal learning with reinforcement learning-based methods by leveraging causal disentanglement representation learning (CDRL) to identify and extract causal features that influence optimal decision-making in autonomous vehicles. These features are then incorporated into graph neural network-based reinforcement learning algorithms to enhance decision-making in complex traffic scenarios. By using causal features as inputs, the proposed approach enables the optimization of vehicle behavior at an unsignalized intersection. Experimental results demonstrate that our proposed method achieves the highest average reward during training and our approach significantly outperforms other learning-based methods in several key metrics such as collision rate and average cumulative reward during testing. This study provides a promising direction for advancing multi-agent autonomous driving systems and make autonomous vehicles’ navigation safer and more efficient in complex traffic environments.
自自主驾驶技术出现以来,在过去十年中,它取得了显著的进展;然而,大多数现有研究仍然在努力应对多种车辆必须天衣无缝地互动的环境所构成的挑战;这项研究的目的是利用因果学习与强化学习方法相结合,利用因果分解代表学习(CDRL),查明并提取影响自主车辆最佳决策的因果特征;这些特征随后被纳入基于神经网络的图形强化学习算法,以加强复杂交通情况的决策;通过将因果特征作为投入,拟议方法使得车辆行为在未信号交界处得到优化;实验结果显示,我们拟议方法在培训期间获得最高平均奖励,而且我们的方法大大优于若干关键指标中的其他基于学习方法,如碰撞率和测试期间的平均累积奖励;这项研究为推进多剂自主驾驶系统、使机动车辆在复杂的交通环境中更安全、更高效地航行提供了有希望的方向。
Article 15
Title@2025-07-30 (3): Bifröst: Spatial Networking with Bigraphs
Title: Bifröst: Spatial Networking with Bigraphs | Bifröst: Räumliche Vernetzung mit Bigraphen | Bifröst:与论文进行空间联网 2507.22687v1 |
Authors (5): Josh Millar, Ryan Gibb, Roy Ang, Anil Madhavapeddy, Hamed Haddadi
Modern networked environments increasingly rely on spatial reasoning, but lack a coherent representation for coordinating physical space. Consequently, tasks such as enforcing spatial access policies remain fragile and manual. We first propose a unifying representation based on bigraphs, capturing spatial, social, and communication relationships within a single formalism, with user-facing tools to generate bigraphs from physical environments. Second, we present a hierarchical agent architecture for distributed spatial reasoning, with runtimes for agentic processes to interact the spatial representation, and a context-aware execution model that scopes reasoning to the smallest viable subspace. Together, these enable private, reliable, and low-latency spatial networking that can safely interact with agentic workflows.
现代网络环境日益依赖空间推理,但缺乏协调物理空间的连贯代表性。因此,执行空间访问政策等任务仍然脆弱且手动化。我们首先建议基于历史文献的统一代表性,在单一的形式主义中捕捉空间、社会和通信关系,使用以用户为主的工具从物理环境中生成文献。第二,我们为分布式空间推理提供了一个分级代理结构,为空间代表互动代理流程提供了运行时间,以及一个背景认知执行模式,该模式将推到最小可行的子空间。这些共同使得能够安全地与代理工作流程互动的私人、可靠和低纬度空间网络得以实现。
Article 16
Title@2025-07-30 (3): Towards Simulating Social Influence Dynamics with LLM-based Multi-agents
Title: Towards Simulating Social Influence Dynamics with LLM-based Multi-agents | Auf dem Weg zur Simulation sozialer Einflussdynamik mit LLM-basierten Multi-Agenten | 利用以LLM为基础的多剂模拟社会影响动态 2507.22467v1 |
Authors (6): Hsien-Tsung Lin, Pei-Cing Huang, Chan-Tung Ku, Chan Hsu, Pei-Xuan Shieh, Yihuang Kang
Recent advancements in Large Language Models offer promising capabilities to simulate complex human social interactions. We investigate whether LLM-based multi-agent simulations can reproduce core human social dynamics observed in online forums. We evaluate conformity dynamics, group polarization, and fragmentation across different model scales and reasoning capabilities using a structured simulation framework. Our findings indicate that smaller models exhibit higher conformity rates, whereas models optimized for reasoning are more resistant to social influence.
最近大语言模型的进步为模拟复杂的人类社会互动提供了很有希望的能力。我们调查了基于LLM的多试剂模拟是否可以复制在线论坛所观察到的核心人类社会动态。我们利用结构化模拟框架评估了不同模式规模的合规动态、群体两极分化和分裂以及推理能力。我们的调查结果显示,较小的模型表现出更高的合规率,而优化的推理模型更耐受社会影响。
Article 17
Title@2025-07-30 (3): Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework
Title: Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework | Auf dem Weg zu einer interpretierbaren Renal Health-Prognose über Multi-LMM-Kollaboratives Reasoning-Framework | 通过多伦多和多伦多MM合作理由框架,迈向可解释性中时健康下降预测 2507.22464v1 |
Authors (6): Peng-Yi Wu, Pei-Cing Huang, Ting-Yu Chen, Chantung Ku, Ming-Yen Lin, Yihuang Kang
Accurate and interpretable prediction of estimated glomerular filtration rate (eGFR) is essential for managing chronic kidney disease (CKD) and supporting clinical decisions. Recent advances in Large Multimodal Models (LMMs) have shown strong potential in clinical prediction tasks due to their ability to process visual and textual information. However, challenges related to deployment cost, data privacy, and model reliability hinder their adoption. In this study, we propose a collaborative framework that enhances the performance of open-source LMMs for eGFR forecasting while generating clinically meaningful explanations. The framework incorporates visual knowledge transfer, abductive reasoning, and a short-term memory mechanism to enhance prediction accuracy and interpretability. Experimental results show that the proposed framework achieves predictive performance and interpretability comparable to proprietary models. It also provides plausible clinical reasoning processes behind each prediction. Our method sheds new light on building AI systems for healthcare that combine predictive accuracy with clinically grounded interpretability.
对估计球状过滤率(eGFR)的准确和可解释的预测对于管理慢性肾病(CKD)和支持临床决定至关重要。大型多式模型(LMMs)最近的进展表明,由于其处理视觉和文字信息的能力,临床预测任务具有巨大潜力。然而,与部署成本、数据隐私和模型可靠性有关的挑战妨碍了其采用。在本研究中,我们提议了一个合作框架,以提高用于eGFR预报的开放源LMs的性能,同时产生具有临床意义的解释。框架包括视觉知识转移、诱拐推理和短期记忆机制,以提高预测的准确性和可解释性。实验结果显示,拟议的框架实现了预测性业绩和可与专利模型相比的可解释性。它还提供了各种预测背后的可信的临床推理过程。我们的方法为建立将预测性准确性和基于临床的解释性结合起来的AI系统提供了新的思路。
Article 18
Title@2025-07-30 (3): The challenge of hidden gifts in multi-agent reinforcement learning
Title: The challenge of hidden gifts in multi-agent reinforcement learning | Die Herausforderung der versteckten Gaben in Multi-Agenten-Verstärkung Lernen | 多试剂强化学习中隐藏礼品的挑战 2505.20579v3 |
Authors (2): Dane Malenfant, Blake A. Richards
Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These “hidden gifts” represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a “hidden gift”. We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of “hidden gifts”, and demonstrate that learning awareness in independent agents can benefit these settings.
有时我们从其他人的行动中受益,即使我们不知道他们采取了这些行动。例如,如果邻居选择不在其家中时不在其家门前停泊,即使不知道他们采取了这一行动,也可以受益。这些“隐藏的礼物”代表了多试剂强化学习(MARL)的一个有趣的挑战,因为当其他人的有益行动被隐藏起来时,就分配信用是非三角的。在这里,我们研究隐藏的礼品的影响,任务很简单,MARL的任务非常简单。在这个任务中,网格世界环境中的代理商有单独的门可以打开,以获得个人报酬。同样,如果所有代理商都打开了他们的家门,他们也可以得到更大的集体奖赏。然而,所有这些“隐藏的礼物”只是当代理人在其他人的有益行动被隐藏起来的时候,集体奖赏才能得到。 值得注意的是,没有什么可以告诉代理商其他代理商已经放下了钥匙,因此,放弃他人的钥匙的行为就是“隐藏的礼物”。我们用不同的门打开门打开了自己的门来获得个人奖赏。同样,如果所有的代理商都打开他们的门门, 包括MAL 算算,那么,我们就能在他们自己学习了一个真正的历史任务中,我们如何在学习这些任务中,我们如何在学习这些任务中,我们是如何学习了。
Article 19
Title@2025-07-29 (2): Multi-Agent Path Finding Among Dynamic Uncontrollable Agents with Statistical Safety Guarantees
Title: Multi-Agent Path Finding Among Dynamic Uncontrollable Agents with Statistical Safety Guarantees | Multi-Agent Pfad finden unter dynamischen unkontrollierbaren Agenten mit statistischen Sicherheitsgarantien | 在具有统计安全保障措施的动态不可控制的代理人中寻找多机构途径 2507.22282v1 |
Authors (6): Kegan J. Strawn, Thomy Phan, Eric Wang, Nora Ayanian, Sven Koenig, Lars Lindemann
Existing multi-agent path finding (MAPF) solvers do not account for uncertain behavior of uncontrollable agents. We present a novel variant of Enhanced Conflict-Based Search (ECBS), for both one-shot and lifelong MAPF in dynamic environments with uncontrollable agents. Our method consists of (1) training a learned predictor for the movement of uncontrollable agents, (2) quantifying the prediction error using conformal prediction (CP), a tool for statistical uncertainty quantification, and (3) integrating these uncertainty intervals into our modified ECBS solver. Our method can account for uncertain agent behavior, comes with statistical guarantees on collision-free paths for one-shot missions, and scales to lifelong missions with a receding horizon sequence of one-shot instances. We run our algorithm, CP-Solver, across warehouse and game maps, with competitive throughput and reduced collisions.
现有多试剂路径发现(MAPF)解答器没有考虑到无法控制的代理商的不确定行为。 我们提出了一个新型的强化冲突搜索(ECBS)变体,用于在动态环境中使用无法控制的代理商的单发和终身MAPF。 我们的方法包括:(1) 培训一个学习的无法控制的代理商移动预测器,(2) 使用一致的预测(CP)来量化预测错误,这是统计不确定性量化的工具,(3) 将这些不确定性间隔纳入我们修改后的欧洲央行解答器。 我们的方法可以解释不确定的代理商行为,在一次性任务无碰撞路径上提供统计保障,在一次性任务上提供尺度,在一次性任务上提供终身任务,以放弃一发事件地平线序列。 我们运行我们的算法(CP-Sverol),跨仓库和游戏地图,具有竞争性的吞吐量和减少碰撞。
Article 20
Title@2025-07-29 (2): Physics-Informed EvolveGCN: Satellite Prediction for Multi Agent Systems
Title: Physics-Informed EvolveGCN: Satellite Prediction for Multi Agent Systems | Physik-informierte EvolveGCN: Satellitenvorhersage für Multi-Agent-Systeme | GCN:多剂系统的卫星预测 2507.22279v1 |
Authors (3): Timothy Jacob Huber, Madhur Tiwari, Camilo A. Riano-Rios
In the rapidly evolving domain of autonomous systems, interaction among agents within a shared environment is both inevitable and essential for enhancing overall system capabilities. A key requirement in such multi-agent systems is the ability of each agent to reliably predict the future positions of its nearest neighbors. Traditionally, graphs and graph theory have served as effective tools for modeling inter agent communication and relationships. While this approach is widely used, the present work proposes a novel method that leverages dynamic graphs in a forward looking manner. Specifically, the employment of EvolveGCN, a dynamic graph convolutional network, to forecast the evolution of inter-agent relationships over time. To improve prediction accuracy and ensure physical plausibility, this research incorporates physics constrained loss functions based on the Clohessy-Wiltshire equations of motion. This integrated approach enhances the reliability of future state estimations in multi-agent scenarios.
在自主系统迅速演变的领域中,在共享环境中,代理商之间的互动对于提高整体系统能力既是不可避免的,也是不可或缺的。这种多试剂系统的一个关键要求是,每个代理商都有能力可靠地预测其近邻的未来位置。传统上,图表和图表理论是模拟代理商之间通信和关系的有效工具。虽然这种方法被广泛使用,但目前的工作提出了一种新的方法,以前瞻性的方式利用动态图表。具体地说,使用动态图形革命网络EvolveGCN,即动态图形革命网络,来预测跨机构关系的演变。为了提高预测准确性并确保物理可行性,这一研究纳入了基于Clohessy-Wiltshire运动方程式的物理限制损失功能。这一综合方法提高了多试剂情景中未来国家估算的可靠性。
Article 21
Title@2025-07-29 (2): Successor Features for Transfer in Alternating Markov Games
Title: Successor Features for Transfer in Alternating Markov Games | Nachfolger Funktionen für den Transfer in Wechsel Markov Spiele | 在交替的 Markov 游戏中传输的后继功能 2507.22278v1 |
Authors (4): Sunny Amatya, Yi Ren, Zhe Xu, Wenlong Zhang
This paper explores successor features for knowledge transfer in zero-sum, complete-information, and turn-based games. Prior research in single-agent systems has shown that successor features can provide a ``jump start” for agents when facing new tasks with varying reward structures. However, knowledge transfer in games typically relies on value and equilibrium transfers, which heavily depends on the similarity between tasks. This reliance can lead to failures when the tasks differ significantly. To address this issue, this paper presents an application of successor features to games and presents a novel algorithm called Game Generalized Policy Improvement (GGPI), designed to address Markov games in multi-agent reinforcement learning. The proposed algorithm enables the transfer of learning values and policies across games. An upper bound of the errors for transfer is derived as a function the similarity of the task. Through experiments with a turn-based pursuer-evader game, we demonstrate that the GGPI algorithm can generate high-reward interactions and one-shot policy transfer. When further tested in a wider set of initial conditions, the GGPI algorithm achieves higher success rates with improved path efficiency compared to those of the baseline algorithms.
本文探讨了零和、完整信息和回合游戏中知识转让的后继特征。 以往对单一试剂系统的研究表明,在面临不同奖赏结构的新任务时,后继特征可以为代理提供“ 跳跃启动 ” 。 但是,游戏中的知识转让通常依赖于价值和均衡转移,这在很大程度上取决于任务之间的相似性。 当任务差异很大时,这种依赖可能导致失败。 为解决这一问题,本文件介绍了游戏的后续特征的应用,并介绍了一种叫做游戏通用政策改进(GGPI)的新型算法,旨在解决多试剂强化学习中的Markov游戏问题。提议的算法使跨游戏的学习价值和政策得以转移。转移错误的上层功能与任务相似。通过对基于转盘的追逐者-evder游戏的实验,我们证明GGPI算法可以产生高回报性互动和一发式政策转移。 当在更广阔的初始条件下进行进一步测试时,GPI算法取得了更高的成功率,其路径效率高于基线算法。
Article 22
Title@2025-07-29 (2): Validating Generative Agent-Based Models of Social Norm Enforcement: From Replication to Novel Predictions
Title: Validating Generative Agent-Based Models of Social Norm Enforcement: From Replication to Novel Predictions | Validierung generativer agentenbasierter Modelle der Durchsetzung sozialer Normen: Von der Replikation zu neuartigen Vorhersagen | 验证社会规范执行的产生代理模式:从复制到新预测 2507.22049v1 |
Authors (3): Logan Cross, Nick Haber, Daniel L. K. Yamins
As large language models (LLMs) advance, there is growing interest in using them to simulate human social behavior through generative agent-based modeling (GABM). However, validating these models remains a key challenge. We present a systematic two-stage validation approach using social dilemma paradigms from psychological literature, first identifying the cognitive components necessary for LLM agents to reproduce known human behaviors in mixed-motive settings from two landmark papers, then using the validated architecture to simulate novel conditions. Our model comparison of different cognitive architectures shows that both persona-based individual differences and theory of mind capabilities are essential for replicating third-party punishment (TPP) as a costly signal of trustworthiness. For the second study on public goods games, this architecture is able to replicate an increase in cooperation from the spread of reputational information through gossip. However, an additional strategic component is necessary to replicate the additional boost in cooperation rates in the condition that allows both ostracism and gossip. We then test novel predictions for each paper with our validated generative agents. We find that TPP rates significantly drop in settings where punishment is anonymous, yet a substantial amount of TPP persists, suggesting that both reputational and intrinsic moral motivations play a role in this behavior. For the second paper, we introduce a novel intervention and see that open discussion periods before rounds of the public goods game further increase contributions, allowing groups to develop social norms for cooperation. This work provides a framework for validating generative agent models while demonstrating their potential to generate novel and testable insights into human social behavior.
随着大型语言模型(LLMs)的进步,人们越来越有兴趣利用这些模型模拟人类社会行为,通过基于基因的代理模型(GABM)模拟人类社会行为。然而,验证这些模型仍是一个关键挑战。我们提出一个系统化的两阶段验证方法,利用心理文献的社会进化范式,首先确定LLM代理商在两种里程碑式文件的混合环境中复制已知人类行为所必需的认知构件,然后利用经过验证的架构模拟新的条件。我们对不同认知结构的模型的比较表明,基于人的个人差异和思维能力理论对于复制第三方惩罚(TPP)作为昂贵的可信度信号至关重要。对于第二次关于公益游戏的研究来说,这一架构能够复制从通过流言传传播声誉信息而增加合作的两阶段。然而,需要增加一个额外的战略构件,以便在允许排斥和流言的条件下复制合作率。然后我们用经过验证的基因检验的代理商对每份文件进行新的预测。我们发现,TPP率在惩罚是匿名的环境下大幅下降,而TPP的第二个数额是相当的具有信任性的信号信号。对于第二次研究来说,这个结构能够复制从通过传声道信息信息信息信息信息传播的游戏中增加社会行为,让我们的内在和道德行为,从而在社会行为研究中增加社会行为研究中增加其内在和道德和道德表现。
Article 23
Title@2025-07-29 (2): Towards Cognitive Synergy in LLM-Based Multi-Agent Systems: Integrating Theory of Mind and Critical Evaluation
Title: Towards Cognitive Synergy in LLM-Based Multi-Agent Systems: Integrating Theory of Mind and Critical Evaluation | Auf dem Weg zu kognitiver Synergie in LLM-basierten Multiagentensystemen: Integration der Theorie des Geistes und kritische Evaluation | 在以LLM为基础的多种机构系统中实现认知协同:综合思维理论和关键评价 2507.21969v1 |
Authors (2): Adam Kostka, Jarosław A. Chudziak
Recently, the field of Multi-Agent Systems (MAS) has gained popularity as researchers are trying to develop artificial intelligence capable of efficient collective reasoning. Agents based on Large Language Models (LLMs) perform well in isolated tasks, yet struggle with higher-order cognition required for adaptive collaboration. Human teams achieve synergy not only through knowledge sharing, but also through recursive reasoning, structured critique, and the ability to infer others’ mental states. Current artificial systems lack these essential mechanisms, limiting their ability to engage in sophisticated collective reasoning. This work explores cognitive processes that enable effective collaboration, focusing on adaptive theory of mind (ToM) and systematic critical evaluation. We investigate three key questions. First, how does the ability to model others’ perspectives enhance coordination and reduce redundant reasoning? Second, to what extent does structured critique improve reasoning quality by identifying logical gaps and mitigating biases? Third, the interplay of these mechanisms can lead to emergent cognitive synergy, where the collective intelligence of the system exceeds the sum of its parts. Through an empirical case study on complex decision making, we show that the integration of these cognitive mechanisms leads to more coherent, adaptive, and rigorous agent interactions. This article contributes to the field of cognitive science and AI research by presenting a structured framework that emulates human-like collaborative reasoning MAS. It highlights the significance of dynamic ToM and critical evaluation in advancing multi-agent systems’ ability to tackle complex, real-world challenges.
最近,随着研究人员努力开发能有效集体推理的人工智能,多语言系统领域日益受到欢迎。基于大语言模型的代理人在孤立的任务中表现良好,但与适应性合作所需的更高层次的认知力作斗争。人类团队不仅通过知识共享,而且通过循环推理、结构化批评和推断他人精神状态的能力实现协同增效。当前的人工系统缺乏这些基本机制,限制了他们参与复杂集体推理的能力。这项工作探索了能够进行有效合作的认知过程,重点是思维的适应理论和系统批判性评估。我们调查了三个关键问题。首先,模型他人观点的能力如何加强协调和减少多余的推理?第二,通过找出逻辑差距和减少偏见来结构化地改进推理质量的程度如何?第三,这些机制的相互作用可以导致出现突发的认知协同效应,使系统的集体智能超过其复杂部分的总和。通过对复杂的决策进行经验性案例研究,我们展示了这些认知机制的整合导致科学进入更连贯、适应性、适应性和严格的推理能力,从而形成一个具有前瞻性的、具有历史价值的模型。这一条条条条有助于通过像机构性的研究推论和结构化的模型推理推理,从而推理推理,从而推理地推理推理质量地推理质量地推理质量推理质量推理质量推理质量推理质量。
Article 24
Title@2025-07-29 (2): A finite time analysis of distributed Q-learning
Title: A finite time analysis of distributed Q-learning | Eine endliche Zeitanalyse des verteilten Q-Learning | 对分发的 “ 学习 “ 的有限时间分析 2405.14078v2 |
Authors (2): Han-Dong Lim, Donghwan Lee
Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL). In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards. In particular, we study finite-time analysis of a distributed Q-learning algorithm, and provide a new sample complexity result of $\tilde{\mathcal{O}}\left( \min\left{\frac{1}{\epsilon^2}\frac{t_{\text{mix}}}{(1-\gamma)^6 d_{\min}^4 } ,\frac{1}{\epsilon}\frac{\sqrt{ | \gS | \gA | }}{(1-\sigma_2(\boldsymbol{W}))(1-\gamma)^4 d_{\min}^3} \right}\right)$ under tabular lookup |
多剂加固学习(MARL)在单一剂加固学习(RL)应用成功经验的推动下,出现了令人瞩目的兴趣激增。在本研究中,我们考虑一个分布式的Q-学习设想方案,其中一些代理机构合作解决了顺序决策问题,而没有获得当地平均奖励的中央奖励功能。特别是,我们研究了对分布式Q-学习算法的有限时间分析,并提供了一个新的样本复杂性结果,即:$\tilde_mathcal{Oleft(min\leftfäfrac{1-hepsilon_2frac{t{t{t{mix}(1-\gamma)6 dmin4},\frac\sqrtg_S\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
Article 25
Title@2025-07-29 (2): Agent-Based Exploration of Recommendation Systems in Misinformation Propagation
Title: Agent-Based Exploration of Recommendation Systems in Misinformation Propagation | Agent-based Exploration von Empfehlungssystemen in falscher Informationsverbreitung | 错误信息传播中建议系统的基于代理人的探索 2507.21724v1 |
Authors (4): Lise Jakobsen, Anna Johanne Holden, Önder Gürcan, Özlem Özgöbek
This study uses agent-based modeling to examine the impact of various recommendation algorithms on the propagation of misinformation on online social networks. We simulate a synthetic environment consisting of heterogeneous agents, including regular users, bots, and influencers, interacting through a social network with recommendation systems. We evaluate four recommendation strategies: popularity-based, collaborative filtering, and content-based filtering, along with a random baseline. Our results show that popularity-driven algorithms significantly amplify misinformation, while item-based collaborative filtering and content-based approaches are more effective in limiting exposure to fake content. Item-based collaborative filtering was found to perform better than previously reported in related literature. These findings highlight the role of algorithm design in shaping online information exposure and show that agent-based modeling can be used to gain realistic insight into how misinformation spreads.
这项研究利用以代理商为基础的模型来审查各种建议算法对网上社交网络传播错误信息的影响。我们模拟由各种代理商组成的合成环境,包括经常用户、机器人和影响力者,通过社交网络与建议系统互动。我们评估了四项建议战略:以普及为基础、合作过滤、内容过滤以及随机基线。我们的结果显示,以普及为驱动的算法极大地扩大了错误信息,而以项目为基础的合作过滤和内容为基础的方法在限制对虚假内容的接触方面更为有效。以项目为基础的合作过滤方法被认为比相关文献中以前报告的要好。这些结论突出了算法设计在塑造在线信息曝光方面的作用,并表明,可以利用基于代理商的模型来实际了解错误信息传播的方式。
Article 26
Title@2025-07-29 (2): Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis
Title: Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis | Intrinsische Barrieren und praktische Wege für die Mensch-AI-Ausrichtung: Eine auf Vereinbarungen basierende Komplexitätsanalyse | 内在障碍和人类-AI协调的实用途径:基于协定的复杂程度分析 2502.05934v2 |
Authors (1): Aran Nayebi
We formalize AI alignment as a multi-objective optimization problem called $\langle M,N,\varepsilon,\delta\rangle$-agreement that generalizes prior approaches with fewer assumptions, in which a set of $N$ agents (including humans) must reach approximate ($\varepsilon$) agreement across $M$ candidate objectives with probability at least $1-\delta$. Using communication complexity, we prove an information-theoretic lower bound demonstrating that once either $M$ or $N$ is large enough, no interaction or rationality can avoid intrinsic alignment overheads. This barrier establishes rigorous intrinsic limits to alignment \emph{itself}, not merely to specific methods, clarifying a crucial no free lunch'' principle: encoding
all human values’’ inevitably leads to misalignment, requiring future methods to explicitly manage complexity through consensus-driven reduction or prioritization of objectives. Complementing this impossibility result, we provide explicit algorithms achieving alignment under both computationally unbounded and bounded rationality with noisy messages. Even in these best-case scenarios where alignment to arbitrary precision is theoretically guaranteed, our analysis identifies three critical scalability barriers: the number of tasks ($M$), agents ($N$), and task state space size ($D$); thereby highlighting fundamental complexity-theoretic constraints and providing guidelines for safer, scalable human-AI collaboration.
我们正式将AI对齐定为一个多目标优化问题,称为$langle M,N,\varepsilon,\varepsilon,\delta\rangle$协议,它以较少的假设来概括以往的做法,其中一组美元代理商(包括人)必须达到约合($varepsilon$),在美元候选目标之间达成约合,概率至少为1美元delta美元。我们利用通信的复杂性,证明信息理论较低约束,表明一旦美元或美元足够大,任何互动或合理性都无法避免内在的匹配间接费用。这个屏障不仅为具体方法规定了严格的内在限制,还澄清了关键的“免费午餐”原则:将“所有人类价值”编码成“所有价值”必然导致不相称,要求今后采用方法明确管理复杂性,通过协商一致驱动的减少或确定目标的优先次序。我们提供了明确的方法,在计算上不受约束和约束的合理性合理性合理性下实现一致性,即使这些最佳的情景中,与任意性精确性$(emph{self},而不仅仅是,我们的分析确定了三个关键的空间风险度(M) 任务(Simstrubilty) imstrubiltystrubilty) imstrubilty) imstrubilty) imstrucolviolverty) lading laxism
Article 27
Title@2025-07-29 (2): “Teammates, Am I Clear?”: Analysing Legible Behaviours in Teams
Title: “Teammates, Am I Clear?”: Analysing Legible Behaviours in Teams | “Teamkollegen, bin ich klar?”: Legible Verhaltensmuster in Teams analysieren | “Teammates,我清楚了吗?” “分析团队中可行的行为” 2507.21631v1 |
Authors (3): Miguel Faria, Francisco S. Melo, Ana Paiva
In this paper we investigate the notion of legibility in sequential decision-making in the context of teams and teamwork. There have been works that extend the notion of legibility to sequential decision making, for deterministic and for stochastic scenarios. However, these works focus on one agent interacting with one human, foregoing the benefits of having legible decision making in teams of agents or in team configurations with humans. In this work we propose an extension of legible decision-making to multi-agent settings that improves the performance of agents working in collaboration. We showcase the performance of legible decision making in team scenarios using our proposed extension in multi-agent benchmark scenarios. We show that a team with a legible agent is able to outperform a team composed solely of agents with standard optimal behaviour.
在本文中,我们从团队和团队精神的角度对连续决策的清晰度概念进行了调查; 已经开展了一些工作,将清晰度概念扩展到了连续决策、确定性和随机性情景; 然而,这些工作侧重于一个代理与一个人互动的代理,而不必考虑在代理团队中或与人类团队组合中做出清晰度决策的好处; 在这项工作中,我们提议将清晰度决策扩大到多代理环境,从而改善合作代理的绩效; 我们利用我们提议的多代理基准情景扩展,展示了团队情景中清晰度决策的绩效; 我们表明,一个拥有清晰度代理的团队能够超越一个完全由具有标准最佳行为的代理组成的团队。
Article 28
Title@2025-07-29 (2): A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature
Title: A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature | Ein Multi-Agent-System ermöglicht vielseitige Informationsextraktion aus der chemischen Literatur | 一个多机构系统能够从化学文献中提取 Versatile 信息 2507.20230v2 |
Authors (8): Yufan Chen, Ching Ting Leung, Bowen Yu, Jianwei Sun, Yong Huang, Linyan Li, Hao Chen, Hanyu Gao
To fully expedite AI-powered chemical research, high-quality chemical databases are the cornerstone. Automatic extraction of chemical information from the literature is essential for constructing reaction databases, but it is currently limited by the multimodality and style variability of chemical information. In this work, we developed a multimodal large language model (MLLM)-based multi-agent system for robust and automated chemical information extraction. It utilizes the MLLM’s strong reasoning capability to understand the structure of diverse chemical graphics, decompose the extraction task into sub-tasks, and coordinate a set of specialized agents, each combining the capabilities of the MLLM with the precise, domain-specific strengths of dedicated tools, to solve them accurately and integrate the results into a unified output. Our system achieved an F1 score of 80.8% on a benchmark dataset of sophisticated multimodal chemical reaction graphics from the literature, surpassing the previous state-of-the-art model (F1 score of 35.6%) by a significant margin. Additionally, it demonstrated consistent improvements in key sub-tasks, including molecular image recognition, reaction image parsing, named entity recognition and text-based reaction extraction. This work is a critical step toward automated chemical information extraction into structured datasets, which will be a strong promoter of AI-driven chemical research.
为了充分加快AI驱动的化学研究,高质量的化学数据库是基础。从文献中自动提取化学信息对于建立反应数据库至关重要,但目前受到化学信息的多式和风格变化的限制。在这项工作中,我们开发了一个基于多式联运的大型语言模型(MLLM)多剂系统,用于强有力和自动化的化学信息提取。它利用MLLM的强大推理能力来理解多种化学图形的结构,将提取任务分解成子任务,并协调一套专门剂,将MLLM的能力与专门工具的准确、具体领域优势结合起来,以便准确解决这些问题并将结果纳入统一产出。我们的系统在文献中复杂的多式化学反应图形的基准数据集上取得了80.8%的F1分,超过了以前的最先进的化学图形模型(F1分,35.6%)。此外,它展示了关键子任务(包括分子图像识别、反应图像分辨、实体识别和文本反动反应提取)的不断改进,这将推动以自动步骤进行化学数据提取。
Article 29
Title@2025-07-28 (1): Games Agents Play: Towards Transactional Analysis in LLM-based Multi-Agent Systems
Title: Games Agents Play: Towards Transactional Analysis in LLM-based Multi-Agent Systems | Games Agents Play: Auf dem Weg zur Transaktionsanalyse in LLM-basierten Multi-Agent-Systemen | 玩游戏代理游戏:争取在基于LLM的多机构系统中进行交易分析 2507.21354v1 |
Authors (2): Monika Zamojska, Jarosław A. Chudziak
Multi-Agent Systems (MAS) are increasingly used to simulate social interactions, but most of the frameworks miss the underlying cognitive complexity of human behavior. In this paper, we introduce Trans-ACT (Transactional Analysis Cognitive Toolkit), an approach embedding Transactional Analysis (TA) principles into MAS to generate agents with realistic psychological dynamics. Trans-ACT integrates the Parent, Adult, and Child ego states into an agent’s cognitive architecture. Each ego state retrieves context-specific memories and uses them to shape response to new situations. The final answer is chosen according to the underlying life script of the agent. Our experimental simulation, which reproduces the Stupid game scenario, demonstrates that agents grounded in cognitive and TA principles produce deeper and context-aware interactions. Looking ahead, our research opens a new way for a variety of applications, including conflict resolution, educational support, and advanced social psychology studies.
多主体系统(MAS)越来越多地用于模拟社会互动,但大多数框架都与人类行为的基本认知复杂性相去甚远。在本文中,我们引入了TransACT(TransAction Aly分析认知工具包),这是将交易分析(TA)原则纳入MAS的一种方法,将交易分析(TA)原则纳入MAS,以产生具有现实心理动态的代理物。跨ACT将父母、成人和儿童自我状态纳入代理物的认知结构。每个自我状态都检索了特定背景的记忆,并用它们来形成对新情况的反应。最后答案是根据代理人的基本生活文字选择的。我们的实验模拟,它复制了愚蠢的游戏情景,表明基于认知和TA原则的代理物产生更深层次的和符合背景的相互作用。展望未来,我们的研究为各种应用开辟了新的途径,包括解决冲突、教育支持和先进的社会心理学研究。
Article 30
Title@2025-07-28 (1): Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model
Title: Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model | Nachahmung des Verhaltens von Fahrern von Elektrofahrzeugen mit Hilfe eines agentengestützten Bewehrungs-Lernmodells | 利用以代理为基础的强化学习模式复制电动车辆驾驶员的行为 2507.21341v1 |
Authors (3): Zixin Feng, Qunshan Zhao, Alison Heppenstall
Despite the rapid expansion of electric vehicle (EV) charging networks, questions remain about their efficiency in meeting the growing needs of EV drivers. Previous simulation-based approaches, which rely on static behavioural rules, have struggled to capture the adaptive behaviours of human drivers. Although reinforcement learning has been introduced in EV simulation studies, its application has primarily focused on optimising fleet operations rather than modelling private drivers who make independent charging decisions. Additionally, long-distance travel remains a primary concern for EV drivers. However, existing simulation studies rarely explore charging behaviour over large geographical scales. To address these gaps, we propose a multi-stage reinforcement learning framework that simulates EV charging demand across large geographical areas. We validate the model against real-world data, and identify the training stage that most closely reflects actual driver behaviour, which captures both the adaptive behaviours and bounded rationality of private drivers. Based on the simulation results, we also identify critical ‘charging deserts’ where EV drivers consistently have low state of charge. Our findings also highlight recent policy shifts toward expanding rapid charging hubs along motorway corridors and city boundaries to meet the demand from long-distance trips.
尽管电动车(EV)充电网络迅速扩大,但对于电动车(EV)充电网络在满足EV驱动者日益增长的需求方面的效率仍然存在疑问。以前以模拟为基础的方法依靠静态行为规则,一直努力捕捉人类驱动者的适应行为。虽然在EV模拟研究中引入了强化学习,但其应用主要侧重于优化机队运作,而不是模拟独立充电决定的私人驱动者。此外,长途旅行仍然是EV驱动者的主要关切。然而,现有的模拟研究很少探索大地域范围的充电行为。然而,为填补这些空白,我们提议了一个多阶段强化学习框架,模拟EV在大地理区域上充电的需求。我们用真实世界数据验证模型,并找出最能反映实际驱动行为的培训阶段,该培训阶段既能捕捉到适应行为,又能约束私人驱动者的合理性。根据模拟结果,我们还确定了关键“热沙漠”的EV驱动者持续低电量状态。我们的调查结果还突出了最近的政策转变,即沿着高速公路走廊和城市边界扩大快速充电中心,以满足长途旅行的需求。
Article 31
Title@2025-07-28 (1): Core Safety Values for Provably Corrigible Agents
Title: Core Safety Values for Provably Corrigible Agents | Grundlegende Sicherheitswerte für wahrscheinlich korrigierbare Wirkstoffe | 可可调代用品的核心安全价值 2507.20964v1 |
Authors (1): Aran Nayebi
We introduce the first implementable framework for corrigibility, with provable guarantees in multi-step, partially observed environments. Our framework replaces a single opaque reward with five structurally separate utility heads – deference, switch-access preservation, truthfulness, low-impact behavior via a belief-based extension of Attainable Utility Preservation, and bounded task reward – combined lexicographically by strict weight gaps. Theorem 1 proves exact single-round corrigibility in the partially observable off-switch game; Theorem 3 extends the guarantee to multi-step, self-spawning agents, showing that even if each head is \emph{learned} to mean-squared error $\varepsilon$ and the planner is $\varepsilon$-sub-optimal, the probability of violating \emph{any} safety property is bounded while still ensuring net human benefit. In contrast to Constitutional AI or RLHF/RLAIF, which merge all norms into one learned scalar, our separation makes obedience and impact-limits dominate even when incentives conflict. For open-ended settings where adversaries can modify the agent, we prove that deciding whether an arbitrary post-hack agent will ever violate corrigibility is undecidable by reduction to the halting problem, then carve out a finite-horizon ``decidable island’’ where safety can be certified in randomized polynomial time and verified with privacy-preserving, constant-round zero-knowledge proofs. Consequently, the remaining challenge is the ordinary ML task of data coverage and generalization: reward-hacking risk is pushed into evaluation quality rather than hidden incentive leak-through, giving clearer implementation guidance for today’s LLM assistants and future autonomous systems.
我们引入了第一个可执行的提升框架, 并在多步、 部分观测环境中提供可识别的保障。 我们的框架将单一的不透明奖赏用五个“ 结构上分离的” 工具头替换为五个“ 结构上分离的” 工具头: 尊重、 切换访问保存、 真实性、 通过基于信仰的“ 可获取的公用事业保护” 扩展的低影响行为, 以及约束性任务奖赏 – 在严格的重量差距下, 合并地算安全财产。 理论1 证明在部分可见的离节游戏中, 完全单轮的可识别; 理论3 将保证扩大到多步、 自我保存的代理, 显示即使每个头是“ 结构” 的“ 结构上分立” 、 “ 切换访问” 保存、 真实性、 影响低影响行为框架, 通过基于信仰的“ 价格” 和“ 标准” 安全属性被捆绑在一起, 同时仍然确保人类的净利益。 与宪法的 AI 或 RLHF/ RLAIF 相比, 将所有规范都整合成一个学习的选项, 我们的分级的分级和 , 我们的分解会让一个固定的值 的值 的值 和效果上 的值值 的值值 的值值 的值 的值 的值值 的值 也使得一个固定的值 将一个固定的值 的值 的值 成为一个不固定的游戏的 的 的 的 。
Article 32
Title@2025-07-28 (1): Contrastive learning-based agent modeling for deep reinforcement learning
Title: Contrastive learning-based agent modeling for deep reinforcement learning | Kontrastive Learning-basierte Agentenmodellierung für tiefe Verstärkungs-Lernen | 用于深强化学习的反向学习代理模型模型 2401.00132v3 |
Authors (5): Wenhao Ma, Yu-Cheng Chang, Jie Yang, Yu-Kai Wang, Chin-Teng Lin
Multi-agent systems often require agents to collaborate with or compete against other agents with diverse goals, behaviors, or strategies. Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems, as this is the means by which the ego agent understands other agents’ behavior and extracts their meaningful policy representations. These representations can be used to enhance the ego agent’s adaptive policy which is trained by reinforcement learning. However, existing agent modeling approaches typically assume the availability of local observations from other agents (modeled agents) during training or a long observation trajectory for policy adaption. To remove these constrictive assumptions and improve agent modeling performance, we devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution. With these observations, CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode. We evaluated the efficacy of our approach in both cooperative and competitive multi-agent environments. Our experiments demonstrate that our approach achieves state-of-the-art on both cooperative and competitive tasks, highlighting the potential of contrastive learning-based agent modeling for enhancing reinforcement learning.
多试剂系统往往要求代理人与具有不同目标、行为或战略的其他代理人合作或竞争。在设计多试剂系统中智能机器代理人的适应性政策时,代理模型至关重要,因为这是自我代理理解其他代理人行为并提取其有意义的政策表述的手段。这些表述可用来加强自我代理的适应性政策,该政策通过强化学习得到培训。然而,现有的代理模型方法通常假定在培训期间或其他代理人(示范代理人)能够提供当地观测结果,或为适应政策提供长长的观察轨迹。为了消除这些限制性假设并改进代理模型的性能,我们设计了一种基于对抗性学习的代理模型(CLAM)方法,该方法仅依赖自我代理在培训和执行期间的当地观察。有了这些观察,CLAM能够从每集训开始就实时生成一致的高质量政策表述。我们评估了我们在合作性和竞争性多试剂环境中的方法的有效性。我们的实验表明,我们的方法在合作性和竞争性任务上都达到了最先进的水平,同时强调以对比性学习为基础的代理人强化学习模式的潜力。
Article 33
Title@2025-07-27 (7): Real-Time LaCAM for Real-Time MAPF
Title: Real-Time LaCAM for Real-Time MAPF | Echtzeit-LaCAM für Echtzeit-MAPF | 实时MAPF的实时拉卡姆 2504.06091v2 |
Authors (5): Runzhe Liang, Rishi Veerapaneni, Daniel Harabor, Jiaoyang Li, Maxim Likhachev
The vast majority of Multi-Agent Path Finding (MAPF) methods with completeness guarantees require planning full-horizon paths. However, planning full-horizon paths can take too long and be impractical in real-world applications. Instead, real-time planning and execution, which only allows the planner a finite amount of time before executing and replanning, is more practical for real-world multi-agent systems. Several methods utilize real-time planning schemes but none are provably complete, which leads to livelock or deadlock. Our main contribution is Real-Time LaCAM, the first Real-Time MAPF method with provable completeness guarantees. We do this by leveraging LaCAM (Okumura 2023) in an incremental fashion. Our results show how we can iteratively plan for congested environments with a cutoff time of milliseconds while still maintaining the same success rate as full-horizon LaCAM. We also show how it can be used with a single-step learned MAPF policy.
绝大多数具有完整保障的多行为者路径查找方法(MAPF)都要求规划全正方位路径。然而,规划全正方位路径可能花费太长,在现实世界应用中可能不切实际。相反,实时规划和实施(这只允许规划者在执行和重新规划之前有一定的时间)对于现实世界的多剂系统更为实用。几种方法使用实时规划计划,但没有一种方法可以确定地完成,从而导致僵持或僵局。我们的主要贡献是实时拉卡姆(Real-Time LaCAM),这是第一个具有可确认完整性保证的实时MAPF方法。我们通过逐步利用LACAM(Okumura 2023)来做到这一点。我们的结果表明,我们如何在保持与全方位拉卡姆(LACAM)相同的成功率的同时,以毫秒的截断点时间来反复规划凝固的环境。我们还展示了如何在单步学习的MAPF政策中使用它。
Article 34
Title@2025-07-27 (7): MLC-Agent: Cognitive Model based on Memory-Learning Collaboration in LLM Empowered Agent Simulation Environment
Title: MLC-Agent: Cognitive Model based on Memory-Learning Collaboration in LLM Empowered Agent Simulation Environment | MLC-Agent: Kognitives Modell auf Basis von Memory-Learning Collaboration in LLM Empowered Agent Simulation Environment | 刚果解放运动-刚果解放运动代理:基于LLM授权模拟环境中的记忆-学习合作的认知模型 2507.20215v1 |
Authors (4): Ming Zhang, Yiling Xuan, Qun Ma, Yuwei Guo
Many real-world systems, such as transportation systems, ecological systems, and Internet systems, are complex systems. As an important tool for studying complex systems, computational experiments can map them into artificial society models that are computable and reproducible within computers, thereby providing digital and computational methods for quantitative analysis. In current research, the construction of individual agent models often ignores the long-term accumulative effect of memory mechanisms in the development process of agents, which to some extent causes the constructed models to deviate from the real characteristics of real-world systems. To address this challenge, this paper proposes an individual agent model based on a memory-learning collaboration mechanism, which implements hierarchical modeling of the memory mechanism and a multi-indicator evaluation mechanism. Through hierarchical modeling of the individual memory repository, the group memory repository, and the memory buffer pool, memory can be effectively managed, and knowledge sharing and dissemination between individuals and groups can be promoted. At the same time, the multi-indicator evaluation mechanism enables dynamic evaluation of memory information, allowing dynamic updates of information in the memory set and promoting collaborative decision-making between memory and learning. Experimental results show that, compared with existing memory modeling methods, the agents constructed by the proposed model demonstrate better decision-making quality and adaptability within the system. This verifies the effectiveness of the individual agent model based on the memory-learning collaboration mechanism proposed in this paper in improving the quality of individual-level modeling in artificial society modeling and achieving anthropomorphic characteristics.
许多实际世界系统,如运输系统、生态系统和因特网系统,都是复杂的系统。作为研究复杂系统的一个重要工具,计算实验可以将它们映射成在计算机内可比较和可复制的人工社会模型,从而为定量分析提供数字和计算方法。在目前的研究中,单个代理模型的构建往往忽视代理人发展进程中记忆机制的长期累积效应,这在某种程度上导致所建模型偏离真实世界系统的真正特征。为了应对这一挑战,本文件提议了一个以记忆学习协作机制为基础的个体代理模型,用于对记忆机制和多指标评估机制进行分级建模和多指标级评价机制。通过个人记忆储存库、群体记忆储存库和记忆缓冲库的等级建模,可以有效管理记忆,个人和群体之间的知识共享和传播可以促进。与此同时,多指标评价机制使所建模型能够动态地评估记忆信息,允许个人记忆系统中的动态更新,并促进记忆和学习之间的协作决策。实验结果显示,在构建个人记忆储存储存机制、群体记忆储存库和记忆储存储存储存库和记忆缓冲积积存机制中,与构建现有模型化机制的改进了个人学习机制的模型,从而改进了个人学习机制。
Article 35
Title@2025-07-27 (7): ADL: A Declarative Language for Agent-Based Chatbots
Title: ADL: A Declarative Language for Agent-Based Chatbots | ADL: Eine deklarative Sprache für agentenbasierte Chatbots | ADL: 代理查博特人的宣布语言 2504.14787v2 |
Authors (2): Sirui Zeng, Xifeng Yan
There are numerous frameworks capable of creating and orchestrating agents to address complex tasks. However, most of them highly coupled Python programming with agent declaration, making it hard for maintenance and runtime optimization. In this work, we introduce ADL, an agent declarative language for customer service chatbots. ADL abstracts away implementation details, offering a declarative way to define agents and their interactions, which could ease maintenance and debugging. It also incorporates natural language programming at its core to simplify the specification and communication of chatbot designs. ADL includes four basic types of agents and supports integration with custom functions, tool use, and third-party agents. MICA, a multi-agent system designed to interpret and execute ADL programs, has been developed and is now available as an open-source project at https://github.com/Mica-labs/MICA. Its documentation can be found at https://mica-labs.github.io/.
在这项工作中,我们引入了ADL,这是供客户服务聊天员使用的宣讲语言。ADL摘要删除了实施细节,提供了界定代理人及其互动的宣示方式,这可以方便维护和调试。它还将自然语言编程纳入其核心,以简化聊天员设计的规格和通信。ADL包括四种基本类型的代理人,并支持与定制功能、工具使用和第三方代理人的整合。MICA是一个多剂系统,旨在解释和执行ADL程序,已经开发出来,现在作为开放源项目在https://github.com/Mica-labs/MICA上提供。其文件可在https://mica-labs.github.io/上查阅。
Article 36
Title@2025-07-27 (7): Local Prompt Adaptation for Style-Consistent Multi-Object Generation in Diffusion Models
Title: Local Prompt Adaptation for Style-Consistent Multi-Object Generation in Diffusion Models | Lokale Prompt-Anpassung für stilkonsistente Multi-Object-Generierung in Diffusions-Modellen | 在传播模型中为样式一致多对象生成发布模式进行本地快速适应 2507.20094v1 |
Authors (1): Ankit Sanjyal
Diffusion models have become a powerful backbone for text-to-image generation, enabling users to synthesize high-quality visuals from natural language prompts. However, they often struggle with complex prompts involving multiple objects and global or local style specifications. In such cases, the generated scenes tend to lack style uniformity and spatial coherence, limiting their utility in creative and controllable content generation. In this paper, we propose a simple, training-free architectural method called Local Prompt Adaptation (LPA). Our method decomposes the prompt into content and style tokens, and injects them selectively into the U-Net’s attention layers at different stages. By conditioning object tokens early and style tokens later in the generation process, LPA enhances both layout control and stylistic consistency. We evaluate our method on a custom benchmark of 50 style-rich prompts across five categories and compare against strong baselines including Composer, MultiDiffusion, Attend-and-Excite, LoRA, and SDXL. Our approach outperforms prior work on both CLIP score and style consistency metrics, offering a new direction for controllable, expressive diffusion-based generation.
传播模型已成为文字到图像生成的强大支柱,使用户能够从自然语言的提示中合成高质量的视觉,但是它们往往与涉及多个物体和全球或当地风格规格的复杂提示进行斗争;在这种情况下,产生的场景往往缺乏风格统一和空间一致性,限制了其在创造性和控制性内容生成中的效用;在本文中,我们提出了一个简单的、不培训的建筑方法,称为“本地即时适应 ” (LPA) 。我们的方法将快速信号分解成内容和风格符号,并有选择地将其注入U-Net的不同阶段的注意层。通过在生成过程中稍后调整对象早期和风格标志,LPA将增强布局控制和风格一致性。我们评估了我们关于五类50种丰富风格提示的定制基准的方法,并与包括集成器、多发集、出和Excite、LORA和SDXL等强基线进行比较。我们的方法超越了以前关于CLIP分数和风格一致性度度度度测量的工作,为可控、直观的传播新一代提供了新的方向。
Article 37
Title@2025-07-26 (6): Large-Scale Mixed-Traffic and Intersection Control using Multi-agent Reinforcement Learning
Title: Large-Scale Mixed-Traffic and Intersection Control using Multi-agent Reinforcement Learning | Multi-Agenten-Verstärkungs-Lernen mit großflächiger Mixed-Traffic- und Intersektionskontrolle | 利用多剂强化学习系统进行大型混合运输和跨部门控制 2504.04691v2 |
Authors (5): Songyang Liu, Muyang Fan, Weizi Li, Jing Du, Shuai Li
Traffic congestion remains a significant challenge in modern urban networks. Autonomous driving technologies have emerged as a potential solution. Among traffic control methods, reinforcement learning has shown superior performance over traffic signals in various scenarios. However, prior research has largely focused on small-scale networks or isolated intersections, leaving large-scale mixed traffic control largely unexplored. This study presents the first attempt to use decentralized multi-agent reinforcement learning for large-scale mixed traffic control in which some intersections are managed by traffic signals and others by robot vehicles. Evaluating a real-world network in Colorado Springs, CO, USA with 14 intersections, we measure traffic efficiency via average waiting time of vehicles at intersections and the number of vehicles reaching their destinations within a time window (i.e., throughput). At 80% RV penetration rate, our method reduces waiting time from 6.17s to 5.09s and increases throughput from 454 vehicles per 500 seconds to 493 vehicles per 500 seconds, outperforming the baseline of fully signalized intersections. These findings suggest that integrating reinforcement learning-based control large-scale traffic can improve overall efficiency and may inform future urban planning strategies.
在现代城市网络中,交通堵塞仍然是一项巨大的挑战。在交通控制方法中,自动化驾驶技术已经成为一个潜在的解决方案。在交通控制方法中,强化学习显示在各种情况下比交通信号表现优异。然而,先前的研究主要侧重于小型网络或孤立的交叉点,使得大规模混合交通控制基本上没有被探索。本研究首次尝试利用分散式多剂强化学习,以大规模混合交通控制,其中某些交叉点由交通信号和其他机器人车辆管理。在科罗拉多斯普林斯、CO、美国14个交叉点评估一个真实世界网络,我们通过车辆在交叉点的平均等候时间和在时间窗口内到达目的地的车辆数量来衡量交通效率(即吞吐量 ) 。 80%的RV渗透率将等候时间从6.17到5.09秒缩短到每500秒454辆汽车的超载量增加,超过了完全信号化的交叉点的基线。这些研究结果表明,整合强化学习控制大规模交通可以提高总体效率,并可能为未来的城市规划战略提供信息。
Article 38
Title@2025-07-26 (6): Homotopy-aware Multi-agent Navigation via Distributed Model Predictive Control
Title: Homotopy-aware Multi-agent Navigation via Distributed Model Predictive Control | Homotopy-aware Multi-Agent Navigation über verteilte Modell Predictive Control | 通过分布式模型预测控制,通过分布式预测控制进行多剂导航 2507.19860v1 |
Authors (4): Haoze Dong, Meng Guo, Chengyi He, Zhongkui Li
Multi-agent trajectory planning requires ensuring both safety and efficiency, yet deadlocks remain a significant challenge, especially in obstacle-dense environments. Such deadlocks frequently occur when multiple agents attempt to traverse the same long and narrow corridor simultaneously. To address this, we propose a novel distributed trajectory planning framework that bridges the gap between global path and local trajectory cooperation. At the global level, a homotopy-aware optimal path planning algorithm is proposed, which fully leverages the topological structure of the environment. A reference path is chosen from distinct homotopy classes by considering both its spatial and temporal properties, leading to improved coordination among agents globally. At the local level, a model predictive control-based trajectory optimization method is used to generate dynamically feasible and collision-free trajectories. Additionally, an online replanning strategy ensures its adaptability to dynamic environments. Simulations and experiments validate the effectiveness of our approach in mitigating deadlocks. Ablation studies demonstrate that by incorporating time-aware homotopic properties into the underlying global paths, our method can significantly reduce deadlocks and improve the average success rate from 4%-13% to over 90% in randomly generated dense scenarios.
多试剂轨迹规划既需要确保安全又需要效率,但僵局仍然是一项重大挑战,特别是在障碍环境中。当多种物剂试图同时穿越同一长窄走廊时,这种僵局经常发生。为了解决这个问题,我们提议了一个新的分布式轨迹规划框架,以弥合全球路径与当地轨迹合作之间的差距。在全球一级,提出了一个同质意识最佳路径规划算法,以充分利用环境的地形结构。一个参考路径是从不同的同质层中选择的,考虑到其空间和时间特性,从而导致改善全球物剂之间的协调。在地方一级,一种基于控制的模型轨迹优化方法被用来产生动态可行和无碰撞的轨迹。此外,一个在线再规划战略确保其适应动态环境。模拟和实验证实了我们在缓解僵局方面的做法的有效性。 模拟和实验研究表明,通过将有时识的同质特性纳入基本的全球路径,我们的方法可以大大减少僵局,并将平均成功率从4%-13%提高到90%以上。
Article 39
Title@2025-07-26 (6): VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets
Title: VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets | VAE-GAN-basierte Preismanipulation in koordinierten lokalen Energiemärkten | VAE-GAN 协调的地方能源市场价格操纵 2507.19844v1 |
Authors (6): Biswarup Mukherjee, Li Zhou, S. Gokul Krishnan, Milad Kabirifar, Subhash Lakshminarayana, Charalambos Konstantinou
This paper introduces a model for coordinating prosumers with heterogeneous distributed energy resources (DERs), participating in the local energy market (LEM) that interacts with the market-clearing entity. The proposed LEM scheme utilizes a data-driven, model-free reinforcement learning approach based on the multi-agent deep deterministic policy gradient (MADDPG) framework, enabling prosumers to make real-time decisions on whether to buy, sell, or refrain from any action while facilitating efficient coordination for optimal energy trading in a dynamic market. In addition, we investigate a price manipulation strategy using a variational auto encoder-generative adversarial network (VAE-GAN) model, which allows utilities to adjust price signals in a way that induces financial losses for the prosumers. Our results show that under adversarial pricing, heterogeneous prosumer groups, particularly those lacking generation capabilities, incur financial losses. The same outcome holds across LEMs of different sizes. As the market size increases, trading stabilizes and fairness improves through emergent cooperation among agents.
本文介绍了一种模式,用以与分散的多样化能源资源(DERs)进行协调,参与与市场清算实体互动的地方能源市场(LEM),拟议的LEM计划采用基于多剂深度确定型政策梯度(MADDPG)框架的数据驱动、无模型强化学习方法,使passer能够就是否购买、销售或不采取任何行动作出实时决定,同时促进在动态市场中进行最佳能源贸易的有效协调。此外,我们还调查了一种价格操纵战略,采用变式汽车编码-遗传对抗网络(VAE-GAN)模式,使公用事业能够调整价格信号,从而给计票人带来财政损失。我们的结果显示,在对抗性定价下,不同规模的多式的造价集团,尤其是那些缺乏发电能力的集团,都造成了财政损失。随着市场规模的扩大,交易稳定和公平性通过代理商之间的新兴合作而得到改善。
Article 40
Title@2025-07-26 (6): Moving Out: Physically-grounded Human-AI Collaboration
Title: Moving Out: Physically-grounded Human-AI Collaboration | Ausstieg: physikalisch begründete Mensch-AI-Kollaboration | 搬出:基于身体的人类 – – AI协作 2507.18623v2 |
Authors (5): Xuhui Kang, Sung-Wook Lee, Haolin Liu, Yuyan Wang, Yen-Ling Kuo
The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. In this paper, we introduce Moving Out, a new human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and maintaining consistent actions to move a big item around a corner. Using Moving Out, we designed two tasks and collected human-human interaction data to evaluate models’ abilities to adapt to diverse human behaviors and unseen physical attributes. To address the challenges in physical environments, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. Our experiments show that BASS outperforms state-of-the-art models in AI-AI and human-AI collaboration. The project page is available at https://live-robotics-uva.github.io/movingout_ai/.
能够适应环境中的物理行动和限制,对于装饰物剂(如机器人)有效与人类合作至关重要。这种有物理基础的人类-AI合作必须说明持续国家行动空间的日益复杂性和因物理限制造成的受限动态。在本论文中,我们引入了一个新的人类-AI合作基准,类似于受物理属性和限制影响的各种合作模式,例如将重物一起移动,并保持一致的行动,在角落周围移动一个大物品。我们通过搬出,设计了两项任务,并收集了人与人的互动数据,以评估模型适应不同人类行为和看不见的物理特征的能力。为了应对物理环境中的挑战,我们提出了一种新颖的方法,即BASS(行为增强、模拟和选择),以加强代理人的多样性和他们对行动结果的理解。我们的实验显示,BASS超越了AI和人类-AI合作中的状态-艺术模型。项目网页见https://live-robotics-uva.githubub.io/moving_a_a/。
Article 41
Title@2025-07-26 (6): Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation
Title: Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation | Assembly Your Crew: Automatisches Multi-Agenten-Kommunikationstopologie-Design über autoregressive Graphen-Generierung | 通过自动递减图形生成将您的组群组合成:自动多剂多剂通信地形设计 2507.18224v2 |
Authors (5): Shiyuan Li, Yixin Liu, Qingsong Wen, Chengqi Zhang, Shirui Pan
Multi-agent systems (MAS) based on large language models (LLMs) have emerged as a powerful solution for dealing with complex problems across diverse domains. The effectiveness of MAS is critically dependent on its collaboration topology, which has become a focal point for automated design research. However, existing approaches are fundamentally constrained by their reliance on a template graph modification paradigm with a predefined set of agents and hard-coded interaction structures, significantly limiting their adaptability to task-specific requirements. To address these limitations, we reframe MAS design as a conditional autoregressive graph generation task, where both the system composition and structure are designed jointly. We propose ARG-Designer, a novel autoregressive model that operationalizes this paradigm by constructing the collaboration graph from scratch. Conditioned on a natural language task query, ARG-Designer sequentially and dynamically determines the required number of agents, selects their appropriate roles from an extensible pool, and establishes the optimal communication links between them. This generative approach creates a customized topology in a flexible and extensible manner, precisely tailored to the unique demands of different tasks. Extensive experiments across six diverse benchmarks demonstrate that ARG-Designer not only achieves state-of-the-art performance but also enjoys significantly greater token efficiency and enhanced extensibility. The source code of ARG-Designer is available at https://github.com/Shiy-Li/ARG-Designer.
以大型语言模型为基础的多试剂系统(MAS)已经成为处理不同领域复杂问题的有力解决办法,而MAS的有效性则主要取决于其协作型态,而后者已成为自动化设计研究的协调中心,然而,现有办法基本上受到制约,因为它们依赖一个模板图修改模式,其中含有一套预先定义的代理和硬编码的互动结构,大大限制其适应特定任务要求的能力。为了解决这些限制,我们重新将MAS设计作为有条件的自动递增图形生成任务,其中系统构成和结构是联合设计的。我们建议了ARG-Deleter,这是一种新的自动递增模式,从零开始构建合作图案,使这一模式运作起来。在自然语言任务查询、ARG-Deder按顺序和动态确定所需数量,从一个可扩展的集合中选择其适当作用,并在它们之间建立最佳的通信联系。这种归正式方法以灵活和可扩展的方式创建了一种定制的表层,不完全适应不同任务的独特要求。在六种不同任务上进行广泛的跨比例实验,在六个不同层次上也具有更高的业绩基准。
Article 42
Title@2025-07-25 (5): Ultracoarse Equilibria and Ordinal-Folding Dynamics in Operator-Algebraic Models of Infinite Multi-Agent Games
Title: Ultracoarse Equilibria and Ordinal-Folding Dynamics in Operator-Algebraic Models of Infinite Multi-Agent Games | Ultracoarse Equilibria und Ordinal-Folding-Dynamik in Operator-Algebraische Modelle von unendlichen Multi-Agent-Spiele | 无限多生运动会操作者-代数模型中的超粗平衡和奥地平流和奥地硬化动态 2507.19694v1 |
Authors (4): Faruk Alpay, Hamdi Alakkad, Bugra Kilictas, Taylan Alpay
We develop an operator algebraic framework for infinite games with a continuum of agents and prove that regret based learning dynamics governed by a noncommutative continuity equation converge to a unique quantal response equilibrium under mild regularity assumptions. The framework unifies functional analysis, coarse geometry and game theory by assigning to every game a von Neumann algebra that represents collective strategy evolution. A reflective regret operator within this algebra drives the flow of strategy distributions and its fixed point characterises equilibrium. We introduce the ordinal folding index, a computable ordinal valued metric that measures the self referential depth of the dynamics, and show that it bounds the transfinite time needed for convergence, collapsing to zero on coarsely amenable networks. The theory yields new invariant subalgebra rigidity results, establishes existence and uniqueness of envy free and maximin share allocations in continuum economies, and links analytic properties of regret flows with empirical stability phenomena in large language models. These contributions supply a rigorous mathematical foundation for large scale multi agent systems and demonstrate the utility of ordinal metrics for equilibrium selection.
我们为无限游戏开发了一个操作者代数框架,其中含有一系列物剂,并证明由非混合连续性方程式制约的基于遗憾的学习动态,在温和的规律假设下,会与独特的二次反应平衡相汇而成。这个框架将功能分析、粗度几何和游戏理论统一起来,为每个游戏分配了代表集体战略演变的冯纽曼代数。这个代数中的反射式算数操作者将战略分布流动及其固定点特征平衡驱动。我们引入了折叠指数,这是一个可比较的或非常规的有价值指标,测量动态的自优深度,并表明它将交汇所需的半成时时间捆绑在一起,在粗度易变的网络上达到零。这个理论产生新的异性亚代数僵硬性结果,确定了连续经济中嫉妒自由的和最大份额分配的存在和独特性,并将遗憾流的分析性特性与大型语言模型中的经验稳定现象联系起来。这些贡献为大规模多剂系统提供了严格的数学基础,并展示了或非量度度度度指标对平衡选择的效用。
Article 43
Title@2025-07-25 (5): Hypergames: Modeling Misaligned Perceptions and Nested Beliefs for Multi-agent Systems
Title: Hypergames: Modeling Misaligned Perceptions and Nested Beliefs for Multi-agent Systems | Hypergames: Modellierung falscher Wahrnehmungen und verschachtelter Überzeugungen für Multi-Agent-Systeme | 超游戏:模拟多试剂系统的错误观念和信仰 2507.19593v1 |
Authors (3): Vince Trencsenyi, Agnieszka Mensfelt, Kostas Stathis
Classical game-theoretic models typically assume rational agents, complete information, and common knowledge of payoffs - assumptions that are often violated in real-world MAS characterized by uncertainty, misaligned perceptions, and nested beliefs. To overcome these limitations, researchers have proposed extensions that incorporate models of cognitive constraints, subjective beliefs, and heterogeneous reasoning. Among these, hypergame theory extends the classical paradigm by explicitly modeling agents’ subjective perceptions of the strategic scenario, known as perceptual games, in which agents may hold divergent beliefs about the structure, payoffs, or available actions. We present a systematic review of agent-compatible applications of hypergame theory, examining how its descriptive capabilities have been adapted to dynamic and interactive MAS contexts. We analyze 44 selected studies from cybersecurity, robotics, social simulation, communications, and general game-theoretic modeling. Building on a formal introduction to hypergame theory and its two major extensions - hierarchical hypergames and HNF - we develop agent-compatibility criteria and an agent-based classification framework to assess integration patterns and practical applicability. Our analysis reveals prevailing tendencies, including the prevalence of hierarchical and graph-based models in deceptive reasoning and the simplification of extensive theoretical frameworks in practical applications. We identify structural gaps, including the limited adoption of HNF-based models, the lack of formal hypergame languages, and unexplored opportunities for modeling human-agent and agent-agent misalignment. By synthesizing trends, challenges, and open research directions, this review provides a new roadmap for applying hypergame theory to enhance the realism and effectiveness of strategic modeling in dynamic multi-agent environments.
典型的游戏理论模式通常假定理性的动力、完整的信息和共同的回报知识,这些假设往往在现实世界的MAS中被违反,其特点是不确定性、错误的观念和嵌入的信仰。为了克服这些限制,研究人员提议了包含认知限制、主观信仰和各种推理模型的扩展。在这些限制中,超游戏理论扩展了经典模式,明确模拟剂对战略情景的主观认识,称为概念游戏,其中代理剂可能对结构、付款或现有行动持有不同的看法,从而可能对结构、付款或基于代理剂的分类框架持有不同的看法。我们系统地审查超游戏理论的可与代理剂兼容的应用,研究其描述性能力是如何适应动态和互动的MAS环境的。我们分析了网络、机器人、社会模拟、通信和一般游戏理论模型模型的44项选定研究。 在超游戏理论理论理论理论理论及其两大扩展的基础上,我们开发了代理商的兼容性模型标准和基于代理剂的分类框架,以评估整合模式和实际适用性。我们的分析揭示了当前的趋势,包括基于等级和图表的模型的模型的流行性,通过动态的动态理论理论分析,在理解性理论推理学和简化中,我们查明了在结构推理学和简化中采用不甚深的模型方面的机会。
Article 44
Title@2025-07-25 (5): MCP4EDA: LLM-Powered Model Context Protocol RTL-to-GDSII Automation with Backend Aware Synthesis Optimization
Title: MCP4EDA: LLM-Powered Model Context Protocol RTL-to-GDSII Automation with Backend Aware Synthesis Optimization | MCP4EDA: LLM-Powered Model Context Protocol RTL-to-GDSII Automation mit Backend Aware Syntheseoptimierung | MCP4EDA: LLM 授权示范背景议定书RTL-GDSII 2507.19570v1 |
Authors (6): Yiting Wang, Wanghao Ye, Yexiao He, Yiran Chen, Gang Qu, Ang Li
This paper presents MCP4EDA, the first Model Context Protocol server that enables Large Language Models (LLMs) to control and optimize the complete open-source RTL-to-GDSII design flow through natural language interaction. The system integrates Yosys synthesis, Icarus Verilog simulation, OpenLane place-and-route, GTKWave analysis, and KLayout visualization into a unified LLM-accessible interface, enabling designers to execute complex multi-tool EDA workflows conversationally via AI assistants such as Claude Desktop and Cursor IDE. The principal contribution is a backend-aware synthesis optimization methodology wherein LLMs analyze actual post-layout timing, power, and area metrics from OpenLane results to iteratively refine synthesis TCL scripts, establishing a closed-loop optimization system that bridges the traditional gap between synthesis estimates and physical implementation reality. In contrast to conventional flows that rely on wire-load models, this methodology leverages real backend performance data to guide synthesis parameter tuning, optimization sequence selection, and constraint refinement, with the LLM functioning as an intelligent design space exploration agent. Experimental evaluation on representative digital designs demonstrates 15-30% improvements in timing closure and 10-20% area reduction compared to default synthesis flows, establishing MCP4EDA as the first practical LLM-controlled end-to-end open-source EDA automation system. The code and demo are avaiable at: http://www.agent4eda.com/
本文件介绍MCP4EDA,这是第一个使大语言模型(LLMS)能够通过自然语言互动控制和优化完整的开放源码 RTL-到GDSII设计流程的模拟背景协议服务器。该系统整合了Yosys合成、Icarus Verilog模拟、OpenLane place-and-route、GTKWave分析、KLayout可视化成一个统一的LLMM-可访问界面,使设计师能够通过诸如Claude桌面和Cursor IDE等AI助理进行复杂的多工具 EDA工作流程对话。主要贡献是一个后端综合优化合成优化方法,其中LMS分析OpenLane结果中的实际延期后时间、权力和地区指标,以迭接方式完善TCLOCR脚本,建立一个闭式优化系统与实际操作的LMM-S-S-SLA系统之间传统差距,与依赖线载模型的常规流动形成对照,这种方法利用真正的后端性业绩数据来指导合成参数的调整、优化序列选择选择和制约性改进,LM-20-20号合成系统作为智能的透明系统,在智能的LMA-ralde-ralde-ration-ral-rental 流中运行中运行中,测试中,将第15A 将自动演示流中,将自动演示的缩缩缩缩缩算。实验性分析。
Article 45
Title@2025-07-25 (5): Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges
Title: Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges | Integration von LLM in agentenbasierte Sozialsimulation: Chancen und Herausforderungen | 将LLM纳入代理社会模拟:机会与挑战 2507.19364v1 |
Authors (6): Patrick Taillandier, Jean Daniel Zucker, Arnaud Grignard, Benoit Gaudou, Nghi Quang Huynh, Alexis Drogoul
This position paper examines the use of Large Language Models (LLMs) in social simulation, analyzing both their potential and their limitations from a computational social science perspective. The first part reviews recent findings on the ability of LLMs to replicate key aspects of human cognition, including Theory of Mind reasoning and social inference, while also highlighting significant limitations such as cognitive biases, lack of true understanding, and inconsistencies in behavior. The second part surveys emerging applications of LLMs in multi-agent simulation frameworks, focusing on system architectures, scale, and validation strategies. Notable projects such as Generative Agents (Smallville) and AgentSociety are discussed in terms of their design choices, empirical grounding, and methodological innovations. Particular attention is given to the challenges of behavioral fidelity, calibration, and reproducibility in large-scale LLM-driven simulations. The final section distinguishes between contexts where LLMs, like other black-box systems, offer direct value-such as interactive simulations and serious games-and those where their use is more problematic, notably in explanatory or predictive modeling. The paper concludes by advocating for hybrid approaches that integrate LLMs into traditional agent-based modeling platforms (GAMA, Netlogo, etc), enabling modelers to combine the expressive flexibility of language-based reasoning with the transparency and analytical rigor of classical rule-based systems.
本立场文件审查了在社会模拟中使用大语言模型(LLMs)的情况,从计算社会科学的角度分析了这些模型的潜力和局限性。第一部分审查了最近关于LLMs复制人类认知关键方面的能力的调查结果,包括思维推理理论和社会推论,同时也强调了认知偏差、缺乏真正理解和行为不一致等重大局限性。第二部分调查了LLMs在多试模拟框架中的新应用,重点是系统结构、规模和验证战略。一些著名的项目,如创用剂(Smallville)和代理社,从设计选择、经验基础化和方法创新的角度来讨论这些项目。尤其注意行为忠诚、校正和在大规模LMMM驱动模拟中重新展示等挑战。最后一节区分了LMs在多试模拟框架中的新应用,侧重于系统、互动模拟和严肃的游戏和那些其使用更成问题的项目,主要是在解释或预测模型方面。文件的结论是,通过倡导混合分析性方法,将LMMMA规则的透明性模型和传统分析性模型相结合。
Article 46
Title@2025-07-25 (5): Exploring 6G Potential for Industrial Digital Twinning and Swarm Intelligence in Obstacle-Rich Environments
Title: Exploring 6G Potential for Industrial Digital Twinning and Swarm Intelligence in Obstacle-Rich Environments | 6G-Potenzial für industrielle digitale Twinnings und Schwarmintelligenz in Hindernis-Rich-Umgebungen erkunden | 探索6G潜力,以工业数字结对和摇篮情报在奥斯塔克 – – 里希环境方面的潜力 2406.19930v3 |
Authors (5): Siyu Yuan, Khurshid Alam, Bin Han, Dennis Krummacker, Hans D. Schotten
With the advent of Sixth Generation (6G) technology, the demand for efficient and intelligent systems in industrial applications has surged, driving the need for advanced solutions in target localization. Utilizing swarm robots to locate unknown targets involves navigating increasingly complex environments. digital twin (DT) offers a robust solution by creating a virtual replica of the physical world, which enhances the swarm’s navigation capabilities. Our framework leverages DT and integrates swarm intelligence (SI) to store physical map information in the cloud, enabling robots to efficiently locate unknown targets. The simulation results demonstrate that the DT framework, augmented by SI, significantly improves target location efficiency in obstacle-rich environments compared to traditional methods. This research underscores the potential of combining DT and swarm intelligence to advance the field of robotic navigation and target localization in complex industrial settings.
随着第六代(6G)技术的出现,对工业应用中高效和智能系统的需求激增,导致在目标定位方面需要先进的解决方案。利用群机器人定位未知目标涉及探索日益复杂的环境。数字双胞胎(DT)提供了一个强有力的解决方案,通过创建一个虚拟复制的物理世界,增强群温导航能力。我们的框架利用DT和将群情智能(SI)结合到云中储存物理地图信息,使机器人能够有效地定位未知目标。模拟结果表明,由SI扩大的DT框架与传统方法相比,大大提高了障碍环境的目标定位效率。这一研究强调了将DT和群情智能结合到复杂的工业环境中推进机器人导航领域和目标定位的潜力。
Article 47
Title@2025-07-25 (5): ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination
Title: ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination | ReCoDe: Verstärktes Learning-basiertes dynamisches Constraint-Design für Multi-Agent-Koordination | ReCode:加强以学习为基础的强化学习,为多机构协调设计动态制约 2507.19151v1 |
Authors (6): Michael Amir, Guang Yang, Zhan Gao, Keisuke Okumura, Heedo Woo, Amanda Prorok
Constraint-based optimization is a cornerstone of robotics, enabling the design of controllers that reliably encode task and safety requirements such as collision avoidance or formation adherence. However, handcrafted constraints can fail in multi-agent settings that demand complex coordination. We introduce ReCoDe–Reinforcement-based Constraint Design–a decentralized, hybrid framework that merges the reliability of optimization-based controllers with the adaptability of multi-agent reinforcement learning. Rather than discarding expert controllers, ReCoDe improves them by learning additional, dynamic constraints that capture subtler behaviors, for example, by constraining agent movements to prevent congestion in cluttered scenarios. Through local communication, agents collectively constrain their allowed actions to coordinate more effectively under changing conditions. In this work, we focus on applications of ReCoDe to multi-agent navigation tasks requiring intricate, context-based movements and consensus, where we show that it outperforms purely handcrafted controllers, other hybrid approaches, and standard MARL baselines. We give empirical (real robot) and theoretical evidence that retaining a user-defined controller, even when it is imperfect, is more efficient than learning from scratch, especially because ReCoDe can dynamically change the degree to which it relies on this controller.
以限制为基础的优化是机器人的基石,使设计控制器能够可靠地编码任务和安全要求,如避免碰撞或形成遵守碰撞等任务和安全要求。然而,手工制造的制约在需要复杂协调的多剂环境中可能失败。我们引入了基于RECode-加强的封闭式设计-一个分散化的混合框架,将基于优化的控制器的可靠性与多试剂强化学习的适应性结合起来。ReCode不是丢弃专家控制器,而是通过学习更多、动态的制约因素来改进它们,这些制约因素能够捕捉微妙的行为,例如通过限制代理器的移动来防止混乱情形中的拥堵。通过本地通信,代理器集体限制其在不断变化的条件下更有效协调的允许行动。在这项工作中,我们侧重于将RECode应用到需要复杂、基于背景的移动和共识的多剂导航任务中。我们展示它超越纯手工制造的控制器、其他混合方法和标准 MARL 基线。我们提供经验(真实的机器人)和理论证据,说明保留用户定义的控制器,即使它不完善,也可以依赖这种控制器的动态,特别是Con totracl。
Article 48
Title@2025-07-25 (5): Heterogeneous Risk Management Using a Multi-Agent Framework for Supply Chain Disruption Response
Title: Heterogeneous Risk Management Using a Multi-Agent Framework for Supply Chain Disruption Response | Heterogenes Risikomanagement mit Hilfe eines Multi-Agenten-Rahmens für die Reaktion auf Störungen der Lieferkette | 利用多机构框架应对供应链干扰的多机构应对框架进行不同不同的风险管理 2507.19049v1 |
Authors (5): Mingjie Bi, Juan-Alberto Estrada-Garcia, Dawn M. Tilbury, Siqian Shen, Kira Barton
In the highly complex and stochastic global, supply chain environments, local enterprise agents seek distributed and dynamic strategies for agile responses to disruptions. Existing literature explores both centralized and distributed approaches, while most work neglects temporal dynamics and the heterogeneity of the risk management of individual agents. To address this gap, this letter presents a heterogeneous risk management mechanism to incorporate uncertainties and risk attitudes into agent communication and decision-making strategy. Hence, this approach empowers enterprises to handle disruptions in stochastic environments in a distributed way, and in particular in the context of multi-agent control and management. Through a simulated case study, we showcase the feasibility and effectiveness of the proposed approach under stochastic settings and how the decision of disruption responses changes when agents hold various risk attitudes.
在高度复杂和混乱的全球供应链环境中,地方企业代理商寻求分散和动态的战略,以灵活应对干扰。现有文献探讨了集中和分散的办法,而大多数工作忽略了时间动态和个别代理商风险管理的异质性。为弥补这一差距,本信提出了一个多种风险管理机制,将不确定性和风险态度纳入代理商通信和决策战略。因此,这种方法使企业能够以分散的方式处理随机环境的干扰,特别是在多剂控制和管理方面。我们通过模拟案例研究展示了在随机环境中拟议办法的可行性和有效性,以及在代理人持有各种风险态度时,如何改变对干扰反应的决定。
Article 49
Title@2025-07-25 (5): Dynamic distributed decision-making for resilient resource reallocation in disrupted manufacturing systems
Title: Dynamic distributed decision-making for resilient resource reallocation in disrupted manufacturing systems | Dynamisch verteilte Entscheidungsfindung für widerstandsfähige Ressourcenumverteilung in gestörten Fertigungssystemen | 在被破坏的制造系统内进行有弹性资源重新分配的动态分配决策的动态分布式决策 2507.19043v1 |
Authors (4): Mingjie Bi, Ilya Kovalenko, Dawn M. Tilbury, Kira Barton
The COVID-19 pandemic brings many unexpected disruptions, such as frequently shifting markets and limited human workforce, to manufacturers. To stay competitive, flexible and real-time manufacturing decision-making strategies are needed to deal with such highly dynamic manufacturing environments. One essential problem is dynamic resource allocation to complete production tasks, especially when a resource disruption (e.g., machine breakdown) occurs. Though multi-agent methods have been proposed to solve the problem in a flexible and agile manner, the agent internal decision-making process and resource uncertainties have rarely been studied. This work introduces a model-based resource agent (RA) architecture that enables effective agent coordination and dynamic agent decision-making. Based on the RA architecture, a rescheduling strategy that incorporates risk assessment via a clustering agent coordination strategy is also proposed. A simulation-based case study is implemented to demonstrate dynamic rescheduling using the proposed multi-agent framework. The results show that the proposed method reduces the computational efforts while losing some throughput optimality compared to the centralized method. Furthermore, the case study illustrates that incorporating risk assessment into rescheduling decision-making improves the throughput.
COVID-19大流行给制造商带来了许多意想不到的干扰,例如市场经常变化和人力有限,制造厂家需要保持竞争性、灵活和实时的制造业决策战略,才能应对这种高度动态的制造业环境。一个基本的问题是,为完成生产任务,特别是在出现资源中断(例如机器故障)时,进行动态资源分配;虽然提出了多种试剂方法,以灵活灵活的方式解决问题,但很少研究代理人的内部决策过程和资源不确定性。这项工作采用了基于模型的资源代理结构,以便能够进行有效的代理协调和动态代理决策。在RA结构的基础上,还提议了一项重组战略,通过集群代理协调战略纳入风险评估。还进行了模拟案例研究,以证明利用拟议的多试剂框架进行动态的重新安排。结果显示,拟议的方法减少了计算工作,同时与集中方法相比,失去了一些吞力的最佳性。此外,案例研究表明,将风险评估纳入调整决策的过渡性会改善吞吐量。
Article 50
Title@2025-07-25 (5): A Distributed Approach for Agile Supply Chain Decision-Making Based on Network Attributes
Title: A Distributed Approach for Agile Supply Chain Decision-Making Based on Network Attributes | Ein verteilter Ansatz für agile Supply Chain Entscheidungsfindung auf der Grundlage von Netzwerkattributen | 基于网络属性的 “ 危险供应链决策分配办法 “ 2507.19038v1 |
Authors (4): Mingjie Bi, Dawn M. Tilbury, Siqian Shen, Kira Barton
In recent years, the frequent occurrence of disruptions has had a negative impact on global supply chains. To stay competitive, enterprises strive to remain agile through the implementation of efficient and effective decision-making strategies in reaction to disruptions. A significant effort has been made to develop these agile disruption mitigation approaches, leveraging both centralized and distributed decision-making strategies. Though trade-offs of centralized and distributed approaches have been analyzed in existing studies, no related work has been found on understanding supply chain performance based on the network attributes of the disrupted supply chain entities. In this paper, we characterize supply chains from a capability and network topological perspective and investigate the use of a distributed decision-making approach based on classical multi-agent frameworks. The performance of the distributed framework is evaluated through a comprehensive case study that investigates the performance of the supply chain as a function of the network structure and agent attributes within the network in the presence of a disruption. Comparison to a centralized decision-making approach highlights trade-offs between performance, computation time, and network communication based on the decision-making strategy and network architecture. Practitioners can use the outcomes of our studies to design response strategies based on agent capabilities, network attributes, and desired supply chain performance.
近年来,经常发生的中断对全球供应链产生了负面影响。为了保持竞争力,企业努力通过实施高效和有效的决策战略来保持灵活,以应对中断。已经作出重大努力,制定这些灵活的干扰缓解办法,利用集中和分散的决策战略。虽然在现有研究中分析了集中和分散做法的权衡,但并没有发现根据中断的供应链实体的网络属性来理解供应链绩效的相关工作。本文从能力和网络地形角度对供应链进行定性,并调查使用基于传统多剂框架的分布式决策方法的情况。分布式框架的绩效是通过一项综合案例研究进行评估的,该案例研究调查供应链作为网络结构和代理人功能的运作情况,并在出现中断的情况下调查网络内部的特征。与集中决策方法相比,突出基于决策战略和网络架构的绩效、计算时间和网络通信之间的权衡。从业者可以利用我们研究结果设计基于代理能力、网络属性和预期供应链绩效的应对战略。
Article 51
Title@2025-07-25 (5): Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies
Title: Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies | Mixed-Reality Digital Twins: Nutzung der physischen und virtuellen Welten für Hybrid Sim2Real Transition von Multi-Agent Verstärkungs-Learning-Politiken | 混合-现实数字双对:利用物理和虚拟世界促进混合的Sim2重新过渡多机构强化学习政策 2403.10996v7 |
Authors (3): Chinmay Vilas Samak, Tanmay Vilas Samak, Venkat Narayan Krovi
Multi-agent reinforcement learning (MARL) for cyber-physical vehicle systems usually requires a significantly long training time due to their inherent complexity. Furthermore, deploying the trained policies in the real world demands a feature-rich environment along with multiple physical embodied agents, which may not be feasible due to monetary, physical, energy, or safety constraints. This work seeks to address these pain points by presenting a mixed-reality (MR) digital twin (DT) framework capable of: (i) boosting training speeds by selectively scaling parallelized simulation workloads on-demand, and (ii) immersing the MARL policies across hybrid simulation-to-reality (sim2real) experiments. The viability and performance of the proposed framework are highlighted through two representative use cases, which cover cooperative as well as competitive classes of MARL problems. We study the effect of: (i) agent and environment parallelization on training time, and (ii) systematic domain randomization on zero-shot sim2real transfer, across both case studies. Results indicate up to 76.3% reduction in training time with the proposed parallelization scheme and sim2real gap as low as 2.9% using the proposed deployment method.
由于网络物理车辆系统的多剂强化学习(MARL)通常需要相当长的培训时间,因为其内在的复杂性,因此,在现实世界中部署经过训练的政策需要具有丰富特点的环境以及多种物理成形剂,由于货币、物理、能源或安全方面的限制,这可能不可行。这项工作力求解决这些痛苦点,办法是提出一个混合现实(MR)数字双胞胎(DT)框架,能够:(一) 通过有选择地根据需求扩大平行模拟工作量,提高培训速度;(二) 在混合模拟到现实(im2real)试验中浸泡出MARL政策,通过两个有代表性的使用案例来强调拟议框架的可行性和绩效,这两个案例涉及MARL问题的合作和竞争性类别。我们研究:(一) 代理和环境对培训时间的平行效应,以及(二) 两种案例研究对零点成双向的模拟转移的系统性域随机化效果。结果显示,与拟议的平行计划的培训时间减少76.3%,与使用拟议部署方法的轻度为2.9%的模拟差距,低于2.9%。
Article 52
Title@2025-07-25 (5): From Cloud-Native to Trust-Native: A Protocol for Verifiable Multi-Agent Systems
Title: From Cloud-Native to Trust-Native: A Protocol for Verifiable Multi-Agent Systems | Von Cloud-Native zu Trust-Native: Ein Protokoll für überprüfbare Multi-Agent-Systeme | 从云源向信任的转移:可核证的多机构系统议定书 2507.22077v1 |
Authors (1): Muyang Li
As autonomous agents powered by large language models (LLMs) proliferate in high-stakes domains – from pharmaceuticals to legal workflows – the challenge is no longer just intelligence, but verifiability. We introduce TrustTrack, a protocol that embeds structural guarantees – verifiable identity, policy commitments, and tamper-resistant behavioral logs – directly into agent infrastructure. This enables a new systems paradigm: trust-native autonomy. By treating compliance as a design constraint rather than post-hoc oversight, TrustTrack reframes how intelligent agents operate across organizations and jurisdictions. We present the protocol design, system requirements, and use cases in regulated domains such as pharmaceutical R&D, legal automation, and AI-native collaboration. We argue that the Cloud -> AI -> Agent -> Trust transition represents the next architectural layer for autonomous systems.
由于由大型语言模型(LLMS)驱动的自主代理商在从制药到法律工作流程等高接触领域扩散,挑战不再仅仅是情报,而是可核查的。我们引入了信任跟踪,这是将结构性保障 – – 可核实的身份、政策承诺和防腐性行为日志 – – 包含在结构保障 – – 可核实的身份、政策承诺和防腐性行为日志 – – 直接纳入代理基础设施中的协议。这促成了一种新的系统范式:信任的自主性。通过将遵守视为设计上的制约,而不是控制后的监管,信任跟踪系统重新界定了智能代理商在各组织和管辖区之间的运作方式。我们介绍了协议设计、系统要求,以及在制药研发、法律自动化和AI-Native合作等监管领域使用案例。我们争论说,云 - > AI -代理 - > 信托过渡是自治系统的下一个建筑层。
Article 53
Title@2025-07-25 (5): Adaptive Cluster Collaborativeness Boosts LLMs Medical Decision Support Capacity
Title: Adaptive Cluster Collaborativeness Boosts LLMs Medical Decision Support Capacity | Adaptive Cluster Zusammenarbeit steigert LLMs medizinische Entscheidungsunterstützung Kapazität | LLM 医疗决策支助能力 2507.21159v1 |
Authors (4): Zhihao Peng, Liuxin Bao, Shengyuan Liu, Yixuan Yuan
The collaborativeness of large language models (LLMs) has proven effective in natural language processing systems, holding considerable promise for healthcare development. However, it lacks explicit component selection rules, necessitating human intervention or clinical-specific validation. Moreover, existing architectures heavily rely on a predefined LLM cluster, where partial LLMs underperform in medical decision support scenarios, invalidating the collaborativeness of LLMs. To this end, we propose an adaptive cluster collaborativeness methodology involving self-diversity and cross-consistency maximization mechanisms to boost LLMs medical decision support capacity. For the self-diversity, we calculate the fuzzy matching value of pairwise outputs within an LLM as its self-diversity value, subsequently prioritizing LLMs with high self-diversity values as cluster components in a training-free manner. For the cross-consistency, we first measure cross-consistency values between the LLM with the highest self-diversity value and others, and then gradually mask out the LLM having the lowest cross-consistency value to eliminate the potential inconsistent output during the collaborative propagation. Extensive experiments on two specialized medical datasets, NEJMQA and MMLU-Pro-health, demonstrate the effectiveness of our method across physician-oriented specialties. For example, on NEJMQA, our method achieves the accuracy rate up to the publicly official passing score across all disciplines, especially achieving ACC of 65.47\% compared to the 56.12\% achieved by GPT-4 on the Obstetrics and Gynecology discipline.
大型语言模型(LLMS)的协作在自然语言处理系统中证明是有效的,对保健发展有着相当大的希望,然而,它缺乏明确的组成部分选择规则,需要人的干预或临床鉴定;此外,现有结构严重依赖预先定义的LLM集群,其中部分LMS在医疗决策支助方案方面表现不佳,使LLMS的协作无效。为此,我们建议采用适应性的集群协作方法,涉及自我多样化和交叉一致性最大化机制,以提高LLMS的医疗决策支持能力。关于自我多样化,我们计算LLM内配对产出的模糊匹配值,作为其自我多样化价值,随后以无培训方式将具有高度多样性价值的LLMMS作为集群组成部分。对于相互一致的情况,我们首先衡量具有最高自我多样化价值的LMMR和其他人之间的相互一致价值,然后逐渐掩盖LMMLMR具有最低的交叉一致性价值,以消除潜在的不一致产出。关于LLMMM的双重医学数据集,NEMQA和MMLU方法的准确性,通过我们的专业方法实现整个GMLMCR方法的成本效益, 在整个GMCRCRCR方法中实现所有的成绩。
Article 54
Title@2025-07-25 (5): TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework with Group-Based Monte Carlo Tree Search
Title: TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework with Group-Based Monte Carlo Tree Search | TrafficMCTS: Ein Closed-Loop Traffic Flow Generation Framework mit gruppenbasierter Monte Carlo Tree Suche | 交通流量监测:一个闭路交通流量生成框架,并配有基于集团的蒙特卡洛树搜索 2308.12797v3 |
Authors (6): Ze Fu, Licheng Wen, Pinlong Cai, Daocheng Fu, Song Mao, Botian Shi
Traffic flow simulation within the domain of intelligent transportation systems is garnering significant attention, and generating realistic, diverse, and human-like traffic patterns presents critical challenges that must be addressed. Current approaches often hinge on predefined driver models, objective optimization, or reliance on pre-recorded driving datasets, imposing limitations on their scalability, versatility, and adaptability. In this paper, we introduce TrafficMCTS, an innovative framework that harnesses the synergy of group-based Monte Carlo tree search (MCTS) and Social Value Orientation (SVO) to engender a multifaceted traffic flow with varying driving styles and cooperative tendencies. Anchored by a closed-loop architecture, our framework enables vehicles to dynamically adapt to their environment in real time, and ensure feasible collision-free trajectories. Through comprehensive comparisons with state-of-the-art methods, we illuminate the advantages of our approach in terms of computational efficiency, planning success rate, intention completion time, and diversity metrics. Besides, we simulate multiple scenarios to illustrate the effectiveness of the proposed framework and highlight its ability to induce diverse social behaviors within the traffic flow. Finally, we validate the scalability of TrafficMCTS by demonstrating its capability to efficiently simulate diverse traffic scenarios involving numerous interacting vehicles within a complex road network, capturing the intricate dynamics of human-like driving behaviors.
智能运输系统范围内的交通流量模拟正在引起人们的极大关注,并产生了现实的、多样化的和人性化的交通模式,提出了必须解决的重大挑战。目前的方法往往取决于预先定义的驱动模型、客观的优化,或依赖预先记录的驾驶数据集,限制其可缩放性、多功能性和适应性。在本文件中,我们引入了交通监控系统,这是一个创新框架,利用基于团体的蒙特卡洛树搜索(MCTS)和社会价值导向(SVO)的协同作用,形成多方面的交通流动,其驾驶风格和合作倾向各不相同。在封闭式结构的推动下,我们的框架使车辆能够动态地适应其实时环境,并确保可行的无碰撞轨道。通过与最新方法的全面比较,我们展示了我们在计算效率、规划成功率、意图完成时间和多样性度等方面的做法的优势。此外,我们模拟了多种假设情景,以说明拟议框架的有效性,并突出其在交通流动中诱导出不同社会行为的能力。最后,我们验证了车辆动态的动态性变化性模型,通过展示了多种复杂的交通动态机动性车辆的复杂机动性动态。
Article 55
Title@2025-07-25 (5): Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise
Title: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise | Individueller Intrinsischer Lohn im Mehr-Agenten-Verstärkungs-Lernen durch Einbeziehung allgemeiner menschlicher Expertise | 通过纳入通用的人类专门知识,学习多机构加强学习中的个人内在奖赏 2507.18867v1 |
Authors (4): Xuefei Wu, Xiao Yin, Yuanyang Zhu, Chunlin Chen
Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward, especially in environments with sparse rewards. A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration. However, individual rewards generally rely on manually engineered shaping-reward functions that lack high-order intelligence, thus it behaves ineffectively than humans regarding learning and generalization in complex problems. To tackle these issues, we combine the above two paradigms and propose a novel framework, LIGHT (Learning Individual Intrinsic reward via Incorporating Generalized Human experTise), which can integrate human knowledge into MARL algorithms in an end-to-end manner. LIGHT guides each agent to avoid unnecessary exploration by considering both individual action distribution and human expertise preference distribution. Then, LIGHT designs individual intrinsic rewards for each agent based on actionable representational transformation relevant to Q-learning so that the agents align their action preferences with the human expertise while maximizing the joint action value. Experimental results demonstrate the superiority of our method over representative baselines regarding performance and better knowledge reusability across different sparse-reward tasks on challenging scenarios.
在多试剂强化学习(MARL)方面进行有效的探索,在只获得团队奖励时是一个具有挑战性的问题,特别是在报酬稀少的环境中。缓解这一问题的一个有力方法是设计密集的个人奖励,以引导代理人进行高效的勘探。然而,个人奖励一般依赖人工设计的成型奖励功能,缺乏高阶智能,因此在学习和概括复杂问题方面,其行为不力于人。为了解决这些问题,我们结合了上述两个范例,并提出了一个新的框架,即Light(通过采用通用的人类出品技术来学习个人内在奖励),它可以以端到端的方式将人类知识纳入MARL算法中。光线引导每个代理人避免不必要的探索,既考虑个别行动分配,又考虑人类专长偏好分配。然后,根据与Q学习相关的可操作的代表性转变,为每个代理人设计个人内在的奖赏,以便代理人的行动偏好与人类专门知识相一致,同时尽量扩大联合行动的价值。实验结果表明,我们的方法优于具有代表性的基线,不能在具有挑战性的不同微小的情景上改进知识的可重复性。
Article 56
Title@2025-07-24 (4): Toward Super Agent System with Hybrid AI Routers
Title: Toward Super Agent System with Hybrid AI Routers | Auf dem Weg zum Super Agent System mit Hybrid-KI Routern | 向超级代理系统过渡 2504.10519v2 |
Authors (8): Yuhang Yao, Haixin Wang, Yibo Chen, Jiawen Wang, Min Chang Jordan Ren, Bosheng Ding, Salman Avestimehr, Chaoyang He
AI Agents powered by Large Language Models are transforming the world through enormous applications. A super agent has the potential to fulfill diverse user needs, such as summarization, coding, and research, by accurately understanding user intent and leveraging the appropriate tools to solve tasks. However, to make such an agent viable for real-world deployment and accessible at scale, significant optimizations are required to ensure high efficiency and low cost. This position paper presents a design of the Super Agent System powered by the hybrid AI routers. Upon receiving a user prompt, the system first detects the intent of the user, then routes the request to specialized task agents with the necessary tools or automatically generates agentic workflows. In practice, most applications directly serve as AI assistants on edge devices such as phones and robots. As different language models vary in capability and cloud-based models often entail high computational costs, latency, and privacy concerns, we then explore the hybrid mode where the router dynamically selects between local and cloud models based on task complexity. Finally, we introduce the blueprint of an on-device super agent enhanced with cloud. With advances in multi-modality models and edge hardware, we envision that most computations can be handled locally, with cloud collaboration only as needed. Such architecture paves the way for super agents to be seamlessly integrated into everyday life in the near future.
由大语言模型驱动的 AI 代理机构正在通过巨大的应用来改变世界。 超级代理机构有潜力满足不同的用户需求, 如总化、编码和研究, 准确理解用户意图, 并利用适当的工具解决任务。 但是, 要使这种代理机构在现实世界部署和规模上可以使用, 需要大幅优化以确保高效和低成本。 此位置文件展示了由混合 AI 路由混合路由 AI 路由器驱动的超级代理系统的设计。 接收用户迅速时, 系统首先检测用户的意向, 然后用必要的工具将请求发送给专门的任务代理机构或自动生成代理工作流程。 在实践中, 大多数应用都直接在诸如电话和机器人等边缘设备上充当AI 助手。 由于不同语言模式在能力和基于云的模型上存在差异, 通常需要很高的计算成本、 粘度和隐私问题。 我们然后探索一种混合模式, 即路由路由路由器根据任务的复杂性动态选择本地和云型模式。 最后, 我们引入一个由云层增强的超级代理商的蓝图 。 在多模式和超级硬件上, 我们设想, 只能以接近于天体化的方式进行本地的计算。
Article 57
Title@2025-07-24 (4): Towards Multi-Agent Economies: Enhancing the A2A Protocol with Ledger-Anchored Identities and x402 Micropayments for AI Agents
Title: Towards Multi-Agent Economies: Enhancing the A2A Protocol with Ledger-Anchored Identities and x402 Micropayments for AI Agents | Auf dem Weg zu Multi-Agent Economies: Verbesserung des A2A-Protokolls mit Ledger-Anchored Identities und x402 Micropayments für KI-Agenten | 朝向多机构经济体:加强A2A议定书,使用分类标志和X402向AI代理商支付微额付款 2507.19550v1 |
Authors (3): Awid Vaziry, Sandro Rodriguez Garzon, Axel Küpper
This research article presents a novel architecture to empower multi-agent economies by addressing two critical limitations of the emerging Agent2Agent (A2A) communication protocol: decentralized agent discoverability and agent-to-agent micropayments. By integrating distributed ledger technology (DLT), this architecture enables tamper-proof, on-chain publishing of AgentCards as smart contracts, providing secure and verifiable agent identities. The architecture further extends A2A with the x402 open standard, facilitating blockchain-agnostic, HTTP-based micropayments via the HTTP 402 status code. This enables autonomous agents to seamlessly discover, authenticate, and compensate each other across organizational boundaries. This work further presents a comprehensive technical implementation and evaluation, demonstrating the feasibility of DLT-based agent discovery and micropayments. The proposed approach lays the groundwork for secure, scalable, and economically viable multi-agent ecosystems, advancing the field of agentic AI toward trusted, autonomous economic interactions.
这份研究文章提出了赋予多试剂经济体权力的新结构,它解决了新兴A2A代理商通信协议的两个关键限制:分散代理商发现和代理商对代理商微额付款。通过整合分布式分类账技术,这一结构能够将代理商卡作为智能合同在链上发布,防止篡改,提供安全和可核查的代理商身份。该结构进一步扩展了A2A,采用x402开放标准,通过HTTP 402地位代码便利了以块链为主的基于HTTP的微额付款。这使得自主代理商能够无缝地跨组织边界发现、认证和补偿对方。这项工作进一步展示了以DLT为主的代理商发现和微额付款的可行性,并展示了全面的技术实施和评价。拟议的方法为安全、可缩放和经济上可行的多试剂生态系统奠定了基础,将代理商AI领域推进了可信赖的自主经济互动。
Article 58
Title@2025-07-24 (4): EH-Benchmark Ophthalmic Hallucination Benchmark and Agent-Driven Top-Down Traceable Reasoning Workflow
Title: EH-Benchmark Ophthalmic Hallucination Benchmark and Agent-Driven Top-Down Traceable Reasoning Workflow | EH-Benchmark Ophthalmische Halluzination Benchmark und Agent-getriebene Top-Down-Rückverfolgbarkeit Workflow | EH-Benchmark Ophthalmic 幻觉基准和代理Dripreven 顶底可追踪合理理由工作流程 2507.22929v1 |
Authors (8): Xiaoyu Pan, Yang Bai, Ke Zou, Yang Zhou, Jun Zhou, Huazhu Fu, Yih-Chung Tham, Yong Liu
Medical Large Language Models (MLLMs) play a crucial role in ophthalmic diagnosis, holding significant potential to address vision-threatening diseases. However, their accuracy is constrained by hallucinations stemming from limited ophthalmic knowledge, insufficient visual localization and reasoning capabilities, and a scarcity of multimodal ophthalmic data, which collectively impede precise lesion detection and disease diagnosis. Furthermore, existing medical benchmarks fail to effectively evaluate various types of hallucinations or provide actionable solutions to mitigate them. To address the above challenges, we introduce EH-Benchmark, a novel ophthalmology benchmark designed to evaluate hallucinations in MLLMs. We categorize MLLMs’ hallucinations based on specific tasks and error types into two primary classes: Visual Understanding and Logical Composition, each comprising multiple subclasses. Given that MLLMs predominantly rely on language-based reasoning rather than visual processing, we propose an agent-centric, three-phase framework, including the Knowledge-Level Retrieval stage, the Task-Level Case Studies stage, and the Result-Level Validation stage. Experimental results show that our multi-agent framework significantly mitigates both types of hallucinations, enhancing accuracy, interpretability, and reliability. Our project is available at https://github.com/ppxy1/EH-Benchmark.
大型医学语言模型(MLLMS)在眼科诊断中发挥着关键作用,具有应对视力威胁疾病的巨大潜力,但是,由于眼科知识有限、视觉定位和推理能力不足以及多式眼科数据缺乏,共同阻碍精确损伤检测和疾病诊断的多式眼科数据缺乏,因此这些模型的准确性受到限制。此外,现有的医学基准未能有效评估各种类型的幻觉或提供可采取行动的缓解这些幻觉的解决方案。为了应对上述挑战,我们引入了EH-Benchmark,这是用于评估MLLMS幻觉的新型眼科基准。我们根据具体任务和错误类型将MLLMS的幻觉分为两大类:视觉理解和逻辑构成,每个类别由多个子类组成。鉴于MLLMS主要依赖基于语言的推理学而不是视觉处理,我们建议了一个以代理人为中心的三阶段框架,包括知识级Retreival阶段、任务级案例研究阶段和结果级校正阶段校准阶段。实验结果显示,我们的多试框架大大降低了我们多种试机框架的可靠性。
Article 59
Title@2025-07-24 (4): Remembering the Markov Property in Cooperative MARL
Title: Remembering the Markov Property in Cooperative MARL | Erinnerung an das Markov-Grundstück in der Genossenschaft MARL | 记得马尔科夫在MARL合作社中的财产 2507.18333v1 |
Authors (5): Kale-ab Abebe Tessera, Leonard Hinckeldey, Riccardo Zamboni, David Abel, Amos Storkey
Cooperative multi-agent reinforcement learning (MARL) is typically formalised as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP), where agents must reason about the environment and other agents’ behaviour. In practice, current model-free MARL algorithms use simple recurrent function approximators to address the challenge of reasoning about others using partial information. In this position paper, we argue that the empirical success of these methods is not due to effective Markov signal recovery, but rather to learning simple conventions that bypass environment observations and memory. Through a targeted case study, we show that co-adapting agents can learn brittle conventions, which then fail when partnered with non-adaptive agents. Crucially, the same models can learn grounded policies when the task design necessitates it, revealing that the issue is not a fundamental limitation of the learning models but a failure of the benchmark design. Our analysis also suggests that modern MARL environments may not adequately test the core assumptions of Dec-POMDPs. We therefore advocate for new cooperative environments built upon two core principles: (1) behaviours grounded in observations and (2) memory-based reasoning about other agents, ensuring success requires genuine skill rather than fragile, co-adapted agreements.
合作性多试剂强化学习(MARL)通常被正规化为分散化部分可观测的马尔科夫决定程序(Dec-POMDP),代理商必须了解环境和其他代理商的行为。实际上,目前的无模型的MARL算法使用简单的经常性功能相近器来应对对使用部分信息的其他人进行推理的挑战。在本立场文件中,我们争辩说,这些方法的成功经验不是由于有效的Markov信号恢复,而是因为学习绕过环境观测和记忆的简单公约。我们通过有针对性的案例研究,表明共同适应的代理商可以学习易碎的公约,而当与非适应剂合作时,这些公约就会失败。至关重要的是,在任务设计需要时,同样的模型可以学习基于基础的政策,表明这个问题不是学习模式的基本限制,而是基准设计失败。我们的分析还表明,现代的MARL环境可能无法充分测试Dec-POMDPs的核心假设。我们因此倡导基于两个核心原则的新的合作环境:(1)基于观察的行为和(2)基于记忆的推理,而不是基于其他脆弱代理商的真正成功的技能。
Article 60
Title@2025-07-24 (4): Designing Value-Aligned Traffic Agents through Conflict Sensitivity
Title: Designing Value-Aligned Traffic Agents through Conflict Sensitivity | Gestaltung wertorientierter Verkehrsagenten durch Konfliktsensitivität | 通过冲突敏感性设计符合价值的交通代理 2507.18284v1 |
Authors (5): Astrid Rakow, Joe Collenette, Maike Schwammberger, Marija Slavkovik, Gleifer Vs Alves
Autonomous traffic agents (ATAs) are expected to act in ways tat are not only safe, but also aligned with stakeholder values across legal, social, and moral dimensions. In this paper, we adopt an established formal model of conflict from epistemic game theory to support the development of such agents. We focus on value conflicts-situations in which agents face competing goals rooted in value-laden situations and show how conflict analysis can inform key phases of the design process. This includes value elicitation, capability specification, explanation, and adaptive system refinement. We elaborate and apply the concept of Value-Aligned Operational Design Domains (VODDs) to structure autonomy in accordance with contextual value priorities. Our approach shifts the emphasis from solving moral dilemmas at runtime to anticipating and structuring value-sensitive behaviour during development.
自主交通代理商(ATAs)的行事方式不仅安全,而且符合法律、社会和道德方面利益攸关方的价值观; 在本文件中,我们采用了一个既定的正式冲突模式,从迷你游戏理论到支持此类代理商的发展; 我们侧重于价值冲突情况,即代理商面临源于价值拉累情况的相互竞争的目标,并表明冲突分析如何为设计过程的关键阶段提供信息;这包括价值采集、能力规格、解释和适应性系统完善; 我们制定并应用价值统一操作设计域的概念,以根据背景价值优先事项构建自主性; 我们的方法将重点从在运行时解决道德困境转向在发展过程中预测和构建对价值敏感的行为。
Article 61
Title@2025-07-24 (4): Compositional Coordination for Multi-Robot Teams with Large Language Models
Title: Compositional Coordination for Multi-Robot Teams with Large Language Models | Kompositionskoordination für Multi-Roboter-Teams mit großen Sprachmodellen | 具有大语言模式的多机器人小组的组成协调 2507.16068v2 |
Authors (5): Zhehui Huang, Guangyao Shi, Yuwei Wu, Vijay Kumar, Gaurav S. Sukhatme
Multi-robot coordination has traditionally relied on a mission-specific and expert-driven pipeline, where natural language mission descriptions are manually translated by domain experts into mathematical formulation, algorithm design, and executable code. This conventional process is labor-intensive, inaccessible to non-experts, and inflexible to changes in mission requirements. Here, we propose LAN2CB (Language to Collective Behavior), a novel framework that leverages large language models (LLMs) to streamline and generalize the multi-robot coordination pipeline. LAN2CB transforms natural language (NL) mission descriptions into executable Python code for multi-robot systems through two core modules: (1) Mission Analysis, which parses mission descriptions into behavior trees, and (2) Code Generation, which leverages the behavior tree and a structured knowledge base to generate robot control code. We further introduce a dataset of natural language mission descriptions to support development and benchmarking. Experiments in both simulation and real-world environments demonstrate that LAN2CB enables robust and flexible multi-robot coordination from natural language, significantly reducing manual engineering effort and supporting broad generalization across diverse mission types. Website: https://sites.google.com/view/lan-cb
多机器人协调历来依赖一个特派团专用和专家驱动的管道,其中自然语言任务说明由域专家人工转换成数学配制、算法设计和可执行代码。这一常规过程是劳动密集型的,非专家无法使用,无法灵活地适应任务要求的变化。在这里,我们提议使用LAN2CB(集体行为语言至集体行为),这是一个利用大型语言模式简化和普及多机器人协调管道的新框架。 LAN2CB将自然语言(NL)任务说明转换成多机器人系统可执行的 Python 代码,通过两个核心模块:(1) 任务分析,将任务描述划为行为树,和(2) 代码生成,利用行为树和结构知识库生成机器人控制代码。我们进一步引入一套自然语言任务说明数据集,以支持发展和基准化。在模拟和现实世界环境中进行的实验表明,LAN2CB使多机器人系统系统的描述能够从自然语言中实现可靠和灵活的多机器人协调,大大减少了手工工程努力,并支持了不同类型任务的一般化。 http://mexiolog/clusional orges.
Article 62
Title@2025-07-24 (4): A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms
Title: A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms | Eine differenzierte Prämienmethode für verstärktes Lernen auf der Grundlage von Multi-Fahrzeug-Kooperativen-Entscheidungs-Making-Algorithmen | 基于多维合作社决策的强化学习有区别的奖励方法 2502.00352v2 |
Authors (4): Ye Han, Lijun Zhang, Dejian Meng, Zhuang Zhang
Reinforcement learning (RL) shows great potential for optimizing multi-vehicle cooperative driving strategies through the state-action-reward feedback loop, but it still faces challenges such as low sample efficiency. This paper proposes a differentiated reward method based on steady-state transition systems, which incorporates state transition gradient information into the reward design by analyzing traffic flow characteristics, aiming to optimize action selection and policy learning in multi-vehicle cooperative decision-making. The performance of the proposed method is validated in RL algorithms such as MAPPO, MADQN, and QMIX under varying autonomous vehicle penetration. The results show that the differentiated reward method significantly accelerates training convergence and outperforms centering reward and others in terms of traffic efficiency, safety, and action rationality. Additionally, the method demonstrates strong scalability and environmental adaptability, providing a novel approach for multi-agent cooperative decision-making in complex traffic scenarios.
强化学习(RL)显示出通过州-行动回报反馈循环优化多车辆合作驱动战略的巨大潜力,但仍然面临样本效率低等挑战。本文件提出基于稳定状态过渡制度的有区别奖励方法,通过分析交通流量特点将国家过渡梯度信息纳入奖励设计,目的是在多车辆合作决策中优化行动选择和政策学习。拟议方法的绩效在州-行动回报反馈循环(MAPO、MADQN和QMIX)等RL算法中被验证,在不同的机动车辆自主渗透下得到验证。结果显示,有区别的奖励方法大大加快了培训的趋同,在交通效率、安全和行动合理性方面超越了奖励的核心。此外,该方法展示了强大的可伸缩性和环境适应性,为复杂交通情况中的多剂合作决策提供了新的方法。
Article 63
Title@2025-07-24 (4): Recognizing and Eliciting Weakly Single Crossing Profiles on Trees
Title: Recognizing and Eliciting Weakly Single Crossing Profiles on Trees | Erkennen und Elizitieren von schwachen einzelnen Kreuzungsprofilen auf Bäumen | 承认树树和树的脆弱单一交叉概况 1611.04175v4 |
Authors (1): Palash Dey
We introduce and study the weakly single-crossing domain on trees which is a generalization of the well-studied single-crossing domain in social choice theory. We design a polynomial-time algorithm for recognizing preference profiles which belong to this domain. We then develop an efficient elicitation algorithm for this domain which works even if the preferences can be accessed only sequentially and the underlying single-crossing tree structure is not known beforehand. We also prove matching lower bound on the query complexity of our elicitation algorithm when the number of voters is large compared to the number of candidates. We also prove a lower bound of $\Omega(m^2\log n)$ on the number of queries that any algorithm needs to ask to elicit single crossing profile when random queries are allowed. This resolves an open question in an earlier paper and proves optimality of their preference elicitation algorithm when random queries are allowed.
我们引入并研究树木上薄弱的单跨域,这是社会选择理论中研究周密的单跨域的概括性。 我们设计了一种多元时间算法,用于确认属于此域的优惠概况。 然后我们为此域开发一种有效的引算法, 即使只能按顺序获得偏好, 且其基础的单跨树结构事先并不为人知。 我们还证明, 当选民人数与候选人人数相比较大时, 我们的引算法的查询复杂性比我们更低。 我们还证明, 在允许随机查询时, 任何算法需要查询的单交叉剖度的查询数量上, 也比 $\ Omega(m%2\log n) 的比值要低。 这在早先的文件中解决了一个开放问题, 并证明在允许随机查询时, 他们的优先引算法最优化 。
Article 64
Title@2025-07-24 (4): Multi-Agent Guided Policy Optimization
Title: Multi-Agent Guided Policy Optimization | Multi-Agent gesteuerte Politikoptimierung | 多边机构引导政策优化政策 2507.18059v1 |
Authors (3): Yueheng Li, Guangming Xie, Zongqing Lu
Due to practical constraints such as partial observability and limited communication, Centralized Training with Decentralized Execution (CTDE) has become the dominant paradigm in cooperative Multi-Agent Reinforcement Learning (MARL). However, existing CTDE methods often underutilize centralized training or lack theoretical guarantees. We propose Multi-Agent Guided Policy Optimization (MAGPO), a novel framework that better leverages centralized training by integrating centralized guidance with decentralized execution. MAGPO uses an auto-regressive joint policy for scalable, coordinated exploration and explicitly aligns it with decentralized policies to ensure deployability under partial observability. We provide theoretical guarantees of monotonic policy improvement and empirically evaluate MAGPO on 43 tasks across 6 diverse environments. Results show that MAGPO consistently outperforms strong CTDE baselines and matches or surpasses fully centralized approaches, offering a principled and practical solution for decentralized multi-agent learning. Our code and experimental data can be found in https://github.com/liyheng/MAGPO.
由于一些实际制约因素,如部分可观性和有限的沟通,集中化执行培训已成为多机构强化学习合作(MARL)的主要模式,然而,现有的中央化培训方法往往没有充分利用集中化培训,或缺乏理论保障。我们建议多机构引导政策优化(MAGPO)这个新的框架,通过将集中化指导与分散执行相结合,更好地利用集中化培训。MAGPO采用自动回归式联合政策,进行可扩展、协调的探索,并明确将其与分散化政策保持一致,以确保在局部易懂性下部署性。我们从理论上保证单方政策改进,并实证地评价在六个不同环境中43项任务的宏观化政策。结果显示,MAGPO一贯地超越强大的CTDE基线,并匹配或超过完全集中化的方法,为分散化多机构学习提供了原则性和实用的解决方案。我们的代码和实验数据见https://github.com/liyheng/MAGPO。