cs.MA @ 2025-06-20: 059

06-18 (3)

SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

SwarmAgentic: Auf dem Weg zur vollautomatisierten Agentensystem-Erzeugung über Swarm Intelligence

SwarmAgentic公司:通过Swarm Intell公司实现完全自动化的自动制剂系统生成

2506.15672v1

06-18

CORA: Coalitional Rational Advantage Decomposition for Multi-Agent Policy Gradients

CORA: Coalitional Rational Advantage Zersetzung für Multi-Agent Policy Gradienten

CORA: 多重利益政策梯度联合合理优势分解

2506.04265v2

06-18

Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study

Multi-Agenten-Verstärkungslernen für autonome Multi-Satelliten-Erdbeobachtung: Eine realistische Fallstudie

多卫星对地观测自治多卫星地球观测多机构强化学习:现实案例研究

2506.15207v1

06-18

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Wolfpack-Adversarierangriff für robustes Mehr-Agenten-Verstärkungs-Lernen

Wolfpack 对强力多机构强化学习的逆向攻击

2502.02844v3

06-18

YOLO-MARL: You Only LLM Once for Multi-Agent Reinforcement Learning

YOLO-MARL: Du bist nur einmal LLM für das Multi-Agenten-Verstärkungs-Lernen

YOLO-MARL:你只有一次多机构加强学习的LLM

2410.03997v2

06-18

ChatModel: Automating Reference Model Design and Verification with LLMs

ChatModel: Automatisieren von Referenzmodell-Design und Überprüfung mit LLMs

聊天模式:使用LLMs自动使用参考模型设计和核查

2506.15066v1

06-17 (2)

Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

多方语言模式:推进合作、协调和适应

2506.09331v2

06-17

Towards the Autonomous Optimization of Urban Logistics: Training Generative AI with Scientific Tools via Agentic Digital Twins and Model Context Protocol

Auf dem Weg zur autonomen Optimierung der Urban Logistics: Training Generative KI mit wissenschaftlichen Tools über Agentic Digital Twins und Model Context Protocol

实现城市物流自动化优化:通过 “ 代理数字双双 “ 和 “ 示范背景议定书 “ ,利用科学工具生成培训AI

2506.13068v2

06-17

AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

AssistantX: Ein LLM-powered Proaktiver Assistent in kollaborativer Mensch-bevölkerter Umgebung

助理X:在合作人类普惠环境方面由LLM授权的一名主动助理助理

2409.17655v2

06-17

Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective

Inhärente und entstehende Haftungsfragen in LLM-basierten agentischen Systemen: eine Principal-Agent-Perspektive

以LLLM为基础的代理系统中的固有和新出现的赔偿责任问题:主要代理人的视角

2504.03255v2

06-17

Hierarchical Multi-Agent Reinforcement Learning-based Coordinated Spatial Reuse for Next Generation WLANs

Hierarchische Multi-Agenten-Verstärkung lernbasierte koordinierte räumliche Wiederverwendung für WLANs der nächsten Generation

下一代工作计划协调空间再利用

2506.14187v1

06-17

Light Aircraft Game : Basic Implementation and training results analysis

Light Aircraft Spiel : Basic Implementation and training results analysis

轻型飞机游戏:基本实施和培训结果分析

2506.14164v1

06-17

StorySage: Conversational Autobiography Writing Powered by a Multi-Agent Framework

故事:以多机构框架为动力的相互自传书写作

2506.14159v1

06-16 (1)

Beyond Browsing: API-Based Web Agents

Jenseits von Browsing: API-basierte Web-Agenten

超出浏览范围: API 网络代理

2410.16464v3

06-16

Deceptive Path Planning: A Bayesian Game Approach

Täuschende Pfadplanung: Ein Bayesischer Spielansatz

欺骗性道路规划:贝耶斯游戏方法

2506.13650v1

06-16

Agent Capability Negotiation and Binding Protocol (ACNBP)

能力谈判和具有约束力的议定书(ACNBP)

2506.13590v1

06-16

Mobility to Campus – a Framework to Evaluate and Compare Different Mobility Modes

Mobility to Campus - ein Rahmen zur Bewertung und zum Vergleich unterschiedlicher Mobilitätsmodi

流动到校园 – – 评估和比较不同流动模式的框架

2506.13574v1

06-16

Achieving Collective Welfare in Multi-Agent Reinforcement Learning via Suggestion Sharing

Kollektive Wohlfahrt im Mehr-Agenten-Verstärkungs-Lernen durch Suggestion Sharing erreichen

通过分享建议,实现多机构加强多机构加强学习的集体福利

2412.12326v2

06-16

Socratic RL: A Novel Framework for Efficient Knowledge Acquisition through Iterative Reflection and Viewpoint Distillation

Socratic RL: Ein neuartiger Rahmen für effiziente Wissensakquisition durch iterative Reflexion und Sichtdestillation

Scortic RL:一个通过迭代思考和观察点蒸馏提高知识获取效率的新颖框架

2506.13358v1

06-16

Design of A* based heuristic algorithm for efficient interdiction in multi-Layer networks

Entwurf eines auf A* basierenden heuristischen Algorithmus für effizientes Interdiction in Multi-Layer-Netzwerken

设计基于A* 的超值算法,以有效阻截多路网络

2506.10017v2

06-16

Towards Pervasive Distributed Agentic Generative AI – A State of The Art

Auf dem Weg zu einer allgegenwärtigen verteilten agentischen Generativen KI – Ein Stand der Kunst

朝向分布式分布式分布式制剂产生AI – – 艺术状态

2506.13324v1

06-16

Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning

Convex Markov Games: Eine neue Front für das Mehr-Agenten-Stärkungs-Lernen

Convex Markov 游戏:多机构强化学习的新疆域

2410.16600v3

06-16

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

G-Memory: Hierarchischer Speicher für Multi-Agent-Systeme

G-记忆:为多机构系统追踪等级记忆

2506.07398v2

06-16

Autonomous Computer Vision Development with Agentic AI

Autonome Computer Vision Entwicklung mit Agentischer KI

与Agric AI合作的自主计算机愿景发展

2506.11140v2

06-16

Identification of LFT Structured Descriptor Systems with Slow and Non-uniform Sampling

Identifizierung von LFT-strukturierten Deskriptorensystemen mit langsamer und nicht einheitlicher Probenahme

LFT 结构化描述系统缓慢和非统一抽样的标识

2407.00629v5

06-15 (7)

Homeostatic Coupling for Prosocial Behavior

Homeostatische Kupplung für Prosoziales Verhalten

有利于社会行为的自制共聚

2506.12894v1

06-15

HARBOR: Exploring Persona Dynamics in Multi-Agent Competition

HARBOR: Erforschen von Persona-Dynamik im Multi-Agenten-Wettbewerb

《HARBOR:在多机构竞争中探索人动态》

2502.12149v2

06-14 (6)

Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow

Vertrauen-MARL: Vertrauen-basiertes Multi-Agenten-Verstärkungs-Learning-Framework für kooperative On-Ramp-Merging-Kontrolle im heterogenen Verkehrsfluss

Trust-MARL: 以信任为基础的多机构加强多机构加强学习框架,促进在多样化交通流量方面合作在潮上合并控制

2506.12600v1

06-14

A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications

Eine umfassende Untersuchung der Tiefenforschung: Systeme, Methoden und Anwendungen

深层研究综合调查:系统、方法和应用

2506.12594v1

06-14

Collaboration Between the City and Machine Learning Community is Crucial to Efficient Autonomous Vehicles Routing

Zusammenarbeit zwischen der Stadt und Machine Learning Community ist entscheidend für effiziente autonome Fahrzeuge Routing

城市与机械学习社区之间的合作对于高效自治车辆的运行至关重要。

2502.13188v2

06-14

Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

Auf dem Weg zu vernünftigen Papageien: Warum große Sprachmodelle mit uns argumentieren sollten

通向合理的鹦鹉:为什么大语言模型应该设计来与我们争论?

2505.05298v2

06-14

IndoorWorld: Integrating Physical Task Solving and Social Simulation in A Heterogeneous Multi-Agent Environment

IndoorWorld: Integrieren physischer Aufgabenlösung und sozialer Simulation in einer heterogenen, multiagenten Umgebung

室内世界:将物理任务综合解决和社会模拟纳入一个多样化的多机构环境

2506.12331v1

06-14

Deep Fictitious Play-Based Potential Differential Games for Learning Human-Like Interaction at Unsignalized Intersections

Deep Fictitious Play-Based Potential Differential Games für das Lernen von Mensch-ähnliche Interaktion an unsignalisierten Schnitten

为在无信号交界处学习人与人之间的相互作用而举行的深、有真知灼见的以游戏为基础的潜在差异运动会

2506.12283v1

06-13 (5)

Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study

Untersuchung des Potenzials von Multi-Agent-Architekturen für die Grundlagen-Design-Automatisierung von Großsprachenmodellen: Eine Aufgabenklassifikation und Expertenauswahlstudie

调查基于大语言示范示范路由器多机构结构对基础设计自动化的潜力:任务分类和专家甄选研究

2506.13811v1

06-13

A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon

Ein Benchmark für die Verallgemeinerung unterschiedlicher Teamstrategien im wettbewerbsfähigen Pokémon

普凯蒙竞争中全面推广不同团队战略的基准

2506.10326v2

06-13

Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?

Upgrade oder Switch: Brauchen wir eine neue Registry-Architektur für das Internet von KI-Agenten?

升级或切换:我们是否需要为AI代理商的互联网建立一个新的注册结构?

2506.12003v1

06-13

Computational Social Choice: Parameterized Complexity and Challenges

Computational Social Choice: Parameterisierte Komplexität und Herausforderungen

社会选择:参数复杂性和挑战

2410.14078v2

06-13

Combining Deep Reinforcement Learning and Search with Generative Models for Game-Theoretic Opponent Modeling

Die Kombination von Deep Enforcement Learning und Search mit generativen Modellen für Game-Theoretic Opponent Modeling

将深强化学习和搜索与游戏理论对称模型生成模型相结合

2302.00797v2

06-13

The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

Das automatisierte, aber riskante Spiel: Modellierung von Agent-zu-Agent-Verhandlungen und Transaktionen in Verbrauchermärkten

自动但有风险游戏:消费者市场代理对代理谈判和交易的模拟

2506.00073v3

06-13

Agent Semantics, Semantic Spacetime, and Graphical Reasoning

Agent Semantics, Semantische Raumzeit und Graphische Vernunft

语义学、语义空间时间和图形解释

2506.07756v2

06-13

Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines

Bel Esprit: Multi-Agent Framework für den Bau von KI-Modellpipelines

Bel Esprit: 建立AI 示范管道的多机构机构框架

2412.14684v2

06-13

PE-MA: Parameter-Efficient Co-Evolution of Multi-Agent Systems

PE-MA: Parametereffiziente Ko-Evolution von Multi-Agent-Systemen

PE-MA: 多机构系统参数有效共同演变

2506.11803v1

06-13

Is Your LLM-Based Multi-Agent a Reliable Real-World Planner? Exploring Fraud Detection in Travel Planning

Ist Ihr LLM-basierter Multiagent ein zuverlässiger Real-World Planer? Erforschen Sie Betrugserkennung in der Reiseplanung

你以LLM为基地的多方机构是可靠的真实世界规划者吗? 探索旅行规划中的欺诈侦查

2505.16557v2

06-13

AutoGen Driven Multi Agent Framework for Iterative Crime Data Analysis and Prediction

AutoGen Driven Multi Agent Framework für iterative Kriminalität Datenanalyse und Vorhersage

循环犯罪数据分析和预测自动驱动器多剂框架

2506.11475v1

06-13

DURA-CPS: A Multi-Role Orchestrator for Dependability Assurance in LLM-Enabled Cyber-Physical Systems

DURA-CPS: Ein Multi-Rolle-Orchester für Zuverlässigkeitssicherung in LLM-fähigen Cyber-Physischen Systemen

DURA-CPS:LLM-Enable网络-物理系统依赖性保证多功能Orster

2506.06381v2

06-13

Policy Optimization and Multi-agent Reinforcement Learning for Mean-variance Team Stochastic Games

Politikoptimierung und Multi-Agenten-Verstärkung Lernen für Mittelvarianz Team Stochastic Games

平均差小组游戏游戏政策优化和多剂强化学习

2503.22779v2

06-13

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

Robustes kooperatives Mehr-Agenten-Verstärkung-Lernen:Ein Mittelfeld-Spiel-Perspektive

强有力的合作多机构强化多机构强化学习:中、实地游戏的视角

2406.13992v2

06-12 (4)

Control Industrial Automation System with Large Language Model Agents

Steuerung des industriellen Automatisierungssystems mit großen Sprachmodellen

配有大语言示范物剂的控制工业自动化系统

2409.18009v2

06-12

Shapley Machine: A Game-Theoretic Framework for N-Agent Ad Hoc Teamwork

Shapley Machine: Ein Game-Theoretisches Framework für N-Agent Ad Hoc Teamwork

N-代理特设团队工作游戏理论框架

2506.11285v1

06-12

Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

Ausbau des kooperativen Multi-Agenten-Verstärkungs-Lernens mit staatlicher Modellierung und kontersarieller Exploration

通过国家建模和反向探索,加强合作性多机构强化多机构强化学习以及国家建模和反向探索

2505.05262v2

06-12

Noncooperative Equilibrium Selection via a Trading-based Auction

Nichtkooperative Equilibrium-Auswahl über eine Trading-basierte Auktion

通过基于交易的拍卖选择平衡不合作

2502.03616v2

06-12

AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

AutoMind: Adaptives Knowledgeable Agent für automatisierte Datenwissenschaft

自动Mind:自动数据科学适应性知识代理

2506.10974v1

06-12

Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Unkoppelte Lerndynamik und Nash-Equilibrium für höhere Ordnung

高等职称无交错学习动态和纳什平衡

2506.10874v1

06-12

AI Agent Behavioral Science

KI Agent Verhaltenswissenschaft

AI 行为科学代理

2506.06366v3

06-12

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

AniMaker: Automatisiertes Multi-Agent-Animiertes Storytelling mit MCTS-gesteuerter Clip-Generierung

AniMaker:与MCTS-Driven Clift 生成的自动多代理动画小说

2506.10540v1

06-12

Nonconvex Game and Multi Agent Reinforcement Learning for Zonal Ancillary Markets

Nonconvex-Spiel und Multi-Agenten-Verstärkungs-Lernen für zonale Hilfsmärkte

为Zonal辅助市场进行非convelx 游戏和多剂强化学习

2505.03288v2

06-12

MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

MasHost baut alles: Autonomes Multi-Agenten-System, das durch Verstärkungslernen gesteuert wird

以强化学习为导向的多机构自治系统

2506.08507v2

06-12

CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models

CAF-I: Ein kollaboratives Multi-Agent-Framework für eine verbesserte Ironieerkennung mit großen Sprachmodellen

CAF-I:采用大语言模式加强铁铁探测多机构合作多方协作框架

2506.08430v2

06-12

The Optimization Paradox in Clinical AI Multi-Agent Systems

Das Optimierungsparadox in klinischen KI-Multiagentensystemen

AI 临床多机构系统中最佳优化的副作用

2506.06574v2

Article 0

Title@2025-06-18 (3): SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

Title: SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

SwarmAgentic: Auf dem Weg zur vollautomatisierten Agentensystem-Erzeugung über Swarm Intelligence

SwarmAgentic公司:通过Swarm Intell公司实现完全自动化的自动制剂系统生成 2506.15672v1

Authors (7): Yao Zhang, Chenyang Lin, Shijie Tang, Haokun Chen, Shijie Zhou, Yunpu Ma, Volker Tresp

The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated system multi-agent generation. Our code is publicly released at https://yaoz720.github.io/SwarmAgentic/.

大型语言模型的快速进展推动了决策、协调和任务执行方面的代理系统;然而,现有的代理系统生成框架缺乏完全的自主性,缺乏从偷盗剂生成、自我优化剂功能和协作,限制了适应性和可扩展性;我们提议SwarmAgentic,这是一个完全自动化的代理系统生成框架,它从零开始建立代理系统,并通过语言驱动的探索,共同优化代理功能和协作,作为相互依存的组成部分;为了能够有效地搜索系统一级的结构,SwarAgentic保持一批候选系统,并通过反馈指导更新,从Partle Starm Oppimization(PSO)中汲取灵感,对其进行演变;我们评估了我们六种现实世界、开放性和探索性的方法,涉及高层规划、系统级协调和创造性推理;鉴于任务描述和客观功能,SwarmAgencientiopic超越了所有基线,在旅行规划系统基准上实现了+261.8的相对改进,突出了结构上不受控制的任务的自动化的有效性;这个框架标志着我们在公共数据系统上完全升级的自动化设计系统。

Article 1

Title@2025-06-18 (3): CORA: Coalitional Rational Advantage Decomposition for Multi-Agent Policy Gradients

Title: CORA: Coalitional Rational Advantage Decomposition for Multi-Agent Policy Gradients

CORA: Coalitional Rational Advantage Zersetzung für Multi-Agent Policy Gradienten

CORA: 多重利益政策梯度联合合理优势分解 2506.04265v2

Authors (3): Mengda Ji, Genjiu Xu, Liying Wang

This work focuses on the credit assignment problem in cooperative multi-agent reinforcement learning (MARL). Sharing the global advantage among agents often leads to suboptimal policy updates as it fails to account for the distinct contributions of agents. Although numerous methods consider global or individual contributions for credit assignment, a detailed analysis at the coalition level remains lacking in many approaches. This work analyzes the over-updating problem during multi-agent policy updates from a coalition-level perspective. To address this issue, we propose a credit assignment method called Coalitional Rational Advantage Decomposition (CORA). CORA evaluates coalitional advantages via marginal contributions from all possible coalitions and decomposes advantages using the core solution from cooperative game theory, ensuring coalitional rationality. To reduce computational overhead, CORA employs random coalition sampling. Experiments on matrix games, differential games, and multi-agent collaboration benchmarks demonstrate that CORA outperforms strong baselines, particularly in tasks with multiple local optima. These findings highlight the importance of coalition-aware credit assignment for improving MARL performance.

这项工作侧重于多试剂合作强化学习中的信用分配问题。在代理人之间分享全球优势往往导致政策更新不足,因为它没有考虑到代理人的独特贡献。虽然有许多方法考虑到全球或个人对信用分配的贡献,但在联盟一级仍缺乏许多方法的详细分析。这项工作从联盟一级的角度分析了多试剂政策更新过程中的过度更新问题。为了解决这一问题,我们提议了一种称为联合合理优势解体的信用分配方法。 CoRA通过所有可能的联盟的边际贡献来评估联合优势,并利用合作游戏理论的核心解决方案来消除优势,确保联合合理性。为了减少计算间接费用,CORA采用随机联合抽样。在矩阵游戏、差异游戏和多试剂合作基准方面的实验表明,CORA超越了强大的基线,特别是在多个地方选择任务中。这些研究结果突出表明了联盟-认知信用分配对于改善MARL业绩的重要性。

Article 2

Title@2025-06-18 (3): Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study

Title: Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study

Multi-Agenten-Verstärkungslernen für autonome Multi-Satelliten-Erdbeobachtung: Eine realistische Fallstudie

多卫星对地观测自治多卫星地球观测多机构强化学习:现实案例研究 2506.15207v1

Authors (5): Mohamad A. Hady, Siyi Hu, Mahardhika Pratama, Jimmy Cao, Ryszard Kowalczyk

The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions, addressing challenges in climate monitoring, disaster management, and more. However, autonomous coordination in multi-satellite systems remains a fundamental challenge. Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions, necessitating the use of Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations using MARL frameworks. We address key challenges, including energy and data storage limitations, uncertainties in satellite observations, and the complexities of decentralised coordination under partial observability. By leveraging a near-realistic satellite simulation environment, we evaluate the training stability and performance of state-of-the-art MARL algorithms, including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can effectively balance imaging and resource management while addressing non-stationarity and reward interdependency in multi-satellite coordination. The insights gained from this study provide a foundation for autonomous satellite operations, offering practical guidelines for improving policy learning in decentralised EO missions.

低地轨道(LEO)卫星的飞速增长使地球观测任务发生了革命性的变化,解决了气候监测、灾害管理等方面的挑战,然而,多卫星系统的自主协调仍是一项根本性挑战,传统优化办法难以满足动态EO飞行任务的实时决策需求,因此需要使用强化学习和多机构强化学习(MARL)系统。在本文件中,我们调查基于RL的自主EO飞行任务规划,办法是模拟单一卫星业务,并利用MARL框架将任务扩大到多卫星星座。我们处理关键挑战,包括能源和数据储存限制、卫星观测的不确定性以及部分可耐性下分散协调的复杂性。我们利用近乎现实的卫星模拟环境,评估MARL算法(包括PPO、IPPO、MAPO和HAPPO)的培训稳定性和绩效。我们的结果表明,MAR能够有效地平衡成像和资源管理,同时处理非固定性和奖励多卫星协调中的相互依存性。我们从这一研究中获得的深刻认识,为不断改进的卫星业务提供了一种自主的基础。

Article 3

Title@2025-06-18 (3): Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Title: Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Wolfpack-Adversarierangriff für robustes Mehr-Agenten-Verstärkungs-Lernen

Wolfpack 对强力多机构强化学习的逆向攻击 2502.02844v3

Authors (4): Sunwoo Lee, Jaebak Hwang, Yonghyeon Jo, Seungyul Han

Traditional robust methods in multi-agent reinforcement learning (MARL) often struggle against coordinated adversarial attacks in cooperative scenarios. To address this limitation, we propose the Wolfpack Adversarial Attack framework, inspired by wolf hunting strategies, which targets an initial agent and its assisting agents to disrupt cooperation. Additionally, we introduce the Wolfpack-Adversarial Learning for MARL (WALL) framework, which trains robust MARL policies to defend against the proposed Wolfpack attack by fostering systemwide collaboration. Experimental results underscore the devastating impact of the Wolfpack attack and the significant robustness improvements achieved by WALL. Our code is available at https://github.com/sunwoolee0504/WALL.

多试剂加固学习的传统强力方法经常在合作情况下对抗协调对抗性攻击。为了应对这一限制,我们提议在猎狼战略的启发下,建立沃尔夫帕克反versarial攻击框架,针对最初的代理人及其协助代理人,以破坏合作。此外,我们介绍沃尔夫帕克-Aversarial学习争取加强加固学习(WAL)框架,该框架通过促进全系统合作,培训强有力的MARL政策,以抵御拟议的沃尔夫帕克攻击。实验结果突出了沃尔夫帕克攻击的毁灭性影响以及沃尔夫帕克取得的重大强力改进。我们的代码可在https://github.com/sunwoole080504/WALL查阅。

Article 4

Title@2025-06-18 (3): YOLO-MARL: You Only LLM Once for Multi-Agent Reinforcement Learning

Title: YOLO-MARL: You Only LLM Once for Multi-Agent Reinforcement Learning

YOLO-MARL: Du bist nur einmal LLM für das Multi-Agenten-Verstärkungs-Lernen

YOLO-MARL:你只有一次多机构加强学习的LLM 2410.03997v2

Authors (5): Yuan Zhuang, Yi Shen, Zhili Zhang, Yuxiao Chen, Fei Miao

Advancements in deep multi-agent reinforcement learning (MARL) have positioned it as a promising approach for decision-making in cooperative games. However, it still remains challenging for MARL agents to learn cooperative strategies for some game environments. Recently, large language models (LLMs) have demonstrated emergent reasoning capabilities, making them promising candidates for enhancing coordination among the agents. However, due to the model size of LLMs, it can be expensive to frequently infer LLMs for actions that agents can take. In this work, we propose You Only LLM Once for MARL (YOLO-MARL), a novel framework that leverages the high-level task planning capabilities of LLMs to improve the policy learning process of multi-agents in cooperative games. Notably, for each game environment, YOLO-MARL only requires one time interaction with LLMs in the proposed strategy generation, state interpretation and planning function generation modules, before the MARL policy training process. This avoids the ongoing costs and computational time associated with frequent LLMs API calls during training. Moreover, trained decentralized policies based on normal-sized neural networks operate independently of the LLM. We evaluate our method across two different environments and demonstrate that YOLO-MARL outperforms traditional MARL algorithms.

在深入的多试剂强化学习(MARL)中,它被定位为合作性游戏决策的一个很有希望的方法,然而,对于MARL代理商来说,学习合作性游戏环境的合作战略仍然是一项挑战性的工作。最近,大型语言模型(LLLMS)展示了突发的推理能力,使他们成为加强代理商之间协调的有希望的人选。然而,由于LLMS的模型规模,经常为代理商可以采取的行动推断LLMS成本昂贵。在这项工作中,我们提议只有你LLMS一次为MARL(YOLO-MARL),这是一个利用LLMML高层次任务规划能力改进合作性游戏中多代理人的政策学习过程的新框架。值得注意的是,对于每一个游戏环境,YOLO-MARL只需要在拟议的战略制定、国家解释和规划功能生成模块中与LMLMAR进行一次互动。这避免了培训过程中经常使用LMLMAR API电话进行的持续费用和计算时间。此外,我们还培训了以普通神经网络为基础的分散政策,以独立于LMLMARMARMARML系统为不同环境。我们评估了两种方法,并展示了传统的MARLMARL系统。我们在不同环境中的传统方法。

Article 5

Title@2025-06-18 (3): ChatModel: Automating Reference Model Design and Verification with LLMs

Title: ChatModel: Automating Reference Model Design and Verification with LLMs

ChatModel: Automatisieren von Referenzmodell-Design und Überprüfung mit LLMs

聊天模式:使用LLMs自动使用参考模型设计和核查 2506.15066v1

Authors (6): Jianmin Ye, Tianyang Liu, Qi Tian, Shengchu Su, Zhe Jiang, Xi Wang

As the complexity of integrated circuit designs continues to escalate, the functional verification becomes increasingly challenging. Reference models, critical for accelerating the verification process, are themselves becoming more intricate and time-consuming to develop. Despite the promise shown by large language models (LLMs) in code programming, effectively generating complex reference models remains a significant hurdle. To address these challenges, we introduce ChatModel, the first LLM-aided agile reference model generation and verification platform. ChatModel streamlines the transition from design specifications to fully functional reference models by integrating design standardization and hierarchical agile modeling. Employing a building-block generation strategy, it not only enhances the design capabilities of LLMs for reference models but also significantly boosts verification efficiency. We evaluated ChatModel on 300 designs of varying complexity, demonstrating substantial improvements in both efficiency and quality of reference model generation. ChatModel achieved a peak performance improvement of 55.02% compared to alternative methods, with notable enhancements in generation stability, and delivered a 9.18x increase in its capacity to produce reference model designs. Furthermore, it accelerated the iterative process of reference model design and validation by an average of 5.90x compared to traditional approaches. These results highlight the potential of ChatModel to significantly advance the automation of reference model generation and validation.

由于集成电路设计的复杂性继续升级,功能性核查变得日益具有挑战性。对于加速核查进程至关重要的参考模型本身正在变得更加复杂和耗时地开发。尽管大型语言模型(LLMs)在代码编程中显示了希望,但有效生成复杂的参考模型仍是一个重大障碍。为了应对这些挑战,我们引入了ChatModel,即第一个由LLM协助的LLM型快速参考模型生成和核查平台。ChatModel通过整合设计标准化和等级灵活建模,简化了从设计规格向完全功能性能参考模型的过渡。采用建筑区块生成战略,不仅提高了LLMs用于参考模型的设计能力,而且还大大提高了核查效率。我们评估了300种不同复杂设计的ChatModel,表明在创建参考模型的效率和质量方面都有很大改进。ChatModel实现了与替代方法相比最高性能改进55.02%,并显著加强了生产参考模型设计的能力。此外,它加速了参考模型设计和验证的迭接过程,比传统模型的参照率平均提高了5.90x。这些结果突出表明了Chadel的自动化。

Article 6

Title@2025-06-17 (2): Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Title: Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

多方语言模式:推进合作、协调和适应 2506.09331v2

Authors (1): Arjun Vaithilingam Sudhakar

Modern Large Language Models (LLMs) exhibit impressive zero-shot and few-shot generalization capabilities across complex natural language tasks, enabling their widespread use as virtual assistants for diverse applications such as translation and summarization. Despite being trained solely on large corpora of text without explicit supervision on author intent, LLMs appear to infer the underlying meaning of textual interactions. This raises a fundamental question: can LLMs model and reason about the intentions of others, i.e., do they possess a form of theory of mind? Understanding other’s intentions is crucial for effective collaboration, which underpins human societal success and is essential for cooperative interactions among multiple agents, including humans and autonomous systems. In this work, we investigate the theory of mind in LLMs through the lens of cooperative multi-agent reinforcement learning (MARL), where agents learn to collaborate via repeated interactions, mirroring human social reasoning. Our approach aims to enhance artificial agent’s ability to adapt and cooperate with both artificial and human partners. By leveraging LLM-based agents capable of natural language interaction, we move towards creating hybrid human-AI systems that can foster seamless collaboration, with broad implications for the future of human-artificial interaction.

现代大型语言模型(LLMS)在复杂的自然语言任务中表现出令人印象深刻的零射和少见的概括能力,使LLMS能够广泛用作翻译和总结等各种应用的虚拟助手。尽管LLMS仅仅在没有明确监督作者意图的情况下接受了关于大量文本整体的培训,但似乎推断了文本互动的根本含义。这提出了一个根本问题:LLMS模型和关于他人意图的理由,即它们是否具有某种形式的思想理论?了解他人的意图对于有效合作至关重要,而有效合作是人类社会成功的基础,并且对于包括人类和自主系统在内的多种代理人之间的合作互动至关重要。在这项工作中,我们通过合作性多剂强化学习(MARL)的透镜调查LMMS中的思想理论,代理商通过反复的互动学习合作,反映人类的社会推理。我们的方法的目的是提高人工代理人适应和与人工和人类伙伴合作的能力。通过利用LMM公司能够进行自然语言互动的代理人,我们开始建立能够促进无缝合作的人类-AI混合系统,对未来的人类-艺术互动产生广泛影响。

Article 7

Title@2025-06-17 (2): Towards the Autonomous Optimization of Urban Logistics: Training Generative AI with Scientific Tools via Agentic Digital Twins and Model Context Protocol

Title: Towards the Autonomous Optimization of Urban Logistics: Training Generative AI with Scientific Tools via Agentic Digital Twins and Model Context Protocol

Auf dem Weg zur autonomen Optimierung der Urban Logistics: Training Generative KI mit wissenschaftlichen Tools über Agentic Digital Twins und Model Context Protocol

实现城市物流自动化优化:通过 “ 代理数字双双 “ 和 “ 示范背景议定书 “ ,利用科学工具生成培训AI 2506.13068v2

Authors (6): Haowen Xu, Yulin Sun, Jose Tupayachi, Olufemi Omitaomu, Sisi Zlatanova, Xueping Li

Optimizing urban freight logistics is critical for developing sustainable, low-carbon cities. Traditional methods often rely on manual coordination of simulation tools, optimization solvers, and expert-driven workflows, limiting their efficiency and scalability. This paper presents an agentic system architecture that leverages the model context protocol (MCP) to orchestrate multi-agent collaboration among scientific tools for autonomous, simulation-informed optimization in urban logistics. The system integrates generative AI agents with domain-specific engines - such as Gurobi for optimization and AnyLogic for agent-based simulation - forming a generative digital twin capable of reasoning, planning, and acting across multimodal freight networks. By incorporating integrated chatbots, retrieval-augmented generation, and structured memory, the framework enables agents to interpret user intent from natural language conversations, retrieve relevant datasets and models, coordinate solvers and simulators, and execute complex workflows. We demonstrate this approach through a freight decarbonization case study, showcasing how MCP enables modular, interoperable, and adaptive agent behavior across diverse toolchains. The results reveal that our system transforms digital twins from static visualizations into autonomous, decision-capable systems, advancing the frontiers of urban operations research. By enabling context-aware, generative agents to operate scientific tools automatically and collaboratively, this framework supports more intelligent, accessible, and dynamic decision-making in transportation planning and smart city management.

优化城市货运物流对于发展可持续、低碳城市至关重要。传统方法往往依赖于模拟工具、优化解决方案和专家驱动工作流程的手工协调,从而限制其效率和可扩展性。本文展示了一种代理系统架构,利用示范背景协议(MCP)在科学工具之间协调多剂合作,以实现城市物流的自主、模拟知情优化。该系统将自营代理物与具体领域的引擎(如用于优化的Gurobi和用于代理模拟的AnyLogic)相结合,形成具有推理、规划和跨多式联运货运网络行动的基因化数字双轨。通过整合综合聊天机、检索增强生成和结构记忆,该框架使代理商能够解释自然语言对话中的用户意向,检索相关数据集和模型,协调解决方案和模拟器,并实施复杂的科学工作流程。我们通过货物脱碳化案例研究展示了这一方法,展示了MCP如何在各种工具链中促成模块化、相互兼容性和适应性代理物的行为。结果显示,我们系统通过将数字双胞胎从静态可视化的市际决策框架、自动操作,支持了静态的市际决策工具。

Article 8

Title@2025-06-17 (2): AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

Title: AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

AssistantX: Ein LLM-powered Proaktiver Assistent in kollaborativer Mensch-bevölkerter Umgebung

助理X:在合作人类普惠环境方面由LLM授权的一名主动助理助理 2409.17655v2

Authors (5): Nan Sun, Bo Mao, Yongchang Li, Di Guo, Huaping Liu

Current service robots suffer from limited natural language communication abilities, heavy reliance on predefined commands, ongoing human intervention, and, most notably, a lack of proactive collaboration awareness in human-populated environments. This results in narrow applicability and low utility. In this paper, we introduce AssistantX, an LLM-powered proactive assistant designed for autonomous operation in realworld scenarios with high accuracy. AssistantX employs a multi-agent framework consisting of 4 specialized LLM agents, each dedicated to perception, planning, decision-making, and reflective review, facilitating advanced inference capabilities and comprehensive collaboration awareness, much like a human assistant by your side. We built a dataset of 210 real-world tasks to validate AssistantX, which includes instruction content and status information on whether relevant personnel are available. Extensive experiments were conducted in both text-based simulations and a real office environment over the course of a month and a half. Our experiments demonstrate the effectiveness of the proposed framework, showing that AssistantX can reactively respond to user instructions, actively adjust strategies to adapt to contingencies, and proactively seek assistance from humans to ensure successful task completion. More details and videos can be found at https://assistantx-agent. github.io/AssistantX/.

目前,服务机器人的自然语言交流能力有限,严重依赖预先界定的指令,不断进行人类干预,而且最显著的是,在人类居住环境中缺乏主动的合作意识,这导致适用性狭窄和效用低。在本文件中,我们引入了A助理X,这是一位具有LLM动力的主动助理,在现实世界情景下,为自主运作设计,精准地采用了高精准地进行LLM的自动操作;A助理X采用一个由4个专门LLM代理组成的多试剂框架,每个代理都致力于认识、规划、决策和反思性审查,促进先进的推断能力和全面合作意识,就像你身边的一位人类助理一样。我们建立了一个由210项真实世界任务组成的数据集,以验证A助理X,其中包括关于是否有相关人员的教学内容和状况的教学信息。在一个半月内,在基于文本的模拟和真实的办公环境中进行了广泛的实验。我们的实验展示了拟议框架的有效性,表明A助理X能够对用户的指示作出反应,积极调整战略以适应紧急情况,并积极寻求人类的协助,以确保任务顺利完成。在 https://ARIantx/Aribex.Ariotrio.X/Arius.

Article 9

Title@2025-06-17 (2): Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective

Title: Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective

Inhärente und entstehende Haftungsfragen in LLM-basierten agentischen Systemen: eine Principal-Agent-Perspektive

以LLLM为基础的代理系统中的固有和新出现的赔偿责任问题:主要代理人的视角 2504.03255v2

Authors (2): Garry A. Gabison, R. Patrick Xian

Agentic systems powered by large language models (LLMs) are becoming progressively more complex and capable. Their increasing agency and expanding deployment settings attract growing attention to effective governance policies, monitoring, and control protocols. Based on the emerging landscape of the agentic market, we analyze potential liability issues arising from the delegated use of LLM agents and their extended systems through a principal-agent perspective. Our analysis complements existing risk-based studies on artificial agency and covers the spectrum of important aspects of the principal-agent relationship and their potential consequences at deployment. Furthermore, we motivate method developments for technical governance along the directions of interpretability and behavior evaluations, reward and conflict management, and the mitigation of misalignment and misconduct through principled engineering of detection and fail-safe mechanisms. By illustrating the outstanding issues in AI liability for LLM-based agentic systems, we aim to inform the system design, auditing, and tracing to enhance transparency and liability attribution.

由大型语言模型(LLMS)驱动的代理系统正在逐渐变得日益复杂和有能力,其日益增强的机构性和扩大的部署环境吸引了对有效治理政策、监测和控制规程的日益重视。根据代理市场的新格局,我们从主要代理的角度分析委托使用LLM代理物及其扩展系统所产生的潜在责任问题。我们的分析补充了现有基于风险的人工代理物研究,并涵盖主要代理人关系及其在部署时的潜在后果的方方面面。此外,我们推动技术治理方法的发展,沿着可解释性和行为评价、奖励和冲突管理的方向发展,并通过探测和故障保险机制的原则工程,减轻不匹配和不当行为。我们通过说明基于LM代理物系统的AI赔偿责任中的未决问题,目的是为系统设计、审计和追踪提供信息,以加强透明度和责任归属。

Article 10

Title@2025-06-17 (2): Hierarchical Multi-Agent Reinforcement Learning-based Coordinated Spatial Reuse for Next Generation WLANs

Title: Hierarchical Multi-Agent Reinforcement Learning-based Coordinated Spatial Reuse for Next Generation WLANs

Hierarchische Multi-Agenten-Verstärkung lernbasierte koordinierte räumliche Wiederverwendung für WLANs der nächsten Generation

下一代工作计划协调空间再利用 2506.14187v1

Authors (4): Jiaming Yu, Le Liang, Hao Ye, Shi Jin

High-density Wi-Fi deployments often result in significant co-channel interference, which degrades overall network performance. To address this issue, coordination of multi access points (APs) has been considered to enable coordinated spatial reuse (CSR) in next generation wireless local area networks. This paper tackles the challenge of downlink spatial reuse in Wi-Fi networks, specifically in scenarios involving overlapping basic service sets, by employing hierarchical multi-agent reinforcement learning (HMARL). We decompose the CSR process into two phases, i.e., a polling phase and a decision phase, and introduce the HMARL algorithm to enable efficient CSR. To enhance training efficiency, the proposed HMARL algorithm employs a hierarchical structure, where station selection and power control are determined by a high- and low-level policy network, respectively. Simulation results demonstrate that this approach consistently outperforms baseline methods in terms of throughput and latency across various network topologies. Moreover, the algorithm exhibits robust performance when coexisting with legacy APs. Additional experiments in a representative topology further reveal that the carefully designed reward function not only maximizes the overall network throughput, but also improves fairness in transmission opportunities for APs in high-interference regions.

为解决这一问题,考虑协调多个接入点,以便在下一代无线局域网中协调空间再利用(CSR),本文件应对无线网下行空间再利用(CSR)的挑战,特别是在涉及基础成套服务重叠的情景中,采用高等级多剂强化学习(HMARL),对高等级多剂强化学习(HMARL)进行高层次再利用;我们将CSR进程分解为两个阶段,即投票阶段和决策阶段,并引入HMARL算法,使高效率的CSR得以提高培训效率。为了提高培训效率,拟议的HMARL算法采用了等级结构,由高层次和低层次的政策网分别决定站的选择和电力控制。模拟结果表明,这一方法始终超越了各种网络顶层的吞吐量和耐受力方面的基线方法。此外,在与遗留的APS共存时,该算法显示出强有力的性能。为了提高培训效率,拟议的HMARL算法采用了一种等级结构。为了提高培训效率,拟议的HMARL算法采用了一种等级结构,即由高层次和低层次政策网点选择和低级别政策网络的公平性,还改进了亚太网络的传输机会。

Article 11

Title@2025-06-17 (2): Light Aircraft Game : Basic Implementation and training results analysis

Title: Light Aircraft Game : Basic Implementation and training results analysis

Light Aircraft Spiel : Basic Implementation and training results analysis

轻型飞机游戏:基本实施和培训结果分析 2506.14164v1

Authors (1): Hanzhong Cao

This paper investigates multi-agent reinforcement learning (MARL) in a partially observable, cooperative-competitive combat environment known as LAG. We describe the environment’s setup, including agent actions, hierarchical controls, and reward design across different combat modes such as No Weapon and ShootMissile. Two representative algorithms are evaluated: HAPPO, an on-policy hierarchical variant of PPO, and HASAC, an off-policy method based on soft actor-critic. We analyze their training stability, reward progression, and inter-agent coordination capabilities. Experimental results show that HASAC performs well in simpler coordination tasks without weapons, while HAPPO demonstrates stronger adaptability in more dynamic and expressive scenarios involving missile combat. These findings provide insights into the trade-offs between on-policy and off-policy methods in multi-agent settings.

本文件调查了在被称为LAG的可部分观测、合作竞争的战斗环境中的多剂强化学习(MARL),我们描述了环境的设置,包括代理行动、等级控制和各种作战模式(如无武器和导弹)的奖励设计,评价了两种有代表性的算法:HAPPO,PPO政策等级变体和HasAC,这是基于软行为者批评的脱政策方法。我们分析了它们的培训稳定性、奖励进展和机构间协调能力。实验结果显示,HASAC在没有武器的情况下,在更简单的协调任务方面表现良好,而HAPPO在涉及导弹战斗的更动态、更清晰的情景中表现出更强的适应性。这些结果为多剂环境下的在政策上和非政策上的方法之间的取舍提供了深刻的见解。

Article 12

Title@2025-06-17 (2): StorySage: Conversational Autobiography Writing Powered by a Multi-Agent Framework

Title: StorySage: Conversational Autobiography Writing Powered by a Multi-Agent Framework

StorySage: Conversational Autobiography Writing Powered by a Multi-Agent Framework

故事:以多机构框架为动力的相互自传书写作 2506.14159v1

Authors (6): Shayan Talaei, Meijin Li, Kanu Grover, James Kent Hippler, Diyi Yang, Amin Saberi

Every individual carries a unique and personal life story shaped by their memories and experiences. However, these memories are often scattered and difficult to organize into a coherent narrative, a challenge that defines the task of autobiography writing. Existing conversational writing assistants tend to rely on generic user interactions and pre-defined guidelines, making it difficult for these systems to capture personal memories and develop a complete biography over time. We introduce StorySage, a user-driven software system designed to meet the needs of a diverse group of users that supports a flexible conversation and a structured approach to autobiography writing. Powered by a multi-agent framework composed of an Interviewer, Session Scribe, Planner, Section Writer, and Session Coordinator, our system iteratively collects user memories, updates their autobiography, and plans for future conversations. In experimental simulations, StorySage demonstrates its ability to navigate multiple sessions and capture user memories across many conversations. User studies (N=28) highlight how StorySage maintains improved conversational flow, narrative completeness, and higher user satisfaction when compared to a baseline. In summary, StorySage contributes both a novel architecture for autobiography writing and insights into how multi-agent systems can enhance human-AI creative partnerships.

然而,这些记忆往往分散,难以组织成一个连贯的叙事,这是界定自传写任务的挑战。现有的谈话写作助理往往依赖通用用户互动和预先确定的指导方针,使得这些系统难以在一段时间内捕捉个人记忆和制作完整的传记。我们引入了一个由用户驱动的软件系统StorySage,这个系统旨在满足不同用户群体的需求,支持灵活的谈话和对自传书写采用结构化的方法。由访谈者、会议Scribe、规划者、科作家和会议协调员组成的多试办框架授权,我们的系统反复收集用户的记忆,更新其自传和今后的谈话计划。在实验性模拟中,故事StorySage展示其操作多个会议的能力和在许多对话中捕捉用户的记忆。用户研究(N=28)着重说明StorySage如何保持更好的谈话流、叙述性完整性和用户满意度,与基线相比,用户满意度更高。简而言之,TritySage为自动生物学写作和洞察力系统提供了新的结构。

Article 13

Title@2025-06-16 (1): Beyond Browsing: API-Based Web Agents

Title: Beyond Browsing: API-Based Web Agents

Jenseits von Browsing: API-basierte Web-Agenten

超出浏览范围: API 网络代理 2410.16464v3

Authors (4): Yueqi Song, Frank Xu, Shuyan Zhou, Graham Neubig

Web browsers are a portal to the internet, where much of human activity is undertaken. Thus, there has been significant research work in AI agents that interact with the internet through web browsing. However, there is also another interface designed specifically for machine interaction with online content: application programming interfaces (APIs). In this paper we ask – what if we were to take tasks traditionally tackled by Browsing Agents, and give AI agents access to APIs? To do so, we propose two varieties of agents: (1) an API-calling agent that attempts to perform online tasks through APIs only, similar to traditional coding agents, and (2) a Hybrid Agent that can interact with online data through both web browsing and APIs. In experiments on WebArena, a widely-used and realistic benchmark for web navigation tasks, we find that API-Based Agents outperform web Browsing Agents. Hybrid Agents out-perform both others nearly uniformly across tasks, resulting in a more than 24.0% absolute improvement over web browsing alone, achieving a success rate of 38.9%, the SOTA performance among task-agnostic agents. These results strongly suggest that when APIs are available, they present an attractive alternative to relying on web browsing alone.

网络浏览器是互联网的门户, 大部分人类活动都是在互联网上进行。因此, AI代理商通过网络浏览与互联网互动, 已经做了大量研究工作。但是, 还有一个专门设计用于与在线内容进行机器互动的界面: 应用程序编程界面(APIs) 。在本文中, 我们问 – 如果我们要承担传统上由浏览代理商处理的任务, 并允许AI代理商访问API ? 要做到这一点, 我们建议两种类型的代理商:(1) API呼叫代理商, 仅试图通过API执行在线任务, 类似于传统的编码代理商; (2) 混合代理商, 可通过网络浏览和API 与在线数据互动。在WebArena的实验中, 应用范围广泛和现实的网络导航任务基准, 我们发现基于API的代理商超越了网络浏览代理商的功能。混合代理商在各项任务之间几乎一致地调整了其他两种代理商, 导致仅仅通过网络浏览器完成超过24.0%的绝对改进, 成功率达到38.9%, SOTA的绩效显示, 仅具有吸引力的网络浏览代理商的替代成果。

Article 14

Title@2025-06-16 (1): Deceptive Path Planning: A Bayesian Game Approach

Title: Deceptive Path Planning: A Bayesian Game Approach

Täuschende Pfadplanung: Ein Bayesischer Spielansatz

欺骗性道路规划:贝耶斯游戏方法 2506.13650v1

Authors (5): Violetta Rostobaya, James Berneburg, Yue Guan, Michael Dorothy, Daigo Shishika

This paper investigates how an autonomous agent can transmit information through its motion in an adversarial setting. We consider scenarios where an agent must reach its goal while deceiving an intelligent observer about its destination. We model this interaction as a dynamic Bayesian game between a mobile Attacker with a privately known goal and a Defender who infers the Attacker’s intent to allocate defensive resources effectively. We use Perfect Bayesian Nash Equilibrium (PBNE) as our solution concept and propose a computationally efficient approach to find it. In the resulting equilibrium, the Defender employs a simple Markovian strategy, while the Attacker strategically balances deception and goal efficiency by stochastically mixing shortest and non-shortest paths to manipulate the Defender’s beliefs. Numerical experiments demonstrate the advantages of our PBNE-based strategies over existing methods based on one-sided optimization.

本文探讨自主代理人如何在对抗状态下通过其运动传递信息。我们考虑了代理人必须达到其目标的情景,同时对目的地进行欺骗。我们将这种互动模拟为具有私人已知目标的移动攻击者与推断攻击者意图有效分配防御资源的辩护人之间的动态巴伊西亚游戏。我们使用完美的巴伊西亚纳什·纳什·基利姆(PBNE)作为我们的解决方案概念,并提出一种计算效率高的方法来找到它。在由此形成的平衡中,辩护人采用了简单的马尔科维安战略,而攻击者则通过将最短和非最短、最短的路径混在一起来控制辩护人的信仰,从而在战略上平衡欺骗和目的效率。数字实验显示了我们以PBNE为基础的战略比以片面优化为基础的现有方法的优势。

Article 15

Title@2025-06-16 (1): Agent Capability Negotiation and Binding Protocol (ACNBP)

Title: Agent Capability Negotiation and Binding Protocol (ACNBP)

Agent Capability Negotiation and Binding Protocol (ACNBP)

能力谈判和具有约束力的议定书(ACNBP) 2506.13590v1

Authors (4): Ken Huang, Akram Sheriff, Vineeth Sai Narajala, Idan Habler

As multi-agent systems evolve to encompass increasingly diverse and specialized agents, the challenge of enabling effective collaboration between heterogeneous agents has become paramount, with traditional agent communication protocols often assuming homogeneous environments or predefined interaction patterns that limit their applicability in dynamic, open-world scenarios. This paper presents the Agent Capability Negotiation and Binding Protocol (ACNBP), a novel framework designed to facilitate secure, efficient, and verifiable interactions between agents in heterogeneous multi-agent systems through integration with an Agent Name Service (ANS) infrastructure that provides comprehensive discovery, negotiation, and binding mechanisms. The protocol introduces a structured 10-step process encompassing capability discovery, candidate pre-screening and selection, secure negotiation phases, and binding commitment with built-in security measures including digital signatures, capability attestation, and comprehensive threat mitigation strategies, while a key innovation of ACNBP is its protocolExtension mechanism that enables backward-compatible protocol evolution and supports diverse agent architectures while maintaining security and interoperability. We demonstrate ACNBP’s effectiveness through a comprehensive security analysis using the MAESTRO threat modeling framework, practical implementation considerations, and a detailed example showcasing the protocol’s application in a document translation scenario, with the protocol addressing critical challenges in agent autonomy, capability verification, secure communication, and scalable agent ecosystem management.

随着多试剂系统演变成涵盖日益多样化和专业化的代理人,促成不同代理人之间有效合作的挑战已经变得至关重要,传统代理人通信协议往往假设单一的环境或预先确定的相互作用模式,限制其在动态、开放世界情景中的适用性,因此传统代理人通信协议往往假定了单一的环境或预先确定的互动模式,本文件介绍了《代理能力谈判和具有约束力的议定书》(ACNBP),这是一个新颖的框架,旨在通过与提供全面发现、谈判和约束机制的代理名称服务(ANS)基础设施整合,促进不同多试剂系统中的代理人之间安全、高效和可核查的互动,提供全面的发现、谈判和约束机制。议定书引入了一个结构化的十步进程,包括能力发现、候选的预先筛选和选择、安全的谈判阶段以及有约束力的承诺,包括内在安全措施,包括数字签名、能力证明和全面减少威胁战略,而亚太代理人协定的一项关键创新是其促进议定书演变的机制,它既能促进后相兼容性协议的演变,又支持多种代理人结构,同时保持安全和互操作性。我们通过利用MAESTRO威胁建模框架的全面安全分析、实际执行考虑以及一个详细的例子,在文件翻译设想中展示议定书的应用,并详细展示了议定书的应用,并附有自主权,从而解决了议定书的代理人管理能力,从而解决了安全管理的挑战。

Article 16

Title@2025-06-16 (1): Mobility to Campus – a Framework to Evaluate and Compare Different Mobility Modes

Title: Mobility to Campus – a Framework to Evaluate and Compare Different Mobility Modes

Mobility to Campus - ein Rahmen zur Bewertung und zum Vergleich unterschiedlicher Mobilitätsmodi

流动到校园 – – 评估和比较不同流动模式的框架 2506.13574v1

Authors (3): Helena Fehler, Marco Pruckner, Marie Schmidt

The transport sector accounts for about 20% of German CO2 emissions, with commuter traffic contributing a significant part. Particularly in rural areas, where public transport is inconvenient to use, private cars are a common choice for commuting and most commuters travel alone in their cars. Consolidation of some of these trips has the potential to decrease CO2 emissions and could be achieved, e.g., by offering ridesharing (commuters with similar origin-destination pairs share a car) or ridepooling (commuters are picked up by shuttle services). In this study, we present a framework to assess the potential of introducing new mobility modes like ridesharing and ridepooling for commuting towards several locations in close vicinity to each other. We test our framework on the case of student mobility to the University of W"urzburg, a university with several campus locations and a big and rather rural catchment area, where existing public transport options are inconvenient and many students commute by car. We combine data on student home addresses and campus visitation times to create demand scenarios. In our case study, we compare the mobility modes of ridesharing and ridepooling to the base case, where students travel by car on their own. We find that ridesharing has the potential to greatly reduce emissions, depending on the percentage of students willing to use the service and their willingness to walk to the departure location. The benefit of ridepooling is less clear, materializing only if the shuttle vehicles are more energy efficient than the student cars.

交通部门约占德国二氧化碳排放量的20%,通勤交通占德国二氧化碳排放量的很大一部分。特别是在农村地区,公共交通不方便使用,私家汽车是通勤的常见选择,大多数通勤者单独乘坐汽车。合并其中一些旅行有可能减少二氧化碳排放量,可以实现,例如,提供搭乘(具有类似来源地-目的地配对的坐客共用一辆车)或搭乘搭乘(搭便车由穿梭服务接通)或搭乘搭乘(搭便车搭乘轮车),从而产生相当大一部分需求。在本研究中,我们提出了一个框架,用以评估采用新的机动模式的可能性,如搭乘和搭乘公车前往彼此相邻的若干地点。我们测试了我们关于学生前往W'urzburg大学(一所拥有若干校园地点的大学,以及一个大型和偏重农村集水区,这里现有的公共交通选项不方便,许多学生乘车往返汽车。我们仅将学生住家住家和校园访问时间的数据合并起来,以创造需求设想。在我们的案例研究中,我们比较了搭乘公车和搭乘公车前往离彼此附近若干地点的交通的机动模式的可能性,而学生旅行旅行旅行比更难更有利于。我们发现他们更愿意旅行旅行旅行旅行更难度的航行更有利于度。

Article 17

Title: Achieving Collective Welfare in Multi-Agent Reinforcement Learning via Suggestion Sharing

Kollektive Wohlfahrt im Mehr-Agenten-Verstärkungs-Lernen durch Suggestion Sharing erreichen

通过分享建议,实现多机构加强多机构加强学习的集体福利 2412.12326v2

Authors (3): Yue Jin, Shuangqing Wei, Giovanni Montana

In human society, the conflict between self-interest and collective well-being often obstructs efforts to achieve shared welfare. Related concepts like the Tragedy of the Commons and Social Dilemmas frequently manifest in our daily lives. As artificial agents increasingly serve as autonomous proxies for humans, we propose a novel multi-agent reinforcement learning (MARL) method to address this issue - learning policies to maximise collective returns even when individual agents’ interests conflict with the collective one. Unlike traditional cooperative MARL solutions that involve sharing rewards, values, and policies or designing intrinsic rewards to encourage agents to learn collectively optimal policies, we propose a novel MARL approach where agents exchange action suggestions. Our method reveals less private information compared to sharing rewards, values, or policies, while enabling effective cooperation without the need to design intrinsic rewards. Our algorithm is supported by our theoretical analysis that establishes a bound on the discrepancy between collective and individual objectives, demonstrating how sharing suggestions can align agents’ behaviours with the collective objective. Experimental results demonstrate that our algorithm performs competitively with baselines that rely on value or policy sharing or intrinsic rewards.

在人类社会,自我利益和集体福利之间的冲突往往阻碍实现共同福利的努力。共同人和社会困境的悲剧等相关概念经常出现在我们的日常生活中。由于人为代理人日益成为人类的自主代理人,我们提出一种新的多剂强化学习(MARL)方法来解决这一问题:学习政策以最大限度地实现集体回报,即使个别代理人的利益与集体利益发生冲突。传统合作的MARL解决方案涉及分享奖励、价值和政策,或设计内在奖励以鼓励代理人学习集体最佳政策,与此不同,我们建议采用新的MARL方法,使代理人交流行动建议。我们的方法显示,与分享奖励、价值或政策相比,私人信息较少,而无需设计内在奖励。我们的算法得到我们的理论分析的支持,这种理论分析将集体目标和个人目标之间的差异联系在一起,表明分享建议可以如何使代理人的行为与集体目标相一致。实验结果表明,我们的算法具有竞争性,其基线依赖于价值或政策分享或内在奖励。

Article 18

Title@2025-06-16 (1): Socratic RL: A Novel Framework for Efficient Knowledge Acquisition through Iterative Reflection and Viewpoint Distillation

Title: Socratic RL: A Novel Framework for Efficient Knowledge Acquisition through Iterative Reflection and Viewpoint Distillation

Socratic RL: Ein neuartiger Rahmen für effiziente Wissensakquisition durch iterative Reflexion und Sichtdestillation

Scortic RL:一个通过迭代思考和观察点蒸馏提高知识获取效率的新颖框架 2506.13358v1

Authors (1): Xiangfan Wu

Current Reinforcement Learning (RL) methodologies for Large Language Models (LLMs) often rely on simplistic, outcome-based reward signals (e.g., final answer correctness), which limits the depth of learning from each interaction. This paper introduces Socratic Reinforcement Learning (Socratic-RL), a novel, process-oriented framework designed to address this limitation. Socratic-RL operates on the principle that deeper understanding is achieved by reflecting on the causal reasons for errors and successes within the reasoning process itself. The framework employs a decoupled “Teacher-Student” architecture, where a “Teacher AI” analyzes interaction histories, extracts causal insights, and formulates them into structured “viewpoints.” These viewpoints, acting as distilled guidance, are then used by a “Student AI” to enhance its subsequent reasoning. A key innovation is the iterative self-improvement of the Teacher AI, enabling its reflective capabilities to evolve through a meta-learning loop. To manage the accumulation of knowledge, a distillation mechanism compresses learned viewpoints into the Student’s parameters. By focusing on process rather than just outcome, Socratic-RL presents a pathway toward enhanced sample efficiency, superior interpretability, and a more scalable architecture for self-improving AI systems. This paper details the foundational concepts, formal mechanisms, synergies, challenges, and a concrete research roadmap for this proposed framework.

目前大语言模型(LLM)的强化学习(RL)方法往往依赖简单化的、基于结果的奖励信号(例如,最终答案的正确性),这限制了从每种互动中学习的深度。本文介绍了Scortic Estrument Learning(Sctory-RL),这是一个旨在解决这一局限性的新颖的、面向过程的框架。Scorti-RL遵循的原则是,通过反思推理过程本身中错误和成功的原因来达成更深刻的理解。框架采用了一种分解的“师生”架构,其中“教师AI”分析互动历史,提取因果关系的洞察力,并将其形成结构化为结构化的“观察点 ” 。这些观点作为精炼的指南,然后被一个“教师AI”用来加强随后的推理。一个关键的创新是反复的自我改进,使其反省能力能够通过元学习循环演进。管理知识的积累,一个精练机制将学习的观点压缩成学生的正式参数。通过注重过程而不是更高级的路径结构来解释一个更高级的自我分析基础。

Article 19

Title@2025-06-16 (1): Design of A* based heuristic algorithm for efficient interdiction in multi-Layer networks

Title: Design of A* based heuristic algorithm for efficient interdiction in multi-Layer networks

Entwurf eines auf A* basierenden heuristischen Algorithmus für effizientes Interdiction in Multi-Layer-Netzwerken

设计基于A* 的超值算法,以有效阻截多路网络 2506.10017v2

Authors (1): Sukanya Samanta

Intercepting a criminal using limited police resources presents a significant challenge in dynamic crime environments, where the criminal’s location continuously changes over time. The complexity is further heightened by the vastness of the transportation network. To tackle this problem, we propose a layered graph representation, in which each time step is associated with a duplicate of the transportation network. For any given set of attacker strategies, a near-optimal defender strategy is computed using the A-Star heuristic algorithm applied to the layered graph. The defender’s goal is to maximize the probability of successful interdiction. We evaluate the performance of the proposed method by comparing it with a Mixed-Integer Linear Programming (MILP) approach used for the defender. The comparison considers both computational efficiency and solution quality. The results demonstrate that our approach effectively addresses the complexity of the problem and delivers high-quality solutions within a short computation time.

利用有限的警察资源侦缉罪犯是动态犯罪环境中的一个重大挑战,罪犯所处的位置随着时间的变化而不断变化,其复杂性因运输网络的广度而进一步增加。为了解决这一问题,我们提议了一个分层图示,其中每一步都与运输网络的重复相联。对于任何一套特定的攻击者战略,都使用适用于分层图的A-Star黑奴主义算法计算出近于最佳的防御战略。辩护人的目标是最大限度地增加成功阻截的概率。我们通过将拟议方法与维权者使用的混合- Intger线性程序(MILP)方法进行比较来评估该方法的绩效。比较考虑到计算效率和解决方案的质量。结果表明,我们的方法有效地解决了问题的复杂性,并在短的计算时间内提供了高质量的解决方案。

Article 20

Title@2025-06-16 (1): Towards Pervasive Distributed Agentic Generative AI – A State of The Art

Title: Towards Pervasive Distributed Agentic Generative AI – A State of The Art

Auf dem Weg zu einer allgegenwärtigen verteilten agentischen Generativen KI – Ein Stand der Kunst

朝向分布式分布式分布式制剂产生AI – – 艺术状态 2506.13324v1

Authors (2): Gianni Molinari, Fabio Ciravegna

The rapid advancement of intelligent agents and Large Language Models (LLMs) is reshaping the pervasive computing field. Their ability to perceive, reason, and act through natural language understanding enables autonomous problem-solving in complex pervasive environments, including the management of heterogeneous sensors, devices, and data. This survey outlines the architectural components of LLM agents (profiling, memory, planning, and action) and examines their deployment and evaluation across various scenarios. Than it reviews computational and infrastructural advancements (cloud to edge) in pervasive computing and how AI is moving in this field. It highlights state-of-the-art agent deployment strategies and applications, including local and distributed execution on resource-constrained devices. This survey identifies key challenges of these agents in pervasive computing such as architectural, energetic and privacy limitations. It finally proposes what we called “Agent as a Tool”, a conceptual framework for pervasive agentic AI, emphasizing context awareness, modularity, security, efficiency and effectiveness.

智能剂和大语言模型(LLMS)的快速进步正在改变普遍存在的计算领域。他们通过自然语言理解的感知、理性和行为的能力使得在复杂的普遍环境中能够自主解决问题,包括管理各种传感器、装置和数据。本调查概述了LLM剂的建筑组成部分(外观、记忆、规划和行动),并审查了其在不同情况下的部署和评价。比它审查普遍计算中的计算和基础设施进步(从边缘到边缘)以及AI在这一领域的动向。它强调了最先进的代理器部署战略和应用程序,包括在当地和分布地对资源限制装置的部署和应用。本调查确定了这些代理人在建筑、高能和隐私限制等普遍存在的计算机中的主要挑战。它最后提出了我们称之为“作为工具”的工具的普及性人工智能的概念框架,强调背景意识、模块性、安全性、效率和有效性。

Article 21

Title@2025-06-16 (1): Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning

Title: Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning

Convex Markov Games: Eine neue Front für das Mehr-Agenten-Stärkungs-Lernen

Convex Markov 游戏:多机构强化学习的新疆域 2410.16600v3

Authors (5): Ian Gemp, Andreas Haupt, Luke Marris, Siqi Liu, Georgios Piliouras

Behavioral diversity, expert imitation, fairness, safety goals and others give rise to preferences in sequential decision making domains that do not decompose additively across time. We introduce the class of convex Markov games that allow general convex preferences over occupancy measures. Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibria exist. Furthermore, equilibria can be approximated empirically by performing gradient descent on an upper bound of exploitability. Our experiments reveal novel solutions to classic repeated normal-form games, find fair solutions in a repeated asymmetric coordination game, and prioritize safe long-term behavior in a robot warehouse environment. In the prisoner’s dilemma, our algorithm leverages transient imitation to find a policy profile that deviates from observed human play only slightly, yet achieves higher per-player utility while also being three orders of magnitude less exploitable.

行为多样性、专家模仿、公平、安全目标及其他因素在连续决策领域带来偏好, 且不会在时间上分解。我们引入了允许一般分流偏好于占用措施的康韦克斯 Markov 游戏类别。尽管时间跨度无限,且比Markov 游戏更加普遍, 纯粹的策略Nash 平衡仍然存在。此外, 以经验方式将梯度从可开发性的上层下降, 从而可以比较平衡。我们的实验揭示了经典的重复正态游戏的新解决方案, 在重复的非对称协调游戏中找到公平解决方案, 并在机器人仓库环境中优先关注安全的长期行为。在囚犯的困境中, 我们的算法利用瞬间仿制来找到一个与所观察到的人类游戏稍有偏差的政策配置, 但却能达到更高的人均效用, 而同时又是3级的不易开发性。

Article 22

Title@2025-06-16 (1): G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Title: G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

G-Memory: Hierarchischer Speicher für Multi-Agent-Systeme

G-记忆:为多机构系统追踪等级记忆 2506.07398v2

Authors (6): Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, Shuicheng Yan

Large language model (LLM)-powered multi-agent systems (MAS) have demonstrated cognitive and execution capabilities that far exceed those of single LLM agents, yet their capacity for self-evolution remains hampered by underdeveloped memory architectures. Upon close inspection, we are alarmed to discover that prevailing MAS memory mechanisms (1) are overly simplistic, completely disregarding the nuanced inter-agent collaboration trajectories, and (2) lack cross-trial and agent-specific customization, in stark contrast to the expressive memory developed for single agents. To bridge this gap, we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory, which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both $\textit{high-level, generalizable insights}$ that enable the system to leverage cross-trial knowledge, and $\textit{fine-grained, condensed interaction trajectories}$ that compactly encode prior collaboration experiences. Upon task execution, the entire hierarchy evolves by assimilating new collaborative trajectories, nurturing the progressive evolution of agent teams. Extensive experiments across five benchmarks, three LLM backbones, and three popular MAS frameworks demonstrate that G-Memory improves success rates in embodied action and accuracy in knowledge QA by up to $20.89\%$ and $10.12\%$, respectively, without any modifications to the original frameworks. Our codes are available at https://github.com/bingreeky/GMemory.

大型语言模型(LLM)驱动的多试剂系统(MAS)显示的认知和执行能力远远超过了单一LLM代理机构的认知和执行能力,然而,他们的自我革命能力仍然受到不完善的记忆结构的阻碍。仔细检查后,我们震惊地发现,目前流行的MAS记忆机制(1)过于简单,完全无视细微的试剂间协作轨迹,以及(2) 缺乏跨审和代理人专用定制,这与为单个代理商开发的表达式记忆力形成鲜明对照。为了缩小这一差距,我们引入了G-Meory,这是一个受组织记忆理论启发的等级和代理人记忆系统,它通过三层图层结构:洞察、查询和互动图图图来管理MAS的长期互动。收到新的用户查询后,G-Mory进行双向记忆穿行曲,以检索 $tleftit{高层次和普通洞察 ,使系统能够利用跨层知识, $rdealit{fy frial-deal reminate tractorationorations, legreal legreal legal acal legreal le acreal lexal legregrodustrational le lexlational le le le lexlational legal legal legal legre lection legal lection lection legal legal legal legal lemental lemental legal lemental lemental lemental lemental lemental lemental lemental lemental lemental lemental lemental lemental lex lectionsal lex lex lex lex lex lex le lemental legal lemental lemental lemental lemental lemental lemental lemental lemental le le le le lex lemental lemental lex,GMal lex,GMal lex,G-s le

Article 23

Title@2025-06-16 (1): Autonomous Computer Vision Development with Agentic AI

Title: Autonomous Computer Vision Development with Agentic AI

Autonome Computer Vision Entwicklung mit Agentischer KI

与Agric AI合作的自主计算机愿景发展 2506.11140v2

Authors (6): Jin Kim, Muhammad Wahi-Anwa, Sangyun Park, Shawn Shin, John M. Hoffman, Matthew S. Brown

Agentic Artificial Intelligence (AI) systems leveraging Large Language Models (LLMs) exhibit significant potential for complex reasoning, planning, and tool utilization. We demonstrate that a specialized computer vision system can be built autonomously from a natural language prompt using Agentic AI methods. This involved extending SimpleMind (SM), an open-source Cognitive AI environment with configurable tools for medical image analysis, with an LLM-based agent, implemented using OpenManus, to automate the planning (tool configuration) for a particular computer vision task. We provide a proof-of-concept demonstration that an agentic system can interpret a computer vision task prompt, plan a corresponding SimpleMind workflow by decomposing the task and configuring appropriate tools. From the user input prompt, “provide sm (SimpleMind) config for lungs, heart, and ribs segmentation for cxr (chest x-ray)”), the agent LLM was able to generate the plan (tool configuration file in YAML format), and execute SM-Learn (training) and SM-Think (inference) scripts autonomously. The computer vision agent automatically configured, trained, and tested itself on 50 chest x-ray images, achieving mean dice scores of 0.96, 0.82, 0.83, for lungs, heart, and ribs, respectively. This work shows the potential for autonomous planning and tool configuration that has traditionally been performed by a data scientist in the development of computer vision applications.

利用大语言模型(LLMS)的人工智能(AI)系统,利用大语言模型(LLMS),利用大语言模型(LLMS),在复杂的推理、规划和工具利用方面有巨大的潜力。我们证明一个专门的计算机视觉系统可以自动地从自然语言中建立,使用AAID方法,这涉及扩展SimleMind(SM),这是一个开放源源代码的人工智能环境,具有可配置的医学图像分析工具,使用OpenManus(LLMM)的代理,使规划(工具配置)自动化(工具配置),用于计算机的某个特定任务。我们提供了一个概念证明,证明一个代理系统可以迅速解释计算机视觉任务,通过拆分任务和配置适当工具来规划相应的简单MimpleMind工作流程。从用户输入提示中,“为肺部、心脏和肋骨部(chet x-ray)提供一个配置工具,通过测试YAML格式将SMS-L(培训)和SMYS-TINK3,用传统的50级图像自动配置,通过测试的计算机-BLILIMLMLMLMLA图图图图图,实现。

Article 24

Title@2025-06-16 (1): Identification of LFT Structured Descriptor Systems with Slow and Non-uniform Sampling

Title: Identification of LFT Structured Descriptor Systems with Slow and Non-uniform Sampling

Identifizierung von LFT-strukturierten Deskriptorensystemen mit langsamer und nicht einheitlicher Probenahme

LFT 结构化描述系统缓慢和非统一抽样的标识 2407.00629v5

Authors (1): Tong Zhou

Time domain identification is studied in this paper for parameters of a continuous-time multi-input multi-output descriptor system, with these parameters affecting system matrices through a linear fractional transformation. Sampling is permitted to be slow and non-uniform, and there are no necessities to satisfy the Nyquist frequency restrictions. This model can be used to describe the behaviors of a networked dynamic system, and the obtained results can be straightforwardly applied to an ordinary state-space model, as well as a lumped system. An explicit formula is obtained respectively for the transient and steady-state responses of the system stimulated by an arbitrary signal. Some relations have been derived between the system steady-state response and its transfer function matrix (TFM), which reveal that the value of a TFM at almost any interested point, as well as its derivatives and a right tangential interpolation along an arbitrary direction, can in principle be estimated from input-output experimental data. Based on these relations, an estimation algorithm is suggested respectively for the parameters of the descriptor system and the values of its TFM. Their properties like asymptotic unbiasedness, consistency, etc., are analyzed. A simple numerical example is included to illustrate characteristics of the suggested estimation algorithms.

本文将研究持续时间多投入多输出多输出描述器系统的参数的时间域域识别,这些参数通过线性分形转换影响系统矩阵。允许抽样缓慢且不统一,不需要满足Nyquist频率限制。这个模型可以用来描述网络化动态系统的行为,所获得的结果可以直接应用到普通的状态空间模型和块状系统。以任意信号刺激的系统瞬时和稳定状态反应分别获得明确的公式。系统稳定状态反应及其转移功能矩阵(TFM)之间产生了一些关系, 表明几乎任何感兴趣的点的TFM的价值及其衍生物和任意方向的正确相近性内推法。根据这些关系, 提议对描述器系统的参数和TFM的值分别进行估算算法。其属性,如系统稳定状态反应及其转移功能矩阵(TFM), 表明几乎任何感兴趣的点的TFM值及其衍生物和任意方向的正确相近相对等, 原则上可以从输入输出实验数据中估算。根据这些关系, 提议对描述器的参数和TFM值分别进行估算。其性质, 包括简单的不偏向性、分析。

Article 25

Title@2025-06-15 (7): Homeostatic Coupling for Prosocial Behavior

Title: Homeostatic Coupling for Prosocial Behavior

Homeostatische Kupplung für Prosoziales Verhalten

有利于社会行为的自制共聚 2506.12894v1

Authors (2): Naoto Yoshida, Kingson Man

When regarding the suffering of others, we often experience personal distress and feel compelled to help\footnote{Preprint. Under review.}. Inspired by living systems, we investigate the emergence of prosocial behavior among autonomous agents that are motivated by homeostatic self-regulation. We perform multi-agent reinforcement learning, treating each agent as a vulnerable homeostat charged with maintaining its own well-being. We introduce an empathy-like mechanism to share homeostatic states between agents: an agent can either \emph{observe} their partner’s internal state ({\bf cognitive empathy}) or the agent’s internal state can be \emph{directly coupled} to that of their partner ({\bf affective empathy}). In three simple multi-agent environments, we show that prosocial behavior arises only under homeostatic coupling - when the distress of a partner can affect one’s own well-being. Additionally, we show that empathy can be learned: agents can decode" their partner's external emotive states to infer the partner's internal homeostatic states. Assuming some level of physiological similarity, agents reference their own emotion-generation functions to invert the mapping from outward display to internal state. Overall, we demonstrate the emergence of prosocial behavior when homeostatic agents learn toread” the emotions of others and then to empathize, or feel as they feel.

有关他人的痛苦,我们经常经历个人痛苦,并感到不得不帮助他人。正在审查中。在生活体系的启发下, 我们调查由自制自我调节驱动的自主代理商中出现的亲社会行为。我们进行多剂强化学习, 将每个代理商视为一个负责维护自身福祉的弱势的顺势分子。我们引入一种同情式的机制, 以便在代理商之间分享顺势状态: 一个代理商可以帮助他们的伴侣的内部状态( bf 认知性同情 ) , 或者代理商的内部状态可以与其伴侣的内部状态( bf 感知性同情 ) 直接结合。在三个简单的多剂环境中, 我们显示, 亲社会行为只能发生在自制结合下。当伴侣的烦恼可以影响个人的福祉。此外, 我们表明, 可以学到同情: 代理商可以“ 解说 ” 他们的伴侣的外部情绪状态( breadcodecod ) , 或代理商的内心状态可以推断其伴侣的内部状态可以与其伴侣的内部状态( bemph{ 直接结合 ) 。在三个伙伴的伴侣的情感状态( breph commotion commotion) 中, 我们从内心的情感构造上的情感构造上的情感表现中可以显示某种状态, , 我们从内心的情感的情感的情感的情感变化到内心的情感变化到内心的情感变化到其他的发生。

Article 26

Title@2025-06-15 (7): HARBOR: Exploring Persona Dynamics in Multi-Agent Competition

Title: HARBOR: Exploring Persona Dynamics in Multi-Agent Competition

HARBOR: Erforschen von Persona-Dynamik im Multi-Agenten-Wettbewerb

《HARBOR:在多机构竞争中探索人动态》 2502.12149v2

Authors (3): Kenan Jiang, Li Xiong, Fei Liu

We investigate factors contributing to LLM agents’ success in competitive multi-agent environments, using auctions as a testbed where agents bid to maximize profit. The agents are equipped with bidding domain knowledge, distinct personas that reflect item preferences, and a memory of auction history. Our work extends the classic auction scenario by creating a realistic environment where multiple agents bid on houses, weighing aspects such as size, location, and budget to secure the most desirable homes at the lowest prices. Particularly, we investigate three key questions: (a) How does a persona influence an agent’s behavior in a competitive setting? (b) Can an agent effectively profile its competitors’ behavior during auctions? (c) How can persona profiling be leveraged to create an advantage using strategies such as theory of mind? Through a series of experiments, we analyze the behaviors of LLM agents and shed light on new findings. Our testbed, called HARBOR, offers a valuable platform for deepening our understanding of multi-agent workflows in competitive environments.

我们调查促使LLM代理商在竞争性多试机构环境下取得成功的因素,利用拍卖作为检验机构争取最大利润的试金石。代理商配备了投标领域知识、反映项目偏好的独特人物以及拍卖历史的记忆。我们的工作扩展了典型的拍卖情景,创造了一个现实的环境,让多个代理商竞拍房屋,权衡规模、地点和预算等方面,以最低价格确保最理想的住宅。我们特别调查了三个关键问题:(a) 个人如何影响代理人在竞争环境中的行为? (b) 代理人能否在拍卖期间有效地描述其竞争对手的行为? (c) 如何利用个人特征分析来利用思维理论等战略创造优势?我们通过一系列实验,分析LM代理商的行为,并揭示新的发现。我们的试金称为HARBOR,为加深我们对竞争性环境中多代理工作流程的理解提供了一个宝贵的平台。

Article 27

Title@2025-06-14 (6): Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow

Title: Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow

Vertrauen-MARL: Vertrauen-basiertes Multi-Agenten-Verstärkungs-Learning-Framework für kooperative On-Ramp-Merging-Kontrolle im heterogenen Verkehrsfluss

Trust-MARL: 以信任为基础的多机构加强多机构加强学习框架,促进在多样化交通流量方面合作在潮上合并控制 2506.12600v1

Authors (4): Jie Pan, Tianyi Wang, Christian Claudel, Jing Shi

Intelligent transportation systems require connected and automated vehicles (CAVs) to conduct safe and efficient cooperation with human-driven vehicles (HVs) in complex real-world traffic environments. However, the inherent unpredictability of human behaviour, especially at bottlenecks such as highway on-ramp merging areas, often disrupts traffic flow and compromises system performance. To address the challenge of cooperative on-ramp merging in heterogeneous traffic environments, this study proposes a trust-based multi-agent reinforcement learning (Trust-MARL) framework. At the macro level, Trust-MARL enhances global traffic efficiency by leveraging inter-agent trust to improve bottleneck throughput and mitigate traffic shockwave through emergent group-level coordination. At the micro level, a dynamic trust mechanism is designed to enable CAVs to adjust their cooperative strategies in response to real-time behaviors and historical interactions with both HVs and other CAVs. Furthermore, a trust-triggered game-theoretic decision-making module is integrated to guide each CAV in adapting its cooperation factor and executing context-aware lane-changing decisions under safety, comfort, and efficiency constraints. An extensive set of ablation studies and comparative experiments validates the effectiveness of the proposed Trust-MARL approach, demonstrating significant improvements in safety, efficiency, comfort, and adaptability across varying CAV penetration rates and traffic densities.

智能运输系统需要连接和自动化车辆,以便在复杂的现实世界交通环境中与人驱动车辆进行安全有效的合作。然而,人类行为的内在不可预测性,特别是在诸如公路在铁路上合并地区的瓶颈上,往往会扰乱交通流量和妥协系统性能。为了应对不同交通环境中合作在铁路上合并的挑战,本研究报告提议了一个基于信任的多试剂强化学习(信任-MARL)框架。在宏观一级,信托-MAL通过利用机构间信任,通过突发团体一级的协调,提高瓶颈吞吐量和减少交通冲击波,提高全球交通效率。在微观一级,设计一个动态信任机制,使民航委员会能够调整其合作战略,以应对实时行为和与不同交通环境中的历史互动。此外,一个受信任触发的游戏理论决策模块将一体化,以指导每一合作因素的调整和执行在安全、舒适和效率下改变车道的决定。在安全、舒适和高效的团体一级协调下,一个动态的信任机制旨在根据实时行为和历史互动性原则,展示在拟议的安全性、安全性、安全性、安全性和效率限制方面进行广泛的变革性测试。

Article 28

Title@2025-06-14 (6): A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications

Title: A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications

Eine umfassende Untersuchung der Tiefenforschung: Systeme, Methoden und Anwendungen

深层研究综合调查:系统、方法和应用 2506.12594v1

Authors (2): Renjun Xu, Jingwen Peng

This survey examines the rapidly evolving field of Deep Research systems – AI-powered applications that automate complex research workflows through the integration of large language models, advanced information retrieval, and autonomous reasoning capabilities. We analyze more than 80 commercial and non-commercial implementations that have emerged since 2023, including OpenAI/Deep Research, Gemini/Deep Research, Perplexity/Deep Research, and numerous open-source alternatives. Through comprehensive examination, we propose a novel hierarchical taxonomy that categorizes systems according to four fundamental technical dimensions: foundation models and reasoning engines, tool utilization and environmental interaction, task planning and execution control, and knowledge synthesis and output generation. We explore the architectural patterns, implementation approaches, and domain-specific adaptations that characterize these systems across academic, scientific, business, and educational applications. Our analysis reveals both the significant capabilities of current implementations and the technical and ethical challenges they present regarding information accuracy, privacy, intellectual property, and accessibility. The survey concludes by identifying promising research directions in advanced reasoning architectures, multimodal integration, domain specialization, human-AI collaboration, and ecosystem standardization that will likely shape the future evolution of this transformative technology. By providing a comprehensive framework for understanding Deep Research systems, this survey contributes to both the theoretical understanding of AI-augmented knowledge work and the practical development of more capable, responsible, and accessible research technologies. The paper resources can be viewed at https://github.com/scienceaix/deepresearch.

本调查考察了深层研究系统迅速发展的领域 – – 通过整合大型语言模型、先进信息检索和自主推理能力,使复杂的研究工作流程自动化的AI动力应用。我们分析了2023年以来出现的80多项商业和非商业实施,包括OpenAI/Deep Research、Gemini/Deep Research、Perplication/Deep Research以及许多开放来源的替代方法。我们通过全面研究,提出了一个新的等级分类法,将系统分为四个基本技术层面:基础模型和推理引擎、工具利用和环境互动、任务规划和执行控制、知识合成和产出生成等。我们探讨了这些系统在学术、科学、商业和教育应用中特有的建筑模式、实施办法和具体领域的适应。我们的分析揭示了当前实施的重大能力及其在信息准确性、隐私、知识产权和可获取性方面提出的技术和道德挑战。我们通过全面研究,确定了高级推理结构、多式联运一体化、领域专业化、人力资源-AI合作和生态系统标准化等有希望影响这一变革技术的未来演变。我们探索了建筑学、更能了解深入研究系统,从而了解这一研究技术。

Article 29

Title@2025-06-14 (6): Collaboration Between the City and Machine Learning Community is Crucial to Efficient Autonomous Vehicles Routing

Title: Collaboration Between the City and Machine Learning Community is Crucial to Efficient Autonomous Vehicles Routing

Zusammenarbeit zwischen der Stadt und Machine Learning Community ist entscheidend für effiziente autonome Fahrzeuge Routing

城市与机械学习社区之间的合作对于高效自治车辆的运行至关重要。 2502.13188v2

Authors (6): Anastasia Psarou, Ahmet Onur Akman, Łukasz Gorczyca, Michał Hoffmann, Grzegorz Jamróz, Rafał Kucharski

Autonomous vehicles (AVs), possibly using Multi-Agent Reinforcement Learning (MARL) for simultaneous route optimization, may destabilize traffic networks, with human drivers potentially experiencing longer travel times. We study this interaction by simulating human drivers and AVs. Our experiments with standard MARL algorithms reveal that, both in simplified and complex networks, policies often fail to converge to an optimal solution or require long training periods. This problem is amplified by the fact that we cannot rely entirely on simulated training, as there are no accurate models of human routing behavior. At the same time, real-world training in cities risks destabilizing urban traffic systems, increasing externalities, such as $CO_2$ emissions, and introducing non-stationarity as human drivers will adapt unpredictably to AV behaviors. In this position paper, we argue that city authorities must collaborate with the ML community to monitor and critically evaluate the routing algorithms proposed by car companies toward fair and system-efficient routing algorithms and regulatory standards.

自治车辆(AVs),可能同时使用多机构强化学习(MARL)来同时优化路线,可能会破坏交通网络的稳定,而人驾驶员可能要经历更长的旅行时间。我们通过模拟人驾驶员和AVs来研究这种互动。我们对标准MARL算法的实验表明,在简化和复杂的网络中,政策往往无法趋于最佳解决办法,或需要较长的培训时间。由于我们无法完全依赖模拟培训,因为没有准确的人类路线行为模型,这一问题更加严重。与此同时,城市的现实世界培训有可能破坏城市交通系统的稳定,增加外部效应,如2美元的排放量,并引入非静止性,因为人驾驶员将难以预见地适应AV的行为。在本立场文件中,我们主张,城市当局必须与ML社区合作,监测和严格评价汽车公司提出的路线算法,以实现公平和系统高效的路线算法和监管标准。

Article 30

Title@2025-06-14 (6): Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

Title: Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

Auf dem Weg zu vernünftigen Papageien: Warum große Sprachmodelle mit uns argumentieren sollten

通向合理的鹦鹉:为什么大语言模型应该设计来与我们争论? 2505.05298v2

Authors (13): Elena Musi, Nadin Kokciyan, Khalid Al-Khatib, Davide Ceolin, Emmanuelle Dietz, Klara Gutekunst, Annette Hautli-Janisz, Cristian Manuel Santibañez Yañez, Jodi Schneider, Jonas Scholz, Cor Steging, Jacky Visser, Henning Wachsmuth

In this position paper, we advocate for the development of conversational technology that is inherently designed to support and facilitate argumentative processes. We argue that, at present, large language models (LLMs) are inadequate for this purpose, and we propose an ideal technology design aimed at enhancing argumentative skills. This involves re-framing LLMs as tools to exercise our critical thinking skills rather than replacing them. We introduce the concept of \textit{reasonable parrots} that embody the fundamental principles of relevance, responsibility, and freedom, and that interact through argumentative dialogical moves. These principles and moves arise out of millennia of work in argumentation theory and should serve as the starting point for LLM-based technology that incorporates basic principles of argumentation.

在这份立场文件中,我们主张发展对话技术,而这种技术本来就是用来支持和促进辩论过程的。我们争辩说,目前,大型语言模型(LLMs)不足以达到这一目的,我们提出一个旨在增强辩论技能的理想技术设计,这涉及将LLMs重新设计为工具,以行使我们批判性思维技能,而不是取代这些技能。我们引入了体现相关性、责任和自由等基本原则,并通过辩论性对话动作进行互动的\textit{合理鹦鹉}概念。这些原则和动作产生于数千年的争论理论工作,应当成为基于LLMM的技术的起点,该技术包含基本的论证原则。

Article 31

Title: IndoorWorld: Integrating Physical Task Solving and Social Simulation in A Heterogeneous Multi-Agent Environment

IndoorWorld: Integrieren physischer Aufgabenlösung und sozialer Simulation in einer heterogenen, multiagenten Umgebung

室内世界:将物理任务综合解决和社会模拟纳入一个多样化的多机构环境 2506.12331v1

Authors (4): Dekun Wu, Frederik Brudy, Bang Liu, Yi Wang

Virtual environments are essential to AI agent research. Existing environments for LLM agent research typically focus on either physical task solving or social simulation, with the former oversimplifying agent individuality and social dynamics, and the latter lacking physical grounding of social behaviors. We introduce IndoorWorld, a heterogeneous multi-agent environment that tightly integrates physical and social dynamics. By introducing novel challenges for LLM-driven agents in orchestrating social dynamics to influence physical environments and anchoring social interactions within world states, IndoorWorld opens up possibilities of LLM-based building occupant simulation for architectural design. We demonstrate the potential with a series of experiments within an office setting to examine the impact of multi-agent collaboration, resource competition, and spatial layout on agent behavior.

虚拟环境是AI代理商研究所必不可少的。LLM代理商研究的现有环境通常侧重于实际任务解决或社会模拟,前者过于简单化的代理商个性和社会动态,而后者缺乏社会行为的实际基础。我们引入了门世界,这是一个将物理和社会动态紧密结合的多种多媒介环境。通过引入LLM驱动的代理商在协调社会动态以影响世界各州的物理环境和巩固社会互动方面的新挑战,IndoorWorld为建筑设计开辟了以LLM为基础的建筑占住者模拟的可能性。我们展示了在办公室设置内进行一系列实验以审查多代理人合作、资源竞争和空间布局对代理商行为的影响的潜力。

Article 32

Title@2025-06-14 (6): Deep Fictitious Play-Based Potential Differential Games for Learning Human-Like Interaction at Unsignalized Intersections

Title: Deep Fictitious Play-Based Potential Differential Games for Learning Human-Like Interaction at Unsignalized Intersections

Deep Fictitious Play-Based Potential Differential Games für das Lernen von Mensch-ähnliche Interaktion an unsignalisierten Schnitten

为在无信号交界处学习人与人之间的相互作用而举行的深、有真知灼见的以游戏为基础的潜在差异运动会 2506.12283v1

Authors (3): Kehua Chen, Shucheng Zhang, Yinhai Wang

Modeling vehicle interactions at unsignalized intersections is a challenging task due to the complexity of the underlying game-theoretic processes. Although prior studies have attempted to capture interactive driving behaviors, most approaches relied solely on game-theoretic formulations and did not leverage naturalistic driving datasets. In this study, we learn human-like interactive driving policies at unsignalized intersections using Deep Fictitious Play. Specifically, we first model vehicle interactions as a Differential Game, which is then reformulated as a Potential Differential Game. The weights in the cost function are learned from the dataset and capture diverse driving styles. We also demonstrate that our framework provides a theoretical guarantee of convergence to a Nash equilibrium. To the best of our knowledge, this is the first study to train interactive driving policies using Deep Fictitious Play. We validate the effectiveness of our Deep Fictitious Play-Based Potential Differential Game (DFP-PDG) framework using the INTERACTION dataset. The results demonstrate that the proposed framework achieves satisfactory performance in learning human-like driving policies. The learned individual weights effectively capture variations in driver aggressiveness and preferences. Furthermore, the ablation study highlights the importance of each component within our model.

由于基本游戏理论过程的复杂性,模拟未发牌交叉路口的车辆互动是一项艰巨的任务。虽然先前的研究曾试图捕捉互动驱动行为,但大多数方法都仅仅依靠游戏理论配方,没有利用自然驱动数据集。在这项研究中,我们学习了在未发牌交叉点使用深发盘游戏的人类式交互式驱动政策。具体地说,我们第一次模拟车辆互动是一个差异游戏,然后重新拟订为潜在差异游戏。成本函数中的权重是从数据集中学习的,并捕捉了不同的驱动风格。我们还表明,我们的框架提供了与纳什平衡趋同的理论保证。根据我们的最佳知识,这是利用深发盘游戏来培训交互式驱动政策的第一个研究。我们用InterFactictious游戏来验证我们深发盘游戏潜在差异游戏框架的有效性。结果显示,拟议的框架在学习人型驾驶政策时取得了令人满意的业绩。我们学到的个体权重度,以及每一个驱动力偏好度模型都突出了我们每个驱动力的大小。

Article 33

Title@2025-06-13 (5): Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study

Title: Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study

Untersuchung des Potenzials von Multi-Agent-Architekturen für die Grundlagen-Design-Automatisierung von Großsprachenmodellen: Eine Aufgabenklassifikation und Expertenauswahlstudie

调查基于大语言示范示范路由器多机构结构对基础设计自动化的潜力:任务分类和专家甄选研究 2506.13811v1

Authors (4): Sompote Youwai, David Phim, Vianne Gayl Murcia, Rianne Clair Onas

This study investigates router-based multi-agent systems for automating foundation design calculations through intelligent task classification and expert selection. Three approaches were evaluated: single-agent processing, multi-agent designer-checker architecture, and router-based expert selection. Performance assessment utilized baseline models including DeepSeek R1, ChatGPT 4 Turbo, Grok 3, and Gemini 2.5 Pro across shallow foundation and pile design scenarios. The router-based configuration achieved performance scores of 95.00% for shallow foundations and 90.63% for pile design, representing improvements of 8.75 and 3.13 percentage points over standalone Grok 3 performance respectively. The system outperformed conventional agentic workflows by 10.0 to 43.75 percentage points. Grok 3 demonstrated superior standalone performance without external computational tools, indicating advances in direct LLM mathematical reasoning for engineering applications. The dual-tier classification framework successfully distinguished foundation types, enabling appropriate analytical approaches. Results establish router-based multi-agent systems as optimal for foundation design automation while maintaining professional documentation standards. Given safety-critical requirements in civil engineering, continued human oversight remains essential, positioning these systems as advanced computational assistance tools rather than autonomous design replacements in professional practice.

该研究调查了通过智能任务分类和专家选择实现基础设计计算自动化的基于路由器的多试剂系统,评价了三种方法:单一试剂处理、多剂设计师结构以及基于路由器的专家选择;绩效评估使用了基线模型,包括DeepSeek R1、ChatGPT 4 Turbo、Grok 3和Gemini 2.5 Pro,横跨浅地基和堆积设计情景;基于路由器的配置在浅地基和堆积设计方面达到了95.00%的性能分数,在堆积设计方面达到了90.63%的性能分,分别比独立格罗克3的性能改进了8.75和3.13个百分点;系统以10.0至43.75百分点为优于常规制剂工作流程;Grok 3展示了在没有外部计算工具的情况下的优异独立性业绩,表明在工程应用的直接LLM数学推理方面取得了进展;双层分类框架成功地区分了基础类型,使适当的分析方法得以建立基于路由器的多试剂系统,作为基础设计自动化设计自动化的最佳基础设计自动化,同时保持专业文件标准;鉴于民用工程的安全临界要求,持续的人类监督仍然至关重要,将这些系统定位作为先进的计算辅助工具,这些系统定位为高级计算工具而不是专业做法中自主设计替代。

Article 34

Title@2025-06-13 (5): A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon

Title: A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon

Ein Benchmark für die Verallgemeinerung unterschiedlicher Teamstrategien im wettbewerbsfähigen Pokémon

普凯蒙竞争中全面推广不同团队战略的基准 2506.10326v2

Authors (5): Cameron Angliss, Jiaxun Cui, Jiaheng Hu, Arrasy Rahman, Peter Stone

Developing AI agents that can robustly adapt to dramatically different strategic landscapes without retraining is a central challenge for multi-agent learning. Pok'emon Video Game Championships (VGC) is a domain with an extraordinarily large space of possible team configurations of approximately $10^{139}$ - far larger than those of Dota or Starcraft. The highly discrete, combinatorial nature of team building in Pok'emon VGC causes optimal strategies to shift dramatically depending on both the team being piloted and the opponent’s team, making generalization uniquely challenging. To advance research on this problem, we introduce VGC-Bench: a benchmark that provides critical infrastructure, standardizes evaluation protocols, and supplies human-play datasets and a range of baselines - from large-language-model agents and behavior cloning to reinforcement learning and empirical game-theoretic methods such as self-play, fictitious play, and double oracle. In the restricted setting where an agent is trained and evaluated on a single-team configuration, our methods are able to win against a professional VGC competitor. We extensively evaluated all baseline methods over progressively larger team sets and find that even the best-performing algorithm in the single-team setting struggles at scaling up as team size grows. Thus, policy generalization across diverse team strategies remains an open challenge for the community. Our code is open sourced at https://github.com/cameronangliss/VGC-Bench.

在不再培训的情况下,发展能够强有力地适应完全不同的战略景观的AI代理机构是多试探学习的一个中心挑战。 Pok\ emamon Vegle General Campales(VGC)是一个非常庞大的领域,拥有大约10139美元(比Dota或Starcraft要大得多)的可能团队配置空间,远大于Dota或Starcraft。在Pok'emon VGC的团队建设中,高度离散、组合性强的团队建设导致最佳战略的急剧转变,取决于正在试点的团队和对手团队,使普遍性具有独特的挑战性。为了推进对这一问题的研究,我们引入VGC-Bench:一个提供关键基础设施的基准,将评估协议标准化,并提供人类游戏数据集和一系列基线 — 从大型语言模范代理和行为克隆到强化学习和实验性游戏-理论方法,如自玩游戏、虚构游戏和双形或变形。在限制环境中培训和评价一个代理机构,我们的方法能够战胜专业VGC竞争者。我们广泛评估VGC-Bench:我们广泛评估了所有基线方法,在团队规模上超越了整个团队的团队规模上,在团队中不断提升了整个团队的游戏中,在团队中不断演进进进进进进进进进进进进进进进进进进进进进进进式的系统。

Article 35

Title@2025-06-13 (5): Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?

Title: Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?

Upgrade oder Switch: Brauchen wir eine neue Registry-Architektur für das Internet von KI-Agenten?

升级或切换:我们是否需要为AI代理商的互联网建立一个新的注册结构? 2506.12003v1

Authors (13): Ramesh Raskar, Pradyumna Chari, Jared James Grogan, Mahesh Lambe, Robert Lincourt, Raghu Bala, Abhishek Singh, Ayush Chopra, Rajesh Ranjan, Shailja Gupta, Dimitris Stripelis, Maria Gorskikh, Sichao Wang

The emerging Internet of AI Agents challenges existing web infrastructure designed for human-scale, reactive interactions. Unlike traditional web resources, autonomous AI agents initiate actions, maintain persistent state, spawn sub-agents, and negotiate directly with peers: demanding millisecond-level discovery, instant credential revocation, and cryptographic behavioral proofs that exceed current DNS/PKI capabilities. This paper analyzes whether to upgrade existing infrastructure or implement purpose-built registry architectures for autonomous agents. We identify critical failure points: DNS propagation (24-48 hours vs. required milliseconds), certificate revocation unable to scale to trillions of entities, and IPv4/IPv6 addressing inadequate for agent-scale routing. We evaluate three approaches: (1) Upgrade paths, (2) Switch options, (3) Hybrid registries. Drawing parallels to dialup-to-broadband transitions, we find that agent requirements constitute qualitative, and not incremental, changes. While upgrades offer compatibility and faster deployment, clean-slate solutions provide better performance but require longer for adoption. Our analysis suggests hybrid approaches will emerge, with centralized registries for critical agents and federated meshes for specialized use cases.

AI代理商的新兴互联网挑战了为人类规模的被动互动而设计的现有网络基础设施。与传统的网络资源不同,自主的AI代理商发起行动,保持持久性状态,产卵子试剂,并与同行直接谈判:要求获得毫秒水平的发现,即时认证撤销,以及超出当前DNS/PKI能力的加密行为证明。本文分析是对现有基础设施进行升级,还是为自主代理商实施目的建造的注册结构。我们查明了关键的故障点:DNS传播(24至48小时相对于要求的毫秒 ) , 证书吊销无法达到数万亿实体, IPv4/IPv6 解决代理规模路由不足的问题。我们评估了三种方法:(1) 升级路径,(2) 切换选项,(3) 混合登记册。我们发现,在拨号到宽带过渡的同时,代理要求构成质的改变,而不是递增的。虽然升级提供了兼容性和更快的部署,但清洁的解决方案提供更好的业绩,但需要更长的采用。我们的分析表明,混合方法将会出现,关键代理商的中央登记册和专门使用案例的配制。

Article 36

Title: Computational Social Choice: Parameterized Complexity and Challenges

Computational Social Choice: Parameterisierte Komplexität und Herausforderungen

社会选择:参数复杂性和挑战 2410.14078v2

Authors (3): Jiehua Chen, Christian Hatschka, Sofia Simola

We survey two key problems-Multi-Winner Determination and Hedonic Games in Computational Social Choice, with a special focus on their parameterized complexity, and propose some research challenges in the field.

我们调查了两个关键问题,即多维因决定和计算社会选择中的黑道运动会,特别侧重于其参数复杂性,并提出该领域的一些研究挑战。

Article 37

Title@2025-06-13 (5): Combining Deep Reinforcement Learning and Search with Generative Models for Game-Theoretic Opponent Modeling

Title: Combining Deep Reinforcement Learning and Search with Generative Models for Game-Theoretic Opponent Modeling

Die Kombination von Deep Enforcement Learning und Search mit generativen Modellen für Game-Theoretic Opponent Modeling

将深强化学习和搜索与游戏理论对称模型生成模型相结合 2302.00797v2

Authors (10): Zun Li, Marc Lanctot, Kevin R. McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P. Wellman

Opponent modeling methods typically involve two crucial steps: building a belief distribution over opponents’ strategies, and exploiting this opponent model by playing a best response. However, existing approaches typically require domain-specific heurstics to come up with such a model, and algorithms for approximating best responses are hard to scale in large, imperfect information domains. In this work, we introduce a scalable and generic multiagent training regime for opponent modeling using deep game-theoretic reinforcement learning. We first propose Generative Best Respoonse (GenBR), a best response algorithm based on Monte-Carlo Tree Search (MCTS) with a learned deep generative model that samples world states during planning. This new method scales to large imperfect information domains and can be plug and play in a variety of multiagent algorithms. We use this new method under the framework of Policy Space Response Oracles (PSRO), to automate the generation of an \emph{offline opponent model} via iterative game-theoretic reasoning and population-based training. We propose using solution concepts based on bargaining theory to build up an opponent mixture, which we find identifying profiles that are near the Pareto frontier. Then GenBR keeps updating an \emph{online opponent model} and reacts against it during gameplay. We conduct behavioral studies where human participants negotiate with our agents in Deal-or-No-Deal, a class of bilateral bargaining games. Search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare and Nash bargaining score negotiating with humans as humans trading among themselves.

相对最佳模型方法通常涉及两个关键步骤:在对手的战略上建立信仰分布,并利用这一对手的模型做出最佳反应。然而,现有方法通常要求有特定域的超能力模型才能形成这样的模型,而接近最佳反应的算法很难在大、不完善的信息领域进行规模化。在这项工作中,我们采用一个可缩放和通用的多试剂培训制度,用于使用深层次的游戏理论强化学习来模拟对手的模型。我们首先提出基于蒙特卡罗树搜索(MCTS)的最佳响应算法(GenBR),该算法以一个更深层次的基因化模型为基础,让世界在规划中进行样本化。这个新的方法比方对于大不完善的信息领域,并且可以在多种多媒介的算法中进行插插接和玩。我们在政策空间反应甲骨架(PSRO)的框架内,引入一个可自动生成模型的对手模型。我们建议使用基于谈判理论的解决方案概念来构建一个更深层次的货币模型混合物,我们发现一个比额的比方的比方的比方行为,在随后的比方研究中,我们找到一个比方的比方的比方的比方的比方,我们发现一个比方的比方的比方的比方的比方的比方的比方在做。

Article 38

Title@2025-06-13 (5): The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

Title: The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

Das automatisierte, aber riskante Spiel: Modellierung von Agent-zu-Agent-Verhandlungen und Transaktionen in Verbrauchermärkten

自动但有风险游戏:消费者市场代理对代理谈判和交易的模拟 2506.00073v3

Authors (6): Shenzhe Zhu, Jiao Sun, Yi Nian, Tobin South, Alex Pentland, Jiaxin Pei

AI agents are increasingly used in consumer-facing applications to assist with tasks such as product search, negotiation, and transaction execution. In this paper, we explore a future scenario where both consumers and merchants authorize AI agents to fully automate negotiations and transactions. We aim to answer two key questions: (1) Do different LLM agents vary in their ability to secure favorable deals for users? (2) What risks arise from fully automating deal-making with AI agents in consumer markets? To address these questions, we develop an experimental framework that evaluates the performance of various LLM agents in real-world negotiation and transaction settings. Our findings reveal that AI-mediated deal-making is an inherently imbalanced game – different agents achieve significantly different outcomes for their users. Moreover, behavioral anomalies in LLMs can result in financial losses for both consumers and merchants, such as overspending or accepting unreasonable deals. These results underscore that while automation can improve efficiency, it also introduces substantial risks. Users should exercise caution when delegating business decisions to AI agents.

以消费者为对象的大赦国际代理人越来越多地被用于消费者为对象的应用程序,以协助完成产品搜索、谈判和交易执行等任务。在本文件中,我们探讨了消费者和商人授权大赦国际代理人使谈判和交易完全自动化的未来情景。我们的目标是回答两个关键问题:(1) 不同的LLM代理商在为用户争取优惠交易的能力方面是否各不相同?(2) 在消费者市场上与AI代理商进行完全自动化交易会产生什么风险?为了解决这些问题,我们制定了一个实验框架,评估各种LM代理商在现实世界谈判和交易环境中的表现。我们的调查结果显示,AI中介交易的制作是一种固有的不平衡游戏,不同的代理商为其用户取得了显著不同的结果。此外,LLMMS的行为异常可能会给消费者和商人造成财务损失,例如过度支出或接受不合理的交易。这些结果强调,自动化可以提高效率,但也带来很大风险。用户在将商业决定委托给AI代理商时,应该谨慎行事。

Article 39

Title@2025-06-13 (5): Agent Semantics, Semantic Spacetime, and Graphical Reasoning

Title: Agent Semantics, Semantic Spacetime, and Graphical Reasoning

Agent Semantics, Semantische Raumzeit und Graphische Vernunft

语义学、语义空间时间和图形解释 2506.07756v2

Authors (1): Mark Burgess

Some formal aspects of the Semantic Spacetime graph model are presented, with reference to its use for directed knowledge representations and process modelling. A finite $\gamma(3,4)$ representation is defined to form a closed set of operations that can scale to any degree of semantic complexity. The Semantic Spacetime postulates bring predictability with minimal constraints to pathways in graphs. The ubiquitous appearance of absorbing states in any partial graph means that a graph process leaks information. The issue is closely associated with the issue of division by zero, which signals a loss of closure and the need for manual injection of remedial information. The Semantic Spacetime model (and its Promise Theory) origins help to clarify how such absorbing states are associated with boundary information where intentionality can enter.

介绍了Semantic Spacetime时间图形模型的一些正式方面,其中提及了该模型用于定向知识表达和进程建模的情况。限定的 $\gamma(3,4,4) 表示方式的定义是形成一套封闭的操作,可以达到某种程度的语义复杂程度。语义空间时间假设给图形中的路径带来可预测性,而最小的限制。任何部分图形中的吸收状态的无处不在的外观意味着图形过程会泄漏信息。这个问题与以零表示关闭损失和人工注入补救信息的问题密切相关。语义空间时间模型(及其承诺理论)的起源有助于澄清这种吸收状态如何与可有意进入的边界信息相联系。

Article 40

Title@2025-06-13 (5): Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines

Title: Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines

Bel Esprit: Multi-Agent Framework für den Bau von KI-Modellpipelines

Bel Esprit: 建立AI 示范管道的多机构机构框架 2412.14684v2

Authors (5): Yunsu Kim, AhmedElmogtaba Abdelaziz, Thiago Castro Ferreira, Mohamed Al-Badrashiny, Hassan Sawaf

As the demand for artificial intelligence (AI) grows to address complex real-world tasks, single models are often insufficient, requiring the integration of multiple models into pipelines. This paper introduces Bel Esprit, a conversational agent designed to construct AI model pipelines based on user-defined requirements. Bel Esprit employs a multi-agent framework where subagents collaborate to clarify requirements, build, validate, and populate pipelines with appropriate models. We demonstrate the effectiveness of this framework in generating pipelines from ambiguous user queries, using both human-curated and synthetic data. A detailed error analysis highlights ongoing challenges in pipeline construction. Bel Esprit is available for a free trial at https://belesprit.aixplain.com.

由于对人工智能(AI)的需求日益增长,以解决复杂的现实世界任务,单一模型往往不够充分,需要将多种模型纳入管道,本文介绍Bel Esprit,这是一个旨在根据用户界定的要求建造AI型模板管道的谈话代理物,Bel Esprit使用一个多试剂框架,分剂在其中合作澄清要求、建造、验证和以适当模型填充管道。我们展示了这一框架在利用人造和合成数据从模糊用户查询中生成管道方面的有效性。详细错误分析突出了管道建设中目前存在的挑战。Bel Esprit可以在https://beresprit.aixplain.com上免费试用。

Article 41

Title@2025-06-13 (5): PE-MA: Parameter-Efficient Co-Evolution of Multi-Agent Systems

Title: PE-MA: Parameter-Efficient Co-Evolution of Multi-Agent Systems

PE-MA: Parametereffiziente Ko-Evolution von Multi-Agent-Systemen

PE-MA: 多机构系统参数有效共同演变 2506.11803v1

Authors (6): Yingfan Deng, Anhao Zhou, Yuan Yuan, Xian Zhang, Yifei Zou, Dongxiao Yu

Multi-Agent Systems have recently emerged as a promising paradigm for collaborative reasoning and solving complex tasks. However, the design of collaborative learning algorithms in multi-agent systems faces several challenges, including high communication overhead and insufficient agent-level personalization. In this paper, we propose PE-MA (Parameter-Efficient Multi-Agent Co-Evolution), a novel collaboration framework that supports efficient, scalable, and personalized co-evolution in multi-agent systems. In PE-MA, each agent maintains a lightweight personalized adapter to support agent-specific behavior, while a shared adapter is collaboratively optimized across neighboring agents. This design balances global coordination with local adaptation under heterogeneous environments. We achieve an asymptotically optimal convergence rate of O( 1/(NK)^(1/2) ), where N is the number of agents and K the local update steps.

多个机构系统最近成为合作推理和解决复杂任务的有希望的范例,然而,多试剂系统中合作学习算法的设计面临若干挑战,包括高通信间接费用和代理级别个人化不足。在本文件中,我们提议PE-MA(Parameter-Efficist 多重代理共同进化),这是一个支持多试剂系统中高效、可缩放和个性化共同进化的新协作框架。在PE-MA中,每个代理商都有一个轻量级个人化的适配器,以支持特定代理商的行为,而一个共享的适配器则在相邻的代理商之间相互优化。这一设计平衡了全球协调与不同环境中的当地适应性。我们实现了O(1/(NK)(1/2)的零位最佳融合率,其中N是代理商的数量,K是当地更新步骤。

Article 42

Title@2025-06-13 (5): Is Your LLM-Based Multi-Agent a Reliable Real-World Planner? Exploring Fraud Detection in Travel Planning

Title: Is Your LLM-Based Multi-Agent a Reliable Real-World Planner? Exploring Fraud Detection in Travel Planning

Ist Ihr LLM-basierter Multiagent ein zuverlässiger Real-World Planer? Erforschen Sie Betrugserkennung in der Reiseplanung

你以LLM为基地的多方机构是可靠的真实世界规划者吗? 探索旅行规划中的欺诈侦查 2505.16557v2

Authors (7): Junchi Yao, Jianhua Xu, Tianyu Xin, Ziyi Wang, Shenzhe Zhu, Shu Yang, Di Wang

The rise of Large Language Model-based Multi-Agent Planning has leveraged advanced frameworks to enable autonomous and collaborative task execution. Some systems rely on platforms like review sites and social media, which are prone to fraudulent information, such as fake reviews or misleading descriptions. This reliance poses risks, potentially causing financial losses and harming user experiences. To evaluate the risk of planning systems in real-world applications, we introduce \textbf{WandaPlan}, an evaluation environment mirroring real-world data and injected with deceptive content. We assess system performance across three fraud cases: Misinformation Fraud, Team-Coordinated Multi-Person Fraud, and Level-Escalating Multi-Round Fraud. We reveal significant weaknesses in existing frameworks that prioritize task efficiency over data authenticity. At the same time, we validate WandaPlan’s generalizability, capable of assessing the risks of real-world open-source planning frameworks. To mitigate the risk of fraud, we propose integrating an anti-fraud agent, providing a solution for reliable planning.

以大语言模式为基础的多机构规划的兴起利用了先进的框架,使任务得以自主和协作执行。有些系统依靠审查站和社交媒体等平台,这些平台容易出现欺诈信息,例如虚假审查或误导性描述。这种依赖带来了风险,可能造成财政损失和伤害用户经验。为了评价现实应用中规划系统的风险,我们引入了反映现实世界数据并注入欺骗内容的评价环境\ textbf{WandaPlan}。我们评估了三个欺诈案件:错误信息欺诈、团队协调多人欺诈和等级扩大多功能欺诈的系统性能。我们发现现有框架存在重大缺陷,将任务效率置于数据真实性之上。与此同时,我们验证了WandaPlan的通用性,能够评估现实世界开放源规划框架的风险。为减轻欺诈风险,我们建议整合一个反欺诈代理人,为可靠的规划提供解决方案。

Article 43

Title@2025-06-13 (5): AutoGen Driven Multi Agent Framework for Iterative Crime Data Analysis and Prediction

Title: AutoGen Driven Multi Agent Framework for Iterative Crime Data Analysis and Prediction

AutoGen Driven Multi Agent Framework für iterative Kriminalität Datenanalyse und Vorhersage

循环犯罪数据分析和预测自动驱动器多剂框架 2506.11475v1

Authors (4): Syeda Kisaa Fatima, Tehreem Zubair, Noman Ahmed, Asifullah Khan

This paper introduces LUCID-MA (Learning and Understanding Crime through Dialogue of Multiple Agents), an innovative AI powered framework where multiple AI agents collaboratively analyze and understand crime data. Our system that consists of three core components: an analysis assistant that highlights spatiotemporal crime patterns, a feedback component that reviews and refines analytical results and a prediction component that forecasts future crime trends. With a well-designed prompt and the LLaMA-2-13B-Chat-GPTQ model, it runs completely offline and allows the agents undergo self-improvement through 100 rounds of communication with less human interaction. A scoring function is incorporated to evaluate agent’s performance, providing visual plots to track learning progress. This work demonstrates the potential of AutoGen-style agents for autonomous, scalable, and iterative analysis in social science domains maintaining data privacy through offline execution.

本文介绍LUCID-MA(通过多种代理人的对话来学习和理解犯罪),这是一个创新的AI授权框架,其中多个大赦国际代理人合作分析和理解犯罪数据。我们的系统由三个核心部分组成:一个分析助理,突出时空犯罪模式;一个反馈部分,审查和完善分析结果;一个预测部分,预测未来犯罪趋势。它设计得当的迅速和LalaMA-213B-Chat-GPTQ模式,完全脱机运行,通过100轮交流进行自我改进,减少人际互动;一个评分功能,用于评价代理人的业绩,提供跟踪学习进展的视觉图象。这项工作显示了AutoGen式代理人在社会科学领域通过离线执行保持数据隐私的自主、可扩展和迭代分析的潜力。

Article 44

Title@2025-06-13 (5): DURA-CPS: A Multi-Role Orchestrator for Dependability Assurance in LLM-Enabled Cyber-Physical Systems

Title: DURA-CPS: A Multi-Role Orchestrator for Dependability Assurance in LLM-Enabled Cyber-Physical Systems

DURA-CPS: Ein Multi-Rolle-Orchester für Zuverlässigkeitssicherung in LLM-fähigen Cyber-Physischen Systemen

DURA-CPS:LLM-Enable网络-物理系统依赖性保证多功能Orster 2506.06381v2

Authors (10): Trisanth Srinivasan, Santosh Patapati, Himani Musku, Idhant Gode, Aditya Arora, Samvit Bhattacharya, Abubakr Nazriev, Sanika Hirave, Zaryab Kanjiani, Srinjoy Ghose

Cyber-Physical Systems (CPS) increasingly depend on advanced AI techniques to operate in critical applications. However, traditional verification and validation methods often struggle to handle the unpredictable and dynamic nature of AI components. In this paper, we introduce DURA-CPS, a novel framework that employs multi-role orchestration to automate the iterative assurance process for AI-powered CPS. By assigning specialized roles (e.g., safety monitoring, security assessment, fault injection, and recovery planning) to dedicated agents within a simulated environment, DURA-CPS continuously evaluates and refines AI behavior against a range of dependability requirements. We demonstrate the framework through a case study involving an autonomous vehicle navigating an intersection with an AI-based planner. Our results show that DURA-CPS effectively detects vulnerabilities, manages performance impacts, and supports adaptive recovery strategies, thereby offering a structured and extensible solution for rigorous V&V in safety- and security-critical systems.

然而,传统核查和验证方法往往难以处理AI组件的不可预测和动态性质。我们在本文件中介绍了DURA-CPS,这是一个采用多功能管弦的新框架,它采用多功能管弦,使AI动力的CPS的迭代保证程序自动化。通过在模拟环境中指定专门代理人(例如安全监测、安保评估、过失注射和复原规划),DURA-CPS不断根据一系列可靠要求评估和完善AI行为。我们通过涉及自动车辆与AI型规划师交叉的案例研究,展示了该框架。我们的结果显示,DURA-CPS有效地检测了脆弱性,管理绩效影响,并支持适应性恢复战略,从而为安全和安保关键系统中严格的V&V提供了结构化和可扩展的解决办法。

Article 45

Title@2025-06-13 (5): Policy Optimization and Multi-agent Reinforcement Learning for Mean-variance Team Stochastic Games

Title: Policy Optimization and Multi-agent Reinforcement Learning for Mean-variance Team Stochastic Games

Politikoptimierung und Multi-Agenten-Verstärkung Lernen für Mittelvarianz Team Stochastic Games

平均差小组游戏游戏政策优化和多剂强化学习 2503.22779v2

Authors (2): Junkai Hu, Li Xia

We study a long-run mean-variance team stochastic game (MV-TSG), where each agent shares a common mean-variance objective for the system and takes actions independently to maximize it. MV-TSG has two main challenges. First, the variance metric is neither additive nor Markovian in a dynamic setting. Second, simultaneous policy updates of all agents lead to a non-stationary environment for each individual agent. Both challenges make dynamic programming inapplicable. In this paper, we study MV-TSGs from the perspective of sensitivity-based optimization. The performance difference and performance derivative formulas for joint policies are derived, which provide optimization information for MV-TSGs. We prove the existence of a deterministic Nash policy for this problem. Subsequently, we propose a Mean-Variance Multi-Agent Policy Iteration (MV-MAPI) algorithm with a sequential update scheme, where individual agent policies are updated one by one in a given order. We prove that the MV-MAPI algorithm converges to a first-order stationary point of the objective function. By analyzing the local geometry of stationary points, we derive specific conditions for stationary points to be (local) Nash equilibria, and further, strict local optima. To solve large-scale MV-TSGs in scenarios with unknown environmental parameters, we extend the idea of trust region methods to MV-MAPI and develop a multi-agent reinforcement learning algorithm named Mean-Variance Multi-Agent Trust Region Policy Optimization (MV-MATRPO). We derive a performance lower bound for each update of joint policies. Finally, numerical experiments on energy management in multiple microgrid systems are conducted.

我们研究的是长期的中位差分团队微查游戏(MV-TSG),在这个游戏中,每个代理商都有一个共同的中位差分目标,并且独立地采取行动来尽量扩大这个系统。MV-TSG有两个主要挑战。首先,差异度指标在动态环境下既不是添加性的,也不是Markovian。第二,所有代理商同时更新政策导致每个代理商的非静止环境。这两项挑战都使得动态程序无法适用。在本文中,我们从基于敏感性的优化角度研究MV-TSG。联合政策的性能差异和性能衍生公式公式,为MV-TSG提供优化的信息。我们证明存在一种确定性纳什政策,随后,我们建议采用一个超值多位政策(MV-MAPI)的算法,使单个代理商政策根据给定的顺序逐次更新。我们证明,MV-MADI的算法与目标功能的第一阶级定点一致。通过分析本地的SDRI,我们用固定的测算法,我们用每个固定的测算方法来进行具体的能源级的测算。

Article 46

Title@2025-06-13 (5): Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

Title: Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

Robustes kooperatives Mehr-Agenten-Verstärkung-Lernen:Ein Mittelfeld-Spiel-Perspektive

强有力的合作多机构强化多机构强化学习:中、实地游戏的视角 2406.13992v2

Authors (4): Muhammad Aneeq uz Zaman, Mathieu Laurière, Alec Koppel, Tamer Başar

In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochastic} uncertainties whose distributions are respectively known and unknown. Focusing on policy optimization that accounts for both types of uncertainties, we formulate the problem in a worst-case (minimax) framework, which is is intractable in general. Thus, we focus on the Linear Quadratic setting to derive benchmark solutions. First, since no standard theory exists for this problem due to the distributed information structure, we utilize the Mean-Field Type Game (MFTG) paradigm to establish guarantees on the solution quality in the sense of achieved Nash equilibrium of the MFTG. This in turn allows us to compare the performance against the corresponding original robust multi-agent control problem. Then, we propose a Receding-horizon Gradient Descent Ascent RL algorithm to find the MFTG Nash equilibrium and we prove a non-asymptotic rate of convergence. Finally, we provide numerical experiments to demonstrate the efficacy of our approach relative to a baseline algorithm.

在本文中,我们研究了强有力的合作性多试剂强化学习(RL)问题,即大量合作代理机构在分布信息的情况下,旨在学习政策,而这种政策在分布为已知和未知的\ emph{stochastic} 和\ emph{n-stochistic} 不确定因素面前,其分布分别为已知和未知。我们注重政策优化,对这两种不确定因素都适用最差的情况(Minimmax)框架(Minmax),这是一般难以解决的。因此,我们侧重于线性二次曲线设置,以找到基准解决方案。首先,由于分布的信息结构没有关于该问题的标准理论,我们利用平面型游戏(MFTG)模式来为MFTG达到的纳什平衡意义上的解决方案质量提供保障。这反过来又使我们能够将业绩与相应的原始强力多剂控制问题进行比较。然后,我们建议采用后退级分级梯级梯级后后源源源源Ascent RL算法,以找到MFTG Nash平衡,我们证明是一个非被动融合率。最后,我们提供数字实验,以展示了我们相对的趋一致的方法。

Article 47

Title@2025-06-12 (4): Control Industrial Automation System with Large Language Model Agents

Title: Control Industrial Automation System with Large Language Model Agents

Steuerung des industriellen Automatisierungssystems mit großen Sprachmodellen

配有大语言示范物剂的控制工业自动化系统 2409.18009v2

Authors (5): Yuchen Xia, Nasser Jazdi, Jize Zhang, Chaitanya Shah, Michael Weyrich

Traditional industrial automation systems require specialized expertise to operate and complex reprogramming to adapt to new processes. Large language models offer the intelligence to make them more flexible and easier to use. However, LLMs’ application in industrial settings is underexplored. This paper introduces a framework for integrating LLMs to achieve end-to-end control of industrial automation systems. At the core of the framework are an agent system designed for industrial tasks, a structured prompting method, and an event-driven information modeling mechanism that provides real-time data for LLM inference. The framework supplies LLMs with real-time events on different context semantic levels, allowing them to interpret the information, generate production plans, and control operations on the automation system. It also supports structured dataset creation for fine-tuning on this downstream application of LLMs. Our contribution includes a formal system design, proof-of-concept implementation, and a method for generating task-specific datasets for LLM fine-tuning and testing. This approach enables a more adaptive automation system that can respond to spontaneous events, while allowing easier operation and configuration through natural language for more intuitive human-machine interaction. We provide demo videos and detailed data on GitHub: https://github.com/YuchenXia/LLM4IAS.

大型语言模型提供智能,使其更加灵活和更容易使用。然而,LLMS在工业环境中的应用没有得到充分探讨。本文件介绍了一个框架,将LLMS整合成一个框架,以实现工业自动化系统的端到端控制。框架的核心是为工业任务设计的代理系统、结构化的提示方法,以及一个为LLM推断提供实时数据的事件驱动信息模型机制。框架为LM提供不同背景语义级别的实时事件LMs,允许他们解释信息、生成生产计划和控制自动化系统操作。它还支持结构化数据集的创建,以对LLMMS的下游应用进行微调。我们的贡献包括一个正式的系统设计、测试概念实施,以及生成LMM微调和测试任务特定数据集的方法。这一方法使得一个更适应性更强的自动化系统能够应对自发事件,同时允许他们通过自然语言较易操作和配置的操作和配置,用于更直观的人类机器/MAVA4。我们提供详细的图像和数据。

Article 48

Title@2025-06-12 (4): Shapley Machine: A Game-Theoretic Framework for N-Agent Ad Hoc Teamwork

Title: Shapley Machine: A Game-Theoretic Framework for N-Agent Ad Hoc Teamwork

Shapley Machine: Ein Game-Theoretisches Framework für N-Agent Ad Hoc Teamwork

N-代理特设团队工作游戏理论框架 2506.11285v1

Authors (4): Jianhong Wang, Yang Li, Samuel Kaski, Jonathan Lawry

Open multi-agent systems are increasingly important in modeling real-world applications, such as smart grids, swarm robotics, etc. In this paper, we aim to investigate a recently proposed problem for open multi-agent systems, referred to as n-agent ad hoc teamwork (NAHT), where only a number of agents are controlled. Existing methods tend to be based on heuristic design and consequently lack theoretical rigor and ambiguous credit assignment among agents. To address these limitations, we model and solve NAHT through the lens of cooperative game theory. More specifically, we first model an open multi-agent system, characterized by its value, as an instance situated in a space of cooperative games, generated by a set of basis games. We then extend this space, along with the state space, to accommodate dynamic scenarios, thereby characterizing NAHT. Exploiting the justifiable assumption that basis game values correspond to a sequence of n-step returns with different horizons, we represent the state values for NAHT in a form similar to $\lambda$-returns. Furthermore, we derive Shapley values to allocate state values to the controlled agents, as credits for their contributions to the ad hoc team. Different from the conventional approach to shaping Shapley values in an explicit form, we shape Shapley values by fulfilling the three axioms uniquely describing them, well defined on the extended game space describing NAHT. To estimate Shapley values in dynamic scenarios, we propose a TD($\lambda$)-like algorithm. The resulting reinforcement learning (RL) algorithm is referred to as Shapley Machine. To our best knowledge, this is the first time that the concepts from cooperative game theory are directly related to RL concepts. In experiments, we demonstrate the effectiveness of Shapley Machine and verify reasonableness of our theory.

在模拟现实世界应用方面,开放多试剂系统越来越重要,例如智能网格、群温机器人等。在本文件中,我们的目标是调查最近提出的开放多试剂系统(称为n代理临时团队(NAHT))的问题,即只对若干代理进行控制。现有方法往往基于超自然设计,因此缺乏理论规范,而且代理商之间的信用分配模棱两可。为了解决这些局限性,我们用合作游戏理论的镜头来模拟和解决NAHT。更具体地说,我们先模拟一个开放多试剂系统,以其价值为特征,作为合作游戏空间空间空间空间空间空间,由一组基础游戏产生。我们扩展这个空间空间空间,与州空间一起适应动态情景,从而将NAHT定性。解释基础游戏价值与不同地平面的n步返回顺序相对,我们以类似于 MID$ 的形态来代表NAHHT的状态值。此外,我们从Spilty值中将Sqreality值配置为Sharpreal-deal-deal develop ex exal exal ex ex ex exaltiumal exal ex ex ex ex des thes Shaput thes Shapaltiquest Shalate Shalations Shapal Shalations, Shapal Shalus ex ex ex ex ex ex ex Shalations) theslations

Article 49

Title@2025-06-12 (4): Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

Title: Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

Ausbau des kooperativen Multi-Agenten-Verstärkungs-Lernens mit staatlicher Modellierung und kontersarieller Exploration

通过国家建模和反向探索,加强合作性多机构强化多机构强化学习以及国家建模和反向探索 2505.05262v2

Authors (6): Andreas Kontogiannis, Konstantinos Papathanasiou, Yi Shen, Giorgos Stamou, Michael M. Zavlanos, George Vouros

Learning to cooperate in distributed partially observable environments with no communication abilities poses significant challenges for multi-agent deep reinforcement learning (MARL). This paper addresses key concerns in this domain, focusing on inferring state representations from individual agent observations and leveraging these representations to enhance agents’ exploration and collaborative task execution policies. To this end, we propose a novel state modelling framework for cooperative MARL, where agents infer meaningful belief representations of the non-observable state, with respect to optimizing their own policies, while filtering redundant and less informative joint state information. Building upon this framework, we propose the MARL SMPE algorithm. In SMPE, agents enhance their own policy’s discriminative abilities under partial observability, explicitly by incorporating their beliefs into the policy network, and implicitly by adopting an adversarial type of exploration policies which encourages agents to discover novel, high-value states while improving the discriminative abilities of others. Experimentally, we show that SMPE outperforms state-of-the-art MARL algorithms in complex fully cooperative tasks from the MPE, LBF, and RWARE benchmarks.

在分布式、部分可观测的环境中,没有通信能力,学习如何合作,对多试剂深度强化学习(MARL)构成重大挑战。本文件述及该领域的主要关切,侧重于从个别代理人的观察中推断国家代表,并利用这些代表加强代理人的探索和合作任务执行政策。为此,我们提议为合作性MARL提出一个新的国家建模框架,使代理人推断出非观察性国家有意义的信仰表现,以优化自己的政策,同时过滤多余和较少信息的联合国家信息。我们在此框架的基础上提议MARL SMPE算法。在SMPE中,代理人通过将自己的信念纳入政策网络,并隐含地采取对抗性勘探政策,鼓励代理人发现新的高价值国家,同时提高他人的歧视性能力。我们实验性地表明,SMPE在MPE、LBF和RWARE基准的复杂全面合作任务中,超越了最先进的ML算法。

Article 50

Title@2025-06-12 (4): Noncooperative Equilibrium Selection via a Trading-based Auction

Title: Noncooperative Equilibrium Selection via a Trading-based Auction

Nichtkooperative Equilibrium-Auswahl über eine Trading-basierte Auktion

通过基于交易的拍卖选择平衡不合作 2502.03616v2

Authors (5): Jaehan Im, Filippos Fotiadis, Daniel Delahaye, Ufuk Topcu, David Fridovich-Keil

Noncooperative multi-agent systems often face coordination challenges due to conflicting preferences among agents. In particular, agents acting in their own self-interest can settle on different equilibria, leading to suboptimal outcomes or even safety concerns. We propose an algorithm named trading auction for consensus (TACo), a decentralized approach that enables noncooperative agents to reach consensus without communicating directly or disclosing private valuations. TACo facilitates coordination through a structured trading-based auction, where agents iteratively select choices of interest and provably reach an agreement within an a priori bounded number of steps. A series of numerical experiments validate that the termination guarantees of TACo hold in practice, and show that TACo achieves a median performance that minimizes the total cost across all agents, while allocating resources significantly more fairly than baseline approaches.

不合作的多试剂系统往往由于代理人之间的偏好冲突而面临协调挑战,特别是出于自身利益行事的代理人可以解决不同的平衡问题,导致不理想的结果,甚至安全问题。我们提议了一个名为交易拍卖以达成共识的算法(TACO ) , 这是一种分散的办法,使不合作的代理人能够在不直接沟通或不披露私人估值的情况下达成共识。 TACo通过结构化的基于贸易的拍卖促进协调,在该拍卖中,代理人迭接地选择利益选择,并在事先限定的若干步骤内达成协定。一系列数字实验证实TACo的终止保证在实践中是有效的,并表明TACo取得了一种中位性业绩,将所有代理人的总成本降到最低,同时分配的资源比基线方法要公平得多。

Article 51

Title@2025-06-12 (4): AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Title: AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

AutoMind: Adaptives Knowledgeable Agent für automatisierte Datenwissenschaft

自动Mind:自动数据科学适应性知识代理 2506.10974v1

Authors (9): Yixin Ou, Yujie Luo, Jingsheng Zheng, Lanning Wei, Shuofei Qiao, Jintian Zhang, Da Zheng, Huajun Chen, Ningyu Zhang

Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science.

大型语言模型(LLM)代理商在解决现实世界数据科学问题方面表现出了巨大的潜力。LLM驱动的数据科学代理商承诺使整个机器学习管道自动化,然而其真实世界的有效性仍然有限。现有框架依赖于僵硬、预先定义的工作流程和不灵活的编码战略;因此,它们仅擅长于相对简单、古老的问题,未能捕捉人类从业者带来复杂、创新任务的经验专长。在这项工作中,我们引入了AutoMind(AutoMind)(一个适应性、知识丰富的LLM(LM)代理商)框架,通过三项关键进步克服了这些缺陷:(1) 一种成熟的专家知识基础,使该代理商具有领域专家知识;(2) 一种具有代理知识的树搜索算法,从战略上探索可能的解决方案;(3) 一种自我调整的编码战略,根据任务的复杂性动态地定制生成代码。对两个自动化数据科学基准的评估表明,AutoMind(AutoMind)能够提供优异的绩效、效率和质量解决方案质量,强调AutMind(Autmind)是迈向完全自动化数据科学的高效和稳健健捷的一步。

Article 52

Title@2025-06-12 (4): Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Title: Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Unkoppelte Lerndynamik und Nash-Equilibrium für höhere Ordnung

高等职称无交错学习动态和纳什平衡 2506.10874v1

Authors (2): Sarah A. Toonsi, Jeff S. Shamma

We study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.

我们研究混合战略Nash Equilibrium(NE)在一般的有限游戏中学习混合战略Nash Equilibrium(NE)的可学习性。我们研究的是,在一般的有限游戏中,使用高阶复制机的动态以及高阶非混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合游戏(NE)的等级。在较高阶的学习动态中,存在较高阶的不相交混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合的游戏。在高阶的学习动态中,我们通过将学习与同步稳定的概念联系起来,进一步确定学习动力的缺乏普遍性。我们建造了两场游戏,让任何高阶的动态学会完全混合混合的NENE(NE) 利用对可自由学习动态施加的自然限制,我们引入了Asyrtregive 最佳同步的动态(ABRA) ,我们用Arnical-restial Indeal Recal restitutional restical restial restial restition (W) Procidustr) 和Arview Procial Stal detral) 将一个最稳定的状态与最动态进行。我们学习的状态显示了最佳的状态。

Article 53

Title@2025-06-12 (4): AI Agent Behavioral Science

Title: AI Agent Behavioral Science

KI Agent Verhaltenswissenschaft

AI 行为科学代理 2506.06366v3

Authors (16): Lin Chen, Yunke Zhang, Jie Feng, Haoye Chai, Honglin Zhang, Bingbing Fan, Yibo Ma, Shiyuan Zhang, Nian Li, Tianhui Liu, Nicholas Sukiennik, Keyu Zhao, Yu Li, Ziyi Liu, Fengli Xu, Yong Li

Recent advances in large language models (LLMs) have enabled the development of AI agents that exhibit increasingly human-like behaviors, including planning, adaptation, and social dynamics across diverse, interactive, and open-ended scenarios. These behaviors are not solely the product of the internal architectures of the underlying models, but emerge from their integration into agentic systems operating within specific contexts, where environmental factors, social cues, and interaction feedbacks shape behavior over time. This evolution necessitates a new scientific perspective: AI Agent Behavioral Science. Rather than focusing only on internal mechanisms, this perspective emphasizes the systematic observation of behavior, design of interventions to test hypotheses, and theory-guided interpretation of how AI agents act, adapt, and interact over time. We systematize a growing body of research across individual agent, multi-agent, and human-agent interaction settings, and further demonstrate how this perspective informs responsible AI by treating fairness, safety, interpretability, accountability, and privacy as behavioral properties. By unifying recent findings and laying out future directions, we position AI Agent Behavioral Science as a necessary complement to traditional model-centric approaches, providing essential tools for understanding, evaluating, and governing the real-world behavior of increasingly autonomous AI systems.

大型语言模型(LLMs)的近期进展使得AI代理商的发展能够显示日益表现出人性化的行为,包括规划、适应和各种互动和开放的假设情景,这些行为不仅仅是基础模型内部结构的产物,而且产生于它们融入特定情况下的代理系统,环境因素、社会提示和互动反馈影响着长期的行为。这种演变需要一个新的科学视角:AI Agri Abustival Science。这种视角不仅侧重于内部机制,而且强调系统观察行为,设计用来测试假设的干预措施,以及理论指导解释AI代理商的行为、适应和互动方式。我们系统化了在单个代理商、多代理商和人类代理互动环境中不断增长的一套研究,并进一步展示了这一视角如何通过将公平、安全、可解释性、问责和隐私作为行为特性对待来告知负责任的AI。我们通过统一最近的调查结果和提出未来方向,将AI代理商 Behavial Science作为传统模式中心方法的必要补充,为理解、评估、自主性和日益管理真实世界的行为提供了必不可少的工具。

Article 54

Title@2025-06-12 (4): AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Title: AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

AniMaker: Automatisiertes Multi-Agent-Animiertes Storytelling mit MCTS-gesteuerter Clip-Generierung

AniMaker:与MCTS-Driven Clift 生成的自动多代理动画小说 2506.10540v1

Authors (6): Haoyuan Shi, Yunxin Li, Xinyu Chen, Longyue Wang, Baotian Hu, Min Zhang

Despite rapid advancements in video generation models, generating coherent storytelling videos that span multiple scenes and characters remains challenging. Current methods often rigidly convert pre-generated keyframes into fixed-length clips, resulting in disjointed narratives and pacing issues. Furthermore, the inherent instability of video generation models means that even a single low-quality clip can significantly degrade the entire output animation’s logical coherence and visual continuity. To overcome these obstacles, we introduce AniMaker, a multi-agent framework enabling efficient multi-candidate clip generation and storytelling-aware clip selection, thus creating globally consistent and story-coherent animation solely from text input. The framework is structured around specialized agents, including the Director Agent for storyboard generation, the Photography Agent for video clip generation, the Reviewer Agent for evaluation, and the Post-Production Agent for editing and voiceover. Central to AniMaker’s approach are two key technical components: MCTS-Gen in Photography Agent, an efficient Monte Carlo Tree Search (MCTS)-inspired strategy that intelligently navigates the candidate space to generate high-potential clips while optimizing resource usage; and AniEval in Reviewer Agent, the first framework specifically designed for multi-shot animation evaluation, which assesses critical aspects such as story-level consistency, action completion, and animation-specific features by considering each clip in the context of its preceding and succeeding clips. Experiments demonstrate that AniMaker achieves superior quality as measured by popular metrics including VBench and our proposed AniEval framework, while significantly improving the efficiency of multi-candidate generation, pushing AI-generated storytelling animation closer to production standards.

尽管视频生成模型进展迅速,但制作跨越多个场景和字符的一致故事视频仍然具有挑战性。目前的方法往往僵硬地将预先生成的关键框架转换成固定长度的剪辑,造成脱节的叙述和节奏问题。此外,视频生成模型固有的不稳定性意味着即使是单一低质量的剪辑也能显著地降低整个产出动画的逻辑一致性和视觉连续性。为了克服这些障碍,我们引入了AniMaker,这是一个多试管框架,能够高效地多感应剪辑制作和叙事剪辑剪辑选择,从而产生全球一致和符合故事的动画。这个框架的结构围绕专业机构,包括故事生成的代理主任、视频剪辑制作的摄影代理、评估的预演器、编辑和语音翻译的后导剂。 Animaker 方法的核心是两大技术组成部分:摄影剂中的MCT-Gen,一个高效的多质量搜索(MCTS)激励策略,通过文本输入输入文字输入的智能空间,以生成高清晰度的直观的动动动动动画动画动动动动动画,同时通过优化的预演算来大幅优化的动作来显示其前演练的演练的演练的动作的动作,并展示,并展示过程的不断演化的演化的演化的动作。

Article 55

Title@2025-06-12 (4): Nonconvex Game and Multi Agent Reinforcement Learning for Zonal Ancillary Markets

Title: Nonconvex Game and Multi Agent Reinforcement Learning for Zonal Ancillary Markets

Nonconvex-Spiel und Multi-Agenten-Verstärkungs-Lernen für zonale Hilfsmärkte

为Zonal辅助市场进行非convelx 游戏和多剂强化学习 2505.03288v2

Authors (4): Francesco Morri, Hélène Le Cadre, Pierre Gruet, Luce Brotcorne

We characterize zonal ancillary market coupling relying on noncooperative game theory. To that purpose, we formulate the ancillary market as a multi-leader single follower bilevel problem, that we subsequently cast as a generalized Nash game with side constraints and nonconvex feasibility sets. We determine conditions for equilibrium existence and show that the game has a generalized potential game structure. To compute market equilibrium, we rely on two exact approaches: an integrated optimization approach and Gauss-Seidel best-response, that we compare against multi-agent deep reinforcement learning. On real data from Germany and Austria, simulations indicate that multi-agent deep reinforcement learning achieves the smallest convergence rate but requires pretraining, while best-response is the slowest. On the economics side, multi-agent deep reinforcement learning results in smaller market costs compared to the exact methods, but at the cost of higher variability in the profit allocation among stakeholders. Further, stronger coupling between zones tends to reduce costs for larger zones.

我们根据不合作的游戏理论将区级辅助市场组合定性为依赖不合作的游戏理论的区级辅助市场。为此,我们将辅助市场发展成一个多领导单一追随者双级问题,我们随后将之作为普世纳什游戏,并配有侧面限制和非康韦克斯可行性组合。我们确定均衡存在的条件,并显示游戏具有普遍的潜在游戏结构。我们计算市场平衡时,依靠两种精确的方法:综合优化方法和高斯-塞德尔最佳反应,我们比对多试剂深度加固学习进行比较。关于德国和奥地利的实际数据,模拟表明多试剂深层加固学习达到最小的趋同率,但需要预先培训,而最佳反应是最慢的。在经济方面,多试剂深度加固学习的结果是市场成本小于精确方法,但以利利利利分配的更大变动性为代价。此外,地区间更强大的结合往往降低大区的成本。

Article 56

Title@2025-06-12 (4): MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

Title: MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

MasHost baut alles: Autonomes Multi-Agenten-System, das durch Verstärkungslernen gesteuert wird

以强化学习为导向的多机构自治系统 2506.08507v2

Authors (8): Kuo Yang, Xingjie Yang, Linhui Yu, Qing Xu, Yan Fang, Xu Wang, Zhengyang Zhou, Yang Wang

Large Language Model (LLM)-driven Multi-agent systems (Mas) have recently emerged as a powerful paradigm for tackling complex real-world tasks. However, existing Mas construction methods typically rely on manually crafted interaction mechanisms or heuristic rules, introducing human biases and constraining the autonomous ability. Even with recent advances in adaptive Mas construction, existing systems largely remain within the paradigm of semi-autonomous patterns. In this work, we propose MasHost, a Reinforcement Learning (RL)-based framework for autonomous and query-adaptive Mas design. By formulating Mas construction as a graph search problem, our proposed MasHost jointly samples agent roles and their interactions through a unified probabilistic sampling mechanism. Beyond the accuracy and efficiency objectives pursued in prior works, we introduce component rationality as an additional and novel design principle in Mas. To achieve this multi-objective optimization, we propose Hierarchical Relative Policy Optimization (HRPO), a novel RL strategy that collaboratively integrates group-relative advantages and action-wise rewards. To our knowledge, our proposed MasHost is the first RL-driven framework for autonomous Mas graph construction. Extensive experiments on six benchmarks demonstrate that MasHost consistently outperforms most competitive baselines, validating its effectiveness, efficiency, and structure rationality.

大型语言模型(LLM)驱动的多试剂系统(Mas)最近成为处理复杂的现实世界任务的一个强有力的范例,然而,现有的Mas建筑方法通常依赖人工设计的互动机制或超常规则,引入人类偏见并限制自主能力。即使最近在适应性Mas建设方面有所进展,现有系统在很大程度上仍然处于半自治模式范式的范式之内。在这项工作中,我们提议以MasHost为主的强化学习框架(RL)为基础,用于自主和调试性Mas设计。通过将Mas建筑设计成图表搜索问题,我们提议的Mashost联合样本代理作用及其相互作用通过统一的概率抽样机制进行。除了在以前的工程中追求的准确性和效率目标外,我们还引入了部分合理性,作为新的设计原则。为了实现这一多目标优化,我们提议了高分级相对优化(HRPO),这是一个新型的RL战略,将群体优势和行动角度的奖赏结合起来。我们所拟议的Mashost是第一个由RL驱动的自主结构最有竞争力的标准,在连续的马斯最有竞争力的结构上展示的基线。

Article 57

Title@2025-06-12 (4): CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models

Title: CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models

CAF-I: Ein kollaboratives Multi-Agent-Framework für eine verbesserte Ironieerkennung mit großen Sprachmodellen

CAF-I:采用大语言模式加强铁铁探测多机构合作多方协作框架 2506.08430v2

Authors (3): Ziqi. Liu, Ziyang. Zhou, Mingxuan. Hu

Large language model (LLM) have become mainstream methods in the field of sarcasm detection. However, existing LLM methods face challenges in irony detection, including: 1. single-perspective limitations, 2. insufficient comprehensive understanding, and 3. lack of interpretability. This paper introduces the Collaborative Agent Framework for Irony (CAF-I), an LLM-driven multi-agent system designed to overcome these issues. CAF-I employs specialized agents for Context, Semantics, and Rhetoric, which perform multidimensional analysis and engage in interactive collaborative optimization. A Decision Agent then consolidates these perspectives, with a Refinement Evaluator Agent providing conditional feedback for optimization. Experiments on benchmark datasets establish CAF-I’s state-of-the-art zero-shot performance. Achieving SOTA on the vast majority of metrics, CAF-I reaches an average Macro-F1 of 76.31, a 4.98 absolute improvement over the strongest prior baseline. This success is attained by its effective simulation of human-like multi-perspective analysis, enhancing detection accuracy and interpretability.

大型语言模型(LLM)已成为讽刺性探测领域的主流方法,然而,现有的LLM方法在讽刺性探测方面面临着挑战,包括:1. 单视限制,2. 全面理解不足,3. 缺乏解释性;本文件介绍了旨在解决这些问题的LLM驱动的多试剂系统 “ 讽刺性协作剂框架(CAF-I) “ ;CAF-I雇用了背景、语义和Rhetoric等专门剂,进行多层面分析,并进行互动协作优化;然后,一个决策代理将这些观点合并起来,有一个精细评价剂为优化提供有条件反馈;基准数据集实验建立了CAF-I最先进的零光性性能;在绝大多数指标上实现SOTA, CAF-I达到76.31的平均宏观-F1,比以前最强的基线有4.98的绝对改进;通过有效模拟人型多视角分析,提高探测准确性和可解释性,取得了这一成功。

Article 58

Title@2025-06-12 (4): The Optimization Paradox in Clinical AI Multi-Agent Systems

Title: The Optimization Paradox in Clinical AI Multi-Agent Systems

Das Optimierungsparadox in klinischen KI-Multiagentensystemen

AI 临床多机构系统中最佳优化的副作用 2506.06574v2

Authors (5): Suhana Bedi, Iddah Mlauzi, Daniel Shin, Sanmi Koyejo, Nigam H. Shah

Multi-agent artificial intelligence systems are increasingly deployed in clinical settings, yet the relationship between component-level optimization and system-wide performance remains poorly understood. We evaluated this relationship using 2,400 real patient cases from the MIMIC-CDM dataset across four abdominal pathologies (appendicitis, pancreatitis, cholecystitis, diverticulitis), decomposing clinical diagnosis into information gathering, interpretation, and differential diagnosis. We evaluated single agent systems (one model performing all tasks) against multi-agent systems (specialized models for each task) using comprehensive metrics spanning diagnostic outcomes, process adherence, and cost efficiency. Our results reveal a paradox: while multi-agent systems generally outperformed single agents, the component-optimized or Best of Breed system with superior components and excellent process metrics (85.5% information accuracy) significantly underperformed in diagnostic accuracy (67.7% vs. 77.4% for a top multi-agent system). This finding underscores that successful integration of AI in healthcare requires not just component level optimization but also attention to information flow and compatibility between agents. Our findings highlight the need for end to end system validation rather than relying on component metrics alone.

多剂人工智能系统越来越多地部署在临床环境中,然而,各组成部分优化和全系统性能之间的关系仍然不甚为人知。我们利用MIMIMI-CDCD数据库中跨越四个腹部病理(甲型肝炎、胰腺炎、胆固炎、骨髓炎、转移性肺炎)的2 400个实际病人病例评估了这种关系,将临床诊断分解成信息收集、解释和差别诊断。我们评估了单剂系统(一个模式,执行所有任务)与多剂系统(每个任务的专门模型)之间的关系,使用综合指标,涵盖诊断结果、程序遵守和成本效率。我们的结果揭示了一个矛盾现象:多剂系统通常优于完善的单一物剂,但具有优异性成分和优异性流程指标(85.5%的信息准确性)的组合或最佳植树种系统,在诊断准确性方面表现严重不足(67.7%对77.4%的高级多剂系统来说是77.4%)。我们发现,在保健方面成功整合AI不仅需要部分优化,而且还需要注意信息流动和代理人之间的兼容性。我们强调最终需要依赖最终的测试系统,而不是依赖标准。

Article 0

Title@2025-06-18 (3): SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

Article 1

Title@2025-06-18 (3): CORA: Coalitional Rational Advantage Decomposition for Multi-Agent Policy Gradients

Article 2

Title@2025-06-18 (3): Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study

Article 3

Title@2025-06-18 (3): Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Article 4

Title@2025-06-18 (3): YOLO-MARL: You Only LLM Once for Multi-Agent Reinforcement Learning

Article 5

Title@2025-06-18 (3): ChatModel: Automating Reference Model Design and Verification with LLMs

Article 6

Title@2025-06-17 (2): Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Article 7

Title@2025-06-17 (2): Towards the Autonomous Optimization of Urban Logistics: Training Generative AI with Scientific Tools via Agentic Digital Twins and Model Context Protocol

Article 8

Title@2025-06-17 (2): AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

Article 9

Title@2025-06-17 (2): Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective

Article 10

Title@2025-06-17 (2): Hierarchical Multi-Agent Reinforcement Learning-based Coordinated Spatial Reuse for Next Generation WLANs

Article 11

Title@2025-06-17 (2): Light Aircraft Game : Basic Implementation and training results analysis

Article 12

Title@2025-06-17 (2): StorySage: Conversational Autobiography Writing Powered by a Multi-Agent Framework

Article 13

Title@2025-06-16 (1): Beyond Browsing: API-Based Web Agents

Article 14

Title@2025-06-16 (1): Deceptive Path Planning: A Bayesian Game Approach

Article 15

Title@2025-06-16 (1): Agent Capability Negotiation and Binding Protocol (ACNBP)

Article 16

Title@2025-06-16 (1): Mobility to Campus – a Framework to Evaluate and Compare Different Mobility Modes

Article 17

Title@2025-06-16 (1): Achieving Collective Welfare in Multi-Agent Reinforcement Learning via Suggestion Sharing

Article 18

Title@2025-06-16 (1): Socratic RL: A Novel Framework for Efficient Knowledge Acquisition through Iterative Reflection and Viewpoint Distillation

Article 19

Title@2025-06-16 (1): Design of A* based heuristic algorithm for efficient interdiction in multi-Layer networks

Article 20

Title@2025-06-16 (1): Towards Pervasive Distributed Agentic Generative AI – A State of The Art

Article 21

Title@2025-06-16 (1): Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning

Article 22

Title@2025-06-16 (1): G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Article 23

Title@2025-06-16 (1): Autonomous Computer Vision Development with Agentic AI

Article 24

Title@2025-06-16 (1): Identification of LFT Structured Descriptor Systems with Slow and Non-uniform Sampling

Article 25

Title@2025-06-15 (7): Homeostatic Coupling for Prosocial Behavior

Article 26

Title@2025-06-15 (7): HARBOR: Exploring Persona Dynamics in Multi-Agent Competition

Article 27

Title@2025-06-14 (6): Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow

Article 28

Title@2025-06-14 (6): A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications

Article 29

Title@2025-06-14 (6): Collaboration Between the City and Machine Learning Community is Crucial to Efficient Autonomous Vehicles Routing

Article 30

Title@2025-06-14 (6): Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

Article 31

Title@2025-06-14 (6): IndoorWorld: Integrating Physical Task Solving and Social Simulation in A Heterogeneous Multi-Agent Environment

Article 32

Title@2025-06-14 (6): Deep Fictitious Play-Based Potential Differential Games for Learning Human-Like Interaction at Unsignalized Intersections

Article 33

Title@2025-06-13 (5): Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study

Article 34

Title@2025-06-13 (5): A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon

Article 35

Title@2025-06-13 (5): Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?

Article 36

Title@2025-06-13 (5): Computational Social Choice: Parameterized Complexity and Challenges

Article 37

Title@2025-06-13 (5): Combining Deep Reinforcement Learning and Search with Generative Models for Game-Theoretic Opponent Modeling

Article 38

Title@2025-06-13 (5): The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

Article 39

Title@2025-06-13 (5): Agent Semantics, Semantic Spacetime, and Graphical Reasoning