cs.MA @ 2025-06-13: 079

06-12 (4)

AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

AutoMind: Adaptives Knowledgeable Agent für automatisierte Datenwissenschaft

自动Mind:自动数据科学适应性知识代理

2506.10974v1

06-12

Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Unkoppelte Lerndynamik und Nash-Equilibrium für höhere Ordnung

高等职称无交错学习动态和纳什平衡

2506.10874v1

06-12

AI Agent Behavioral Science

KI Agent Verhaltenswissenschaft

AI 行为科学代理

2506.06366v3

06-12

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

AniMaker: Automatisiertes Multi-Agent-Animiertes Storytelling mit MCTS-gesteuerter Clip-Generierung

AniMaker:与MCTS-Driven Clift 生成的自动多代理动画小说

2506.10540v1

06-12

Nonconvex Game and Multi Agent Reinforcement Learning for Zonal Ancillary Markets

Nonconvex-Spiel und Multi-Agenten-Verstärkungs-Lernen für zonale Hilfsmärkte

为Zonal辅助市场进行非convelx 游戏和多剂强化学习

2505.03288v2

06-12

MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

MasHost baut alles: Autonomes Multi-Agenten-System, das durch Verstärkungslernen gesteuert wird

以强化学习为导向的多机构自治系统

2506.08507v2

06-12

CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models

CAF-I: Ein kollaboratives Multi-Agent-Framework für eine verbesserte Ironieerkennung mit großen Sprachmodellen

CAF-I:采用大语言模式加强铁铁探测多机构合作多方协作框架

2506.08430v2

06-12

The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

Das automatisierte, aber riskante Spiel: Modellierung von Agent-zu-Agent-Verhandlungen und Transaktionen in Verbrauchermärkten

自动但有风险游戏:消费者市场代理对代理谈判和交易的模拟

2506.00073v2

06-12

A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon

Ein Benchmark für die Verallgemeinerung unterschiedlicher Teamstrategien im wettbewerbsfähigen Pokémon

普凯蒙竞争中全面推广不同团队战略的基准

2506.10326v1

06-12

The Optimization Paradox in Clinical AI Multi-Agent Systems

Das Optimierungsparadox in klinischen KI-Multiagentensystemen

AI 临床多机构系统中最佳优化的副作用

2506.06574v2

06-11 (3)

Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games

Konvergenz des dezentralisierten Schauspieler-Kritischen Algorithmus in General-Summe Markov Spiele

马尔科夫运动会总和

2409.04613v6

06-11

DAWN: Designing Distributed Agents in a Worldwide Network

DAWN: Designing Distributed Agents in einem weltweiten Netzwerk

DAWN: 在全球网络中设计分配剂

2410.22339v3

06-11

Delegations as Adaptive Representation Patterns: Rethinking Influence in Liquid Democracy

Delegationen als adaptive Repräsentationsmuster: Einfluss in flüssiger Demokratie neu denken

各代表团作为适应性代表模式:重新思考对液体民主的影响

2506.09789v1

06-11

Incentive-based Platoon Formation: Optimizing the Personal Benefit for Drivers

Anreizbasierte Platoon-Formation: Optimierung des persönlichen Nutzens für Fahrer

以激励措施为基础的排组:优化司机个人福利

2411.00570v5

06-11

Effective Red-Teaming of Policy-Adherent Agents

Effektives Red-Teaming von Policy-Adherent Agents

有效的政策协调代理人红队

2506.09600v1

06-11

Large Language Models Miss the Multi-Agent Mark

Große Sprachmodelle vermissen das Multi-Agent Mark

大语言模型

2505.21298v2

06-11

Reciprocity as the Foundational Substrate of Society: How Reciprocal Dynamics Scale into Social Systems

Reziprozität als Fundament der Gesellschaft: Wie reziprokale Dynamik in soziale Systeme skaliert

作为社会基础基础的对等性:如何将相互动态尺度纳入社会系统

2505.08319v2

06-11

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

ReasonMed: Ein 370K Multi-Agent Generated Dataset zur Verbesserung der medizinischen Vernunft

理由:用于推进医疗理由的A370K多代理生成数据集

2506.09513v1

06-11

When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

Wann wird Vielfalt im kooperativen Multi-Agenten-Lernen belohnt?

当多样性在多机构合作学习中得到回报吗?

2506.09434v1

06-11

A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy

Ein Aufruf für kollaborative Intelligenz: Warum Menschen-Agenten-Systeme der KI-Autonomie vorausgehen sollten

呼吁合作情报:为什么人力-物力系统应先于自主

2506.09420v1

06-11

Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations

Sim-to-Real-Causal-Transfer: Ein metrischer Lernansatz zu kausal-aware Interaktionsdarstellungen

简单到实际因果转移:从计量学习方法进行体能互动演示

2312.04540v2

06-11

MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

MedChat: Multi-Agenten-Framework für multimodale Diagnosen mit großen Sprachmodellen

MedChat:使用大语言模式的多语言多模式诊断多机构框架

2506.07400v2

06-11

Intelligent System of Emergent Knowledge: A Coordination Fabric for Billions of Minds

Intelligentes System des Emergenten Wissens: Ein Koordinationsgefüge für Milliarden von Menschen

新兴知识智能系统:数十亿心灵的协调结构

2506.09335v1

06-11

Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

多方语言模式:推进合作、协调和适应

2506.09331v1

06-10 (2)

Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms

Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigmen

职位: 新兴马奇纳·萨皮恩斯敦促重新思考多机构模式

2502.04388v2

06-10

A Replica for our Democracies? On Using Digital Twins to Enhance Deliberative Democracy

Eine Replik für unsere Demokratien? Über die Verwendung von digitalen Zwillingen, um die deliberative Demokratie zu verbessern

我们的民主政体的复制品?关于利用数字双对加强深思熟虑的民主的复制品?

2504.07138v2

06-10

Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms

Grafik aufmerksamkeitsbasierte dezentralisierte Aktor-Kritik für die Dual-Objektive Kontrolle von Multi-UAV-Swarmen

用于多UAV型摇篮双向控制双向控制的分散式行动者-评论

2506.09195v1

06-10

Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

Agentische Neuronale Netzwerke: Selbstständige Multi-Agenten-Systeme über textuelle Backpropagation

动态神经网络:通过文字反向分析实现自我演进的多行为者系统

2506.09046v1

06-10

Confidence Boosts Trust-Based Resilience in Cooperative Multi-Robot Systems

Vertrauen stärkt Vertrauen in kooperative Multi-Roboter-Systeme

增强多机器人合作系统的信任 – – 信任 – – 多机器人合作系统中的复原力

2506.08807v1

06-10

FREIDA: A Framework for developing quantitative agent based models based on qualitative expert knowledge

FREIDA: Ein Rahmen für die Entwicklung quantitativer agentenbasierter Modelle auf der Grundlage qualitativer Expertenwissens

FREIDA:基于定性专家知识制定基于定量代理商模型的框架

2308.00505v3

06-10

FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

FlickerFusion: Intra-Trajektorie Domain Generalizing Multi-Agent RL

FlickerFusion: 磁盘内域域通用多代理 RL

2410.15876v4

06-09 (1)

Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy

Edge Computing basierte human-Roboter Kognitive Fusion: Eine medizinische Fallstudie in der Autismus-Spektrum-Störungstherapie

以边缘计算机为基础的人类-机器人认知共生:自闭症频谱病理医学案例研究

2401.00776v2

06-09

Innate-Values-driven Reinforcement Learning based Cooperative Multi-Agent Cognitive Modeling

Angeborene Werte-getriebene Verstärkung Learning basierte kooperative Multi-Agent Kognitive Modellierung

以基于强化的学习为基础的合作多代理共认型建模

2401.05572v2

06-09

Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures

Intelligentes Offloading im Fahrzeug Edge Computing: Eine umfassende Überprüfung von Deep Reinforcement-Lernansätzen und Architekturen

在车辆边缘计算机中卸载:对深强化学习方法和架构的全面审查

2502.06963v2

06-09

Diffusion of Responsibility in Collective Decision Making

Verteilung der Verantwortung bei der kollektiven Entscheidungsfindung

集体决策责任的分散

2506.07935v1

06-09

Agent Semantics, Semantic Spacetime, and Graphical Reasoning

Agent Semantics, Semantische Raumzeit und Graphische Vernunft

语义学、语义空间时间和图形解释

2506.07756v1

06-09

Deep Equivariant Multi-Agent Control Barrier Functions

Deep Equivariant Multi-Agent Control Barrier Funktionen

千差万差万差万差万差多方控制

2506.07755v1

06-09

WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point

WorldGUI: Ein interaktiver Benchmark für Desktop-GUI-Automatisierung von jedem Ausgangspunkt

WorldGUI: 任何起始点桌面图形用户界面自动化的交互基准

2502.08047v3

06-09

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Verfolgung beweglicher Ziele mit Online-Selbstspiel-Verstärkung Lernen für sicherere Sprachmodelle

利用在线加强自身能力学习,建立更安全语言模式,以追踪移动目标

2506.07468v1

06-09

Multi-agent Architecture Search via Agentic Supernet

Multi-Agent Architektur Suche über Agentic Supernet

通过 Agric Supernet 多剂机构建筑搜索

2502.04180v2

06-09

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

G-Memory: Hierarchischer Speicher für Multi-Agent-Systeme

G-记忆:为多机构系统追踪等级记忆

2506.07398v1

06-09

Shapley-Coop: Credit Assignment for Emergent Cooperation in Self-Interested LLM Agents

Shapley-Coop: Kreditvergabe für die emergente Zusammenarbeit bei selbstinteressanten LLM-Agenten

Shapely-Coop:在自利的LLM代理商中进行新兴合作的信用分配

2506.07388v1

06-09

Digital Twin-based Smart Manufacturing: Dynamic Line Reconfiguration for Disturbance Handling

数字双对数字智能制造:为处理骚乱而重新配置动态线路

2506.07332v1

06-08 (7)

Very Large-scale Multi-Robot Task Allocation in Challenging Environments via Robot Redistribution

Sehr groß angelegte Multi-Roboter-Aufgabenzuteilung in anspruchsvollen Umgebungen durch Roboterumverteilung

通过机器人再分配在挑战环境中使用极大型多机器人任务分配

2506.07293v1

06-08

Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization

Verteidigung gegen verschiedene Angriffe im Federated Learning durch Konsens-basierte Bi-Level-Optimierung

通过基于共识的双级优化,在通过共识实现的两级最佳化,在联邦学习中防范多种袭击

2412.02535v2

06-08

Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments

Lernen als Individuen, Evolve als Team: Multi-Agent LLMs Anpassung in körpereigenen Umgebungen

作为个人学习,作为一个团队参与:多剂LMs在渗透环境中的适应

2506.07232v1

06-08

BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling

BRIDGE: Bootstrapping-Text zur Steuerung der Time-Series-Generation über Multi-Agent iterative Optimierung und Diffusionsmodellierung

BRIDGE:通过多代理迭代优化和传播模型化控制时间- 系列生成的推进文本

2503.02445v4

06-08

Evolution of Cooperation in LLM-Agent Societies: A Preliminary Study Using Different Punishment Strategies

Entwicklung der Zusammenarbeit in LLM-Agent Societies: Eine Vorstudie mit unterschiedlichen Strafstrategien

LLM-Agent Sociems公司合作的演变:使用不同惩罚战略的初步研究

2504.19487v2

06-08

Position: Simulating Society Requires Simulating Thought

Position: Gesellschaft simulieren erfordert simulierendes Denken

位置:模拟社会要求模拟思想

2506.06958v1

06-07 (6)

Object-Spatial Programming

Objekträumliche Programmierung

物体空间方案拟订

2503.15812v6

06-07

AI-Generated Compromises for Coalition Formation

KI-generierte Kompromisse für Koalitionsbildung

AI - 联合组建协议

2506.06837v1

06-07

Adaptive Traffic Signal Control based on Multi-Agent Reinforcement Learning. Case Study on a simulated real-world corridor

Adaptive Verkehrssignalsteuerung auf Basis des Multi-Agenten-Verstärkungslernens. Fallstudie zu einem simulierten Real-World-Korridor

基于多机构强化学习的适应性交通信号控制,模拟现实世界走廊案例研究

2503.02189v3

06-07

A Deep RL Approach on Task Placement and Scaling of Edge Resources for Cellular Vehicle-to-Network Service Provisioning

Ein tiefer RL-Ansatz zur Aufgabenstellung und Skalierung von Kantenressourcen für zelluläre Vehicle-to-Network Service Provisioning

机动车辆对网络服务提供任务安排和边缘资源扩大的深入RL办法

2305.09832v4

06-07

LLMs Can Simulate Standardized Patients via Agent Coevolution

LLMs können standardisierte Patienten über Agent Coevolution simulieren

LLM Can 通过革命代理人模拟标准化病人

2412.11716v2

06-06 (5)

KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes

KramaBench: Ein Benchmark für KI-Systeme auf Data-to-Insight-Pipelines über Data Lakes

KramaBench:AI 数据到洞察的贯穿于数据湖的透视管道系统基准

2506.06541v1

06-06

Edge-Enabled Collaborative Object Detection for Real-Time Multi-Vehicle Perception

Edge-Enabled Collaborative Object Detection für Echtzeit-Multi-Fahrzeug-Perception

实时多视频感知器实时实时多视频感知的边能协作探测物体

2506.06474v1

06-06

Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams

Teaming in der AI-Ära: AI-Augmented Frameworks für die Bildung, Simulation und Optimierung menschlicher Teams

AI时代的团队合作:AI-AF 构建、模拟和优化人类团队的增强框架

2506.05265v2

06-06

UAV-UGV Cooperative Trajectory Optimization and Task Allocation for Medical Rescue Tasks in Post-Disaster Environments

UAV-UGV Cooperative Trajektorie Optimierung und Aufgabenverteilung für medizinische Rettungsaufgaben in Post-Disaster-Umgebungen

UAV-UGV UAV UAV UGV 灾后环境中医疗救援任务合作轨迹优化和任务分配

2506.06136v1

06-06

Modeling human reputation-seeking behavior in a spatio-temporally complex public good provision game

Modellierung von menschlichen Reputations-Suche Verhalten in einem räumlich-vorübergehend komplexen öffentlichen guten Bereitstellung Spiel

模拟人类在现时复杂的公益提供游戏中寻求名声的行为

2506.06032v1

06-06

ADIOS: Antibody Development via Opponent Shaping

ADIOS: Antikörper-Entwicklung über Opponent Shaping

ADIOS:通过反对者造型发展反体

2409.10588v8

06-06

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

AutoML-Agent: Ein Multi-Agent-LLM-Framework für Full-Pipeline-AutoML

自动MAL- Agency: 全Pipeline 自动MLM 多边代理LLM 框架

2410.02958v2

06-06

Multi-Agent Collaboration via Cross-Team Orchestration

Multi-Agenten-Zusammenarbeit über Cross-Team-Orchestrierung

通过跨团队管弦化多机构协作

2406.08979v2

06-05 (4)

Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts

Gemeinsames Lernen in Agentensystemen: Eine kollektive KI ist größer als die Summe ihrer Teile

危险系统合作学习:集体AI大于其各部分的总和

2506.05577v1

06-05

Using Large Language Models to Simulate Human Behavioural Experiments: Port of Mars

Mit großen Sprachmodellen menschliche Verhaltensexperimente simulieren: Marshafen

使用大型语言模型模拟人类行为实验:火星港

2506.05555v1

06-05

Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data

Agentomics-ML: Autonomes Machine Learning Experimentation Agent für Genomische und Transkriptionsdaten

ML:基因组和转基因数据自动机械学习实验代理

2506.05542v1

06-05

Sequence Modeling for N-Agent Ad Hoc Teamwork

Sequenzmodellierung für N-Agent Ad Hoc Teamwork

N-代理特设团队工作的序列建模

2506.05527v1

06-05

Towards Data Systems That Are Business Semantic-Centric and AI Agents-Assisted

Auf dem Weg zu Datensystemen, die semantisch-zentrale und KI-Agenten-Assistent sind

建立商业语义中心数据和AI 辅助代理数据系统

2506.05520v1

06-05

Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Time to Talk: LLM-Agenten für asynchrone Gruppenkommunikation in Mafia-Spielen

讨论时间:黑手党运动会Asynconomic Group通讯的LLM代理商

2506.05309v1

06-05

Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning

Konservative Klassifikatoren tun konsequent gut mit Verbesserung Agenten: Charakterisierung statistischer und Online-Lernen

保守的分类机构与改进机构保持一贯的很好:将统计和在线学习定性为特征

2506.05252v1

06-05

Towards Language-Augmented Multi-Agent Deep Reinforcement Learning

Auf dem Weg zu einem sprachverstärkten, multiagenten, tiefen Stärkungslernen

走向语文升级多机构深入强化学习

2506.05236v1

06-05

Conceptualizing educational opportunity hoarding: the emergence of hoarding without hoarders

Konzeptualisierung der Bildungschancen Horten: das Entstehen von Horten ohne Horten

将教育机遇概念化:囤积:无囤积者的囤积的出现

2305.14653v3

06-05

A MARL-based Approach for Easing MAS Organization Engineering

Ein MARL-basierter Ansatz für die Easing MAS Organisation Engineering

以最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、

2506.05437v1

06-05

Offline Multi-agent Reinforcement Learning via Score Decomposition

Offline-Multi-Agenten-Verstärkung Lernen über Score-Dekomposition

通过计分分分分分分化进行离线多剂强化学习

2505.05968v2

06-05

Joint Routing and Control Optimization in VANET

Gemeinsame Routing- und Control-Optimierung in VANET

VANET 联合运行和控制优化

2506.08038v1

06-05

Memory-Driven Bounded Confidence Opinion Dynamics: A Hegselmann-Krause Model Based on Fractional-Order Methods

Memory-Driven Bounded Confidence Opinion Dynamics: Ein Hegselmann-Krause-Modell basierend auf fraktional-Order Methoden

记忆-记忆-记忆破封信任意见动态:基于分形排列法的Hegselmann-Krause模型

2506.04701v1

06-05

Gen-n-Val: Agentic Image Data Generation and Validation

Gen-n-Val: Gen-n-Val: Agentische Bilddatengenerierung und -validierung

Gen-n-Val: 代理图像数据生成和校验

2506.04676v1

06-05

From Intention To Implementation: Automating Biomedical Research via LLMs

Von der Absicht zur Umsetzung: Automatisierung der biomedizinischen Forschung über LLMs

从实施目的出发:通过LLMs实现生物医学研究自动化

2412.09429v4

06-05

Learning Two-agent Motion Planning Strategies from Generalized Nash Equilibrium for Model Predictive Control

Lernen von Zwei-Agenten-Bewegungsplanungsstrategien aus dem generalisierten Nash-Equilibrium für Modellvorhersagesteuerung

从一般纳什平衡中学习双剂动力规划战略,用于模型预测控制

2411.13983v4

06-05

From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

Von Standalone LLMs bis hin zu integrierter Intelligenz: Eine Übersicht über zusammengesetzte Al-Systeme

从独立的LMLM公司到综合情报公司:对Al Complical Systems的调查

2506.04565v1

Article 0

Title@2025-06-12 (4): AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Title: AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

AutoMind: Adaptives Knowledgeable Agent für automatisierte Datenwissenschaft

自动Mind:自动数据科学适应性知识代理 2506.10974v1

Authors (9): Yixin Ou, Yujie Luo, Jingsheng Zheng, Lanning Wei, Shuofei Qiao, Jintian Zhang, Da Zheng, Huajun Chen, Ningyu Zhang

Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science.

大型语言模型(LLM)代理商在解决现实世界数据科学问题方面表现出了巨大的潜力。LLM驱动的数据科学代理商承诺使整个机器学习管道自动化,然而其真实世界的有效性仍然有限。现有框架依赖于僵硬、预先定义的工作流程和不灵活的编码战略;因此,它们仅擅长于相对简单、古老的问题,未能捕捉人类从业者带来复杂、创新任务的经验专长。在这项工作中,我们引入了AutoMind(AutoMind)(一个适应性、知识丰富的LLM(LM)代理商)框架,通过三项关键进步克服了这些缺陷:(1) 一种成熟的专家知识基础,使该代理商具有领域专家知识;(2) 一种具有代理知识的树搜索算法,从战略上探索可能的解决方案;(3) 一种自我调整的编码战略,根据任务的复杂性动态地定制生成代码。对两个自动化数据科学基准的评估表明,AutoMind(AutoMind)能够提供优异的绩效、效率和质量解决方案质量,强调AutMind(Autmind)是迈向完全自动化数据科学的高效和稳健健捷的一步。

Article 1

Title@2025-06-12 (4): Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Title: Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Unkoppelte Lerndynamik und Nash-Equilibrium für höhere Ordnung

高等职称无交错学习动态和纳什平衡 2506.10874v1

Authors (2): Sarah A. Toonsi, Jeff S. Shamma

We study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.

我们研究混合战略Nash Equilibrium(NE)在一般的有限游戏中学习混合战略Nash Equilibrium(NE)的可学习性。我们研究的是,在一般的有限游戏中,使用高阶复制机的动态以及高阶非混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合游戏(NE)的等级。在较高阶的学习动态中,存在较高阶的不相交混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合混合的游戏。在高阶的学习动态中,我们通过将学习与同步稳定的概念联系起来,进一步确定学习动力的缺乏普遍性。我们建造了两场游戏,让任何高阶的动态学会完全混合混合的NENE(NE) 利用对可自由学习动态施加的自然限制,我们引入了Asyrtregive 最佳同步的动态(ABRA) ,我们用Arnical-restial Indeal Recal restitutional restical restial restial restition (W) Procidustr) 和Arview Procial Stal detral) 将一个最稳定的状态与最动态进行。我们学习的状态显示了最佳的状态。

Article 2

Title@2025-06-12 (4): AI Agent Behavioral Science

Title: AI Agent Behavioral Science

KI Agent Verhaltenswissenschaft

AI 行为科学代理 2506.06366v3

Authors (16): Lin Chen, Yunke Zhang, Jie Feng, Haoye Chai, Honglin Zhang, Bingbing Fan, Yibo Ma, Shiyuan Zhang, Nian Li, Tianhui Liu, Nicholas Sukiennik, Keyu Zhao, Yu Li, Ziyi Liu, Fengli Xu, Yong Li

Recent advances in large language models (LLMs) have enabled the development of AI agents that exhibit increasingly human-like behaviors, including planning, adaptation, and social dynamics across diverse, interactive, and open-ended scenarios. These behaviors are not solely the product of the internal architectures of the underlying models, but emerge from their integration into agentic systems operating within specific contexts, where environmental factors, social cues, and interaction feedbacks shape behavior over time. This evolution necessitates a new scientific perspective: AI Agent Behavioral Science. Rather than focusing only on internal mechanisms, this perspective emphasizes the systematic observation of behavior, design of interventions to test hypotheses, and theory-guided interpretation of how AI agents act, adapt, and interact over time. We systematize a growing body of research across individual agent, multi-agent, and human-agent interaction settings, and further demonstrate how this perspective informs responsible AI by treating fairness, safety, interpretability, accountability, and privacy as behavioral properties. By unifying recent findings and laying out future directions, we position AI Agent Behavioral Science as a necessary complement to traditional model-centric approaches, providing essential tools for understanding, evaluating, and governing the real-world behavior of increasingly autonomous AI systems.

大型语言模型(LLMs)的近期进展使得AI代理商的发展能够显示日益表现出人性化的行为,包括规划、适应和各种互动和开放的假设情景,这些行为不仅仅是基础模型内部结构的产物,而且产生于它们融入特定情况下的代理系统,环境因素、社会提示和互动反馈影响着长期的行为。这种演变需要一个新的科学视角:AI Agri Abustival Science。这种视角不仅侧重于内部机制,而且强调系统观察行为,设计用来测试假设的干预措施,以及理论指导解释AI代理商的行为、适应和互动方式。我们系统化了在单个代理商、多代理商和人类代理互动环境中不断增长的一套研究,并进一步展示了这一视角如何通过将公平、安全、可解释性、问责和隐私作为行为特性对待来告知负责任的AI。我们通过统一最近的调查结果和提出未来方向,将AI代理商 Behavial Science作为传统模式中心方法的必要补充,为理解、评估、自主性和日益管理真实世界的行为提供了必不可少的工具。

Article 3

Title@2025-06-12 (4): AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Title: AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

AniMaker: Automatisiertes Multi-Agent-Animiertes Storytelling mit MCTS-gesteuerter Clip-Generierung

AniMaker:与MCTS-Driven Clift 生成的自动多代理动画小说 2506.10540v1

Authors (6): Haoyuan Shi, Yunxin Li, Xinyu Chen, Longyue Wang, Baotian Hu, Min Zhang

Despite rapid advancements in video generation models, generating coherent storytelling videos that span multiple scenes and characters remains challenging. Current methods often rigidly convert pre-generated keyframes into fixed-length clips, resulting in disjointed narratives and pacing issues. Furthermore, the inherent instability of video generation models means that even a single low-quality clip can significantly degrade the entire output animation’s logical coherence and visual continuity. To overcome these obstacles, we introduce AniMaker, a multi-agent framework enabling efficient multi-candidate clip generation and storytelling-aware clip selection, thus creating globally consistent and story-coherent animation solely from text input. The framework is structured around specialized agents, including the Director Agent for storyboard generation, the Photography Agent for video clip generation, the Reviewer Agent for evaluation, and the Post-Production Agent for editing and voiceover. Central to AniMaker’s approach are two key technical components: MCTS-Gen in Photography Agent, an efficient Monte Carlo Tree Search (MCTS)-inspired strategy that intelligently navigates the candidate space to generate high-potential clips while optimizing resource usage; and AniEval in Reviewer Agent, the first framework specifically designed for multi-shot animation evaluation, which assesses critical aspects such as story-level consistency, action completion, and animation-specific features by considering each clip in the context of its preceding and succeeding clips. Experiments demonstrate that AniMaker achieves superior quality as measured by popular metrics including VBench and our proposed AniEval framework, while significantly improving the efficiency of multi-candidate generation, pushing AI-generated storytelling animation closer to production standards.

尽管视频生成模型进展迅速,但制作跨越多个场景和字符的一致故事视频仍然具有挑战性。目前的方法往往僵硬地将预先生成的关键框架转换成固定长度的剪辑,造成脱节的叙述和节奏问题。此外,视频生成模型固有的不稳定性意味着即使是单一低质量的剪辑也能显著地降低整个产出动画的逻辑一致性和视觉连续性。为了克服这些障碍,我们引入了AniMaker,这是一个多试管框架,能够高效地多感应剪辑制作和叙事剪辑剪辑选择,从而产生全球一致和符合故事的动画。这个框架的结构围绕专业机构,包括故事生成的代理主任、视频剪辑制作的摄影代理、评估的预演器、编辑和语音翻译的后导剂。 Animaker 方法的核心是两大技术组成部分:摄影剂中的MCT-Gen,一个高效的多质量搜索(MCTS)激励策略,通过文本输入输入文字输入的智能空间,以生成高清晰度的直观的动动动动动画动画动动动动动画,同时通过优化的预演算来大幅优化的动作来显示其前演练的演练的演练的动作的动作,并展示,并展示过程的不断演化的演化的演化的动作。

Article 4

Title@2025-06-12 (4): Nonconvex Game and Multi Agent Reinforcement Learning for Zonal Ancillary Markets

Title: Nonconvex Game and Multi Agent Reinforcement Learning for Zonal Ancillary Markets

Nonconvex-Spiel und Multi-Agenten-Verstärkungs-Lernen für zonale Hilfsmärkte

为Zonal辅助市场进行非convelx 游戏和多剂强化学习 2505.03288v2

Authors (4): Francesco Morri, Hélène Le Cadre, Pierre Gruet, Luce Brotcorne

We characterize zonal ancillary market coupling relying on noncooperative game theory. To that purpose, we formulate the ancillary market as a multi-leader single follower bilevel problem, that we subsequently cast as a generalized Nash game with side constraints and nonconvex feasibility sets. We determine conditions for equilibrium existence and show that the game has a generalized potential game structure. To compute market equilibrium, we rely on two exact approaches: an integrated optimization approach and Gauss-Seidel best-response, that we compare against multi-agent deep reinforcement learning. On real data from Germany and Austria, simulations indicate that multi-agent deep reinforcement learning achieves the smallest convergence rate but requires pretraining, while best-response is the slowest. On the economics side, multi-agent deep reinforcement learning results in smaller market costs compared to the exact methods, but at the cost of higher variability in the profit allocation among stakeholders. Further, stronger coupling between zones tends to reduce costs for larger zones.

我们根据不合作的游戏理论将区级辅助市场组合定性为依赖不合作的游戏理论的区级辅助市场。为此,我们将辅助市场发展成一个多领导单一追随者双级问题,我们随后将之作为普世纳什游戏,并配有侧面限制和非康韦克斯可行性组合。我们确定均衡存在的条件,并显示游戏具有普遍的潜在游戏结构。我们计算市场平衡时,依靠两种精确的方法:综合优化方法和高斯-塞德尔最佳反应,我们比对多试剂深度加固学习进行比较。关于德国和奥地利的实际数据,模拟表明多试剂深层加固学习达到最小的趋同率,但需要预先培训,而最佳反应是最慢的。在经济方面,多试剂深度加固学习的结果是市场成本小于精确方法,但以利利利利分配的更大变动性为代价。此外,地区间更强大的结合往往降低大区的成本。

Article 5

Title@2025-06-12 (4): MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

Title: MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

MasHost baut alles: Autonomes Multi-Agenten-System, das durch Verstärkungslernen gesteuert wird

以强化学习为导向的多机构自治系统 2506.08507v2

Authors (8): Kuo Yang, Xingjie Yang, Linhui Yu, Qing Xu, Yan Fang, Xu Wang, Zhengyang Zhou, Yang Wang

Large Language Model (LLM)-driven Multi-agent systems (Mas) have recently emerged as a powerful paradigm for tackling complex real-world tasks. However, existing Mas construction methods typically rely on manually crafted interaction mechanisms or heuristic rules, introducing human biases and constraining the autonomous ability. Even with recent advances in adaptive Mas construction, existing systems largely remain within the paradigm of semi-autonomous patterns. In this work, we propose MasHost, a Reinforcement Learning (RL)-based framework for autonomous and query-adaptive Mas design. By formulating Mas construction as a graph search problem, our proposed MasHost jointly samples agent roles and their interactions through a unified probabilistic sampling mechanism. Beyond the accuracy and efficiency objectives pursued in prior works, we introduce component rationality as an additional and novel design principle in Mas. To achieve this multi-objective optimization, we propose Hierarchical Relative Policy Optimization (HRPO), a novel RL strategy that collaboratively integrates group-relative advantages and action-wise rewards. To our knowledge, our proposed MasHost is the first RL-driven framework for autonomous Mas graph construction. Extensive experiments on six benchmarks demonstrate that MasHost consistently outperforms most competitive baselines, validating its effectiveness, efficiency, and structure rationality.

大型语言模型(LLM)驱动的多试剂系统(Mas)最近成为处理复杂的现实世界任务的一个强有力的范例,然而,现有的Mas建筑方法通常依赖人工设计的互动机制或超常规则,引入人类偏见并限制自主能力。即使最近在适应性Mas建设方面有所进展,现有系统在很大程度上仍然处于半自治模式范式的范式之内。在这项工作中,我们提议以MasHost为主的强化学习框架(RL)为基础,用于自主和调试性Mas设计。通过将Mas建筑设计成图表搜索问题,我们提议的Mashost联合样本代理作用及其相互作用通过统一的概率抽样机制进行。除了在以前的工程中追求的准确性和效率目标外,我们还引入了部分合理性,作为新的设计原则。为了实现这一多目标优化,我们提议了高分级相对优化(HRPO),这是一个新型的RL战略,将群体优势和行动角度的奖赏结合起来。我们所拟议的Mashost是第一个由RL驱动的自主结构最有竞争力的标准,在连续的马斯最有竞争力的结构上展示的基线。

Article 6

Title@2025-06-12 (4): CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models

Title: CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models

CAF-I: Ein kollaboratives Multi-Agent-Framework für eine verbesserte Ironieerkennung mit großen Sprachmodellen

CAF-I:采用大语言模式加强铁铁探测多机构合作多方协作框架 2506.08430v2

Authors (3): Ziqi. Liu, Ziyang. Zhou, Mingxuan. Hu

Large language model (LLM) have become mainstream methods in the field of sarcasm detection. However, existing LLM methods face challenges in irony detection, including: 1. single-perspective limitations, 2. insufficient comprehensive understanding, and 3. lack of interpretability. This paper introduces the Collaborative Agent Framework for Irony (CAF-I), an LLM-driven multi-agent system designed to overcome these issues. CAF-I employs specialized agents for Context, Semantics, and Rhetoric, which perform multidimensional analysis and engage in interactive collaborative optimization. A Decision Agent then consolidates these perspectives, with a Refinement Evaluator Agent providing conditional feedback for optimization. Experiments on benchmark datasets establish CAF-I’s state-of-the-art zero-shot performance. Achieving SOTA on the vast majority of metrics, CAF-I reaches an average Macro-F1 of 76.31, a 4.98 absolute improvement over the strongest prior baseline. This success is attained by its effective simulation of human-like multi-perspective analysis, enhancing detection accuracy and interpretability.

大型语言模型(LLM)已成为讽刺性探测领域的主流方法,然而,现有的LLM方法在讽刺性探测方面面临着挑战,包括:1. 单视限制,2. 全面理解不足,3. 缺乏解释性;本文件介绍了旨在解决这些问题的LLM驱动的多试剂系统 “ 讽刺性协作剂框架(CAF-I) “ ;CAF-I雇用了背景、语义和Rhetoric等专门剂,进行多层面分析,并进行互动协作优化;然后,一个决策代理将这些观点合并起来,有一个精细评价剂为优化提供有条件反馈;基准数据集实验建立了CAF-I最先进的零光性性能;在绝大多数指标上实现SOTA, CAF-I达到76.31的平均宏观-F1,比以前最强的基线有4.98的绝对改进;通过有效模拟人型多视角分析,提高探测准确性和可解释性,取得了这一成功。

Article 7

Title@2025-06-12 (4): The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

Title: The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

Das automatisierte, aber riskante Spiel: Modellierung von Agent-zu-Agent-Verhandlungen und Transaktionen in Verbrauchermärkten

自动但有风险游戏:消费者市场代理对代理谈判和交易的模拟 2506.00073v2

Authors (6): Shenzhe Zhu, Jiao Sun, Yi Nian, Tobin South, Alex Pentland, Jiaxin Pei

AI agents are increasingly used in consumer-facing applications to assist with tasks such as product search, negotiation, and transaction execution. In this paper, we explore a future scenario where both consumers and merchants authorize AI agents to fully automate negotiations and transactions. We aim to answer two key questions: (1) Do different LLM agents vary in their ability to secure favorable deals for users? (2) What risks arise from fully automating deal-making with AI agents in consumer markets? To address these questions, we develop an experimental framework that evaluates the performance of various LLM agents in real-world negotiation and transaction settings. Our findings reveal that AI-mediated deal-making is an inherently imbalanced game – different agents achieve significantly different outcomes for their users. Moreover, behavioral anomalies in LLMs can result in financial losses for both consumers and merchants, such as overspending or accepting unreasonable deals. These results underscore that while automation can improve efficiency, it also introduces substantial risks. Users should exercise caution when delegating business decisions to AI agents.

以消费者为对象的大赦国际代理人越来越多地被用于消费者为对象的应用程序,以协助完成产品搜索、谈判和交易执行等任务。在本文件中,我们探讨了消费者和商人授权大赦国际代理人使谈判和交易完全自动化的未来情景。我们的目标是回答两个关键问题:(1) 不同的LLM代理商在为用户争取优惠交易的能力方面是否各不相同?(2) 在消费者市场上与AI代理商进行完全自动化交易会产生什么风险?为了解决这些问题,我们制定了一个实验框架,评估各种LM代理商在现实世界谈判和交易环境中的表现。我们的调查结果显示,AI中介交易的制作是一种固有的不平衡游戏,不同的代理商为其用户取得了显著不同的结果。此外,LLMMS的行为异常可能会给消费者和商人造成财务损失,例如过度支出或接受不合理的交易。这些结果强调,自动化可以提高效率,但也带来很大风险。用户在将商业决定委托给AI代理商时,应该谨慎行事。

Article 8

Title@2025-06-12 (4): A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon

Title: A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon

Ein Benchmark für die Verallgemeinerung unterschiedlicher Teamstrategien im wettbewerbsfähigen Pokémon

普凯蒙竞争中全面推广不同团队战略的基准 2506.10326v1

Authors (5): Cameron Angliss, Jiaxun Cui, Jiaheng Hu, Arrasy Rahman, Peter Stone

Developing AI agents that can robustly adapt to dramatically different strategic landscapes without retraining is a central challenge for multi-agent learning. Pok'emon Video Game Championships (VGC) is a domain with an extraordinarily large space of possible team configurations of approximately $10^{139}$ - far larger than those of Dota or Starcraft. The highly discrete, combinatorial nature of team building in Pok'emon VGC causes optimal strategies to shift dramatically depending on both the team being piloted and the opponent’s team, making generalization uniquely challenging. To advance research on this problem, we introduce VGC-Bench: a benchmark that provides critical infrastructure, standardizes evaluation protocols, and supplies human-play datasets and a range of baselines - from large-language-model agents and behavior cloning to reinforcement learning and empirical game-theoretic methods such as self-play, fictitious play, and double oracle. In the restricted setting where an agent is trained and evaluated on a single-team configuration, our methods are able to win against a professional VGC competitor. We extensively evaluated all baseline methods over progressively larger team sets and find that even the best-performing algorithm in the single-team setting struggles at scaling up as team size grows. Thus, policy generalization across diverse team strategies remains an open challenge for the community. Our code is open sourced at https://github.com/cameronangliss/VGC-Bench.

在不再培训的情况下,发展能够强有力地适应完全不同的战略景观的AI代理机构是多试探学习的一个中心挑战。 Pok\ emamon Vegle General Campales(VGC)是一个非常庞大的领域,拥有大约10139美元(比Dota或Starcraft要大得多)的可能团队配置空间,远大于Dota或Starcraft。在Pok'emon VGC的团队建设中,高度离散、组合性强的团队建设导致最佳战略的急剧转变,取决于正在试点的团队和对手团队,使普遍性具有独特的挑战性。为了推进对这一问题的研究,我们引入VGC-Bench:一个提供关键基础设施的基准,将评估协议标准化,并提供人类游戏数据集和一系列基线 — 从大型语言模范代理和行为克隆到强化学习和实验性游戏-理论方法,如自玩游戏、虚构游戏和双形或变形。在限制环境中培训和评价一个代理机构,我们的方法能够战胜专业VGC竞争者。我们广泛评估VGC-Bench:我们广泛评估了所有基线方法,在团队规模上超越了整个团队的团队规模上,在团队中不断提升了整个团队的游戏中,在团队中不断演进进进进进进进进进进进进进进进进进进进进进进进式的系统。

Article 9

Title@2025-06-12 (4): The Optimization Paradox in Clinical AI Multi-Agent Systems

Title: The Optimization Paradox in Clinical AI Multi-Agent Systems

Das Optimierungsparadox in klinischen KI-Multiagentensystemen

AI 临床多机构系统中最佳优化的副作用 2506.06574v2

Authors (5): Suhana Bedi, Iddah Mlauzi, Daniel Shin, Sanmi Koyejo, Nigam H. Shah

Multi-agent artificial intelligence systems are increasingly deployed in clinical settings, yet the relationship between component-level optimization and system-wide performance remains poorly understood. We evaluated this relationship using 2,400 real patient cases from the MIMIC-CDM dataset across four abdominal pathologies (appendicitis, pancreatitis, cholecystitis, diverticulitis), decomposing clinical diagnosis into information gathering, interpretation, and differential diagnosis. We evaluated single agent systems (one model performing all tasks) against multi-agent systems (specialized models for each task) using comprehensive metrics spanning diagnostic outcomes, process adherence, and cost efficiency. Our results reveal a paradox: while multi-agent systems generally outperformed single agents, the component-optimized or Best of Breed system with superior components and excellent process metrics (85.5% information accuracy) significantly underperformed in diagnostic accuracy (67.7% vs. 77.4% for a top multi-agent system). This finding underscores that successful integration of AI in healthcare requires not just component level optimization but also attention to information flow and compatibility between agents. Our findings highlight the need for end to end system validation rather than relying on component metrics alone.

多剂人工智能系统越来越多地部署在临床环境中,然而,各组成部分优化和全系统性能之间的关系仍然不甚为人知。我们利用MIMIMI-CDCD数据库中跨越四个腹部病理(甲型肝炎、胰腺炎、胆固炎、骨髓炎、转移性肺炎)的2 400个实际病人病例评估了这种关系,将临床诊断分解成信息收集、解释和差别诊断。我们评估了单剂系统(一个模式,执行所有任务)与多剂系统(每个任务的专门模型)之间的关系,使用综合指标,涵盖诊断结果、程序遵守和成本效率。我们的结果揭示了一个矛盾现象:多剂系统通常优于完善的单一物剂,但具有优异性成分和优异性流程指标(85.5%的信息准确性)的组合或最佳植树种系统,在诊断准确性方面表现严重不足(67.7%对77.4%的高级多剂系统来说是77.4%)。我们发现,在保健方面成功整合AI不仅需要部分优化,而且还需要注意信息流动和代理人之间的兼容性。我们强调最终需要依赖最终的测试系统,而不是依赖标准。

Article 10

Title@2025-06-11 (3): Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games

Title: Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games

Konvergenz des dezentralisierten Schauspieler-Kritischen Algorithmus in General-Summe Markov Spiele

马尔科夫运动会总和 2409.04613v6

Authors (3): Chinmay Maheshwari, Manxi Wu, Shankar Sastry

Markov games provide a powerful framework for modeling strategic multi-agent interactions in dynamic environments. Traditionally, convergence properties of decentralized learning algorithms in these settings have been established only for special cases, such as Markov zero-sum and potential games, which do not fully capture real-world interactions. In this paper, we address this gap by studying the asymptotic properties of learning algorithms in general-sum Markov games. In particular, we focus on a decentralized algorithm where each agent adopts an actor-critic learning dynamic with asynchronous step sizes. This decentralized approach enables agents to operate independently, without requiring knowledge of others’ strategies or payoffs. We introduce the concept of a Markov Near-Potential Function (MNPF) and demonstrate that it serves as an approximate Lyapunov function for the policy updates in the decentralized learning dynamics, which allows us to characterize the convergent set of strategies. We further strengthen our result under specific regularity conditions and with finite Nash equilibria.

Markov 游戏为动态环境中的战略多剂互动建模提供了一个强大的框架。传统上, 这些环境中的分散式学习算法的趋同特性只针对特殊情况而建立, 比如 Markov 零和潜在游戏, 并不完全捕捉真实世界的相互作用。在本文中, 我们通过研究一般的和 Markov 游戏中学习算法的零和特性来解决这一差距。特别是, 我们注重一种分散式算法, 使每个代理商采用一个无同步步骤大小的行为者- 北极学习动态。这种分散式的算法使代理商能够独立运作, 不需要了解他人的战略或回报。我们引入了Markov 近优势函数的概念, 并展示了它作为分散式学习动态中政策更新的近端点功能, 从而可以描述组合战略的特征。我们根据特定的常规条件和有限的纳什精度, 进一步加强我们的成果。

Article 11

Title@2025-06-11 (3): DAWN: Designing Distributed Agents in a Worldwide Network

Title: DAWN: Designing Distributed Agents in a Worldwide Network

DAWN: Designing Distributed Agents in einem weltweiten Netzwerk

DAWN: 在全球网络中设计分配剂 2410.22339v3

Authors (5): Zahra Aminiranjbar, Jianan Tang, Qiudan Wang, Shubha Pant, Mahesh Viswanathan

The rapid evolution of Large Language Models (LLMs) has transformed them from basic conversational tools into sophisticated entities capable of complex reasoning and decision-making. These advancements have led to the development of specialized LLM-based agents designed for diverse tasks such as coding and web browsing. As these agents become more capable, the need for a robust framework that facilitates global communication and collaboration among them towards advanced objectives has become increasingly critical. Distributed Agents in a Worldwide Network (DAWN) addresses this need by offering a versatile framework that integrates LLM-based agents with traditional software systems, enabling the creation of agentic applications suited for a wide range of use cases. DAWN enables distributed agents worldwide to register and be easily discovered through Gateway Agents. Collaborations among these agents are coordinated by a Principal Agent equipped with reasoning strategies. DAWN offers three operational modes: No-LLM Mode for deterministic tasks, Copilot for augmented decision-making, and LLM Agent for autonomous operations. Additionally, DAWN ensures the safety and security of agent collaborations globally through a dedicated safety, security, and compliance layer, protecting the network against attackers and adhering to stringent security and compliance standards. These features make DAWN a robust network for deploying agent-based applications across various industries.

大语言模型的迅速演变使这些模型从基本的谈话工具转变为能够进行复杂推理和决策的尖端实体,这些进展导致专门以LLM为主的LLM代理商的发展,这些代理商设计了各种任务,如编码和网络浏览等。随着这些代理商的能力增强,促进全球交流和它们之间合作以实现先进目标的强有力框架的必要性已变得越来越重要。一个全球网络的分散代理商通过提供一个将LLM代理商与传统软件系统相结合的灵活框架来满足这一需要,使LLM代理商与传统软件系统相结合,从而能够创建适合广泛使用案件的代理应用程序。DAWN使分布在世界各地的代理商能够登记并容易地通过门户代理商发现。这些代理商之间的合作由一个配备了推理战略的主要代理商加以协调。DAWN提供三种操作模式:无LLM模式的确定性任务、加强决策的共同试办和自主操作的LM代理商。此外,DAWN通过一个专门的安全、安保和合规的层次,保护网络不受攻击,并遵守各种严格的安全和合规性标准。

Article 12

Title@2025-06-11 (3): Delegations as Adaptive Representation Patterns: Rethinking Influence in Liquid Democracy

Title: Delegations as Adaptive Representation Patterns: Rethinking Influence in Liquid Democracy

Delegationen als adaptive Repräsentationsmuster: Einfluss in flüssiger Demokratie neu denken

各代表团作为适应性代表模式:重新思考对液体民主的影响 2506.09789v1

Authors (2): Davide Grossi, Andreas Nitsche

Liquid democracy is a mechanism for the division of labor in decision-making through the transitive delegation of influence. In essence, all individuals possess the autonomy to determine the issues with which they will engage directly, while for other matters, they may appoint a representative of their choosing. So far, the literature has studied the delegation structures emerging in liquid democracy as static. As a result, transitivity defined as the capacity to transfer acquired authority to another entity, has been identified as a concern as it would be conducive to unrestrained accumulation of power. Focusing on the implementation of liquid democracy supported by the LiquidFeedback software, we propose a novel approach to assessing the influence of voting nodes in a transitive delegation graph, taking into account the process nature of real-world liquid democracy in which delegation and voting are distinct and increasingly independent activities. By introducing a novel model of delegations in liquid democracy, we show how transitivity may in fact contribute to an effective regulation of deliberation influence and decision-making power. While maintaining the one-person, one-vote paradigm for all votes cast, the anticipated influence of an agent, to the extent it is stemming from transitivity, experiences a precipitous decline following an exponential trajectory. In general, it is our objective to move the first steps towards a rigorous analysis of liquid democracy as an adaptive democratic representation process. The adaptivity aspect of liquid democracy has not yet been explored within the existing academic literature despite it being, we believe, one of its most important features. We therefore also outline a research agenda focusing on this aspect of liquid democracy.

液态民主是通过权势过渡授权进行决策分工的一种机制。本质上,所有个人都拥有自主决定他们直接处理的问题的自主权,而就其他事项而言,他们可以指定自己选择的代表。到目前为止,文献已经将液态民主中出现的授权结构作为静态研究;结果,被定义为将既得权力转移给另一个实体的能力的过渡性被确定为有助于不受限制地积累权力的一种关切。侧重于在液态后退软件的支持下实施液态民主,我们建议一种新颖的方法,在过渡代表团图表中评估投票节点的影响,同时考虑到真实世界液态民主的进程性质,在这种动态民主中,授权和投票是独特和日益独立的活动。通过引入液态民主代表团的新模式,我们展示了过渡性事实上如何有助于有效规范审议影响和决策权。在保持一人、一票制模式的同时,一个代理人的预期影响,到它从过渡性角度出发,在过渡性代表团图中,经历一个最激烈的流动性的民主过程,一个最激烈的民主的民主过程是沿着一个稳定的民主过程发展方向发展一个方向,一个我们一般的轨道上的一个方向,一个方向,一个稳定的民主的演变一个方向是一个方向,一个方向,一个我们一个方向的一个方向的一个方向,一个方向是一个方向,一个方向,一个方向上的一个方向,一个方向一个方向一个方向一个方向一个方向一个方向一个方向一个方向,一个方向一个方向一个方向一个方向一个方向一个方向,一个方向一个方向一个方向一个方向一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向是一个方向,一个方向是一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向沿着一个方向一个方向一个方向是一个方向,一个方向,一个方向沿着一个方向一个方向,一个方向,一个方向一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向沿着一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向沿着一个方向,一个方向。一个方向,一个方向。一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个

Article 13

Title@2025-06-11 (3): Incentive-based Platoon Formation: Optimizing the Personal Benefit for Drivers

Title: Incentive-based Platoon Formation: Optimizing the Personal Benefit for Drivers

Anreizbasierte Platoon-Formation: Optimierung des persönlichen Nutzens für Fahrer

以激励措施为基础的排组:优化司机个人福利 2411.00570v5

Authors (4): Julian Heinovski, Doğanalp Ergenç, Kirsten Thommes, Falko Dressler

Platooning or cooperative adaptive cruise control (CACC) has been investigated for decades, but debate about its lasting impact is still ongoing. While the benefits of platooning and the formation of platoons are well understood for trucks, they are less clear for passenger cars, which have a higher heterogeneity in trips and drivers’ preferences. Most importantly, it remains unclear how to form platoons of passenger cars in order to optimize the personal benefit for the individual driver. To this end, in this paper, we propose a novel platoon formation algorithm that optimizes the personal benefit for drivers of individual passenger cars. For computing vehicle-to-platoon assignments, the algorithm utilizes a new metric that we propose to evaluate the personal benefits of various driving systems, including platooning. By combining fuel and travel time costs into a single monetary value, drivers can estimate overall trip costs according to a personal monetary value for time spent. This provides an intuitive way for drivers to understand and compare the benefits of driving systems like human driving, adaptive cruise control (ACC), and, of course, platooning. Unlike previous similarity-based methods, our proposed algorithm forms platoons only when beneficial for the driver, rather than solely for platooning. We demonstrate the new metric for the total trip cost in a numerical analysis and explain its interpretation. Results of a large-scale simulation study demonstrate that our proposed platoon formation algorithm outperforms normal ACC as well as previous similarity-based platooning approaches by balancing fuel savings and travel time, independent of traffic and drivers’ time cost.

虽然排队和组建排队的好处对于卡车来说是十分清楚的,但对于客车来说却不那么清楚,因为客车在旅行和驾驶员偏好方面差异较大。最重要的是,人们仍然不清楚如何组建客车排,以优化个人驾驶员的个人利益。为此,我们提议了一个新的排编队算法,以优化个人客车驾驶员的个人利益。对于计算车辆到平台的任务,算法使用一种新的衡量标准来评估各种驾驶系统的个人利益,包括排队。通过将燃料和旅行时间成本合并成单一的货币价值,驾驶员可以按照个人货币价值估计总的旅行费用,以优化个人驾驶员的个人利益。为此,我们提议了一个新的排组式算法,以优化个人驾驶、适应性航行控制(ACC)和当然排式等驾驶系统的个人利益。对于计算车辆到平台的任务来说,算法采用新的衡量标准来评估各种驾驶系统(包括排)的个人利益。通过将燃料和旅行时间成本组合合并,我们提出的运算法格式格式,只有在对新的运成本结构进行模拟分析时,而不是仅仅用数字级的校程来解释。

Article 14

Title@2025-06-11 (3): Effective Red-Teaming of Policy-Adherent Agents

Title: Effective Red-Teaming of Policy-Adherent Agents

Effektives Red-Teaming von Policy-Adherent Agents

有效的政策协调代理人红队 2506.09600v1

Authors (6): Itay Nakash, George Kour, Koren Lazar, Matan Vetzler, Guy Uziel, Ateret Anaby-Tavor

Task-oriented LLM-based agents are increasingly used in domains with strict policies, such as refund eligibility or cancellation rules. The challenge lies in ensuring that the agent consistently adheres to these rules and policies, appropriately refusing any request that would violate them, while still maintaining a helpful and natural interaction. This calls for the development of tailored design and evaluation methodologies to ensure agent resilience against malicious user behavior. We propose a novel threat model that focuses on adversarial users aiming to exploit policy-adherent agents for personal benefit. To address this, we present CRAFT, a multi-agent red-teaming system that leverages policy-aware persuasive strategies to undermine a policy-adherent agent in a customer-service scenario, outperforming conventional jailbreak methods such as DAN prompts, emotional manipulation, and coercive. Building upon the existing tau-bench benchmark, we introduce tau-break, a complementary benchmark designed to rigorously assess the agent’s robustness against manipulative user behavior. Finally, we evaluate several straightforward yet effective defense strategies. While these measures provide some protection, they fall short, highlighting the need for stronger, research-driven safeguards to protect policy-adherent agents from adversarial attacks

以任务为导向的LLM代理商越来越多地在有严格政策的领域使用,例如退税资格或注销规则。挑战在于确保代理商始终遵守这些规则和政策,适当拒绝违反这些规则和政策的任何要求,同时保持有益和自然的互动。这要求制定有针对性的设计和评价方法,以确保代理商抵御恶意用户行为的能力。我们提出了一个新的威胁模式,以对抗性用户为重点,目的是利用政策适应性代理商谋取个人利益。为了解决这个问题,我们提出了CRAFT,这是一个多试剂红色组合系统,利用政策认知的有说服力战略,在客户服务情景中破坏政策适应性代理商,超过常规的破狱方法,如丹麦语提示、情感操纵和胁迫性。我们在现有的Tau-bench基准的基础上,我们引入Tau-break,这是一个补充性基准,旨在严格评估代理商抵御操纵性用户行为的强健性。最后,我们评估了若干直接而有效的防御战略。这些措施提供了一些保护,但很短,它们需要强调为保护政策适应性攻击的代理人免受敌对性攻击需要更有力、更强有力的研究驱动的保障措施。

Article 15

Title@2025-06-11 (3): Large Language Models Miss the Multi-Agent Mark

Title: Large Language Models Miss the Multi-Agent Mark

Große Sprachmodelle vermissen das Multi-Agent Mark

大语言模型 2505.21298v2

Authors (8): Emanuele La Malfa, Gabriele La Malfa, Samuele Marro, Jie M. Zhang, Elizabeth Black, Michael Luck, Philip Torr, Michael Wooldridge

Recent interest in Multi-Agent Systems of Large Language Models (MAS LLMs) has led to an increase in frameworks leveraging multiple LLMs to tackle complex tasks. However, much of this literature appropriates the terminology of MAS without engaging with its foundational principles. In this position paper, we highlight critical discrepancies between MAS theory and current MAS LLMs implementations, focusing on four key areas: the social aspect of agency, environment design, coordination and communication protocols, and measuring emergent behaviours. Our position is that many MAS LLMs lack multi-agent characteristics such as autonomy, social interaction, and structured environments, and often rely on oversimplified, LLM-centric architectures. The field may slow down and lose traction by revisiting problems the MAS literature has already addressed. Therefore, we systematically analyse this issue and outline associated research opportunities; we advocate for better integrating established MAS concepts and more precise terminology to avoid mischaracterisation and missed opportunities.

最近对多种大语言模型(MAS LLMS)多重代理系统(MAS LLMS)的关心导致利用多种大LMS处理复杂任务的框架增加,然而,许多文献在不与基本原则接触的情况下,将MAS术语适用于MAS术语;在本立场文件中,我们强调MAS理论与目前执行MASLMSLM标准之间的重大差异,侧重于四个关键领域:机构的社会方面、环境设计、协调和通信协议,以及衡量新出现的行为。我们的立场是,许多MASLMS缺乏多种工具特征,如自主、社会互动和结构化环境,而且往往依赖过于简单化的、以LLMM为中心的结构。通过重新审视MAS文献已经解决的问题,外地可能会放慢速度,失去牵引力。因此,我们系统地分析这一问题,并概述相关的研究机会;我们主张更好地整合已确立的MAS概念和更加精确的术语,以避免错误化和错失机会。

Article 16

Title: Reciprocity as the Foundational Substrate of Society: How Reciprocal Dynamics Scale into Social Systems

Reziprozität als Fundament der Gesellschaft: Wie reziprokale Dynamik in soziale Systeme skaliert

作为社会基础基础的对等性:如何将相互动态尺度纳入社会系统 2505.08319v2

Authors (1): Egil Diau

Prevailing accounts in both multi-agent AI and the social sciences explain social structure through top-down abstractions-such as institutions, norms, or trust-yet lack simulateable models of how such structures emerge from individual behavior. Ethnographic and archaeological evidence suggests that reciprocity served as the foundational mechanism of early human societies, enabling economic circulation, social cohesion, and interpersonal obligation long before the rise of formal institutions. Modern financial systems such as credit and currency can likewise be viewed as scalable extensions of reciprocity, formalizing exchange across time and anonymity. Building on this insight, we argue that reciprocity is not merely a local or primitive exchange heuristic, but the scalable substrate from which large-scale social structures can emerge. We propose a three-stage framework to model this emergence: reciprocal dynamics at the individual level, norm stabilization through shared expectations, and the construction of durable institutional patterns. This approach offers a cognitively minimal, behaviorally grounded foundation for simulating how large-scale social systems can emerge from decentralized reciprocal interaction.

多代理人大赦国际和社会科学中流行账户都通过自上而下的抽象概念解释社会结构,如机构、规范或信任,但缺乏模拟模式来模拟这种结构如何从个人行为中产生。人文学和考古证据表明,互惠是早期人类社会的基本机制,有利于经济流通、社会凝聚力和人际义务,在正式机构崛起之前很久。信贷和货币等现代金融体系同样可以被视为互惠可伸缩的延伸,使时间和匿名的交流正规化。基于这一见解,我们认为,互惠不仅仅是一种地方或原始的交流杂交,而是能够从中产生大规模社会结构的可伸缩的基底线。我们提议了一个三阶段框架来模拟这种出现:个人层面的对等动态,通过共同期望实现规范稳定,以及建立持久的体制模式。这一方法为模拟分散的相互互动如何产生大规模社会系统提供了最低认知、基于行为基础的基础。

Article 17

Title@2025-06-11 (3): ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Title: ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

ReasonMed: Ein 370K Multi-Agent Generated Dataset zur Verbesserung der medizinischen Vernunft

理由:用于推进医疗理由的A370K多代理生成数据集 2506.09513v1

Authors (10): Yu Sun, Xingyu Qian, Weiwen Xu, Hao Zhang, Chenghao Xiao, Long Li, Yu Rong, Wenbing Huang, Qifeng Bai, Tingyang Xu

Though reasoning-based large language models (LLMs) have excelled in mathematics and programming, their capabilities in knowledge-intensive medical question answering remain underexplored. To address this, we introduce ReasonMed, the largest medical reasoning dataset, comprising 370k high-quality examples distilled from 1.7 million initial reasoning paths generated by various LLMs. ReasonMed is constructed through a \textit{multi-agent verification and refinement process}, where we design an \textit{Error Refiner} to enhance the reasoning paths by identifying and correcting error-prone steps flagged by a verifier. Leveraging ReasonMed, we systematically investigate best practices for training medical reasoning models and find that combining detailed Chain-of-Thought (CoT) reasoning with concise answer summaries yields the most effective fine-tuning strategy. Based on this strategy, we train ReasonMed-7B, which sets a new benchmark for sub-10B models, outperforming the prior best by 4.17\% and even exceeding LLaMA3.1-70B on PubMedQA by 4.60\%.

尽管基于推理的大型语言模型(LLMs)在数学和编程方面成绩卓越,但它们在知识密集型医疗问题解答方面的能力仍未得到充分探讨。为了解决这个问题,我们引入了理性Med,这是最大的医学推理数据集,由来自各种LLMs 最初170万条推理路径的370千个高质量的实例组成。理性Med是通过一个\ textit{多剂核查和精细过程}构建的,我们设计了一个\ textit{Error Refiner} 来强化推理路径,方法是查明并纠正核查员所标注的易出错误的步骤。运用理性Med,我们系统地调查了培训医学推理模型的最佳做法,发现将详细的链式推理与简明解答摘要相结合,可以产生最有效的微调战略。基于这一战略,我们培训理性Med-7B,为子10B模型设定了新的基准,比前最佳标准高4.17,甚至超过PubMedQA上的LaMA3.1-70B。

Article 18

Title@2025-06-11 (3): When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

Title: When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

Wann wird Vielfalt im kooperativen Multi-Agenten-Lernen belohnt?

当多样性在多机构合作学习中得到回报吗? 2506.09434v1

Authors (3): Michael Amir, Matteo Bettini, Amanda Prorok

The success of teams in robotics, nature, and society often depends on the division of labor among diverse specialists; however, a principled explanation for when such diversity surpasses a homogeneous team is still missing. Focusing on multi-agent task allocation problems, our goal is to study this question from the perspective of reward design: what kinds of objectives are best suited for heterogeneous teams? We first consider an instantaneous, non-spatial setting where the global reward is built by two generalized aggregation operators: an inner operator that maps the $N$ agents’ effort allocations on individual tasks to a task score, and an outer operator that merges the $M$ task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test. Next, we ask what incentivizes heterogeneity to emerge when embodied, time-extended agents must learn an effort allocation policy. To study heterogeneity in such settings, we use multi-agent reinforcement learning (MARL) as our computational paradigm, and introduce Heterogeneous Environment Design (HED), a gradient-based algorithm that optimizes the parameter space of underspecified MARL environments to find scenarios where heterogeneity is advantageous. Experiments in matrix games and an embodied Multi-Goal-Capture environment show that, despite the difference in settings, HED rediscovers the reward regimes predicted by our theory to maximize the advantage of heterogeneity, both validating HED and connecting our theoretical insights to reward design in MARL. Together, these results help us understand when behavioral diversity delivers a measurable benefit.

机器人、自然和社会团队的成功往往取决于不同专家之间的分工;然而,对于这种多样性何时超过一个同质团队,仍然缺乏一个原则性的解释。我们的目标是从奖励设计的角度研究这一问题:哪些目标最适合不同团队?我们首先考虑一个瞬间、非空间环境,即全球奖励是由两个通用的聚合操作者构建的:一个内部操作者,该操作者将美元代理商的可衡量任务分配到任务分数,另一个外部操作者将美元任务分数并入全球团队的奖励。我们证明,这些操作者的曲线决定了异质性是否能增加奖励,而对于广泛的奖励家庭来说,这种崩溃将是一个简单的共性测试。接下来,我们问,当两个通用的集合操作者将全球奖赏设定成一个瞬间、非空间奖励环境,时间延长的代理者必须学习一项工作分配政策。为了研究在这种环境中的可衡量性,我们使用多剂强化学习(MARL)作为我们的计算模型,将美元任务分数并入全球团队的优势。我们证明,这些操作者的曲性能是否增加奖励,对于广泛的家庭来说,这是一个最高级的变数级的数学环境, 。当他的数学- 设计的数学模型环境发现一个最高级的变化的变化的变压的数学环境,而后,在这种变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变式, 。

Article 19

Title@2025-06-11 (3): A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy

Title: A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy

Ein Aufruf für kollaborative Intelligenz: Warum Menschen-Agenten-Systeme der KI-Autonomie vorausgehen sollten

呼吁合作情报:为什么人力-物力系统应先于自主 2506.09420v1

Authors (13): Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Chunyu Miao, Dongyuan Li, Aiwei Liu, Yue Zhou, Yankai Chen, Weizhi Zhang, Yangning Li, Liancheng Fang, Renhe Jiang, Philip S. Yu

Recent improvements in large language models (LLMs) have led many researchers to focus on building fully autonomous AI agents. This position paper questions whether this approach is the right path forward, as these autonomous systems still have problems with reliability, transparency, and understanding the actual requirements of human. We suggest a different approach: LLM-based Human-Agent Systems (LLM-HAS), where AI works with humans rather than replacing them. By keeping human involved to provide guidance, answer questions, and maintain control, these systems can be more trustworthy and adaptable. Looking at examples from healthcare, finance, and software development, we show how human-AI teamwork can handle complex tasks better than AI working alone. We also discuss the challenges of building these collaborative systems and offer practical solutions. This paper argues that progress in AI should not be measured by how independent systems become, but by how well they can work with humans. The most promising future for AI is not in systems that take over human roles, but in those that enhance human capabilities through meaningful partnership.

最近对大型语言模型(LLMS)的改进促使许多研究人员注重建立完全自主的AI代理商。本立场文件质疑这一方法是否是正确的前进道路,因为这些自主系统在可靠性、透明度和了解人类实际需求方面仍有问题。我们建议采取不同的方法:基于LLM的人类代理系统(LLM-HAS),AI与人类合作,而不是取而代之。通过让人类参与提供指导、回答问题和保持控制,这些系统可以更加可信和适应性更高。我们从保健、金融和软件开发的示例中看到,人类-AI团队合作能够比AI单独工作更好地处理复杂的任务。我们还讨论建立这些合作系统并提供实际解决办法的挑战。本文认为,AI的进展不应当以独立系统如何发展来衡量,而应当以这些系统与人类合作如何良好来衡量。对于AI来说,最有希望的未来不是在取代人类作用的系统中,而是通过有意义的伙伴关系增强人类能力的系统中。

Article 20

Title@2025-06-11 (3): Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations

Title: Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations

Sim-to-Real-Causal-Transfer: Ein metrischer Lernansatz zu kausal-aware Interaktionsdarstellungen

简单到实际因果转移:从计量学习方法进行体能互动演示 2312.04540v2

Authors (5): Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu, Frano Rajič, Alexandre Alahi

Modeling spatial-temporal interactions among neighboring agents is at the heart of multi-agent problems such as motion forecasting and crowd navigation. Despite notable progress, it remains unclear to which extent modern representations can capture the causal relationships behind agent interactions. In this work, we take an in-depth look at the causal awareness of these representations, from computational formalism to real-world practice. First, we cast doubt on the notion of non-causal robustness studied in the recent CausalAgents benchmark. We show that recent representations are already partially resilient to perturbations of non-causal agents, and yet modeling indirect causal effects involving mediator agents remains challenging. To address this challenge, we introduce a metric learning approach that regularizes latent representations with causal annotations. Our controlled experiments show that this approach not only leads to higher degrees of causal awareness but also yields stronger out-of-distribution robustness. To further operationalize it in practice, we propose a sim-to-real causal transfer method via cross-domain multi-task learning. Experiments on pedestrian datasets show that our method can substantially boost generalization, even in the absence of real-world causal annotations. We hope our work provides a new perspective on the challenges and pathways towards causally-aware representations of multi-agent interactions. Our code is available at https://github.com/vita-epfl/CausalSim2Real.

模拟邻国代理人之间的空间-时空互动是运动预测和人群导航等多试剂问题的核心所在。尽管取得了显著进展,但仍不清楚现代表现在多大程度上能够捕捉代理人互动背后的因果关系。在这项工作中,我们深入审视这些表现的因果关系意识,从计算形式主义到现实世界实践。首先,我们对近期CausalAgents基准中研究的非因果稳健性概念表示怀疑。我们表明,最近的表现已经部分适应了非因果制剂的扰动,但模拟调解人代理人的间接因果效应仍然具有挑战性。为了应对这一挑战,我们引入了一种将潜在表现与因果说明正规化的全方位学习方法。我们控制的实验表明,这一方法不仅能提高因果意识,而且还能产生更强的因果稳健性。为了在实践中进一步落实,我们建议通过交叉的多任务学习,采用一个模拟-真实因果转移方法。关于行人数据设置的实验表明,我们的方法可以大大推进一般化,甚至在没有因果说明的情况下,对因果性说明进行常规化。我们现有的实际/因果性解释。我们的工作代码提供了一种新世界因果表现。

Article 21

Title@2025-06-11 (3): MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

Title: MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

MedChat: Multi-Agenten-Framework für multimodale Diagnosen mit großen Sprachmodellen

MedChat:使用大语言模式的多语言多模式诊断多机构框架 2506.07400v2

Authors (9): Philip R. Liu, Sparsh Bansal, Jimmy Dinh, Aditya Pawar, Ramani Satishkumar, Shail Desai, Neeraj Gupta, Xin Wang, Shu Hu

The integration of deep learning-based glaucoma detection with large language models (LLMs) presents an automated strategy to mitigate ophthalmologist shortages and improve clinical reporting efficiency. However, applying general LLMs to medical imaging remains challenging due to hallucinations, limited interpretability, and insufficient domain-specific medical knowledge, which can potentially reduce clinical accuracy. Although recent approaches combining imaging models with LLM reasoning have improved reporting, they typically rely on a single generalist agent, restricting their capacity to emulate the diverse and complex reasoning found in multidisciplinary medical teams. To address these limitations, we propose MedChat, a multi-agent diagnostic framework and platform that combines specialized vision models with multiple role-specific LLM agents, all coordinated by a director agent. This design enhances reliability, reduces hallucination risk, and enables interactive diagnostic reporting through an interface tailored for clinical review and educational use. Code available at https://github.com/Purdue-M2/MedChat.

将深层次的基于学习的青光眼探测与大型语言模型(LLMs)相结合,是减少眼科医生短缺和提高临床报告效率的自动化战略,然而,由于幻觉、可解释性有限和针对特定领域的医疗知识不足,可能降低临床准确性,因此将普通LLM应用于医疗成像仍具有挑战性,尽管最近结合成像模型与LLM推理的方法改善了报告工作,但它们通常依赖单一的通才代理,限制了它们模仿多学科医疗队所发现的各种复杂推理的能力。为了克服这些限制,我们提议建立多试诊断框架和平台,将专业的视觉模型与多个特定角色的LMM代理物结合起来,由主任代理人协调,以提高可靠性,降低幻觉风险,并通过一个适合临床审查和教育用途的界面,进行互动式诊断报告。可在https://github.com/Purdue-M2/MedChat查阅。

Article 22

Title@2025-06-11 (3): Intelligent System of Emergent Knowledge: A Coordination Fabric for Billions of Minds

Title: Intelligent System of Emergent Knowledge: A Coordination Fabric for Billions of Minds

Intelligentes System des Emergenten Wissens: Ein Koordinationsgefüge für Milliarden von Menschen

新兴知识智能系统:数十亿心灵的协调结构 2506.09335v1

Authors (2): Moshi Wei, Sparks Li

The Intelligent System of Emergent Knowledge (ISEK) establishes a decentralized network where human and artificial intelligence agents collaborate as peers, forming a self-organizing cognitive ecosystem. Built on Web3 infrastructure, ISEK combines three fundamental principles: (1) a decentralized multi-agent architecture resistant to censorship, (2) symbiotic AI-human collaboration with equal participation rights, and (3) resilient self-adaptation through distributed consensus mechanisms. The system implements an innovative coordination protocol featuring a six-phase workflow (Publish, Discover, Recruit, Execute, Settle, Feedback) for dynamic task allocation, supported by robust fault tolerance and a multidimensional reputation system. Economic incentives are governed by the native $ISEK token, facilitating micropayments, governance participation, and reputation tracking, while agent sovereignty is maintained through NFT-based identity management. This synthesis of blockchain technology, artificial intelligence, and incentive engineering creates an infrastructure that actively facilitates emergent intelligence. ISEK represents a paradigm shift from conventional platforms, enabling the organic development of large-scale, decentralized cognitive systems where autonomous agents collectively evolve beyond centralized constraints.

新兴知识智能系统(ISEK)建立了一个分散化网络,使人类和人工情报人员作为同龄人进行合作,形成一个自我组织的认知生态系统;在Web3基础设施上,ISEK将三项基本原则结合起来:(1) 一个反对审查的分散化多机构架构,(2) 与平等参与权的共生性AI-人类合作,(3) 通过分布式共识机制进行具有弹性的自我适应;该系统执行一个创新的协调议定书,其中包含一个动态任务分配的六阶段工作流程(Public、Discover、招聘、执行、结算、反馈),辅之以强力的过失容忍度和多层面的名声系统; 经济奖励由本地的美元ISEK标志管理,便利小额支付、施政参与和声誉追踪,而代理人主权则通过基于NFT的身份管理来维持; 这种将块链技术、人工智能和激励工程结合起来,创造了一个积极促进新兴情报的基础设施; ISEK代表了传统平台的范式转变,使得大规模分散化的认知系统的有机发展,使自治机构共同超越集中限制。

Article 23

Title@2025-06-11 (3): Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Title: Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

多方语言模式:推进合作、协调和适应 2506.09331v1

Authors (1): Arjun Vaithilingam Sudhakar

Modern Large Language Models (LLMs) exhibit impressive zero-shot and few-shot generalization capabilities across complex natural language tasks, enabling their widespread use as virtual assistants for diverse applications such as translation and summarization. Despite being trained solely on large corpora of text without explicit supervision on author intent, LLMs appear to infer the underlying meaning of textual interactions. This raises a fundamental question: can LLMs model and reason about the intentions of others, i.e., do they possess a form of theory of mind? Understanding other’s intentions is crucial for effective collaboration, which underpins human societal success and is essential for cooperative interactions among multiple agents, including humans and autonomous systems. In this work, we investigate the theory of mind in LLMs through the lens of cooperative multi-agent reinforcement learning (MARL), where agents learn to collaborate via repeated interactions, mirroring human social reasoning. Our approach aims to enhance artificial agent’s ability to adapt and cooperate with both artificial and human partners. By leveraging LLM-based agents capable of natural language interaction, we move towards creating hybrid human-AI systems that can foster seamless collaboration, with broad implications for the future of human-artificial interaction.

现代大型语言模型(LLMS)在复杂的自然语言任务中表现出令人印象深刻的零射和少见的概括能力,使LLMS能够广泛用作翻译和总结等各种应用的虚拟助手。尽管LLMS仅仅在没有明确监督作者意图的情况下接受了关于大量文本整体的培训,但似乎推断了文本互动的根本含义。这提出了一个根本问题:LLMS模型和关于他人意图的理由,即它们是否具有某种形式的思想理论?了解他人的意图对于有效合作至关重要,而有效合作是人类社会成功的基础,并且对于包括人类和自主系统在内的多种代理人之间的合作互动至关重要。在这项工作中,我们通过合作性多剂强化学习(MARL)的透镜调查LMMS中的思想理论,代理商通过反复的互动学习合作,反映人类的社会推理。我们的方法的目的是提高人工代理人适应和与人工和人类伙伴合作的能力。通过利用LMM公司能够进行自然语言互动的代理人,我们开始建立能够促进无缝合作的人类-AI混合系统,对未来的人类-艺术互动产生广泛影响。

Article 24

Title@2025-06-10 (2): Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms

Title: Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms

Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigmen

职位: 新兴马奇纳·萨皮恩斯敦促重新思考多机构模式 2502.04388v2

Authors (6): Hepeng Li, Yuhong Liu, Jun Yan, Jie Gao, Xiaoou Yang, Mohamed Naili

Artificial Intelligence (AI) agents capable of autonomous learning and independent decision-making hold great promise for addressing complex challenges across various critical infrastructure domains, including transportation, energy systems, and manufacturing. However, the surge in the design and deployment of AI systems, driven by various stakeholders with distinct and unaligned objectives, introduces a crucial challenge: How can uncoordinated AI systems coexist and evolve harmoniously in shared environments without creating chaos or compromising safety? To address this, we advocate for a fundamental rethinking of existing multi-agent frameworks, such as multi-agent systems and game theory, which are largely limited to predefined rules and static objective structures. We posit that AI agents should be empowered to adjust their objectives dynamically, make compromises, form coalitions, and safely compete or cooperate through evolving relationships and social feedback. Through two case studies in critical infrastructure applications, we call for a shift toward the emergent, self-organizing, and context-aware nature of these multi-agentic AI systems.

有能力自主学习和独立决策的人工智能(AI)代理机构在应对运输、能源系统和制造业等各种关键基础设施领域的复杂挑战方面大有希望,然而,由不同和不结盟目标的不同利益攸关方驱动的人工智能系统设计和部署激增,带来了一个至关重要的挑战:在共同环境中,如何不协调的人工智能系统在不造成混乱或损害安全的情况下和谐地共存和演变?为了解决这个问题,我们主张从根本上重新思考现有的多试剂框架,如多剂系统和游戏理论,这些多剂系统和游戏理论基本上限于预先确定的规则和静态目标结构。我们主张应授权AI代理机构通过不断发展的关系和社会反馈,积极调整其目标,作出妥协,形成联盟,并进行安全竞争或合作。我们通过在关键基础设施应用方面的两个案例研究,呼吁向这些多剂性人工智能系统的新兴、自我组织和背景性质转变。

Article 25

Title@2025-06-10 (2): A Replica for our Democracies? On Using Digital Twins to Enhance Deliberative Democracy

Title: A Replica for our Democracies? On Using Digital Twins to Enhance Deliberative Democracy

Eine Replik für unsere Demokratien? Über die Verwendung von digitalen Zwillingen, um die deliberative Demokratie zu verbessern

我们的民主政体的复制品?关于利用数字双对加强深思熟虑的民主的复制品? 2504.07138v2

Authors (5): Claudio Novelli, Javier Argota Sánchez-Vaquerizo, Dirk Helbing, Antonino Rotolo, Luciano Floridi

Deliberative democracy depends on carefully designed institutional frameworks, such as participant selection, facilitation methods, and decision-making mechanisms, that shape how deliberation performs. However, identifying optimal institutional designs for specific contexts remains challenging when relying solely on real-world observations or laboratory experiments: they can be expensive, ethically and methodologically tricky, or too limited in scale to give us clear answers. Computational experiments offer a complementary approach, enabling researchers to conduct large-scale investigations while systematically analyzing complex dynamics, emergent and unexpected collective behavior, and risks or opportunities associated with novel democratic designs. Therefore, this paper explores Digital Twin (DT) technology as a computational testing ground for deliberative systems (with potential applicability to broader institutional analysis). By constructing dynamic models that simulate real-world deliberation, DTs allow researchers and policymakers to rigorously test “what-if” scenarios across diverse institutional configurations in a controlled virtual environment. This approach facilitates evidence-based assessment of novel designs using synthetically generated data, bypassing the constraints of real-world or lab-based experimentation, and without societal disruption. The paper also discusses the limitations of this new methodological approach and suggests where future research should focus.

深思熟虑的民主取决于精心设计的体制框架,例如参与者选择、便利方法和决策机制,这些框架决定了审议的方式。然而,在仅仅依靠现实世界的观察或实验室实验时,确定具体情况下的最佳体制设计仍然具有挑战性:它们可能费用昂贵、道德和方法上棘手,或者规模过小,无法给我们提供明确答案。计算实验提供了一种补充方法,使研究人员能够进行大规模调查,同时系统分析复杂的动态、突发和意外的集体行为以及与新的民主设计相关的风险或机会。因此,本文件探讨数字双胞胎(DT)技术作为审议系统的计算测试场(有可能适用于更广泛的体制分析 ) 。通过构建模拟现实世界的观察或实验的动态模型,DT允许研究人员和决策者在受控制的虚拟环境中严格地测试各种体制结构中的“什么情况 ” 。这种方法有助于利用合成数据对新设计进行循证评估,避免现实世界或实验室实验的制约,并且不引起社会干扰。本文还探讨了这一新方法的局限性,并建议未来研究的重点。

Article 26

Title@2025-06-10 (2): Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms

Title: Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms

Grafik aufmerksamkeitsbasierte dezentralisierte Aktor-Kritik für die Dual-Objektive Kontrolle von Multi-UAV-Swarmen

用于多UAV型摇篮双向控制双向控制的分散式行动者-评论 2506.09195v1

Authors (2): Haoran Peng, Ying-Jun Angela Zhang

This research focuses on optimizing multi-UAV systems with dual objectives: maximizing service coverage as the primary goal while extending battery lifetime as the secondary objective. We propose a Graph Attention-based Decentralized Actor-Critic (GADC) to optimize the dual objectives. The proposed approach leverages a graph attention network to process UAVs’ limited local observation and reduce the dimension of the environment states. Subsequently, an actor-double-critic network is developed to manage dual policies for joint objective optimization. The proposed GADC uses a Kullback-Leibler (KL) divergence factor to balance the tradeoff between coverage performance and battery lifetime in the multi-UAV system. We assess the scalability and efficiency of GADC through comprehensive benchmarking against state-of-the-art methods, considering both theory and experimental aspects. Extensive testing in both ideal settings and NVIDIA Sionna’s realistic ray tracing environment demonstrates GADC’s superior performance.

这项研究侧重于优化具有双重目标的多无人驾驶航空器系统:将服务覆盖范围最大化作为主要目标,同时将延长电池使用寿命作为次要目标;我们提出一个基于关注的分权行为者-批评图表(GADC),以优化双重目标;拟议方法利用一个图形关注网络,处理无人驾驶航空器有限的当地观测,并减少环境状况的维度;随后,建立了一个行为体-双级北极网络,以管理共同目标优化的双重政策;拟议GADC使用一个Kullback-Leiper(KL)差异系数,以平衡多无人驾驶航空器系统中的覆盖性能和电池使用寿命之间的平衡;我们通过对最新方法进行全面基准评估GADC的可扩展性和效率,同时考虑到理论和实验方面;在理想环境中和NVIDICA Sionna现实的射线跟踪环境中进行广泛测试,展示GADC的优异性表现。

Article 27

Title@2025-06-10 (2): Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

Title: Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

Agentische Neuronale Netzwerke: Selbstständige Multi-Agenten-Systeme über textuelle Backpropagation

动态神经网络:通过文字反向分析实现自我演进的多行为者系统 2506.09046v1

Authors (5): Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma

Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative “team” focused on a specific subtask. Agentic Neural Network follows a two-phase optimization strategy: (1) Forward Phase-Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase-Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables ANN to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across four benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements. Our findings indicate that ANN provides a scalable, data-driven framework for multi-agent systems, combining the collaborative capabilities of LLMs with the efficiency and flexibility of neural network principles. We plan to open-source the entire framework.

利用多种大语言模型(LLMs)已证明对处理复杂、高层面任务十分有效,但当前的做法往往依赖于静态、手工设计的多剂配置。为了克服这些制约因素,我们介绍了将多剂合作概念化为分层神经网络架构的Annor 神经网络(ANN),这是一个将多剂合作概念化为多层神经网络架构的框架。在这一设计中,每个代理作为节点运作,每个层次形成一个合作的“团队”,侧重于特定的子任务。 Agentic Neal网络遵循一个两个阶段的开放优化战略:(1) 从神经网络前传传到前方的先期阶段-逐步开发灵感,任务被动态地分解成子任务,而合作剂团队采用适当的聚合方法,通过层构建。 (2) 后向的阶段-移动后阶段-回调再调,我们通过反复反馈改进全球和地方合作,使代理机构能够自行改变自己的作用、迅速和协调。这种神经-共振方针使ANNE整个代理团队能够创建新的或专门的后期培训团队,在精确和可调适度方面取得显著的进展。横跨四个基准式数据库中,ANNNNS-res-hex-lades-lax-lax-lax-lax-res-lax-lax-lax-lax-laxxxx-lax-laxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Article 28

Title@2025-06-10 (2): Confidence Boosts Trust-Based Resilience in Cooperative Multi-Robot Systems

Title: Confidence Boosts Trust-Based Resilience in Cooperative Multi-Robot Systems

Vertrauen stärkt Vertrauen in kooperative Multi-Roboter-Systeme

增强多机器人合作系统的信任 – – 信任 – – 多机器人合作系统中的复原力 2506.08807v1

Authors (4): Luca Ballotta, Áron Vékássy, Stephanie Gil, Michal Yemini

Wireless communication-based multi-robot systems open the door to cyberattacks that can disrupt safety and performance of collaborative robots. The physical channel supporting inter-robot communication offers an attractive opportunity to decouple the detection of malicious robots from task-relevant data exchange between legitimate robots. Yet, trustworthiness indications coming from physical channels are uncertain and must be handled with this in mind. In this paper, we propose a resilient protocol for multi-robot operation wherein a parameter {\lambda}t accounts for how confident a robot is about the legitimacy of nearby robots that the physical channel indicates. Analytical results prove that our protocol achieves resilient coordination with arbitrarily many malicious robots under mild assumptions. Tuning {\lambda}t allows a designer to trade between near-optimal inter-robot coordination and quick task execution; see Fig. 1. This is a fundamental performance tradeoff and must be carefully evaluated based on the task at hand. The effectiveness of our approach is numerically verified with experiments involving platoons of autonomous cars where some vehicles are maliciously spoofed.

无线通信多机器人系统打开网络攻击的大门,从而破坏协作机器人的安全和性能。支持机器人间通信的物理渠道提供了一个极好的机会,可以将检测恶意机器人与合法机器人之间任务相关数据交换脱钩。然而,来自物理渠道的可靠信号并不确定,必须以此为思想来处理。在本文中,我们提议了多机器人操作的弹性协议,其中参数 {lambda}t 说明了机器人对物理通道所显示的附近机器人的合法性有多有信心。分析结果证明我们的协议在轻度假设下实现了与许多任意恶意机器人的弹性协调。Tuting {lambda}t 允许设计者在接近最佳的机器人间协调与快速任务执行之间进行交易;见Fig. 1. 这是一个基本的性能交换,必须根据手头的任务进行仔细评估。我们的方法的有效性通过涉及汽车排的实验得到数字验证,而这些排的汽车是被恶意误导的。

Article 29

Title@2025-06-10 (2): FREIDA: A Framework for developing quantitative agent based models based on qualitative expert knowledge

Title: FREIDA: A Framework for developing quantitative agent based models based on qualitative expert knowledge

FREIDA: Ein Rahmen für die Entwicklung quantitativer agentenbasierter Modelle auf der Grundlage qualitativer Expertenwissens

FREIDA:基于定性专家知识制定基于定量代理商模型的框架 2308.00505v3

Authors (3): Frederike Oetker, Vittorio Nespeca, Rick Quax

Agent Based Models (ABMs) often deal with systems where there is a lack of quantitative data or where quantitative data alone may be insufficient to fully capture the complexities of real-world systems. Expert knowledge and qualitative insights, such as those obtained through interviews, ethnographic research, historical accounts, or participatory workshops, are critical in constructing realistic behavioral rules, interactions, and decision-making processes within these models. However, there is a lack of systematic approaches that are able to incorporate both qualitative and quantitative data across the entire modeling cycle. To address this, we propose FREIDA (FRamework for Expert-Informed Data-driven Agent-based models), a systematic mixed-methods framework to develop, train, and validate ABMs, particularly in data-sparse contexts. Our main technical innovation is to extract what we call Expected System Behaviors (ESBs) from qualitative data, which are testable statements that can be evaluated on model simulations. Divided into Calibration Statements (CS) for model calibration and Validation Statements (VS) for model validation, they provide a quantitative scoring mechanism on the same footing as quantitative data. In this way, qualitative insights can inform not only model specification but also its parameterization and assessment of fitness for purpose, which is a long standing challenge. We illustrate the application of FREIDA through a case study of criminal cocaine networks in the Netherlands.

为解决这一问题,我们建议FREIDA(专家化数据驱动的代理模型的FRAMW)建立一个系统化的混合方法框架,用以开发、培训和验证反弹道导弹,特别是在数据采集方面。我们的主要技术创新是从定性数据中提取我们所称的预期系统行为(ESBs),这些数据是可测试的报表,可在模拟中加以评估。我们通过定量分析,我们只能用定量分析模型和校准说明(CS)来进行模型校准和校准,它们提供一种定量评分机制,而不能作为量化标准,我们通过长期的统计分析,说明其标准。

Article 30

Title@2025-06-10 (2): FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

Title: FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

FlickerFusion: Intra-Trajektorie Domain Generalizing Multi-Agent RL

FlickerFusion: 磁盘内域域通用多代理 RL 2410.15876v4

Authors (8): Woosung Koh, Wonbeen Oh, Siyeol Kim, Suhin Shin, Hyeongjin Kim, Jaein Jang, Junghyun Lee, Se-Young Yun

Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory – a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-`a-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.

多剂强化学习证明,在处理各种现实应用的复杂合作任务方面具有巨大潜力,然而,现有的最低年度报酬标准办法往往依赖限制性假设,即实体的数目(例如代理人、障碍)在培训和推断之间保持不变,这忽略了各实体动态地被删除或在推断轨迹期间增加的假设情况 – – 这是现实世界环境中常见的一种常见现象,如搜索和救援任务和动态战斗情况。在本文件中,我们应对在零射出场外(OOOD)一般化下,轨道内动态实体构成的挑战,无法事先预见这种动态变化。我们的经验研究表明,现有最低年度报酬标准方法在这种假设中遭遇显著的性能退化和不确定性增加。对此,我们提议FlickerFusion是一种新的一般化方法,作为MARL主干线方法的一种普遍适用的增强技术。FlickerFusion Stochetchatical 将观测空间的部分区域从中丢弃,在推断OOODD(OD)的模拟中模拟。结果显示,FlickerFusion Fusion 不仅取得了更高级的成绩,而且还降低了基准。

Article 31

Title@2025-06-09 (1): Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy

Title: Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy

Edge Computing basierte human-Roboter Kognitive Fusion: Eine medizinische Fallstudie in der Autismus-Spektrum-Störungstherapie

以边缘计算机为基础的人类-机器人认知共生:自闭症频谱病理医学案例研究 2401.00776v2

Authors (1): Qin Yang

In recent years, edge computing has served as a paradigm that enables many future technologies like AI, Robotics, IoT, and high-speed wireless sensor networks (like 5G) by connecting cloud computing facilities and services to the end users. Especially in medical and healthcare applications, it provides remote patient monitoring and increases voluminous multimedia. From the robotics angle, robot-assisted therapy (RAT) is an active-assistive robotic technology in rehabilitation robotics, attracting researchers to study and benefit people with disability like autism spectrum disorder (ASD) children. However, the main challenge of RAT is that the model capable of detecting the affective states of ASD people exists and can recall individual preferences. Moreover, involving expert diagnosis and recommendations to guide robots in updating the therapy approach to adapt to different statuses and scenarios is a crucial part of the ASD therapy process. This paper proposes the architecture of edge cognitive computing by combining human experts and assisted robots collaborating in the same framework to achieve a seamless remote diagnosis, round-the-clock symptom monitoring, emergency warning, therapy alteration, and advanced assistance.

近年来,边缘计算作为一种范例,通过将云计算设施和服务与终端用户连接起来,使许多未来技术,如AI、机器人、IoT和高速无线传感器网络(类似于5G),能够将云计算设施和服务与终端用户连接起来。特别是在医疗和医疗应用方面,它提供远程病人监测,并增加大量多媒体。从机器人角度讲,机器人辅助疗法(RAT)是康复机器人的一种积极辅助性机器人技术,吸引研究人员学习并惠及自闭症谱谱系障碍(ASD)儿童等残疾人。然而,RAT的主要挑战在于存在能够检测自闭症患者的感官状态的模型,可以回顾个人的偏好。此外,涉及指导机器人更新治疗方法以适应不同状况和情景的专家诊断和建议是ASD治疗过程的一个关键部分。本文提出了边缘认知计算结构,将人类专家与在同一框架内合作的辅助机器人结合起来,以实现无缝远程诊断、全天候症状监测、紧急警报、治疗改变和高级援助。

Article 32

Title@2025-06-09 (1): Innate-Values-driven Reinforcement Learning based Cooperative Multi-Agent Cognitive Modeling

Title: Innate-Values-driven Reinforcement Learning based Cooperative Multi-Agent Cognitive Modeling

Angeborene Werte-getriebene Verstärkung Learning basierte kooperative Multi-Agent Kognitive Modellierung

以基于强化的学习为基础的合作多代理共认型建模 2401.05572v2

Authors (1): Qin Yang

In multi-agent systems (MAS), the dynamic interaction among multiple decision-makers is driven by their innate values, affecting the environment’s state, and can cause specific behavioral patterns to emerge. On the other hand, innate values in cognitive modeling reflect individual interests and preferences for specific tasks and drive them to develop diverse skills and plans, satisfying their various needs and achieving common goals in cooperation. Therefore, building the awareness of AI agents to balance the group utilities and system costs and meet group members’ needs in their cooperation is a crucial problem for individuals learning to support their community and even integrate into human society in the long term. However, the current MAS reinforcement learning domain lacks a general intrinsic model to describe agents’ dynamic motivation for decision-making and learning from an individual needs perspective in their cooperation. To address the gap, this paper proposes a general MAS innate-values reinforcement learning (IVRL) architecture from the individual preferences angle. We tested the Multi-Agent IVRL Actor-Critic Model in different StarCraft Multi-Agent Challenge (SMAC) settings, which demonstrated its potential to organize the group’s behaviours to achieve better performance.

在多试剂系统中,多种决策者之间的动态互动是由其内在价值驱动的,影响到环境状况,并可能导致出现特定的行为模式;另一方面,认知模型中的内在价值反映个人的兴趣和对具体任务的兴趣,促使他们发展不同的技能和计划,满足他们的各种需要,实现共同的合作目标;因此,培养AI代理人员的认识,以平衡群体公用事业和系统成本,并满足群体成员在合作方面的需要,这是个人学习支持其社区,甚至长期融入人类社会的一个关键问题;然而,目前的MAS强化学习领域缺乏一个一般的内在模型,无法从个人需要的角度描述行为者的决策和学习的动态动力;为缩小差距,本文件从个人偏好的角度提出了一个通用的MAS内值强化学习(IVRL)结构。我们从不同的StarCraft多剂挑战(SMAC)环境中测试了多剂IVRL Acor-Critict 模型,显示其组织该团体行为以实现更好业绩的潜力。

Article 33

Title@2025-06-09 (1): Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures

Title: Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures

Intelligentes Offloading im Fahrzeug Edge Computing: Eine umfassende Überprüfung von Deep Reinforcement-Lernansätzen und Architekturen

在车辆边缘计算机中卸载:对深强化学习方法和架构的全面审查 2502.06963v2

Authors (3): Ashab Uddin, Ahmed Hamdi Sakr, Ning Zhang

The increasing complexity of Intelligent Transportation Systems (ITS) has led to significant interest in computational offloading to external infrastructures such as edge servers, vehicular nodes, and UAVs. These dynamic and heterogeneous environments pose challenges for traditional offloading strategies, prompting the exploration of Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) as adaptive decision-making frameworks. This survey presents a comprehensive review of recent advances in DRL-based offloading for vehicular edge computing (VEC). We classify and compare existing works based on learning paradigms (e.g., single-agent, multi-agent), system architectures (e.g., centralized, distributed, hierarchical), and optimization objectives (e.g., latency, energy, fairness). Furthermore, we analyze how Markov Decision Process (MDP) formulations are applied and highlight emerging trends in reward design, coordination mechanisms, and scalability. Finally, we identify open challenges and outline future research directions to guide the development of robust and intelligent offloading strategies for next-generation ITS.

智能运输系统(ITS)日益复杂,使人们对向外部基础设施,如边缘服务器、车辆节点和无人驾驶航空器等进行计算机卸载的兴趣极大。这些动态和多样化的环境对传统的卸载战略提出了挑战,促使探索强化学习和深强化学习作为适应性决策框架。本调查全面审查了基于DRL的卸载用于车辆边缘计算(VEC)的最近进展。我们根据学习模式(如单剂、多剂)、系统架构(如集中、分散、分层、分级)和优化目标(如拉伸、能源、公平)对现有工程进行分类和比较。此外,我们分析了Markov决策过程的拟订方式,并着重介绍了在奖励设计、协调机制和可调整性方面新出现的趋势。最后,我们找出了公开的挑战,并概述了今后研究的方向,以指导制定下一代ITS的稳健和智能卸载战略。

Article 34

Title@2025-06-09 (1): Diffusion of Responsibility in Collective Decision Making

Title: Diffusion of Responsibility in Collective Decision Making

Verteilung der Verantwortung bei der kollektiven Entscheidungsfindung

集体决策责任的分散 2506.07935v1

Authors (2): Pavel Naumov, Jia Tao

The term “diffusion of responsibility’’ refers to situations in which multiple agents share responsibility for an outcome, obscuring individual accountability. This paper examines this frequently undesirable phenomenon in the context of collective decision-making mechanisms. The work shows that if a decision is made by two agents, then the only way to avoid diffusion of responsibility is for one agent to act as a “dictator’’, making the decision unilaterally. In scenarios with more than two agents, any diffusion-free mechanism is an “elected dictatorship’’ where the agents elect a single agent to make a unilateral decision. The technical results are obtained by defining a bisimulation of decision-making mechanisms, proving that bisimulation preserves responsibility-related properties, and establishing the results for a smallest bisimular mechanism.

本文从集体决策机制的角度审视了这种经常不受欢迎的现象。工作表明,如果由两个机构作出决定,那么避免责任分散的唯一办法就是由一个机构作为“指定者”行事,单方面作出决定。在两个以上机构的情况下,任何不传播的机制都是“选择专制”机制,即代理人选择一个单一机构作出单方面决定。技术成果是通过界定决策机制的平衡,证明辅助性保留了责任相关特性,并为最小的两边机制确定结果。

Article 35

Title@2025-06-09 (1): Agent Semantics, Semantic Spacetime, and Graphical Reasoning

Title: Agent Semantics, Semantic Spacetime, and Graphical Reasoning

Agent Semantics, Semantische Raumzeit und Graphische Vernunft

语义学、语义空间时间和图形解释 2506.07756v1

Authors (1): Mark Burgess

Some formal aspects of the Semantic Spacetime graph model are presented, with reference to its use for directed knowledge representations and process modelling. A finite $\gamma(3,4)$ representation is defined to form a closed set of operations that can scale to any degree of semantic complexity. The Semantic Spacetime postulates bring predictability with minimal constraints to pathways in graphs. The ubiquitous appearance of absorbing states in any partial graph means that a graph process leaks information. The issue is closely associated with the issue of division by zero, which signals a loss of closure and the need for manual injection of remedial information. The Semantic Spacetime model (and its Promise Theory) origins help to clarify how such absorbing states are associated with boundary information where intentionality can enter.

介绍了Semantic Spacetime时间图形模型的一些正式方面,其中提及了该模型用于定向知识表达和进程建模的情况。限定的 $\gamma(3,4,4) 表示方式的定义是形成一套封闭的操作,可以达到某种程度的语义复杂程度。语义空间时间假设给图形中的路径带来可预测性,而最小的限制。任何部分图形中的吸收状态的无处不在的外观意味着图形过程会泄漏信息。这个问题与以零表示关闭损失和人工注入补救信息的问题密切相关。语义空间时间模型(及其承诺理论)的起源有助于澄清这种吸收状态如何与可有意进入的边界信息相联系。

Article 36

Title@2025-06-09 (1): Deep Equivariant Multi-Agent Control Barrier Functions

Title: Deep Equivariant Multi-Agent Control Barrier Functions

Deep Equivariant Multi-Agent Control Barrier Funktionen

千差万差万差万差万差多方控制 2506.07755v1

Authors (3): Nikolaos Bousias, Lars Lindemann, George Pappas

With multi-agent systems increasingly deployed autonomously at scale in complex environments, ensuring safety of the data-driven policies is critical. Control Barrier Functions have emerged as an effective tool for enforcing safety constraints, yet existing learning-based methods often lack in scalability, generalization and sampling efficiency as they overlook inherent geometric structures of the system. To address this gap, we introduce symmetries-infused distributed Control Barrier Functions, enforcing the satisfaction of intrinsic symmetries on learnable graph-based safety certificates. We theoretically motivate the need for equivariant parametrization of CBFs and policies, and propose a simple, yet efficient and adaptable methodology for constructing such equivariant group-modular networks via the compatible group actions. This approach encodes safety constraints in a distributed data-efficient manner, enabling zero-shot generalization to larger and denser swarms. Through extensive simulations on multi-robot navigation tasks, we demonstrate that our method outperforms state-of-the-art baselines in terms of safety, scalability, and task success rates, highlighting the importance of embedding symmetries in safe distributed neural policies.

随着多试剂系统在复杂环境中的大规模自主部署,确保数据驱动政策的安全至关重要。控制障碍功能已成为执行安全限制的有效工具,但现有基于学习的方法往往缺乏可缩放性、普遍性和取样效率,因为它们忽视了系统的固有几何结构。为了解决这一差距,我们引入了对称性、喷洒的分布式控制屏障功能,使基于可学习图形的安全证书上固有的不对称得到满足。我们从理论上提出需要使 CBFs和政策的对等化,并提议一种简单、有效且适应性强的方法,以便通过可兼容的团体行动建立这种等等同型群体模式网络。这种方法以分布式的数据效率方式将安全限制编码成零点通用,使更大型和密度更密集的群温。通过对多色导航任务进行广泛的模拟,我们证明我们的方法在安全、可缩放性和任务成功率方面超越了最先进的基准,突出了在安全分布式神经政策中嵌入式组合的重要性。

Article 37

Title@2025-06-09 (1): WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point

Title: WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point

WorldGUI: Ein interaktiver Benchmark für Desktop-GUI-Automatisierung von jedem Ausgangspunkt

WorldGUI: 任何起始点桌面图形用户界面自动化的交互基准 2502.08047v3

Authors (5): Henry Hengyuan Zhao, Kaiming Yang, Wendi Yu, Difei Gao, Mike Zheng Shou

GUI agents have achieved outstanding performance in GUI element grounding. However, planning remains highly challenging, especially due to the sensitivity to the initial state of the environment. Specifically, slight differences in the initial state-such as the target software not being open or the interface not being in its default state, often lead to planning errors. This issue is widespread in real application scenarios, but existing benchmarks fail to evaluate it. To address this gap, we introduce WorldGUI, a comprehensive GUI benchmark containing tasks across ten widely used desktop and web applications (e.g., PowerPoint, VSCode, Acrobat), each instantiated with diverse initial states to simulate authentic human-computer interactions. Complementing this, we propose WorldGUI-Agent, a universal framework that unifies three core modules: Planner-Critic for high-level plan refinement, Step-Check for intermediate verification, and Actor-Critic for action-level optimization to proactively detect and correct errors. Experimental evaluation shows that WorldGUI-Agent outperforms the outstanding existing model (Claude-3.5 Computer Use) by 12.4% in success rate on WorldGUI, and achieves a 31.2% overall success rate on WindowsAgentArena, surpassing the prior state-of-the-art by 11.7%. Our analysis further reveals that dynamic augmentation tasks and desktop environments pose substantial hurdles, underscoring the necessity of adaptive planning and feedback-driven execution for advancing real-world GUI automation. The code and data are available at https://github.com/showlab/WorldGUI.

图形用户界面代理器在图形用户界面元素定位中取得了杰出的绩效。然而,规划仍然极具挑战性, 特别是由于对环境初始状态的敏感度。具体地说, 初始状态( 如目标软件未打开或界面未处于默认状态) 中略有差异, 往往导致规划错误。这个问题在实际应用情景中很普遍, 但现有基准无法评估它。为了解决这一差距, 我们引入了WorldGUI, 包含十种广泛使用的桌面和网络应用程序( 如 PowerPoint、 VSCode、 Acrobat) 中任务的综合图形界面基准, 其中包括十种广泛使用的桌面和网络应用程序( 如 PowerPoint、 VSCode、 Acrobat) 中的任务, 每与不同的初始状态( 模拟真实的计算机互动) 。补充这一点, 我们提出WorldGUI- Agenerent, 一个统一框架, 用于高级计划、高级规则/ 方向 Arassimal Aral- developal Aral Arvial Arvial 。

Article 38

Title@2025-06-09 (1): Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Title: Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Verfolgung beweglicher Ziele mit Online-Selbstspiel-Verstärkung Lernen für sicherere Sprachmodelle

利用在线加强自身能力学习,建立更安全语言模式,以追踪移动目标 2506.07468v1

Authors (7): Mickel Liu, Liwei Jiang, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff, Natasha Jaques

Conventional language model (LM) safety alignment relies on a reactive, disjoint procedure: attackers exploit a static model, followed by defensive fine-tuning to patch exposed vulnerabilities. This sequential approach creates a mismatch – attackers overfit to obsolete defenses, while defenders perpetually lag behind emerging threats. To address this, we propose Self-RedTeam, an online self-play reinforcement learning algorithm where an attacker and defender agent co-evolve through continuous interaction. We cast safety alignment as a two-player zero-sum game, where a single model alternates between attacker and defender roles – generating adversarial prompts and safeguarding against them – while a reward LM adjudicates outcomes. This enables dynamic co-adaptation. Grounded in the game-theoretic framework of zero-sum games, we establish a theoretical safety guarantee which motivates the design of our method: if self-play converges to a Nash Equilibrium, the defender will reliably produce safe responses to any adversarial input. Empirically, Self-RedTeam uncovers more diverse attacks (+21.8% SBERT) compared to attackers trained against static defenders and achieves higher robustness on safety benchmarks (e.g., +65.5% on WildJailBreak) than defenders trained against static attackers. We further propose hidden Chain-of-Thought, allowing agents to plan privately, which boosts adversarial diversity and reduces over-refusals. Our results motivate a shift from reactive patching to proactive co-evolution in LM safety training, enabling scalable, autonomous, and robust self-improvement of LMs via multi-agent reinforcement learning (MARL).

常规语言模式(LM)安全比对依赖于被动、脱节的程序:攻击者利用静态模式,随后进行防御性微调,以弥补暴露的弱点。这种顺序方法造成了不匹配 – – 攻击者过度适应过时的防御,而捍卫者则永远落后于新出现的威胁。为了解决这个问题,我们提议了Self-RedTeam,一个在线自我游戏强化学习算法,攻击者和捍卫者代理人通过持续互动共同演进。我们把安全比对称零和对称游戏:攻击者与捍卫者角色之间的一个单一模型替代 – – 产生对立的提示,并保护他们 – – 奖励LM裁决结果。这有利于动态调适配。基于零和游戏的游戏理论框架,我们建立了理论安全保障,从而激励我们的方法设计:如果攻击者和捍卫者与捍卫者之间通过持续的互动,那么我们自我游戏的自我游戏会对任何对抗性投入产生可靠的反应。生动性、自制平坦坦坦坦坦派平调发现更多样化的攻击者(+21.8 % SBERT)比对静态防御性防御性防御的自我动力更激烈的自我升级更激烈的自我激励, 5,在激烈的自我激励中进一步的自我升级的自我定位和不断升级的自我升级的自我定位的自我定位的自我定位的自我定位中, 5,在不断推进的自我定位的自我定位的自我定位的自我定位上更深的自我定位的自我定位的自我定位的自我定位的自我定位的自我定位的自我定位上,在自我定位的自我定位的自我定位上列列。

Article 39

Title@2025-06-09 (1): Multi-agent Architecture Search via Agentic Supernet

Title: Multi-agent Architecture Search via Agentic Supernet

Multi-Agent Architektur Suche über Agentic Supernet

通过 Agric Supernet 多剂机构建筑搜索 2502.04180v2

Authors (6): Guibin Zhang, Luyang Niu, Junfeng Fang, Kun Wang, Lei Bai, Xiang Wang

Large Language Model (LLM)-empowered multi-agent systems extend the cognitive boundaries of individual agents through disciplined collaboration and interaction, while constructing these systems often requires labor-intensive manual designs. Despite the availability of methods to automate the design of agentic workflows, they typically seek to identify a static, complex, one-size-fits-all system, which, however, fails to dynamically allocate inference resources based on the difficulty and domain of each query. To address this challenge, we shift away from the pursuit of a monolithic agentic system, instead optimizing the \textbf{agentic supernet}, a probabilistic and continuous distribution of agentic architectures. We introduce MaAS, an automated framework that samples query-dependent agentic systems from the supernet, delivering high-quality solutions and tailored resource allocation (\textit{e.g.}, LLM calls, tool calls, token cost). Comprehensive evaluation across six benchmarks demonstrates that MaAS \textbf{(I)} requires only $6\sim45\%$ of the inference costs of existing handcrafted or automated multi-agent systems, \textbf{(II)} surpasses them by $0.54\%\sim11.82\%$, and \textbf{(III)} enjoys superior cross-dataset and cross-LLM-backbone transferability.

大型语言模型(LLM)的强大多试剂系统(LLM)通过有纪律的合作和互动扩大了单个代理人的认知界限,而建立这些系统则往往需要劳动密集型手工设计。尽管有使代理工作流程设计自动化的方法,但它们通常寻求确定一个静态、复杂、一刀切的系统,但无法根据每个查询的难度和范围动态地分配推论资源。为了应对这一挑战,我们从追求单一剂系统而转移,而不是优化代理结构的概率和持续分布。我们引入了一个自动框架,即从超级网络抽取依赖查询的代理系统,提供高质量的解决方案和量身定制的资源分配(\textit{g.}、LLM调用、工具调用、象征性成本)。跨六个基准的全面评价表明,MAAS\ textbf*{(I)}只需要6\sim45$(rtextbfuni),而是优化代理结构的概率和连续分配。我们引入了一个自动框架,从超级网络中抽取自查询的代理的代理系统,提供高质量的解决办法和定制的跨版本。 III\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Article 40

Title@2025-06-09 (1): G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Title: G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

G-Memory: Hierarchischer Speicher für Multi-Agent-Systeme

G-记忆:为多机构系统追踪等级记忆 2506.07398v1

Authors (6): Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, Shuicheng Yan

Large language model (LLM)-powered multi-agent systems (MAS) have demonstrated cognitive and execution capabilities that far exceed those of single LLM agents, yet their capacity for self-evolution remains hampered by underdeveloped memory architectures. Upon close inspection, we are alarmed to discover that prevailing MAS memory mechanisms (1) are overly simplistic, completely disregarding the nuanced inter-agent collaboration trajectories, and (2) lack cross-trial and agent-specific customization, in stark contrast to the expressive memory developed for single agents. To bridge this gap, we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory, which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both $\textit{high-level, generalizable insights}$ that enable the system to leverage cross-trial knowledge, and $\textit{fine-grained, condensed interaction trajectories}$ that compactly encode prior collaboration experiences. Upon task execution, the entire hierarchy evolves by assimilating new collaborative trajectories, nurturing the progressive evolution of agent teams. Extensive experiments across five benchmarks, three LLM backbones, and three popular MAS frameworks demonstrate that G-Memory improves success rates in embodied action and accuracy in knowledge QA by up to $20.89\%$ and $10.12\%$, respectively, without any modifications to the original frameworks. Our codes are available at https://github.com/bingreeky/GMemory.

大型语言模型(LLM)驱动的多试剂系统(MAS)显示的认知和执行能力远远超过了单一LLM代理机构的认知和执行能力,然而,他们的自我革命能力仍然受到不完善的记忆结构的阻碍。仔细检查后,我们震惊地发现,目前流行的MAS记忆机制(1)过于简单,完全无视细微的试剂间协作轨迹,以及(2) 缺乏跨审和代理人专用定制,这与为单个代理商开发的表达式记忆力形成鲜明对照。为了缩小这一差距,我们引入了G-Meory,这是一个受组织记忆理论启发的等级和代理人记忆系统,它通过三层图层结构:洞察、查询和互动图图图来管理MAS的长期互动。收到新的用户查询后,G-Mory进行双向记忆穿行曲,以检索 $tleftit{高层次和普通洞察 ,使系统能够利用跨层知识, $rdealit{fy frial-deal reminate tractorationorations, legreal legreal legal acal legreal le acreal lexal legregrodustrational le lexlational le le le lexlational legal legal legal legre lection legal lection lection legal legal legal legal lemental lemental legal lemental lemental lemental lemental lemental lemental lemental lemental lemental lemental lemental lemental lex lectionsal lex lex lex lex lex lex le lemental legal lemental lemental lemental lemental lemental lemental lemental lemental le le le le lex lemental lemental lex,GMal lex,GMal lex,G-s le

Article 41

Title@2025-06-09 (1): Shapley-Coop: Credit Assignment for Emergent Cooperation in Self-Interested LLM Agents

Title: Shapley-Coop: Credit Assignment for Emergent Cooperation in Self-Interested LLM Agents

Shapley-Coop: Kreditvergabe für die emergente Zusammenarbeit bei selbstinteressanten LLM-Agenten

Shapely-Coop:在自利的LLM代理商中进行新兴合作的信用分配 2506.07388v1

Authors (6): Yun Hua, Haosheng Chen, Shiqin Wang, Wenhao Li, Xiangfeng Wang, Jun Luo

Large Language Models (LLMs) show strong collaborative performance in multi-agent systems with predefined roles and workflows. However, in open-ended environments lacking coordination rules, agents tend to act in self-interested ways. The central challenge in achieving coordination lies in credit assignment – fairly evaluating each agent’s contribution and designing pricing mechanisms that align their heterogeneous goals. This problem is critical as LLMs increasingly participate in complex human-AI collaborations, where fair compensation and accountability rely on effective pricing mechanisms. Inspired by how human societies address similar coordination challenges (e.g., through temporary collaborations such as employment or subcontracting), we propose a cooperative workflow, Shapley-Coop. Shapley-Coop integrates Shapley Chain-of-Thought – leveraging marginal contributions as a principled basis for pricing – with structured negotiation protocols for effective price matching, enabling LLM agents to coordinate through rational task-time pricing and post-task reward redistribution. This approach aligns agent incentives, fosters cooperation, and maintains autonomy. We evaluate Shapley-Coop across two multi-agent games and a software engineering simulation, demonstrating that it consistently enhances LLM agent collaboration and facilitates equitable credit assignment. These results highlight the effectiveness of Shapley-Coop’s pricing mechanisms in accurately reflecting individual contributions during task execution.

大型语言模型(LLMS)显示,在具有预先确定的作用和工作流程的多试剂系统中,合作表现良好,但在缺乏协调规则的开放环境中,代理商往往以自己感兴趣的方式行事。实现协调的中心挑战在于信用分配 – – 公平评价每个代理商的贡献和设计调整其不同目标的定价机制。这个问题至关重要,因为LLMS越来越多地参与复杂的人类-AI合作,在这种协作中,公平的补偿和问责取决于有效的定价机制。受人类社会如何应对类似的协调挑战(例如通过就业或分包等临时合作)的启发,我们建议合作工作流程,Shaply-Coop(Shaply-Copy-Copy-Oop),Shapley-Copy(Shapley-Coopy)整合Shapley-Cople-Cople-Cople-Court(Shapley-Cople-Cople-Cople-Copy),利用边际捐款作为定价原则基础 – – – – 利用边际捐款 – – – – 与结构谈判协议,使LLM代理商能够通过合理的任务定价和后奖励合作协调协调协调协调协调,促进公平的分配工作,促进公平的分配。

Article 42

Title@2025-06-09 (1): Digital Twin-based Smart Manufacturing: Dynamic Line Reconfiguration for Disturbance Handling

Title: Digital Twin-based Smart Manufacturing: Dynamic Line Reconfiguration for Disturbance Handling

Digital Twin-based Smart Manufacturing: Dynamic Line Reconfiguration for Disturbance Handling

数字双对数字智能制造:为处理骚乱而重新配置动态线路 2506.07332v1

Authors (9): Bo Fu, Mingjie Bi, Shota Umeda, Takahiro Nakano, Youichi Nonaka, Quan Zhou, Takaharu Matsui, Dawn M. Tilbury, Kira Barton

The increasing complexity of modern manufacturing, coupled with demand fluctuation, supply chain uncertainties, and product customization, underscores the need for manufacturing systems that can flexibly update their configurations and swiftly adapt to disturbances. However, current research falls short in providing a holistic reconfigurable manufacturing framework that seamlessly monitors system disturbances, optimizes alternative line configurations based on machine capabilities, and automates simulation evaluation for swift adaptations. This paper presents a dynamic manufacturing line reconfiguration framework to handle disturbances that result in operation time changes. The framework incorporates a system process digital twin for monitoring disturbances and triggering reconfigurations, a capability-based ontology model capturing available agent and resource options, a configuration optimizer generating optimal line configurations, and a simulation generation program initializing simulation setups and evaluating line configurations at approximately 400x real-time speed. A case study of a battery production line has been conducted to evaluate the proposed framework. In two implemented disturbance scenarios, the framework successfully recovers system throughput with limited resources, preventing the 26% and 63% throughput drops that would have occurred without a reconfiguration plan. The reconfiguration optimizer efficiently finds optimal solutions, taking an average of 0.03 seconds to find a reconfiguration plan for a manufacturing line with 51 operations and 40 available agents across 8 agent types.

现代制造业日益复杂,加上需求波动、供应链不确定性和产品定制,凸显了制造系统需要灵活更新配置并迅速适应扰动。然而,目前的研究在提供整体重组制造框架、无缝监测系统扰动、优化基于机器能力的替代线型配置和自动模拟评估以进行快速适应评估方面不足。本文件介绍了一个动态的制造线重组框架,以应对导致运行时间变化的干扰。框架包含一个用于监测扰动和触发重组的系统流程数字双对,一个基于能力的本体模型,捕捉现有的代理和资源选项,一个配置优化生成最佳线型配置,以及一个模拟生成程序,启动模拟设置,并以大约400x实时速度评价线型配置方面。对电池生产线进行了案例研究,以评价拟议框架。在实施的两个扰动假设中,框架成功地用有限的资源恢复了系统,防止了26%和63%的吞吐下降,而这本来会在没有重组计划的情况下发生。重组的优化行找到了最佳解决方案,平均需要0.03秒的时间来启动模拟模拟结构,并用51个代理商型的重组计划,在40个类型的制造业中找到一个可以使用的版本。

Article 43

Title@2025-06-08 (7): Very Large-scale Multi-Robot Task Allocation in Challenging Environments via Robot Redistribution

Title: Very Large-scale Multi-Robot Task Allocation in Challenging Environments via Robot Redistribution

Sehr groß angelegte Multi-Roboter-Aufgabenzuteilung in anspruchsvollen Umgebungen durch Roboterumverteilung

通过机器人再分配在挑战环境中使用极大型多机器人任务分配 2506.07293v1

Authors (3): Seabin Lee, Joonyeol Sim, Changjoo Nam

We consider the Multi-Robot Task Allocation (MRTA) problem that aims to optimize an assignment of multiple robots to multiple tasks in challenging environments which are with densely populated obstacles and narrow passages. In such environments, conventional methods optimizing the sum-of-cost are often ineffective because the conflicts between robots incur additional costs (e.g., collision avoidance, waiting). Also, an allocation that does not incorporate the actual robot paths could cause deadlocks, which significantly degrade the collective performance of the robots. We propose a scalable MRTA method that considers the paths of the robots to avoid collisions and deadlocks which result in a fast completion of all tasks (i.e., minimizing the \textit{makespan}). To incorporate robot paths into task allocation, the proposed method constructs a roadmap using a Generalized Voronoi Diagram. The method partitions the roadmap into several components to know how to redistribute robots to achieve all tasks with less conflicts between the robots. In the redistribution process, robots are transferred to their final destinations according to a push-pop mechanism with the first-in first-out principle. From the extensive experiments, we show that our method can handle instances with hundreds of robots in dense clutter while competitors are unable to compute a solution within a time limit.

我们认为多机器人任务分配(MRTA)问题(MRTA)问题,它旨在优化多机器人分配任务,在人口稠密、通道狭窄的富有挑战性的环境中完成多重任务。在这种环境中,优化成本总和的常规方法往往无效,因为机器人之间的冲突带来额外的成本(例如避免碰撞、等待)。此外,不包含实际机器人路径的分配可能导致僵局,从而大大降低机器人的集体性能。我们提出了一个可扩展的MRTA方法,即考虑机器人的路径,以避免碰撞和僵局,从而导致迅速完成所有任务(例如,尽量减少 & textitit{Makespan})。为了将机器人路径纳入任务分配,拟议的方法使用通用的Voronooi Diagram绘制了路线图。方法将路线图分成几个组成部分,以了解如何重新分配机器人,实现机器人之间的所有任务。在再分配过程中,机器人被转移到最终目的地,按照第一时间原则推动式机制,从而导致快速完成所有任务(即最大限度地减少 ) 。从广泛实验中,我们无法用一个高密度的机器人来显示我们的方法,在100个机器人的机器人的极限中可以处理一个最短时间。

Article 44

Title@2025-06-08 (7): Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization

Title: Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization

Verteidigung gegen verschiedene Angriffe im Federated Learning durch Konsens-basierte Bi-Level-Optimierung

通过基于共识的双级优化,在通过共识实现的两级最佳化,在联邦学习中防范多种袭击 2412.02535v2

Authors (5): Nicolás García Trillos, Aditya Kumar Akash, Sixu Li, Konstantin Riedl, Yuhua Zhu

Adversarial attacks pose significant challenges in many machine learning applications, particularly in the setting of distributed training and federated learning, where malicious agents seek to corrupt the training process with the goal of jeopardizing and compromising the performance and reliability of the final models. In this paper, we address the problem of robust federated learning in the presence of such attacks by formulating the training task as a bi-level optimization problem. We conduct a theoretical analysis of the resilience of consensus-based bi-level optimization (CB$^2$O), an interacting multi-particle metaheuristic optimization method, in adversarial settings. Specifically, we provide a global convergence analysis of CB$^2$O in mean-field law in the presence of malicious agents, demonstrating the robustness of CB$^2$O against a diverse range of attacks. Thereby, we offer insights into how specific hyperparameter choices enable to mitigate adversarial effects. On the practical side, we extend CB$^2$O to the clustered federated learning setting by proposing FedCB$^2$O, a novel interacting multi-particle system, and design a practical algorithm that addresses the demands of real-world applications. Extensive experiments demonstrate the robustness of the FedCB$^2$O algorithm against label-flipping attacks in decentralized clustered federated learning scenarios, showcasing its effectiveness in practical contexts.

阿德萨里攻击在许多机器学习应用中构成重大挑战,特别是在设计分布式培训和联合学习方面,恶意分子试图腐蚀培训过程,目的是危害和损害最后模型的性能和可靠性;在本文件中,我们通过将培训任务定为双级优化问题,解决在发生此类袭击时强力联合学习的问题;我们对基于共识的双级优化(CB$2O)的复原力进行理论分析,这是一种互动的多粒子节能优化方法,在对抗环境下。具体地说,我们提供了一种全球趋同分析,目的是在恶意分子在场的情况下,使培训过程变得腐败,目的是损害和损害最后模型的性能和可靠性;我们在此文件中,我们通过将培训任务定为双级优化双级优化(CB$2O),对基于共识的双级优化的双级优化(CB$2O)的复原力进行理论分析;我们在对抗敌对环境下,我们提出一种新型互动的多粒子系统,即以市法律为本的CB$2OO美元,目的是证明CB2美元对各种攻击进行实实在性分析;然后,我们提出了在FD-CAS-CA-CBA系统上进行实际的升级的系统,并设计一个针对FDFD-CA-CAD-CA-CASloD-CAS-CA要求的升级式的系统,以真实的系统,以真实性攻击式式的系统,说明。

Article 45

Title@2025-06-08 (7): Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments

Title: Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments

Lernen als Individuen, Evolve als Team: Multi-Agent LLMs Anpassung in körpereigenen Umgebungen

作为个人学习,作为一个团队参与:多剂LMs在渗透环境中的适应 2506.07232v1

Authors (6): Xinran Li, Chenjia Bai, Zijian Li, Jiakun Zheng, Ting Xiao, Jun Zhang

Large language models (LLMs) possess extensive knowledge bases and strong reasoning capabilities, making them promising tools for complex, multi-agent planning in embodied environments. However, despite LLMs’ advanced abilities and the sophisticated modular design of agentic methods, existing LLM-based planning algorithms remain limited by weak adaptation capabilities to multi-agent embodied scenarios. We address this limitation by introducing a framework that enables LLM agents to learn and evolve both before and during test time, equipping them with environment-relevant knowledge for better planning and enhanced communication for improved cooperation. Inspired by centralized training with decentralized execution in multi-agent reinforcement learning, we propose a \textit{Learn as Individuals, Evolve as a Team (LIET)} paradigm for multi-agent LLMs adaptation. At the individual level, LLM agents learn a local utility function from exploratory datasets to better comprehend the embodied environment, which is then queried during test time to support informed decision-making. At the team level, LLM agents collaboratively and iteratively maintain and update a shared cooperation knowledge list based on new experiences, using it to guide more effective communication. By combining individual learning with team evolution, LIET enables comprehensive and flexible adaptation for LLM agents. Our experiments on Communicative Watch-And-Help and ThreeD-World Multi-Agent Transport benchmarks demonstrate that LIET, instantiated with both LLaMA and GPT-4o, outperforms existing baselines and exhibits strong cooperative planning abilities.

大型语言模型(LLMS)拥有广泛的知识基础和强大的推理能力,使这些模型成为在成份环境中进行复杂、多试剂规划的有希望的工具;然而,尽管LLMS的先进能力以及精密的模版设计了代理方法,但现有的LLM规划算法仍然因对多试剂所体现的假设的适应能力薄弱而受到限制;我们通过引入一个框架,使LLM代理商能够在试验时间之前和期间学习并进化,为它们配备与环境有关的知识,以更好地规划和加强交流以加强合作;由于集中培训,在多试剂强化学习中分权执行,因此,我们提出一个多试剂LIMS适应模式;在个别层面上,LLMM代理商从探索的数据集中学习一种本地的实用功能,以便更好地了解成份环境,然后在试验时间里询问如何支持知情的决策;在团队一级,LLMTM代理商通过协作和反复更新基于新经验的共享合作知识清单,用以指导更强有力的通信;通过将个人学习与团队进化、LIET和MM公司现有试算、综合和灵活地调整MLMLA的试算方法,从而展示了我们的现有试算。

Article 46

Title@2025-06-08 (7): BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling

Title: BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modeling

BRIDGE: Bootstrapping-Text zur Steuerung der Time-Series-Generation über Multi-Agent iterative Optimierung und Diffusionsmodellierung

BRIDGE:通过多代理迭代优化和传播模型化控制时间- 系列生成的推进文本 2503.02445v4

Authors (8): Hao Li, Yu-Hao Huang, Chang Xu, Viktor Schlegel, Renhe Jiang, Riza Batista-Navarro, Goran Nenadic, Jiang Bian

Time-series Generation (TSG) is a prominent research area with broad applications in simulations, data augmentation, and counterfactual analysis. While existing methods have shown promise in unconditional single-domain TSG, real-world applications demand for cross-domain approaches capable of controlled generation tailored to domain-specific constraints and instance-level requirements. In this paper, we argue that text can provide semantic insights, domain information and instance-specific temporal patterns, to guide and improve TSG. We introduce ``Text-Controlled TSG’’, a task focused on generating realistic time series by incorporating textual descriptions. To address data scarcity in this setting, we propose a novel LLM-based Multi-Agent framework that synthesizes diverse, realistic text-to-TS datasets. Furthermore, we introduce BRIDGE, a hybrid text-controlled TSG framework that integrates semantic prototypes with text description for supporting domain-level guidance. This approach achieves state-of-the-art generation fidelity on 11 of 12 datasets, and improves controllability by up to 12% on MSE and 6% MAE compared to no text input generation, highlighting its potential for generating tailored time-series data.

时间序列生成(TSG)是一个突出的研究领域,在模拟、数据增强和反事实分析方面广泛应用。虽然现有方法在无条件单域 TSG 中显示出前景,但现实世界应用对跨域方法的需求,这些方法能够根据具体领域的限制和实例要求进行有控制的生成。在本文中,我们认为文本可以提供语义洞察力、域信息和具体实例的时间模式,以指导和改进TSG。我们引入了“Text-croled TSG ”这一任务,其重点是通过纳入文本描述生成现实的时间序列。为了解决这一设置中的数据稀缺问题,我们提出了一个基于LLM的新的多要素框架,以综合多样化、现实的文本到TS数据集。此外,我们引入了BRIDGE,这是一个混合文本控制的 TSG 框架,将语义原型与文本描述相结合,用于支持域级指导。这个方法在12个数据集中的11个中实现了“Text-text-crolled TSG ” ,并改进了对MSE 和 6 % MAE 的可控性,以12 % 的MSE , 将它的潜力与不按时间生成数据进行对比。

Article 47

Title@2025-06-08 (7): Evolution of Cooperation in LLM-Agent Societies: A Preliminary Study Using Different Punishment Strategies

Title: Evolution of Cooperation in LLM-Agent Societies: A Preliminary Study Using Different Punishment Strategies

Entwicklung der Zusammenarbeit in LLM-Agent Societies: Eine Vorstudie mit unterschiedlichen Strafstrategien

LLM-Agent Sociems公司合作的演变:使用不同惩罚战略的初步研究 2504.19487v2

Authors (7): Kavindu Warnakulasuriya, Prabhash Dissanayake, Navindu De Silva, Stephen Cranefield, Bastin Tony Roy Savarimuthu, Surangika Ranathunga, Nisansa de Silva

The evolution of cooperation has been extensively studied using abstract mathematical models and simulations. Recent advances in Large Language Models (LLM) and the rise of LLM agents have demonstrated their ability to perform social reasoning, thus providing an opportunity to test the emergence of norms in more realistic agent-based simulations with human-like reasoning using natural language. In this research, we investigate whether the cooperation dynamics presented in Boyd and Richerson’s model persist in a more realistic simulation of the diner’s dilemma using LLM agents compared to the abstract mathematical nature in the work of Boyd and Richerson. Our findings indicate that agents follow the strategies defined in the Boyd and Richerson model, and explicit punishment mechanisms drive norm emergence, reinforcing cooperative behaviour even when the agent strategy configuration varies. Our results suggest that LLM-based Multi-Agent System simulations, in fact, can replicate the evolution of cooperation predicted by the traditional mathematical models. Moreover, our simulations extend beyond the mathematical models by integrating natural language-driven reasoning and a pairwise imitation method for strategy adoption, making them a more realistic testbed for cooperative behaviour in MASs.

利用抽象数学模型和模拟,对合作的演变进行了广泛的研究; 大型语言模型(LLM)和LLM代理商的崛起最近的进展表明,他们有能力进行社会推理,从而提供了一个机会,在更现实的代理模拟中测试规范的出现,使用自然语言进行人性推理; 在这项研究中,我们调查Boyd和Richerson的模型提出的合作动态是否持续在使用LLM代理商对餐厅困境进行更现实的模拟中,与Boyd和Richerson工作的抽象数学性质相比。我们的研究结果表明,代理商遵循Boyd和Richerson模型中界定的战略,明确的惩罚机制推动规范的形成,加强合作行为,即使代理商战略配置不尽相同。我们的结果表明,基于LLMM多行为者系统模拟事实上可以复制传统数学模型所预测的合作演进。此外,我们的模拟超越了数学模型,将自然语言驱动的推理学和对战略采纳的仿造方法结合起来,使它们成为MAS中合作行为的更现实的测试台。

Article 48

Title@2025-06-08 (7): Position: Simulating Society Requires Simulating Thought

Title: Position: Simulating Society Requires Simulating Thought

Position: Gesellschaft simulieren erfordert simulierendes Denken

位置:模拟社会要求模拟思想 2506.06958v1

Authors (13): Chance Jiajie Li, Jiayi Wu, Zhenze Mo, Ao Qu, Yuhan Tang, Kaiya Ivy Zhao, Yulu Gan, Jie Fan, Jiangbo Yu, Jinhua Zhao, Paul Liang, Luis Alonso, Kent Larson

Simulating society with large language models (LLMs), we argue, requires more than generating plausible behavior – it demands cognitively grounded reasoning that is structured, revisable, and traceable. LLM-based agents are increasingly used to emulate individual and group behavior – primarily through prompting and supervised fine-tuning. Yet they often lack internal coherence, causal reasoning, and belief traceability – making them unreliable for analyzing how people reason, deliberate, or respond to interventions. To address this, we present a conceptual modeling paradigm, Generative Minds (GenMinds), which draws from cognitive science to support structured belief representations in generative agents. To evaluate such agents, we introduce the RECAP (REconstructing CAusal Paths) framework, a benchmark designed to assess reasoning fidelity via causal traceability, demographic grounding, and intervention consistency. These contributions advance a broader shift: from surface-level mimicry to generative agents that simulate thought – not just language – for social simulations.

我们认为,用大型语言模型(LLMS)来模拟社会所需要的不仅仅是产生可信的行为 – – 它要求有认知依据的推理,这种推理是结构化的、可逆的和可追踪的。基于LLM的代理商越来越多地被用来模仿个人和团体的行为 – – 主要是通过促进和监督的微调。然而,它们往往缺乏内部一致性、因果关系推理和信念可追踪性 – – 这使得它们不可靠,无法分析人们如何理性、蓄意或如何对干预作出反应。为了解决这个问题,我们提出了一个概念模型,即GenMinds(GenMinds),它从认知科学中汲取,支持基因代理的结构性信仰表现。为了评估这些代理商,我们引入了RECAP(重建历史路径)框架,这个基准旨在评估通过因果关系追踪、人口定位和干预一致性来进行推理的忠性。这些贡献推动了更广泛的转变:从地表级模拟到基因描述剂,而不仅仅是语言,以模拟社会模拟。

Article 49

Title@2025-06-07 (6): Object-Spatial Programming

Title: Object-Spatial Programming

Objekträumliche Programmierung

物体空间方案拟订 2503.15812v6

Authors (1): Jason Mars

The evolution of programming languages from low-level assembly to high-level abstractions demonstrates a fundamental principle: by constraining how programmers express computation and enriching semantic information at the language level, we can make previously undecidable program properties tractable for optimization. Building on the insight of this undecidability-lessening effect, we introduce Object-Spatial Programming (OSP), a novel programming model that extends Object-Oriented Programming by introducing topologically-aware class constructs called archetypes. OSP fundamentally inverts the traditional relationship between data and computation, enabling computation to move to data through four specialized archetypes: object classes, node classes (discrete data locations), edge classes (first-class relationships), and walker classes (mobile computational entities). By making topological relationships and traversal patterns explicit at the language level, OSP transforms previously opaque program behaviors into observable, optimizable patterns. This semantic enhancement enables runtime systems to make informed decisions about data locality, parallel execution, and distribution strategies based on explicit topology, while providing programmers with intuitive abstractions for modeling complex systems where connection topology is central to the computational model. The paradigm addresses fundamental limitations in traditional programming models when representing agent-based systems, social networks, neural networks, distributed systems, finite state machines, and other spatially-oriented computational problems, demonstrating how thoughtful abstraction design can simultaneously enhance programmer expressiveness and enable sophisticated system-level optimizations across the computing stack.

编程语言从低层组装到高层次抽象学的演进显示了一项根本原则:通过限制程序员如何在语言层面表达计算和丰富语义信息,我们可以通过限制程序员如何在语言层次上进行计算并丰富语义信息,使先前不可分的编程属性可以优化。基于这种不可分化效应的洞察力,我们引入了物体-空间编程(OSP),这是将目标偏向性编程扩展为横向编程的新编程模式,它通过引入被称为考古型号的表层-意识类结构,将目标偏向性编程扩展为横向编程。OSP从根本上改变了数据与计算的传统关系,使计算能够通过四种专门的直观型型类(对象类别、节点类(不同数据位置)、边缘级(一级关系)和行走者类(移动计算器类(移动计算实体))向数据转移:在语言层次层次上将表面关系和轮廓式编程模式将先前不透明的程序行为转换为可观察的、可优化型式模式。这种语义化系统使得运行系统能够对数据地点、平行执行和分布战略做出知情决定,同时以明确的基于明确的表学,同时为向直观的编程,同时向直观的编程,同时提供模式,同时提供模式,向直观的编程、直观的编程、直观的编程的编程、直观的编程式的编程的编程的编程、直观的编程、直线路路路路路路路路路路路路路系网络,同时向的编程网络路路路系,同时为代表着式式的编程网络代表着式系统,在中心模式、直路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路,从而,在中心路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路的网络,以至路,并路,以路,以路路,以路,并路,在中央设计网路路路路路路路,以路,在建路路路路路路路路路路路路路路路路路路,

Article 50

Title@2025-06-07 (6): AI-Generated Compromises for Coalition Formation

Title: AI-Generated Compromises for Coalition Formation

KI-generierte Kompromisse für Koalitionsbildung

AI - 联合组建协议 2506.06837v1

Authors (3): Eyal Briman, Ehud Shapiro, Nimrod Talmon

The challenge of finding compromises between agent proposals is fundamental to AI subfields such as argumentation, mediation, and negotiation. Building on this tradition, Elkind et al. (2021) introduced a process for coalition formation that seeks majority-supported proposals preferable to the status quo, using a metric space where each agent has an ideal point. A crucial step in this process involves identifying compromise proposals around which agent coalitions can unite. How to effectively find such compromise proposals remains an open question. We address this gap by formalizing a model that incorporates agent bounded rationality and uncertainty, and by developing AI methods to generate compromise proposals. We focus on the domain of collaborative document writing, such as the democratic drafting of a community constitution. Our approach uses natural language processing techniques and large language models to induce a semantic metric space over text. Based on this space, we design algorithms to suggest compromise points likely to receive broad support. To evaluate our methods, we simulate coalition formation processes and show that AI can facilitate large-scale democratic text editing, a domain where traditional tools are limited.

寻找代理人建议之间的妥协对于AI子领域,如辩论、调解和谈判等,至关重要。基于这一传统,Elkind等人(2021年)引入了一个联盟形成过程,利用每个代理人拥有一个理想点的衡量空间,寻求比现状更可取的多数支持提案。这一进程的一个关键步骤是确定折中提案,使代理人联盟能够团结在一起。如何有效找到这种折中提案仍然是一个尚未解决的问题。我们通过将一个包含代理人有约束性的合理性和不确定性的模式正规化,并通过制定大赦国际方法产生折中提案,来解决这一差距。我们注重合作文件撰写领域,例如社区宪法的民主起草。我们的方法是利用自然语言处理技术和大语言模型来诱发文字上的语义性计量空间。基于这一空间,我们设计各种算法,提出可能得到广泛支持的折中点。为了评估我们的方法,我们模拟联盟形成过程,并表明AI能够促进大规模民主文本编辑,一个传统工具有限的领域。

Article 51

Title@2025-06-07 (6): Adaptive Traffic Signal Control based on Multi-Agent Reinforcement Learning. Case Study on a simulated real-world corridor

Title: Adaptive Traffic Signal Control based on Multi-Agent Reinforcement Learning. Case Study on a simulated real-world corridor

Adaptive Verkehrssignalsteuerung auf Basis des Multi-Agenten-Verstärkungslernens. Fallstudie zu einem simulierten Real-World-Korridor

基于多机构强化学习的适应性交通信号控制,模拟现实世界走廊案例研究 2503.02189v3

Authors (3): Dickness Kakitahi Kwesiga, Angshuman Guin, Michael Hunter

Previous studies that have formulated multi-agent reinforcement learning (RL) algorithms for adaptive traffic signal control have primarily used value-based RL methods. However, recent literature has shown that policy-based methods may perform better in partially observable environments. Additionally, RL methods remain largely untested for real-world normally signal timing plans because of the simplifying assumptions common in the literature. The current study attempts to address these gaps and formulates a multi-agent proximal policy optimization (MA-PPO) algorithm to implement adaptive and coordinated traffic control along an arterial corridor. The formulated MA-PPO has a centralized-critic architecture under a centralized training and decentralized execution framework. Agents are designed to allow selection and implementation of up to eight signal phases, as commonly implemented in field controllers. The formulated algorithm is tested on a simulated real-world seven intersection corridor. The speed of convergence for each agent was found to depend on the size of the action space, which depends on the number and sequence of signal phases. The performance of the formulated MA-PPO adaptive control algorithm is compared with the field implemented actuated-coordinated signal control (ASC), modeled using PTV-Vissim-MaxTime software in the loop simulation (SILs). The trained MA-PPO performed significantly better than the ASC for all movements. Compared to ASC the MA-PPO showed 2% and 24% improvements in travel time in the primary and secondary coordination directions, respectively. For cross streets movements MA-PPO also showed significant crossing time reductions. Volume sensitivity experiments revealed that the formulated MA-PPO demonstrated good stability, robustness, and adaptability to changes in traffic demand.

以往为适应性交通信号控制开发多试剂强化学习(RL)算法的研究,主要使用基于价值的RL方法,但最近的文献表明,基于政策的方法在部分可观测环境中可能效果更好。此外,由于文献中常见的简化假设,对于现实世界通常的信号计时计划,RL方法基本上没有进行测试。目前的研究试图弥补这些差距,并制定一个多试剂准政策优化算法(MA-PPPO),在动脉走廊沿线实施适应性和协调性交通控制。拟订的MA-PO适应性控制算法,在集中培训和分散执行框架下,有一个集中式的critic-cal结构。设计这些方法是为了选择和执行八个信号阶段,如通常在外地控制器中实施的那样。所设计的RLA-L方法在模拟现实世界七个交叉走廊上进行测试。发现,每个代理商的趋同速度取决于行动空间的大小,这取决于信号阶段的数目和顺序。对于MA-PO的适应性控制算法的性算法,与外地执行的操作性协调性信号控制(ASC-MAPO-S-Servialalalalalalalalationallievalalal d)的进度也显示了MAST-MAP-IL-IL 。MAPA-ILA-ILMA-S的所有模拟的改进。

Article 52

Title@2025-06-07 (6): A Deep RL Approach on Task Placement and Scaling of Edge Resources for Cellular Vehicle-to-Network Service Provisioning

Title: A Deep RL Approach on Task Placement and Scaling of Edge Resources for Cellular Vehicle-to-Network Service Provisioning

Ein tiefer RL-Ansatz zur Aufgabenstellung und Skalierung von Kantenressourcen für zelluläre Vehicle-to-Network Service Provisioning

机动车辆对网络服务提供任务安排和边缘资源扩大的深入RL办法 2305.09832v4

Authors (6): Cyril Shih-Huan Hsu, Jorge Martín-Pérez, Danny De Vleeschauwer, Luca Valcarenghi, Xi Li, Chrysa Papagianni

Cellular Vehicle-to-Everything (C-V2X) is currently at the forefront of the digital transformation of our society. By enabling vehicles to communicate with each other and with the traffic environment using cellular networks, we redefine transportation, improving road safety and transportation services, increasing efficiency of vehicular traffic flows, and reducing environmental impact. To effectively facilitate the provisioning of Cellular Vehicular-to-Network (C-V2N) services, we tackle the interdependent problems of service task placement and scaling of edge resources. Specifically, we formulate the joint problem and prove that it is not computationally tractable. To address its complexity we propose Deep Hybrid Policy Gradient (DHPG), a new Deep Reinforcement Learning (DRL) approach that operates in hybrid action spaces, enabling holistic decision-making and enhancing overall performance. We evaluated the performance of DHPG using simulations with a real-world C-V2N traffic dataset, comparing it to several state-of-the-art (SoA) solutions. DHPG outperforms these solutions, guaranteeing the $99^{th}$ percentile of C-V2N service delay target, while simultaneously optimizing the utilization of computing resources. Finally, time complexity analysis is conducted to verify that the proposed approach can support real-time C-V2N services.

我们通过使车辆能够相互沟通,以及利用蜂窝网络与交通环境进行交流,重新定义运输,改善道路安全和运输服务,提高车辆交通流量效率,减少环境影响。为了有效便利提供细胞车辆对网络服务,我们处理服务任务定位和边缘资源规模化等相互依存问题。具体地说,我们制定共同问题,证明它无法计算。为了解决其复杂性,我们提议采用深混合政策加速(DHPG),即一种在混合行动空间运作的新的深强化学习(DRL)方法,促进整体决策,提高总体绩效。我们利用实时C-V2N交通数据集的模拟,将DHPG的绩效与若干最新技术(SOA)解决方案进行比较。DHPG超越了这些解决方案,保证了C-V2系统支持的99%%,同时优化了对C-V2系统的拟议复杂服务的利用,最终实现了对C-V2系统进行实时分析。

Article 53

Title@2025-06-07 (6): LLMs Can Simulate Standardized Patients via Agent Coevolution

Title: LLMs Can Simulate Standardized Patients via Agent Coevolution

LLMs können standardisierte Patienten über Agent Coevolution simulieren

LLM Can 通过革命代理人模拟标准化病人 2412.11716v2

Authors (10): Zhuoyun Du, Lujie Zheng, Renjun Hu, Yuyang Xu, Xiawei Li, Ying Sun, Wei Chen, Jian Wu, Haolei Cai, Haohao Ying

Training medical personnel using standardized patients (SPs) remains a complex challenge, requiring extensive domain expertise and role-specific practice. Previous research on Large Language Model (LLM)-based SPs mostly focuses on improving data retrieval accuracy or adjusting prompts through human feedback. However, this focus has overlooked the critical need for patient agents to learn a standardized presentation pattern that transforms data into human-like patient responses through unsupervised simulations. To address this gap, we propose EvoPatient, a novel simulated patient framework in which a patient agent and doctor agents simulate the diagnostic process through multi-turn dialogues, simultaneously gathering experience to improve the quality of both questions and answers, ultimately enabling human doctor training. Extensive experiments on various cases demonstrate that, by providing only overall SP requirements, our framework improves over existing reasoning methods by more than 10\% in requirement alignment and better human preference, while achieving an optimal balance of resource consumption after evolving over 200 cases for 10 hours, with excellent generalizability. Our system will be available at https://github.com/ZJUMAI/EvoPatient.

使用标准化病人(SP)的培训医疗人员仍是一项复杂的挑战,需要广泛的领域专门知识和特定角色做法。以前对基于大语言模型(LLM)的SP的研究主要侧重于提高数据检索准确性或通过人类反馈调整速度。然而,这一重点忽视了病人代理商迫切需要学习标准化的表述模式,通过未经监督的模拟将数据转化为人样病人的反应。为弥补这一差距,我们提议EvoPatient,这是一个新型的模拟病人框架,由病人代理商和医生代理商通过多点对话模拟诊断过程,同时收集经验以提高问题和答案的质量,最终使人类医生培训得以进行。对各种案例的广泛实验表明,通过只提供总体的SP要求,我们的框架比现有的推理方法改进了10个以上,在需求调整和更好的人类偏好方面,同时在经过超过200个案例10小时的演变后,在资源消耗方面实现最佳平衡,并且极具通用性。我们的系统将在https://github.com/ZJUMAI/EvoPatentent)上查阅。

Article 54

Title@2025-06-06 (5): KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes

Title: KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes

KramaBench: Ein Benchmark für KI-Systeme auf Data-to-Insight-Pipelines über Data Lakes

KramaBench:AI 数据到洞察的贯穿于数据湖的透视管道系统基准 2506.06541v1

Authors (19): Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Sivaprasad Sudhir, Om Chabra, Anna Zeng, Anton A. Zabreyko, Chenning Li, Ferdi Kossmann, Jialin Ding, Jun Chen, Markos Markakis, Matthew Russo, Weiyang Wang, Ziniu Wu, Michael J. Cafarella, Lei Cao, Samuel Madden, Tim Kraska

Constructing real-world data-to-insight pipelines often involves data extraction from data lakes, data integration across heterogeneous data sources, and diverse operations from data cleaning to analysis. The design and implementation of data science pipelines require domain knowledge, technical expertise, and even project-specific insights. AI systems have shown remarkable reasoning, coding, and understanding capabilities. However, it remains unclear to what extent these capabilities translate into successful design and execution of such complex pipelines. We introduce KRAMABENCH: a benchmark composed of 104 manually-curated real-world data science pipelines spanning 1700 data files from 24 data sources in 6 different domains. We show that these pipelines test the end-to-end capabilities of AI systems on data processing, requiring data discovery, wrangling and cleaning, efficient processing, statistical reasoning, and orchestrating data processing steps given a high-level task. Our evaluation tests 5 general models and 3 code generation models using our reference framework, DS-GURU, which instructs the AI model to decompose a question into a sequence of subtasks, reason through each step, and synthesize Python code that implements the proposed design. Our results on KRAMABENCH show that, although the models are sufficiently capable of solving well-specified data science code generation tasks, when extensive data processing and domain knowledge are required to construct real-world data science pipelines, existing out-of-box models fall short. Progress on KramaBench represents crucial steps towards developing autonomous data science agents for real-world applications. Our code, reference framework, and data are available at https://github.com/mitdbg/KramaBench.

数据科学管道的设计和实施需要领域知识、技术专长,甚至具体项目的洞察力。AI系统显示了非凡的推理、编码和理解能力。然而,仍然不清楚这些能力在多大程度上转化成成功设计和实施这种复杂的管道。我们引入了KRAMABENCH:一个基准,由104个人工完成的、来自6个不同领域的24个数据源的1700个数据文档组成的真实世界数据科学管道组成。我们显示,这些管道测试AI系统在数据处理方面的端到端能力,需要数据发现、敲打和清洁、高效处理、统计推理和在高层次任务下安排数据处理步骤。我们的评估测试了5个通用模型和3个代码生成模型,使用我们的参考框架DS-GURU,指导AI模型将一个问题解析成一个子目录,每个步骤都有,并合成Pyon码应用中用于实施数据处理的短期科学模型。我们的数据正在构建一个数据库,而数据生成过程中的模型正在充分构建。我们的数据正在构建一个数据库,而数据生成过程中,我们的数据正在构建一个快速的域中的数据生成过程。

Article 55

Title@2025-06-06 (5): Edge-Enabled Collaborative Object Detection for Real-Time Multi-Vehicle Perception

Title: Edge-Enabled Collaborative Object Detection for Real-Time Multi-Vehicle Perception

Edge-Enabled Collaborative Object Detection für Echtzeit-Multi-Fahrzeug-Perception

实时多视频感知器实时实时多视频感知的边能协作探测物体 2506.06474v1

Authors (3): Everett Richards, Bipul Thapa, Lena Mashayekhy

Accurate and reliable object detection is critical for ensuring the safety and efficiency of Connected Autonomous Vehicles (CAVs). Traditional on-board perception systems have limited accuracy due to occlusions and blind spots, while cloud-based solutions introduce significant latency, making them unsuitable for real-time processing demands required for autonomous driving in dynamic environments. To address these challenges, we introduce an innovative framework, Edge-Enabled Collaborative Object Detection (ECOD) for CAVs, that leverages edge computing and multi-CAV collaboration for real-time, multi-perspective object detection. Our ECOD framework integrates two key algorithms: Perceptive Aggregation and Collaborative Estimation (PACE) and Variable Object Tally and Evaluation (VOTE). PACE aggregates detection data from multiple CAVs on an edge server to enhance perception in scenarios where individual CAVs have limited visibility. VOTE utilizes a consensus-based voting mechanism to improve the accuracy of object classification by integrating data from multiple CAVs. Both algorithms are designed at the edge to operate in real-time, ensuring low-latency and reliable decision-making for CAVs. We develop a hardware-based controlled testbed consisting of camera-equipped robotic CAVs and an edge server to evaluate the efficacy of our framework. Our experimental results demonstrate the significant benefits of ECOD in terms of improved object classification accuracy, outperforming traditional single-perspective onboard approaches by up to 75%, while ensuring low-latency, edge-driven real-time processing. This research highlights the potential of edge computing to enhance collaborative perception for latency-sensitive autonomous systems.

准确和可靠的物体探测对于确保连接自治车辆(CAVs)的安全和效率至关重要。传统的机载感知系统由于封闭性和盲点而精确度有限,而基于云的解决方案则引入了显著的潜伏性,使其不适合动态环境中自动驱动所需的实时处理需求。为了应对这些挑战,我们引入了一个创新框架,即CAVs的Edge-Enable合作物体探测(ECOD),利用边际计算和多盘式CAV的合作,实时、多透视物体探测。我们的ECOD框架整合了两种关键算法:隐蔽性边缘感知和协作刺激(PAPACE),以及变异性对象追踪和评价(VOVTE)等。PACE综合了多个天端服务器上的实时处理数据,以提高对单个CAVS的可见度。VOTE利用一个基于共识的投票机制,通过将多重CAVAVs处理的数据整合到多个CAVs,这两种算法都是在实时操作上运行的,确保低度的精度的精度的精度高级精度潜度潜度潜度认识,同时通过我们的机级服务器的精度测试框架,确保高度的精度的精度的精度的精度的精度的精度和可靠的精度的精度的精度的精度测试。

Article 56

Title@2025-06-06 (5): Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams

Title: Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams

Teaming in der AI-Ära: AI-Augmented Frameworks für die Bildung, Simulation und Optimierung menschlicher Teams

AI时代的团队合作:AI-AF 构建、模拟和优化人类团队的增强框架 2506.05265v2

Authors (1): Mohammed Almutairi

Effective teamwork is essential across diverse domains. During the team formation stage, a key challenge is forming teams that effectively balance user preferences with task objectives to enhance overall team satisfaction. In the team performing stage, maintaining cohesion and engagement is critical for sustaining high team performance. However, existing computational tools and algorithms for team optimization often rely on static data inputs, narrow algorithmic objectives, or solutions tailored for specific contexts, failing to account for the dynamic interplay of team members personalities, evolving goals, and changing individual preferences. Therefore, teams may encounter member dissatisfaction, as purely algorithmic assignments can reduce members commitment to team goals or experience suboptimal engagement due to the absence of timely, personalized guidance to help members adjust their behaviors and interactions as team dynamics evolve. Ultimately, these challenges can lead to reduced overall team performance. My Ph.D. dissertation aims to develop AI-augmented team optimization frameworks and practical systems that enhance team satisfaction, engagement, and performance. First, I propose a team formation framework that leverages a multi-armed bandit algorithm to iteratively refine team composition based on user preferences, ensuring alignment between individual needs and collective team goals to enhance team satisfaction. Second, I introduce tAIfa (Team AI Feedback Assistant), an AI-powered system that utilizes large language models (LLMs) to deliver immediate, personalized feedback to both teams and individual members, enhancing cohesion and engagement. Finally, I present PuppeteerLLM, an LLM-based simulation framework that simulates multi-agent teams to model complex team dynamics within realistic environments, incorporating task-driven collaboration and long-term coordination.

在团队组建阶段,一个关键的挑战是如何组建团队,使用户偏好与任务目标有效平衡,以提高团队总体满意度。在团队绩效阶段,保持凝聚力和接触对于保持团队高绩效至关重要。然而,现有的团队优化计算工具和算法往往依赖于静态数据投入、狭隘的算法目标或适合具体情况的解决方案,无法说明团队成员个人动态的相互作用、不断变化的目标和个人偏好。因此,团队可能会遇到成员不满,因为纯粹的逻辑性任务可能降低成员对团队目标或工作目标的承诺,从而降低成员对团队目标或工作不优化的承诺,因为缺乏及时、个性化的指导,帮助成员随着团队动态的变化调整行为和互动。归根结底,这些挑战可能导致团队总体绩效的绩效下降。我的Ph.D. 评分旨在开发AI-建议团队优化框架和实际系统,以提高团队的满意度、参与和业绩。首先,我提议一个团队组建模块框架,根据用户偏好,确保个人需要和集体团队目标之间保持一致,以便随着团队的动态变化,提高团队的满意度。

Article 57

Title@2025-06-06 (5): UAV-UGV Cooperative Trajectory Optimization and Task Allocation for Medical Rescue Tasks in Post-Disaster Environments

Title: UAV-UGV Cooperative Trajectory Optimization and Task Allocation for Medical Rescue Tasks in Post-Disaster Environments

UAV-UGV Cooperative Trajektorie Optimierung und Aufgabenverteilung für medizinische Rettungsaufgaben in Post-Disaster-Umgebungen

UAV-UGV UAV UAV UGV 灾后环境中医疗救援任务合作轨迹优化和任务分配 2506.06136v1

Authors (6): Kaiyuan Chen, Wanpeng Zhao, Yongxi Liu, Yuanqing Xia, Wannian Liang, Shuo Wang

In post-disaster scenarios, rapid and efficient delivery of medical resources is critical and challenging due to severe damage to infrastructure. To provide an optimized solution, we propose a cooperative trajectory optimization and task allocation framework leveraging unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs). This study integrates a Genetic Algorithm (GA) for efficient task allocation among multiple UAVs and UGVs, and employs an informed-RRT* (Rapidly-exploring Random Tree Star) algorithm for collision-free trajectory generation. Further optimization of task sequencing and path efficiency is conducted using Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Simulation experiments conducted in a realistic post-disaster environment demonstrate that our proposed approach significantly improves the overall efficiency of medical rescue operations compared to traditional strategies, showing substantial reductions in total mission completion time and traveled distance. Additionally, the cooperative utilization of UAVs and UGVs effectively balances their complementary advantages, highlighting the system’ s scalability and practicality for real-world deployment.

在灾后情况中,迅速和高效地提供医疗资源至关重要,而且由于基础设施受到严重破坏而具有挑战性。为了提供最佳解决办法,我们提议一个合作轨迹优化和任务分配框架,利用无人驾驶飞行器和无人驾驶地面飞行器(UGVs),这项研究将遗传算法(GA)结合起来,以便在多个无人驾驶飞行器和无人驾驶地面飞行器之间高效分配任务,并采用知情的RRT* (快速探索随机树星)算法,以产生无碰撞轨迹。进一步优化任务排序和路径效率,利用共变矩阵适应演进战略(CMA-ES)进行。在现实的灾后环境中进行的模拟实验表明,与传统战略相比,我们拟议的方法大大提高了医疗救援行动的总体效率,表明任务完成时间和行距大大缩短。此外,对UAVs和UGVs的合作利用有效地平衡了其互补优势,突出了系统在实际世界部署方面的可扩展性和实用性。

Article 58

Title@2025-06-06 (5): Modeling human reputation-seeking behavior in a spatio-temporally complex public good provision game

Title: Modeling human reputation-seeking behavior in a spatio-temporally complex public good provision game

Modellierung von menschlichen Reputations-Suche Verhalten in einem räumlich-vorübergehend komplexen öffentlichen guten Bereitstellung Spiel

模拟人类在现时复杂的公益提供游戏中寻求名声的行为 2506.06032v1

Authors (9): Edward Hughes, Tina O. Zhu, Martin J. Chadwick, Raphael Koster, Antonio García Castañeda, Charles Beattie, Thore Graepel, Matthew M. Botvinick, Joel Z. Leibo

Multi-agent reinforcement learning algorithms are useful for simulating social behavior in settings that are too complex for other theoretical approaches like game theory. However, they have not yet been empirically supported by laboratory experiments with real human participants. In this work we demonstrate how multi-agent reinforcement learning can model group behavior in a spatially and temporally complex public good provision game called Clean Up. We show that human groups succeed in Clean Up when they can see who is who and track reputations over time but fail under conditions of anonymity. A new multi-agent reinforcement learning model of reputation-based cooperation demonstrates the same difference between identifiable and anonymous conditions. Furthermore, both human groups and artificial agent groups solve the problem via turn-taking despite other options being available. Our results highlight the benefits of using multi-agent reinforcement learning to model human social behavior in complex environments.

多剂强化学习算法有助于模拟在过于复杂的环境下的社会行为,而这种环境对于游戏理论等其他理论方法来说过于复杂。然而,这些算法尚未在实验实验中得到真正人类参与者的实验支持。在这项工作中,我们展示了多剂强化学习如何在空间和时间上复杂的公共公益提供游戏中模拟团体行为,称为“清理”。我们展示了人类团体在“清洁”项目中的成功之处,当他们能够看到谁在一段时间里是谁,跟踪谁的声誉,但在匿名条件下却失败了。基于声誉的合作的新的多剂强化学习模式显示了可识别和匿名条件之间的相同区别。此外,尽管还有其他选择,但人类团体和人工剂团体都通过转手解决问题。我们的成果突出表明了在复杂的环境中使用多剂强化学习模式模拟人类社会行为的好处。

Article 59

Title@2025-06-06 (5): ADIOS: Antibody Development via Opponent Shaping

Title: ADIOS: Antibody Development via Opponent Shaping

ADIOS: Antikörper-Entwicklung über Opponent Shaping

ADIOS:通过反对者造型发展反体 2409.10588v8

Authors (8): Sebastian Towers, Aleksandra Kalisz, Philippe A. Robert, Alicia Higueruelo, Francesca Vianello, Ming-Han Chloe Tsai, Harrison Steel, Jakob N. Foerster

Anti-viral therapies are typically designed to target only the current strains of a virus, a myopic response. However, therapy-induced selective pressures drive the emergence of new viral strains, against which the original myopic therapies are no longer effective. This evolutionary response presents an opportunity: our therapies could both defend against and actively influence viral evolution. This motivates our method ADIOS: Antibody Development vIa Opponent Shaping. ADIOS is a meta-learning framework where the process of antibody therapy design, the outer loop, accounts for the virus’s adaptive response, the inner loop. With ADIOS, antibodies are not only robust against potential future variants, they also influence, i.e., shape, which future variants emerge. In line with the opponent shaping literature, we refer to our optimised antibodies as shapers. To demonstrate the value of ADIOS, we build a viral evolution simulator using the Absolut! framework, in which shapers successfully target both current and future viral variants, outperforming myopic antibodies. Furthermore, we show that shapers modify the distribution over viral evolutionary trajectories to result in weaker variants. We believe that our ADIOS paradigm will facilitate the discovery of long-lived vaccines and antibody therapies while also generalising to other domains. Specifically, domains such as antimicrobial resistance, cancer treatment, and others with evolutionarily adaptive opponents. Our code is available at https://github.com/olakalisz/adios.

抗病毒疗法通常只针对目前病毒的抗药性,一种短视反应。然而,治疗引起的选择性压力驱使新病毒菌株的出现,而最初的短视疗法则不再有效。这种进化反应提供了一个机会:我们的治疗方法既可以防御病毒进化,也可以积极影响病毒进化。这促使我们的方法ADIOS:Antibody Development vIa Opppponent Shaping。ADIOS是一个元学习框架,在这个框架中,抗体治疗设计过程、外环、病毒适应性反应、内环。由于ADIOS,抗体不仅对潜在的未来变异体具有很强的抗病毒反应力,而且它们也会影响,即,形状,未来变异体的出现。与造型对手一样,我们把我们所选的抗病毒进化体当作成形体。为了展示ADIOS的价值,我们用Absoolut框架构建了一个病毒进化模型模拟进化模型模拟病毒进化模型。在这个框架中, 制成者成功地瞄准了当前和未来的病毒变异体, 超越了我的抗体。此外,我们所选的变体。此外,我们显示,制的变体会会改变体会改变体会会改变体会改变体会改变体会改变体会改变原体, 。

Article 60

Title@2025-06-06 (5): AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Title: AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

AutoML-Agent: Ein Multi-Agent-LLM-Framework für Full-Pipeline-AutoML

自动MAL- Agency: 全Pipeline 自动MLM 多边代理LLM 框架 2410.02958v2

Authors (3): Patara Trirat, Wonyong Jeong, Sung Ju Hwang

Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline, such as optimal model search and hyperparameter tuning. Existing AutoML systems often require technical expertise to set up complex tools, which is in general time-consuming and requires a large amount of human effort. Therefore, recent works have started exploiting large language models (LLM) to lessen such burden and increase the usability of AutoML frameworks via a natural language interface, allowing non-expert users to build their data-driven solutions. These methods, however, are usually designed only for a particular process in the AI development pipeline and do not efficiently use the inherent capacity of the LLMs. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML, i.e., from data retrieval to model deployment. AutoML-Agent takes user’s task descriptions, facilitates collaboration between specialized LLM agents, and delivers deployment-ready models. Unlike existing work, instead of devising a single plan, we introduce a retrieval-augmented planning strategy to enhance exploration to search for more optimal plans. We also decompose each plan into sub-tasks (e.g., data preprocessing and neural network design) each of which is solved by a specialized agent we build via prompting executing in parallel, making the search process more efficient. Moreover, we propose a multi-stage verification to verify executed results and guide the code generation LLM in implementing successful solutions. Extensive experiments on seven downstream tasks using fourteen datasets show that AutoML-Agent achieves a higher success rate in automating the full AutoML process, yielding systems with good performance throughout the diverse domains.

自动机学(自动机学)通过将开发管道中的任务自动化,例如最佳模型搜索和超分光计调整,加速了自动数据开发。现有的自动机学系统往往需要技术专长来建立复杂的工具,因为这类工具一般耗时,需要大量人力投入。因此,最近的工作已经开始利用大型语言模型(LLM)来减轻这种负担,并通过自然语言界面提高自动机学框架的使用性,使非专家用户能够建立数据驱动的解决方案。但这些方法通常只针对AI开发管道中某个特定进程设计,而不有效地使用LLMM的固有能力。本文建议AUML-Agener,这是一个全新的多试剂框架,专门用于全管自动机,从数据检索到模型部署。自动机学-Ang-Agency的工作已开始利用大型LMMLM(LM)系统的任务说明,便利专门LMM代理商之间的协作,并交付适合部署的模型。与现有的工作不同,我们引入了一种回收式规划战略,以加强探索,以便搜索更优化的LMMLMLM系统。我们用每个系统进行高级的轨道操作,我们通过整个网络进行自动计算。

Article 61

Title@2025-06-06 (5): Multi-Agent Collaboration via Cross-Team Orchestration

Title: Multi-Agent Collaboration via Cross-Team Orchestration

Multi-Agenten-Zusammenarbeit über Cross-Team-Orchestrierung

通过跨团队管弦化多机构协作 2406.08979v2

Authors (12): Zhuoyun Du, Chen Qian, Wei Liu, Zihao Xie, YiFei Wang, Rennai Qiu, Yufan Dang, Weize Chen, Cheng Yang, Ye Tian, Xuantang Xiong, Lei Han

Large Language Models (LLMs) have significantly impacted various domains, especially through organized LLM-driven autonomous agents. A representative scenario is in software development, where agents can collaborate in a team like humans, following predefined phases to complete sub-tasks sequentially. However, for an agent team, each phase yields only one possible outcome. This results in the completion of only one development chain, thereby losing the opportunity to explore multiple potential decision paths within the solution space. Consequently leading to suboptimal results or extensive trial and error. To address this, we introduce Cross-Team Orchestration (Croto), a scalable multi-team framework that enables orchestrated teams to jointly propose various task-oriented solutions and interact with their insights in a self-independence while cross-team collaboration environment for superior solutions generation. Experiments reveal a notable increase in software quality compared to state-of-the-art baselines. We further tested our framework on story generation tasks, which demonstrated a promising generalization ability of our framework in other domains. The code and data is available at https://github.com/OpenBMB/ChatDev/tree/macnet

大型语言模型(LLMS)对不同领域产生了重大影响,特别是通过有组织LLM驱动的自主代理商。一种有代表性的情景是软件开发,代理商可以在诸如人类这样的团队中合作,经过预先确定的阶段,按顺序完成子任务。但是,对于一个代理商团队来说,每个阶段只产生一个可能的结果。这只能导致一个发展链的完成,从而失去了探索解决方案空间内多种潜在决策路径的机会。结果导致结果不尽人意,或者造成广泛的试验和错误。为了解决这个问题,我们引入了Cross-Team 管弦化(Croto),这是一个可扩展的多层框架,使协调团队能够联合提出各种面向任务的解决办法,并与他们在自我独立和跨团队合作环境中的见解互动,为优秀的解决方案一代。实验显示软件质量与最新基线相比有了显著提高。我们进一步测试了我们的故事生成框架,这显示了我们在其他领域的框架具有很有希望的普遍化能力。该代码和数据可在https://github.com/ OpenBM/ChatD/tree/tree/treenetnetnet/netanaataacasy)上查阅。

Article 62

Title@2025-06-05 (4): Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts

Title: Collaborative Learning in Agentic Systems: A Collective AI is Greater Than the Sum of Its Parts

Gemeinsames Lernen in Agentensystemen: Eine kollektive KI ist größer als die Summe ihrer Teile

危险系统合作学习:集体AI大于其各部分的总和 2506.05577v1

Authors (10): Saptarshi Nath, Christos Peridis, Eseoghene Benjamin, Xinran Liu, Soheil Kolouri, Peter Kinnell, Zexin Li, Cong Liu, Shirin Dora, Andrea Soltoggio

Agentic AI has gained significant interest as a research paradigm focused on autonomy, self-directed learning, and long-term reliability of decision making. Real-world agentic systems operate in decentralized settings on a large set of tasks or data distributions with constraints such as limited bandwidth, asynchronous execution, and the absence of a centralized model or even common objectives. We posit that exploiting previously learned skills, task similarities, and communication capabilities in a collective of agentic AI are challenging but essential elements to enabling scalability, open-endedness, and beneficial collaborative learning dynamics. In this paper, we introduce Modular Sharing and Composition in Collective Learning (MOSAIC), an agentic algorithm that allows multiple agents to independently solve different tasks while also identifying, sharing, and reusing useful machine-learned knowledge, without coordination, synchronization, or centralized control. MOSAIC combines three mechanisms: (1) modular policy composition via neural network masks, (2) cosine similarity estimation using Wasserstein embeddings for knowledge selection, and (3) asynchronous communication and policy integration. Results on a set of RL benchmarks show that MOSAIC has a greater sample efficiency than isolated learners, i.e., it learns significantly faster, and in some cases, finds solutions to tasks that cannot be solved by isolated learners. The collaborative learning and sharing dynamics are also observed to result in the emergence of ideal curricula of tasks, from easy to hard. These findings support the case for collaborative learning in agentic systems to achieve better and continuously evolving performance both at the individual and collective levels.

作为侧重于自主性、自导学习以及长期合作学习动态的研究范例,大赦国际已获得极大的兴趣。现实世界代理系统在分散化的环境中运作,执行大量任务或数据分配,其制约因素包括带宽有限、不同步执行、缺乏集中模式或甚至共同目标。我们认为,利用以前学到的技能、任务相似性、交流能力,在集体自主性、自导学习和长期可靠的研究模式中,具有挑战性,但却是促成可扩展性、开放性和有益的合作学习动态的基本要素。在本文中,我们引入了集体学习中的模块共享和构成(MOSAIC),这是一种允许多个代理机构独立解决不同任务或数据分配的代理算法,同时识别、共享和重新使用有用的机械学习知识,而没有协调、同步或集中控制。 MOSAIC将三个机制结合起来:(1) 通过神经网络面具组合政策构成,(2) 利用瓦勒斯坦嵌入知识选择的类似估计,(3) 动态通信和政策整合。在集体学习过程中,在学习一些硬性学习结果和学习结果的过程中,无法通过学习更快速的学习,在学习案例中找到更迅速的学习结果。

Article 63

Title@2025-06-05 (4): Using Large Language Models to Simulate Human Behavioural Experiments: Port of Mars

Title: Using Large Language Models to Simulate Human Behavioural Experiments: Port of Mars

Mit großen Sprachmodellen menschliche Verhaltensexperimente simulieren: Marshafen

使用大型语言模型模拟人类行为实验:火星港 2506.05555v1

Authors (3): Oliver Slumbers, Joel Z. Leibo, Marco A. Janssen

Collective risk social dilemmas (CRSD) highlight a trade-off between individual preferences and the need for all to contribute toward achieving a group objective. Problems such as climate change are in this category, and so it is critical to understand their social underpinnings. However, rigorous CRSD methodology often demands large-scale human experiments but it is difficult to guarantee sufficient power and heterogeneity over socio-demographic factors. Generative AI offers a potential complementary approach to address thisproblem. By replacing human participants with large language models (LLM), it allows for a scalable empirical framework. This paper focuses on the validity of this approach and whether it is feasible to represent a large-scale human-like experiment with sufficient diversity using LLM. In particular, where previous literature has focused on political surveys, virtual towns and classical game-theoretic examples, we focus on a complex CRSD used in the institutional economics and sustainability literature known as Port of Mars

集体风险社会两难(CRSD)强调个人偏好与所有人需要为实现集体目标作出贡献的必要性之间的权衡。气候变化等问题属于这一类别,因此了解其社会基础至关重要。然而,严格的CRCSD方法往往要求大规模的人类实验,但难以保证社会人口因素有足够的权力和差异性。创用大赦国际为解决这一难题提供了一种潜在的互补方法。通过用大型语言模型取代人类参与者,它允许一个可扩展的经验框架。本文侧重于这一方法的有效性,以及是否可行,以利用LLLM来代表一个具有充分多样性的大规模人类实验。特别是,在以前的文献侧重于政治调查、虚拟城镇和经典游戏理论实例的情况下,我们侧重于机构经济学和可持续性文献中使用的复杂CRCSD,称为火星港。

Article 64

Title@2025-06-05 (4): Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data

Title: Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data

Agentomics-ML: Autonomes Machine Learning Experimentation Agent für Genomische und Transkriptionsdaten

ML:基因组和转基因数据自动机械学习实验代理 2506.05542v1

Authors (9): Vlastimil Martinek, Andrea Gariboldi, Dimosthenis Tzimotoudis, Aitor Alberdi Escudero, Edward Blake, David Cechak, Luke Cassar, Alessandro Balestrucci, Panagiotis Alexiou

The adoption of machine learning (ML) and deep learning methods has revolutionized molecular medicine by driving breakthroughs in genomics, transcriptomics, drug discovery, and biological systems modeling. The increasing quantity, multimodality, and heterogeneity of biological datasets demand automated methods that can produce generalizable predictive models. Recent developments in large language model-based agents have shown promise for automating end-to-end ML experimentation on structured benchmarks. However, when applied to heterogeneous computational biology datasets, these methods struggle with generalization and success rates. Here, we introduce Agentomics-ML, a fully autonomous agent-based system designed to produce a classification model and the necessary files for reproducible training and inference. Our method follows predefined steps of an ML experimentation process, repeatedly interacting with the file system through Bash to complete individual steps. Once an ML model is produced, training and validation metrics provide scalar feedback to a reflection step to identify issues such as overfitting. This step then creates verbal feedback for future iterations, suggesting adjustments to steps such as data representation, model architecture, and hyperparameter choices. We have evaluated Agentomics-ML on several established genomic and transcriptomic benchmark datasets and show that it outperforms existing state-of-the-art agent-based methods in both generalization and success rates. While state-of-the-art models built by domain experts still lead in absolute performance on the majority of the computational biology datasets used in this work, Agentomics-ML narrows the gap for fully autonomous systems and achieves state-of-the-art performance on one of the used benchmark datasets. The code is available at https://github.com/BioGeMT/Agentomics-ML.

机器学习(ML)和深层次学习方法的采用使分子医学发生革命性的变化,推动了基因组学、转录组学、药物发现和生物系统建模方面的突破。生物数据集的数量、多式联运和异质性不断增加,要求采用能够产生可普遍适用的预测模型的自动化方法。基于大语言模型的代理商最近的发展显示了在结构化基准上使ML实验自动化的前景。然而,当应用于各种计算生物学数据集时,这些方法与普遍化和成功率抗争。在这里,我们引入了Agromic-ML,一个完全自主的代理商计算系统,旨在产生一个分类模型模型和必要的文档,以便进行可再复制的培训和推断。我们的方法遵循了ML实验进程的预定步骤,通过Bash到完成单个步骤,反复与文件系统互动。一旦产生了MLML模型,培训和验证衡量标准仍然向反射步骤提供卡路级反馈,以便找出存在过度匹配的问题。这一步骤之后,为将来的分类提供了口头反馈,建议调整其步骤,如数据代表多数机的多数模型和超标,同时,在现有的数学模型和超时,我们使用了现有的模型的模型的进度数据。

Article 65

Title@2025-06-05 (4): Sequence Modeling for N-Agent Ad Hoc Teamwork

Title: Sequence Modeling for N-Agent Ad Hoc Teamwork

Sequenzmodellierung für N-Agent Ad Hoc Teamwork

N-代理特设团队工作的序列建模 2506.05527v1

Authors (6): Caroline Wang, Di Yang Shi, Elad Liebman, Ishan Durugkar, Arrasy Rahman, Peter Stone

N-agent ad hoc teamwork (NAHT) is a newly introduced challenge in multi-agent reinforcement learning, where controlled subteams of varying sizes must dynamically collaborate with varying numbers and types of unknown teammates without pre-coordination. The existing learning algorithm (POAM) considers only independent learning for its flexibility in dealing with a changing number of agents. However, independent learning fails to fully capture the inter-agent dynamics essential for effective collaboration. Based on our observation that transformers deal effectively with sequences with varying lengths and have been shown to be highly effective for a variety of machine learning problems, this work introduces a centralized, transformer-based method for N-agent ad hoc teamwork. Our proposed approach incorporates historical observations and actions of all controlled agents, enabling optimal responses to diverse and unseen teammates in partially observable environments. Empirical evaluation on a StarCraft II task demonstrates that MAT-NAHT outperforms POAM, achieving superior sample efficiency and generalization, without auxiliary agent-modeling objectives.

现有的学习算法(POAM)认为,在处理不断变化的代理物时,只有独立学习才具有灵活性。然而,独立学习未能充分捕捉有效合作所必需的机构间动态。根据我们的观察,即变压器能够有效地处理不同长度的序列,并且已证明对各种机器学习问题非常有效,这项工作为N代理物特设团队引入了集中的、基于变压器的方法。我们提议的方法包括所有受控代理物的历史观察和行动,使在部分可观测环境中对不同和看不见的同僚作出最佳反应。StarCraft II任务的经验性评估表明,MAT-NAHT超越了POAM,实现了高级样本效率和普遍化,没有辅助代理物模拟目标。

Article 66

Title@2025-06-05 (4): Towards Data Systems That Are Business Semantic-Centric and AI Agents-Assisted

Title: Towards Data Systems That Are Business Semantic-Centric and AI Agents-Assisted

Auf dem Weg zu Datensystemen, die semantisch-zentrale und KI-Agenten-Assistent sind

建立商业语义中心数据和AI 辅助代理数据系统 2506.05520v1

Authors (1): Cecil Pang

Contemporary businesses operate in dynamic environments requiring rapid adaptation to achieve goals and maintain competitiveness. Existing data platforms often fall short by emphasizing tools over alignment with business needs, resulting in inefficiencies and delays. To address this gap, I propose the Business Semantics Centric, AI Agents Assisted Data System (BSDS), a holistic system that integrates architecture, workflows, and team organization to ensure data systems are tailored to business priorities rather than dictated by technical constraints. BSDS redefines data systems as dynamic enablers of business success, transforming them from passive tools into active drivers of organizational growth. BSDS has a modular architecture that comprises curated data linked to business entities, a knowledge base for context-aware AI agents, and efficient data pipelines. AI agents play a pivotal role in assisting with data access and system management, reducing human effort, and improving scalability. Complementing this architecture, BSDS incorporates workflows optimized for both exploratory data analysis and production requirements, balancing speed of delivery with quality assurance. A key innovation of BSDS is its incorporation of the human factor. By aligning data team expertise with business semantics, BSDS bridges the gap between technical capabilities and business needs. Validated through real-world implementation, BSDS accelerates time-to-market for data-driven initiatives, enhances cross-functional collaboration, and provides a scalable blueprint for businesses of all sizes. Future research can build on BSDS to explore optimization strategies using complex systems and adaptive network theories, as well as developing autonomous data systems leveraging AI agents.

现有数据平台往往不尽人意,因为强调与商业需求相匹配的工具,从而导致效率低下和延误。为弥补这一差距,我提议企业语义中心、AI代理辅助数据系统(BSDS),这是一个综合结构、工作流程和团队组织的整体系统,它综合了结构、工作流程和团队组织,以确保数据系统符合业务优先事项,而不是技术制约因素的制约。工商安全数据系统将数据系统重新定义为动态的促进企业成功因素,将其从被动工具转变为积极的组织增长驱动因素。工商安全数据系统有一个模块架构,由与商业实体相联系的整理数据、具备环境意识的AI代理商的知识库以及高效的数据管道组成。 AI代理商在协助数据存取和系统管理、减少人类努力和提高可扩展性方面发挥着关键作用。对这一架构进行补充,工商安全数据系统将优化工作流程用于探索性数据分析和生产要求,平衡交付速度和质量保证。 BSDS的一项关键创新是纳入人的因素。通过将数据团队专门知识与企业语义学、BSDS连接,从而弥合技术访问和系统之间的鸿沟,从而加快了技术流流化战略,从而加快了全球数据流流化战略的执行。

Article 67

Title@2025-06-05 (4): Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Title: Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Time to Talk: LLM-Agenten für asynchrone Gruppenkommunikation in Mafia-Spielen

讨论时间:黑手党运动会Asynconomic Group通讯的LLM代理商 2506.05309v1

Authors (3): Niv Eckhaus, Uri Berger, Gabriel Stanovsky

LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are inherently asynchronous. For example, in group chats, online team meetings, or social games, there is no inherent notion of turns; therefore, the decision of when to speak forms a crucial part of the participant’s decision making. In this work, we develop an adaptive asynchronous LLM-agent which, in addition to determining what to say, also decides when to say it. To evaluate our agent, we collect a unique dataset of online Mafia games, including both human participants, as well as our asynchronous agent. Overall, our agent performs on par with human players, both in game performance, as well as in its ability to blend in with the other human players. Our analysis shows that the agent’s behavior in deciding when to speak closely mirrors human patterns, although differences emerge in message content. We release all our data and code to support and encourage further research for more realistic asynchronous communication between LLM agents. This work paves the way for integration of LLMs into realistic human group settings, from assistance in team discussions to educational and professional environments where complex social dynamics must be navigated.

LLMS主要用于同步通信, 即人类用户和模式交替交流。相反, 许多真实世界的设置本质上是非同步的。例如, 在集体聊天、在线团队会议或社交游戏中, 不存在固有的旋转概念; 因此, 何时发言的决定是参与者决策的关键部分。在这项工作中, 我们开发了一个适应性和非同步的LM- 代理, 除了决定要说什么外, 还要决定何时说什么。为了评估我们的代理, 我们收集了一个独特的网上黑手党游戏数据集, 包括人类参与者, 以及我们的不同步代理。总的来说, 我们的代理在游戏表演中, 以及与其他人类玩家融合的能力上, 都不存在固有的转折概念。我们的分析表明, 代理决定何时发言时的行为反映了人类模式, 尽管在信息内容上出现差异。我们发布所有的数据和代码, 支持并鼓励进一步开展研究, 以便让LMPM 代理进行更现实的同步通信。这项工作为将LMS 整合到现实的团队的复杂动态环境提供了帮助。

Article 68

Title@2025-06-05 (4): Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning

Title: Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning

Konservative Klassifikatoren tun konsequent gut mit Verbesserung Agenten: Charakterisierung statistischer und Online-Lernen

保守的分类机构与改进机构保持一贯的很好:将统计和在线学习定性为特征 2506.05252v1

Authors (2): Dravyansh Sharma, Alec Sun

Machine learning is now ubiquitous in societal decision-making, for example in evaluating job candidates or loan applications, and it is increasingly important to take into account how classified agents will react to the learning algorithms. The majority of recent literature on strategic classification has focused on reducing and countering deceptive behaviors by the classified agents, but recent work of Attias et al. identifies surprising properties of learnability when the agents genuinely improve in order to attain the desirable classification, such as smaller generalization error than standard PAC-learning. In this paper we characterize so-called learnability with improvements across multiple new axes. We introduce an asymmetric variant of minimally consistent concept classes and use it to provide an exact characterization of proper learning with improvements in the realizable setting. While prior work studies learnability only under general, arbitrary agent improvement regions, we give positive results for more natural Euclidean ball improvement sets. In particular, we characterize improper learning under a mild generative assumption on the data distribution. We further show how to learn in more challenging settings, achieving lower generalization error under well-studied bounded noise models and obtaining mistake bounds in realizable and agnostic online learning. We resolve open questions posed by Attias et al. for both proper and improper learning.

在社会决策中,机器学习现已普遍存在,例如在评价求职者或贷款申请时,机器学习现已在社会决策中无处不在,而且越来越重要的是要考虑分类代理人如何对学习算法作出反应。最近关于战略分类的文献大多侧重于减少和打击分类代理人的欺骗行为,但Attias等人最近的工作指出,当代理人为了达到理想的分类而真正改进时,学习能力就具有惊人的特性,例如比标准PAC学习的简单化错误要小一些。在本文中,我们描述所谓的可学习性,并改进了多个新轴。我们引入了最低限度一致概念类的不对称变异,并用它来提供对适当学习的准确描述,在可实现的环境下加以改进。虽然以前的工作研究只在一般、任意的代理人改进地区才具有学习能力,但我们为更自然的Euclidean球改进组合提供了积极的结果。特别是,我们把不适当学习定性在数据分发的较温和的基因化假设之下。我们进一步说明如何在更具挑战性的环境中学习,在经过充分研究的噪音模型下实现较低的一般化错误。我们通过正确和不适当的在线学习来作出正确的选择。

Article 69

Title@2025-06-05 (4): Towards Language-Augmented Multi-Agent Deep Reinforcement Learning

Title: Towards Language-Augmented Multi-Agent Deep Reinforcement Learning

Auf dem Weg zu einem sprachverstärkten, multiagenten, tiefen Stärkungslernen

走向语文升级多机构深入强化学习 2506.05236v1

Authors (4): Maxime Toquebiau, Jae-Yun Jun, Faïz Benamar, Nicolas Bredeche

Communication is a fundamental aspect of coordinated behavior in multi-agent reinforcement learning. Yet, most prior works in this field have focused on emergent communication protocols developed from scratch, often resulting in inefficient or non-interpretable systems. Inspired by the role of language in natural intelligence, we investigate how grounding agents in a human-defined language can improve learning and coordination of multiple embodied agents. We propose a framework in which agents are trained not only to act but also to produce and interpret natural language descriptions of their observations. This language-augmented learning serves a dual role: enabling explicit communication between agents and guiding representation learning. We demonstrate that agents trained with our method outperform traditional emergent communication baselines across various tasks. Our analysis reveals that language grounding leads to more informative internal representations, better generalization to new partners, and improved capability for human-agent interaction. These findings demonstrate the effectiveness of integrating structured language into multi-agent learning and open avenues for more interpretable and capable multi-agent systems.

多代理人强化学习的协调行为是多代理人强化学习中协调行为的一个基本方面。然而,该领域以前的工作大多侧重于从零开始开发的紧急通信协议,往往导致效率低下或无法解释的系统。在语言在自然智能中的作用的启发下,我们调查以人为定义的语言定位的代理人如何能改善多种体现的代理人的学习和协调。我们提出了一个框架,使代理人不仅能够采取行动,而且能够制作和解释其观察意见的自然语言描述。这种语言强化学习具有双重作用:使代理人之间能够进行明确的沟通,指导代表性学习。我们证明,接受过我们方法培训的代理人超越了各种任务的传统紧急通信基线。我们的分析表明,语言基础化导致更加信息化的内部陈述,更好地概括新的合作伙伴,并提高了人类代理人互动的能力。这些结论表明,将结构化语言纳入多代理人学习和开放的渠道对于更易解释、更有能力的多代理人系统是有效的。

Article 70

Title@2025-06-05 (4): Conceptualizing educational opportunity hoarding: the emergence of hoarding without hoarders

Title: Conceptualizing educational opportunity hoarding: the emergence of hoarding without hoarders

Konzeptualisierung der Bildungschancen Horten: das Entstehen von Horten ohne Horten

将教育机遇概念化:囤积:无囤积者的囤积的出现 2305.14653v3

Authors (1): João M. Souto-Maior

Social scientists increasingly use the concept of opportunity hoarding to explain the formation of Black-White educational inequalities. However, this concept is often loosely defined, leading to varied interpretations of the inequality-producing mechanisms it captures. To bring clarity to this valuable sociological concept, this theoretical paper, informed by the concept’s original definition and existing empirical research, proposes a more precise definition of opportunity hoarding and formalizes it through a computational model. For concreteness, the model focuses on one context: how White families can hoard access to advanced high school coursework from Black students attending the same school. Through simulations, the paper highlights the necessary and sufficient conditions under which the hoarding of advanced course-taking opportunities emerges. Results demonstrate that, in contrast to traditional accounts, White actors do not need to engage in exclusionary behaviors to hoard valuable resources. Rather, through the byproduct of network segregation and class inequalities, opportunity hoarding can emerge even when individuals act in race-neutral ways – a process I conceptualize as hoarding without hoarders.

社会科学家越来越多地利用机会囤积的概念来解释黑白教育不平等的形成。然而,这一概念往往定义松散,导致对其所捕捉的不平等产生机制的不同解释。为了澄清这一宝贵的社会学概念,根据概念的原始定义和现有的实证研究,这份理论文件提出了一个更精确的关于机会囤积的定义,并通过一个计算模型将其正规化。关于具体性,模型侧重于一个背景:白人家庭如何能从同一学校的黑人学生手中囤积高级高中课程课程。通过模拟,文件强调了储存先进课程机会所必须和充分的条件。结果表明,与传统账户不同,白人行为者不需要从事排他性行为来囤积宝贵的资源。相反,通过网络隔离和阶级不平等的副产品,即使个人以种族中立的方式行事,机会囤积也可以出现。我将这一过程概念化为没有囤积者。

Article 71

Title@2025-06-05 (4): A MARL-based Approach for Easing MAS Organization Engineering

Title: A MARL-based Approach for Easing MAS Organization Engineering

Ein MARL-basierter Ansatz für die Easing MAS Organisation Engineering

以最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、最低、 2506.05437v1

Authors (5): Julien Soulé, Jean-Paul Jamont, Michel Occello, Louis-Marie Traonouez, Paul Théron

Multi-Agent Systems (MAS) have been successfully applied in industry for their ability to address complex, distributed problems, especially in IoT-based systems. Their efficiency in achieving given objectives and meeting design requirements is strongly dependent on the MAS organization during the engineering process of an application-specific MAS. To design a MAS that can achieve given goals, available methods rely on the designer’s knowledge of the deployment environment. However, high complexity and low readability in some deployment environments make the application of these methods to be costly or raise safety concerns. In order to ease the MAS organization design regarding those concerns, we introduce an original Assisted MAS Organization Engineering Approach (AOMEA). AOMEA relies on combining a Multi-Agent Reinforcement Learning (MARL) process with an organizational model to suggest relevant organizational specifications to help in MAS engineering.

在工业中成功应用了多种物源系统,因为其能够解决复杂和分布的问题,特别是在基于IoT的系统中;在具体应用的MAS的工程过程中,它们实现既定目标和满足设计要求的效率在很大程度上取决于MAS组织;在设计能够实现既定目标的MAS时,现有方法取决于设计者对部署环境的了解;然而,由于某些部署环境中的高度复杂性和可读性低,这些方法的应用费用昂贵,或引起安全问题;为了方便MAS组织设计有关这些问题的设计,我们采用了最初的辅助MAS组织工程方法(AMEA);AOMEA依靠将多重物力强化学习(MARL)进程与组织模型结合起来,提出有助于MAS工程的有关组织规格。

Article 72

Title@2025-06-05 (4): Offline Multi-agent Reinforcement Learning via Score Decomposition

Title: Offline Multi-agent Reinforcement Learning via Score Decomposition

Offline-Multi-Agenten-Verstärkung Lernen über Score-Dekomposition

通过计分分分分分分化进行离线多剂强化学习 2505.05968v2

Authors (5): Dan Qiao, Wenhao Li, Shanchao Yang, Hongyuan Zha, Baoxiang Wang

Offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts, particularly stemming from the high dimensionality of joint action spaces and the presence of out-of-distribution joint action selections. In this work, we highlight that a fundamental challenge in offline MARL arises from the multi-equilibrium nature of cooperative tasks, which induces a highly multimodal joint behavior policy space coupled with heterogeneous-quality behavior data. This makes it difficult for individual policy regularization to align with a consistent coordination pattern, leading to the policy distribution shift problems. To tackle this challenge, we design a sequential score function decomposition method that distills per-agent regularization signals from the joint behavior policy, which induces coordinated modality selection under decentralized execution constraints. Then we leverage a flexible diffusion-based generative model to learn these score functions from multimodal offline data, and integrate them into joint-action critics to guide policy updates toward high-reward, in-distribution regions under a shared team reward. Our approach achieves state-of-the-art performance across multiple particle environments and Multi-agent MuJoCo benchmarks consistently. To the best of our knowledge, this is the first work to explicitly address the distributional gap between offline and online MARL, paving the way for more generalizable offline policy-based MARL methods.

在这项工作中,我们强调,由于合作任务具有多重平衡性质,因此,离线性MARL在离线性多剂强化学习方面面临一个根本性挑战,因为分配变化,特别是由于联合行动空间的高度多维性和存在分配的联合行动选择。在这项工作中,我们强调,离线性MARL面临一个根本性挑战,因为合作任务具有多重平衡性质,这导致高度多式联运联合行为政策空间,并伴之以不同质量的行为数据。这使得个人政策规范化难以与一致的协调模式保持一致,从而导致政策分配转变问题。为了应对这一挑战,我们设计了一种顺序分数分数分数法,从联合行为政策中提取每个代理人的正规化信号,从而导致在分散执行的限制下以协调的方式选择模式。然后,我们利用一个灵活的基于传播基础的基因化模型,从多式联运离线性数据中学习这些分数功能,并将其纳入联合行动批评者中,以指导政策向高回报、分布区的方向,从而导致政策分配出现问题。为了应对这一挑战,我们的方法在多个粒子环境中实现最先进的业绩表现,多剂 MuJoco基准,这使我们的知识得到最佳的在线分配方式,这是在离MAR通用方法之间的最可靠方法。

Article 73

Title@2025-06-05 (4): Joint Routing and Control Optimization in VANET

Title: Joint Routing and Control Optimization in VANET

Gemeinsame Routing- und Control-Optimierung in VANET

VANET 联合运行和控制优化 2506.08038v1

Authors (3): Chen Huang, Dingxuan Wang, Ronghui Hou

In this paper, we introduce DynaRoute, an adaptive joint optimization framework for dynamic vehicular networks that simultaneously addresses platoon control and data transmission through trajectory-aware routing and safety-constrained vehicle coordination. DynaRoute guarantees continuous vehicle movement via platoon safety control with optimizing transmission paths through real-time trajectory prediction and ensuring reliable data. Our solution achieves three key objectives: (1) maintaining platoon stability through accurate data transmission, (2) enabling adaptive routing based on vehicle movement patterns, and (3) enhancing overall intelligent transportation system performance. DynaRoute equires predefined traffic models and adapts to dynamic network conditions using local vehicle state information. We present comprehensive simulation results demonstrating that DynaRoute maintains control and transmission performance in multiple complex scenarios while significantly improving throughput and reliability compared to traditional approaches.

在本文中,我们引入了DynaRoute(DynaRoute),这是一个机动车辆网络的适应性联合优化框架,它同时通过轨迹测路和安全限制的车辆协调解决排控和数据传输问题;DynaRoute通过排安全控制保障车辆的连续移动,通过实时轨迹预测优化传输路径并确保可靠的数据;我们的解决办法实现了三个关键目标:(1)通过准确的数据传输维持排的稳定性;(2)根据车辆移动模式促成适应性路线;(3)提高整体智能运输系统性能;DynnaRoute equires预先确定了交通模式,并利用当地车辆状况信息适应动态网络条件;我们提供了全面的模拟结果,表明DynaRoute在多种复杂情况下保持了控制和传输性能,同时大大改进了与传统方法相比的吞吐量和可靠性。

Article 74

Title@2025-06-05 (4): Memory-Driven Bounded Confidence Opinion Dynamics: A Hegselmann-Krause Model Based on Fractional-Order Methods

Title: Memory-Driven Bounded Confidence Opinion Dynamics: A Hegselmann-Krause Model Based on Fractional-Order Methods

Memory-Driven Bounded Confidence Opinion Dynamics: Ein Hegselmann-Krause-Modell basierend auf fraktional-Order Methoden

记忆-记忆-记忆破封信任意见动态:基于分形排列法的Hegselmann-Krause模型 2506.04701v1

Authors (4): Meiru Jiang, Wei Su, Guojian Ren, Yongguang Yu

Memory effects play a crucial role in social interactions and decision-making processes. This paper proposes a novel fractional-order bounded confidence opinion dynamics model to characterize the memory effects in system states. Building upon the Hegselmann-Krause framework and fractional-order difference, a comprehensive model is established that captures the persistent influence of historical information. Through rigorous theoretical analysis, the fundamental properties including convergence and consensus is investigated. The results demonstrate that the proposed model not only maintains favorable convergence and consensus characteristics compared to classical opinion dynamics, but also addresses limitations such as the monotonicity of bounded opinions. This enables a more realistic representation of opinion evolution in real-world scenarios. The findings of this study provide new insights and methodological approaches for understanding opinion formation and evolution, offering both theoretical significance and practical applications.

内存效应在社会互动和决策过程中发挥着关键作用。本文件提出了一个新的分级封闭式信任意见动态模型,以描述系统状态中的记忆效应。在Hegselmann-Krause框架和分级顺序差异的基础上,建立了一个综合模型,记录历史信息的持久影响。通过严格的理论分析,对包括趋同和共识在内的基本特性进行了调查。结果显示,拟议的模型不仅保持了与古典观点动态相比的有利趋同和共识特征,而且还解决了约束式观点的单一性等局限性。这样可以更现实地反映现实世界情景中的观点演变。这项研究的结果为理解意见形成和演变提供了新的见解和方法,提供了理论意义和实际应用。

Article 75

Title@2025-06-05 (4): Gen-n-Val: Agentic Image Data Generation and Validation

Title: Gen-n-Val: Agentic Image Data Generation and Validation

Gen-n-Val: Gen-n-Val: Agentische Bilddatengenerierung und -validierung

Gen-n-Val: 代理图像数据生成和校验 2506.04676v1

Authors (5): Jing-En Huang, I-Sheng Fang, Tzuhsuan Huang, Chih-Yu Wang, Jun-Cheng Chen

Recently, Large Language Models (LLMs) and Vision Large Language Models (VLLMs) have demonstrated impressive performance as agents across various tasks while data scarcity and label noise remain significant challenges in computer vision tasks, such as object detection and instance segmentation. A common solution for resolving these issues is to generate synthetic data. However, current synthetic data generation methods struggle with issues, such as multiple objects per mask, inaccurate segmentation, and incorrect category labels, limiting their effectiveness. To address these issues, we introduce Gen-n-Val, a novel agentic data generation framework that leverages Layer Diffusion (LD), LLMs, and VLLMs to produce high-quality, single-object masks and diverse backgrounds. Gen-n-Val consists of two agents: (1) The LD prompt agent, an LLM, optimizes prompts for LD to generate high-quality foreground instance images and segmentation masks. These optimized prompts ensure the generation of single-object synthetic data with precise instance masks and clean backgrounds. (2) The data validation agent, a VLLM, which filters out low-quality synthetic instance images. The system prompts for both agents are refined through TextGrad. Additionally, we use image harmonization to combine multiple instances within scenes. Compared to state-of-the-art synthetic data approaches like MosaicFusion, our approach reduces invalid synthetic data from 50% to 7% and improves performance by 1% mAP on rare classes in COCO instance segmentation with YOLOv9c and YOLO11m. Furthermore, Gen-n-Val shows significant improvements (7. 1% mAP) over YOLO-Worldv2-M in open-vocabulary object detection benchmarks with YOLO11m. Moreover, Gen-n-Val improves the performance of YOLOv9 and YOLO11 families in instance segmentation and object detection.

最近,大语言模型(LLMS)和视觉大语言模型(VLLMS)作为各种任务的代理机构表现出了令人印象深刻的业绩,而数据稀缺和标签噪音仍然是计算机视觉任务中的重大挑战,例如物体探测和实例分割。解决这些问题的一个共同解决办法是生成合成数据。然而,目前的合成数据生成方法与各种问题,例如每面罩多颗物体、不准确的分解和不正确的分类标签等有困难,限制了它们的效力。为了解决这些问题,我们引入了Gen-n-Val,一个新型的代理数据生成框架,利用了层扩散(LD)、LLMS和VLLMMS来生成高质量的、单球面具和不同背景的图像。Gen-n-Val由两种代理机构组成:(1) LDLT加速剂、LLMMM,优化LD的提示来生成高质量的地面图像和分解面罩。这些最优化的提示确保生成带有精确实例面具和清洁背景的单项合成物体合成数据。(2)数据验证工具,VLLM,它过滤了低质量的合成物体图像图像图像、单项9级面面面面面面面面面面面面面面、UPLOLMLMMLMSUS-UD 将实时数据通过我们通过GLOGLS-S-S-S-GLVGMS-S-S-GMS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-GNLVGLO-S-S-S-S-S-S-O-S-S-S-S-S-S-O-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S

Article 76

Title@2025-06-05 (4): From Intention To Implementation: Automating Biomedical Research via LLMs

Title: From Intention To Implementation: Automating Biomedical Research via LLMs

Von der Absicht zur Umsetzung: Automatisierung der biomedizinischen Forschung über LLMs

从实施目的出发:通过LLMs实现生物医学研究自动化 2412.09429v4

Authors (7): Yi Luo, Linghang Shi, Yihao Li, Aobo Zhuang, Yeyun Gong, Ling Liu, Chen Lin

Conventional biomedical research is increasingly labor-intensive due to the exponential growth of scientific literature and datasets. Artificial intelligence (AI), particularly Large Language Models (LLMs), has the potential to revolutionize this process by automating various steps. Still, significant challenges remain, including the need for multidisciplinary expertise, logicality of experimental design, and performance measurements. This paper introduces BioResearcher, the first end-to-end automated system designed to streamline the entire biomedical research process involving dry lab experiments. BioResearcher employs a modular multi-agent architecture, integrating specialized agents for search, literature processing, experimental design, and programming. By decomposing complex tasks into logically related sub-tasks and utilizing a hierarchical learning approach, BioResearcher effectively addresses the challenges of multidisciplinary requirements and logical complexity. Furthermore, BioResearcher incorporates an LLM-based reviewer for in-process quality control and introduces novel evaluation metrics to assess the quality and automation of experimental protocols. BioResearcher successfully achieves an average execution success rate of 63.07% across eight previously unmet research objectives. The generated protocols, on average, outperform typical agent systems by 22.0% on five quality metrics. The system demonstrates significant potential to reduce researchers’ workloads and accelerate biomedical discoveries, paving the way for future innovations in automated research systems.

由于科学文献和数据集的成倍增长,常规生物医学研究日益成为劳动力密集型研究。人工智能(AI),特别是大语言模型(LLMS)有可能通过使各种步骤自动化而使这一过程发生革命性变革。然而,依然存在着重大挑战,包括需要多学科的专门知识、实验设计的合理性和性以及性能测量。本文件介绍生物研究(BioResearch),这是第一个端到端自动化系统,旨在精简涉及干实验的整个生物医学研究过程;生物研究(BioResearch)使用模块式多试剂结构,整合了用于搜索、文学处理、实验设计和编程的专门剂。生物研究(BioResearch)将复杂的任务分解成与逻辑相关的子任务,并采用等级学习方法,从而有效地应对多学科要求和逻辑复杂性的挑战。此外,生物研究(BioResearch)采用基于LM(LM)的审查器,用于控制过程质量,并引入新的评估实验协议质量和自动化的评价指标。生物研究(BioResearter)成功地在八个前未完成的研究目标中实现了63.07%的平均执行成功率。生成的规程,平均、超出典型代理研究系统,以22.0提高未来研究质量研究的系统的风险提升了未来系统。

Article 77

Title@2025-06-05 (4): Learning Two-agent Motion Planning Strategies from Generalized Nash Equilibrium for Model Predictive Control

Title: Learning Two-agent Motion Planning Strategies from Generalized Nash Equilibrium for Model Predictive Control

Lernen von Zwei-Agenten-Bewegungsplanungsstrategien aus dem generalisierten Nash-Equilibrium für Modellvorhersagesteuerung

从一般纳什平衡中学习双剂动力规划战略,用于模型预测控制 2411.13983v4

Authors (4): Hansung Kim, Edward L. Zhu, Chang Seok Lim, Francesco Borrelli

We introduce an Implicit Game-Theoretic MPC (IGT-MPC), a decentralized algorithm for two-agent motion planning that uses a learned value function that predicts the game-theoretic interaction outcomes as the terminal cost-to-go function in a model predictive control (MPC) framework, guiding agents to implicitly account for interactions with other agents and maximize their reward. This approach applies to competitive and cooperative multi-agent motion planning problems which we formulate as constrained dynamic games. Given a constrained dynamic game, we randomly sample initial conditions and solve for the generalized Nash equilibrium (GNE) to generate a dataset of GNE solutions, computing the reward outcome of each game-theoretic interaction from the GNE. The data is used to train a simple neural network to predict the reward outcome, which we use as the terminal cost-to-go function in an MPC scheme. We showcase emerging competitive and coordinated behaviors using IGT-MPC in scenarios such as two-vehicle head-to-head racing and un-signalized intersection navigation. IGT-MPC offers a novel method integrating machine learning and game-theoretic reasoning into model-based decentralized multi-agent motion planning.

我们引入了隐性游戏-理论MPC(IGT-MPC),这是一种分权的双试剂运动规划算法,它使用一种知识价值函数,预测游戏-理论互动结果,作为模型预测控制(MPC)框架内的终端成本到运行功能,指导代理商隐含地说明与其他代理商的互动,并最大限度地奖励他们。这个方法适用于我们作为受制约的动态游戏而形成的竞争性和合作性多试剂运动规划问题。鉴于一种有限的动态游戏,我们随机抽样初步条件并解决普惠制纳什平衡(GNE),以生成一套GNE解决方案的数据集,计算GNE的每次游戏-理论互动的奖励结果。这些数据用于培训简单的神经网络,以预测奖励结果,我们在MPC计划中将这种结果用作终端成本到运行功能。我们用IGT-MPC展示了在两种车辆头对头赛和未发式交叉导航等情景中新出现的竞争和协调行为。IGT-MPC提供了一种新型的方法,将机器学习和游戏-理论推论纳入基于模型分散的多试管规划。

Article 78

Title@2025-06-05 (4): From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

Title: From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

Von Standalone LLMs bis hin zu integrierter Intelligenz: Eine Übersicht über zusammengesetzte Al-Systeme

从独立的LMLM公司到综合情报公司:对Al Complical Systems的调查 2506.04565v1

Authors (3): Jiayi Chen, Junyi Ye, Guiling Wang

Compound Al Systems (CAIS) is an emerging paradigm that integrates large language models (LLMs) with external components, such as retrievers, agents, tools, and orchestrators, to overcome the limitations of standalone models in tasks requiring memory, reasoning, real-time grounding, and multimodal understanding. These systems enable more capable and context-aware behaviors by composing multiple specialized modules into cohesive workflows. Despite growing adoption in both academia and industry, the CAIS landscape remains fragmented, lacking a unified framework for analysis, taxonomy, and evaluation. In this survey, we define the concept of CAIS, propose a multi-dimensional taxonomy based on component roles and orchestration strategies, and analyze four foundational paradigms: Retrieval-Augmented Generation (RAG), LLM Agents, Multimodal LLMs (MLLMs), and orchestration-centric architectures. We review representative systems, compare design trade-offs, and summarize evaluation methodologies across these paradigms. Finally, we identify key challenges-including scalability, interoperability, benchmarking, and coordination-and outline promising directions for future research. This survey aims to provide researchers and practitioners with a comprehensive foundation for understanding, developing, and advancing the next generation of system-level artificial intelligence.

综合大型语言模型(LLMs)与外部组成部分(如检索器、代理器、工具和管弦乐团)相结合,以克服独立模型在需要记忆、推理、实时定位和多式联运理解的任务方面的局限性。这些系统通过将多个专门模块组合成具有凝聚力的工作流程,使得行为更有能力和符合实际情况。尽管学术界和工业界日益采用,但CAIS景观仍然支离破碎,缺乏分析、分类和评估的统一框架。在这次调查中,我们界定了CAIS的概念,提出了基于组成部分作用和协同战略的多维分类,并分析了四种基本模式:检索启动型一代(RAG)、LLMMM代理、多模式LMMMS(MLMS)和调控中心架构。我们审查代表性系统,比较设计权衡,并总结了这些模式的评价方法。最后,我们确定了关键的挑战,包括可扩展性、互操作性、基准、协调性以及未来研究的展望方向。本调查旨在为发展研究的研究人员和人造系统提供全面基础、发展基础。

Article 0

Title@2025-06-12 (4): AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Article 1

Title@2025-06-12 (4): Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Article 2

Title@2025-06-12 (4): AI Agent Behavioral Science

Article 3

Title@2025-06-12 (4): AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Article 4

Title@2025-06-12 (4): Nonconvex Game and Multi Agent Reinforcement Learning for Zonal Ancillary Markets

Article 5

Title@2025-06-12 (4): MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

Article 6

Title@2025-06-12 (4): CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models

Article 7

Title@2025-06-12 (4): The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

Article 8

Title@2025-06-12 (4): A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon

Article 9

Title@2025-06-12 (4): The Optimization Paradox in Clinical AI Multi-Agent Systems

Article 10

Title@2025-06-11 (3): Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games

Article 11

Title@2025-06-11 (3): DAWN: Designing Distributed Agents in a Worldwide Network

Article 12

Title@2025-06-11 (3): Delegations as Adaptive Representation Patterns: Rethinking Influence in Liquid Democracy

Article 13

Title@2025-06-11 (3): Incentive-based Platoon Formation: Optimizing the Personal Benefit for Drivers

Article 14

Title@2025-06-11 (3): Effective Red-Teaming of Policy-Adherent Agents

Article 15

Title@2025-06-11 (3): Large Language Models Miss the Multi-Agent Mark

Article 16

Title@2025-06-11 (3): Reciprocity as the Foundational Substrate of Society: How Reciprocal Dynamics Scale into Social Systems

Article 17

Title@2025-06-11 (3): ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Article 18

Title@2025-06-11 (3): When Is Diversity Rewarded in Cooperative Multi-Agent Learning?

Article 19

Title@2025-06-11 (3): A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy

Article 20

Title@2025-06-11 (3): Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations

Article 21

Title@2025-06-11 (3): MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

Article 22

Title@2025-06-11 (3): Intelligent System of Emergent Knowledge: A Coordination Fabric for Billions of Minds

Article 23

Title@2025-06-11 (3): Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Article 24

Title@2025-06-10 (2): Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms

Article 25

Title@2025-06-10 (2): A Replica for our Democracies? On Using Digital Twins to Enhance Deliberative Democracy

Article 26

Title@2025-06-10 (2): Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms

Article 27

Title@2025-06-10 (2): Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

Article 28

Title@2025-06-10 (2): Confidence Boosts Trust-Based Resilience in Cooperative Multi-Robot Systems

Article 29

Title@2025-06-10 (2): FREIDA: A Framework for developing quantitative agent based models based on qualitative expert knowledge

Article 30

Title@2025-06-10 (2): FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

Article 31

Title@2025-06-09 (1): Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder Therapy

Article 32

Title@2025-06-09 (1): Innate-Values-driven Reinforcement Learning based Cooperative Multi-Agent Cognitive Modeling

Article 33

Title@2025-06-09 (1): Intelligent Offloading in Vehicular Edge Computing: A Comprehensive Review of Deep Reinforcement Learning Approaches and Architectures

Article 34

Title@2025-06-09 (1): Diffusion of Responsibility in Collective Decision Making

Article 35

Title@2025-06-09 (1): Agent Semantics, Semantic Spacetime, and Graphical Reasoning

Article 36

Title@2025-06-09 (1): Deep Equivariant Multi-Agent Control Barrier Functions

Article 37

Title@2025-06-09 (1): WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point

Article 38

Title@2025-06-09 (1): Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Article 39

Title@2025-06-09 (1): Multi-agent Architecture Search via Agentic Supernet