cs.MA @ 2025-06-01: 092
-
00 05-29 (4) ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork ROTATE: Bedauern-getriebenes Open-End-Training für Ad-Hoc-Teamwork 对特设团队工作不限成员名额培训的遗憾驱动的不限名额培训 2505.23686v1 -
01 05-29 Collaborative Last-Mile Delivery: A Multi-Platform Vehicle Routing Problem With En-route Charging Collaborative Last-Mile Lieferung: Ein Multi-Platform Fahrzeug Routing Problem mit en-route Laden 合作性最后一式交付:多平台车辆运行问题与连路充电 2505.23584v1 -
02 05-29 MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs MegaAgent: Ein autonomes LLM-basiertes Multi-Agent-System ohne vordefinierte SOPs 大型机构:一个以大型自治LLM为基础的没有预先界定的SOP的多机构系统 2408.09955v3 -
03 05-29 Agentic Knowledgeable Self-awareness Agentisch sachkundiges Selbstbewußtsein A. 动态知识自觉意识 2504.03553v2 -
04 05-29 The challenge of hidden gifts in multi-agent reinforcement learning Die Herausforderung der versteckten Gaben in Multi-Agenten-Verstärkung Lernen 多试剂强化学习中隐藏礼品的挑战 2505.20579v2 -
05 05-29 Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems Verständnis der Informationsverbreitungseffekte von Kommunikationstopologien in LLM-basierten Multi-Agent-Systemen 了解基于LLOM的多机构机构系统中的通信地形对信息传播的影响 2505.23352v1 -
06 05-29 Emergent social conventions and collective bias in LLM populations Emergente soziale Konventionen und kollektive Voreingenommenheit in LLM-Populationen 新出现的社会习俗和LLM人口的集体偏见 2410.08948v2 -
07 05-29 Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game Sprachagenten mit Verstärkung Lernen für strategisches Spiel im Werwolf Spiel 在狼人游戏中进行战略游戏强化学习的语文代理 2310.18940v4 -
08 05-29 Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration Erfahrungsübergreifendes Lernen auf LLM-basierter Multi-Agent-Kollaboration 关于基于LLM的多机构合作的跨任务跨任务经验学习 2505.23187v1 -
09 05-29 Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems Topologisches Strukturlernen sollte eine Forschungspriorität für LLM-basierte Multi-Agent-Systeme sein 地形结构学习应成为以LLM为基础的多种机构系统的研究重点 2505.22467v2 -
10 05-29 MedRAX: Medical Reasoning Agent for Chest X-ray MedRAX: Medizinischer Reasoning Agent für Bruströntgen MedraX: 胸前X光医疗理疗代理 2502.02673v2 -
11 05-29 Learning Recommender Mechanisms for Bayesian Stochastic Games Lern-Empfänger-Mechanismen für Bayesian Stochastic Games 贝耶斯沙沙运动会学习建议机制 2505.22979v1 -
12 05-29 MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming MermaidFlow: Neudefinition der agentischen Workflow-Generierung durch sicherheitsbeschränkte evolutionäre Programmierung 美人鱼:通过受安全限制的进化方案拟订,重新确定干燥性工作流的产生 2505.22967v1 -
13 05-28 (3) A Large Language Model-Enabled Control Architecture for Dynamic Resource Capability Exploration in Multi-Agent Manufacturing Systems Eine großsprachige modellfähige Steuerungsarchitektur für dynamische Ressourcenkapazitäts-Exploration in Multi-Agent-Produktionssystemen 多机构制造系统动态资源能力探索大语言模型化控制结构 2505.22814v1 -
14 05-28 Dynamic Task Adaptation for Multi-Robot Manufacturing Systems with Large Language Models Dynamische Aufgabenanpassung für Multi-Roboter-Produktionssysteme mit großen Sprachmodellen 具有大语言模型的多机器人制造系统动态任务适应 2505.22804v1 -
15 05-28 Enhancing Lifelong Multi-Agent Path-finding by Using Artificial Potential Fields Verbesserung der lebensbegleitenden multi-agenten Path-Finding durch den Einsatz künstlicher Potenzialfelder 利用人造潜在潜力领域加强终身多机构探索 2505.22753v1 -
16 05-28 A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control Ein neuartiges Null-Vertrauens-Identitäts-Framework für Agentische KI: Dezentrale Authentisierung und feinkörnige Zugriffskontrolle AI:分散认证和精密访问控制 2505.19301v2 -
17 05-28 HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym HDDLGym: Ein Tool zum Studieren multi-agenter Hierarchischer Probleme, definiert in HDDL mit OpenAI Gym HDDLGym: 与 OpenAI Gym 一起研究在HDDL 中界定的多代理等级问题的工具 2505.22597v1 -
18 05-28 SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement Synworld: 用于改进制剂行动知识的虚拟情景合成 2504.03561v2 -
19 05-28 From Strangers to Assistants: Fast Desire Alignment for Embodied Agent-User Adaptation Von Fremdlingen zu Assistenten: Schnelles Wunsch-Ausrichtung für eingedickte Agent-User-Anpassung 从陌生人到助理:对装装配剂用户适应的快速理想调整 2505.22503v1 -
20 05-28 OptiMindTune: A Multi-Agent Framework for Intelligent Hyperparameter Optimization OptiMindTune: Multi-Agenten-Framework für intelligente Hyperparameter-Optimierung OptiMindTunne: 智能超参数优化的多机构框架 2505.19205v2 -
21 05-28 The Complexity of Pure Strategy Relevant Equilibria in Concurrent Games Die Komplexität der reinen Strategie Relevante Equilibria in Parallelspielen 同时运动会中纯粹战略相关平衡的复杂性 2505.07501v2 -
22 05-28 Voice CMS: updating the knowledge base of a digital assistant through conversation Voice CMS: Aktualisierung der Wissensbasis eines digitalen Assistenten durch Konversation 语音CMS:通过对话更新数字助理的知识库 2505.22303v1 -
23 05-28 Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration 利用语言代理框架中的双重进程理论促进实时同时人类-AI合作 2502.11882v5 -
24 05-28 Efficient Leave-one-out Approximation in LLM Multi-agent Debate Based on Introspection Effiziente Ein-Aus-Annäherung in der LLM-Multiagenten-Debatte auf der Grundlage von Introspektion 以内审为基础的多机构辩论 2505.22192v1 -
25 05-28 Online Fair Division for Personalized $2$-Value Instances Online Fair Division für Personalisierte $2$-Value Instances 个人个人价值2美元-价值实例在线网上交易会司 2505.22174v1 -
26 05-28 Sentiment Simulation using Generative AI Agents Sentiment-Simulation mit generativen KI-Agenten 使用 “ 产生AI “ 制剂模拟情感 2505.22125v1 -
27 05-28 Benchmarking LLMs’ Swarm intelligence Benchmarking der Swarm-Intelligenz der LLM 基准确定LLLMs的Swarm情报 2505.04364v3 -
28 05-28 AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation AudioGenie: Ein trainingsfreies Multi-Agent-Framework für die vielfältige Multimodalität-zu-Multiaudio-Generierung AudioGenie:多元化多式联运到多民族一代的无培训多机会多机会框架 2505.22053v1 -
29 05-28 Reward-Independent Messaging for Decentralized Multi-Agent Reinforcement Learning Reward-independent Messaging für dezentralisiertes Mehr-Agenten-Verstärkungs-Lernen 权力下放多机构加强学习分权式多机构加强学习的回报独立通信 2505.21985v1 -
30 05-28 Preference-CFR$:$ Beyond Nash Equilibrium for Better Game Strategies Präferenz-CFR$:$ Jenseits von Nash Equilibrium für bessere Spielstrategien 普特-CFR$ =: Nash 后平衡促进更好游戏战略的美元 2411.01217v2 -
31 05-28 Properties of zero-determinant strategies in multichannel games Eigenschaften von Zero-Determinant-Strategien in Multichannel-Spielen 多频道游戏零决定策略属性 2505.21952v1 -
32 05-28 Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development Co-Saving: Ressourcenschonende Multi-Agenten-Kollaboration für Software-Entwicklung 共同节省:为开发软件进行有意识的资源、多机构协作 2505.21898v1 -
33 05-28 Incorporating LLMs for Large-Scale Urban Complex Mobility Simulation Einschließlich LLMs für großräumige Urban Complex Mobility Simulation 大型城市综合流动模拟项目LLMs 2505.21880v1 -
34 05-27 (2) Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation Optimale Output-Feedback-Lernsteuerung für diskrete Zeit lineare quadratische Regulierung 用于分立时线性二次曲线调控的最佳输出反馈学习控制 2503.06226v3 -
35 05-27 Empowering Scientific Workflows with Federated Agents Stärkung wissenschaftlicher Workflows mit Federated Agents 赋予联邦药剂部门科学工作流程权能 2505.05428v2 -
36 05-27 AI-Supported Platform for System Monitoring and Decision-Making in Nuclear Waste Management with Large Language Models AI-unterstützte Plattform für Systemüberwachung und Entscheidungsfindung in der Entsorgung nuklearer Abfälle mit großen Sprachmodellen AI-支持的具有大语言模式的核废物管理系统监测和决策平台 2505.21741v1 -
37 05-27 Communication- and Computation-Efficient Distributed Submodular Optimization in Robot Mesh Networks Kommunikation- und Computation-Effizient verteilte Submodulare Optimierung in Robot Mesh-Netzwerken 机器人网网中的通信和计算-有效分布式子模块优化 2407.10382v3 -
38 05-27 Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper2Poster: Auf dem Weg zur multimodalen Plakatautomatisierung aus wissenschaftlichen Papieren Paper2Poster:从科学论文中走向多式海报自动化 2505.21497v1 -
39 05-27 Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge Agentisches medizinisches Wissen Grafiken verbessern medizinische Frageantworten: Die Lücke zwischen LLMs und sich entwickelndem medizinischem Wissen überbrücken 药用知识图加强医疗问题的回答:缩小LLMM与不断发展的医学知识之间的差距 2502.13010v2 -
40 05-27 Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks Individuelles Verhalten in agentenbasierten Modellen mit Graph Diffusionsnetzwerken lernen 具有图表传播网络的基于代理模型的学习个人行为 2505.21426v1 -
41 05-27 Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery Autonome Multi-Modal LLM-Agenten für die Behandlungsplanung in fokussierter Ultraschallablationschirurgie 重点超声速超声振动外科手术治疗规划代理 2505.21418v1 -
42 05-27 Sequential Resource Trading Using Comparison-Based Gradient Estimation Sequentieller Ressourcenhandel mit Vergleichsbasis-Gradientenschätzung 使用基于比较的逐步梯度估计法进行按顺序进行的资源贸易 2408.11186v3 -
43 05-27 PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning PeerGuard: Verteidigen von Multi-Agenten-Systemen gegen Hintertürangriffe durch gegenseitige Vernunft 同伴保护:捍卫多机构系统,防止通过相互理由进行后门攻击 2505.11642v2 -
44 05-27 Large Language Models Miss the Multi-Agent Mark Große Sprachmodelle vermissen das Multi-Agent Mark 大语言模型 2505.21298v1 -
45 05-27 Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies Breaking the Performance Ceiling in komplexen Verstärkungs-Lernen erfordert Inferenz-Strategien 综合加强学习中业绩上限的打破需要推断战略 2505.21236v1 -
46 05-27 Voting or Consensus? Decision-Making in Multi-Agent Debate Abstimmung oder Konsens? Entscheidungsfindung in Multi-Agent-Debatte 表决还是协商一致?多机构辩论中的决策 2502.19130v2 -
47 05-27 GGBond: Growing Graph-Based AI-Agent Society for Socially-Aware Recommender Simulation GGBond: Wachsende Graphen-basierte KI-Agenten-Gesellschaft für sozial-aware-Empfänger-Simulation GGBond: 不断增长的基于图表的AI-Agent Society 社会软件建议模拟模拟软件 2505.21154v1 -
48 05-27 Stopping Criteria for Value Iteration on Concurrent Stochastic Reachability and Safety Games Stoppen von Kriterien für die Wert-Iteration bei gleichzeitigen stochastischen Erreichbarkeits- und Sicherheitsspielen 停止同时举行存储可达性和安全运动会的价值迭代标准 2505.21087v1 -
49 05-27 Herd Behavior: Investigating Peer Influence in LLM-based Multi-Agent Systems Herdverhalten: Untersuchung des Peer-Einflusses in LLM-basierte Multi-Agent-Systeme 牧民行为:调查基于LLM的多机构机构系统中的同侪影响 2505.21588v1 -
50 05-27 Improving flocking behaviors in street networks with vision Verbesserung des Beflockungsverhaltens in Straßennetzen mit Vision 改善街头网络中有远见的群众行为 2505.21585v1 -
51 05-27 Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective Multi-Agenten-Weltmodellierung aus einer diffusionsinspirierten Perspektive Revue passieren 从传播启发的视角重新审视多股权世界建模 2505.20922v1 -
52 05-27 Generalized Coordination of Partially Cooperative Urban Traffic Generalisierte Koordinierung des teilweise kooperativen Stadtverkehrs 部分合作城市交通协调 2505.20879v1 -
53 05-27 MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems MedSentry: Sicherheitsrisiken in medizinischen LLM-Multiagentensystemen verstehen und mindern MedSentry:了解和减轻医疗LLM多机构系统中的安全风险 2505.20824v1 -
54 05-27 Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System Viele Köpfe sind besser als eins: Verbesserte wissenschaftliche Idee-Generation durch ein LLM-basiertes Multi-Agent-System 许多领导人比一个领导人好得多:由以LLM为基础的多种机构系统改进科学思想的一代 2410.09403v4 -
55 05-27 ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning ReMA: Meta-Denken lernen für LLMs mit Multi-Agenten-Verstärkungs-Lernen ReMA:学习多机构强化学习的LLMLM的元思维 2503.09501v3 -
56 05-27 JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes JaxRobotarium: Schulung und Einsatz von Multi-Roboter-Politik in 10 Minuten JaxRobotior:10分钟内培训和部署多机器人政策 2505.06771v2 -
57 05-26 (1) xChemAgents: Agentic AI for Explainable Quantum Chemistry xChemAgenten: Agentische KI für erklärbare Quantenchemie xchemAgents: 可解释量子化学的AAA剂 2505.20574v1 -
58 05-26 Streamlining Resilient Kubernetes Autoscaling with Multi-Agent Systems via an Automated Online Design Framework Straffung des Resilients Kubernetes Autoscaling mit Multi-Agent Systemen über ein automatisiertes Online-Design-Framework 通过自动在线设计框架与多机构系统自动调整 2505.21559v1 -
59 05-26 Reconceptualizing Smart Microscopy: From Data Collection to Knowledge Creation by Multi-Agent Integration Intelligente Mikroskopie neu konzipieren: Von der Datenerhebung zur Wissenserstellung durch Multi-Agent-Integration 重新概念化智能微镜:从数据收集到通过多机构整合创造知识 2505.20466v1 -
60 05-26 Sable: a Performant, Efficient and Scalable Sequence Model for MARL Sable: ein leistungsfähiges, effizientes und skalierbares Sequenzmodell für MARL 电缆:MARL的性能、高效和可缩放序列模型 2410.01706v5 -
61 05-26 Federated Domain Generalization with Data-free On-server Matching Gradient Föderierte Domain-Verallgemeinerung mit datenfreiem On-Server-Zustimmungs-Gradient 具有无数据观测站上与渐变匹配的无数据观测器的联邦通用域 2501.14653v2 -
62 05-26 Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning Semantic-Aware Ressourcenmanagement für C-V2X Platooning über Multi-Agent Verstärkungslernen 通过多机构强化学习进行 C-V2X 等离子处理的语义软件资源管理 2411.04672v2 -
63 05-26 Multi-Agent Reinforcement Learning in Cybersecurity: From Fundamentals to Applications Multi-Agenten-Verstärkung Lernen in Cybersicherheit: Von Grundlagen zu Anwendungen 网络安全多机构强化多机构网络安全学习:从基础到应用 2505.19837v1 -
64 05-26 Fast and Robust Flocking of Protesters on Street Networks Schnelles und robustes Auspeitschen von Protestierenden auf Straßennetzen 街头网络上抗争者快速和强力封锁 2406.01101v3 -
65 05-26 Adaptive Episode Length Adjustment for Multi-agent Reinforcement Learning Adaptive Anpassung der Episodenlänge für das Multi-Agenten-Verstärkungs-Lernen 多试剂强化学习的适应性分单元长度调整 2505.19637v1 -
66 05-26 Multi-Agent Collaboration via Evolving Orchestration Multi-Agenten-Zusammenarbeit über Evolving Orchestration 通过不断演变的管弦化多机构协作 2505.19591v1 -
67 05-26 LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer LLM-Agent-Controller: Ein universelles Multi-Agent-Großsprachmodellsystem als Steuerungsingenieur LLM-代理主计长:作为控制工程师的通用多代理大型语文示范系统 2505.19567v1 -
68 05-26 DoctorRAG: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients DoctorRAG: Medizinische RAG Durch Textabstufungen Wissen mit Patient Analogie fusionieren 医生RAG:通过文字梯度将医学RAG知识与病人分析知识与病人分析相融合 2505.19538v1 -
69 05-26 VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning VLMLight: Verkehrssignalsteuerung über Vision-Language Meta-Control und Dual-Branch-Reasoning VLMLight:通过视觉语言、超控制和双层理由解释控制交通信号控制 2505.19486v1 -
70 05-26 Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs Gewinnen Sie schnell oder verlieren Sie langsam: Ausgleichende Geschwindigkeit und Genauigkeit in Latenz-Sensitive Entscheidungen von LLMs 慢赢或慢输:LLMs的延缓敏感决定中平衡速度和准确性 2505.19481v1 -
71 05-25 (7) Making Teams and Influencing Agents: Efficiently Coordinating Decision Trees for Interpretable Multi-Agent Reinforcement Learning Teambildung und Beeinflussung von Agenten: Entscheidungsbäume effizient koordinieren für interpretierbares Mehr-Agenten-Verstärkungs-Lernen 建立团队和对代理人产生影响的代理:高效协调可解释的多机构强化学习决策树 2505.19316v1 -
72 05-25 Agentic Information Theory: Ergodicity and Intrinsic Semantics of Information Processes Agentische Informationstheorie: Ergodikität und Intrinsische Semantik von Informationsprozessen 代理信息理论:信息过程的分化和内在的语义 2505.19275v1 -
73 05-25 GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling GUARDIAN: LLM-Multiagent-Kollaborationen mit zeitlicher Graphenmodellierung sichern GUARDIAN: 保护LLM 多机构协作与时间图建模 2505.19234v1 -
74 05-25 Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding Where Paths Collide: Eine umfassende Untersuchung der klassischen und lernbasierten multi-agenten Pathfinding 路径相撞之处:对经典和以学习为基础的多方代理调查的全面调查 2505.19219v1 -
75 05-25 Collaborative Agentic AI Needs Interoperability Across Ecosystems Kollaborative Agentische KI braucht Interoperabilität über Ökosysteme hinweg AI 需要跨生态系统的互操作性 2505.21550v1 -
76 05-25 Interacting Large Language Model Agents. Interpretable Models and Social Learning Interagieren von Large Language Model Agents. Interpretierbare Modelle und soziales Lernen 跨大语言示范工具、可解释模型和社会学习 2411.01271v2 -
77 05-25 Adversarial Bandit over Bandits: Hierarchical Bandits for Online Configuration Management Adversarial Bandit über Bandits: Hierarchische Bandits für Online-Konfigurationsmanagement 反强盗强盗: 用于在线配置管理的等级强盗 2505.19061v1 -
78 05-25 Adaptive Inference through Bayesian and Inverse Bayesian Inference with Symmetry-Bias in Nonstationary Environments Adaptive Schlussfolgerung durch Bayesische und Inverse Bayesische Schlussfolgerung mit Symmetrie-Bias in nichtstationären Umgebungen 在非静止环境中,通过贝耶斯和反贝耶斯和反贝耶斯的同对称-比亚推理,进行适应性推理 2505.12796v3 -
79 05-25 SANNet: A Semantic-Aware Agentic AI Networking Framework for Multi-Agent Cross-Layer Coordination SANNet: Ein Semantic-Aware Agentic AI Networking Framework für die multi-agente Cross-Layer-Koordination SANNet: 多代理人跨行业协调的语义学-敏感物义学AI联网框架 2505.18946v1 -
80 05-24 (6) Distributed Set-membership Filtering Frameworks For Multi-agent Systems With Absolute and Relative Measurements Distributed Set-Membership Filtering Frameworks für Multi-Agent-Systeme mit absoluten und relativen Messungen 具有绝对和相对计量的多试剂系统分布式成员筛选框架 2305.15797v2 -
81 05-24 Coordinated guidance and control for multiple parafoil system landing Koordinierte Führung und Steuerung für die Landung mehrerer Parafoil-Systeme 协调制导和管制多个抛油系统着陆的协调制导和控制 2505.18691v1 -
82 05-24 Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi Erweiterung des Aktionsraums mit Konventionen zur Verbesserung der Multi-Agenten-Kooperation in Hanabi 与公约扩大行动空间,以改进哈纳比多剂合作 2412.06333v3 -
83 05-24 DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation DDO: Dual-Decision-Optimierung durch Multi-Agent-Kollaboration für LLM-basierte medizinische Beratung DDO:通过多方机构协作,优化基于LLM的医疗咨询的双重决定 2505.18630v1 -
84 05-24 An Identity Based Agent Model for Value Alignment Ein identitätsbasiertes Agentenmodell für die Wertausrichtung 基于身份的保值调整代理模型 2401.12159v4 -
85 05-24 MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations MisoDICE: Multi-Agent-Imitation aus nicht gekennzeichneten Mixed-Quality-Demonstrationen MisoDICE:从未贴标签的混合质量示范中多机构吸收 2505.18595v1 -
86 05-24 MASTER: Multi-Agent Security Through Exploration of Roles and Topological Structures – A Comprehensive Framework MASTER: Multi-Agent Sicherheit durch Erforschung von Rollen und topologischen Strukturen – Ein umfassender Rahmen 通过探索作用和地形结构实现多机构安全 – – 综合框架 2505.18572v1 -
87 05-24 MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs MRGAgents: Multi-Agenten-Rahmen für verbesserte medizinische Report-Generation mit Med-LVLMs MRGGGGss: 采用医疗低水平医疗报告制改进医疗报告制的多机构框架 2505.18530v1 -
88 05-24 Group Trip Planning Query Problem with Multimodal Journey Gruppenreiseplanungs-Abfrage-Problem mit multimodaler Reise 具有多模式旅程的问询问题 2502.03144v2 -
89 05-24 TextArena TextArena TextArenna 文本 2504.11442v2 -
90 05-24 EdgeAgentX: A Novel Framework for Agentic AI at the Edge in Military Communication Networks EdgeAgentX: Ein neuartiges Framework für Agentische KI am Rand in militärischen Kommunikationsnetzwerken EdgeAgengengenderX:军事通信网络边缘地带AAA剂性AI新框架 2505.18457v1 -
91 05-24 Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methoden für dezentralisiertes Mehr-Agenten-Verstärkungs-Lernen 分散式多机构强化学习的深神经立体-集中式多机构强化学习方法中全球最佳程度趋同 2505.18433v1
Article 0
Title@2025-05-29 (4): ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork
Title: ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork | ROTATE: Bedauern-getriebenes Open-End-Training für Ad-Hoc-Teamwork | 对特设团队工作不限成员名额培训的遗憾驱动的不限名额培训 2505.23686v1 |
Authors: Caroline Wang, Arrasy Rahman, Jiaxun Cui, Yoonchang Sung, Peter Stone
Developing AI agents capable of collaborating with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT). Existing AHT approaches typically adopt a two-stage pipeline, where first, a fixed population of teammates is generated with the idea that they should be representative of the teammates that will be seen at deployment time, and second, an AHT agent is trained to collaborate well with agents in the population. To date, the research community has focused on designing separate algorithms for each stage. This separation has led to algorithms that generate teammate pools with limited coverage of possible behaviors, and that ignore whether the generated teammates are easy to learn from for the AHT agent. Furthermore, algorithms for training AHT agents typically treat the set of training teammates as static, thus attempting to generalize to previously unseen partner agents without assuming any control over the distribution of training teammates. In this paper, we present a unified framework for AHT by reformulating the problem as an open-ended learning process between an ad hoc agent and an adversarial teammate generator. We introduce ROTATE, a regret-driven, open-ended training algorithm that alternates between improving the AHT agent and generating teammates that probe its deficiencies. Extensive experiments across diverse AHT environments demonstrate that ROTATE significantly outperforms baselines at generalizing to an unseen set of evaluation teammates, thus establishing a new standard for robust and generalizable teamwork.
在多试剂学习(称为特设团队工作(AHT)中,发展能够与先前的隐蔽伙伴合作的AI代理机构是一项基本的概括性挑战。 现有的AHT方法通常采用两阶段管道,第一,固定的队友组成,其想法是他们应当代表部署时将看到的队友,第二,AHT代理机构受过培训,能够与人口中的代理人进行良好合作。到目前为止,研究界一直侧重于为每个阶段设计不同的算法。这种分离导致了一种算法,这种算法产生了团队间集合,对可能的行为的覆盖面有限,而且忽视了所形成的队友是否容易向AHT代理机构学习。此外,培训AHT代理机构的算法通常将一组培训队友视为静态,从而试图向以前不为人所知的伙伴代理人推广,而没有对培训队的分布实行任何控制。在本文中,我们提出了一个AHT的统一框架,作为特设代理和对抗性团队之间开放式学习过程。我们引入了对AHATT机构进行简单性评估的方法,因此,对AOT公司进行一个公开的实验室,对ABATT团队进行多样化的升级。
Article 1
Title@2025-05-29 (4): Collaborative Last-Mile Delivery: A Multi-Platform Vehicle Routing Problem With En-route Charging
Title: Collaborative Last-Mile Delivery: A Multi-Platform Vehicle Routing Problem With En-route Charging | Collaborative Last-Mile Lieferung: Ein Multi-Platform Fahrzeug Routing Problem mit en-route Laden | 合作性最后一式交付:多平台车辆运行问题与连路充电 2505.23584v1 |
Authors: Sumbal Malik, Majid Khonji, Khaled Elbassioni, Jorge Dias
The rapid growth of e-commerce and the increasing demand for timely, cost-effective last-mile delivery have increased interest in collaborative logistics. This research introduces a novel collaborative synchronized multi-platform vehicle routing problem with drones and robots (VRP-DR), where a fleet of $\mathcal{M}$ trucks, $\mathcal{N}$ drones and $\mathcal{K}$ robots, cooperatively delivers parcels. Trucks serve as mobile platforms, enabling the launching, retrieving, and en-route charging of drones and robots, thereby addressing critical limitations such as restricted payload capacities, limited range, and battery constraints. The VRP-DR incorporates five realistic features: (1) multi-visit service per trip, (2) multi-trip operations, (3) flexible docking, allowing returns to the same or different trucks (4) cyclic and acyclic operations, enabling return to the same or different nodes; and (5) en-route charging, enabling drones and robots to recharge while being transported on the truck, maximizing operational efficiency by utilizing idle transit time. The VRP-DR is formulated as a mixed-integer linear program (MILP) to minimize both operational costs and makespan. To overcome the computational challenges of solving large-scale instances, a scalable heuristic algorithm, FINDER (Flexible INtegrated Delivery with Energy Recharge), is developed, to provide efficient, near-optimal solutions. Numerical experiments across various instance sizes evaluate the performance of the MILP and heuristic approaches in terms of solution quality and computation time. The results demonstrate significant time savings of the combined delivery mode over the truck-only mode and substantial cost reductions from enabling multi-visits. The study also provides insights into the effects of en-route charging, docking flexibility, drone count, speed, and payload capacity on system performance.
电子商务的迅速增长和对及时、具有成本效益的最后一英里交货的需求不断增加,增加了对合作物流的兴趣。这一研究引入了新型的协同性多平台机动车辆航线问题,无人机和机器人(VRP-DR)都存在这种新型的多平台车辆航线问题。 在这样的车队中,由一辆卡车、一辆卡车和一辆汽车组成的多路卡车、$mathcal{N}美元和$mathcal{K}机器人,它们合作运送包裹。卡车作为移动平台,能够启动、回收和绕行收取无人机和机器人的费用,从而解决诸如有限有效载荷能力、有限射程和电池限制等关键限制。 VRP-DR包含五个现实的特征:(1) 每趟多路服务,(2)多路业务,(3)灵活的对接,允许返回相同或不同的卡车(4) 自行车和自行车作业,能够返回相同或不同的节点;以及(5) 定期收费,使无人机和机器人能够进行再补给,同时在卡车上运输,通过近距离的流运时间实现最大操作效率。
Article 2
Title@2025-05-29 (4): MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs
Title: MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs | MegaAgent: Ein autonomes LLM-basiertes Multi-Agent-System ohne vordefinierte SOPs | 大型机构:一个以大型自治LLM为基础的没有预先界定的SOP的多机构系统 2408.09955v3 |
Authors: Qian Wang, Tianyu Wang, Zhenheng Tang, Qinbin Li, Nuo Chen, Jingsheng Liang, Bingsheng He
LLM-based multi-agent systems (MAS) have shown promise in tackling complex tasks. However, existing solutions often suffer from limited agent coordination and heavy reliance on predefined Standard Operating Procedures (SOPs), which demand extensive human input. To address these limitations, we propose MegaAgent, a large-scale autonomous LLM-based multi-agent system. MegaAgent generates agents based on task complexity and enables dynamic task decomposition, parallel execution, efficient communication, and comprehensive system monitoring of agents. In evaluations, MegaAgent demonstrates exceptional performance, successfully developing a Gobang game within 800 seconds and scaling up to 590 agents in a national policy simulation to generate multi-domain policies. It significantly outperforms existing systems, such as MetaGPT, in both task completion efficiency and scalability. By eliminating the need for predefined SOPs, MegaAgent demonstrates exceptional scalability and autonomy, setting a foundation for advancing true autonomy in MAS. Our code is available at https://github.com/Xtra-Computing/MegaAgent .
以LLM为主的多试剂系统(MAS)在应对复杂任务方面表现出了希望,然而,现有解决办法往往因代理人协调有限和严重依赖预先确定的标准作业程序(SOPs)而受到影响,这些程序需要大量的人力投入。为解决这些限制,我们建议MegaAgency,这是一个大型自主LMM多试剂系统。MegaAgency根据任务的复杂性产生代理人,使任务能够动态分解、平行执行、高效通信和对代理人的全面系统监测。在评价中,MegaAgency展示了杰出的业绩,在800秒内成功开发了Gobang游戏,并在国家政策模拟中推广到590个代理人,以产生多域政策。它在任务完成效率和可扩缩性方面大大优于MetaGPT等现有系统。通过消除对预先确定的SOPs的需要,MegaAgency展示了异常的可扩展性和自主性,为推进MAS的真正自治奠定了基础。我们的代码可在https://github.com/Xtra-Computing/MegaAgentient查阅。
Article 3
Title@2025-05-29 (4): Agentic Knowledgeable Self-awareness
Title: Agentic Knowledgeable Self-awareness | Agentisch sachkundiges Selbstbewußtsein | A. 动态知识自觉意识 2504.03553v2 |
Authors: Shuofei Qiao, Zhisong Qiu, Baochang Ren, Xiaobin Wang, Xiangyuan Ru, Ningyu Zhang, Xiang Chen, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
Large Language Models (LLMs) have achieved considerable performance across various agentic planning tasks. However, traditional agent planning approaches adopt a “flood irrigation” methodology that indiscriminately injects gold trajectories, external feedback, and domain knowledge into agent models. This practice overlooks the fundamental human cognitive principle of situational self-awareness during decision-making-the ability to dynamically assess situational demands and strategically employ resources during decision-making. We propose agentic knowledgeable self-awareness to address this gap, a novel paradigm enabling LLM-based agents to autonomously regulate knowledge utilization. Specifically, we propose KnowSelf, a data-centric approach that applies agents with knowledgeable self-awareness like humans. Concretely, we devise a heuristic situation judgement criterion to mark special tokens on the agent’s self-explored trajectories for collecting training data. Through a two-stage training process, the agent model can switch between different situations by generating specific special tokens, achieving optimal planning effects with minimal costs. Our experiments demonstrate that KnowSelf can outperform various strong baselines on different tasks and models with minimal use of external knowledge. Code is available at https://github.com/zjunlp/KnowSelf.
大型语言模型(LLMS)在各种代理规划任务中取得了相当大的成绩,然而,传统代理规划方法采用了一种“洪水灌溉”方法,不加区别地将金轨、外部反馈和领域知识注入代理模型中,这种做法忽略了在决策过程中对情况自我认识的基本人类认知原则,即动态地评估形势需求和在决策中战略性地利用资源的能力。我们提出了一种具有代理知识的自我意识来解决这一差距的新模式,使以LLM为基础的代理能够自主地规范知识的利用。具体地说,我们提出了一种以数据为中心的方法,将具有了解情况的自我认识的代理人应用到像人类那样有知识的自我意识的代理人。具体地说,我们设计了一种超常状况判断标准,以标志该代理人收集培训数据的自我探索轨迹的特殊标志。通过两阶段的培训过程,该代理模型可以在不同情况之间转换,产生特定的特殊标志,以最低的成本实现最佳的规划效果。我们的实验表明,“了解自我”可以超越不同任务和模型上的各种强的基线,而很少使用外部知识。《准则》可在 https://gimb/commus.
Article 4
Title@2025-05-29 (4): The challenge of hidden gifts in multi-agent reinforcement learning
Title: The challenge of hidden gifts in multi-agent reinforcement learning | Die Herausforderung der versteckten Gaben in Multi-Agenten-Verstärkung Lernen | 多试剂强化学习中隐藏礼品的挑战 2505.20579v2 |
Authors: Dane Malenfant, Blake A. Richards
Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These “hidden gifts” represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a “hidden gift”. We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of “hidden gifts”, and demonstrate that learning awareness in independent agents can benefit these settings.
有时我们从其他人的行动中受益,即使我们不知道他们采取了这些行动。例如,如果邻居选择不在其家中时不在其家门前停泊,即使不知道他们采取了这一行动,也可以受益。这些“隐藏的礼物”代表了多试剂强化学习(MARL)的一个有趣的挑战,因为当其他人的有益行动被隐藏起来时,就分配信用是非三角的。在这里,我们研究隐藏的礼品的影响,任务很简单,MARL的任务非常简单。在这个任务中,网格世界环境中的代理商有单独的门可以打开,以获得个人报酬。同样,如果所有代理商都打开了他们的家门,他们也可以得到更大的集体奖赏。然而,所有这些“隐藏的礼物”只是当代理人在其他人的有益行动被隐藏起来的时候,集体奖赏才能得到。 值得注意的是,没有什么可以告诉代理商其他代理商已经放下了钥匙,因此,放弃他人的钥匙的行为就是“隐藏的礼物”。我们用不同的门打开门打开了自己的门来获得个人奖赏。同样,如果所有的代理商都打开他们的门门, 包括MAL 算算,那么,我们就能在他们自己学习了一个真正的历史任务中,我们如何在学习这些任务中,我们如何在学习这些任务中,我们是如何学习了。
Article 5
Title@2025-05-29 (4): Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems
Title: Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems | Verständnis der Informationsverbreitungseffekte von Kommunikationstopologien in LLM-basierten Multi-Agent-Systemen | 了解基于LLOM的多机构机构系统中的通信地形对信息传播的影响 2505.23352v1 |
Authors: Xu Shen, Yixin Liu, Yiwei Dai, Yili Wang, Rui Miao, Yue Tan, Shirui Pan, Xin Wang
The communication topology in large language model-based multi-agent systems fundamentally governs inter-agent collaboration patterns, critically shaping both the efficiency and effectiveness of collective decision-making. While recent studies for communication topology automated design tend to construct sparse structures for efficiency, they often overlook why and when sparse and dense topologies help or hinder collaboration. In this paper, we present a causal framework to analyze how agent outputs, whether correct or erroneous, propagate under topologies with varying sparsity. Our empirical studies reveal that moderately sparse topologies, which effectively suppress error propagation while preserving beneficial information diffusion, typically achieve optimal task performance. Guided by this insight, we propose a novel topology design approach, EIB-leanrner, that balances error suppression and beneficial information propagation by fusing connectivity patterns from both dense and sparse graphs. Extensive experiments show the superior effectiveness, communication cost, and robustness of EIB-leanrner.
大型语言模型多试剂系统中的通信表层从根本上制约了机构间协作模式,对集体决策的效率和成效都产生了重要影响。虽然最近关于通信表层自动化设计的研究往往为提高效率而建立稀少的结构,但它们往往忽略了为什么以及何时稀有和密集的地形有助于或阻碍合作。在本文件中,我们提出了一个因果框架,分析代理产出如何在具有不同广度的地形下传播,无论是正确还是错误。我们的实证研究表明,中度稀疏的地形在保存有益的信息传播的同时有效地抑制错误传播,通常能够取得最佳的任务性。我们根据这一洞察,提出了一种新型的地形设计方法,即EIB-leorner,通过利用密度和稀薄的图形的连接模式来平衡错误抑制和有益的信息传播。广泛的实验显示了EIB-leanner的优越性、通信成本和坚固性。
Article 6
Title@2025-05-29 (4): Emergent social conventions and collective bias in LLM populations
Title: Emergent social conventions and collective bias in LLM populations | Emergente soziale Konventionen und kollektive Voreingenommenheit in LLM-Populationen | 新出现的社会习俗和LLM人口的集体偏见 2410.08948v2 |
Authors: Ariel Flint Ashery, Luca Maria Aiello, Andrea Baronchelli
Social conventions are the backbone of social coordination, shaping how individuals form a group. As growing populations of artificial intelligence (AI) agents communicate through natural language, a fundamental question is whether they can bootstrap the foundations of a society. Here, we present experimental results that demonstrate the spontaneous emergence of universally adopted social conventions in decentralized populations of large language model (LLM) agents. We then show how strong collective biases can emerge during this process, even when agents exhibit no bias individually. Last, we examine how committed minority groups of adversarial LLM agents can drive social change by imposing alternative social conventions on the larger population. Our results show that AI systems can autonomously develop social conventions without explicit programming and have implications for designing AI systems that align, and remain aligned, with human values and societal goals.
社会公约是社会协调的支柱,塑造个人如何组成一个群体。随着越来越多的人工智能(AI)人员通过自然语言进行交流,一个根本问题是他们是否能够奠定一个社会的基础。在这里,我们提出实验结果,表明在分散的大型语言模式(LLM)人员群体中,普遍通过的社会公约是自发产生的。然后,我们表明在这个过程中如何产生强烈的集体偏见,即使代理人没有表现出个别的偏见。最后,我们研究有决心的对抗性LLM人员少数群体如何通过将其他社会公约强加给更多的人口来推动社会变革。我们的结果表明,AI系统可以在没有明确规划的情况下自主地制定社会公约,并对设计与人类价值观和社会目标一致的AI系统具有影响。
Article 7
Title@2025-05-29 (4): Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
Title: Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game | Sprachagenten mit Verstärkung Lernen für strategisches Spiel im Werwolf Spiel | 在狼人游戏中进行战略游戏强化学习的语文代理 2310.18940v4 |
Authors: Zelai Xu, Chao Yu, Fei Fang, Yu Wang, Yi Wu
Agents built with large language models (LLMs) have shown great potential across a wide range of domains. However, in complex decision-making tasks, pure LLM-based agents tend to exhibit intrinsic bias in their choice of actions, which is inherited from the model’s training data and results in suboptimal performance. To develop strategic language agents, i.e., agents that generate flexible language actions and possess strong decision-making abilities, we propose a novel framework that powers LLM-based agents with reinforcement learning (RL). We consider Werewolf, a popular social deduction game, as a challenging testbed that emphasizes versatile communication and strategic gameplay. To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates. Then an RL policy trained to optimize the decision-making ability chooses an action from the candidates to play in the game. Extensive experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game. We also conduct human-agent experiments and find that our agents achieve human-level performance and demonstrate strong strategic play.
然而,在复杂的决策任务中,纯粹的LLM代理商往往在选择行动时表现出内在的偏见,这种偏见是从该模式的培训数据所继承的,其结果不尽人意。为了发展战略语言代理商,即产生灵活语言行动和拥有强大决策能力的代理商,我们提议了一个赋予LLM代理商以强化学习能力的新框架。我们认为Wrewolf是一种流行的社会推理游戏,是一种具有挑战性的试金,它强调多功能的沟通和战略游戏。为了减轻语言行动的内在偏见,我们的代理商利用LLM进行推理推理和产生一套不同的行动候选人。然后,为优化决策能力而培训的RL政策从候选人中选择了在游戏中玩的动作。广泛的实验表明,我们的代理商克服了内在的偏见,超越了在Werewolf游戏中现有的LM代理商。我们还进行人力代理实验,发现我们的代理商取得了人的水平表现并展示了强有力的战略游戏。
Article 8
Title@2025-05-29 (4): Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration
Title: Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration | Erfahrungsübergreifendes Lernen auf LLM-basierter Multi-Agent-Kollaboration | 关于基于LLM的多机构合作的跨任务跨任务经验学习 2505.23187v1 |
Authors: Yilong Li, Chen Qian, Yu Xia, Ruijie Shi, Yufan Dang, Zihao Xie, Ziming You, Weize Chen, Cheng Yang, Weichuan Liu, Ye Tian, Xuantang Xiong, Lei Han, Zhiyuan Liu, Maosong Sun
Large Language Model-based multi-agent systems (MAS) have shown remarkable progress in solving complex tasks through collaborative reasoning and inter-agent critique. However, existing approaches typically treat each task in isolation, resulting in redundant computations and limited generalization across structurally similar tasks. To address this, we introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation. We model the task-solving workflow on a graph-structured multi-agent collaboration network, where agents propagate information and coordinate via explicit connectivity. During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards along with the corresponding inputs and outputs into each agent’s individual experience pool. During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step, thereby enabling more accurate and efficient multi-agent collaboration. Experimental results on diverse datasets demonstrate that MAEL empowers agents to learn from prior task experiences effectively-achieving faster convergence and producing higher-quality solutions on current tasks.
语言模型型大型多试剂系统(MAS)在通过协作推理和机构间评析解决复杂任务方面取得了显著进展,然而,现有办法一般都是孤立地处理每项任务,导致重复计算和对结构相似的任务进行有限的概括化。为了解决这个问题,我们引入了多试剂跨任务体验学习(MAEL),这是一个新颖的框架,使LLM驱动的代理商具有明确的跨任务学习和经验积累能力。我们把任务解决工作流程建在一个图形结构多试剂合作网络上,使代理商通过明确的连通性传播信息和协调。在经验学习阶段,我们量化任务解决工作流程中每个步骤的质量,并将由此产生的奖励与相应的投入和产出一起储存到每个代理商的个人经验库中。在推断过程中,代理商检索高回报、任务相关的经验,作为少见的例子,以提高每个推理步骤的有效性,从而能够更准确和高效地进行多试剂合作。关于各种数据集的实验结果显示MAEL使代理商能够从以往的任务经验中学习如何有效实现更快的趋同和提出更高质量的解决办法。
Article 9
Title@2025-05-29 (4): Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems
Title: Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems | Topologisches Strukturlernen sollte eine Forschungspriorität für LLM-basierte Multi-Agent-Systeme sein | 地形结构学习应成为以LLM为基础的多种机构系统的研究重点 2505.22467v2 |
Authors: Jiaxi Yang, Mengqi Zhang, Yiqiao Jin, Hao Chen, Qingsong Wen, Lu Lin, Yi He, Weijie Xu, James Evans, Jindong Wang
Large Language Model-based Multi-Agent Systems (MASs) have emerged as a powerful paradigm for tackling complex tasks through collaborative intelligence. Nevertheless, the question of how agents should be structurally organized for optimal cooperation remains largely unexplored. In this position paper, we aim to gently redirect the focus of the MAS research community toward this critical dimension: develop topology-aware MASs for specific tasks. Specifically, the system consists of three core components - agents, communication links, and communication patterns - that collectively shape its coordination performance and efficiency. To this end, we introduce a systematic, three-stage framework: agent selection, structure profiling, and topology synthesis. Each stage would trigger new research opportunities in areas such as language models, reinforcement learning, graph learning, and generative modeling; together, they could unleash the full potential of MASs in complicated real-world applications. Then, we discuss the potential challenges and opportunities in the evaluation of multiple systems. We hope our perspective and framework can offer critical new insights in the era of agentic AI.
大型语言模型多行为者系统(MAS)已成为通过协作情报处理复杂任务的有力范例,然而,关于应如何从结构上组织代理人以实现最佳合作的问题基本上尚未探讨。在本立场文件中,我们的目标是将MAS研究界的重点轻轻地转向这一关键方面:为具体任务开发具有地貌意识的MAS。具体地说,该系统由三个核心组成部分组成:代理、通信联系和通信模式,共同决定其协调性能和效率。为此,我们引入了一个系统化的、三阶段的框架:代理选择、结构特征分析和地形综合。每个阶段都将在语言模型、强化学习、图表学习和基因模型等领域触发新的研究机会;它们一起可以充分发挥MAS在复杂的现实世界应用中的潜力。然后,我们讨论在评估多种系统方面的潜在挑战和机遇。我们希望我们的观点和框架能够在代理性AI时代提供重要的新见解。
Article 10
Title@2025-05-29 (4): MedRAX: Medical Reasoning Agent for Chest X-ray
Title: MedRAX: Medical Reasoning Agent for Chest X-ray | MedRAX: Medizinischer Reasoning Agent für Bruströntgen | MedraX: 胸前X光医疗理疗代理 2502.02673v2 |
Authors: Adibvafa Fallahpour, Jun Ma, Alif Munim, Hongwei Lyu, Bo Wang
Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care. While recent innovations have led to specialized models for various CXR interpretation tasks, these solutions often operate in isolation, limiting their practical utility in clinical practice. We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. MedRAX dynamically leverages these models to address complex medical queries without requiring additional training. To rigorously evaluate its capabilities, we introduce ChestAgentBench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. Our experiments demonstrate that MedRAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems. Data and code have been publicly available at https://github.com/bowang-lab/MedRAX
切斯特X光片(CXRs)在推动疾病管理和病人护理的关键决定方面发挥着不可或缺的作用。虽然最近的创新导致了各种CXR解释任务的专门模式,但这些解决方案往往孤立地运作,限制了其在临床实践中的实用性。我们介绍MedRAX,这是第一个将最先进的CXR分析工具和多式大型语言模型无缝地纳入统一框架的全方位AI代理。MedRAX积极利用这些模型处理复杂的医疗问题,而不需要额外的培训。为了严格评估其能力,我们引入了ChestAgentBench,这是一个包含7个不同类别2 500个复杂医疗查询的综合基准。我们的实验表明,MedRAX取得了与开放源和专利模型相比的最新的最新业绩,这是向实际部署自动CXR解释系统迈出的重要一步。数据和代码已公布在https://github.com/bowang-lab/MedRAX上。
Article 11
Title@2025-05-29 (4): Learning Recommender Mechanisms for Bayesian Stochastic Games
Title: Learning Recommender Mechanisms for Bayesian Stochastic Games | Lern-Empfänger-Mechanismen für Bayesian Stochastic Games | 贝耶斯沙沙运动会学习建议机制 2505.22979v1 |
Authors: Bengisu Guresti, Chongjie Zhang, Yevgeniy Vorobeychik
An important challenge in non-cooperative game theory is coordinating on a single (approximate) equilibrium from many possibilities - a challenge that becomes even more complex when players hold private information. Recommender mechanisms tackle this problem by recommending strategies to players based on their reported type profiles. A key consideration in such mechanisms is to ensure that players are incentivized to participate, report their private information truthfully, and follow the recommendations. While previous work has focused on designing recommender mechanisms for one-shot and extensive-form games, these approaches cannot be effectively applied to stochastic games, particularly if we constrain recommendations to be Markov stationary policies. To bridge this gap, we introduce a novel bi-level reinforcement learning approach for automatically designing recommender mechanisms in Bayesian stochastic games. Our method produces a mechanism represented by a parametric function (such as a neural network), and is therefore highly efficient at execution time. Experimental results on two repeated and two stochastic games demonstrate that our approach achieves social welfare levels competitive with cooperative multi-agent reinforcement learning baselines, while also providing significantly improved incentive properties.
在不合作的游戏理论中,一个重要挑战是从多种可能性中协调单一(近似)平衡 – – 当玩家持有私人信息时,这一挑战就变得更加复杂。建议机制解决这一问题,根据所报告类型向玩家推荐战略。这种机制中的一个关键考虑是确保玩家受到激励,能够参与,真实地报告其私人信息,并遵循建议。虽然以前的工作重点是为一次性和广泛形式的游戏设计推荐机制,但这些方法无法有效地适用于随机游戏,特别是如果我们限制建议成为Markov固定政策,那么就更复杂了。为了弥合这一差距,我们采用了一种新的双级强化学习方法,在Bayesian 类游戏中自动设计推荐机制。我们的方法产生了一种以参数函数(例如神经网络)为代表的机制,因此在执行时效率很高。两次重复的和两次随机游戏的实验结果表明,我们的方法达到了社会福利水平,具有竞争性,合作性多剂强化学习基线,同时提供显著改进的奖励性特性。
Article 12
Title@2025-05-29 (4): MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming
Title: MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming | MermaidFlow: Neudefinition der agentischen Workflow-Generierung durch sicherheitsbeschränkte evolutionäre Programmierung | 美人鱼:通过受安全限制的进化方案拟订,重新确定干燥性工作流的产生 2505.22967v1 |
Authors: Chengqi Zheng, Jianda Chen, Yueming Lyu, Wen Zheng Terence Ng, Haopeng Zhang, Yew-Soon Ong, Ivor Tsang, Haiyan Yin
Despite the promise of autonomous agentic reasoning, existing workflow generation methods frequently produce fragile, unexecutable plans due to unconstrained LLM-driven construction. We introduce MermaidFlow, a framework that redefines the agentic search space through safety-constrained graph evolution. At its core, MermaidFlow represent workflows as a verifiable intermediate representation using Mermaid, a structured and human-interpretable graph language. We formulate domain-aware evolutionary operators, i.e., crossover, mutation, insertion, and deletion, to preserve semantic correctness while promoting structural diversity, enabling efficient exploration of a high-quality, statically verifiable workflow space. Without modifying task settings or evaluation protocols, MermaidFlow achieves consistent improvements in success rates and faster convergence to executable plans on the agent reasoning benchmark. The experimental results demonstrate that safety-constrained graph evolution offers a scalable, modular foundation for robust and interpretable agentic reasoning systems.
尽管有自主代理推理的希望,但现有的工作流程生成方法往往由于不受限制的LLM驱动的建设而产生脆弱、无法执行的计划。我们引入了美人鱼Flow,这是一个通过安全限制的图形演进重新定义代理搜索空间的框架。在本质上,美人鱼Flow代表工作流程作为可核实的中间代表,使用美人鱼这一结构化和人文解释的图表语言。我们设计了有域觉的演进操作器,即交叉、突变、插入和删除,以便在促进结构多样性的同时保持语义正确性,从而能够有效地探索一个高质量、静态可核查的工作流程空间。在不修改任务设置或评估协议的情况下,美人鱼Flow在成功率方面实现了一致,并更快地与代理推理基准的可执行计划接轨。实验结果表明,受安全限制的图形演进为稳健和可解释的代理推理系统提供了一个可扩展的模块基础。
Article 13
Title@2025-05-28 (3): A Large Language Model-Enabled Control Architecture for Dynamic Resource Capability Exploration in Multi-Agent Manufacturing Systems
Title: A Large Language Model-Enabled Control Architecture for Dynamic Resource Capability Exploration in Multi-Agent Manufacturing Systems | Eine großsprachige modellfähige Steuerungsarchitektur für dynamische Ressourcenkapazitäts-Exploration in Multi-Agent-Produktionssystemen | 多机构制造系统动态资源能力探索大语言模型化控制结构 2505.22814v1 |
Authors: Jonghan Lim, Ilya Kovalenko
Manufacturing environments are becoming more complex and unpredictable due to factors such as demand variations and shorter product lifespans. This complexity requires real-time decision-making and adaptation to disruptions. Traditional control approaches highlight the need for advanced control strategies capable of overcoming unforeseen challenges, as they demonstrate limitations in responsiveness within dynamic industrial settings. Multi-agent systems address these challenges through decentralization of decision-making, enabling systems to respond dynamically to operational changes. However, current multi-agent systems encounter challenges related to real-time adaptation, context-aware decision-making, and the dynamic exploration of resource capabilities. Large language models provide the possibility to overcome these limitations through context-aware decision-making capabilities. This paper introduces a large language model-enabled control architecture for multi-agent manufacturing systems to dynamically explore resource capabilities in response to real-time disruptions. A simulation-based case study demonstrates that the proposed architecture improves system resilience and flexibility. The case study findings show improved throughput and efficient resource utilization compared to existing approaches.
由于需求变化和产品寿命缩短等因素,制造环境变得更加复杂和不可预测。这种复杂性要求实时决策和适应干扰。传统控制方法强调,需要制定能够克服意外挑战的先进控制战略,因为这些战略表明在动态工业环境中反应能力有限。多试剂系统通过下放决策权来应对这些挑战,使系统能够对业务变化作出动态反应。然而,目前的多试剂系统遇到与实时适应、环境意识决策和动态开发资源能力有关的挑战。大型语言模型提供了通过具备环境意识的决策能力克服这些限制的可能性。本文为多剂制造系统引入了大型语言模型化控制结构,以动态地探索资源能力以应对实时干扰。模拟案例研究表明,拟议的结构提高了系统的复原力和灵活性。案例研究结果表明,与现有方法相比,吞吐量和高效利用资源的情况有所改善。
Article 14
Title@2025-05-28 (3): Dynamic Task Adaptation for Multi-Robot Manufacturing Systems with Large Language Models
Title: Dynamic Task Adaptation for Multi-Robot Manufacturing Systems with Large Language Models | Dynamische Aufgabenanpassung für Multi-Roboter-Produktionssysteme mit großen Sprachmodellen | 具有大语言模型的多机器人制造系统动态任务适应 2505.22804v1 |
Authors: Jonghan Lim, Ilya Kovalenko
Recent manufacturing systems are increasingly adopting multi-robot collaboration to handle complex and dynamic environments. While multi-agent architectures support decentralized coordination among robot agents, they often face challenges in enabling real-time adaptability for unexpected disruptions without predefined rules. Recent advances in large language models offer new opportunities for context-aware decision-making to enable adaptive responses to unexpected changes. This paper presents an initial exploratory implementation of a large language model-enabled control framework for dynamic task reassignment in multi-robot manufacturing systems. A central controller agent leverages the large language model’s ability to interpret structured robot configuration data and generate valid reassignments in response to robot failures. Experiments in a real-world setup demonstrate high task success rates in recovering from failures, highlighting the potential of this approach to improve adaptability in multi-robot manufacturing systems.
最近制造系统越来越多地采用多机器人合作来处理复杂和动态环境。虽然多剂结构支持机器人代理机构之间分散协调,但它们往往面临挑战,无法在没有预先确定的规则的情况下实时适应意外干扰。大型语言模型最近的进展为环境意识决策提供了新的机会,以便能够对意外变化作出适应性反应。本文件介绍了在多机器人制造系统中初步探索性地实施大型语言模型化控制框架,以便在多机器人制造系统中进行动态任务重新分配。一个中央控制剂利用大型语言模型的能力来解释结构化机器人配置数据,并针对机器人的故障产生有效的重新配置。在现实世界中进行的实验表明,从失败中恢复的过程中任务成功率很高,突出了这一方法在提高多机器人制造系统中适应性方面的潜力。
Article 15
Title@2025-05-28 (3): Enhancing Lifelong Multi-Agent Path-finding by Using Artificial Potential Fields
Title: Enhancing Lifelong Multi-Agent Path-finding by Using Artificial Potential Fields | Verbesserung der lebensbegleitenden multi-agenten Path-Finding durch den Einsatz künstlicher Potenzialfelder | 利用人造潜在潜力领域加强终身多机构探索 2505.22753v1 |
Authors: Arseniy Pertzovsky, Roni Stern, Ariel Felner, Roie Zivan
We explore the use of Artificial Potential Fields (APFs) to solve Multi-Agent Path Finding (MAPF) and Lifelong MAPF (LMAPF) problems. In MAPF, a team of agents must move to their goal locations without collisions, whereas in LMAPF, new goals are generated upon arrival. We propose methods for incorporating APFs in a range of MAPF algorithms, including Prioritized Planning, MAPF-LNS2, and Priority Inheritance with Backtracking (PIBT). Experimental results show that using APF is not beneficial for MAPF but yields up to a 7-fold increase in overall system throughput for LMAPF.
我们探索利用人造潜力场解决多种代理路径(MAPF)和终身MAPF(LMAPF)问题。在MAPF中,一组代理人必须在不发生碰撞的情况下移动到目标地点,而LMAPF则在到达时产生新的目标。我们提出了将APF纳入多种MAPF算法的方法,包括优先规划、MAPF-LNS2和后跟踪优先继承权(PIBT)。实验结果表明,使用APF对MAPF没有好处,但使LMAPF的系统总吞吐量增加7倍。
Article 16
Title@2025-05-28 (3): A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control
Title: A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control | Ein neuartiges Null-Vertrauens-Identitäts-Framework für Agentische KI: Dezentrale Authentisierung und feinkörnige Zugriffskontrolle | AI:分散认证和精密访问控制 2505.19301v2 |
Authors: Ken Huang, Vineeth Sai Narajala, John Yeoh, Jason Ross, Ramesh Raskar, Youssef Harkati, Jerry Huang, Idan Habler, Chris Hughes
Traditional Identity and Access Management (IAM) systems, primarily designed for human users or static machine identities via protocols such as OAuth, OpenID Connect (OIDC), and SAML, prove fundamentally inadequate for the dynamic, interdependent, and often ephemeral nature of AI agents operating at scale within Multi Agent Systems (MAS), a computational system composed of multiple interacting intelligent agents that work collectively. This paper posits the imperative for a novel Agentic AI IAM framework: We deconstruct the limitations of existing protocols when applied to MAS, illustrating with concrete examples why their coarse-grained controls, single-entity focus, and lack of context-awareness falter. We then propose a comprehensive framework built upon rich, verifiable Agent Identities (IDs), leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), that encapsulate an agents capabilities, provenance, behavioral scope, and security posture. Our framework includes an Agent Naming Service (ANS) for secure and capability-aware discovery, dynamic fine-grained access control mechanisms, and critically, a unified global session management and policy enforcement layer for real-time control and consistent revocation across heterogeneous agent communication protocols. We also explore how Zero-Knowledge Proofs (ZKPs) enable privacy-preserving attribute disclosure and verifiable policy compliance. We outline the architecture, operational lifecycle, innovative contributions, and security considerations of this new IAM paradigm, aiming to establish the foundational trust, accountability, and security necessary for the burgeoning field of agentic AI and the complex ecosystems they will inhabit.
传统身份和准入管理系统(IAM)主要为人类用户设计,或通过OAUT、OIID Connect(OIDC)和SAML等协议为静态机器身份设计的传统身份和准入管理系统(IAM)主要为人类用户设计,这些系统对于在多种代理系统(MAS)内部大规模运作的AI代理商的动态、相互依存和往往短暂的性质来说,根本是不够的,因为多用途系统(MAS)是一个由多种互动智能代理商组成的计算系统,它包含一种新型的代理能力、开源、行为范围和安全态势。 我们的框架包括一个标识服务(ANS)在应用MAS时的局限性,用具体实例说明为什么它们的粗化控制、单一实体重点和缺乏环境意识。 然后,我们提出一个基于丰富、可核查的AID特征的综合框架,利用分散化的标识和可核实的证书。
Article 17
Title@2025-05-28 (3): HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym
Title: HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym | HDDLGym: Ein Tool zum Studieren multi-agenter Hierarchischer Probleme, definiert in HDDL mit OpenAI Gym | HDDLGym: 与 OpenAI Gym 一起研究在HDDL 中界定的多代理等级问题的工具 2505.22597v1 |
Authors: Ngoc La, Ruaridh Mon-Williams, Julie A. Shah
In recent years, reinforcement learning (RL) methods have been widely tested using tools like OpenAI Gym, though many tasks in these environments could also benefit from hierarchical planning. However, there is a lack of a tool that enables seamless integration of hierarchical planning with RL. Hierarchical Domain Definition Language (HDDL), used in classical planning, introduces a structured approach well-suited for model-based RL to address this gap. To bridge this integration, we introduce HDDLGym, a Python-based tool that automatically generates OpenAI Gym environments from HDDL domains and problems. HDDLGym serves as a link between RL and hierarchical planning, supporting multi-agent scenarios and enabling collaborative planning among agents. This paper provides an overview of HDDLGym’s design and implementation, highlighting the challenges and design choices involved in integrating HDDL with the Gym interface, and applying RL policies to support hierarchical planning. We also provide detailed instructions and demonstrations for using the HDDLGym framework, including how to work with existing HDDL domains and problems from International Planning Competitions, exemplified by the Transport domain. Additionally, we offer guidance on creating new HDDL domains for multi-agent scenarios and demonstrate the practical use of HDDLGym in the Overcooked domain. By leveraging the advantages of HDDL and Gym, HDDLGym aims to be a valuable tool for studying RL in hierarchical planning, particularly in multi-agent contexts.
近年来,利用OpenAI Gym等工具广泛测试了强化学习方法(RL),尽管这些环境中的许多任务也可受益于等级规划;然而,缺乏一种工具,使等级规划与RL.HDDL的等级规划紧密结合,用于古典规划,引入一种结构化方法,适合于基于模型的RL(RL),以弥补这一差距;为弥合这一整合,我们引入了基于Python的宝贵工具HDDLGym,这是一个从HDDL域和问题中自动生成多级AI Gym环境的宝贵工具;HDDLGym作为RL规划与等级规划之间的联系纽带,支持多剂设想,促进代理人之间的协作规划;本文概述了HDDDLGym的设计和实施,突出了将HDDL与Gym接口相结合的挑战和设计选择,并应用RL政策支持等级规划。 我们还为使用HDLGyGy框架提供了详细的指导和演示,包括如何与现有的HDDL域内现有的HDL域和在HDDDR的实用规划方面,具体地展示了HDDDDDDL的目标。
Article 18
Title@2025-05-28 (3): SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
Title: SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement | SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement | Synworld: 用于改进制剂行动知识的虚拟情景合成 2504.03561v2 |
Authors: Runnan Fang, Xiaobin Wang, Yuan Liang, Shuofei Qiao, Jialong Wu, Zekun Xi, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate unconventional action spaces. To empower agents to autonomously explore environments, optimize workflows, and enhance their understanding of actions, we propose SynWorld, a framework that allows agents to synthesize possible scenarios with multi-step action invocation within the action space and perform Monte Carlo Tree Search (MCTS) exploration to effectively refine their action knowledge in the current environment. Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments. Code is available at https://github.com/zjunlp/SynWorld.
在代理商及其环境之间的互动中,代理商通过规划和执行行动扩大其能力;然而,LLM代理商在部署于新环境或需要导航非常规行动空间时面临巨大挑战;为增强代理商自主探索环境、优化工作流程和增进对行动的理解的能力,我们提议SynWorld,这是一个允许代理商在行动空间内以多步行动方式综合可能情景的框架,并进行蒙特卡洛树搜索(MCTS)探索,以有效完善其在目前环境中的行动知识。我们的实验证明SynWorld是在新环境中学习行动知识的有效和一般方法。代码可在https://github.com/zjunlp/SynWorld上查阅。
Article 19
Title@2025-05-28 (3): From Strangers to Assistants: Fast Desire Alignment for Embodied Agent-User Adaptation
Title: From Strangers to Assistants: Fast Desire Alignment for Embodied Agent-User Adaptation | Von Fremdlingen zu Assistenten: Schnelles Wunsch-Ausrichtung für eingedickte Agent-User-Anpassung | 从陌生人到助理:对装装配剂用户适应的快速理想调整 2505.22503v1 |
Authors: Yuanfei Wang, Xinju Huang, Fangwei Zhong, Yaodong Yang, Yizhou Wang, Yuanpei Chen, Hao Dong
While embodied agents have made significant progress in performing complex physical tasks, real-world applications demand more than pure task execution. The agents must collaborate with unfamiliar agents and human users, whose goals are often vague and implicit. In such settings, interpreting ambiguous instructions and uncovering underlying desires is essential for effective assistance. Therefore, fast and accurate desire alignment becomes a critical capability for embodied agents. In this work, we first develop a home assistance simulation environment HA-Desire that integrates an LLM-driven human user agent exhibiting realistic value-driven goal selection and communication. The ego agent must interact with this proxy user to infer and adapt to the user’s latent desires. To achieve this, we present a novel framework FAMER for fast desire alignment, which introduces a desire-based mental reasoning mechanism to identify user intent and filter desire-irrelevant actions. We further design a reflection-based communication module that reduces redundant inquiries, and incorporate goal-relevant information extraction with memory persistence to improve information reuse and reduce unnecessary exploration. Extensive experiments demonstrate that our framework significantly enhances both task execution and communication efficiency, enabling embodied agents to quickly adapt to user-specific desires in complex embodied environments.
虽然在从事复杂的物理任务方面已取得了显著的进展,但现实世界应用要求的不仅仅是纯粹的任务执行。代理必须同目标往往模糊和隐含的不熟悉的代理人和人类用户合作。在这种环境下,解释模糊的指示和发现基本愿望对于有效援助至关重要。因此,快速和准确的愿望调整对于体现的代理人来说是一个关键的能力。在这项工作中,我们首先开发一个家庭援助模拟环境HA-Desire,将一个由LLLM驱动的、展示现实价值驱动的目标选择和交流的人类用户代理人融为一体。自我代理必须同这个代理用户互动,以推断和适应用户的潜在愿望。为了达到这个目的,我们提出了一个新的框架,即快速愿望调整FAMER,引入一个基于愿望的精神推理机制,以确定用户的意图和过滤与愿望无关的行动。我们进一步设计一个基于反思的通信模块,减少多余的调查,并纳入与目标相关的信息提取与记忆的耐久性,以改进信息再利用和减少不必要的探索。广泛的实验表明,我们的框架必须大大加强任务执行和通信效率,使装成的代理人能够在复杂的建筑环境中迅速适应用户的具体愿望。
Article 20
Title@2025-05-28 (3): OptiMindTune: A Multi-Agent Framework for Intelligent Hyperparameter Optimization
Title: OptiMindTune: A Multi-Agent Framework for Intelligent Hyperparameter Optimization | OptiMindTune: Multi-Agenten-Framework für intelligente Hyperparameter-Optimierung | OptiMindTunne: 智能超参数优化的多机构框架 2505.19205v2 |
Authors: Meher Bhaskar Madiraju, Meher Sai Preetam Madiraju
Hyperparameter optimization (HPO) is a critical yet challenging aspect of machine learning model development, significantly impacting model performance and generalization. Traditional HPO methods often struggle with high dimensionality, complex interdependencies, and computational expense. This paper introduces OptiMindTune, a novel multi-agent framework designed to intelligently and efficiently optimize hyperparameters. OptiMindTune leverages the collaborative intelligence of three specialized AI agents – a Recommender Agent, an Evaluator Agent, and a Decision Agent – each powered by Google’s Gemini models. These agents address distinct facets of the HPO problem, from model selection and hyperparameter suggestion to robust evaluation and strategic decision-making. By fostering dynamic interactions and knowledge sharing, OptiMindTune aims to converge to optimal hyperparameter configurations more rapidly and robustly than existing single-agent or monolithic approaches. Our framework integrates principles from advanced large language models, and adaptive search to achieve scalable and intelligent AutoML. We posit that this multi-agent paradigm offers a promising avenue for tackling the increasing complexity of modern machine learning model tuning.
超光度优化(HPO)是机器学习模型开发中一个关键但具有挑战性的方面,它极大地影响模型性能和一般化。传统的HPO方法往往与高维度、复杂的相互依存关系和计算费用相抗衡。本文介绍了“OptiMindTune”,这是一个创新的多试剂框架,旨在智能和高效地优化超光度计。OptiMindTune利用了三个专门的AI代理商 – – 一个建议代理商、一个评价代理商和一个决策代理商 – – 的协作情报,每个代理商都由谷歌的双子模型提供动力。这些代理商处理HPO问题的不同方面,从模型选择和超光量参数建议到强有力的评价和战略决策。通过促进动态互动和知识共享,OptiMindTune的目标是与现有的单一代理商或单体方法相比,更快和更强有力地趋同最佳超光度超度配置。我们的框架整合了先进大型语言模型的原则和适应性搜索,以实现可缩和智能自动ML。我们认为,这一多试剂模式为处理日益复杂的现代机器学习模型调整提供了有希望的渠道。
Article 21
Title@2025-05-28 (3): The Complexity of Pure Strategy Relevant Equilibria in Concurrent Games
Title: The Complexity of Pure Strategy Relevant Equilibria in Concurrent Games | Die Komplexität der reinen Strategie Relevante Equilibria in Parallelspielen | 同时运动会中纯粹战略相关平衡的复杂性 2505.07501v2 |
Authors: Purandar Bhaduri
We study rational synthesis problems for concurrent games with $\omega$-regular objectives. Our model of rationality considers only pure strategy Nash equilibria that satisfy either a social welfare or Pareto optimality condition with respect to an $\omega$-regular objective for each agent. This extends earlier work on equilibria in concurrent games, without consideration about their quality. Our results show that the existence of Nash equilibria satisfying social welfare conditions can be computed as efficiently as the constrained Nash equilibrium existence problem. On the other hand, the existence of Nash equilibria satisfying the Pareto optimality condition possibly involves a higher upper bound, except in the case of B"uchi and Muller games, for which all three problems are in the classes P and PSPACE-complete, respectively.
我们研究的是同时游戏的合理合成问题,同时使用美元-美元-经常目标。我们的理性模式只考虑满足社会福利或Pareto最佳条件的纯纳什平衡战略,每个代理商通常以美元-经常目标满足社会福利或Pareto最佳条件。这扩大了先前在同时游戏中平衡工作的范围,而没有考虑其质量。我们的结果表明,满足社会福利条件的纳什平衡的存在可以与受限制的纳什均衡存在问题一样有效地计算。另一方面,满足Pareto最佳条件的纳什平衡存在可能涉及更高的上限,但B\“uchi和Muller游戏除外,因为B"uchi和Muller游戏的所有三个问题都分别属于P和PSPACE类的完成阶段。
Article 22
Title@2025-05-28 (3): Voice CMS: updating the knowledge base of a digital assistant through conversation
Title: Voice CMS: updating the knowledge base of a digital assistant through conversation | Voice CMS: Aktualisierung der Wissensbasis eines digitalen Assistenten durch Konversation | 语音CMS:通过对话更新数字助理的知识库 2505.22303v1 |
Authors: Grzegorz Wolny, Michał Szczerbak
In this study, we propose a solution based on a multi-agent LLM architecture and a voice user interface (VUI) designed to update the knowledge base of a digital assistant. Its usability is evaluated in comparison to a more traditional graphical content management system (CMS), with a focus on understanding the relationship between user preferences and the complexity of the information being provided. The findings demonstrate that, while the overall usability of the VUI is rated lower than the graphical interface, it is already preferred by users for less complex tasks. Furthermore, the quality of content entered through the VUI is comparable to that achieved with the graphical interface, even for highly complex tasks. Obtained qualitative results suggest that a hybrid interface combining the strengths of both approaches could address the key challenges identified during the experiment, such as reducing cognitive load through graphical feedback while maintaining the intuitive nature of voice-based interactions. This work highlights the potential of conversational interfaces as a viable and effective method for knowledge management in specific business contexts.
在这项研究中,我们提出了一个基于多试剂LLM架构和语音用户界面(VUI)的解决方案,目的是更新数字助理的知识基础。与较传统的图形内容管理系统(CMS)相比,我们对其可用性进行了评估,重点是了解用户偏好与所提供信息的复杂性之间的关系。研究结果表明,虽然VUI的总体可用性比图形界面低,但用户已经倾向于较不复杂的任务。此外,通过VUI输入的内容的质量与图形界面所达到的质量相当,甚至与非常复杂的任务相当。获得的质量结果表明,结合两种方法的优势的混合界面可以应对试验期间发现的关键挑战,例如通过图形反馈减少认知负荷,同时保持语音互动的直观性质。这项工作强调了对话界面作为具体商业环境中知识管理的一种可行和有效的方法的潜力。
Article 23
Title@2025-05-28 (3): Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration
Title: Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | 利用语言代理框架中的双重进程理论促进实时同时人类-AI合作 2502.11882v5 |
Authors: Shao Zhang, Xihuai Wang, Wenhao Zhang, Chaoran Li, Junru Song, Tingyu Li, Lin Qiu, Xuezhi Cao, Xunliang Cai, Wen Yao, Weinan Zhang, Xinbing Wang, Ying Wen
Agents built on large language models (LLMs) have excelled in turn-by-turn human-AI collaboration but struggle with simultaneous tasks requiring real-time interaction. Latency issues and the challenge of inferring variable human strategies hinder their ability to make autonomous decisions without explicit instructions. Through experiments with current independent System 1 and System 2 methods, we validate the necessity of using Dual Process Theory (DPT) in real-time tasks. We propose DPT-Agent, a novel language agent framework that integrates System 1 and System 2 for efficient real-time simultaneous human-AI collaboration. DPT-Agent’s System 1 uses a Finite-state Machine (FSM) and code-as-policy for fast, intuitive, and controllable decision-making. DPT-Agent’s System 2 integrates Theory of Mind (ToM) and asynchronous reflection to infer human intentions and perform reasoning-based autonomous decisions. We demonstrate the effectiveness of DPT-Agent through further experiments with rule-based agents and human collaborators, showing significant improvements over mainstream LLM-based frameworks. DPT-Agent can effectively help LLMs convert correct slow thinking and reasoning into executable actions, thereby improving performance. To the best of our knowledge, DPT-Agent is the first language agent framework that achieves successful real-time simultaneous human-AI collaboration autonomously. Code of DPT-Agent can be found in https://github.com/sjtu-marl/DPT-Agent.
基于大型语言模型(LLMS)的代理机构在人与AI之间的交替协作中取得了卓越的成绩,但在同时开展需要实时互动的任务时却挣扎不休。拖延问题和推断可变人类战略的挑战妨碍了他们在没有明确指示的情况下自主决策的能力。通过对目前独立的系统1和系统2方法的实验,我们确认在实时任务中有必要使用双进程理论(DPT),我们提议DPT-Agency,这是一个新的语言代理框架,将系统1和系统2结合起来,用于有效的实时人与AI之间的实时协作。 DPT-Agency系统1使用非官方国家机器(FSM)和代号政策,用于快速、直观和可控制的决策。DPT-Agency系统2结合了思维理论(TM)和不连贯的思考,用于推断人类意图和进行基于推理的自主决定。我们通过与基于规则的代理机构和人合作进一步实验,展示DPT-A级框架的重大改进。DPT-Arent-A/Arental-ass-assimational-defrence,从而有效地将真正的业绩思维框架转化。
Article 24
Title@2025-05-28 (3): Efficient Leave-one-out Approximation in LLM Multi-agent Debate Based on Introspection
Title: Efficient Leave-one-out Approximation in LLM Multi-agent Debate Based on Introspection | Effiziente Ein-Aus-Annäherung in der LLM-Multiagenten-Debatte auf der Grundlage von Introspektion | 以内审为基础的多机构辩论 2505.22192v1 |
Authors: Yue Cui, Liuyi Yao, Zitao Li, Yaliang Li, Bolin Ding, Xiaofang Zhou
Multi-agent systems based on large language models (LLMs) advance automatic task completion in various fields, where debate is a common cooperation form for agents to solve complicated problems with reasoning and cross-review to solidify answers. Assessing the individual contributions of agents within these debates is crucial for system refinement and outcome reliability. Traditional leave-one-out (LOO) method offers a clear framework for evaluating each agent’s role but face challenges in LLM-based systems due to high computational costs and associated financial implications. This paper presents introspective-leave-one-out (IntrospecLOO), a simple yet effective prompting for approximation of LOO in LLM-powered multi-agent debates. IntrospecLOO introduces an additional querying round after standard debates, prompting agents to update their answers while ignoring responses from a designated agent. This strategy effectively isolates and gauges each participant’s influence at a reduced query complexity compared to the original LOO approaches. Validation through experiments on three benchmark datasets confirms the effectiveness of IntrospecLOO.
基于大语言模型(LLMS)的多试剂系统在各领域推动自动完成任务,在各个领域,辩论是一种共同的合作形式,使代理商能够通过推理和交叉审查解决复杂的问题,从而巩固答案。评估这些辩论中代理商的个别贡献对于系统完善和结果可靠性至关重要。传统的一出一出制(LOO)方法为评价每个代理商的作用提供了一个明确的框架,但由于计算成本高和相关的财务影响,在基于LLM的系统中面临挑战。本文件介绍了内向一出一出一出(IntrospectLOOO),这是在LLM驱动的多剂辩论中简单而有效的近似LO(InterspectLOO)的一个简单而有效的提示。IntrospectLOO在标准辩论之后又增加了一轮询问,促使代理商更新其答复,同时忽略指定代理商的答复。这一战略有效地孤立并衡量了每个参与者在与原始LOOU方法相比查询复杂性较低的影响。通过三个基准数据集的试验验证了IntrospectLO的有效性。
Article 25
Title@2025-05-28 (3): Online Fair Division for Personalized $2$-Value Instances
Title: Online Fair Division for Personalized $2$-Value Instances | Online Fair Division für Personalisierte $2$-Value Instances | 个人个人价值2美元-价值实例在线网上交易会司 2505.22174v1 |
Authors: Georgios Amanatidis, Alexandros Lolos, Evangelos Markakis, Victor Turmel
We study an online fair division setting, where goods arrive one at a time and there is a fixed set of $n$ agents, each of whom has an additive valuation function over the goods. Once a good appears, the value each agent has for it is revealed and it must be allocated immediately and irrevocably to one of the agents. It is known that without any assumptions about the values being severely restricted or coming from a distribution, very strong impossibility results hold in this setting. To bypass the latter, we turn our attention to instances where the valuation functions are restricted. In particular, we study personalized $2$-value instances, where there are only two possible values each agent may have for each good, possibly different across agents, and we show how to obtain worst case guarantees with respect to well-known fairness notions, such as maximin share fairness and envy-freeness up to one (or two) good(s). We suggest a deterministic algorithm that maintains a $1/(2n-1)$-MMS allocation at every time step and show that this is the best possible any deterministic algorithm can achieve if one cares about every single time step; nevertheless, eventually the allocation constructed by our algorithm becomes a $1/4$-MMS allocation. To achieve this, the algorithm implicitly maintains a fragile system of priority levels for all agents. Further, we show that, by allowing some limited access to future information, it is possible to have stronger results with less involved approaches. By knowing the values of goods for $n-1$ time steps into the future, we design a matching-based algorithm that achieves an EF$1$ allocation every $n$ time steps, while always maintaining an EF$2$ allocation. Finally, we show that our results allow us to get the first nontrivial guarantees for additive instances in which the ratio of the maximum over the minimum value an agent has for a good is bounded.
我们研究的是在线公平分工设置, 即货物一次到达一个单位, 并且有固定的一套美元代理商, 其中每个代理商对每件货物都有附加价值。 一旦货物出现, 每个代理商的价值就会暴露出来, 并且必须将其不可撤销地立即分配给其中一家代理商。 众所周知, 在没有对价值受到严格限制或分配结果有任何假设的情况下, 极强的不可能在这种设置中维持一种稳定的算法。 为了绕过后者, 我们把注意力转向估值功能受到限制的事例。 特别是, 我们研究的是个人化的2美元价值案例, 其中每个代理商对每件货物只有两种可能的值, 可能是不同的代理商的附加价值。 一旦货物出现良好价值, 每个代理商可能只有两种可能的值。 我们展示的是如何获得最坏的信用保证, 最后, 以最起码的公平性和无嫉妒性的方式, 将我们最起码的算法的算法, 将我们每个分期的每个分期值都保持在1美元上, 并且我们每个分期的分期中, 我们的分期, 将每个分期的每个分期, 以最保守的算法, 我们的分期的分期, 我们的分期, 我们的分期的分期的每个分期, 将比的每个分期的每个分期, 我们的每个分期的分的每个分的分一个比的每个分期, 我们的算法, 我们的分的分的分的分的分的分一个最优先的分, 我们的次的分, 我们的次的分的分,
Article 26
Title@2025-05-28 (3): Sentiment Simulation using Generative AI Agents
Title: Sentiment Simulation using Generative AI Agents | Sentiment-Simulation mit generativen KI-Agenten | 使用 “ 产生AI “ 制剂模拟情感 2505.22125v1 |
Authors: Melrose Tia, Jezreel Sophia Lanuzo, Lei Rigi Baltazar, Marie Joy Lopez-Relente, Diwa Malaya Quiñones, Jason Albia
Traditional sentiment analysis relies on surface-level linguistic patterns and retrospective data, limiting its ability to capture the psychological and contextual drivers of human sentiment. These limitations constrain its effectiveness in applications that require predictive insight, such as policy testing, narrative framing, and behavioral forecasting. We present a robust framework for sentiment simulation using generative AI agents embedded with psychologically rich profiles. Agents are instantiated from a nationally representative survey of 2,485 Filipino respondents, combining sociodemographic information with validated constructs of personality traits, values, beliefs, and socio-political attitudes. The framework includes three stages: (1) agent embodiment via categorical or contextualized encodings, (2) exposure to real-world political and economic scenarios, and (3) generation of sentiment ratings accompanied by explanatory rationales. Using Quadratic Weighted Accuracy (QWA), we evaluated alignment between agent-generated and human responses. Contextualized encoding achieved 92% alignment in replicating original survey responses. In sentiment simulation tasks, agents reached 81%–86% accuracy against ground truth sentiment, with contextualized profile encodings significantly outperforming categorical (p < 0.0001, Cohen’s d = 0.70). Simulation results remained consistent across repeated trials (+/-0.2–0.5% SD) and resilient to variation in scenario framing (p = 0.9676, Cohen’s d = 0.02). Our findings establish a scalable framework for sentiment modeling through psychographically grounded AI agents. This work signals a paradigm shift in sentiment analysis from retrospective classification to prospective and dynamic simulation grounded in psychology of sentiment formation.
传统情绪分析依赖于表面语言模式和追溯性数据,限制了其捕捉人类情绪心理和背景驱动因素的能力。这些限制限制了其在需要预测性洞察力的应用中的效力,例如政策测试、叙述框架和行为预测。我们提供了一个强大的情感模拟框架,使用具有心理丰富特征的基因性爱代理器进行模拟;对2 485菲律宾答卷者进行了具有全国代表性的调查,将社会人口信息与经证实的个性特征、价值观、信仰和社会政治态度等结构相结合。框架包括三个阶段:(1) 代理人通过直观或背景化编码进行体现,(2) 受到真实世界政治和经济情景的影响,(3) 产生情绪评级,伴有解释性理由。我们利用夸德瑞的精度准确度分析(QWA),我们评估了代理者和人类反应之间的匹配。在复制原始调查反应方面实现了92%的匹配。在情感模拟任务中,代理人比地面真相感知感达到81%至86%的准确度,而背景化的直线性分析(p p < 0.001, Cohin’s deligial oral frial frialalal) oralalalalalalalalalal sal real real real real real real real deal deal ex fal deal ex ex ex ex ex ex fal sal sal ex ex be ex ex ex ex ex ex ex ex ex ex ex ex ex ex exmlututismus ex ex sal ex ex ex = = 0.0.0.0.0.0.0.0.7, 0.7, = sal sal = sal sal sal sal =x ex exfal exfal sal sal sal sal sal sal sal sal sal sal sal sal sal ex = sal ex ex exfal ex ex ex ex ex ex ex ex ex ex = sal = sal ex ex ex ex ex ex ex = sal ex ex ex ex ex ex ex ex ex ex ex
Article 27
Title@2025-05-28 (3): Benchmarking LLMs’ Swarm intelligence
Title: Benchmarking LLMs’ Swarm intelligence | Benchmarking der Swarm-Intelligenz der LLM | 基准确定LLLMs的Swarm情报 2505.04364v3 |
Authors: Kai Ruan, Mowen Huang, Ji-Rong Wen, Hao Sun
Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) when operating under strict swarm-like constraints-limited local perception and communication-remains largely unexplored. Existing benchmarks often do not fully capture the unique challenges of decentralized coordination when agents operate with incomplete spatio-temporal information. To bridge this gap, we introduce SwarmBench, a novel benchmark designed to systematically evaluate the swarm intelligence capabilities of LLMs acting as decentralized agents. SwarmBench features five foundational MAS coordination tasks (Pursuit, Synchronization, Foraging, Flocking, Transport) within a configurable 2D grid environment, forcing agents to rely solely on local sensory input ($k\times k$ view) and local communication. We propose metrics for coordination effectiveness and analyze emergent group dynamics. Zero-shot evaluations of leading LLMs (e.g., deepseek-v3, o4-mini) reveal significant task-dependent performance variations. While some rudimentary coordination is observed, our results indicate that current LLMs significantly struggle with robust long-range planning and adaptive strategy formation under the uncertainty inherent in these decentralized scenarios. Assessing LLMs under such swarm-like constraints is crucial for understanding their utility in future decentralized intelligent systems. We release SwarmBench as an open, extensible toolkit-built on a customizable physical system-providing environments, prompts, evaluation scripts, and comprehensive datasets. This aims to foster reproducible research into LLM-based MAS coordination and the theoretical underpinnings of emergent collective behavior under severe informational decentralization. Our code repository is available at https://github.com/x66ccff/swarmbench.
大型语言模型(LLMS) 显示有进行复杂推理的潜力,然而,当多机构系统(MAS)在严格暖化、同步化、福拉里格、Frelocking、运输)的2D网格环境内运行时,大型语言模型(LLLMS) 显示其在多机构系统(MAS)中的突发协调能力,但这种能力基本上没有探索; 现有基准往往不能完全捕捉在代理机构运作时,分散协调的独特挑战; 为了缩小这一差距,我们引入SwarmBench(SwemarmBench)这个新基准,目的是系统系统作为分散的代理机构系统地评估能力; SwemarmBenBench在多机构系统(Pursity, Synchronicrocility, Floadlickrlickral-LMs) 的五个基本任务协调任务(Purabilickral-lickMs)下,我们的成果显示,在内部的Sliver-liver-liveraldealdealdealdeal stress Proport Proview Proview Proviews) Proview Proview中,这是一个稳定化的快速的系统下,在长期的系统下,一个稳定规则规则下,在进行着地评估。
Article 28
Title@2025-05-28 (3): AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation
Title: AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation | AudioGenie: Ein trainingsfreies Multi-Agent-Framework für die vielfältige Multimodalität-zu-Multiaudio-Generierung | AudioGenie:多元化多式联运到多民族一代的无培训多机会多机会框架 2505.22053v1 |
Authors: Yan Rong, Jinting Wang, Shan Yang, Guangzhi Lei, Li Liu
Multimodality-to-Multiaudio (MM2MA) generation faces significant challenges in synthesizing diverse and contextually aligned audio types (e.g., sound effects, speech, music, and songs) from multimodal inputs (e.g., video, text, images), owing to the scarcity of high-quality paired datasets and the lack of robust multi-task learning frameworks. Recently, multi-agent system shows great potential in tackling the above issues. However, directly applying it to MM2MA task presents three critical challenges: (1) inadequate fine-grained understanding of multimodal inputs (especially for video), (2) the inability of single models to handle diverse audio events, and (3) the absence of self-correction mechanisms for reliable outputs. To this end, we propose AudioGenie, a novel training-free multi-agent system featuring a dual-layer architecture with a generation team and a supervisor team. For the generation team, a fine-grained task decomposition and an adaptive Mixture-of-Experts (MoE) collaborative entity are designed for dynamic model selection, and a trial-and-error iterative refinement module is designed for self-correction. The supervisor team ensures temporal-spatial consistency and verifies outputs through feedback loops. Moreover, we build MA-Bench, the first benchmark for MM2MA tasks, comprising 198 annotated videos with multi-type audios. Experiments demonstrate that our AudioGenie outperforms state-of-the-art (SOTA) methods across 9 metrics in 8 tasks. User study further validate the effectiveness of the proposed method in terms of quality, accuracy, alignment, and aesthetic. The anonymous project website with samples can be found at https://audiogenie.github.io/.
由于缺少高质量的配对数据集和缺乏强有力的多任务学习框架,多试剂系统在从多式到多式(MM2MA)的生成方面面临重大挑战。然而,将它直接适用于MMM2MA的任务提出了三大挑战:(1) 对多式联运投入(特别是视频投入)的精细理解不足,(2) 单一模型无法处理不同的音频事件,(3) 缺乏用于可靠产出的自我校正机制(例如,视频、文本、图像),因为缺少高质量的配对数据集,而且缺乏强有力的多任务学习框架。最近,多试剂系统在解决上述问题方面表现出巨大的潜力。然而,将它直接应用于MMM2MA的任务提出了三大挑战:(1) 对多式联运投入(特别是视频投入)的精细理解不足,(2) 单一模型无法处理不同的音效、语音、音乐和歌曲。
Article 29
Title@2025-05-28 (3): Reward-Independent Messaging for Decentralized Multi-Agent Reinforcement Learning
Title: Reward-Independent Messaging for Decentralized Multi-Agent Reinforcement Learning | Reward-independent Messaging für dezentralisiertes Mehr-Agenten-Verstärkungs-Lernen | 权力下放多机构加强学习分权式多机构加强学习的回报独立通信 2505.21985v1 |
Authors: Naoto Yoshida, Tadahiro Taniguchi
In multi-agent reinforcement learning (MARL), effective communication improves agent performance, particularly under partial observability. We propose MARL-CPC, a framework that enables communication among fully decentralized, independent agents without parameter sharing. MARL-CPC incorporates a message learning model based on collective predictive coding (CPC) from emergent communication research. Unlike conventional methods that treat messages as part of the action space and assume cooperation, MARL-CPC links messages to state inference, supporting communication in non-cooperative, reward-independent settings. We introduce two algorithms -Bandit-CPC and IPPO-CPC- and evaluate them in non-cooperative MARL tasks. Benchmarks show that both outperform standard message-as-action approaches, establishing effective communication even when messages offer no direct benefit to the sender. These results highlight MARL-CPC’s potential for enabling coordination in complex, decentralized environments.
在多试剂强化学习(MARL)中,有效的通信可以提高代理商的绩效,特别是在部分可观测性下。我们提议MARL-CPC,这是一个使完全分散的独立代理商之间能够进行交流而不共享参数的框架。MARL-CPC纳入了基于新兴通信研究集体预测编码(CPC)的信息学习模式。与将信息作为行动空间的一部分处理并承担合作的传统方法不同,MARL-CPC将信息连接到国家推断,支持在不合作、不依赖报酬的环境中进行通信。我们引入了两种算法――Bandit-CPC和IPPO-CPC-CPC-,并在不合作的MARL任务中对其进行评估。基准表明,这两种方法都超越了标准信息作为行动的方法,即使在信息没有给发件人带来直接利益的情况下,也建立了有效的通信。这些结果突出了MARL-CPC在复杂、分散的环境中促进协调的潜力。
Article 30
Title@2025-05-28 (3): Preference-CFR$:$ Beyond Nash Equilibrium for Better Game Strategies
Title: Preference-CFR$:$ Beyond Nash Equilibrium for Better Game Strategies | Präferenz-CFR$:$ Jenseits von Nash Equilibrium für bessere Spielstrategien | 普特-CFR$ =: Nash 后平衡促进更好游戏战略的美元 2411.01217v2 |
Authors: Qi Ju, Thomas Tellier, Meng Sun, Zhemei Fang, Yunfeng Luo
Artificial intelligence (AI) has surpassed top human players in a variety of games. In imperfect information games, these achievements have primarily been driven by Counterfactual Regret Minimization (CFR) and its variants for computing Nash equilibrium. However, most existing research has focused on maximizing payoff, while largely neglecting the importance of strategic diversity and the need for varied play styles, thereby limiting AI’s adaptability to different user preferences. To address this gap, we propose Preference-CFR (Pref-CFR), a novel method that incorporates two key parameters: preference degree and vulnerability degree. These parameters enable the AI to adjust its strategic distribution within an acceptable performance loss threshold, thereby enhancing its adaptability to a wider range of strategic demands. In our experiments with Texas Hold’em, Pref-CFR successfully trained Aggressive and Loose Passive styles that not only match original CFR-based strategies in performance but also display clearly distinct behavioral patterns. Notably, for certain hand scenarios, Pref-CFR produces strategies that diverge significantly from both conventional expert heuristics and original CFR outputs, potentially offering novel insights for professional players.
人工智能(AI)在各种游戏中超越了顶尖的人类玩家。在不完善的信息游戏中,这些成就主要是由反事实后悔最小化(CFR)及其用于计算纳什均衡的变体驱动的。然而,大多数现有研究都侧重于最大限度地实现收益,同时在很大程度上忽视了战略多样性的重要性和对不同游戏风格的需要,从而限制了AI适应不同用户的偏好。为了解决这一差距,我们建议了Pref-CFR(Pref-CFR)这一包含两个关键参数的新方法:偏好度和脆弱性度。这些参数使AI能够调整其战略分布,使其适应更广泛的战略需求。在我们与德克萨斯·霍尔德姆的实验中,Pref-CFR成功地培训了侵略性和宽松的风格,这些风格不仅与基于CFRFR的原始业绩战略相匹配,而且明显表现出了不同的行为模式。值得注意的是,Pref-CFR(Pref-CFR)提出了与常规专家的超常态和原CFR产出明显不同的战略。
Article 31
Title@2025-05-28 (3): Properties of zero-determinant strategies in multichannel games
Title: Properties of zero-determinant strategies in multichannel games | Eigenschaften von Zero-Determinant-Strategien in Multichannel-Spielen | 多频道游戏零决定策略属性 2505.21952v1 |
Authors: Masahiko Ueda
Controlling payoffs in repeated games is one of the important topics in control theory of multi-agent systems. Recently proposed zero-determinant strategies enable players to unilaterally enforce linear relations between payoffs. Furthermore, based on the mathematics of zero-determinant strategies, regional payoff control, in which payoffs are enforced into some feasible regions, has been discovered in social dilemma situations. More recently, theory of payoff control was extended to multichannel games, where players parallelly interact with each other in multiple channels. However, properties of zero-determinant strategies specific to multichannel games are still not clear. In this paper, we elucidate properties of zero-determinant strategies in multichannel games. First, we relate the existence condition of zero-determinant strategies in multichannel games to that of zero-determinant strategies in each channel. We then show that the existence of zero-determinant strategies in multichannel games requires the existence of zero-determinant strategies in some channels. This result implies that the existence of zero-determinant strategies in multichannel games is tightly restricted by structure of games played in each channel.
控制重复游戏中的报酬是多试剂系统控制理论的重要议题之一。最近提出的零确定性战略使玩家能够单方面执行付款之间的线性关系。此外,根据零确定性战略的数学,在社会两难状况中发现一些可行的区域支付控制,将付款强制到一些可行的区域。最近,支付控制理论扩大到多渠道游戏,玩家在多个渠道中平行互动。然而,多渠道游戏特有的零确定性战略的特性仍然不明确。在本文件中,我们阐明了多渠道游戏中零确定性战略的特性。首先,我们把多渠道游戏中的零确定性战略的存在条件与每个渠道的零确定性战略的特性联系起来。我们然后表明,多渠道游戏中的零确定性战略的存在要求在某些渠道中存在零确定性战略。这说明多渠道游戏中的零确定性战略的存在受到每个频道游戏结构的严格限制。
Article 32
Title@2025-05-28 (3): Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development
Title: Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development | Co-Saving: Ressourcenschonende Multi-Agenten-Kollaboration für Software-Entwicklung | 共同节省:为开发软件进行有意识的资源、多机构协作 2505.21898v1 |
Authors: Rennai Qiu, Chen Qian, Ran Li, Yufan Dang, Weize Chen, Cheng Yang, Yingli Zhang, Ye Tian, Xuantang Xiong, Lei Han, Zhiyuan Liu, Maosong Sun
Recent advancements in Large Language Models (LLMs) and autonomous agents have demonstrated remarkable capabilities across various domains. However, standalone agents frequently encounter limitations when handling complex tasks that demand extensive interactions and substantial computational resources. Although Multi-Agent Systems (MAS) alleviate some of these limitations through collaborative mechanisms like task decomposition, iterative communication, and role specialization, they typically remain resource-unaware, incurring significant inefficiencies due to high token consumption and excessive execution time. To address these limitations, we propose a resource-aware multi-agent system – Co-Saving (meaning that multiple agents collaboratively engage in resource-saving activities), which leverages experiential knowledge to enhance operational efficiency and solution quality. Our key innovation is the introduction of “shortcuts” – instructional transitions learned from historically successful trajectories – which allows to bypass redundant reasoning agents and expedite the collective problem-solving process. Experiments for software development tasks demonstrate significant advantages over existing methods. Specifically, compared to the state-of-the-art MAS ChatDev, our method achieves an average reduction of 50.85% in token usage, and improves the overall code quality by 10.06%.
大语言模型(LLMS)和自主代理最近的进展在各个领域都表现出了非凡的能力。然而,独立代理商在处理需要广泛互动和大量计算资源的复杂任务时经常遇到限制。尽管多个代理商通过任务分解、迭代通信和角色专业化等合作机制缓解了其中一些限制,但它们通常仍是资源缺乏软件,由于高象征性消费和超时的执行时间而导致严重效率低下。为了解决这些限制,我们提议建立一个资源认知多试剂系统 – – 共同保存(意味着多个代理商共同参与资源节约活动),利用实验性知识提高业务效率和解决方案质量。我们的主要创新是引入“短期”(从历史上成功的轨迹学得的指令性过渡),从而绕过多余的推理剂,加快集体解决问题的进程。软件开发任务实验显示现有方法的重大优势。具体地说,与最新版MAS ChatDev相比,我们的方法平均减少了50.85%的象征性使用率,并通过06提高总体代码质量。
Article 33
Title@2025-05-28 (3): Incorporating LLMs for Large-Scale Urban Complex Mobility Simulation
Title: Incorporating LLMs for Large-Scale Urban Complex Mobility Simulation | Einschließlich LLMs für großräumige Urban Complex Mobility Simulation | 大型城市综合流动模拟项目LLMs 2505.21880v1 |
Authors: Yu-Lun Song, Chung-En Tsern, Che-Cheng Wu, Yu-Ming Chang, Syuan-Bo Huang, Wei-Chu Chen, Michael Chia-Liang Lin, Yu-Ta Lin
This study presents an innovative approach to urban mobility simulation by integrating a Large Language Model (LLM) with Agent-Based Modeling (ABM). Unlike traditional rule-based ABM, the proposed framework leverages LLM to enhance agent diversity and realism by generating synthetic population profiles, allocating routine and occasional locations, and simulating personalized routes. Using real-world data, the simulation models individual behaviors and large-scale mobility patterns in Taipei City. Key insights, such as route heat maps and mode-specific indicators, provide urban planners with actionable information for policy-making. Future work focuses on establishing robust validation frameworks to ensure accuracy and reliability in urban planning applications.
与传统的基于规则的反弹道导弹框架不同,拟议框架利用LLM,通过制作合成人口概况、分配常规和偶发地点以及模拟个人化路线,加强代理人多样性和现实主义。 利用台北市的现实世界数据、模拟模型个人行为和大规模流动模式,重要见解,如路线热图和模式特定指标,为城市规划者提供了可供决策使用的信息。未来工作的重点是建立强有力的验证框架,以确保城市规划应用的准确性和可靠性。
Article 34
Title@2025-05-27 (2): Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation
Title: Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation | Optimale Output-Feedback-Lernsteuerung für diskrete Zeit lineare quadratische Regulierung | 用于分立时线性二次曲线调控的最佳输出反馈学习控制 2503.06226v3 |
Authors: Kedi Xie, Martin Guay, Shimin Wang, Fang Deng, Maobin Lu
This paper studies the linear quadratic regulation (LQR) problem of unknown discrete-time systems via dynamic output feedback learning control. In contrast to the state feedback, the optimality of the dynamic output feedback control for solving the LQR problem requires an implicit condition on the convergence of the state observer. Moreover, due to unknown system matrices and the existence of observer error, it is difficult to analyze the convergence and stability of most existing output feedback learning-based control methods. To tackle these issues, we propose a generalized dynamic output feedback learning control approach with guaranteed convergence, stability, and optimality performance for solving the LQR problem of unknown discrete-time linear systems. In particular, a dynamic output feedback controller is designed to be equivalent to a state feedback controller. This equivalence relationship is an inherent property without requiring convergence of the estimated state by the state observer, which plays a key role in establishing the off-policy learning control approaches. By value iteration and policy iteration schemes, the adaptive dynamic programming based learning control approaches are developed to estimate the optimal feedback control gain. In addition, a model-free stability criterion is provided by finding a nonsingular parameterization matrix, which contributes to establishing a switched iteration scheme. Furthermore, the convergence, stability, and optimality analyses of the proposed output feedback learning control approaches are given. Finally, the theoretical results are validated by two numerical examples.
本文通过动态输出反馈学习控制,研究未知离散时间系统的线性二次调节(LQR)问题; 与州反馈相反,动态输出反馈控制对解决LQR问题的优化要求国家观察员的趋同隐含条件; 此外,由于系统矩阵未知和存在观察者错误,很难分析大多数现有产出反馈基于学习的控制方法的趋同和稳定性; 为解决这些问题,我们提议采用通用的动态产出反馈学习控制方法,保证一致性、稳定性和最佳性能,以解决未知离散时间线性系统LQR问题; 与州反馈相比,动态产出反馈控制器的设计要等同于州反馈控制器; 这种等同关系是一个固有属性,不需要国家观察员对估计的国家进行趋同,而国家观察员在建立离政策学习控制方法方面发挥着关键作用; 通过增值和政策循环计划,我们制定了适应性动态规划学习控制方法,以估计最佳反馈控制收益。 此外,通过寻找非离散时间线性线系统,提供了无型稳定标准,通过找到非离散时间线系统线性反馈控制器。 这种对应关系是一种固有的属性属性属性属性属性属性属性属性属性关系,通过两个分析,从而建立最优化的排序分析。
Article 35
Title@2025-05-27 (2): Empowering Scientific Workflows with Federated Agents
Title: Empowering Scientific Workflows with Federated Agents | Stärkung wissenschaftlicher Workflows mit Federated Agents | 赋予联邦药剂部门科学工作流程权能 2505.05428v2 |
Authors: J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Mansi Sakarvadia, Kyle Chard, Ian Foster
Agentic systems, in which diverse agents cooperate to tackle challenging problems, are exploding in popularity in the AI community. However, the agentic frameworks used to build these systems have not previously enabled use with research cyberinfrastructure. Here we introduce Academy, a modular and extensible middleware designed to deploy autonomous agents across the federated research ecosystem, including HPC systems, experimental facilities, and data repositories. To meet the demands of scientific computing, Academy supports asynchronous execution, heterogeneous resources, high-throughput data flows, and dynamic resource availability. It provides abstractions for expressing stateful agents, managing inter-agent coordination, and integrating computation with experimental control. We present microbenchmark results that demonstrate high performance and scalability in HPC environments. To demonstrate the breadth of applications that can be supported by agentic workflow designs, we also present case studies in materials discovery, decentralized learning, and information extraction in which agents are deployed across diverse HPC systems.
各种代理人合作解决具有挑战性的问题的代理系统在AI社区中正在兴起。然而,用于建立这些系统的代理框架以前还没有能够用于研究网络基础设施。在这里,我们介绍了《教程》,这是一个模块和可扩展的中间软件,旨在在整个联合研究生态系统中部署自主代理人,包括HPC系统、实验设施和数据储存库。为了满足科学计算的需求,学院支持不同步地执行、多种资源、高通量数据流和动态资源可用性。它为表达国家代理人、管理机构间协调以及将计算与实验控制相结合提供了抽象信息。我们提出了显出高性能和可伸缩性的微小标准结果。为了展示能够得到代理工作流程设计支持的应用的广度,我们还介绍了材料发现、分散学习和信息提取方面的案例研究,其中将各种代理人部署在不同的HPC系统。
Article 36
Title@2025-05-27 (2): AI-Supported Platform for System Monitoring and Decision-Making in Nuclear Waste Management with Large Language Models
Title: AI-Supported Platform for System Monitoring and Decision-Making in Nuclear Waste Management with Large Language Models | AI-unterstützte Plattform für Systemüberwachung und Entscheidungsfindung in der Entsorgung nuklearer Abfälle mit großen Sprachmodellen | AI-支持的具有大语言模式的核废物管理系统监测和决策平台 2505.21741v1 |
Authors: Dongjune Chang, Sola Kim, Young Soo Park
Nuclear waste management requires rigorous regulatory compliance assessment, demanding advanced decision-support systems capable of addressing complex legal, environmental, and safety considerations. This paper presents a multi-agent Retrieval-Augmented Generation (RAG) system that integrates large language models (LLMs) with document retrieval mechanisms to enhance decision accuracy through structured agent collaboration. Through a structured 10-round discussion model, agents collaborate to assess regulatory compliance and safety requirements while maintaining document-grounded responses. Implemented on consumer-grade hardware, the system leverages Llama 3.2 and mxbai-embed-large-v1 embeddings for efficient retrieval and semantic representation. A case study of a proposed temporary nuclear waste storage site near Winslow, Arizona, demonstrates the framework’s effectiveness. Results show the Regulatory Agent achieves consistently higher relevance scores in maintaining alignment with legal frameworks, while the Safety Agent effectively manages complex risk assessments requiring multifaceted analysis. The system demonstrates progressive improvement in agreement rates between agents across discussion rounds while semantic drift decreases, indicating enhanced decision-making consistency and response coherence. The system ensures regulatory decisions remain factually grounded, dynamically adapting to evolving regulatory frameworks through real-time document retrieval. By balancing automated assessment with human oversight, this framework offers a scalable and transparent approach to regulatory governance. These findings underscore the potential of AI-driven, multi-agent systems in advancing evidence-based, accountable, and adaptive decision-making for high-stakes environmental management scenarios.
核废料管理要求严格监管合规评估,要求先进的决策支持系统能够处理复杂的法律、环境和安全考虑。本文件介绍了一个多试剂回收启动新一代核废物临时储存站(RAG)系统,该系统将大型语言模型(LLMS)与文件检索机制相结合,通过结构化代理协作提高决策准确性。通过结构化的10轮讨论模式,代理商合作评估监管合规和安全要求,同时保持基于文件的反应。在消费级硬件方面,该系统利用Llama 3.2和mxbai-medi-med-fl-v1嵌入系统,以高效检索和语义代表。对亚利桑那州温斯洛附近拟议临时核废料储存站的案例研究展示了框架的有效性。结果显示,监管机构在保持与法律框架一致方面始终取得了更高的相关性分数,而安全代理商则有效地管理复杂的风险评估,需要进行多方面的分析。该系统显示,在基于语义性流下降的同时,在决策的一致性和反应一致性方面,该系统确保监管决定的判断仍然以事实为基础,动态为动态调整,在不断适应不断演变的监管框架方面,通过实时的、透明性文件回收,从而平衡了这些动态的前瞻性的监管动态的系统。
Article 37
Title@2025-05-27 (2): Communication- and Computation-Efficient Distributed Submodular Optimization in Robot Mesh Networks
Title: Communication- and Computation-Efficient Distributed Submodular Optimization in Robot Mesh Networks | Kommunikation- und Computation-Effizient verteilte Submodulare Optimierung in Robot Mesh-Netzwerken | 机器人网网中的通信和计算-有效分布式子模块优化 2407.10382v3 |
Authors: Zirui Xu, Sandilya Sai Garimella, Vasileios Tzoumas
We provide a communication- and computation-efficient method for distributed submodular optimization in robot mesh networks. Submodularity is a property of diminishing returns that arises in active information gathering such as mapping, surveillance, and target tracking. Our method, Resource-Aware distributed Greedy (RAG), introduces a new distributed optimization paradigm that enables scalable and near-optimal action coordination. To this end, RAG requires each robot to make decisions based only on information received from and about their neighbors. In contrast, the current paradigms allow the relay of information about all robots across the network. As a result, RAG’s decision-time scales linearly with the network size, while state-of-the-art near-optimal submodular optimization algorithms scale cubically. We also characterize how the designed mesh-network topology affects RAG’s approximation performance. Our analysis implies that sparser networks favor scalability without proportionally compromising approximation performance: while RAG’s decision time scales linearly with network size, the gain in approximation performance scales sublinearly. We demonstrate RAG’s performance in simulated scenarios of area detection with up to 45 robots, simulating realistic robot-to-robot (r2r) communication speeds such as the 0.25 Mbps speed of the Digi XBee 3 Zigbee 3.0. In the simulations, RAG enables real-time planning, up to three orders of magnitude faster than competitive near-optimal algorithms, while also achieving superior mean coverage performance. To enable the simulations, we extend the high-fidelity and photo-realistic simulator AirSim by integrating a scalable collaborative autonomy pipeline to tens of robots and simulating r2r communication delays. Our code is available at https://github.com/UM-iRaL/Resource-Aware-Coordination-AirSim.
我们为机器人网状网状网状中分布式子模块优化提供了一种通信和计算效率方法。 子模式是一种回报递减的特性, 它在诸如绘图、 监视和目标跟踪等积极信息收集中产生。 我们的方法, 资源- 软件分布的贪婪( RAG) , 引入一种新的分布式优化模式, 能够进行可缩放和接近最佳的行动协调。 为此, RAG 要求每个机器人仅根据从他们邻居那里收到的信息和关于他们邻居的信息来做决定。 相反, 当前模式允许在整个网状网状中传递关于所有机器人的信息。 结果, RAG 的决定- 时间比例与网络规模成直线, 而RAG 的决定- 时间比例与网络规模成直线, 而RAG 的直线比例是直线的线度, 而最先进的近的亚智能亚缩缩缩缩缩略图 , 将OGAG 3 快速的性能化, 将ODRADA级机型系统模拟区域测算到可操作速度。
Article 38
Title@2025-05-27 (2): Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Title: Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers | Paper2Poster: Auf dem Weg zur multimodalen Plakatautomatisierung aus wissenschaftlichen Papieren | Paper2Poster:从科学论文中走向多式海报自动化 2505.21497v1 |
Authors: Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, Philip Torr
Academic poster generation is a crucial yet challenging task in scientific communication, requiring the compression of long-context interleaved documents into a single, visually coherent page. To address this challenge, we introduce the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i)Visual Quality-semantic alignment with human posters, (ii)Textual Coherence-language fluency, (iii)Holistic Assessment-six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv)PaperQuiz-the poster’s ability to convey core paper content as measured by VLMs answering generated quizzes. Building on this benchmark, we propose PosterAgent, a top-down, visual-in-the-loop multi-agent pipeline: the (a)Parser distills the paper into a structured asset library; the (b)Planner aligns text-visual pairs into a binary-tree layout that preserves reading order and spatial balance; and the (c)Painter-Commenter loop refines each panel by executing rendering code and using VLM feedback to eliminate overflow and ensure alignment. In our comprehensive evaluation, we find that GPT-4o outputs-though visually appealing at first glance-often exhibit noisy text and poor PaperQuiz scores, and we find that reader engagement is the primary aesthetic bottleneck, as human-designed posters rely largely on visual semantics to convey meaning. Our fully open-source variants (e.g. based on the Qwen-2.5 series) outperform existing 4o-driven multi-agent systems across nearly all metrics, while using 87% fewer tokens. It transforms a 22-page paper into a finalized yet editable .pptx poster - all for just $0.005. These findings chart clear directions for the next generation of fully automated poster-generation models. The code and datasets are available at https://github.com/Paper2Poster/Paper2Poster.
学术海报的生成是科学交流中一项至关重要但具有挑战性的任务,需要将长文本间断文件压缩为单一的、视觉一致的页面。为了应对这一挑战,我们为海报的生成引入了第一个基准和标准套件,该套套套件将最近的会议文件与作者设计的海报配对,并评价以下内容:(一) 视觉质量和语义与人类海报的匹配;(二) 视觉一致性-语言流畅;(三) 健康评估-六种精美的审美与信息标准,由VLM-servic判官评为分,特别是(四) PaperQuiz- the 海报基本上能够传递由VLMS回答测量的核心纸质内容。我们在这个基准上,我们提出自上而下、直观-视觉多媒介的管道: (a) 所有Parkerergier 将纸质将纸质提取到一个结构化的资产库库中; (b) 将所有文本和视频配对我们的双向版版版版版版版布局,以维护秩序和空间平衡; (c) 和(c) Pal-heal-hill Ral-hillal-hillimalalalalalalalalalalalal 将每个版本的版本的版本的版本更新数据流,以Sildildal-deal-deal-de
Article 39
Title@2025-05-27 (2): Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge
Title: Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge | Agentisches medizinisches Wissen Grafiken verbessern medizinische Frageantworten: Die Lücke zwischen LLMs und sich entwickelndem medizinischem Wissen überbrücken | 药用知识图加强医疗问题的回答:缩小LLMM与不断发展的医学知识之间的差距 2502.13010v2 |
Authors: Mohammad Reza Rezaei, Reza Saadati Fard, Rahul G. Krishnan, Milad Lankarany
Large Language Models (LLMs) have significantly advanced medical question-answering by leveraging extensive clinical data and medical literature. However, the rapid evolution of medical knowledge and the labor-intensive process of manually updating domain-specific resources pose challenges to the reliability of these systems. To address this, we introduce Agentic Medical Graph-RAG (AMG-RAG), a comprehensive framework that automates the construction and continuous updating of medical knowledge graphs, integrates reasoning, and retrieves current external evidence, such as PubMed and WikiSearch. By dynamically linking new findings and complex medical concepts, AMG-RAG not only improves accuracy but also enhances interpretability in medical queries. Evaluations on the MEDQA and MEDMCQA benchmarks demonstrate the effectiveness of AMG-RAG, achieving an F1 score of 74.1 percent on MEDQA and an accuracy of 66.34 percent on MEDMCQA, outperforming both comparable models and those 10 to 100 times larger. Notably, these improvements are achieved without increasing computational overhead, highlighting the critical role of automated knowledge graph generation and external evidence retrieval in delivering up-to-date, trustworthy medical insights.
大型语言模型(LLMS)通过利用广泛的临床数据和医学文献,大大推进了医学问题解答;然而,医疗知识的迅速发展以及人工更新特定领域资源的劳动密集型过程,给这些系统的可靠性带来了挑战;为此,我们引入了Agric Medical Graph-RAG(AMG-RAG)(AMG-RAG)(AMG-RAG)(AMG-RAG)(AMG-RAG)(AMG-RAG)(AMG-RAG)(AMG-RA)(AMG-RA)(AMG-RA)(AMG-RAG)(AMG-RA)(AMG-RAG)(AMG-RAG-RAG-RAG)(A)(AG)(AMG-MG-RA)(LLLMM)(LLLMMM)(LMMM)(LMMM)(LLLLM)(LMTM)(LMTM)(LMIT)(LID)(LI)(LMT)(LMT)(LI)(LLLID)(LIG(LIG(LM)(LMT)(LMT)(LI)(LM)(LM)(LM)(LM)(L)(L)(LM)(LM)(LLIG(LM)(LLLLLLLID)(LM)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(L)(LM)(L)(L)(LM)(LM)(LM)(L)(L)(LM)(LM)(LM)(L)(L)(LID)(L)(L)(LM)(LM)(LM)(L)(LM)(L)(L)(LM)(LM)(LM)(LI
Article 40
Title@2025-05-27 (2): Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks
Title: Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks | Individuelles Verhalten in agentenbasierten Modellen mit Graph Diffusionsnetzwerken lernen | 具有图表传播网络的基于代理模型的学习个人行为 2505.21426v1 |
Authors: Francesco Cozzi, Marco Pangallo, Alan Perotti, André Panisson, Corrado Monti
Agent-Based Models (ABMs) are powerful tools for studying emergent properties in complex systems. In ABMs, agent behaviors are governed by local interactions and stochastic rules. However, these rules are, in general, non-differentiable, limiting the use of gradient-based methods for optimization, and thus integration with real-world data. We propose a novel framework to learn a differentiable surrogate of any ABM by observing its generated data. Our method combines diffusion models to capture behavioral stochasticity and graph neural networks to model agent interactions. Distinct from prior surrogate approaches, our method introduces a fundamental shift: rather than approximating system-level outputs, it models individual agent behavior directly, preserving the decentralized, bottom-up dynamics that define ABMs. We validate our approach on two ABMs (Schelling’s segregation model and a Predator-Prey ecosystem) showing that it replicates individual-level patterns and accurately forecasts emergent dynamics beyond training. Our results demonstrate the potential of combining diffusion models and graph learning for data-driven ABM simulation.
在反弹道导弹系统中,代理行为受当地互动和随机规则的制约,但这些规则一般是不可区分的,限制使用梯度法优化,从而限制使用梯度法优化,从而与现实世界数据相结合。我们提出了一个新框架,通过观察所生成的数据来学习任何反弹道导弹的不同替代物。我们的方法将收集行为随机性和图形神经网络的传播模型与模拟代理互动结合起来。我们的方法不同于先前的替代方法,我们的方法引入了一种根本的转变:它直接模拟单个代理行为,保护分散的、自下而上的反弹道导弹定义动态。我们验证我们对两种反弹道导弹(先锋隔离模型和先锋-先锋生态系统)采用的方法,表明它复制了个人层面的模式和准确预测超出培训之外的突发动态。我们的结果表明,将扩散模型和图表学习结合起来,用于数据驱动反弹道导弹模拟的潜力。
Article 41
Title@2025-05-27 (2): Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery
Title: Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery | Autonome Multi-Modal LLM-Agenten für die Behandlungsplanung in fokussierter Ultraschallablationschirurgie | 重点超声速超声振动外科手术治疗规划代理 2505.21418v1 |
Authors: Lina Zhao, Jiaxing Bai, Zihao Bian, Qingyue Chen, Yafang Li, Guangbo Li, Min He, Huaiyuan Yao, Zongjiu Zhang
Focused Ultrasound Ablation Surgery (FUAS) has emerged as a promising non-invasive therapeutic modality, valued for its safety and precision. Nevertheless, its clinical implementation entails intricate tasks such as multimodal image interpretation, personalized dose planning, and real-time intraoperative decision-making processes that demand intelligent assistance to improve efficiency and reliability. We introduce FUAS-Agents, an autonomous agent system that leverages the multimodal understanding and tool-using capabilities of large language models (LLMs). By integrating patient profiles and MRI data, FUAS-Agents orchestrates a suite of specialized medical AI tools, including segmentation, treatment dose prediction, and clinical guideline retrieval, to generate personalized treatment plans comprising MRI image, dose parameters, and therapeutic strategies. We evaluate the system in a uterine fibroid treatment scenario. Human assessment by four senior FUAS experts indicates that 82.5%, 82.5%, 87.5%, and 97.5% of the generated plans were rated 4 or above (on a 5-point scale) in terms of completeness, accuracy, fluency, and clinical compliance, respectively. These results demonstrate the potential of LLM-driven agents in enhancing decision-making across complex clinical workflows, and exemplify a translational paradigm that combines general-purpose models with specialized expert systems to solve practical challenges in vertical healthcare domains.
以安全和精确性为价值价值的临床实施是一项有希望的非侵入性治疗模式(FUAS)。然而,临床实施包含复杂的任务,如多式图像判读、个性化剂量规划以及实时的、需要智能援助以提高效率和可靠性的手术内部决策程序。我们引入了FUAS-Agents(FUAS-Agents),这是一个自主代理系统,利用大型语言模型的多式联运理解和工具使用能力。通过整合患者概况和MRI数据,FUAS-Agents将一套专门的医疗AI工具,包括分解、治疗剂量预测和临床准则检索,以产生个性化治疗计划,包括MRI图像、剂量参数和治疗战略。我们在子宫纤维治疗情景中评估这一系统。 FUAS四名高级专家的人类评估表明,生成的计划中有82.5%、8.2.5%、87.5%和97.5%在完整性、准确性能和临床合规性方面被评为4或以上(5个百分点)。这些结果分别表明,在高端的临床治理模式中,将高端翻译机构在专业性、高端选择领域中,在高端翻译领域将高端研究模式中,将高端研究模式与高端研究领域中,将高端研究模式相结合。
Article 42
Title@2025-05-27 (2): Sequential Resource Trading Using Comparison-Based Gradient Estimation
Title: Sequential Resource Trading Using Comparison-Based Gradient Estimation | Sequentieller Ressourcenhandel mit Vergleichsbasis-Gradientenschätzung | 使用基于比较的逐步梯度估计法进行按顺序进行的资源贸易 2408.11186v3 |
Authors: Surya Murthy, Mustafa O. Karabag, Ufuk Topcu
Autonomous agents interact with other autonomous agents and humans of unknown preferences to share resources in their environment. We explore sequential trading for resource allocation in a setting where two greedily rational agents sequentially trade resources from a finite set of categories. Each agent has a utility function that depends on the amount of resources it possesses in each category. The offering agent makes trade offers to improve its utility without knowing the responding agent’s utility function, and the responding agent only accepts offers that improve its utility. To facilitate cooperation between an autonomous agent and another autonomous agent or a human, we present an algorithm for the offering agent to estimate the responding agent’s gradient (preferences) and make offers based on previous acceptance or rejection responses. The algorithm’s goal is to reach a Pareto-optimal resource allocation state while ensuring that the utilities of both agents improve after every accepted trade. The algorithm estimates the responding agent’s gradient by leveraging the rejected offers and the greedy rationality assumption, to prune the space of potential gradients. We show that, after the algorithm makes a finite number of rejected offers, the algorithm either finds a mutually beneficial trade or certifies that the current state is epsilon-weakly Pareto optimal. We compare the proposed algorithm against various baselines in continuous and discrete trading scenarios and show that it improves the societal benefit with fewer offers. Additionally, we validate these findings in a user study with human participants, where the algorithm achieves high performance in scenarios with high resource conflict due to aligned agent goals.
自主代理商与其他自主代理商和具有未知偏好的人进行互动,以便在环境中共享资源。 我们探索在两个贪婪、理性的代理商之间进行资源配置的顺序交易,在这样一个环境下,两个贪婪、理性的代理商顺序交易,其交易功能取决于每个类别拥有的资源数量。 提供代理商提出贸易要提高其效用,而没有了解响应代理商的公用事业功能,而响应代理商只接受改善其效用的报价; 为促进自主代理商和另一个自主的代理商或人之间的合作,我们为提供代理商提出一种算法,以估计响应代理商的梯度(参考),并根据先前的接受或拒绝反应作出报价。 每个代理商的目标是达到一个Pareto-最佳资源分配状态,同时确保两个代理商的公用事业在每次接受交易后都能得到改善。 算法估计了响应代理商的梯度,利用被拒绝的报价和贪婪的合理合理合理假设,以利用潜在的梯度空间。 我们表明,在算法得出一个有限的被拒绝报价后, 算法要么发现一个相互有利的交易,要么根据先前的接受或拒绝反应,根据先前的接受或拒绝反应,根据接受的响应的对策,提出报价,提出报价提出报价提出报价提出报价提出报价提出报价, , 将一个比目前的标准评估一个比标准标准的费率比标准的费率更低的费率比标准,比标准比标准的费率比标准的费率比标准比标准的费率比标准比标准的费率的费率比标准的费率的费率显示,比标准更低的费率的费率比标准更低的费率比标准,比标准,比比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方的费率比方标准。
Article 43
Title@2025-05-27 (2): PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
Title: PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning | PeerGuard: Verteidigen von Multi-Agenten-Systemen gegen Hintertürangriffe durch gegenseitige Vernunft | 同伴保护:捍卫多机构系统,防止通过相互理由进行后门攻击 2505.11642v2 |
Authors: Falong Fan, Xi Li
Multi-agent systems leverage advanced AI models as autonomous agents that interact, cooperate, or compete to complete complex tasks across applications such as robotics and traffic management. Despite their growing importance, safety in multi-agent systems remains largely underexplored, with most research focusing on single AI models rather than interacting agents. This work investigates backdoor vulnerabilities in multi-agent systems and proposes a defense mechanism based on agent interactions. By leveraging reasoning abilities, each agent evaluates responses from others to detect illogical reasoning processes, which indicate poisoned agents. Experiments on LLM-based multi-agent systems, including ChatGPT series and Llama 3, demonstrate the effectiveness of the proposed method, achieving high accuracy in identifying poisoned agents while minimizing false positives on clean agents. We believe this work provides insights into multi-agent system safety and contributes to the development of robust, trustworthy AI interactions.
多试剂系统利用先进的AI模型,作为互动、合作或竞争的自主代理,完成机器人和交通管理等各种应用的复杂任务。尽管这些模型日益重要,但多试剂系统的安全性在很大程度上仍未得到充分探讨,大部分研究侧重于单一的AI模型而不是互动代理。这项工作调查多试剂系统中的后门脆弱性,并提议基于代理的防御机制。通过利用推理能力,每个代理机构评估他人的反应,以发现不合逻辑的推理过程,表明有毒剂。LLLM的多试剂系统实验,包括ChatGPT系列和Llama3,显示了拟议方法的有效性,在识别有毒剂方面实现了高度准确性,同时尽量减少了在清洁剂上的假阳性。我们认为,这项工作为多试剂系统安全提供了深刻的洞察力,并有助于发展稳健可靠的AI互动。
Article 44
Title@2025-05-27 (2): Large Language Models Miss the Multi-Agent Mark
Title: Large Language Models Miss the Multi-Agent Mark | Große Sprachmodelle vermissen das Multi-Agent Mark | 大语言模型 2505.21298v1 |
Authors: Emanuele La Malfa, Gabriele La Malfa, Samuele Marro, Jie M. Zhang, Elizabeth Black, Micheal Luck, Philip Torr, Michael Wooldridge
Recent interest in Multi-Agent Systems of Large Language Models (MAS LLMs) has led to an increase in frameworks leveraging multiple LLMs to tackle complex tasks. However, much of this literature appropriates the terminology of MAS without engaging with its foundational principles. In this position paper, we highlight critical discrepancies between MAS theory and current MAS LLMs implementations, focusing on four key areas: the social aspect of agency, environment design, coordination and communication protocols, and measuring emergent behaviours. Our position is that many MAS LLMs lack multi-agent characteristics such as autonomy, social interaction, and structured environments, and often rely on oversimplified, LLM-centric architectures. The field may slow down and lose traction by revisiting problems the MAS literature has already addressed. Therefore, we systematically analyse this issue and outline associated research opportunities; we advocate for better integrating established MAS concepts and more precise terminology to avoid mischaracterisation and missed opportunities.
最近对多种大语言模型(MAS LLMS)多重代理系统(MAS LLMS)的关心导致利用多种大LMS处理复杂任务的框架增加,然而,许多文献在不与基本原则接触的情况下,将MAS术语适用于MAS术语;在本立场文件中,我们强调MAS理论与目前执行MASLMSLM标准之间的重大差异,侧重于四个关键领域:机构的社会方面、环境设计、协调和通信协议,以及衡量新出现的行为。我们的立场是,许多MASLMS缺乏多种工具特征,如自主、社会互动和结构化环境,而且往往依赖过于简单化的、以LLMM为中心的结构。通过重新审视MAS文献已经解决的问题,外地可能会放慢速度,失去牵引力。因此,我们系统地分析这一问题,并概述相关的研究机会;我们主张更好地整合已确立的MAS概念和更加精确的术语,以避免错误化和错失机会。
Article 45
Title@2025-05-27 (2): Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies
Title: Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies | Breaking the Performance Ceiling in komplexen Verstärkungs-Lernen erfordert Inferenz-Strategien | 综合加强学习中业绩上限的打破需要推断战略 2505.21236v1 |
Authors: Felix Chalumeau, Daniel Rajaonarivonivelomanantsoa, Ruan de Kock, Claude Formanek, Sasha Abramowitz, Oumayma Mahjoub, Wiem Khlifi, Simon Du Toit, Louay Ben Nessir, Refiloe Shabe, Arnol Fokam, Siddarth Singh, Ulrich Mbou Sob, Arnu Pretorius
Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination between multiple agents. This level of complexity can cause even state-of-the-art RL systems, trained until convergence, to hit a performance ceiling which they are unable to break out of with zero-shot inference. Meanwhile, many digital or simulation-based applications allow for an inference phase that utilises a specific time and compute budget to explore multiple attempts before outputting a final solution. In this work, we show that such an inference phase employed at execution time, and the choice of a corresponding inference strategy, are key to breaking the performance ceiling observed in complex multi-agent RL problems. Our main result is striking: we can obtain up to a 126% and, on average, a 45% improvement over the previous state-of-the-art across 17 tasks, using only a couple seconds of extra wall-clock time during execution. We also demonstrate promising compute scaling properties, supported by over 60k experiments, making it the largest study on inference strategies for complex RL to date. Our experimental data and code are available at https://sites.google.com/view/inf-marl.
强化学习(RL)系统有无数的应用,从能源电网管理到蛋白质设计等,然而,这种现实世界情景往往极其困难,在性质上是组合的,需要多个代理商之间复杂的协调。这种复杂程度可能导致甚至最先进的RL系统,经过培训直到趋同,达到一个性能上限,而它们无法用零射推力突破。与此同时,许多基于数字或模拟的应用程序允许一个推论阶段,利用特定时间和计算预算来探索在输出最终解决方案之前的多种尝试。在这项工作中,我们显示,这种在执行时采用的推论阶段和选择相应的推论战略,是打破复杂多试剂RL问题中观察到的性能上限的关键。我们的主要结果十分惊人:我们可以在17项任务中达到126 % ,平均而言,比以往的状态-艺术改进45%,在执行期间只使用几秒钟的墙时钟额外时间。我们还展示了有希望的调整性能,得到超过60公里的实验的支持,在复杂的实验中作出相应的实验/最大数据日期。
Article 46
Title@2025-05-27 (2): Voting or Consensus? Decision-Making in Multi-Agent Debate
Title: Voting or Consensus? Decision-Making in Multi-Agent Debate | Abstimmung oder Konsens? Entscheidungsfindung in Multi-Agent-Debatte | 表决还是协商一致?多机构辩论中的决策 2502.19130v2 |
Authors: Lars Benedikt Kaesberg, Jonas Becker, Jan Philip Wahle, Terry Ruas, Bela Gipp
Much of the success of multi-agent debates depends on carefully choosing the right parameters. The decision-making protocol stands out as it can highly impact final model answers, depending on how decisions are reached. Systematic comparison of decision protocols is difficult because many studies alter multiple discussion parameters beyond the protocol. So far, it has been largely unknown how decision-making influences different tasks. This work systematically evaluates the impact of seven decision protocols (e.g., majority voting, unanimity consensus). We change only one variable at a time - the decision protocol - to analyze how different methods affect the collaboration between agents and measure differences in knowledge and reasoning tasks. Our results show that voting protocols improve performance by 13.2% in reasoning tasks and consensus protocols by 2.8% in knowledge tasks compared to other decision protocols. Increasing the number of agents improves performance, while more discussion rounds before voting reduce it. To improve decision-making by increasing answer diversity, we propose two new methods, All-Agents Drafting (AAD) and Collective Improvement (CI). Our methods improve task performance by up to 3.3% with AAD and up to 7.4% with CI. This work demonstrates the importance of decision-making in multi-agent debates beyond scaling.
多代理人辩论的成功很大程度上取决于如何仔细选择正确的参数。决策协议的突出之处在于它能够对最终模式的答案产生很大影响,取决于如何作出决定。对决定协议的系统比较是困难的,因为许多研究改变了议定书之外的许多讨论参数。到目前为止,人们基本上不知道决策如何影响不同的任务。这项工作系统地评估了七个决定协议的影响(例如多数投票、全体一致协商一致)。我们一次只改变一个变量——决定协议——分析不同方法如何影响代理人之间的合作并衡量知识和推理任务的差异。我们的结果表明,投票协议提高了13.2%的推理任务和协商一致协议的绩效,比其他决定协议提高了2.8 %的推理任务和协商一致协议的绩效。增加代理人的数量提高了绩效,而更多的投票前讨论回合减少了绩效。为了通过增加答案的多样性来改进决策,我们提出了两种新方法,即 “ 所有人起草(AAD) “ 和 “ 集体改进 “ (CI)。我们的方法提高了任务绩效,与AAD的比例提高到3.3%,与CI的比例提高到7.4 %。这项工作表明,在超出规模的多代理人辩论中,必须进行决策。
Article 47
Title@2025-05-27 (2): GGBond: Growing Graph-Based AI-Agent Society for Socially-Aware Recommender Simulation
Title: GGBond: Growing Graph-Based AI-Agent Society for Socially-Aware Recommender Simulation | GGBond: Wachsende Graphen-basierte KI-Agenten-Gesellschaft für sozial-aware-Empfänger-Simulation | GGBond: 不断增长的基于图表的AI-Agent Society 社会软件建议模拟模拟软件 2505.21154v1 |
Authors: Hailin Zhong, Hanlin Wang, Yujun Ye, Meiyi Zhang, Shengxin Zhu
Current personalized recommender systems predominantly rely on static offline data for algorithm design and evaluation, significantly limiting their ability to capture long-term user preference evolution and social influence dynamics in real-world scenarios. To address this fundamental challenge, we propose a high-fidelity social simulation platform integrating human-like cognitive agents and dynamic social interactions to realistically simulate user behavior evolution under recommendation interventions. Specifically, the system comprises a population of Sim-User Agents, each equipped with a five-layer cognitive architecture that encapsulates key psychological mechanisms, including episodic memory, affective state transitions, adaptive preference learning, and dynamic trust-risk assessments. In particular, we innovatively introduce the Intimacy–Curiosity–Reciprocity–Risk (ICR2) motivational engine grounded in psychological and sociological theories, enabling more realistic user decision-making processes. Furthermore, we construct a multilayer heterogeneous social graph (GGBond Graph) supporting dynamic relational evolution, effectively modeling users’ evolving social ties and trust dynamics based on interest similarity, personality alignment, and structural homophily. During system operation, agents autonomously respond to recommendations generated by typical recommender algorithms (e.g., Matrix Factorization, MultVAE, LightGCN), deciding whether to consume, rate, and share content while dynamically updating their internal states and social connections, thereby forming a stable, multi-round feedback loop. This innovative design transcends the limitations of traditional static datasets, providing a controlled, observable environment for evaluating long-term recommender effects.
目前的个人化推荐系统主要依赖静态离线数据进行算法设计和评估,大大限制了它们捕捉长期用户偏好变化和现实世界情景中社会影响动态的能力。为了应对这一根本挑战,我们提议建立一个高不忠的社会模拟平台,将人性化认知媒介和动态社会互动结合起来,以现实地模拟用户行为演变,根据建议干预措施,具体地说,该系统由一组Sim-用户代理系统组成,每个系统配备五层认知结构,包含关键心理机制,包括偶发记忆、感性国家过渡、适应性偏好学习和动态信任风险评估。特别是,我们创新引入了基于心理和社会学理论的亲近性-团结-回报-回报-风险(ICR2)动态社会模拟平台,使用户的多层次社会图(GGGBond图)支持动态关系演变,有效地模拟用户基于兴趣相似性、个性调整、适应性偏向性偏向的社会联系和结构同性风险评估的社会关系和信任动态变化。在系统运行期间,代理者自主地应对以典型的超度建议性建议性――稳定型设计、稳定型G型设计、稳定型数据序列,从而决定其动态内消费比例。
Article 48
Title@2025-05-27 (2): Stopping Criteria for Value Iteration on Concurrent Stochastic Reachability and Safety Games
Title: Stopping Criteria for Value Iteration on Concurrent Stochastic Reachability and Safety Games | Stoppen von Kriterien für die Wert-Iteration bei gleichzeitigen stochastischen Erreichbarkeits- und Sicherheitsspielen | 停止同时举行存储可达性和安全运动会的价值迭代标准 2505.21087v1 |
Authors: Marta Grobelna, Jan Křetínský, Maximilian Weininger
We consider two-player zero-sum concurrent stochastic games (CSGs) played on graphs with reachability and safety objectives. These include degenerate classes such as Markov decision processes or turn-based stochastic games, which can be solved by linear or quadratic programming; however, in practice, value iteration (VI) outperforms the other approaches and is the most implemented method. Similarly, for CSGs, this practical performance makes VI an attractive alternative to the standard theoretical solution via the existential theory of reals. VI starts with an under-approximation of the sought values for each state and iteratively updates them, traditionally terminating once two consecutive approximations are $\epsilon$-close. However, this stopping criterion lacks guarantees on the precision of the approximation, which is the goal of this work. We provide bounded (a.k.a. interval) VI for CSGs: it complements standard VI with a converging sequence of over-approximations and terminates once the over- and under-approximations are $\epsilon$-close.
我们认为,在具有可达性和安全目标的图表上,有两种玩家同时玩零和随机游戏(CSGs)的游戏(CSGs)具有可达性和安全性,其中包括诸如Markov决策程序或转接型随机游戏等堕落的类别,可以通过线性或二次编程解决;然而,在实践中,价值迭代(VI)优于其他方法,是采用得最多的方法。对于CSGs来说,这种实际表现通过真实存在理论,使VI成为标准理论解决办法的有吸引力的替代物。 VI首先,对每个州所寻求的数值的利用不足,并反复更新这些数值,传统上将连续两次近似终止一次的近似值为$\ epslon$- close。然而,这一停止标准缺乏对近似性的精确性的保证,而这正是这项工作的目标。我们为CSGs提供了(a.k.a.a.间隔)六的束缚性:一旦超常和超常次使用后,它就补充了标准的第六号标准六号的超合的顺序并终止。
Article 49
Title@2025-05-27 (2): Herd Behavior: Investigating Peer Influence in LLM-based Multi-Agent Systems
Title: Herd Behavior: Investigating Peer Influence in LLM-based Multi-Agent Systems | Herdverhalten: Untersuchung des Peer-Einflusses in LLM-basierte Multi-Agent-Systeme | 牧民行为:调查基于LLM的多机构机构系统中的同侪影响 2505.21588v1 |
Authors: Young-Min Cho, Sharath Chandra Guntuku, Lyle Ungar
Recent advancements in Large Language Models (LLMs) have enabled the emergence of multi-agent systems where LLMs interact, collaborate, and make decisions in shared environments. While individual model behavior has been extensively studied, the dynamics of peer influence in such systems remain underexplored. In this paper, we investigate herd behavior, the tendency of agents to align their outputs with those of their peers, within LLM-based multi-agent interactions. We present a series of controlled experiments that reveal how herd behaviors are shaped by multiple factors. First, we show that the gap between self-confidence and perceived confidence in peers significantly impacts an agent’s likelihood to conform. Second, we find that the format in which peer information is presented plays a critical role in modulating the strength of herd behavior. Finally, we demonstrate that the degree of herd behavior can be systematically controlled, and that appropriately calibrated herd tendencies can enhance collaborative outcomes. These findings offer new insights into the social dynamics of LLM-based systems and open pathways for designing more effective and adaptive multi-agent collaboration frameworks.
大型语言模型(LLMs)最近的进展使得多试剂系统得以出现,LLMs在共享环境中互动、协作和作出决定。虽然对个人模型行为进行了广泛研究,但这种系统中同侪影响动态仍未得到充分探讨。在本文中,我们调查了同侪行为,即代理人在以LLM为基础的多试剂互动中将其产出与其同侪产出相协调的趋势。我们提出了一系列有控制的实验,揭示了同侪行为如何受多种因素影响。首先,我们表明自信心和对同侪的认知信心之间的差距严重影响了同侪遵守的可能性。第二,我们发现提供同侪信息的格式在改变同侪行为强度方面发挥着关键作用。最后,我们证明可以系统控制同侪行为的程度,适当校正的同系趋势可以增进协作成果。这些研究结果为LM系统的社会动态和设计更有效、更适应性多代理人合作框架的公开途径提供了新的见解。
Article 50
Title@2025-05-27 (2): Improving flocking behaviors in street networks with vision
Title: Improving flocking behaviors in street networks with vision | Verbesserung des Beflockungsverhaltens in Straßennetzen mit Vision | 改善街头网络中有远见的群众行为 2505.21585v1 |
Authors: Guillaume Moinard, Matthieu Latapy
We improve a flocking model on street networks introduced in a previous paper. We expand the field of vision of walkers, making the model more realistic. Under such conditions, we obtain groups of walkers whose gathering times and robustness to break ups are better than previous results. We explain such improvements because the alignment rule with vision guaranties walkers do not split into divergent directions at intersections anymore, and because the attraction rule with vision gathers distant groups. This paves the way to a better understanding of events where walkers have collective decentralized goals, like protests.
我们改进了在前一篇论文中引入的街头网络群策群力模式。 我们扩大了行人视野的范围,使模式更加现实。 在这样的条件下,我们得到了行人群体,他们的聚会时间和断裂的坚韧性优于以往的结果。 我们解释这些改进,因为与愿景保温行人相一致的规则不再在交叉点分裂为不同的方向,还因为有愿景的吸引规则聚集了远方群体。 这为更好地了解行人具有集体分散目标的事件铺平了道路,比如抗议活动。
Article 51
Title@2025-05-27 (2): Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Title: Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective | Multi-Agenten-Weltmodellierung aus einer diffusionsinspirierten Perspektive Revue passieren | 从传播启发的视角重新审视多股权世界建模 2505.20922v1 |
Authors: Yang Zhang, Xinran Li, Jianing Ye, Delin Qu, Shuang Qiu, Chongjie Zhang, Xiu Li, Chenjia Bai
World models have recently attracted growing interest in Multi-Agent Reinforcement Learning (MARL) due to their ability to improve sample efficiency for policy learning. However, accurately modeling environments in MARL is challenging due to the exponentially large joint action space and highly uncertain dynamics inherent in multi-agent systems. To address this, we reduce modeling complexity by shifting from jointly modeling the entire state-action transition dynamics to focusing on the state space alone at each timestep through sequential agent modeling. Specifically, our approach enables the model to progressively resolve uncertainty while capturing the structured dependencies among agents, providing a more accurate representation of how agents influence the state. Interestingly, this sequential revelation of agents’ actions in a multi-agent system aligns with the reverse process in diffusion models–a class of powerful generative models known for their expressiveness and training stability compared to autoregressive or latent variable models. Leveraging this insight, we develop a flexible and robust world model for MARL using diffusion models. Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks, significantly outperforming prior world models in terms of final return and sample efficiency, including MAMuJoCo and Bi-DexHands. DIMA establishes a new paradigm for constructing multi-agent world models, advancing the frontier of MARL research.
最近,世界模型由于能够提高政策学习的抽样效率而吸引了人们对多机构强化学习(MARL)的兴趣,最近世界模型引起了人们对多机构强化学习(MARL)的兴趣。然而,由于多试剂系统固有的大量联合行动空间和高度不确定性的内在动态,MARL准确的模拟环境具有挑战性。为了解决这个问题,我们减少了模型的复杂性,从共同模拟整个国家行动过渡动态转向通过相继代理模式的每个时段以国家空间为重点。具体地说,我们的方法使模型能够逐步解决不确定性,同时捕捉代理之间的结构依赖性,更准确地说明代理方如何影响国家。有趣的是,在多机构系统中连续披露代理人的行动,这与传播模式的反向进程相匹配,即以其表态性和培训稳定性而闻名的强大基因化模型类,与自动递增或潜在变异模型相比。我们利用这种洞察力,利用扩散模型为MARL开发了一个灵活而有力的世界模型。我们的方法,即发泡-激励多机构世界模型(DIMA),在多机构影响国家中更精确地展示了代理人行动,在多个多机构前沿模型和多机构化的前沿模型方面,在前世界上建模模型中取得了最新的世界的先进模型方面的最新业绩。
Article 52
Title@2025-05-27 (2): Generalized Coordination of Partially Cooperative Urban Traffic
Title: Generalized Coordination of Partially Cooperative Urban Traffic | Generalisierte Koordinierung des teilweise kooperativen Stadtverkehrs | 部分合作城市交通协调 2505.20879v1 |
Authors: Max Bastian Mertens, Michael Buchholz
Vehicle-to-anything connectivity, especially for autonomous vehicles, promises to increase passenger comfort and safety of road traffic, for example, by sharing perception and driving intention. Cooperative maneuver planning uses connectivity to enhance traffic efficiency, which has, so far, been mainly considered for automated intersection management. In this article, we present a novel cooperative maneuver planning approach that is generalized to various situations found in urban traffic. Our framework handles challenging mixed traffic, that is, traffic comprising both cooperative connected vehicles and other vehicles at any distribution. Our solution is based on an optimization approach accompanied by an efficient heuristic method for high-load scenarios. We extensively evaluate the proposed planer in a distinctly realistic simulation framework and show significant efficiency gains already at a cooperation rate of 40%. Traffic throughput increases, while the average waiting time and the number of stopped vehicles are reduced, without impacting traffic safety.
车辆与任何物品的连通性,特别是自治车辆的连通性,有望提高乘客的舒适度和道路交通安全,例如,通过共享感知和驾驶意向,提高交通安全。合作演习规划利用连通性提高交通效率,迄今为止,主要考虑进行自动化交叉管理。在本篇文章中,我们提出了一种新的合作机动性规划方法,广泛适用于城市交通中发现的各种情况。我们的框架处理挑战性混合交通,即由合作型连通车辆和其他任何分销车辆组成的交通。我们的解决办法是以优化方法为基础,同时采用高效的超速方法应对高载情景。我们在一个明显现实的模拟框架内对拟议规划者进行了广泛评估,并显示已以40%的合作率大幅提高了效率。交通流量增加,同时减少平均等候时间和被拦截车辆的数量,同时不影响交通安全。
Article 53
Title@2025-05-27 (2): MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems
Title: MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems | MedSentry: Sicherheitsrisiken in medizinischen LLM-Multiagentensystemen verstehen und mindern | MedSentry:了解和减轻医疗LLM多机构系统中的安全风险 2505.20824v1 |
Authors: Kai Chen, Taihang Zhen, Hewei Wang, Kailai Liu, Xinfeng Li, Jing Huo, Tianpei Yang, Jinfeng Xu, Wei Dong, Yang Gao
As large language models (LLMs) are increasingly deployed in healthcare, ensuring their safety, particularly within collaborative multi-agent configurations, is paramount. In this paper we introduce MedSentry, a benchmark comprising 5 000 adversarial medical prompts spanning 25 threat categories with 100 subthemes. Coupled with this dataset, we develop an end-to-end attack-defense evaluation pipeline to systematically analyze how four representative multi-agent topologies (Layers, SharedPool, Centralized, and Decentralized) withstand attacks from ‘dark-personality’ agents. Our findings reveal critical differences in how these architectures handle information contamination and maintain robust decision-making, exposing their underlying vulnerability mechanisms. For instance, SharedPool’s open information sharing makes it highly susceptible, whereas Decentralized architectures exhibit greater resilience thanks to inherent redundancy and isolation. To mitigate these risks, we propose a personality-scale detection and correction mechanism that identifies and rehabilitates malicious agents, restoring system safety to near-baseline levels. MedSentry thus furnishes both a rigorous evaluation framework and practical defense strategies that guide the design of safer LLM-based multi-agent systems in medical domains.
由于大型语言模型(LLMS)越来越多地用于医疗保健,确保它们的安全,特别是在多试剂协作配置中。在本文件中,我们引入了MedSentry,这是一个由5 000个对抗性医疗提示组成的基准,涉及25个威胁类别,共100个次主题。结合这一数据集,我们开发了一个端到端攻击-防御评价管道,以便系统分析四种具有代表性的多试剂(Layers、共享Pool、中央化和分散化)如何抵御“暗人”剂的袭击。我们的发现揭示出这些结构如何处理信息污染和保持稳健的决策,暴露其基本的脆弱性机制方面存在重大差异。例如,共享Pool的开放信息共享使其非常容易受感染,而分散式结构则由于固有的冗余和孤立而表现出更大的复原力。为了减轻这些风险,我们提议了个性规模的检测和矫正机制,用以识别和修复恶意剂,将系统安全恢复到近基线水平。MedSentry因此提供了严格的评估框架和实用防御战略,用以指导医疗领域的更安全的LM多试系统的设计。
Article 54
Title@2025-05-27 (2): Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System
Title: Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | Viele Köpfe sind besser als eins: Verbesserte wissenschaftliche Idee-Generation durch ein LLM-basiertes Multi-Agent-System | 许多领导人比一个领导人好得多:由以LLM为基础的多种机构系统改进科学思想的一代 2410.09403v4 |
Authors: Haoyang Su, Renqi Chen, Shixiang Tang, Zhenfei Yin, Xinzhe Zheng, Jinzhe Li, Biqing Qi, Qi Wu, Hui Li, Wanli Ouyang, Philip Torr, Bowen Zhou, Nanqing Dong
The rapid advancement of scientific progress requires innovative tools that can accelerate knowledge discovery. Although recent AI methods, particularly large language models (LLMs), have shown promise in tasks such as hypothesis generation and experimental design, they fall short of replicating the collaborative nature of real-world scientific practices, where diverse experts work together in teams to tackle complex problems. To address the limitations, we propose an LLM-based multi-agent system, i.e., Virtual Scientists (VirSci), designed to mimic the teamwork inherent in scientific research. VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. Through comprehensive experiments, we demonstrate that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas. We further investigate the collaboration mechanisms that contribute to its tendency to produce ideas with higher novelty, offering valuable insights to guide future research and illuminating pathways toward building a robust system for autonomous scientific discovery. The code is available at https://github.com/open-sciencelab/Virtual-Scientists.
科学进步的迅速发展需要创新工具来加速知识的发现。尽管最近的人工智能方法,特别是大型语言模型(LLMs)在假设生成和实验设计等任务中表现出了希望,但它们没有能够复制现实世界科学实践的协作性质,在现实世界科学实践中,不同专家共同合作解决复杂问题。为了解决这些局限性,我们提议建立一个基于LLM的多试剂系统,即虚拟科学家(VirSci),旨在模仿科学研究中固有的团队精神。VirSci组织一个代理团队,协作生成、评估和完善研究理念。通过全面实验,我们证明这种多试剂方法在产生新科学理念方面优于最先进的方法。我们进一步调查有助于其产生新颖思想的趋势的合作机制,为指导未来研究提供宝贵的见解,并为建立强有力的自主科学发现系统指明道路。该代码见https://github.com/open-sciallab/Virtual-Scicientists。
Article 55
Title@2025-05-27 (2): ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Title: ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning | ReMA: Meta-Denken lernen für LLMs mit Multi-Agenten-Verstärkungs-Lernen | ReMA:学习多机构强化学习的LLMLM的元思维 2503.09501v3 |
Authors: Ziyu Wan, Yunxiang Li, Xiaoyu Wen, Yan Song, Hanjing Wang, Linyi Yang, Mark Schmidt, Jun Wang, Weinan Zhang, Shuyue Hu, Ying Wen
Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking – enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Additionally, we further extend ReMA to multi-turn interaction settings, leveraging turn-level ratio and parameter sharing to improve efficiency. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs. Our code can be found in https://github.com/ziyuwan/ReMA-public
最近关于大语言模型解释的研究试图通过整合元思维 – – 使模型能够监测、评价和控制其推理过程,以便更适应和更有效地解决问题 – – 进一步提高其绩效。然而,目前的单一代理工作缺乏获得元思维的专门设计,导致低效率。为了应对这一挑战,我们引入了强化元思考代理(REMA),这是一个利用多代理强化学习(MARL)获得元思维行为的新框架,鼓励LLMS思考思考思维。ReMA将推理过程分为两个等级:一个高级元思考代理,负责制定战略监督和计划,以及一个用于详细处决的低级别推理代理。通过与目标一致的相互强化学习,这些代理探索和学习协作,从而改进总体化和稳健性。单试实验的实证结果表明,ReMA在复杂的推理任务方面超越了单一代理RLL的基线,包括竞争性数学基准和LM-as-Judio基准。此外,我们进一步扩展了REMA的推理能力,将每个具有代表性的推理能力推理能力提高到了我们所处的推理学系。
Article 56
Title@2025-05-27 (2): JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes
Title: JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes | JaxRobotarium: Schulung und Einsatz von Multi-Roboter-Politik in 10 Minuten | JaxRobotior:10分钟内培训和部署多机器人政策 2505.06771v2 |
Authors: Shalin Anand Jain, Jiazhen Liu, Siva Kailas, Harish Ravichandar
Multi-agent reinforcement learning (MARL) has emerged as a promising solution for learning complex and scalable coordination behaviors in multi-robot systems. However, established MARL platforms (e.g., SMAC and MPE) lack robotics relevance and hardware deployment, leaving multi-robot learning researchers to develop bespoke environments and hardware testbeds dedicated to the development and evaluation of their individual contributions. The Multi-Agent RL Benchmark and Learning Environment for the Robotarium (MARBLER) is an exciting recent step in providing a standardized robotics-relevant platform for MARL, by bridging the Robotarium testbed with existing MARL software infrastructure. However, MARBLER lacks support for parallelization and GPU/TPU execution, making the platform prohibitively slow compared to modern MARL environments and hindering adoption. We contribute JaxRobotarium, a Jax-powered end-to-end simulation, learning, deployment, and benchmarking platform for the Robotarium. JaxRobotarium enables rapid training and deployment of multi-robot RL (MRRL) policies with realistic robot dynamics and safety constraints, supporting parallelization and hardware acceleration. Our generalizable learning interface integrates easily with SOTA MARL libraries (e.g., JaxMARL). In addition, JaxRobotarium includes eight standardized coordination scenarios, including four novel scenarios that bring established MARL benchmark tasks (e.g., RWARE and Level-Based Foraging) to a robotics setting. We demonstrate that JaxRobotarium retains high simulation fidelity while achieving dramatic speedups over baseline (20x in training and 150x in simulation), and provides an open-access sim-to-real evaluation pipeline through the Robotarium testbed, accelerating and democratizing access to multi-robot learning research and evaluation. Our code is available at https://github.com/GT-STAR-Lab/JaxRobotarium.
多代理人强化学习(MARL)已成为在多机器人系统中学习复杂和可扩展的协调行为的有希望的解决方案,然而,已经建立的MARL平台(例如SMAC和MPE)缺乏机器人相关性和硬件部署,使多机器人学习研究人员无法开发专门发展和评价其个人贡献的言语环境和硬件测试台;机器人馆的多高级RL基准和学习环境(MARBERR)是最近为MARL提供一个标准化的机器人相关平台的一个令人振奋的步骤,该平台将机器人机构测试台与现有的MARL软件基础设施连接起来;然而,MARBLR(MERBER)缺乏对平行化和GPU/TPU执行的支持,使平台与现代MARL环境相比速度过慢,难以被采用;我们为机器人馆提供Jax动力端对端模拟、学习、部署和基准平台;JaxRobotSaloria(MRRL)快速培训和部署多机器人(MRBL)政策与现实性机器人动态和安全限制相结合,支持平行基线和软体基线,同时在SOL上进行常规学习。
Article 57
Title@2025-05-26 (1): xChemAgents: Agentic AI for Explainable Quantum Chemistry
Title: xChemAgents: Agentic AI for Explainable Quantum Chemistry | xChemAgenten: Agentische KI für erklärbare Quantenchemie | xchemAgents: 可解释量子化学的AAA剂 2505.20574v1 |
Authors: Can Polat, Mehmet Tuncel, Hasan Kurban, Erchin Serpedin, Mustafa Kurban
Recent progress in multimodal graph neural networks has demonstrated that augmenting atomic XYZ geometries with textual chemical descriptors can enhance predictive accuracy across a range of electronic and thermodynamic properties. However, naively appending large sets of heterogeneous descriptors often degrades performance on tasks sensitive to molecular shape or symmetry, and undermines interpretability. xChemAgents proposes a cooperative agent framework that injects physics-aware reasoning into multimodal property prediction. xChemAgents comprises two language-model-based agents: a Selector, which adaptively identifies a sparse, weighted subset of descriptors relevant to each target, and provides a natural language rationale; and a Validator, which enforces physical constraints such as unit consistency and scaling laws through iterative dialogue. On standard benchmark datasets, xChemAgents achieves up to a 22\% reduction in mean absolute error over strong baselines, while producing faithful, human-interpretable explanations. Experiment results highlight the potential of cooperative, self-verifying agents to enhance both accuracy and transparency in foundation-model-driven materials science. The implementation and accompanying dataset are available anonymously at https://github.com/KurbanIntelligenceLab/xChemAgents.
多式联运图形神经网络的近期进展表明,以文本化学描述器增强原子XYZ的地形特征可以提高一系列电子和热力特性的预测准确性;然而,天真地附加大量多种描述器往往会降低对分子形状或对称敏感的任务的性能,并损害解释性。 xChemAgents提出一个合作性代理框架,将物理认知推理注入多式联运属性预测中。 xChemAgents 提议一个合作性代理框架,将物理认知推入基于多种语言模型的推理。 xChemAgents 包含两种基于语言的代理物:选择器,该选择器在适应性基础上确定与每个目标相关的稀有加权描述器子,并提供自然语言理由;验证器,通过迭代对话强制执行单位一致性和扩展法律等物理限制。关于标准基准数据集, xChemAsemalents在强基线的绝对误差方面达到22,同时得出可靠的、人间可解释性的解释。实验结果突出表明合作性、自我验证性代理物的潜力,以提高基础模质材料科学的准确性和透明度。执行和透明性/Cambasmaburmas/Armissionalismass。
Article 58
Title@2025-05-26 (1): Streamlining Resilient Kubernetes Autoscaling with Multi-Agent Systems via an Automated Online Design Framework
Title: Streamlining Resilient Kubernetes Autoscaling with Multi-Agent Systems via an Automated Online Design Framework | Straffung des Resilients Kubernetes Autoscaling mit Multi-Agent Systemen über ein automatisiertes Online-Design-Framework | 通过自动在线设计框架与多机构系统自动调整 2505.21559v1 |
Authors: Julien Soulé, Jean-Paul Jamont, Michel Occello, Louis-Marie Traonouez, Paul Théron
In cloud-native systems, Kubernetes clusters with interdependent services often face challenges to their operational resilience due to poor workload management issues such as resource blocking, bottlenecks, or continuous pod crashes. These vulnerabilities are further amplified in adversarial scenarios, such as Distributed Denial-of-Service attacks (DDoS). Conventional Horizontal Pod Autoscaling (HPA) approaches struggle to address such dynamic conditions, while reinforcement learning-based methods, though more adaptable, typically optimize single goals like latency or resource usage, neglecting broader failure scenarios. We propose decomposing the overarching goal of maintaining operational resilience into failure-specific sub-goals delegated to collaborative agents, collectively forming an HPA Multi-Agent System (MAS). We introduce an automated, four-phase online framework for HPA MAS design: 1) modeling a digital twin built from cluster traces; 2) training agents in simulation using roles and missions tailored to failure contexts; 3) analyzing agent behaviors for explainability; and 4) transferring learned policies to the real cluster. Experimental results demonstrate that the generated HPA MASs outperform three state-of-the-art HPA systems in sustaining operational resilience under various adversarial conditions in a proposed complex cluster.
在云型系统中,具有相互依存服务的Kubernets集群往往由于诸如资源阻塞、瓶颈或连续撞车等工作量管理不足的问题而面临业务复原力挑战。这些脆弱性在对抗性情景中进一步扩大,如分布式拒绝服务攻击(DDoS)。传统横向平台自动扩缩(HPA)方法力求解决这种动态条件,同时强化基于学习的方法,尽管这种方法更适应性更强,通常最优化单一目标,如延时或资源使用,忽视更广泛的故障假设。我们提议将维持业务复原力这一总体目标转变为授权合作机构执行的故障特定次级目标,集体形成HPA多要素系统(MAS),我们为HPAMAS设计引入了一个自动、四阶段在线框架:1个基于集群跟踪建立的数字组合;2个培训人员利用适合故障背景的角色和任务进行模拟;3个分析代理人解释行为;以及4个将学到的政策转移到真正的集群。实验结果表明,产生的HPAMAMAS超越了在各种拟议的复杂对抗性组合下维持业务复原力的3个状态。
Article 59
Title@2025-05-26 (1): Reconceptualizing Smart Microscopy: From Data Collection to Knowledge Creation by Multi-Agent Integration
Title: Reconceptualizing Smart Microscopy: From Data Collection to Knowledge Creation by Multi-Agent Integration | Intelligente Mikroskopie neu konzipieren: Von der Datenerhebung zur Wissenserstellung durch Multi-Agent-Integration | 重新概念化智能微镜:从数据收集到通过多机构整合创造知识 2505.20466v1 |
Authors: P. S. Kesavan, Pontus Nordenfelt
Smart microscopy represents a paradigm shift in biological imaging, moving from passive observation tools to active collaborators in scientific inquiry. Enabled by advances in automation, computational power, and artificial intelligence, these systems are now capable of adaptive decision-making and real-time experimental control. Here, we introduce a theoretical framework that reconceptualizes smart microscopy as a partner in scientific investigation. Central to our framework is the concept of the ‘epistemic-empirical divide’ in cellular investigation-the gap between what is observable (empirical domain) and what must be understood (epistemic domain). We propose six core design principles: epistemic-empirical awareness, hierarchical context integration, an evolution from detection to perception, adaptive measurement frameworks, narrative synthesis capabilities, and cross-contextual reasoning. Together, these principles guide a multi-agent architecture designed to align empirical observation with the goals of scientific understanding. Our framework provides a roadmap for building microscopy systems that go beyond automation to actively support hypothesis generation, insight discovery, and theory development, redefining the role of scientific instruments in the process of knowledge creation.
智能显微镜代表生物成像的范式转变,从被动观察工具向科学调查的积极合作者转变。通过自动化、计算力和人工智能的进步,这些系统现在能够适应决策和实时实验控制。在这里,我们引入了将智能显微镜作为科学调查伙伴的理论框架。我们框架的核心是细胞调查中的“宇宙-经验分化”概念——观察(经验领域)和必须理解(流行病领域)之间的差距。我们提出了六项核心设计原则:内分泌-经验意识、等级背景整合、从探测到感知的演进、适应性计量框架、叙述合成能力和跨理论推理。这些原则共同指导一个多工具结构,旨在将经验观察与科学理解目标相协调。我们的框架为建立微生物系统提供了路线图,这些系统超越自动化,积极支持假设的产生、洞察发现和理论发展,重新界定科学仪器在知识创造过程中的作用。
Article 60
Title@2025-05-26 (1): Sable: a Performant, Efficient and Scalable Sequence Model for MARL
Title: Sable: a Performant, Efficient and Scalable Sequence Model for MARL | Sable: ein leistungsfähiges, effizientes und skalierbares Sequenzmodell für MARL | 电缆:MARL的性能、高效和可缩放序列模型 2410.01706v5 |
Authors: Omayma Mahjoub, Sasha Abramowitz, Ruan de Kock, Wiem Khlifi, Simon du Toit, Jemma Daniel, Louay Ben Nessir, Louise Beyers, Claude Formanek, Liam Clark, Arnu Pretorius
As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency, and (3) scalability. In this work, we introduce Sable, a performant, memory-efficient, and scalable sequence modeling approach to MARL. Sable works by adapting the retention mechanism in Retentive Networks (Sun et al., 2023) to achieve computationally efficient processing of multi-agent observations with long context memory for temporal reasoning. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable’s performance gains and confirm its efficient computational memory usage.
随着多试剂加固学习(MARL)在解决更大、更复杂的问题方面取得进展,越来越重要的是,算法要表现出(1) 强性、(2) 内存效率以及(3) 缩放性等关键特性。在这项工作中,我们向MARL引入了Sable、性能强、内存效率和可缩放性序列建模方法。Sable工作通过在Retention Networks(Sun等人,2023年)中调整留存机制,实现对多试剂观测的计算机高效处理,并用长期内存来进行时间推理。通过在六个不同的环境中进行广泛的评价,我们证明Sable在大量不同的任务中(45项测试中的34项)能够大大超过现有的最新方法。此外,Sable工作在规模上保持了性能,处理超过一千种物剂的环境,同时显示记忆用量的线性增长。最后,我们进行对比研究,以分离Sable的性能收益的来源,并证实其有效的计算性能。
Article 61
Title@2025-05-26 (1): Federated Domain Generalization with Data-free On-server Matching Gradient
Title: Federated Domain Generalization with Data-free On-server Matching Gradient | Föderierte Domain-Verallgemeinerung mit datenfreiem On-Server-Zustimmungs-Gradient | 具有无数据观测站上与渐变匹配的无数据观测器的联邦通用域 2501.14653v2 |
Authors: Trong-Binh Nguyen, Minh-Duong Nguyen, Jinsun Park, Quoc-Viet Pham, Won Joo Hwang
Domain Generalization (DG) aims to learn from multiple known source domains a model that can generalize well to unknown target domains. One of the key approaches in DG is training an encoder which generates domain-invariant representations. However, this approach is not applicable in Federated Domain Generalization (FDG), where data from various domains are distributed across different clients. In this paper, we introduce a novel approach, dubbed Federated Learning via On-server Matching Gradient (FedOMG), which can \emph{efficiently leverage domain information from distributed domains}. Specifically, we utilize the local gradients as information about the distributed models to find an invariant gradient direction across all domains through gradient inner product maximization. The advantages are two-fold: 1) FedOMG can aggregate the characteristics of distributed models on the centralized server without incurring any additional communication cost, and 2) FedOMG is orthogonal to many existing FL/FDG methods, allowing for additional performance improvements by being seamlessly integrated with them. Extensive experimental evaluations on various settings to demonstrate the robustness of FedOMG compared to other FL/FDG baselines. Our method outperforms recent SOTA baselines on four FL benchmark datasets (MNIST, EMNIST, CIFAR-10, and CIFAR-100), and three FDG benchmark datasets (PACS, VLCS, and OfficeHome).
常规化(DG) 旨在从多个已知源域中学习一个能够向未知目标域广泛推广的模型。 DG的主要方法之一是培训一个能生成域内异性表示的编码器。但是,这一方法不适用于联邦域通用化(FDG),因为来自不同领域的数据分布在不同客户之间。在本文中,我们引入了一种新颖的方法,称为通过服务器匹配梯级(FedOMG)进行联邦学习,这可以有效地利用分布域域域域的域域信息。具体地说,我们利用地方梯度作为分布模型的信息,通过内部产品渐变最大化,在所有域找到一个不变化的梯度方向。其优点有两个方面:(1) FedOMG可以将分布模型的特性汇总到中央服务器上,而不会产生额外的通信费用;(2) FedOMG与许多现有的FL/FDG方法不相近似,从而能够与这些方法紧密结合,从而能够进一步改进业绩。 具体地对各种环境进行广泛的实验性评估,以显示FOMOMG G办公室与其他FA/FAR 基准、CSA 3 CRA 基准、CSAS CIA 和 CSAR 。
Article 62
Title@2025-05-26 (1): Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning
Title: Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning | Semantic-Aware Ressourcenmanagement für C-V2X Platooning über Multi-Agent Verstärkungslernen | 通过多机构强化学习进行 C-V2X 等离子处理的语义软件资源管理 2411.04672v2 |
Authors: Wenjun Zhang, Qiong Wu, Pingyi Fan, Kezhi Wang, Nan Cheng, Wen Chen, Khaled B. Letaief
Semantic communication transmits the extracted features of information rather than raw data, significantly reducing redundancy, which is crucial for addressing spectrum and energy challenges in 6G networks. In this paper, we introduce semantic communication into a cellular vehicle-to-everything (C-V2X)- based autonomous vehicle platoon system for the first time, aiming to achieve efficient management of communication resources in a dynamic environment. Firstly, we construct a mathematical model for semantic communication in platoon systems, in which the DeepSC model and MU-DeepSC model are used to semantically encode and decode unimodal and multi-modal data, respectively. Then, we propose the quality of experience (QoE) metric based on semantic similarity and semantic rate. Meanwhile, we consider the success rate of semantic information transmission (SRS) metric to ensure the fairness of channel resource allocation. Next, the optimization problem is posed with the aim of maximizing the QoE in vehicle-to-vehicle (V2V) links while improving SRS. To solve this mixed integer nonlinear programming problem (MINLP) and adapt to time-varying channel conditions, the paper proposes a distributed semantic-aware multi-modal resource allocation (SAMRA) algorithm based on multi-agent reinforcement learning (MARL), referred to as SAMRAMARL. The algorithm can dynamically allocate channels and power and determine semantic symbol length based on the contextual importance of the transmitted information, ensuring efficient resource utilization. Finally, extensive simulations have demonstrated that SAMRAMARL outperforms existing methods, achieving significant gains in QoE, SRS, and communication delay in C-V2X platooning scenarios.
在本文中,我们首次将语义通信引入一个以机动车辆到无障碍(C-V2X)为基础的自动车辆排排系统,目的是在动态环境中实现对通信资源的有效管理。首先,我们为排级系统中的语义通信建立一个数学模型,在这个模型中,DeepSC模型和MU-DepSC符号模型分别用于对6G网络中的频度和能源挑战进行语义编码和解码。然后,我们提出基于语义相似性和语义化的机动车辆自动排排系统(C-V2X)的语义通信质量衡量标准。与此同时,我们考虑语义信息传输标准的成功率,以确保频道资源分配的公平性。 其次是优化问题,目的是在车辆到车辆之间(V2V2V)的QE类流流流流体符号中实现最大程度的QE值,同时改进SRS。要解决这一混合的非线性非语系数据流数据流数据流的QE(MLP) 以语义性语言流数据流数据流质量,SLMal-Sal-liamal IML 数据流分配现有时间级数据流分析方法,在SMal-resmal-resmal-resultal-resultal 上进行大幅分配的Silal-resultal-resmal-resultal-resml ,在基于的Sal-resmal-resultal-resultal 的 Sl 上,在Sal-resml 的Sal-resental-resental 上,在Slational-resmlational 上展示的 Slectional-resmal-resmal-lement Slaction 上,在基于的 Sl-lemental-lection 的 Sal-lemental-lemental-lemental-li-li-lemental-lemental-lemental-lemental-lectional-lemental-lactional-lemental-lemental-lemental-lemental-lemental-lemental-lemental-lemental-laction上,在Sal-lisal-lational-lemental-lemental-lemental-lemental
Article 63
Title@2025-05-26 (1): Multi-Agent Reinforcement Learning in Cybersecurity: From Fundamentals to Applications
Title: Multi-Agent Reinforcement Learning in Cybersecurity: From Fundamentals to Applications | Multi-Agenten-Verstärkung Lernen in Cybersicherheit: Von Grundlagen zu Anwendungen | 网络安全多机构强化多机构网络安全学习:从基础到应用 2505.19837v1 |
Authors: Christoph R. Landolt, Christoph Würsch, Roland Meier, Alain Mermoud, Julian Jang-Jaccard
Multi-Agent Reinforcement Learning (MARL) has shown great potential as an adaptive solution for addressing modern cybersecurity challenges. MARL enables decentralized, adaptive, and collaborative defense strategies and provides an automated mechanism to combat dynamic, coordinated, and sophisticated threats. This survey investigates the current state of research in MARL applications for automated cyber defense (ACD), focusing on intruder detection and lateral movement containment. Additionally, it examines the role of Autonomous Intelligent Cyber-defense Agents (AICA) and Cyber Gyms in training and validating MARL agents. Finally, the paper outlines existing challenges, such as scalability and adversarial robustness, and proposes future research directions. This also discusses how MARL integrates in AICA to provide adaptive, scalable, and dynamic solutions to counter the increasingly sophisticated landscape of cyber threats. It highlights the transformative potential of MARL in areas like intrusion detection and lateral movement containment, and underscores the value of Cyber Gyms for training and validation of AICA.
作为应对现代网络安全挑战的适应性解决办法,多机构强化学习(MARL)已显示出巨大的潜力。MARL使分散、适应性和协作性防御战略成为分散、适应性和协作性防御战略,并提供了应对动态、协调和复杂威胁的自动化机制。这项调查调查调查了MARL自动化网络防御应用程序(ACD)研究现状,重点是入侵探测和横向移动封隔。此外,它还审查了自主智能网络防御代理(AICA)和网络健身在培训和验证MARL代理方面的作用。最后,该文件概述了现有挑战,如可扩缩性和对抗性强力,并提出了未来的研究方向。它还讨论了MARL如何融入AICA,以提供适应性、可扩缩和动态的解决方案,应对日益复杂的网络威胁。它强调了MARL在入侵探测和横向移动封隔等领域的变革潜力,并强调网络健身对培训和验证AICA的价值观。
Article 64
Title@2025-05-26 (1): Fast and Robust Flocking of Protesters on Street Networks
Title: Fast and Robust Flocking of Protesters on Street Networks | Schnelles und robustes Auspeitschen von Protestierenden auf Straßennetzen | 街头网络上抗争者快速和强力封锁 2406.01101v3 |
Authors: Guillaume Moinard, Matthieu Latapy
We propose a simple model of protesters scattered throughout a city who want to gather into large and mobile groups. This model relies on random walkers on a street network that follow tactics built from a set of basic rules. Our goal is to identify the most important rules for fast and robust flocking of walkers. We explore a wide set of tactics and show the central importance of a specific rule based on alignment. Other rules alone perform poorly, but our experiments show that combining alignment with them enhances flocking, and that obtained groups are then remarkably robust.
我们提出一个分散在城市各地的抗议者简单模式,他们想聚集在大型机动团体中。这个模式依靠街头网络的随机行走者,他们遵循一套基本规则的策略。我们的目标是确定快速和强力驱赶行走者的最重要规则。我们探索了一套广泛的策略,并展示了基于一致的特定规则的核心重要性。其他规则本身表现不佳,但我们的实验显示,与他们联合起来会增加群聚,而获得的团体则非常强大。
Article 65
Title@2025-05-26 (1): Adaptive Episode Length Adjustment for Multi-agent Reinforcement Learning
Title: Adaptive Episode Length Adjustment for Multi-agent Reinforcement Learning | Adaptive Anpassung der Episodenlänge für das Multi-Agenten-Verstärkungs-Lernen | 多试剂强化学习的适应性分单元长度调整 2505.19637v1 |
Authors: Byunghyun Yoo, Younghwan Shin, Hyunwoo Kim, Euisok Chung, Jeongmin Yang
In standard reinforcement learning, an episode is defined as a sequence of interactions between agents and the environment, which terminates upon reaching a terminal state or a pre-defined episode length. Setting a shorter episode length enables the generation of multiple episodes with the same number of data samples, thereby facilitating an exploration of diverse states. While shorter episodes may limit the collection of long-term interactions, they may offer significant advantages when properly managed. For example, trajectory truncation in single-agent reinforcement learning has shown how the benefits of shorter episodes can be leveraged despite the trade-off of reduced long-term interaction experiences. However, this approach remains underexplored in MARL. This paper proposes a novel MARL approach, Adaptive Episode Length Adjustment (AELA), where the episode length is initially limited and gradually increased based on an entropy-based assessment of learning progress. By starting with shorter episodes, agents can focus on learning effective strategies for initial states and minimize time spent in dead-end states. The use of entropy as an assessment metric prevents premature convergence to suboptimal policies and ensures balanced training over varying episode lengths. We validate our approach using the StarCraft Multi-agent Challenge (SMAC) and a modified predator-prey environment, demonstrating significant improvements in both convergence speed and overall performance compared to existing methods. To the best of our knowledge, this is the first study to adaptively adjust episode length in MARL based on learning progress.
在标准强化学习中,一个插曲被定义为代理物与环境之间一系列互动关系,在到达终点状态或预设的插曲长度时终止。设定一个较短的插曲长度,能够生成相同数量的数据样本的多个事件,从而便利于对不同的州进行探索。虽然短片可能会限制长期互动的收集,但如果管理得当,它们可能会提供很大的优势。例如,单试剂强化学习的轨迹缩短表明,尽管长期互动经验减少,但短期事件的好处仍然可以被利用。然而,这一方法在MARL中仍然没有得到充分利用。本文建议采用新的MARL方法,即适应性Episode Laut 调整(AELA),根据对学习进展的迷幻剂评估,该插曲期最初有限,逐渐增加。从短片开始,代理物可以侧重于为初始状态学习有效战略,并最大限度地减少在死后状态所花费的时间。使用一种评估指标防止过早地与次优化政策,并确保在不同的插曲长度内进行平衡的培训。我们首先用StarCraft-Protrade Laut Lodal Stal Sqal Sqal 正在验证我们现有的适应性适应性研究,这是以大幅度的升级的进度, 和不断调整的进度,这是在SMAL-Cretraview-
Article 66
Title@2025-05-26 (1): Multi-Agent Collaboration via Evolving Orchestration
Title: Multi-Agent Collaboration via Evolving Orchestration | Multi-Agenten-Zusammenarbeit über Evolving Orchestration | 通过不断演变的管弦化多机构协作 2505.19591v1 |
Authors: Yufan Dang, Chen Qian, Xueheng Luo, Jingru Fan, Zihao Xie, Ruijie Shi, Weize Chen, Cheng Yang, Xiaoyin Che, Ye Tian, Xuantang Xiong, Lei Han, Zhiyuan Liu, Maosong Sun
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving. While recent research explores multi-agent collaboration among LLMs, most approaches rely on static organizational structures that struggle to adapt as task complexity and agent numbers grow, resulting in coordination overhead and inefficiencies. To this end, we propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator (“puppeteer”) dynamically directs agents (“puppets”) in response to evolving task states. This orchestrator is trained via reinforcement learning to adaptively sequence and prioritize agents, enabling flexible and evolvable collective reasoning. Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs. Analyses further reveal that the key improvements consistently stem from the emergence of more compact, cyclic reasoning structures under the orchestrator’s evolution.
大型语言模型(LLMs)在各种下游任务中取得了显著成果,但其单一性质限制了复杂问题的解决的可扩展性和效率。 尽管最近的研究探索了LLMs之间的多剂合作,但大多数方法都依赖于随着任务复杂性和代理数量增长而难以适应的静态组织结构,从而导致协调间接费用和低效率。为此,我们为基于LLM的多剂合作提出了一个木偶式模式,即中央管弦(“布偶”)动态引导代理(“傀儡 ”)应对不断变化的任务状态。该管弦乐队通过强化适应性序列和优先排序代理的学习而接受培训,使灵活和可演化的集体推理成为可能。关于封闭和开放场情景的实验表明,这种方法在降低计算成本的情况下取得了优异的业绩。我们进一步的分析表明,关键改进始终来自于在管弦乐队进化过程中出现的更为紧凑的、循环推理结构。
Article 67
Title@2025-05-26 (1): LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer
Title: LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer | LLM-Agent-Controller: Ein universelles Multi-Agent-Großsprachmodellsystem als Steuerungsingenieur | LLM-代理主计长:作为控制工程师的通用多代理大型语文示范系统 2505.19567v1 |
Authors: Rasoul Zahedifar, Sayyed Ali Mirghasemi, Mahdieh Soleymani Baghshah, Alireza Taheri
This study presents the LLM-Agent-Controller, a multi-agent large language model (LLM) system developed to address a wide range of problems in control engineering (Control Theory). The system integrates a central controller agent with multiple specialized auxiliary agents, responsible for tasks such as controller design, model representation, control analysis, time-domain response, and simulation. A supervisor oversees high-level decision-making and workflow coordination, enhancing the system’s reliability and efficiency. The LLM-Agent-Controller incorporates advanced capabilities, including Retrieval-Augmented Generation (RAG), Chain-of-Thought reasoning, self-criticism and correction, efficient memory handling, and user-friendly natural language communication. It is designed to function without requiring users to have prior knowledge of Control Theory, enabling them to input problems in plain language and receive complete, real-time solutions. To evaluate the system, we propose new performance metrics assessing both individual agents and the system as a whole. We test five categories of Control Theory problems and benchmark performance across three advanced LLMs. Additionally, we conduct a comprehensive qualitative conversational analysis covering all key services. Results show that the LLM-Agent-Controller successfully solved 83% of general tasks, with individual agents achieving an average success rate of 87%. Performance improved with more advanced LLMs. This research demonstrates the potential of multi-agent LLM architectures to solve complex, domain-specific problems. By integrating specialized agents, supervisory control, and advanced reasoning, the LLM-Agent-Controller offers a scalable, robust, and accessible solution framework that can be extended to various technical domains.
本研究报告介绍了LLM-Agent-Central(LLM)系统,这是一个多试剂大型语言模型(LLM)系统,是为解决控制工程(控制理论)方面一系列广泛问题而开发的。该系统将中央控制剂与多个专业辅助剂合并,负责控制设计、模式演示、控制分析、时间-领域反应和模拟等任务。一名主管监督高层决策和工作流程协调,提高系统的可靠性和效率。LLM-Agenter(LLM)系统包含先进能力,包括检索-启动型(RAG)系统(RET-AUG)、连锁推理、自我检查和校正校校、高效的记忆处理和方便用户的自然语言交流。该系统的设计是,在运作上不要求用户事先了解控制理论,使他们能够用普通语言输入问题并获得完整、实时解决方案。为了评估系统,我们提出了新的业绩衡量标准,我们测试了三种先进的LA-CM(RG)系统的五类可操作问题和基准性业绩。此外,我们进行了全面的高质量对话分析,包括了所有核心领域(LM)的高级逻辑、高等级的高级分析,提高了高级分析。结果显示,实现了整个内部的升级的升级的升级的升级的系统,实现了。
Article 68
Title@2025-05-26 (1): DoctorRAG: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients
Title: DoctorRAG: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients | DoctorRAG: Medizinische RAG Durch Textabstufungen Wissen mit Patient Analogie fusionieren | 医生RAG:通过文字梯度将医学RAG知识与病人分析知识与病人分析相融合 2505.19538v1 |
Authors: Yuxing Lu, Gecheng Fu, Wei Wu, Xukai Zhao, Sin Yee Goi, Jinzhuo Wang
Existing medical RAG systems mainly leverage knowledge from medical knowledge bases, neglecting the crucial role of experiential knowledge derived from similar patient cases – a key component of human clinical reasoning. To bridge this gap, we propose DoctorRAG, a RAG framework that emulates doctor-like reasoning by integrating both explicit clinical knowledge and implicit case-based experience. DoctorRAG enhances retrieval precision by first allocating conceptual tags for queries and knowledge sources, together with a hybrid retrieval mechanism from both relevant knowledge and patient. In addition, a Med-TextGrad module using multi-agent textual gradients is integrated to ensure that the final output adheres to the retrieved knowledge and patient query. Comprehensive experiments on multilingual, multitask datasets demonstrate that DoctorRAG significantly outperforms strong baseline RAG models and gains improvements from iterative refinements. Our approach generates more accurate, relevant, and comprehensive responses, taking a step towards more doctor-like medical reasoning systems.
现有RAG医疗系统主要利用医疗知识库的知识,忽视了类似病人病例的经验性知识的关键作用 – – 人类临床推理的一个关键组成部分。为弥合这一差距,我们提议DrotRAG,这是一个仿照医生推理的框架,它既包括明确的临床知识,也包括隐含的个案经验。DrotRAG首先为查询和知识来源分配概念标签,同时从相关知识和病人中建立混合检索机制,从而提高检索的准确性。此外,一个使用多试剂文本梯度的Med-TextGrad模块被整合到一起,以确保最终产出符合已检索到的知识和病人查询。关于多语种、多任务数据集的全面实验表明,DrectraG大大超越了强有力的基线RAG模型,并从迭接改进中获得了改进。我们的方法产生了更准确、相关和全面的反应,朝着更像医生一样的医疗推理系统迈出了一步。
Article 69
Title@2025-05-26 (1): VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning
Title: VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning | VLMLight: Verkehrssignalsteuerung über Vision-Language Meta-Control und Dual-Branch-Reasoning | VLMLight:通过视觉语言、超控制和双层理由解释控制交通信号控制 2505.19486v1 |
Authors: Maonan Wang, Yirong Chen, Aoyu Pang, Yuxin Cai, Chung Shue Chen, Yuheng Kan, Man-On Pun
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
城市交通信号控制(TSC)是城市流动性的核心挑战,因为实时决定必须平衡效率和安全。现有方法 — 从基于规则的超光速到强化学习(RL) — — 往往难以推广到复杂、动态和安全临界情景。我们引入了VLMLight,这是一个全新的TSC框架,将视觉语言的元控制与双部门推理结合起来。VLMLight的核心是第一个基于图像的交通模拟器,它使多视路交汇的视觉感知能够让多视路交汇,允许政策对车辆类型、运动和空间密度等丰富线索进行解释。一个大型语言模型(LLM)作为安全优先化的元控制器,在常规交通快速RL政策与关键案例结构化推理部门之间进行选择。在后一种情况下,多个LMLM代理商合作评估交通阶段,优先关注紧急车辆,并核查规则遵守情况。实验显示VLMLight将紧急车辆的等候时间减少至65 % 超过RL专用系统,同时保持标准条件下的实时性性性性性性性性性性性性性性表现,而低于1 % 信号控制状态。
Article 70
Title@2025-05-26 (1): Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
Title: Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs | Gewinnen Sie schnell oder verlieren Sie langsam: Ausgleichende Geschwindigkeit und Genauigkeit in Latenz-Sensitive Entscheidungen von LLMs | 慢赢或慢输:LLMs的延缓敏感决定中平衡速度和准确性 2505.19481v1 |
Authors: Hao Kang, Qingru Zhang, Han Cai, Weiyuan Xu, Tushar Krishna, Yilun Du, Tsachy Weissman
Large language models (LLMs) have shown remarkable performance across diverse reasoning and generation tasks, and are increasingly deployed as agents in dynamic environments such as code generation and recommendation systems. However, many real-world applications, such as high-frequency trading and real-time competitive gaming, require decisions under strict latency constraints, where faster responses directly translate into higher rewards. Despite the importance of this latency quality trade off, it remains underexplored in the context of LLM based agents. In this work, we present the first systematic study of this trade off in real time decision making tasks. To support our investigation, we introduce two new benchmarks: HFTBench, a high frequency trading simulation, and StreetFighter, a competitive gaming platform. Our analysis reveals that optimal latency quality balance varies by task, and that sacrificing quality for lower latency can significantly enhance downstream performance. To address this, we propose FPX, an adaptive framework that dynamically selects model size and quantization level based on real time demands. Our method achieves the best performance on both benchmarks, improving win rate by up to 80% in Street Fighter and boosting daily yield by up to 26.52% in trading, underscoring the need for latency aware evaluation and deployment strategies for LLM based agents. These results demonstrate the critical importance of latency aware evaluation and deployment strategies for real world LLM based agents. Our benchmarks are available at Latency Sensitive Benchmarks.
大型语言模型(LLMS)在各种推理和代代任务中表现出了显著的绩效,并越来越多地作为代号生成和建议系统等动态环境中的代理商被部署。然而,许多现实世界应用软件,如高频交易和实时竞争性竞技游戏等,都需要在严格的长期限制下作出决定,而更快的反应则直接转化为更高的奖励。尽管这种长期质量交易十分重要,但在基于LLLM的代理商中,它仍然没有得到充分探讨。在这项工作中,我们首次以实时决策实时任务的形式对这种交易进行了系统研究。为了支持我们的调查,我们引入了两个新的基准:HFTBench,一个高频交易模拟,以及StreetFighter,一个竞争性的组合平台。我们的分析表明,最佳的延时质量平衡因任务而异,而降低低长期质量可以大大提高下游业绩。为了解决这个问题,我们建议FPX,一个根据实时需求动态选择模型大小和四分级水平的适应性框架。我们的方法在两个基准上都取得了最佳的业绩,在街头战斗中以80%的速度递增速率率率率,在战略中提升到80%,在战略中,通过展示我们稳定的部署基准,在战略的升级的部署中,这些基准需要。这些基准以显示我们的安全度的部署战略的升级。
Article 71
Title@2025-05-25 (7): Making Teams and Influencing Agents: Efficiently Coordinating Decision Trees for Interpretable Multi-Agent Reinforcement Learning
Title: Making Teams and Influencing Agents: Efficiently Coordinating Decision Trees for Interpretable Multi-Agent Reinforcement Learning | Teambildung und Beeinflussung von Agenten: Entscheidungsbäume effizient koordinieren für interpretierbares Mehr-Agenten-Verstärkungs-Lernen | 建立团队和对代理人产生影响的代理:高效协调可解释的多机构强化学习决策树 2505.19316v1 |
Authors: Rex Chen, Stephanie Milani, Zhicheng Zhang, Norman Sadeh, Fei Fang
Poor interpretability hinders the practical applicability of multi-agent reinforcement learning (MARL) policies. Deploying interpretable surrogates of uninterpretable policies enhances the safety and verifiability of MARL for real-world applications. However, if these surrogates are to interact directly with the environment within human supervisory frameworks, they must be both performant and computationally efficient. Prior work on interpretable MARL has either sacrificed performance for computational efficiency or computational efficiency for performance. To address this issue, we propose HYDRAVIPER, a decision tree-based interpretable MARL algorithm. HYDRAVIPER coordinates training between agents based on expected team performance, and adaptively allocates budgets for environment interaction to improve computational efficiency. Experiments on standard benchmark environments for multi-agent coordination and traffic signal control show that HYDRAVIPER matches the performance of state-of-the-art methods using a fraction of the runtime, and that it maintains a Pareto frontier of performance for different interaction budgets.
多剂强化学习(MARL)政策的可解释性妨碍了多剂强化学习(MARL)政策的实际适用。采用不可解释的替代政策可解释性可解释性可加强MARL在现实世界应用中的安全和可核查性。然而,如果这些代用者要在人类监督框架内与环境直接互动,这些代用者必须既具有性能,又具有计算效率。以前关于可解释性MARL的工作要么牺牲了计算效率的性能,要么牺牲了计算效率或计算性能的计算效率。为了解决这一问题,我们提议采用基于决定的、基于树的可解释MARL算法HyDRAVIPER。 HyDRAVIPER根据预期的团队业绩协调各代理者之间的培训,并适应性地分配环境互动预算以提高计算效率。关于多剂协调和交通信号控制的标准基准环境的实验表明,HYDRAVIPER利用运行时间的一小部分与最先进方法的性能相匹配,并且为不同互动预算保持业绩的PAreto边界。
Article 72
Title@2025-05-25 (7): Agentic Information Theory: Ergodicity and Intrinsic Semantics of Information Processes
Title: Agentic Information Theory: Ergodicity and Intrinsic Semantics of Information Processes | Agentische Informationstheorie: Ergodikität und Intrinsische Semantik von Informationsprozessen | 代理信息理论:信息过程的分化和内在的语义 2505.19275v1 |
Authors: James P. Crutchfield, Alexandra Jurgens
We develop information theory for the temporal behavior of memoryful agents moving through complex – structured, stochastic – environments. We introduce information processes – stochastic processes produced by cognitive agents in real-time as they interact with and interpret incoming stimuli. We provide basic results on the ergodicity and semantics of the resulting time series of Shannon information measures that monitor an agent’s adapting view of uncertainty and structural correlation in its environment.
我们为在复杂的 – – 结构化的、随机的 – – 环境中移动的记忆性物剂的时间行为发展信息理论。我们引入了信息过程 – – 认知性物剂在与进取刺激进行互动和解释时实时生成的随机过程。我们提供了由此产生的香农信息措施的时间序列的灵敏性和语义学基本结果,监测一个物剂对其环境中的不确定性和结构相关性的适应观点。
Article 73
Title@2025-05-25 (7): GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
Title: GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling | GUARDIAN: LLM-Multiagent-Kollaborationen mit zeitlicher Graphenmodellierung sichern | GUARDIAN: 保护LLM 多机构协作与时间图建模 2505.19234v1 |
Authors: Jialong Zhou, Lichao Wang, Xiao Yang
The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration face critical safety challenges, such as hallucination amplification and error injection and propagation. This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. By modeling the multi-agent collaboration process as a discrete-time temporal attributed graph, GUARDIAN explicitly captures the propagation dynamics of hallucinations and errors. The unsupervised encoder-decoder architecture incorporating an incremental training paradigm, learns to reconstruct node attributes and graph structures from latent embeddings, enabling the identification of anomalous nodes and edges with unparalleled precision. Moreover, we introduce a graph abstraction mechanism based on the Information Bottleneck Theory, which compresses temporal interaction graphs while preserving essential patterns. Extensive experiments demonstrate GUARDIAN’s effectiveness in safeguarding LLM multi-agent collaborations against diverse safety vulnerabilities, achieving state-of-the-art accuracy with efficient resource utilization.
大型语言模型(LLMS)的出现有助于开发能够参与复杂和多方向对话的智能剂(LLMS),然而,多剂协作面临严重的安全挑战,如幻觉放大、错误注入和传播等。本文介绍了GUARDIAN,这是在GUARDINGIN Intelligent Agent CollaboratioNs中发现和减轻多种安全问题的统一方法。通过将多剂协作进程建模成一个离散时间时间时间时间分解图,GUARDIAN明确捕捉了幻象和错误的传播动态。无监督的编码解码结构包含一个渐进式培训模式,学会从潜在嵌入中重建节点属性和图形结构,从而能够以前所未有的精确度识别异常节点和边缘。此外,我们引入了一个基于信息波特纳克理论的图形抽象机制,它既压缩时间互动图,又保留基本模式。广泛的实验表明GURDIAN在保护LM多剂合作防止多种安全脆弱性方面的有效性,从而实现以高效的资源利用状态的精确性。
Article 74
Title@2025-05-25 (7): Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding
Title: Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding | Where Paths Collide: Eine umfassende Untersuchung der klassischen und lernbasierten multi-agenten Pathfinding | 路径相撞之处:对经典和以学习为基础的多方代理调查的全面调查 2505.19219v1 |
Authors: Shiyue Wang, Haozheng Xu, Yuhan Zhang, Jingran Lin, Changhong Lu, Xiangfeng Wang, Wenhao Li
Multi-Agent Path Finding (MAPF) is a fundamental problem in artificial intelligence and robotics, requiring the computation of collision-free paths for multiple agents navigating from their start locations to designated goals. As autonomous systems become increasingly prevalent in warehouses, urban transportation, and other complex environments, MAPF has evolved from a theoretical challenge to a critical enabler of real-world multi-robot coordination. This comprehensive survey bridges the long-standing divide between classical algorithmic approaches and emerging learning-based methods in MAPF research. We present a unified framework that encompasses search-based methods (including Conflict-Based Search, Priority-Based Search, and Large Neighborhood Search), compilation-based approaches (SAT, SMT, CSP, ASP, and MIP formulations), and data-driven techniques (reinforcement learning, supervised learning, and hybrid strategies). Through systematic analysis of experimental practices across 200+ papers, we uncover significant disparities in evaluation methodologies, with classical methods typically tested on larger-scale instances (up to 200 by 200 grids with 1000+ agents) compared to learning-based approaches (predominantly 10-100 agents). We provide a comprehensive taxonomy of evaluation metrics, environment types, and baseline selections, highlighting the need for standardized benchmarking protocols. Finally, we outline promising future directions including mixed-motive MAPF with game-theoretic considerations, language-grounded planning with large language models, and neural solver architectures that combine the rigor of classical methods with the flexibility of deep learning. This survey serves as both a comprehensive reference for researchers and a practical guide for deploying MAPF solutions in increasingly complex real-world applications.
多机构路径定位(MAPF)是人工智能和机器人研究中一个根本问题,需要计算从最初地点到指定目标的多种代理商的无碰撞路径。随着自动系统在仓库、城市交通和其他复杂环境中日益盛行,MAPF已经从理论挑战演变为现实世界多机器人协调的关键推动者。这一全面调查弥合了传统算法方法与MAPF研究中新出现的基于学习的方法之间的长期差距。我们提出了一个统一的框架,其中包括基于搜索的方法(包括基于冲突的搜索、基于优先权的搜索和大型邻里搜索)、日益基于汇编的方法(SAT、SMT、CSP、ASP和MIP的制定)以及数据驱动技术(加强学习、监管的学习和混合战略)。通过系统分析200+文件的实验做法,我们发现评价方法存在重大差异,典型方法通常在更大规模的解决方案中测试(超过200个网络,有1 000+的参考工具),而基于学习的方法(主要为10100个内行者)、基于汇编的方法(SAT、SMER、SAP和MIMF的深度模型选择,我们作为未来标准化的常规排序选择方法,我们最终的模型选择,包括标准化的标准化的模型,我们作为基础的模型选择。
Article 75
Title@2025-05-25 (7): Collaborative Agentic AI Needs Interoperability Across Ecosystems
Title: Collaborative Agentic AI Needs Interoperability Across Ecosystems | Kollaborative Agentische KI braucht Interoperabilität über Ökosysteme hinweg | AI 需要跨生态系统的互操作性 2505.21550v1 |
Authors: Rishi Sharma, Martijn de Vos, Pradyumna Chari, Ramesh Raskar, Anne-Marie Kermarrec
Collaborative agentic AI is projected to transform entire industries by enabling AI-powered agents to autonomously perceive, plan, and act within digital environments. Yet, current solutions in this field are all built in isolation, and we are rapidly heading toward a landscape of fragmented, incompatible ecosystems. In this position paper, we argue that interoperability, achieved by the adoption of minimal standards, is essential to ensure open, secure, web-scale, and widely-adopted agentic ecosystems. To this end, we devise a minimal architectural foundation for collaborative agentic AI, named Web of Agents, which is composed of four components: agent-to-agent messaging, interaction interoperability, state management, and agent discovery. Web of Agents adopts existing standards and reuses existing infrastructure where possible. With Web of Agents, we take the first but critical step toward interoperable agentic systems and offer a pragmatic path forward before ecosystem fragmentation becomes the norm.
合作代理AI预计将通过使AI动力代理商能够自主地看待、规划和在数字环境中行动来改变整个产业。然而,目前这一领域的解决方案都是孤立地构建的,我们正在迅速走向支离破碎、互不相容的生态系统的景观。在本立场文件中,我们争辩说,通过采用最低标准实现互操作性对于确保开放、安全、网络规模和广泛接受的代理生态系统至关重要。为此,我们为合作代理AI设计了一个最起码的建筑基础,名为“代理商网络 ” , 由四个组成部分组成:代理商对代理商信息传递、互动互操作性、国家管理和代理商发现。 代理商网络采用现有标准,并尽可能重新使用现有基础设施。有了代理商网络,我们迈出了迈向互操作性代理系统的第一个但至关重要的步骤,并为生态系统碎裂成为常规提供了一条务实的前进道路。
Article 76
Title@2025-05-25 (7): Interacting Large Language Model Agents. Interpretable Models and Social Learning
Title: Interacting Large Language Model Agents. Interpretable Models and Social Learning | Interagieren von Large Language Model Agents. Interpretierbare Modelle und soziales Lernen | 跨大语言示范工具、可解释模型和社会学习 2411.01271v2 |
Authors: Adit Jain, Vikram Krishnamurthy
This paper discusses the theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making involving interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from both prior decisions and external inputs, they can exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the necessary and sufficient conditions for rationally inattentive (bounded rationality) Bayesian utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our proposed models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under 2 settings: (a) centrally controlled LLMAs (b) autonomous LLMAs with incentives. We demonstrate the effectiveness of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like LLaMA and closed-source models like ChatGPT. The main takeaway of this paper, based on empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting.
本文讨论使用统计信号处理和微观经济学方法互动大型语言模型代理商(LLMAs)的理论和算法。 虽然这两个领域已经成熟,但它们在涉及互动LMAs的决策中的应用仍未被探索。 在Bayesian情绪分析的推动下,我们构建了可解释的模式和算法,使LMMAs能够互动和进行Bayesian推理。因为互动LMAs能够从先前的决定和外部投入中学习,它们可以表现出偏向和放牧行为。因此,开发可解释的Bayesian的货币模型和随机控制算法对于理解和减轻这些行为至关重要。本文有三个主要结果。首先,我们用Bayesian的揭示的微观经济学的偏好之处,让个人LMA能够满足合理强化(有限制的合理合理性)Bayesmas效用最大化的模型和(观察,LIMA)选择一种能最大限度地实现正规化用途的行动。 其次,我们利用Bayesians 公开的社会学学习模型来构建可解释的LMAs解释模型, 在进行Bayeses-almainaldealdealalal 分析时, 正在展示一个真实的模型。 我们的自我分析, 正在演示的常规分析, 正在展示的模型, 演示的自我分析。
Article 77
Title@2025-05-25 (7): Adversarial Bandit over Bandits: Hierarchical Bandits for Online Configuration Management
Title: Adversarial Bandit over Bandits: Hierarchical Bandits for Online Configuration Management | Adversarial Bandit über Bandits: Hierarchische Bandits für Online-Konfigurationsmanagement | 反强盗强盗: 用于在线配置管理的等级强盗 2505.19061v1 |
Authors: Chen Avin, Zvi Lotker, Shie Mannor, Gil Shabat, Hanan Shteingart, Roey Yadgar
Motivated by dynamic parameter optimization in finite, but large action (configurations) spaces, this work studies the nonstochastic multi-armed bandit (MAB) problem in metric action spaces with oblivious Lipschitz adversaries. We propose ABoB, a hierarchical Adversarial Bandit over Bandits algorithm that can use state-of-the-art existing “flat” algorithms, but additionally clusters similar configurations to exploit local structures and adapt to changing environments. We prove that in the worst-case scenario, such clustering approach cannot hurt too much and ABoB guarantees a standard worst-case regret bound of $O\left(k^{\frac{1}{2}}T^{\frac{1}{2}}\right)$, where $T$ is the number of rounds and $k$ is the number of arms, matching the traditional flat approach. However, under favorable conditions related to the algorithm properties, clusters properties, and certain Lipschitz conditions, the regret bound can be improved to $O\left(k^{\frac{1}{4}}T^{\frac{1}{2}}\right)$. Simulations and experiments on a real storage system demonstrate that ABoB, using standard algorithms like EXP3 and Tsallis-INF, achieves lower regret and faster convergence than the flat method, up to 50% improvement in known previous setups, nonstochastic and stochastic, as well as in our settings.
由有限但大型动作( 配置) 空间的动态参数优化驱动, 在有限但大型动作( 配置) 空间的动态参数优化下, 这项工作研究在不明显的利普西茨对手的矩阵行动空间中, 非随机多武装土匪( MAB) 问题。 我们提议ABoB, 一个等级的Aversarial Bandit, 而不是能够使用最先进的现有“ 缩放” 算法, 但是在与本地结构和适应变化环境有关的有利条件下, 将相似的配置分组组合起来。 我们证明, 在最坏的情况下, 这种群集方法不会伤害太多, 而 ABoB 保证在标准行动空间里 $O\left (kfraft) {1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\美元的标准最坏美元中, 美元的标准最坏 美元中, 标准上 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元是B\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
Article 78
Title@2025-05-25 (7): Adaptive Inference through Bayesian and Inverse Bayesian Inference with Symmetry-Bias in Nonstationary Environments
Title: Adaptive Inference through Bayesian and Inverse Bayesian Inference with Symmetry-Bias in Nonstationary Environments | Adaptive Schlussfolgerung durch Bayesische und Inverse Bayesische Schlussfolgerung mit Symmetrie-Bias in nichtstationären Umgebungen | 在非静止环境中,通过贝耶斯和反贝耶斯和反贝耶斯的同对称-比亚推理,进行适应性推理 2505.12796v3 |
Authors: Shuji Shinohara, Daiki Morita, Hayato Hirai, Ryosuke Kuribayashi, Nobuhito Manome, Toru Moriyama, Yoshihiro Nakajima, Yukio-Pegio Gunji, Ung-il Chung
This study introduces a novel inference framework, designated as Bayesian and inverse Bayesian (BIB) inference, which concurrently performs both conventional and inverse Bayesian updates by integrating symmetry bias into Bayesian inference. The effectiveness of the model was evaluated through a sequential estimation task involving observations sampled from a Gaussian distribution with a stochastically time-varying mean. Conventional Bayesian inference entails a fundamental trade-off between adaptability to abrupt environmental shifts and estimation accuracy during stable intervals. The BIB framework addresses this limitation by dynamically modulating the learning rate through inverse Bayesian updates, thereby enhancing adaptive flexibility. The BIB model generated spontaneous bursts in the learning rate during sudden environmental transitions, transiently entering a high-sensitivity state to accommodate incoming data. This intermittent burst-relaxation pattern functions as a dynamic mechanism that balances adaptability and accuracy. Further analysis of burst interval distributions demonstrated that the BIB model consistently produced power-law distributions under diverse conditions. Such robust scaling behavior, absent in conventional Bayesian inference, appears to emerge from a self-regulatory mechanism driven by inverse Bayesian updates. These results present a novel computational perspective on scale-free phenomena in natural systems and offer implications for designing adaptive inference systems in nonstationary environments.
这项研究引入了一个新的推论框架,称为贝耶斯和反贝伊西亚(BIB)的推论,它通过将对称偏差纳入巴伊西亚的推论,同时进行常规和反巴伊西亚的更新,同时进行常规和反巴伊西亚的更新,将对称偏差纳入巴伊西亚的推论;模型的有效性是通过连续的估算任务进行评估的,其中包括从高萨分布中抽取的、抽查时间变化平均值的观测结果;常规巴伊西亚的推论意味着在适应性变化和稳定间隔期间对突发环境变化的适应性和估计准确性进行根本的权衡。BIBIB框架通过动态调节学习率,从而增强适应性灵活性来解决这一局限性。BIB模型在突然的环境转型期间产生了学习率的自发爆发,为适应性地进入了适应性高敏感状态以适应数据。这种间歇性防爆松动模式功能是平衡适应性和准确性的动态机制。对断断层分布进行进一步分析表明,BIB模型在多种条件下持续产生权力法律的分布。这种稳健的缩行为,在传统的拜伊西亚的更新过程中,从目前对自控系统进行自我调整的系统进行自我调整的系统所产生的结果。
Article 79
Title@2025-05-25 (7): SANNet: A Semantic-Aware Agentic AI Networking Framework for Multi-Agent Cross-Layer Coordination
Title: SANNet: A Semantic-Aware Agentic AI Networking Framework for Multi-Agent Cross-Layer Coordination | SANNet: Ein Semantic-Aware Agentic AI Networking Framework für die multi-agente Cross-Layer-Koordination | SANNet: 多代理人跨行业协调的语义学-敏感物义学AI联网框架 2505.18946v1 |
Authors: Yong Xiao, Haoran Zhou, Xubo Li, Yayu Gao, Guangming Shi, Ping Zhang
Agentic AI networking (AgentNet) is a novel AI-native networking paradigm that relies on a large number of specialized AI agents to collaborate and coordinate for autonomous decision-making, dynamic environmental adaptation, and complex goal achievement. It has the potential to facilitate real-time network management alongside capabilities for self-configuration, self-optimization, and self-adaptation across diverse and complex networking environments, laying the foundation for fully autonomous networking systems in the future. Despite its promise, AgentNet is still in the early stage of development, and there still lacks an effective networking framework to support automatic goal discovery and multi-agent self-orchestration and task assignment. This paper proposes SANNet, a novel semantic-aware agentic AI networking architecture that can infer the semantic goal of the user and automatically assign agents associated with different layers of a mobile system to fulfill the inferred goal. Motivated by the fact that one of the major challenges in AgentNet is that different agents may have different and even conflicting objectives when collaborating for certain goals, we introduce a dynamic weighting-based conflict-resolving mechanism to address this issue. We prove that SANNet can provide theoretical guarantee in both conflict-resolving and model generalization performance for multi-agent collaboration in dynamic environment. We develop a hardware prototype of SANNet based on the open RAN and 5GS core platform. Our experimental results show that SANNet can significantly improve the performance of multi-agent networking systems, even when agents with conflicting objectives are selected to collaborate for the same goal.
AI网络(Agentic AINet)是一个全新的AI-National网络模式,它依靠大量专门的AI代理机构进行合作与协调,进行自主决策、动态环境适应和复杂的目标实现,具有促进实时网络管理的潜力,同时具备各种复杂网络环境的自我配置、自我优化和自我适应能力,为未来完全自主的网络系统奠定基础。尽管它有希望,AgentNet仍然处于早期发展阶段,而且仍然缺乏一个有效的网络框架来支持自动目标发现和多代理人自我定位和任务任务任务任务任务。本文提议SANNet,这是一个新型的SANNet,它是一个精辟的Smantic-aware代理AI网络网络网络结构,可以推断用户的语义化目标,并自动指定与不同层次移动系统相关的代理机构来实现推断的目标。由于AgentNet的主要挑战之一是,不同代理人在为某些目标开展协作时,甚至可能存在相互矛盾的目标,我们引入一个动态的基于冲突加权的冲突解决机制,我们证明SAN-AN网络能够大大改进基于动态的SAN目标的SAN目标,从而大大改进我们以动态的硬件五号网络的模型化环境合作。
Article 80
Title@2025-05-24 (6): Distributed Set-membership Filtering Frameworks For Multi-agent Systems With Absolute and Relative Measurements
Title: Distributed Set-membership Filtering Frameworks For Multi-agent Systems With Absolute and Relative Measurements | Distributed Set-Membership Filtering Frameworks für Multi-Agent-Systeme mit absoluten und relativen Messungen | 具有绝对和相对计量的多试剂系统分布式成员筛选框架 2305.15797v2 |
Authors: Yu Ding, Yirui Cong, Xiangke Wang
In this paper, we focus on the distributed set-membership filtering (SMFing) problem for a multi-agent system with absolute (taken from agents themselves) and relative (taken from neighbors) measurements. In the literature, the relative measurements are difficult to deal with, and the SMFs highly rely on specific set descriptions. As a result, establishing the general distributed SMFing framework having relative measurements is still an open problem. To solve this problem, first, we provide the set description based on uncertain variables determined by the relative measurements between two agents as the foundation. Surprisingly, the accurate description requires only a single calculation step rather than multiple iterations, which can effectively reduce computational complexity. Based on the derived set description, called the uncertain range, we propose two distributed SMFing frameworks: one calculates the joint uncertain range of the agent itself and its neighbors, while the other only computes the marginal uncertain range of each local system. Furthermore, we compare the performance of our proposed two distributed SMFing frameworks and the benchmark – centralized SMFing framework. A rigorous set analysis reveals that the distributed SMF can be essentially considered as the process of computing the marginal uncertain range to outer bound the projection of the uncertain range obtained by the centralized SMF in the corresponding subspace. Simulation results corroborate the effectiveness of our proposed distributed frameworks and verify our theoretical analysis.
在本文中,我们侧重于对一个具有绝对(从代理人本身获得)和相对(从邻居获得)测量的多试剂系统进行分布式成员过滤(SMF)的问题。在文献中,相对测量很难处理,而SMF高度依赖特定描述。因此,建立具有相对测量的分布式成员过滤(SMF)框架仍然是一个尚未解决的问题。为了解决这个问题,首先,我们根据两个代理人之间相对测量所决定的不确定变数提供一套描述。令人惊讶的是,准确描述只需要一个单一的计算步骤,而不是多个迭代,才能有效减少计算的复杂性。根据衍生的既定描述,称为不确定范围,我们建议两个分布式成员高度依赖特定描述框架:一个计算代理人本身及其邻居的共同不确定性范围,而另一个仅计算每个地方系统的边际不确定范围。此外,我们比较了我们拟议的两个分布式的SMFF框架和基准 – – 集中的SMF框架 – – 令人惊讶地分析显示,分布式SMF的分布式SMF,基本上可以考虑通过我们拟议的不确定的边际空间预测的中央范围,我们拟议的SMF的边际分析,而将SMF的边际预测的边际范围视为我们提议的SMF的边际预测。
Article 81
Title@2025-05-24 (6): Coordinated guidance and control for multiple parafoil system landing
Title: Coordinated guidance and control for multiple parafoil system landing | Koordinierte Führung und Steuerung für die Landung mehrerer Parafoil-Systeme | 协调制导和管制多个抛油系统着陆的协调制导和控制 2505.18691v1 |
Authors: Zhenyu Wei, Zhijiang Shao, Lorenz T. Biegler
Multiple parafoil landing is an enabling technology for massive supply delivery missions. However, it is still an open question to design a collision-free, computation-efficient guidance and control method for unpowered parafoils. To address this issue, this paper proposes a coordinated guidance and control method for multiple parafoil landing. First, the multiple parafoil landing process is formulated as a trajectory optimization problem. Then, the landing point allocation algorithm is designed to assign the landing point to each parafoil. In order to guarantee flight safety, the collision-free trajectory replanning algorithm is designed. On this basis, the nonlinear model predictive control algorithm is adapted to leverage the nonlinear dynamics model for trajectory tracking. Finally, the parafoil kinematic model is utilized to reduce the computational burden of trajectory calculation, and kinematic model is updated by the moving horizon correction algorithm to improve the trajectory accuracy. Simulation results demonstrate the effectiveness and computational efficiency of the proposed coordinated guidance and control method for the multiple parafoil landing.
多重抛油着陆是大规模供应交付任务的一项赋能技术,然而,设计无碰撞、计算高效的无动力抛油制导和控制方法仍是一个未决问题。为解决这一问题,本文件提议了多种抛油着陆的协调制导和控制方法。首先,多个抛油着陆程序被设计成轨迹优化问题。然后,着陆点分配算法旨在为每个抛油指定着陆点。为了保证飞行安全,设计了无碰撞轨迹重新规划算法。在此基础上,非线性模型预测控制算法被调整为利用非线性动态模型进行轨迹跟踪。最后,利用了抛油动力模型来减少轨迹计算计算中的计算负担,并通过移动地平线修正算法更新了运动模型,以提高轨迹准确性。模拟结果显示了拟议多重抛油着陆的协调制导和控制方法的有效性和计算效率。
Article 82
Title@2025-05-24 (6): Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi
Title: Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi | Erweiterung des Aktionsraums mit Konventionen zur Verbesserung der Multi-Agenten-Kooperation in Hanabi | 与公约扩大行动空间,以改进哈纳比多剂合作 2412.06333v3 |
Authors: F. Bredell, H. A. Engelbrecht, J. C. Schoeman
The card game Hanabi is considered a strong medium for the testing and development of multi-agent reinforcement learning (MARL) algorithms, due to its cooperative nature, partial observability, limited communication and remarkable complexity. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on advanced architecture design and algorithmic manipulations to achieve state-of-the-art performance for various number of cooperators. However, this often leads to complex solution strategies with high computational cost and requiring large amounts of training data. For humans to solve the Hanabi game effectively, they require the use of conventions, which often allows for a means to implicitly convey ideas or knowledge based on a predefined, and mutually agreed upon, set of “rules” or principles. Multi-agent problems containing partial observability, especially when limited communication is present, can benefit greatly from the use of implicit knowledge sharing. In this paper, we propose a novel approach to augmenting an agent’s action space using conventions, which act as a sequence of special cooperative actions that span over and include multiple time steps and multiple agents, requiring agents to actively opt in for it to reach fruition. These conventions are based on existing human conventions, and result in a significant improvement on the performance of existing techniques for self-play and cross-play for various number of cooperators within Hanabi.
汉娜比牌牌游戏被认为是测试和开发多试剂强化学习算法的强大媒介,因为它具有合作性质,具有部分可观察性,通信有限,而且非常复杂。以前的研究工作探索了汉娜比内部MARL算法的能力,主要侧重于先进的建筑设计和算法操纵,以便为各种合作者实现最先进的性能。然而,这往往导致复杂的解决方案战略,其计算成本高,需要大量的培训数据。对于有效解决汉娜比游戏的人来说,它们需要使用公约,这往往允许以事先确定、相互商定的一套“规则”或原则为基础,以隐含的方式传递思想或知识。含有部分可遵守性的多剂问题,特别是在存在有限通信的情况下,可以从使用隐含的知识分享中受益。在这份文件中,我们建议一种新办法,用公约来扩大代理人的行动空间,这些公约是跨越和包括多个时间步骤和多个代理人的特别合作行动,要求代理人积极选择以预先确定、相互商定的“规则”或原则为基础的思想或知识。这些包含部分可遵守性的多剂问题,特别是当存在有限的通信时,可以从使用隐含的知识分享知识分享知识。在现有的各种公约中,这些公约中以取得显著的成绩。这些公约以现有的技术为基础,这些公约以现有的共同结果为基础,这些公约为基础,这些公约是建立在现有公约中,这些公约以现有公约的相互促进。
Article 83
Title@2025-05-24 (6): DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation
Title: DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation | DDO: Dual-Decision-Optimierung durch Multi-Agent-Kollaboration für LLM-basierte medizinische Beratung | DDO:通过多方机构协作,优化基于LLM的医疗咨询的双重决定 2505.18630v1 |
Authors: Zhihao Jia, Mingyi Jia, Junwen Duan, Jianxin Wang
Large Language Models (LLMs) demonstrate strong generalization and reasoning abilities, making them well-suited for complex decision-making tasks such as medical consultation (MC). However, existing LLM-based methods often fail to capture the dual nature of MC, which entails two distinct sub-tasks: symptom inquiry, a sequential decision-making process, and disease diagnosis, a classification problem. This mismatch often results in ineffective symptom inquiry and unreliable disease diagnosis. To address this, we propose \textbf{DDO}, a novel LLM-based framework that performs \textbf{D}ual-\textbf{D}ecision \textbf{O}ptimization by decoupling and independently optimizing the the two sub-tasks through a collaborative multi-agent workflow. Experiments on three real-world MC datasets show that DDO consistently outperforms existing LLM-based approaches and achieves competitive performance with state-of-the-art generation-based methods, demonstrating its effectiveness in the MC task.
大型语言模型(LLMs)展示了很强的概括和推理能力,使其非常适合医疗咨询(MC)等复杂决策任务。然而,基于LLM的现有方法往往无法捕捉MC的双重性质,这需要两种不同的子任务:症状调查、顺序决策程序和疾病诊断、分类问题。这种不匹配往往导致无效的症状调查和不可靠的疾病诊断。为了解决这个问题,我们提议建立基于LLM的新颖框架,即通过多剂协作工作流程脱钩和独立优化这两个子任务,从而发挥\ textbf{DDDDO}的功能,显示DDDO在三个真实世界的MC数据集方面的实验显示,DDDO始终超越了现有的LMM方法,并以基于现代的新一代方法实现竞争性业绩,显示了其在MC任务中的有效性。
Article 84
Title@2025-05-24 (6): An Identity Based Agent Model for Value Alignment
Title: An Identity Based Agent Model for Value Alignment | Ein identitätsbasiertes Agentenmodell für die Wertausrichtung | 基于身份的保值调整代理模型 2401.12159v4 |
Authors: Karthik Sama, Janvi Chhabra, Arpitha Srivatsha Malavalli, Jayati Deshmukh, Srinath Srinivasa
Social identities play an important role in the dynamics of human societies, and it can be argued that some sense of identification with a larger cause or idea plays a critical role in making humans act responsibly. Often social activists strive to get populations to identify with some cause or notion – like green energy, diversity, etc. in order to bring about desired social changes. We explore the problem of designing computational models for social identities in the context of autonomous AI agents. For this, we propose an agent model that enables agents to identify with certain notions and show how this affects collective outcomes. We also contrast between associations of identity with rational preferences. The proposed model is simulated in an application context of urban mobility, where we show how changes in social identity affect mobility patterns and collective outcomes.
社会认同在人类社会动态中起着重要作用,可以说,某种认同感与更大的原因或思想的认同感在使人类行为负责任方面起着关键作用。社会活跃分子往往努力让民众认同某种原因或概念,如绿色能源、多样性等,以便带来理想的社会变革。我们探讨了在自主的AI代理机构背景下设计社会认同计算模型的问题。为此,我们提出了一个代理模型,使代理商能够认同某些概念,并表明这如何影响集体结果。我们还在认同与合理偏好之间的对比。拟议的模型在城市流动性应用中模拟,我们展示社会认同的变化如何影响流动性模式和集体结果。
Article 85
Title@2025-05-24 (6): MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations
Title: MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations | MisoDICE: Multi-Agent-Imitation aus nicht gekennzeichneten Mixed-Quality-Demonstrationen | MisoDICE:从未贴标签的混合质量示范中多机构吸收 2505.18595v1 |
Authors: The Viet Bui, Tien Mai, Hong Thanh Nguyen
We study offline imitation learning (IL) in cooperative multi-agent settings, where demonstrations have unlabeled mixed quality - containing both expert and suboptimal trajectories. Our proposed solution is structured in two stages: trajectory labeling and multi-agent imitation learning, designed jointly to enable effective learning from heterogeneous, unlabeled data. In the first stage, we combine advances in large language models and preference-based reinforcement learning to construct a progressive labeling pipeline that distinguishes expert-quality trajectories. In the second stage, we introduce MisoDICE, a novel multi-agent IL algorithm that leverages these labels to learn robust policies while addressing the computational complexity of large joint state-action spaces. By extending the popular single-agent DICE framework to multi-agent settings with a new value decomposition and mixing architecture, our method yields a convex policy optimization objective and ensures consistency between global and local policies. We evaluate MisoDICE on multiple standard multi-agent RL benchmarks and demonstrate superior performance, especially when expert data is scarce.
在合作性多试剂环境下,我们研究脱线模仿学习(IL),在这种环境中,示范品具有无标签的混合质量,既包括专家,也包括次最佳轨迹。我们提议的解决方案分为两个阶段:轨迹标签和多试剂模仿学习,共同设计,以便从多种无标签的数据中有效学习。在第一阶段,我们把大语言模型的进展和基于优惠的强化学习结合起来,以建立一个逐步标记的管道,区分专家质量的轨迹。在第二阶段,我们引入了MisoDICE,这是一个新的多试剂IL算法,利用这些标签学习稳健的政策,同时解决大型联合州-行动空间的计算复杂性。通过将流行的单剂 DICE 框架扩大到具有新价值分解和混合结构的多剂环境,我们的方法产生一个方位政策优化目标,确保全球和地方政策的一致性。我们评估多标准多试剂RL基准的MisoDICE,并展示高性业绩,特别是在专家数据稀缺的情况下。
Article 86
Title@2025-05-24 (6): MASTER: Multi-Agent Security Through Exploration of Roles and Topological Structures – A Comprehensive Framework
Title: MASTER: Multi-Agent Security Through Exploration of Roles and Topological Structures – A Comprehensive Framework | MASTER: Multi-Agent Sicherheit durch Erforschung von Rollen und topologischen Strukturen – Ein umfassender Rahmen | 通过探索作用和地形结构实现多机构安全 – – 综合框架 2505.18572v1 |
Authors: Yifan Zhu, Chao Zhang, Xin Shi, Xueqiao Zhang, Yi Yang, Yawei Luo
Large Language Models (LLMs)-based Multi-Agent Systems (MAS) exhibit remarkable problem-solving and task planning capabilities across diverse domains due to their specialized agentic roles and collaborative interactions. However, this also amplifies the severity of security risks under MAS attacks. To address this, we introduce MASTER, a novel security research framework for MAS, focusing on diverse Role configurations and Topological structures across various scenarios. MASTER offers an automated construction process for different MAS setups and an information-flow-based interaction paradigm. To tackle MAS security challenges in varied scenarios, we design a scenario-adaptive, extensible attack strategy utilizing role and topological information, which dynamically allocates targeted, domain-specific attack tasks for collaborative agent execution. Our experiments demonstrate that such an attack, leveraging role and topological information, exhibits significant destructive potential across most models. Additionally, we propose corresponding defense strategies, substantially enhancing MAS resilience across diverse scenarios. We anticipate that our framework and findings will provide valuable insights for future research into MAS security challenges.
大型语言模型(LLMS)基于多种行为者系统的大型语言模型(LLMS)由于其专业代理作用和协作互动,在各个领域都表现出了显著的解决问题和任务规划能力。然而,这也扩大了MAS攻击下的安全风险的严重性。为了解决这个问题,我们介绍了MASTER,这是MAS的新的安全研究框架,侧重于各种不同的角色配置和地形结构。MASER为不同的MAS设置和基于信息流的互动模式提供了一个自动化的建设过程。为了应对不同情景中的MAS安全挑战,我们设计了一个情景适应、可扩展的攻击战略,利用角色和地形信息,积极分配有针对性的、针对特定领域的攻击任务用于合作代理执行。我们的实验表明,这种攻击、作用和地形信息在大多数模式中都具有巨大的破坏性潜力。此外,我们提出了相应的国防战略,大大增强MAS在不同情景中的复原力。我们预计,我们的框架和研究结果将为MAS安全挑战的未来研究提供宝贵的见解。
Article 87
Title@2025-05-24 (6): MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs
Title: MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs | MRGAgents: Multi-Agenten-Rahmen für verbesserte medizinische Report-Generation mit Med-LVLMs | MRGGGGss: 采用医疗低水平医疗报告制改进医疗报告制的多机构框架 2505.18530v1 |
Authors: Pengyu Wang, Shuchang Ye, Usman Naseem, Jinman Kim
Medical Large Vision-Language Models (Med-LVLMs) have been widely adopted for medical report generation. Despite Med-LVLMs producing state-of-the-art performance, they exhibit a bias toward predicting all findings as normal, leading to reports that overlook critical abnormalities. Furthermore, these models often fail to provide comprehensive descriptions of radiologically relevant regions necessary for accurate diagnosis. To address these challenges, we proposeMedical Report Generation Agents (MRGAgents), a novel multi-agent framework that fine-tunes specialized agents for different disease categories. By curating subsets of the IU X-ray and MIMIC-CXR datasets to train disease-specific agents, MRGAgents generates reports that more effectively balance normal and abnormal findings while ensuring a comprehensive description of clinically relevant regions. Our experiments demonstrate that MRGAgents outperformed the state-of-the-art, improving both report comprehensiveness and diagnostic utility.
医用大型视力-语言模型(Med-LVLMs)已被广泛采用,用于编写医疗报告。尽管Med-LVLMs生产了最先进的性能,但它们表现出一种偏向于将所有发现都预测为正常的倾向,导致报告忽略了关键的异常现象。此外,这些模型往往不能全面描述准确诊断所需的与放射有关的区域。为了应对这些挑战,我们提议了医疗报告生成代理(Med-LVLMs),这是一个微调不同疾病类别专用剂的新多试剂框架。通过将IU X射线和MIMIC-CXR数据集的子集用于培训特定疾病制剂,MRGGGents生成的报告能够更有效地平衡正常和异常的调查结果,同时确保全面描述与临床有关的区域。我们的实验表明,MRGGGents超越了最新技术,改进了报告的全面性和诊断效用。
Article 88
Title@2025-05-24 (6): Group Trip Planning Query Problem with Multimodal Journey
Title: Group Trip Planning Query Problem with Multimodal Journey | Gruppenreiseplanungs-Abfrage-Problem mit multimodaler Reise | 具有多模式旅程的问询问题 2502.03144v2 |
Authors: Dildar Ali, Suman Banerjee, Yamuna Prasad
In Group Trip Planning (GTP) Query Problem, we are given a city road network where a number of Points of Interest (PoI) have been marked with their respective categories (e.g., Cafeteria, Park, Movie Theater, etc.). A group of agents want to visit one PoI from every category from their respective starting location and once finished, they want to reach their respective destinations. This problem asks which PoI from every category should be chosen so that the aggregated travel cost of the group is minimized. This problem has been studied extensively in the last decade, and several solution approaches have been proposed. However, to the best of our knowledge, none of the existing studies have considered the different modalities of the journey, which makes the problem more practical. To bridge this gap, we introduce and study the GTP Query Problem with Multimodal Journey in this paper. Along with the other inputs of the GTP Query Problem, we are also given the different modalities of the journey that are available and their respective cost. Now, the problem is not only to select the PoIs from respective categories but also to select the modality of the journey. For this problem, we have proposed an efficient solution approach, which has been analyzed to understand their time and space requirements. A large number of experiments have been conducted using real-life datasets and the results have been reported. From the results, we observe that the PoIs and modality of journey recommended by the proposed solution approach lead to much less time and cost than the baseline methods.
在集体旅行规划(GTP)查询问题中,我们得到了一个城市公路网络,在这个网络中,一些利益点(PoI)被标记为各自的类别(如食堂、公园、电影剧院等)。一组代理商希望从各自的起始地点访问每个类别的一个PoI,一旦完成,他们就希望到达各自的目的地。这个问题要求从每个类别中选择哪个PoI,以便尽可能降低该团体的总旅行费用。过去十年中,这个问题得到了广泛的研究,并提出了若干解决办法。然而,据我们所知,现有研究中没有一个考虑过不同的旅行方式,使问题更加实际化。为了弥合这一差距,我们介绍并研究GTP Query 问题,在本文中用多模式 Journey 。除了GTP Querney 问题的其他投入之外,我们还考虑到现有的旅行方式不同,以及它们各自的费用。现在,问题不仅仅是从不同的类别中选择PoI,而是提出了几种解决办法。根据我们所知的类别选择了不同的旅行方式,而是选择了不同的旅行方式,我们所建议的一种方法。我们用的是, 已经选择了一种大的方法。我们所建议的方法。我们使用的是, 已经选择了一种方法, 已经选择了一种方法。我们所使用的是了一种方法。我们所建议的方法, 已经选择了一种方法。 已经选择了一种方法, 。 已经选择了一种方法是了一种方法。我们所使用的是了一种方法, 已经选择了一种方法, 。
Article 89
Title@2025-05-24 (6): TextArena
Title: TextArena | TextArena | TextArenna 文本 2504.11442v2 |
Authors: Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen, Cheston Tan
TextArena is an open-source collection of competitive text-based games for training and evaluation of agentic behavior in Large Language Models (LLMs). It spans 57+ unique environments (including single-player, two-player, and multi-player setups) and allows for easy evaluation of model capabilities via an online-play system (against humans and other submitted models) with real-time TrueSkill scores. Traditional benchmarks rarely assess dynamic social skills such as negotiation, theory of mind, and deception, creating a gap that TextArena addresses. Designed with research, community and extensibility in mind, TextArena emphasizes ease of adding new games, adapting the framework, testing models, playing against the models, and training models. Detailed documentation of environments, games, leaderboard, and examples are available on https://github.com/LeonGuertler/TextArena and https://www.textarena.ai/.
TextArena是用于培训和评价大语言模型中代理人行为的竞争性文字游戏的公开来源汇编,涵盖57+独特环境(包括单玩家、双玩家和多玩家设置),便于通过在线游戏系统(针对人类和其他提交的模型)对模型能力进行评估,并配有实时TeurSkill分数。传统基准很少评估动态社会技能,如谈判、思想理论和欺骗,造成TextArena地址的差距。TextArena用研究、社区和可扩展性设计,TextArena强调增加新游戏、调整框架、测试模型、对照模型进行游戏和培训模型的方便性。关于环境、游戏、领导板和实例的详细文件可在https://github./com/LeonGuertler/TextArena和https://www.textarena.ai/上查阅。
Article 90
Title@2025-05-24 (6): EdgeAgentX: A Novel Framework for Agentic AI at the Edge in Military Communication Networks
Title: EdgeAgentX: A Novel Framework for Agentic AI at the Edge in Military Communication Networks | EdgeAgentX: Ein neuartiges Framework für Agentische KI am Rand in militärischen Kommunikationsnetzwerken | EdgeAgengengenderX:军事通信网络边缘地带AAA剂性AI新框架 2505.18457v1 |
Authors: Abir Ray
This paper introduces EdgeAgentX, a novel framework integrating federated learning (FL), multi-agent reinforcement learning (MARL), and adversarial defense mechanisms, tailored for military communication networks. EdgeAgentX significantly improves autonomous decision-making, reduces latency, enhances throughput, and robustly withstands adversarial disruptions, as evidenced by comprehensive simulations.
本文介绍了EdgeAgentix(EdgeAgentix ) , 这是一个整合联邦学习(FL ) 、 多剂强化学习(MARL ) 和对抗性防御机制的新框架,专门为军事通信网络设计。 全面模拟证明,EdgeAgentiX(EdgeAgentix)极大地改进了自主决策,减少了潜伏,提高了吞吐量,并有力地抵御了对抗性干扰。
Article 91
Title@2025-05-24 (6): Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning
Title: Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning | Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methoden für dezentralisiertes Mehr-Agenten-Verstärkungs-Lernen | 分散式多机构强化学习的深神经立体-集中式多机构强化学习方法中全球最佳程度趋同 2505.18433v1 |
Authors: Zhiyao Zhang, Myeung Suk Oh, FNU Hairi, Ziyue Luo, Alvaro Velasquez, Jia Liu
Actor-critic methods for decentralized multi-agent reinforcement learning (MARL) facilitate collaborative optimal decision making without centralized coordination, thus enabling a wide range of applications in practice. To date, however, most theoretical convergence studies for existing actor-critic decentralized MARL methods are limited to the guarantee of a stationary solution under the linear function approximation. This leaves a significant gap between the highly successful use of deep neural actor-critic for decentralized MARL in practice and the current theoretical understanding. To bridge this gap, in this paper, we make the first attempt to develop a deep neural actor-critic method for decentralized MARL, where both the actor and critic components are inherently non-linear. We show that our proposed method enjoys a global optimality guarantee with a finite-time convergence rate of O(1/T), where T is the total iteration times. This marks the first global convergence result for deep neural actor-critic methods in the MARL literature. We also conduct extensive numerical experiments, which verify our theoretical results.
分散多剂加固强化学习(MARL)的操作-批评方法(MARL)有助于在没有集中协调的情况下进行协作的最佳决策,从而在实践上能够实现广泛的应用。但是,迄今为止,对现有行为者-批评者分散的MARL方法的大多数理论趋同研究都限于在线性功能近似下保证固定的解决办法。这在实际中非常成功地使用深神经行为者-批评方法将MARL分散化与目前的理论理解之间留下了巨大差距。为了缩小这一差距,我们本文件首次试图为分散的MARL开发一种深神经行为者-批评方法,在这种方法中,行为者和评论家的成分本来就是非线性。我们表明,我们所提议的方法具有全球最佳性保证,有O(1/T)的有限时间趋同率,而T是总循环时间。这标志着MAR文献中深海神经行为者-批评方法的第一个全球趋同结果。我们还进行广泛的数字实验,以核实我们的理论结果。