cs.MA @ 2025-07-25: 059
-
00 07-24 (4) Moving Out: Physically-grounded Human-AI Collaboration Ausstieg: physikalisch begründete Mensch-AI-Kollaboration 搬出:基于身体的人类 – – AI协作 2507.18623v1 -
01 07-24 Remembering the Markov Property in Cooperative MARL Erinnerung an das Markov-Grundstück in der Genossenschaft MARL 记得马尔科夫在MARL合作社中的财产 2507.18333v1 -
02 07-24 Designing Value-Aligned Traffic Agents through Conflict Sensitivity Gestaltung wertorientierter Verkehrsagenten durch Konfliktsensitivität 通过冲突敏感性设计符合价值的交通代理 2507.18284v1 -
03 07-24 Compositional Coordination for Multi-Robot Teams with Large Language Models Kompositionskoordination für Multi-Roboter-Teams mit großen Sprachmodellen 具有大语言模式的多机器人小组的组成协调 2507.16068v2 -
04 07-24 Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation Assembly Your Crew: Automatisches Multi-Agenten-Kommunikationstopologie-Design über autoregressive Graphen-Generierung 通过自动递减图形生成将您的组群组合成:自动多剂多剂通信地形设计 2507.18224v1 -
05 07-24 A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms Eine differenzierte Prämienmethode für verstärktes Lernen auf der Grundlage von Multi-Fahrzeug-Kooperativen-Entscheidungs-Making-Algorithmen 基于多维合作社决策的强化学习有区别的奖励方法 2502.00352v2 -
06 07-24 Multi-Agent Guided Policy Optimization Multi-Agent gesteuerte Politikoptimierung 多边机构引导政策优化政策 2507.18059v1 -
07 07-23 (3) Rapid Modeling Architecture for Lightweight Simulator to Accelerate and Improve Decision Making for Industrial Systems Schnelle Modellierungsarchitektur für leichte Simulatoren zur Beschleunigung und Verbesserung der Entscheidungsfindung für industrielle Systeme 加快和改进工业系统决策的轻型模拟器快速建模架构 2507.17990v1 -
08 07-23 Learning in Conjectural Stackelberg Games Lernen in Conjectural Stackelberg Spiele 在Cantuatural Stakkelberg运动会学习 2501.13686v3 -
09 07-23 Fair Compromises in Participatory Budgeting: a Multi-Agent Deep Reinforcement Learning Approach Faire Kompromisse bei der partizipativen Budgetierung: ein multi-agent-basierter Lernansatz zur Vertiefung der Stärkung 参与性预算编制的公平折衷:多机构机构深入强化学习方法 2507.17433v1 -
10 07-23 Agent Identity Evals: Measuring Agentic Identity Agent Identity Evals: Messung Agentischer Identität Evals: 测量制剂身份 2507.17257v1 -
11 07-23 Regret Minimization in Population Network Games: Vanishing Heterogeneity and Convergence to Equilibria Entdauern Minimierung in Population Network Games: Verschwundene Heterogenität und Konvergenz zu Equilibria 人口网络运动会的遗憾最小化:消除异异质性和融合到平衡 2507.17183v1 -
12 07-23 Adaptive Graph Pruning for Multi-Agent Communication Adaptives Graph Pruning für Multi-Agent Kommunikation 多机构通信调节图 2506.02951v3 -
13 07-23 Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination Resiliente Multi-Agent-Verhandlung für medizinische Lieferketten:Integration von LLMs und Blockchain für transparente Koordination 关于医疗供应链的具有弹性的多机构谈判:整合LLMM和透明协调的链锁 2507.17134v1 -
14 07-23 Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning Gemeinsame Fußgänger- und Fahrzeugverkehrsoptimierung in städtischen Umgebungen mittels Verstärkungslernen 利用强化学习在城市环境中联合优化步行和车辆交通 2504.05018v2 -
15 07-22 (2) Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems Parallelität trifft auf Anpassungsfähigkeit: Skalierbare Dokumente verstehen in Multi-Agent LLM-Systemen 适应性:多机构LLM系统中可缩放文件理解 2507.17061v1 -
16 07-22 AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation AURA: Multi-Modal Medical Agent für Verständnis, Vernunft und Annotation AURA:一个多模式医疗代理,用于理解、说明理由和说明 2507.16940v1 -
17 07-22 From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent AI safety benchmarks Von der Homöostase bis zur Ressourcenteilung: Biologisch und wirtschaftlich ausgerichtete multi-objektive Multi-Agenten-KI-Sicherheits-Benchmarks 从原生状态到资源共享:生物和经济上一致的多目标多试剂AI安全基准 2410.00081v4 -
18 07-22 Smooth Games of Configuration in the Linear-Quadratic Setting Glatte Spiele der Konfiguration in der linearen-Quadrat-Einstellung 线性二次曲线设置中的配置平滑游戏 2507.16611v1 -
19 07-22 Low complexity convergence rate bounds for push-sum algorithms with homogeneous correlation structure Grenzen der Konvergenzrate geringer Komplexität für Push-Summe-Algorithmen mit homogener Korrelationsstruktur 低复杂合并率的低复杂合并率约束值,用于具有同质相关结构的推算算法-总算算法 2507.16601v1 -
20 07-22 Budget Allocation Policies for Real-Time Multi-Agent Path Finding Budgetzuweisungsrichtlinien für die Echtzeit-Multi-Agent-Pfadsuche 实时多机构道路寻找的预算拨款政策 2507.16874v1 -
21 07-22 COMPASS: Cooperative Multi-Agent Persistent Monitoring using Spatio-Temporal Attention Network COMPASS: Kooperatives Multi-Agenten-Persistenz-Monitoring mit Spatio-Temporal Attention Network COMASS:利用斯帕蒂奥-时地注意网络进行多主动合作性持久性监测 2507.16306v1 -
22 07-22 Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping Multi-Agenten-Verstärkungs-Lernen für stichprobeneffiziente Tiefen-Neural-Netzwerk-Mapping 用于抽样有效深神经网络绘图的多机构强化学习 2507.16249v1 -
23 07-22 Unbeatable imitation of a friend Unschlagbare Nachahmung eines Freundes 对朋友的无敌模仿 2507.16221v1 -
24 07-22 Heterogeneous Mixed Traffic Control and Coordination Heterogene gemischte Verkehrssteuerung und -koordinierung 异异混合混合交通控制和协调 2409.12330v2 -
25 07-22 Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations Aitomia: Ihr intelligenter Assistent für KI-getriebene Atomistische und Quantum Chemical Simulationen Aitomia:您对AI-Driven原子学和量子化学模拟的智能助理 2505.08195v3 -
26 07-21 (1) AI-driven Orchestration at Scale: Estimating Service Metrics on National-Wide Testbeds KI-getriebene Orchestrierung im Maßstab: Bewertung von Service-Metriken auf national-breiten Testbeds AI驱动的缩放式手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手 2507.16077v1 -
27 07-21 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra LLM Economist: Große Bevölkerungsmodelle und Mechanism Design in Multi-Agent Generative Simulacra LLM 经济学家:多机构生成模拟中大型人口模型和机制设计 2507.15815v1 -
28 07-21 Competitive Algorithms for Cooperative Multi-Agent Ski-Rental Problems Wettbewerbsfähige Algorithmen für kooperative Multi-Agenten-Ski-Mietprobleme 合作性多机构天空-天空问题的竞争价值 2507.15727v1 -
29 07-21 Asynchronous Collective Tree Exploration: a Distributed Algorithm, and a new Lower Bound Asynchronous Collective Tree Exploration: ein verteilter Algorithmus und ein neuer Lower Bound 无同步集体树木勘探:分配的数值和新的下层圆环 2507.15658v1 -
30 07-21 Preventing Rogue Agents Improves Multi-Agent Collaboration Verhindern von Rogue-Agenten verbessert Multi-Agenten-Kollaboration B. 改进多机构协作 2502.05986v2 -
31 07-21 HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics HAMLET: Hyperadaptive agentenbasierte Modellierung für lebend-embod Theatrics HAMLET:基于超适应性制剂的活体编织戏剧模型模型 2507.15518v1 -
32 07-24 (4) Recognizing and Eliciting Weakly Single Crossing Profiles on Trees Erkennen und Elizitieren von schwachen einzelnen Kreuzungsprofilen auf Bäumen 承认树树和树的脆弱单一交叉概况 1611.04175v4 -
33 07-21 (1) MobileUse: A GUI Agent with Hierarchical Reflection for Autonomous Mobile Operation MobileUse: Ein GUI-Agent mit Hierarchischer Reflexion für autonomen mobilen Betrieb 移动用途: 自主移动行动等级反射的图形界面代理 2507.16853v1 -
34 07-21 One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms Ein Schritt ist genug: Multi-Agenten-Verstärkung-Lernen basierend auf One-Step-Politikoptimierung für Order Dispatch auf Ride-Sharing-Plattformen 第一步就足够了:以单步政策优化为基础,开展多机构强化学习,以发出分流平台命令 2507.15351v1 -
35 07-21 IM-Chat: A Multi-agent LLM-based Framework for Knowledge Transfer in Injection Molding Industry IM-Chat: Ein LLM-basierter Rahmen für den Wissenstransfer in der Spritzgießindustrie IM-Chat:一个基于多剂LLM的注射诱导业知识转让框架 2507.15268v1 -
36 07-21 Advancing Responsible Innovation in Agentic AI: A study of Ethical Frameworks for Household Automation Advancing Responsible Innovation in Agentic AI: Eine Studie über ethische Rahmenbedingungen für die Haushaltsautomatisierung 推进AI:家庭自动化道德框架研究 2507.15901v1 -
37 07-20 (7) STL-GO: Spatio-Temporal Logic with Graph Operators for Distributed Systems with Multiple Network Topologies STL-GO: Spatio-Temporale Logik mit Graph Operatoren für verteilte Systeme mit mehreren Netzwerktopologien STL-GO: 与具有多网络地形分布式分布式系统的图表操作员一起的时空空间逻辑 2507.15147v1 -
38 07-20 Can We Move Freely in NEOM’s The Line? An Agent-Based Simulation of Human Mobility in a Futuristic Smart City Können wir uns in der Linie von NEOM frei bewegen? Eine agentenbasierte Simulation menschlicher Mobilität in einer futuristischen Smart City 我们可以在近地物体M的线上自由移动吗? 2507.15143v1 -
39 07-20 EduThink4AI: Translating Educational Critical Thinking into Multi-Agent LLM Systems EduThink4AI: Übersetzen des pädagogisch-kritischen Denkens in multi-agente LLM-Systeme EduThindink4AI:将教育关键思想转换成多机构LLM系统 2507.15015v1 -
40 07-20 LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading LLM-erweitertes Multi-Agenten-Verstärkungs-Lernen mit Experten-Workflow für Echtzeit-P2P-Energiehandel 与实时P2P能源贸易专家工作流程一起加强多机构强化学习 2507.14995v1 -
41 07-20 AutoGen Driven Multi Agent Framework for Iterative Crime Data Analysis and Prediction AutoGen Driven Multi Agent Framework für iterative Kriminalität Datenanalyse und Vorhersage 循环犯罪数据分析和预测自动驱动器多剂框架 2506.11475v2 -
42 07-19 (6) Learning to Communicate in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence Lernen zur Kommunikation im Mehr-Agenten-Verstärkungs-Lernen für die autonome Cyber-Verteidigung 学习多机构强化学习,以交流多机构强化学习,促进自动网络防御 2507.14658v1 -
43 07-19 Strategyproofness and Monotone Allocation of Auction in Social Networks Strategyproofness und Monotone Allokation von Auktionen in sozialen Netzwerken 社会网络拍卖的策略防战略和单调分配 2507.14472v1 -
44 07-19 Approximate Revenue Maximization for Diffusion Auctions Ungefähre Umsatzmaximierung für Diffusionsauktionen 传播拍卖收入的接近最大化 2507.14470v1 -
45 07-19 Learning in Strategic Queuing Systems with Small Buffers Lernen in strategischen Queuing-Systemen mit kleinen Puffern 战略排队系统与小缓冲的学习 2502.08898v2 -
46 07-19 DHLight: Multi-agent Policy-based Directed Hypergraph Learning for Traffic Signal Control DHLight: Multi-Agent Policy-based Directed Hypergraph Learning for Traffic Signal Control DHLight:多代理人基于政策的指导电报学习用于交通信号控制 2409.05037v2 -
47 07-18 (5) Technical Implementation of Tippy: Multi-Agent Architecture and System Design for Drug Discovery Laboratory Automation Technische Umsetzung von Tippy: Multi-Agent Architektur und Systemdesign für Drug Discovery Laborautomation Tippy:药物发现实验室自动化多机构建筑和系统设计技术实施 2507.17852v1 -
48 07-18 Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation Agentische Neuronale Netzwerke: Selbstständige Multi-Agenten-Systeme über textuelle Backpropagation 动态神经网络:通过文字反向分析实现自我演进的多行为者系统 2506.09046v2 -
49 07-18 A Minimalist Controller for Autonomously Self-Aggregating Robotic Swarms: Enabling Compact Formations in Multitasking Scenarios Minimalistische Steuerung für autonom selbstaggregierende Roboterschwärme: Ermöglichung kompakter Formationen in Multitasking-Szenarien 自主自我集中的机械式摇篮:多任务情景中有利的契约形式 2507.13969v1 -
50 07-18 Beyond DNS: Unlocking the Internet of AI Agents via the NANDA Index and Verified AgentFacts Beyond DNS: Entsperren des Internets von KI-Agenten über den NANDA Index und verifizierte AgentFacts 超越DNS:通过NANDA指数和经核实的代理活动解锁AI代理商的互联网 2507.14263v1 -
51 07-18 Towards Regulated Deep Learning Auf dem Weg zu reguliertem Deep Learning 走向监管的深学习 1912.13122v8 -
52 07-18 Scalable Submodular Policy Optimization via Pruned Submodularity Graph Skalierbare submodulare Optimierung der Politik über Pruned Submodularity Graph 通过审慎次模块图实现可缩放子模块政策优化 2507.13834v1 -
53 07-18 CodeEdu: A Multi-Agent Collaborative Platform for Personalized Coding Education CodeEdu: Eine Multi-Agenten-Kollaborative Plattform für personalisierte Coding-Bildung CodeEdu:个人化编码教育多机构合作平台 2507.13814v1 -
54 07-18 Acceleration of Gossip Algorithms through the Euler-Poisson-Darboux Equation Beschleunigung der Gossip-Algorithmen durch die Euler-Poisson-Darboux-Gleichung 通过Euler-Poisson-Darboux赤道加速戈斯普算法 2202.10742v2 -
55 07-18 From Firms to Computation: AI Governance and the Evolution of Institutions Von Unternehmen zur Berechnung: KI-Governance und die Entwicklung von Institutionen 从公司到计算:AI 治理和机构演变 2507.13616v1 -
56 07-17 (4) Nash equilibrium seeking for a class of quadratic-bilinear Wasserstein distributionally robust games Nash Gleichgewicht Suche nach einer Klasse von quadratisch-bilinearen Wasserstein Verteilung robusten Spielen Nash 均衡, 寻求类二次- 贝里尼奥尔 瓦西斯坦分配强强的游戏 2411.09636v2 -
57 07-17 Coral Protocol: Open Infrastructure Connecting The Internet of Agents Coral Protocol: Open Infrastructure Connecting Das Internet der Agenten 珊瑚议定书:开放基础设施连接代理物互联网 2505.00749v2 -
58 07-17 Imitating Mistakes in a Learning Companion AI Agent for Online Peer Learning Nachahmen von Fehlern in einem Learning Companion KI Agent für Online Peer Learning 模拟学习伙伴AI在线同行学习代理的错误 2507.12801v1
Article 0
Title@2025-07-24 (4): Moving Out: Physically-grounded Human-AI Collaboration
Title: Moving Out: Physically-grounded Human-AI Collaboration | Ausstieg: physikalisch begründete Mensch-AI-Kollaboration | 搬出:基于身体的人类 – – AI协作 2507.18623v1 |
Authors (5): Xuhui Kang, Sung-Wook Lee, Haolin Liu, Yuyan Wang, Yen-Ling Kuo
The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. In this paper, we introduce \textit{Moving Out}, a new human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and maintaining consistent actions to move a big item around a corner. Using Moving Out, we designed two tasks and collected human-human interaction data to evaluate models’ abilities to adapt to diverse human behaviors and unseen physical attributes. To address the challenges in physical environments, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. Our experiments show that BASS outperforms state-of-the-art models in AI-AI and human-AI collaboration. The project page is available at \href{https://live-robotics-uva.github.io/movingout_ai/}{https://live-robotics-uva.github.io/movingout_ai/}.
适应环境中的物理动作和限制的能力,对于内装物剂(如机器人)有效与人类合作至关重要。这种基于物理的人类-AI合作必须说明持续的国家-行动空间和因物理限制造成的受限动态的日益复杂性。在本文中,我们引入了一个新的人类-AI合作基准,类似于受物理属性和限制影响的广泛合作模式,例如将重物品一起移动,并保持一致的行动,以移动一个大项目。我们利用“搬出去”设计了两项任务,并收集了人与人的互动数据,以评价模型适应不同人类行为和看不见的物理特征的能力。为了应对物理环境中的挑战,我们提出了一种新颖的方法,即“BASS”(Behavior 增强、模拟和选择),以加强代理人的多样性和他们对行动结果的理解。我们的实验显示,BAS在AI-AI和人类-AI的协作中超越了“艺术”的状态模式。项目页面可在以下查阅:hrefefes://live-rovotiotio. am_ívotios/ buvotius/autrus/ mauttius.
Article 1
Title@2025-07-24 (4): Remembering the Markov Property in Cooperative MARL
Title: Remembering the Markov Property in Cooperative MARL | Erinnerung an das Markov-Grundstück in der Genossenschaft MARL | 记得马尔科夫在MARL合作社中的财产 2507.18333v1 |
Authors (5): Kale-ab Abebe Tessera, Leonard Hinckeldey, Riccardo Zamboni, David Abel, Amos Storkey
Cooperative multi-agent reinforcement learning (MARL) is typically formalised as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP), where agents must reason about the environment and other agents’ behaviour. In practice, current model-free MARL algorithms use simple recurrent function approximators to address the challenge of reasoning about others using partial information. In this position paper, we argue that the empirical success of these methods is not due to effective Markov signal recovery, but rather to learning simple conventions that bypass environment observations and memory. Through a targeted case study, we show that co-adapting agents can learn brittle conventions, which then fail when partnered with non-adaptive agents. Crucially, the same models can learn grounded policies when the task design necessitates it, revealing that the issue is not a fundamental limitation of the learning models but a failure of the benchmark design. Our analysis also suggests that modern MARL environments may not adequately test the core assumptions of Dec-POMDPs. We therefore advocate for new cooperative environments built upon two core principles: (1) behaviours grounded in observations and (2) memory-based reasoning about other agents, ensuring success requires genuine skill rather than fragile, co-adapted agreements.
合作性多试剂强化学习(MARL)通常被正规化为分散化部分可观测的马尔科夫决定程序(Dec-POMDP),代理商必须了解环境和其他代理商的行为。实际上,目前的无模型的MARL算法使用简单的经常性功能相近器来应对对使用部分信息的其他人进行推理的挑战。在本立场文件中,我们争辩说,这些方法的成功经验不是由于有效的Markov信号恢复,而是因为学习绕过环境观测和记忆的简单公约。我们通过有针对性的案例研究,表明共同适应的代理商可以学习易碎的公约,而当与非适应剂合作时,这些公约就会失败。至关重要的是,在任务设计需要时,同样的模型可以学习基于基础的政策,表明这个问题不是学习模式的基本限制,而是基准设计失败。我们的分析还表明,现代的MARL环境可能无法充分测试Dec-POMDPs的核心假设。我们因此倡导基于两个核心原则的新的合作环境:(1)基于观察的行为和(2)基于记忆的推理,而不是基于其他脆弱代理商的真正成功的技能。
Article 2
Title@2025-07-24 (4): Designing Value-Aligned Traffic Agents through Conflict Sensitivity
Title: Designing Value-Aligned Traffic Agents through Conflict Sensitivity | Gestaltung wertorientierter Verkehrsagenten durch Konfliktsensitivität | 通过冲突敏感性设计符合价值的交通代理 2507.18284v1 |
Authors (5): Astrid Rakow, Joe Collenette, Maike Schwammberger, Marija Slavkovik, Gleifer Vs Alves
Autonomous traffic agents (ATAs) are expected to act in ways tat are not only safe, but also aligned with stakeholder values across legal, social, and moral dimensions. In this paper, we adopt an established formal model of conflict from epistemic game theory to support the development of such agents. We focus on value conflicts-situations in which agents face competing goals rooted in value-laden situations and show how conflict analysis can inform key phases of the design process. This includes value elicitation, capability specification, explanation, and adaptive system refinement. We elaborate and apply the concept of Value-Aligned Operational Design Domains (VODDs) to structure autonomy in accordance with contextual value priorities. Our approach shifts the emphasis from solving moral dilemmas at runtime to anticipating and structuring value-sensitive behaviour during development.
自主交通代理商(ATAs)的行事方式不仅安全,而且符合法律、社会和道德方面利益攸关方的价值观; 在本文件中,我们采用了一个既定的正式冲突模式,从迷你游戏理论到支持此类代理商的发展; 我们侧重于价值冲突情况,即代理商面临源于价值拉累情况的相互竞争的目标,并表明冲突分析如何为设计过程的关键阶段提供信息;这包括价值采集、能力规格、解释和适应性系统完善; 我们制定并应用价值统一操作设计域的概念,以根据背景价值优先事项构建自主性; 我们的方法将重点从在运行时解决道德困境转向在发展过程中预测和构建对价值敏感的行为。
Article 3
Title@2025-07-24 (4): Compositional Coordination for Multi-Robot Teams with Large Language Models
Title: Compositional Coordination for Multi-Robot Teams with Large Language Models | Kompositionskoordination für Multi-Roboter-Teams mit großen Sprachmodellen | 具有大语言模式的多机器人小组的组成协调 2507.16068v2 |
Authors (5): Zhehui Huang, Guangyao Shi, Yuwei Wu, Vijay Kumar, Gaurav S. Sukhatme
Multi-robot coordination has traditionally relied on a mission-specific and expert-driven pipeline, where natural language mission descriptions are manually translated by domain experts into mathematical formulation, algorithm design, and executable code. This conventional process is labor-intensive, inaccessible to non-experts, and inflexible to changes in mission requirements. Here, we propose LAN2CB (Language to Collective Behavior), a novel framework that leverages large language models (LLMs) to streamline and generalize the multi-robot coordination pipeline. LAN2CB transforms natural language (NL) mission descriptions into executable Python code for multi-robot systems through two core modules: (1) Mission Analysis, which parses mission descriptions into behavior trees, and (2) Code Generation, which leverages the behavior tree and a structured knowledge base to generate robot control code. We further introduce a dataset of natural language mission descriptions to support development and benchmarking. Experiments in both simulation and real-world environments demonstrate that LAN2CB enables robust and flexible multi-robot coordination from natural language, significantly reducing manual engineering effort and supporting broad generalization across diverse mission types. Website: https://sites.google.com/view/lan-cb
多机器人协调历来依赖一个特派团专用和专家驱动的管道,其中自然语言任务说明由域专家人工转换成数学配制、算法设计和可执行代码。这一常规过程是劳动密集型的,非专家无法使用,无法灵活地适应任务要求的变化。在这里,我们提议使用LAN2CB(集体行为语言至集体行为),这是一个利用大型语言模式简化和普及多机器人协调管道的新框架。 LAN2CB将自然语言(NL)任务说明转换成多机器人系统可执行的 Python 代码,通过两个核心模块:(1) 任务分析,将任务描述划为行为树,和(2) 代码生成,利用行为树和结构知识库生成机器人控制代码。我们进一步引入一套自然语言任务说明数据集,以支持发展和基准化。在模拟和现实世界环境中进行的实验表明,LAN2CB使多机器人系统系统的描述能够从自然语言中实现可靠和灵活的多机器人协调,大大减少了手工工程努力,并支持了不同类型任务的一般化。 http://mexiolog/clusional orges.
Article 4
Title@2025-07-24 (4): Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation
Title: Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation | Assembly Your Crew: Automatisches Multi-Agenten-Kommunikationstopologie-Design über autoregressive Graphen-Generierung | 通过自动递减图形生成将您的组群组合成:自动多剂多剂通信地形设计 2507.18224v1 |
Authors (5): Shiyuan Li, Yixin Liu, Qingsong Wen, Chengqi Zhang, Shirui Pan
Multi-agent systems (MAS) based on large language models (LLMs) have emerged as a powerful solution for dealing with complex problems across diverse domains. The effectiveness of MAS is critically dependent on its collaboration topology, which has become a focal point for automated design research. However, existing approaches are fundamentally constrained by their reliance on a template graph modification paradigm with a predefined set of agents and hard-coded interaction structures, significantly limiting their adaptability to task-specific requirements. To address these limitations, we reframe MAS design as a conditional autoregressive graph generation task, where both the system composition and structure are designed jointly. We propose ARG-Designer, a novel autoregressive model that operationalizes this paradigm by constructing the collaboration graph from scratch. Conditioned on a natural language task query, ARG-Designer sequentially and dynamically determines the required number of agents, selects their appropriate roles from an extensible pool, and establishes the optimal communication links between them. This generative approach creates a customized topology in a flexible and extensible manner, precisely tailored to the unique demands of different tasks. Extensive experiments across six diverse benchmarks demonstrate that ARG-Designer not only achieves state-of-the-art performance but also enjoys significantly greater token efficiency and enhanced extensibility. The source code of ARG-Designer is available at https://github.com/Shiy-Li/ARG-Designer.
以大型语言模型为基础的多试剂系统(MAS)已经成为处理不同领域复杂问题的有力解决办法,而MAS的有效性则主要取决于其协作型态,而后者已成为自动化设计研究的协调中心,然而,现有办法基本上受到制约,因为它们依赖一个模板图修改模式,其中含有一套预先定义的代理和硬编码的互动结构,大大限制其适应特定任务要求的能力。为了解决这些限制,我们重新将MAS设计作为有条件的自动递增图形生成任务,其中系统构成和结构是联合设计的。我们建议了ARG-Deleter,这是一种新的自动递增模式,从零开始构建合作图案,使这一模式运作起来。在自然语言任务查询、ARG-Deder按顺序和动态确定所需数量,从一个可扩展的集合中选择其适当作用,并在它们之间建立最佳的通信联系。这种归正式方法以灵活和可扩展的方式创建了一种定制的表层,不完全适应不同任务的独特要求。在六种不同任务上进行广泛的跨比例实验,在六个不同层次上也具有更高的业绩基准。
Article 5
Title@2025-07-24 (4): A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms
Title: A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms | Eine differenzierte Prämienmethode für verstärktes Lernen auf der Grundlage von Multi-Fahrzeug-Kooperativen-Entscheidungs-Making-Algorithmen | 基于多维合作社决策的强化学习有区别的奖励方法 2502.00352v2 |
Authors (4): Ye Han, Lijun Zhang, Dejian Meng, Zhuang Zhang
Reinforcement learning (RL) shows great potential for optimizing multi-vehicle cooperative driving strategies through the state-action-reward feedback loop, but it still faces challenges such as low sample efficiency. This paper proposes a differentiated reward method based on steady-state transition systems, which incorporates state transition gradient information into the reward design by analyzing traffic flow characteristics, aiming to optimize action selection and policy learning in multi-vehicle cooperative decision-making. The performance of the proposed method is validated in RL algorithms such as MAPPO, MADQN, and QMIX under varying autonomous vehicle penetration. The results show that the differentiated reward method significantly accelerates training convergence and outperforms centering reward and others in terms of traffic efficiency, safety, and action rationality. Additionally, the method demonstrates strong scalability and environmental adaptability, providing a novel approach for multi-agent cooperative decision-making in complex traffic scenarios.
强化学习(RL)显示出通过州-行动回报反馈循环优化多车辆合作驱动战略的巨大潜力,但仍然面临样本效率低等挑战。本文件提出基于稳定状态过渡制度的有区别奖励方法,通过分析交通流量特点将国家过渡梯度信息纳入奖励设计,目的是在多车辆合作决策中优化行动选择和政策学习。拟议方法的绩效在州-行动回报反馈循环(MAPO、MADQN和QMIX)等RL算法中被验证,在不同的机动车辆自主渗透下得到验证。结果显示,有区别的奖励方法大大加快了培训的趋同,在交通效率、安全和行动合理性方面超越了奖励的核心。此外,该方法展示了强大的可伸缩性和环境适应性,为复杂交通情况中的多剂合作决策提供了新的方法。
Article 6
Title@2025-07-24 (4): Multi-Agent Guided Policy Optimization
Title: Multi-Agent Guided Policy Optimization | Multi-Agent gesteuerte Politikoptimierung | 多边机构引导政策优化政策 2507.18059v1 |
Authors (3): Yueheng Li, Guangming Xie, Zongqing Lu
Due to practical constraints such as partial observability and limited communication, Centralized Training with Decentralized Execution (CTDE) has become the dominant paradigm in cooperative Multi-Agent Reinforcement Learning (MARL). However, existing CTDE methods often underutilize centralized training or lack theoretical guarantees. We propose Multi-Agent Guided Policy Optimization (MAGPO), a novel framework that better leverages centralized training by integrating centralized guidance with decentralized execution. MAGPO uses an auto-regressive joint policy for scalable, coordinated exploration and explicitly aligns it with decentralized policies to ensure deployability under partial observability. We provide theoretical guarantees of monotonic policy improvement and empirically evaluate MAGPO on 43 tasks across 6 diverse environments. Results show that MAGPO consistently outperforms strong CTDE baselines and matches or surpasses fully centralized approaches, offering a principled and practical solution for decentralized multi-agent learning. Our code and experimental data can be found in https://github.com/liyheng/MAGPO.
由于一些实际制约因素,如部分可观性和有限的沟通,集中化执行培训已成为多机构强化学习合作(MARL)的主要模式,然而,现有的中央化培训方法往往没有充分利用集中化培训,或缺乏理论保障。我们建议多机构引导政策优化(MAGPO)这个新的框架,通过将集中化指导与分散执行相结合,更好地利用集中化培训。MAGPO采用自动回归式联合政策,进行可扩展、协调的探索,并明确将其与分散化政策保持一致,以确保在局部易懂性下部署性。我们从理论上保证单方政策改进,并实证地评价在六个不同环境中43项任务的宏观化政策。结果显示,MAGPO一贯地超越强大的CTDE基线,并匹配或超过完全集中化的方法,为分散化多机构学习提供了原则性和实用的解决方案。我们的代码和实验数据见https://github.com/liyheng/MAGPO。
Article 7
Title@2025-07-23 (3): Rapid Modeling Architecture for Lightweight Simulator to Accelerate and Improve Decision Making for Industrial Systems
Title: Rapid Modeling Architecture for Lightweight Simulator to Accelerate and Improve Decision Making for Industrial Systems | Schnelle Modellierungsarchitektur für leichte Simulatoren zur Beschleunigung und Verbesserung der Entscheidungsfindung für industrielle Systeme | 加快和改进工业系统决策的轻型模拟器快速建模架构 2507.17990v1 |
Authors (2): Takumi Kato, Zhi Li Hu
Designing industrial systems, such as building, improving, and automating distribution centers and manufacturing plants, involves critical decision-making with limited information in the early phases. The lack of information leads to less accurate designs of the systems, which are often difficult to resolve later. It is effective to use simulators to model the designed system and find out the issues early. However, the modeling time required by conventional simulators is too long to allow for rapid model creation to meet decision-making demands. In this paper, we propose a Rapid Modeling Architecture (RMA) for a lightweight industrial simulator that mitigates the modeling burden while maintaining the essential details in order to accelerate and improve decision-making. We have prototyped a simulator based on the RMA and applied it to the actual factory layout design problem. We also compared the modeling time of our simulator to that of an existing simulator, and as a result, our simulator achieved a 78.3% reduction in modeling time compared to conventional simulators.
设计工业系统,如建筑、改进和自动化分销中心和制造厂等设计工业系统,需要在早期阶段以有限的信息进行关键决策;缺乏信息导致系统设计不准确,往往难以在以后解决;使用模拟器模拟设计设计系统并及早发现问题十分有效;然而,常规模拟器所需的模拟时间太长,无法迅速创建模型以满足决策需求;在本文件中,我们提议为轻量级工业模拟器建立一个快速建模结构(RMA),以减轻建模负担,同时保持基本细节,以加快和改进决策;我们根据RMA制作了模拟器,并将其应用于实际的工厂布局设计问题;我们还将模拟器的建模时间与现有模拟器的建模时间作了比较,结果我们的模拟器比常规模拟器减少了78.3%的建模时间。
Article 8
Title@2025-07-23 (3): Learning in Conjectural Stackelberg Games
Title: Learning in Conjectural Stackelberg Games | Lernen in Conjectural Stackelberg Spiele | 在Cantuatural Stakkelberg运动会学习 2501.13686v3 |
Authors (3): Francesco Morri, Hélène Le Cadre, Luce Brotcorne
We extend the formalism of Conjectural Variations games to Stackelberg games involving multiple leaders and a single follower. To solve these nonconvex games, a common assumption is that the leaders compute their strategies having perfect knowledge of the follower’s best response. However, in practice, the leaders may have little to no knowledge about the other players’ reactions. To deal with this lack of knowledge, we assume that each leader can form conjectures about the other players’ best responses, and update its strategy relying on these conjectures. Our contributions are twofold: (i) On the theoretical side, we introduce the concept of Conjectural Stackelberg Equilibrium – keeping our formalism conjecture agnostic – with Stackelberg Equilibrium being a refinement of it. (ii) On the algorithmic side, we introduce a two-stage algorithm with guarantees of convergence, which allows the leaders to first learn conjectures on a training data set, and then update their strategies. Theoretical results are illustrated numerically.
我们把猜想变异游戏的形式主义扩大到涉及多个领导人和一位追随者的斯大克尔贝格游戏。 为了解决这些非confelx游戏,一个共同的假设是,领导人计算其战略时完全了解追随者的最佳反应。 但是,在实践中,领导人可能对其他玩家的反应知之甚少。为了解决这种缺乏知识的问题,我们假设每个领导人都可以对其他玩家的最佳反应进行猜想,并根据这些猜想更新其战略。我们的贡献有两个方面:(一) 在理论方面,我们引入了Conjector Stackelberg Equibilrium的概念 – – 保持我们正式的猜想,而Stakkelberg Equilibrium则是对它的一种改进。 (二) 在算法方面,我们引入了两阶段的算法,保证汇合,使领导人能够首先从一套培训数据中学习猜想,然后更新其战略。理论结果用数字说明。
Article 9
Title@2025-07-23 (3): Fair Compromises in Participatory Budgeting: a Multi-Agent Deep Reinforcement Learning Approach
Title: Fair Compromises in Participatory Budgeting: a Multi-Agent Deep Reinforcement Learning Approach | Faire Kompromisse bei der partizipativen Budgetierung: ein multi-agent-basierter Lernansatz zur Vertiefung der Stärkung | 参与性预算编制的公平折衷:多机构机构深入强化学习方法 2507.17433v1 |
Authors (3): Hugh Adams, Srijoni Majumdar, Evangelos Pournaras
Participatory budgeting is a method of collectively understanding and addressing spending priorities where citizens vote on how a budget is spent, it is regularly run to improve the fairness of the distribution of public funds. Participatory budgeting requires voters to make decisions on projects which can lead to ``choice overload”. A multi-agent reinforcement learning approach to decision support can make decision making easier for voters by identifying voting strategies that increase the winning proportion of their vote. This novel approach can also support policymakers by highlighting aspects of election design that enable fair compromise on projects. This paper presents a novel, ethically aligned approach to decision support using multi-agent deep reinforcement learning modelling. This paper introduces a novel use of a branching neural network architecture to overcome scalability challenges of multi-agent reinforcement learning in a decentralized way. Fair compromises are found through optimising voter actions towards greater representation of voter preferences in the winning set. Experimental evaluation with real-world participatory budgeting data reveals a pattern in fair compromise: that it is achievable through projects with smaller cost.
参与性预算编制是一种集体理解和解决支出优先事项的方法,公民对如何使用预算进行投票,定期进行这种投票是为了提高公共资金分配的公平性;参与性预算编制要求选民就可能导致 “ 选择超负荷 “ 的项目作出决定;多剂强化决策支持学习方法通过确定提高选民得票比例的投票战略,可以使选民的决策更容易进行;这种新颖的方法也可以通过强调选举设计的各个方面,为决策者提供支持,从而在项目上达成公平的妥协;本文件介绍了利用多剂深度强化学习模型支持决策的新颖的、符合道德的方法;本文介绍了一种分支神经网络结构的新用途,以分散方式克服多剂强化学习的可扩展性挑战;通过优化选民行动,在胜票组合中更多地体现选民的偏好,可以找到公平的妥协办法;用现实世界参与性预算编制数据进行的实验性评价揭示了一种公平的妥协模式:通过成本较低的项目可以实现。
Article 10
Title@2025-07-23 (3): Agent Identity Evals: Measuring Agentic Identity
Title: Agent Identity Evals: Measuring Agentic Identity | Agent Identity Evals: Messung Agentischer Identität | Evals: 测量制剂身份 2507.17257v1 |
Authors (2): Elija Perrier, Michael Timothy Bennett
Central to agentic capability and trustworthiness of language model agents (LMAs) is the extent they maintain stable, reliable, identity over time. However, LMAs inherit pathologies from large language models (LLMs) (statelessness, stochasticity, sensitivity to prompts and linguistically-intermediation) which can undermine their identifiability, continuity, persistence and consistency. This attrition of identity can erode their reliability, trustworthiness and utility by interfering with their agentic capabilities such as reasoning, planning and action. To address these challenges, we introduce \textit{agent identity evals} (AIE), a rigorous, statistically-driven, empirical framework for measuring the degree to which an LMA system exhibit and maintain their agentic identity over time, including their capabilities, properties and ability to recover from state perturbations. AIE comprises a set of novel metrics which can integrate with other measures of performance, capability and agentic robustness to assist in the design of optimal LMA infrastructure and scaffolding such as memory and tools. We set out formal definitions and methods that can be applied at each stage of the LMA life-cycle, and worked examples of how to apply them.
语言模型代理人(LMAs)的代理能力和可信度的核心是它们长期保持稳定、可靠和身份特性的程度;然而,LMAs继承了大型语言模型(LLMs)(无意识、随机性、敏锐性、敏锐性、语言干扰)的病理,这些病理会损害其可识别性、连续性、持久性和一致性。这种特征的消减会通过干扰其推理、规划和行动等代理能力而损害其可靠性、可信赖性和实用性。为了应对这些挑战,我们引入了\textit{agent identific evals}(AIE),这是一个严格的、统计驱动的、经验性的框架,用以衡量LMA系统展示并保持其代理特性的程度,包括它们的能力、特性和从国家扰动中恢复的能力。AIAIE包含一套新的衡量标准,可以与其他性能、能力和体力强性能衡量标准相结合,协助设计最佳LMA基础设施,以及将记忆和工具进行筛选。我们制定了正式的定义和方法,可以在LMA生命周期的每个阶段应用这些定义和范例。
Article 11
Title@2025-07-23 (3): Regret Minimization in Population Network Games: Vanishing Heterogeneity and Convergence to Equilibria
Title: Regret Minimization in Population Network Games: Vanishing Heterogeneity and Convergence to Equilibria | Entdauern Minimierung in Population Network Games: Verschwundene Heterogenität und Konvergenz zu Equilibria | 人口网络运动会的遗憾最小化:消除异异质性和融合到平衡 2507.17183v1 |
Authors (7): Die Hu, Shuyue Hu, Chunjiang Mu, Shiqi Fan, Chen Chu, Jinzhuo Liu, Zhen Wang
Understanding and predicting the behavior of large-scale multi-agents in games remains a fundamental challenge in multi-agent systems. This paper examines the role of heterogeneity in equilibrium formation by analyzing how smooth regret-matching drives a large number of heterogeneous agents with diverse initial policies toward unified behavior. By modeling the system state as a probability distribution of regrets and analyzing its evolution through the continuity equation, we uncover a key phenomenon in diverse multi-agent settings: the variance of the regret distribution diminishes over time, leading to the disappearance of heterogeneity and the emergence of consensus among agents. This universal result enables us to prove convergence to quantal response equilibria in both competitive and cooperative multi-agent settings. Our work advances the theoretical understanding of multi-agent learning and offers a novel perspective on equilibrium selection in diverse game-theoretic scenarios.
理解和预测游戏中大型多试剂的行为仍然是多试剂系统中的一项根本挑战。本文通过分析如何顺利的悔过配对使大量不同代理人以不同的初始政策推动统一行为的方式,审视了异质性在均衡形成中的作用。通过模拟系统作为遗憾的概率分布,分析其通过连续性方程式的演进,我们发现了多种试剂环境中的一个关键现象:遗憾分布的差异随着时间的流逝而减少,导致异质的消失,以及代理人之间形成共识。这一普遍结果使我们能够证明在竞争性和合作性多试剂环境下,都与孔式反应的平衡趋于一致。我们的工作推进了多试剂学习的理论理解,并为不同游戏理论情景中的均衡选择提供了新视角。
Article 12
Title@2025-07-23 (3): Adaptive Graph Pruning for Multi-Agent Communication
Title: Adaptive Graph Pruning for Multi-Agent Communication | Adaptives Graph Pruning für Multi-Agent Kommunikation | 多机构通信调节图 2506.02951v3 |
Authors (4): Boyi Li, Zhonghan Zhao, Der-Horng Lee, Gaoang Wang
Large Language Model (LLM) based multi-agent systems have shown remarkable performance in various tasks, especially when enhanced through collaborative communication. However, current methods often rely on a fixed number of agents and static communication structures, limiting their ability to adapt to varying task complexities. In this paper, we propose Adaptive Graph Pruning (AGP), a novel task-adaptive multi-agent collaboration framework that jointly optimizes agent quantity (hard-pruning) and communication topology (soft-pruning). Specifically, our method employs a two-stage training strategy: firstly, independently training soft-pruning networks for different agent quantities to determine optimal agent-quantity-specific complete graphs and positional masks across specific tasks; and then jointly optimizing hard-pruning and soft-pruning within a maximum complete graph to dynamically configure the number of agents and their communication topologies per task. Extensive experiments demonstrate that our approach is: (1) High-performing, achieving state-of-the-art results across six benchmarks and consistently generalizes across multiple mainstream LLM architectures, with a increase in performance of $2.58\%\sim 9.84\%$; (2) Task-adaptive, dynamically constructing optimized communication topologies tailored to specific tasks, with an extremely high performance in all three task categories (general reasoning, mathematical reasoning, and code generation); (3) Token-economical, having fewer training steps and token consumption at the same time, with a decrease in token consumption of $90\%+$; and (4) Training-efficient, achieving high performance with very few training steps compared with other methods. The performance will surpass the existing baselines after about ten steps of training under six benchmarks.
以大型语言模型(LLM)为基础的大型多试剂系统在各种任务中表现显著,特别是在通过协作通信而得到加强的情况下。然而,目前的方法往往依赖固定数量的代理商和静态通信结构,从而限制了它们适应不同任务复杂性的能力。在本文件中,我们建议采用适应性图形普鲁宁(AGP),这是一个新颖的任务适应性多试剂协作框架,共同优化代理商数量(硬调整)和通信地形(软调整)。具体地说,我们的方法采用一个两阶段培训战略:首先,独立培训不同代理商数量的软运行网络,以确定最佳的代理商-q具体数量完整图表和定位掩体,从而确定具体任务的最佳性能;然后,在最大完整图表中联合优化硬调整和软运行,以便动态地配置代理商数量及其通信结构(软调整)。 广泛的实验表明,我们的方法是:(1) 高绩效,在六个基准中达到最先进的标准,在多个主流LLM结构中持续地将最低的消费基准进行比较,将业绩提高2.5-QQ-eximal 完整完整完整的图表;在最高标准上,在最高标准上,在最高级培训中实现最高水平和最精确的顺序上,在最精确的进度上,在最精确的进度上,在最精确的排序上,在最精确的排序上,在最精确地进行最精确地推。
Article 13
Title@2025-07-23 (3): Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination
Title: Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination | Resiliente Multi-Agent-Verhandlung für medizinische Lieferketten:Integration von LLMs und Blockchain für transparente Koordination | 关于医疗供应链的具有弹性的多机构谈判:整合LLMM和透明协调的链锁 2507.17134v1 |
Authors (2): Mariam ALMutairi, Hyungmin Kim
Global health emergencies, such as the COVID-19 pandemic, have exposed critical weaknesses in traditional medical supply chains, including inefficiencies in resource allocation, lack of transparency, and poor adaptability to dynamic disruptions. This paper presents a novel hybrid framework that integrates blockchain technology with a decentralized, large language model (LLM) powered multi-agent negotiation system to enhance the resilience and accountability of medical supply chains during crises. In this system, autonomous agents-representing manufacturers, distributors, and healthcare institutions-engage in structured, context-aware negotiation and decision-making processes facilitated by LLMs, enabling rapid and ethical allocation of scarce medical resources. The off-chain agent layer supports adaptive reasoning and local decision-making, while the on-chain blockchain layer ensures immutable, transparent, and auditable enforcement of decisions via smart contracts. The framework also incorporates a formal cross-layer communication protocol to bridge decentralized negotiation with institutional enforcement. A simulation environment emulating pandemic scenarios evaluates the system’s performance, demonstrating improvements in negotiation efficiency, fairness of allocation, supply chain responsiveness, and auditability. This research contributes an innovative approach that synergizes blockchain trust guarantees with the adaptive intelligence of LLM-driven agents, providing a robust and scalable solution for critical supply chain coordination under uncertainty.
本文介绍了一个新的混合框架,将链式技术与分散、大语言模式(LLM)的多机构谈判系统结合起来,以在危机期间加强医疗供应链的复原力和问责制;在这一系统中,自主代理代表医疗供应链的制造商、经销商和保健机构参与由LLM推动的结构性、符合环境需要的谈判和决策进程,从而能够迅速和合乎道德地分配稀缺的医疗资源; 离链剂层支持适应性推理和地方决策,而链式链式链层则通过智能合同确保不易变、透明和可审计地执行决定; 该框架还纳入了正式的跨层通信协议,以便将分散谈判与机构执法联系起来; 模拟环境模拟模型模拟环境情景评估了系统的业绩,展示了谈判效率的提高、分配的公平性、供应链的响应性和可审计性; 这一研究有助于采用创新方法,使链式链式链式链式信任保证与稳定供应链的可靠可靠可靠协调。
Article 14
Title@2025-07-23 (3): Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning
Title: Joint Pedestrian and Vehicle Traffic Optimization in Urban Environments using Reinforcement Learning | Gemeinsame Fußgänger- und Fahrzeugverkehrsoptimierung in städtischen Umgebungen mittels Verstärkungslernen | 利用强化学习在城市环境中联合优化步行和车辆交通 2504.05018v2 |
Authors (5): Bibek Poudel, Xuan Wang, Weizi Li, Lei Zhu, Kevin Heaslip
Reinforcement learning (RL) holds significant promise for adaptive traffic signal control. While existing RL-based methods demonstrate effectiveness in reducing vehicular congestion, their predominant focus on vehicle-centric optimization leaves pedestrian mobility needs and safety challenges unaddressed. In this paper, we present a deep RL framework for adaptive control of eight traffic signals along a real-world urban corridor, jointly optimizing both pedestrian and vehicular efficiency. Our single-agent policy is trained using real-world pedestrian and vehicle demand data derived from Wi-Fi logs and video analysis. The results demonstrate significant performance improvements over traditional fixed-time signals, reducing average wait times per pedestrian and per vehicle by up to 67% and 52% respectively, while simultaneously decreasing total wait times for both groups by up to 67% and 53%. Additionally, our results demonstrate generalization capabilities across varying traffic demands, including conditions entirely unseen during training, validating RL’s potential for developing transportation systems that serve all road users.
强化学习(RL)对适应性交通信号控制有着重大希望。现有基于RL的方法在减少车辆拥堵方面显示了实效,但主要侧重于车辆中心优化使行人流动需求和安全挑战得不到解决。在本文中,我们提出了一个深度RL框架,用于在现实世界城市走廊沿线对八条交通信号进行适应性控制,共同优化行人和车辆效率。我们的单一试剂政策使用来自Wi-Fi日志和视频分析的真实世界行人和车辆需求数据进行了培训。结果显示传统固定时间信号的性能显著改善,将行人和车辆的平均等候时间分别减少67%和52%,同时将两个群体的总体等待时间分别减少高达67%和53%。此外,我们的结果显示,在各种交通需求中,包括培训期间完全看不见的条件,普遍具备了RL开发为所有道路使用者服务的运输系统的潜力。
Article 15
Title@2025-07-22 (2): Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems
Title: Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems | Parallelität trifft auf Anpassungsfähigkeit: Skalierbare Dokumente verstehen in Multi-Agent LLM-Systemen | 适应性:多机构LLM系统中可缩放文件理解 2507.17061v1 |
Authors (4): Chengxuan Xia, Qianye Wu, Sixuan Tian, Yilun Hao
Large language model (LLM) agents have shown increasing promise for collaborative task completion. However, existing multi-agent frameworks often rely on static workflows, fixed roles, and limited inter-agent communication, reducing their effectiveness in open-ended, high-complexity domains. This paper proposes a coordination framework that enables adaptiveness through three core mechanisms: dynamic task routing, bidirectional feedback, and parallel agent evaluation. The framework allows agents to reallocate tasks based on confidence and workload, exchange structured critiques to iteratively improve outputs, and crucially compete on high-ambiguity subtasks with evaluator-driven selection of the most suitable result. We instantiate these principles in a modular architecture and demonstrate substantial improvements in factual coverage, coherence, and efficiency over static and partially adaptive baselines. Our findings highlight the benefits of incorporating both adaptiveness and structured competition in multi-agent LLM systems.
大型语文模式(LLM)代理机构对完成协作任务的前景越来越大,但是,现有的多试剂框架往往依赖静态工作流程、固定角色和有限的机构间沟通,降低了它们在开放的、高复杂度领域的效力,本文件提出一个协调框架,通过三个核心机制使适应能力得以实现:动态任务路线、双向反馈和平行代理机构评价;该框架使代理机构能够根据信心和工作量重新分配任务,通过结构化批评来迭接改进产出,以及关键地在高度矛盾的子任务上与评价人员驱动的最适当结果的选择进行竞争;我们将这些原则纳入模块架构,并表明相对于静态和部分适应性基线的实际覆盖面、一致性和效率方面的重大改进;我们的调查结果强调将适应性和结构竞争纳入多试剂LM系统的好处。
Article 16
Title@2025-07-22 (2): AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation
Title: AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation | AURA: Multi-Modal Medical Agent für Verständnis, Vernunft und Annotation | AURA:一个多模式医疗代理,用于理解、说明理由和说明 2507.16940v1 |
Authors (3): Nima Fathi, Amar Kumar, Tal Arbel
Recent advancements in Large Language Models (LLMs) have catalyzed a paradigm shift from static prediction systems to agentic AI agents capable of reasoning, interacting with tools, and adapting to complex tasks. While LLM-based agentic systems have shown promise across many domains, their application to medical imaging remains in its infancy. In this work, we introduce AURA, the first visual linguistic explainability agent designed specifically for comprehensive analysis, explanation, and evaluation of medical images. By enabling dynamic interactions, contextual explanations, and hypothesis testing, AURA represents a significant advancement toward more transparent, adaptable, and clinically aligned AI systems. We highlight the promise of agentic AI in transforming medical image analysis from static predictions to interactive decision support. Leveraging Qwen-32B, an LLM-based architecture, AURA integrates a modular toolbox comprising: (i) a segmentation suite with phase grounding, pathology segmentation, and anatomy segmentation to localize clinically meaningful regions; (ii) a counterfactual image-generation module that supports reasoning through image-level explanations; and (iii) a set of evaluation tools including pixel-wise difference-map analysis, classification, and advanced state-of-the-art components to assess diagnostic relevance and visual interpretability.
大语言模型(LLMS)最近的进展催化了范式的转变,从静态预测系统向能进行推理、与工具互动和适应复杂任务的代理AI代理商的范式转变,LLM的代理系统在许多领域显示出希望,但它们对医学成像的应用仍处于初级阶段。我们在此工作中引入了AURA,这是专门为全面分析、解释和评价医学图像而设计的首个视觉语言解析剂。通过促成动态互动、背景解释和假设测试,AURA代表了向更透明、适应性和临床一致的AI系统的重大进步。我们强调ATI在将医学图像分析从静态预测转变为互动决策支持方面的希望。LMM的架构是LWen-32B。AUR整合了一个模块工具箱,其中包括:(一) 带有分阶段地基、病理学分解和解分解的分解模块,以将具有临床意义的区域本地化;(二) 一个反现实的图像生成模块,支持通过图像层次解释进行推理;以及(三) 一套评价工具,包括可视性诊断性分析、可判和州分析的高级分析。
Article 17
Title@2025-07-22 (2): From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent AI safety benchmarks
Title: From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent AI safety benchmarks | Von der Homöostase bis zur Ressourcenteilung: Biologisch und wirtschaftlich ausgerichtete multi-objektive Multi-Agenten-KI-Sicherheits-Benchmarks | 从原生状态到资源共享:生物和经济上一致的多目标多试剂AI安全基准 2410.00081v4 |
Authors (2): Roland Pihlakas, Joel Pyykkö
Developing safe, aligned agentic AI systems requires comprehensive empirical testing, yet many existing benchmarks neglect crucial themes aligned with biology and economics, both time-tested fundamental sciences describing our needs and preferences. To address this gap, the present work focuses on introducing biologically and economically motivated themes that have been neglected in current mainstream discussions on AI safety - namely a set of multi-objective, multi-agent alignment benchmarks that emphasize homeostasis for bounded and biological objectives, diminishing returns for unbounded, instrumental, and business objectives, sustainability principle, and resource sharing. We implemented eight main benchmark environments on the above themes, to illustrate key pitfalls and challenges in agentic AI-s, such as unboundedly maximizing a homeostatic objective, over-optimizing one objective at the expense of others, neglecting safety constraints, or depleting shared resources.
为了弥补这一差距,目前的工作重点是引入在目前关于AI安全的主流讨论中被忽视的生物和经济动机主题,即一套多目标、多剂调整基准,强调对约束性和生物目标的自闭和自闭性,减少无约束、工具性和商业目标的回报,可持续性原则,以及资源共享。 我们在上述主题上实施了八个主要基准环境,以说明在代理性AI方面的主要缺陷和挑战,例如无限制地最大限度地实现自闭性目标,过度优化一个目标而牺牲其他目标,忽视安全限制,或耗尽共享资源。
Article 18
Title@2025-07-22 (2): Smooth Games of Configuration in the Linear-Quadratic Setting
Title: Smooth Games of Configuration in the Linear-Quadratic Setting | Glatte Spiele der Konfiguration in der linearen-Quadrat-Einstellung | 线性二次曲线设置中的配置平滑游戏 2507.16611v1 |
Authors (3): Jesse Milzman, Jeffrey Mao, Giuseppe Loianno
Dynamic game theory offers a toolbox for formalizing and solving for both cooperative and non-cooperative strategies in multi-agent scenarios. However, the optimal configuration of such games remains largely unexplored. While there is existing literature on the parametrization of dynamic games, little research examines this parametrization from a strategic perspective where each agent’s configuration choice is influenced by the decisions of others. In this work, we introduce the concept of a game of configuration, providing a framework for the strategic fine-tuning of differential games. We define a game of configuration as a two-stage game within the setting of finite-horizon, affine-quadratic, AQ, differential games. In the first stage, each player chooses their corresponding configuration parameter, which will impact their dynamics and costs in the second stage. We provide the subgame perfect solution concept and a method for computing first stage cost gradients over the configuration space. This then allows us to formulate a gradient-based method for searching for local solutions to the configuration game, as well as provide necessary conditions for equilibrium configurations over their downstream (second stage) trajectories. We conclude by demonstrating the effectiveness of our approach in example AQ systems, both zero-sum and general-sum.
动态游戏理论为多试剂情景中合作和非合作策略的正规化和解决提供了一个工具箱。 但是,这种游戏的最佳配置仍然基本上尚未探索。 虽然已有关于动态游戏美化的文献, 但几乎没有研究从战略角度审视这种配对化, 使每个代理器的配置选择受到他人决定的影响。 在这项工作中, 我们引入了配置游戏的概念, 为不同游戏的战略微调提供了一个框架。 我们定义了一个配置游戏, 在定点正弦、 偏角、 AQ、 差异游戏的设置中, 是一个两阶段的游戏。 在第一阶段, 每个玩家选择相应的配置参数, 这将在第二阶段影响他们的动态和成本。 我们提供了子组合完美解决方案概念和在配置空间上计算第一阶段成本梯度的方法。 这样我们就可以制定一个基于梯度的方法, 搜索本地配置游戏的解决方案, 并为下游( 第二阶段) 轨迹、 AQ 、 差异游戏的平衡配置提供必要的条件。 我们通过展示我们下游( 第二阶段) 和 零点系统的有效性来结束。
Article 19
Title@2025-07-22 (2): Low complexity convergence rate bounds for push-sum algorithms with homogeneous correlation structure
Title: Low complexity convergence rate bounds for push-sum algorithms with homogeneous correlation structure | Grenzen der Konvergenzrate geringer Komplexität für Push-Summe-Algorithmen mit homogener Korrelationsstruktur | 低复杂合并率的低复杂合并率约束值,用于具有同质相关结构的推算算法-总算算法 2507.16601v1 |
Authors (2): Balázs Gerencsér, Miklós Kornyik
The objective of this work is to establish an upper bound for the almost sure convergence rate for a class of push-sum algorithms. The current work extends the methods and results of the authors on a similar low-complexity bound on push-sum algorithms with some particular synchronous message passing schemes and complements the general approach of Gerencs'er and Gerencs'er from 2022 providing an exact, but often less accessible description. Furthermore, a parametric analysis is presented on the ``weight’’ of the messages, which is found to be convex with an explicit expression for the gradient. This allows the fine-tuning of the algorithm used for improved efficiency. Numerical results confirm the speedup in evaluating the computable bounds without deteriorating their performance, for a graph on 120 vertices the runtime drops by more than 4 orders of magnitude.
这项工作的目的是为一类推和算法的几乎肯定的趋同率确定一个上限。 目前的工作扩展了作者采用类似的低复杂度的方法和结果,这些方法与某些特定同步电文传递办法的推和算法紧密相连,并补充了Gerencs'er和Gerencs'er从2022年开始提供准确但往往不易获取描述的一般方法。此外,对电文的“重量”进行了参数分析,发现该“重量”与梯度的清晰表达相连接。这样可以对用于提高效率的算法进行微调。数字结果证实了在不降低性能的情况下评价可计算界限的速度加快,在120个垂直值上将运行时间下降超过4个数量级。
Article 20
Title@2025-07-22 (2): Budget Allocation Policies for Real-Time Multi-Agent Path Finding
Title: Budget Allocation Policies for Real-Time Multi-Agent Path Finding | Budgetzuweisungsrichtlinien für die Echtzeit-Multi-Agent-Pfadsuche | 实时多机构道路寻找的预算拨款政策 2507.16874v1 |
Authors (2): Raz Beck, Roni Stern
Multi-Agent Pathfinding (MAPF) is the problem of finding paths for a set of agents such that each agent reaches its desired destination while avoiding collisions with the other agents. Many MAPF solvers are designed to run offline, that is, first generate paths for all agents and then execute them. Real-Time MAPF (RT-MAPF) embodies a realistic MAPF setup in which one cannot wait until a complete path for each agent has been found before they start to move. Instead, planning and execution are interleaved, where the agents must commit to a fixed number of steps in a constant amount of computation time, referred to as the planning budget. Existing solutions to RT-MAPF iteratively call windowed versions of MAPF algorithms in every planning period, without explicitly considering the size of the planning budget. We address this gap and explore different policies for allocating the planning budget in windowed versions of standard MAPF algorithms, namely Prioritized Planning (PrP) and MAPF-LNS2. Our exploration shows that the baseline approach in which all agents draw from a shared planning budget pool is ineffective in over-constrained situations. Instead, policies that distribute the planning budget over the agents are able to solve more problems with a smaller makespan.
多代理路由调查(MAPF)是一个现实的MAPF(RT-MAPF)设置,在这个设置中,人们不能等到每个代理商在开始移动之前找到完整的路径。相反,规划和执行是相互交错的,代理商必须承诺在固定的计算时间内采取固定数量的步骤,称为规划预算。RT-MAPF迭代调用窗口版的MAPF算法的现有解决方案在每个规划期间都是无效的,没有明确考虑规划预算的大小。我们处理这一缺口,探索不同的政策,将规划预算分配到标准MAPF算法的窗口版本,即优先规划(PrP)和MAPF-LNS2。我们的探索表明,所有代理商从共同规划库提取的基线方法在过多的计算时间里(称为规划预算预算预算预算)中是无效的。相反,将预算规划方法分配给更能解决的问题的代理商。
Article 21
Title@2025-07-22 (2): COMPASS: Cooperative Multi-Agent Persistent Monitoring using Spatio-Temporal Attention Network
Title: COMPASS: Cooperative Multi-Agent Persistent Monitoring using Spatio-Temporal Attention Network | COMPASS: Kooperatives Multi-Agenten-Persistenz-Monitoring mit Spatio-Temporal Attention Network | COMASS:利用斯帕蒂奥-时地注意网络进行多主动合作性持久性监测 2507.16306v1 |
Authors (3): Xingjian Zhang, Yizhuo Wang, Guillaume Sartoretti
Persistent monitoring of dynamic targets is essential in real-world applications such as disaster response, environmental sensing, and wildlife conservation, where mobile agents must continuously gather information under uncertainty. We propose COMPASS, a multi-agent reinforcement learning (MARL) framework that enables decentralized agents to persistently monitor multiple moving targets efficiently. We model the environment as a graph, where nodes represent spatial locations and edges capture topological proximity, allowing agents to reason over structured layouts and revisit informative regions as needed. Each agent independently selects actions based on a shared spatio-temporal attention network that we design to integrate historical observations and spatial context. We model target dynamics using Gaussian Processes (GPs), which support principled belief updates and enable uncertainty-aware planning. We train COMPASS using centralized value estimation and decentralized policy execution under an adaptive reward setting. Our extensive experiments demonstrate that COMPASS consistently outperforms strong baselines in uncertainty reduction, target coverage, and coordination efficiency across dynamic multi-target scenarios.
在应对灾害、环境遥感和野生生物保护等现实应用中,对动态目标进行持续监测至关重要,移动剂必须在这种应用中不断收集不确定的信息。我们提议建立多剂强化学习(MARL)框架,使分散剂能够持续高效监测多重移动目标。我们将环境建为图表,其中节点代表空间位置和边缘代表地貌接近,使物剂能够根据需要对结构化布局进行思考并重新审视信息区域。每个物剂独立选择基于共同的时空关注网络的行动,我们设计这个网络来整合历史观测和空间环境。我们用高山进程(GPs)作为动态模型,支持有原则的信念更新,并促成有意识的不确定性规划。我们用集中价值估计和分散政策执行的图案进行培训。我们的广泛实验表明,COMASS在减少不确定性、目标覆盖和协调动态多目标情景方面始终比强的基线。
Article 22
Title@2025-07-22 (2): Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping
Title: Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping | Multi-Agenten-Verstärkungs-Lernen für stichprobeneffiziente Tiefen-Neural-Netzwerk-Mapping | 用于抽样有效深神经网络绘图的多机构强化学习 2507.16249v1 |
Authors (7): Srivatsan Krishnan, Jason Jabbour, Dan Zhang, Natasha Jaques, Aleksandra Faust, Shayegan Omidshafiei, Vijay Janapa Reddi
Mapping deep neural networks (DNNs) to hardware is critical for optimizing latency, energy consumption, and resource utilization, making it a cornerstone of high-performance accelerator design. Due to the vast and complex mapping space, reinforcement learning (RL) has emerged as a promising approach-but its effectiveness is often limited by sample inefficiency. We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome this challenge. By distributing the search across multiple agents, our framework accelerates exploration. To avoid inefficiencies from training multiple agents in parallel, we introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis. This enables a decentralized, parallelized learning process that significantly improves sample efficiency. Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL, achieving up to 32.61x latency reduction and 16.45x energy-delay product (EDP) reduction under iso-sample conditions.
深神经网络(DNNS)对硬件进行测绘对于优化延迟度、能源消耗和资源利用至关重要,使其成为高性能加速器设计的基石。由于测绘空间广阔而复杂,强化学习(RL)已成为有希望的方法,但其效力往往受到抽样效率低下的限制。我们提出了一个分散的多剂强化学习(MARL)框架,旨在克服这一挑战。通过在多个代理机构之间分配搜索,我们的框架加快了探索。为避免培训多个代理机构的效率低下,我们引入了一种代理机构集群算法,根据相关分析将类似的绘图参数分配给相同的代理机构。这使得一个分散的平行学习过程能够大大提高样本效率。实验结果表明,我们的MARL方法提高了样本效率,比标准的单一代理机构RL提高了30-300x,达到32.61x拉特的减少和16.45x的能源脱落产(EDP)在等模条件下的减少量。
Article 23
Title@2025-07-22 (2): Unbeatable imitation of a friend
Title: Unbeatable imitation of a friend | Unschlagbare Nachahmung eines Freundes | 对朋友的无敌模仿 2507.16221v1 |
Authors (1): Masahiko Ueda
Imitation sometimes achieves success in multi-agent situations even though it is very simple. In game theory, success of imitation has been characterized by unbeatability against other agents. Previous studies specified conditions under which imitation is unbeatable in repeated games, and clarified that the existence of unbeatable imitation is strongly related to the existence of payoff-controlling strategies, called zero-determinant strategies. However, the previous studies mainly focused on imitation of opponents''. It was pointed out that imitation of other players in the same group and imitation of other players in the same role in other groups generally result in different outcomes. Here, we investigate the existence condition of unbeatable imitation in the latter
imitation of friends’’ situations. We find that it is stronger than the existence condition of unbeatable zero-determinant strategies, whereas both are very limited. Our findings suggest a strong relation between them even in the `imitation of friends’’ situations.
在游戏理论中,模仿成功的特点就是对其他代理人的不打赢。 先前的研究指出,在重复游戏中,模仿是无法打赢的,并且澄清,不可打赢的模仿的存在与是否存在“付息控制”战略密切相关,称为“零决定战略”。然而,先前的研究主要侧重于“对对手的压制 ” 。据指出,模仿同一团体中的其他角色和模仿其他团体中担任相同角色的其他角色通常会产生不同的结果。在这里,我们调查在后一种游戏中“对朋友情况的“限制”中存在无法打赢的模仿情况。我们发现,它比无法打赢的“零决定战略”的存在条件更强大,但两者都非常有限。我们的研究结果表明,即使在“朋友的模仿”局势中,它们之间也有很强的关系。
Article 24
Title@2025-07-22 (2): Heterogeneous Mixed Traffic Control and Coordination
Title: Heterogeneous Mixed Traffic Control and Coordination | Heterogene gemischte Verkehrssteuerung und -koordinierung | 异异混合混合交通控制和协调 2409.12330v2 |
Authors (5): Iftekharul Islam, Weizi Li, Xuan Wang, Shuai Li, Kevin Heaslip
Urban intersections with diverse vehicle types, from small cars to large semi-trailers, pose significant challenges for traffic control. This study explores how robot vehicles (RVs) can enhance heterogeneous traffic flow, particularly at unsignalized intersections where traditional methods fail during power outages. Using reinforcement learning (RL) and real-world data, we simulate mixed traffic at complex intersections with RV penetration rates ranging from 10% to 90%. Results show that average waiting times drop by up to 86% and 91% compared to signalized and unsignalized intersections, respectively. We observe a “rarity advantage,” where less frequent vehicles benefit the most (up to 87%). Although CO2 emissions and fuel consumption increase with RV penetration, they remain well below those of traditional signalized traffic. Decreased space headways also indicate more efficient road usage. These findings highlight RVs’ potential to improve traffic efficiency and reduce environmental impact in complex, heterogeneous settings.
从小汽车到大型半拖车等不同类型车辆的城市交汇点对交通控制构成重大挑战。本研究探讨了机器人车辆(RVs)如何能增强不同交通流量,特别是在传统方法在停电期间失败的无信号化交叉点。利用强化学习(RL)和现实世界数据,我们模拟了复杂交叉点的混合交通,从小汽车到大型半拖车的渗透率为10%至90%。结果显示,平均等候时间分别下降86%和91%,而信号化和未信号化的交叉点则不同。我们观察到“强势优势 ” , 最不常见的车辆从中获益(高达87% ) 。虽然二氧化碳排放和燃料消耗随着RV的渗透而增加,但仍远远低于传统信号化交通。空间进步的减少也表明道路使用效率更高。这些发现显示RVs在复杂、多变环境提高交通效率和减少环境影响方面的潜力。
Article 25
Title@2025-07-22 (2): Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations
Title: Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations | Aitomia: Ihr intelligenter Assistent für KI-getriebene Atomistische und Quantum Chemical Simulationen | Aitomia:您对AI-Driven原子学和量子化学模拟的智能助理 2505.08195v3 |
Authors (6): Jinming Hu, Hassan Nawaz, Yuting Rui, Lijie Chi, Arif Ullah, Pavlo O. Dral
We have developed Aitomia - a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations. This evolving intelligent assistant platform is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running atomistic simulations, monitoring their computational status, analyzing simulation results, and summarizing them for the user in both textual and graphical forms. We achieve these goals by exploiting large language models that leverage the versatility of our MLatom ecosystem, supporting AI-enhanced computational chemistry tasks ranging from ground-state to excited-state calculations, including geometry optimizations, thermochemistry, and spectral calculations. The multi-agent implementation enables autonomous executions of the complex computational workflows, such as the computation of the reaction enthalpies. Aitomia is the first intelligent assistant publicly accessible online on a cloud computing platform for atomistic simulations of broad scope (Aitomistic Hub at https://aitomistic.xyz). It may also be deployed locally as described at http://mlatom.com/aitomia. Aitomia is expected to lower the barrier to performing atomistic simulations, thereby democratizing simulations and accelerating research and development in relevant fields.
我们开发了Aitimia,这是一个由AI授权的平台,用于协助实施AI驱动的原子学和量子化学模拟(QC)平台。这个不断发展的智能助理平台配备了聊天机和AI代理器,帮助专家和指导非专家建立和运行原子模拟,监测其计算状态,分析模拟结果,并以文字和图形形式为用户进行总结。我们通过利用利用利用我们MLatom生态系统多功能的大型语言模型来实现这些目标,支持从地面状态到刺激状态的计算任务,包括几何优化、热化学和光谱计算。多试器的实施使得专家能够自主地执行复杂的计算工作流程,如计算反应动静脉。Aitomia是第一个在广度原子模拟的云计算平台上公开访问的智能助手(Aitomicic Hubaticaticality, https://aitomicistic.xyz ) 。它也可能被当地部署,如http://mlatoomsimationalimational-toimational-toimational-toimational-togradustrualtoimational asimationaltotototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototomatomatototototomatotototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototototo
Article 26
Title@2025-07-21 (1): AI-driven Orchestration at Scale: Estimating Service Metrics on National-Wide Testbeds
Title: AI-driven Orchestration at Scale: Estimating Service Metrics on National-Wide Testbeds | KI-getriebene Orchestrierung im Maßstab: Bewertung von Service-Metriken auf national-breiten Testbeds | AI驱动的缩放式手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手动手 2507.16077v1 |
Authors (5): Rodrigo Moreira, Rafael Pasquini, Joberto S. B. Martins, Tereza C. Carvalho, Flávio de Oliveira Silva
Network Slicing (NS) realization requires AI-native orchestration architectures to efficiently and intelligently handle heterogeneous user requirements. To achieve this, network slicing is evolving towards a more user-centric digital transformation, focusing on architectures that incorporate native intelligence to enable self-managed connectivity in an integrated and isolated manner. However, these initiatives face the challenge of validating their results in production environments, particularly those utilizing ML-enabled orchestration, as they are often tested in local networks or laboratory simulations. This paper proposes a large-scale validation method using a network slicing prediction model to forecast latency using Deep Neural Networks (DNNs) and basic ML algorithms embedded within an NS architecture, evaluated in real large-scale production testbeds. It measures and compares the performance of different DNNs and ML algorithms, considering a distributed database application deployed as a network slice over two large-scale production testbeds. The investigation highlights how AI-based prediction models can enhance network slicing orchestration architectures and presents a seamless, production-ready validation method as an alternative to fully controlled simulations or laboratory setups.
网络切除(NS)实现需要全方位的调控结构,以便高效和明智地处理不同用户的要求。为此,网络切除正在向更以用户为中心的数字转换演变,重点是将本地情报纳入其中的架构,以便能够以综合和孤立的方式实现自我管理的连通性。然而,这些举措面临着在生产环境中验证其结果的挑战,特别是利用由ML带动的调控,因为通常在当地网络或实验室模拟中进行测试。本文件提出了使用网络切除预测模型的大规模验证方法,以利用深神经网络和嵌入NS结构的基本ML算法预测长期性,在实际大规模生产试验台进行评估。它衡量和比较了不同的DNN和ML算法的性能,考虑将分布式数据库应用作为网络切片部署在两个大型生产试验台。调查突出表明,基于AI的预测模型如何加强网络切换管结构,并提供一个无缝合的、生产准备式验证方法,作为完全控制的模拟或实验室设置的替代方法。
Article 27
Title@2025-07-21 (1): LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra
Title: LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra | LLM Economist: Große Bevölkerungsmodelle und Mechanism Design in Multi-Agent Generative Simulacra | LLM 经济学家:多机构生成模拟中大型人口模型和机制设计 2507.15815v1 |
Authors (6): Seth Karten, Wenzhe Li, Zihan Ding, Samuel Kleiner, Yu Bai, Chi Jin
We present the LLM Economist, a novel framework that uses agent-based modeling to design and assess economic policies in strategic environments with hierarchical decision-making. At the lower level, bounded rational worker agents – instantiated as persona-conditioned prompts sampled from U.S. Census-calibrated income and demographic statistics – choose labor supply to maximize text-based utility functions learned in-context. At the upper level, a planner agent employs in-context reinforcement learning to propose piecewise-linear marginal tax schedules anchored to the current U.S. federal brackets. This construction endows economic simulacra with three capabilities requisite for credible fiscal experimentation: (i) optimization of heterogeneous utilities, (ii) principled generation of large, demographically realistic agent populations, and (iii) mechanism design – the ultimate nudging problem – expressed entirely in natural language. Experiments with populations of up to one hundred interacting agents show that the planner converges near Stackelberg equilibria that improve aggregate social welfare relative to Saez solutions, while a periodic, persona-level voting procedure furthers these gains under decentralized governance. These results demonstrate that large language model-based agents can jointly model, simulate, and govern complex economic systems, providing a tractable test bed for policy evaluation at the societal scale to help build better civilizations.
我们提出了LLM Economic(LLM Economic),这是一个新颖的框架,它利用代理模型模型来设计和评估具有等级决策的战略性环境中的经济政策。在较低层次上,受约束的合理工人代理人 – – 即作为个人条件的刺激,从美国人口普查经校准的收入和人口统计中抽取了样本 – – 选择劳动力供应,以最大限度地发挥在逻辑上学到的基于文本的实用功能。在上层,一个规划代理人利用基于当前美国联邦括号的细微线性边际税收时间表来提出。这个建筑为经济模拟提供了三个必要的能力:(一) 优化多样化的公用事业,(二) 有原则地产生大量符合人口统计实际情况的代理人口,(三) 机制设计 – – 最终形成的问题 – – 完全以自然语言表达。 与多达100个互动代理人的实验表明,规划员在Stackelberg equiquilibria附近聚集了一种基于目前美国联邦括号的社会福利,而定期、人级的投票程序可以帮助在分散治理下进一步取得这些收益。这些收益。这些成就。这些成果,这些成果是:在复杂的社会结构的模型,这些成果的模型可以提供基础的模型,为基础的模型,可以提供。
Article 28
Title@2025-07-21 (1): Competitive Algorithms for Cooperative Multi-Agent Ski-Rental Problems
Title: Competitive Algorithms for Cooperative Multi-Agent Ski-Rental Problems | Wettbewerbsfähige Algorithmen für kooperative Multi-Agenten-Ski-Mietprobleme | 合作性多机构天空-天空问题的竞争价值 2507.15727v1 |
Authors (6): Xuchuang Wang, Bo Sun, Hedyeh Beyhaghi, John C. S. Lui, Mohammad Hajiesmaili, Adam Wierman
This paper introduces a novel multi-agent ski-rental problem that generalizes the classical ski-rental dilemma to a group setting where agents incur individual and shared costs. In our model, each agent can either rent at a fixed daily cost, or purchase a pass at an individual cost, with an additional third option of a discounted group pass available to all. We consider scenarios in which agents’ active days differ, leading to dynamic states as agents drop out of the decision process. To address this problem from different perspectives, we define three distinct competitive ratios: overall, state-dependent, and individual rational. For each objective, we design and analyze optimal deterministic and randomized policies. Our deterministic policies employ state-aware threshold functions that adapt to the dynamic states, while our randomized policies sample and resample thresholds from tailored state-aware distributions. The analysis reveals that symmetric policies, in which all agents use the same threshold, outperform asymmetric ones. Our results provide competitive ratio upper and lower bounds and extend classical ski-rental insights to multi-agent settings, highlighting both theoretical and practical implications for group decision-making under uncertainty.
本文介绍了一个新的多试剂滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪- 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑 滑雪/ 滑/ 滑 滑 滑/ 滑/ 滑 滑雪/ 滑雪/ 滑雪/ 滑 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑 / / / / / / / 滑雪/ 滑雪/ 滑雪/ / / 滑雪/ / 滑雪/ 滑雪/ / / 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ 滑雪/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / 滑 / / / 滑 / / / / / / / / / / / 滑 /
Article 29
Title@2025-07-21 (1): Asynchronous Collective Tree Exploration: a Distributed Algorithm, and a new Lower Bound
Title: Asynchronous Collective Tree Exploration: a Distributed Algorithm, and a new Lower Bound | Asynchronous Collective Tree Exploration: ein verteilter Algorithmus und ein neuer Lower Bound | 无同步集体树木勘探:分配的数值和新的下层圆环 2507.15658v1 |
Authors (2): Romain Cosson, Laurent Massoulié
We study the problem of collective tree exploration in which a team of $k$ mobile agents must collectively visit all nodes of an unknown tree in as few moves as possible. The agents all start from the root and discover adjacent edges as they progress in the tree. Communication is distributed in the sense that agents share information by reading and writing on whiteboards located at all nodes. Movements are asynchronous, in the sense that the speeds of all agents are controlled by an adversary at all times. All previous competitive guarantees for collective tree exploration are either distributed but synchronous, or asynchronous but centralized. In contrast, we present a distributed asynchronous algorithm that explores any tree of $n$ nodes and depth $D$ in at most $2n+O(k^2 2^kD)$ moves, i.e., with a regret that is linear in $D$, and a variant algorithm with a guarantee in $O(k/\log k)(n+kD)$, i.e., with a competitive ratio in $O(k/\log k)$. We note that our regret guarantee is asymptotically optimal (i.e., $1$-competitive) from the perspective of average-case complexity. We then present a new general lower bound on the competitive ratio of asynchronous collective tree exploration, in $\Omega(\log^2 k)$. This lower bound applies to both the distributed and centralized settings, and improves upon the previous lower bound in $\Omega(\log k)$.
我们研究的是集体树勘探问题,在这个问题上,一个由美元组成的流动代理人团队必须在尽可能少的动作中集体访问未知树的所有节点。所有代理人都从根开始,随着树的进步而发现相邻的边缘。通信的分布意味着代理人在所有节点上的白板上通过阅读和写字分享信息。运动是无休止的,因为所有代理人的速度总是由一个对手控制。以前所有集体树勘探的竞争保证要么分布但同步,要么不同步但集中。相比之下,我们展示的是一种分散式的零点和深度的改善算法,这种算法最多以$2+O(k=2 2/kkD) 的速度在所有的节点上分享信息。运动是无休止的,因为所有代理人的速度在任何时候都由对手控制。所有集体树勘探的进度都是以美元(k/log=k)(n+k) 美元(n+kD) 来进行,或者以低调调的但但集中方式进行。我们注意到,我们从现在的正统(k/log_ral-ral-rus) 的角度,从现在以美元和一般的正正的正(ral-ral-ral-ral-ral-ral-ral) 角度,从现在以美元作为目前以正-rvial-ral-rvial-rvial-rvial-rvial-rvial-rvial-al-al-al-al-al-al-al-xxxxxxxxxxxxxx。
Article 30
Title@2025-07-21 (1): Preventing Rogue Agents Improves Multi-Agent Collaboration
Title: Preventing Rogue Agents Improves Multi-Agent Collaboration | Verhindern von Rogue-Agenten verbessert Multi-Agenten-Kollaboration | B. 改进多机构协作 2502.05986v2 |
Authors (3): Ohav Barbi, Ori Yoran, Mor Geva
Multi-agent systems, where specialized agents collaborate to solve a shared task hold great potential, from increased modularity to simulating complex environments. However, they also have a major caveat – a single agent can cause the entire system to fail. Consider a simple game where the knowledge to solve the task is distributed between agents, which share information in a communication channel. At each round, any of the agents can terminate the game and make the final prediction, even if they are uncertain about the outcome of their action. Detection of such rogue agents before they act may prevent the system’s failure. In this work, we propose to monitor agents during action prediction and intervene when a future error is likely to occur. To test our approach, we introduce WhoDunitEnv, a multi-agent collaboration environment that allows modular control over task complexity and communication structure. Experiments on WhoDunitEnv, code generation tasks and the GovSim environment for resource sustainability show that our approach leads to substantial performance gains up to 17.4%, 2.5% and 20%, respectively. Thorough analysis shows that our monitors successfully identify critical points of agent confusion and our interventions effectively stop agent errors from propagating.
多试剂系统,其中专门代理商合作解决一项共同任务,具有巨大的潜力,从增加模块化到模拟复杂环境。然而,它们也有一个重大警告 – – 一个单一代理商可能导致整个系统失败。 考虑一个简单的游戏, 解决任务的知识在代理商之间分配, 代理商在通信频道中共享信息。 在每一回合中, 任何代理商都可以终止游戏并作出最后预测, 即使他们对其行动结果不确定。 在他们采取行动之前发现这种无赖代理商可能会防止系统失败。 在这项工作中, 我们提议在行动预测中监测代理商, 并在可能发生未来错误时进行干预。 为了测试我们的方法, 我们引入了WhoDunitEnv, 一个多代理商合作环境, 允许对任务复杂性和通信结构进行模块化控制。 关于谁Env, 代码生成任务和 GovSim 环境促进资源可持续性的实验显示,我们的方法可以使业绩大幅提高至17.4%, 2.5% 和 20%。 索罗夫分析显示, 我们的监测商成功地确定了代理人混淆的关键点, 以及我们的有效干预防止传播错误。
Article 31
Title@2025-07-21 (1): HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics
Title: HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics | HAMLET: Hyperadaptive agentenbasierte Modellierung für lebend-embod Theatrics | HAMLET:基于超适应性制剂的活体编织戏剧模型模型 2507.15518v1 |
Authors (5): Sizhou Chen, Shufan Jiang, Chi Zhang, Xiao-Lei Zhang, Xuelong Li
Creating an immersive and interactive theatrical experience is a long-term goal in the field of interactive narrative. The emergence of large language model (LLM) is providing a new path to achieve this goal. However, existing LLM-based drama generation methods often result in AI agents that lack initiative and cannot interact with the physical environment. Furthermore, these methods typically require detailed user input to drive the drama. These limitations reduce the interactivity and immersion of online real-time performance. To address the above challenges, we propose HAMLET, a multi-agent framework focused on drama creation and online performance. Given a simple topic, the framework generates a narrative blueprint, guiding the subsequent improvisational performance. During the online performance, each actor is given an autonomous mind. This means that actors can make independent decisions based on their own background, goals, and emotional state. In addition to conversations with other actors, their decisions can also change the state of scene props through actions such as opening a letter or picking up a weapon. The change is then broadcast to other related actors, updating what they know and care about, which in turn influences their next action. To evaluate the quality of drama performance, we designed an evaluation method to assess three primary aspects, including character performance, narrative quality, and interaction experience. The experimental evaluation shows that HAMLET can create expressive and coherent theatrical experiences. Our code, dataset and models are available at https://github.com/HAMLET-2025/HAMLET.
创建沉浸和互动的戏剧体验是互动叙事领域的一项长期目标。大型语言模型(LLM)的出现为实现这一目标提供了一条新途径。然而,基于LLM的戏剧生成方法往往导致AI代理商缺乏主动性,无法与自然环境互动。此外,这些方法通常需要详细的用户投入才能驱动戏剧。这些限制还减少了在线实时表现的交互性和渗透性。为了应对上述挑战,我们提议HAMLET,这是一个多试办框架,侧重于戏剧创造和在线表现。鉴于一个简单的主题,该框架产生了一个叙述性蓝图,指导随后的即兴表演业绩。在网上表演期间,每个行为者都有自主的思维。这意味着行为者可以根据自己的背景、目标和情绪状态做出独立决定。除了与其他行为者交谈之外,他们的决定还可以通过打开信函或捡取武器等行动改变场景状态。随后,我们向其他相关行为者播放了HAMLET,更新了他们所了解和关心的内容,从而影响他们的下一个行动。在在线表演期间,每个行为者都有了一个自主的思维。这意味着行为者可以根据自己的背景、目标和情感状态做出独立评估。我们所设计了一种实验性的业绩分析方法,我们所设计了一种可以用来评估。
Article 32
Title@2025-07-24 (4): Recognizing and Eliciting Weakly Single Crossing Profiles on Trees
Title: Recognizing and Eliciting Weakly Single Crossing Profiles on Trees | Erkennen und Elizitieren von schwachen einzelnen Kreuzungsprofilen auf Bäumen | 承认树树和树的脆弱单一交叉概况 1611.04175v4 |
Authors (1): Palash Dey
We introduce and study the weakly single-crossing domain on trees which is a generalization of the well-studied single-crossing domain in social choice theory. We design a polynomial-time algorithm for recognizing preference profiles which belong to this domain. We then develop an efficient elicitation algorithm for this domain which works even if the preferences can be accessed only sequentially and the underlying single-crossing tree structure is not known beforehand. We also prove matching lower bound on the query complexity of our elicitation algorithm when the number of voters is large compared to the number of candidates. We also prove a lower bound of $\Omega(m^2\log n)$ on the number of queries that any algorithm needs to ask to elicit single crossing profile when random queries are allowed. This resolves an open question in an earlier paper and proves optimality of their preference elicitation algorithm when random queries are allowed.
我们引入并研究树木上薄弱的单跨域,这是社会选择理论中研究周密的单跨域的概括性。 我们设计了一种多元时间算法,用于确认属于此域的优惠概况。 然后我们为此域开发一种有效的引算法, 即使只能按顺序获得偏好, 且其基础的单跨树结构事先并不为人知。 我们还证明, 当选民人数与候选人人数相比较大时, 我们的引算法的查询复杂性比我们更低。 我们还证明, 在允许随机查询时, 任何算法需要查询的单交叉剖度的查询数量上, 也比 $\ Omega(m%2\log n) 的比值要低。 这在早先的文件中解决了一个开放问题, 并证明在允许随机查询时, 他们的优先引算法最优化 。
Article 33
Title@2025-07-21 (1): MobileUse: A GUI Agent with Hierarchical Reflection for Autonomous Mobile Operation
Title: MobileUse: A GUI Agent with Hierarchical Reflection for Autonomous Mobile Operation | MobileUse: Ein GUI-Agent mit Hierarchischer Reflexion für autonomen mobilen Betrieb | 移动用途: 自主移动行动等级反射的图形界面代理 2507.16853v1 |
Authors (10): Ning Li, Xiangmou Qu, Jiamu Zhou, Jun Wang, Muning Wen, Kounianhua Du, Xingyu Lou, Qiuying Peng, Jun Wang, Weinan Zhang
Recent advances in Multimodal Large Language Models (MLLMs) have enabled the development of mobile agents that can understand visual inputs and follow user instructions, unlocking new possibilities for automating complex tasks on mobile devices. However, applying these models to real-world mobile scenarios remains a significant challenge due to the long-horizon task execution, difficulty in error recovery, and the cold-start problem in unfamiliar environments. To address these challenges, we propose MobileUse, a GUI agent designed for robust and adaptive mobile task execution. To improve resilience in long-horizon tasks and dynamic environments, we introduce a hierarchical reflection architecture that enables the agent to self-monitor, detect, and recover from errors across multiple temporal scales-ranging from individual actions to overall task completion-while maintaining efficiency through a reflection-on-demand strategy. To tackle cold-start issues, we further introduce a proactive exploration module, which enriches the agent’s understanding of the environment through self-planned exploration. Evaluations on AndroidWorld and AndroidLab benchmarks demonstrate that MobileUse establishes new state-of-the-art performance, achieving success rates of 62.9% and 44.2%, respectively. To facilitate real-world applications, we release an out-of-the-box toolkit for automated task execution on physical mobile devices, which is available at https://github.com/MadeAgents/mobile-use.
最近多式大语言模型(MLLM)的进展使得能够理解视觉投入并遵循用户指示的移动代理器得以发展,从而能够理解视觉投入和遵循用户指示,从而打开移动设备复杂任务自动化的新可能性。然而,将这些模型应用到现实世界移动情景中仍是一个重大挑战,原因是任务执行时间长,错误恢复困难,以及不熟悉环境中的冷启动问题。为了应对这些挑战,我们提议移动用户,即一个用于强有力和适应性强的移动任务执行的图形界面代理器。为了提高长视距任务和动态环境中的复原力,我们引入了一个等级反省结构,使该代理器能够自我监测、检测和从多个时间尺度的错误中恢复过来,从单个行动到总体任务完成,同时通过反需求战略保持效率。为了解决冷启动问题,我们进一步引入一个积极主动的探索模块,通过自我规划探索来丰富代理器对环境的了解。对安道世界和安道洛比基准的评估表明,移动用户建立了新的状态性业绩,实现62.9%的成功率和44.2%的成功率,从多个时间尺度上,从多个时间尺度上,从单个应用,通过一个自动移动工具A/MILa-hlib-hliftstal-totototototototototototototodal adal adal adal apps adal adal appliplistepal adalliputs, ex apps ex apps, ex adallipallipal-dal-toment astepal-liputtomentalpalborputusment apps
Article 34
Title@2025-07-21 (1): One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms
Title: One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms | Ein Schritt ist genug: Multi-Agenten-Verstärkung-Lernen basierend auf One-Step-Politikoptimierung für Order Dispatch auf Ride-Sharing-Plattformen | 第一步就足够了:以单步政策优化为基础,开展多机构强化学习,以发出分流平台命令 2507.15351v1 |
Authors (2): Zijian Zhao, Sen Li
On-demand ride-sharing platforms face the fundamental challenge of dynamically bundling passengers with diverse origins and destinations and matching them with vehicles in real time, all under significant uncertainty. Recently, MARL has emerged as a promising solution for this problem, leveraging decentralized learning to address the curse of dimensionality caused by the large number of agents in the ride-hailing market and the resulting expansive state and action spaces. However, conventional MARL-based ride-sharing approaches heavily rely on the accurate estimation of Q-values or V-values, which becomes problematic in large-scale, highly uncertain environments. Specifically, most of these approaches adopt an independent paradigm, exacerbating this issue, as each agent treats others as part of the environment, leading to unstable training and substantial estimation bias in value functions. To address these challenges, we propose two novel alternative methods that bypass value function estimation. First, we adapt GRPO to ride-sharing, replacing the PPO baseline with the group average reward to eliminate critic estimation errors and reduce training bias. Second, inspired by GRPO’s full utilization of group reward information, we customize the PPO framework for ride-sharing platforms and show that, under a homogeneous fleet, the optimal policy can be trained using only one-step rewards - a method we term One-Step Policy Optimization (OSPO). Experiments on a real-world Manhattan ride-hailing dataset demonstrate that both GRPO and OSPO achieve superior performance across most scenarios, efficiently optimizing pickup times and the number of served orders using simple MLP networks.
需求搭载共享平台面临以动态方式将不同来源和目的地的乘客与实时车辆相搭并与之搭配的动态搭载,这些平台都处于很大的不确定性之中。最近,MARL逐渐成为解决这一问题的一个很有希望的解决办法,利用分散学习,解决乘车市场众多代理商以及由此而来的广阔州和行动空间所造成的对维度的诅咒。然而,传统的MARL搭载共享方法严重依赖对Q值或V值的准确估计,这在大规模、高度不确定的环境中成为问题。具体地说,这些方法大多采用独立模式,加剧这一问题,因为每个代理商将他人视为环境的一部分,导致培训不稳定,在价值功能方面产生很大的估计偏差。为了应对这些挑战,我们提出了两种新的替代方法,绕过价值函数估算。首先,我们调整GROPO,以集体平均奖励取代PO基准,以消除批评性估算错误,减少培训偏差。 其次,由于GROPO充分利用团体奖励信息,我们定制了PO的更高级采购流程框架,用于整个机舱共享平台和显示我们经过培训的O级平台,根据一个时间显示我们最优级的机舱级的机舱政策可以展示一个标准。
Article 35
Title@2025-07-21 (1): IM-Chat: A Multi-agent LLM-based Framework for Knowledge Transfer in Injection Molding Industry
Title: IM-Chat: A Multi-agent LLM-based Framework for Knowledge Transfer in Injection Molding Industry | IM-Chat: Ein LLM-basierter Rahmen für den Wissenstransfer in der Spritzgießindustrie | IM-Chat:一个基于多剂LLM的注射诱导业知识转让框架 2507.15268v1 |
Authors (5): Junhyeong Lee, Joon-Young Kim, Heekyu Kim, Inhyo Lee, Seunghwa Ryu
The injection molding industry faces critical challenges in preserving and transferring field knowledge, particularly as experienced workers retire and multilingual barriers hinder effective communication. This study introduces IM-Chat, a multi-agent framework based on large language models (LLMs), designed to facilitate knowledge transfer in injection molding. IM-Chat integrates both limited documented knowledge (e.g., troubleshooting tables, manuals) and extensive field data modeled through a data-driven process condition generator that infers optimal manufacturing settings from environmental inputs such as temperature and humidity, enabling robust and context-aware task resolution. By adopting a retrieval-augmented generation (RAG) strategy and tool-calling agents within a modular architecture, IM-Chat ensures adaptability without the need for fine-tuning. Performance was assessed across 100 single-tool and 60 hybrid tasks for GPT-4o, GPT-4o-mini, and GPT-3.5-turbo by domain experts using a 10-point rubric focused on relevance and correctness, and was further supplemented by automated evaluation using GPT-4o guided by a domain-adapted instruction prompt. The evaluation results indicate that more capable models tend to achieve higher accuracy, particularly in complex, tool-integrated scenarios. Overall, these findings demonstrate the viability of multi-agent LLM systems for industrial knowledge workflows and establish IM-Chat as a scalable and generalizable approach to AI-assisted decision support in manufacturing.
注入模具行业在保存和转让实地知识方面面临重大挑战,特别是因为有经验的工人退休和多语言障碍阻碍有效沟通。本研究报告介绍了基于大语言模型的多试剂框架IM-Chat,这是一个基于大语言模型的多试剂框架,目的是便利注射模具方面的知识转让。IM-Chat综合了有限的有记录的知识(例如排除故障表、手册)和广泛的实地数据,它们通过数据驱动流程条件生成模型,从环境投入(如温度和湿度)中推断出最佳制造环境环境环境,如温度和湿度,促成强有力和符合环境特点的任务解决。在模块架构内采用检索强化的生成战略和工具呼叫剂,IM-C确保适应性,而无需微调。对GPT-4o、GPT-4o-mini和GPT-3-3.5-turbo的100个单一工具和60个混合任务的业绩进行了评估,由域专家利用以相关性和正确性为重点的十点标本,并辅之以以域适应性指令为指导的自动评价。在模块化指令中,IM-C确保适应适应适应适应适应适应性生成过程的适应性,评估结果,显示这些复杂、更有能力的模型,从而能地展示了多层次分析。
Article 36
Title@2025-07-21 (1): Advancing Responsible Innovation in Agentic AI: A study of Ethical Frameworks for Household Automation
Title: Advancing Responsible Innovation in Agentic AI: A study of Ethical Frameworks for Household Automation | Advancing Responsible Innovation in Agentic AI: Eine Studie über ethische Rahmenbedingungen für die Haushaltsautomatisierung | 推进AI:家庭自动化道德框架研究 2507.15901v1 |
Authors (2): Joydeep Chandra, Satyam Kumar Navneet
The implementation of Artificial Intelligence (AI) in household environments, especially in the form of proactive autonomous agents, brings about possibilities of comfort and attention as well as it comes with intra or extramural ethical challenges. This article analyzes agentic AI and its applications, focusing on its move from reactive to proactive autonomy, privacy, fairness and user control. We review responsible innovation frameworks, human-centered design principles, and governance practices to distill practical guidance for ethical smart home systems. Vulnerable user groups such as elderly individuals, children, and neurodivergent who face higher risks of surveillance, bias, and privacy risks were studied in detail in context of Agentic AI. Design imperatives are highlighted such as tailored explainability, granular consent mechanisms, and robust override controls, supported by participatory and inclusive methodologies. It was also explored how data-driven insights, including social media analysis via Natural Language Processing(NLP), can inform specific user needs and ethical concerns. This survey aims to provide both a conceptual foundation and suggestions for developing transparent, inclusive, and trustworthy agentic AI in household automation.
在家庭环境中实施人工智能(AI),特别是以主动自主代理人的形式,带来了舒适和关注的可能性,也带来了内部和外部道德挑战; 本条分析了人工智能及其应用,重点是从被动自主转向主动自主、隐私、公平和用户控制; 我们审查了负责任的创新框架、以人为中心的设计原则和治理做法,为符合道德的智能家庭系统提供实用指导; 详细研究了面临较高监视、偏见和隐私风险的老年人、儿童和神经多样性群体等弱势用户群体,如老年人、儿童和神经多样性群体; 结合 “ 人工智能 “ 进行了详细研究; 强调设计的必要性,如有针对性的解释、微量同意机制和强有力的超控制,并辅之以参与性和包容性方法的支持; 还探讨了数据驱动的见解,包括通过自然语言处理(NLP)进行的社会媒体分析,如何为具体的用户需求和道德问题提供信息; 调查的目的是为在家庭自动化中发展透明、包容和可靠的代理AI提供概念基础和建议。
Article 37
Title@2025-07-20 (7): STL-GO: Spatio-Temporal Logic with Graph Operators for Distributed Systems with Multiple Network Topologies
Title: STL-GO: Spatio-Temporal Logic with Graph Operators for Distributed Systems with Multiple Network Topologies | STL-GO: Spatio-Temporale Logik mit Graph Operatoren für verteilte Systeme mit mehreren Netzwerktopologien | STL-GO: 与具有多网络地形分布式分布式系统的图表操作员一起的时空空间逻辑 2507.15147v1 |
Authors (6): Yiqi Zhao, Xinyi Yu, Bardh Hoxha, Georgios Fainekos, Jyotirmoy V. Deshmukh, Lars Lindemann
Multi-agent systems (MASs) consisting of a number of autonomous agents that communicate, coordinate, and jointly sense the environment to achieve complex missions can be found in a variety of applications such as robotics, smart cities, and internet-of-things applications. Modeling and monitoring MAS requirements to guarantee overall mission objectives, safety, and reliability is an important problem. Such requirements implicitly require reasoning about diverse sensing and communication modalities between agents, analysis of the dependencies between agent tasks, and the spatial or virtual distance between agents. To capture such rich MAS requirements, we model agent interactions via multiple directed graphs, and introduce a new logic – Spatio-Temporal Logic with Graph Operators (STL-GO). The key innovation in STL-GO are graph operators that enable us to reason about the number of agents along either the incoming or outgoing edges of the underlying interaction graph that satisfy a given property of interest; for example, the requirement that an agent should sense at least two neighboring agents whose task graphs indicate the ability to collaborate. We then propose novel distributed monitoring conditions for individual agents that use only local information to determine whether or not an STL-GO specification is satisfied. We compare the expressivity of STL-GO against existing spatio-temporal logic formalisms, and demonstrate the utility of STL-GO and our distributed monitors in a bike-sharing and a multi-drone case study.
多试剂系统(MAS)由若干自动代理商组成,他们进行交流、协调和共同感知环境,以实现复杂的任务。在机器人、智能城市和互联网应用等各种应用中,可以找到新的逻辑 – – Spatio-Temporal Lologic与图表操作员(STL-GO) – – STL-GO的关键创新是图形操作员,它使我们能够在基本互动图的即将到来或即将到来的边缘,说明代理商的数量,从而满足一定的利害关系;例如,要求代理商至少应感知到两个其任务图表表明合作能力的近邻代理商。我们随后建议为个人代理商提供新的监测条件,这些代理商仅使用当地信息来判断STPio-Temporal Lologic(STL-GO)是否与SBAR-SDR-SDR-SDR-SL-SL-SL-SL-SL-SDRmissional-Servical-Cal-Cal-Spal-Cal-Spal-Cal-Cal-Spal-Suptracal-SOCal-Sal-Sal-Spal-Sal-Servical-COGOL-Sal-Smal-Sal-Spal-Spal-Sal-Sal-Sal-SDRismismismismismmal-SDormmmmmmmmmmmex)。
Article 38
Title@2025-07-20 (7): Can We Move Freely in NEOM’s The Line? An Agent-Based Simulation of Human Mobility in a Futuristic Smart City
Title: Can We Move Freely in NEOM’s The Line? An Agent-Based Simulation of Human Mobility in a Futuristic Smart City | Können wir uns in der Linie von NEOM frei bewegen? Eine agentenbasierte Simulation menschlicher Mobilität in einer futuristischen Smart City | 我们可以在近地物体M的线上自由移动吗? 2507.15143v1 |
Authors (2): Abderaouf Bahi, Amel Ourici
This paper investigates the feasibility of human mobility in The Line, a proposed 170-kilometer linear smart city in NEOM, Saudi Arabia. To assess whether citizens can move freely within this unprecedented urban topology, we develop a hybrid simulation framework that integrates agent-based modeling, reinforcement learning, supervised learning, and graph neural networks. The simulation captures multi-modal transportation behaviors across 50 vertical levels and varying density scenarios using both synthetic data and real-world traces from high-density cities. Our experiments reveal that with the full AI-integrated architecture, agents achieved an average commute time of 7.8 to 8.4 minutes, a satisfaction rate exceeding 89 percent, and a reachability index of over 91 percent, even during peak congestion periods. Ablation studies confirmed that the removal of intelligent modules such as reinforcement learning or graph neural networks significantly degrades performance, with commute times increasing by up to 85 percent and reachability falling below 70 percent. Environmental modeling further demonstrated low energy consumption and minimal CO2 emissions when electric modes are prioritized. The findings suggest that freedom of movement is not only conceptually achievable in The Line, but also operationally realistic if supported by adaptive AI systems, sustainable infrastructure, and real-time feedback loops.
本文调查了“线线”中人类流动的可行性,“线线”是沙特阿拉伯近地物体M中一个拟议的170公里线性智能城市。为了评估公民能否在这一前所未有的城市地形中自由移动,我们开发了一个混合模拟框架,将基于代理模型的建模、强化学习、监管学习、神经网络和图形神经网络结合起来。模拟利用合成数据和高密度城市真实世界痕迹,记录了50个垂直水平的多模式交通行为和不同密度情景。我们的实验表明,随着完整AI综合结构,代理商的平均通勤时间为7.8至8.4分钟,满意度超过89%,可达率超过91%,即使在高峰的拥挤时期也是如此。 减缩研究证实,删除智能模块,如强化学习或图形神经网络,会大大降低性能,通勤时间增加85%,可达70%以下。环境建模进一步表明,当电子模式排列为优先时,能源消耗量低,二氧化碳排放量最小。研究结果表明,移动自由不仅在概念上可以实现,而且如果得到适应性AI系统、可持续基础设施以及实时反馈的支持,在实际操作上也是现实的。
Article 39
Title@2025-07-20 (7): EduThink4AI: Translating Educational Critical Thinking into Multi-Agent LLM Systems
Title: EduThink4AI: Translating Educational Critical Thinking into Multi-Agent LLM Systems | EduThink4AI: Übersetzen des pädagogisch-kritischen Denkens in multi-agente LLM-Systeme | EduThindink4AI:将教育关键思想转换成多机构LLM系统 2507.15015v1 |
Authors (5): Xinmeng Hou, Zhouquan Lu, Wenli Chen, Hai Hu, Qing Guo
Large language models (LLMs) have demonstrated significant potential as educational tutoring agents, capable of tailoring hints, orchestrating lessons, and grading with near-human finesse across various academic domains. However, current LLM-based educational systems exhibit critical limitations in promoting genuine critical thinking, failing on over one-third of multi-hop questions with counterfactual premises, and remaining vulnerable to adversarial prompts that trigger biased or factually incorrect responses. To address these gaps, we propose EDU-Prompting, a novel multi-agent framework that bridges established educational critical thinking theories with LLM agent design to generate critical, bias-aware explanations while fostering diverse perspectives. Our systematic evaluation across theoretical benchmarks and practical college-level critical writing scenarios demonstrates that EDU-Prompting significantly enhances both content truthfulness and logical soundness in AI-generated educational responses. The framework’s modular design enables seamless integration into existing prompting frameworks and educational applications, allowing practitioners to directly incorporate critical thinking catalysts that promote analytical reasoning and introduce multiple perspectives without requiring extensive system modifications.
大型语言模式(LLMS)作为教育辅导人员,在各种学术领域都表现出巨大的潜力,能够裁剪提示、安排课程和以近乎人类的技巧进行分级;然而,目前以LLM为基础的教育系统在促进真正的批判性思维方面表现出严重的局限性,在超过三分之一的多点问题上,与反事实的前提脱节,仍然容易受到引发偏见或事实不正确的反应的对抗性反应的影响。为弥补这些差距,我们提议EDU-Prompting,这是一个创新的多试剂框架,将教育批判性思维理论与LLM代理设计建立桥梁,以产生批判性、有偏见的解释,同时促进不同的观点。我们对理论基准和大学一级实用关键写作设想的系统评估表明,EDU的生成极大地增强了AI产生的教育对策的内容真实性和逻辑健全性。该框架的模块设计使得能够顺利地融入现有的快速框架和教育应用,使从业人员能够直接纳入促进分析推理和引入多种观点而无需广泛系统修改的关键思想催化剂。
Article 40
Title@2025-07-20 (7): LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading
Title: LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading | LLM-erweitertes Multi-Agenten-Verstärkungs-Lernen mit Experten-Workflow für Echtzeit-P2P-Energiehandel | 与实时P2P能源贸易专家工作流程一起加强多机构强化学习 2507.14995v1 |
Authors (6): Chengwei Lou, Zekai Jin, Wei Tang, Guangfei Geng, Jin Yang, Lu Zhang
Real-time peer-to-peer (P2P) electricity markets dynamically adapt to fluctuations in renewable energy and variations in demand, maximizing economic benefits through instantaneous price responses while enhancing grid flexibility. However, scaling expert guidance for massive personalized prosumers poses critical challenges, including diverse decision-making demands and lack of customized modeling frameworks. This paper proposed an integrated large language model-multi-agent reinforcement learning (LLM-MARL) framework for real-time P2P energy trading to address challenges such as the limited technical capability of prosumers, the lack of expert experience, and security issues of distribution networks. LLMs are introduced as experts to generate personalized strategy, guiding MARL under the centralized training with decentralized execution (CTDE) paradigm through imitation learning. A differential attention-based critic network is designed to enhance convergence performance. Experimental results demonstrate that LLM generated strategies effectively substitute human experts. The proposed multi-agent imitation learning algorithms achieve significantly lower economic costs and voltage violation rates on test sets compared to baselines algorithms, while maintaining robust stability. This work provides an effective solution for real-time P2P electricity market decision-making by bridging expert knowledge with agent learning.
P2P电力市场动态地适应可再生能源的波动和需求的变化,通过即时价格反应使经济效益最大化,同时提高电网的灵活性;然而,扩大对大规模个性化计价人的专家指导提出了重大挑战,包括各种决策要求和缺乏定制的模型框架;本文提议为P2P实时能源贸易建立一个大型大型语言模型-多剂强化学习(LLM-MARL)框架,以应对诸如购买者的技术能力有限、缺乏专家经验和分销网络的安全问题等挑战。LLMS作为专家被引入,以制定个性化战略,通过模仿学习,在集中培训中指导MARL,采用分散执行模式指导MARL。基于不同关注的批评网络旨在增强趋同性业绩。实验结果表明LLM生成的战略有效替代了人类专家。拟议的多剂模拟学习算法与基线算法相比,在测试机组上实现了大幅降低经济成本和电压违规率,同时保持了稳健的稳定。这项工作为实时P2P电力市场决策提供了有效的解决办法,通过连接专家学习代理人进行实时P2P电路。
Article 41
Title@2025-07-20 (7): AutoGen Driven Multi Agent Framework for Iterative Crime Data Analysis and Prediction
Title: AutoGen Driven Multi Agent Framework for Iterative Crime Data Analysis and Prediction | AutoGen Driven Multi Agent Framework für iterative Kriminalität Datenanalyse und Vorhersage | 循环犯罪数据分析和预测自动驱动器多剂框架 2506.11475v2 |
Authors (4): Syeda Kisaa Fatima, Tehreem Zubair, Noman Ahmed, Asifullah Khan
This paper introduces LUCID-MA (Learning and Understanding Crime through Dialogue of Multiple Agents), an innovative AI powered framework where multiple AI agents collaboratively analyze and understand crime data. Our system that consists of three core components: an analysis assistant that highlights spatiotemporal crime patterns; a feedback component that reviews and refines analytical results; and a prediction component that forecasts future crime trends. With a well-designed prompt and the LLaMA-2-13B-Chat-GPTQ model, it runs completely offline and allows the agents undergo self-improvement through 100 rounds of communication with less human interaction. A scoring function is incorporated to evaluate agent performance, providing visual plots to track learning progress. This work demonstrates the potential of AutoGen-style agents for autonomous, scalable, and iterative analysis in social science domains, maintaining data privacy through offline execution. It also showcases a computational model with emergent intelligence, where the system’s global behavior emerges from the interactions of its agents. This emergent behavior manifests as enhanced individual agent performance, driven by collaborative dialogue between the LLM-based agents.
本文介绍了LUCID-MA(通过多代理人对话学习和理解犯罪)这一创新的AI授权框架,其中多个大赦国际代理人合作分析和理解犯罪数据。我们的系统由三个核心部分组成:一个分析助理,突出时空犯罪模式;一个反馈组成部分,审查和完善分析结果;一个预测组成部分,预测未来犯罪趋势。一个设计完善的迅速和LALAMA-213B-Chat-GPTQ模型,它完全脱机运行,通过100轮交流,使代理人能够自我改进,而人际互动较少。一个评分功能被纳入评估代理人的绩效,提供视觉图象来跟踪学习进展。这项工作展示了AutoGen式代理人在社会科学领域进行自主、可扩展和迭接分析的潜力,通过离线执行来维护数据隐私。它还展示了一种计算模型,即系统的全球行为源自其代理人的互动。这一新兴行为表现是个人代理人在以LM为主的代理人之间协作对话所驱动的增强表现。
Article 42
Title@2025-07-19 (6): Learning to Communicate in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence
Title: Learning to Communicate in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence | Lernen zur Kommunikation im Mehr-Agenten-Verstärkungs-Lernen für die autonome Cyber-Verteidigung | 学习多机构强化学习,以交流多机构强化学习,促进自动网络防御 2507.14658v1 |
Authors (3): Faizan Contractor, Li Li, Ranwa Al Mallah
Popular methods in cooperative Multi-Agent Reinforcement Learning with partially observable environments typically allow agents to act independently during execution, which may limit the coordinated effect of the trained policies. However, by sharing information such as known or suspected ongoing threats, effective communication can lead to improved decision-making in the cyber battle space. We propose a game design where defender agents learn to communicate and defend against imminent cyber threats by playing training games in the Cyber Operations Research Gym, using the Differentiable Inter Agent Learning algorithm adapted to the cyber operational environment. The tactical policies learned by these autonomous agents are akin to those of human experts during incident responses to avert cyber threats. In addition, the agents simultaneously learn minimal cost communication messages while learning their defence tactical policies.
在具有部分可观测环境的多机构强化合作学习中,普及方法通常允许代理商在执行过程中独立行动,这可能会限制经过培训的政策的协调效果;然而,通过共享已知或疑似持续威胁等信息,有效的沟通可以改善网络战空间的决策;我们提出一个游戏设计,让辩护商通过在网络操作研究健身场上玩训练游戏,利用适应网络操作环境的可区别的跨代理商学习算法,学会沟通和防范迫在眉睫的网络威胁;这些自主代理商所学的战术政策类似于在应对事件时人类专家为避免网络威胁而采取的战术政策;此外,代理商在学习其防御战术政策的同时,学习最低成本的通信信息。
Article 43
Title@2025-07-19 (6): Strategyproofness and Monotone Allocation of Auction in Social Networks
Title: Strategyproofness and Monotone Allocation of Auction in Social Networks | Strategyproofness und Monotone Allokation von Auktionen in sozialen Netzwerken | 社会网络拍卖的策略防战略和单调分配 2507.14472v1 |
Authors (5): Yuhang Guo, Dong Hao, Bin Li, Mingyu Xiao, Bakh Khoussainov
Strategyproofness in network auctions requires that bidders not only report their valuations truthfully, but also do their best to invite neighbours from the social network. In contrast to canonical auctions, where the value-monotone allocation in Myerson’s Lemma is a cornerstone, a general principle of allocation rules for strategyproof network auctions is still missing. We show that, due to the absence of such a principle, even extensions to multi-unit network auctions with single-unit demand present unexpected difficulties, and all pioneering researches fail to be strategyproof. For the first time in this field, we identify two categories of monotone allocation rules on networks: Invitation-Depressed Monotonicity (ID-MON) and Invitation-Promoted Monotonicity (IP-MON). They encompass all existing allocation rules of network auctions as specific instances. For any given ID-MON or IP-MON allocation rule, we characterize the existence and sufficient conditions for the strategyproof payment rules, and show that among all such payment rules, the revenue-maximizing one exists and is computationally feasible. With these results, the obstacle of combinatorial network auction with single-minded bidders is now resolved.
网络拍卖中的战略防守性要求投标人不仅真实地报告其估值,而且尽其所能邀请社会网络的邻居。与卡通式拍卖相反,Myerson’s Lemma中的价值-分子分配是一个基石,因此仍然缺少战略防网络拍卖分配规则的一般性原则。我们表明,由于缺乏这样一项原则,甚至以单一单位需求扩大多单位网络拍卖也存在意外困难,而且所有开创性研究都未能做到战略防患于未然。我们在这一领域首次确定了两种单一的网络分配规则:邀请-压抑垄断(ID-MON)和邀请-促进垄断(IP-MON)。它们包括了网络拍卖的所有现有分配规则,作为具体实例。对于任何指定的ID-MON或IP-MON分配规则,我们给出了战略防战略付款规则的存在和充分条件,并表明在所有这些付款规则中,收入最大化规则的存在和计算上是可行的。有了这些结果,现在与单一方向拍卖商的组合网络障碍已经解决了。
Article 44
Title@2025-07-19 (6): Approximate Revenue Maximization for Diffusion Auctions
Title: Approximate Revenue Maximization for Diffusion Auctions | Ungefähre Umsatzmaximierung für Diffusionsauktionen | 传播拍卖收入的接近最大化 2507.14470v1 |
Authors (5): Yifan Huang, Dong Hao, Zhiyi Fan, Yuhang Guo, Bin Li
Reserve prices are widely used in practice. The problem of designing revenue-optimal auctions based on reserve price has drawn much attention in the auction design community. Although they have been extensively studied, most developments rely on the significant assumption that the target audience of the sale is directly reachable by the auctioneer, while a large portion of bidders in the economic network unaware of the sale are omitted. This work follows the diffusion auction design, which aims to extend the target audience of optimal auction theory to all entities in economic networks. We investigate the design of simple and provably near-optimal network auctions via reserve price. Using Bayesian approximation analysis, we provide a simple and explicit form of the reserve price function tailored to the most representative network auction. We aim to balance setting a sufficiently high reserve price to induce high revenue in a successful sale, and attracting more buyers from the network to increase the probability of a successful sale. This reserve price function preserves incentive compatibility for network auctions, allowing the seller to extract additional revenue beyond that achieved by the Myerson optimal auction. Specifically, if the seller has $\rho$ direct neighbours in a network of size $n$, this reserve price guarantees a $1-{1 \over \rho}$ approximation to the theoretical upper bound, i.e., the maximum possible revenue from any network of size $n$. This result holds for any size and any structure of the networked market.
在实际中广泛使用储备价格。设计基于储备价格的收入最佳拍卖的问题引起了拍卖设计界的极大关注。虽然已经进行了广泛的研究,但大多数发展动态都依赖于一个重要假设,即拍卖商可以直接获得销售的目标受众,而经济网络中很大一部分投标人不知道销售成功,因此忽略了这项工作。这项工作是在推广拍卖设计之后进行的,目的是将最佳拍卖理论的目标受众扩大到经济网络中的所有实体。我们调查了通过储备价格进行的简单和可察觉的接近最佳网络拍卖的设计。我们利用Bayesian近似分析,提供了适合最具代表性的网络拍卖的储备价格功能的简单和明确的形式。我们的目标是平衡地确定足够高的储备价格,在成功的销售中吸引更多的投标人,从网络中吸引更多的买主增加成功销售的概率。这一保留价格功能保持了网络拍卖的兼容性,使卖方能够从Myerson最佳拍卖所取得的更多收入之外。具体地说,如果销售商在以美元为单位的网络中拥有直接邻居的美元,那么,那么,这种价格最高额将保证最高额的网络。
Article 45
Title@2025-07-19 (6): Learning in Strategic Queuing Systems with Small Buffers
Title: Learning in Strategic Queuing Systems with Small Buffers | Lernen in strategischen Queuing-Systemen mit kleinen Puffern | 战略排队系统与小缓冲的学习 2502.08898v2 |
Authors (5): Ariana Abel, Yoav Kolumbus, Jeronimo Martin Duque, Cristian Palma Foster, Eva Tardos
We consider learning outcomes in games with carryover effects between rounds: when outcomes in the present round affect the game in the future. An important example of such systems is routers in networking, as they use simple learning algorithms to find the best way to deliver packets to their desired destination. This simple, myopic, and distributed decision process makes large queuing systems easy to operate, but at the same time, the system needs more capacity than would be required if all traffic were centrally coordinated. Gaitonde and Tardos (EC 2020 and JACM 2023) initiated the study of such systems, modeling them as an infinitely repeated game in which routers compete for servers and the system maintains a state (the number of packets held at each queue) that results from outcomes of previous rounds. However, their model assumes that servers have no buffers at all, so routers have to resend all packets that were not served successfully, which makes their system model unrealistic. They show that in their model, even with hugely increased server capacity relative to what is needed in the centrally coordinated case, ensuring that the system is stable requires the use of timestamps and priority for older packets. We consider a system with two important changes, which make the model more realistic and allow for much higher traffic rates: first, we add a very small buffer to each server, allowing the server to hold on to a single packet to be served later (if it fails to serve it immediately), and second, we do not require timestamps or priority to older packets. Using theoretical analysis and simulations, we show that when queues are learning, a small constant-factor increase in server capacity, compared to what would be needed if centrally coordinating, suffices to keep the system stable, even if servers select randomly among packets arriving simultaneously.
我们考虑在具有各回合之间结转效应的游戏中学习结果:当当前回合的结果影响未来游戏时。这种系统的一个重要例子是网络中的路由器,因为它们使用简单的学习算法找到将包交付到其理想目的地的最佳方法。这个简单、短视和分布式的决策过程使得大型排队系统易于操作,但与此同时,这个系统所需要的容量要大于所有交通协调集中时所需要的容量。Gaitonde和Tardos(EC 2020和JACM 2023)开始研究这些系统,把它们建为无限重复的游戏,在这种游戏中,路由器会竞争服务器,而系统会保持一种状态(每个队列所保存的包数量),而这种状态(每个队列所保存的包数量是最佳的。)然而,它们的模型假设服务器根本没有缓冲,因此路由器必须重新发送所有未成功服务的包,这使其系统模型变得不现实。它们会在其模型中选择,即使与中央协调的案例中需要的服务器容量大得多,服务器的能力也会大大提高, 并且确保系统需要两个系统稳定地重复重复运行, 当我们需要一个稳定的使用一个稳定的版本的服务器, 当我们需要一个更精确的服务器的时候, 当我们需要一个更稳定地显示一个更稳定的系统的时候, 。
Article 46
Title@2025-07-19 (6): DHLight: Multi-agent Policy-based Directed Hypergraph Learning for Traffic Signal Control
Title: DHLight: Multi-agent Policy-based Directed Hypergraph Learning for Traffic Signal Control | DHLight: Multi-Agent Policy-based Directed Hypergraph Learning for Traffic Signal Control | DHLight:多代理人基于政策的指导电报学习用于交通信号控制 2409.05037v2 |
Authors (5): Zhen Lei, Zhishu Shen, Kang Wang, Zhenwei Wang, Tiehua Zhang
Recent advancements in Deep Reinforcement Learning (DRL) and Graph Neural Networks (GNNs) have demonstrated notable promise in the realm of intelligent traffic signal control, facilitating the coordination across multiple intersections. However, the traditional methods rely on standard graph structures often fail to capture the intricate higher-order spatio-temporal correlations inherent in real-world traffic dynamics. Standard graphs cannot fully represent the spatial relationships within road networks, which limits the effectiveness of graph-based approaches. In contrast, directed hypergraphs provide more accurate representation of spatial information to model complex directed relationships among multiple nodes. In this paper, we propose DHLight, a novel multi-agent policy-based framework that synergistically integrates directed hypergraph learning module. This framework introduces a novel dynamic directed hypergraph construction mechanism, which captures complex and evolving spatio-temporal relationships among intersections in road networks. By leveraging the directed hypergraph relational structure, DHLight empowers agents to achieve adaptive decision-making in traffic signal control. The effectiveness of DHLight is validated against state-of-the-art baselines through extensive experiments in various network datasets. We release the code to support the reproducibility of this work at https://github.com/LuckyVoasem/Traffic-Light-control
深度强化学习(DRL)和图形神经网络(GNN)最近的进展显示,在智能交通信号控制(DRL)和图形神经网络(GNNS)领域,在智能交通信号控制(GNNS)领域表现出显著的希望,促进了多个交叉点之间的协调,然而,传统方法依赖标准图形结构往往无法捕捉真实世界交通动态所固有的复杂高阶电磁时空相关关系。标准图表无法充分反映道路网络内的空间关系,从而限制了基于图形的方法的有效性。相比之下,定向高射线为多个节点之间的复杂定向关系模型提供了更准确的空间信息。在本文件中,我们提议DHLight,这是一个新型的多剂政策基础框架,协同整合了定向高射线学习模块。这个框架引入了一个新的动态高射线建设机制,它捕捉了道路网络交叉点之间复杂和不断变化的空间-时空关系。通过利用指定的超光线关系结构,DHLight授权代理器在交通信号控制方面实现适应性决策。DHLight(DLight)的有效性通过在网络数据设置的大规模实验中支持这个数据库/RPRODLA/LMs。
Article 47
Title@2025-07-18 (5): Technical Implementation of Tippy: Multi-Agent Architecture and System Design for Drug Discovery Laboratory Automation
Title: Technical Implementation of Tippy: Multi-Agent Architecture and System Design for Drug Discovery Laboratory Automation | Technische Umsetzung von Tippy: Multi-Agent Architektur und Systemdesign für Drug Discovery Laborautomation | Tippy:药物发现实验室自动化多机构建筑和系统设计技术实施 2507.17852v1 |
Authors (12): Yao Fehlis, Charles Crain, Aidan Jensen, Michael Watson, James Juhasz, Paul Mandel, Betty Liu, Shawn Mahon, Daren Wilson, Nick Lynch-Jonely, Ben Leedom, David Fuller
Building on the conceptual framework presented in our previous work on agentic AI for pharmaceutical research, this paper provides a comprehensive technical analysis of Tippy’s multi-agent system implementation for drug discovery laboratory automation. We present a distributed microservices architecture featuring five specialized agents (Supervisor, Molecule, Lab, Analysis, and Report) that coordinate through OpenAI Agents SDK orchestration and access laboratory tools via the Model Context Protocol (MCP). The system architecture encompasses agent-specific tool integration, asynchronous communication patterns, and comprehensive configuration management through Git-based tracking. Our production deployment strategy utilizes Kubernetes container orchestration with Helm charts, Docker containerization, and CI/CD pipelines for automated testing and deployment. The implementation integrates vector databases for RAG functionality and employs an Envoy reverse proxy for secure external access. This work demonstrates how specialized AI agents can effectively coordinate complex laboratory workflows while maintaining security, scalability, reliability, and integration with existing laboratory infrastructure through standardized protocols.
本文件以我们以前关于制药研究的代理AI工作所提出的概念框架为基础,对Tippy的药物发现实验室自动化多剂系统实施情况进行了全面技术分析,我们提出了一个分布式微服务结构,由五种专门代理(Supervisor、Molecule、Lab、分析和报告)组成,通过OpenAI代理SDK管弦和通过《示范背景议定书》进入实验室工具进行协调。系统结构包括特定代理工具的整合、无同步通信模式以及通过基于Git的跟踪进行的全面配置管理。我们的生产部署战略利用Kubernetes集装箱与Helm图的交织、Docker集装箱化和CI/CD管道进行自动测试和部署。执行这一结构将RAG功能的矢量数据库结合起来,并使用代为外部安全访问的代言人。这项工作表明,专门的AI代理能够有效地协调复杂的实验室工作流程,同时通过标准化协议维护安全、可扩展性、可靠性和与现有实验室基础设施的整合。
Article 48
Title@2025-07-18 (5): Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation
Title: Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation | Agentische Neuronale Netzwerke: Selbstständige Multi-Agenten-Systeme über textuelle Backpropagation | 动态神经网络:通过文字反向分析实现自我演进的多行为者系统 2506.09046v2 |
Authors (5): Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma
Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative “team” focused on a specific subtask. Agentic Neural Network follows a two-phase optimization strategy: (1) Forward Phase-Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase-Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables ANN to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across four benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements. Our findings indicate that ANN provides a scalable, data-driven framework for multi-agent systems, combining the collaborative capabilities of LLMs with the efficiency and flexibility of neural network principles. We plan to open-source the entire framework.
利用多种大语言模型(LLMs)已证明对处理复杂、高层面任务十分有效,但当前的做法往往依赖于静态、手工设计的多剂配置。为了克服这些制约因素,我们介绍了将多剂合作概念化为分层神经网络架构的Annor 神经网络(ANN),这是一个将多剂合作概念化为多层神经网络架构的框架。在这一设计中,每个代理作为节点运作,每个层次形成一个合作的“团队”,侧重于特定的子任务。 Agentic Neal网络遵循一个两个阶段的开放优化战略:(1) 从神经网络前传传到前方的先期阶段-逐步开发灵感,任务被动态地分解成子任务,而合作剂团队采用适当的聚合方法,通过层构建。 (2) 后向的阶段-移动后阶段-回调再调,我们通过反复反馈改进全球和地方合作,使代理机构能够自行改变自己的作用、迅速和协调。 这种神经-共振方针使ANNE整个代理团队能够创建新的或专门的后期培训团队,在精确和可调适度方面取得显著的进展。横跨四个基准式数据库中,ANNNNS-res-hex-lades-lax-lax-lax-lax-res-lax-lax-lax-lax-laxxxx-lax-laxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Article 49
Title@2025-07-18 (5): A Minimalist Controller for Autonomously Self-Aggregating Robotic Swarms: Enabling Compact Formations in Multitasking Scenarios
Title: A Minimalist Controller for Autonomously Self-Aggregating Robotic Swarms: Enabling Compact Formations in Multitasking Scenarios | Minimalistische Steuerung für autonom selbstaggregierende Roboterschwärme: Ermöglichung kompakter Formationen in Multitasking-Szenarien | 自主自我集中的机械式摇篮:多任务情景中有利的契约形式 2507.13969v1 |
Authors (4): Maria Eduarda Silva de Macedo, Ana Paula Chiarelli de Souza, Roberto Silvio Ubertino Rosso Jr., Yuri Kaszubowski Lopes
The deployment of simple emergent behaviors in swarm robotics has been well-rehearsed in the literature. A recent study has shown how self-aggregation is possible in a multitask approach – where multiple self-aggregation task instances occur concurrently in the same environment. The multitask approach poses new challenges, in special, how the dynamic of each group impacts the performance of others. So far, the multitask self-aggregation of groups of robots suffers from generating a circular formation – that is not fully compact – or is not fully autonomous. In this paper, we present a multitask self-aggregation where groups of homogeneous robots sort themselves into different compact clusters, relying solely on a line-of-sight sensor. Our multitask self-aggregation behavior was able to scale well and achieve a compact formation. We report scalability results from a series of simulation trials with different configurations in the number of groups and the number of robots per group. We were able to improve the multitask self-aggregation behavior performance in terms of the compactness of the clusters, keeping the proportion of clustered robots found in other studies.
文献中已经很好地记录了在群温机器人中部署简单的突发行为。最近的一项研究显示,在多任务方法中,自我聚合是可能的 – – 在同一环境中同时发生多重自我聚合任务事件。多任务方法提出了新的挑战,特别是每个组群的动态如何影响他人的性能。迄今为止,多任务自聚合机器人组群受到循环形成的影响 – – 并不完全紧凑 – – 或并非完全自主。在本文中,我们展示了多任务自我聚合,即同质机器人组群本身分为不同的紧凑组群,只依靠直观传感器。我们的多任务自聚合行为能够很好地缩放,并实现一个紧凑的形成。我们报告了一系列模拟试验的可缩放性结果,这些模拟试验的组合组群和每个组组群中机器人的数目不同。我们得以根据群集的紧凑性改进多任务自聚合行为表现,将组群集的机器人比例保留在其他研究中。
Article 50
Title@2025-07-18 (5): Beyond DNS: Unlocking the Internet of AI Agents via the NANDA Index and Verified AgentFacts
Title: Beyond DNS: Unlocking the Internet of AI Agents via the NANDA Index and Verified AgentFacts | Beyond DNS: Entsperren des Internets von KI-Agenten über den NANDA Index und verifizierte AgentFacts | 超越DNS:通过NANDA指数和经核实的代理活动解锁AI代理商的互联网 2507.14263v1 |
Authors (18): Ramesh Raskar, Pradyumna Chari, John Zinky, Mahesh Lambe, Jared James Grogan, Sichao Wang, Rajesh Ranjan, Rekha Singhal, Shailja Gupta, Robert Lincourt, Raghu Bala, Aditi Joshi, Abhishek Singh, Ayush Chopra, Dimitris Stripelis, Bhuwan B, Sumit Kumar, Maria Gorskikh
The Internet is poised to host billions to trillions of autonomous AI agents that negotiate, delegate, and migrate in milliseconds and workloads that will strain DNS-centred identity and discovery. In this paper, we describe the NANDA index architecture, which we envision as a means for discoverability, identifiability and authentication in the internet of AI agents. We present an architecture where a minimal lean index resolves to dynamic, cryptographically verifiable AgentFacts that supports multi-endpoint routing, load balancing, privacy-preserving access, and credentialed capability assertions. Our architecture design delivers five concrete guarantees: (1) A quilt-like index proposal that supports both NANDA-native agents as well as third party agents being discoverable via the index, (2) rapid global resolution for newly spawned AI agents, (3) sub-second revocation and key rotation, (4) schema-validated capability assertions, and (5) privacy-preserving discovery across organisational boundaries via verifiable, least-disclosure queries. We formalize the AgentFacts schema, specify a CRDT-based update protocol, and prototype adaptive resolvers. The result is a lightweight, horizontally scalable foundation that unlocks secure, trust-aware collaboration for the next generation of the Internet of AI agents, without abandoning existing web infrastructure.
互联网准备容纳数十亿至数万亿个自主的AI代理商,这些代理商以毫秒和工作量进行谈判、委托和迁移,从而给以DNS为中心的身份和发现带来压力。在本文中,我们描述了NANDA指数结构,我们设想它作为在AI代理商的互联网上发现、识别和认证的一种手段。我们提出了一个结构,在这个结构中,一个微小的精干指数能够解决动态的、加密的、可核实的、可加密的代理商问题,以及支持多端点路由、负载平衡、隐私保护访问和认证能力声明。我们的建筑设计提供了五种具体保证:(1) 一个类似于QIDT的索引建议,支持NANDA型代理商和第三方代理商通过该索引被发现、(2) 为新生成的AI代理商迅速制定全球解决方案,(3) 二次撤销和关键轮换,(4) 系统化、可验证的能力主张,以及(5) 通过可核查、最不公开的查询,保护隐私的发现跨组织边界的发现。我们正式了AgentFacts 计划,指定了基于CDD的更新协议,以及原型适应式的互联网解决方案式的互联网解决方案,这是一个不具有高度安全的互联网基础。
Article 51
Title@2025-07-18 (5): Towards Regulated Deep Learning
Title: Towards Regulated Deep Learning | Auf dem Weg zu reguliertem Deep Learning | 走向监管的深学习 1912.13122v8 |
Authors (1): Andrés García-Camino
Regulation of Multi-Agent Systems (MAS) and Declarative Electronic Institutions (DEIs) was a multidisciplinary research topic of the past decade involving (Physical and Software) Agents and Law since the beginning, but recently evolved towards News-claimed Robot Lawyer since 2016. One of these first proposals of restricting the behaviour of Software Agents was Electronic Institutions. However, with the recent reformulation of Artificial Neural Networks (ANNs) as Deep Learning (DL), Security, Privacy,Ethical and Legal issues regarding the use of DL has raised concerns in the Artificial Intelligence (AI) Community. Now that the Regulation of MAS is almost correctly addressed, we propose the Regulation of Artificial Neural Networks as Agent-based Training of a special type of regulated Artificial Neural Network that we call Institutional Neural Network (INN).The main purpose of this paper is to bring attention to Artificial Teaching (AT) and to give a tentative answer showing a proof-of-concept implementation of Regulated Deep Learning (RDL). This paper introduces the former concept and provide $I^*$, a language previously used to model declaratively and extend Electronic Institutions, as a means to regulate the execution of Artificial Neural Networks and their interactions with Artificial Teachers (ATs)
多边监管系统(MAS)和标准电子机构(DEI)是过去十年来涉及(物理和软件)代理和法律的多学科研究课题,自始至终一直涉及(物理和软件)代理和法律,但自2016年以来,最近演变为以新闻为名的机器人律师;这些最初限制软件代理行为的建议之一是电子机构;然而,最近将人工神经网络(ANN)改名为深层学习(DL),安全、隐私、伦理和法律问题,这引起了人工智能共同体的关切;由于《MAS条例》几乎得到了正确处理,我们提议将人工神经网络监管作为一种特殊类型的规范的人工神经网络的代理培训,我们称之为机构神经网络(INN)。 本文的主要目的是提请注意人工教学(AT),并给出一个初步答案,表明受校校的深层学习(RDL)的验证执行情况。本文介绍了前概念,并提供了“$I$”,这是以前用来规范示范性教学和扩展机构(Artimais artifical artimal Network)的一种语言,用于示范性教学和扩展机构。
Article 52
Title@2025-07-18 (5): Scalable Submodular Policy Optimization via Pruned Submodularity Graph
Title: Scalable Submodular Policy Optimization via Pruned Submodularity Graph | Skalierbare submodulare Optimierung der Politik über Pruned Submodularity Graph | 通过审慎次模块图实现可缩放子模块政策优化 2507.13834v1 |
Authors (3): Aditi Anand, Suman Banerjee, Dildar Ali
In Reinforcement Learning (abbreviated as RL), an agent interacts with the environment via a set of possible actions, and a reward is generated from some unknown distribution. The task here is to find an optimal set of actions such that the reward after a certain time step gets maximized. In a traditional setup, the reward function in an RL Problem is considered additive. However, in reality, there exist many problems, including path planning, coverage control, etc., the reward function follows the diminishing return, which can be modeled as a submodular function. In this paper, we study a variant of the RL Problem where the reward function is submodular, and our objective is to find an optimal policy such that this reward function gets maximized. We have proposed a pruned submodularity graph-based approach that provides a provably approximate solution in a feasible computation time. The proposed approach has been analyzed to understand its time and space requirements as well as a performance guarantee. We have experimented with a benchmark agent-environment setup, which has been used for similar previous studies, and the results are reported. From the results, we observe that the policy obtained by our proposed approach leads to more reward than the baseline methods.
在强化学习(以RL为缩放)中,一个代理商通过一系列可能的行动与环境互动,并从一些未知的分布中产生奖赏。这里的任务是找到一套最优的行动,以便在某一时间步骤后获得最大程度的奖赏。在传统的设置中,RL问题的奖赏功能被认为是累加性的。然而,在现实中,奖赏功能随着递减回报而出现许多问题,包括路径规划、覆盖控制等,奖赏功能可以作为子模式功能的模型。在本文中,我们研究了RL问题的变式,奖励功能是子模式的,我们的目标是找到一种最佳的政策,使这一奖赏功能最大化。我们提出了一个纯化的子模式图式方法,在可行的计算时间里提供一种优美的近似近似解决方案。对拟议方法进行了分析,以了解其时间和空间要求以及绩效保证。我们实验了一种基准的代理商-环境设置,用于类似的前研究,结果被报告。我们从结果中看到,我们所拟议的奖赏方法比基线更接近。
Article 53
Title@2025-07-18 (5): CodeEdu: A Multi-Agent Collaborative Platform for Personalized Coding Education
Title: CodeEdu: A Multi-Agent Collaborative Platform for Personalized Coding Education | CodeEdu: Eine Multi-Agenten-Kollaborative Plattform für personalisierte Coding-Bildung | CodeEdu:个人化编码教育多机构合作平台 2507.13814v1 |
Authors (8): Jianing Zhao, Peng Gao, Jiannong Cao, Zhiyuan Wen, Chen Chen, Jianing Yin, Ruosong Yang, Bo Yuan
Large Language Models (LLMs) have demonstrated considerable potential in improving coding education by providing support for code writing, explanation, and debugging. However, existing LLM-based approaches generally fail to assess students’ abilities, design learning plans, provide personalized material aligned with individual learning goals, and enable interactive learning. Current work mostly uses single LLM agents, which limits their ability to understand complex code repositories and schedule step-by-step tutoring. Recent research has shown that multi-agent LLMs can collaborate to solve complicated problems in various domains like software engineering, but their potential in the field of education remains unexplored. In this work, we introduce CodeEdu, an innovative multi-agent collaborative platform that combines LLMs with tool use to provide proactive and personalized education in coding. Unlike static pipelines, CodeEdu dynamically allocates agents and tasks to meet student needs. Various agents in CodeEdu undertake certain functions specifically, including task planning, personalized material generation, real-time QA, step-by-step tutoring, code execution, debugging, and learning report generation, facilitated with extensive external tools to improve task efficiency. Automated evaluations reveal that CodeEdu substantially enhances students’ coding performance.
大型语言模型(LLMS)通过为代码写作、解释和调试提供支持,在改进编码教育方面显示出相当大的潜力;然而,现有的LLM方法通常未能评估学生的能力,设计学习计划,提供符合个人学习目标的个性化材料,并促成互动学习;目前的工作大多使用单一LLM代理,这限制了他们理解复杂的代码库和逐步安排辅导的能力;最近的研究表明,多试办LMS可以合作解决软件工程等各个领域的复杂问题,但它们在教育领域的潜力仍未开发;在这项工作中,我们引入了CodeEdu,这是一个创新的多机构协作平台,将LMS与提供主动和个性化教育的工具结合起来用于编码。与静态管道不同,CodEdu动态地分配代理和任务以满足学生的需要。DCEdu的各种代理具体承担了某些职能,包括任务规划、个性化材料生成、实时QA、逐步辅导、代码执行、调试和学习报告生成等,以广泛的外部工具大大改进工作效率。
Article 54
Title@2025-07-18 (5): Acceleration of Gossip Algorithms through the Euler-Poisson-Darboux Equation
Title: Acceleration of Gossip Algorithms through the Euler-Poisson-Darboux Equation | Beschleunigung der Gossip-Algorithmen durch die Euler-Poisson-Darboux-Gleichung | 通过Euler-Poisson-Darboux赤道加速戈斯普算法 2202.10742v2 |
Authors (2): Raphaël Berthier, Mufan Bill Li
Gossip algorithms and their accelerated versions have been studied exclusively in discrete time on graphs. In this work, we take a different approach, and consider the scaling limit of gossip algorithms in both large graphs and large number of iterations. These limits lead to well-known partial differential equations (PDEs) with insightful properties. On lattices, we prove that the non-accelerated gossip algorithm of Boyd et al. [2006] converges to the heat equation, and the accelerated Jacobi polynomial iteration of Berthier et al. [2020] converges to the Euler-Poisson-Darboux (EPD) equation - a damped wave equation. Remarkably, with appropriate parameters, the fundamental solution of the EPD equation has the ideal gossip behaviour: a uniform density over an ellipsoid, whose radius increases at a rate proportional to t - the fastest possible rate for locally communicating gossip algorithms. This is in contrast with the heat equation where the density spreads on a typical scale of $\sqrt{t}$. Additionally, we provide simulations demonstrating that the gossip algorithms are accurately approximated by their limiting PDEs.
Gossip 算法及其加速版本在图表上的离散时间专门研究。 在这项工作中, 我们采取了不同的方法, 并考虑了大图表和大量迭代中八卦算法的缩放限制。 这些限制导致众所周知的局部差异方程式( PDEs ) , 具有有洞察力的特性。 在 lattices 上, 我们证明 Boyd et al. [ 2006] 的非加速八卦算法及其加速版本与热方程相匹配, 以及 Berthier 等人( 20202020 ) 加速的 coupi 多元代谢变等式与 Euler- Poisson- Darbuux ( ELPD) 等式相交汇 — 一种倾斜波方程式。 在有适当参数的情况下, EPDD 等式的基本解决方案具有理想的八流行为: 一种统一的密度, 其半径与 t 的比例增长 — 当地传播八理算算算算法的最快速度率。 这与热方程式形成对照, 其密度在典型的 $\ qrt{ t} 。 此外, 我们提供了精确的模拟分析。
Article 55
Title@2025-07-18 (5): From Firms to Computation: AI Governance and the Evolution of Institutions
Title: From Firms to Computation: AI Governance and the Evolution of Institutions | Von Unternehmen zur Berechnung: KI-Governance und die Entwicklung von Institutionen | 从公司到计算:AI 治理和机构演变 2507.13616v1 |
Authors (1): Michael S. Harre
The integration of agential artificial intelligence into socioeconomic systems requires us to reexamine the evolutionary processes that describe changes in our economic institutions. This article synthesizes three frameworks: multi-level selection theory, Aoki’s view of firms as computational processes, and Ostrom’s design principles for robust institutions. We develop a framework where selection operates concurrently across organizational levels, firms implement distributed inference via game-theoretic architectures, and Ostrom-style rules evolve as alignment mechanisms that address AI-related risks. This synthesis yields a multi-level Price equation expressed over nested games, providing quantitative metrics for how selection and governance co-determine economic outcomes. We examine connections to Acemoglu’s work on inclusive institutions, analyze how institutional structures shape AI deployment, and demonstrate the framework’s explanatory power via case studies. We conclude by proposing a set of design principles that operationalize alignment between humans and AI across institutional layers, enabling scalable, adaptive, and inclusive governance of agential AI systems. We conclude with practical policy recommendations and further research to extend these principles into real-world implementation.
将代理人工智能纳入社会经济体系要求我们重新审查描述我们经济体制变化的进化过程。本篇文章综合了三个框架:多层次选择理论、Aoki对企业作为计算过程的看法以及Ostrom的健全机构设计原则。我们制定了一个选择同时在组织层面运作的框架,企业通过游戏理论结构实施分布式推论,而奥斯特龙式规则则随着应对与AI相关的风险的调整机制而演变。这一合成产生了一个多层次的价格方程式,在嵌套式游戏中表达,为选择和治理共同决定经济结果提供了量化指标。我们研究了与Acemoglu关于包容性机构工作的联系,分析了体制结构如何塑造AI的部署,并通过案例研究展示了框架的解释力。我们最后提出了一套设计原则,使人类和AI之间在体制层面之间实现协调,从而能够对代理AI系统进行可扩展、适应性和包容性的治理。我们最后提出了切实可行的政策建议和进一步的研究,将这些原则扩展到现实世界的实施。
Article 56
Title@2025-07-17 (4): Nash equilibrium seeking for a class of quadratic-bilinear Wasserstein distributionally robust games
Title: Nash equilibrium seeking for a class of quadratic-bilinear Wasserstein distributionally robust games | Nash Gleichgewicht Suche nach einer Klasse von quadratisch-bilinearen Wasserstein Verteilung robusten Spielen | Nash 均衡, 寻求类二次- 贝里尼奥尔 瓦西斯坦分配强强的游戏 2411.09636v2 |
Authors (3): Georgios Pantazis, Reza Rahimi Baghbadorani, Sergio Grammatico
We consider a class of Wasserstein distributionally robust Nash equilibrium problems, where agents construct heterogeneous data-driven Wasserstein ambiguity sets using private samples and radii, in line with their individual risk-averse behaviour. By leveraging relevant properties of this class of games, we show that equilibria of the original seemingly infinite-dimensional problem can be obtained as a solution to a finite-dimensional Nash equilibrium problem. We then reformulate the problem as a finite-dimensional variational inequality and establish the connection between the corresponding solution sets. Our reformulation has scalable behaviour with respect to the data size and maintains a fixed number of constraints, independently of the number of samples. To compute a solution, we leverage two algorithms, based on the golden ratio algorithm. The efficiency of both algorithmic schemes is corroborated through extensive simulation studies on an illustrative example and a stochastic portfolio allocation game, where behavioural coupling among investors is modeled.
我们认为瓦森斯坦分配上非常稳健的纳什均衡问题,在这类问题上,代理商根据个人风险规避行为,利用私人样本和反射行为,构建了不同数据驱动的瓦森斯坦模棱两可的模棱两可的模样。我们利用这一类游戏的相关特性,可以证明原始看似无限的维度问题的平衡性可以作为有限维度纳什平衡问题的解决办法。然后,我们重新将这一问题描述为有限维度的变异性不平等,并在相应的解决方案之间建立联系。我们的重新拟订在数据大小方面有可缩放的行为,并保持固定数量的制约,与样本数量无关。为了计算一个解决办法,我们根据黄金比例算法,运用两种算法。两种算法的效率都通过对一个示例的广泛模拟研究和一个随机组合分配游戏得到证实,投资者之间的行为组合是模拟的。
Article 57
Title@2025-07-17 (4): Coral Protocol: Open Infrastructure Connecting The Internet of Agents
Title: Coral Protocol: Open Infrastructure Connecting The Internet of Agents | Coral Protocol: Open Infrastructure Connecting Das Internet der Agenten | 珊瑚议定书:开放基础设施连接代理物互联网 2505.00749v2 |
Authors (6): Roman J. Georgio, Caelum Forder, Suman Deb, Andri Rahimov, Peter Carroll, Önder Gürcan
Coral Protocol is an open and decentralized collaboration infrastructure that enables communication, coordination, trust and payments for The Internet of Agents. It addresses the growing need for interoperability in a world where organizations are deploying multiple specialized AI agents that must work together across domains and vendors. As a foundational platform for multi-agent AI ecosystems, Coral establishes a common language and coordination framework allowing any agent to participate in complex workflows with others. Its design emphasizes broad compatibility, security, and vendor neutrality, ensuring that agent interactions are efficient and trustworthy. In particular, Coral introduces standardized messaging formats for agent communication, a modular coordination mechanism for orchestrating multi-agent tasks, and secure team formation capabilities for dynamically assembling trusted groups of agents. Together, these innovations position Coral Protocol as a cornerstone of the emerging “Internet of Agents,” unlocking new levels of automation, collective intelligence, and business value through open agent collaboration.
《珊瑚议定书》是一个开放和分散的协作基础设施,它使代理商的互联网能够进行通信、协调、信任和支付,它解决了在这样一个世界中各组织正在部署多个专门的AI代理商的世界中日益需要互操作性的问题,这些代理商必须在各个领域和供应商之间开展合作。作为多试剂AI生态系统的基础平台,珊瑚岛建立了一个共同的语言和协调框架,允许任何代理商与他人参与复杂的工作流程。它的设计强调广泛的兼容性、安全和供应商中立性,确保代理商的互动是高效和可信赖的。特别是,珊瑚岛为代理商的通信引入了标准化的信息传递格式,这是协调多试剂任务的模块化协调机制,也是动态地集合受信任的代理商团体的团队组建能力。这些创新共同将《珊瑚议定书》定位为新兴的“代理商互联网”的基石,通过开放代理商协作释放新的自动化、集体情报和商业价值。
Article 58
Title@2025-07-17 (4): Imitating Mistakes in a Learning Companion AI Agent for Online Peer Learning
Title: Imitating Mistakes in a Learning Companion AI Agent for Online Peer Learning | Nachahmen von Fehlern in einem Learning Companion KI Agent für Online Peer Learning | 模拟学习伙伴AI在线同行学习代理的错误 2507.12801v1 |
Authors (2): Sosui Moribe, Taketoshi Ushiama
In recent years, peer learning has gained attention as a method that promotes spontaneous thinking among learners, and its effectiveness has been confirmed by numerous studies. This study aims to develop an AI Agent as a learning companion that enables peer learning anytime and anywhere. However, peer learning between humans has various limitations, and it is not always effective. Effective peer learning requires companions at the same proficiency levels. In this study, we assume that a learner’s peers with the same proficiency level as the learner make the same mistakes as the learner does and focus on English composition as a specific example to validate this approach.
近年来,同侪学习作为一种促进学习者自发思维的方法,得到了人们的注意,其有效性也得到了许多研究的证实。这项研究的目的是发展一个AI代理作为学习伙伴,使同侪能够随时随地相互学习。然而,人与人之间的同侪学习有各种限制,而且并不总是有效的。有效的同侪学习需要具有相同熟练水平的同伴。在这个研究中,我们认为,与学习者同样的熟练水平的同侪会犯与学习者一样的错误,并侧重于英语组成,作为验证这一方法的具体例子。