• 00 07-24 (4) Probabilistic Concurrent Reasoning in Outcome Logic: Independence, Conditioning, and Invariants Probabilistische Concurrent Reasoning in Outcome Logic: Unabhängigkeit, Konditionierung und Invarianten 结果逻辑的概率并存理由:独立、条件和不稳定 2411.11662v2
  • 01 07-24 Higher-Order Behavioural Conformances via Fibrations Behavioural Conformances höherer Ordnung durch Fibrationen 通过纤维纤维达到较高等级的行为合规 2507.18509v1
  • 02 07-24 Language-Integrated Recursive Queries Sprachintegrierte rekursive Abfragen 语言综合递归查询 2504.02443v2
  • 03 07-24 Building an Accelerated OpenFOAM Proof-of-Concept Application using Modern C++ Aufbau einer beschleunigten OpenFOAM Proof-of-Concept-Anwendung mit modernem C++ 利用现代C++建立加速的开放有机金融AM系统概念校验应用 2507.18268v1
  • 04 07-23 (3) Multi-Relational Algebra for Multi-Granular Data Analytics Multi-Relationale Algebra für Multi-Granular Data Analytics 多组合数据分析法多关系代数 2311.04824v6
  • 05 07-23 CASCADE: LLM-Powered JavaScript Deobfuscator at Google CASCADE: LLM-Powered JavaScript Deobfuscator bei Google CASCADE: 谷歌的LLM Powered JavaScript Deobfuscator 谷歌的LLM Powered JavaScript Deobfuscator 2507.17691v1
  • 06 07-23 Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees Effiziente Neuralnetzverifizierung durch Order Leading Exploration von Zweig-und-Bound-Bäumen 通过分树和环形树的有序主要勘探进行高效神经网络核查 2507.17453v1
  • 07 07-23 A Diagrammatic Calculus for a Functional Model of Natural Language Semantics Ein diagrammatischer Kalkulus für ein funktionelles Modell der natürlichen Sprachsemantik 自然语言语义学功能模型的图表计算 2507.00782v2
  • 08 07-23 Hiord: An Approach to the Specification and Verification of Higher-Order (C)LP Programs Hiord: Ein Ansatz zur Spezifikation und Überprüfung von Higher-Order (C)LP-Programmen Hiord: 高级命令(C)LP方案规格和核查办法 2507.17233v1
  • 09 07-23 Querying Graph-Relational Data Abfrage von Graph-Relationalen Daten 查询图表关系数据 2507.16089v2
  • 10 07-23 Mapple: A Domain-Specific Language for Mapping Distributed Heterogeneous Parallel Programs Mapple: Eine Domain-spezifische Sprache für Mapping Verteilte Heterogene Parallelprogramme Mapple: 用于测绘分布式异基因平行方案的一种特定域语言 2507.17087v1
  • 11 07-22 (2) Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms Hydra: Virtualisierte Mehrsprachen-Laufzeit für hochdichte serverlose Plattformen Hydal: 高密度无服务器平台虚拟化多语言运行时间 2212.10131v3
  • 12 07-22 Enhancing Compiler Optimization Efficiency through Grammatical Decompositions of Control-Flow Graphs Steigerung der Effizienz der Kompileroptimierung durch grammatische Zersetzungen von Control-Flow Graphen 通过控制花形图的外表分解,提高编译者优化效率 2507.16660v1
  • 13 07-22 Hear Your Code Fail, Voice-Assisted Debugging for Python Hören Sie Ihren Code fehlschlagen, Voice-Assisted Debugging für Python 听到您的代码失效, 语音协助调试 Python 的调试 2507.15007v2
  • 14 07-21 (1) Bialgebraic Reasoning on Stateful Languages Bialgebraische Vernunft auf Stateful Languages 国语语比数理由 2503.10955v2
  • 15 07-21 GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities GitChameleon 2.0: Bewertung der KI-Codegenerierung gegen Python Library Version Inkompatibilitäten GitChameleon 2.0:评估AI 与 Python 图书馆版本的不兼容性 2507.12367v2
  • 16 07-21 Understanding Haskell-style Overloading via Open Data and Open Functions Haskell-ähnliches Überladen über offene Daten und offene Funktionen verstehen 通过 Open Data 和 Open 函数超载理解 Haskell 模式 2507.16086v1
  • 17 07-21 Abstracting Extensible Recursive Functions Abstracting Extensible Rekursive Funktionen 抽象化扩展递归函数 2410.11742v2
  • 18 07-21 RightTyper: Effective and Efficient Type Annotation for Python RightTyper: Effektive und effiziente Typ-Annotation für Python RightTyper: Python 有效、高效型号注解 2507.16051v1
  • 19 07-21 Autocomp: LLM-Driven Code Optimization for Tensor Accelerators Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v3
  • 20 07-21 What does it take to certify a conversion checker? Was braucht es, um einen Conversion Checker zu bescheinigen? 需要用什么来证明转换核对器? 2502.15500v2
  • 21 07-21 Closure Conversion, Flat Environments, and the Complexity of Abstract Machines Verschlussumwandlung, flache Umgebungen und die Komplexität abstrakter Maschinen 关闭转换、平坦环境和简易机器的复杂性 2507.15843v1
  • 22 07-21 Formal Analysis of Networked PLC Controllers Interacting with Physical Environments Formale Analyse von vernetzten SPS-Controllern, die mit physikalischen Umgebungen interagieren 对联网PLC主计长与自然环境互动的正式分析 2507.15596v1
  • 23 07-21 Bayesian Separation Logic Bayesische Trennungslogik Bayesian 分离逻辑 2507.15530v1
  • 24 07-21 Quantum Programming in Polylogarithmic Time Quantenprogrammierung in polylogarithmischer Zeit 聚合时间的量子规划 2507.15415v1
  • 25 07-21 Don’t exhaust, don’t waste Nicht ausschöpfen, nicht verschwenden 不要用尽,不要浪费,不要浪费 2507.13792v2
  • 26 07-21 A Few Fit Most: Improving Performance Portability of SGEMM on GPUs using Multi-Versioning Ein paar Fit Most: Verbesserung der Performance-Portabilität von SGEMM auf GPUs mit Multi-Versioning 少数几个最适合:利用多变方法提高SGEMM在GPU上的性能便捷性 2507.15277v1
  • 27 07-20 (7) Invariant Generation for Floating-Point Programs via Constraint Solving Invariante Generierung für Floating-Point-Programme über Constraint Solving 通过约束性溶解为浮动点程序生成变量 2507.15017v1
  • 28 07-19 (6) Timetide: A programming model for logically synchronous distributed systems Timetide: Ein Programmiermodell für logisch synchron verteilte Systeme 时针:逻辑同步分布系统编程模型 2507.14471v1
  • 29 07-18 (5) NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers NPUEval: Optimierung von NPU-Kerneln mit LLMs und Open Source Compilern NPUEval:利用LLMS和开放源码汇编器优化NPU核心内核 2507.14403v1
  • 30 07-18 Towards Regulated Deep Learning Auf dem Weg zu reguliertem Deep Learning 走向监管的深学习 1912.13122v8
  • 31 07-18 AdapTT: Functoriality for Dependent Type Casts AdapTT: Funktorialität für abhängige Typ Casts AdapTT: 依附性种类的杂交性 2507.13774v1
  • 32 07-18 Frex: dependently-typed algebraic simplification Frex: abhängig typisierte algebraische Vereinfachung Frex: 依赖型代数简化 2306.15375v2
  • 33 07-18 Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models Modellierung der Open-World-Kognition als On-Demand-Synthese probabilistischer Modelle 将开放世界的认知建模作为概率模型的 “ 现场合成 “ 模型 2507.12547v2
  • 34 07-18 SAQR-QC: A Logic for Scalable but Approximate Quantitative Reasoning about Quantum Circuits SAQR-QC: Eine Logik für skalierbare, aber annähernd quantitative Begründung von Quantenkreisen SAQR-QC:可缩放但近似量控量用于量子电路的逻辑 2507.13635v1
  • 35 07-18 Extensional and Non-extensional Functions as Processes Erweiterungs- und Nichterweiterungsfunktionen als Prozesse 作为处理过程的扩展和非扩展函数 2405.03536v3
  • 36 07-17 (4) Increasing the Expressiveness of a Gradual Verifier Steigerung der Expressivität eines Gradualen Prüfers 提高逐步验证者的表达性 2507.13533v1
  • 37 07-17 AI-Assisted Fixes to Code Review Comments at Scale AI-Assisted Fixes to Code Review Kommentare auf Scale AI 协助制定标准标准代码审查评论 2507.13499v1
  • 38 07-17 Random Variate Generation with Formal Guarantees Zufallsvariate Generation mit formalen Garantien 具有正式保障的随机流动养殖 2507.13494v1
  • 39 07-17 GPU Performance Portability needs Autotuning GPU Performance Portability benötigt Autotuning GPU 性能表现 便捷性需要自动调节 2505.03780v3
  • 40 07-17 Towards Formal Verification of LLM-Generated Code from Natural Language Prompts Auf dem Weg zu einer formalen Überprüfung des LLM-generierten Codes aus natürlichen Sprachprompts 争取正式核查自然语言提示法LLM产生的守则 2507.13290v1
  • 41 07-17 Secure Parsing and Serializing with Separation Logic Applied to CBOR, CDDL, and COSE Sichere Parsing und Serialisierung mit Separation Logic auf CBOR, CDDL und COSE angewendet 安全分解和序列化,与分离逻辑结合应用到CBOR、CDDL和COSE 2505.17335v2
  • 42 07-17 Formal Verification for JavaScript Regular Expressions: a Proven Semantics and its Applications Formale Überprüfung für JavaScript Reguläre Ausdrücke: eine bewährte Semantik und ihre Anwendungen JavaScript 常规表达式:经验证的语义及其应用的正式验证 2507.13091v1
  • 43 07-17 Syntax Repair as Language Intersection Syntax-Reparatur als Sprach-Intersektion 语法修理作为语言交汇处 2507.11873v2

Article 0

Title@2025-07-24 (4): Probabilistic Concurrent Reasoning in Outcome Logic: Independence, Conditioning, and Invariants

Title: Probabilistic Concurrent Reasoning in Outcome Logic: Independence, Conditioning, and Invariants Probabilistische Concurrent Reasoning in Outcome Logic: Unabhängigkeit, Konditionierung und Invarianten 结果逻辑的概率并存理由:独立、条件和不稳定 2411.11662v2

Authors (3): Noam Zilberstein, Alexandra Silva, Joseph Tassarotti

Although randomization has long been used in distributed computing, formal methods for reasoning about probabilistic concurrent programs have lagged behind. No existing program logics can express specifications about the full distributions of outcomes resulting from programs that are both probabilistic and concurrent. To address this, we introduce Probabilistic Concurrent Outcome Logic (pcOL), which incorporates ideas from concurrent and probabilistic separation logics into Outcome Logic to introduce new compositional reasoning principles. At its core, pcOL reinterprets the rules of Concurrent Separation Logic in a setting where separation models probabilistic independence, so as to compositionally describe joint distributions over variables in concurrent threads. Reasoning about outcomes also proves crucial, as case analysis is often necessary to derive precise information about threads that rely on randomized shared state. We demonstrate pcOL on a variety of examples, including to prove almost sure termination for unbounded loops.

尽管在分布式计算中长期以来一直使用随机化,但关于概率并行程序的正式推理方法却落后于以往。任何现有的程序逻辑都无法对概率性和同时性两种程序所产生的结果的全面分布作出具体说明。为了解决这个问题,我们引入了概率性同时结果逻辑(pcOL),将同时性和概率分离逻辑的理念纳入结果逻辑,以引入新的构成性推理原则。在核心,PcOL在分离模型概率独立的环境中重新解释同时分离逻辑规则,从而在结构上描述对同时线变量的联合分布。结果的理由也证明至关重要,因为案例分析往往有必要获得依赖随机共有状态的线索的准确信息。我们用多种实例展示PcOL,包括证明几乎可以确定无界循环的终止。


Article 1

Title@2025-07-24 (4): Higher-Order Behavioural Conformances via Fibrations

Title: Higher-Order Behavioural Conformances via Fibrations Behavioural Conformances höherer Ordnung durch Fibrationen 通过纤维纤维达到较高等级的行为合规 2507.18509v1

Authors (1): Henning Urbat

Coinduction is a widely used technique for establishing behavioural equivalence of programs in higher-order languages. In recent years, the rise of languages with quantitative (e.g.~probabilistic) features has led to extensions of coinductive methods to more refined types of behavioural conformances, most notably notions of behavioural distance. To guarantee soundness of coinductive reasoning, one needs to show that the behavioural conformance at hand forms a program congruence, i.e. it is suitably compatible with the operations of the language. This is usually achieved by a complex proof technique known as \emph{Howe’s method}, which needs to be carefully adapted to both the specific language and the targeted notion of behavioural conformance. We develop a uniform categorical approach to Howe’s method that features two orthogonal dimensions of abstraction: (1) the underlying higher-order language is modelled by an \emph{abstract higher-order specification} (AHOS), a novel and very general categorical account of operational semantics, and (2) notions of behavioural conformance (such as relations or metrics) are modelled via fibrations over the base category of an AHOS. Our main result is a fundamental congruence theorem at this level of generality: Under natural conditions on the categorical ingredients and the operational rules of a language modelled by an AHOS, the greatest behavioural (bi)conformance on its operational model forms a congruence. We illustrate our theory by deriving congruence of bisimilarity and behavioural pseudometrics for probabilistic higher-order languages.

  1. 最近几年,具有定量(例如~概率)特征的语言的上升导致创造性方法扩展至更精细的行为一致性类型,最明显的是行为距离概念。为了保证硬度推理的正确性,人们需要表明手头的行为一致性是一种与语言操作相容的方案一致性,即它与语言运行相适应。这通常是通过一种称为 emph{Howe’s 方法} 的复杂证据技术实现的。 该技术需要谨慎地适应特定语言和行为一致性的定向概念。我们开发了一种统一的直截了当的方法,其特征是抽象的两个或两个不同的层面:(1) 基本高阶语言以\emph{tract 更高层级的规格为模范(AHOS),一个新型和非常直观的操作语性描述,以及(2) 行为一致性(例如关系或度)的概念,是用最直观的运行性规则的直径直径比,是基础的直线度。

Article 2

Title@2025-07-24 (4): Language-Integrated Recursive Queries

Title: Language-Integrated Recursive Queries Sprachintegrierte rekursive Abfragen 语言综合递归查询 2504.02443v2

Authors (4): Anna Herlihy, Amir Shaikhha, Anastasia Ailamaki, Martin Odersky

Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, rely on fixed-point computations. The introduction of recursive common table expressions (CTEs) using the WITH RECURSIVE keyword in SQL:1999 extended the ability of relational database systems to handle fixed-point computations, unlocking significant performance advantages by allowing computation to move closer to the data. Yet with recursion, SQL becomes a Turing-complete programming language and, with that, unrecoverable safety and correctness risks. SQL itself lacks a fixed semantics, as the SQL specification is written in natural language, full of ambiguities that database vendors resolve in divergent ways. As a result, reasoning about the correctness of recursive SQL programs must rely on isolated mathematical properties of queries rather than wrestling a unified formal model out of a language with notoriously inconsistent semantics. To address these challenges, we propose a calculus that automatically derives mathematical properties from embedded recursive queries and, depending on the database backend, rejects queries that may lead to the three classes of recursive query errors - database errors, incorrect results, and non-termination. We introduce TyQL, a practical implementation in Scala for safe, recursive language-integrated query. Using Named-Tuples and type-level pattern matching, TyQL ensures query portability and safety, showing no performance penalty compared to raw SQL strings while unlocking a three-orders-of-magnitude speedup over non-recursive SQL queries.

SQL本身缺乏固定的语义,因为SQL规格是用自然语言编写的,数据库供应商以不同方式解决的非模糊性。因此,关于递归性 SQL 程序是否准确的推论必须依靠孤立的数学特性来进行查询,而不是通过让计算方法更接近于数据而将统一的正式模型从一种臭名昭著的语义中挤出。为了应对这些挑战,我们建议一种计算法,从嵌入的反复查询中自动获得数学特性,并视数据库的后端情况,拒绝可能导致递归性精度为三类的SQL 标准、数据库供应商以不同方式解决的非模糊性。结果是,关于递归性 SQL 程序是否准确性的理由必须依靠孤立的查询数学属性,而不是通过一个与臭名不相容的语义不一致的语言拼出一个统一的正式模型。为了应对这些挑战,我们建议一种计算法,从嵌入的反复性查询中自动获得数学特性,并取决于数据库的后端点,拒绝可能导致循环性查询质量等级的三个类别,而SQL ,在实际的S-Sral-real-real trueal rodition Scal rodual roud 上显示一个不正确的Silal 。


Article 3

Title@2025-07-24 (4): Building an Accelerated OpenFOAM Proof-of-Concept Application using Modern C++

Title: Building an Accelerated OpenFOAM Proof-of-Concept Application using Modern C++ Aufbau einer beschleunigten OpenFOAM Proof-of-Concept-Anwendung mit modernem C++ 利用现代C++建立加速的开放有机金融AM系统概念校验应用 2507.18268v1

Authors (5): Giulio Malenza, Giovanni Stabile, Filippo Spiga, Robert Birke, Marco Aldinucci

The modern trend in High-Performance Computing (HPC) involves the use of accelerators such as Graphics Processing Units (GPUs) alongside Central Processing Units (CPUs) to speed up numerical operations in various applications. Leading manufacturers such as NVIDIA, Intel, and AMD are constantly advancing these architectures, augmenting them with features such as mixed precision, enhanced memory hierarchies, and specialised accelerator silicon blocks (e.g., Tensor Cores on GPU or AMX/SME engines on CPU) to enhance compute performance. At the same time, significant efforts in software development are aimed at optimizing the use of these innovations, seeking to improve usability and accessibility. This work contributes to the state-of-the-art of OpenFOAM development by presenting a working Proof-Of-Concept application built using modern ISO C++ parallel constructs. This approach, combined with an appropriate compiler runtime stack, like the one provided by the NVIDIA HPC SDK, makes it possible to accelerate well-defined kernels, allowing multi-core execution and GPU offloading using a single codebase. The study demonstrates that it is possible to increase the performance of the OpenFOAM laplacianFoam application by offloading the computations on NVIDIA GPUs using the C++ parallel construct.

高性能计算(HPC)的现代趋势涉及与中央处理股(CPU)一起使用图形处理股(GPUs)等加速器来加速各种应用中的数字操作。像NVIDIA、英特尔和AMD等主要制造商一直在不断推进这些结构,以混合精度、增强记忆等级和专用加速器硅合块(例如,GPU或AMX/SME引擎的Tensor Corps)等特征来增强计算性能。与此同时,软件开发方面的重大努力旨在优化这些创新的使用,力求提高可用性和无障碍性。这项工作通过展示使用现代ISO C++ 平行结构构建的工作校验设备应用程序,促进OpFAM开发的状态。这个方法与适当的编译器运行时间堆一起,如由NVIDA HPC SDK提供的编程堆,使得能够利用定义良好的C-BOF内核应用, 显示OFAM的多级性能,从而通过GAM的单个计算工具,通过ODRA的升级来提高运行。


Article 4

Title@2025-07-23 (3): Multi-Relational Algebra for Multi-Granular Data Analytics

Title: Multi-Relational Algebra for Multi-Granular Data Analytics Multi-Relationale Algebra für Multi-Granular Data Analytics 多组合数据分析法多关系代数 2311.04824v6

Authors (5): Xi Wu, Eugene Wu, Zichen Zhu, Fengan Li, Jeffrey F. Naughton

In modern data analytics, analysts frequently face the challenge of searching for desirable entities by evaluating, for each entity, a collection of its feature relations to derive key analytical properties. This search is challenging because the definitions of both entities and their feature relations may span multiple, varying granularities. Existing constructs such as GROUP BY CUBE, GROUP BY GROUPING SETS, ARRAY AGGREGATE, WINDOW functions, OLAP cube, and various data explanation paradigms aim to facilitate such analyses, but all exhibit limitations in terms of composability, clear specifications, and performance. To address these challenges, we introduce Multi-Relational Algebra (MRA), which generalizes relational algebra with two core data abstractions: RelationSpace, for managing collections of relations, and SliceRelation, which structures data around entities with corresponding relation-valued features. MRA introduces a rich set of operators for transforming data between these representations, enabling complex multi-granular analysis in a modular and declarative way. An early version of MRA is in production at Google, supporting diverse data insight applications. This paper describes the motivation for MRA, its formalism, implementation, and future opportunities.

在现代数据分析学中,分析家经常面临寻找适当实体的挑战,方法是对每个实体进行特征关系的收集,以获得关键分析特性。这种搜索具有挑战性,因为两个实体的定义及其特征关系可能涉及多种不同的颗粒。现有的结构,如CUBE集团、Group Brouping SETS、ARRAY AGGGGGETE、WINDOW功能、OLAP 立方体和各种数据解释范式,旨在便利这种分析,但所有在可比较性、明确规格和性能方面都显示出局限性。为了应对这些挑战,我们采用了多关系代数(MRA),我们采用了多关系代数(MRA),它概括了关系代数与两个核心数据抽象概念:RelationSpace,用于管理关系集合,SliceRelation, 和SliceRelation, 将数据围绕具有相应关系价值特征的实体进行。MRA引入了一套丰富的操作者,以模块化和宣示方式进行复杂的多层次分析。MRA的早期版本正在谷里制作,支持不同的数据洞察应用。本文件描述了MRA的动机、正式执行和将来的机会。


Article 5

Title@2025-07-23 (3): CASCADE: LLM-Powered JavaScript Deobfuscator at Google

Title: CASCADE: LLM-Powered JavaScript Deobfuscator at Google CASCADE: LLM-Powered JavaScript Deobfuscator bei Google CASCADE: 谷歌的LLM Powered JavaScript Deobfuscator 谷歌的LLM Powered JavaScript Deobfuscator 2507.17691v1

Authors (4): Shan Jiang, Pranoy Kovuri, David Tao, Zhixun Tan

Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google’s production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.

软件模糊化,特别是在爪哇史克里普特,阻碍代码理解和分析,对软件测试、静态分析和恶意检测构成重大挑战。本文介绍CASCADE,这是一种新型混合方法,将Gemini的先进编码能力与编译器中级代表(IR),特别是JavaScript IR(JSIR)的确定性转化能力相结合。通过使用Gemini来识别关键前端功能,即最流行的模糊化技术的基本组成部分,利用JSIR进行随后的代码转换,CASCADE有效地回收了原始字符串和API名称等语义元素,并揭示了原始程序行为。这种方法克服了现有静态和动态脱色技术的局限性,消除了数百至数千条硬编码规则,同时实现了可靠性和灵活性。CASCADE已经部署在谷歌的生产环境中,展示了JavaScript deobfuscation效率的大幅改进,并减少了逆向工程努力。


Article 6

Title@2025-07-23 (3): Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees

Title: Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees Effiziente Neuralnetzverifizierung durch Order Leading Exploration von Zweig-und-Bound-Bäumen 通过分树和环形树的有序主要勘探进行高效神经网络核查 2507.17453v1

Authors (7): Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H. M. N. Dilum Bandara, Shiping Chen, Jianjun Zhao, Yulei Sui

The vulnerability of neural networks to adversarial perturbations has necessitated formal verification techniques that can rigorously certify the quality of neural networks. As the state-of-the-art, branch and bound (BaB) is a “divide-and-conquer” strategy that applies off-the-shelf verifiers to sub-problems for which they perform better. While BaB can identify the sub-problems that are necessary to be split, it explores the space of these sub-problems in a naive “first-come-first-serve” manner, thereby suffering from an issue of inefficiency to reach a verification conclusion. To bridge this gap, we introduce an order over different sub-problems produced by BaB, concerning with their different likelihoods of containing counterexamples. Based on this order, we propose a novel verification framework Oliva that explores the sub-problem space by prioritizing those sub-problems that are more likely to find counterexamples, in order to efficiently reach the conclusion of the verification. Even if no counterexample can be found in any sub-problem, it only changes the order of visiting different sub-problem and so will not lead to a performance degradation. Specifically, Oliva has two variants, including $Oliva^{GR}$, a greedy strategy that always prioritizes the sub-problems that are more likely to find counterexamples, and $Oliva^{SA}$, a balanced strategy inspired by simulated annealing that gradually shifts from exploration to exploitation to locate the globally optimal sub-problems. We experimentally evaluate the performance of Oliva on 690 verification problems spanning over 5 models with datasets MNIST and CIFAR10. Compared to the state-of-the-art approaches, we demonstrate the speedup of Oliva for up to 25X in MNIST, and up to 80X in CIFAR10.

神经网络对对抗性扰动的脆弱性要求正式的核查技术来严格验证神经网络的质量。 由于神经网络的状态、 分支和约束( BAB) 是一种“ 分解和解析” 战略, 将现成的核查器应用到它们表现较好的子问题。 虽然BAB 可以找出需要分解的子问题, 但是它会探索这些小问题的空间, 以天真的“ 先到先得” 方式解决这些小问题, 从而影响神经网络质量问题, 从而导致无法达成核查结论。 为了弥合这一差距, 我们对BAB 产生的不同子问题, 将现成的校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外。 我们提议一个新的校外校外校外校外校外校外校外校外的校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外, , , , , , 等校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校内的校内的校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外的


Article 7

Title@2025-07-23 (3): A Diagrammatic Calculus for a Functional Model of Natural Language Semantics

Title: A Diagrammatic Calculus for a Functional Model of Natural Language Semantics Ein diagrammatischer Kalkulus für ein funktionelles Modell der natürlichen Sprachsemantik 自然语言语义学功能模型的图表计算 2507.00782v2

Authors (1): Matthieu Pierre Boyer

In this paper, we study a functional programming approach to natural language semantics, allowing us to increase the expressiveness of a more traditional denotation style. We will formalize a category based type and effect system to represent the semantic difference between syntactically equivalent expressions. We then construct a diagrammatic calculus to model parsing and handling of effects, providing a method to efficiently compute the denotations for sentences.

在本文中,我们研究自然语言语义学的功能性编程方法,从而使我们能够提高更传统的批注风格的表达性。我们将正式确定一种基于分类的类型和效果系统,以代表语义等同的表达方式之间的语义差异。然后我们构建一个图解计算法,以模拟对语义的解析和处理,从而提供一种有效计算判决批注的方法。


Article 8

Title@2025-07-23 (3): Hiord: An Approach to the Specification and Verification of Higher-Order (C)LP Programs

Title: Hiord: An Approach to the Specification and Verification of Higher-Order (C)LP Programs Hiord: Ein Ansatz zur Spezifikation und Überprüfung von Higher-Order (C)LP-Programmen Hiord: 高级命令(C)LP方案规格和核查办法 2507.17233v1

Authors (5): Marco Ciccalè, Daniel Jurjo-Rivas, Jose F. Morales, Pedro López-García, Manuel V. Hermenegildo

Higher-order constructs enable more expressive and concise code by allowing procedures to be parameterized by other procedures. Assertions allow expressing partial program specifications, which can be verified either at compile time (statically) or run time (dynamically). In higher-order programs, assertions can also describe higher-order arguments. While in the context of (C)LP, run-time verification of higher-order assertions has received some attention, compile-time verification remains relatively unexplored. We propose a novel approach for statically verifying higher-order (C)LP programs with higher-order assertions. Although we use the Ciao assertion language for illustration, our approach is quite general and we believe is applicable to similar contexts. Higher-order arguments are described using predicate properties – a special kind of property which exploits the (Ciao) assertion language. We refine the syntax and semantics of these properties and introduce an abstract criterion to determine conformance to a predicate property at compile time, based on a semantic order relation comparing the predicate property with the predicate assertions. We then show how to handle these properties using an abstract interpretation-based static analyzer for programs with first-order assertions by reducing predicate properties to first-order properties. Finally, we report on a prototype implementation and evaluate it through various examples within the Ciao system.

高级命令的构造允许通过其他程序参数化程序,使得更清晰、更简洁的代码能够使程序得到更清晰、更简洁的代码。 语句允许表达部分程序规格,这些规格可以在编译时(按时计)或运行时(按动态计)加以验证。 在较高顺序的程序中,主张也可以描述较高顺序的参数。 虽然在(C)LP的背景下,对较高顺序的主张的运行时间核查得到一定的注意,但编译时间核查仍然相对不易探索。 我们提出了一个静态地核查较高顺序(C)LP程序并具有较高排序的参数的新办法。 虽然我们使用Ciao的主张语言来说明,但我们的方法相当笼统,而且我们认为适用于类似的情况。 更高级命令参数的描述使用上游特性 – – 一种特殊的属性,利用(Ciao)的主张语言。 我们改进这些属性的语法和语义的定时核查,并引入抽象的标准,以确定在编译时是否符合上游属性与上游产权的比较关系。 我们随后展示如何使用基于基于抽象的定序的定式分析程序处理这些属性,然后通过先通过Sipeal-st分析程序对各种原型数据进行评价。


Article 9

Title@2025-07-23 (3): Querying Graph-Relational Data

Title: Querying Graph-Relational Data Abfrage von Graph-Relationalen Daten 查询图表关系数据 2507.16089v2

Authors (7): Michael J. Sullivan, Zhibo Chen, Elvis Pranskevichus, Robert J. Simmons, Victor Petrovykh, Aljaž Mur Eržen, Yury Selivanov

For applications that store structured data in relational databases, there is an impedance mismatch between the flat representations encouraged by relational data models and the deeply nested information that applications expect to receive. In this work, we present the graph-relational database model, which provides a flexible, compositional, and strongly-typed solution to this “object-relational mismatch.” We formally define the graph-relational database model and present a static and dynamic semantics for queries. In addition, we discuss the realization of the graph-relational database model in EdgeQL, a general-purpose SQL-style query language, and the Gel system, which compiles EdgeQL schemas and queries into PostgreSQL queries. Gel facilitates the kind of object-shaped data manipulation that is frequently provided inefficiently by object-relational mapping (ORM) technologies, while achieving most of the efficiency that comes from writing complex PostgreSQL queries directly.

对于将结构化数据储存到相关数据库中的应用程序,在关系数据模型所鼓励的平方表示式与应用程序预期会收到的深巢信息之间存在着阻力不匹配。 在这项工作中,我们展示了图形关系数据库模型,该模型为“对象关系不匹配”提供了灵活、构成和强型解决方案。我们正式定义了图形关系数据库模型,并为查询提供了静态和动态的语义。此外,我们讨论了在EdgeQL(通用的 SQL 风格查询语言)和Gel 系统中实现图形关系数据库模型模型的问题,Gel 系统汇编了 EgeQL schemas 和查询到 PostgreSQL 查询中。 Gel 方便了对象形数据操作的种类,这种操作常常由对象关系绘图技术低效地提供,同时实现了直接撰写复杂的 PostgreSQL 查询所带来的大部分效率。


Article 10

Title@2025-07-23 (3): Mapple: A Domain-Specific Language for Mapping Distributed Heterogeneous Parallel Programs

Title: Mapple: A Domain-Specific Language for Mapping Distributed Heterogeneous Parallel Programs Mapple: Eine Domain-spezifische Sprache für Mapping Verteilte Heterogene Parallelprogramme Mapple: 用于测绘分布式异基因平行方案的一种特定域语言 2507.17087v1

Authors (6): Anjiang Wei, Rohan Yadav, Hang Song, Wonchan Lee, Ke Wang, Alex Aiken

Optimizing parallel programs for distributed heterogeneous systems remains a complex task, often requiring significant code modifications. Task-based programming systems improve modularity by separating performance decisions from core application logic, but their mapping interfaces are often too low-level. In this work, we introduce Mapple, a high-level, declarative programming interface for mapping distributed applications. Mapple provides transformation primitives to resolve dimensionality mismatches between iteration and processor spaces, including a key primitive, decompose, that helps minimize communication volume. We implement Mapple on top of the Legion runtime by translating Mapple mappers into its low-level C++ interface. Across nine applications, including six matrix multiplication algorithms and three scientific computing workloads, Mapple reduces mapper code size by 14X and enables performance improvements of up to 1.34X over expert-written C++ mappers. In addition, the decompose primitive achieves up to 1.83X improvement over existing dimensionality-resolution heuristics. These results demonstrate that Mapple simplifies the development of high-performance mappers for distributed applications.

优化分布式混杂系统的平行程序仍然是一项复杂的任务,往往需要大量修改代码。基于任务的程序设计系统通过将性能决定与核心应用逻辑区分开来改善模块性,但其绘图界面往往太低。 在这项工作中,我们引入了用于分布式应用程序绘图的高层次、宣示性编程界面Mapple。Mapple提供了解决迭代与处理空间之间维度不匹配的转化原始材料,包括有助于最大限度地减少通信量的关键原始分解材料。我们通过将 Mapple 映像师转换为低水平 C++界面,在Legion 运行时间的顶部安装了 Mapple 。在9个应用程序中,包括6个矩阵倍增算法和3个科学计算工作量中, Mapple 将映像器代码的尺寸减少14X,并使得与专家撰写的 C++ 映像器相比,其性能提高到1.34X。此外,将原始分解使现有维度分辨率超度图像达到1.83X的改进程度。这些结果表明,Mapple 简化了用于分布式应用的高性绘图器的开发。


Article 11

Title@2025-07-22 (2): Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms

Title: Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms Hydra: Virtualisierte Mehrsprachen-Laufzeit für hochdichte serverlose Plattformen Hydal: 高密度无服务器平台虚拟化多语言运行时间 2212.10131v3

Authors (5): Serhii Ivanenko, Vasyl Lanko, Rudi Horn, Vojin Jovanovic, Rodrigo Bruno

Serverless is an attractive computing model that offers seamless scalability and elasticity; it takes the infrastructure management burden away from users and enables a pay-as-you-use billing model. As a result, serverless is becoming increasingly popular to support highly elastic and bursty workloads. However, existing platforms are supported by bloated virtualization stacks, which, combined with bursty and irregular invocations, lead to high memory and latency overheads. To reduce the virtualization stack bloat, we propose Hydra, a virtualized multi-language runtime and platform capable of hosting multiple sandboxes running concurrently. To fully leverage Hydra’s virtualized runtime, we revisit the existing serverless platform design to make it colocation-aware across owners and functions, and to feature a caching layer of pre-allocated Hydra instances that can be used by different functions written in different languages to reduce cold starts. We also propose a snapshotting mechanism to checkpoint and restore individual sandboxes. By consolidating multiple serverless function invocations through Hydra, we improve the overall function density (ops/GB-sec) by 2.41x on average compared to OpenWhisk runtimes, the state-of-the-art single-language runtimes used in most serverless platforms, and by 1.43x on average compared to Knative runtimes supporting invocation colocation within the same function. When reproducing the Azure Functions trace, our serverless platform operating Hydra instances reduces the overall memory footprint by 21.3-43.9% compared to operating OpenWhisk instances and by 14.5-30% compared to operating Knative instances. Hydra eliminates cold starts thanks to the pool of pre-warmed runtime instances, reducing p99 latency by 45.3-375.5x compared to OpenWhisk and by 1.9-51.4x compared to Knative.

无服务器是一种有吸引力的计算机模型,它提供无缝缩放和弹性;它从用户那里带走基础设施管理负担,并能够同时托管多个沙箱。因此,无服务器越来越受欢迎,以支持高度弹性和突发的工作量。然而,现有平台得到一个浮肿的虚拟化堆堆块的支持,这些堆堆堆与破碎和不规则的调试相结合,导致高记忆和延缓性管理。为了减少虚拟化堆叠浮肿,我们提议海德拉,一个虚拟化的多语言运行时间和平台,能够同时托管多个沙箱。为了充分利用海德拉的虚拟化运行时间,我们重新审视现有的无服务器平台设计,使其在拥有者和功能之间合用弹性和爆发性的工作。 3要充分利用海德拉的运行时间,3个虚拟化的多语言运行时间和多功能运行时间,3 将整个功能密度(Opps/GBsec) 降低到运行期间的运行时间平均轨道。


Article 12

Title@2025-07-22 (2): Enhancing Compiler Optimization Efficiency through Grammatical Decompositions of Control-Flow Graphs

Title: Enhancing Compiler Optimization Efficiency through Grammatical Decompositions of Control-Flow Graphs Steigerung der Effizienz der Kompileroptimierung durch grammatische Zersetzungen von Control-Flow Graphen 通过控制花形图的外表分解,提高编译者优化效率 2507.16660v1

Authors (1): Xuran Cai

This thesis addresses the complexities of compiler optimizations, such as register allocation and Lifetime-optimal Speculative Partial Redundancy Elimination (LOSPRE), which are often handled using tree decomposition algorithms. However, these methods frequently overlook important sparsity aspects of Control Flow Graphs (CFGs) and result in high computational costs. We introduce the SPL (Series-Parallel-Loop) decomposition, a novel framework that offers optimal solutions to these challenges. A key contribution is the formulation of a general solution for Partial Constraint Satisfaction Problems (PCSPs) within graph structures, applied to three optimization problems. First, SPL decomposition enhances register allocation by accurately modeling variable interference graphs, leading to efficient register assignments and improved performance across benchmarks. Second, it optimizes LOSPRE by effectively identifying and eliminating redundancies in program execution. Finally, the thesis focuses on optimizing the placement of bank selection instructions to enhance data retrieval efficiency and reduce latency. Extensive experimentation demonstrates significant performance improvements over existing methods, establishing SPL decomposition as a powerful tool for complex compiler optimizations, including register allocation, LOSPRE, and bank selection.

本文论述汇编器优化的复杂性,例如登记册分配和一生最优化的投机性部分冗余消除(LOSPRE),这些优化往往是用树分解算法处理的,然而,这些方法往往忽略控制流程图的重要聚变方面,导致计算成本高。我们引入了SPL(系列-Parallel-Loop)分解,这是一个为应对这些挑战提供最佳解决办法的新框架。一个关键贡献是制定了图表结构中部分抑制性满意问题的一般性解决方案,应用于三个优化问题。首先,SPL分解通过精确的可变干扰图模型加强登记册分配,导致高效率登记任务和改进各项基准的绩效。第二,通过有效查明和消除方案执行中的冗余,优化LOSPRE。最后,该理论侧重于优化银行选择指令的定位,以提高数据检索效率和减少拉力。广泛实验表明,在现有方法上取得了显著的绩效改进,将SPL解析作为复杂的汇编器选择的有力工具,包括登记册分配。


Article 13

Title@2025-07-22 (2): Hear Your Code Fail, Voice-Assisted Debugging for Python

Title: Hear Your Code Fail, Voice-Assisted Debugging for Python Hören Sie Ihren Code fehlschlagen, Voice-Assisted Debugging für Python 听到您的代码失效, 语音协助调试 Python 的调试 2507.15007v2

Authors (7): Sayed Mahbub Hasan Amiri, Md. Mainul Islam, Mohammad Shakhawat Hossen, Sayed Majhab Hasan Amiri, Mohammad Shawkat Ali Mamun, Sk. Humaun Kabir, Naznin Akter

This research introduces an innovative voice-assisted debugging plugin for Python that transforms silent runtime errors into actionable audible diagnostics. By implementing a global exception hook architecture with pyttsx3 text-to-speech conversion and Tkinter-based GUI visualization, the solution delivers multimodal error feedback through parallel auditory and visual channels. Empirical evaluation demonstrates 37% reduced cognitive load (p<0.01, n=50) compared to traditional stack-trace debugging, while enabling 78% faster error identification through vocalized exception classification and contextualization. The system achieves sub-1.2 second voice latency with under 18% CPU overhead during exception handling, vocalizing error types and consequences while displaying interactive tracebacks with documentation deep links. Criteria validate compatibility across Python 3.7+ environments on Windows, macOS, and Linux platforms. Needing only two lines of integration code, the plugin significantly boosts availability for aesthetically impaired designers and supports multitasking workflows through hands-free error medical diagnosis. Educational applications show particular promise, with pilot studies indicating 45% faster debugging skill acquisition among novice programmers. Future development will incorporate GPT-based repair suggestions and real-time multilingual translation to further advance auditory debugging paradigms. The solution represents a fundamental shift toward human-centric error diagnostics, bridging critical gaps in programming accessibility while establishing new standards for cognitive efficiency in software development workflows.

此项研究为 Python 引入了一个创新的语音辅助调试插件, 将静态运行时间错误转换成可感知的诊断。 通过实施带有 Pyttsx3 文本对语音转换和 Tkinter 图形界面图像化的全球例外钩形结构, 解决方案通过平行的听觉和视觉频道提供多式错误反馈。 经验评估显示, 与传统的书桌调试相比, 认知负载( p< 0.01, n=50) 减少了37% , 同时, 通过声频化例外分类和背景化, 能够更快地识别78%的错误。 该系统在例外处理、 发声错误类型和后果显示互动式追踪结构时, 使用 Python 3. 7+ 环境在 Windows, macOS 和 Linux 平台上, 标准验证兼容性。 只需要两行整合代码, 插件能极大地促进美容受损设计师的可用性, 通过无手错诊断, 支持多重任务流程。 教育应用程序显示了特别的希望, 试点研究显示, 将45% 快速的上下级的上下级读性理解性理解标准, 将快速转换到基础的逻辑化选择性格式化选择中, 格式化的流程中将显示, 格式化选择的流程中将快速转换为快速的逻辑转换到智能选择。


Article 14

Title@2025-07-21 (1): Bialgebraic Reasoning on Stateful Languages

Title: Bialgebraic Reasoning on Stateful Languages Bialgebraische Vernunft auf Stateful Languages 国语语比数理由 2503.10955v2

Authors (5): Sergey Goncharov, Stefan Milius, Lutz Schröder, Stelios Tsampas, Henning Urbat

Reasoning about program equivalence in imperative languages is notoriously challenging, as the presence of states (in the form of variable stores) fundamentally increases the observational power of program terms. The key desideratum for any notion of equivalence is compositionality, guaranteeing that subprograms can be safely replaced by equivalent subprograms regardless of the context. To facilitate compositionality proofs and avoid boilerplate work, one would hope to employ the abstract bialgebraic methods provided by Turi and Plotkin’s powerful theory of mathematical operational semantics (a.k.a. abstract GSOS) or its recent extension by Goncharov et al. to higher-order languages. However, multiple attempts to apply abstract GSOS to stateful languages have thus failed. We propose a novel approach to the operational semantics of stateful languages based on the formal distinction between readers (terms that expect an initial input store before being executed), and writers (running terms that have already been provided with a store). In contrast to earlier work, this style of semantics is fully compatible with abstract GSOS, and we can thus leverage the existing theory to obtain coinductive reasoning techniques. We demonstrate that our approach generates non-trivial compositionality results for stateful languages with first-order and higher-order store and that it flexibly applies to program equivalences at different levels of granularity, such as trace, cost, and natural equivalence.

由于国家的存在(以可变储存的形式)从根本上提高了对程序术语的观察力,因此,以紧迫语言进行方案等同的理论是众所周知的挑战。 任何等同概念的关键偏差在于构成性,保证子方案可以安全地被等同子子方案取代,而不论背景如何。为了便利组成性证明和避免锅炉工作,人们希望采用图里和普洛特金的数学操作语义学(a.k.a.抽象GSOS)的强大理论提供的抽象双地理比值方法,或贡察罗夫等人最近将其扩展至更高等级语言。然而,将抽象的GSS应用到等同语言的多次尝试都失败了。 我们提出了一种新颖的方法,根据正式区分读者(在实施之前预计有一个初始投入存储处的术语)和作家(已经提供的一种持续术语)和作者(与早期工作相比,这种语义学风格与抽象的GSOS完全兼容,因此,我们可以利用现有的理论将抽象的GOS应用到更高等级的自然等同性语言,从而将现有的理论应用到硬度的逻辑结构的自然等等同性思维。


Article 15

Title@2025-07-21 (1): GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Title: GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities GitChameleon 2.0: Bewertung der KI-Codegenerierung gegen Python Library Version Inkompatibilitäten GitChameleon 2.0:评估AI 与 Python 图书馆版本的不兼容性 2507.12367v2

Authors (12): Diganta Misra, Nizar Islah, Victor May, Brice Rauby, Zihan Wang, Justine Gehring, Antonio Orvieto, Muawiz Chaudhary, Eilif B. Muller, Irina Rish, Samira Ebrahimi Kahou, Massimo Caccia

The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks provide valuable insights, they typically lack execution-based evaluation for generating code compliant with specific library versions. To address this, we introduce GitChameleon 2.0, a novel, meticulously curated dataset comprising 328 Python code completion problems, each conditioned on specific library versions and accompanied by executable unit tests. GitChameleon 2.0 rigorously evaluates the capacity of contemporary large language models (LLMs), LLM-powered agents, code assistants, and RAG systems to perform version-conditioned code generation that demonstrates functional accuracy through execution. Our extensive evaluations indicate that state-of-the-art systems encounter significant challenges with this task; enterprise models achieving baseline success rates in the 48-51% range, underscoring the intricacy of the problem. By offering an execution-based benchmark emphasizing the dynamic nature of code libraries, GitChameleon 2.0 enables a clearer understanding of this challenge and helps guide the development of more adaptable and dependable AI code generation methods. We make the dataset and evaluation code publicly available at https://github.com/mrcabbage972/GitChameleonBenchmark.

软件图书馆的迅速演变为代码生成带来了相当大的障碍,需要不断适应经常更新版本,同时保持后向兼容性。虽然现有的代码演变基准提供了宝贵的洞察力,但它们通常缺乏基于执行的评价,以生成符合特定图书馆版本的代码。为此,我们引入了GitChameleon 2.0,这是一个由328个Python代码完成问题组成的创新和精心整理的数据集,每个系统都以特定图书馆版本为条件,并伴之以可执行的单位测试。GitChameleon 2.0 严格评估当代大型语言模型(LLLMS)、LLM-动力代理、代码助理和RAG系统的能力,以进行基于版本的代码生成,通过执行来显示功能的准确性。我们的广泛评估表明,最先进的系统在这项任务中遇到重大挑战;企业模型在48-51%的代码范围内实现基线成功率,凸显了问题的严重性。通过提供基于执行的基准,强调代码库的动态性质,GitChameleon 2.0 能够更清楚地了解这一挑战,并帮助指导开发更可调整和可信赖的AI代码生成方法。我们在 http://Gsembasimbas/ 公开提供数据设置。


Article 16

Title@2025-07-21 (1): Understanding Haskell-style Overloading via Open Data and Open Functions

Title: Understanding Haskell-style Overloading via Open Data and Open Functions Haskell-ähnliches Überladen über offene Daten und offene Funktionen verstehen 通过 Open Data 和 Open 函数超载理解 Haskell 模式 2507.16086v1

Authors (3): Andrew Marmaduke, Apoorv Ingle, J. Garrett Morris

We present a new, uniform semantics for Haskell-style overloading. We realize our approach in a new core language, System F$\mathrm{D}$, whose metatheory we mechanize in the Lean4 interactive theorem prover. System F$\mathrm{D}$ is distinguished by its open data types and open functions, each given by a collection of instances rather than by a single definition. We show that System F$_\mathrm{D}$ can encode advanced features of Haskell’s of type class systems, more expressively than current semantics of these features, and without assuming additional type equality axioms.

我们为Haskell式超载提供了一种新的统一语义。 我们用一种新的核心语言,即F$-mathrm{D}$系统来认识我们的方法。 我们用它的新的核心语言,即F$-mathrm{D}$系统,在Lean4互动理论验证器中机械化了它的元神话。 系统F$-mathrm{D}$的区别在于它的开放数据类型和开放功能,每个功能都有一系列实例,而不是单一的定义。 我们显示,系统F$-mathrm{D}$可以将Haskell的类系统先进特征编码起来,比这些特征的当前语义更加清晰,并且不假定其他类型的平等轴。


Article 17

Title@2025-07-21 (1): Abstracting Extensible Recursive Functions

Title: Abstracting Extensible Recursive Functions Abstracting Extensible Rekursive Funktionen 抽象化扩展递归函数 2410.11742v2

Authors (4): Alex Hubers, Apoorv Ingle, Andrew Marmaduke, J. Garrett Morris

We explore recursive programming with extensible data types. Row types make the structure of data types first class, and can express a variety of type system features including record subtyping and combination of case branches. Our goal is the modular combination of recursive types and of recursive functions over them. The most significant challenge is in recursive function calls, which may need to account for new cases in a combined type. We introduce extensible histomorphisms, Mendler-style descriptions of recursive functions in which recursive calls can happen at larger types, and show that they provide expressive recursion over extensible data types. We formalize our approach in R$\omega\mu$, a row type theory with support for recursive terms and types.

我们用可扩展的数据类型探索循环编程。 行型类型使数据类型结构成为头等, 并可以表达各种类型系统特征, 包括记录分型和案件分支的组合。 我们的目标是将循环类型和这些类型上的递归功能组合成模块组合。 最重要的挑战在于循环功能呼叫, 可能需要以组合类型对新案例进行核算。 我们引入了可扩展的原型、 门德勒式的循环函数描述, 可以以较大类型进行循环调用, 并显示它们为可扩展的数据类型提供直观的重现。 我们用R$\ omega\mu$正式确定了我们的方法, 这是一种行型理论, 支持循环术语和类型 。


Article 18

Title@2025-07-21 (1): RightTyper: Effective and Efficient Type Annotation for Python

Title: RightTyper: Effective and Efficient Type Annotation for Python RightTyper: Effektive und effiziente Typ-Annotation für Python RightTyper: Python 有效、高效型号注解 2507.16051v1

Authors (2): Juan Altmayer Pizzorno, Emery D. Berger

Python type annotations bring the benefits of static type checking to the language. However, manually writing annotations can be time-consuming and tedious. The result is that most real-world Python code remains largely untyped. Past approaches to annotating types in Python code fall short in a number of ways. Static approaches struggle with dynamic features and infer overly broad types. AI-based methods are inherently unsound and can miss rare or user-defined types. Dynamic methods can impose extreme runtime overheads, degrading performance by up to 270x, abort execution as they exhaust resources, and even infer incorrect types that lead to runtime errors. Crucially, all prior work assumes implicitly that the code to be annotated is already correct. This assumption is generally unwarranted, especially for large codebases that have been untyped. This paper presents RightTyper, a novel approach for Python that overcomes these disadvantages. RightTyper not only generates precise type annotations based on actual program behavior, improving recall in type checking relative to prior approaches. It also turns type checking into anomaly detection, allowing the type checker to identify corner cases that the programmer can audit for unintended behavior. RightTyper is also fast and space-efficient, imposing just 30% performance overhead on average. RightTyper achieves these characteristics by a principled yet pervasive use of sampling–guided by self-profiling–along with statistical filtering and careful resolution and aggregation of type information.

Python 类型描述将静态类型检查的好处带给语言。 但是, 手动写说明可能会耗费时间和繁琐。 结果是, 大多数真实世界 Python 代码仍然基本上没有类型化。 过去对 Python 代码批注类型的方法在许多方面都存在缺陷。 静态方法与动态特性和推断过于宽泛的类型进行斗争。 基于 AI 的方法本质上是不健康的, 可能忽略稀有或用户定义的类型。 动态方法可以强制设置极短运行时间管理、 最高为270x 的性能降低性能, 甚至在耗尽资源时中止执行, 甚至推断出导致运行错误的类型。 很显然, 所有先前的工作都暗含着要附加注释的代码已经是正确的。 这一假设通常没有道理, 特别是对于非类型代码化的大型代码基础。 本文展示了RightTyper 一种克服这些缺点的新型方法。 RightT 不仅根据实际的筛选程序行为产生精确的字型描述, 并且比先前的方法改进了对类型检查的回顾。 它还将类型检查转换成异常性检测, 允许右键式的直截式的直截式的直截式直截式的直径性操作, 将使得直截为直截式的直截式的直截式操作性操作操作操作操作操作, 直截到直截到直截截截截截截截截截下的直截截断式的直截截截断式的操作式的操作式的操作式的直截截截式操作性操作性操作性操作性操作性操作性操作性操作性操作性操作性操作性操作。


Article 19

Title@2025-07-21 (1): Autocomp: LLM-Driven Code Optimization for Tensor Accelerators

Title: Autocomp: LLM-Driven Code Optimization for Tensor Accelerators Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v3

Authors (4): Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao

Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today’s computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages like specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three categories of representative workloads and two different accelerators, we demonstrate that Autocomp-optimized code runs 5.6x (GEMM) and 2.7x (convolution) faster than the vendor-provided library, and outperforms expert-level hand-tuned code by 1.4x (GEMM), 1.1x (convolution), and 1.3x (fine-grained linear algebra). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.

在当今的计算环境中,硬件加速器,特别是那些设计用于高压处理的硬件加速器已经变得无处不在。然而,即便在建设数据编纂器方面做出了巨大努力,程序制作这些高压加速器仍然具有挑战性,使得其潜在潜力得不到充分利用。最近,在大量代码方面受过培训的大型语言模型(LLMS)在代码生成和优化任务中表现出了巨大的希望,但在生成像专门的高压加速器代码这样的低资源语言方面仍是一个重大挑战。我们用Autocomp 来应对这一挑战。Autcomp(Autocomp)赋予加速器程序程序程序员通过自动LLOM驱动搜索将域知识和硬件反馈用于优化代码。我们通过下列方法完成这项工作:1)将每个优化的通过结构化的两阶段快速、分为规划和代码生成阶段;2)在规划过程中通过简洁和调整的优化菜单插入域知识,3)将硬件的准确性和性度指标整合成为每次搜索的反馈。在三类代表性工作量和两个不同的内部加速器中,我们展示了自动整合的代码在5.6x(GEMMLA-D-A-D-D-D-D-CAR-D-D-C-C-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-C-C-D-D-C-C-C-C-C-C-C-C-C-C-C-C-D-D-D-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-S-S-S-S-S-S-S-C-C-S-C-C-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S


Article 20

Title@2025-07-21 (1): What does it take to certify a conversion checker?

Title: What does it take to certify a conversion checker? Was braucht es, um einen Conversion Checker zu bescheinigen? 需要用什么来证明转换核对器? 2502.15500v2

Authors (1): Meven Lennon-Bertrand

We report on a detailed exploration of the properties of conversion (definitional equality) in dependent type theory, with the goal of certifying decision procedures for it. While in that context the property of normalisation has attracted the most light, we instead emphasize the importance of injectivity properties, showing that they alone are both crucial and sufficient to certify most desirable properties of conversion checkers. We also explore the certification of a fully untyped conversion checker, with respect to a typed specification, and show that the story is mostly unchanged, although the exact injectivity properties needed are subtly different.

我们在报告中详细探讨了从属类型理论中的转换(定义平等)特性(定义平等),目的是验证其决定程序。 虽然在此情况下,正常化特性吸引了最多的光线,但我们却强调注射特性的重要性,表明只有注射特性既重要,又足以证明转换检查器最可取的特性。 我们还探讨了对完全不型型转换检查器的认证,涉及打印规格,并表明故事基本没有变化,尽管所需的精确注射特性非常不同。


Article 21

Title@2025-07-21 (1): Closure Conversion, Flat Environments, and the Complexity of Abstract Machines

Title: Closure Conversion, Flat Environments, and the Complexity of Abstract Machines Verschlussumwandlung, flache Umgebungen und die Komplexität abstrakter Maschinen 关闭转换、平坦环境和简易机器的复杂性 2507.15843v1

Authors (5): Beniamino Accattoli, Dan Ghica, Giulio Guerrieri, Cláudio Belo Lourenço, Claudio Sacerdoti Coen

Closure conversion is a program transformation at work in compilers for functional languages to turn inner functions into global ones, by building closures pairing the transformed functions with the environment of their free variables. Abstract machines rely on similar and yet different concepts of closures and environments. In this paper, we study the relationship between the two approaches. We adopt a very simple {\lambda}-calculus with tuples as source language and study abstract machines for both the source language and the target of closure conversion. Moreover, we focus on the simple case of flat closures/environments, that is, with no sharing of environments. We provide three contributions. Firstly, a new simple proof technique for the correctness of closure conversion, inspired by abstract machines. Secondly, we show how the closure invariants of the target language allow us to design a new way of handling environments in abstract machines, not suffering the shortcomings of other styles. Thirdly, we study the machines from the point of view of time complexity, adapting analyses by Accattoli and co-authors. We show that closure conversion decreases various dynamic costs while increasing the size of the initial code. Despite these changes, the overall complexity of the machines before and after closure conversion turns out to be the same.

在功能语言编译器中,关闭转换是一种程序转换,使功能语言将内部功能转化为全球功能,方法是建立封闭功能与自由变量的环境相配。抽象机器依靠类似但又不同的封闭和环境概念。在本文件中,我们研究了这两种方法之间的关系。我们采用了一种非常简单的 lambda}计算法,将图例作为源语言,并研究源语言和关闭转换目标的抽象机器。此外,我们侧重于平板封闭/环境的简单案例,即不共享环境。我们提供了三种贡献。首先,一种由抽象机器启发的封闭转换正确性新的简单验证技术。第二,我们展示了目标语言的封闭性如何使我们设计出一种在抽象机器中处理环境的新方法,而不受其他风格缺陷的影响。第三,我们从时间复杂性的角度研究机器,调整Accattoli和共同作者的分析。我们显示关闭转换会降低各种动态成本,同时增加初始代码的大小。尽管这些变化是机器在关闭前的复杂程度,但在关闭之前,但机器的总体复杂性是相同的。


Article 22

Title@2025-07-21 (1): Formal Analysis of Networked PLC Controllers Interacting with Physical Environments

Title: Formal Analysis of Networked PLC Controllers Interacting with Physical Environments Formale Analyse von vernetzten SPS-Controllern, die mit physikalischen Umgebungen interagieren 对联网PLC主计长与自然环境互动的正式分析 2507.15596v1

Authors (2): Jaeseo Lee, Kyungmin Bae

Programmable Logic Controllers (PLCs) are widely used in industrial automation to control physical systems. As PLC applications become increasingly complex, ensuring their correctness is crucial. Existing formal verification techniques focus on individual PLC programs in isolation, often neglecting interactions with physical environments and network communication between controllers. This limitation poses significant challenges in analyzing real-world industrial systems, where continuous dynamics and communication delays play a critical role. In this paper, we present a unified formal framework that integrates discrete PLC semantics, networked communication, and continuous physical behaviors. To mitigate state explosion, we apply partial order reduction, significantly reducing the number of explored states while maintaining correctness. Our framework enables precise analysis of PLC-driven systems with continuous dynamics and networked communication.

在工业自动化中,可编程逻辑控制器(PLC)被广泛用于控制物理系统。随着PLC应用日益复杂,确保其正确性至关重要。现有的正式核查技术侧重于孤立的单个PLC程序,往往忽视与物理环境的相互作用以及控制器之间的网络通信。这一限制在分析现实世界工业系统方面提出了重大挑战,因为在这个系统中,连续的动态和通信延迟起着关键作用。在本文件中,我们提出了一个统一的正式框架,将离散的PLC语义、联网通信和连续的物理行为结合起来。为了减轻国家爆炸,我们采用了部分订单削减,大大减少了被探索的州的数量,同时保持了正确性。我们的框架使得能够精确分析具有连续动态和网络通信的PLC驱动的系统。


Article 23

Title@2025-07-21 (1): Bayesian Separation Logic

Title: Bayesian Separation Logic Bayesische Trennungslogik Bayesian 分离逻辑 2507.15530v1

Authors (3): Shing Hin Ho, Nicolas Wu, Azalea Raad

Bayesian probabilistic programming languages (BPPLs) let users denote statistical models as code while the interpreter infers the posterior distribution. The semantics of BPPLs are usually mathematically complex and unable to reason about desirable properties such as expected values and independence of random variables. To reason about these properties in a non-Bayesian setting, probabilistic separation logics such as PSL and Lilac interpret separating conjunction as probabilistic independence of random variables. However, no existing separation logic can handle Bayesian updating, which is the key distinguishing feature of BPPLs. To close this gap, we introduce Bayesian separation logic (BaSL), a probabilistic separation logic that gives semantics to BPPL. We prove an internal version of Bayes’ theorem using a result in measure theory known as the Rokhlin-Simmons disintegration theorem. Consequently, BaSL can model probabilistic programming concepts such as Bayesian updating, unnormalised distribution, conditional distribution, soft constraint, conjugate prior and improper prior while maintaining modularity via the frame rule. The model of BaSL is based on a novel instantiation of Kripke resource monoid via $\sigma$-finite measure spaces over the Hilbert cube, and the semantics of Hoare triple is compatible with an existing denotational semantics of BPPL based on the category of $s$-finite kernels. Using BaSL, we then prove properties of statistical models such as the expected value of Bayesian coin flip, correlation of random variables in the collider Bayesian network, and the posterior distributions of the burglar alarm model, a parameter estimation algorithm, and the Gaussian mixture model.

BPPL的语义通常在数学上很复杂,无法解释预期值和随机变量的独立性等适当属性。为了在非巴伊西亚环境下解释这些属性,PSL和Lilac等概率分解逻辑将这些属性解释为随机变量的概率独立性。然而,任何现有的分离逻辑都无法将Bayesian 的参数更新作为代码,而这是BPPL的关键特征。为了缩小这一差距,我们引入了BPPLs的语义分解逻辑(BASL),这是BPLs的数学分解逻辑(BASL),这是给BPPLs随机变量的预期值和独立性等合理性属性。我们证明Bayes 的内在版本是测量理论,如Rokhlin-Simmons 空间的概率分解。因此,BSLSL可以将Bayes Ral-ral-ral-lational-lational-loral-loral-loral-loral-lation 概念概念概念概念概念概念概念概念概念模型建模模型模型, 包括BSil-ral-ral-ral-ral-ral-lation-lation-lation-lation-lation-lation-lation-lation-lation-lation-lational-lational-lational-lational-lational-lation-lational-lational-lation-lational-lational-lational-l)的模型, 和Slation-lation-lation-lation-lation-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-l-l-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-ld-l-l-ld-ld-ld-l-ld-ld-ld-ld-l-l-ld-ld-ld-ld-l-l-l-l-l-l-l


Article 24

Title@2025-07-21 (1): Quantum Programming in Polylogarithmic Time

Title: Quantum Programming in Polylogarithmic Time Quantenprogrammierung in polylogarithmischer Zeit 聚合时间的量子规划 2507.15415v1

Authors (4): Florent Ferrari, Emmanuel Hainry, Romain Péchoux, Mário Silva

Polylogarithmic time delineates a relevant notion of feasibility on several classical computational models such as Boolean circuits or parallel random access machines. As far as the quantum paradigm is concerned, this notion yields the complexity class FBQPOLYLOG of functions approximable in polylogarithmic time with a quantum random-access Turing machine. We introduce a quantum programming language with first-order recursive procedures, which provides the first programming-language-based characterization of FBQPOLYLOG. Each program computes a function in FBQPOLYLOG (soundness) and, conversely, each function of this complexity class is computed by a program (completeness). We also provide a compilation strategy from programs to uniform families of quantum circuits of polylogarithmic depth and polynomial size, whose set of computed functions is known as QNC, and recover the well-known separation result FBQPOLYLOG $\subsetneq$ QNC.

多式时间划定了几个古典计算模型(如布林电路或平行随机存取机器)的相关可行性概念。 就量子范式而言,这个概念生成了在量子随机存取图灵机的多式时间中相近功能的复杂级FBQPOLYLOG。我们引入了一种量子编程语言,配有第一级递归程序,为FBQPOLYLOG提供了第一个基于编程语言的特征描述。每个程序在FBQPOLYLOG(音频)中计算了一个函数,反之,这一复杂类的每个函数都由一个程序(完整性)计算。我们还提供了一套汇编战略,从程序到多元深度和多元体积大小的量子电流的统一组,其计算功能的集名为QNC,并回收众所周知的分离结果FBQPOLYLO$\subsetneq QNC。


Article 25

Title@2025-07-21 (1): Don’t exhaust, don’t waste

Title: Don’t exhaust, don’t waste Nicht ausschöpfen, nicht verschwenden 不要用尽,不要浪费,不要浪费 2507.13792v2

Authors (4): Riccardo Bianchini, Francesco Dagnino, Paola Giannini, Elena Zucca

We extend the semantics and type system of a lambda calculus equipped with common constructs to be resource-aware. That is, the semantics keep tracks of the usage of resources, and is stuck, besides in case of type errors, if either a needed resource is exhausted, or a provided resource would be wasted. In such way, the type system guarantees, besides standard soundness, that for well-typed programs there is a computation where no resource gets either exhausted or wasted. The no-waste extension is parametric on an arbitrary grade algebra, modeling an arbitrary assortment of possible usages, and does not require ad-hoc changes to the underlying language. To this end, the semantics needs to be formalized in big-step style; as a consequence, expressing and proving (resource-aware) soundness is challenging, and is achieved by applying recent techniques based on coinductive reasoning.

我们扩展了配有通用结构的羊羔微积分的语义和类型系统, 以成为资源意识。 也就是说, 语义保留着资源使用情况的跟踪, 并且被卡住, 除了类型错误, 如果需要的资源已经用尽, 或者提供的资源将会被浪费。 这样, 类型系统可以保证, 除了标准的稳健性, 对于有型程序来说, 没有资源被耗尽或浪费的计算方法。 无废物扩展是任意的等级代数的参数, 以任意的等级代数为模型, 并且不要求对基本语言进行任意的分类, 并且不需要对基本语言进行临时的改动 。 为此, 语义需要以大步方式正式化; 其结果是, 表达和证明( 资源认知) 健全性具有挑战性, 并且通过应用基于硬体推理的最新技术来实现。


Article 26

Title@2025-07-21 (1): A Few Fit Most: Improving Performance Portability of SGEMM on GPUs using Multi-Versioning

Title: A Few Fit Most: Improving Performance Portability of SGEMM on GPUs using Multi-Versioning Ein paar Fit Most: Verbesserung der Performance-Portabilität von SGEMM auf GPUs mit Multi-Versioning 少数几个最适合:利用多变方法提高SGEMM在GPU上的性能便捷性 2507.15277v1

Authors (2): Robert Hochgraf, Sreepathi Pai

Hand-optimizing linear algebra kernels for different GPU devices and applications is complex and labor-intensive. Instead, many developers use automatic performance tuning (autotuning) to achieve high performance on a variety of devices. However, autotuning “overfits”, and must be redone if any part of the environment changes, such as if the device or input characteristics change. In most non-trivial cases, a single compute kernel cannot maintain near-optimal performance across all environments. Changing the kernel to specialize it to the current execution environment is possible, but on GPUs, runtime tuning and compilation can be expensive. In this work, we use multi-versioning – producing several variants of the same code – as a way to generate performance portable code. We describe a framework called portability tuning that can automatically generate multi-versioned code whose performance is portable, requiring no retuning. We evaluate our framework on a dataset of execution times for GEMM kernels from the CLBlast linear algebra library. We find our portability tuning techniques outperform CLBlast’s default kernels – often approaching within 10% of the theoretical maximum performance – despite CLBlast using autotuning techniques. Further, we find that our generated programs generalize well to new and unseen devices, matching the performance of autotuning without ever portability tuning for those devices.

对不同的 GPU 设备和应用程序进行手操作操作的线性代数内核内核,是复杂和劳动密集型的。相反,许多开发者使用自动性能调调(自动调试)来实现各种装置的高性能。然而,如果环境变化的任何部分,例如设备或输入特性的变化,自动调“功能”必须重新显示,如果环境变化的任何部分,例如设备或输入特性的变化,则必须重新显示。在大多数非三角案例中,单个计算式内核无法在所有环境中保持接近最佳的性能。将内核专门改为当前执行环境是可能的,但在GPU、运行时间调试和编译方面费用可能很高。在这项工作中,我们使用多功能转换 – – 产生若干变异的同一代码,作为生成性能便携式代码的一种方法。我们描述一个称为可移植调制的框架,可以自动生成多版本的性能,不需要再调整。我们用GEMM 不断调试的内核调的内核设备执行时间的架构。我们发现,在GLBLT 线性平调新升升调库库库库库库库库库中,我们发现我们的可移植性调技术可能超越了CLLLB 。我们内部的可移植性技术,尽管CLLLLLLLLLLLLLLLLLD 默认性能生成的性能生成的性能生成的性能生成了10的自动操作程序。


Article 27

Title@2025-07-20 (7): Invariant Generation for Floating-Point Programs via Constraint Solving

Title: Invariant Generation for Floating-Point Programs via Constraint Solving Invariante Generierung für Floating-Point-Programme über Constraint Solving 通过约束性溶解为浮动点程序生成变量 2507.15017v1

Authors (3): Xuran Cai, Liqian Chen, Hongfei Fu

In numeric-intensive computations, it is well known that the execution of floating-point programs is imprecise as floating point arithmetics (e.g., addition, subtraction, multiplication, division, etc.) incurs rounding errors. Albeit the rounding error is small for every single floating-point operation, the aggregation of such error in multiple operations may be dramatic and cause catastrophic program failures. Therefore, to ensure the correctness of floating-point programs, the effect of floating point error needs to be carefully taken into account. In this work, we consider the invariant generation for floating point programs, whose aim is to generate tight invariants under the perturbation of floating point errors. Our main contribution is a theoretical framework on how to apply constraint solving methods to address the invariant generation problem. In our framework, we propose a novel combination between the first-order differential characterization by FPTaylor (TOPLAS 2018) and constraint solving methods, aiming to reduce the computational burden of constraint solving. Moreover, we devise two polynomial invariant generation algorithms to instantiate the framework. The first algorithm is applicable to a wide range of floating-point operations but requires an initial (coarse) invariant as external input, while the second does not require an initial invariant but is limited to polynomial programs. Furthermore, we show how conditional branches, a difficult issue in floating-point analysis, can be handled in our framework. Experimental results show that our algorithms outperform SOTA approaches in both the time efficiency and the precision of the generated invariants over a variety of benchmarks.

在数字密集计算中,众所周知,浮动点程序的执行不精确,因为浮动点计算(例如,添加、减、倍增、分等)会产生四舍五入的错误。尽管四舍五入对于每个浮点操作来说都是很小的,但在多个操作中,这种差错的汇总可能是戏剧性的,并造成灾难性的程序失败。因此,为了确保浮动点程序正确性,需要仔细考虑浮动点错误的影响。在这项工作中,我们认为浮动点程序产生的变异性生成的目的是在浮动点误差的扭曲值下产生紧凑的变异性。我们的主要贡献是一个理论框架,说明如何运用限制解算方法来解决波动点生成的问题。在我们的框架中,我们建议将FPTASaylor(TOPL2018)的第一阶差分和限制解算方法结合起来,目的是减少计算性解缩压。此外,我们为浮点点的浮点程序设计了两种多变量生成算法,目的是在浮点点误差值框架的初始值下产生紧凑紧的变数。我们的第一个算法是用来在初始框架中显示一个伸缩的伸缩操作,同时显示一个伸缩的伸缩的伸缩的伸缩程序,同时需要显示一个伸缩的伸缩的伸缩的伸缩的伸缩的伸缩的伸缩的伸缩。


Article 28

Title@2025-07-19 (6): Timetide: A programming model for logically synchronous distributed systems

Title: Timetide: A programming model for logically synchronous distributed systems Timetide: Ein Programmiermodell für logisch synchron verteilte Systeme 时针:逻辑同步分布系统编程模型 2507.14471v1

Authors (5): Logan Kenwright, Partha Roop, Nathan Allen, Călin Caşcaval, Avinash Malik

Massive strides in deterministic models have been made using synchronous languages. They are mainly focused on centralised applications, as the traditional approach is to compile away the concurrency. Time triggered languages such as Giotto and Lingua Franca are suitable for distribution albeit that they rely on expensive physical clock synchronisation, which is both expensive and may suffer from scalability. Hence, deterministic programming of distributed systems remains challenging. We address the challenges of deterministic distribution by developing a novel multiclock semantics of synchronous programs. The developed semantics is amenable to seamless distribution. Moreover, our programming model, Timetide, alleviates the need for physical clock synchronisation by building on the recently proposed logical synchrony model for distributed systems. We discuss the important aspects of distributing computation, such as network communication delays, and explore the formal verification of Timetide programs. To the best of our knowledge, Timetide is the first multiclock synchronous language that is both amenable to distribution and formal verification without the need for physical clock synchronisation or clock gating.

使用同步语言在确定型模式上取得了巨大进步。 它们主要集中于集中化应用, 因为传统方法是将货币编译。 时间触发语言, 如 Giotto 和 Lingua Franca 适合发行, 尽管它们依赖昂贵的物理时钟同步化, 这既昂贵又可能受到可缩放的影响。 因此, 分布式系统的确定型编程仍然具有挑战性。 我们通过开发同步程序的新颖的多小时语调来应对确定型分布的挑战。 发达的语义可以无缝地分布。 此外, 我们的编程模型, Timetide , 通过建立最近提议的分布式系统逻辑同步模式, 来缓解物理时钟同步化的需要。 我们讨论计算分配的重要方面, 例如网络通信延迟, 并探索时间化程序的正式验证 。 根据我们的知识, Timede 是第一种多小时同步语言, 既可以分发, 也不需要物理时钟同步或时钟调节, 也无需正式核实。


Article 29

Title@2025-07-18 (5): NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers

Title: NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers NPUEval: Optimierung von NPU-Kerneln mit LLMs und Open Source Compilern NPUEval:利用LLMS和开放源码汇编器优化NPU核心内核 2507.14403v1

Authors (2): Sarunas Kalade, Graham Schelle

Neural processing units (NPUs) are gaining prominence in power-sensitive devices like client devices, with AI PCs being defined by their inclusion of these specialized processors. Running AI workloads efficiently on these devices requires libraries of optimized kernels. Creating efficient kernels demands expertise in domain-specific C++ with vector intrinsics and in-depth knowledge of the target architecture. Unlike GPU programming, which has had years to mature, NPU programming is new, with smaller and more fragmented developer communities across hardware platforms. This fragmentation poses a challenge when utilizing LLMs to assist in writing NPU kernels, as domain-specific optimized code examples are underrepresented in LLM pre-training data. In this paper we introduce NPUEval – a benchmark for writing and evaluating NPU kernels, consisting of 102 common operators for machine learning workloads. We evaluate LLM generated code on actual hardware based on both functional correctness and vectorization efficiency using open source compiler tools targeting the AMD NPU. We evaluate a range of state-of-the-art LLMs with a mix of proprietary and open-weight models. Latest reasoning models like DeepSeek R1, show promising results achieving out-of-the-box 50%+ vectorization on select kernels. However, the average score across the entire dataset remains roughly 10% even with compiler feedback and vectorized kernel examples – showing that this is a challenging dataset even for frontier models. The dataset and evaluation code will be released with a permissive open source license, providing an essential benchmark for advancing research in code generation and NPU kernel optimization.

神经处理器( NPU) 正在客户端设备等对电力敏感的设备中占据显要地位, 客户端设备为AI PC 定义了这些专门处理器。 高效运行 AI 工作量需要优化内核库。 创建高效的内核需要特定域 C+ 的专业知识, 包括矢量固有和对目标结构的深入了解。 与GPU 程序编制程序不同, 它已经成熟多年, NPU 编程是新的, 其硬件平台的开发者群体规模较小、 更加分散。 当利用LLMS 协助编写 NPU 内核内核时, 是一个挑战, 因为特定域的优化代码实例在 LLM 培训前的数据中代表不足。 在此文件中我们引入 NPUUEval – – 用于撰写和评价特定CUP 内 C++, 由102个共同操作者组成, 用于机器学习工作量的通用操作者。 我们用开放源码编程和矢量效率来评价实际硬件的编码。 我们评估了一系列最先进的LMM , , 和公开和开放的离心室的离心室的离心的流的离心的流数据流模型将显示一个有预的升级的升级的升级的升级的升级的计算结果。


Article 30

Title@2025-07-18 (5): Towards Regulated Deep Learning

Title: Towards Regulated Deep Learning Auf dem Weg zu reguliertem Deep Learning 走向监管的深学习 1912.13122v8

Authors (1): Andrés García-Camino

Regulation of Multi-Agent Systems (MAS) and Declarative Electronic Institutions (DEIs) was a multidisciplinary research topic of the past decade involving (Physical and Software) Agents and Law since the beginning, but recently evolved towards News-claimed Robot Lawyer since 2016. One of these first proposals of restricting the behaviour of Software Agents was Electronic Institutions. However, with the recent reformulation of Artificial Neural Networks (ANNs) as Deep Learning (DL), Security, Privacy,Ethical and Legal issues regarding the use of DL has raised concerns in the Artificial Intelligence (AI) Community. Now that the Regulation of MAS is almost correctly addressed, we propose the Regulation of Artificial Neural Networks as Agent-based Training of a special type of regulated Artificial Neural Network that we call Institutional Neural Network (INN).The main purpose of this paper is to bring attention to Artificial Teaching (AT) and to give a tentative answer showing a proof-of-concept implementation of Regulated Deep Learning (RDL). This paper introduces the former concept and provide $I^*$, a language previously used to model declaratively and extend Electronic Institutions, as a means to regulate the execution of Artificial Neural Networks and their interactions with Artificial Teachers (ATs)

多边监管系统(MAS)和标准电子机构(DEI)是过去十年来涉及(物理和软件)代理和法律的多学科研究课题,自始至终一直涉及(物理和软件)代理和法律,但自2016年以来,最近演变为以新闻为名的机器人律师;这些最初限制软件代理行为的建议之一是电子机构;然而,最近将人工神经网络(ANN)改名为深层学习(DL),安全、隐私、伦理和法律问题,这引起了人工智能共同体的关切;由于《MAS条例》几乎得到了正确处理,我们提议将人工神经网络监管作为一种特殊类型的规范的人工神经网络的代理培训,我们称之为机构神经网络(INN)。 本文的主要目的是提请注意人工教学(AT),并给出一个初步答案,表明受校校的深层学习(RDL)的验证执行情况。本文介绍了前概念,并提供了“$I$”,这是以前用来规范示范性教学和扩展机构(Artimais artifical artimal Network)的一种语言,用于示范性教学和扩展机构。


Article 31

Title@2025-07-18 (5): AdapTT: Functoriality for Dependent Type Casts

Title: AdapTT: Functoriality for Dependent Type Casts AdapTT: Funktorialität für abhängige Typ Casts AdapTT: 依附性种类的杂交性 2507.13774v1

Authors (4): Arthur Adjedj, Meven Lennon-Bertrand, Thibaut Benjamin, Kenji Maillard

The ability to cast values between related types is a leitmotiv of many flavors of dependent type theory, such as observational type theories, subtyping, or cast calculi for gradual typing. These casts all exhibit a common structural behavior that boils down to the pervasive functoriality of type formers. We propose and extensively study a type theory, called AdapTT, which makes systematic and precise this idea of functorial type formers, with respect to an abstract notion of adapters relating types. Leveraging descriptions for functorial inductive types in AdapTT, we derive structural laws for type casts on general inductive type formers.

在相关类型之间抛出价值的能力是多种依赖类型理论口味的精髓,例如观察型理论、亚型或为逐步打字而投放计算器。 这些都展示了一种共同的结构行为,归结为典型前型的普及调料。 我们提出并广泛研究一种类型理论,叫做AdapTT, 它将调料型前型的概念系统而精确地纳入一个与类型有关的抽象调料器概念。 我们利用AdapTT中调料导型的描述,我们为普通导料型前型的排料制定结构性法律。


Article 32

Title@2025-07-18 (5): Frex: dependently-typed algebraic simplification

Title: Frex: dependently-typed algebraic simplification Frex: abhängig typisierte algebraische Vereinfachung Frex: 依赖型代数简化 2306.15375v2

Authors (5): Guillaume Allais, Edwin Brady, Nathan Corbyn, Ohad Kammar, Jeremy Yallop

We present a new design for an algebraic simplification library structured around concepts from universal algebra: theories, models, homomorphisms, and universal properties of free algebras and free extensions of algebras. The library’s dependently typed interface guarantees that both built-in and user-defined simplification modules are terminating, sound, and complete with respect to a well-specified class of equations. We have implemented the design in the Idris 2 and Agda dependently typed programming languages and shown that it supports modular extension to new theories, proof extraction and certification, goal extraction via reflection, and interactive development.

我们提出一个新的代数简化图书馆设计,它围绕来自通用代数的概念:理论、模型、同质法以及自由代数和自由代数扩展的通用特性。 图书馆的自发输入界面保证了内置和用户定义的简化模块在明确指定的方程式类别方面都正在终止、健全和完整。 我们已经在伊德里斯2和阿格达格式化编程语言中实施了设计,并表明它支持将模块扩展至新理论、证据提取和认证、通过反射目标提取和互动开发。


Article 33

Title@2025-07-18 (5): Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models

Title: Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models Modellierung der Open-World-Kognition als On-Demand-Synthese probabilistischer Modelle 将开放世界的认知建模作为概率模型的 “ 现场合成 “ 模型 2507.12547v2

Authors (11): Lionel Wong, Katherine M. Collins, Lance Ying, Cedegao E. Zhang, Adrian Weller, Tobias Gerstenberg, Timothy O’Donnell, Alexander K. Lew, Jacob D. Andreas, Joshua B. Tenenbaum, Tyler Brooke-Wilson

When faced with novel situations, people are able to marshal relevant considerations from a wide range of background knowledge and put these to use in inferences and predictions. What permits us to draw in globally relevant information and reason over it coherently? Here, we explore the hypothesis that people use a combination of distributed and symbolic representations to construct bespoke mental models tailored to novel situations. We propose a computational implementation of this idea – a ``Model Synthesis Architecture’’ (MSA) – using language models to implement global relevance-based retrieval and model synthesis and probabilistic programs to implement bespoke, coherent world models. We evaluate our MSA as a model of human judgments on a novel reasoning dataset. The dataset – built around a Model Olympics domain of sports vignettes – tests models’ capacity for human-like, open-ended reasoning by requiring (i) judgments about novel causal structures described in language; (ii) drawing on large bodies of background knowledge; and (iii) doing both in light of observations that introduce arbitrary novel variables. Our MSA approach captures human judgments better than language model-only baselines, under both direct and chain-of-thought generations from the LM that supports model synthesis. These results suggest that MSAs can be implemented in a way that mirrors people’s ability to deliver locally coherent reasoning over globally relevant variables, offering a path to understanding and replicating human reasoning in open-ended domains.

在面临新情况时,人们能够从广泛的背景知识中汇集相关考虑,并将这些因素用于推论和预测。什么能使我们在全球范围内以一致的方式获取相关信息和理性?在这里,我们探讨一种假设,即人们使用分布式和象征式的表达方式相结合来构建符合新情况的精神模型;我们建议对这一概念进行计算实施 – – “ 模型综合综合架构 “ (MSA) – – 使用语言模型来实施基于全球相关性的检索和模型合成,以及概率化方案,以实施直观、一致的世界模型。我们评估我们的特派任务生活津贴,将其作为人类对新推理数据集的判断模型。数据集 – – 围绕“奥林匹克运动”运动“体育名流域 – – 测试模型” 来为人性、开放型的推理能力而构建。我们建议(一) 判断语言中描述的新因果结构;(二) 借鉴大量的背景知识;以及(三) 结合引入任意的新变数的观察,既能反映人类的人类的判断,又能更好理解语言模型的基线,在直接和连锁推理学能力下,能够从全球推理推算出人类的推理结果。


Article 34

Title@2025-07-18 (5): SAQR-QC: A Logic for Scalable but Approximate Quantitative Reasoning about Quantum Circuits

Title: SAQR-QC: A Logic for Scalable but Approximate Quantitative Reasoning about Quantum Circuits SAQR-QC: Eine Logik für skalierbare, aber annähernd quantitative Begründung von Quantenkreisen SAQR-QC:可缩放但近似量控量用于量子电路的逻辑 2507.13635v1

Authors (3): Nengkun Yu, Jens Palsberg, Thomas Reps

Reasoning about quantum programs remains a fundamental challenge, regardless of the programming model or computational paradigm. Despite extensive research, existing verification techniques are insufficient–even for quantum circuits, a deliberately restricted model that lacks classical control, but still underpins many current quantum algorithms. Many existing formal methods require exponential time and space to represent and manipulate (representations of) assertions and judgments, making them impractical for quantum circuits with many qubits. This paper presents a logic for reasoning in such settings, called SAQR-QC. The logic supports Scalable but Approximate Quantitative Reasoning about Quantum Circuits, whence the name. SAQR-QC has three characteristics: (i) some (deliberate) loss of precision is built into it; (ii) it has a mechanism to help the accumulated loss of precision during a sequence of reasoning steps remain small; and (iii) most importantly, to make reasoning scalable, all reasoning steps are local–i.e., they each involve just a small number of qubits. We demonstrate the effectiveness of SAQR-QC via two case studies: the verification of GHZ circuits involving non-Clifford gates, and the analysis of quantum phase estimation–a core subroutine in Shor’s factoring algorithm.

尽管进行了广泛的研究,但现有的核查技术对于量子电路来说甚至不够充分,而量子电路是一种蓄意限制的模式,缺乏古典的控制,但仍支撑着目前许多量子算法。许多现有的正规方法需要指数化的时间和空间来代表并操纵(代表)断言和判断,使量子电路在许多方位上不切实际。本文提出了在这种环境下进行推理的逻辑,称为SAQR-QC。逻辑支持可缩放但近似量化的量子电路,其名称是何方。SAQR-QC有三个特点:(一) 某些(故意)精确度的丧失已经嵌入其中;(二) 它有一个机制来帮助在一系列推理步骤中累积的精确度损失仍然很小;(三) 最重要的是,使推理具有可缩放的逻辑,所有推理步骤都是本地的。我们通过两个案例研究展示了SAQR-QC的有效性:SQR-Qrus Qalveals 包括SQral-z的次级分析。


Article 35

Title@2025-07-18 (5): Extensional and Non-extensional Functions as Processes

Title: Extensional and Non-extensional Functions as Processes Erweiterungs- und Nichterweiterungsfunktionen als Prozesse 作为处理过程的扩展和非扩展函数 2405.03536v3

Authors (2): Ken Sakayori, Davide Sangiorgi

Following Milner’s seminal paper, the representation of functions as processes has received considerable attention. For pure $\lambda$-calculus, the process representations yield (at best) non-extensional $\lambda $-theories (i.e., $\beta$ rule holds, whereas $\eta$ does not). In the paper, we study how to obtain extensional representations, and how to move between extensional and non-extensional representations. Using Internal $\pi$, $\mathrm{I}\pi$ (a subset of the $\pi$-calculus in which all outputs are bound), we develop a refinement of Milner’s original encoding of functions as processes that is parametric on certain abstract components called wires. These are, intuitively, processes whose task is to connect two end-point channels. We show that when a few algebraic properties of wires hold, the encoding yields a $\lambda$-theory. Exploiting the symmetries and dualities of $\mathrm{I}\pi$, we isolate three main classes of wires. The first two have a sequential behaviour and are dual of each other; the third has a parallel behaviour and is the dual of itself. We show the adoption of the parallel wires yields an extensional $\lambda$-theory; in fact, it yields an equality that coincides with that of B"ohm trees with infinite $\eta$. In contrast, the other two classes of wires yield non-extensional $\lambda$-theories whose equalities are those of the L'evy-Longo and B"ohm trees.

遵循 Milner 的原始文件, 函数作为进程表达得到相当的注意 。 对于纯 $\ lambda $- calcululs 的计算, 进程表达方式产生( 最多) 非扩展 $\ lambda $ 美元理论( 即 $\ beta$ 规则持有, 而不是 $\ a $ 美元 ) 。 在文件中, 我们研究如何获得扩展表达方式, 以及如何在扩展和非扩展表达方式之间移动 。 使用 $\ pi, $\ materm { Ipi $ ( $- 美元- calcallulus 的分类 ) 。 我们开发了 Milner 最初的元编码( $- lambda $ 美元 ) ( 美元 美元- 美元 ) ( 美元- 美元 ) 的计算方式, 以 美元- 美元 美元 的双线表达方式 。


Article 36

Title@2025-07-17 (4): Increasing the Expressiveness of a Gradual Verifier

Title: Increasing the Expressiveness of a Gradual Verifier Steigerung der Expressivität eines Gradualen Prüfers 提高逐步验证者的表达性 2507.13533v1

Authors (1): Priyam Gupta

Static verification provides strong correctness guarantees for code; however, fully specifying programs for static verification is a complex, burdensome process for users. Gradual verification was introduced to make this process easier by supporting the verification of partially specified programs. The only currently working gradual verifier, Gradual C0, successfully verifies heap manipulating programs, but lacks expressiveness in its specification language. This paper describes the design and implementation of an extension to Gradual C0 that supports unfolding expressions, which allow more intuitive specifications of recursive heap data structures.

静态核查为代码提供了强有力的正确性保障; 然而, 完全指定静态核查程序对于用户来说是一个复杂、 繁琐的过程。 引入了渐进性核查是为了通过支持部分指定的程序的核查使这一过程更加容易。 目前唯一正在运行的渐进式核查器“ 渐进式 C0 ” 成功验证了堆积操纵程序, 但其规格语言缺乏清晰度。 本文描述了用于支持正在展开的表达方式的渐进式 C0 扩展的设计和实施, 允许对循环式堆积数据结构进行更直观的规范 。


Article 37

Title@2025-07-17 (4): AI-Assisted Fixes to Code Review Comments at Scale

Title: AI-Assisted Fixes to Code Review Comments at Scale AI-Assisted Fixes to Code Review Kommentare auf Scale AI 协助制定标准标准代码审查评论 2507.13499v1

Authors (10): Chandra Maddila, Negar Ghorbani, James Saindon, Parth Thakkar, Vijayaraghavan Murali, Rui Abreu, Jingyue Shen, Brian Zhou, Nachiappan Nagappan, Peter C. Rigby

Aim. There are 10s of thousands of code review comments each week at Meta. We developed Metamate for Code Review (MetaMateCR) that provides AI-assisted fixes for reviewer comments in production at scale. Method. We developed an internal benchmark of 64k <review comment, patch> data points to fine-tune Llama models. Once our models achieve reasonable offline results, we roll them into production. To ensure that our AI-assisted fixes do not negatively impact the time it takes to do code reviews, we conduct randomized controlled safety trials as well as full production experiments. Offline Results. As a baseline, we compare GPT-4o to our small and large Llama models. In offline results, our LargeLSFT model creates an exact match patch 68% of the time outperforming GPT-4o by 9 percentage points (pp). The internal models also use more modern Hack functions when compared to the PHP functions suggested by GPT-4o. Safety Trial. When we roll MetaMateCR into production in a safety trial that compares no AI patches with AI patch suggestions, we see a large regression with reviewers taking over 5% longer to conduct reviews. After investigation, we modify the UX to only show authors the AI patches, and see no regressions in the time for reviews. Production. When we roll LargeLSFT into production, we see an ActionableToApplied rate of 19.7%, which is a 9.2pp improvement over GPT-4o. Our results illustrate the importance of safety trials in ensuring that AI does not inadvertently slow down engineers, and a successful review comment to AI patch product running at scale.

目标 : 每星期在梅塔有10 000个代码审查评论。 我们开发了代码审查的Metamate(MetamateCR) , 提供了在规模生产中进行审评评论的AI协助修正。 方法 。 我们开发了64k < 审评评论, 修补了数据点以微调Llama模型。 一旦我们的模型实现了合理的离线结果, 我们就会将其输入到生产中。 为了确保我们的AI协助修正不会对进行代码审查所需的时间产生消极影响, 我们随机地进行了控制安全测试和全面生产实验。 离线结果 。 作为基线, 我们把GPT-4o 与我们的小型和大型Llama模型进行比较。 在离线结果中, 我们的GLOTFT模型创造了精确匹配68%的时间比GPT-4的9个百分点(pppp) 。 一旦我们的模型实现了合理的离线结果, 我们就会把它们输入到更现代的HPHP 功能。 安全试验。 当我们把MetmateCRCR到一个安全试验中, 而不是AI 校正补建议, 我们看到一个巨大的回缩, 。


Article 38

Title@2025-07-17 (4): Random Variate Generation with Formal Guarantees

Title: Random Variate Generation with Formal Guarantees Zufallsvariate Generation mit formalen Garantien 具有正式保障的随机流动养殖 2507.13494v1

Authors (2): Feras A. Saad, Wonyeol Lee

This article introduces a new approach to principled and practical random variate generation with formal guarantees. The key idea is to first specify the desired probability distribution in terms of a finite-precision numerical program that defines its cumulative distribution function (CDF), and then generate exact random variates according to this CDF. We present a universal and fully automated method to synthesize exact random variate generators given any numerical CDF implemented in any binary number format, such as floating-point, fixed-point, and posits. The method is guaranteed to operate with the same precision used to specify the CDF, does not overflow, avoids expensive arbitrary-precision arithmetic, and exposes a consistent API. The method rests on a novel space-time optimal implementation for the class of generators that attain the information-theoretically optimal Knuth and Yao entropy rate, consuming the least possible number of input random bits per output variate. We develop a random variate generation library using our method in C and evaluate it on a diverse set of continuous'' anddiscrete’’ distributions, showing competitive runtime with the state-of-the-art GNU Scientific Library while delivering higher accuracy, entropy efficiency, and automation.

本条引入了一种新的方法, 用于有正式保证的原则性和实用性随机随机变换生成。 关键的想法是首先用一个限定精度数字程序来指定理想的概率分布, 以定义其累积分布函数( CDF) , 然后根据此 CDF 生成精确随机变换。 我们提出了一个通用和完全自动化的方法, 以任何二进制格式, 如浮动点、 固定点和假设, 来合成精确的随机变换生成器。 我们用我们C 中的方法来随机变换生成图书馆, 并用不同的“ 连续的” 和“ 扭曲的” 方法来操作, 避免昂贵的任意精确算术, 并暴露出一个一致的 API 。 这种方法取决于对能够达到信息- 理论上最佳 Knuth 和 Yao enpypypy 率的各类生成器的新的空间- 时间最佳执行方式。 我们用我们C 中的方法来开发一个随机变换生成的图书馆图书馆图书馆图书馆图书馆图书馆图书馆, 运行时要显示高的运行效率, 。


Article 39

Title@2025-07-17 (4): GPU Performance Portability needs Autotuning

Title: GPU Performance Portability needs Autotuning GPU Performance Portability benötigt Autotuning GPU 性能表现 便捷性需要自动调节 2505.03780v3

Authors (3): Burkhard Ringlein, Thomas Parnell, Radu Stoica

As LLMs grow in complexity, achieving state-of-the-art performance requires tight co-design across algorithms, software, and hardware. Today’s reliance on a single dominant platform limits portability, creates vendor lock-in, and raises barriers for new AI hardware. In this work, we make the case for combining just-in-time (JIT) compilation with comprehensive kernel parameter autotuning to enable portable LLM inference with state-of-the-art performance without code changes. Focusing on performance-critical LLM kernels, we demonstrate that this approach explores up to 15x more kernel parameter configurations, produces significantly more diverse code across multiple dimensions, and even outperforms vendor-optimized implementations by up to 230%, all while reducing kernel code size by 70x and eliminating manual code optimizations. Our results highlight autotuning as a promising path to unlocking model portability across GPU vendors.

随着LLMS的复杂程度不断增长,实现最先进的LLM性能要求严格地共同设计各种算法、软件和硬件。今天,对单一主导平台的依赖限制了可移动性,创建了供应商锁定,并为新的AI硬件制造了障碍。在这项工作中,我们有理由将即时(JIT)汇编与全面的内核参数自动调整结合起来,以便允许便携式LLM在不改变代码的情况下对最新性能进行推论。聚焦于性能临界的LLM内核,我们证明这一方法可以探索多达15x更多的内核参数配置,在多个维度上产生更多样化得多的代码,甚至比供应商优化执行率高出230 % , 而同时将内核代码的尺寸减少70x,并消除手动代码优化。我们的结果突出表明,自动化是释放GPU供应商模型可移植性能的一条有希望的道路。


Article 40

Title@2025-07-17 (4): Towards Formal Verification of LLM-Generated Code from Natural Language Prompts

Title: Towards Formal Verification of LLM-Generated Code from Natural Language Prompts Auf dem Weg zu einer formalen Überprüfung des LLM-generierten Codes aus natürlichen Sprachprompts 争取正式核查自然语言提示法LLM产生的守则 2507.13290v1

Authors (7): Aaron Councilman, David Fu, Aryan Gupta, Chengxiao Wang, David Grove, Yu-Xiong Wang, Vikram Adve

In the past few years LLMs have emerged as a tool that can aid programmers by taking natural language descriptions and generating code based on it. However, LLMs often generate incorrect code that users need to fix and the literature suggests users often struggle to detect these errors. In this work we seek to offer formal guarantees of correctness to LLM generated code; such guarantees could improve the experience of using AI Code Assistants and potentially enable natural language programming for users with little or no programming knowledge. To address this challenge we propose to incorporate a formal query language that can represent a user’s intent in a formally defined but natural language-like manner that a user can confirm matches their intent. Then, using such a query we propose to verify LLM generated code to ensure it matches the user’s intent. We implement these ideas in our system, Astrogator, for the Ansible programming language which includes such a formal query language, a calculus for representing the behavior of Ansible programs, and a symbolic interpreter which is used for the verification. On a benchmark suite of 21 code-generation tasks, our verifier is able to verify correct code in 83% of cases and identify incorrect code in 92%.

过去几年来,LLMS已经成为帮助程序员的工具,通过采用自然语言描述并据此生成代码来帮助程序员。然而,LLMS经常产生用户需要修正的错误代码,文献则表明用户往往会为发现这些错误而挣扎。在这项工作中,我们力求为LLM生成的代码提供正式的正确性保证;这些保障可以改善使用AI Code助理的经验,并有可能为几乎或完全没有编程知识的用户提供自然语言编程。为了应对这一挑战,我们提议纳入一种正式的查询语言,该语言可以以正式确定但自然语言相似的方式代表用户的意图,使用户能够确认其意图。然后,利用这样的询问,我们提议核查LLM生成的代码,以确保与用户的意图相符。我们在我们的系统中实施了这些想法,Astrogator,用于Asisable编程语言,其中包括一种正式的查询语言,一种代表 ANSE 程序的行为的缩略图,以及一种用于核查的象征性翻译。在21个代码生成任务的基准套中,我们的校准者能够核实83%的案例的代码,并查明92 %的代码。


Article 41

Title@2025-07-17 (4): Secure Parsing and Serializing with Separation Logic Applied to CBOR, CDDL, and COSE

Title: Secure Parsing and Serializing with Separation Logic Applied to CBOR, CDDL, and COSE Sichere Parsing und Serialisierung mit Separation Logic auf CBOR, CDDL und COSE angewendet 安全分解和序列化,与分离逻辑结合应用到CBOR、CDDL和COSE 2505.17335v2

Authors (4): Tahina Ramananandro, Gabriel Ebner, Guido Martínez, Nikhil Swamy

Incorrect handling of security-critical data formats, particularly in low-level languages, are the root cause of many security vulnerabilities. Provably correct parsing and serialization tools that target languages like C can help. Towards this end, we present PulseParse, a library of verified parser and serializer combinators for non-malleable binary formats. Specifications and proofs in PulseParse are in separation logic, offering a more abstract and compositional interface, with full support for data validation, parsing, and serialization. PulseParse also supports a class of recursive formats – with a focus on security and handling adversarial inputs, we show how to parse such formats with only a constant amount of stack space. We use PulseParse at scale by providing the first formalization of CBOR, a recursive, binary data format standard, with growing adoption in various industrial standards. We prove that the deterministic fragment of CBOR is non-malleable and provide EverCBOR, a verified library in both C and Rust to validate, parse, and serialize CBOR objects implemented using PulseParse. Next, we provide the first formalization of CDDL, a schema definition language for CBOR. We identify well-formedness conditions on CDDL definitions that ensure that they yield unambiguous, non-malleable formats, and implement EverCDDL, a tool that checks that a CDDL definition is well-formed, and then produces verified parsers and serializers for it. To evaluate our work, we use EverCDDL to generate verified parsers and serializers for various security-critical applications. Notably, we build a formally verified implementation of COSE signing, a standard for cryptographically signed objects. We also use our toolchain to generate verified code for other standards specified in CDDL, including DICE Protection Environment, a secure boot protocol standard.

安全关键数据格式的处理不正确, 特别是低级别语言的处理不正确, 是许多安全脆弱性的根源。 PulseParse 也支持一系列循环格式, 重点是安全和处理对抗性投入。 为此, 我们展示了 PulseParse。 我们使用 PulseParse, 是一个由可核实的剖析器和连序器组合组合器组成的库, 用于不可变二进制的二进制格式。 PulmalParse 中的规格和证明符合分离逻辑, 提供了更加抽象和构成的界面, 并全力支持数据验证、 解析和连序化。 PulseParse 也支持了一组循环格式, 重点是安全和处理对抗性投入, 我们展示了如何以固定的堆叠版空间来解析这些格式。 我们使用 PulseParseParse ParsePrass 标准, 运行了一种常规的 CDLL 标准, 用于在各种工业标准中不断更新的 CDL 。 我们证明, 输入了一种标准 Dral Dral 的硬化的硬文件, 。


Article 42

Title@2025-07-17 (4): Formal Verification for JavaScript Regular Expressions: a Proven Semantics and its Applications

Title: Formal Verification for JavaScript Regular Expressions: a Proven Semantics and its Applications Formale Überprüfung für JavaScript Reguläre Ausdrücke: eine bewährte Semantik und ihre Anwendungen JavaScript 常规表达式:经验证的语义及其应用的正式验证 2507.13091v1

Authors (3): Aurèle Barrière, Victor Deng, Clément Pit-Claudel

We present the first mechanized, succinct, practical, complete, and proven-faithful semantics for a modern regular expression language with backtracking semantics. We ensure its faithfulness by proving it equivalent to a preexisting line-by-line embedding of the official ECMAScript specification of JavaScript regular expressions. We demonstrate its practicality by presenting two real-world applications. First, a new notion of contextual equivalence for modern regular expressions, which we use to prove or disprove rewrites drawn from previous work. Second, the first formal proof of the PikeVM algorithm used in many real-world engines. In contrast with the specification and other formalization work, our semantics captures not only the top-priority match, but a full backtracking tree recording all possible matches and their respective priority. All our definitions and results have been mechanized in the Rocq proof assistant.

我们为现代正文表达语言提供了第一个机械化、简洁、实用、完整和经过实践证明真实的语义,并附有回溯性语义。我们通过证明它与先前存在的ECMAScript正式的JavaScript常规表达方式的逐行嵌入相当,确保了它的忠实性。我们展示了两种现实世界应用,显示了它的实用性。首先,现代正文表达方式的上下文等同性的新概念,我们用它来证明或否定以往工作中的重写。第二,许多真实世界引擎使用的PikevM算法的第一个正式证据。与规格和其他正规化工作不同的是,我们的语义学不仅捕捉了最优先的匹配,而且捕捉了记录所有可能的匹配和各自优先事项的全反跟踪树。我们的所有定义和结果在Rocq验证助理中都被机械化了。


Article 43

Title@2025-07-17 (4): Syntax Repair as Language Intersection

Title: Syntax Repair as Language Intersection Syntax-Reparatur als Sprach-Intersektion 语法修理作为语言交汇处 2507.11873v2

Authors (1): Breandan Considine

We introduce a new technique for repairing syntax errors in arbitrary context-free languages. This technique models syntax repair as a language intersection problem by defining a finite language that provably generates every syntactically valid repair within a given edit distance. Leveraging a theoretical connection between the Bar-Hillel construction from formal language theory and CFL reachability from program analysis, we show that repairability in a finite number of typographic edits is polylogarithmic parallel time decidable and provide an enumeration algorithm based on the Brzozowski derivative. Finally, we evaluate this algorithm and its implementation, demonstrating state-of-the-art results on a Python syntax repair benchmark.

我们引入了一种在任意的无背景语言中修复语法错误的新技术。 这个技术模型将语法修复作为一种语言交叉问题, 定义一种有限的语言, 可以在一定的编辑距离内生成每一种在同步上有效的修复。 我们从正式的语言理论中将Bar- Hillel 建筑的理论联系与 CFL 的可达性从程序分析中加以利用, 我们显示, 数量有限的印刷编辑的可修复性是多词性平行时间的分解, 并提供基于 Brzozowski 衍生物的查点算法。 最后, 我们评估了这一算法及其实施, 展示了 Python 语法修复基准的最新结果 。