cs.PL @ 2025-06-01: 029
-
00 05-29 (4) Extensional and Non-extensional Functions as Processes Erweiterungs- und Nichterweiterungsfunktionen als Prozesse 作为处理过程的扩展和非扩展函数 2405.03536v2 -
01 05-29 Quantitative Verification with Neural Networks Quantitative Überprüfung mit neuralen Netzen 与神经网络进行定量核查 2301.06136v5 -
02 05-29 Is spreadsheet syntax better than numeric indexing for cell selection? Ist die Tabellensyntax besser als die numerische Indexierung für die Zellenauswahl? 选择单元格的电子表格语法是否比数字索引更好? 2505.23296v1 -
03 05-29 VERINA: Benchmarking Verifiable Code Generation VERINA: Benchmarking der überprüfbaren Code-Generierung VERINA:可核实代码生成基准 2505.23135v1 -
04 05-29 Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference Kann LLMs Grund über Programm Semantik? Eine umfassende Bewertung von LLMs auf formale Spezifikation Inferenz CLLMs 方案语义学理由:全面评价关于正式具体推断的LLMs 2503.04779v4 -
05 05-29 CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark CASS: Nvidia zu AMD Transpilation mit Daten, Modellen und Benchmark CASS: Nvidia 到AMD 传输数据、模型和基准 2505.16968v3 -
06 05-29 DINGO: Constrained Inference for Diffusion LLMs DINGO: Beschränkte Schlussfolgerung für Diffusion LLMs DINGO: 扩散长效LMM的连续推论 2505.23061v1 -
07 05-28 (3) HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding HiLDe: Intentionale Code-Generierung über Human-in-the-Loop-Dekodierung HILDe:通过 “ 人类在圈套解码 “ 进行有意代码生成 2505.22906v1 -
08 05-28 $Δ$-Nets: Interaction-Based System for Optimal Parallel $λ$-Reduction $Δ$-Nets: Interaktionsbasiertes System für eine optimale parallele $λ$-Reduktion \(-净额:最佳平行互动系统\)$美元-削减 2505.20314v2 -
09 05-28 TPDE: A Fast Adaptable Compiler Back-End Framework TPDE: Ein schnell anpassbares Compiler Back-End Framework TPDE:一个快速可调适的汇编者后端框架 2505.22610v1 -
10 05-28 Custom Representations of Inductive Families Benutzerdefinierte Darstellungen von induktiven Familien 感养家庭习俗代表 2505.21225v2 -
11 05-28 An instance of FreeCHR with refined operational semantics Eine Instanz von FreeCHR mit raffinierter operativer Semantik ” 自由人权中心 “ 的例子,其中精炼了操作语义 2505.22155v1 -
12 05-27 (2) KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads KPerfIR: Auf dem Weg zu einem offenen und kompilerzentrierten Ökosystem für GPU-Kernel Performance Tooling auf modernen KI-Workloads KPerfIR:努力建立一个开放的、以编纂者为中心的生态系统,用于在现代AI 工作负荷上使用 GPU 内核性能工具 2505.21661v1 -
13 05-27 Full LTL Synthesis over Infinite-state Arenas Vollständige LTL-Synthese über Infinite-State-Arenen 无限状态阿雷纳斯地区全面LTL综合综合 2307.09776v3 -
14 05-27 Local Type Inference for Context-Free Session Types Lokale Typ-Schlussfolgerung für kontextfreie Sitzungstypen 无背景会话类型本地类型推推推 2505.20855v1 -
15 05-27 An Efficient Implementation of Guard-Based Synchronization for an Object-Oriented Programming Language Effiziente Implementierung von Guard-Based Synchronization für eine objektorientierte Programmiersprache 高效率地实施以警卫为基础的同步,以用于以目标为导向的方案编制语言 2505.20850v1 -
16 05-27 Thread and Memory-Safe Programming with CLASS Thread- und Memory-sichere Programmierung mit CLASS 使用 CLASS 的线索和内存安全编程 2505.20848v1 -
17 05-27 Choreographies as Macros Choreographien als Makros 作为宏的舞蹈 2505.20845v1 -
18 05-27 INTERLEAVE: A Faster Symbolic Algorithm for Maximal End Component Decomposition INTERLEAVE: Ein schnellerer symbolischer Algorithmus für maximale Endkomponentenzersetzung 最大末端组件分解的更快的符号性算法 2505.20748v1 -
19 05-26 (1) GPUMC: A Stateless Model Checker for GPU Weak Memory Concurrency GPUMC: Ein staatenloser Modellprüfer für GPU-Schwachspeicherkonkurrenz GPUMC: GPU 弱内存调制货币的无国籍模式检查器 2505.20207v1 -
20 05-26 The CodeInverter Suite: Control-Flow and Data-Mapping Augmented Binary Decompilation with LLMs Die CodeInverter Suite: Control-Flow und Data-Mapping Augmented Binary Decompilation mit LLMs 代码输入器套件:控制-光和数据-制表增强的二进制解析与LLMS 2503.07215v2 -
21 05-26 Abstractions-of-Thought: Intermediate Representations for LLM Reasoning in Hardware Design Abstractions-of-Thought: Mittlere Darstellungen für LLM-Reasoning im Hardware-Design 设计硬件设计方面LLM理由的中间代表 2505.15873v2 -
22 05-26 LEGO-Compiler: Enhancing Neural Compilation Through Translation Composability LEGO-Compiler: Neurale Kompilierung durch Übersetzungskompatibilität verbessern LEGO-Compuper:通过翻译集成加强神经汇编 2505.20356v1 -
23 05-26 Fully Randomized Pointers Vollkommen Randomisierte Zeiger 完全随机化指针 2405.12513v2 -
24 05-25 (7) CLEVER: A Curated Benchmark for Formally Verified Code Generation CLEVER: Ein kuratierter Benchmark für die formal verifizierte Codegenerierung 正式核实的代码生成基准 2505.13938v3 -
25 05-25 Proceedings 16th International Workshop on Programming Language Approaches to Concurrency and Communication-cEntric Software 16. Internationaler Workshop zur Programmierung von Sprachkonzepten für Concurrency und Communication-cEntric Software 第16次关于以方案语言处理货币和通信软件的国际讲习班 2505.19078v1 -
26 05-24 (6) Algorithmic Language Models with Neurally Compiled Libraries Algorithmische Sprachmodelle mit neurally compiled Bibliotheken 具有神经编译图书馆的算法语言模型 2407.04899v2 -
27 05-24 Automated Verification of Monotonic Data Structure Traversals in C Automatisierte Überprüfung der monotonen Datenstruktur Traversals in C 自动核查C的单声量数据结构轨迹 2505.18818v1 -
28 05-24 Autocomp: LLM-Driven Code Optimization for Tensor Accelerators Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v1
Article 0
Title@2025-05-29 (4): Extensional and Non-extensional Functions as Processes
Title: Extensional and Non-extensional Functions as Processes | Erweiterungs- und Nichterweiterungsfunktionen als Prozesse | 作为处理过程的扩展和非扩展函数 2405.03536v2 |
Authors: Ken Sakayori, Davide Sangiorgi
Following Milner’s seminal paper, the representation of functions as processes has received considerable attention. For pure $\lambda$-calculus, the process representations yield (at best) non-extensional $\lambda $-theories (i.e., $\beta$ rule holds, whereas $\eta$ does not). In the paper, we study how to obtain extensional representations, and how to move between extensional and non-extensional representations. Using Internal $\pi$, $\mathrm{I}\pi$ (a subset of the $\pi$-calculus in which all outputs are bound), we develop a refinement of Milner’s original encoding of functions as processes that is parametric on certain abstract components called wires. These are, intuitively, processes whose task is to connect two end-point channels. We show that when a few algebraic properties of wires hold, the encoding yields a $\lambda$-theory. Exploiting the symmetries and dualities of $\mathrm{I}\pi$, we isolate three main classes of wires. The first two have a sequential behaviour and are dual of each other; the third has a parallel behaviour and is the dual of itself. We show the adoption of the parallel wires yields an extensional $\lambda$-theory; in fact, it yields an equality that coincides with that of B"ohm trees with infinite $\eta$. In contrast, the other two classes of wires yield non-extensional $\lambda$-theories whose equalities are those of the L'evy-Longo and B"ohm trees.
遵循 Milner 的原始文件, 函数作为进程表达得到相当的注意 。 对于纯 $\ lambda $- calcululs 的计算, 进程表达方式产生( 最多) 非扩展 $\ lambda $ 美元理论( 即 $\ beta$ 规则持有, 而不是 $\ a $ 美元 ) 。 在文件中, 我们研究如何获得扩展表达方式, 以及如何在扩展和非扩展表达方式之间移动 。 使用 $\ pi, $\ materm { Ipi $ ( $- 美元- calcallulus 的分类 ) 。 我们开发了 Milner 最初的元编码( $- lambda $ 美元 ) ( 美元 美元- 美元 ) ( 美元- 美元 ) 的计算方式, 以 美元- 美元 美元 的双线表达方式 。
Article 1
Title@2025-05-29 (4): Quantitative Verification with Neural Networks
Title: Quantitative Verification with Neural Networks | Quantitative Überprüfung mit neuralen Netzen | 与神经网络进行定量核查 2301.06136v5 |
Authors: Alessandro Abate, Alec Edwards, Mirco Giacobbe, Hashan Punchihewa, Diptarko Roy
We present a data-driven approach to the quantitative verification of probabilistic programs and stochastic dynamical models. Our approach leverages neural networks to compute tight and sound bounds for the probability that a stochastic process hits a target condition within finite time. This problem subsumes a variety of quantitative verification questions, from the reachability and safety analysis of discrete-time stochastic dynamical models, to the study of assertion-violation and termination analysis of probabilistic programs. We rely on neural networks to represent supermartingale certificates that yield such probability bounds, which we compute using a counterexample-guided inductive synthesis loop: we train the neural certificate while tightening the probability bound over samples of the state space using stochastic optimisation, and then we formally check the certificate’s validity over every possible state using satisfiability modulo theories; if we receive a counterexample, we add it to our set of samples and repeat the loop until validity is confirmed. We demonstrate on a diverse set of benchmarks that, thanks to the expressive power of neural networks, our method yields smaller or comparable probability bounds than existing symbolic methods in all cases, and that our approach succeeds on models that are entirely beyond the reach of such alternative techniques.
我们提出了一种数据驱动的方法,用于对概率程序进行定量核查,以及随机动态模型。我们的方法利用神经网络对神经网络进行精确和稳健的界限进行计算,以确定随机过程在有限时间内击中目标条件的可能性。这个问题包含各种定量核查问题,从离散随机动态模型的可及性和安全分析,到对概率程序侵犯和终止性分析的研究。我们依靠神经网络来代表产生这种概率界限的超测证书,我们利用反比制导导导导导导导导引导合成环路进行计算:我们培训神经证书,同时使用随机优选法收紧对州空间样本的概率,然后我们利用可互观性模拟模型的可及性分析,正式检查证书对每个可能州的有效性;如果我们得到反比反比分析,我们把它添加到我们的样本中,并在确认有效性之前重复循环。我们用一套不同的基准来进行计算,由于神经测试方法的显性能力,因此在模拟模型的绝对成功性模型上,我们现有的方法压得更小。
Article 2
Title@2025-05-29 (4): Is spreadsheet syntax better than numeric indexing for cell selection?
Title: Is spreadsheet syntax better than numeric indexing for cell selection? | Ist die Tabellensyntax besser als die numerische Indexierung für die Zellenauswahl? | 选择单元格的电子表格语法是否比数字索引更好? 2505.23296v1 |
Authors: Philip Heltweg, Dirk Riehle, Georg-Daniel Schwarz
Selecting a subset of cells is a common task in data engineering, for example, to remove errors or select only specific parts of a table. Multiple approaches to express this selection exist. One option is numeric indexing, commonly found in general programming languages, where a tuple of numbers identifies the cell. Alternatively, the separate dimensions can be referred to using different enumeration schemes like “A1” for the first cell, commonly found in software such as spreadsheet systems. In a large-scale controlled experiment with student participants as proxy for data practitioners, we compare the two options with respect to speed and correctness of reading and writing code. The results show that, when reading code, participants make less mistakes using spreadsheet-style syntax. Additionally, when writing code, they make fewer mistakes and are faster when using spreadsheet syntax compared to numeric syntax. From this, a domain-specific syntax, such as spreadsheet syntax for data engineering, appears to be a promising alternative to explore in future tools to support practitioners without a software engineering background.
选择一组单元格是数据工程中的一项共同任务,例如,清除错误或只选择表格的特定部分。存在表达此选择的多种方法。一个选项是数字索引化,通常在一般编程语言中找到,其中数字图示可以识别单元格。或者,可以将不同的维度提到使用不同的查点方案,例如“A1”等第一个单元格的查点方案,通常在电子表格系统等软件中找到。在与学生参与者作为数据实践者代理的大规模控制实验中,我们比较了关于读写代码速度和正确性的两个选项。结果显示,在阅读代码时,参与者使用电子表格风格语法的误差较少。此外,在写入代码时,他们会减少错误,使用电子表格语法比数字语法更快。从这一点看,一个特定域的语法,例如数据工程电子表格语法,似乎是今后在没有软件工程背景的情况下探索支持从业者的工具的一个很有希望的替代方法。
Article 3
Title@2025-05-29 (4): VERINA: Benchmarking Verifiable Code Generation
Title: VERINA: Benchmarking Verifiable Code Generation | VERINA: Benchmarking der überprüfbaren Code-Generierung | VERINA:可核实代码生成基准 2505.23135v1 |
Authors: Zhe Ye, Zhengxu Yan, Jingxuan He, Timothe Kasriel, Kaiyu Yang, Dawn Song
Large language models (LLMs) are increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging and often requires costly manual review. Verifiable code generation – jointly generating code, specifications, and proofs of code-specification alignment – offers a promising path to address this limitation and further unleash LLMs’ benefits in coding. Yet, there exists a significant gap in evaluation: current benchmarks often lack support for end-to-end verifiable code generation. In this paper, we introduce Verina (Verifiable Code Generation Arena), a high-quality benchmark enabling a comprehensive and modular evaluation of code, specification, and proof generation as well as their compositions. Verina consists of 189 manually curated coding tasks in Lean, with detailed problem descriptions, reference implementations, formal specifications, and extensive test suites. Our extensive evaluation of state-of-the-art LLMs reveals significant challenges in verifiable code generation, especially in proof generation, underscoring the need for improving LLM-based theorem provers in verification domains. The best model, OpenAI o4-mini, generates only 61.4% correct code, 51.0% sound and complete specifications, and 3.6% successful proofs, with one trial per task. We hope Verina will catalyze progress in verifiable code generation by providing a rigorous and comprehensive benchmark. We release our dataset on https://huggingface.co/datasets/sunblaze-ucb/verina and our evaluation code on https://github.com/sunblaze-ucb/verina.
大型语言模型(LLMS)日益融入软件开发,但确保LLM生成的代码的正确性仍具有挑战性,而且往往需要花费昂贵的人工审查。可验证代码的生成 – – 共同生成代码、规格和具体编码协调的证明 – – 为解决这一限制和进一步释放LLMS的编码好处提供了一条充满希望的道路。然而,在评价方面存在着巨大的差距:目前的基准往往缺乏对端至端可核查代码生成的支持。在本文件中,我们引入了一个高质量基准,从而能够对代码、规格和证据生成及其构成进行全面和模块化评价。Verina由189个手工拼凑的编码任务组成,其中有详细的问题描述、参考执行、正式规格和广泛的测试套件。我们对目前最先进的LLMSM(可验证代码生成的DLMSUCSDS/Arencrearetures)的生成存在重大挑战。我们的最佳模型(OO4minirea)只能生成61.4%的代码,51.0%的硬度和3.6%的精确度的代码,我们将提供我们精确度数据生成的进度和3.6%的数据。
Article 4
Title@2025-05-29 (4): Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference
Title: Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference | Kann LLMs Grund über Programm Semantik? Eine umfassende Bewertung von LLMs auf formale Spezifikation Inferenz | CLLMs 方案语义学理由:全面评价关于正式具体推断的LLMs 2503.04779v4 |
Authors: Thanh Le-Cong, Bach Le, Toby Murray
Large Language Models (LLMs) are increasingly being used to automate programming tasks. Yet, LLMs’ capabilities in reasoning about program semantics are still inadequately studied, leaving significant potential for further exploration. This paper introduces FormalBench, a comprehensive benchmark designed to evaluate LLMs’ reasoning abilities on program semantics, particularly via the task of synthesizing formal program specifications to assist verifying program correctness. This task requires both comprehensive reasoning over all possible program executions and the generation of precise, syntactically correct expressions that adhere to formal syntax and semantics. Using this benchmark, we evaluated the ability of LLMs in synthesizing consistent and complete specifications. Our findings show that LLMs perform well with simple control flows but struggle with more complex structures, especially loops, even with advanced prompting. Additionally, LLMs exhibit limited robustness against semantic-preserving transformations. We also highlight common failure patterns and design self-repair prompts, improving success rates by 25%.
大型语言模型(LLMs)正越来越多地被用于使程序设计任务自动化。然而,LLMs在程序语义学的推理能力仍然没有得到充分的研究,从而留下了进一步探索的巨大潜力。本文介绍了旨在评估LLMs在程序语义学上的推理能力的全面基准“正式Bench”,尤其是通过综合正式程序规格协助核实程序正确性的任务。这项任务要求对所有可能的方案执行进行综合推理,并生成精确、统一和正确的表达方式,以坚持正式的语义学和语义学。我们利用这一基准,评估LLMs在综合一致和完整的规格方面的能力。我们的研究结果显示LMs在简单控制流程方面表现良好,但与更复杂的结构,特别是圆环进行斗争,即使有先进的提示。此外,LLMs在反对语义-保留转换方面表现出有限的强健性。我们还强调了常见的失败模式和设计自我修复的提示,使成功率提高了25%。
Article 5
Title@2025-05-29 (4): CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark
Title: CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark | CASS: Nvidia zu AMD Transpilation mit Daten, Modellen und Benchmark | CASS: Nvidia 到AMD 传输数据、模型和基准 2505.16968v3 |
Authors: Ahmed Heakl, Sarim Hashmi, Gustavo Bertolo Stahl, Seung Hun Eddie Han, Salman Khan, Abdulrahman Mahmoud
We introduce CASS, the first large-scale dataset and model suite for cross-architecture GPU code transpilation, targeting both source-level (CUDA <–> HIP) and assembly-level (Nvidia SASS <–> AMD RDNA3) translation. The dataset comprises 70k verified code pairs across host and device, addressing a critical gap in low-level GPU code portability. Leveraging this resource, we train the CASS family of domain-specific language models, achieving 95% source translation accuracy and 37.5% assembly translation accuracy, substantially outperforming commercial baselines such as GPT-4o, Claude, and Hipify. Our generated code matches native performance in over 85% of test cases, preserving runtime and memory behavior. To support rigorous evaluation, we introduce CASS-Bench, a curated benchmark spanning 16 GPU domains with ground-truth execution. All data, models, and evaluation tools are released as open source to foster progress in GPU compiler tooling, binary compatibility, and LLM-guided hardware translation.
我们引入了CASS, 这是首个用于跨建筑化 GPU 代码转换的大型数据集和模型套件, 针对源级( CUDA < - > HIP) 和组装级( Nvidia SASSS < - > AMD RDNA3) 翻译。 该数据集由70k经核实的对数组成, 跨越主机和装置, 解决低级别 GPU 代码可移植性的重大差距。 利用此资源, 我们培训 CASS 群域域语言模型, 实现95% 源翻译准确性和37.5% 组装翻译准确性, 大大超过 GPT-4o、 Claude 和 Hipifify等商业基线。 我们生成的代码匹配了85%以上测试案例的本地性能, 保存运行时间和记忆行为。 为了支持严格的评估, 我们引入了 CASS- Bench, 一个覆盖16 GPU 域域的曲线基准, 并带有地标执行。 所有的数据、 模型和评价工具都作为公开来源发布, 以促进 GPUPU 工具的编译、 和 LLM 制硬件翻译的进展 。
Article 6
Title@2025-05-29 (4): DINGO: Constrained Inference for Diffusion LLMs
Title: DINGO: Constrained Inference for Diffusion LLMs | DINGO: Beschränkte Schlussfolgerung für Diffusion LLMs | DINGO: 扩散长效LMM的连续推论 2505.23061v1 |
Authors: Tarun Suresh, Debangshu Banerjee, Shubham Ugare, Sasa Misailovic, Gagandeep Singh
Diffusion LLMs have emerged as a promising alternative to conventional autoregressive LLMs, offering significant potential for improved runtime efficiency. However, existing diffusion models lack the ability to provably enforce user-specified formal constraints, such as regular expressions, which makes them unreliable for tasks that require structured outputs, such as fixed-schema JSON generation. Unlike autoregressive models that generate tokens sequentially, diffusion LLMs predict a block of tokens in parallel. This parallelism makes traditional constrained decoding algorithms, which are designed for sequential token prediction, ineffective at preserving the true output distribution. To address this limitation, we propose DINGO, a dynamic programming-based constrained decoding strategy that is both efficient and provably distribution-preserving. DINGO enables sampling of output strings with the highest probability under the model’s predicted distribution, while strictly satisfying any user-specified regular expression. On standard symbolic math and JSON generation benchmarks, DINGO achieves up to a 68 percentage point improvement over unconstrained inference
与传统自动递减的LMS相比,LMS已成为一种大有希望的替代传统自动递减的LMS,它为提高运行时间效率提供了巨大的潜力;然而,现有的推广模式缺乏能力,无法对用户指定的正式限制,例如常规表达方式,使其不适于执行需要结构化产出的任务,例如固定的JSON 生成。与自动递减模式不同,扩散LMS同时预测一系列象征性。这种平行使得传统的受限制解码算法(这些算法是为按顺序进行象征性预测而设计的,在保存真正的产出分布方面无效)。为了应对这一限制,我们建议DINGO,这是一个动态的、基于程序化的受限解码战略,既高效又可可移动的分布保存。DINGO能够根据模型预测的分布,在严格满足用户指定的任何常规表达方式时,以最高概率取样产出字符。关于标准的象征性数学和JSONS生成基准,DINGO在未受限制的推算之外,实现了68个百分点的改进。
Article 7
Title@2025-05-28 (3): HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding
Title: HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding | HiLDe: Intentionale Code-Generierung über Human-in-the-Loop-Dekodierung | HILDe:通过 “ 人类在圈套解码 “ 进行有意代码生成 2505.22906v1 |
Authors: Emmanuel Anaya González, Raven Rothkopf, Sorin Lerner, Nadia Polikarpova
While AI programming tools hold the promise of increasing programmers’ capabilities and productivity to a remarkable degree, they often exclude users from essential decision-making processes, causing many to effectively “turn off their brains” and over-rely on solutions provided by these systems. These behaviors can have severe consequences in critical domains, like software security. We propose Human-in-the-loop Decoding, a novel interaction technique that allows users to observe and directly influence LLM decisions during code generation, in order to align the model’s output with their personal requirements. We implement this technique in HiLDe, a code completion assistant that highlights critical decisions made by the LLM and provides local alternatives for the user to explore. In a within-subjects study (N=18) on security-related tasks, we found that HiLDe led participants to generate significantly fewer vulnerabilities and better align code generation with their goals compared to a traditional code completion assistant.
虽然AI编程工具有望提高程序员的能力和生产力,但往往将用户排除在基本决策进程之外,导致许多用户有效“转变大脑”和过度依赖这些系统提供的解决办法。这些行为可能在软件安全等关键领域产生严重后果。我们提出“人与人互换脱钩”这一新型互动技术,使用户能够在代码生成过程中观察并直接影响LLM决定,以便使模型的产出与其个人需求相一致。我们在HiLDE实施这一技术,这是一个代码完成助理,它突出LLM做出的关键决定,并为用户提供可供探索的本地替代方法。在一项关于与安全有关的任务的内题研究(N=18)中,我们发现HILDE引导参与者比传统的代码完成助理少得多的脆弱性,更好地将代码生成与其目标相匹配。
Article 8
Title@2025-05-28 (3): $Δ$-Nets: Interaction-Based System for Optimal Parallel $λ$-Reduction
Title: $Δ$-Nets: Interaction-Based System for Optimal Parallel $λ$-Reduction | $Δ$-Nets: Interaktionsbasiertes System für eine optimale parallele $λ$-Reduktion | \(-净额:最佳平行互动系统\)$美元-削减 2505.20314v2 |
Authors: Daniel Augusto Rizzi Salvadori
I present a model of universal parallel computation called $\Delta$-Nets, and a method to translate $\lambda$-terms into $\Delta$-nets and back. Together, the model and the method constitute an algorithm for optimal parallel $\lambda$-reduction, solving the longstanding enigma with groundbreaking clarity. I show that the $\lambda$-calculus can be understood as a projection of $\Delta$-Nets – one that severely restricts the structure of sharing, among other drawbacks. Unhindered by these restrictions, the $\Delta$-Nets model opens the door to new highly parallel programming language implementations and computer architectures that are more efficient and performant than previously possible.
我提出了一个称为$Delta$-Net的通用平行计算模型,以及一种将$lambda$-terms 转换成$Delta$-nets和回转的方法。模型和方法共同构成一个优化平行$\lambda$降值的算法,以突破性的清晰度解决长期谜题。我显示,$Lumbda$-计算法可以被理解为一个$Delta$-Nets的预测,这一预测严格限制了共享结构,以及其他缺陷。由于这些限制,$Delta$-Nets模型被打破,为与以前相比效率更高、更能发挥作用的新的高度平行的编程语言实施和计算机结构打开了大门。
Article 9
Title@2025-05-28 (3): TPDE: A Fast Adaptable Compiler Back-End Framework
Title: TPDE: A Fast Adaptable Compiler Back-End Framework | TPDE: Ein schnell anpassbares Compiler Back-End Framework | TPDE:一个快速可调适的汇编者后端框架 2505.22610v1 |
Authors: Tobias Schwarz, Tobias Kamm, Alexis Engelke
Fast machine code generation is especially important for fast start-up just-in-time compilation, where the compilation time is part of the end-to-end latency. However, widely used compiler frameworks like LLVM do not prioritize fast compilation and require an extra IR translation step increasing latency even further; and rolling a custom code generator is a substantial engineering effort, especially when targeting multiple architectures. Therefore, in this paper, we present TPDE, a compiler back-end framework that adapts to existing code representations in SSA form. Using an IR-specific adapter providing canonical access to IR data structures and a specification of the IR semantics, the framework performs one analysis pass and then performs the compilation in just a single pass, combining instruction selection, register allocation, and instruction encoding. The generated target instructions are primarily derived code written in high-level language through LLVM’s Machine IR, easing portability to different architectures while enabling optimizations during code generation. To show the generality of our framework, we build a new back-end for LLVM from scratch targeting x86-64 and AArch64. Performance results on SPECint 2017 show that we can compile LLVM-IR 8–24x faster than LLVM -O0 while being on-par in terms of run-time performance. We also demonstrate the benefits of adapting to domain-specific IRs in JIT contexts, particularly WebAssembly and database query compilation, where avoiding the extra IR translation further reduces compilation latency.
快速机器代码生成对于快速启动即时编译特别重要,因为编辑时间是端到端的延迟时间的一部分。 然而,LLLLLDM等广泛使用的编译框架并不优先考虑快速编译,而要求额外的IR翻译步骤,甚至进一步增加延缓度;滚动自定义代码生成器是一项巨大的工程工作,特别是针对多个架构。因此,在本文件中,我们介绍了一个编译器后端框架TPDE,即一个编译器后端框架,可适应SSA格式中的现有代码表述。使用一个IR特定的调适器,提供IR数据结构的卡通访问和IR域码汇编的规格,框架执行一个分析通行证,然后仅用一个单关卡进行汇编,将选择指令、登记分配和指令编码合并在一起。 生成的目标指令主要是通过LLLLVM的机器 IR, 方便对不同架构的可移植性能进行优化。 为了显示我们的框架的普遍性,我们为LLLVM-86-64和Arch-MLA-LA-LA-LA-LA-LA-LA-LS-LA-LA-LS-LA-LA-LA-LA-LA-LA-LA-LV-LV-LV-LA-LA-LA-LA-LS-LS-LS-LS-LS-LS-LD-LD-LD-LA-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-LD-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-LD-LD-LD-LD-LD-LD-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L
Article 10
Title@2025-05-28 (3): Custom Representations of Inductive Families
Title: Custom Representations of Inductive Families | Benutzerdefinierte Darstellungen von induktiven Familien | 感养家庭习俗代表 2505.21225v2 |
Authors: Constantine Theocharis, Edwin Brady
Inductive families provide a convenient way of programming with dependent types. Yet, when it comes to compilation, their default linked-tree runtime representations, as well as the need to convert between different indexed views of the same data, can lead to unsatisfactory runtime performance. In this paper, we introduce a language with dependent types, and inductive families with customisable representations. Representations are a version of Wadler’s views, refined to inductive families like in Epigram, but with compilation guarantees: a represented inductive family will not leave any runtime traces behind, without relying on heuristics such as deforestation. This way, we can build a library of convenient inductive families based on a minimal set of primitives, whose re-indexing and conversion functions are erased during compilation. We show how we can express optimisation techniques such as representing Nat-like types as GMP-style big integers, without special casing in the compiler. With dependent types, reasoning about data representations is also possible through a provided modality. This yields computationally irrelevant isomorphisms between the original and represented data.
诱导家庭提供了一种方便的依附类型模式的编程方式。然而,在编程方面,它们默认的链接树木运行时间表,以及需要将同一数据的不同索引观点转换为不同索引,可能导致运行性能不尽如人意。在本文中,我们引入了一种有依附类型和有自定义代表性的感应家庭的语言。表解是Wadler观点的版本,像Epigram那样对感应家庭进行了精细化,但有编程保证:一个有代表性的传导家庭不会留下任何运行时间的痕迹,而不依赖毁林等超自然现象。这样,我们可以建立一个方便的感应家庭图书馆,以一套最起码的原始数据为基础,其再索引和转换功能在编程中被删除。我们展示了如何表达优化技术,例如代表像Nat一样的种类,如GMP式的大整形,在编程中不设特殊的外壳。有依附类型,通过提供的方式,也可以对数据表述进行推理。这种结果在计算上与原始数据和代表数据之间没有关联性。
Article 11
Title@2025-05-28 (3): An instance of FreeCHR with refined operational semantics
Title: An instance of FreeCHR with refined operational semantics | Eine Instanz von FreeCHR mit raffinierter operativer Semantik | ” 自由人权中心 “ 的例子,其中精炼了操作语义 2505.22155v1 |
Authors: Sascha Rechenberger, Thom Frühwirth
Constraint Handling Rules (CHR) is a rule-based programming language which is typically embedded into a general-purpose language. There exists a plethora of implementations of CHR for numerous host languages. However, the existing implementations often reinvent the way to embed CHR, which impedes maintenance and weakens assertions of correctness. To formalize and thereby unify the embedding of CHR into arbitrary host languages, we introduced the framework FreeCHR and proved it to be a valid representation of classical CHR. Until now, this framework only includes a translation of the very abstract operational semantics of CHR which, due to its abstract nature, introduces several practical issues. In this paper, we introduce an execution algorithm for FreeCHR. We derive it from the refined operational semantics of CHR, which resolve the issues introduced by the very abstract semantics. We also prove soundness of the algorithm with respect to the very abstract semantics of FreeCHR. Hereby we provide a unified and an easy to implement guideline for new CHR implementations, as well as an algorithmic definition of the refined operational semantics.
约束性处理规则(CHR)是一种基于规则的编程语言,通常被嵌入通用语言中。对于许多主用语言,CHR有许多执行方式。然而,现有的执行方式往往重新发明了嵌入CHR的方法,这妨碍了维护,削弱了正确性的说法。为了正式确定并从而统一将CHR纳入任意主用语言,我们引入了FreeHR框架,并证明它是传统CHR的有效表述。到目前为止,这个框架只包括翻译了CHR非常抽象的操作语义,由于其抽象性质,提出了若干实际问题。我们在本文件中为FreeCHR采用了一种执行算法。我们从SHR的精细操作语义中得出,它解决了非常抽象的语义引起的问题。我们还证明了对FreeCHR非常抽象的语义的算法的正确性。我们提供了一种统一和易于执行CHR新执行指南,以及精细操作语义的算法定义。
Article 12
Title@2025-05-27 (2): KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads
Title: KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads | KPerfIR: Auf dem Weg zu einem offenen und kompilerzentrierten Ökosystem für GPU-Kernel Performance Tooling auf modernen KI-Workloads | KPerfIR:努力建立一个开放的、以编纂者为中心的生态系统,用于在现代AI 工作负荷上使用 GPU 内核性能工具 2505.21661v1 |
Authors: Yue Guan, Yuanwei Fang, Keren Zhou, Corbin Robeck, Manman Ren, Zhongkai Yu, Yufei Ding, Adnan Aziz
In this work, we propose KPerfIR, a novel multilevel compiler-centric infrastructure to enable the development of customizable, extendable, and portable profiling tools tailored for modern artificial intelligence (AI) workloads on modern GPUs. Our approach integrates profiling capabilities directly into the compiler workflow, allowing profiling functionalities to be implemented as compiler passes, offering a programmable and reusable framework for performance analysis. This design bridges the gap between compilers and profilers, enabling fine-grained insights into complex optimization challenges such as overlapping the execution of fine-grained function units on GPUs. KPerfIR is integrated into the Triton infrastructure to highlight the power of a compiler-centric approach to advance performance analysis and optimization in the ever-evolving landscape of AI compilers. Our evaluation shows that our tool incurs low overhead (8.2%), provides accurate measurements (2% relative error), and delivers actionable insights into complicated GPU intra-kernel optimizations.
在这项工作中,我们提出了KPerfIR,这是一个新的多级编译器中心基础设施,用于开发适合现代GPU的现代人工智能工作量的可定制、可扩展和便携式剖析工具。我们的方法将剖析能力直接纳入汇编者工作流程,允许将剖析功能作为编译者通行证加以实施,为绩效分析提供一个可编程和可重复使用的框架。这个设计可以弥合编译者和剖面设计器之间的差距,从而能够对复杂的优化挑战进行精细的洞察,例如,在GPUs上执行精细裁的功能单位重叠。 KPerfIR被整合到特里顿基础设施中,以突出以编纂者为中心的方法在不断演变的AI汇编者环境中推进绩效分析和优化的能力。我们的评估表明,我们的工具拥有低的间接费用(8.2%),提供准确的测量(2%相对错误),并提供了复杂的GPU内核优化的可操作的洞察力。
Article 13
Title@2025-05-27 (2): Full LTL Synthesis over Infinite-state Arenas
Title: Full LTL Synthesis over Infinite-state Arenas | Vollständige LTL-Synthese über Infinite-State-Arenen | 无限状态阿雷纳斯地区全面LTL综合综合 2307.09776v3 |
Authors: Shaun Azzopardi, Luca Di Stefano, Nir Piterman, Gerardo Schneider
Recently, interest has increased in applying reactive synthesis to richer-than-Boolean domains. A major (undecidable) challenge in this area is to establish when certain repeating behaviour terminates in a desired state when the number of steps is unbounded. Existing approaches struggle with this problem, or can handle at most deterministic games with B"uchi goals. This work goes beyond by contributing the first effectual approach to synthesis with full LTL objectives, based on Boolean abstractions that encode both safety and liveness properties of the underlying infinite arena. We take a CEGAR approach: attempting synthesis on the Boolean abstraction, checking spuriousness of abstract counterstrategies through invariant checking, and refining the abstraction based on counterexamples. We reduce the complexity, when restricted to predicates, of abstracting and synthesising by an exponential through an efficient binary encoding. This also allows us to eagerly identify useful fairness properties. Our discrete synthesis tool outperforms the state-of-the-art on linear integer arithmetic (LIA) benchmarks from literature, solving almost double as many syntesis problems as the current state-of-the-art. It also solves slightly more problems than the second-best realisability checker, in one-third of the time. We also introduce benchmarks with richer objectives that other approaches cannot handle, and evaluate our tool on them.
最近,人们对将被动合成应用于较富的 Boolean 域的兴趣有所增加。 这一领域的一个主要(不可估量的)挑战是确定某些重复行为何时在不限制步骤数量的情况下以预期状态终止。 现有方法在解决这个问题上挣扎, 或能够在大多数决定性游戏中与 B\'uchi 目标打交道。 这项工作还远不止于促进第一个效果方法, 将全面LTL目标结合起来, 其依据是博莱语抽象学, 将基础无限域的安全和活性特性编码起来。 我们采取了CEGAR 方法: 尝试在布尔语抽象抽象学上进行综合, 通过反动性检查检查来检查抽象反战略的虚假性, 并根据反动抽样改进抽象学。 我们限制在假设性游戏中, 通过高效的二进制编码来减少抽象化和合成的复杂性。 这还使我们能够急切地找出有用的公平性。 我们的离散合成工具超越了文献中直线分数计算(LIA) 基准的状态, 试图综合方法几乎是双倍地解决多种抽象反向反向反制战略问题, 也无法作为目前最可靠的基础。
Article 14
Title@2025-05-27 (2): Local Type Inference for Context-Free Session Types
Title: Local Type Inference for Context-Free Session Types | Lokale Typ-Schlussfolgerung für kontextfreie Sitzungstypen | 无背景会话类型本地类型推推推 2505.20855v1 |
Authors: Bernardo Almeida, Andreia Mordido, Vasco T. Vasconcelos
We address the problem of local type inference for a language based on System F with context-free session types. We present an algorithm that leverages the bidirectional type checking approach to propagate type information, enabling first class polymorphism while addressing the intricacies brought about by the sequential composition operator and type equivalence. The algorithm improves the language’s usability by eliminating the need for type annotations at type application sites.
我们处理基于F系统、无上下文会话类型的语言的本地类型推论问题,我们提出一种算法,利用双向类型检查方法来传播类型信息,在解决顺序组成操作员和类型等同引起的复杂问题的同时,使头等级多形态主义成为可能。算法通过消除类型应用网站对类型说明的需求,提高了语言的可用性。
Article 15
Title@2025-05-27 (2): An Efficient Implementation of Guard-Based Synchronization for an Object-Oriented Programming Language
Title: An Efficient Implementation of Guard-Based Synchronization for an Object-Oriented Programming Language | Effiziente Implementierung von Guard-Based Synchronization für eine objektorientierte Programmiersprache | 高效率地实施以警卫为基础的同步,以用于以目标为导向的方案编制语言 2505.20850v1 |
Authors: Shucai Yao, Emil Sekerinski
In the shared variable model of concurrency, guarded atomic actions restrict the possible interference between processes by regions of atomic execution. The guard specifies the condition for entering an atomic region. That is a convenient model for the specification and verification of concurrent programs, but has eschewed efficient execution so far. This article shows how guarded atomic actions, when attached to objects, can be implemented highly efficiently using a combination of coroutines, operating-system worker threads, and dedicated management of object queues and stacks. The efficiency of an experimental language, Lime, is shown to compare favourably with that of C/Pthreads, Go, Erlang, Java, and Haskell on synthetic benchmarks.
在共同的可变货币模型中,有戒备的原子行动限制了原子执行区域之间可能发生的干扰。 卫兵规定了进入原子区域的条件。 这是同时进行程序规格和核查的方便模式,但迄今避免了高效执行。 本条表明,在附在物体上的有戒备的原子行动如何能够高效地实施,同时使用共流、操作系统工人线和专门管理对象队列和堆叠。 实验语言Lime的效率与C/Pthreads、Go、Erlang、Java和Haskell在合成基准上的效率相当。
Article 16
Title@2025-05-27 (2): Thread and Memory-Safe Programming with CLASS
Title: Thread and Memory-Safe Programming with CLASS | Thread- und Memory-sichere Programmierung mit CLASS | 使用 CLASS 的线索和内存安全编程 2505.20848v1 |
Authors: Luís Caires
CLASS is a proof-of-concept general purpose linear programming language, flexibly supporting realistic concurrent programming idioms, and featuring an expressive linear type system ensuring that programs (1) never misuse or leak stateful resources or memory, (2) never deadlock, and (3) always terminate. The design of CLASS and the strong static guarantees of its type system originates in its Linear Logic and proposition-as-types foundations. However, instead of focusing on its theoretical foundations, this paper briefly illustrates, in a tutorial form, an identifiable CLASS session-based programming style where strong correctness properties are automatically ensured by type-checking. Our more challenging examples include concurrent thread and memory-safe mutable ADTs, lazy stream programming, and manipulation of linear digital assets as used in smart contracts.
CLASS是一种概念性通用线性编程语言的证明,灵活地支持现实的并行编程风格,并有一个直线型系统,确保方案(1) 永不误用或泄漏有状态的资源或记忆,(2) 永不陷入僵局,(3) 永远终止;CLASS的设计及其类型系统的强有力的静态保障源于其线性逻辑基础和“提议一式”基础;然而,本文没有注重理论基础,而是以教益形式简要地说明CLASS会议可识别的基于编程的编程风格,在这种风格下,通过打字检查可自动确保强有力的正确性。我们更具有挑战性的例子包括:同时线性和记忆安全的ADT、懒惰的流式编程以及智能合同中使用的线性数字资产的操纵。
Article 17
Title@2025-05-27 (2): Choreographies as Macros
Title: Choreographies as Macros | Choreographien als Makros | 作为宏的舞蹈 2505.20845v1 |
Authors: Alexander Bohosian, Andrew K. Hirsch
Concurrent programming often entails meticulous pairing of sends and receives between participants to avoid deadlock. Choreographic programming alleviates this burden by specifying the system as a single program. However, there are more applications than implementations of choreographies, and developing new implementations takes a lot of time and effort. Our work uses Racket to expedite building a new choreographic language called Choret. Racket has a powerful macro system which allows Choret to reuse much of its infrastructure for greater functionality and correctness.
与此同时,编程往往需要参与者对发送和接收进行细致的配对以避免僵局。编程通过将系统指定为单一程序来减轻这一负担。然而,应用量比舞蹈编程多,开发新的实施需要大量的时间和精力。我们的工作利用拉克茨加快建立一个名为Choret的新编程语言。拉克茨拥有强大的宏观系统,使乔雷特能够重新利用大部分基础设施来提高功能和正确性。
Article 18
Title@2025-05-27 (2): INTERLEAVE: A Faster Symbolic Algorithm for Maximal End Component Decomposition
Title: INTERLEAVE: A Faster Symbolic Algorithm for Maximal End Component Decomposition | INTERLEAVE: Ein schnellerer symbolischer Algorithmus für maximale Endkomponentenzersetzung | 最大末端组件分解的更快的符号性算法 2505.20748v1 |
Authors: Suguman Bansal, Ramneet Singh
This paper presents a novel symbolic algorithm for the Maximal End Component (MEC) decomposition of a Markov Decision Process (MDP). The key idea behind our algorithm INTERLEAVE is to interleave the computation of Strongly Connected Components (SCCs) with eager elimination of redundant state-action pairs, rather than performing these computations sequentially as done by existing state-of-the-art algorithms. Even though our approach has the same complexity as prior works, an empirical evaluation of INTERLEAVE on the standardized Quantitative Verification Benchmark Set demonstrates that it solves 19 more benchmarks (out of 379) than the closest previous algorithm. On the 149 benchmarks that prior approaches can solve, we demonstrate a 3.81x average speedup in runtime.
本文为马克夫决定程序(MDP)最大端组件分解提供了一个新型的象征性算法。 我们的INTELEAVE算法背后的关键思想是,将“连接强的部件”的计算与热切地消除多余的州-州对对,而不是像现有最新算法那样按顺序进行这些计算。 尽管我们的方法与先前的工程一样复杂,但对INTERLEAVE在标准化量化核查基准集上的实证评估表明,它解决的基准(在379项中)比以前的最接近的算法多19项。 关于以前方法可以解决的149项基准,我们展示了运行时平均速度3.81x。
Article 19
Title@2025-05-26 (1): GPUMC: A Stateless Model Checker for GPU Weak Memory Concurrency
Title: GPUMC: A Stateless Model Checker for GPU Weak Memory Concurrency | GPUMC: Ein staatenloser Modellprüfer für GPU-Schwachspeicherkonkurrenz | GPUMC: GPU 弱内存调制货币的无国籍模式检查器 2505.20207v1 |
Authors: Soham Chakraborty, S. Krishna, Andreas Pavlogiannis, Omkar Tuppe
GPU computing is embracing weak memory concurrency for performance improvement. However, compared to CPUs, modern GPUs provide more fine-grained concurrency features such as scopes, have additional properties like divergence, and thereby follow different weak memory consistency models. These features and properties make concurrent programming on GPUs more complex and error-prone. To this end, we present GPUMC, a stateless model checker to check the correctness of GPU shared-memory concurrent programs under scoped-RC11 weak memory concurrency model. GPUMC explores all possible executions in GPU programs to reveal various errors - races, barrier divergence, and assertion violations. In addition, GPUMC also automatically repairs these errors in the appropriate cases. We evaluate GPUMC with benchmarks and real-life GPU programs. GPUMC is efficient both in time and memory in verifying large GPU programs where state-of-the-art tools are timed out. In addition, GPUMC identifies all known errors in these benchmarks compared to the state-of-the-art tools.
然而,与CPU相比,现代GPU提供了更多精细的同值货币特征,如范围等,具有其他的特性,如差异等,从而遵循不同的微弱记忆一致性模型。这些特性和特性使得GPU的同步编程更为复杂和容易出错。为此,我们提出一个无国籍的模型检查器GPUMC,以检查GPU在范围为RC11的微弱记忆同值货币模型下共享的共线程序的正确性。GPUMC探索了在GPU方案中所有可能的处决,以揭示各种错误――种族、屏障差异和侵犯权利行为。此外,GPUMC还自动在适当的情况下纠正这些错误。我们用基准和真实的GPU程序对GPUMC进行了评估。GPUMC在时间和记忆上都有效,在那些最先进的工具被淘汰的大型 GPUPU程序进行核查。此外,GPUMC还查明了这些基准中所有已知的错误,与最先进的工具相比。
Article 20
Title@2025-05-26 (1): The CodeInverter Suite: Control-Flow and Data-Mapping Augmented Binary Decompilation with LLMs
Title: The CodeInverter Suite: Control-Flow and Data-Mapping Augmented Binary Decompilation with LLMs | Die CodeInverter Suite: Control-Flow und Data-Mapping Augmented Binary Decompilation mit LLMs | 代码输入器套件:控制-光和数据-制表增强的二进制解析与LLMS 2503.07215v2 |
Authors: Peipei Liu, Jian Sun, Rongkang Sun, Li Chen, Zhaoteng Yan, Peizheng Zhang, Dapeng Sun, Dawei Wang, Xiaoling Zhang, Dan Li
Binary decompilation plays a vital role in various cybersecurity and software engineering tasks. Recently, end-to-end decompilation methods powered by large language models (LLMs) have garnered significant attention due to their ability to generate highly readable source code with minimal human intervention. However, existing LLM-based approaches face several critical challenges, including limited capability in reconstructing code structure and logic, low accuracy in data recovery, concerns over data security and privacy, and high computational resource requirements. To address these issues, we develop the CodeInverter Suite, making three contributions: (1) the CodeInverter Workflow (CIW) is a novel prompt engineering workflow that incorporates control flow graphs (CFG) and explicit data mappings to improve LLM-based decompilation. (2) Using CIW on well-known source code datasets, we curate the CodeInverter Dataset (CID), a domain-specific dataset containing 8.69 million samples that contains CFGs and data mapping tables. (3) We train the CoderInverter Models (CIMs) on CID, generating two lightweight LLMs (with 1.3B and 6.7B parameters) intended for efficient inference in privacy-sensitive or resource-constrained environments. Extensive experiments on two benchmarks demonstrate that the CIW substantially enhances the performance of various LLMs across multiple metrics. Our CIM-6.7B can achieve state-of-the-art decompilation performance, outperforming existing LLMs even with over 100x more parameters in decompilation tasks, an average improvement of 11.03% in re-executability, 6.27% in edit similarity.
最近,由大型语言模型(LLMS)驱动的端到端的分解方法因其在最小人力干预下生成高可读源代码的能力而引起极大关注。然而,基于LLM的现有方法面临若干重大挑战,包括重建代码结构和逻辑的能力有限,数据恢复的准确性低,对数据安全和隐私的关切,以及高计算资源要求。为了解决这些问题,我们开发了代码Inverter套件,作出了三项贡献:(1)代码Inververer Working(CIW)是一个新型的快速工程参数,其中纳入了控制流程图(CFG)和明确的数据映射,以改进基于LLMM的分解。 (2) 使用基于LLMM的源代码数据集的CIW,我们整理了代码数据集(CIDInverer Data)。 27 具体领域数据集包含869万个样本,其中包含CFGs和数据绘图表。(3) 我们为CEDRInverer Inverer Inforlation 模型(CIMS)提供了三项贡献:(1) CICMS-CS-CS-CS-CS-CS-S-SB Sqreablental SupyLMislation Supulate decessional decessional decessional decessional decessional decessional decessional deal deal deal degisl)系统,在两部(1.3和六B 和六B 和六B 的精基级的精基底图,在高的精基底图性能性能性能性能性能性能性能性能性实验性能性能性测试中,在二BLMSBLMSDRILMSBDRIDRILMSBDRIDRBDRBDRBDRBLMSDRBSBLMSBLMSBLMSBSBLMSBDRisal)。
Article 21
Title@2025-05-26 (1): Abstractions-of-Thought: Intermediate Representations for LLM Reasoning in Hardware Design
Title: Abstractions-of-Thought: Intermediate Representations for LLM Reasoning in Hardware Design | Abstractions-of-Thought: Mittlere Darstellungen für LLM-Reasoning im Hardware-Design | 设计硬件设计方面LLM理由的中间代表 2505.15873v2 |
Authors: Matthew DeLorenzo, Kevin Tieu, Prithwish Jana, Piyush Jha, Dileep Kalathil, Vijay Ganesh, Jeyavijayan Rajendran
Large language models (LLMs) have achieved impressive proficiency on logic and programming tasks, often rivaling expert-level performance. However, generating functionally correct hardware description language (HDL) code from natural language specifications remains challenging, primarily in data-scarce domains. Therefore, we present Abstractions-of-Thought (AoT) - a training-free, inference-only prompting framework to mitigate misinterpretations and reasoning pitfalls of LLMs through a series of task-based abstractions within the prompting procedure, assisting in the transition from high-level to low-level representations of hardware. Furthermore, AoT consists of the following stages: (1) an LLM-based classification of hardware design patterns, (2) a structured intermediate representation (IR) to separate functional decomposition from code syntax, and (3) a line-by-line pseudocode solution enabling a more direct mapping to the final Verilog implementation. Experimental results on the VerilogEval benchmark depict that AoT demonstrates improvements in functionality when applied to large non-reasoning models (such as GPT-4o, outperforming all baseline techniques (including 1-shot, Chain-of-Thought, and Tree-of-Thought) while significantly reducing the generated tokens by 1.8-5.2x compared to popular Tree-of-Thought prompting.
大型语言模型(LLMS)在逻辑和编程任务方面达到了令人印象深刻的熟练程度,往往与专家一级的业绩相匹敌,然而,从自然语言规格中生成功能正确的硬件描述语言(HDL)代码仍然具有挑战性,主要是在数据偏差域。因此,我们介绍了“推理摘要(AoT)”——一个无培训的、仅推论的促动框架,通过在加速程序内的一系列基于任务的抽取,减轻LLMS的误解和推理缺陷,协助从高层次向低层次的硬件展示过渡。此外,AoT由以下阶段组成:(1)基于LMM的硬件设计模式分类,(2)结构化的中间代表(IR),将功能分解与代码合成法分开,(3)一行一行的假码解决方案,以便能够更直接地测绘最终的VerilogEval基准显示,AoT在应用大型非理性模型(如GPT-4o,在将所有基线技术(包括一图一图、一楼和一模直径图)在大幅降低一模的树的树形图)时,实验性)显示,AOI-Thal-T-T-T-T-Th-T-T-T-T-T-T-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-hal-h-h-h-h-h-h-h-h-h-h-s-hal-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-hal-hal-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h-h
Article 22
Title@2025-05-26 (1): LEGO-Compiler: Enhancing Neural Compilation Through Translation Composability
Title: LEGO-Compiler: Enhancing Neural Compilation Through Translation Composability | LEGO-Compiler: Neurale Kompilierung durch Übersetzungskompatibilität verbessern | LEGO-Compuper:通过翻译集成加强神经汇编 2505.20356v1 |
Authors: Shuoming Zhang, Jiacheng Zhao, Chunwei Xia, Zheng Wang, Yunji Chen, Xiaobing Feng, Huimin Cui
Large language models (LLMs) have the potential to revolutionize how we design and implement compilers and code translation tools. However, existing LLMs struggle to handle long and complex programs. We introduce LEGO-Compiler, a novel neural compilation system that leverages LLMs to translate high-level languages into assembly code. Our approach centers on three key innovations: LEGO translation, which decomposes the input program into manageable blocks; breaking down the complex compilation process into smaller, simpler verifiable steps by organizing it as a verifiable LLM workflow by external tests; and a feedback mechanism for self-correction. Supported by formal proofs of translation composability, LEGO-Compiler demonstrates high accuracy on multiple datasets, including over 99% on ExeBench and 97.9% on industrial-grade AnsiBench. Additionally, LEGO-Compiler has also acheived near one order-of-magnitude improvement on compilable code size scalability. This work opens new avenues for applying LLMs to system-level tasks, complementing traditional compiler technologies.
大型语言模型(LLMS)有可能革命我们如何设计和实施汇编者和代码翻译工具。然而,现有的LLMS努力处理长期和复杂的程序。我们引入了LEGO-Compiler,这是一个新的神经编译系统,利用LLLMS将高层次语言翻译成组装代码。我们的方法以三大创新为中心:LEGO翻译,将输入程序分解成可控区块;通过外部测试将复杂的汇编进程组织成可核查的LLM工作流程,从而将复杂的汇编进程分为更小、更简单的可核查的步骤;以及自我校正回馈机制。在翻译可复性的正式证明的支持下,LEGO-Compiler在多个数据集上表现出高度的准确性,包括ExeBench99%以上的数据,AnsiBench工业级的97.9%的数据。此外,LEGO-Compiler在可比较的编码大小缩放性上也接近一个顺序的磁性改进。这项工作开辟了新的途径,将LLMSMs应用于系统层面的任务,补充传统的编译技术。
Article 23
Title@2025-05-26 (1): Fully Randomized Pointers
Title: Fully Randomized Pointers | Vollkommen Randomisierte Zeiger | 完全随机化指针 2405.12513v2 |
Authors: Sai Dhawal Phaye, Gregory J. Duck, Roland H. C. Yap, Trevor E. Carlson
Memory errors continue to be a critical concern for programs written in low-level programming languages such as C and C++. Many different memory error defenses have been proposed, each with varying trade-offs in terms of overhead, compatibility, and attack resistance. Some defenses are highly compatible but only provide minimal protection, and can be easily bypassed by knowledgeable attackers. On the other end of the spectrum, capability systems offer very strong (unforgeable) protection, but require novel software and hardware implementations that are incompatible by definition. The challenge is to achieve both very strong protection and high compatibility. In this paper, we propose {\em Fully Randomized Pointers} FRP as a strong memory error defense that also maintains compatibility with existing binary software. The key idea behind FRP is to design a new pointer encoding scheme that allows for the full randomization of most pointer bits, rendering even brute force attacks impractical. We design a FRP encoding that is: (1) compatible with existing binary code (recompilation not needed); and (2) decoupled from the underlying object layout. FRP is prototyped as: (i) a software implementation (BlueFat) to test security and compatibility; and (ii) a proof-of-concept hardware implementation (GreenFat) to evaluate performance. We show FRP is secure, practical, and compatible at the binary level, while our hardware implementation achieves low performance overheads (< 4%).
对于以C和C+++等低级别编程语言编写的程序来说,内存错误仍然是一个至关重要的问题。许多不同的内存错误防护已经提出了许多不同的防线,每个防线在管理费、兼容性和攻击抗力方面都有不同的取舍。有些防线高度兼容,但只能提供最低限度的保护,而且很容易被知情攻击者绕过。在频谱的另一端,能力系统提供非常强(不可忘怀)的保护,但需要新颖的软件和硬件执行,但根据定义不相容。挑战在于实现非常强的保护和高度兼容性。在本文中,我们提出“全调调调制点”FRP作为强大的内存错误防线,同时保持与现有二进制软件的兼容性。FRP的关键思想是设计一个新的指针编码计划,使大多数点位完全随机化,甚至粗力攻击不切实际。我们设计的FRP编码是:(1) 与现有的二进制代码兼容性(不需要反调);(2) 与基本对象布局脱钩。FRP的原为原型:(i)一个软件执行水平(BBFFFFFFFD)的兼容性、测试的兼容性和性)到测试性。
Article 24
Title@2025-05-25 (7): CLEVER: A Curated Benchmark for Formally Verified Code Generation
Title: CLEVER: A Curated Benchmark for Formally Verified Code Generation | CLEVER: Ein kuratierter Benchmark für die formal verifizierte Codegenerierung | 正式核实的代码生成基准 2505.13938v3 |
Authors: Amitayush Thakur, Jasper Lee, George Tsoukalas, Meghana Sistla, Matthew Zhao, Stefan Zetzsche, Greg Durrett, Yisong Yue, Swarat Chaudhuri
We introduce ${\rm C{\small LEVER}}$, a high-quality, curated benchmark of 161 problems for end-to-end verified code generation in Lean. Each problem consists of (1) the task of generating a specification that matches a held-out ground-truth specification, and (2) the task of generating a Lean implementation that provably satisfies this specification. Unlike prior benchmarks, ${\rm C{\small LEVER}}$ avoids test-case supervision, LLM-generated annotations, and specifications that leak implementation logic or allow vacuous solutions. All outputs are verified post-hoc using Lean’s type checker to ensure machine-checkable correctness. We use ${\rm C{\small LEVER}}$ to evaluate several few-shot and agentic approaches based on state-of-the-art language models. These methods all struggle to achieve full verification, establishing it as a challenging frontier benchmark for program synthesis and formal reasoning. Our benchmark can be found on GitHub(https://github.com/trishullab/clever) as well as HuggingFace(https://huggingface.co/datasets/amitayusht/clever). All our evaluation code is also available online(https://github.com/trishullab/clever-prover).
我们引入了一个高质量的、成熟的161个问题基准,用于在利昂最终到最终核实代码生成。 每个问题包括:(1) 制定符合固定地面真相规格的规格,(2) 制定符合该规格的精干实施任务。 不同于以前的基准, $ C小LEVL$避免测试- 情况监督, LLM 生成说明, 以及泄漏执行逻辑或允许空洞解决方案的规格。 所有产出都使用 Lean 的字型检查器进行后热检测,以确保机器检查的正确性。 我们使用$ C小LEO$来评估几张图片和基于最新语言模型的试探性方法。 这些方法都为实现全面核查而奋斗,将它确定为具有挑战性的方案合成和正式推理基准。 我们的基准可以在 GitHub(https://github.com/trishallballb/cleggest) 上找到(https://hulishclasclibs/combestregress)。
Article 25
Title@2025-05-25 (7): Proceedings 16th International Workshop on Programming Language Approaches to Concurrency and Communication-cEntric Software
Title: Proceedings 16th International Workshop on Programming Language Approaches to Concurrency and Communication-cEntric Software | 16. Internationaler Workshop zur Programmierung von Sprachkonzepten für Concurrency und Communication-cEntric Software | 第16次关于以方案语言处理货币和通信软件的国际讲习班 2505.19078v1 |
Authors: Farzaneh Derakhshan, Jan Hoffmann
This volume contains the proceedings of PLACES 2025, the 16th edition of the Workshop on Programming Language Approaches to Concurrency and Communication-cEntric Software. The workshop is scheduled to take place in Hamilton, Canada, on May 4, 2025, as a satellite event of ETAPS, the European Joint Conferences on Theory and Practice of Software. PLACES offers a forum for exchanging new ideas on how to address the challenges of concurrent and distributed programming and how to improve the foundations of modern and future computer applications. PLACES welcomes researchers from various fields, and its topics include the design of new programming languages, models for concurrent and distributed systems, type systems, program verification, and applications in various areas (e.g., microservices, sensor networks, blockchains, event processing, business process management).
该卷载有2025年PLACES会议记录,即第16版《对货币和通信-计算机软件采用方案编制语言方法讲习班》,该讲习班定于2025年5月4日在加拿大汉密尔顿举行,作为欧洲软件理论和实践问题欧洲联合会议的卫星活动,该讲习班提供了一个论坛,以交流关于如何应对同时和分散编程的挑战以及如何改进现代和今后计算机应用基础的新想法;该讲习班欢迎来自各领域的研究人员,其专题包括设计新的编程语言、同时和分布系统的模式、类型系统、程序核查以及各个领域的应用(例如微服务、传感器网络、块链、事件处理、业务流程管理)。
Article 26
Title@2025-05-24 (6): Algorithmic Language Models with Neurally Compiled Libraries
Title: Algorithmic Language Models with Neurally Compiled Libraries | Algorithmische Sprachmodelle mit neurally compiled Bibliotheken | 具有神经编译图书馆的算法语言模型 2407.04899v2 |
Authors: Lucas Saldyt, Subbarao Kambhampati
Important tasks such as reasoning and planning are fundamentally algorithmic, meaning that solving them robustly requires acquiring true reasoning or planning algorithms, rather than shortcuts. Large Language Models lack true algorithmic ability primarily because of the limitations of neural network optimization algorithms, their optimization data and optimization objective, but also due to architectural inexpressivity. To solve this, our paper proposes augmenting LLMs with a library of fundamental operations and sophisticated differentiable programs, so that common algorithms do not need to be learned from scratch. We add memory, registers, basic operations, and adaptive recurrence to a transformer architecture built on LLaMA3. Then, we define a method for directly compiling algorithms into a differentiable starting library, which is used natively and propagates gradients for optimization. In this preliminary study, we explore the feasability of augmenting LLaMA3 with a differentiable computer, for instance by fine-tuning small transformers on simple algorithmic tasks with variable computational depth.
推理和规划等重要任务根本上是算法性的,这意味着解决这些问题需要获得真正的推理或规划算法,而不是捷径。 大型语言模型缺乏真正的算法能力,主要是因为神经网络优化算法、优化数据和优化目标的局限性,但也因为建筑缺乏表达性。 为了解决这个问题,我们的论文提议用一个基本操作和复杂的不同程序库来增加LLMS, 从而不必从零开始学习通用算法。 我们把记忆、 登记册、 基本操作和适应性重现添加到以 LLLama3 为基础的变压器结构中。 然后, 我们定义了一种直接将算法编集成不同启动图书馆的方法, 后者是本地使用的, 并推广梯度, 用于优化。 在这项初步研究中, 我们探索用一种不同计算机来增加LLAMA3 的可行性, 例如微调小变压器, 其简单算法工作具有可变计算深度。
Article 27
Title@2025-05-24 (6): Automated Verification of Monotonic Data Structure Traversals in C
Title: Automated Verification of Monotonic Data Structure Traversals in C | Automatisierte Überprüfung der monotonen Datenstruktur Traversals in C | 自动核查C的单声量数据结构轨迹 2505.18818v1 |
Authors: Matthew Sotoudeh
Bespoke data structure operations are common in real-world C code. We identify one common subclass, monotonic data structure traversals (MDSTs), that iterate monotonically through the structure. For example, strlen iterates from start to end of a character array until a null byte is found, and a binary search tree insert iterates from the tree root towards a leaf. We describe a new automated verification tool, Shrinker, to verify MDSTs written in C. Shrinker uses a new program analysis strategy called scapegoating size descent, which is designed to take advantage of the fact that many MDSTs produce very similar traces when executed on an input (e.g., some large list) as when executed on a ‘shrunk’ version of the input (e.g., the same list but with its first element deleted). We introduce a new benchmark set containing over one hundred instances proving correctness, equivalence, and memory safety properties of dozens of MDSTs found in major C codebases including Linux, NetBSD, OpenBSD, QEMU, Git, and Musl. Shrinker significantly increases the number of monotonic string and list traversals that can be verified vs. a portfolio of state-of-the-art tools.
声明数据结构操作在现实世界 C 代码中很常见 。 我们发现一个共同的子类, 单声道数据结构跨行( MDSTs) , 以单声道方式在结构中循环。 例如, 从字符阵的开始到结束, 直到找到一个空字字节, 一个二进式搜索树将树根的迭代插入到叶子上。 我们描述一个新的自动核查工具, Shrinkerer , 以校验以 C 书写的 MDST 。 Shrinker 使用一个新的程序分析战略, 称为替罪状大小下降, 旨在利用许多 MDSTs 在输入( 例如, 某些大列表) 执行时产生非常相似的痕迹, 在输入的“ hrunk” 版本( 例如, 相同的列表但第一个元素被删除 ) 执行时 。 我们引入一个新的基准集, 包含超过100个证明主要 C 代码库( 包括 Linux、 NetBSDSD、 OpenBSD、 QEMUMU、 GIMMI- 和 Muslink 工具的版本数 。
Article 28
Title@2025-05-24 (6): Autocomp: LLM-Driven Code Optimization for Tensor Accelerators
Title: Autocomp: LLM-Driven Code Optimization for Tensor Accelerators | Autocomp: LLM-gesteuerte Code-Optimierung für Tensor-Beschleuniger | 自动comp: LLM- Driven 代码对 Tensor 加速器的优化 2505.18574v1 |
Authors: Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao
Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today’s computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages like specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three categories of representative workloads and two different accelerators, we demonstrate that Autocomp-optimized code runs 5.6x (GEMM) and 2.7x (convolution) faster than the vendor-provided library, and outperforms expert-level hand-tuned code by 1.4x (GEMM), 1.1x (convolution), and 1.3x (fine-grained linear algebra). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.
在当今的计算环境中,硬件加速器,特别是那些设计用于高压处理的硬件加速器已经变得无处不在。然而,即便在建设数据编纂器方面做出了巨大努力,程序制作这些高压加速器仍然具有挑战性,使得其潜在潜力得不到充分利用。最近,在大量代码方面受过培训的大型语言模型(LLMS)在代码生成和优化任务中表现出了巨大的希望,但在生成像专门的高压加速器代码这样的低资源语言方面仍是一个重大挑战。我们用Autocomp 来应对这一挑战。Autcomp(Autocomp)赋予加速器程序程序程序员通过自动LLOM驱动搜索将域知识和硬件反馈用于优化代码。我们通过下列方法完成这项工作:1)将每个优化的通过结构化的两阶段快速、分为规划和代码生成阶段;2)在规划过程中通过简洁和调整的优化菜单插入域知识,3)将硬件的准确性和性度指标整合成为每次搜索的反馈。在三类代表性工作量和两个不同的内部加速器中,我们展示了自动整合的代码在5.6x(GEMMLA-D-A-D-D-D-D-CAR-D-D-C-C-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-C-C-D-D-C-C-C-C-C-C-C-C-C-C-C-C-D-D-D-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-S-S-S-S-S-S-S-C-C-S-C-C-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S