Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp5666390pxb; Mon, 28 Mar 2022 15:44:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwAUrF/Wh18FEnCHDj417owFhpgjMpL4n/YqSTkZRgke4S+QR/+bxEE5vx3YyjC11wThR3z X-Received: by 2002:a05:6870:8921:b0:db:2ef8:f221 with SMTP id i33-20020a056870892100b000db2ef8f221mr704114oao.191.1648507465877; Mon, 28 Mar 2022 15:44:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648507465; cv=none; d=google.com; s=arc-20160816; b=RZR2alDNLZIeKzc9y6t9fhppJXuYq0igv4hpGEA9kKs+yOBJS/Z4Y7SO6z4L1pqTGH EfcJPoTBWw5o+VlLdYVN1LmGL/QH0Gn8jSftXO+zRgBCv5XYiX1RlSbEc5zq65fTJkbZ neZqRuHMsltMm+0kvqnI43fcCAFrK7Ek72V+AEmCcsT1UOt5mEukw6VeuVlYN91MFEgU fcI9fubQWK5AbsXrIMU9VdHV+/O0TCtGkR/JbdkgS/MkAEyspbV8sN3PYCX8kdQphGsm juWgLUfvx/0O8l60FGwYXxRbAeWTkYXI9r0uDa4TKgPWW6QyGhu5hfnmLTahwOBvVwdQ wefQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=1yElFHWO68nIZ61/n3SeH5EHzDx7vWD3HnJ2nuQnGj4=; b=1AArbTmJyf4hfM2TgMXgegnPeokGCex0TTv+nh/nBHReTFlG1Wku46fk99u15cUFYO fGzc6Y4hLyE/hTmwW31X+/fHmALiTk5hSwuHGXX7gkgJH75esvrIMdjo8Dj6s/MYkm3Q UOPp8PsHo35JLJMpVRMecGP0HjJ81YRLMWBTFXLNYDIcbyz40SyQDo69QQMQXXVVZ4kp 9rg1shJun6iktAfgVBetUxZaaW+V9JLmpAzC/RTla96QMuDXyZZwae/3+0HXOOI+1jFd B9J6SWwsMfwMmpG/8Rlg8Z9OBLRs0bTF+lgeGbQDhA6ubHaWYtGDwHxL7DhEK7y6MoIu 21mA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id i17-20020a056871029100b000de177e5a08si11913691oae.57.2022.03.28.15.44.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 15:44:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8CB4921DF09; Mon, 28 Mar 2022 14:50:05 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240052AbiC1Jys (ORCPT + 99 others); Mon, 28 Mar 2022 05:54:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239800AbiC1Jyr (ORCPT ); Mon, 28 Mar 2022 05:54:47 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 168CC54BD7; Mon, 28 Mar 2022 02:53:04 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A1405D6E; Mon, 28 Mar 2022 02:53:03 -0700 (PDT) Received: from FVFF77S0Q05N (unknown [10.57.8.66]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0BE283F66F; Mon, 28 Mar 2022 02:53:01 -0700 (PDT) Date: Mon, 28 Mar 2022 10:52:54 +0100 From: Mark Rutland To: Segher Boessenkool Cc: Peter Zijlstra , Nick Desaulniers , Borislav Petkov , Nathan Chancellor , x86-ml , lkml , llvm@lists.linux.dev, Josh Poimboeuf , linux-toolchains@vger.kernel.org Subject: Re: clang memcpy calls Message-ID: References: <20220325151238.GB614@gate.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220325151238.GB614@gate.crashing.org> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 25, 2022 at 10:12:38AM -0500, Segher Boessenkool wrote: > Hi! > > On Fri, Mar 25, 2022 at 03:13:36PM +0100, Peter Zijlstra wrote: > > > > +linux-toolchains > > > > On Fri, Mar 25, 2022 at 12:15:28PM +0000, Mark Rutland wrote: > > > a) The compiler expects the out-of-line implementations of functions > > > ARE NOT instrumented by address-sanitizer. > > > > > > If this is the case, then it's legitimate for the compiler to call > > > these functions anywhere, and we should NOT instrument the kernel > > > implementations of these. If the compiler wants those instrumented it > > > needs to add the instrumentation in the caller. > > The compiler isn't assuming anything about asan. The compiler generates > its code without any consideration of what asan will or will not do. > The burden of making things work is on asan. I think we're talking past each other here, so let me be more precise. :) The key thing is that when the user passes `-fsantize=address`, instrumentation is added by (a part of) the compiler. That instrumentation is added under some assumptions as to how the compiler as a whole will behave. With that in mind, the question is how is __attribute__((no_sanitize_address)) intended to work when considering all the usual expectations around how the compiler can play with memcpy and similar? I think the answer to that is "this hasn't been thought about in great detail", which leads to the question of "how could/should this be made to work?", which is what I'm on about below. > It is legitimate to call (or not call!) memcpy anywhere. memcpy always > is __builtin_memcpy, which either or not does a function call. > > > > AFAICT The two options for the compiler here are: > > > > > > 1) Always inline an uninstrumented form of the function in this case > > > > > > 2) Have distinct instrumented/uninstrumented out-of-line > > > implementations, and call the uninstrumented form in this case. > > The compiler should not do anything differently here if it uses asan. > The address sanitizer and the memcpy function implementation perhaps > have to cooperate somehow, or asan needs more smarts. This needs to > happen no matter what, to support other things calling memcpy, say, > assembler code. I appreciate where you're coming from here, but I think you're approaching the problem sideways. > > > So from those examples it seems GCC falls into bucket (a), and assumes the > > > blessed functions ARE NOT instrumented. > > No, it doesn't show GCC assumes anything. No testing of this kind can > show anything alike. I appreciate that; hence "it seems". What I'm getting at is that the *instrumentation* is added under some assumptions (those of whoever wrote the instrumentation code), and those assumptions might not match the behaviour of the compiler, or the behaviour we expect for __attribute__((no_sanitize_address)). We need to define *what the semantics are* so that we can actually solve the problem, e.g. is a memcpy implementation expected to be instrumented or not? > > > I think something has to change on the compiler side here (e.g. as per > > > options above), and we should align GCC and clang on the same > > > approach... > > GCC *requires* memcpy to be the standard memcpy always (i.e. to have the > standard-specified semantics). This means that it will have the same > semantics as __builtin_memcpy always, and either or not be a call to an > external function. It can also create calls to it out of thin air. I understand all of that. Given the standard doesn't say *anything* about instrumentation, what does GCC *require* instrumentation-wise of the memcpy implementation? What happens *in practice* today? For example, is the userspace implementation of memcpy() instrumented for AddressSanitizer, or not? Thanks, Mark.