Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp5549273pxb; Mon, 28 Mar 2022 14:04:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwFmJMFvewqQLZf0yYKRxwoPqokSk3m7shmup2S8BaV5mkhNMuUCN7bX/FNtsiN3DqgGxUH X-Received: by 2002:a17:90b:2242:b0:1c6:80e3:71b6 with SMTP id hk2-20020a17090b224200b001c680e371b6mr1024948pjb.152.1648501471366; Mon, 28 Mar 2022 14:04:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648501471; cv=none; d=google.com; s=arc-20160816; b=MBEzEmQeQnvR36RnzzOrjpDlf1aaixdu9fwVcSL2U2DDajwQ3Ngz4EE7qclVqf+iOU p9P51hSOJHVwUIO7lKV3pKP38dYO6mqfZAy1dGeYjseXn7gbUlF3W8oxe+FxI/JaECS7 8WfzPMsulq652txLe7laSTxpmio3mAf3duUbCiCkvm5Dql6qnKHgEWoTATfiA8VG74LN krargTrwUDBmT7AtdOyh41AABW8VorKsroe5kMlQSahKMhcS/1aWFNuFgmZwvS7IAtMZ g+Fb7ScD5G2bZIc96W5mCGbzFV67fvKGx7KIHji8TuGV4NWpQB+Sf/KZ5lDquLUO89eA QaFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=VFKVFBJllPgEyW56PiRtdvRwkwA7GHXWbr85Z7t/jMI=; b=J64Tgj+Ie/wpBDnkfGh/Z3BeAC5ICsNttJKiBrbPOrM2fi+7jYmUKJoGD98mO8oc0C dq3NolqGopr7MXDzq0JgD7gI8zqzrubSorDdDhOVb2XdBFaY5gVfjnqh6jhrMOyeXKtE bVmc0Vd3l+fxAImqgh7Onw+90x1J2YM8RA45ih09iZBMRz7tWHxE6KL27htJ6Rvk0Vi6 95c2iPY+Qc+cjrvte5reayyPqoqVg+8YxBxzlheOH5B+QZ3hRSJ1IxaELCd/akwqakvZ frnBxgAkvBbBPGKFzoRU0D6vfvemRst+N/nmsJWBTWrrtWsomDC088ZOpwK2y/OY8/ya 6tkA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id g15-20020a1709029f8f00b00153bb80673bsi13736643plq.52.2022.03.28.14.04.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 14:04:31 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EEF0A6E8F3; Mon, 28 Mar 2022 13:59:53 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244157AbiC1PBt (ORCPT + 99 others); Mon, 28 Mar 2022 11:01:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244159AbiC1O77 (ORCPT ); Mon, 28 Mar 2022 10:59:59 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 51380EAA; Mon, 28 Mar 2022 07:58:17 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 09F86D6E; Mon, 28 Mar 2022 07:58:17 -0700 (PDT) Received: from lakrids (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B75763F73B; Mon, 28 Mar 2022 07:58:15 -0700 (PDT) Date: Mon, 28 Mar 2022 15:58:10 +0100 From: Mark Rutland To: Segher Boessenkool Cc: Peter Zijlstra , Nick Desaulniers , Borislav Petkov , Nathan Chancellor , x86-ml , lkml , llvm@lists.linux.dev, Josh Poimboeuf , linux-toolchains@vger.kernel.org Subject: Re: clang memcpy calls Message-ID: References: <20220325151238.GB614@gate.crashing.org> <20220328142220.GI614@gate.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220328142220.GI614@gate.crashing.org> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 28, 2022 at 09:22:20AM -0500, Segher Boessenkool wrote: > Hi! > > On Mon, Mar 28, 2022 at 10:52:54AM +0100, Mark Rutland wrote: > > On Fri, Mar 25, 2022 at 10:12:38AM -0500, Segher Boessenkool wrote: > > > The compiler isn't assuming anything about asan. The compiler generates > > > its code without any consideration of what asan will or will not do. > > > The burden of making things work is on asan. > > > > I think we're talking past each other here, so let me be more precise. :) > > > > The key thing is that when the user passes `-fsantize=address`, instrumentation > > is added by (a part of) the compiler. That instrumentation is added under some > > assumptions as to how the compiler as a whole will behave. > > > > With that in mind, the question is how is __attribute__((no_sanitize_address)) > > intended to work when considering all the usual expectations around how the > > compiler can play with memcpy and similar? > > The attribute is about how the *current* function is instrumented, not > about anything called by this function. This is clearly documented: > 'no_sanitize_address' > 'no_address_safety_analysis' > The 'no_sanitize_address' attribute on functions is used to inform > the compiler that it should not instrument memory accesses in the > function when compiling with the '-fsanitize=address' option. The > 'no_address_safety_analysis' is a deprecated alias of the > 'no_sanitize_address' attribute, new code should use > 'no_sanitize_address'. I understand this, and I have read the documentation. I'm not claiming any *individual* semantic is wrong, just that in combination this doesn't provide what people *need* (even if it strictly matches what is documented). My argument is: if the compiler is permitted to implictly and arbitrarily add calls to instrumented functions within a function marked with `no_sanitize_address`, the `no_sanitize_address` attribute is effectively useless, and therefore *something* needs to change. > > > The compiler should not do anything differently here if it uses asan. > > > The address sanitizer and the memcpy function implementation perhaps > > > have to cooperate somehow, or asan needs more smarts. This needs to > > > happen no matter what, to support other things calling memcpy, say, > > > assembler code. > > > > I appreciate where you're coming from here, but I think you're approaching the > > problem sideways. > > I am stating facts, I am not trying to solve your problem there. It > seemed to me (and still does) that you didn't grasp all facts here. Sorry, but I think you're reading my replies uncharitably if you think that. > > We need to define *what the semantics are* so that we can actually solve the > > problem, e.g. is a memcpy implementation expected to be instrumented or not? > > That is up to the memcpy implementation itself, of course. Sorry, but that doesn't make sense to me. When the compiler instruments a function with AddressSanitizer, it must have *some* assumption about whether memcpy() itself will be instrumented, such that it won't miss some necessary instrumentation (and ideally, for performance reasons doesn't have redundant instrumentation). If the story is "memcpy may or may not be instrumented", then the only way to guarantee necessary instrumentation is for the compiler to *always* place it in the caller (unless forbidden by `no_sanitize_address`). If that were the case, the kernel can make things work by simply not instrumenting memcpy and friends. IIUC today those assumptions are not documented. Is the behaviour consistent? > > > GCC *requires* memcpy to be the standard memcpy always (i.e. to have the > > > standard-specified semantics). This means that it will have the same > > > semantics as __builtin_memcpy always, and either or not be a call to an > > > external function. It can also create calls to it out of thin air. > > > > I understand all of that. > > And still you want us to do something that is impossible under those > existing constraints :-( If that's truly impossible, that's very unfortunate. FWIW, I can believe this would require tremendous effort to change, even if it's not truly impossible. If that is the case, it means that kernel side we have to never instrument our implementation of memcpy and friends for correctness reasons, which has the unfortunate property of losing coverage in the cases we *would* like to use an instrumented memcpy. > If you want the external memcpy called by modules A, B, C to not be > instrumented, you have to link A, B, and C against an uninstrumented > memcpy. This is something the kernel will have to do, the compiler has > no say in how the kernel is linked together. Unfortunately that options doesn't really fix the `no_sanitize_address` semantic, and forces us to move *all* uninstrumentable code out into separate compilation units, etc. This isn't *impossible*, but is *very* painful. Thanks, Mark.