Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp5629230pxb; Mon, 28 Mar 2022 15:11:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxKYQoYIWcAz9QlYQPMhLy0T7qtEfYD+WyzdZgosG1K/t1d8VoyPyVWPz06kqjiOv1Dx2GK X-Received: by 2002:a17:90b:4d0d:b0:1c6:84b6:d945 with SMTP id mw13-20020a17090b4d0d00b001c684b6d945mr1218354pjb.59.1648505503787; Mon, 28 Mar 2022 15:11:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648505503; cv=none; d=google.com; s=arc-20160816; b=MBZEr4mC+76ca0OartaBwuS5r5bZwXq13d8xNOJJDLE8gmgzmXGWvgiciBEjgggB+/ WCD3pb+yGxPmCH1tZex40twbUARD6etG5hNSyIfBPVisE+iNhw6RgxPPGTRGvc4krryi FvviCVVE9IRCRD9s9+o+M3KwRPlGA8aPuh8ibS8wa3Gwj6lwWzywwe0hldWTXRgr5rGK 0sLf/oq1cxqA98OVISnviZaWwuqRR8gUHoxsKylWTCdSIUhSxYb8+XVu6DevvEmKBPTk UPovTOiNzHc91rGDTKzznbAEdiDhUYJNu9+TApAVX0ovRBwkUeiy7cNjIqmdTI5gCXCT kR2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=paUFmy4H9/aUX4NDudxSqsSdrOX17nQEmG8DTa+i2KQ=; b=R2L2ZnTcUj++KlM7oyZbaSuTI1g0vf1HRVUZ6isH4JmH1XM6eMMTYPVYbTWexM8K7U /FfPFwHQhT8ocdnJ9QqlWY0dG6+1kYXAetbxLfngD/AuIBw4AnReo5BSeFMLPzhGHiR/ A0AKjJLtpzktAsg0y1UHsgdt7E98fRrOmFRWle9Uo+IQ5TLiyKb3jT1dYZ/SEI0pdRg0 AMG6oDLJR7sXh/nXPJrdK8/RQcSyU6PyHJXO+Q7Yi1GJwkKqQWIaFxj6eOIlcWFSyHK7 ESm1DZuYdm5YAtW7jyFZ/Ud7a8XnHy265jLxIxyoZSO1hl0yEaw3xCy1zXzpECBQ2l05 tpHA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id e16-20020a17090301d000b00153b2d1660csi14412719plh.532.2022.03.28.15.11.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 15:11:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7A8244553A; Mon, 28 Mar 2022 14:31:46 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241035AbiC1QGK (ORCPT + 99 others); Mon, 28 Mar 2022 12:06:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237767AbiC1QGI (ORCPT ); Mon, 28 Mar 2022 12:06:08 -0400 Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6FF2822290; Mon, 28 Mar 2022 09:04:25 -0700 (PDT) Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 22SFxw1v015812; Mon, 28 Mar 2022 10:59:58 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 22SFxwpE015811; Mon, 28 Mar 2022 10:59:58 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Mon, 28 Mar 2022 10:59:57 -0500 From: Segher Boessenkool To: Mark Rutland Cc: Peter Zijlstra , Nick Desaulniers , Borislav Petkov , Nathan Chancellor , x86-ml , lkml , llvm@lists.linux.dev, Josh Poimboeuf , linux-toolchains@vger.kernel.org Subject: Re: clang memcpy calls Message-ID: <20220328155957.GK614@gate.crashing.org> References: <20220325151238.GB614@gate.crashing.org> <20220328142220.GI614@gate.crashing.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 28, 2022 at 03:58:10PM +0100, Mark Rutland wrote: > On Mon, Mar 28, 2022 at 09:22:20AM -0500, Segher Boessenkool wrote: > > The attribute is about how the *current* function is instrumented, not > > about anything called by this function. This is clearly documented: > > 'no_sanitize_address' > > 'no_address_safety_analysis' > > The 'no_sanitize_address' attribute on functions is used to inform > > the compiler that it should not instrument memory accesses in the > > function when compiling with the '-fsanitize=address' option. The > > 'no_address_safety_analysis' is a deprecated alias of the > > 'no_sanitize_address' attribute, new code should use > > 'no_sanitize_address'. > > I understand this, and I have read the documentation. > > I'm not claiming any *individual* semantic is wrong, just that in > combination this doesn't provide what people *need* (even if it strictly > matches what is documented). > > My argument is: if the compiler is permitted to implictly and > arbitrarily add calls to instrumented functions within a function marked > with `no_sanitize_address`, the `no_sanitize_address` attribute is > effectively useless, and therefore *something* needs to change. I do not see how that follows. Maybe that is obvious from how you look at your use case, but it is not from the viewpoint of people who just want to do sanitation. So what is the goal here? Why do you need to prevent sanitation on anything called from this function, at all cost? > > > I appreciate where you're coming from here, but I think you're approaching the > > > problem sideways. > > > > I am stating facts, I am not trying to solve your problem there. It > > seemed to me (and still does) that you didn't grasp all facts here. > > Sorry, but I think you're reading my replies uncharitably if you think > that. Not at all. I just don't see what your problem is, and what you try to achieve. I do know what you say you want, but that is clearly impossible to do: the compiler cannot put restrictions on what some external function will or won't do! > > > We need to define *what the semantics are* so that we can actually solve the > > > problem, e.g. is a memcpy implementation expected to be instrumented or not? > > > > That is up to the memcpy implementation itself, of course. > > Sorry, but that doesn't make sense to me. When the compiler instruments > a function with AddressSanitizer, it must have *some* assumption about > whether memcpy() itself will be instrumented, such that it won't miss > some necessary instrumentation (and ideally, for performance reasons > doesn't have redundant instrumentation). Yes. It sets things up with an external memcpy that is sanitized. But that happens at the linking stage: it is fine in general for user space, but for kasan you need to do something similar manually. > If the story is "memcpy may or may not be instrumented", then the only > way to guarantee necessary instrumentation is for the compiler to > *always* place it in the caller (unless forbidden by > `no_sanitize_address`). If that were the case, the kernel can make > things work by simply not instrumenting memcpy and friends. It is *impossible* (in general) to put this in the caller, and it is not how this stuff is designed either (of course). > IIUC today those assumptions are not documented. Is the behaviour > consistent? The documentation I quoted above is simple and clear enough I hope. Consistent? Consistent with what? > > > > GCC *requires* memcpy to be the standard memcpy always (i.e. to have the > > > > standard-specified semantics). This means that it will have the same > > > > semantics as __builtin_memcpy always, and either or not be a call to an > > > > external function. It can also create calls to it out of thin air. > > > > > > I understand all of that. > > > > And still you want us to do something that is impossible under those > > existing constraints :-( > > If that's truly impossible, that's very unfortunate. > > FWIW, I can believe this would require tremendous effort to change, even > if it's not truly impossible. No. Truly and trivially impossible. You must want something else than what you say, but I cannot figure out what. To implement what you ask for we will have to build every function twice, once with and once without instrumentation, and emit both somewhere, somehow. This requires either some ABI extensions, or file format extensions ("fat objects"), or at the minimum some copperation between the app (the kernel here), the compiler, and the linker (and more tools in general, but, kernel :-) ) > If that is the case, it means that kernel side we have to never > instrument our implementation of memcpy and friends for correctness > reasons, which has the unfortunate property of losing coverage in the > cases we *would* like to use an instrumented memcpy. What correctness reasons are that? > > If you want the external memcpy called by modules A, B, C to not be > > instrumented, you have to link A, B, and C against an uninstrumented > > memcpy. This is something the kernel will have to do, the compiler has > > no say in how the kernel is linked together. > > Unfortunately that options doesn't really fix the `no_sanitize_address` > semantic, and forces us to move *all* uninstrumentable code out into > separate compilation units, etc. Yes. Which follows directly from you not wanting to call anything instrumented from anything marked with such an attribute: you have to divide the world into two parts, if you want the world to be divided into two parts. > This isn't *impossible*, but is *very* painful. Yes. It is doable if there is just a handful of things that you do not want instrumented. Like low-level interrupt handlers. Luckily those things are generally written in assembler language for other reasons. Segher