Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp5520094pxb; Mon, 28 Mar 2022 13:39:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyrqeIpmNbtgjfs4XCJoFkVyHiCvRAovze88C4GwZAgKHFZJau64LVlkSYU5o34y7nCATeg X-Received: by 2002:a05:6402:5193:b0:419:3d19:ce9e with SMTP id q19-20020a056402519300b004193d19ce9emr18172636edd.199.1648499947985; Mon, 28 Mar 2022 13:39:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648499947; cv=none; d=google.com; s=arc-20160816; b=zPNBvEJmLs6oHbbeSSqfYQku6Cg0u7esLqiYGPAm23oKgBQjZ2X25K1xRgmYLIF1Da mVUL2jm5Dm3/YPGtFD1S/BhnLQpJNt743vPXTwdNkAfLMX/Jajylae0zKBGoLw/tf4s0 pZ6g61bRCvJGQQ2DHQFscShxU9DNiJLqp2Jf+mHA0R2hbGhKki1pa8OrZariysVE0vc5 nOFWk5U9hotJz5MME4t3PM/9jx8FDixl2fiK7JvhERr05WYTlADdFaTiaFSGiwd2McoU My+DFi/Y30VLjInrddTZA5O2scsIlcsguUyPVPnJWX/P9nrK01AbXoifM5OoTkGGkQci TCXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=CEqBY7SiYOUUdoAVWWkeL/ZVXe7g+0DujMF7zIO1x2U=; b=0h0HJS4EaiKUiBoBYQw9b4R/TcPdDDuYJVox+K6I9ZRnYmF535444R0lx+aPquYYax RKFFpKO+Ikcz1/tavSYlXzhbGZbBe3XmtWURvQHV5mD1nJwVDGUGVJRrrOUZ/QrRQqxx +ZkTAdd69FoReT+M6G7rKvcv+pNA1G1JdMTXCcdaR9siYlaticKOu2cGpFJLbTAU8ev0 vEydJHjD4yj6PH6RjwboJ/fq84RZ3oOQnKAdENxJen7hSSjXS9aGtzwlFnz8SRdsJ+VH MnHo3lZqGR/YeeI+K5WbksQV/Um/42oGp0z/7XOBhpfnMM/9KKE/eauHPAQ018ItUShU 7CxA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t9-20020a056402020900b00418c2b5bf5fsi13967957edv.577.2022.03.28.13.38.42; Mon, 28 Mar 2022 13:39:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242701AbiC1M53 (ORCPT + 99 others); Mon, 28 Mar 2022 08:57:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240038AbiC1M51 (ORCPT ); Mon, 28 Mar 2022 08:57:27 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EDCBB35DD5; Mon, 28 Mar 2022 05:55:45 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7F150D6E; Mon, 28 Mar 2022 05:55:45 -0700 (PDT) Received: from lakrids (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1E1623F718; Mon, 28 Mar 2022 05:55:44 -0700 (PDT) Date: Mon, 28 Mar 2022 13:55:38 +0100 From: Mark Rutland To: Jakub Jelinek Cc: Segher Boessenkool , Peter Zijlstra , Nick Desaulniers , Borislav Petkov , Nathan Chancellor , x86-ml , lkml , llvm@lists.linux.dev, Josh Poimboeuf , linux-toolchains@vger.kernel.org Subject: Re: clang memcpy calls Message-ID: References: <20220325151238.GB614@gate.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 28, 2022 at 12:20:39PM +0200, Jakub Jelinek wrote: > On Mon, Mar 28, 2022 at 10:52:54AM +0100, Mark Rutland wrote: > > I think we're talking past each other here, so let me be more precise. :) > > > > The key thing is that when the user passes `-fsantize=address`, instrumentation > > is added by (a part of) the compiler. That instrumentation is added under some > > assumptions as to how the compiler as a whole will behave. > > > > With that in mind, the question is how is __attribute__((no_sanitize_address)) > > intended to work when considering all the usual expectations around how the > > compiler can play with memcpy and similar? > > no_sanitize_address or lack thereof is whether the current function > shouldn't be or should be ASan instrumented, not on whether other functions > it calls are instrumented or not. I understand this. :) > memcpy/memmove/memset are just a tiny bit special case because the > compiler can add them on their own even if they aren't present in the > source (there are a few others the compiler can pattern match too) and > various builtins can be on the other side expanded inline instead of > called, so one then gets the sanitization status of the function in > which it is used rather than whether the out of line implementation of > the function is sanitized. Yep, and this is the point I'm getting at. If the compiler *implicitly* generates a call to one of these from a function which was marked with __attribute__((no_sanitize_address)), then either: 1) This renders __attribute__((no_sanitize_address)) useless, since instrumentation is being added in a way the code author cannot reliably prevent. If so, I'd argue that this is a compiler bug, and that the transformation of inserting the call is unsound. 2) There's an expectation that those out-of-line implementations are *NOT* instrumented, and this is fine so long as those are not instrumented. I appreciate that this isn't necessarily something which was considered when the feature was originally designed, and maybe this is fine for userspace, but for kernel usage we need to be able to reliably prevent instrumentation. > If coexistence of instrumented and non-instrumented memcpy etc. was the goal > (it clearly wasn't), then the sanitizer libraries wouldn't be overriding > memcpy calls, but instead the compiler would redirect calls to memcpy in > instrumented functions to say __asan_memcpy which then would be > instrumented. FWIW, I think that approach would be fine for kernel usage. > > Given the standard doesn't say *anything* about instrumentation, what does GCC > > *require* instrumentation-wise of the memcpy implementation? What happens *in > > practice* today? > > > > For example, is the userspace implementation of memcpy() instrumented for > > AddressSanitizer, or not? > > It is, for all functions, whether compiled with -fsanitize=address or not, > if user app is linked with -fsanitize=address, libasan is linked in and > overrides the libc memcpy with its instrumented version. Thanks for confirming! Just to check, how does libasan prevent recursing within itself on implicitly generated calls to memcpy and friends? Is anything special done to build the libasan code, is that all asm, or something else? > Note that the default shadow memory value is 0 which means accessible, > so even when memcpy is instrumented, when called from non-instrumented code > often it will not do anything beyond normal behavior That might be true in userspace, but is not the case within the kernel as the shadow might not be mapped (and is one reason we need to inhibit instrumentation in the first place). For example, consider the code which *creates* the page tables for the shadow memory. Until that's run, accessing the shadow will result in a (fatal) fault, and while the compilation unit with that code is compiled *without* `-fsantize=address`, it may call into the common implementation of memcpy() and friends shared by all kernel code. Thanks, Mark. > , as non-instrumented > functions don't poison the paddings in between variable (there even doesn't > have to be any) etc. But e.g. malloc/operator new etc. is also overridden, > so buffer overflows/underflows on memory allocated that way from > uninstrumented code using memcpy etc. will be diagnosed. > > Jakub >