Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp5627465pxb; Mon, 28 Mar 2022 15:10:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy5NT7Ziq1qH5+YOPGcYaOUsO/Z3I9wfEuVexE8h259Lbh9Kv0xaI/rtcD+QoaPjxW2cYK3 X-Received: by 2002:a67:f8ce:0:b0:325:76a3:5724 with SMTP id c14-20020a67f8ce000000b0032576a35724mr9297604vsp.47.1648505411342; Mon, 28 Mar 2022 15:10:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648505411; cv=none; d=google.com; s=arc-20160816; b=LLyMc05Vit4B779WX/BskafhRP2fe7qZ9/OgT9ueICj8iyAKl9mKhImucKR+FKM28o tPZae/bDVgEeMCQYAA1DrilTB00GU7y4nn/sB15MhNWYMlY28XhrkYLKjwSEvTnq+mFk SxQkm/NWt8DmZpOfmTzi5rpDVLznDJJFnwdWqEhCinus4mcQShWthAGBlDbvX6sD9kzr zmKQRsKSJxl+ULsH51NTEcJhIFXgA0T8Jt31rGWRwVNsybJuSuo1Q3L/vDHv9Wnsonmg o2IJAGEdf9xxltVZFBlF+6CxX2fWO5eCAHKHD7Z8SdgsbJNKlGzqg6qiUIS9jBb1ADUi jTQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=IkKQexUeX2UiNz6CEEEo6v7o+Hvk5OA/ydLvkm76Vt4=; b=oefEfUyhuP8v6RjYKJUWrnkGOnCGj9OO8sgmUIyZuwXoSI3O+BzHZyVDV+N3bqUR/k ZZDnXBB/1D5mRsBIDdEdOL7/T/mjVLsIpJKLVxMYn5q/pCcddtP3hI//3vrli4UZsVQL XbHa2wkx40n3jlHt7PznYmh8McMOe9rwoLYbzbu1Q4oKgzJlBNJA+SLN0239dahGqotY uGRJMMKpmVHNgghIJjD8O1caRFmP6Osnj3C7T4XdfkgpHMVJkNNCklQgEU7FfSngMrfd 2r0c1LHZilVlmbKQPKNNEHHtYKit2ynAsv3QHg2uX+TvAdO5MBRojkY/faB2mNMmWk3e SJDA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id l25-20020a67d519000000b00324c5c3be3bsi3302902vsj.125.2022.03.28.15.10.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 15:10:11 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F2D66165ABA; Mon, 28 Mar 2022 14:31:02 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233543AbiC1Nqh (ORCPT + 99 others); Mon, 28 Mar 2022 09:46:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232470AbiC1Nqe (ORCPT ); Mon, 28 Mar 2022 09:46:34 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ACBC137BFA; Mon, 28 Mar 2022 06:44:53 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6D6941FB; Mon, 28 Mar 2022 06:44:53 -0700 (PDT) Received: from lakrids (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0D43C3F718; Mon, 28 Mar 2022 06:44:51 -0700 (PDT) Date: Mon, 28 Mar 2022 14:44:46 +0100 From: Mark Rutland To: Jakub Jelinek Cc: Segher Boessenkool , Peter Zijlstra , Nick Desaulniers , Borislav Petkov , Nathan Chancellor , x86-ml , lkml , llvm@lists.linux.dev, Josh Poimboeuf , linux-toolchains@vger.kernel.org Subject: Re: clang memcpy calls Message-ID: References: <20220325151238.GB614@gate.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 28, 2022 at 03:12:14PM +0200, Jakub Jelinek wrote: > On Mon, Mar 28, 2022 at 01:55:38PM +0100, Mark Rutland wrote: > > > If coexistence of instrumented and non-instrumented memcpy etc. was the goal > > > (it clearly wasn't), then the sanitizer libraries wouldn't be overriding > > > memcpy calls, but instead the compiler would redirect calls to memcpy in > > > instrumented functions to say __asan_memcpy which then would be > > > instrumented. > > > > FWIW, I think that approach would be fine for kernel usage. > > > > > > Given the standard doesn't say *anything* about instrumentation, what does GCC > > > > *require* instrumentation-wise of the memcpy implementation? What happens *in > > > > practice* today? > > > > > > > > For example, is the userspace implementation of memcpy() instrumented for > > > > AddressSanitizer, or not? > > > > > > It is, for all functions, whether compiled with -fsanitize=address or not, > > > if user app is linked with -fsanitize=address, libasan is linked in and > > > overrides the libc memcpy with its instrumented version. > > > > Thanks for confirming! Just to check, how does libasan prevent recursing > > within itself on implicitly generated calls to memcpy and friends? Is > > anything special done to build the libasan code, is that all asm, or > > something else? > > Generally, most of the libasan wrappers look like > do something > call the original libc function (address from dlsym/dlvsym) > do something > and the "do something" code isn't that complicated. I see; thanks! > The compiler doesn't add calls to memcpy/memset etc. just to screw up > users, they are added for a reason, such as copying or clearing very > large aggregates (including for passing them by value), without -Os it > will rarely use calls for smaller sizes and will instead expand them > inline. Sure; I understand that and (from my side at least) I'm not arguing that there's malice or so on, just that I don't think we currently have the tools for the kernel to be able to do the right thing reliably and robustly. Thanks for helping! :) > For malloc and the like wrappers I think it uses some TLS recursion > counters so that say malloc called from dlsym doesn't cause problems. > > Now, one way for the kernel to make kasan work (more) reliably even with > existing compilers without special tweaks for this might be if those > calls to no_sanitize_address code aren't mixed with sanitized code all the > time might be set some per-thread flag when starting a "no sanitized" code > execution and clear it at the end of the region (or vice versa) and test > those flags in the kernel's memcpy/memset etc. implementation to decide if > they should be sanitized or not. Unfortunately, I don't think the setting a flag is workable, since e.g. we need to ensure the flag is set before any implicitly-generated calls, and I don't think we have a reliable way to do that from C. There's also a number of portions of uninstrumentable code, so from a maintainability and robustness PoV this option isn't great. For `noinstr` code specifically (which gets placed into a distinct section and can be identified by virtual address) we could have the out-of-line functions look at their return address, but that's not going to cover the general case of `__attribute__((no_sanitize_address))` or compilation units built without `-fsanitize=address`. From my PoV, distinguishing instrumentable/uninstrumentable calls at compile time would be ideal. That, or placing the instrumentation into the caller (omitting it when instrumentation is disabled for that caller), and expecting the out-of-line forms are never instrumented. I appreciate that latter option may not be workable due to potential size bloat, though. Similar will apply to TSAN, MSAN, etc, and I'm not sure whether those are mutually exclusive from the compiler's PoV. > As KASAN is (hopefully) just a kernel debugging aid and not something meant > for production (in the userspace at least GCC strongly recommends against > using the sanitizers in production), perhaps allocating one per-thread bool > or int and changing it in a few spots and testing in the library functions > might be acceptable. I'm not sure whether folk are using KASAN in production, but IIRC there was a desire to do so. Thanks, Mark.