Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp3423585rwb; Fri, 16 Dec 2022 14:56:07 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Zko0agkqe9weKchDdx9c8Avvo/cCWoQYIVtJvtyatbbgEeTry56aXbLoUiCQBj87VXSlW X-Received: by 2002:a05:6a20:3c93:b0:a4:63fd:ef1e with SMTP id b19-20020a056a203c9300b000a463fdef1emr49525819pzj.3.1671231367233; Fri, 16 Dec 2022 14:56:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671231367; cv=none; d=google.com; s=arc-20160816; b=AQ6MBQ2V5uDeILLOP3AouNU279ohATgJ0Wf47kcPv9xH7Efc+dsSQilK+Fk72OGksJ TBur4c2MgCQzhSNH4vYElTMP6Qoj2PZZzjTwCCF3txdAn53LwmU/51d/pqPq2hwmSpi8 q5CFuqjMc3PACdSbaKBmDG+3LVwZqLU3gzKrL9sfNzVzNxP6dym+wGRhLLs/SOJ0XU2I m9xo4wVZYz5GRPu+l+ap9SOQt5QlAXhOtR1HgTISpbDRt08vawM0X4sODyEnSdt+EBh7 N8jc+oAQr//CO71JMU6IzgEVqFHEjQqwY6jx0amVG9vwsZp7e55Pts4WjVsXwtQoPkqC tLlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=T1sWzmWH45qh/pr3yJYA5mK6XRRop2e+j5E2G9NZhYg=; b=uRLzy7yAgst0x1ANs7uI7TVVorS5ovpAcZozLsu5X+QMCenb3rEj5zciO/uciTI+Bk 2T80tExY5XrJ8guZvtZx6l3TLJRdjCZQi+Qzy7e1xXcEVrco3V6gEfyqieiySbpd23cq 5NZo7eZcab0aDO7JY5YmFBY9H3iyBT/FqnzJ2A7rEAqjGTanrRzhiRzDTAn3b9giQ7BO Z4RPSUErdc2RWFqRZX0dihiM7zuYVrSswVqOiHT0MiuURgM1arv2I9I5Z/ByBxYheSQn 7tnCoMjOUVl7ZbPlUTTufDWfGLJIn2C4x2uvE3z1fWixOr6Yigc6/Lw2E8hhX/5wL5IG 2lNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=n6U8DMpf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w189-20020a6382c6000000b00478972c18c4si3870119pgd.865.2022.12.16.14.55.56; Fri, 16 Dec 2022 14:56:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=n6U8DMpf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229971AbiLPWRJ (ORCPT + 68 others); Fri, 16 Dec 2022 17:17:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229453AbiLPWRH (ORCPT ); Fri, 16 Dec 2022 17:17:07 -0500 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9107275C4 for ; Fri, 16 Dec 2022 14:17:05 -0800 (PST) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1671229023; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=T1sWzmWH45qh/pr3yJYA5mK6XRRop2e+j5E2G9NZhYg=; b=n6U8DMpf0eb/5vQgFVgFS1beacvmcP0Ydrc2JAB8Ojx8ebmiLKO+7TSL0fxBOxDGdq/dNa ZVvUdeKgNGhyL4ys97RMCOmPtb6Y0hhe5u+HPH1OUzVjrTxS4on3lTp5H8h7V1pihmDnae w474f+4FHAmWALc4/NU+3C/4pM8vLZQ= From: andrey.konovalov@linux.dev To: Marco Elver , Andrew Morton Cc: Andrey Konovalov , Alexander Potapenko , Dmitry Vyukov , Andrey Ryabinin , kasan-dev@googlegroups.com, Peter Collingbourne , Evgenii Stepanov , Florian Mayer , Jann Horn , Mark Brand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrey Konovalov Subject: [PATCH v3] kasan: allow sampling page_alloc allocations for HW_TAGS Date: Fri, 16 Dec 2022 23:16:54 +0100 Message-Id: <323d51d422d497b3783dacb130af245f67d77671.1671228324.git.andreyknvl@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andrey Konovalov As Hardware Tag-Based KASAN is intended to be used in production, its performance impact is crucial. As page_alloc allocations tend to be big, tagging and checking all such allocations can introduce a significant slowdown. Add two new boot parameters that allow to alleviate that slowdown: - kasan.page_alloc.sample, which makes Hardware Tag-Based KASAN tag only every Nth page_alloc allocation with the order configured by the second added parameter (default: tag every such allocation). - kasan.page_alloc.sample.order, which makes sampling enabled by the first parameter only affect page_alloc allocations with the order equal or greater than the specified value (default: 3, see below). The exact performance improvement caused by using the new parameters depends on their values and the applied workload. The chosen default value for kasan.page_alloc.sample.order is 3, which matches both PAGE_ALLOC_COSTLY_ORDER and SKB_FRAG_PAGE_ORDER. This is done for two reasons: 1. PAGE_ALLOC_COSTLY_ORDER is "the order at which allocations are deemed costly to service", which corresponds to the idea that only large and thus costly allocations are supposed to sampled. 2. One of the workloads targeted by this patch is a benchmark that sends a large amount of data over a local loopback connection. Most multi-page data allocations in the networking subsystem have the order of SKB_FRAG_PAGE_ORDER (or PAGE_ALLOC_COSTLY_ORDER). When running a local loopback test on a testing MTE-enabled device in sync mode, enabling Hardware Tag-Based KASAN introduces a ~50% slowdown. Applying this patch and setting kasan.page_alloc.sampling to a value higher than 1 allows to lower the slowdown. The performance improvement saturates around the sampling interval value of 10 with the default sampling page order of 3. This lowers the slowdown to ~20%. The slowdown in real scenarios involving the network will likely be better. Enabling page_alloc sampling has a downside: KASAN misses bad accesses to a page_alloc allocation that has not been tagged. This lowers the value of KASAN as a security mitigation. However, based on measuring the number of page_alloc allocations of different orders during boot in a test build, sampling with the default kasan.page_alloc.sample.order value affects only ~7% of allocations. The rest ~93% of allocations are still checked deterministically. Signed-off-by: Andrey Konovalov --- Changes v2-v3: - Drop __GFP_KASAN_SAMPLE flag. - Add kasan.page_alloc.sample.order. - Add fast-path for disabled sampling to kasan_sample_page_alloc. Changes v1->v2: - Only sample allocations when __GFP_KASAN_SAMPLE is provided to alloc_pages(). - Fix build when KASAN is disabled. - Add more information about the flag to documentation. - Use optimized preemption-safe approach for sampling suggested by Marco. --- Documentation/dev-tools/kasan.rst | 16 +++++++++ include/linux/kasan.h | 14 +++++--- mm/kasan/common.c | 9 +++-- mm/kasan/hw_tags.c | 60 +++++++++++++++++++++++++++++++ mm/kasan/kasan.h | 27 ++++++++++++++ mm/page_alloc.c | 43 ++++++++++++++-------- 6 files changed, 148 insertions(+), 21 deletions(-) diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index 5c93ab915049..d983e4fcee7c 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -140,6 +140,22 @@ disabling KASAN altogether or controlling its features: - ``kasan.vmalloc=off`` or ``=on`` disables or enables tagging of vmalloc allocations (default: ``on``). +- ``kasan.page_alloc.sample=`` makes KASAN tag only every + Nth page_alloc allocation with the order equal or greater than + ``kasan.page_alloc.sample.order``, where N is the value of the ``sample`` + parameter (default: ``1``, or tag every such allocation). + This parameter is intended to mitigate the performance overhead introduced + by KASAN. + Note that enabling this parameter makes Hardware Tag-Based KASAN skip checks + of allocations chosen by sampling and thus miss bad accesses to these + allocations. Use the default value for accurate bug detection. + +- ``kasan.page_alloc.sample.order=`` specifies the minimum + order of allocations that are affected by sampling (default: ``3``). + Only applies when ``kasan.page_alloc.sample`` is set to a non-default value. + This parameter is intended to allow sampling only large page_alloc + allocations, which is the biggest source of the performace overhead. + Error reports ~~~~~~~~~~~~~ diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 96c9d56e5510..5ebbaf672009 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -120,12 +120,13 @@ static __always_inline void kasan_poison_pages(struct page *page, __kasan_poison_pages(page, order, init); } -void __kasan_unpoison_pages(struct page *page, unsigned int order, bool init); -static __always_inline void kasan_unpoison_pages(struct page *page, +bool __kasan_unpoison_pages(struct page *page, unsigned int order, bool init); +static __always_inline bool kasan_unpoison_pages(struct page *page, unsigned int order, bool init) { if (kasan_enabled()) - __kasan_unpoison_pages(page, order, init); + return __kasan_unpoison_pages(page, order, init); + return false; } void __kasan_cache_create_kmalloc(struct kmem_cache *cache); @@ -249,8 +250,11 @@ static __always_inline bool kasan_check_byte(const void *addr) static inline void kasan_unpoison_range(const void *address, size_t size) {} static inline void kasan_poison_pages(struct page *page, unsigned int order, bool init) {} -static inline void kasan_unpoison_pages(struct page *page, unsigned int order, - bool init) {} +static inline bool kasan_unpoison_pages(struct page *page, unsigned int order, + bool init) +{ + return false; +} static inline void kasan_cache_create_kmalloc(struct kmem_cache *cache) {} static inline void kasan_poison_slab(struct slab *slab) {} static inline void kasan_unpoison_object_data(struct kmem_cache *cache, diff --git a/mm/kasan/common.c b/mm/kasan/common.c index 833bf2cfd2a3..1d0008e1c420 100644 --- a/mm/kasan/common.c +++ b/mm/kasan/common.c @@ -95,19 +95,24 @@ asmlinkage void kasan_unpoison_task_stack_below(const void *watermark) } #endif /* CONFIG_KASAN_STACK */ -void __kasan_unpoison_pages(struct page *page, unsigned int order, bool init) +bool __kasan_unpoison_pages(struct page *page, unsigned int order, bool init) { u8 tag; unsigned long i; if (unlikely(PageHighMem(page))) - return; + return false; + + if (!kasan_sample_page_alloc(order)) + return false; tag = kasan_random_tag(); kasan_unpoison(set_tag(page_address(page), tag), PAGE_SIZE << order, init); for (i = 0; i < (1 << order); i++) page_kasan_tag_set(page + i, tag); + + return true; } void __kasan_poison_pages(struct page *page, unsigned int order, bool init) diff --git a/mm/kasan/hw_tags.c b/mm/kasan/hw_tags.c index b22c4f461cb0..d1bcb0205327 100644 --- a/mm/kasan/hw_tags.c +++ b/mm/kasan/hw_tags.c @@ -59,6 +59,24 @@ EXPORT_SYMBOL_GPL(kasan_mode); /* Whether to enable vmalloc tagging. */ DEFINE_STATIC_KEY_TRUE(kasan_flag_vmalloc); +#define PAGE_ALLOC_SAMPLE_DEFAULT 1 +#define PAGE_ALLOC_SAMPLE_ORDER_DEFAULT 3 + +/* + * Sampling interval of page_alloc allocation (un)poisoning. + * Defaults to no sampling. + */ +unsigned long kasan_page_alloc_sample = PAGE_ALLOC_SAMPLE_DEFAULT; + +/* + * Minimum order of page_alloc allocations to be affected by sampling. + * The default value is chosen to match both + * PAGE_ALLOC_COSTLY_ORDER and SKB_FRAG_PAGE_ORDER. + */ +unsigned int kasan_page_alloc_sample_order = PAGE_ALLOC_SAMPLE_ORDER_DEFAULT; + +DEFINE_PER_CPU(long, kasan_page_alloc_skip); + /* kasan=off/on */ static int __init early_kasan_flag(char *arg) { @@ -122,6 +140,48 @@ static inline const char *kasan_mode_info(void) return "sync"; } +/* kasan.page_alloc.sample= */ +static int __init early_kasan_flag_page_alloc_sample(char *arg) +{ + int rv; + + if (!arg) + return -EINVAL; + + rv = kstrtoul(arg, 0, &kasan_page_alloc_sample); + if (rv) + return rv; + + if (!kasan_page_alloc_sample || kasan_page_alloc_sample > LONG_MAX) { + kasan_page_alloc_sample = PAGE_ALLOC_SAMPLE_DEFAULT; + return -EINVAL; + } + + return 0; +} +early_param("kasan.page_alloc.sample", early_kasan_flag_page_alloc_sample); + +/* kasan.page_alloc.sample.order= */ +static int __init early_kasan_flag_page_alloc_sample_order(char *arg) +{ + int rv; + + if (!arg) + return -EINVAL; + + rv = kstrtouint(arg, 0, &kasan_page_alloc_sample_order); + if (rv) + return rv; + + if (kasan_page_alloc_sample_order > INT_MAX) { + kasan_page_alloc_sample_order = PAGE_ALLOC_SAMPLE_ORDER_DEFAULT; + return -EINVAL; + } + + return 0; +} +early_param("kasan.page_alloc.sample.order", early_kasan_flag_page_alloc_sample_order); + /* * kasan_init_hw_tags_cpu() is called for each CPU. * Not marked as __init as a CPU can be hot-plugged after boot. diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h index ea8cf1310b1e..32413f22aa82 100644 --- a/mm/kasan/kasan.h +++ b/mm/kasan/kasan.h @@ -42,6 +42,10 @@ enum kasan_mode { extern enum kasan_mode kasan_mode __ro_after_init; +extern unsigned long kasan_page_alloc_sample; +extern unsigned int kasan_page_alloc_sample_order; +DECLARE_PER_CPU(long, kasan_page_alloc_skip); + static inline bool kasan_vmalloc_enabled(void) { return static_branch_likely(&kasan_flag_vmalloc); @@ -57,6 +61,24 @@ static inline bool kasan_sync_fault_possible(void) return kasan_mode == KASAN_MODE_SYNC || kasan_mode == KASAN_MODE_ASYMM; } +static inline bool kasan_sample_page_alloc(unsigned int order) +{ + /* Fast-path for when sampling is disabled. */ + if (kasan_page_alloc_sample == 1) + return true; + + if (order < kasan_page_alloc_sample_order) + return true; + + if (this_cpu_dec_return(kasan_page_alloc_skip) < 0) { + this_cpu_write(kasan_page_alloc_skip, + kasan_page_alloc_sample - 1); + return true; + } + + return false; +} + #else /* CONFIG_KASAN_HW_TAGS */ static inline bool kasan_async_fault_possible(void) @@ -69,6 +91,11 @@ static inline bool kasan_sync_fault_possible(void) return true; } +static inline bool kasan_sample_page_alloc(unsigned int order) +{ + return true; +} + #endif /* CONFIG_KASAN_HW_TAGS */ #ifdef CONFIG_KASAN_GENERIC diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0745aedebb37..7d980dc0000e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1356,6 +1356,8 @@ static int free_tail_pages_check(struct page *head_page, struct page *page) * see the comment next to it. * 3. Skipping poisoning is requested via __GFP_SKIP_KASAN_POISON, * see the comment next to it. + * 4. The allocation is excluded from being checked due to sampling, + * see the call to kasan_unpoison_pages. * * Poisoning pages during deferred memory init will greatly lengthen the * process and cause problem in large memory systems as the deferred pages @@ -2468,7 +2470,8 @@ inline void post_alloc_hook(struct page *page, unsigned int order, { bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) && !should_skip_init(gfp_flags); - bool init_tags = init && (gfp_flags & __GFP_ZEROTAGS); + bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS); + bool reset_tags = !zero_tags; int i; set_page_private(page, 0); @@ -2491,30 +2494,42 @@ inline void post_alloc_hook(struct page *page, unsigned int order, */ /* - * If memory tags should be zeroed (which happens only when memory - * should be initialized as well). + * If memory tags should be zeroed + * (which happens only when memory should be initialized as well). */ - if (init_tags) { + if (zero_tags) { /* Initialize both memory and tags. */ for (i = 0; i != 1 << order; ++i) tag_clear_highpage(page + i); - /* Note that memory is already initialized by the loop above. */ + /* Take note that memory was initialized by the loop above. */ init = false; } if (!should_skip_kasan_unpoison(gfp_flags)) { - /* Unpoison shadow memory or set memory tags. */ - kasan_unpoison_pages(page, order, init); - - /* Note that memory is already initialized by KASAN. */ - if (kasan_has_integrated_init()) - init = false; - } else { - /* Ensure page_address() dereferencing does not fault. */ + /* Try unpoisoning (or setting tags) and initializing memory. */ + if (kasan_unpoison_pages(page, order, init)) { + /* Take note that memory was initialized by KASAN. */ + if (kasan_has_integrated_init()) + init = false; + /* Take note that memory tags were set by KASAN. */ + reset_tags = false; + } else { + /* + * KASAN decided to exclude this allocation from being + * poisoned due to sampling. Skip poisoning as well. + */ + SetPageSkipKASanPoison(page); + } + } + /* + * If memory tags have not been set, reset the page tags to ensure + * page_address() dereferencing does not fault. + */ + if (reset_tags) { for (i = 0; i != 1 << order; ++i) page_kasan_tag_reset(page + i); } - /* If memory is still not initialized, do it now. */ + /* If memory is still not initialized, initialize it now. */ if (init) kernel_init_pages(page, 1 << order); /* Propagate __GFP_SKIP_KASAN_POISON to page flags. */ -- 2.25.1