Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp4375990rwl; Mon, 3 Apr 2023 04:13:49 -0700 (PDT) X-Google-Smtp-Source: AKy350bAjGe+Nu9qHghXz2sBpnFcDP0wejvlx36UpUWj3CraigBPQlAwZ431sWtgpp4hhbrsDeBA X-Received: by 2002:a17:906:ae82:b0:931:a164:8efa with SMTP id md2-20020a170906ae8200b00931a1648efamr35691616ejb.70.1680520429337; Mon, 03 Apr 2023 04:13:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680520429; cv=none; d=google.com; s=arc-20160816; b=VI3ywem/A+mdK1SengxG/FQIxS8eJLT84MEslVmklJPgW92NV/an/4X9KytD2D/0sc Yf4TSuhbRmPwGhII4jOicgAVOz1Vt6jc49vlH8TJ2bQwFEAvTsAWA9PAPQFra+zt179R 7bzoXoLoz5coZbXE8D2RYmMoVm0TnRJ8es3MX+2RaWgsFjU50iADZUtdFcQ/z8aaVr+U CW5+XNH0M9GSAWmni3xJIEbbnh3TP6jxxhI+rX/nM4Jb6dK+11ShM3mr+ZFiwmd1p2Z3 oRWKZQxQGt9XmxBlkrKhTd/hD3l81EJSil5TCIEGbdXOa9hZ5S9IXGAHEt9/rx5D8szP YuaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id :dkim-signature; bh=jNd8dznDa6qafxBwtS27BSWf68Ex5hvYx12J2zeUsyk=; b=x5r0xQWovMpgHdF7iMT9WErANQ2y1vo/fEqbj8U7HD+U0UCzItLvgs0r1DMK0CLhDU 2zq38bZYJ2xuOmJXtcmTHHX/4bQ+b/qROyYMs/6i1vEc3fD6+q30akmM/ZsVnDfvYIPl C0eOlPcIpFhD8OFiIGKYLBMLzKW9U++uFoLyW/hIwHtrxzpHHaovBX1U3DUHXrLZZGfE pIfVAVXFZEZut45CJHVQ2XpX7ArqS/VGBd4hgYaFqAhPsJ3Bz0LcDxKz+x9/6lWNrjxs PYdhCLSoeJaw30qFUsDvmSR7R8LXGmYJ2pHf7i0gQpY8r/PZdA0p0KDOxiqEej504nbw 7HrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=FoILJWvg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w7-20020a1709064a0700b00939fb8e7aabsi4465247eju.784.2023.04.03.04.13.25; Mon, 03 Apr 2023 04:13:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=FoILJWvg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232027AbjDCLLa (ORCPT + 99 others); Mon, 3 Apr 2023 07:11:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232117AbjDCLLL (ORCPT ); Mon, 3 Apr 2023 07:11:11 -0400 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48AB3525C for ; Mon, 3 Apr 2023 04:10:46 -0700 (PDT) Received: by mail-pf1-x434.google.com with SMTP id cm5so13041069pfb.0 for ; Mon, 03 Apr 2023 04:10:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1680520245; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=jNd8dznDa6qafxBwtS27BSWf68Ex5hvYx12J2zeUsyk=; b=FoILJWvgv3NWfFQeUJNTXJJI82i+8W1Ki3dLEWEhl/hRtxMe9hth6UZ1WGhuiGs84B IIjhyU2lXdh7DaMZbG2uJMddroOm3yRwv7U5ZjpkG45BVYKfmzuTATC3gZ++JZoA2grJ Br+UPntkaXMnv/l8b3T66WRVBRV2q2CNa+XMMv29+WEUlVzUMuTYoHbMDKA8bJfYmLPr DAM2Ou7i/07S4/Wlse9jDhaCZYVYf1uiZmlr4AwZ15HoQaEVGe1kUkDhqkB84xrUjyVm m2m16cO4ODu6OqiyPiheiWFrQofa7beOz9JzGhdFmfVUoA+GUJ50dLKmTHKpMU1AUnox DmJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680520245; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=jNd8dznDa6qafxBwtS27BSWf68Ex5hvYx12J2zeUsyk=; b=DArQUR+czFd5YSgrYf9sZ3z7Arese2EMnG/jKSD4EMald/CcPemqdNEosrXh1AH+2V Ep9olU3sfcz6H4dWYnmfRQDlX5nP83cM11zl9ejPIkSKAmpbG0UBeQxEPbEPOfdxh3lE NzN2nqBxGR6e52G10418q7OmgviL+hhyGICWIxhXoV4KCX+PNPWMltWXFpfbWBYfmeMU 7Ei0+oleKNSMSC4OqrwDA8dInZYULPwT0R4Q0YCi/GKqN5pVvFPAAXUzoohq7XcDzWsQ babD+/gWX3MTlxYPs0A2f02ExyaiY9QOpsjeA9BesIejSHvI5+B0XwFQjruxhxCJrh58 0sUQ== X-Gm-Message-State: AAQBX9fq5hzSKjifwIe12nhcvOupMfoV6BFHDgDysqu0N0e1kPwcm+5B npNyJnn4rPkTjea2vq/zbhTsHg== X-Received: by 2002:a62:1811:0:b0:62a:4503:53ba with SMTP id 17-20020a621811000000b0062a450353bamr32121258pfy.26.1680520245583; Mon, 03 Apr 2023 04:10:45 -0700 (PDT) Received: from [10.200.10.217] ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id s21-20020a056a00195500b0062dd1c55346sm6693830pfk.67.2023.04.03.04.10.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 Apr 2023 04:10:45 -0700 (PDT) Message-ID: Date: Mon, 3 Apr 2023 19:10:40 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH] mm: kfence: Improve the performance of __kfence_alloc() and __kfence_free() To: Marco Elver Cc: glider@google.com, dvyukov@google.com, akpm@linux-foundation.org, kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230403062757.74057-1-zhangpeng.00@bytedance.com> From: Peng Zhang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.5 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2023/4/3 17:21, Marco Elver 写道: > On Mon, 3 Apr 2023 at 08:28, Peng Zhang wrote: >> In __kfence_alloc() and __kfence_free(), we will set and check canary. >> Assuming that the size of the object is close to 0, nearly 4k memory >> accesses are required because setting and checking canary is executed >> byte by byte. >> >> canary is now defined like this: >> KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7)) >> >> Observe that canary is only related to the lower three bits of the >> address, so every 8 bytes of canary are the same. We can access 8-byte >> canary each time instead of byte-by-byte, thereby optimizing nearly 4k >> memory accesses to 4k/8 times. >> >> Use the bcc tool funclatency to measure the latency of __kfence_alloc() >> and __kfence_free(), the numbers (deleted the distribution of latency) >> is posted below. Though different object sizes will have an impact on the >> measurement, we ignore it for now and assume the average object size is >> roughly equal. >> >> Before playing patch: >> __kfence_alloc: >> avg = 5055 nsecs, total: 5515252 nsecs, count: 1091 >> __kfence_free: >> avg = 5319 nsecs, total: 9735130 nsecs, count: 1830 >> >> After playing patch: >> __kfence_alloc: >> avg = 3597 nsecs, total: 6428491 nsecs, count: 1787 >> __kfence_free: >> avg = 3046 nsecs, total: 3415390 nsecs, count: 1121 > Seems like a nice improvement! > >> The numbers indicate that there is ~30% - ~40% performance improvement. >> >> Signed-off-by: Peng Zhang >> --- >> mm/kfence/core.c | 71 +++++++++++++++++++++++++++++++++------------- >> mm/kfence/kfence.h | 10 ++++++- >> mm/kfence/report.c | 2 +- >> 3 files changed, 62 insertions(+), 21 deletions(-) >> >> diff --git a/mm/kfence/core.c b/mm/kfence/core.c >> index 79c94ee55f97..0b1b1298c738 100644 >> --- a/mm/kfence/core.c >> +++ b/mm/kfence/core.c >> @@ -297,20 +297,13 @@ metadata_update_state(struct kfence_metadata *meta, enum kfence_object_state nex >> WRITE_ONCE(meta->state, next); >> } >> >> -/* Write canary byte to @addr. */ >> -static inline bool set_canary_byte(u8 *addr) >> -{ >> - *addr = KFENCE_CANARY_PATTERN(addr); >> - return true; >> -} >> - >> /* Check canary byte at @addr. */ >> static inline bool check_canary_byte(u8 *addr) >> { >> struct kfence_metadata *meta; >> unsigned long flags; >> >> - if (likely(*addr == KFENCE_CANARY_PATTERN(addr))) >> + if (likely(*addr == KFENCE_CANARY_PATTERN_U8(addr))) >> return true; >> >> atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); >> @@ -323,11 +316,27 @@ static inline bool check_canary_byte(u8 *addr) >> return false; >> } >> >> -/* __always_inline this to ensure we won't do an indirect call to fn. */ >> -static __always_inline void for_each_canary(const struct kfence_metadata *meta, bool (*fn)(u8 *)) >> +static inline void set_canary(const struct kfence_metadata *meta) >> { >> const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE); >> - unsigned long addr; >> + unsigned long addr = pageaddr; >> + >> + /* >> + * The canary may be written to part of the object memory, but it does >> + * not affect it. The user should initialize the object before using it. >> + */ >> + for (; addr < meta->addr; addr += sizeof(u64)) >> + *((u64 *)addr) = KFENCE_CANARY_PATTERN_U64; >> + >> + addr = ALIGN_DOWN(meta->addr + meta->size, sizeof(u64)); >> + for (; addr - pageaddr < PAGE_SIZE; addr += sizeof(u64)) >> + *((u64 *)addr) = KFENCE_CANARY_PATTERN_U64; >> +} >> + >> +static inline void check_canary(const struct kfence_metadata *meta) >> +{ >> + const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE); >> + unsigned long addr = pageaddr; >> >> /* >> * We'll iterate over each canary byte per-side until fn() returns > This comment is now out-of-date ("fn" no longer exists). > >> @@ -339,14 +348,38 @@ static __always_inline void for_each_canary(const struct kfence_metadata *meta, >> */ >> >> /* Apply to left of object. */ >> - for (addr = pageaddr; addr < meta->addr; addr++) { >> - if (!fn((u8 *)addr)) >> + for (; meta->addr - addr >= sizeof(u64); addr += sizeof(u64)) { >> + if (unlikely(*((u64 *)addr) != KFENCE_CANARY_PATTERN_U64)) >> break; >> } >> >> - /* Apply to right of object. */ >> - for (addr = meta->addr + meta->size; addr < pageaddr + PAGE_SIZE; addr++) { >> - if (!fn((u8 *)addr)) >> + /* >> + * If the canary is damaged in a certain 64 bytes, or the canay memory > "damaged" -> "corrupted" > "canay" -> "canary" > >> + * cannot be completely covered by multiple consecutive 64 bytes, it >> + * needs to be checked one by one. >> + */ >> + for (; addr < meta->addr; addr++) { >> + if (unlikely(!check_canary_byte((u8 *)addr))) >> + break; >> + } >> + >> + /* >> + * Apply to right of object. >> + * For easier implementation, check from high address to low address. >> + */ >> + addr = pageaddr + PAGE_SIZE - sizeof(u64); >> + for (; addr >= meta->addr + meta->size ; addr -= sizeof(u64)) { >> + if (unlikely(*((u64 *)addr) != KFENCE_CANARY_PATTERN_U64)) >> + break; >> + } >> + >> + /* >> + * Same as above, checking byte by byte, but here is the reverse of >> + * the above. >> + */ >> + addr = addr + sizeof(u64) - 1; >> + for (; addr >= meta->addr + meta->size; addr--) { > The re-checking should forward-check i.e. not in reverse, otherwise > the report might not include some corrupted bytes that had in the > previous version been included. I think you need to check from low to > high address to start with above. Yes, it's better to forward-check to avoid losing the corrupted bytes which be used in report. I will include all your suggestions in the next version of the patch. Thanks. > >> + if (unlikely(!check_canary_byte((u8 *)addr))) >> break; >> } >> } >> @@ -434,7 +467,7 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g >> #endif >> >> /* Memory initialization. */ >> - for_each_canary(meta, set_canary_byte); >> + set_canary(meta); >> >> /* >> * We check slab_want_init_on_alloc() ourselves, rather than letting >> @@ -495,7 +528,7 @@ static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool z >> alloc_covered_add(meta->alloc_stack_hash, -1); >> >> /* Check canary bytes for memory corruption. */ >> - for_each_canary(meta, check_canary_byte); >> + check_canary(meta); >> >> /* >> * Clear memory if init-on-free is set. While we protect the page, the >> @@ -751,7 +784,7 @@ static void kfence_check_all_canary(void) >> struct kfence_metadata *meta = &kfence_metadata[i]; >> >> if (meta->state == KFENCE_OBJECT_ALLOCATED) >> - for_each_canary(meta, check_canary_byte); >> + check_canary(meta); >> } >> } >> >> diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h >> index 600f2e2431d6..2aafc46a4aaf 100644 >> --- a/mm/kfence/kfence.h >> +++ b/mm/kfence/kfence.h >> @@ -21,7 +21,15 @@ >> * lower 3 bits of the address, to detect memory corruptions with higher >> * probability, where similar constants are used. >> */ >> -#define KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7)) >> +#define KFENCE_CANARY_PATTERN_U8(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7)) >> + >> +/* >> + * Define a continuous 8-byte canary starting from a multiple of 8. The canary >> + * of each byte is only related to the lowest three bits of its address, so the >> + * canary of every 8 bytes is the same. 64-bit memory can be filled and checked >> + * at a time instead of byte by byte to improve performance. >> + */ >> +#define KFENCE_CANARY_PATTERN_U64 ((u64)0xaaaaaaaaaaaaaaaa ^ (u64)(0x0706050403020100)) >> >> /* Maximum stack depth for reports. */ >> #define KFENCE_STACK_DEPTH 64 >> diff --git a/mm/kfence/report.c b/mm/kfence/report.c >> index 60205f1257ef..197430a5be4a 100644 >> --- a/mm/kfence/report.c >> +++ b/mm/kfence/report.c >> @@ -168,7 +168,7 @@ static void print_diff_canary(unsigned long address, size_t bytes_to_show, >> >> pr_cont("["); >> for (cur = (const u8 *)address; cur < end; cur++) { >> - if (*cur == KFENCE_CANARY_PATTERN(cur)) >> + if (*cur == KFENCE_CANARY_PATTERN_U8(cur)) >> pr_cont(" ."); >> else if (no_hash_pointers) >> pr_cont(" 0x%02x", *cur); >> -- >> 2.20.1 >>