Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp38058885rwd; Wed, 12 Jul 2023 02:20:51 -0700 (PDT) X-Google-Smtp-Source: APBJJlEDyPEfhM1jxM840AUsYWwFPNCfec6LhcNu6+MHxj5NeSjOrMdSnudjv5zEFmcx+QV6OVWR X-Received: by 2002:a92:cf4e:0:b0:346:53cf:418d with SMTP id c14-20020a92cf4e000000b0034653cf418dmr11638706ilr.15.1689153651171; Wed, 12 Jul 2023 02:20:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689153651; cv=none; d=google.com; s=arc-20160816; b=0VJt//mI0NFZjZY45LbkKZtBg03LCAJ5qhv12E5MlF7TNP/j/29Oft+pAdB8vT7glJ P5WwIq1FPk0HQ8BgbcnD8F1y0tsxGDdGThX8+Bp3qLlXogtYG7txhX7XSV1m02MtvaN+ zXdvKRXXcpsdBNONSqxbKqkMJR8Qe1JB2kwl2hhQtwniLtDgu96I8J9D46qtzIoeQdxd 1Wq04ETkaXPyOMgTHuYKqEMWkjuQs9lsAVxIiVjosruYErl/9BDDaVek291t2w4CAV1P tU0GzINuDKWt50xitsJA7D2pzQ5nHx36+/wIVQkfXDopmOQZA8jOiGXnm8xQt9qmQypd yf7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id :dkim-signature; bh=rBUqtpQ/qPgW5u7ywMLTowE+PlXrM3g8tx27h51sMCE=; fh=K2x+m+hFjceLF2wk8ADStJsgy5QAj4/zR9EywkGKr8s=; b=jrHEvqEeql1fhA+QMqpsB+DZJc+7h36IslHmC3Hhr/fGUe5fvOwHXP2djKXG5EGtRp FNR0fd5xffGoIrNw8e/VaccjU1mmCmu8qFYDQ+TNWZMBmhXuXuo3nM/v6KIRKM/5L5Su Qa6ZmpSques/gVIMZVl2HW629hNUXy8QsnsZFOPN+UY5S+567VnnR9JLxkXA4ksTl1Jd msS37ax9O8mgOLp2J/5xIvpRqof/Ipe2Yclqa8i61pZG3PX57RnUYj2LFfVLWfyqvpiq BBuADV7cnJE1GFI5c54KM18snWVhorKGUqbjkp5RBxr8BNnym6Vh+sJj5W7OUz3zdgcs 8iLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=VqzN9T6Q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v185-20020a6389c2000000b0053ef0ac79c8si2866393pgd.263.2023.07.12.02.20.39; Wed, 12 Jul 2023 02:20:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=VqzN9T6Q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231690AbjGLIjF (ORCPT + 99 others); Wed, 12 Jul 2023 04:39:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231210AbjGLIi3 (ORCPT ); Wed, 12 Jul 2023 04:38:29 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A74E2685 for ; Wed, 12 Jul 2023 01:37:16 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-66869feb7d1so3683876b3a.3 for ; Wed, 12 Jul 2023 01:37:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1689151036; x=1691743036; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=rBUqtpQ/qPgW5u7ywMLTowE+PlXrM3g8tx27h51sMCE=; b=VqzN9T6Qk0SmNiMem+kRtijR440z3E/ALzGAi5DDK6EoHaDzEG/nn3Ed2Qv6pH6Iyw VHphXyz81s1pWvpd9DYqoqyVUxrs9jGYhkmJrFtpjhHBEwogtxcBINhKqQHqVblVqUO8 /KeWzOyK6V4UDu0itEHFvgsvtfZKNRJXn+mbCx2r8H8oYSb/p0K+H5e7vx4a0/h3Zpm1 twR1Zo4NCjJTuFY8b1HAl+Q6DJYwJjOe4MC1k0S6I7CrK3mu0QXpEHzbyOFcxsuTgfNz JplpzbKV72uyfouPy9r3K5hfutrfi3qrrOPCi5u3iYE8E9zF8IVLRtOX9LffJXqcWJSw D5qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689151036; x=1691743036; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=rBUqtpQ/qPgW5u7ywMLTowE+PlXrM3g8tx27h51sMCE=; b=EXYHF+yRzbWHOoRQWnMVG6pG2VohZ0GbPnp4BN4BUz+/HvrI1wic9M8xMBJsndva7r q5HM6mLbO80EMUMiWx3hHqFl3riOP7PAZ/Qrgap38AxLxnMnpNcZ/X9SXD8V1fbCWbaH E/I48vOcoRnTRq5M6omW6sLpCsWXi0nkMZsU9KapdhZQgUMhgja1dbvLys6jY0Xa3NK3 MQVyUMMoawMtGlIe/hB34/+wlmq+YYBYcXoHXb4ZucyZzcFBpi57pBMhQOhgnWyO2C/r c0p66f+AMxU9NdnGm4fO6pEVBArwOhIH84FI4XCHvmGW76Q2CRtrovATmdhrIOnIKkMr IUbA== X-Gm-Message-State: ABy/qLY4PDHPhT4bLloeykCMz3QfjAMJxytZOM8w9G4k1ptvJcdT99iJ zbEPZh776EXH2bc9cY+OkVccVw== X-Received: by 2002:a05:6a00:1409:b0:673:5d1e:6654 with SMTP id l9-20020a056a00140900b006735d1e6654mr16234608pfu.33.1689151036022; Wed, 12 Jul 2023 01:37:16 -0700 (PDT) Received: from [10.254.22.102] ([139.177.225.243]) by smtp.gmail.com with ESMTPSA id c19-20020aa78e13000000b00682b2fbd20fsm3078868pfr.31.2023.07.12.01.37.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 12 Jul 2023 01:37:15 -0700 (PDT) Message-ID: Date: Wed, 12 Jul 2023 16:37:10 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH] mm: kfence: allocate kfence_metadata at runtime To: Peng Zhang Cc: glider@google.com, dvyukov@google.com, akpm@linux-foundation.org, kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Marco Elver References: <20230710032714.26200-1-zhangpeng.00@bytedance.com> <2a16a76c-506c-f325-6792-4fb58e8da531@bytedance.com> From: Peng Zhang In-Reply-To: <2a16a76c-506c-f325-6792-4fb58e8da531@bytedance.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2023/7/12 16:28, Peng Zhang 写道: > > > 在 2023/7/10 18:19, Marco Elver 写道: >> On Mon, 10 Jul 2023 at 05:27, 'Peng Zhang' via kasan-dev >> wrote: >>> >>> kfence_metadata is currently a static array. For the purpose of >>> allocating scalable __kfence_pool, we first change it to runtime >>> allocation of metadata. Since the size of an object of kfence_metadata >>> is 1160 bytes, we can save at least 72 pages (with default 256 objects) >>> without enabling kfence. >>> >>> Below is the numbers obtained in qemu (with default 256 objects). >>> before: Memory: 8134692K/8388080K available (3668K bss) >>> after: Memory: 8136740K/8388080K available (1620K bss) >>> More than expected, it saves 2MB memory. >>> >>> Signed-off-by: Peng Zhang >> >> Seems like a reasonable optimization, but see comments below. >> >> Also with this patch applied on top of v6.5-rc1, KFENCE just doesn't >> init at all anymore (early init). Please fix. > I'm very sorry because I made a slight modification before sending the > patch but it has not been tested, which caused it to not work properly. > I fixed some of the issues you mentioned in v2[1]. > > [1] > https://lore.kernel.org/lkml/20230712081616.45177-1-zhangpeng.00@bytedance.com/ > >> >>> --- >>>   mm/kfence/core.c   | 102 ++++++++++++++++++++++++++++++++------------- >>>   mm/kfence/kfence.h |   5 ++- >>>   2 files changed, 78 insertions(+), 29 deletions(-) >>> >>> diff --git a/mm/kfence/core.c b/mm/kfence/core.c >>> index dad3c0eb70a0..b9fec1c46e3d 100644 >>> --- a/mm/kfence/core.c >>> +++ b/mm/kfence/core.c >>> @@ -116,7 +116,7 @@ EXPORT_SYMBOL(__kfence_pool); /* Export for test >>> modules. */ >>>    * backing pages (in __kfence_pool). >>>    */ >>>   static_assert(CONFIG_KFENCE_NUM_OBJECTS > 0); >>> -struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS]; >>> +struct kfence_metadata *kfence_metadata; >>> >>>   /* Freelist with available objects. */ >>>   static struct list_head kfence_freelist = >>> LIST_HEAD_INIT(kfence_freelist); >>> @@ -643,13 +643,56 @@ static unsigned long kfence_init_pool(void) >>>          return addr; >>>   } >>> >>> +static int kfence_alloc_metadata(void) >>> +{ >>> +       unsigned long nr_pages = KFENCE_METADATA_SIZE / PAGE_SIZE; >>> + >>> +#ifdef CONFIG_CONTIG_ALLOC >>> +       struct page *pages; >>> + >>> +       pages = alloc_contig_pages(nr_pages, GFP_KERNEL, >>> first_online_node, >>> +                                  NULL); >>> +       if (pages) >>> +               kfence_metadata = page_to_virt(pages); >>> +#else >>> +       if (nr_pages > MAX_ORDER_NR_PAGES) { >>> +               pr_warn("KFENCE_NUM_OBJECTS too large for buddy >>> allocator\n"); >> >> Does this mean that KFENCE won't work at all if we can't allocate the >> metadata? I.e. it won't work either in early nor late init modes? >> >> I know we already have this limitation for _late init_ of the KFENCE >> pool. >> >> So I have one major question: when doing _early init_, what is the >> maximum size of the KFENCE pool (#objects) with this change? > It will be limited to 2^10/sizeof(struct kfence_metadata) by buddy Sorry, 2^10*PAGE_SIZE/sizeof(struct kfence_metadata) > system, so I used memblock to allocate kfence_metadata in v2. >> >>> +               return -EINVAL; >>> +       } >>> +       kfence_metadata = alloc_pages_exact(KFENCE_METADATA_SIZE, >>> +                                           GFP_KERNEL); >>> +#endif >>> + >>> +       if (!kfence_metadata) >>> +               return -ENOMEM; >>> + >>> +       memset(kfence_metadata, 0, KFENCE_METADATA_SIZE); >> >> memzero_explicit, or pass __GFP_ZERO to alloc_pages? > Unfortunately, __GFP_ZERO does not work successfully in > alloc_contig_pages(), so I used memzero_explicit() in v2. > Even though I don't know if memzero_explicit() is necessary > (it just uses the barrier). >> >>> +       return 0; >>> +} >>> + >>> +static void kfence_free_metadata(void) >>> +{ >>> +       if (WARN_ON(!kfence_metadata)) >>> +               return; >>> +#ifdef CONFIG_CONTIG_ALLOC >>> +       free_contig_range(page_to_pfn(virt_to_page((void >>> *)kfence_metadata)), >>> +                         KFENCE_METADATA_SIZE / PAGE_SIZE); >>> +#else >>> +       free_pages_exact((void *)kfence_metadata, KFENCE_METADATA_SIZE); >>> +#endif >>> +       kfence_metadata = NULL; >>> +} >>> + >>>   static bool __init kfence_init_pool_early(void) >>>   { >>> -       unsigned long addr; >>> +       unsigned long addr = (unsigned long)__kfence_pool; >>> >>>          if (!__kfence_pool) >>>                  return false; >>> >>> +       if (!kfence_alloc_metadata()) >>> +               goto free_pool; >>> + >>>          addr = kfence_init_pool(); >>> >>>          if (!addr) { >>> @@ -663,6 +706,7 @@ static bool __init kfence_init_pool_early(void) >>>                  return true; >>>          } >>> >>> +       kfence_free_metadata(); >>>          /* >>>           * Only release unprotected pages, and do not try to go back >>> and change >>>           * page attributes due to risk of failing to do so as well. >>> If changing >>> @@ -670,31 +714,12 @@ static bool __init kfence_init_pool_early(void) >>>           * fails for the first page, and therefore expect >>> addr==__kfence_pool in >>>           * most failure cases. >>>           */ >>> +free_pool: >>>          memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - >>> (unsigned long)__kfence_pool)); >>>          __kfence_pool = NULL; >>>          return false; >>>   } >>> >>> -static bool kfence_init_pool_late(void) >>> -{ >>> -       unsigned long addr, free_size; >>> - >>> -       addr = kfence_init_pool(); >>> - >>> -       if (!addr) >>> -               return true; >>> - >>> -       /* Same as above. */ >>> -       free_size = KFENCE_POOL_SIZE - (addr - (unsigned >>> long)__kfence_pool); >>> -#ifdef CONFIG_CONTIG_ALLOC >>> -       free_contig_range(page_to_pfn(virt_to_page((void *)addr)), >>> free_size / PAGE_SIZE); >>> -#else >>> -       free_pages_exact((void *)addr, free_size); >>> -#endif >>> -       __kfence_pool = NULL; >>> -       return false; >>> -} >>> - >>>   /* === DebugFS Interface >>> ==================================================== */ >>> >>>   static int stats_show(struct seq_file *seq, void *v) >>> @@ -896,6 +921,10 @@ void __init kfence_init(void) >>>   static int kfence_init_late(void) >>>   { >>>          const unsigned long nr_pages = KFENCE_POOL_SIZE / PAGE_SIZE; >>> +       unsigned long addr = (unsigned long)__kfence_pool; >>> +       unsigned long free_size = KFENCE_POOL_SIZE; >>> +       int ret; >>> + >>>   #ifdef CONFIG_CONTIG_ALLOC >>>          struct page *pages; >>> >>> @@ -913,15 +942,29 @@ static int kfence_init_late(void) >>>                  return -ENOMEM; >>>   #endif >>> >>> -       if (!kfence_init_pool_late()) { >>> -               pr_err("%s failed\n", __func__); >>> -               return -EBUSY; >>> +       ret = kfence_alloc_metadata(); >>> +       if (!ret) >>> +               goto free_pool; >>> + >>> +       addr = kfence_init_pool(); >>> +       if (!addr) { >>> +               kfence_init_enable(); >>> +               kfence_debugfs_init(); >>> +               return 0; >>>          } >>> >>> -       kfence_init_enable(); >>> -       kfence_debugfs_init(); >>> +       pr_err("%s failed\n", __func__); >>> +       kfence_free_metadata(); >>> +       free_size = KFENCE_POOL_SIZE - (addr - (unsigned >>> long)__kfence_pool); >>> +       ret = -EBUSY; >>> >>> -       return 0; >>> +free_pool: >>> +#ifdef CONFIG_CONTIG_ALLOC >>> +       free_contig_range(page_to_pfn(virt_to_page((void *)addr)), >>> free_size / PAGE_SIZE); >>> +#else >>> +       free_pages_exact((void *)addr, free_size); >>> +#endif >> >> You moved this from kfence_init_pool_late - that did "__kfence_pool = >> NULL" which is missing now. > Thanks for spotting this, I added it in v2. >