Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp1682279pxm; Fri, 4 Mar 2022 01:10:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJwYhTyTZApY0OEdxstnDbPbtuOHgyUltE/EFHxff0BiNZ38S99p553kIfBu5xfRCRiCM/zN X-Received: by 2002:a50:d9c2:0:b0:413:97b5:d9e6 with SMTP id x2-20020a50d9c2000000b0041397b5d9e6mr28437526edj.334.1646385031516; Fri, 04 Mar 2022 01:10:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646385031; cv=none; d=google.com; s=arc-20160816; b=KijqWB1Y1vqYIyxkOgUCIH+Lofs5jDpIkS5FKwHG6Pe/9jzLN5MbwQa0qIryhFjAJf X9z4rJvEvuJ5Kf3AxtKsEiWiTchuTfxhwDhCDfVhAC+unAcgNpa+cL7PxD7ca5TMaKhO X1PVWc22mqFhMJEbMCTeoUlIutvtCwu2ptN90WI1YybU1xBWelb9oW4sUYwq/iGCvsW5 VDxfJ5cQNicUb6qws5wXxE6mSam76BUZ57+zNRZYc2l7mB6G6kcxjHvZWs0RIXEfGciX 2rQgCxpI1qnnLOIOMv5TfXxM+NoxP4Cbb0ZkTmtsDEc/TSRzIIxMlOPboR4YdkSEytED bjAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=JfdzVvLooHJF9L8WODqVVEN3n1zOdAfiFCmeSEp2dM0=; b=zOL1kIk/Tg2wI6d1corZuHIrya8dDJsaXtJElVNr5ID/bRbGQeihuQtDE55ObL6lXn /KPlP8TV+jRxeSfI2XXISMNHbVXCAykB+yMfJ2ZGr4nitgjnf72PXZBaRPbESNxLA7/t 58RFeXn1G0hsFGKObjN/59tdtNMM1wcC4lC3WmoaQ9ioLp/ydZwNVrji1GhslGSnFm/w Xjh8FdmbMWSu2HazA1qXH8tPk2YqDwTHp+Gv9K4kwMgAnciHqrmTiK31RKXqAX9fIzNC REnvwjArlY1dK34oLWpOMdm1uAapzAn06wX/7+u5ZThBlTAzMnwB3B0rUq/1CxwvcycD cafw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l10-20020a170906794a00b006dab9e2f034si761671ejo.614.2022.03.04.01.10.08; Fri, 04 Mar 2022 01:10:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237692AbiCDCZt (ORCPT + 99 others); Thu, 3 Mar 2022 21:25:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237677AbiCDCZr (ORCPT ); Thu, 3 Mar 2022 21:25:47 -0500 Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com [115.124.30.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2FD21704FB for ; Thu, 3 Mar 2022 18:24:59 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R901e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=dtcccc@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0V6AHJDk_1646360695; Received: from 30.97.48.223(mailfrom:dtcccc@linux.alibaba.com fp:SMTPD_---0V6AHJDk_1646360695) by smtp.aliyun-inc.com(127.0.0.1); Fri, 04 Mar 2022 10:24:56 +0800 Message-ID: <7c14bb40-1e7b-9819-1634-e9e9051726fa@linux.alibaba.com> Date: Fri, 4 Mar 2022 10:24:55 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [RFC PATCH 0/2] Alloc kfence_pool after system startup Content-Language: en-US To: Marco Elver , Alexander Potapenko Cc: Dmitry Vyukov , Andrew Morton , kasan-dev , Linux Memory Management List , LKML References: <20220303031505.28495-1-dtcccc@linux.alibaba.com> From: Tianchen Ding In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/3/3 17:30, Marco Elver wrote: Thanks for your replies. I do see setting a large sample_interval means almost disabling KFENCE. In fact, my point is to provide a more “flexible” way. Since some Ops may be glad to use something like on/off switch than 10000ms interval. :-) > On Thu, 3 Mar 2022 at 10:05, Alexander Potapenko wrote: > > I share Alex's concerns. > >> On Thu, Mar 3, 2022 at 4:15 AM Tianchen Ding wrote: >>> >>> KFENCE aims at production environments, but it does not allow enabling >>> after system startup because kfence_pool only alloc pages from memblock. >>> Consider the following production scene: >>> At first, for performance considerations, production machines do not >>> enable KFENCE. >> >> What are the performance considerations you have in mind? Are you running KFENCE with a very aggressive sampling rate? > > Indeed, what is wrong with simply starting up KFENCE with a sample > interval of 10000? However, I very much doubt that you'll notice any > performance issues above 500ms. > > Do let us know what performance issues you have seen. It may be > related to an earlier version of KFENCE but has since been fixed (see > log). > >>> However, after running for a while, the kernel is suspected to have >>> memory errors. (e.g., a sibling machine crashed.) >> >> I have doubts regarding this setup. It might be faster (although one can tune KFENCE to have nearly zero performance impact), but is harder to maintain. >> It will also catch fewer errors than if you just had KFENCE on from the very beginning: >> - sibling machines may behave differently, and a certain bug may only occur once - in that case the secondary instances won't notice it, even with KFENCE; >> - KFENCE also catches non-lethal corruptions (e.g. OOB reads), which may stay under radar for a very long time. >> >>> >>> So other production machines need to enable KFENCE, but it's hard for >>> them to reboot. >>> >>> The 1st patch allows re-enabling KFENCE if the pool is already >>> allocated from memblock. > > Patch 1/2 might be ok by itself, but I still don't see the point > because you should just leave KFENCE enabled. There should be no > reason to have to turn it off. If anything, you can increase the > sample interval to something very large if needed.