Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp1394854rwi; Thu, 20 Oct 2022 11:55:11 -0700 (PDT) X-Google-Smtp-Source: AMsMyM79V7041aWgaGvbe0JYZV7MKavi4t1NUEsNj719GFZoRCa3xqdRg5koAO+69Gim0oiAL+uI X-Received: by 2002:a05:6402:2816:b0:434:ed38:16f3 with SMTP id h22-20020a056402281600b00434ed3816f3mr13611561ede.116.1666292111525; Thu, 20 Oct 2022 11:55:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666292111; cv=none; d=google.com; s=arc-20160816; b=oqFZCtbEpMdr0NarWv6F1Pu6qm83uj7/+Y/X15ieBv+ymFOvPBSKW7OS7jfF4M5tgv WOoeUpW95SZLBDZ09hbsQNRsz9GR2RZJ6lvrXYMWlUVczaOZxWG1ZE+QLGn0y/YKVtdi +MgPlsiSAz0SAjMgyIO0kZiBnThMntn59ui57atEBZncNTbz7N8YN4Bpz3a8JmeSgCza /uMQpIww31IqUd6/zzNLrV76qYPgDrhv5k/11Or8qCnQ/bWQXeDxmBOdLGuwaRLdjQ5d VMvQjTGbIdlkkvkpX4a9qFZoyX384SxMuB/G6DhISRmcm95sBm5wQ4oq7UntZY4m68/u Mb0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=GiLFWWbEtdCDuURuAcgBAS+IDqShPLESUc/digwUyOQ=; b=geMAxzMTIsMThM4e8+Er10o/siP8OnpohTTuRFtrrXyBtsh9uqJsvi3lyNZ9ZSPafV EsjmRVzFPzZKBi+Zke1A8wYZQvphmuRwxYwBGqvGYAgS25zrKStZmvLPAJj22yEu+ior hwVr9HPqwtNTE5PZrizSlda2zncsI4pfzeRyYHdIr2Cni+XWgLVBX12pUJZyPILusJ8p 29L7SmiU3zGT+oDEN3O8+vD1YNwyttWGxUKh8w4tQ3g8To3OpXmMax45T3YUwmnlO2AZ 5+jiq2sQ3dhGmjF92x43NAmSd7fHS2UYoMJo50LdXL4VMLUPddeTaPamiIr3Lycb9cE7 Vo/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=is+dxxAd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hq8-20020a1709073f0800b0078c37681f89si21785073ejc.650.2022.10.20.11.54.45; Thu, 20 Oct 2022 11:55:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=is+dxxAd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230081AbiJTSmw (ORCPT + 99 others); Thu, 20 Oct 2022 14:42:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229739AbiJTSmu (ORCPT ); Thu, 20 Oct 2022 14:42:50 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 468A1208809 for ; Thu, 20 Oct 2022 11:42:49 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id y4so149351plb.2 for ; Thu, 20 Oct 2022 11:42:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=GiLFWWbEtdCDuURuAcgBAS+IDqShPLESUc/digwUyOQ=; b=is+dxxAdd4jXbGomGgpqBlglzFuIhZBN+mY22McB5iz1ijkjv+BOtTkj7PHH7jR4w+ vWEf+Njo+LaPuFFiSjL3qGSpn9IMgegQr4ayfAi9LSJY330Mdg65huw6G/+2QWT7JB0l EfXLfofcVi43z9wEOwUm6BZuCO5InhFjCF0FSy7VT3uBkbIhZ/l++dJmv1aDxm7aheJw jKfZRNM3zxR2uIVrMmPR/7yWQA6XYzXF8JfJprgTguhr4dDSSc/NpKW6eymDIRs6Nuc/ /QS2HSwvHLZKW0aeS7OMu/sH2PvqTgBZ9wVY6+N6wY5jWM7X7YUgjBhQ27/noQ7Ih22l CiFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=GiLFWWbEtdCDuURuAcgBAS+IDqShPLESUc/digwUyOQ=; b=wB2cPyoj2MQQ16D3PlJ75f8wyzy8dMaWq34dD7Bjen3vAsAH5pKFk9VugPMBHldcs4 o+2xYS/oU7dKC9GeMELoX+cUCPcai+vx1VO+pMqU5jgenmiPQHSOAXGQXkN79ERPebmB OUtOtdujF90Ic9KcSqWNDmCVSbtPTDL3r+h8SG5GbFTJ7OHygJY6OeHLynq3PI1z4dJz UbjxZuNmAZzJ/FIqKzRzTMVzeOEFR6cA2Z0hSBuc1L/YvWhroyKJ0NP2LgEQHCIXuRK6 RqM0G012y0s1xhVV9ZRPHL/ZRP2i4mbn2xH77PqGn7batEARH4F46HdejXokVVv/1XO+ bB1g== X-Gm-Message-State: ACrzQf3afpG7eWOFE2xGYAp/K8EJT2QeMKGRSxe+AsruBUFsqhefqx9O 3yma23YoMqt6UeM9AKnkJWivftQt0Levh/Y8GrI= X-Received: by 2002:a17:903:41cf:b0:183:5a22:c63e with SMTP id u15-20020a17090341cf00b001835a22c63emr14905540ple.61.1666291368493; Thu, 20 Oct 2022 11:42:48 -0700 (PDT) MIME-Version: 1.0 References: <20221018200125.848471-1-jthoughton@google.com> In-Reply-To: From: Yang Shi Date: Thu, 20 Oct 2022 11:42:36 -0700 Message-ID: Subject: Re: [PATCH] hugetlbfs: don't delete error page from pagecache To: Mike Kravetz Cc: James Houghton , Muchun Song , Naoya Horiguchi , Miaohe Lin , Andrew Morton , Axel Rasmussen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 19, 2022 at 11:55 AM Mike Kravetz wrote: > > On 10/19/22 11:31, Yang Shi wrote: > > On Tue, Oct 18, 2022 at 1:01 PM James Houghton wrote: > > > > > > This change is very similar to the change that was made for shmem [1], > > > and it solves the same problem but for HugeTLBFS instead. > > > > > > Currently, when poison is found in a HugeTLB page, the page is removed > > > from the page cache. That means that attempting to map or read that > > > hugepage in the future will result in a new hugepage being allocated > > > instead of notifying the user that the page was poisoned. As [1] states, > > > this is effectively memory corruption. > > > > > > The fix is to leave the page in the page cache. If the user attempts to > > > use a poisoned HugeTLB page with a syscall, the syscall will fail with > > > EIO, the same error code that shmem uses. For attempts to map the page, > > > the thread will get a BUS_MCEERR_AR SIGBUS. > > > > > > [1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens") > > > > > > Signed-off-by: James Houghton > > > > Thanks for the patch. Yes, we should do the same thing for hugetlbfs. > > When I was working on shmem I did look into hugetlbfs too. But the > > problem is we actually make the whole hugetlb page unavailable even > > though just one 4K sub page is hwpoisoned. It may be fine to 2M > > hugetlb page, but a lot of memory may be a huge waste for 1G hugetlb > > page, particular for the page fault path. > > One thing that complicated this a bit is the vmemmap optimizations for > hugetlb. However, I believe Naoya may have addressed this recently. > > > So I discussed this with Mike offline last year, and I was told Google > > was working on PTE mapped hugetlb page. That should be able to solve > > the problem. And we'd like to have the high-granularity hugetlb > > mapping support as the predecessor. > > Yes, I went back in my notes and noticed it had been one year. No offense > intended to James and his great work on HGM. However, in hindsight we should > have fixed this in some way without waiting for a HGM based. > > > There were some other details, but I can't remember all of them, I > > have to refresh my memory by rereading the email discussions... > > I think the complicating factor was vmemmap optimization. As mentioned > above, this may have already been addressed by Naoya in patches to > indicate which sub-page(s) had the actual error. > > As Yang Shi notes, this patch makes the entire hugetlb page inaccessible. > With some work, we could allow reads to everything but the sub-page with > error. However, this should be much easier with HGM. And, we could > potentially even do page faults everywhere but the sub-page with error. > > I still think it may be better to wait for HGM instead of trying to do > read access to all but sub-page with error now. But, entirely open to > other opinions. I have no strong preference about which goes first. > > I plan to do a review of this patch a little later. > -- > Mike Kravetz