Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp2249670pxb; Fri, 25 Mar 2022 13:55:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwzElhbYwbZRV19DuyzGAJ4VeAlbaW948LXvmFoFeEqzJYQrcYZt/OqJ8sx3OcfcDWmImDE X-Received: by 2002:a17:90a:db50:b0:1c6:99dd:c40d with SMTP id u16-20020a17090adb5000b001c699ddc40dmr26703569pjx.143.1648241703737; Fri, 25 Mar 2022 13:55:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648241703; cv=none; d=google.com; s=arc-20160816; b=aQrEzwe4HDaO6kVVgELZ2KSNTQLPGNsJotNiU8kH4Rv7+2euoTyZ6W24dQqrkxfJjY Fv9udw311FUFwCzfXNM5TjoqzS/I9MyUkIMI94KdBhB7jQACO3JrckF0FFFO7slq2i3E OH9yeUR9/oMDVHUWeeWba66eBaHIofivLc5HoGvbaEAAGGXQoczDUaEkNZZRaBlDgJGa 0d0VXdrkgVxIVvix4gAGl8jttk48XOhsU5uWQRhXV1juCvrSU5cjARcwfJdYONcokx3j wRXb8KWEL3w8ycnurfWxG+11jyAxb3jC8j6e4WlXH4u84Dy0k70z39PJIge95QQ1q+vP EJHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:subject:cc:to:from:date; bh=4vbLBBAb2r+8k758VzR0kzBTfupq3l1dCYy0gE9auIQ=; b=ms2z6BVjay/jc71iVgklceHJ8BInRiWSDrUfIoXQDZ3IGKXGzDie0dsqYO7FzWp65C 13KpdAFr/vRb8vpP9qikPHg6lfKQEFPX43raN46fnJg5kzbaK53dqZh9J+dwQVm+lMei HUOyL/9xqRNB+h6Z/DeeLxmAxMQLEaZWHFpGPXhx2vUSyp9imhMatSsFjjFKsyu/a4nL cXA7rdHMr1Se4uKc8qWDsj+T7qvI6pmk+643xDijcKEZPcCgZjtZq0GmPFSLmhtu/dl5 AKasCk9bGh1cVEX+6MCkPG4bCwQl9H0fe8biTGhtBHvetf+AkqQEh65KOr0nZtHjt8DZ EtBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id j6-20020a170902690600b00153b2d1654bsi3170307plk.339.2022.03.25.13.55.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Mar 2022 13:55:03 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5ED9870F58; Fri, 25 Mar 2022 13:25:27 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232195AbiCYU0r (ORCPT + 99 others); Fri, 25 Mar 2022 16:26:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232169AbiCYU0p (ORCPT ); Fri, 25 Mar 2022 16:26:45 -0400 X-Greylist: delayed 633 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 25 Mar 2022 13:25:11 PDT Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 477176EB1E; Fri, 25 Mar 2022 13:25:10 -0700 (PDT) Received: from [2603:3005:d05:2b00:6e0b:84ff:fee2:98bb] (helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nXqKL-0000sb-7e; Fri, 25 Mar 2022 16:14:29 -0400 Date: Fri, 25 Mar 2022 16:14:28 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, kernel-team@fb.com, Oscar Salvador , Miaohe Lin , Naoya Horiguchi , Mel Gorman , Johannes Weiner , Andrew Morton , stable@vger.kernel.org Subject: [PATCH] mm,hwpoison: unmap poisoned page before invalidation Message-ID: <20220325161428.5068d97e@imladris.surriel.com> X-Mailer: Claws Mail 4.0.0 (GTK+ 3.24.31; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: riel@shelob.surriel.com X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In some cases it appears the invalidation of a hwpoisoned page fails because the page is still mapped in another process. This can cause a program to be continuously restarted and die when it page faults on the page that was not invalidated. Avoid that problem by unmapping the hwpoisoned page when we find it. Another issue is that sometimes we end up oopsing in finish_fault, if the code tries to do something with the now-NULL vmf->page. I did not hit this error when submitting the previous patch because there are several opportunities for alloc_set_pte to bail out before accessing vmf->page, and that apparently happened on those systems, and most of the time on other systems, too. However, across several million systems that error does occur a handful of times a day. It can be avoided by returning VM_FAULT_NOPAGE which will cause do_read_fault to return before calling finish_fault. Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path") Cc: Oscar Salvador Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Mel Gorman Cc: Johannes Weiner Cc: Andrew Morton Cc: stable@vger.kernel.org --- mm/memory.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index be44d0b36b18..76e3af9639d9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) return ret; if (unlikely(PageHWPoison(vmf->page))) { + struct page *page = vmf->page; vm_fault_t poisonret = VM_FAULT_HWPOISON; if (ret & VM_FAULT_LOCKED) { + if (page_mapped(page)) + unmap_mapping_pages(page_mapping(page), + page->index, 1, false); /* Retry if a clean page was removed from the cache. */ - if (invalidate_inode_page(vmf->page)) - poisonret = 0; - unlock_page(vmf->page); + if (invalidate_inode_page(page)) + poisonret = VM_FAULT_NOPAGE; + unlock_page(page); } - put_page(vmf->page); + put_page(page); vmf->page = NULL; return poisonret; } -- 2.35.1