Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp3115568rdb; Fri, 22 Sep 2023 20:33:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHC3FKp4uwPkzt6sIryhNfHNx6Rh/2x71QI3FoEKHLRs4l37oygW5dS0QtGvPaR1cX6Cj0N X-Received: by 2002:a05:6808:1b11:b0:3a7:c13:c8d1 with SMTP id bx17-20020a0568081b1100b003a70c13c8d1mr1642370oib.17.1695440029683; Fri, 22 Sep 2023 20:33:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695440029; cv=none; d=google.com; s=arc-20160816; b=Hl7JZkinyAacruPJCy5GEHG6k1mk6v0ECaoGWPcJbcXEWy5rwKoD1b49UgEJzFeThJ nu+UCx5NGktcA6RPv6fBdrmt1xSh+TRf/+UPBTdwpnTJ1RzKPVvQKM9TsomgtdAjtFH/ r37fJlrEk8+1xfFjg0krgBinUMCJEMig3I+YNfQfa7zsMSIF75U4TWBAjtB1BKqvrwbI 9OaedhQiPdOe2dD9wCYa2MCyV8xO9WjN1AoRXMA2vpRnRUDBCVpKhUhKmRBpky80Jx7C V4twWxKYLjZmgF1gz85l3hQ75s5IxCOLaonnXVYNtU0MP73BVZAWEyHXL97ZT1w9tXcT 0Txg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=PX/0KxCRZLI5NaCTo1yZkeVqSoIgksd0mU5334LBdQI=; fh=143cm8fmjSKyyJvff5tNMi3FFVkQxMs4OgEqpDuQRwM=; b=Z7gfnx6HTHlMMxjxW7tLl/BHGQ3T3fevql6ICEw4ZaMi8LDSvn9XiuBdAHmBtSwDTo 3zvEqvlGI0Ui7LX8xUac21nI1X4fK5Vqd+UKxXXsKfl8OwtHZcwvHcu9fPpvYh/Sbt/L lYF1ebZHNIOiisA1MyPWZRw7o8qWuAe39pb8JykLqnPBxTY9eGn+RYjmH8b26XdYPwIy bNJcLyiVKUyMSQ8qISq6i3gY1DZgQpl+KXc8sVNfeQELrK0ODxHWG2myZfRTCKesX0q7 Fvj2tXcRk5BSK77DVAZH5gVaE2t20xpxbhv1HEcBuPv9y47P0z3tg5iH2fJ37E3X0aoS RPjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id z16-20020a656650000000b0056a1ea34f45si4737090pgv.670.2023.09.22.20.33.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 20:33:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 2B4068097A4D; Fri, 22 Sep 2023 20:25:42 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229810AbjIWDUg (ORCPT + 99 others); Fri, 22 Sep 2023 23:20:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229592AbjIWDUf (ORCPT ); Fri, 22 Sep 2023 23:20:35 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9827CA9; Fri, 22 Sep 2023 20:20:24 -0700 (PDT) Received: from canpemm500002.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4RsvRR2mGTztS3d; Sat, 23 Sep 2023 11:16:03 +0800 (CST) Received: from [10.174.151.185] (10.174.151.185) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Sat, 23 Sep 2023 11:20:19 +0800 Subject: Re: [PATCH v1 1/4] mm: handle poisoning of pfn without struct pages To: CC: , , , , , , , , , , , , , , , , References: <20230920140210.12663-1-ankita@nvidia.com> <20230920140210.12663-2-ankita@nvidia.com> From: Miaohe Lin Message-ID: <878264ae-f6f6-04d9-2d52-fb7ae29dca85@huawei.com> Date: Sat, 23 Sep 2023 11:20:19 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <20230920140210.12663-2-ankita@nvidia.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.151.185] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected X-Spam-Status: No, score=1.3 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:25:42 -0700 (PDT) X-Spam-Level: * On 2023/9/20 22:02, ankita@nvidia.com wrote: > From: Ankit Agrawal > > The kernel MM currently does not handle ECC errors / poison on a memory > region that is not backed by struct pages. If a memory region is mapped > using remap_pfn_range(), but not added to the kernel, MM will not have > associated struct pages. Add a new mechanism to handle memory failure > on such memory. > > Make kernel MM expose a function to allow modules managing the device > memory to register a failure function and the physical address space > associated with the device memory. MM maintains this information as > interval tree. The registered memory failure function is used by MM to > notify the kernel module managing the PFN, so that the module may take > any required action. The module for example may use the information > to track the poisoned pages. > > In this implementation, kernel MM follows the following sequence similar > (mostly) to the memory_failure() handler for struct page backed memory: > 1. memory_failure() is triggered on reception of a poison error. An > absence of struct page is detected and consequently memory_failure_pfn() > is executed. > 2. memory_failure_pfn() call the newly introduced failure handler exposed > by the module managing the poisoned memory to notify it of the problematic > PFN. > 3. memory_failure_pfn() unmaps the stage-2 mapping to the PFN. > 4. memory_failure_pfn() collects the processes mapped to the PFN. > 5. memory_failure_pfn() sends SIGBUS (BUS_MCEERR_AO) to all the processes > mapping the faulty PFN using kill_procs(). > 6. An access to the faulty PFN by an operation in VM at a later point > is trapped and user_mem_abort() is called. > 7. The vma ops fault function gets called due to the absence of Stage-2 > mapping. It is expected to return VM_FAULT_HWPOISON on the PFN. > 8. __gfn_to_pfn_memslot() then returns KVM_PFN_ERR_HWPOISON, which cause > the poison with SIGBUS (BUS_MCEERR_AR) to be sent to the QEMU process > through kvm_send_hwpoison_signal(). > > Signed-off-by: Ankit Agrawal Thanks for your patch. > /* > * Return values: > * 1: the page is dissolved (if needed) and taken off from buddy, > @@ -422,15 +428,15 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma, > * Schedule a process for later kill. > * Uses GFP_ATOMIC allocations to avoid potential recursions in the VM. > * > - * Note: @fsdax_pgoff is used only when @p is a fsdax page and a > - * filesystem with a memory failure handler has claimed the > - * memory_failure event. In all other cases, page->index and > - * page->mapping are sufficient for mapping the page back to its > + * Notice: @pgoff is used either when @p is a fsdax page or a PFN is not > + * backed by struct page and a filesystem with a memory failure handler > + * has claimed the memory_failure event. In all other cases, page->index > + * and page->mapping are sufficient for mapping the page back to its > * corresponding user virtual address. > */ > static void __add_to_kill(struct task_struct *tsk, struct page *p, > struct vm_area_struct *vma, struct list_head *to_kill, > - unsigned long ksm_addr, pgoff_t fsdax_pgoff) > + unsigned long ksm_addr, pgoff_t pgoff) > { > struct to_kill *tk; > > @@ -440,13 +446,18 @@ static void __add_to_kill(struct task_struct *tsk, struct page *p, > return; > } > > - tk->addr = ksm_addr ? ksm_addr : page_address_in_vma(p, vma); > - if (is_zone_device_page(p)) { > - if (fsdax_pgoff != FSDAX_INVALID_PGOFF) > - tk->addr = vma_pgoff_address(fsdax_pgoff, 1, vma); > - tk->size_shift = dev_pagemap_mapping_shift(vma, tk->addr); > - } else > - tk->size_shift = page_shift(compound_head(p)); > + if (vma->vm_flags | PFN_MAP) { if (vma->vm_flags | PFN_MAP)? So this branch is always selected? > + tk->addr = vma_pgoff_address(pgoff, 1, vma); > + tk->size_shift = PAGE_SHIFT; > + } else { > + tk->addr = ksm_addr ? ksm_addr : page_address_in_vma(p, vma); > + if (is_zone_device_page(p)) { > + if (pgoff != FSDAX_INVALID_PGOFF) > + tk->addr = vma_pgoff_address(pgoff, 1, vma); > + tk->size_shift = dev_pagemap_mapping_shift(vma, tk->addr); > + } else > + tk->size_shift = page_shift(compound_head(p)); > + } > IIUC, the page passed to __add_to_kill is NULL in this case. So when tk->addr == -EFAULT, we will have problem to do the page_to_pfn(p) in the following pr_info: if (tk->addr == -EFAULT) { pr_info("Unable to find user space address %lx in %s\n", page_to_pfn(p), tsk->comm); > /* > * Send SIGKILL if "tk->addr == -EFAULT". Also, as > @@ -666,8 +677,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, > i_mmap_unlock_read(mapping); > } > > /** > * memory_failure - Handle memory failure of a page. > * @pfn: Page Number of the corrupted page > @@ -2183,6 +2271,11 @@ int memory_failure(unsigned long pfn, int flags) > if (!(flags & MF_SW_SIMULATED)) > hw_memory_failure = true; > > + if (!pfn_valid(pfn) && !arch_is_platform_page(PFN_PHYS(pfn))) { Could it be better to add a helper here to detect the pfns without struct page? > + res = memory_failure_pfn(pfn, flags); > + goto unlock_mutex; > + } > + > p = pfn_to_online_page(pfn); > if (!p) { > res = arch_memory_failure(pfn, flags); > Thanks.