Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp231114pxb; Wed, 13 Jan 2021 02:07:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJyHtGFUhLma3fDWJhi/Q9Qg0r9aQGOlVzOXOBqYPHpDKlkuHZpXyOm6+k/j+mZwYkCGErr8 X-Received: by 2002:a17:906:c45:: with SMTP id t5mr995190ejf.370.1610532420670; Wed, 13 Jan 2021 02:07:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610532420; cv=none; d=google.com; s=arc-20160816; b=i5k25dtqUtICMKz9abuk8TkPoo8MGqRUPv5qLA4Ez9Q+h87fMXvJD3sgovibImTGWo t0+xakZ6Nc0OJirs3JY3OFYUj9V8PqzbqlA8W7ELzd0cxI/BE7JgyE1N4b9vzH8RYXxd 9mMaDlOamWvcRVD9oyREVfrj4zJMR0G2++bW/kaCc/a95+hvQab+kvxlw2wrjeGkXuY+ LlwVXui71jrXMApV8GHTS0fSqHqXufvhLLfuhQYSM2DdYmE17eBj6GshnT3tBRJmTHl0 p74WffB8aEryELdnGv44X4TbMlxEtLPYYLtBVt/CJVZbK1RMV80cNuMEHkvONwJX3Yc1 NDIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=CBwPYdaLrE88+hW+WaQ5gdJPEIzdzOZ7SiLCYRvreLY=; b=guPKhPN5OLGFmPmZbSAU4JlvlzvO2KeNDUot41fS9AvsKPGgy6uGRjHdEr74dq9qXt FIt1liDgMIyDyShVO5K9cEAJryA+MM97JYAJTprVila6xpZBhutepsvv3KKoaoP8dnV/ HLdrF2/xX+xdVsb+0B13cPP7jVt3Qp7WIIWDUzAyG8lSzIJe7dS/iwJsr82zs576/kQR TK5zLnAOVkBYC2E7tBAoQHjL5dmGyyUOTbZ1YcwBFA7Gt4vK+b+CvIwxsEShBM+hoHHX GzB+a9Cf3pXBCB3F1BDxXIWCHiANiMosRB71ICnEfCWloi2S2H4UV+Ke9apQlBdNa95H ttfw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o16si765838eds.116.2021.01.13.02.06.37; Wed, 13 Jan 2021 02:07:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727170AbhAMKFg (ORCPT + 99 others); Wed, 13 Jan 2021 05:05:36 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:42756 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725873AbhAMKFg (ORCPT ); Wed, 13 Jan 2021 05:05:36 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R811e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=zhongjiang-ali@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0ULc.JMJ_1610532286; Received: from L-X1DSLVDL-1420.local(mailfrom:zhongjiang-ali@linux.alibaba.com fp:SMTPD_---0ULc.JMJ_1610532286) by smtp.aliyun-inc.com(127.0.0.1); Wed, 13 Jan 2021 18:04:47 +0800 Subject: Re: [PATCH 04/10] mm, fsdax: Refactor memory-failure handler for dax mapping To: Ruan Shiyang , Jan Kara Cc: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-raid@vger.kernel.org, darrick.wong@oracle.com, dan.j.williams@intel.com, david@fromorbit.com, hch@lst.de, song@kernel.org, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com References: <20201230165601.845024-1-ruansy.fnst@cn.fujitsu.com> <20201230165601.845024-5-ruansy.fnst@cn.fujitsu.com> <20210106154132.GC29271@quack2.suse.cz> <75164044-bfdf-b2d6-dff0-d6a8d56d1f62@cn.fujitsu.com> From: zhong jiang Message-ID: <781f276b-afdd-091c-3dba-048e415431ab@linux.alibaba.com> Date: Wed, 13 Jan 2021 18:04:46 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:85.0) Gecko/20100101 Thunderbird/85.0 MIME-Version: 1.0 In-Reply-To: <75164044-bfdf-b2d6-dff0-d6a8d56d1f62@cn.fujitsu.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/1/12 10:55 上午, Ruan Shiyang wrote: > > > On 2021/1/6 下午11:41, Jan Kara wrote: >> On Thu 31-12-20 00:55:55, Shiyang Ruan wrote: >>> The current memory_failure_dev_pagemap() can only handle single-mapped >>> dax page for fsdax mode.  The dax page could be mapped by multiple >>> files >>> and offsets if we let reflink feature & fsdax mode work together.  So, >>> we refactor current implementation to support handle memory failure on >>> each file and offset. >>> >>> Signed-off-by: Shiyang Ruan >> >> Overall this looks OK to me, a few comments below. >> >>> --- >>>   fs/dax.c            | 21 +++++++++++ >>>   include/linux/dax.h |  1 + >>>   include/linux/mm.h  |  9 +++++ >>>   mm/memory-failure.c | 91 >>> ++++++++++++++++++++++++++++++++++----------- >>>   4 files changed, 100 insertions(+), 22 deletions(-) > > ... > >>>   @@ -345,9 +348,12 @@ static void add_to_kill(struct task_struct >>> *tsk, struct page *p, >>>       } >>>         tk->addr = page_address_in_vma(p, vma); >>> -    if (is_zone_device_page(p)) >>> -        tk->size_shift = dev_pagemap_mapping_shift(p, vma); >>> -    else >>> +    if (is_zone_device_page(p)) { >>> +        if (is_device_fsdax_page(p)) >>> +            tk->addr = vma->vm_start + >>> +                    ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); >> >> It seems strange to use 'pgoff' for dax pages and not for any other >> page. >> Why? I'd rather pass correct pgoff from all callers of add_to_kill() and >> avoid this special casing... > > Because one fsdax page can be shared by multiple pgoffs.  I have to > pass each pgoff in each iteration to calculate the address in vma (for > tk->addr).  Other kinds of pages don't need this. They can get their > unique address by calling "page_address_in_vma()". > IMO,   an fsdax page can be shared by multiple files rather than multiple pgoffs if fs query support reflink.   Because an page only located in an mapping(page->mapping is exclusive),  hence it  only has an pgoff or index pointing at the node.  or  I miss something for the feature ?  thanks, > So, I added this fsdax case here.  This patchset only implemented the > fsdax case, other cases also need to be added here if to be implemented. > > > -- > Thanks, > Ruan Shiyang. > >> >>> +        tk->size_shift = dev_pagemap_mapping_shift(p, vma, tk->addr); >>> +    } else >>>           tk->size_shift = page_shift(compound_head(p)); >>>         /* >>> @@ -495,7 +501,7 @@ static void collect_procs_anon(struct page >>> *page, struct list_head *to_kill, >>>               if (!page_mapped_in_vma(page, vma)) >>>                   continue; >>>               if (vma->vm_mm == t->mm) >>> -                add_to_kill(t, page, vma, to_kill); >>> +                add_to_kill(t, page, NULL, 0, vma, to_kill); >>>           } >>>       } >>>       read_unlock(&tasklist_lock); >>> @@ -505,24 +511,19 @@ static void collect_procs_anon(struct page >>> *page, struct list_head *to_kill, >>>   /* >>>    * Collect processes when the error hit a file mapped page. >>>    */ >>> -static void collect_procs_file(struct page *page, struct list_head >>> *to_kill, >>> -                int force_early) >>> +static void collect_procs_file(struct page *page, struct >>> address_space *mapping, >>> +        pgoff_t pgoff, struct list_head *to_kill, int force_early) >>>   { >>>       struct vm_area_struct *vma; >>>       struct task_struct *tsk; >>> -    struct address_space *mapping = page->mapping; >>> -    pgoff_t pgoff; >>>         i_mmap_lock_read(mapping); >>>       read_lock(&tasklist_lock); >>> -    pgoff = page_to_pgoff(page); >>>       for_each_process(tsk) { >>>           struct task_struct *t = task_early_kill(tsk, force_early); >>> - >>>           if (!t) >>>               continue; >>> -        vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, >>> -                      pgoff) { >>> +        vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, >>> pgoff) { >>>               /* >>>                * Send early kill signal to tasks where a vma covers >>>                * the page but the corrupted page is not necessarily >>> @@ -531,7 +532,7 @@ static void collect_procs_file(struct page >>> *page, struct list_head *to_kill, >>>                * to be informed of all such data corruptions. >>>                */ >>>               if (vma->vm_mm == t->mm) >>> -                add_to_kill(t, page, vma, to_kill); >>> +                add_to_kill(t, page, mapping, pgoff, vma, to_kill); >>>           } >>>       } >>>       read_unlock(&tasklist_lock); >>> @@ -550,7 +551,8 @@ static void collect_procs(struct page *page, >>> struct list_head *tokill, >>>       if (PageAnon(page)) >>>           collect_procs_anon(page, tokill, force_early); >>>       else >>> -        collect_procs_file(page, tokill, force_early); >>> +        collect_procs_file(page, page->mapping, page_to_pgoff(page), >> >> Why not use page_mapping() helper here? It would be safer for THPs if >> they >> ever get here... >> >>                                 Honza >> >