Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp656790pxu; Tue, 1 Dec 2020 23:18:02 -0800 (PST) X-Google-Smtp-Source: ABdhPJxHKA0WWDYQ6X9HLSFrclVthMMVo7U8gC9ocs+j/0jwEjiLDy8kLnXB1kYEGKPWC2YNbu1E X-Received: by 2002:a17:906:7a18:: with SMTP id d24mr1068434ejo.324.1606893482592; Tue, 01 Dec 2020 23:18:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606893482; cv=none; d=google.com; s=arc-20160816; b=k4IHwOOYyf1f3CaU2SrKCaZZ694u6JOXF0QGzOxB8+2iGr/BrhVB3TcNtAWMCJo/Jr 49evaT82CRNallfJB+e8xr2U3tTaBVJ3BT339gWUaZ1g2uTCL0TmwpDDsu2cdMo2RPjH woZW2wJAnijMThXk+v/tnTzTpMg+3OcURf3STjUwxOfKxQ7I2GT5gfc7QLgHsowDO8WP ne/RO9n16ebpJBmRrrFiCULAnlP0q3MkDzIFaSvm8eje1AONhuOtBmudpIQqGSTBVAeA d1a8B1diWg3ICtMpGyOlSfexI77qmHxVGuX5PFxWcGT+sPYY+D0+XwjdSkNvKOYTHjx+ mULg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=qf3RJFgKobaXkGvG7vXJk3ykdAHypy3PHjlHd1D62kY=; b=V6u9SfAXCNqwd9SrpliSLfXpMkPjXLAV5SUS4OP1qP8IFTwvXQVyt5OfDSSHBZxs1b IWujDMaicoXBJdWI+Ey+XPO2LzKiR6dsLy0FsTgZe7TekkJMG4kCdAS40pLnF6ABVQU6 gOc1LTcGNsB1cEpEw63D/QxqGD60hZrYxaEPrdtgtF7Inf5YBWZRobSzEuGttM4tnmmI ZOmG2JbysyTEFyj3+lyFPSvkmTNX4qSKLQ9yriwGKllSo5L3V4gPi7INnuoW7ug0J1Sz /QvCjQdhIf92ojKdcBr8ecgJRy5g9nN6R/auIR25Vr0VEiV4IjD4m56+3fx0LLo1/kg9 m8rw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hr21si377836ejc.446.2020.12.01.23.17.38; Tue, 01 Dec 2020 23:18:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728787AbgLBHO5 (ORCPT + 99 others); Wed, 2 Dec 2020 02:14:57 -0500 Received: from mail.cn.fujitsu.com ([183.91.158.132]:13741 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728105AbgLBHO5 (ORCPT ); Wed, 2 Dec 2020 02:14:57 -0500 X-IronPort-AV: E=Sophos;i="5.78,385,1599494400"; d="scan'208";a="101976775" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 02 Dec 2020 15:14:10 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id 11B374CE5CF5; Wed, 2 Dec 2020 15:14:06 +0800 (CST) Received: from irides.mr (10.167.225.141) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 2 Dec 2020 15:14:05 +0800 Subject: Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink To: Dave Chinner CC: , , , , , , , , , , , , References: <20201123004116.2453-1-ruansy.fnst@cn.fujitsu.com> <20201129224723.GG2842436@dread.disaster.area> From: Ruan Shiyang Message-ID: Date: Wed, 2 Dec 2020 15:12:20 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20201129224723.GG2842436@dread.disaster.area> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.167.225.141] X-ClientProxiedBy: G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) To G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) X-yoursite-MailScanner-ID: 11B374CE5CF5.A2628 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Dave, On 2020/11/30 上午6:47, Dave Chinner wrote: > On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote: >> >> The call trace is like this: >> memory_failure() >> pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() >> gendisk->fops->block_lost() => pmem_block_lost() or >> md_blk_block_lost() >> sb->s_ops->storage_lost() => xfs_fs_storage_lost() >> xfs_rmap_query_range() >> xfs_storage_lost_helper() >> mf_recover_controller->recover_fn => \ >> memory_failure_dev_pagemap_kill_procs() >> >> The collect_procs() and kill_procs() are moved into a callback which >> is passed from memory_failure() to xfs_storage_lost_helper(). So we >> can call it when a file assocaited is found, instead of creating a >> file list and iterate it. >> >> The fsdax & reflink support for XFS is not contained in this patchset. > > This looks promising - the overall architecture is a lot more > generic and less dependent on knowing about memory, dax or memory > failures. A few comments that I think would further improve > understanding the patchset and the implementation: Thanks for your kindly comment. It gives me confidence. > > - the order of the patches is inverted. It should start with a > single patch introducing the mf_recover_controller structure for > callbacks, then introduce pgmap->ops->memory_failure, then > ->block_lost, then the pmem and md implementations of ->block > list, then ->storage_lost and the XFS implementations of > ->storage_lost. Yes, it will be easier to understand the patchset in this order. But I have something unsure: for example, I introduce ->memory_failure() firstly, but the implementation of ->memory_failure() needs to call ->block_lost() which is supposed to be introduced in the next patch. So, I am not sure the code is supposed to be what in the implementation of ->memory_failure() in pmem? To avoid this situation, I committed the patches in the inverted order: lowest level first, then its caller, and then caller's caller. I am trying to sort out the order. How about this: Patch i. Introduce ->memory_failure() - just introduce interface, without implementation Patch i++. Introduce ->block_lost() - introduce interface and implement ->memory_failure() in pmem, so that it can call ->block_lost() Patch i++. (similar with above, skip...) > > - I think the names "block_lost" and "storage_lost" are misleading. > It's more like a "media failure" or a general "data corruption" > event at a specific physical location. The data may not be "lost" > but only damaged, so we might be able to recover from it without > "losing" anything. Hence I think they could be better named, > perhaps just "->corrupt_range" 'corrupt' sounds better. (I'm not good at naming functions...) > > - need to pass a {offset,len} pair through the chain, not just a > single offset. This will allow other types of devices to report > different ranges of failures, from a single sector to an entire > device. Yes, it's better to add the length. I restrictively thought that memory-failure on pmem should affect one single page at one time. > > - I'm not sure that passing the mf_recover_controller structure > through the corruption event chain is the right thing to do here. > A block device could generate this storage failure callback if it > detects an unrecoverable error (e.g. during a MD media scrub or > rebuild/resilver failure) and in that case we don't have PFNs or > memory device failure functions to perform. > > IOWs, I think the action that is taken needs to be independent of > the source that generated the error. Even for a pmem device, we > can be using the page cache, so it may be possible to recover the > pmem error by writing the cached page (if it exists) back over the > pmem. > > Hence I think that the recover function probably needs to be moved > to the address space ops, because what we do to recover from the > error is going to be dependent on type of mapping the filesystem > is using. If it's a DAX mapping, we call back into a generic DAX > function that does the vma walk and process kill functions. If it > is a page cache mapping, then if the page is cached then we can > try to re-write it to disk to fix the bad data, otherwise we treat > it like a writeback error and report it on the next > write/fsync/close operation done on that file. > > This gets rid of the mf_recover_controller altogether and allows > the interface to be used by any sort of block device for any sort > of bottom-up reporting of media/device failures. Moving the recover function to the address_space ops looks a better idea. But I think that the error handler for page cache mapping is finished well in memory-failure. The memory-failure is also reused to handles anonymous page. If we move the recover function to address_space ops, I think we also need to refactor the existing handler for page cache mapping, which may affect anonymous page handling. This makes me confused... I rewrote the call trace: memory_failure() * dax mapping case pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() gendisk->fops->block_corrupt_range() => - pmem_block_corrupt_range() - md_blk_block_corrupt_range() sb->s_ops->storage_currupt_range() => xfs_fs_storage_corrupt_range() xfs_rmap_query_range() xfs_storage_lost_helper() mapping->a_ops->corrupt_range() => xfs_dax_aops.xfs_dax_corrupt_range memory_failure_dev_pagemap_kill_procs() * page cache mapping case mapping->a_ops->corrupt_range() => xfs_address_space_operations.xfs_xxx memory_failure_generic_kill_procs() It's rough and not completed yet. Hope for your comment. -- Thanks, Ruan Shiyang. > > Cheers, > > Dave. >