Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp190279pxj; Thu, 17 Jun 2021 00:06:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw1LYfnImqQfeY7k0R7gzsvJOW8yHj6+qq1FGcUy/Chuu0xbatkyiDji7sDScgcCK3Gt1mj X-Received: by 2002:a05:6402:5207:: with SMTP id s7mr4585374edd.363.1623913561916; Thu, 17 Jun 2021 00:06:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623913561; cv=none; d=google.com; s=arc-20160816; b=CwOo3veQVETS83F/9rgzfZKiivLN71ZuEUPImoqABxDCy9xkuUShLdqLn2sVOhrnRV bdZY8VwnoG/e5Uuz4nuWN4tN7J5FwiH5nPweb5zYXN5mxTEmPjpwoyjWpN8tcxgCMstu GeB7eW4M28KYR6AWQOH0TUH1pmGouW7yYNHUF4Xj7V7ufv5Z723qzEfhGtSRv9vXx/8Q k/uqtPq8gZExQomuiuawrTbm0t6xrr0g8tiqfNVFtzMDcYgfwjEwW1iWyEi+xb3E516T RAohYPt9VTL9BE8kgOcRjC/lgk75D7SgEzFPmCeItegQW3uIig+3FiqgxJVt8NyhXhfl HFWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=LDO+ysM0sZT13J4jvjx62XJsQjzELPOWU0d8eft8zqE=; b=NySitcbVuknNzXA5RcBxsZi8ZN4KkyxTkDU45dZQavRZRWyd/BbOGXUDjXJ+MGf50V 1sEdN6/ihhFMNGm6aSwih5ZICDoRa+hde0ii6+KXmn+jh0KxZ3vC2tMOj9ehogI7NCIO Ht71t/S3i3isQGsprTq5vNqQGQlmV799+CrEQUoinG+qt9vd3YFK4nzDp9QpxP+iWg84 b4fgwVyCQAuJ9t9I9BcrWKYTi9jgA+nNfW8qNWgNvF/4UlLE687PbraEHYtQVpx4RuB4 7hlveUEPcCoHU2EYDRShlqqkC4QjHQMbzWbtmlZ9gXuqyVjNu+rYL1Z2Bf0fsqTywwnd Kewg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=MSYHsRMV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d12si4606726edq.386.2021.06.17.00.05.38; Thu, 17 Jun 2021 00:06:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=MSYHsRMV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229515AbhFQHGg (ORCPT + 99 others); Thu, 17 Jun 2021 03:06:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229580AbhFQHGc (ORCPT ); Thu, 17 Jun 2021 03:06:32 -0400 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 759BAC06175F for ; Thu, 17 Jun 2021 00:04:25 -0700 (PDT) Received: by mail-pg1-x52e.google.com with SMTP id n12so4149354pgs.13 for ; Thu, 17 Jun 2021 00:04:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=LDO+ysM0sZT13J4jvjx62XJsQjzELPOWU0d8eft8zqE=; b=MSYHsRMV+YBgrqCmtspIiBHsjVSnp8q2rrU7xIz1exEHNDD8d7ZXVFDAiYBGzUTszU oxWDRoOSLJRRil8r0ifAAGjL5CWEnd/9F61h67dMi34JsBfVO0VLMQtf5+pezRqLC5go PZTK6NzMCmnsv6Qee5K86hnFigHdQ1dgeZcQIuVrdFW0XhnCWNCv6gPMF5NyGFHxeSL2 dMEKiaeQr/vUpglRZ2etWHZYLgOnrHHZEM97IasJ8/tLkV4naYw1TegZMq93k1Ibrwf3 nx7W6YkErqI+becB9QtyiVW1SHmC8nzieD38fWwyhux5OnGh/tqoZ0JbbhUCgYq1ZMiH JyTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=LDO+ysM0sZT13J4jvjx62XJsQjzELPOWU0d8eft8zqE=; b=ELiQWckZtjjtHX2ZqXnWMJJ6hu+XSCcBRi+gSZrdzkj4mkOPj1dxeV0rT6OLEiqtx7 oaWhGXt2XMlpUNJUneKEqe9oHcSZrkfTiNQTQvHSo4L+auWsozp8fF+/9NfxByGEdspv p9UR6pLrDV5IXUgTVp9igODklfm0WCVic70pEPJsB4pudnRsjJI5Ih2+VfWX0OcXKLC7 QRkyOp/jMnY8sm2cpDT7OSethMKhnuO9yNe2TIVh3b8f+vyePI7Fh/YIQrHx2j59w7w1 OTuEjSzIWG022X/mpQitqEqsGJkTPKP7eZn/H98W6S8P/3WFgkQNEsY2VlMcVPt8C5tf NqhQ== X-Gm-Message-State: AOAM530GQwypr5JbO4kU1dXBkOU5ISMcdNvxZRUxIP+cDlBqQYrrbkgo DlaAlrbU4Z3Rr/OxCS4uaegodj7i2KhpJKIdhZUe7w== X-Received: by 2002:a62:768c:0:b029:2ff:2002:d3d0 with SMTP id r134-20020a62768c0000b02902ff2002d3d0mr324838pfc.70.1623913464979; Thu, 17 Jun 2021 00:04:24 -0700 (PDT) MIME-Version: 1.0 References: <20210604011844.1756145-1-ruansy.fnst@fujitsu.com> <20210604011844.1756145-4-ruansy.fnst@fujitsu.com> In-Reply-To: From: Dan Williams Date: Thu, 17 Jun 2021 00:04:14 -0700 Message-ID: Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for superblock To: "ruansy.fnst@fujitsu.com" Cc: Linux Kernel Mailing List , linux-xfs , Linux MM , linux-fsdevel , device-mapper development , "Darrick J. Wong" , david , Christoph Hellwig , Alasdair Kergon , Mike Snitzer , Goldwyn Rodrigues , Linux NVDIMM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 16, 2021 at 11:51 PM ruansy.fnst@fujitsu.com wrote: > > > -----Original Message----- > > From: Dan Williams > > Subject: Re: [PATCH v4 03/10] fs: Introduce ->corrupted_range() for sup= erblock > > > > [ drop old linux-nvdimm@lists.01.org, add nvdimm@lists.linux.dev ] > > > > On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan w= rote: > > > > > > Memory failure occurs in fsdax mode will finally be handled in > > > filesystem. We introduce this interface to find out files or metadat= a > > > affected by the corrupted range, and try to recover the corrupted dat= a > > > if possiable. > > > > > > Signed-off-by: Shiyang Ruan > > > --- > > > include/linux/fs.h | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h index > > > c3c88fdb9b2a..92af36c4225f 100644 > > > --- a/include/linux/fs.h > > > +++ b/include/linux/fs.h > > > @@ -2176,6 +2176,8 @@ struct super_operations { > > > struct shrink_control *); > > > long (*free_cached_objects)(struct super_block *, > > > struct shrink_control *); > > > + int (*corrupted_range)(struct super_block *sb, struct block_d= evice > > *bdev, > > > + loff_t offset, size_t len, void *data)= ; > > > > Why does the superblock need a new operation? Wouldn't whatever functio= n is > > specified here just be specified to the dax_dev as the > > ->notify_failure() holder callback? > > Because we need to find out which file is effected by the given poison pa= ge so that memory-failure code can do collect_procs() and kill_procs() jobs= . And it needs filesystem to use its rmap feature to search the file from = a given offset. So, we need this implemented by the specified filesystem a= nd called by dax_device's holder. > > This is the call trace I described in cover letter: > memory_failure() > * fsdax case > pgmap->ops->memory_failure() =3D> pmem_pgmap_memory_failure() > dax_device->holder_ops->corrupted_range() =3D> > - fs_dax_corrupted_range() > - md_dax_corrupted_range() > sb->s_ops->currupted_range() =3D> xfs_fs_corrupted_range() <=3D=3D= **HERE** > xfs_rmap_query_range() > xfs_currupt_helper() > * corrupted on metadata > try to recover data, call xfs_force_shutdown() > * corrupted on file data > try to recover data, call mf_dax_kill_procs() > * normal case > mf_generic_kill_procs() > > As you can see, this new added operation is an important for the whole pr= ogress. I don't think you need either fs_dax_corrupted_range() nor sb->s_ops->corrupted_range(). In fact that fs_dax_corrupted_range() looks broken because the filesystem may not even be mounted on the device associated with the error. The holder_data and holder_op should be sufficient from communicating the stack of notifications: pgmap->notify_memory_failure() =3D> pmem_pgmap_notify_failure() pmem_dax_dev->holder_ops->notify_failure(pmem_dax_dev) =3D> md_dax_notify_failure() md_dax_dev->holder_ops->notify_failure() =3D> xfs_notify_failure() I.e. the entire chain just walks dax_dev holder ops.