Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp521850pxf; Wed, 24 Mar 2021 09:38:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy6DlnQuXbyqYdaaoowlIZqg9XkndgqPArRviYBd7rIWMYK3lIoPGi7N8Ozz3Dn4waxAtMJ X-Received: by 2002:a17:907:929:: with SMTP id au9mr1697738ejc.28.1616603926042; Wed, 24 Mar 2021 09:38:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616603926; cv=none; d=google.com; s=arc-20160816; b=T7xbpHJ1JjoccBJuOtddbrr5NFjxlUa0MlkuooXv0BpIWSvuPWv06rNMixvGRai6Uf iIPcPt90GylNbi+MusEpRfP1/8Y7q8nwcySdP5CQMno/vmWuE2nm7ZKPozYuieW4ukGG vxXZQbJH9+Yj0ptxX6TvMyPLllRNWjpY+3gxNbxPi9/k/36al7JdaVKMNezNuVHJAQvG x+gO4FMJAP4WVo3fpIjiDEjQ+Zuf/fUWMcSqm/AGrudWg/TBkn5ayirb6vmKMrYP1I9t hY8xcO8Jixa9y5/7F/qelGBwIHXMuto0ko31TaADjqO0iorJNcVwLaLv6TkYBWIw03+0 OTsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=F78F4/Kq+ndIvAG3q41JlNnbimm1afFKY1uPrkgKEKE=; b=kJ+kIE9dA8FIJCx2B4j7ElCgwxC3Y2+6UYZazXM2RZc7IAGEBVKiK3tGHrPo+dXGIz 2txiJ6U+mMNkD4ENNmZbrRKPwkZ1eTR7Cy/5yCsMJ9JBhxVYkcS0AsNpaSFV5YWh2uOM aIEFadqWu/xYqdUIzkoxAOvCZ/eDEMe1la9iv7y6wMWznGWHQT+UwfQzUKbo3ZWqhnNY k4lZ6g1t/Ef87DdsIDmHs3hRdTkYDyex3XbYqqU+eYAhjT/ueptmOz7ZlB3jBfePvGBF DwH9CPsbYHThNm05Hm2qNTGUdAAlZXqj+szX/33G30dyU3dTsVgqop2H66/999PiTLmz klwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=CsDx6xAl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z14si2192028edq.36.2021.03.24.09.38.21; Wed, 24 Mar 2021 09:38:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=CsDx6xAl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235832AbhCXQhS (ORCPT + 99 others); Wed, 24 Mar 2021 12:37:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236042AbhCXQhO (ORCPT ); Wed, 24 Mar 2021 12:37:14 -0400 Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BF77C0613DF for ; Wed, 24 Mar 2021 09:37:13 -0700 (PDT) Received: by mail-ed1-x52d.google.com with SMTP id e7so28298325edu.10 for ; Wed, 24 Mar 2021 09:37:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=F78F4/Kq+ndIvAG3q41JlNnbimm1afFKY1uPrkgKEKE=; b=CsDx6xAl1VtYRX8X8nlGVP72pS+Njf2eRG9UeLEcNYw/0GG+u51E8oRfnZj6k3/sNg aRLLF/mmJCW58czyShqa6WkR7qJxQwuiCLVxyrBpZ+TAPagZGTpDfa7+zl7Dur1DKlk4 B1AAACFL6xsj3ekCmh1gjol0o/QSp3xiwcbei6jwyZhVy3/cDah43fcg8J30O9ZE/PgR mQCXAHzOBzR2wsLGgkCG9GE3xk75u+tWfyPHSZJGznQsHqijxofg2L4Q1jPv+MKqUCHE RQjr95+R6I9EMKEYtmdWrRAb9fTP8P6XndMLkaPTVlg8oBOUifEZ/RsV84+JGEN92W79 y2cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=F78F4/Kq+ndIvAG3q41JlNnbimm1afFKY1uPrkgKEKE=; b=m02na69FoD3l3+K7rOjV/CC5Qx+YS6l/4m09g68mzeu5POBLQ+KLW7xgk4IqTxjpZi 4dbAjxkojoWPHkkNioXX2IesQf/O91pyvcI2rasIiSZKDzh90jbSFayA5GRSTTtSIvEC bVde3aMIebKRj2eGIU5h2fCXm7h1HJXHOtYKpk11mIVNw+ixJobew5BAdaDzzcKvRAs2 8+0AyKPEnJxnUPC8g4+brc/jHUjmghxS270HIFvm5MLgmN3U9kQXWZeQHsnU2AfggJSD O2iw1SXTvcYy/KJZ1k+ZNQ/o0AjHFlOxVHMvWCOtvJqWoLnz9l2qoS6kUPTWBiMswZwa kV6w== X-Gm-Message-State: AOAM5316GfhQN2vDQmYClW5yrSd24uqYzKSGPNmfyeAyV1JWkPYJbnT8 awLb5MO+y/oXskrNG9Hf0jBMUKy64rj/ZU/mtmYXkg== X-Received: by 2002:a05:6402:4301:: with SMTP id m1mr4548882edc.210.1616603832460; Wed, 24 Mar 2021 09:37:12 -0700 (PDT) MIME-Version: 1.0 References: <20210208105530.3072869-1-ruansy.fnst@cn.fujitsu.com> <20210208105530.3072869-2-ruansy.fnst@cn.fujitsu.com> <20210324074751.GA1630@lst.de> In-Reply-To: <20210324074751.GA1630@lst.de> From: Dan Williams Date: Wed, 24 Mar 2021 09:37:01 -0700 Message-ID: Subject: Re: [PATCH v3 01/11] pagemap: Introduce ->memory_failure() To: Christoph Hellwig Cc: "ruansy.fnst@fujitsu.com" , Linux Kernel Mailing List , linux-xfs , linux-nvdimm , Linux MM , linux-fsdevel , device-mapper development , "Darrick J. Wong" , david , Alasdair Kergon , Mike Snitzer , Goldwyn Rodrigues , "qi.fuli@fujitsu.com" , "y-goto@fujitsu.com" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 24, 2021 at 12:48 AM Christoph Hellwig wrote: > > On Tue, Mar 23, 2021 at 07:19:28PM -0700, Dan Williams wrote: > > So I think the path forward is: > > > > - teach memory_failure() to allow for ranged failures > > > > - let interested drivers register for memory failure events via a > > blocking_notifier_head > > Eww. As I said I think the right way is that the file system (or > other consumer) can register a set of callbacks for opening the device. How does that solve the problem of the driver being notified of all pfn failure events? Today pmem only finds out about the ones that are notified via native x86 machine check error handling via a notifier (yes "firmware-first" error handling fails to do the right thing for the pmem driver), or the ones that are eventually reported via address range scrub, but only for the nvdimms that implement range scrubbing. memory_failure() seems a reasonable catch all point to route pfn failure events, in an arch independent way, to interested drivers. I'm fine swapping out dax_device blocking_notiier chains for your proposal, but that does not address all the proposed reworks in my list which are: - delete "drivers/acpi/nfit/mce.c" - teach memory_failure() to be able to communicate range failure - enable memory_failure() to defer to a filesystem that can say "critical metadata is impacted, no point in trying to do file-by-file isolation, bring the whole fs down". > I have a series I need to finish and send out to do that for block > devices. We probably also need the concept of a holder for the dax > device to make it work nicely, as otherwise we're going to have a bit > of a mess. Ok, I'll take a look at adding a holder. > > > This obviously does not solve Dave's desire to get this type of error > > reporting on block_devices, but I think there's nothing stopping a > > parallel notifier chain from being created for block-devices, but > > that's orthogonal to requirements and capabilities provided by > > dax-devices. > > FYI, my series could easily accomodate that if we ever get a block > driver that actually could report such errors. Sure, whatever we land for a dax_device could easily be adopted for a block device.