Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp249597pxb; Sat, 18 Sep 2021 01:34:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxtXVyhYiullitetJdP0QcdEjeEYWv4LWZ/JZWVag8jDG6TrYMCoqosmk5OtGL2t8H/4wV5 X-Received: by 2002:a17:906:c1c9:: with SMTP id bw9mr17173900ejb.3.1631954079991; Sat, 18 Sep 2021 01:34:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631954079; cv=none; d=google.com; s=arc-20160816; b=Uxk7k/tSRY1pZePhcIv2JHtxhfLbLHF7qonSOW6r6Ey4rR4Dk83rhPZ+wcgVbR6OXe Yh0hOwbjVzmfTGhXiaBrtur71QwF3T3CeVBZIrRtMSZk8V8arPDpExEGbCXppYeeffkr XikNUgutyYQkGPMOwACOe4sGvU0LP+vrjIREVd8itbTRphgHJtjz3CPCgUZEsYVa7luN sChunvzeR1EMM8LvMxhqTCJXeomzvZjqZScBa5TPw+tVRHwAq6rg/cF+7KS/Oi2SW5/I 3QsXkWsTRVCNVlFX4MQDtKStsiGhUls/I8h8Hk5UeEbmaSE6FJJQ7RulQFyLGGXGu87T nmsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=8P/ocM2t2YFJb3hFlVvikABipXfDcrvJYjnQI/Pc2oY=; b=ZU6g/owgeCmb0jE9rDsviSN+x3Uk7Q1/M5e9CrL3LwgW4u2rl89S/9RyabYFZh8duS McuAQHvh4PZz17SFS/mC9MF0oEe6U/nQBm/+8AynsleWBXX+Uaf0FteMBEvjOHF00f6D pbdir1PuxAyQ7VJ5A8ac1wkUcVSh87Hyo7Lq2ZgRtndiwWtgnS0aoG7u3Xd4ZdJn8BnR f/gLdjMI2XpO9SmcD6AVQd6xVV4lRWzcIpn+uTYDBepONRboLsTnQlDQVRu/T0BJTV9F iWK7ul4nTBeKbj04j4MLxOp63+UNscfT/vZPhMpeM+hWqYsW52YnrJ5GoBvWlOjLpuTS fEDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=jKkIebtQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bf15si10229722edb.235.2021.09.18.01.34.15; Sat, 18 Sep 2021 01:34:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=jKkIebtQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236345AbhIQUXA (ORCPT + 99 others); Fri, 17 Sep 2021 16:23:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231656AbhIQUW7 (ORCPT ); Fri, 17 Sep 2021 16:22:59 -0400 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1027FC061574 for ; Fri, 17 Sep 2021 13:21:37 -0700 (PDT) Received: by mail-pg1-x52e.google.com with SMTP id u18so10748443pgf.0 for ; Fri, 17 Sep 2021 13:21:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8P/ocM2t2YFJb3hFlVvikABipXfDcrvJYjnQI/Pc2oY=; b=jKkIebtQy/64rwoKx8BBORdPcS7pcRGFeer2ZITl4fQMj2Dxupuu8/2V6F2WUIiRkZ hzsm7e1i/2VHuxg0VSqvFlJiKo0v4Ui286Ovcg3zT9OmLgSeGd97TauIHY6gvxDuYaPp Xj+s4AdSeqmWM5KrlaKW2FozE8SCoJQUnUsnXpBY44rpHE1ZTdbB6LEpZu88MrQjep7R 6dK6ZFuyRSzTvKBe7D2Cd1RDp4FaHAL4h9/IOxlpja4CPXC+Ksd+iP2Pp841IZmCD9v1 iMe6lGen3gv7yEW9D29g9Eu7JJhJY6VQ+yL8ECvjWeyHbczH3lmWBhnQQBjQeyP1/+Bk PMMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8P/ocM2t2YFJb3hFlVvikABipXfDcrvJYjnQI/Pc2oY=; b=iBpGj7FR/4XoDPq3toP/LCaasASU+eCqyt0cJmZjPIzJkYpQEBCEZMpDXZHTB/AzcH 3qhLWQT8vgd6NQc2tpQicDUMbfDUi6wo8BtRTTgfl1V8yDFk0vG1R54kq1PQU04j/W9Q bTw9MJatkptbC5KXHC3YKUxk5678zuZv7BcESHGmvnFgWTRNJQkmAUVog6KfpdOmZJHF 3qKIGI03ABbOidBaF+5Jgr2FlMgITRfFgCgX+bA4I7V/6+qd9rLwkipxg0b0dt8LPFpm YMAwwlcBxYTpEimugrilWI4QlwWeetn+3CDlRw7kLn1HONLRX7t8yo9axD8sV4a07vaU Wqbw== X-Gm-Message-State: AOAM530gA54vexkEAKkExGGUIKvGmBwnLf3Df3h41MfmzBI7Rjc6ehmd QvwloSoN3vHP6RGhgGhUf+mXsXIRY4x/CvMq6exCig== X-Received: by 2002:a63:1262:: with SMTP id 34mr11373646pgs.356.1631910096519; Fri, 17 Sep 2021 13:21:36 -0700 (PDT) MIME-Version: 1.0 References: <20210914233132.3680546-1-jane.chu@oracle.com> <516ecedc-38b9-1ae3-a784-289a30e5f6df@oracle.com> <20210915161510.GA34830@magnolia> <20210917152744.GA10250@magnolia> In-Reply-To: <20210917152744.GA10250@magnolia> From: Dan Williams Date: Fri, 17 Sep 2021 13:21:25 -0700 Message-ID: Subject: Re: [PATCH 0/3] dax: clear poison on the fly along pwrite To: "Darrick J. Wong" Cc: Christoph Hellwig , Jane Chu , Vishal L Verma , Dave Jiang , "Weiny, Ira" , Al Viro , Matthew Wilcox , Jan Kara , Linux NVDIMM , Linux Kernel Mailing List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 17, 2021 at 8:27 AM Darrick J. Wong wrote: > > On Fri, Sep 17, 2021 at 01:53:33PM +0100, Christoph Hellwig wrote: > > On Thu, Sep 16, 2021 at 11:40:28AM -0700, Dan Williams wrote: > > > > That was my gut feeling. If everyone feels 100% comfortable with > > > > zeroingas the mechanism to clear poisoning I'll cave in. The most > > > > important bit is that we do that through a dedicated DAX path instead > > > > of abusing the block layer even more. > > > > > > ...or just rename dax_zero_page_range() to dax_reset_page_range()? > > > Where reset == "zero + clear-poison"? > > > > I'd say that naming is more confusing than overloading zero. > > How about dax_zeroinit_range() ? Works for me. > > To go with its fallocate flag (yeah I've been too busy sorting out -rc1 > regressions to repost this) FALLOC_FL_ZEROINIT_RANGE that will reset the > hardware (whatever that means) and set the contents to the known value > zero. > > Userspace usage model: > > void handle_media_error(int fd, loff_t pos, size_t len) > { > /* yell about this for posterior's sake */ > > ret = fallocate(fd, FALLOC_FL_ZEROINIT_RANGE, pos, len); > > /* yay our disk drive / pmem / stone table engraver is online */ The fallocate mode can still be error-aware though, right? When the FS has knowledge of the error locations the fallocate mode could be fallocate(fd, FALLOC_FL_OVERWRITE_ERRORS, pos, len) with the semantics of attempting to zero out any known poison extents in the given file range? At the risk of going overboard on new fallocate modes there could also (or instead of) be FALLOC_FL_PUNCH_ERRORS to skip trying to clear them and just ask the FS to throw error extents away. > } > > > > > I'm really worried about both patartitions on DAX and DM passing through > > > > DAX because they deeply bind DAX to the block layer, which is just a bad > > > > idea. I think we also need to sort that whole story out before removing > > > > the EXPERIMENTAL tags. > > > > > > I do think it was a mistake to allow for DAX on partitions of a pmemX > > > block-device. > > > > > > DAX-reflink support may be the opportunity to start deprecating that > > > support. Only enable DAX-reflink for direct mounting on /dev/pmemX > > > without partitions (later add dax-device direct mounting), > > > > I think we need to fully or almost fully sort this out. > > > > Here is my bold suggestions: > > > > 1) drop no drop the EXPERMINTAL on the current block layer overload > > at all > > I don't understand this. > > > 2) add direct mounting of the nvdimm namespaces ASAP. Because all > > the filesystem currently also need the /dev/pmem0 device add a way > > to open the block device by the dax_device instead of our current > > way of doing the reverse > > 3) deprecate DAX support through block layer mounts with a say 2 year > > deprecation period > > 4) add DAX remapping devices as needed > > What devices are needed? linear for lvm, and maybe error so we can > actually test all this stuff? The proposal would be zero lvm support. The nvdimm namespace definition would need to grow support for concatenation + striping. Soft error injection could be achieved by writing to the badblocks interface.