Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp286498pxb; Sat, 18 Sep 2021 02:53:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy8sAiJoc+O0mc6Z5UL4YUuGwr+jBmcUoH4Y8sAnler2mc+Z1DzUz/W5mtAlA12X+jwl9Hn X-Received: by 2002:a92:cb87:: with SMTP id z7mr11630475ilo.315.1631958796588; Sat, 18 Sep 2021 02:53:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631958796; cv=none; d=google.com; s=arc-20160816; b=dblddVzP9wGSDdnt2z002i/CqDaS/oYVB/k5Zpg1vKcJpD9aMXQ/0zH+nNOEPnCXeC 9D4IET5jxbwZdoYCVRJNZJSoKS9MxtPTt2/3fvp/qvrHMdW1sHFcuaCLL5kyT/COl5gs yCQO17d2Ogi3p7+au53zIqAXTTcWbiBIBqCBbwrH0NuIfD3fylpJ+n5RoP+J3w49tiar Va/BGrjvdv+mdd662LEV5LrKtatDWMSnKrsP4JRPxWahLixiK7ucARNuaS2ZVRXhe4aJ gXNXHOJB3e6RB9fDEO2tzm402vsyaG4NIieBIHKgTjewDCfPPEd3YEBpeMec2x9TcZOq BRiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=+sg81iXMEFPwLKvEAjlUzcXn+hx/nQIFko4hRZEcPzU=; b=Bmej5N+u/4X00szMvPWE/4ffp07xe3X4dNbtj//JdmRXxkRNzEEQZIWORGkq/19wXQ UbprMPCy6z6X1e+yXOrfbWIpawdY+Plkg/yZXFpTa1RkbfaBeMJpjL5XirMnv1G0lqCW 3A1BKvCNzLFd3PgXEhutLetl/X8PYygBWJyoU/TvDj4Hfv13blUvY+f0i4dx4dPIIXUR RiWfv+5bW5WJKlNh8IaU/GdsmVSEOSCTOPBMd7NSCZIKVeFt4bYVdJpftm7V3qK0FyAm YKlxpB8gc2w8k0e+Cm0V1VFYuiJ8nfT2ilas1u9VcH0PwoqTA5uF9iPF470aA8+ADW0r n61g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="rb/Z/jJ8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y28si8347341iot.48.2021.09.18.02.53.05; Sat, 18 Sep 2021 02:53:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="rb/Z/jJ8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343579AbhIRAIY (ORCPT + 99 others); Fri, 17 Sep 2021 20:08:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:45816 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245639AbhIRAIX (ORCPT ); Fri, 17 Sep 2021 20:08:23 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id E659A60F46; Sat, 18 Sep 2021 00:07:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1631923621; bh=OUe3nAMlae74ZYW8tC5vlGgEhMH49GBMLtVi93PoftA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rb/Z/jJ81LgwmE7jE22mZGV56FF5RpfASb347mf9+Z+ey2uWzm42oW/vhbsD7KFpC aSsBDd45QEPrzLhvwwvm+q0OL7+q9ilpgOY1LcBJTXhdYhhNJKZrZNpCQlAF+cMIHa 7u0XT8UZ/FrMjwnXhbxFy0C65haYNdzu/0GhGRDGHvmYCbOZKetVrKsLsrtVwf9Jy3 aahqYmg4X3hT958JdzhpRVL+SLQsMJ7SKDQc7my+34HoP+hfvzMlZHicHi3AWlIa7W UzDWjMmkUdMOqcCu4v+BUV3w6PbM2STVmyuxZ8Z6RLQ1OdBbmPJBuCSAVh+piE7LFR +e5bOV9B7f++A== Date: Fri, 17 Sep 2021 17:07:00 -0700 From: "Darrick J. Wong" To: Dan Williams Cc: Christoph Hellwig , Jane Chu , Vishal L Verma , Dave Jiang , "Weiny, Ira" , Al Viro , Matthew Wilcox , Jan Kara , Linux NVDIMM , Linux Kernel Mailing List , linux-fsdevel Subject: Re: [PATCH 0/3] dax: clear poison on the fly along pwrite Message-ID: <20210918000700.GA10182@magnolia> References: <20210914233132.3680546-1-jane.chu@oracle.com> <516ecedc-38b9-1ae3-a784-289a30e5f6df@oracle.com> <20210915161510.GA34830@magnolia> <20210917152744.GA10250@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 17, 2021 at 01:21:25PM -0700, Dan Williams wrote: > On Fri, Sep 17, 2021 at 8:27 AM Darrick J. Wong wrote: > > > > On Fri, Sep 17, 2021 at 01:53:33PM +0100, Christoph Hellwig wrote: > > > On Thu, Sep 16, 2021 at 11:40:28AM -0700, Dan Williams wrote: > > > > > That was my gut feeling. If everyone feels 100% comfortable with > > > > > zeroingas the mechanism to clear poisoning I'll cave in. The most > > > > > important bit is that we do that through a dedicated DAX path instead > > > > > of abusing the block layer even more. > > > > > > > > ...or just rename dax_zero_page_range() to dax_reset_page_range()? > > > > Where reset == "zero + clear-poison"? > > > > > > I'd say that naming is more confusing than overloading zero. > > > > How about dax_zeroinit_range() ? > > Works for me. > > > > > To go with its fallocate flag (yeah I've been too busy sorting out -rc1 > > regressions to repost this) FALLOC_FL_ZEROINIT_RANGE that will reset the > > hardware (whatever that means) and set the contents to the known value > > zero. > > > > Userspace usage model: > > > > void handle_media_error(int fd, loff_t pos, size_t len) > > { > > /* yell about this for posterior's sake */ > > > > ret = fallocate(fd, FALLOC_FL_ZEROINIT_RANGE, pos, len); > > > > /* yay our disk drive / pmem / stone table engraver is online */ > > The fallocate mode can still be error-aware though, right? When the FS > has knowledge of the error locations the fallocate mode could be > fallocate(fd, FALLOC_FL_OVERWRITE_ERRORS, pos, len) with the semantics > of attempting to zero out any known poison extents in the given file > range? At the risk of going overboard on new fallocate modes there > could also (or instead of) be FALLOC_FL_PUNCH_ERRORS to skip trying to > clear them and just ask the FS to throw error extents away. It /could/ be, but for now I've stuck to what you see is what you get -- if you tell it to 'zero initialize' 1MB of pmem, it'll write zeroes and clear the poison on all 1MB, regardless of the old contents. IOWs, you can use it from a poison handler on just the range that it told you about, or you could use it to bulk-clear a lot of space all at once. A dorky thing here is that the dax_zero_page_range function returns EIO if you tell it to do more than one page... > > > } > > > > > > > I'm really worried about both patartitions on DAX and DM passing through > > > > > DAX because they deeply bind DAX to the block layer, which is just a bad > > > > > idea. I think we also need to sort that whole story out before removing > > > > > the EXPERIMENTAL tags. > > > > > > > > I do think it was a mistake to allow for DAX on partitions of a pmemX > > > > block-device. > > > > > > > > DAX-reflink support may be the opportunity to start deprecating that > > > > support. Only enable DAX-reflink for direct mounting on /dev/pmemX > > > > without partitions (later add dax-device direct mounting), > > > > > > I think we need to fully or almost fully sort this out. > > > > > > Here is my bold suggestions: > > > > > > 1) drop no drop the EXPERMINTAL on the current block layer overload > > > at all > > > > I don't understand this. > > > > > 2) add direct mounting of the nvdimm namespaces ASAP. Because all > > > the filesystem currently also need the /dev/pmem0 device add a way > > > to open the block device by the dax_device instead of our current > > > way of doing the reverse > > > 3) deprecate DAX support through block layer mounts with a say 2 year > > > deprecation period > > > 4) add DAX remapping devices as needed > > > > What devices are needed? linear for lvm, and maybe error so we can > > actually test all this stuff? > > The proposal would be zero lvm support. The nvdimm namespace > definition would need to grow support for concatenation + striping. Ah, ok. > Soft error injection could be achieved by writing to the badblocks > interface. I'll send out an RFC of what I have currently. --D