Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp3638691ybg; Fri, 25 Oct 2019 06:58:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqx4ZBA+95HAYUXHfJWsbDjgTMh/10Gk9EdSiiHHbJJ8QHchPSQvL91A9kJWHx8LJEhMd0oY X-Received: by 2002:a05:6402:7d2:: with SMTP id u18mr4161533edy.23.1572011930760; Fri, 25 Oct 2019 06:58:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572011930; cv=none; d=google.com; s=arc-20160816; b=n1n37cES70osE0UbvRRFLaHacPLjkMJ0cBHTFvE+rfpqFXAkZ4c4uoXrcHcaKLtIY5 izNBsAyFFjmk/eNKvK7cyGfI7vv7psFp0BIIdxlZBdDoIKA3b6bSIyp1GaIK+rXfWQxB 0LtuEJkiNCGpdLZgUTd13xwb2JKdryQyHzrtmBpdWTq45GVx7T6PCZfGnv6NDsj5XMsI 2dY7RlTgN5+/t7YiWILkJ0wUjzCRrCxCvGOIvj1C4nvxi3ipD1LlUChe4TA42sZCC8a3 KgyvEvRGiYEYg9JY5Sy4GiVoV25Q2/fDtn9I1vycB/+Yx+v+S0rpjaLmhAnQb7Xcfcos GuYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=zivCGSDpCNczJBDnbRKFgwtg6V3eZL1fJK4spMUzb2M=; b=LzyXSXhmo0ZDUjrb8uS6fJ+CNBPJeD+s9guKD045GMPY6m9EWvXXZqZ/cf9m6ru5wN Z2f0l3aa9W5hzH4P+5ndX4ZzQOfmdbmx/zq5RG1zgVqJ4geWflascNjIENSCN1kONpu+ FzV7o9UMXE+eLNL+e2kuYpi4WI1XR0WJ5RFmKyQ+oD24UZR2o/Ma/iAZ4bZCy18Pa9on UdGIMyJmHQQ+OkEqxLFD9Hje+cmRslRbWMkcBln7ETtwqyXkTIofGdWnlqBc+r+TLsx5 20hfyz+8++3qaE1Ih5qPD6n2oHzR8fYeZJltccHp6yJkQf/qbo+jXZhVQ2cYvXu+7cxo vEPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@plexistor-com.20150623.gappssmtp.com header.s=20150623 header.b=xiDEpWnZ; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k35si1404533edb.84.2019.10.25.06.58.18; Fri, 25 Oct 2019 06:58:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@plexistor-com.20150623.gappssmtp.com header.s=20150623 header.b=xiDEpWnZ; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730547AbfJXOFw (ORCPT + 99 others); Thu, 24 Oct 2019 10:05:52 -0400 Received: from mail-ed1-f66.google.com ([209.85.208.66]:40686 "EHLO mail-ed1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2409297AbfJXOFv (ORCPT ); Thu, 24 Oct 2019 10:05:51 -0400 Received: by mail-ed1-f66.google.com with SMTP id p59so10326802edp.7 for ; Thu, 24 Oct 2019 07:05:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plexistor-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=zivCGSDpCNczJBDnbRKFgwtg6V3eZL1fJK4spMUzb2M=; b=xiDEpWnZCwSMVLCp3VPqS4sm+wVW/sdnBHghHZBCQLCx/g5K9AK4KLleVMeeWmtxsG Qw+i/fiW0MHZoLgVWMKlJZIhfsqDMsFG9TYX98zZmDf0nWZWnwamGE669GVR7DAdM8v1 ijnxZRVMR8Agpt4s28FY3a9pGr/wgMisj1Mf0RnhmOK8fOYvfxYgSl1XbEPP4P3yyVr3 nwHlO0p8pN3OxvezUkTIFY9tHGZq+OvNzG68h4/9d3iIzW4I6VQ/3zr876T5NVPSsVRX RYi0efbf1XB+hQ9wEuRmLiBkVs/69sZ021dMhtWGTlzc0sbtZgWvt2mEjWUIwkgXd8A4 8x0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=zivCGSDpCNczJBDnbRKFgwtg6V3eZL1fJK4spMUzb2M=; b=lUyMBezqvsH3YViRFgX3eyklvWWvQX/pc9rnspyXrSdkvOKtViTzfrE/iCc/GtxBpr lf5psSc5oKVDG+hZ0UBlyIZdtdoJx68AAIZZQaJ+g/FDrahXdWQ49DHBFYO5a71QrBSG pOz56O8KOytIgDBhxcBWOd2XCAak2he57yNtSkNQ/y1DbH+WcQqV2QygXBe7OYGt/d5S 8PKODA7Wuzmf6kZ9MdIfBK+IO2he0FWI87QeBAielGa1SrPQ12YGrJUTMDYMICzFgYXl bofUzGTwWAKqcAdl5RrdKAjhhb3BNiAgRR2fGcj5P95c3KyLd/zIsJSODg6zaUojV907 B8jQ== X-Gm-Message-State: APjAAAWg8DkeLnl/o2D4GY/9/oVJ+wjFaAzU94g2EFGRlPrvsY9vmzcw Ir603iWqaAj3VWhLcW8Nj/4fBQ== X-Received: by 2002:a17:906:1e55:: with SMTP id i21mr39029914ejj.47.1571925949397; Thu, 24 Oct 2019 07:05:49 -0700 (PDT) Received: from [10.68.217.182] ([217.70.210.43]) by smtp.googlemail.com with ESMTPSA id y7sm564739edb.97.2019.10.24.07.05.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Oct 2019 07:05:48 -0700 (PDT) Subject: Re: [PATCH 0/5] Enable per-file/directory DAX operations To: Dave Chinner , Boaz Harrosh Cc: ira.weiny@intel.com, linux-kernel@vger.kernel.org, Alexander Viro , "Darrick J. Wong" , Dan Williams , Christoph Hellwig , "Theodore Y. Ts'o" , Jan Kara , linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org References: <20191020155935.12297-1-ira.weiny@intel.com> <20191023221332.GE2044@dread.disaster.area> <20191024073446.GA4614@dread.disaster.area> From: Boaz Harrosh Message-ID: Date: Thu, 24 Oct 2019 17:05:45 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0 MIME-Version: 1.0 In-Reply-To: <20191024073446.GA4614@dread.disaster.area> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 24/10/2019 10:34, Dave Chinner wrote: > On Thu, Oct 24, 2019 at 05:31:13AM +0300, Boaz Harrosh wrote: <> > > The on disk DAX flag is inherited from the parent directory at > create time. Hence an admin only need to set it on the data > directory of the application when first configuring it, and > everything the app creates will be configured for DAX access > automatically. > Yes I said that as well. But again I am concerned that this is the opposite of our Intention. As you said the WRITEs are slow and do not scale so what we like, and why we have the all problem, is to WRITE *none*-DAX. And if so then how do we turn the bit ON later for the fast READs. > Or, alternatively, mkfs sets the flag on the root dir so that > everything in the filesystem uses DAX by default (through > inheritance) unless the admin turns off the flag on a directory > before it starts to be used > or on a set of files after they have > been created (because DAX causes problems)... > Yes exactly this can not be done currently. > So, yeah, there's another problem with the basic assertion that we > only need to allow the on disk flag to be changed on zero length > files: we actually want to be able to -clear- the DAX flag when the > file has data attached to it, not just when it is an empty file... > Exactly, This is my concern. And the case that I see most useful is the opposite where I want to turn it ON, for DAX fast READs. >> What if, say in XFS when setting the DAX-bit we take all the three write-locks >> same as a truncate. Then we check that there are no active page-cache mappings >> ie. a single opener. Then allow to set the bit. Else return EBUISY. (file is in use) > > DAX doesn't have page cache mappings, so anything that relies on > checking page cache state isn't going to work reliably. I meant on the opposite case, Where the flag was OFF and I want it ON for fast READs. In that case if I have any users there are pages on the xarray. BTW the opposite is also true if we have active DAX users we will have DAX entries in the xarray. What we want is that there are *no* active users while we change the file-DAX-mode. Else we fail the change. > I also seem > to recall that there was a need to take some vm level lock to really > prevent page fault races, and that we can't safely take that in a > safe combination with all the filesystem locks we need. > We do not really care with page fault races in the Kernel as long as I protect the xarray access and these are protected well if we take truncate locking. But we have a bigger problem that you pointed out with the change of the operations vector pointer. I was thinking about this last night. One way to do this is with file-exclusive-lock. Correct me if I'm wrong: file-exclusive-readwrite-lock means any other openers will fail and if there are openers already the lock will fail. Which is what we want no? to make sure we are the exclusive user of the file while we change the op vector. Now the question is if we force the application to take the lock and Kernel only check that we are locked. Or Kernel take the lock within the IOCTL. Lets touch base. As I understand the protocol we want to establish with the administration tool is: - File is created, written. read... - ALL file handles are closed, there are no active users - File open by single opener for the purpose of changing the DAX-mode - lock from all other opens - change the DAX-mode, op vectors - unlock-exlusivness - File activity can resume... That's easy to say, But how can we enforce this protocol? > Cheers, > Dave. > Thanks Dave Boaz