Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1173753AbdDXR6Y (ORCPT ); Mon, 24 Apr 2017 13:58:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54864 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1173735AbdDXR6O (ORCPT ); Mon, 24 Apr 2017 13:58:14 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 303BF80E7C Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jmoyer@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 303BF80E7C From: Jeff Moyer To: Dan Williams Cc: "linux-nvdimm\@lists.01.org" , Linux ACPI , "linux-kernel\@vger.kernel.org" , Christoph Hellwig Subject: Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush() References: <149281853758.22910.2919981036906495309.stgit@dwillia2-desk3.amr.corp.intel.com> X-PGP-KeyID: 1F78E1B4 X-PGP-CertKey: F6FE 280D 8293 F72C 65FD 5A58 1FF8 A7CA 1F78 E1B4 X-PCLoadLetter: What the f**k does that mean? Date: Mon, 24 Apr 2017 13:58:12 -0400 In-Reply-To: (Dan Williams's message of "Mon, 24 Apr 2017 10:43:41 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Mon, 24 Apr 2017 17:58:14 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3787 Lines: 85 Dan Williams writes: > [ adding Christoph ] > > On Mon, Apr 24, 2017 at 9:43 AM, Jeff Moyer wrote: >> Dan Williams writes: >> >>> On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer wrote: >>>> Dan Williams writes: >>>> >>>>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR >>>>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing >>>>> platform WPQ (write-pending-queue) buffers when power is removed. The >>>>> nvdimm_flush() mechanism performs that same function on-demand. >>>>> >>>>> When a pmem namespace is associated with a block device, an >>>>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH >>>>> request. However, when a namespace is in device-dax mode, or namespaces >>>>> are disabled, userspace needs another path. >>>>> >>>>> The new 'flush' attribute is visible when it can be determined that the >>>>> interleave-set either does, or does not have DIMMs that expose WPQ-flush >>>>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and >>>>> flushes DIMMs, or returns "0" the flush operation is a platform nop. >>>>> >>>>> Signed-off-by: Dan Williams >>>> >>>> NACK. This should function the same way it does for a pmem device. >>>> Wire up sync. >>> >>> We don't have dirty page tracking for device-dax, without that I don't >>> think we should wire up the current sync calls. >> >> Why not? Device dax is meant for the "flush from userspace" paradigm. >> There's enough special casing around device dax that I think you can get >> away with implementing *sync as call to nvdimm_flush. > > I think its an abuse of fsync() and gets in the way of where we might > take userspace-pmem-flushing with new sync primitives as proposed here > [1]. I agree that it's an abuse, and I'm happy to not go that route. I am still against using a sysfs file to do this WPQ flush, however. > I'm also conscious of the shade that hch threw the last time I tried > to abuse an existing syscall for device-dax [2]. > >>> I do think we need a more sophisticated sync syscall interface >>> eventually that can select which level of flushing is being performed >>> (page cache vs cpu cache vs platform-write-buffers). >> >> I don't. I think this whole notion of flush, and flush harder is >> brain-dead. How do you explain to applications when they should use >> each one? > > You never need to use this mechanism to guarantee persistence, which > is counter to what fsync() is defined to provide. This mechanism is > only there to backstop against potential ADR failures. You haven't answered my question. Why should applications even need to consider this? Do you expect ADR to have a high failure rate? If so, shouldn't an application call this deep flush any time they would want to make their state persistent? >>> Until then I think this sideband interface makes sense and sysfs is >>> more usable than an ioctl. >> >> Well, if you're totally against wiring up sync, then I say we forget >> about the deep flush completely. What's your use case? > > The use case is device-dax users that want to reduce the impact of an > ADR failure. Which also assumes that the platform has mechanisms to > communicate ADR failure. This is not an interface I expect to be used > for general purpose applications. All of those should be depending > solely on ADR semantics. What applications? I remain unconvinced of the utility of the WPQ flush separate from msync/fsync. Either you always do the WPQ flush, or you never do it. I don't see the use case for doing it sometimes, and no one I've asked has managed to come up with a concrete use case. Cheers, Jeff