Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1414004imm; Tue, 15 May 2018 19:53:03 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqWFGTO9IgmjwUNr67kAdoI8C/ilC1ID4wf2kGM2zYIOAbw8ACfYEIpFZcwQz65fKLIjd07 X-Received: by 2002:a17:902:d24:: with SMTP id 33-v6mr16951150plu.22.1526439182971; Tue, 15 May 2018 19:53:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526439182; cv=none; d=google.com; s=arc-20160816; b=tenDnINT0xBshQHk+GfX8EN3uVwd3WywNZ5L8nuUTgJq9Q/gukDpMDM0fMjORfSVWx b968Xoqe4Y7FkZimLr1YTd9/wS3aIg8gt4M7V2hfJSl7T0HX/smg/cXeth3WKOWjIbMr 8Pk0QzsJnonJtDW5hCwrRKtXMbtCH43Cb34MvWEXauYcoDbDsIdG912rldlM8AcZXyMr aNg24FHsC3xr7SiP+fJ69Xy8RKsblDJjsYjNZuKs7Rm1YMtREL7RlqF3oRuDyRRS3zaz K8abMRhDeTV8dR3bMtv4v/71vnaH99JKa1rzmX+J+u3diZ16SXTxgCvy6XZ8SVPM0XRb xhxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=KeFy1mDVjuu1eYiZft4gdy4yStJ7yHx40tQjzH5qM3c=; b=j9/ZJA/wKQ0cD47vBpItNgO8OzaS1F9Fbfmo9fMtEreNwNqsDE+Acjh/X7K1XzPEb+ zS5M0BDuzl6gXeXdLHmWqYcHwqkxryfOW8VGDaciZpcXSpoae7V6SGBlwYUWkf+4dafi 6mh25dQQ+6YVBCE3xSmQhsUX3GysW/x7CQtu3C/VWeecq0WMdeP0EREi6hpt20umyhEE MMWDpEudAqhRFaqwjD94lH2VKvykDlCgAEKSODtnxO5WA4Kp11dLePqblz21amr7y1tD HXB8cxv4DBc3sdXELGpFKpJcJ5t5rvwhYO9Lxdx0AyodqubUnLUGqf8yerq/fye3Ac59 leKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=MJUQJBdf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si1461333pla.565.2018.05.15.19.52.48; Tue, 15 May 2018 19:53:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=MJUQJBdf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752172AbeEPCwb (ORCPT + 99 others); Tue, 15 May 2018 22:52:31 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:48074 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751868AbeEPCw3 (ORCPT ); Tue, 15 May 2018 22:52:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=KeFy1mDVjuu1eYiZft4gdy4yStJ7yHx40tQjzH5qM3c=; b=MJUQJBdfWlHn7bbkz2a41rPTu q1zvVgFBWuz+ZDcK/GCRB8q3K/MiKesEFfdUnpbqKaE/YVhqsJ3XJI4kXW2bp93KcjeNDTUO3EWu/ jQVAtOm8zkUS3gK+xtt7RDjlKq4qHX0uMGYOeNVtQ3GWnv8j02ue1/c05dqoE3UdcP8uFFaoCyFZ6 0wHvwkcoNw/S4wi+9NWLE0zmejEJ2//CjQX1J8Mrl6G5jfEy5S/rS5o726kP3dCMdmJ/nsenUoe8l 2dqFYuDEvP4O5StvwKTh3Pa4p2Eboyj2xHjhBDk0ZmuUQudihV75eK3KyVZWNYwCDY5zhA254Sgsi GD7QE/a6w==; Received: from willy by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux)) id 1fImY7-0005KO-86; Wed, 16 May 2018 02:52:19 +0000 Date: Tue, 15 May 2018 19:52:18 -0700 From: Matthew Wilcox To: Huaisheng HS1 Ye Cc: Jeff Moyer , Dan Williams , Michal Hocko , linux-nvdimm , Tetsuo Handa , NingTing Cheng , Dave Hansen , Linux Kernel Mailing List , "pasha.tatashin@oracle.com" , Linux MM , "colyli@suse.de" , Johannes Weiner , Andrew Morton , Sasha Levin , Mel Gorman , Vlastimil Babka , Ocean HY1 He , Vishal Verma Subject: Re: [External] Re: [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone Message-ID: <20180516025218.GA17352@bombadil.infradead.org> References: <20180507184622.GB12361@bombadil.infradead.org> <20180508030959.GB16338@bombadil.infradead.org> <20180510162742.GA30442@bombadil.infradead.org> <20180515162003.GA26489@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 16, 2018 at 02:05:05AM +0000, Huaisheng HS1 Ye wrote: > > From: Matthew Wilcox [mailto:willy@infradead.org] > > Sent: Wednesday, May 16, 2018 12:20 AM> > > > > > > Then there's the problem of reconnecting the page cache (which is > > > > > > pointed to by ephemeral data structures like inodes and dentries) to > > > > > > the new inodes. > > > > > Yes, it is not easy. > > > > > > > > Right ... and until we have that ability, there's no point in this patch. > > > We are focusing to realize this ability. > > > > But is it the right approach? So far we have (I think) two parallel > > activities. The first is for local storage, using DAX to store files > > directly on the pmem. The second is a physical block cache for network > > filesystems (both NAS and SAN). You seem to be wanting to supplant the > > second effort, but I think it's much harder to reconnect the logical cache > > (ie the page cache) than it is the physical cache (ie the block cache). > > Dear Matthew, > > Thanks for correcting my idea with cache line. > But I have questions about that, assuming NVDIMM works with pmem mode, even we > used it as physical block cache, like dm-cache, there is potential risk with > this cache line issue, because NVDIMMs are bytes-address storage, right? > If system crash happens, that means CPU doesn't have opportunity to flush all dirty > data from cache lines to NVDIMM, during copying data pointed by bio_vec.bv_page to > NVDIMM. > I know there is btt which is used to guarantee sector atomic with block mode, > but for pmem mode that will likely cause mix of new and old data in one page > of NVDIMM. > Correct me if anything wrong. Right, we do have BTT. I'm not sure how it's being used with the block cache ... but the principle is the same; write the new data to a new page and then update the metadata to point to the new page. > Another question, if we used NVDIMMs as physical block cache for network filesystems, > Does industry have existing implementation to bypass Page Cache similarly like DAX way, > that is to say, directly storing data to NVDIMMs from userspace, rather than copying > data from kernel space memory to NVDIMMs. The important part about DAX is that the kernel gets entirely out of the way and userspace takes care of handling flushing and synchronisation. I'm not sure how that works with the block cache; for a network filesystem, the filesystem needs to be in charge of deciding when and how to write the buffered data back to the storage. Dan, Vishal, perhaps you could jump in here; I'm not really sure where this effort has got to.