Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1474049imm; Tue, 15 May 2018 21:10:49 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqn+wGtgbDmglYiDvGUHLSWVW1U03CkbIqQmotyp3MaDjrz/wt/dyCjVSxHUZBsQH5r1jDy X-Received: by 2002:a17:902:20c9:: with SMTP id v9-v6mr17389914plg.206.1526443849003; Tue, 15 May 2018 21:10:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526443848; cv=none; d=google.com; s=arc-20160816; b=m04vsZ7nxQhk8smjg7qJYGfKf//2TZYOtwurde5HcuxFeEXxZj7vtqz0xAPC3rCx4F bfh7qMQcvUcR0ffNmrpbeschDrLzyT7tbmUR/gXxZuuNJBLXoe8O8dS0OBsxHFAc7ln3 jkOXZEOiyQ+IvzN5sBbupLsx9uhkS4XiLRxNqoA8i2d8QqwHC4GvQTYQVUQvIEm0Kmkq cIR/JcCXwqXjWhFJjiCrgkPha9DicJp2xEHOiCGHXxnN8sNqqpIzOeWBiLyEaJUCj/Cr fB+zv4xDpt554xzboRiYbxif6FvxScGyKriskwaIl6mpS+eXM+j1jcWKDOTB4AChbJdv QQyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=3DX7cL62wHOQ3xkUHE0Uo9cWrH/OGC9XKic0BjyBLXw=; b=uftsQVt0NlN4sv4FeO9EOTMWDRwo11E7mAxxhrxOgDIpQO4A6Oi/d3eaaKn73H8MMp jViCZJV7jGFyMoev68wRFQNNhJ8iV89RtAd0B+U0xVbs5HN2g5eHQkbUIliQdI/5LCwO TCWFo+TcIi0tOhbGSgcsdndUgtHrIbKDcWAJnmelxAKs+Odb6af3LeYEHaGnX9dlwHaF lXSECECSdowsyM3FjvpWrLwXXmAmmLsPtAhw9FhOXHmgAqXSh0Ah/n2bwAsnKfOUCp5S aXU94CyUcCZr35x/ct5QLlgIqz6blNzZig5+6rr1qATiVCRy/8+F7wlixrG5mdm7UZj4 UWkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=zdAT8oCq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 9-v6si1751964ple.372.2018.05.15.21.10.21; Tue, 15 May 2018 21:10:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=zdAT8oCq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751213AbeEPEKO (ORCPT + 99 others); Wed, 16 May 2018 00:10:14 -0400 Received: from mail-ot0-f194.google.com ([74.125.82.194]:37439 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750810AbeEPEKN (ORCPT ); Wed, 16 May 2018 00:10:13 -0400 Received: by mail-ot0-f194.google.com with SMTP id 77-v6so2841743otd.4 for ; Tue, 15 May 2018 21:10:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=3DX7cL62wHOQ3xkUHE0Uo9cWrH/OGC9XKic0BjyBLXw=; b=zdAT8oCqzRVdpl3O/ZlQOkPXZer6tQsy7a22ZJ5MgDVBsQ7GCNCkm7ng+yNzQ9MqQ1 1j/+KAeuhpg9U0/lFuVY36NBv/xgm2iRCnAkzSQpyP77u9BnHYOqBq2B0TPOu8sEIAqP 0GDRfxGiFNUY1dVIbt9yuwBNmIunq/Ip7kDYSJ3bVpmvoNeQzmMXytR2pacb/eArqc7E IYNfKMktu2/pIwAI/vOxH78RMCnEl54/U8kHZ67cdEOuwTXMLXJ4ULwSlAfKvFTjPYeq 2qqhll74CWrts+PebeiiuxS2TSg8Olnv/4tKUljUmb2JQREQcEGVCu2Lou1WkO4To77b pyBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=3DX7cL62wHOQ3xkUHE0Uo9cWrH/OGC9XKic0BjyBLXw=; b=tvB4Hb9TAs9BnqwtE3yMZwiS7llJ6v4Ua7r8AoQhoaqoEd5VKGqJxWA6BoQQdrNKmi 4O1Thg5TuCb2RUyqdmt0z02yd9khdzZsZOcZqKVrDLLJ2sxHsthWneaSOi2rl6Gc1chs Vl4oCDkfgbJiUDlkBcsgBdiQHDKfsKuyIH0K459ErNQ+0Gi+5NTPLiaTzw0OHYqLLjfF 4QQ4QUACVKP3umk6VohnP2ib+SMVFhw60sdnx9PfdeHC3mPLJTDR4/QRLdWmxz6pOhV2 SliI+lYb8jK854Ox0Bx/Vg58uPXDu7L+EacAwko5DaXV/KuvtK0Re+Vcx+g0caR6a0Gs hDOA== X-Gm-Message-State: ALKqPwfDJ2FFwFtQsJvAINehn083bKS0zfMJkWPGsVbcc+IlxDgudOcd tkB73q/6X2MBbfXwFFSv10TyzLQXwlxSLGkX7dJYxA== X-Received: by 2002:a9d:de3:: with SMTP id 90-v6mr13477686ots.117.1526443812698; Tue, 15 May 2018 21:10:12 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:2d36:0:0:0:0:0 with HTTP; Tue, 15 May 2018 21:10:11 -0700 (PDT) In-Reply-To: <20180516025218.GA17352@bombadil.infradead.org> References: <20180507184622.GB12361@bombadil.infradead.org> <20180508030959.GB16338@bombadil.infradead.org> <20180510162742.GA30442@bombadil.infradead.org> <20180515162003.GA26489@bombadil.infradead.org> <20180516025218.GA17352@bombadil.infradead.org> From: Dan Williams Date: Tue, 15 May 2018 21:10:11 -0700 Message-ID: Subject: Re: [External] Re: [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone To: Matthew Wilcox Cc: Huaisheng HS1 Ye , Jeff Moyer , Michal Hocko , linux-nvdimm , Tetsuo Handa , NingTing Cheng , Dave Hansen , Linux Kernel Mailing List , "pasha.tatashin@oracle.com" , Linux MM , "colyli@suse.de" , Johannes Weiner , Andrew Morton , Sasha Levin , Mel Gorman , Vlastimil Babka , Ocean HY1 He , Vishal Verma Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 15, 2018 at 7:52 PM, Matthew Wilcox wrote: > On Wed, May 16, 2018 at 02:05:05AM +0000, Huaisheng HS1 Ye wrote: >> > From: Matthew Wilcox [mailto:willy@infradead.org] >> > Sent: Wednesday, May 16, 2018 12:20 AM> >> > > > > > Then there's the problem of reconnecting the page cache (which is >> > > > > > pointed to by ephemeral data structures like inodes and dentries) to >> > > > > > the new inodes. >> > > > > Yes, it is not easy. >> > > > >> > > > Right ... and until we have that ability, there's no point in this patch. >> > > We are focusing to realize this ability. >> > >> > But is it the right approach? So far we have (I think) two parallel >> > activities. The first is for local storage, using DAX to store files >> > directly on the pmem. The second is a physical block cache for network >> > filesystems (both NAS and SAN). You seem to be wanting to supplant the >> > second effort, but I think it's much harder to reconnect the logical cache >> > (ie the page cache) than it is the physical cache (ie the block cache). >> >> Dear Matthew, >> >> Thanks for correcting my idea with cache line. >> But I have questions about that, assuming NVDIMM works with pmem mode, even we >> used it as physical block cache, like dm-cache, there is potential risk with >> this cache line issue, because NVDIMMs are bytes-address storage, right? >> If system crash happens, that means CPU doesn't have opportunity to flush all dirty >> data from cache lines to NVDIMM, during copying data pointed by bio_vec.bv_page to >> NVDIMM. >> I know there is btt which is used to guarantee sector atomic with block mode, >> but for pmem mode that will likely cause mix of new and old data in one page >> of NVDIMM. >> Correct me if anything wrong. > > Right, we do have BTT. I'm not sure how it's being used with the block > cache ... but the principle is the same; write the new data to a new > page and then update the metadata to point to the new page. > >> Another question, if we used NVDIMMs as physical block cache for network filesystems, >> Does industry have existing implementation to bypass Page Cache similarly like DAX way, >> that is to say, directly storing data to NVDIMMs from userspace, rather than copying >> data from kernel space memory to NVDIMMs. > > The important part about DAX is that the kernel gets entirely out of the > way and userspace takes care of handling flushing and synchronisation. > I'm not sure how that works with the block cache; for a network > filesystem, the filesystem needs to be in charge of deciding when and > how to write the buffered data back to the storage. > > Dan, Vishal, perhaps you could jump in here; I'm not really sure where > this effort has got to. Which effort? I think we're saying that there is no such thing as a DAX capable block cache and it is not clear one make sense. We can certainly teach existing block caches some optimizations in the presence of pmem, and perhaps that is sufficient.