Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1411472imm; Tue, 15 May 2018 19:49:54 -0700 (PDT) X-Google-Smtp-Source: AB8JxZr2GFaS4diMu3XTYZvdjdT2mImcsZWXb+oMEgkGZzY90SglfmL5SV0gd2v1FMqsdPSm8eUt X-Received: by 2002:a17:902:5709:: with SMTP id k9-v6mr16516930pli.165.1526438994875; Tue, 15 May 2018 19:49:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526438994; cv=none; d=google.com; s=arc-20160816; b=xXhyq5mVgLYnl640IW06tiR+x5nQZjQsFsEf/rDElGEMPy4xR1Ky0u6Kwb1+ntR4CE 8pGGxmksyHVImxYjCdtSVAYuXOlAxzhO3VOvJlBQvSEZgbmTJfiU9iFhZd/d2TCY7tUm TFMaw9ujk+jiUbE7PHksYds/cTC9UAQAKwInAeZvrAdux+y6CAA4Y1VPRd4dlFmkqZic muL5NhLdi5qpj1YVW6FzjHB5lPBn97g6tdCjKkbTkUAM7S+noofsONLvKaDwfbjdD0Ft jEvwxSUTbXNhwqAUpstslBuX7SLh9gpdqdtwOmILQec2bA7ueH2854b6uOZyEKRnn42L bzaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=B9a75p6MNAYCIuXr2SS2hmxbrbmmagbqawvUFszgxNQ=; b=Jw3XwfgIRlomwWxFVi0Pm1QlEOPxwSu7KJxLdOiDskM3Mj5KGEsZ4zo+tUYiQfhHjk l4jZFQ0JO6K129Seq6nxSlOVsgPgyj8UPofsb8WjOwYw+Jf5HV3+wKVyFViMfCFZjDeQ arLCWtv/n4+JNRYUDNvyFkfHUsOoY8nZyTItFD4ZKe1syXJwXXpjGt8uS1iHKZO4Ve6r lnYcD2cPqTsx9kKEz1YPdCZ3JUhDwaKfiLkhPNzkUcX0TwtElQ4orJAkl6nRD/FJNknN 2zH9Lr0Ov22BsI6dIxe66BpjYfO4wqdk72IiZCsr4AfMxzRuevuSShyoo0s93itlvzFv 4qHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=ZGkJE1Xq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h12-v6si1568280pfd.253.2018.05.15.19.49.40; Tue, 15 May 2018 19:49:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=ZGkJE1Xq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752366AbeEPCtA (ORCPT + 99 others); Tue, 15 May 2018 22:49:00 -0400 Received: from mail-ot0-f194.google.com ([74.125.82.194]:45416 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752095AbeEPCs6 (ORCPT ); Tue, 15 May 2018 22:48:58 -0400 Received: by mail-ot0-f194.google.com with SMTP id 15-v6so2663822otn.12 for ; Tue, 15 May 2018 19:48:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=B9a75p6MNAYCIuXr2SS2hmxbrbmmagbqawvUFszgxNQ=; b=ZGkJE1XqFKZEZVt5hlaOKlzztnP630cJWuNHlGMf3Adng8lLht4Ii51jn79cCLq6Ft OKbd3+m/ici1LSukk+hqDuwNLayGoxezyjGMGIBKwyxlQEPuXNiQR7oaX3Uu3d0KbTjX vafW7tJFUzZ80noz2d31l+s7AzzEu7Yc3as+u69bJMHwfPGYukr0JZBa0cJ1JUUmCRDd Z0V2kr+Ven73Rr2DFzyLuQvJ/zXh6jgrRdGVRkI/85jukFCPkKWJxuVC/ZyO8GN99ydK vkW/Wp9AIUJ0X6dHImrYgHjmDRudbLjhh+nhYhCmH9R19Mvn3sZ28ujZGbHeQ6QqApsN lQkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=B9a75p6MNAYCIuXr2SS2hmxbrbmmagbqawvUFszgxNQ=; b=h/bFTuxDv7A+fbpoRGRpe2Ec/Np7zD427lXZRhmBbVTPRPHS3B6eDY1gQl7mG7+Y5U ZIrO8jCO1mPX7XqtQC8Gs9Ylf/aTy0Mzqk+cklroLsxGBVd0KnRGSJrQ9Gk9CFNC72Aw 8oPTIHWNNJBAdcxNPGP92QpPNTdUxgl1yo6lgg0t1qc93e28HLvZT6DMe/EGagMGZb9S vJQy7eJFeWQZm/t0P75RF3+oAcm1wS00aLfbDixdpcZfYW6jflc7w0r5OUv2X1oi7jaV Hs7777iG5Ji/mY5DVpvXLz01kxZ0eBinwgeRM9Ycw1u7BfPSV6Y+4sLLZJuC/n3Mi1j3 gvdg== X-Gm-Message-State: ALKqPweeDZM29BM6DiBHmr9UIl3Q8QlhpJYadVt4p18wG6lraEOsc286 fZClOHDDPv3kYZa05YEoaWQelFjjr2otl6QVIUlAcQ== X-Received: by 2002:a9d:5a17:: with SMTP id v23-v6mr13124330oth.387.1526438938011; Tue, 15 May 2018 19:48:58 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:2d36:0:0:0:0:0 with HTTP; Tue, 15 May 2018 19:48:57 -0700 (PDT) In-Reply-To: References: <1525704627-30114-1-git-send-email-yehs1@lenovo.com> <20180507184622.GB12361@bombadil.infradead.org> <20180508030959.GB16338@bombadil.infradead.org> <20180510162742.GA30442@bombadil.infradead.org> <20180515162003.GA26489@bombadil.infradead.org> From: Dan Williams Date: Tue, 15 May 2018 19:48:57 -0700 Message-ID: Subject: Re: [External] Re: [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone To: Huaisheng HS1 Ye Cc: Matthew Wilcox , Jeff Moyer , Michal Hocko , linux-nvdimm , Tetsuo Handa , NingTing Cheng , Dave Hansen , Linux Kernel Mailing List , "pasha.tatashin@oracle.com" , Linux MM , "colyli@suse.de" , Johannes Weiner , Andrew Morton , Sasha Levin , Mel Gorman , Vlastimil Babka , Ocean HY1 He Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 15, 2018 at 7:05 PM, Huaisheng HS1 Ye wrote: >> From: Matthew Wilcox [mailto:willy@infradead.org] >> Sent: Wednesday, May 16, 2018 12:20 AM> >> > > > > Then there's the problem of reconnecting the page cache (which is >> > > > > pointed to by ephemeral data structures like inodes and dentries) to >> > > > > the new inodes. >> > > > Yes, it is not easy. >> > > >> > > Right ... and until we have that ability, there's no point in this patch. >> > We are focusing to realize this ability. >> >> But is it the right approach? So far we have (I think) two parallel >> activities. The first is for local storage, using DAX to store files >> directly on the pmem. The second is a physical block cache for network >> filesystems (both NAS and SAN). You seem to be wanting to supplant the >> second effort, but I think it's much harder to reconnect the logical cache >> (ie the page cache) than it is the physical cache (ie the block cache). > > Dear Matthew, > > Thanks for correcting my idea with cache line. > But I have questions about that, assuming NVDIMM works with pmem mode, even we > used it as physical block cache, like dm-cache, there is potential risk with > this cache line issue, because NVDIMMs are bytes-address storage, right? No, there is no risk if the cache is designed properly. The pmem driver will not report that the I/O is complete until the entire payload of the data write has made it to persistent memory. The cache driver will not report that the write succeeded until the pmem driver completes the I/O. There is no risk to losing power while the pmem driver is operating because the cache will recover to it's last acknowledged stable state, i.e. it will roll back / undo the incomplete write. > If system crash happens, that means CPU doesn't have opportunity to flush all dirty > data from cache lines to NVDIMM, during copying data pointed by bio_vec.bv_page to > NVDIMM. > I know there is btt which is used to guarantee sector atomic with block mode, > but for pmem mode that will likely cause mix of new and old data in one page > of NVDIMM. > Correct me if anything wrong. dm-cache is performing similar metadata management as the btt driver to ensure safe forward progress of the cache state relative to power loss or system-crash. > Another question, if we used NVDIMMs as physical block cache for network filesystems, > Does industry have existing implementation to bypass Page Cache similarly like DAX way, > that is to say, directly storing data to NVDIMMs from userspace, rather than copying > data from kernel space memory to NVDIMMs. Any caching solution with associated metadata requires coordination with the kernel, so it is not possible for the kernel to stay completely out of the way. Especially when we're talking about a cache in front of the network there is not much room for DAX to offer improved performance because we need the kernel to takeover on all write-persist operations to update cache metadata. So, I'm still struggling to see why dm-cache is not a suitable solution for this case. It seems suitable if it is updated to allow direct dma-access to the pmem cache pages from the backing device storage / networking driver.