Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3206218pxk; Tue, 15 Sep 2020 12:55:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwRZkyQZ7ReppNVkujmu0wvkjLxXM8lmTfKPFi+NoFFIlfyovInuZ55rTi+43vxXl1kNh2g X-Received: by 2002:a17:906:c7da:: with SMTP id dc26mr21227980ejb.491.1600199742843; Tue, 15 Sep 2020 12:55:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600199742; cv=none; d=google.com; s=arc-20160816; b=Vhh6kwktKSPE3Fht+MGDbAGIGewNyP1rI7Osus6BrOwbwlPPp9qV/9iCwIpUhwD2v7 V1b3Vatl1a8lYzUg9iE9G1PDl6+aVbUnhPeqbqu778gSAZ26ByL/KaFAkbbc2YW6rNLf 93Rb0OdHvB9elKCAwHVRzYdQD6RJxlK3LKAy4OhOQtTtPmDRrUOsY/uTXUycmSFT29zn UUhdG6t3dCOnzR8EnMi+B52AfIsypTumrjLovzufirJSNu0MlNhCGc9qwflzt+blh3tt 1w3079cfA6wbizSRnqlAVpVuKLcUWDi+b+VjrGOh0eLq1KtGTl099HJqw8UGoYI81GzS uihQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=walQ61YYGzyVqfuTJ3FRBx41wBNDB3XgqTy7en19sog=; b=oW3V/2pPnRWwGwB+bVtRJ2nGqE7DOU3Q1NvLdou0cPUwpepzHVXOhi3F+c/7zQa7OY +wNwo/CMM19S+q2FHH7YhJnIRtoEUfnhby5257IhjMHUB+sqLJztwIKIeaA8e9uJSE3c B7cewD0WaGUpOm9LWvWiZlNk6jkgqHkx8ufkPnbEVZ3yYITB9K5i/9LgmlOq3PWcwz0Z rLhyzjId4onJMcBLb0I0SZJge5zqwDMNjI3Nez/OEofHSaxjGF8udNW1vinMc+pgApxy zUadCoBtgz6SratZt+f4Gy1bwaRa60hFfkWvn9ZUzNdEld8tYyHfa+tuHX5tIY636EDY tegA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BOiKWkvg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j24si10034261ejd.671.2020.09.15.12.55.20; Tue, 15 Sep 2020 12:55:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BOiKWkvg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727279AbgIOTyu (ORCPT + 99 others); Tue, 15 Sep 2020 15:54:50 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:52182 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727806AbgIORH2 (ORCPT ); Tue, 15 Sep 2020 13:07:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600189625; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=walQ61YYGzyVqfuTJ3FRBx41wBNDB3XgqTy7en19sog=; b=BOiKWkvgL0zaa6q+k4jFH1U2GBcTjMNKiNKLMXe2nkc6MT9L2Ys5ZzqAEsyiDRa3CYoX6N h2iVM1escL6klA2+poxBVWoCTnPCkEVjayActZapzmJsNA6nv3CvjS5NPRlXf05uCVn75m 5B750syjYFfjmr/3vTrj3Cif5Bq17qc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-459-c5pfAVJMMm2Wno7Kl1o0YA-1; Tue, 15 Sep 2020 12:58:52 -0400 X-MC-Unique: c5pfAVJMMm2Wno7Kl1o0YA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 77D1D18BFECB; Tue, 15 Sep 2020 16:58:49 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 237865DC17; Tue, 15 Sep 2020 16:58:49 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 08FGwmjQ031604; Tue, 15 Sep 2020 12:58:48 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 08FGwkUQ031600; Tue, 15 Sep 2020 12:58:47 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Tue, 15 Sep 2020 12:58:46 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Dan Williams cc: Linus Torvalds , Alexander Viro , Andrew Morton , Vishal Verma , Dave Jiang , Ira Weiny , Matthew Wilcox , Jan Kara , Eric Sandeen , Dave Chinner , "Kani, Toshi" , "Norton, Scott J" , "Tadakamadla, Rajesh (DCIG/CDI/HPS Perf)" , Linux Kernel Mailing List , linux-fsdevel , linux-nvdimm Subject: Re: [RFC] nvfs: a filesystem for persistent memory In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 15 Sep 2020, Dan Williams wrote: > > - when the fsck.nvfs tool mmaps the device /dev/pmem0, the kernel uses > > buffer cache for the mapping. The buffer cache slows does fsck by a factor > > of 5 to 10. Could it be possible to change the kernel so that it maps DAX > > based block devices directly? > > We've been down this path before. > > 5a023cdba50c block: enable dax for raw block devices > 9f4736fe7ca8 block: revert runtime dax control of the raw block device > acc93d30d7d4 Revert "block: enable dax for raw block devices" It says "The functionality is superseded by the new 'Device DAX' facility". But the fsck tool can't change a fsdax device into a devdax device just for checking. Or can it? > EXT2/4 metadata buffer management depends on the page cache and we > eliminated a class of bugs by removing that support. The problems are > likely tractable, but there was not a straightforward fix visible at > the time. Thinking about it - it isn't as easy as it looks... Suppose that the user mounts an ext2 filesystem and then uses the tune2fs tool on the mounted block device. The tune2fs tool reads and writes the mounted superblock directly. So, read/write must be coherent with the buffer cache (otherwise the kernel would not see the changes written by tune2fs). And mmap must be coherent with read/write. So, if we want to map the pmem device directly, we could add a new flag MAP_DAX. Or we could test if the fd has O_DIRECT flag and map it directly in this case. But the default must be to map it coherently in order to not break existing programs. > > - __copy_from_user_inatomic_nocache doesn't flush cache for leading and > > trailing bytes. > > You want copy_user_flushcache(). See how fs/dax.c arranges for > dax_copy_from_iter() to route to pmem_copy_from_iter(). Is it something new for the kernel 5.10? I see only __copy_user_flushcache that is implemented just for x86 and arm64. There is __copy_from_user_flushcache implemented for x86, arm64 and power. It is used in lib/iov_iter.c under #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE - so should I use this? Mikulas