Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3272907pxk; Mon, 21 Sep 2020 09:22:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwC36A9k0U8d5ZE9i2xmJTdgK62EW9Xy8tr5T4ewFxnWyMzuLEe8gWfQnQkDxtJEpB6dHYp X-Received: by 2002:a17:906:cf9b:: with SMTP id um27mr273284ejb.66.1600705369433; Mon, 21 Sep 2020 09:22:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600705369; cv=none; d=google.com; s=arc-20160816; b=U1pRPvcA9S/r83XNM5zE/fsoaxj4u/wwHDHpvilM7d65YXOHVcVHD7YkQDgQhJfniE m7w8VObj1v31g94gZU9Wx/Omqi3iTUI9VmshXd+HANOxZaceMVxoS8Owz+01CT6Gy+zt olLG4nlTKS6z/UTzCal0yd4SBaOTpHnPrr3AOddWY3P8WYuKxB9DZvHlIISeD5Y/OIZ3 LrcPsXTof4rf3V4QNWEySypg/Gi0yZuD/4bHqBvzSd3djJp4dW8YCQ6HStUeeoiXeHi7 0qKgUBxMmKn5J5Mbq4gEPY99kVw+7Eut65tLk3SPTpW4JCsIaSBuU1RA3VR7gp7Y4FpD DheA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:message-id :in-reply-to:subject:cc:to:from:date:dkim-signature; bh=jQWTc66kqIVZ12N6Cbr6RkCq5AT8IRw3RQfZVWbTbrI=; b=xcIT8fTnClM12cgVu/4lg0l25Vci3dUq6fQP18UByrmBuL7LgRRoU+WwavIF/uSjhP ZGEj3Mh7amk5jDJ8Zm2R4VfIMKJCTsbetBLA3p5+dDM9twO4W2EmexDgmX7AU1y8cPO4 46eravFnONCKcTO6j+VmAdvuS+iCuRjyExJTGMWotXG1mYUg5ms9YrSkRN5Zjb116AjK u63E9w3AbgtymjFfBuZ8VWIs2Hj6J4ZJoeNjempoHzOmGFUv4u3ioSBanSX1xWI47O9f XAoLaTFfAfAxQrAzM3aDaJVAu/gYgZWBcmhd4Ew6fU24iKDgoMTnt8sappMswzXoLS5d USSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CzjvJKX2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s15si8881951ejy.634.2020.09.21.09.22.25; Mon, 21 Sep 2020 09:22:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CzjvJKX2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728109AbgIUQUv (ORCPT + 99 others); Mon, 21 Sep 2020 12:20:51 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:38366 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726430AbgIUQUu (ORCPT ); Mon, 21 Sep 2020 12:20:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600705248; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jQWTc66kqIVZ12N6Cbr6RkCq5AT8IRw3RQfZVWbTbrI=; b=CzjvJKX2yEpTwfO0DSS/76aqW4K1f9wON90pBvbfy31kjr/6cpsSIuhCRVkOTU7EC0aAzy CMdxo6Amax88bthlxtq6JdToLP9bRJvJA8goRwfMHHKehDM06oXGxTUqudOBu2+++tLOKT ZYGAo4UMNKJm/XtHw9YFv4X8Vj+oobI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-515-cblGsGTfOVqBQ7_yEP5lXA-1; Mon, 21 Sep 2020 12:20:46 -0400 X-MC-Unique: cblGsGTfOVqBQ7_yEP5lXA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D80471005E5B; Mon, 21 Sep 2020 16:20:43 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A19A010013D0; Mon, 21 Sep 2020 16:20:43 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 08LGKhfl006434; Mon, 21 Sep 2020 12:20:43 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 08LGKgrb006430; Mon, 21 Sep 2020 12:20:42 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Mon, 21 Sep 2020 12:20:42 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Dan Williams cc: Linus Torvalds , Alexander Viro , Andrew Morton , Vishal Verma , Dave Jiang , Ira Weiny , Matthew Wilcox , Jan Kara , Eric Sandeen , Dave Chinner , "Kani, Toshi" , "Norton, Scott J" , "Tadakamadla, Rajesh (DCIG/CDI/HPS Perf)" , Linux Kernel Mailing List , linux-fsdevel , linux-nvdimm Subject: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 16 Sep 2020, Mikulas Patocka wrote: > > > On Wed, 16 Sep 2020, Dan Williams wrote: > > > On Wed, Sep 16, 2020 at 10:24 AM Mikulas Patocka wrote: > > > > > > > My first question about nvfs is how it compares to a daxfs with > > > > executables and other binaries configured to use page cache with the > > > > new per-file dax facility? > > > > > > nvfs is faster than dax-based filesystems on metadata-heavy operations > > > because it doesn't have the overhead of the buffer cache and bios. See > > > this: http://people.redhat.com/~mpatocka/nvfs/BENCHMARKS > > > > ...and that metadata problem is intractable upstream? Christoph poked > > at bypassing the block layer for xfs metadata operations [1], I just > > have not had time to carry that further. > > > > [1]: "xfs: use dax_direct_access for log writes", although it seems > > he's dropped that branch from his xfs.git > > XFS is very big. I wanted to create something small. And the another difference is that XFS metadata are optimized for disks and SSDs. On disks and SSDs, reading one byte is as costly as reading a full block. So we must put as much information to a block as possible. XFS uses b+trees for file block mapping and for directories - it is reasonable decision because b+trees minimize the number of disk accesses. On persistent memory, each access has its own cost, so NVFS uses metadata structures that minimize the number of cache lines accessed (rather than the number of blocks accessed). For block mapping, NVFS uses the classic unix dierct/indirect blocks - if a file block is mapped by a 3-rd level indirect block, we do just three memory accesses and we are done. If we used b+trees, the number of accesses would be much larger than 3 (we would have to do binary search in the b+tree nodes). The same for directories - NVFS hashes the file name and uses radix-tree to locate a directory page where the directory entry is located. XFS b+trees would result in much more accesses than the radix-tree. Regarding journaling - NVFS doesn't do it because persistent memory is so fast that we can just check it in the case of crash. NVFS has a multithreaded fsck that can do 3 million inodes per second. XFS does journaling (it was reasonable decision for disks where fsck took hours) and it will cause overhead for all the filesystem operations. Mikulas