Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp1846936pxb; Sun, 10 Jan 2021 13:19:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJzg30bEp9nkKqsgKnyZTp1KstHwyFVPkly/MiPepV1092eMmDaW63/yD3NuDx0GTSuDKj4o X-Received: by 2002:a17:906:6449:: with SMTP id l9mr9563036ejn.320.1610313585048; Sun, 10 Jan 2021 13:19:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610313585; cv=none; d=google.com; s=arc-20160816; b=y46brnUhMxbHlpr/LyIA6/tQ16r2FqxNMGNwcoq4z33nhiAlg0vl/y4H8zg6tg6qVu TJcTek8jK3wqx28leZoYECKYFPmPu6fC45Jgl5MSbqYfKnRnDWkpRW1CoMuCsK25hp4d sHkD4W7cuIqpHVwQcf0qUO/fqVqp85DetdJILvkCLJLtqfUEAsQHzveq1XgDpM0tYeWn kV7XxBg/pzA5YxfWXUXk6VN798UlrtkmDCF3OgrqmRMXFditgv5ME5sjPaNErw+IOW6g S/GdOfJ3SFG5VcMzl+zMrWkKqNYyn5MHcERo9J5n/r1Ah54KCs88Zel+h09r2I6aYyUp ihvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:message-id :in-reply-to:subject:cc:to:from:date:dkim-signature; bh=rxOZFxk9FoO306GsC2fkx0fNsyYQ9FUyxG7qC/hq6I0=; b=XnNUAu33YyIB+dG9LNtJcFiSpwi0PZPCW8IEhMF4ZTtSfGyyKjP1xE98Ryq/hOAxTr ykxivBceCZx4QO1NP2cAhHGjqBVzrH2+3x0i+u4dhn8erP1JoYbf8WswF3POue+YmFfU FspfzRzcUf+9tWSApCLuh5UOhkpaoAf+K78YcKgW3VSDJGDm2WFmrorJ37yMLrHBPKnz JsbkiaA1lW4t1neUT6QXkpGAWuz2MYsx+Bn8IawKWMDB5Z+hiDsdoiab/2hRWObcY24w KnMIvP1KamQeOX0lNDySm1mfVADWZ1cP85442nrTEMQM21O3u9uYKNc4JYrmtz5f8Ucg bMXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KrezSGTC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q12si6327315edb.395.2021.01.10.13.19.05; Sun, 10 Jan 2021 13:19:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KrezSGTC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726748AbhAJVQf (ORCPT + 99 others); Sun, 10 Jan 2021 16:16:35 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:34790 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726432AbhAJVQe (ORCPT ); Sun, 10 Jan 2021 16:16:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610313307; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rxOZFxk9FoO306GsC2fkx0fNsyYQ9FUyxG7qC/hq6I0=; b=KrezSGTCCX+KyNfvislaOar964oB9z3+xPiIIdG5LHQIa/FAoBYomP1OMoev6VcZvqzHkU CB0nA/0+7hRZ/0nUWMg07p1sVehKKfelaooTrG2b8z6aB4e3X+lFjePRrL/tBLVHJ6V7qy 0G3NA6yeQjf6OMRy4htxS788uosTW8U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-231-Kk6bgDgTN0KXmNEno4OnzA-1; Sun, 10 Jan 2021 16:15:03 -0500 X-MC-Unique: Kk6bgDgTN0KXmNEno4OnzA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B5CD2801817; Sun, 10 Jan 2021 21:15:00 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9EA785D769; Sun, 10 Jan 2021 21:14:58 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 10ALEvRs016838; Sun, 10 Jan 2021 16:14:57 -0500 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 10ALEtlh016834; Sun, 10 Jan 2021 16:14:55 -0500 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Sun, 10 Jan 2021 16:14:55 -0500 (EST) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Al Viro cc: Andrew Morton , Dan Williams , Vishal Verma , Dave Jiang , Ira Weiny , Matthew Wilcox , Jan Kara , Steven Whitehouse , Eric Sandeen , Dave Chinner , "Theodore Ts'o" , Wang Jianchao , "Kani, Toshi" , "Norton, Scott J" , "Tadakamadla, Rajesh" , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org Subject: Re: [RFC v2] nvfs: a filesystem for persistent memory In-Reply-To: <20210110162008.GV3579531@ZenIV.linux.org.uk> Message-ID: References: <20210110162008.GV3579531@ZenIV.linux.org.uk> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 10 Jan 2021, Al Viro wrote: > On Thu, Jan 07, 2021 at 08:15:41AM -0500, Mikulas Patocka wrote: > > Hi > > > > I announce a new version of NVFS - a filesystem for persistent memory. > > http://people.redhat.com/~mpatocka/nvfs/ > Utilities, AFAICS > > > git://leontynka.twibright.com/nvfs.git > Seems to hang on git pull at the moment... Do you have it anywhere else? I saw some errors 'git-daemon: fatal: the remote end hung up unexpectedly' in syslog. I don't know what's causing them. > > I found out that on NVFS, reading a file with the read method has 10% > > better performance than the read_iter method. The benchmark just reads the > > same 4k page over and over again - and the cost of creating and parsing > > the kiocb and iov_iter structures is just that high. > > Apples and oranges... What happens if you take > > ssize_t read_iter_locked(struct file *file, struct iov_iter *to, loff_t *ppos) > { > struct inode *inode = file_inode(file); > struct nvfs_memory_inode *nmi = i_to_nmi(inode); > struct nvfs_superblock *nvs = inode->i_sb->s_fs_info; > ssize_t total = 0; > loff_t pos = *ppos; > int r; > int shift = nvs->log2_page_size; > size_t i_size; > > i_size = inode->i_size; > if (pos >= i_size) > return 0; > iov_iter_truncate(to, i_size - pos); > > while (iov_iter_count(to)) { > void *blk, *ptr; > size_t page_mask = (1UL << shift) - 1; > unsigned page_offset = pos & page_mask; > unsigned prealloc = (iov_iter_count(to) + page_mask) >> shift; > unsigned size; > > blk = nvfs_bmap(nmi, pos >> shift, &prealloc, NULL, NULL, NULL); > if (unlikely(IS_ERR(blk))) { > r = PTR_ERR(blk); > goto ret_r; > } > size = ((size_t)prealloc << shift) - page_offset; > ptr = blk + page_offset; > if (unlikely(!blk)) { > size = min(size, (unsigned)PAGE_SIZE); > ptr = empty_zero_page; > } > size = copy_to_iter(to, ptr, size); > if (unlikely(!size)) { > r = -EFAULT; > goto ret_r; > } > > pos += size; > total += size; > } while (iov_iter_count(to)); > > r = 0; > > ret_r: > *ppos = pos; > > if (file) > file_accessed(file); > > return total ? total : r; > } > > and use that instead of your nvfs_rw_iter_locked() in your > ->read_iter() for DAX read case? Then the same with > s/copy_to_iter/_copy_to_iter/, to see how much of that is > "hardening" overhead. > > Incidentally, what's the point of sharing nvfs_rw_iter() for > read and write cases? They have practically no overlap - > count the lines common for wr and !wr cases. And if you > do the same in nvfs_rw_iter_locked(), you'll see that the > shared parts _there_ are bloody pointless on the read side. That's a good point. I split nvfs_rw_iter to separate functions nvfs_read_iter and nvfs_write_iter - and inlined nvfs_rw_iter_locked into both of them. It improved performance by 1.3%. > Not that it had been more useful on the write side, really, > but that's another story (nvfs_write_pages() handling of > copyin is... interesting). Let's figure out what's going > on with the read overhead first... > > lib/iov_iter.c primitives certainly could use massage for > better code generation, but let's find out how much of the > PITA is due to those and how much comes from you fighing > the damn thing instead of using it sanely... The results are: read: 6.744s read_iter: 7.417s read_iter - separate read and write path: 7.321s Al's read_iter: 7.182s Al's read_iter with _copy_to_iter: 7.181s Mikulas