Received: by 2002:ab2:710b:0:b0:1ef:a325:1205 with SMTP id z11csp1670463lql; Wed, 13 Mar 2024 05:13:23 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVjWSWWSXGRUtzWBDMoz2H671JREpd77x1yKbz09+eiSgsfI/vUgo9+VqRXfu5cpJMG/Vfmk2JybWtrnaV94fF9mkYcNWBm7c3GTzB1Rg== X-Google-Smtp-Source: AGHT+IF2jd4BfaM7tREXAYxzNswAtiSRP1mOGBB9+lhfHMPVgHXz4k95gF7XuiQK5YDG5Lg6J/VG X-Received: by 2002:ad4:504c:0:b0:690:cb71:b7f2 with SMTP id m12-20020ad4504c000000b00690cb71b7f2mr9065039qvq.58.1710332003307; Wed, 13 Mar 2024 05:13:23 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710332003; cv=pass; d=google.com; s=arc-20160816; b=jK1OH+V/QKK+71eV8tWfYiZgkiGuxKgY6qKInVwaXs0tliAuGzIx4EDQ0mT4ubLIp+ /PEi9q2pW5IKp/o/HnD/+NlRH4Kn5HWC8TT7NaCXShmlVZk4OjnGUTWBuK1gUmrta3Ny 2nIo+74/JzGT3hpof49gN5p/AvGg/t1d3jzNHc2ANmf7OjIXUZ4CZVJHuTqYnKebuZtw Bl5ab6lHCZBByEHattwIwIyDD2KxphGjz3pBZDBr1CvbqaE3bFrgpWthmvwDrf76tE7H NwOI1PAKFAm1wev0hZMOGw72K0cAm1o/yZorCuQnJlqk8zizxZAV/5K4PcykKA3YPjCi G6ZA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=FQcf1nOKlMmN0CZVH5a2pQzTSI6ZekOBuZjHd+HZgWU=; fh=JI91NK2xZxMwYJVOFk6abUsvOnckiuOrgiCilFw4Mtg=; b=FfyqIOLgzMtE0c7b3jJLunSFHb2QnVm1F/vhZU0mTiqUEXtVwagKn6y3mDriz46HFW vSBYMvex8Ksy5/UK4ucoW5NHJhBEU7APKmvYZWoZ3hgfkhMyyFxMm2dxq79n+bIAb30a DRwimnaNL1PILKS/vVb6Zm46lZnRP20iruNIJ1X7ncdZo8fn20PgI77XRzlnAfTUey8O 1ycTM5AVrttQifnQ1rpyoeDOCGN1Z6FHfH33QvQMBcYQwf3a512dhbVOVvoHz3Wcb0Cs Qq1Q0g6WMer4B9OZmPC1ONsW1NM5ytxUVzt+O2T00wB+R2v0VxAnYncg/IP8PV/xNMmS 9uYA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=E9V0VbxP; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-101460-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-101460-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id gm1-20020a056214268100b00690ba65db08si10051583qvb.272.2024.03.13.05.13.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Mar 2024 05:13:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-101460-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=E9V0VbxP; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-101460-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-101460-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id F35D31C21745 for ; Wed, 13 Mar 2024 12:13:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 226AA405DF; Wed, 13 Mar 2024 12:13:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="E9V0VbxP" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32BC83FB15 for ; Wed, 13 Mar 2024 12:13:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710331992; cv=none; b=jCgphOfl2h5A0Z0irxPcsaOovCyGMgTW1zAIMi2wv0SFc2bCrmeGag1SV6t8Tiaiqf3+kzGgB9CWH/nF9QgTycPxYogMswDgYwjQIzCaDVCGrkMGU+M61QBMfrrIxLfdUKMLWXe7LkrtXXJeEoAl5lv6uJD0v8OisTNMCw+JMpU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710331992; c=relaxed/simple; bh=KY3GmGiPcro3a+CAKfJWS6zbPB2AHWvN4QxteRVDrsY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=YliPQx7sSNqZy4it0NmZ6jtf+tUlYj81efZEd1qSQHWdH/Gm5MwIcZI4/9qN6DEd1BZA6Bj4PF08QzThrBoj/fN2h7dF5G4ZoMf6VY6xfQmxDisZWKXR4SFBn54L4qqrI7RXG0qizsMyjV7+SKbO6J/W1leUo5N0twxw7bkEvjg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=E9V0VbxP; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1710331988; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FQcf1nOKlMmN0CZVH5a2pQzTSI6ZekOBuZjHd+HZgWU=; b=E9V0VbxPV3ZzkkEJTuK/FJYBzWxWy5F9K/Y82vh56ab96hxG/DB/rzmxwW0rEvjexorMou tqtKxfafI/dp05dSKEi3fv6VY01h4iweUVFviFLwjwz4UsrVqWIWgITrAar+vFqKhdTGjM 1qq6Ycip1/PAu8zGPyYyY9+LSXm0eGQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-97-bn3cwYwINeCfsO7QsZGFFA-1; Wed, 13 Mar 2024 08:13:04 -0400 X-MC-Unique: bn3cwYwINeCfsO7QsZGFFA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1D2088007AF; Wed, 13 Mar 2024 12:13:04 +0000 (UTC) Received: from bfoster (unknown [10.22.16.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 67E3C1121306; Wed, 13 Mar 2024 12:13:02 +0000 (UTC) Date: Wed, 13 Mar 2024 08:14:44 -0400 From: Brian Foster To: Hou Tao Cc: linux-fsdevel@vger.kernel.org, Miklos Szeredi , Vivek Goyal , Stefan Hajnoczi , Bernd Schubert , "Michael S . Tsirkin" , Matthew Wilcox , Benjamin Coddington , linux-kernel@vger.kernel.org, virtualization@lists.linux.dev, houtao1@huawei.com Subject: Re: [PATCH v2 4/6] virtiofs: support bounce buffer backed by scattered pages Message-ID: References: <20240228144126.2864064-1-houtao@huaweicloud.com> <20240228144126.2864064-5-houtao@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.3 On Sat, Mar 09, 2024 at 12:14:23PM +0800, Hou Tao wrote: > Hi, > > On 2/29/2024 11:01 PM, Brian Foster wrote: > > On Wed, Feb 28, 2024 at 10:41:24PM +0800, Hou Tao wrote: > >> From: Hou Tao > >> > >> When reading a file kept in virtiofs from kernel (e.g., insmod a kernel > >> module), if the cache of virtiofs is disabled, the read buffer will be > >> passed to virtiofs through out_args[0].value instead of pages. Because > >> virtiofs can't get the pages for the read buffer, virtio_fs_argbuf_new() > >> will create a bounce buffer for the read buffer by using kmalloc() and > >> copy the read buffer into bounce buffer. If the read buffer is large > >> (e.g., 1MB), the allocation will incur significant stress on the memory > >> subsystem. > >> > >> So instead of allocating bounce buffer by using kmalloc(), allocate a > >> bounce buffer which is backed by scattered pages. The original idea is > >> to use vmap(), but the use of GFP_ATOMIC is no possible for vmap(). To > >> simplify the copy operations in the bounce buffer, use a bio_vec flex > >> array to represent the argbuf. Also add an is_flat field in struct > >> virtio_fs_argbuf to distinguish between kmalloc-ed and scattered bounce > >> buffer. > >> > >> Signed-off-by: Hou Tao > >> --- > >> fs/fuse/virtio_fs.c | 163 ++++++++++++++++++++++++++++++++++++++++---- > >> 1 file changed, 149 insertions(+), 14 deletions(-) > >> > >> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c > >> index f10fff7f23a0f..ffea684bd100d 100644 > >> --- a/fs/fuse/virtio_fs.c > >> +++ b/fs/fuse/virtio_fs.c > > ... > >> @@ -408,42 +425,143 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work) > >> } > >> } > >> > > ... > >> static void virtio_fs_argbuf_copy_from_in_arg(struct virtio_fs_argbuf *argbuf, > >> unsigned int offset, > >> const void *src, unsigned int len) > >> { > >> - memcpy(argbuf->buf + offset, src, len); > >> + struct iov_iter iter; > >> + unsigned int copied; > >> + > >> + if (argbuf->is_flat) { > >> + memcpy(argbuf->f.buf + offset, src, len); > >> + return; > >> + } > >> + > >> + iov_iter_bvec(&iter, ITER_DEST, argbuf->s.bvec, > >> + argbuf->s.nr, argbuf->s.size); > >> + iov_iter_advance(&iter, offset); > > Hi Hou, > > > > Just a random comment, but it seems a little inefficient to reinit and > > readvance the iter like this on every copy/call. It looks like offset is > > already incremented in the callers of the argbuf copy helpers. Perhaps > > iov_iter could be lifted into the callers and passed down, or even just > > include it in the argbuf structure and init it at alloc time? > > Sorry for the late reply. Being busy with off-site workshop these days. > No worries. > I have tried a similar idea before in which iov_iter was saved directly > in argbuf struct, but it didn't work out well. The reason is that for > copy both in_args and out_args, an iov_iter is needed, but the direction > is different. Currently the bi-directional io_vec is not supported, so > the code have to initialize the iov_iter twice: one for copy from > in_args and another one for copy to out_args. > Ok, seems reasonable enough. > For dio read initiated from kernel, both of its in_numargs and > out_numargs is 1, so there will be only one iov_iter_advance() in > virtio_fs_argbuf_copy_to_out_arg() and the offset is 64, so I think the > overhead will be fine. For dio write initiated from kernel, its > in_numargs is 2 and out_numargs is 1, so there will be two invocations > of iov_iter_advance(). The first one with offset=64, and the another one > with offset=round_up_page_size(64 + write_size), so the later one may > introduce extra overhead. But compared with the overhead of data copy, I > still think the overhead of calling iov_iter_advance() is fine. > I'm not claiming the overhead is some practical problem here, but rather that we shouldn't need to be concerned with it in the first place with some cleaner code. It's been a bit since I first looked at this. I was originally wondering about defining the iov_iter in the caller and pass as a param, but on another look, do the lowest level helpers really need to exist? If you don't anticipate further users, IMO something like the diff below is a bit more clean (compile tested only, but no reinits and less code overall). But that's just my .02, feel free to use it or not.. Brian --- 8< --- diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 9ee71051c89f..9de477e9ccd5 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -544,26 +544,6 @@ static unsigned int virtio_fs_argbuf_setup_sg(struct virtio_fs_argbuf *argbuf, return cur - sg; } -static void virtio_fs_argbuf_copy_from_in_arg(struct virtio_fs_argbuf *argbuf, - unsigned int offset, - const void *src, unsigned int len) -{ - struct iov_iter iter; - unsigned int copied; - - if (argbuf->is_flat) { - memcpy(argbuf->f.buf + offset, src, len); - return; - } - - iov_iter_bvec(&iter, ITER_DEST, argbuf->s.bvec, - argbuf->s.nr, argbuf->s.size); - iov_iter_advance(&iter, offset); - - copied = _copy_to_iter(src, len, &iter); - WARN_ON_ONCE(copied != len); -} - static unsigned int virtio_fs_argbuf_out_args_offset(struct virtio_fs_argbuf *argbuf, const struct fuse_args *args) @@ -577,26 +557,6 @@ virtio_fs_argbuf_out_args_offset(struct virtio_fs_argbuf *argbuf, return round_up(offset, PAGE_SIZE); } -static void virtio_fs_argbuf_copy_to_out_arg(struct virtio_fs_argbuf *argbuf, - unsigned int offset, void *dst, - unsigned int len) -{ - struct iov_iter iter; - unsigned int copied; - - if (argbuf->is_flat) { - memcpy(dst, argbuf->f.buf + offset, len); - return; - } - - iov_iter_bvec(&iter, ITER_SOURCE, argbuf->s.bvec, - argbuf->s.nr, argbuf->s.size); - iov_iter_advance(&iter, offset); - - copied = _copy_from_iter(dst, len, &iter); - WARN_ON_ONCE(copied != len); -} - /* * Returns 1 if queue is full and sender should wait a bit before sending * next request, 0 otherwise. @@ -684,23 +644,41 @@ static void virtio_fs_hiprio_dispatch_work(struct work_struct *work) static void copy_args_to_argbuf(struct fuse_req *req) { struct fuse_args *args = req->args; + struct virtio_fs_argbuf *argbuf = req->argbuf; + struct iov_iter iter; + unsigned int copied; unsigned int offset = 0; unsigned int num_in; unsigned int i; + if (!argbuf->is_flat) { + iov_iter_bvec(&iter, ITER_DEST, argbuf->s.bvec, argbuf->s.nr, + argbuf->s.size); + } + num_in = args->in_numargs - args->in_pages; for (i = 0; i < num_in; i++) { - virtio_fs_argbuf_copy_from_in_arg(req->argbuf, offset, - args->in_args[i].value, - args->in_args[i].size); - offset += args->in_args[i].size; + const void *src = args->in_args[i].value; + unsigned int len = args->in_args[i].size; + + if (argbuf->is_flat) { + memcpy(argbuf->f.buf + offset, src, len); + offset += len; + continue; + } + + iov_iter_advance(&iter, len); + copied = _copy_to_iter(src, len, &iter); + WARN_ON_ONCE(copied != len); } } /* Copy args out of req->argbuf */ static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req) { - struct virtio_fs_argbuf *argbuf; + struct virtio_fs_argbuf *argbuf = req->argbuf; + struct iov_iter iter; + unsigned int copied; unsigned int remaining; unsigned int offset; unsigned int num_out; @@ -711,10 +689,14 @@ static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req) if (!num_out) goto out; - argbuf = req->argbuf; + if (!argbuf->is_flat) + iov_iter_bvec(&iter, ITER_SOURCE, argbuf->s.bvec, + argbuf->s.nr, argbuf->s.size); + offset = virtio_fs_argbuf_out_args_offset(argbuf, args); for (i = 0; i < num_out; i++) { unsigned int argsize = args->out_args[i].size; + void *dst = args->out_args[i].value; if (args->out_argvar && i == args->out_numargs - 1 && @@ -722,10 +704,14 @@ static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req) argsize = remaining; } - virtio_fs_argbuf_copy_to_out_arg(argbuf, offset, - args->out_args[i].value, - argsize); - offset += argsize; + if (argbuf->is_flat) { + memcpy(dst, argbuf->f.buf + offset, argsize); + offset += argsize; + } else { + iov_iter_advance(&iter, argsize); + copied = _copy_from_iter(dst, argsize, &iter); + WARN_ON_ONCE(copied != argsize); + } if (i != args->out_numargs - 1) remaining -= argsize;