Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp7278821pxb; Thu, 18 Feb 2021 06:14:55 -0800 (PST) X-Google-Smtp-Source: ABdhPJzU54SxW6FDUJiPV0LwHopoC5If9jcE54M8/vh+Q9RrgfohG9jDZLMvc2JORX5+W2vcWuM/ X-Received: by 2002:a17:906:3c13:: with SMTP id h19mr4313556ejg.232.1613657695568; Thu, 18 Feb 2021 06:14:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613657695; cv=none; d=google.com; s=arc-20160816; b=ck+sltS4HV8oKyXGLO/1BqLVBIo5IWNOZDvnzk6XWXS+MegxD5xDiNvZLBylHBhD3Q Medl/iXtWboZmXSTZE9d3IVpgz9ZAAMhnmck14dsyE3qWYxNPQ0CJ1YkiG3N/LsoQ0ft 0LGnPDITIyX8hkKsZhuYpeEQo6yCDQeqIxfZgVrunqpuAsrcMDKxIh2aGhjT494T9Qkf Dj3HdQSsD2erLwDasZLVbt0rRUzlGLi1X1/7QLg1bAfVEJFdZ7tRNX2Yr0xld+Fbbllb vxxaQvIdJ0Oogt56JcpsadeVOoPAJA8+ze+bY0qLXuZrTJrgYlezEirCYqJ5Eg25bTIS mOhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date; bh=ERiz8GnQhQsBoOTcWgLrQAX6EHJDlqfIaXtyzX2wqfY=; b=n1W0aEq/DvKBhuU36SwQZgoPSLTIjmIv4tp5f0W99ML++pDxEi2Dn7mMGr+e0qJHd+ CqHO4BLODM9vEoG/37/bj2Ser/E0Dk53qlRQdS7WdNRVOTmKBvAWRbBJ1X+XZqu3J52a z1cQFqWsL4OSeR6qHfv2Lh6l+xGPZfc1h8bvLku6dMK77Fxv7wxYGy/HHotkCNSS/Ko+ jEuxu3dEcxageDpNSFcD2oPHwODlRwMoWmUUzlo4kuAN6WllqSBo7FmqVEGTnnCOP+n0 3h5dC5e/QrQSTcLKJt6BkblukDiqjwNQdDjg5oF14KdD1dQkaXCFB69yEHIXzvu0++Aj /1iA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lz22si297128ejb.184.2021.02.18.06.14.32; Thu, 18 Feb 2021 06:14:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232784AbhBROKj (ORCPT + 99 others); Thu, 18 Feb 2021 09:10:39 -0500 Received: from hmm.wantstofly.org ([213.239.204.108]:57128 "EHLO mail.wantstofly.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232388AbhBRM2a (ORCPT ); Thu, 18 Feb 2021 07:28:30 -0500 Received: by mail.wantstofly.org (Postfix, from userid 1000) id E7AEC7F4AC; Thu, 18 Feb 2021 14:26:40 +0200 (EET) Date: Thu, 18 Feb 2021 14:26:40 +0200 From: Lennert Buytenhek To: Jens Axboe , Al Viro , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, io-uring@vger.kernel.org Cc: David Laight , Matthew Wilcox Subject: [PATCH v3 0/2] io_uring: add support for IORING_OP_GETDENTS Message-ID: <20210218122640.GA334506@wantstofly.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org These patches add support for IORING_OP_GETDENTS, which is a new io_uring opcode that more or less does an lseek(sqe->fd, sqe->off, SEEK_SET) followed by a getdents64(sqe->fd, (void *)sqe->addr, sqe->len). A dumb test program for IORING_OP_GETDENTS is available here: https://krautbox.wantstofly.org/~buytenh/uringfind-v2.c This test program does something along the lines of what find(1) does: it scans recursively through a directory tree and prints the names of all directories and files it encounters along the way -- but then using io_uring. (The io_uring version prints the names of encountered files and directories in an order that's determined by SQE completion order, which is somewhat nondeterministic and likely to differ between runs.) On a directory tree with 14-odd million files in it that's on a six-drive (spinning disk) btrfs raid, find(1) takes: # echo 3 > /proc/sys/vm/drop_caches # time find /mnt/repo > /dev/null real 24m7.815s user 0m15.015s sys 0m48.340s # And the io_uring version takes: # echo 3 > /proc/sys/vm/drop_caches # time ./uringfind /mnt/repo > /dev/null real 10m29.064s user 0m4.347s sys 0m1.677s # The fully cached case also shows some speedup. find(1): # time find /mnt/repo > /dev/null real 0m5.223s user 0m1.926s sys 0m3.268s # Versus the io_uring version: # time ./uringfind /mnt/repo > /dev/null real 0m3.604s user 0m2.417s sys 0m0.793s # That said, the point of this patch isn't primarily to enable lightning-fast find(1) or du(1), but more to complete the set of filesystem I/O primitives available via io_uring, so that applications can do all of their filesystem I/O using the same mechanism, without having to manually punt some of their work out to worker threads -- and indeed, an object storage backend server that I wrote a while ago can run with a pure io_uring based event loop with this patch. Changes since v2 RFC: - Rebase onto io_uring-2021-02-17 plus a manually applied version of the mkdirat patch. The latter is needed because userland (liburing) has already merged the opcode for IORING_OP_MKDIRAT (in commit "io_uring.h: 5.12 pending kernel sync") while this opcode isn't in the kernel yet (as of io_uring-2021-02-17), and this means that this can't be merged until IORING_OP_MKDIRAT is merged. - Adapt to changes made in "io_uring: replace force_nonblock with flags" that are in io_uring-2021-02-17. Changes since v1 RFC: - Drop the trailing '64' from IORING_OP_GETDENTS64 (suggested by Matthew Wilcox). - Instead of requiring that sqe->off be zero, use this field to pass in a directory offset to start reading from. For the first IORING_OP_GETDENTS call on a directory, this can be set to zero, and for subsequent calls, it can be set to the ->d_off field of the last struct linux_dirent64 returned by the previous call. Lennert Buytenhek (2): readdir: split the core of getdents64(2) out into vfs_getdents() io_uring: add support for IORING_OP_GETDENTS fs/io_uring.c | 73 ++++++++++++++++++++++++++++++++++++++++++ fs/readdir.c | 25 +++++++++----- include/linux/fs.h | 4 ++ include/uapi/linux/io_uring.h | 1 4 files changed, 95 insertions(+), 8 deletions(-)