Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp3496386ybt; Sat, 4 Jul 2020 20:13:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwC4xJUjV3JUYUmKlPucWLEpyb/noYQ+EgAa4RfVkDgsOlMcud29dti0E0jU+jDugi6asrg X-Received: by 2002:a50:afe1:: with SMTP id h88mr48510126edd.295.1593918814760; Sat, 04 Jul 2020 20:13:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593918814; cv=none; d=google.com; s=arc-20160816; b=v4+fV/jZjEAbJZc5qzuE5E6xR2HyKcP83fd6DjReNHzjaL92XBDQzwct0o4Is3qNjU GMQ3RtJImg/x10W7qtiuUsMR5mfaP9l2VKbPu8NeabQh5oPhWdf17pCdZEZuZaaLjAJk 6nJEYBVPgoOtqdsedqMvKvmYGUaJWbIHmH7XgOukkPdMw5sF/roIk6MbNzeXUZauHgrr idLIpk5+DV/HNGafpnsIOuG2KMczMz/wzCnfInWaucOTxoO+1NUJIN8f+2970bRSnp6r HW8UDm3I1MPpFppEq9i5CjTlWn1gvGFQK5Ql7aBntYeFKdrDNL0HfHHwaDBNIyTXPpm0 H6Eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=m+BXN9Ytx6NK3kafN0BfTeKYVvshGw4EyEVUqxlddGg=; b=bLjhEPbl7pa/jkNVfah3Erm1Wweg8bHqi12IM9qjS62n6adxXrwH/jFZ8Pzj7spb+5 ZM6OBtCNGlpCYB40Ca9nL1sTRpPMayoi27XEcGDLBxdJy/qyRW6LksII45cfbfbkQyKN 0BYGSLq4vDCijJZliWo+pM4W2ERJx1erzp+SulxazLbzlQ9fdRNE46VOPmSoq54NfyxJ C1aHQjkpxP/tB1oWAkmGOordzqG3za4DLCdcx3k/T8/hzzdXchWnJmsBs/EXP1f/7EfC KoMGOYLzDNUCnwwOChbZ5NfXStYVX4y9SyUEShqYVD0TGBk9Fk3002itkXkXb5jcDGk1 vkrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b="r/5xO+CD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bw19si10513925ejb.729.2020.07.04.20.13.00; Sat, 04 Jul 2020 20:13:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b="r/5xO+CD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728191AbgGEDMM (ORCPT + 99 others); Sat, 4 Jul 2020 23:12:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728046AbgGEDML (ORCPT ); Sat, 4 Jul 2020 23:12:11 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37E57C061794; Sat, 4 Jul 2020 20:12:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=m+BXN9Ytx6NK3kafN0BfTeKYVvshGw4EyEVUqxlddGg=; b=r/5xO+CDJwnLpUWAykeEJqPcna 7jaY7a5X/ftYwrLjwrKC71NN9CN4BO4JMYWLpqM4kHxl7w+zLD3IY06Tw6pI13SpD5/Qg76yjCfob xjRGi1v0MLZQa7EGddIbtFF/YKkauwo2f5V8nUVrxQYltqaCnQ9JSw3lVg80tn/5oKSAjmBgZ5HXK V/Q61jQzCSsB1Y78p5qL9s03jNSu5HnCsHJlpIO18yxHnlqBTJBf3vBwFzIquERMcoheWnvisWSGv Bif3ajYNVei5TcRImcla12cETM3F3gVnTCzrLG8BNC4El/vWEm5vJfWlGbJAttDrmU+jGD1aXqd2v iMbyN8XA==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1jrv4a-00021V-DJ; Sun, 05 Jul 2020 03:12:08 +0000 Date: Sun, 5 Jul 2020 04:12:08 +0100 From: Matthew Wilcox To: Jan Ziak <0xe2.0x9a.0x9b@gmail.com> Cc: gregkh@linuxfoundation.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-man@vger.kernel.org, mtk.manpages@gmail.com, shuah@kernel.org, viro@zeniv.linux.org.uk Subject: Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster Message-ID: <20200705031208.GS25523@casper.infradead.org> References: <20200705021631.GR25523@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jul 05, 2020 at 04:46:04AM +0200, Jan Ziak wrote: > On Sun, Jul 5, 2020 at 4:16 AM Matthew Wilcox wrote: > > > > On Sun, Jul 05, 2020 at 04:06:22AM +0200, Jan Ziak wrote: > > > Hello > > > > > > At first, I thought that the proposed system call is capable of > > > reading *multiple* small files using a single system call - which > > > would help increase HDD/SSD queue utilization and increase IOPS (I/O > > > operations per second) - but that isn't the case and the proposed > > > system call can read just a single file. > > > > > > Without the ability to read multiple small files using a single system > > > call, it is impossible to increase IOPS (unless an application is > > > using multiple reader threads or somehow instructs the kernel to > > > prefetch multiple files into memory). > > > > What API would you use for this? > > > > ssize_t readfiles(int dfd, char **files, void **bufs, size_t *lens); > > > > I pretty much hate this interface, so I hope you have something better > > in mind. > > I am proposing the following: > > struct readfile_t { > int dirfd; > const char *pathname; > void *buf; > size_t count; > int flags; > ssize_t retval; // set by kernel > int reserved; // not used by kernel > }; > > int readfiles(struct readfile_t *requests, size_t count); > > Returns zero if all requests succeeded, otherwise the returned value > is non-zero (glibc wrapper: -1) and user-space is expected to check > which requests have succeeded and which have failed. retval in > readfile_t is set to what the single-file readfile syscall would > return if it was called with the contents of the corresponding > readfile_t struct. You should probably take a look at io_uring. That has the level of complexity of this proposal and supports open/read/close along with many other opcodes.