Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp582883rdb; Mon, 29 Jan 2024 11:09:54 -0800 (PST) X-Google-Smtp-Source: AGHT+IEN6GTp2lZ1QqdaN/7ENyEtxz7bqdSvpucjytdGXfHeH/a6zbhXrqnPgtZmg2i6QRve3jJk X-Received: by 2002:a05:622a:514:b0:42a:1eef:a8c2 with SMTP id l20-20020a05622a051400b0042a1eefa8c2mr8493092qtx.65.1706555394026; Mon, 29 Jan 2024 11:09:54 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706555394; cv=pass; d=google.com; s=arc-20160816; b=BRVciFLTMNblCcSQJAIT8xMqo2V3O3dk+MemYdj35wwe8SLN5CWE5FOkpTzF4Nn5T7 SIdx3704psWPys7nqbDGIbOWYY4ZV6eDqtHB/yCGt9H/Ie6cDQgrGBh71WmgIJNuU5R6 LF2nUKRo+vWoiw7bBXaQO+KhWt191TomB7Tz8ZLPbgY7fVw/i5K4qeqfGdzG5FLD5QQz jlxBxXS5nU2LXmFSUYh3wiM3wvpKQFQKXsT800LKl4NkS5HcrkglgPtsp5N0V33K3kor POdZSGBT67fXZz3ci/KtRv3dW25qgvyqzxXHlsxLcVDqhikEUqgmpFEtQqJvwlAHzWWg lFjw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date:dkim-signature; bh=A+pfwsAR7dQEMZaUbmAmf7P3tGpaqey93GDUy/LCXUY=; fh=uXW40L69qv3Mp1hW4LWR1/9DA+qMThRL43yTEnx+lrM=; b=uTlu7sGf8tGaP9zdCPnxNqJaCcfMCKIcmlsrrgS2GsEGtJfP0WADIojkPLL9MvyaIT j/kMxMlxt5ODEw/vlPjTGcnkDlwEzR4S2p6MlWM+TFpes1+I5FDX6XejrkiyeXWLQ1JK SaIVEUcRXKwjlS7XJ4jP3jMQDHxZTt7Cr+GRM3gy+dY5Mpu60TOpqptaTz8TRhRNKKZF WiKJ4q3uUpj98opwvnKMxUhlxJGvfJtsEIvfYf5O0P5KYL/XpEnoJnZO1RvnzekAc7MJ tH35l/GVkiHCWDSeMyPFzQXwysMVZJOfu5iz81oi0l43Kvu158C2gZQoRierST1/nZg4 5xZw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@fastly.com header.s=google header.b=rIkU6Ws2; arc=pass (i=1 spf=pass spfdomain=fastly.com dkim=pass dkdomain=fastly.com dmarc=pass fromdomain=fastly.com); spf=pass (google.com: domain of linux-kernel+bounces-43390-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43390-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=fastly.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id o6-20020ac85a46000000b0042a33e4f167si8401589qta.137.2024.01.29.11.09.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 11:09:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43390-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@fastly.com header.s=google header.b=rIkU6Ws2; arc=pass (i=1 spf=pass spfdomain=fastly.com dkim=pass dkdomain=fastly.com dmarc=pass fromdomain=fastly.com); spf=pass (google.com: domain of linux-kernel+bounces-43390-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43390-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=fastly.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 81D3B1C23CBE for ; Mon, 29 Jan 2024 19:09:53 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 71FB94CB3D; Mon, 29 Jan 2024 19:09:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=fastly.com header.i=@fastly.com header.b="rIkU6Ws2" Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FB963C060 for ; Mon, 29 Jan 2024 19:09:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706555370; cv=none; b=cqIcn7kjvDCNF5nOuG7j5iQBeBLg1GfUbkPyRa0w1gqJ6zWCj6H4BTl7feBIs63fJUhgB8kI0lFOH8pX/Ef7/0c7JWBukrfHDC2XaMTFD2UaO6C1UPMJXpjGOp3CyvOp48nMxer8woaWccZx/UD9D4rOIvR0aGZ6yHpXgpR+q1c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706555370; c=relaxed/simple; bh=511hvbc1Pbq2+EOuaGOVAprY+/qc4fIDXxmZ7B+0s2w=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=hl/2x/gGECoKw85SoCSMyN8cI5FiS05mOlWPGvenO8x8lBlkuE0cDoSD//hWUcEE4cbe9OvoftyU9FFJZ51UkV3uoaWiEOt8WfSsEPGPIrg7IXzwWpxrfIBnMNOvZsLk6yHiTQRW+ETkOcC9dbZWQtymrCu+poBnjhZcosXQhAc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fastly.com; spf=pass smtp.mailfrom=fastly.com; dkim=pass (1024-bit key) header.d=fastly.com header.i=@fastly.com header.b=rIkU6Ws2; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fastly.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fastly.com Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6ddc268ce2bso1703650b3a.0 for ; Mon, 29 Jan 2024 11:09:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastly.com; s=google; t=1706555368; x=1707160168; darn=vger.kernel.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=A+pfwsAR7dQEMZaUbmAmf7P3tGpaqey93GDUy/LCXUY=; b=rIkU6Ws2QKHii+1VfbHJ2qXmJcqT+IfmpgjYD4xzI5MIRFiSvC+xRxCq3OP1SGpSgF 4TglgnOJTGktPRPsH8rsgF7lN7JIo5WrTDlJVg1OO4glT8goQETCjiwYdIofMJ5z+XVc D3eVlvTYsLHUwtw4Ginkr08ZTdGu83VGHMZK4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706555368; x=1707160168; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A+pfwsAR7dQEMZaUbmAmf7P3tGpaqey93GDUy/LCXUY=; b=uJz0bVlSaXINWODivarFd1gbd4LBlnNFimaPgP6kbqD+7EZq1+kJTUUqam7CVXBbhU gPoT0wrQohXxk3UL1LR/bd5k3hWnFwuSofM1b3s6mSfDcy2xbCt2EpGoefWZkOIGjeJZ h2ttNHVA/7K2uHZZCtENBpPO+yEzf/E/pPen7ISHWrs0lcuAcY5kAxuyB84XAy7Frgs0 D667S833EKYqGC2SBXvEs4s7T9bLyhc7g0aVf/4PR4Q8CEZ94Wkrn6Gqpq8VTU2RaHvJ noPv5rFnozolxQILfQOcp41PqnJUM0KJoHxN2zKBd5uHpAcLlBy/p1D6mTs1lDru8VUN VjTw== X-Gm-Message-State: AOJu0YwgwifGa+XZ555ZbZic2/vUl8X6D/MTm4JqcvkuC9+56cL7/JW/ MG+C/7+DnJFTMFQvSBGhboCkJtiv6c4T3WBmxDlTajTRA4ikBsIrsQk45ApS9Vo= X-Received: by 2002:a05:6a00:939c:b0:6dd:8767:2fa1 with SMTP id ka28-20020a056a00939c00b006dd87672fa1mr4221676pfb.0.1706555367798; Mon, 29 Jan 2024 11:09:27 -0800 (PST) Received: from fastly.com (c-24-6-151-244.hsd1.ca.comcast.net. [24.6.151.244]) by smtp.gmail.com with ESMTPSA id gu7-20020a056a004e4700b006db105027basm6234279pfb.50.2024.01.29.11.09.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jan 2024 11:09:27 -0800 (PST) Date: Mon, 29 Jan 2024 11:09:23 -0800 From: Joe Damato To: Willem de Bruijn Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, chuck.lever@oracle.com, jlayton@kernel.org, linux-api@vger.kernel.org, brauner@kernel.org, edumazet@google.com, davem@davemloft.net, alexander.duyck@gmail.com, sridhar.samudrala@intel.com, kuba@kernel.org, weiwan@google.com, Alexander Viro , Andrew Waterman , Arnd Bergmann , Dominik Brodowski , Greg Kroah-Hartman , Jan Kara , Jiri Slaby , Jonathan Corbet , Julien Panis , "open list:DOCUMENTATION" , "(open list:FILESYSTEMS \\(VFS and infrastructure\\))" , Michael Ellerman , Nathan Lynch , Palmer Dabbelt , Steve French , Thomas Huth , Thomas Zimmermann Subject: Re: [PATCH net-next v3 0/3] Per epoll context busy poll support Message-ID: <20240129190922.GA1315@fastly.com> References: <20240125225704.12781-1-jdamato@fastly.com> <65b52d6381de7_3a9e0b2943d@willemb.c.googlers.com.notmuch> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <65b52d6381de7_3a9e0b2943d@willemb.c.googlers.com.notmuch> User-Agent: Mutt/1.5.24 (2015-08-30) On Sat, Jan 27, 2024 at 11:20:51AM -0500, Willem de Bruijn wrote: > Joe Damato wrote: > > Greetings: > > > > Welcome to v3. Cover letter updated from v2 to explain why ioctl and > > adjusted my cc_cmd to try to get the correct people in addition to folks > > who were added in v1 & v2. Labeled as net-next because it seems networking > > related to me even though it is fs code. > > > > TL;DR This builds on commit bf3b9f6372c4 ("epoll: Add busy poll support to > > epoll with socket fds.") by allowing user applications to enable > > epoll-based busy polling and set a busy poll packet budget on a per epoll > > context basis. > > > > This makes epoll-based busy polling much more usable for user > > applications than the current system-wide sysctl and hardcoded budget. > > > > To allow for this, two ioctls have been added for epoll contexts for > > getting and setting a new struct, struct epoll_params. > > > > ioctl was chosen vs a new syscall after reviewing a suggestion by Willem > > de Bruijn [1]. I am open to using a new syscall instead of an ioctl, but it > > seemed that: > > - Busy poll affects all existing epoll_wait and epoll_pwait variants in > > the same way, so new verions of many syscalls might be needed. It > > There is no need to support a new feature on legacy calls. Applications have > to be upgraded to the new ioctl, so they can also be upgraded to the latest > epoll_wait variant. Sure, that's a fair point. I think we could probably make reasonable arguments in both directions about the pros/cons of each approach. It's still not clear to me that a new syscall is the best way to go on this, and IMO it does not offer a clear advantage. I understand that part of the premise of your argument is that ioctls are not recommended, but in this particular case it seems like a good use case and there have been new ioctls added recently (at least according to git log). This makes me think that while their use is not recommended, they can serve a purpose in specific use cases. To me, this use case seems very fitting. More of a joke and I hate to mention this, but this setting is changing how io is done and it seems fitting that this done via an ioctl ;) > epoll_pwait extends epoll_wait with a sigmask. > epoll_pwait2 extends extends epoll_pwait with nsec resolution timespec. > Since they are supersets, nothing is lots by limiting to the most recent API. > > In the discussion of epoll_pwait2 the addition of a forward looking flags > argument was discussed, but eventually dropped. Based on the argument that > adding a syscall is not a big task and does not warrant preemptive code. > This decision did receive a suitably snarky comment from Jonathan Corbet [1]. > > It is definitely more boilerplate, but essentially it is as feasible to add an > epoll_pwait3 that takes an optional busy poll argument. In which case, I also > believe that it makes more sense to configure the behavior of the syscall > directly, than through another syscall and state stored in the kernel. I definitely hear what you are saying; I think I'm still not convinced, but I am thinking it through. In my mind, all of the other busy poll settings are configured by setting options on the sockets using various SO_* options, which modify some state in the kernel. The existing system-wide busy poll sysctl also does this. It feels strange to me to diverge from that pattern just for epoll. In the case of epoll_pwait2 the addition of a new syscall is an approach that I think makes a lot of sense. The new system call is also probably better from an end-user usability perspective, as well. For busy poll, I don't see a clear reasoning why a new system call is better, but maybe I am still missing something. > I don't think that the usec fine grain busy poll argument is all that useful. > Documentation always suggests setting it to 50us or 100us, based on limited > data. Main point is to set it to exceed the round-trip delay of whatever the > process is trying to wait on. Overestimating is not costly, as the call > returns as soon as the condition is met. An epoll_pwait3 flag EPOLL_BUSY_POLL > with default 100us might be sufficient. > > [1] https://lwn.net/Articles/837816/ Perhaps I am misunderstanding what you are suggesting, but I am opposed to hardcoding a value. If it is currently configurable system-wide and via SO_* options for other forms of busy poll, I think it should similarly be configurable for epoll busy poll. I may yet be convinced by the new syscall argument, but I don't think I'd agree on imposing a default. The value can be modified by other forms of busy poll and the goal of my changes are to: - make epoll-based busy poll per context - allow applications to configure (within reason) how epoll-based busy poll behaves, like they can do now with the existing SO_* options for other busy poll methods. > > seems much simpler for users to use the correct > > epoll_wait/epoll_pwait for their app and add a call to ioctl to enable > > or disable busy poll as needed. This also probably means less work to > > get an existing epoll app using busy poll. >