Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp801738ybi; Fri, 31 May 2019 09:06:37 -0700 (PDT) X-Google-Smtp-Source: APXvYqzNZQX9Ml4bMwYbidJTjTXKauQQ591ej0rLVXQoacA2JlHMGe8IsO+Hj9ACteJhTZz+gBb6 X-Received: by 2002:aa7:825a:: with SMTP id e26mr11193204pfn.255.1559318796898; Fri, 31 May 2019 09:06:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559318796; cv=none; d=google.com; s=arc-20160816; b=jDqjAwhqys8z2yqny1vHJ9Qj/JKGtuedbgU/lIO8lLNvT12mdonyuA46B5CxpBiRsK 7p+kMXVvFd0nPCrIMs/DbtyUWqyrvR9A6vaKv+AjbjfmyAyPtWxGfUXXYMI/UtDc97xt X5/97I5vqrDx5FdBmebyL5nQ3mL8Fox//gVFLccD7Q3awtftmf8XhClnHODVR5T1/huE XLBq5mckKxcKEsC3NfmwONoUF4YCAoH67lIxhDnzKIUA/v8FjvdIvu+9RJkcAq3KTu/N 9jN4FPWehBe+p4elv76yqurOMLnM883HrXf/VJPz6IupFuCJQip5BF9OoldsfnajvTX2 KZ+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version; bh=7KmItAPgVzu7CUZe2KN/NS1Kc77hPwmpZ1M6XdL6lsc=; b=EJyL8MyA87iK4hVN2MDtOsXAzyWP1i1h2SyiQGysC03ro96RtwhZ2GiNVwQEpES/93 q+VLBNazh7vCiLjXekGK0IDgyQrQPIfCN/vh5z+Ee+eJFeiw9fm+kPUGcToAMXqRoVoH V5XiwwC9mkoVVan/cqB/9b48humWw2Tr4Xqf94RuE8ry86MVeR8vPQvqrdwB1bxPqWlp ugBO0zQR7ofnGRwTEc6Li4sMej2i1sy2MVoip83Cuq6sUZKPxzwcuXldlHjmL7zAh17z 1QJCDQOWWuU52CkV7kT+OTI/J+naxQ5OeRA+OWkpj04PgWCvim8xyU9cRfWAaiZi7kY3 zMjw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m3si7216703plb.200.2019.05.31.09.06.21; Fri, 31 May 2019 09:06:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726843AbfEaQCP (ORCPT + 99 others); Fri, 31 May 2019 12:02:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:50474 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726037AbfEaQCO (ORCPT ); Fri, 31 May 2019 12:02:14 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 80981B00B; Fri, 31 May 2019 16:02:13 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 31 May 2019 18:02:13 +0200 From: Roman Penyaev To: Jens Axboe Cc: Azat Khuzhin , Andrew Morton , Al Viro , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 00/13] epoll: support pollable epoll from userspace In-Reply-To: References: <20190516085810.31077-1-rpenyaev@suse.de> Message-ID: <1d47ee76735f25ae5e91e691195f7aa5@suse.de> X-Sender: rpenyaev@suse.de User-Agent: Roundcube Webmail Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019-05-31 16:48, Jens Axboe wrote: > On 5/16/19 2:57 AM, Roman Penyaev wrote: >> Hi all, >> >> This is v3 which introduces pollable epoll from userspace. >> >> v3: >> - Measurements made, represented below. >> >> - Fix alignment for epoll_uitem structure on all 64-bit archs except >> x86-64. epoll_uitem should be always 16 bit, proper BUILD_BUG_ON >> is added. (Linus) >> >> - Check pollflags explicitly on 0 inside work callback, and do >> nothing >> if 0. >> >> v2: >> - No reallocations, the max number of items (thus size of the user >> ring) >> is specified by the caller. >> >> - Interface is simplified: -ENOSPC is returned on attempt to add a >> new >> epoll item if number is reached the max, nothing more. >> >> - Alloced pages are accounted using user->locked_vm and limited to >> RLIMIT_MEMLOCK value. >> >> - EPOLLONESHOT is handled. >> >> This series introduces pollable epoll from userspace, i.e. user >> creates >> epfd with a new EPOLL_USERPOLL flag, mmaps epoll descriptor, gets >> header >> and ring pointers and then consumes ready events from a ring, avoiding >> epoll_wait() call. When ring is empty, user has to call epoll_wait() >> in order to wait for new events. epoll_wait() returns -ESTALE if user >> ring has events in the ring (kind of indication, that user has to >> consume >> events from the user ring first, I could not invent anything better >> than >> returning -ESTALE). >> >> For user header and user ring allocation I used vmalloc_user(). I >> found >> that it is much easy to reuse remap_vmalloc_range_partial() instead of >> dealing with page cache (like aio.c does). What is also nice is that >> virtual address is properly aligned on SHMLBA, thus there should not >> be >> any d-cache aliasing problems on archs with vivt or vipt caches. > > Why aren't we just adding support to io_uring for this instead? Then we > don't need yet another entirely new ring, that's is just a little > different from what we have. > > I haven't looked into the details of your implementation, just curious > if there's anything that makes using io_uring a non-starter for this > purpose? Afaict the main difference is that you do not need to recharge an fd (submit new poll request in terms of io_uring): once fd has been added to epoll with epoll_ctl() - we get events. When you have thousands of fds - that should matter. Also interesting question is how difficult to modify existing event loops in event libraries in order to support recharging (EPOLLONESHOT in terms of epoll). Maybe Azat who maintains libevent can shed light on this (currently I see that libevent does not support "EPOLLONESHOT" logic). -- Roman