Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp3624174ybd; Tue, 25 Jun 2019 05:56:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqyGng34BiP4u5u3WswikpVGaTiGAGxWZ1ttlc+W2KAxZnK/xeIqVoeijeurwoSGZyiUl3Cq X-Received: by 2002:a63:1d5:: with SMTP id 204mr40019056pgb.207.1561467401945; Tue, 25 Jun 2019 05:56:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561467401; cv=none; d=google.com; s=arc-20160816; b=i+tyTI6StijLxPT0VUsP7JKp5S+xUqvNdo2paEPyS4r4cm7nRQHjPFCQ0DWv3mnAS5 dL3+vG3bzs4Qb+/QXeNMXOIY7JXK5NjctCANKcKbUyRGx24LLDvKDSFBz/8EBaDTWObc 9ERxjg6+6VWNAIfNs8WtGloJkIRWFnct5GaKDSRkUf5D4+SltyzrdPlAPZ5tnfZNUKou EJVlgbZiud44bpoA3kt/0yL8HfXGB10nYMnabLgKPjAqBkrBSAIGwIKukwwbXRzfPHpo fK7WYeo7k1V4RbkDanMt0vkyqJqhQSQfqvsn9BGmgAguD22SXx7IlIIV3BMbPtB7mdAg DSig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version; bh=DXGp3jKTzla5HQqYSXXgSDvagv26/WafPMg1WoqqZzo=; b=0K1f41wHk/yE/3mX2trSLnuVBA6TB8pXIZ3BDpypYxpUot2+XHO5HtFx14t4SNlUuN b7JrRyzgNBAdxHDBWmUmgIvPzfuRsGiG7chEH6l/K2zV/uyE62W1tKi1Qg61PaCouBRo QPciYXSE+7ToM2TY4h2d8fY4mviQdWmGXhDO4xR/5AiXwprPcOJC0edzYec0zLFYBg5F lxZ4j0S4PI3CZxb7ZV2ZjW+ITwbf9efhZ63BLr2+4WFdLqY9qF9ysFI9O+bqB+6TPUx0 5kRSikfvRaoQSVbbdXWTt0R6mC3cRM9eF14Tm4WmOZzODoDmWEwAtB1TE5uFlviWCSnf zoaA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 6si285581pld.293.2019.06.25.05.56.26; Tue, 25 Jun 2019 05:56:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730300AbfFYLHF (ORCPT + 99 others); Tue, 25 Jun 2019 07:07:05 -0400 Received: from mx2.suse.de ([195.135.220.15]:49856 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730028AbfFYLHE (ORCPT ); Tue, 25 Jun 2019 07:07:04 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 63EC2ACA7; Tue, 25 Jun 2019 11:07:03 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Tue, 25 Jun 2019 13:07:02 +0200 From: Roman Penyaev To: Eric Wong Cc: Jason Baron , Andrew Morton , Al Viro , Linus Torvalds , Peter Zijlstra , Azat Khuzhin , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 00/14] epoll: support pollable epoll from userspace In-Reply-To: <20190625002456.unhdqihvs5lqcjn6@dcvr> References: <20190624144151.22688-1-rpenyaev@suse.de> <20190625002456.unhdqihvs5lqcjn6@dcvr> Message-ID: <1e50e45cfc832320999f21a81790a060@suse.de> X-Sender: rpenyaev@suse.de User-Agent: Roundcube Webmail Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019-06-25 02:24, Eric Wong wrote: > Roman Penyaev wrote: >> Hi all, > > +cc Jason Baron > >> ** Limitations > > > >> 4. No support for EPOLLEXCLUSIVE >> If device does not pass pollflags to wake_up() there is no way to >> call poll() from the context under spinlock, thus special work is >> scheduled to offload polling. In this specific case we can't >> support exclusive wakeups, because we do not know actual result >> of scheduled work and have to wake up every waiter. > > Lacking EPOLLEXCLUSIVE support is probably a showstopper for > common applications using per-task epoll combined with > non-blocking accept4() (e.g. nginx). For the 'accept' case it seems SO_REUSEPORT can be used: https://lwn.net/Articles/542629/ Although I've never tried it in O_NONBLOCK + epoll scenario. But I've just again dived into this add-wait-exclusive logic and it seems possible to support EPOLLEXCLUSIVE by iterating over all "epis" for a particular fd, which has been woken up. For now I want to leave it as is just not to overcomplicate the code. > Fwiw, I'm still a weirdo who prefers a dedicated thread doing > blocking accept4 for distribution between tasks (so epoll never > sees a listen socket). But, depending on what runtime/language > I'm using, I can't always dedicate a blocking thread, so I > recently started using EPOLLEXCLUSIVE from Perl5 where I > couldn't rely on threads being available. > > > If I could dedicate time to improving epoll; I'd probably > add writev() support for batching epoll_ctl modifications > to reduce syscall traffic, or pick-up the kevent()-like interface > started long ago: > https://lore.kernel.org/lkml/1393206162-18151-1-git-send-email-n1ght.4nd.d4y@gmail.com/ > (but I'm not sure I want to increase the size of the syscall table). There is also fresh fs/io_uring.c thingy, which supports polling and batching (among other IO things). But polling there acts only as a single-shot, so it might make sense to support there event subscription instead of resurrecting kevent and co. -- Roman