Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp242888ybi; Thu, 11 Jul 2019 18:34:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqxMs8yGuVq99U84NlN0IId6peMh+rqd7zFV5FLw5eggzUcVyzvqNpvOLmDFtkJf2Qt9itVX X-Received: by 2002:a17:902:7288:: with SMTP id d8mr8096366pll.133.1562895252407; Thu, 11 Jul 2019 18:34:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562895252; cv=none; d=google.com; s=arc-20160816; b=Qzoe6yBJN9fcIHCAB4+m4hdBStXr+/KhLtMYNt9A5Ina7FiiFsQwOqfW/kEF9s4GXQ YuIrGiNFE0WOFNYaLw9KC0mjk8hJAVyhk9NfsaxaAzqkfQu3E/aYneoKP6voe+TSpqKa 1LF+e1kipdAAVFB9QNau2Z1N9RNbJNL9tArE5CfKU4fhwKLsNnppV7Yl/FSY14tehyn5 DdFMJgkHJOqsuuZ1wUIWiwzrmnuvbnYxhVX0OiO2SMCz6BdILWkQGbIxVzImWhi53HW2 azS09RWhctkqkT77718bd8YO9xNbj/vBuSZ+BddYycsr+z5+avWm28Nao5WrJHyUdFTs V+OA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=2ynYddUCh/oJ7Y4MDtpZ1FC0AoRIsNNLd8M4/JQTEg4=; b=pPd9uaPB8dQ/Hjs76uFJI7339TG3u5GlPyMZOe7wkfFKkBvVyfY9sctS+ofJgVt4Ul EgCEam/k0UE28cm1ECheww5aTpPpwSwTBhZD1lf9zpShZc7LX1P3xGlHdV+UddDf1s9D nM9L5OhyopaDLru9ECh+Z3cWWdK+WM5lZrwh5xxmdJyADJbohtyMrpvvHfzeS19N8g0+ x0xjuYCTIQYxjrP3C8GFMPXfkCfhR+myooD2M0MX8d6Hygm6QuaQk9DPTcpHmVCJ06bs MtJLO6lpHz1oysblcKAryyfQ+Df2qypuGV/AQ3gojQnasqG6tDEz+QzOSqI6AIRvSQAk C0Ug== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f59si6203692plb.107.2019.07.11.18.33.56; Thu, 11 Jul 2019 18:34:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729125AbfGLBTW (ORCPT + 99 others); Thu, 11 Jul 2019 21:19:22 -0400 Received: from smtpq5.tb.mail.iss.as9143.net ([212.54.42.168]:53982 "EHLO smtpq5.tb.mail.iss.as9143.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729061AbfGLBTW (ORCPT ); Thu, 11 Jul 2019 21:19:22 -0400 Received: from [212.54.42.117] (helo=lsmtp3.tb.mail.iss.as9143.net) by smtpq5.tb.mail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1hlkDW-000327-Lw; Fri, 12 Jul 2019 03:19:18 +0200 Received: from 92-109-146-195.cable.dynamic.v4.ziggo.nl ([92.109.146.195] helo=mail9.alinoe.com) by lsmtp3.tb.mail.iss.as9143.net with esmtp (Exim 4.90_1) (envelope-from ) id 1hlkDW-00069n-Hq; Fri, 12 Jul 2019 03:19:18 +0200 Received: from carlo by mail9.alinoe.com with local (Exim 4.86_2) (envelope-from ) id 1hlkDV-0003FT-Ky; Fri, 12 Jul 2019 03:19:17 +0200 Date: Fri, 12 Jul 2019 03:19:17 +0200 From: Carlo Wood To: Andy Lutomirski Cc: wangyun@linux.vnet.ibm.com, palewis@adobe.com, LKML , Linux API , Al Viro , Andrew Morton , jbaron@redhat.com, pholland@adobe.com, Davide Libenzi , Michael Kerrisk , "Paul E. McKenney" , Neal Cardwell , carlo@alinoe.com Subject: Re: Is a new EPOLLCLOSED a solution to the problem that EPOLL_CTL_DISABLE tried to solve? Message-ID: <20190712031917.4eabf240@hikaru> In-Reply-To: References: <20190712014223.66326995@hikaru> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: carlo@alinoe.com X-SA-Exim-Scanned: No (on mail9.alinoe.com); SAEximRunCond expanded to false X-SourceIP: 92.109.146.195 X-Ziggo-spambar: / X-Ziggo-spamscore: 0.0 X-Ziggo-spamreport: CMAE Analysis: v=2.3 cv=JMuPTPCb c=1 sm=1 tr=0 a=at3gEZHPcpTZPMkiLtqVSg==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=0o9FgrsRnhwA:10 a=VwQbUJbxAAAA:8 a=BjFOTwK7AAAA:8 a=hNgpY4UhNZx4Anvnp_cA:9 a=CjuIK1q_8ugA:10 a=AjGcO6oz07-iQ99wixmX:22 a=N3Up1mgHhB-0MyeZKEz1:22 X-Ziggo-Spam-Status: No X-Spam-Status: No X-Spam-Flag: No Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andy, thank you for you quick reply. On Thu, 11 Jul 2019 17:32:21 -0700 Andy Lutomirski wrote: > > I propose to add a new EPOLL event EPOLLCLOSED that will cause > > epoll_wait to return (for that event) whenever a file descriptor is > > closed. > > This totally falls apart if you ever want to add a feature to your > library to detach the handler for a given fd without closing the fd. Another way to cause epoll_wait() to wake up for that specific fd is okay too, of course. For example, since the new event basically will mean "resources can now be deleted", the event could be called EPOLLDELETE. It is just needed to have some easy way to trigger this event. Nevertheless, in the more exceptional case that one wants to destroy the object/rss that data.ptr points to without closing the fd it is probably possible to first dup the fd and then close the old one. > > The Worker Thread then does not remove an object from the > > interest list, but either adds (if it was removed before) or > > modifies the event (using EPOLL_CTL_MOD) to watch that fd > > for closing, and then closes it. > > > > Aka, > > > > Working Thread: > > > > epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fd, &event); > > close(fd); > > > > where event contains the new EPOLLCLOSED (compare EPOLLOUT, EPOLLIN > > etc). > > > > This must then guarantee the event EPOLLCLOSED to be reported > > by exactly one epoll_wait(), the caller thread of which can then > > proceed with deleting the resources. > > > > Note that close(fd) must cause the removal from the interest list > > of any epoll struct before causing the event - and that the > > EPOLLCLOSED event may only be reported after all other events > > for that fd have already been reported (although it would be > > ok to report them at the same time: simply handle the other > > events first). > > This is a bunch of subtle semantics in the kernel to support your > particular use case. My particular use case? How so? The problem I'm trying to address is the fact that "It is not currently possible to reliably delete epoll items when using the same epoll set from multiple threads", end quote of Paton Lewis' email from 2012. If there is a simple, generally accepted solution for this in user-space then of course there is no reason to change the kernel; but despite all my efforts to research the net for a solution for this, all can find are people with the same question and no good answers. If there was a way to pass a special event to the thread waiting in epoll_wait() that it now is safe to free the memory that data.ptr is pointing to, then problem would evaporate to something trivally simple. Lets say we would not be using close(2), but instead some epoll_destruct(epoll_fd, fd). Then the worker thread, instead of, if (last reference to object has gone) { epoll_ctl(epoll_fd, EPOLL_CTL_DEL, object->fd, NULL); delete object; // Unsafe } could do, if (last reference to object has gone) epoll_destuct(epoll_fd, object->fd); // Or [optionally dup() and] close(object->fd); Whereas the thread that waits for epoll_fd would take care of the deletion: for (;;) { int ready = epoll_wait(epoll_fd, s_events, maxevents, -1); while (ready > 0) { epoll_event& event(s_events[--ready]); if ((event.events & EPOLLDELETE)) // Or EPOLLCLOSED, or // whatever the name is. { delete event.data.ptr; break; } // Handle other events. } } In this case, if epoll_wait() had returned just prior to the call to epoll_destruct()/close(), the object will not be deleted; The returned events would be handled, epoll_wait() reentered, and only once EPOLLDELETE is returned the object would be deleted. The bunch of subtle requirements as you call it is just about how to implement this in a way that it will do what it is supposed to do, and in no way specific for my particular use case. The requirements are, namely: 1. Only one epoll_fd may have added an epoll_event structure with a pointer to the resource (if more than one need to point to it, you could add a pointer to a smart pointer to the object to the epoll_event structure instead). I'd be surprised if this is not ALREADY a required for normal implementations. 2. The call to epoll_destruct() (as introduced as example above) must remove the fd from the epoll's interest list. Of course it doesn't HAVE to - but then the user MUST call epoll_ctl(epoll_fd, EPOLL_CTL_DEL, ...) right before it, so why not? The reason I opted close(2) is because that ALREADY has this behavior, hence it seems a good candidate for epoll_destruct. 3. After an EPOLLDELETE (formly EPOLLCLOSED) has been returned by an epoll_wait() no other events may be returned for that fd. This is obvious, and should be easy to implement. I just added it for completeness ;). > But this case is fairly straightforward with the user mode approach -- > for example, add it to the list for all threads calling epoll_wait. > Or otherwise defer the deletion until all epoll_wait threads have > woken. That really seems a hell of lot more complex (involving mutexes and updating a queue that might grow till unknown sizes, hence requiring possibly calls to malloc) then my proposed solution, for something that basically every application that uses epoll needs. -- Carlo Wood