Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762174Ab2KASsB (ORCPT ); Thu, 1 Nov 2012 14:48:01 -0400 Received: from exprod6og114.obsmtp.com ([64.18.1.33]:48802 "EHLO exprod6og114.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750796Ab2KASr6 (ORCPT ); Thu, 1 Nov 2012 14:47:58 -0400 Message-ID: <5092C3B9.2070909@adobe.com> Date: Thu, 1 Nov 2012 11:47:21 -0700 From: "Paton J. Lewis" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:15.0) Gecko/20120824 Thunderbird/15.0 MIME-Version: 1.0 To: Michael Wang CC: Alexander Viro , Andrew Morton , Jason Baron , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Paul Holland , Davide Libenzi , Michael Kerrisk , "libc-alpha@sourceware.org" , "linux-api@vger.kernel.org" , "paulmck@linux.vnet.ibm.com" Subject: Re: [PATCH v3] epoll: Support for disabling items, and a self-test app. References: <1351210112-23238-1-git-send-email-palewis@adobe.com> <5090C5F6.6000603@linux.vnet.ibm.com> <5091749B.4060605@adobe.com> <5091C5C2.8060009@linux.vnet.ibm.com> In-Reply-To: <5091C5C2.8060009@linux.vnet.ibm.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 28648 Lines: 754 On 10/31/12 5:43 PM, Michael Wang wrote: > On 11/01/2012 02:57 AM, Paton J. Lewis wrote: >> On 10/30/12 11:32 PM, Michael Wang wrote: >>> On 10/26/2012 08:08 AM, Paton J. Lewis wrote: >>>> From: "Paton J. Lewis" >>>> >>>> It is not currently possible to reliably delete epoll items when >>>> using the >>>> same epoll set from multiple threads. After calling epoll_ctl with >>>> EPOLL_CTL_DEL, another thread might still be executing code related >>>> to an >>>> event for that epoll item (in response to epoll_wait). Therefore the >>>> deleting >>>> thread does not know when it is safe to delete resources pertaining >>>> to the >>>> associated epoll item because another thread might be using those >>>> resources. >>>> >>>> The deleting thread could wait an arbitrary amount of time after calling >>>> epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is >>>> inefficient and could result in the destruction of resources before >>>> another >>>> thread is done handling an event returned by epoll_wait. >>>> >>>> This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which >>>> disables an >>>> epoll item. If epoll_ctl returns -EBUSY in this case, then another >>>> thread may >>>> handling a return from epoll_wait for this item. Otherwise if epoll_ctl >>>> returns 0, then it is safe to delete the epoll item. This allows >>>> multiple >>>> threads to use a mutex to determine when it is safe to delete an >>>> epoll item >>>> and its associated resources, which allows epoll items to be deleted >>>> both >>>> efficiently and without error in a multi-threaded environment. Note that >>>> EPOLL_CTL_DISABLE is only useful in conjunction with EPOLLONESHOT, >>>> and using >>>> EPOLL_CTL_DISABLE on an epoll item without EPOLLONESHOT returns -EINVAL. >>>> >>>> This patch also adds a new test_epoll self-test program to both >>>> demonstrate >>>> the need for this feature and test it. >>> >>> Hi, Paton >>> >>> I'm just think about may be we could use this way. >>> >>> Seems like currently we are depending on the epoll_ctl() to indicate the >>> start point of safe section and epoll_wait() for the end point, like: >>> >>> while () { >>> epoll_wait() -------------- >>> >>> fd event arrived safe section >>> >>> clear fd epi->event.events >>> -------------- >>> if (fd need stop) >>> continue; >>> -------------- >>> ...fd data process... >>> >>> epoll_ctl(MOD) danger section >>> >>> set fd epi->event.events -------------- >>> >>> continue; >>> } >>> >>> So we got a safe section and do delete work in this section won't cause >>> trouble since we have a stop check directly after it. >>> >>> Actually what we want is to make sure no one will touch the fd any more >>> after we DISABLE it. >>> >>> Then what about we add a ref count and a stop flag in epi, maintain it >>> like: >>> >>> epoll_wait() >>> >>> check user events and >>> dec the ref count of fd --------------------------- >>> >>> ... >>> >>> fd event arrived safe sec if ref count is 0 >>> >>> if epi stop flag set >>> do nothing >>> else >>> inc epi ref count --------------------------- >> >> The pseudecode you provide below (for "DISABLE") seems to indicate that >> this "epi ref count" must be maintained by the kernel. Therefore any >> userspace modification of a ref count associated with an epoll item will >> require a new or changed kernel API. >> >>> send event >>> >>> And what DISABLE do is: >>> >>> set epi stop flag >>> >>> if epi ref count is not 0 >>> wait until ref count be 0 >> >> Perhaps I don't fully understand what you're proposing, but I don't >> think it's reasonable for a kernel API (epoll_ctl in this case) to block >> while waiting for a userspace action (decrementing the ref count) that >> might never occur. >> >> Andrew Morton also proposed using ref counting in response to my initial >> patch submission; my reply to his proposal might also be applicable to >> your proposal. A link to that discussion thread: >> http://thread.gmane.org/gmane.linux.kernel/1311457/focus=1315096 >> >> Sorry if I am misunderstanding your proposal, but I don't see how it >> solves the original problem. > > I just try to find out whether we could using DISABLE with out ONESHOT :) > > My currently understanding is: > > 1. we actually want to determine the part between each epoll_wait() in a > while(). > > 2. we can't count on epoll_wait() itself, since no info pass to kernel > to indicate whether it was invoked after another epoll_wait() in the > same while(). > > 3. so we need epoll_ctl(MOD) to tell kernel: user finished process data > after epoll_wait(), and those data belong to which epi. > > 4. since 3 we need ONESHOT to be enabled. > > > Is this the reason we rely on ONESHOT to be enabled? No, we need to use EPOLLONESHOT to ensure that only one worker thread at a time can ever be handling private data associated with a given fd. This constraint allows a deletion thread to coordinate with that worker thread via a mutex to cleanly and quickly delete the associated private data (provided that the deletion thread knows whether or not such a worker thread exists in that state, which the question that EPOLL_CTL_DISABLE answers). Pat > If it is, then if we could do some change to make epoll_wait() have the > ability: > > 1. indicate whether it is invoked after another epoll_wait() in same while() > > 2. indicate which epi has been fired by last epoll_wait() > > 3. no changes on the api for user > > Then we could mark the safe section inside epoll_wait(), and we don't > need MOD or ONESHOT any more, is that correct? > > > Regards, > Michael Wang > >> >> Pat >> >>> So after DISABLE return, we can safely delete any thing related to >>> that epi. >>> >>> One thing is that the user should not change the events info returned by >>> epoll_wait(). >>> >>> It's just a propose, but if it works, there will be no limit on ONESHOT >>> any more ;-) >>> >>> Regards, >>> Michael Wang >>> >>>> >>>> Signed-off-by: Paton J. Lewis >>>> --- >>>> fs/eventpoll.c | 40 ++- >>>> include/linux/eventpoll.h | 1 + >>>> tools/testing/selftests/Makefile | 2 +- >>>> tools/testing/selftests/epoll/Makefile | 11 + >>>> tools/testing/selftests/epoll/test_epoll.c | 364 >>>> ++++++++++++++++++++++++++++ >>>> 5 files changed, 414 insertions(+), 4 deletions(-) >>>> create mode 100644 tools/testing/selftests/epoll/Makefile >>>> create mode 100644 tools/testing/selftests/epoll/test_epoll.c >>>> >>>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c >>>> index 739b098..c718afd 100644 >>>> --- a/fs/eventpoll.c >>>> +++ b/fs/eventpoll.c >>>> @@ -339,7 +339,7 @@ static inline struct epitem >>>> *ep_item_from_epqueue(poll_table *p) >>>> /* Tells if the epoll_ctl(2) operation needs an event copy from >>>> userspace */ >>>> static inline int ep_op_has_event(int op) >>>> { >>>> - return op != EPOLL_CTL_DEL; >>>> + return op == EPOLL_CTL_ADD || op == EPOLL_CTL_MOD; >>>> } >>>> >>>> /* Initialize the poll safe wake up structure */ >>>> @@ -664,6 +664,36 @@ static int ep_remove(struct eventpoll *ep, >>>> struct epitem *epi) >>>> return 0; >>>> } >>>> >>>> +/* >>>> + * Disables a "struct epitem" in the eventpoll set. Returns -EBUSY >>>> if the item >>>> + * had no event flags set, indicating that another thread may be >>>> currently >>>> + * handling that item's events (in the case that EPOLLONESHOT was being >>>> + * used). Otherwise a zero result indicates that the item has been >>>> disabled >>>> + * from receiving events. A disabled item may be re-enabled via >>>> + * EPOLL_CTL_MOD. Must be called with "mtx" held. >>>> + */ >>>> +static int ep_disable(struct eventpoll *ep, struct epitem *epi) >>>> +{ >>>> + int result = 0; >>>> + unsigned long flags; >>>> + >>>> + spin_lock_irqsave(&ep->lock, flags); >>>> + if (epi->event.events & EPOLLONESHOT) { >>>> + if (epi->event.events & ~EP_PRIVATE_BITS) { >>>> + if (ep_is_linked(&epi->rdllink)) >>>> + list_del_init(&epi->rdllink); >>>> + /* Ensure ep_poll_callback will not add epi >>>> back onto >>>> + ready list: */ >>>> + epi->event.events &= EP_PRIVATE_BITS; >>>> + } else >>>> + result = -EBUSY; >>>> + } else >>>> + result = -EINVAL; >>>> + spin_unlock_irqrestore(&ep->lock, flags); >>>> + >>>> + return result; >>>> +} >>>> + >>>> static void ep_free(struct eventpoll *ep) >>>> { >>>> struct rb_node *rbp; >>>> @@ -996,8 +1026,6 @@ static void ep_rbtree_insert(struct eventpoll >>>> *ep, struct epitem *epi) >>>> rb_insert_color(&epi->rbn, &ep->rbr); >>>> } >>>> >>>> - >>>> - >>>> #define PATH_ARR_SIZE 5 >>>> /* >>>> * These are the number paths of length 1 to 5, that we are >>>> allowing to emanate >>>> @@ -1701,6 +1729,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, >>>> int, fd, >>>> } else >>>> error = -ENOENT; >>>> break; >>>> + case EPOLL_CTL_DISABLE: >>>> + if (epi) >>>> + error = ep_disable(ep, epi); >>>> + else >>>> + error = -ENOENT; >>>> + break; >>>> } >>>> mutex_unlock(&ep->mtx); >>>> >>>> diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h >>>> index 657ab55..e91f7e3 100644 >>>> --- a/include/linux/eventpoll.h >>>> +++ b/include/linux/eventpoll.h >>>> @@ -25,6 +25,7 @@ >>>> #define EPOLL_CTL_ADD 1 >>>> #define EPOLL_CTL_DEL 2 >>>> #define EPOLL_CTL_MOD 3 >>>> +#define EPOLL_CTL_DISABLE 4 >>>> >>>> /* Set the One Shot behaviour for the target file descriptor */ >>>> #define EPOLLONESHOT (1 << 30) >>>> diff --git a/tools/testing/selftests/Makefile >>>> b/tools/testing/selftests/Makefile >>>> index 28bc57e..4cf477f 100644 >>>> --- a/tools/testing/selftests/Makefile >>>> +++ b/tools/testing/selftests/Makefile >>>> @@ -1,4 +1,4 @@ >>>> -TARGETS = breakpoints vm >>>> +TARGETS = breakpoints epoll vm >>>> >>>> all: >>>> for TARGET in $(TARGETS); do \ >>>> diff --git a/tools/testing/selftests/epoll/Makefile >>>> b/tools/testing/selftests/epoll/Makefile >>>> new file mode 100644 >>>> index 0000000..19806ed >>>> --- /dev/null >>>> +++ b/tools/testing/selftests/epoll/Makefile >>>> @@ -0,0 +1,11 @@ >>>> +# Makefile for epoll selftests >>>> + >>>> +all: test_epoll >>>> +%: %.c >>>> + gcc -pthread -g -o $@ $^ >>>> + >>>> +run_tests: all >>>> + ./test_epoll >>>> + >>>> +clean: >>>> + $(RM) test_epoll >>>> diff --git a/tools/testing/selftests/epoll/test_epoll.c >>>> b/tools/testing/selftests/epoll/test_epoll.c >>>> new file mode 100644 >>>> index 0000000..54284eb >>>> --- /dev/null >>>> +++ b/tools/testing/selftests/epoll/test_epoll.c >>>> @@ -0,0 +1,364 @@ >>>> +/* >>>> + * tools/testing/selftests/epoll/test_epoll.c >>>> + * >>>> + * Copyright 2012 Adobe Systems Incorporated >>>> + * >>>> + * This program is free software; you can redistribute it and/or >>>> modify >>>> + * it under the terms of the GNU General Public License as >>>> published by >>>> + * the Free Software Foundation; either version 2 of the License, or >>>> + * (at your option) any later version. >>>> + * >>>> + * Paton J. Lewis >>>> + * >>>> + */ >>>> + >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> + >>>> +/* >>>> + * A pointer to an epoll_item_private structure will be stored in >>>> the epoll >>>> + * item's event structure so that we can get access to the >>>> epoll_item_private >>>> + * data after calling epoll_wait: >>>> + */ >>>> +struct epoll_item_private { >>>> + int index; /* Position of this struct within the epoll_items >>>> array. */ >>>> + int fd; >>>> + uint32_t events; >>>> + pthread_mutex_t mutex; /* Guards the following variables... */ >>>> + int stop; >>>> + int status; /* Stores any error encountered while handling >>>> item. */ >>>> + /* The following variable allows us to test whether we have >>>> encountered >>>> + a problem while attempting to cancel and delete the associated >>>> + event. When the test program exits, 'deleted' should be exactly >>>> + one. If it is greater than one, then the failed test >>>> reflects a real >>>> + world situation where we would have tried to access the >>>> epoll item's >>>> + private data after deleting it: */ >>>> + int deleted; >>>> +}; >>>> + >>>> +struct epoll_item_private *epoll_items; >>>> + >>>> +/* >>>> + * Delete the specified item from the epoll set. In a real-world >>>> secneario this >>>> + * is where we would free the associated data structure, but in this >>>> testing >>>> + * environment we retain the structure so that we can test for >>>> double-deletion: >>>> + */ >>>> +void delete_item(int index) >>>> +{ >>>> + __sync_fetch_and_add(&epoll_items[index].deleted, 1); >>>> +} >>>> + >>>> +/* >>>> + * A pointer to a read_thread_data structure will be passed as the >>>> argument to >>>> + * each read thread: >>>> + */ >>>> +struct read_thread_data { >>>> + int stop; >>>> + int status; /* Indicates any error encountered by the read >>>> thread. */ >>>> + int epoll_set; >>>> +}; >>>> + >>>> +/* >>>> + * The function executed by the read threads: >>>> + */ >>>> +void *read_thread_function(void *function_data) >>>> +{ >>>> + struct read_thread_data *thread_data = >>>> + (struct read_thread_data *)function_data; >>>> + struct epoll_event event_data; >>>> + struct epoll_item_private *item_data; >>>> + char socket_data; >>>> + >>>> + /* Handle events until we encounter an error or this thread's >>>> 'stop' >>>> + condition is set: */ >>>> + while (1) { >>>> + int result = epoll_wait(thread_data->epoll_set, >>>> + &event_data, >>>> + 1, /* Number of desired >>>> events */ >>>> + 1000); /* Timeout in ms */ >>>> + if (result < 0) { >>>> + /* Breakpoints signal all threads. Ignore that >>>> while >>>> + debugging: */ >>>> + if (errno == EINTR) >>>> + continue; >>>> + thread_data->status = errno; >>>> + return 0; >>>> + } else if (thread_data->stop) >>>> + return 0; >>>> + else if (result == 0) /* Timeout */ >>>> + continue; >>>> + >>>> + /* We need the mutex here because checking for the stop >>>> + condition and re-enabling the epoll item need to be >>>> done >>>> + together as one atomic operation when >>>> EPOLL_CTL_DISABLE is >>>> + available: */ >>>> + item_data = (struct epoll_item_private >>>> *)event_data.data.ptr; >>>> + pthread_mutex_lock(&item_data->mutex); >>>> + >>>> + /* Remove the item from the epoll set if we want to stop >>>> + handling that event: */ >>>> + if (item_data->stop) >>>> + delete_item(item_data->index); >>>> + else { >>>> + /* Clear the data that was written to the other >>>> end of >>>> + our non-blocking socket: */ >>>> + do { >>>> + if (read(item_data->fd, &socket_data, >>>> 1) < 1) { >>>> + if ((errno == EAGAIN) || >>>> + (errno == EWOULDBLOCK)) >>>> + break; >>>> + else >>>> + goto error_unlock; >>>> + } >>>> + } while (item_data->events & EPOLLET); >>>> + >>>> + /* The item was one-shot, so re-enable it: */ >>>> + event_data.events = item_data->events; >>>> + if (epoll_ctl(thread_data->epoll_set, >>>> + EPOLL_CTL_MOD, >>>> + item_data->fd, >>>> + &event_data) < 0) >>>> + goto error_unlock; >>>> + } >>>> + >>>> + pthread_mutex_unlock(&item_data->mutex); >>>> + } >>>> + >>>> +error_unlock: >>>> + thread_data->status = item_data->status = errno; >>>> + pthread_mutex_unlock(&item_data->mutex); >>>> + return 0; >>>> +} >>>> + >>>> +/* >>>> + * A pointer to a write_thread_data structure will be passed as the >>>> argument to >>>> + * the write thread: >>>> + */ >>>> +struct write_thread_data { >>>> + int stop; >>>> + int status; /* Indicates any error encountered by the write >>>> thread. */ >>>> + int n_fds; >>>> + int *fds; >>>> +}; >>>> + >>>> +/* >>>> + * The function executed by the write thread. It writes a single >>>> byte to each >>>> + * socket in turn until the stop condition for this thread is set. >>>> If writing to >>>> + * a socket would block (i.e. errno was EAGAIN), we leave that >>>> socket alone for >>>> + * the moment and just move on to the next socket in the list. We >>>> don't care >>>> + * about the order in which we deliver events to the epoll set. In >>>> fact we don't >>>> + * care about the data we're writing to the pipes at all; we just >>>> want to >>>> + * trigger epoll events: >>>> + */ >>>> +void *write_thread_function(void *function_data) >>>> +{ >>>> + const char data = 'X'; >>>> + int index; >>>> + struct write_thread_data *thread_data = >>>> + (struct write_thread_data *)function_data; >>>> + while (!thread_data->stop) >>>> + for (index = 0; >>>> + !thread_data->stop && (index < thread_data->n_fds); >>>> + ++index) >>>> + if ((write(thread_data->fds[index], &data, 1) < >>>> 1) && >>>> + (errno != EAGAIN) && >>>> + (errno != EWOULDBLOCK)) { >>>> + thread_data->status = errno; >>>> + return; >>>> + } >>>> +} >>>> + >>>> +/* >>>> + * Arguments are currently ignored: >>>> + */ >>>> +int main(int argc, char **argv) >>>> +{ >>>> + const int n_read_threads = 100; >>>> + const int n_epoll_items = 500; >>>> + int index; >>>> + int epoll_set = epoll_create1(0); >>>> + struct write_thread_data write_thread_data = { >>>> + 0, 0, n_epoll_items, malloc(n_epoll_items * sizeof(int)) >>>> + }; >>>> + struct read_thread_data *read_thread_data = >>>> + malloc(n_read_threads * sizeof(struct read_thread_data)); >>>> + pthread_t *read_threads = malloc(n_read_threads * >>>> sizeof(pthread_t)); >>>> + pthread_t write_thread; >>>> + int socket_pair[2]; >>>> + struct epoll_event event_data; >>>> + >>>> + printf("-----------------\n"); >>>> + printf("Runing test_epoll\n"); >>>> + printf("-----------------\n"); >>>> + >>>> + epoll_items = malloc(n_epoll_items * sizeof(struct >>>> epoll_item_private)); >>>> + >>>> + if (epoll_set < 0 || epoll_items == 0 || write_thread_data.fds >>>> == 0 || >>>> + read_thread_data == 0 || read_threads == 0) >>>> + goto error; >>>> + >>>> + if (sysconf(_SC_NPROCESSORS_ONLN) < 2) { >>>> + printf("Error: please run this test on a multi-core >>>> system.\n"); >>>> + goto error; >>>> + } >>>> + >>>> + /* Create the socket pairs and epoll items: */ >>>> + for (index = 0; index < n_epoll_items; ++index) { >>>> + if (socketpair(AF_UNIX, >>>> + SOCK_STREAM | SOCK_NONBLOCK, >>>> + 0, >>>> + socket_pair) < 0) >>>> + goto error; >>>> + write_thread_data.fds[index] = socket_pair[0]; >>>> + epoll_items[index].index = index; >>>> + epoll_items[index].fd = socket_pair[1]; >>>> + if (pthread_mutex_init(&epoll_items[index].mutex, NULL) >>>> != 0) >>>> + goto error; >>>> + /* We always use EPOLLONESHOT because this test is >>>> currently >>>> + structured to demonstrate the need for >>>> EPOLL_CTL_DISABLE, >>>> + which only produces useful information in the >>>> EPOLLONESHOT >>>> + case (without EPOLLONESHOT, calling epoll_ctl with >>>> + EPOLL_CTL_DISABLE will never return EBUSY). If >>>> support for >>>> + testing events without EPOLLONESHOT is desired, it >>>> should >>>> + probably be implemented in a separate unit test. */ >>>> + epoll_items[index].events = EPOLLIN | EPOLLONESHOT; >>>> + if (index < n_epoll_items / 2) >>>> + epoll_items[index].events |= EPOLLET; >>>> + epoll_items[index].stop = 0; >>>> + epoll_items[index].status = 0; >>>> + epoll_items[index].deleted = 0; >>>> + event_data.events = epoll_items[index].events; >>>> + event_data.data.ptr = &epoll_items[index]; >>>> + if (epoll_ctl(epoll_set, >>>> + EPOLL_CTL_ADD, >>>> + epoll_items[index].fd, >>>> + &event_data) < 0) >>>> + goto error; >>>> + } >>>> + >>>> +#ifdef EPOLL_CTL_DISABLE >>>> + /* Test to make sure that using EPOLL_CTL_DISABLE without >>>> EPOLLONESHOT >>>> + returns a clear error: */ >>>> + if (socketpair(AF_UNIX, >>>> + SOCK_STREAM | SOCK_NONBLOCK, >>>> + 0, >>>> + socket_pair) < 0) >>>> + goto error; >>>> + event_data.events = EPOLLIN; >>>> + event_data.data.ptr = NULL; >>>> + if (epoll_ctl(epoll_set, EPOLL_CTL_ADD, >>>> + socket_pair[1], &event_data) < 0) >>>> + goto error; >>>> + if ((epoll_ctl(epoll_set, EPOLL_CTL_DISABLE, >>>> + socket_pair[1], NULL) == 0) || (errno != EINVAL)) >>>> + goto error; >>>> + if (epoll_ctl(epoll_set, EPOLL_CTL_DEL, socket_pair[1], NULL) >>>> != 0) >>>> + goto error; >>>> +#endif >>>> + >>>> + /* Create and start the read threads: */ >>>> + for (index = 0; index < n_read_threads; ++index) { >>>> + read_thread_data[index].stop = 0; >>>> + read_thread_data[index].status = 0; >>>> + read_thread_data[index].epoll_set = epoll_set; >>>> + if (pthread_create(&read_threads[index], >>>> + NULL, >>>> + read_thread_function, >>>> + &read_thread_data[index]) != 0) >>>> + goto error; >>>> + } >>>> + >>>> + if (pthread_create(&write_thread, >>>> + NULL, >>>> + write_thread_function, >>>> + &write_thread_data) != 0) >>>> + goto error; >>>> + >>>> + /* Cancel all event pollers: */ >>>> +#ifdef EPOLL_CTL_DISABLE >>>> + for (index = 0; index < n_epoll_items; ++index) { >>>> + pthread_mutex_lock(&epoll_items[index].mutex); >>>> + ++epoll_items[index].stop; >>>> + if (epoll_ctl(epoll_set, >>>> + EPOLL_CTL_DISABLE, >>>> + epoll_items[index].fd, >>>> + NULL) == 0) >>>> + delete_item(index); >>>> + else if (errno != EBUSY) { >>>> + pthread_mutex_unlock(&epoll_items[index].mutex); >>>> + goto error; >>>> + } >>>> + /* EBUSY means events were being handled; allow the >>>> other thread >>>> + to delete the item. */ >>>> + pthread_mutex_unlock(&epoll_items[index].mutex); >>>> + } >>>> +#else >>>> + for (index = 0; index < n_epoll_items; ++index) { >>>> + pthread_mutex_lock(&epoll_items[index].mutex); >>>> + ++epoll_items[index].stop; >>>> + pthread_mutex_unlock(&epoll_items[index].mutex); >>>> + /* Wait in case a thread running read_thread_function is >>>> + currently executing code between epoll_wait and >>>> + pthread_mutex_lock with this item. Note that a >>>> longer delay >>>> + would make double-deletion less likely (at the >>>> expense of >>>> + performance), but there is no guarantee that any >>>> delay would >>>> + ever be sufficient. Note also that we delete all event >>>> + pollers at once for testing purposes, but in a >>>> real-world >>>> + environment we are likely to want to be able to >>>> cancel event >>>> + pollers at arbitrary times. Therefore we can't >>>> improve this >>>> + situation by just splitting this loop into two loops >>>> + (i.e. signal 'stop' for all items, sleep, and then >>>> delete all >>>> + items). We also can't fix the problem via EPOLL_CTL_DEL >>>> + because that command can't prevent the case where >>>> some other >>>> + thread is executing read_thread_function within the >>>> region >>>> + mentioned above: */ >>>> + usleep(1); >>>> + pthread_mutex_lock(&epoll_items[index].mutex); >>>> + if (!epoll_items[index].deleted) >>>> + delete_item(index); >>>> + pthread_mutex_unlock(&epoll_items[index].mutex); >>>> + } >>>> +#endif >>>> + >>>> + /* Shut down the read threads: */ >>>> + for (index = 0; index < n_read_threads; ++index) >>>> + __sync_fetch_and_add(&read_thread_data[index].stop, 1); >>>> + for (index = 0; index < n_read_threads; ++index) { >>>> + if (pthread_join(read_threads[index], NULL) != 0) >>>> + goto error; >>>> + if (read_thread_data[index].status) >>>> + goto error; >>>> + } >>>> + >>>> + /* Shut down the write thread: */ >>>> + __sync_fetch_and_add(&write_thread_data.stop, 1); >>>> + if ((pthread_join(write_thread, NULL) != 0) || >>>> write_thread_data.status) >>>> + goto error; >>>> + >>>> + /* Check for final error conditions: */ >>>> + for (index = 0; index < n_epoll_items; ++index) { >>>> + if (epoll_items[index].status != 0) >>>> + goto error; >>>> + if (pthread_mutex_destroy(&epoll_items[index].mutex) < 0) >>>> + goto error; >>>> + } >>>> + for (index = 0; index < n_epoll_items; ++index) >>>> + if (epoll_items[index].deleted != 1) { >>>> + printf("Error: item data deleted %1d times.\n", >>>> + epoll_items[index].deleted); >>>> + goto error; >>>> + } >>>> + >>>> + printf("[PASS]\n"); >>>> + return 0; >>>> + >>>> + error: >>>> + printf("[FAIL]\n"); >>>> + return errno; >>>> +} >>>> >>> >>> . >>> >> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/