Return-Path: linux-nfs-owner@vger.kernel.org Received: from mx2.netapp.com ([216.240.18.37]:36753 "EHLO mx2.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753086Ab2CSQoJ convert rfc822-to-8bit (ORCPT ); Mon, 19 Mar 2012 12:44:09 -0400 From: "Adamson, Andy" To: "Adamson, Andy" CC: "Myklebust, Trond" , Andy Adamson , "linux-nfs@vger.kernel.org" Subject: Re: [PATCH Version 1 08/11] SUNRPC: add rpc_drain_queue to empty an rpc_waitq Date: Mon, 19 Mar 2012 16:44:07 +0000 Message-ID: <69F651CF-0BAF-491F-B557-BC0607356E4F@netapp.com> References: <1331836850-5195-1-git-send-email-andros@netapp.com> <1331836850-5195-9-git-send-email-andros@netapp.com> <1331856643.24392.3.camel@lade.trondhjem.org> <1331911317.2518.14.camel@lade.trondhjem.org> <8587FE07-0E16-4E0B-BCBE-7B295B89062C@netapp.com> In-Reply-To: <8587FE07-0E16-4E0B-BCBE-7B295B89062C@netapp.com> Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org List-ID: The reason rpc_wake_up() does not wake up all the tasks, is that it uses list_for_each_entry_safe which assigns the "next" task->u.tk_wait.lists pointer in it's for loop without taking into consideration the priority queue tasks hanging off the task->u.tk_wait.links list (note "list" vrs "link") assigned in the for loop body by the call to rpc_wake_up_task_queue_locked. Say we have a wait queue with the following: One task in the list (T1) and two priority tasks in T1's "links" list (T2, T3). H -> T1 -> H the next pointers in the "lists" list (T1->u.tk_wait.list) hanging off the list head "H" T1 -> T2 -> T3->T1 the next pointers in the "links" list. (T1->u.tk_wait.links is the list head) Here is what rpc_wake_up_task_queue_locked does (it calls __rpc_remove_wait_queue_priority on priority queues) H -> T2 ->H T2 -> T3 -> T2 with T1 removed. This is exactly what should happen, T1 is removed, T1's u.tk_wait.links list is spliced onto T2's u.tk_wait.links, and T2 is moved to H's list of u.tk_wait.list tasks. #define list_for_each_entry_safe(pos, n, head, member) \ for (pos = list_entry((head)->next, typeof(*pos), member), \ n = list_entry(pos->member.next, typeof(*pos), member); \ &pos->member != (head); \ pos = n, n = list_entry(n->member.next, typeof(*n), member)) BUT! the for loop in list_for_each_entry_safe(task, next, head, u.tk_wait.list) does the following on the above example: Start: H-> T1 -> H T1 -> T2 -> T3 -> T1 for loop Iniitializer: task = list_entry((head)->next, typeof(*task), u.tk_wait.list): next = list_entry(task->u.tk_wait.list.next, typeof(*task), u.tk_wait.list) so task = T1, next = H. for loop test: task != H. This is FALSE for loop body: call rpc_wake_up_task_queue_locked H -> T2 -> H T2 ->.T3 -> T2 for loop increment step task = next; next = list_entry(next->u.tk_wait.list.next, typeof(*next), u.tk_wait.list); so task = H, next = H. for loop test task != H This is TRUE so stop. Note that only T1 was processed - T2 and T3 are still on the queue. So list_for_each_entry_safe will not process any u.tk_wait.links tasks, because the next pointer is assigned prior to the call to rpc_wake_up_task_queue_locked, so the for loop increment (or initialization) setting of: task = next: does NOT pick up the fact that (in our example) T1 was REPLACED by T2, not just removed. On the other hand, the rpc_drain_queue() patch uses while(!list_empty()) instead of list_for_each_entry_safe() and happily processes all of the tasks on the queue: H -> T1 -> H lilst is not empty, so call rpc_wake_up_task_queue_locked. H -> T2 -> H list is not empty; so call rpc_wake_up _task_queue_locked. H -> T3 -> H list is not empty; so call rpc_wake_up_task_queue_locked list is empty. -->Andy On Mar 16, 2012, at 11:25 AM, Adamson, Andy wrote: > > On Mar 16, 2012, at 11:21 AM, Myklebust, Trond wrote: > >> On Fri, 2012-03-16 at 11:13 -0400, Andy Adamson wrote: >>> On Thu, Mar 15, 2012 at 8:10 PM, Myklebust, Trond >>> wrote: >>>> On Thu, 2012-03-15 at 14:40 -0400, andros@netapp.com wrote: >>>>> From: Andy Adamson >>>>> >>>>> Signed-off-by: Andy Adamson >>>>> --- >>>>> include/linux/sunrpc/sched.h | 1 + >>>>> net/sunrpc/sched.c | 27 +++++++++++++++++++++++++++ >>>>> 2 files changed, 28 insertions(+), 0 deletions(-) >>>>> >>>>> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h >>>>> index dc0c3cc..fce0873 100644 >>>>> --- a/include/linux/sunrpc/sched.h >>>>> +++ b/include/linux/sunrpc/sched.h >>>>> @@ -235,6 +235,7 @@ void rpc_sleep_on_priority(struct rpc_wait_queue *, >>>>> void rpc_wake_up_queued_task(struct rpc_wait_queue *, >>>>> struct rpc_task *); >>>>> void rpc_wake_up(struct rpc_wait_queue *); >>>>> +void rpc_drain_queue(struct rpc_wait_queue *); >>>>> struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *); >>>>> struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *, >>>>> bool (*)(struct rpc_task *, void *), >>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c >>>>> index 1c570a8..11928ff 100644 >>>>> --- a/net/sunrpc/sched.c >>>>> +++ b/net/sunrpc/sched.c >>>>> @@ -551,6 +551,33 @@ void rpc_wake_up(struct rpc_wait_queue *queue) >>>>> EXPORT_SYMBOL_GPL(rpc_wake_up); >>>>> >>>>> /** >>>>> + * rpc_drain_queue - empty the queue and wake up all rpc_tasks >>>>> + * @queue: rpc_wait_queue on which the tasks are sleeping >>>>> + * >>>>> + * Grabs queue->lock >>>>> + */ >>>>> +void rpc_drain_queue(struct rpc_wait_queue *queue) >>>>> +{ >>>>> + struct rpc_task *task; >>>>> + struct list_head *head; >>>>> + >>>>> + spin_lock_bh(&queue->lock); >>>>> + head = &queue->tasks[queue->maxpriority]; >>>>> + for (;;) { >>>>> + while (!list_empty(head)) { >>>>> + task = list_entry(head->next, struct rpc_task, >>>>> + u.tk_wait.list); >>>>> + rpc_wake_up_task_queue_locked(queue, task); >>>>> + } >>>>> + if (head == &queue->tasks[0]) >>>>> + break; >>>>> + head--; >>>>> + } >>>>> + spin_unlock_bh(&queue->lock); >>>>> +} >>>>> +EXPORT_SYMBOL_GPL(rpc_drain_queue); >>>>> + >>>> >>>> Confused... How is this function any different from rpc_wake_up()? >>> >>> Because it actually drains the queues where rpc_wake_up does not. See >>> the attached output where I added the same printks to both >>> rpc_drain_queue and rpc_wake_up. >> >> So you are seeing a bug in rpc_wake_up()? I'm surprised; a bug of that >> magnitude should have caused a lot of hangs. Can you please look into >> what is happening. > > OK. Didn't know it was a bug - just thought the comment was off... > > -->Andy > >> >> -- >> Trond Myklebust >> Linux NFS client maintainer >> >> NetApp >> Trond.Myklebust@netapp.com >> www.netapp.com >> >