Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757215Ab2FYVS6 (ORCPT ); Mon, 25 Jun 2012 17:18:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:9633 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756095Ab2FYVS4 (ORCPT ); Mon, 25 Jun 2012 17:18:56 -0400 Date: Mon, 25 Jun 2012 17:18:50 -0400 From: Vivek Goyal To: Josh Hunt Cc: Jens Axboe , linux-kernel@vger.kernel.org, tj@kernel.org Subject: Re: multi-second application stall in open() Message-ID: <20120625211850.GC10916@redhat.com> References: <20120622141211.GB18409@redhat.com> <20120622204207.GA22063@redhat.com> <20120625133047.GA9394@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2803 Lines: 73 On Mon, Jun 25, 2012 at 11:22:32AM -0500, Josh Hunt wrote: > On Mon, Jun 25, 2012 at 8:30 AM, Vivek Goyal wrote: > > On Fri, Jun 22, 2012 at 04:34:07PM -0500, Josh Hunt wrote: > > > > [..] > >> Shouldn't the queue stay on the RR list until it is empty? > > > > This does look odd. cfqq should stay on service tree as long as it has > > requests. > > > > Can you attach the full log again. Also make sure that blktrace is not > > dropping any trace events. > > > > In slice_expire() we check following. > > > > ? ? ? ?if (cfq_cfqq_on_rr(cfqq) && RB_EMPTY_ROOT(&cfqq->sort_list)) > > ? ? ? ? ? ? ? ?cfq_del_cfqq_rr(cfqd, cfqq); > > > > So for some reason RB_EMPTY_ROOT() is returning true. But we must have > > added the request and it should not have been empty. > > > > cfq_insert_request() > > ?cfq_add_rq_rb() > > ? ?elv_rb_add() > > > > So may be little more tracing after request addition will help. Just check > > that RB_EMPTY_ROOT() is not true after addition of request and also print > > number of requests queued. > > > > In slice_expired() we can probably put a BUG_ON() which checks following. > > > > BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list) && (cfqq->queued[0] || cfqq->queued[1])); > > > > Thanks > > Vivek > > Vivek > > First off thanks for all the time you've spent helping me on this :) > > I'm attaching the log. I will add more instrumentation based on your > mail and let you know the results. I suspect that cfq5917 are actually two queues. One sync and other async. In latest kernels, trace messages append "A" or "S" in queue name to represent async/sync queue. Why do I think so. - I see that cfq5917 has dispatched both "W" and "R" requests. "R" don't get queued in sync queues and vice a versa. - Once cfq5917 has preempted cfq5917. A queue can't preempt itself. 8,0 1 296615 4847.030318652 5917 I R 1572951 + 8 ( 2793) [syslogd] 8,0 1 0 4847.030321446 0 m N cfq5917 insert_request 8,0 1 0 4847.030323122 0 m N cfq5917 add_to_rr 8,0 1 0 4847.030325636 0 m N cfq5917 preempt 8,0 1 0 4847.030327033 0 m N cfq5917 slice expired t=1 8,0 1 0 4847.030328710 0 m N cfq5917 resid=39 8,0 1 0 4847.030331783 0 m N cfq5917 sl_used=1 disp=7 charge=1 iops=0 sect=96 Hence I think that it is just a chance that both sync and async queues have got same cfqq->pid. If you just append anohter character to represent sync and async property of queue, that can cofirm it. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/