Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264664AbTFLBQV (ORCPT ); Wed, 11 Jun 2003 21:16:21 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264660AbTFLBQV (ORCPT ); Wed, 11 Jun 2003 21:16:21 -0400 Received: from ppp-217-133-42-200.cust-adsl.tiscali.it ([217.133.42.200]:17562 "EHLO dualathlon.random") by vger.kernel.org with ESMTP id S264682AbTFLBP1 (ORCPT ); Wed, 11 Jun 2003 21:15:27 -0400 Date: Thu, 12 Jun 2003 03:29:51 +0200 From: Andrea Arcangeli To: Nick Piggin Cc: Chris Mason , Marc-Christian Petersen , Jens Axboe , Marcelo Tosatti , Georg Nikodym , lkml , Matthias Mueller Subject: Re: [PATCH] io stalls Message-ID: <20030612012951.GG1500@dualathlon.random> References: <20030611003356.GN26270@dualathlon.random> <1055292839.24111.180.camel@tiny.suse.com> <20030611010628.GO26270@dualathlon.random> <1055296630.23697.195.camel@tiny.suse.com> <20030611021030.GQ26270@dualathlon.random> <1055353360.23697.235.camel@tiny.suse.com> <20030611181217.GX26270@dualathlon.random> <1055356032.24111.240.camel@tiny.suse.com> <20030611183503.GY26270@dualathlon.random> <3EE7D1AA.30701@cyberone.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3EE7D1AA.30701@cyberone.com.au> User-Agent: Mutt/1.4i X-GPG-Key: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43 X-PGP-Key: 1024R/CB4660B9 CC A0 71 81 F4 A0 63 AC C0 4B 81 1D 8C 15 C8 E5 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3988 Lines: 106 On Thu, Jun 12, 2003 at 11:04:42AM +1000, Nick Piggin wrote: > > > Andrea Arcangeli wrote: > > >On Wed, Jun 11, 2003 at 02:27:13PM -0400, Chris Mason wrote: > > > >>On Wed, 2003-06-11 at 14:12, Andrea Arcangeli wrote: > >> > >>>On Wed, Jun 11, 2003 at 01:42:41PM -0400, Chris Mason wrote: > >>> > >>>>+ if (q->rq[rw].count >= q->batch_requests) { > >>>>+ smp_mb(); > >>>>+ if (waitqueue_active(&q->wait_for_requests[rw])) > >>>>+ wake_up(&q->wait_for_requests[rw]); > >>>> > >>>in my tree I also changed this to: > >>> > >>> wake_up_nr(&q->wait_for_requests[rw], > >>> q->rq[rw].count); > >>> > >>>otherwise only one waiter will eat the requests, while multiple waiters > >>>can eat requests in parallel instead because we freed not just 1 request > >>>but many of them. > >>> > >>I tried a few variations of this yesterday and they all led to horrible > >>latencies, but I couldn't really explain why. I had a bunch of other > >> > > > >the I/O latency in theory shouldn't change, we're not reordering the > >queue at all, they'll go to sleep immediatly again if __get_request > >returns null. > > > > And go to the end of the queue? they stay in queue, so they don't go to the end. but as Chris found since we've the get_request_wait_wakeup, even waking free-requests/2 isn't enough since that will generate free-requests *1.5 of wakeups where the last free-requests/2 (implicitly generated by the get_request_wait_wakeup) will become runnable and they will race with the other tasks later waken by another request release. I sort of fixed that by remebering an old suggestion from Andrew: static void get_request_wait_wakeup(request_queue_t *q, int rw) { /* * avoid losing an unplug if a second __get_request_wait did the * generic_unplug_device while our __get_request_wait was * running * w/o the queue_lock held and w/ our request out of the queue. */ if (waitqueue_active(&q->wait_for_requests)) run_task_queue(&tq_disk); } this will avoid get_request_wait_wakeup to mess the wakeup, so we can wakep_nr(rq.count) safely. then there's the last issue raised by Chris, that is if we get request released faster than the tasks can run, still we can generate a not perfect fairness. My solution to that is to change wake_up to have a nr_exclusive not obeying to the try_to_wakeup retval. that should guarantee exact FIFO then, but it's a minor issue because the requests shouldn't be released systematically in a flood. So I'm leaving it opened for now, the others already addressed should be the major ones. > >>stuff in at the time to try and improve throughput though, so I'll try > >>it again. > >> > >>I think part of the problem is the cascading wakeups from > >>get_request_wait_wakeup(). So if we wakeup 32 procs they in turn wakeup > >>another 32, etc. > >> > > > >so maybe it's enough to wakeup count / 2 to account for the double > >wakeup? that will avoid some overscheduling indeed. > > > > > > Andrea, this isn't needed because when the queue falls below actually the problem is that it isn't enough, not that it isn't needed. I had to stop get_request_wait_wakeup to mess the wakeups, so now I can return doing /2. > the batch limit, every retiring request will do a wake up and > another request will be put on (as long as the waitqueue is > active). this was my argument for doing /2, but that will lead to count * 1.5 of wakeups, where the last count /2 will race with further wakeups screwing the FIFO ordering. As said that's fixed completely now and the last issue is the one with the flood of request release that can't keep up with the tasks becoming runnable but it's hopefully a minor issue (I'm not going to change how wake_up_nr works right now, maybe later). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/