Date: Thu, 12 Jun 2003 03:29:51 +0200
From: Andrea Arcangeli <andrea@suse.de>
To: Nick Piggin <piggin@cyberone.com.au>
Cc: Chris Mason <mason@suse.com>,
       Marc-Christian Petersen <m.c.p@wolk-project.de>,
       Jens Axboe <axboe@suse.de>, Marcelo Tosatti <marcelo@conectiva.com.br>,
       Georg Nikodym <georgn@somanetworks.com>,
       lkml <linux-kernel@vger.kernel.org>,
       Matthias Mueller <matthias.mueller@rz.uni-karlsruhe.de>
Subject: Re: [PATCH] io stalls
Message-ID: <20030612012951.GG1500@dualathlon.random>
References: <20030611003356.GN26270@dualathlon.random> <1055292839.24111.180.camel@tiny.suse.com> <20030611010628.GO26270@dualathlon.random> <1055296630.23697.195.camel@tiny.suse.com> <20030611021030.GQ26270@dualathlon.random> <1055353360.23697.235.camel@tiny.suse.com> <20030611181217.GX26270@dualathlon.random> <1055356032.24111.240.camel@tiny.suse.com> <20030611183503.GY26270@dualathlon.random> <3EE7D1AA.30701@cyberone.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3EE7D1AA.30701@cyberone.com.au>
User-Agent: Mutt/1.4i
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3988
Lines: 106

On Thu, Jun 12, 2003 at 11:04:42AM +1000, Nick Piggin wrote:
> 
> 
> Andrea Arcangeli wrote:
> 
> >On Wed, Jun 11, 2003 at 02:27:13PM -0400, Chris Mason wrote:
> >
> >>On Wed, 2003-06-11 at 14:12, Andrea Arcangeli wrote:
> >>
> >>>On Wed, Jun 11, 2003 at 01:42:41PM -0400, Chris Mason wrote:
> >>>
> >>>>+		if (q->rq[rw].count >= q->batch_requests) {
> >>>>+			smp_mb();
> >>>>+			if (waitqueue_active(&q->wait_for_requests[rw]))
> >>>>+				wake_up(&q->wait_for_requests[rw]);
> >>>>
> >>>in my tree I also changed this to:
> >>>
> >>>				wake_up_nr(&q->wait_for_requests[rw], 
> >>>				q->rq[rw].count);
> >>>
> >>>otherwise only one waiter will eat the requests, while multiple waiters
> >>>can eat requests in parallel instead because we freed not just 1 request
> >>>but many of them.
> >>>
> >>I tried a few variations of this yesterday and they all led to horrible
> >>latencies, but I couldn't really explain why.  I had a bunch of other
> >>
> >
> >the I/O latency in theory shouldn't change, we're not reordering the
> >queue at all, they'll go to sleep immediatly again if __get_request
> >returns null.
> >
> 
> And go to the end of the queue?

they stay in queue, so they don't go to the end.

but as Chris found since we've the get_request_wait_wakeup, even waking
free-requests/2 isn't enough since that will generate free-requests *1.5 of
wakeups where the last free-requests/2 (implicitly generated by the
get_request_wait_wakeup) will become runnable and they will race with the
other tasks later waken by another request release.

I sort of fixed that by remebering an old suggestion from Andrew:

static void get_request_wait_wakeup(request_queue_t *q, int rw)
{
	/*
	 * avoid losing an unplug if a second __get_request_wait did the
	 * generic_unplug_device while our __get_request_wait was
	 * running
	 * w/o the queue_lock held and w/ our request out of the queue.
	 */
	if (waitqueue_active(&q->wait_for_requests))
		run_task_queue(&tq_disk);
}


this will avoid get_request_wait_wakeup to mess the wakeup, so we can
wakep_nr(rq.count) safely.

then there's the last issue raised by Chris, that is if we get request
released faster than the tasks can run, still we can generate a not
perfect fairness. My solution to that is to change wake_up to have a
nr_exclusive not obeying to the try_to_wakeup retval. that should
guarantee exact FIFO then, but it's a minor issue because the requests
shouldn't be released systematically in a flood. So I'm leaving it
opened for now, the others already addressed should be the major ones.

> >>stuff in at the time to try and improve throughput though, so I'll try
> >>it again.
> >>
> >>I think part of the problem is the cascading wakeups from
> >>get_request_wait_wakeup().  So if we wakeup 32 procs they in turn wakeup
> >>another 32, etc.
> >>
> >
> >so maybe it's enough to wakeup count / 2 to account for the double
> >wakeup? that will avoid some overscheduling indeed.
> >
> >
> 
> Andrea, this isn't needed because when the queue falls below

actually the problem is that it isn't enough, not that it isn't needed.
I had to stop get_request_wait_wakeup to mess the wakeups, so now I can
return doing /2.

> the batch limit, every retiring request will do a wake up and
> another request will be put on (as long as the waitqueue is
> active).

this was my argument for doing /2, but that will lead to count * 1.5 of
wakeups, where the last count /2 will race with further wakeups screwing
the FIFO ordering. As said that's fixed completely now and the last
issue is the one with the flood of request release that can't keep up
with the tasks becoming runnable but it's hopefully a minor issue (I'm
not going to change how wake_up_nr works right now, maybe later).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/