Date: Fri, 13 Nov 2015 09:15:58 +0000
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Jens Axboe <axboe@kernel.dk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
        DRI Development <dri-devel@lists.freedesktop.org>,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: __i915_spin_request() sucks
Message-ID: <20151113091558.GN6247@nuc-i3427.alporthouse.com>
Mail-Followup-To: Chris Wilson <chris@chris-wilson.co.uk>,
	Jens Axboe <axboe@kernel.dk>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	DRI Development <dri-devel@lists.freedesktop.org>,
	LKML <linux-kernel@vger.kernel.org>
References: <5644F850.2060803@kernel.dk>
 <5644F941.9090505@kernel.dk>
 <20151112221908.GA26194@nuc-i3427.alporthouse.com>
 <56451812.2050704@kernel.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <56451812.2050704@kernel.dk>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3648
Lines: 84

On Thu, Nov 12, 2015 at 03:52:02PM -0700, Jens Axboe wrote:
> On 11/12/2015 03:19 PM, Chris Wilson wrote:
> >>>So today, I figured I'd try just killing that spin. If it fails, we'll
> >>>punt to normal completions, so easy change. And wow, MASSIVE difference.
> >>>I can now scroll in chrome and not rage! It's like the laptop is 10x
> >>>faster now.
> >>>
> >>>Ran git blame, and found:
> >>>
> >>>commit 2def4ad99befa25775dd2f714fdd4d92faec6e34
> >>>Author: Chris Wilson <chris@chris-wilson.co.uk>
> >>>Date:   Tue Apr 7 16:20:41 2015 +0100
> >>>
> >>>     drm/i915: Optimistically spin for the request completion
> >>>
> >>>and read the commit message. Doesn't sound that impressive. Especially
> >>>not for something that screws up interactive performance by a LOT.
> >>>
> >>>What's the deal? Revert?
> >
> >The tests that it improved the most were the latency sensitive tests and
> >since my Broadwell xps13 behaves itself, I'd like to understand how it
> >culminates in an interactivity loss.
> >
> >1. Maybe it is the uninterruptible nature of the polling, making X's
> >SIGIO jerky:
> 
> This one still feels bad.
> 
> >2. Or maybe it is increased mutex contention:
> 
> And so does this one... I had to manually apply hunks 2-3, and after
> doing seat-of-the-pants testing for both variants, I confirmed with
> perf that we're still seeing a ton of time in __i915_wait_request()
> for both of them.
> 
> >Or maybe it is an indirect effect, such as power balancing between the
> >CPU and GPU, or just thermal throttling, or it may be the task being
> >penalised for consuming its timeslice (for which any completion polling
> >seems susceptible).
> 
> Look, polls in the 1-10ms range are just insane. Either you botched
> the commit message and really meant "~1ms at most" and in which case
> I'd suspect you of smoking something good, or you hacked it up wrong
> and used jiffies when you really wanted to be using some other time
> check that really did give you 1us.

What other time check? I was under the impression of setting up a
hrtimer was expensive and jiffies was available.
 
> I'll take an IRQ over 10 msecs of busy looping on my laptop, thanks.
> 
> >>"Limit the spinning to a single jiffie (~1us) at most"
> >>
> >>is totally wrong. I have HZ=100 on my laptop. That's 10ms. 10ms!
> >>Even if I had HZ=1000, that'd still be 1ms of spinning. That's
> >>seriously screwed up, guys.
> >
> >That's over and above the termination condition for blk_poll().
> 
> ?! And this is related, how? Comparing apples and oranges. One is a
> test opt-in feature for experimentation, the other is
> unconditionally enabled for everyone. I believe the commit even says
> so. See the difference? Would I use busy loop spinning waiting for
> rotating storage completions, which are in the 1-10ms range? No,
> with the reason being that the potential wins for spins are in the
> usec range.

Equally I expect the service interval for a batch to be around 2-20us.
There are many workloads that execute 30-50k requests/s, and you can
appreciate that they are sensitive to the latency in setting up an irq
and receiving it - just as equally leaving that irq enabled saturates a
CPU with simply executing the irq handler. So what mechanism do you use
to guard against either a very long queue depth or an abnormal request
causing msec+ spins?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/