Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932695AbbKMPMK (ORCPT ); Fri, 13 Nov 2015 10:12:10 -0500 Received: from mail-io0-f173.google.com ([209.85.223.173]:33432 "EHLO mail-io0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932149AbbKMPMI (ORCPT ); Fri, 13 Nov 2015 10:12:08 -0500 Date: Fri, 13 Nov 2015 08:12:04 -0700 From: Jens Axboe To: Chris Wilson , Daniel Vetter , DRI Development , LKML Subject: Re: __i915_spin_request() sucks Message-ID: <20151113151204.GA8939@kernel.dk> References: <5644F850.2060803@kernel.dk> <5644F941.9090505@kernel.dk> <20151112221908.GA26194@nuc-i3427.alporthouse.com> <56451812.2050704@kernel.dk> <20151113091558.GN6247@nuc-i3427.alporthouse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20151113091558.GN6247@nuc-i3427.alporthouse.com> FCC: imap://axboe%40kernel.dk@imap.gmail.com/[Gmail]/Sent Mail X-Identity-Key: id1 X-Account-Key: account1 X-Mozilla-Draft-Info: internal/draft; vcard=0; receipt=0; DSN=0; uuencode=0; attachmentreminder=0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5821 Lines: 139 On 11/13/2015 02:15 AM, Chris Wilson wrote: > On Thu, Nov 12, 2015 at 03:52:02PM -0700, Jens Axboe wrote: >> On 11/12/2015 03:19 PM, Chris Wilson wrote: >>>>> So today, I figured I'd try just killing that spin. If it fails, we'll >>>>> punt to normal completions, so easy change. And wow, MASSIVE difference. >>>>> I can now scroll in chrome and not rage! It's like the laptop is 10x >>>>> faster now. >>>>> >>>>> Ran git blame, and found: >>>>> >>>>> commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 >>>>> Author: Chris Wilson >>>>> Date: Tue Apr 7 16:20:41 2015 +0100 >>>>> >>>>> drm/i915: Optimistically spin for the request completion >>>>> >>>>> and read the commit message. Doesn't sound that impressive. Especially >>>>> not for something that screws up interactive performance by a LOT. >>>>> >>>>> What's the deal? Revert? >>> >>> The tests that it improved the most were the latency sensitive tests and >>> since my Broadwell xps13 behaves itself, I'd like to understand how it >>> culminates in an interactivity loss. >>> >>> 1. Maybe it is the uninterruptible nature of the polling, making X's >>> SIGIO jerky: >> >> This one still feels bad. >> >>> 2. Or maybe it is increased mutex contention: >> >> And so does this one... I had to manually apply hunks 2-3, and after >> doing seat-of-the-pants testing for both variants, I confirmed with >> perf that we're still seeing a ton of time in __i915_wait_request() >> for both of them. >> >>> Or maybe it is an indirect effect, such as power balancing between the >>> CPU and GPU, or just thermal throttling, or it may be the task being >>> penalised for consuming its timeslice (for which any completion polling >>> seems susceptible). >> >> Look, polls in the 1-10ms range are just insane. Either you botched >> the commit message and really meant "~1ms at most" and in which case >> I'd suspect you of smoking something good, or you hacked it up wrong >> and used jiffies when you really wanted to be using some other time >> check that really did give you 1us. > > What other time check? I was under the impression of setting up a > hrtimer was expensive and jiffies was available. Looping for 10ms is a lot more expensive :-). jiffies is always there, but it's WAY too coarse to be used for something like this. You could use ktime_get(), there's a lot of helpers for checking time_after, adding msecs, etc. Something like the below, not tested here yet. >> I'll take an IRQ over 10 msecs of busy looping on my laptop, thanks. >> >>>> "Limit the spinning to a single jiffie (~1us) at most" >>>> >>>> is totally wrong. I have HZ=100 on my laptop. That's 10ms. 10ms! >>>> Even if I had HZ=1000, that'd still be 1ms of spinning. That's >>>> seriously screwed up, guys. >>> >>> That's over and above the termination condition for blk_poll(). >> >> ?! And this is related, how? Comparing apples and oranges. One is a >> test opt-in feature for experimentation, the other is >> unconditionally enabled for everyone. I believe the commit even says >> so. See the difference? Would I use busy loop spinning waiting for >> rotating storage completions, which are in the 1-10ms range? No, >> with the reason being that the potential wins for spins are in the >> usec range. > > Equally I expect the service interval for a batch to be around 2-20us. > There are many workloads that execute 30-50k requests/s, and you can > appreciate that they are sensitive to the latency in setting up an irq > and receiving it - just as equally leaving that irq enabled saturates a > CPU with simply executing the irq handler. So what mechanism do you use > to guard against either a very long queue depth or an abnormal request > causing msec+ spins? Not disputing that polling can work, but it needs to be a bit more clever. Do you know which requests are fast and which ones are not? Could you track it? Should we make this a module option? 20usec is too long to poll. If we look at the wins of polling, we're talking anywhere from 1-2 usec to maybe 5 usec, depending on different factors. So spinning between 1-3 usec should be a hard limit on most platforms. And it's somewhat of a policy decision, since it does involve throwing CPU at the problem. There's a crossover point where below it's always a win, but that needs a lot more work than just optimistic spinning for everything. You also do need a check for whether the task has been woken up, that's also missing. As for interrupt mitigation, I'd consider that a separate problem. It's a lot simpler than the app induced polling that i915 is doing here. So if overloading a core with IRQs is an issue, I'd solve that differently similarly to NAPI or blk-iopoll (not to be confused with blk_poll() that you referenced and is app induced polling). diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 5cf4a1998273..658514e899b1 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1148,17 +1148,19 @@ static bool missed_irq(struct drm_i915_private *dev_priv, static int __i915_spin_request(struct drm_i915_gem_request *req) { - unsigned long timeout; + ktime_t start, end; if (i915_gem_request_get_ring(req)->irq_refcount) return -EBUSY; - timeout = jiffies + 1; + start = ktime_get(); + end.tv64 = start.tv64; + ktime_add_us(end, 1); while (!need_resched()) { if (i915_gem_request_completed(req, true)) return 0; - if (time_after_eq(jiffies, timeout)) + if (ktime_after(start, end)) break; cpu_relax_lowlatency(); -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/