Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754051AbbKMJQS (ORCPT ); Fri, 13 Nov 2015 04:16:18 -0500 Received: from mail.fireflyinternet.com ([87.106.93.118]:52041 "EHLO fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752588AbbKMJQN (ORCPT ); Fri, 13 Nov 2015 04:16:13 -0500 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Date: Fri, 13 Nov 2015 09:15:58 +0000 From: Chris Wilson To: Jens Axboe Cc: Daniel Vetter , DRI Development , LKML Subject: Re: __i915_spin_request() sucks Message-ID: <20151113091558.GN6247@nuc-i3427.alporthouse.com> Mail-Followup-To: Chris Wilson , Jens Axboe , Daniel Vetter , DRI Development , LKML References: <5644F850.2060803@kernel.dk> <5644F941.9090505@kernel.dk> <20151112221908.GA26194@nuc-i3427.alporthouse.com> <56451812.2050704@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56451812.2050704@kernel.dk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3648 Lines: 84 On Thu, Nov 12, 2015 at 03:52:02PM -0700, Jens Axboe wrote: > On 11/12/2015 03:19 PM, Chris Wilson wrote: > >>>So today, I figured I'd try just killing that spin. If it fails, we'll > >>>punt to normal completions, so easy change. And wow, MASSIVE difference. > >>>I can now scroll in chrome and not rage! It's like the laptop is 10x > >>>faster now. > >>> > >>>Ran git blame, and found: > >>> > >>>commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 > >>>Author: Chris Wilson > >>>Date: Tue Apr 7 16:20:41 2015 +0100 > >>> > >>> drm/i915: Optimistically spin for the request completion > >>> > >>>and read the commit message. Doesn't sound that impressive. Especially > >>>not for something that screws up interactive performance by a LOT. > >>> > >>>What's the deal? Revert? > > > >The tests that it improved the most were the latency sensitive tests and > >since my Broadwell xps13 behaves itself, I'd like to understand how it > >culminates in an interactivity loss. > > > >1. Maybe it is the uninterruptible nature of the polling, making X's > >SIGIO jerky: > > This one still feels bad. > > >2. Or maybe it is increased mutex contention: > > And so does this one... I had to manually apply hunks 2-3, and after > doing seat-of-the-pants testing for both variants, I confirmed with > perf that we're still seeing a ton of time in __i915_wait_request() > for both of them. > > >Or maybe it is an indirect effect, such as power balancing between the > >CPU and GPU, or just thermal throttling, or it may be the task being > >penalised for consuming its timeslice (for which any completion polling > >seems susceptible). > > Look, polls in the 1-10ms range are just insane. Either you botched > the commit message and really meant "~1ms at most" and in which case > I'd suspect you of smoking something good, or you hacked it up wrong > and used jiffies when you really wanted to be using some other time > check that really did give you 1us. What other time check? I was under the impression of setting up a hrtimer was expensive and jiffies was available. > I'll take an IRQ over 10 msecs of busy looping on my laptop, thanks. > > >>"Limit the spinning to a single jiffie (~1us) at most" > >> > >>is totally wrong. I have HZ=100 on my laptop. That's 10ms. 10ms! > >>Even if I had HZ=1000, that'd still be 1ms of spinning. That's > >>seriously screwed up, guys. > > > >That's over and above the termination condition for blk_poll(). > > ?! And this is related, how? Comparing apples and oranges. One is a > test opt-in feature for experimentation, the other is > unconditionally enabled for everyone. I believe the commit even says > so. See the difference? Would I use busy loop spinning waiting for > rotating storage completions, which are in the 1-10ms range? No, > with the reason being that the potential wins for spins are in the > usec range. Equally I expect the service interval for a batch to be around 2-20us. There are many workloads that execute 30-50k requests/s, and you can appreciate that they are sensitive to the latency in setting up an irq and receiving it - just as equally leaving that irq enabled saturates a CPU with simply executing the irq handler. So what mechanism do you use to guard against either a very long queue depth or an abnormal request causing msec+ spins? -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/