Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752624AbbKPQsN (ORCPT ); Mon, 16 Nov 2015 11:48:13 -0500 Received: from mail-ig0-f180.google.com ([209.85.213.180]:36189 "EHLO mail-ig0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751163AbbKPQsM (ORCPT ); Mon, 16 Nov 2015 11:48:12 -0500 Subject: Re: [PATCH 2/2] drm/i915: Limit the busy wait on requests to 2us not 10ms! To: Chris Wilson , intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org References: <1447594364-4206-1-git-send-email-chris@chris-wilson.co.uk> <1447594364-4206-2-git-send-email-chris@chris-wilson.co.uk> Cc: dri-devel@lists.freedesktop.org, Daniel Vetter , Tvrtko Ursulin , Eero Tamminen , "Rantala, Valtteri" , stable@kernel.vger.org From: Jens Axboe Message-ID: <564A08C9.8090508@kernel.dk> Date: Mon, 16 Nov 2015 09:48:09 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <1447594364-4206-2-git-send-email-chris@chris-wilson.co.uk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2310 Lines: 45 On 11/15/2015 06:32 AM, Chris Wilson wrote: > When waiting for high frequency requests, the finite amount of time > required to set up the irq and wait upon it limits the response rate. By > busywaiting on the request completion for a short while we can service > the high frequency waits as quick as possible. However, if it is a slow > request, we want to sleep as quickly as possible. The tradeoff between > waiting and sleeping is roughly the time it takes to sleep on a request, > on the order of a microsecond. Based on measurements from big core, I > have set the limit for busywaiting as 2 microseconds. > > The code currently uses the jiffie clock, but that is far too coarse (on > the order of 10 milliseconds) and results in poor interactivity as the > CPU ends up being hogged by slow requests. To get microsecond resolution > we need to use a high resolution timer. The cheapest of which is polling > local_clock(), but that is only valid on the same CPU. If we switch CPUs > because the task was preempted, we can also use that as an indicator that > the system is too busy to waste cycles on spinning and we should sleep > instead. I tried this (1+2), and it feels better. However, I added some counters just to track how well it's faring: [ 491.077612] i915: invoked=7168, success=50 so out of 6144 invocations, we only avoided going to sleep 49 of those times. As a percentage, that's 99.3% of the time we spun 2usec for no good reason other than to burn up more of my battery. So the reason there's an improvement for me is that we're just not spinning the 10ms anymore, however we're still just wasting time for my use case. I'd recommend putting this behind some option so that people can enable it and play with it if they want, but not making it default to on until some more clever tracking has been added to dynamically adapt to on when to poll and when not to. It should not be a default-on type of thing until it's closer to doing the right thing for a normal workload, not just some synthetic benchmark. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/