Message-ID: <5231ECDD.2050108@vmware.com>
Date: Thu, 12 Sep 2013 18:33:33 +0200
From: Thomas Hellstrom <thellstrom@vmware.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7
MIME-Version: 1.0
To: Daniel Vetter <daniel.vetter@ffwll.ch>
CC: Peter Zijlstra <peterz@infradead.org>, Dave Airlie <airlied@linux.ie>,
        Maarten Lankhorst <maarten.lankhorst@canonical.com>,
        intel-gfx <intel-gfx@lists.freedesktop.org>,
        dri-devel <dri-devel@lists.freedesktop.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
References: <20130912150645.GZ31370@twins.programming.kicks-ass.net> <CAKMK7uFF2GkYapa2_DLv4WYAd0t+POaT+Vy9HG=HQhzMmtH9nA@mail.gmail.com> <20130912154329.GB31370@twins.programming.kicks-ass.net> <CAKMK7uFTYB-E5ksVNEBqGJusJ5TKJv5X4hiPv1mno9tJKuUAdA@mail.gmail.com>
In-Reply-To: <CAKMK7uFTYB-E5ksVNEBqGJusJ5TKJv5X4hiPv1mno9tJKuUAdA@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2174
Lines: 45

On 09/12/2013 05:58 PM, Daniel Vetter wrote:
> On Thu, Sep 12, 2013 at 5:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>> into it's own pagefault handler and then deadlock, the trylock just
>>> keeps lockdep quiet.

Could you describe how it could recurse into it's own pagefault handler?
IIRC the VM flags of the TTM VMAs makes get_user_pages() refrain from 
touching these VMAs,
hence I don't think this code can deadlock, but admittedly it's far from 
the optimal solution.

Never mind, more on the set_need_resched() below.


>>>   We've had that bug arise in drm/i915 due to some
>>> fun userspace did and now have testcases for them. The right solution
>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>> temp allocation and then continues. At least that's how we've fixed
>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>> Yikes.. so how common is it? If I simply rip the set_need_resched() out
>> it will 'spin' on the fault a little longer until a 'natural' preemption
>> point -- if such a thing is every going to happen.

A typical case is if a process is throwing out a buffer from the GPU or 
system memory while another
process pagefaults while writing to it. It's not a common situation, and 
it's by no means a fastpath situation.
For correctness purposes, I think set_need_resched() can be safely removed.

> It's a case of "our userspace doesn't do this", so as long as you're
> not evil and frob the drm device nodes of ttm drivers directly the
> deadlock will never happen. No idea how much contention actually
> happens on e.g. shared buffer objects - in i915 we have just one lock
> and so suffer quite a bit more from contention. So no idea how much
> removing the yield would hurt.
> -Daniel

/Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/