Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753047AbdDJKPl (ORCPT ); Mon, 10 Apr 2017 06:15:41 -0400 Received: from mout02.posteo.de ([185.67.36.66]:43756 "EHLO mout02.posteo.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751442AbdDJKPk (ORCPT ); Mon, 10 Apr 2017 06:15:40 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 10 Apr 2017 12:15:34 +0200 From: Martin Kepplinger To: Andrea Arcangeli Cc: Thorsten Leemhuis , daniel.vetter@intel.com, Dave Airlie , Chris Wilson , intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: Re: [PATCH 0/5] Re: [Intel-gfx] [BUG][REGRESSION] i915 gpu hangs under load In-Reply-To: <20170406232347.988-1-aarcange@redhat.com> References: <87pogtplxr.fsf@intel.com> <20170406232347.988-1-aarcange@redhat.com> Message-ID: <3c3dd67ba55d13893c3d264e23710085@posteo.de> User-Agent: Posteo Webmail Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1393 Lines: 29 Am 07.04.2017 01:23 schrieb Andrea Arcangeli: > I'm also getting kernel hangs every couple of days. For me it's still > not fixed here in 4.11-rc5. It's hard to reproduce, the best > reproducer is to build lineageos 14.1 on host while running LTP in a > guest to stress the guest VM. > > Initially I thought it was related to the fact I upgraded the xf86 > intel driver just a few weeks ago (I deferred any upgrade of the > userland intel driver since last July because of a regression that > never got fixed and broke xterm for me). After I found a workaround > for the userland regression (appended at the end for reference) I > started getting kernel hangs but they are separate issues as far as I > can tell. > > It's not well tested so beware... (it survived a couple of builds and > some VM reclaim but that's it). > > The first patch 1/5 is the potential fix for the i915 kernel hang. The > rest are incremental improvements. > > And I've no great solution for when the shrinker was invoked with the > struct_mutex held and and recurse on the lock. I don't think we can > possibly wait in such case (other than flush work that the second > patch does) but then practically it shouldn't be a big deal, the big > RAM eater is unlikely to be i915 when the system is low on memory. > FWIW without having insight here, -rc6 seems to be good. No disturbing gpu hangs under load so far.