Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752961AbeAFVB4 (ORCPT + 1 other); Sat, 6 Jan 2018 16:01:56 -0500 Received: from mail-qt0-f173.google.com ([209.85.216.173]:40362 "EHLO mail-qt0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751509AbeAFVBy (ORCPT ); Sat, 6 Jan 2018 16:01:54 -0500 X-Google-Smtp-Source: ACJfBotz+MXuBQlMbgyyIctsBRpDSSs7WfFRG3fLUE41tLctsr/YkJ+Wm2hX+7E9mU/Wx87Vp/gmhQ== Date: Sat, 6 Jan 2018 16:01:51 -0500 From: Alexandru Chirvasitu To: Chris Wilson Cc: Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , intel-gfx@lists.freedesktop.org, kernel list Subject: Re: PROBLEM: i915 causes complete desktop freezes in 4.15-rc5 Message-ID: <20180106210151.iwothofxsyqbwt37@D-69-91-141-110.dhcp4.washington.edu> References: <151518186129.6838.5497512563650996948@mail.alporthouse.com> <20180105195842.zryxccc74k7fi6gq@D-69-91-141-110.dhcp4.washington.edu> <151518256891.6838.7870621097092357743@mail.alporthouse.com> <20180105220518.cmmof6rritm4bmjh@D-69-91-141-110.dhcp4.washington.edu> <151523540026.6838.8552050096058843898@mail.alporthouse.com> <20180106132443.yzn2pkfruu7basl7@D-69-91-141-110.dhcp4.washington.edu> <20180106163835.jknrwjt52nhbzzlt@D-69-91-141-110.dhcp4.washington.edu> <151526009137.23681.11777101661125249780@mail.alporthouse.com> <20180106184429.GA1469@chirva-void> <151527210085.23681.13693584447068529774@mail.alporthouse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <151527210085.23681.13693584447068529774@mail.alporthouse.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Got it. I suppose that explains why it was so unreliably reproducible.. In any case, the patch is holding strong still with plenty of activity since, so it's looking good. Thak you for all of this; very instructive. On Sat, Jan 06, 2018 at 08:55:00PM +0000, Chris Wilson wrote: > Quoting Alexandru Chirvasitu (2018-01-06 18:44:29) > > Thanks! > > > > It's also a mystery to me why I never had any crashes on any of the > > other systems running on this machine running the same (unpatched) > > kernels. > > > > I'm assuming the window manager might have something to do with it: > > all of the others are on i3 and the buggy one's openbox, so perhaps > > tiling vs. stacking makes a difference? > > It just takes the right pattern of activity. The logic upon retiring a > request does try to strip the fences from all the objects listening to > the fence, but we can only do that so long as all the fences on the > object have been signaled. So for starters, it needs an object being > used by multiple timelines (different processes and/or engines) and then > retirement has to be run at just the right frequency to not see all of > those fences not to be completed. (Even more for this failure, it must > have a retired exclusive/write fence paired with active shared/read > fences.) It is that timing issue that has made it so rare, it's a > pattern that we definitely do not expose in our testing. > -Chris