Date: Sat, 6 Jan 2018 16:01:51 -0500
From: Alexandru Chirvasitu <achirvasub@gmail.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>,
        Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
        Rodrigo Vivi <rodrigo.vivi@intel.com>,
        intel-gfx@lists.freedesktop.org,
        kernel list <linux-kernel@vger.kernel.org>
Subject: Re: PROBLEM: i915 causes complete desktop freezes in 4.15-rc5
Message-ID: <20180106210151.iwothofxsyqbwt37@D-69-91-141-110.dhcp4.washington.edu>
References: <151518186129.6838.5497512563650996948@mail.alporthouse.com>
 <20180105195842.zryxccc74k7fi6gq@D-69-91-141-110.dhcp4.washington.edu>
 <151518256891.6838.7870621097092357743@mail.alporthouse.com>
 <20180105220518.cmmof6rritm4bmjh@D-69-91-141-110.dhcp4.washington.edu>
 <151523540026.6838.8552050096058843898@mail.alporthouse.com>
 <20180106132443.yzn2pkfruu7basl7@D-69-91-141-110.dhcp4.washington.edu>
 <20180106163835.jknrwjt52nhbzzlt@D-69-91-141-110.dhcp4.washington.edu>
 <151526009137.23681.11777101661125249780@mail.alporthouse.com>
 <20180106184429.GA1469@chirva-void>
 <151527210085.23681.13693584447068529774@mail.alporthouse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <151527210085.23681.13693584447068529774@mail.alporthouse.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-kernel-owner@vger.kernel.org

Got it. I suppose that explains why it was so unreliably
reproducible..

In any case, the patch is holding strong still with plenty of activity
since, so it's looking good.

Thak you for all of this; very instructive.

On Sat, Jan 06, 2018 at 08:55:00PM +0000, Chris Wilson wrote:
> Quoting Alexandru Chirvasitu (2018-01-06 18:44:29)
> > Thanks!
> > 
> > It's also a mystery to me why I never had any crashes on any of the
> > other systems running on this machine running the same (unpatched)
> > kernels.
> > 
> > I'm assuming the window manager might have something to do with it:
> > all of the others are on i3 and the buggy one's openbox, so perhaps
> > tiling vs. stacking makes a difference?
> 
> It just takes the right pattern of activity. The logic upon retiring a
> request does try to strip the fences from all the objects listening to
> the fence, but we can only do that so long as all the fences on the
> object have been signaled. So for starters, it needs an object being
> used by multiple timelines (different processes and/or engines) and then
> retirement has to be run at just the right frequency to not see all of
> those fences not to be completed. (Even more for this failure, it must
> have a retired exclusive/write fence paired with active shared/read
> fences.) It is that timing issue that has made it so rare, it's a
> pattern that we definitely do not expose in our testing.
> -Chris