Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S938641AbcJTNg5 (ORCPT ); Thu, 20 Oct 2016 09:36:57 -0400 Received: from mx2.suse.de ([195.135.220.15]:38455 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934255AbcJTNg4 (ORCPT ); Thu, 20 Oct 2016 09:36:56 -0400 Date: Thu, 20 Oct 2016 15:36:54 +0200 Message-ID: From: Takashi Iwai To: Ville =?UTF-8?B?U3lyasOkbMOk?= Cc: dri-devel@lists.freedesktop.org, Daniel Vetter , linux-kernel@vger.kernel.org Subject: Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater In-Reply-To: <20161020132814.GT4329@intel.com> References: <20161020132055.9646-1-tiwai@suse.de> <20161020132814.GT4329@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/24.5 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2795 Lines: 76 On Thu, 20 Oct 2016 15:28:14 +0200, Ville Syrjälä wrote: > > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote: > > Since 4.7 kernel, we've seen the error messages like > > > > kernel: [TTM] Buffer eviction failed > > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001) > > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO > > > > on QXL when switching and accessing on VT. The culprit was the generic > > deferred_io code (qxl driver switched to it since 4.7). There is a > > race between the dirty clip update and the call of callback. > > > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock, > > while it kicks off the update worker outside the spinlock. Meanwhile > > the update worker clears the dirty clip in the spinlock, too. Thus, > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is > > called after the clip is cleared in the first worker call. > > Why does that matter? The first worker should have done all the > necessary work already, no? Before the first call, it clears the clip and passes the copied clip to the callback. Then the second call will be with the cleared and untouched clip, i.e. with x1=~0. This confuses qxl_framebuffer_dirty(). Of course, we can filter out in the callback side by checking the clip. It was actually my first version. But basically it's a race and should be covered better in the caller side. thanks, Takashi > > > > > The fix is simply moving schedule_work() inside the spinlock. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98322 > > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1003298 > > Fixes: eaa434defaca ('drm/fb-helper: Add fb_deferred_io support') > > Signed-off-by: Takashi Iwai > > --- > > drivers/gpu/drm/drm_fb_helper.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > > index 03414bde1f15..bae392dea2cc 100644 > > --- a/drivers/gpu/drm/drm_fb_helper.c > > +++ b/drivers/gpu/drm/drm_fb_helper.c > > @@ -861,9 +861,8 @@ static void drm_fb_helper_dirty(struct fb_info *info, u32 x, u32 y, > > clip->y1 = min_t(u32, clip->y1, y); > > clip->x2 = max_t(u32, clip->x2, x + width); > > clip->y2 = max_t(u32, clip->y2, y + height); > > - spin_unlock_irqrestore(&helper->dirty_lock, flags); > > - > > schedule_work(&helper->dirty_work); > > + spin_unlock_irqrestore(&helper->dirty_lock, flags); > > } > > > > /** > > -- > > 2.10.1 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > -- > Ville Syrjälä > Intel OTC >