Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935752AbcJTOfe (ORCPT ); Thu, 20 Oct 2016 10:35:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:42684 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750820AbcJTOfc (ORCPT ); Thu, 20 Oct 2016 10:35:32 -0400 Date: Thu, 20 Oct 2016 16:35:30 +0200 Message-ID: From: Takashi Iwai To: Ville =?UTF-8?B?U3lyasOkbMOk?= Cc: dri-devel@lists.freedesktop.org, Daniel Vetter , linux-kernel@vger.kernel.org Subject: Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater In-Reply-To: <20161020141725.GU4329@intel.com> References: <20161020132055.9646-1-tiwai@suse.de> <20161020132814.GT4329@intel.com> <20161020141725.GU4329@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/24.5 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2318 Lines: 54 On Thu, 20 Oct 2016 16:17:25 +0200, Ville Syrjälä wrote: > > On Thu, Oct 20, 2016 at 03:36:54PM +0200, Takashi Iwai wrote: > > On Thu, 20 Oct 2016 15:28:14 +0200, > > Ville Syrjälä wrote: > > > > > > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote: > > > > Since 4.7 kernel, we've seen the error messages like > > > > > > > > kernel: [TTM] Buffer eviction failed > > > > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001) > > > > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO > > > > > > > > on QXL when switching and accessing on VT. The culprit was the generic > > > > deferred_io code (qxl driver switched to it since 4.7). There is a > > > > race between the dirty clip update and the call of callback. > > > > > > > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock, > > > > while it kicks off the update worker outside the spinlock. Meanwhile > > > > the update worker clears the dirty clip in the spinlock, too. Thus, > > > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is > > > > called after the clip is cleared in the first worker call. > > > > > > Why does that matter? The first worker should have done all the > > > necessary work already, no? > > > > Before the first call, it clears the clip and passes the copied clip > > to the callback. Then the second call will be with the cleared and > > untouched clip, i.e. with x1=~0. This confuses > > qxl_framebuffer_dirty(). > > > > Of course, we can filter out in the callback side by checking the > > clip. It was actually my first version. But basically it's a race > > and should be covered better in the caller side. > > The race is still there AFAICS. The worker may already be executing but > not yet in the critical section, at which point drm_fb_helper_dirty() > will expand the dirty rectangle, and schedule another work. So the first > worker will already see the expanded rectangle, and second worker will > get zilch. Hrm, right, there's a slight race window there. > I think the only good fix is to have the worker validate the dirty > rectangle before calling the driver. OK, let me cook it quickly. (It was actually the second version of the patch I wrote, and I sent the third one :) thanks, Takashi