Received: by 2002:a25:ca44:0:0:0:0:0 with SMTP id a65csp27331ybg; Mon, 27 Jul 2020 14:33:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxebMK16alOitzcNLrhFM2/cxCSTC4AkexA2AGKurHd8xbizPXgjRs1GZDTc1Kx2B7n5mb1 X-Received: by 2002:a05:6402:785:: with SMTP id d5mr7041701edy.370.1595885615745; Mon, 27 Jul 2020 14:33:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595885615; cv=none; d=google.com; s=arc-20160816; b=AGImojOFDRADvPEB6nbIn77wgXgO4EOxMwmilPSdTw5NIt5AxcQh+pmluhcIv2QOEV oXQ7i20IaNOpROnQJ/Gmg6K90rEUlwkFC1eu+qWL0l9iR9/F7La4mBefFwcRa6fwNEVO vvGjN1yn68peybpVLaM29maFwgst7nKSOIoL+KJEAvkXmISHT58zZ50Or7UjWE/JvwhN wZYrCpD1pqYmJSccyjX4+Ryr4MTGiEik+hh9PcS//x2RKA1QeH5eQveqP3n8x9dyErJP KV86rDIeOVU8OtwEQUD5oSS6IGb6Ddw8WG1DdmzeDb7sEsEqlkDKVsAYhwd0HvyZm6rA lIEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=OEiABl8b8tMlt/Qjct9uHQEISrUoWRGPs6oS3G8P/Sg=; b=V79B8Wll2c93vz/aJPVTDi8KDRMMnnarKee0O2Reph8QBzjoPp9u/PnetJoEbvWG2V 7XiHZKfx1GrPgv+YLyL1D7gohxtgeZ2UL4INS/Qv4LTpJmDieJApXnQ6sYNnOHkKfB8+ OGsjh/rrwfU5eO2vFvXrwzCKFF35JjIOI+pqzaf5a9MdglarESMB6LTJqeEypxUIukWH xl3RJrnrcxl6j4Jx3rE61/u9E1AKgJVCNY37MWNDK/M4QF2TGKKOcP0xsBMLPzKCS492 5RROeHZ9xofjYBeHZBEoH7dbHImM/BmmNk+la3cq706q/h8eJL4NyeJCmnsClQ+KHbnN mbLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=VUrZPynJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bh1si6462663ejb.453.2020.07.27.14.33.13; Mon, 27 Jul 2020 14:33:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=VUrZPynJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726987AbgG0VcQ (ORCPT + 99 others); Mon, 27 Jul 2020 17:32:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726183AbgG0VcP (ORCPT ); Mon, 27 Jul 2020 17:32:15 -0400 Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com [IPv6:2607:f8b0:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 774D4C061794 for ; Mon, 27 Jul 2020 14:32:15 -0700 (PDT) Received: by mail-ot1-x343.google.com with SMTP id t18so13423065otq.5 for ; Mon, 27 Jul 2020 14:32:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=OEiABl8b8tMlt/Qjct9uHQEISrUoWRGPs6oS3G8P/Sg=; b=VUrZPynJWuf9eFzr9X2aORsn/uGDuxjcZXWBOXdYq4G1h1/6Yc73v20NP/NyvHN/ny Lx9PgDSPOL9caa27MAeNHFVTdqcwU5WOn8+asUuheUEAyvS2o0YU2kcyyO4tHqXB7tCp j1lDA9ekpawR45YUdOfqCj34e+FHiq5cHqBUs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=OEiABl8b8tMlt/Qjct9uHQEISrUoWRGPs6oS3G8P/Sg=; b=BMaOpGq80tiCSmS1Z70ggR515IPdbq9wpOuinW5f5J0i4bP+mL7kj2SHgb2oPxzMtE YELu9jzJMik1f7iS2iDYs5kAJ+MwLHfJ+90gJaiyH3BlOzS9focQQQzy5TziuhbknjRP ik6hJziZlulRXZtZTT3XeMOiiwO//pIFzsrI19lwPaT74VFnkpuVhl8PYLpg1clfckGj 6OrCuLAFJDA2Tny9g9DpfvpiALy058CCj4w2SHeCML5qE5LR6gKEf7rYDkTDDnPGcHb/ UJH6u8aHcfJqvgp3rh25GiUHwedI3jrecXkUmurrIEyKSfT2NhnC2jR/VhKxwhlwFPho ANCQ== X-Gm-Message-State: AOAM531VF7ynoMFneI51r29Aa8bk1OXNwT62iPsS0IaknSj6F0aTn5Wq CD2F5BBM83RFBylKpx5MeQuESBsLyV4CBlzrOaZSbw== X-Received: by 2002:a05:6830:1613:: with SMTP id g19mr21092624otr.303.1595885534733; Mon, 27 Jul 2020 14:32:14 -0700 (PDT) MIME-Version: 1.0 References: <3b7e3e50-2ff7-eff3-2ffc-abaa4b36ce7f@amd.com> In-Reply-To: From: Daniel Vetter Date: Mon, 27 Jul 2020 23:32:03 +0200 Message-ID: Subject: Re: [PATCH] drm/amd/display: Clear dm_state for fast updates To: Mazin Rezk Cc: =?UTF-8?Q?Christian_K=C3=B6nig?= , "Kazlauskas, Nicholas" , "linux-kernel@vger.kernel.org" , "amd-gfx@lists.freedesktop.org" , "dri-devel@lists.freedesktop.org" , Paul Menzel , "anthony.ruhier@gmail.com" , Duncan <1i5t5.duncan@cox.net>, Kees Cook , "sunpeng.li@amd.com" , "regressions@leemhuis.info" , Alexander Deucher , Andrew Morton , "mphantomx@yahoo.com.br" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 27, 2020 at 11:11 PM Mazin Rezk wrote: > > On Monday, July 27, 2020 4:29 PM, Daniel Vetter wrote: > > > On Mon, Jul 27, 2020 at 9:28 PM Christian K=C3=B6nig > > wrote: > > > > > > Am 27.07.20 um 16:05 schrieb Kazlauskas, Nicholas: > > > > On 2020-07-27 9:39 a.m., Christian K=C3=B6nig wrote: > > > >> Am 27.07.20 um 07:40 schrieb Mazin Rezk: > > > >>> This patch fixes a race condition that causes a use-after-free du= ring > > > >>> amdgpu_dm_atomic_commit_tail. This can occur when 2 non-blocking > > > >>> commits > > > >>> are requested and the second one finishes before the first. > > > >>> Essentially, > > > >>> this bug occurs when the following sequence of events happens: > > > >>> > > > >>> 1. Non-blocking commit #1 is requested w/ a new dm_state #1 and i= s > > > >>> deferred to the workqueue. > > > >>> > > > >>> 2. Non-blocking commit #2 is requested w/ a new dm_state #2 and i= s > > > >>> deferred to the workqueue. > > > >>> > > > >>> 3. Commit #2 starts before commit #1, dm_state #1 is used in the > > > >>> commit_tail and commit #2 completes, freeing dm_state #1. > > > >>> > > > >>> 4. Commit #1 starts after commit #2 completes, uses the freed dm_= state > > > >>> 1 and dereferences a freelist pointer while setting the context. > > > >> > > > >> Well I only have a one mile high view on this, but why don't you l= et > > > >> the work items execute in order? > > > >> > > > >> That would be better anyway cause this way we don't trigger a cach= e > > > >> line ping pong between CPUs. > > > >> > > > >> Christian. > > > > > > > > We use the DRM helpers for managing drm_atomic_commit_state and tho= se > > > > helpers internally push non-blocking commit work into the system > > > > unbound work queue. > > > > > > Mhm, well if you send those helper atomic commits in the order A,B an= d > > > they execute it in the order B,A I would call that a bug :) > > > > The way it works is it pushes all commits into unbound work queue, but > > then forces serialization as needed. We do _not_ want e.g. updates on > > different CRTC to be serialized, that would result in lots of judder. > > And hw is funny enough that there's all kinds of dependencies. > > > > The way you force synchronization is by adding other CRTC state > > objects. So if DC is busted and can only handle a single update per > > work item, then I guess you always need all CRTC states and everything > > will be run in order. But that also totally kills modern multi-screen > > compositors. Xorg isn't modern, just in case that's not clear :-) > > > > Lucking at the code it seems like you indeed have only a single dm > > state, so yeah global sync is what you'll need as immediate fix, and > > then maybe fix up DM to not be quite so silly ... or at least only do > > the dm state stuff when really needed. > > > > We could also sprinkle the drm_crtc_commit structure around a bit > > (it's the glue that provides the synchronization across commits), but > > since your dm state is global just grabbing all crtc states > > unconditionally as part of that is probably best. > > > > > > While we could duplicate a copy of that code with nothing but the > > > > workqueue changed that isn't something I'd really like to maintain > > > > going forward. > > > > > > I'm not talking about duplicating the code, I'm talking about fixing = the > > > helpers. I don't know that code well, but from the outside it sounds > > > like a bug there. > > > > > > And executing work items in the order they are submitted is trivial. > > > > > > Had anybody pinged Daniel or other people familiar with the helper co= de > > > about it? > > > > Yeah something is wrong here, and the fix looks horrible :-) > > > > Aside, I've also seen some recent discussion flare up about > > drm_atomic_state_get/put used to paper over some other use-after-free, > > but this time related to interrupt handlers. Maybe a few rules about > > that: > > - dont > > - especially not when it's interrupt handlers, because you can't call > > drm_atomic_state_put from interrupt handlers. > > > > Instead have an spin_lock_irq to protect the shared date with your > > interrupt handler, and _copy_ the date over. This is e.g. what > > drm_crtc_arm_vblank_event does. > > Nicholas wrote a patch that attempted to resolve the issue by adding ever= y > CRTC into the commit to use use the stall checks. [1] While this forces > synchronisation on commits, it's kind of a hacky method that may take a > toll on performance. > > Is it possible to have a DRM helper that forces synchronisation on some > commits without having to add every CRTC into the commit? > > Also, is synchronisation really necessary for fast updates in amdgpu? > I'll admit, the idea of eliminating the use-after-free bug by eliminating > the use entirely doesn't seem ideal; but is forcing synchronisation on > these updates that much better? Well clearing the dc_state pointer here and then allocating another one in atomic_commit_tail also looks fishy. The proper fix is probably a lot more involved, but yeah interim fix is to grab all crtc states iff you also grabbed the dm_atomic_state structure. Real fix is to only do this when necessary, which pretty much means the dc_state needs to be somehow split up, or there needs to be some guarantees about when it's necessary and when not. Otherwise parallel commits on different CRTC are not possible with the current dc backend code. See also my dma-fence annotation fixup patch, there dc_state also gets in the way: https://lore.kernel.org/dri-devel/20200707201229.472834-21-daniel.vetter@ff= wll.ch/ Nicholas, btw I'm still waiting for some dc feedback on that entire series, and what/if there's plans to fix these issues properly. Maybe even going back to the subclassed drm_atomic_state might be better than what we currently have. -Daniel > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=3D207383#c96 > > Thanks, > Mazin Rezk > > > > > Cheers, Daniel > > > > > > > > Regards, > > > Christian. > > > > > > > > > > > Regards, > > > > Nicholas Kazlauskas > > > > > > > >> > > > >>> > > > >>> Since this bug has only been spotted with fast commits, this patc= h > > > >>> fixes > > > >>> the bug by clearing the dm_state instead of using the old dc_stat= e for > > > >>> fast updates. In addition, since dm_state is only used for its dc= _state > > > >>> and amdgpu_dm_atomic_commit_tail will retain the dc_state if none= is > > > >>> found, > > > >>> removing the dm_state should not have any consequences in fast up= dates. > > > >>> > > > >>> This use-after-free bug has existed for a while now, but only cau= sed a > > > >>> noticeable issue starting from 5.7-rc1 due to 3202fa62f ("slub: > > > >>> relocate > > > >>> freelist pointer to middle of object") moving the freelist pointe= r from > > > >>> dm_state->base (which was unused) to dm_state->context (which is > > > >>> dereferenced). > > > >>> > > > >>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=3D207383 > > > >>> Fixes: bd200d190f45 ("drm/amd/display: Don't replace the dc_state > > > >>> for fast updates") > > > >>> Reported-by: Duncan <1i5t5.duncan@cox.net> > > > >>> Signed-off-by: Mazin Rezk > > > >>> --- > > > >>> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 36 > > > >>> ++++++++++++++----- > > > >>> 1 file changed, 27 insertions(+), 9 deletions(-) > > > >>> > > > >>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > > > >>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > > > >>> index 86ffa0c2880f..710edc70e37e 100644 > > > >>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > > > >>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > > > >>> @@ -8717,20 +8717,38 @@ static int amdgpu_dm_atomic_check(struct > > > >>> drm_device *dev, > > > >>> * the same resource. If we have a new DC context as pa= rt of > > > >>> * the DM atomic state from validation we need to free = it and > > > >>> * retain the existing one instead. > > > >>> + * > > > >>> + * Furthermore, since the DM atomic state only contains = the DC > > > >>> + * context and can safely be annulled, we can free the s= tate > > > >>> + * and clear the associated private object now to free > > > >>> + * some memory and avoid a possible use-after-free later= . > > > >>> */ > > > >>> - struct dm_atomic_state *new_dm_state, *old_dm_state; > > > >>> > > > >>> - new_dm_state =3D dm_atomic_get_new_state(state); > > > >>> - old_dm_state =3D dm_atomic_get_old_state(state); > > > >>> + for (i =3D 0; i < state->num_private_objs; i++) { > > > >>> + struct drm_private_obj *obj =3D state->private_objs[= i].ptr; > > > >>> > > > >>> - if (new_dm_state && old_dm_state) { > > > >>> - if (new_dm_state->context) > > > >>> - dc_release_state(new_dm_state->context); > > > >>> + if (obj->funcs =3D=3D adev->dm.atomic_obj.funcs) { > > > >>> + int j =3D state->num_private_objs-1; > > > >>> > > > >>> - new_dm_state->context =3D old_dm_state->context; > > > >>> + dm_atomic_destroy_state(obj, > > > >>> + state->private_objs[i].state); > > > >>> + > > > >>> + /* If i is not at the end of the array then the > > > >>> + * last element needs to be moved to where i was > > > >>> + * before the array can safely be truncated. > > > >>> + */ > > > >>> + if (i !=3D j) > > > >>> + state->private_objs[i] =3D > > > >>> + state->private_objs[j]; > > > >>> > > > >>> - if (old_dm_state->context) > > > >>> - dc_retain_state(old_dm_state->context); > > > >>> + state->private_objs[j].ptr =3D NULL; > > > >>> + state->private_objs[j].state =3D NULL; > > > >>> + state->private_objs[j].old_state =3D NULL; > > > >>> + state->private_objs[j].new_state =3D NULL; > > > >>> + > > > >>> + state->num_private_objs =3D j; > > > >>> + break; > > > >>> + } > > > >>> } > > > >>> } > > > >>> > > > >>> -- > > > >>> 2.27.0 > > > >>> > > > >> > > > > > > > > > > _______________________________________________ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch --=20 Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch