Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1971414ybh; Fri, 24 Jul 2020 00:47:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJznNzZ/NH+3f6VdofMz2BfuHdgprNtGktmckb+EGzKOWC+dWbZYWqcIpOfdpgo64OA7gyWI X-Received: by 2002:a17:907:42d0:: with SMTP id nz24mr8291558ejb.135.1595576876503; Fri, 24 Jul 2020 00:47:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595576876; cv=none; d=google.com; s=arc-20160816; b=yf22ELaoP2sYbmK6neeDyt/L4dYy6RWgdqf3jRhdX/PSXWeuD8KQykgM5DoFBP4tkg /qohugtGX5IUhmgugKzWGSkVuL1mSyLXefFk6CebI8/mjrjqJAbhZJs+R9b2vZte44TL 5WmMj0b1eI3OQWY4Rp9q6OOoZhzBQdeT1DWsCbUoopDEjqx8dYCUMiflvFNv16ijCCj6 hoCm+t5lniY+8iG9rSVJrHE2uLj3X+hqc7LSCAttoWgbclPhNoHofWa3jCd7LbvbAyEh YiIMVr7AMZJTVvxg76LW313BeCgzAX1v6SjZGKeoMyOR4lUiUJ2sNJrqeWyH85DcEMpf ejgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=g1Dn3hQ/wVuH8JuXt63feXjcdbyJGxBuf2LrlRR2xn8=; b=RiV7KFPVlbr/ZWHknEcGoz3HVIZOlg9pfutwvk/EmrPwWkm2IgbVEoDIStBBVU/D+3 v1Yp6fWnLe3liBpV/irY1aSPiBKm1VQttPaR7vkofWip77iQlOZBG1sFq9iDhE7rfXQ2 Pef5nR5HmtMv3g+u6AxN6fXUC7xH3XLhGD/UG4zOPP1MPE0FOzkgZ0Br3En/D94jHoci nZ8uEk3vtw/H7k2Tt+B10/nOC07oQWTtzkrGaFndUudkQW4z/1PCU4nODx2+5hl5ltdR Lw0rcrRlYn0KRMBO0cdXUAUVD4hTB7FzwoKXrt03RhLeo/LRhadAIgtnr+n7CD7ZqwKT /+sQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a24si102272ejs.633.2020.07.24.00.47.34; Fri, 24 Jul 2020 00:47:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726801AbgGXHpW (ORCPT + 99 others); Fri, 24 Jul 2020 03:45:22 -0400 Received: from mx3.molgen.mpg.de ([141.14.17.11]:52005 "EHLO mx1.molgen.mpg.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726573AbgGXHpV (ORCPT ); Fri, 24 Jul 2020 03:45:21 -0400 Received: from [192.168.0.2] (ip5f5af51b.dynamic.kabel-deutschland.de [95.90.245.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: pmenzel) by mx.molgen.mpg.de (Postfix) with ESMTPSA id 95C4C2002EE2B; Fri, 24 Jul 2020 09:45:18 +0200 (CEST) Subject: Re: [PATCH] amdgpu_dm: fix nonblocking atomic commit use-after-free To: Kees Cook , Mazin Rezk Cc: linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Andrew Morton , =?UTF-8?Q?Christian_K=c3=b6nig?= , Harry Wentland , Nicholas Kazlauskas , sunpeng.li@amd.com, Alexander Deucher , 1i5t5.duncan@cox.net, mphantomx@yahoo.com.br, regressions@leemhuis.info, anthony.ruhier@gmail.com References: <202007231524.A24720C@keescook> From: Paul Menzel Message-ID: Date: Fri, 24 Jul 2020 09:45:18 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <202007231524.A24720C@keescook> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear Kees, Am 24.07.20 um 00:32 schrieb Kees Cook: > On Thu, Jul 23, 2020 at 09:10:15PM +0000, Mazin Rezk wrote: >> When amdgpu_dm_atomic_commit_tail is running in the workqueue, >> drm_atomic_state_put will get called while amdgpu_dm_atomic_commit_tail is >> running, causing a race condition where state (and then dm_state) is >> sometimes freed while amdgpu_dm_atomic_commit_tail is running. This bug has >> occurred since 5.7-rc1 and is well documented among polaris11 users [1]. >> >> Prior to 5.7, this was not a noticeable issue since the freelist pointer >> was stored at the beginning of dm_state (base), which was unused. After >> changing the freelist pointer to be stored in the middle of the struct, the >> freelist pointer overwrote the context, causing dc_state to become garbage >> data and made the call to dm_enable_per_frame_crtc_master_sync dereference >> a freelist pointer. >> >> This patch fixes the aforementioned issue by calling drm_atomic_state_get >> in amdgpu_dm_atomic_commit before drm_atomic_helper_commit is called and >> drm_atomic_state_put after amdgpu_dm_atomic_commit_tail is complete. >> >> According to my testing on 5.8.0-rc6, this should fix bug 207383 on >> Bugzilla [1]. >> >> [1] https://bugzilla.kernel.org/show_bug.cgi?id=207383 > > Nice work tracking this down! > >> Fixes: 3202fa62f ("slub: relocate freelist pointer to middle of object") > > I do, however, object to this Fixes tag. :) The flaw appears to have > been with amdgpu_dm's reference tracking of "state" in the nonblocking > case. (How this reference counting is supposed to work correctly, though, > I'm not sure.) If I look at where the drm helper was split from being > the default callback, it looks like this was what introduced the bug: > > da5c47f682ab ("drm/amd/display: Remove acrtc->stream") > > ? 3202fa62f certainly exposed it much more quickly, but there was a race > even without 3202fa62f where something could have realloced the memory > and written over it. I understand the Fixes tag mainly a help when backporting commits. As Linux 5.8-rc7 is going to be released this Sunday, I wonder, if commit 3202fa62f ("slub: relocate freelist pointer to middle of object") should be reverted for now to fix the regression for the users according to Linux’ no regression policy. Once the AMDGPU/DRM driver issue is fixed, it can be reapplied. I know it’s not optimal, but as some testing is going to be involved for the fix, I’d argue it’s the best option for the users. Kind regards, Paul