Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp6296007rwp; Mon, 17 Jul 2023 19:50:03 -0700 (PDT) X-Google-Smtp-Source: APBJJlHVZ9ZkyY9Fn0a8IqIKmsdxxva5DXDm5q31bPhX57g6sUWIa/NqnZU1JsH9uwBZp89gjhge X-Received: by 2002:a17:906:4a86:b0:994:2fa9:7446 with SMTP id x6-20020a1709064a8600b009942fa97446mr10858514eju.46.1689648603283; Mon, 17 Jul 2023 19:50:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689648603; cv=none; d=google.com; s=arc-20160816; b=e6VMC00dwm3JbqHmdx9FAqQPEfjmtJQEKO2ohqAvkcT2JcZRKSEvW7096iM4X8TFPq VSVo6636iZ4oPabIu3gyu+S9InkROei9vCuHwSrF5r5ey9gFszK0CaAa1fJ8eRGjnOOI gEypfV9qE9kbNrpAh22gZ8SgckiDb00jzuYlhdHt5TKuLw4oig1RTB6rWbF4N2LXa8GB jrYaEsp2vMLVtFAqGpRVYrV3H2bccGS019jHHur2i2Nfn8jbpGDBZvKlj3aFl87KfWGy jmB30Dlj0SaSJiaqcBotn4GKNeSw4RTdpClNhAUsDIuounho6WX9nMbxtCM0hRnmRg7B UFqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=wm1P/WxL+whsiQkXCoTp6lofSEhFARADQ69n7Ixlt1Y=; fh=u3+0Am8F0I3pJZAUusn6VsenP7VZJQdqwKcTbkpOXLo=; b=a8NNWOTs3H2uE0ojBtRyaCmwEfq/82PNWuT2vemU1Xt6/eca2EPqvWdsG+88jebAco mG7YwPOWCv1L77PchNlOpTwJtpSwSz2o8m3eT66ry2N++zWtm5BusRqT2mTqG1lpdK+l nrsBlJ/hNtUFMASrYFguxdwxLIt7UqdiEBN9Cu/5fXW8ZRfZN/buWw9tYaFJbfsO7VKv 8FS9FUbrKkrnxX2ubMu3Rbinmslhs6bE/JBkFuQ7hj98H+HB78C1K4TiZ1miaBhsCRul t57+oGXKLae4HfCqtwfPTwg7SyPTw76ToiDDq/pCldj1oAml6bknCc7qIddghQjMFpua ixng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=OJZUvXtK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z12-20020a170906668c00b00993150e5326si468185ejo.626.2023.07.17.19.49.39; Mon, 17 Jul 2023 19:50:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=OJZUvXtK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230029AbjGRCgC (ORCPT + 99 others); Mon, 17 Jul 2023 22:36:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229541AbjGRCgB (ORCPT ); Mon, 17 Jul 2023 22:36:01 -0400 Received: from mail.marcansoft.com (marcansoft.com [212.63.210.85]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2841F136; Mon, 17 Jul 2023 19:35:58 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: lina@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id 6C4DC424BA; Tue, 18 Jul 2023 02:35:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1689647756; bh=DPckD2pcMitjOq4PnGHXF3nMKOQ/JWjAKnEZEzPdo0w=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=OJZUvXtKw3BeyCO8IbXdPGPDd9/x9usKU5vwpZBxXfSCD/0CtaahrxpTvKcvv5l63 uL0PG8zcnIfpGypsFWqp2FaOoQLqhWn+7a/ojr9ftuW9K6GVp1dfDwVnl1MIQiaaLI eJ44Qr66EqBKAlqAO+ckO0i8ndFEe4eZDOSjB6MNdLUrzhPSI0gG1OgfAZ4x7mM2RR 7EMNbFczDoqxTNf0p5Ktjs7NkpIre2dOd4Ts81y7CGzSRQI8gvM49K9ACtoPzdwEKG inuh5ZiR+S8IZquHycUGJNT8yrNv9r/+Uj63u6OHV9zDUwS2YiNDmJCLd2wi5KJ/fa wM1fVndieLO/w== Message-ID: <7b564e55-a9b7-0585-3cf1-d1f132f9a918@asahilina.net> Date: Tue, 18 Jul 2023 11:35:51 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name Content-Language: en-US To: =?UTF-8?Q?Christian_K=c3=b6nig?= , alyssa@rosenzweig.io, Luben Tuikov , David Airlie , Daniel Vetter , Sumit Semwal Cc: Faith Ekstrand , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, asahi@lists.linux.dev References: <9c0cff84-45b1-268f-bdad-38c16316dbc3@amd.com> <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> <20230714-drm-sched-fixes-v1-2-c567249709f7@asahilina.net> <236422117088ca854a6717114de73d99b2b9ba2f@rosenzweig.io> From: Asahi Lina In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 18/07/2023 00.55, Christian König wrote: > Am 15.07.23 um 16:14 schrieb alyssa@rosenzweig.io: >> 15 July 2023 at 00:03, "Luben Tuikov" wrote: >>> On 2023-07-14 05:57, Christian König wrote: >>> >>>> Am 14.07.23 um 11:49 schrieb Asahi Lina: >>>> >>>>> On 14/07/2023 17.43, Christian König wrote: >>>>> >>>> Am 14.07.23 um 10:21 schrieb Asahi Lina: >>>> A signaled scheduler fence can outlive its scheduler, since fences are >>>> independencly reference counted. Therefore, we can't reference the >>>> scheduler in the get_timeline_name() implementation. >>>> >>>> Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared >>>> dma-bufs reference fences from GPU schedulers that no longer exist. >>>> >>>> Signed-off-by: Asahi Lina >>>> --- >>>>    drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++- >>>>    drivers/gpu/drm/scheduler/sched_fence.c  | 4 +++- >>>>    include/drm/gpu_scheduler.h              | 5 +++++ >>>>    3 files changed, 14 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c >>>> b/drivers/gpu/drm/scheduler/sched_entity.c >>>> index b2bbc8a68b30..17f35b0b005a 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>>> @@ -389,7 +389,12 @@ static bool >>>> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) >>>>               /* >>>>             * Fence is from the same scheduler, only need to wait for >>>> -         * it to be scheduled >>>> +         * it to be scheduled. >>>> +         * >>>> +         * Note: s_fence->sched could have been freed and reallocated >>>> +         * as another scheduler. This false positive case is okay, >>>> as if >>>> +         * the old scheduler was freed all of its jobs must have >>>> +         * signaled their completion fences. >>>> >>>> This is outright nonsense. As long as an entity for a scheduler exists >>>> it is not allowed to free up this scheduler. >>>> >>>> So this function can't be called like this. >>>> >>>>> As I already explained, the fences can outlive their scheduler. That >>>>> means *this* entity certainly exists for *this* scheduler, but the >>>>> *dependency* fence might have come from a past scheduler that was >>>>> already destroyed along with all of its entities, and its address reused. >>>>> >>>> >>>> Well this is function is not about fences, this function is a callback >>>> for the entity. >>>> >>>> >>>>> Christian, I'm really getting tired of your tone. I don't appreciate >>>>> being told my comments are "outright nonsense" when you don't even >>>>> take the time to understand what the issue is and what I'm trying to >>>>> do/document. If you aren't interested in working with me, I'm just >>>>> going to give up on drm_sched, wait until Rust gets workqueue support, >>>>> and reimplement it in Rust. You can keep your broken fence lifetime >>>>> semantics and I'll do my own thing. >>>>> >>>> >>>> I'm certainly trying to help here, but you seem to have unrealistic >>>> expectations. >>>> >>>> I perfectly understand what you are trying to do, but you don't seem to >>>> understand that this functionality here isn't made for your use case. >>>> >>>> We can adjust the functionality to better match your requirements, but >>>> you can't say it is broken because it doesn't work when you use it not >>>> in the way it is intended to be used. >>>> >>> I believe "adjusting" functionality to fit some external requirements, >>> may have unintended consequences, requiring yet more and more "adjustments". >>> (Or may allow (new) drivers to do wild things which may lead to wild results. :-) ) >>> >>> We need to be extra careful and wary of this. >> Either drm/scheduler is common code that we should use for our driver, in which case we need to "adjust" it to fit the requirements of a safe Rust abstraction usable for AGX. > > Well this is the fundamental disagreement we have. As far as I can see > you don't need to adjust anything in the common drm/scheduler code. > > That code works with quite a bunch of different drivers, including the > Intel XE which has similar requirements to your work here. > > We can talk about gradually improving the common code, but as Luben > already wrote as well this needs to be done very carefully. > >> Or, drm/scheduler is not common code intended for drivers with our requirements, and then we need to be able to write our own scheduler. >> >> AMD has NAK'd both options, effectively NAK'ing the driver. >> >> I will ask a simple yes/no question: Should we use drm/sched? > > Well, yes. > >> >> If yes, it will need patches like these, > > No, you don't. > > First of all you need to try to adjust your driver to match the > requirements of drm/scheduler and *not* the other way around. > >> and AMD needs to be ok with that and stop NAK'ing them on sight becuase they don't match the existing requirements. >> >> If no, we will write our own scheduler in Rust, and AMD needs to be ok with that and not NAK it on sight because it's not drm/sched. >> >> Which is it? >> >> Note if we write a Rust scheduler, drm/sched and amdgpu will be unaffected. If we do that and AMD comes back and NAKs it -- as said in this thread would "probably" happen -- then it is impossible for us to upstream a driver regardless of whether we use drm/sched. >> >> Lina has been polite and accommodating while AMD calls her code "outright nonsense" and gets "outright NAK"s, and puts her into an impossible catch-22 where no matter what she does it's NAK'd. > > Well as far as I can see I'm totally polite as well. > > Pointing out that approaches doesn't seem to make sense and NAKing > patches is a perfectly normal part of the review process. > > What you need to to is to take a step back and ask yourself why this > here is facing so much rejection from our side. I have to admit that I > don't seem to be good at explaining that, cause we are obviously talking > past each other, but you don't seem to try hard to understand what I'm > pointing out either. > >> That's not ok. > > As far as I can see it is. > > As maintainer of a commonly used component my first duty is to preserve > the status quo and prevent modifications which are not well thought > through. And to be honest those changes here strongly looks like Lina is > just adjusting the code to match her requirements without looking left > and right first. > > Regards, > Christian. > > I give up. You are ignoring everything we say, and rejecting everything we suggest. We've already explained why drm_sched doesn't work for us. I'm tired of repeating the same explanation over and over again only to be ignored and told I'm wrong. I'll start working on a new, much simpler Rust-native scheduler based on the workqueue Rust abstractions which are in review. ~~ Lina