Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp2264500rwp; Fri, 14 Jul 2023 03:35:45 -0700 (PDT) X-Google-Smtp-Source: APBJJlGUMzylikaBTSg2cL1//4w7I3ZL52heh6CjdCgLxd4BYrJPxiKF1cyR6dESFhd9FbzM489u X-Received: by 2002:a05:6a00:cc4:b0:676:20f8:be41 with SMTP id b4-20020a056a000cc400b0067620f8be41mr3863374pfv.16.1689330944828; Fri, 14 Jul 2023 03:35:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689330944; cv=none; d=google.com; s=arc-20160816; b=W7gLLmYhA0YWA+yWKUC4/0UjGskEGQiECRhgbFEfq8LAqG6mB9BQ4cMx2dklgD4z9p ZdNZGYgrD3s0gFVX6ZKpM7Vym9SeuBItO0aVvgLN+amfigaK1APOC8hjxwsXl4/xqsKT UpA+BO6c0d/MxRXlq4ObG4IhShdcKxTxbI1ezudlLtnZwvKQRHMl9yKEan5MfJVkqKMI v394arnJ07qUEoh1LHoAbNKMvDNh1PWVL7Uow9SwbOHmbWTjYZu3l7+NYCK6heQjE2iH VIR7fas2ixAejuvtJ8PGMYH1WxoPe3RtOyuuYwNF90Bwghzvn8HtoVuaopr95sRdxFhe A2kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=MYhED1LhCq8hAVG0cWE10u6vwO+7/3qbcgzV4EhVpdo=; fh=55RZIZzLuhRrvB39jWdpKSgMfNLAAz4RKVVa+gvdeE8=; b=jRiGg6aU8cJCedN5qKRUUN5RV3HeX0LUxfTglrkHdgS38eOWdbIdRO+bW342G4U7Vv 3WkZah6W0VsDrSxkLqJy9NRsPiLg9DiDqa3/iWaKpoVoSVZE5gufPUPwHxsvPQO5MvQ0 5OvmwDL3zmgJK4ahNydeUfuVBT2Sw6ZXa7y7vARX4WyKkmhAWBD+Y9GMLV7amifjBYkH +nmcCqrXfbt/RC8GrJqmYine3yqgVBlTCXEmD6kfdxeRBGXKprwK0XLFYmnxWvO0RT2e TjJepKkKoiOIH/DwP7jSLleI1Ci/wl2CKqj1nHC+9K5MlFrxJb2TqrJ67TIjF4RwkRHB w2fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=tVradgmi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bv69-20020a632e48000000b0051b423d966csi6482823pgb.280.2023.07.14.03.35.32; Fri, 14 Jul 2023 03:35:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=tVradgmi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235444AbjGNJvy (ORCPT + 99 others); Fri, 14 Jul 2023 05:51:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235969AbjGNJvl (ORCPT ); Fri, 14 Jul 2023 05:51:41 -0400 Received: from mail.marcansoft.com (marcansoft.com [212.63.210.85]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07FD230CA; Fri, 14 Jul 2023 02:51:38 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) (Authenticated sender: lina@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id 72F9C5BC37; Fri, 14 Jul 2023 09:51:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1689328297; bh=9YdiC6yP/uFp6AFL5xOcZWtsFNruI4WD3T4VKunyZuM=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=tVradgmizqZtTSMOOR1/kDVWWp9IuFmRZeHsU7ET8PCoB9Gg/lv3s9QNmXaGXXvSz U5dY2f1B7lE/TBHDRgUCjn6EjBE+UspHpBMddK3gRTbmsCjwL40Hz/AoXw+fQX3M3k RlPAGl663yz9oVtSFLLy6YXSVMK+QLcBHL/YaTXlgoB2NaZ+595tz7cJF9v+SqhE/B JRkfi4PKbAwYmlyVuN9N4KAeM12bj4QbXatuJ4RSEiduMivPqVW5dB9AixUkWLcYkq 9VUVPT7M2xR4ZohtD+P7oM6A5u7QlwqmDaPpVTDrAJCx4ikXGt7sfbLF8RjNoUGE72 Zuo61b+e6nYlQ== Message-ID: Date: Fri, 14 Jul 2023 18:51:32 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [PATCH 1/3] drm/scheduler: Add more documentation Content-Language: en-US To: =?UTF-8?Q?Christian_K=c3=b6nig?= , Luben Tuikov , David Airlie , Daniel Vetter , Sumit Semwal Cc: Faith Ekstrand , Alyssa Rosenzweig , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, asahi@lists.linux.dev References: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> <20230714-drm-sched-fixes-v1-1-c567249709f7@asahilina.net> <332e031c-c04e-998c-e401-685c817ea2a1@amd.com> <6ba6827b-ab2b-1c03-2c72-eb3f6da7e719@asahilina.net> <1c9e3c2b-a2bd-e465-809c-0e57ca61f40d@amd.com> From: Asahi Lina In-Reply-To: <1c9e3c2b-a2bd-e465-809c-0e57ca61f40d@amd.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14/07/2023 18.47, Christian König wrote: > Am 14.07.23 um 11:39 schrieb Asahi Lina: >> On 14/07/2023 17.40, Christian König wrote: >>> Am 14.07.23 um 10:21 schrieb Asahi Lina: >>>> Document the implied lifetime rules of the scheduler (or at least the >>>> intended ones), as well as the expectations of how resource acquisition >>>> should be handled. >>>> >>>> Signed-off-by: Asahi Lina >>>> --- >>>>    drivers/gpu/drm/scheduler/sched_main.c | 58 >>>> ++++++++++++++++++++++++++++++++-- >>>>    1 file changed, 55 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>> index 7b2bfc10c1a5..1f3bc3606239 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>> @@ -43,9 +43,61 @@ >>>>     * >>>>     * The jobs in a entity are always scheduled in the order that >>>> they were pushed. >>>>     * >>>> - * Note that once a job was taken from the entities queue and >>>> pushed to the >>>> - * hardware, i.e. the pending queue, the entity must not be >>>> referenced anymore >>>> - * through the jobs entity pointer. >>>> + * Lifetime rules >>>> + * -------------- >>>> + * >>>> + * Getting object lifetimes right across the stack is critical to >>>> avoid UAF >>>> + * issues. The DRM scheduler has the following lifetime rules: >>>> + * >>>> + * - The scheduler must outlive all of its entities. >>>> + * - Jobs pushed to the scheduler are owned by it, and must only be >>>> freed >>>> + *   after the free_job() callback is called. >>>> + * - Scheduler fences are reference-counted and may outlive the >>>> scheduler. >>> >>>> + * - The scheduler *may* be destroyed while jobs are still in flight. >>> >>> That's not correct. The scheduler can only be destroyed after all the >>> entities serving it have been destroyed as well as all the jobs already >>> pushed to the hw finished. >> >> The point of this series is to change this behavior so I can actually >> use the scheduler in my use case, and that begins with formally >> documenting it as Daniel suggested. That is, I need it to be safe for >> jobs to not be yet complete before the scheduler is destroyed (the >> entities do get destroyed first, that's the first bullet point). > > Yeah, but you need to document the current situation not how you like it > to be. Daniel told me to document how I think it should be, then fix the bugs that make it not so. That's what this series does. >> We already had this discussion. Without this guarantee, I cannot build >> a reasonable safe Rust abstraction. Unless you have another >> suggestion, as far as I can tell it's either this or I give up on >> using the DRM scheduler entirely and reimplement something else on my >> own. >> >>> What might be possible to add is that the hw is still working on the >>> already pushed jobs, but so far that was rejected as undesirable. >> >> Where was this rejected? > > Years ago. Our initial driver suspend/resume design relied on that. > Turned out not to be a good idea Times change, maybe it's time to revisit that decision? ~~ Lina