Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp2171843rwp; Fri, 14 Jul 2023 01:46:14 -0700 (PDT) X-Google-Smtp-Source: APBJJlF1pI+ZJugzp6OgS/CdYdZ0MTEI40yVnIR/rAstQHRiNg8qVF5PFjIBRDf+Y2lepoWdRHrP X-Received: by 2002:a17:906:3f13:b0:993:f276:9696 with SMTP id c19-20020a1709063f1300b00993f2769696mr2294502ejj.35.1689324374138; Fri, 14 Jul 2023 01:46:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689324374; cv=none; d=google.com; s=arc-20160816; b=uJ0KdJlFEyXSoFeEpWrIWQPJCiV0tUAy7BlGqWxJwweOzdO37LNTRXgnGsdmtFqn4a p2sCxMpR8Zae05fRpGhbG2QpewAK49JGnS9oCAkEySiCYJlGmNOfR1WB2932gwCACB2h rQKx/iDwNwf5A2F1qJfvYeqhlhE8lQZOpLQW4cl/gyEEOWyfNyOaCTY99JAxkfRsJzNc LnHUJpUJDt1vWaM+sS8j8pG1PYKp/Gp3HQXd3HGnZ3d/Ky7EOGeePwhyPPr/TljZGuI8 B2FRvgz1y5CKK+XN97T1mEU7xSrtc/zPhWXbAXHcO/SHi5uc+yXRgRf9aj+naTrI6Lvl WfTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=0ehcbAT5baYDQ+vOfickk0D5TRb7GsUze4eKCy8EBl4=; fh=94kmplSEjCUqQrkUfPl3t6aErf9LU03vc5Esj90lSEQ=; b=olLgnNn/UEzsB5438dsTyR+QK+1vvQ7vPnk/TZFz7guPqS3njGdQxoa4id67m91EvB 70HsK9uYXQxmhFS505OCWmawwRPwzG8rXkrDVooFGW+wpki2BfH8R2gCUJ4d0uyH2hFZ 9oE8gi4d0SQfC0J5hAUJbiNG0Y8eAuiX0zoCtI/fdIsWmBDbUEpzp23ArltnnlJOie+1 RG3EtYD70XBdlI64vEa6E97I1WsdLIZxq46GCwkZkRmJnCEfJKgmc0vvI9KBvWquUd22 2VL6h+j33SXuSw3VgpVEwQkpCM4VEAh+MeEsqOFPSSfWNvs9YKyCHgtf9yCkPC1uh8lw qDmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=jTbM8iIC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u16-20020a170906409000b00991b7749bd5si2327522ejj.779.2023.07.14.01.45.48; Fri, 14 Jul 2023 01:46:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=jTbM8iIC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235481AbjGNIbx (ORCPT + 99 others); Fri, 14 Jul 2023 04:31:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229492AbjGNIbj (ORCPT ); Fri, 14 Jul 2023 04:31:39 -0400 Received: from mail.marcansoft.com (marcansoft.com [IPv6:2a01:298:fe:f::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A8B4212B; Fri, 14 Jul 2023 01:31:36 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: linasend@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id 0184C5BC38; Fri, 14 Jul 2023 08:21:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1689322899; bh=SiuL//u01cpp5eowFbvlhLlowHgzfDnh6OXKL8CX6Qo=; h=From:Date:Subject:References:In-Reply-To:To:Cc; b=jTbM8iICekJjMQkS8TPm+lGxRLkRWT0/LnMFix4yI5i9aW2W8kIC4DZfAIW3ftLAt VeHeIq0qSPMn0ajIYWJL/jWVmvCL3GajvWpOvcbiAXBfT3f85HmmPMD4SW/fIL8h/C s5RYchWzcSOH8IQzSJ5Px/0rsjQn2RnSP8Kt0d+SsD/NwC2PhBOMcR3cxg1aTOYDvV Ei7SoK9VjbbNk5Sli2mLSr7WToHp8/EQ3mPvdTujFcB39SO2gCK868iIsuiXC2DcXr 99DTSYsBHjww36hV2i+IfIO+dZCpihCiJz0aMsM5HMpstYVad7TJ/4QiN3z9hOJhei E+WjBqvgJKZQQ== From: Asahi Lina Date: Fri, 14 Jul 2023 17:21:29 +0900 Subject: [PATCH 1/3] drm/scheduler: Add more documentation MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20230714-drm-sched-fixes-v1-1-c567249709f7@asahilina.net> References: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> In-Reply-To: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> To: Luben Tuikov , David Airlie , Daniel Vetter , Sumit Semwal , =?utf-8?q?Christian_K=C3=B6nig?= Cc: Faith Ekstrand , Alyssa Rosenzweig , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, asahi@lists.linux.dev, Asahi Lina X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1689322891; l=3946; i=lina@asahilina.net; s=20230221; h=from:subject:message-id; bh=SiuL//u01cpp5eowFbvlhLlowHgzfDnh6OXKL8CX6Qo=; b=W7DZprDVToONdsfgSG5/Md25GSR994KvtI3Fa+/4LBSTFzarcsYcFhySWLme0A2jAZXVl+3n/ uLmvTuHCmrXCSYDjQ6pa1slmUz9KlJCVgWpqYdmzcqk73XPqRuk6KKj X-Developer-Key: i=lina@asahilina.net; a=ed25519; pk=Qn8jZuOtR1m5GaiDfTrAoQ4NE1XoYVZ/wmt5YtXWFC4= X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Document the implied lifetime rules of the scheduler (or at least the intended ones), as well as the expectations of how resource acquisition should be handled. Signed-off-by: Asahi Lina --- drivers/gpu/drm/scheduler/sched_main.c | 58 ++++++++++++++++++++++++++++++++-- 1 file changed, 55 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 7b2bfc10c1a5..1f3bc3606239 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -43,9 +43,61 @@ * * The jobs in a entity are always scheduled in the order that they were pushed. * - * Note that once a job was taken from the entities queue and pushed to the - * hardware, i.e. the pending queue, the entity must not be referenced anymore - * through the jobs entity pointer. + * Lifetime rules + * -------------- + * + * Getting object lifetimes right across the stack is critical to avoid UAF + * issues. The DRM scheduler has the following lifetime rules: + * + * - The scheduler must outlive all of its entities. + * - Jobs pushed to the scheduler are owned by it, and must only be freed + * after the free_job() callback is called. + * - Scheduler fences are reference-counted and may outlive the scheduler. + * - The scheduler *may* be destroyed while jobs are still in flight. + * - There is no guarantee that all jobs have been freed when all entities + * and the scheduled have been destroyed. Jobs may be freed asynchronously + * after this point. + * - Once a job is taken from the entity's queue and pushed to the hardware, + * i.e. the pending queue, the entity must not be referenced any more + * through the job's entity pointer. In other words, entities are not + * required to outlive job execution. + * + * If the scheduler is destroyed with jobs in flight, the following + * happens: + * + * - Jobs that were pushed but have not yet run will be destroyed as part + * of the entity cleanup (which must happen before the scheduler itself + * is destroyed, per the first rule above). This signals the job + * finished fence with an error flag. This process runs asynchronously + * after drm_sched_entity_destroy() returns. + * - Jobs that are in-flight on the hardware are "detached" from their + * driver fence (the fence returned from the run_job() callback). In + * this case, it is up to the driver to ensure that any bookkeeping or + * internal data structures have separately managed lifetimes and that + * the hardware either cancels the jobs or runs them to completion. + * The DRM scheduler itself will immediately signal the job complete + * fence (with an error flag) and then call free_job() as part of the + * cleanup process. + * + * After the scheduler is destroyed, drivers *may* (but are not required to) + * skip signaling their remaining driver fences, as long as they have only ever + * been returned to the scheduler being destroyed as the return value from + * run_job() and not passed anywhere else. If these fences are used in any other + * context, then the driver *must* signal them, per the usual fence signaling + * rules. + * + * Resource management + * ------------------- + * + * Drivers may need to acquire certain hardware resources (e.g. VM IDs) in order + * to run a job. This process must happen during the job's prepare() callback, + * not in the run() callback. If any resource is unavailable at job prepare time, + * the driver must return a suitable fence that can be waited on to wait for the + * resource to (potentially) become available. + * + * In order to avoid deadlocks, drivers must always acquire resources in the + * same order, and release them in opposite order when a job completes or if + * resource acquisition fails. */ #include -- 2.40.1