Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1029402rwl; Wed, 5 Apr 2023 10:49:29 -0700 (PDT) X-Google-Smtp-Source: AKy350bFCS73x4A6MGxd2uE3dcpJcMJ/VIrNwrjkCNuqaKar1jYxkJ4Rpk0m1CjX0gifgXOqLhOh X-Received: by 2002:a17:907:318b:b0:930:f953:962c with SMTP id xe11-20020a170907318b00b00930f953962cmr4591728ejb.1.1680716969441; Wed, 05 Apr 2023 10:49:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680716969; cv=none; d=google.com; s=arc-20160816; b=S+VDhbA02NQhf3tO+4c1vEEdPWJIL2MPLWp3fLPVWyIBS/HmtvN1vg5LJ9R1sr/Tih vMiYC61NOqM5C9B36zd8QLx2gYROAP2m/GMCbmYXk6oO0+qQT7t8if+NSnUOeWewShlq Nk/2K3KAebNd5dlAzJNyJ1Ty40HSCAcUJLZ1WSrm51jHrMas55CU7xlhtC0vG3Fc0reF Kz+XMIz2keXxpiO51vl4WDUdLspGmyi53GwfsXdj8Ivepx+9nrwjeHqn4HxauCRMhp23 Q5i3lJOvynAejsDPUoIV3/x1UBbqo8zHidrP8p87LZg2NpUp6GmeYmqAWpQeBThYidU0 gwFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:to:from :subject:message-id; bh=Lz+c0JKFURawQi4joBqu5xsmya61GWHKA1LCde0Apz0=; b=AL8f8g2rIvOVLY3Rvww5Tf/tFgfI3McnXPCh2oJrv5y7pUEjkjZBBUjIj6ie3yRc5n Xo27mABIXrtFSHCP7GNTi71fuMDo8u8KMNrJnGJcAqlYL7kKyAGBDOplO+nPbliIMgFo fB6zuDXIJhH33okIFG+paYtI6H/e49RQzJ/uX6EQ28rRzj662x8Q9yLauzt7Ox8JB5GA dGIIeTdVXogueq5Ujyk3JaL+G/ZOKSWpwUuHMFAQnv/ND01vYLDnhazSq2iNCy2kzK79 rKERgjBd24XMERYLdNwa2tSt3IHhpXZCJn0UUZXO0VB3BPEFyzGuoiiilwv8DHjoVyYE SFWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dy15-20020a05640231ef00b004ad03d31db6si2039942edb.278.2023.04.05.10.49.03; Wed, 05 Apr 2023 10:49:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231280AbjDERot convert rfc822-to-8bit (ORCPT + 99 others); Wed, 5 Apr 2023 13:44:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229507AbjDERoo (ORCPT ); Wed, 5 Apr 2023 13:44:44 -0400 Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 762523C0B for ; Wed, 5 Apr 2023 10:44:43 -0700 (PDT) Received: from ptz.office.stw.pengutronix.de ([2a0a:edc0:0:900:1d::77] helo=[IPv6:::1]) by metis.ext.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pk7BT-0001xp-H9; Wed, 05 Apr 2023 19:44:35 +0200 Message-ID: Subject: Re: [Regression] drm/scheduler: track GPU active time per entity From: Lucas Stach To: Luben Tuikov , Danilo Krummrich , daniel@ffwll.ch, Dave Airlie , Bagas Sanjaya , andrey.grodzovsky@amd.com, Christian =?ISO-8859-1?Q?K=F6nig?= , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Date: Wed, 05 Apr 2023 19:44:34 +0200 In-Reply-To: <8b28151c-f2db-af3f-8dff-87dd5d57417b@amd.com> References: <3e00d8a9-b6c4-8202-4f2d-5a659c61d094@redhat.com> <2a84875dde6565842aa07ddb96245b7d939cb4fd.camel@pengutronix.de> <8b28151c-f2db-af3f-8dff-87dd5d57417b@amd.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT User-Agent: Evolution 3.46.4 (3.46.4-1.fc37) MIME-Version: 1.0 X-SA-Exim-Connect-IP: 2a0a:edc0:0:900:1d::77 X-SA-Exim-Mail-From: l.stach@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org X-Spam-Status: No, score=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Luben, Am Dienstag, dem 04.04.2023 um 00:31 -0400 schrieb Luben Tuikov: > On 2023-03-28 04:54, Lucas Stach wrote: > > Hi Danilo, > > > > Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich: > > > Hi all, > > > > > > Commit df622729ddbf ("drm/scheduler: track GPU active time per entity") > > > tries to track the accumulated time that a job was active on the GPU > > > writing it to the entity through which the job was deployed to the > > > scheduler originally. This is done within drm_sched_get_cleanup_job() > > > which fetches a job from the schedulers pending_list. > > > > > > Doing this can result in a race condition where the entity is already > > > freed, but the entity's newly added elapsed_ns field is still accessed > > > once the job is fetched from the pending_list. > > > > > > After drm_sched_entity_destroy() being called it should be safe to free > > > the structure that embeds the entity. However, a job originally handed > > > over to the scheduler by this entity might still reside in the > > > schedulers pending_list for cleanup after drm_sched_entity_destroy() > > > already being called and the entity being freed. Hence, we can run into > > > a UAF. > > > > > Sorry about that, I clearly didn't properly consider this case. > > > > > In my case it happened that a job, as explained above, was just picked > > > from the schedulers pending_list after the entity was freed due to the > > > client application exiting. Meanwhile this freed up memory was already > > > allocated for a subsequent client applications job structure again. > > > Hence, the new jobs memory got corrupted. Luckily, I was able to > > > reproduce the same corruption over and over again by just using > > > deqp-runner to run a specific set of VK test cases in parallel. > > > > > > Fixing this issue doesn't seem to be very straightforward though (unless > > > I miss something), which is why I'm writing this mail instead of sending > > > a fix directly. > > > > > > Spontaneously, I see three options to fix it: > > > > > > 1. Rather than embedding the entity into driver specific structures > > > (e.g. tied to file_priv) we could allocate the entity separately and > > > reference count it, such that it's only freed up once all jobs that were > > > deployed through this entity are fetched from the schedulers pending list. > > > > > My vote is on this or something in similar vain for the long term. I > > have some hope to be able to add a GPU scheduling algorithm with a bit > > more fairness than the current one sometime in the future, which > > requires execution time tracking on the entities. > > Danilo, > > Using kref is preferable, i.e. option 1 above. > > Lucas, can you shed some light on, > > 1. In what way the current FIFO scheduling is unfair, and > 2. shed some details on this "scheduling algorithm with a bit > more fairness than the current one"? I don't have a specific implementation in mind yet. However the current FIFO algorithm can be very unfair if you have a sparse workload compete with one that generates a lot of jobs without any throttling aside from the entity queue length. By tracking the actual GPU time consumed by the entities we could implement something with a bit more fairness like deficit round robin (don't pin me on the specific algorithm, as I haven't given it much thought yet). Regards, Lucas