Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp2555987rdh; Wed, 27 Sep 2023 06:19:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFMW3Fzi01FNEJ9ZnpSlP973lauTpvU6qZbHwu4omOcoWAkpeqtAbPT+TPk+VDyjyNcJ2Ft X-Received: by 2002:a17:902:7406:b0:1c5:ce3c:c391 with SMTP id g6-20020a170902740600b001c5ce3cc391mr1708164pll.12.1695820741419; Wed, 27 Sep 2023 06:19:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695820741; cv=none; d=google.com; s=arc-20160816; b=g2Kqtd1ixQI6lnBiDjofYN3JUHBnB7o/o3ZFNHJXPIxZf7uxAN9v7JdmAuAG/yDQWF hVFVdoX7yoktXmxZ1dJXpYmhtoTCAopCLvwqpsGuT0frqjGBGf49EY5GAUyQEy5rk+uL aLjflNMn03kSxqRCsxZD+DJY0ZbNgDdxwl3Zs2CTkCbLFkNp2Y0LdX/dGo9Q6HXRlnkJ ZZhAx4Hib+4WEwC7EN0FYRmV2LZoq7sGyXfnWcA8DLyN+TwniElL79mtcxRVJ0Z944Vz Xwb2HY6IIc6ddeAb3TS2c0U/LgtQgLQtnMpo3U+DG8ZuuadlcS/N6QrgDKllH2HJWGRp SSUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=aWGBQnuWuw7vroQD8Hh/hIbFKCuCxNNSjA6XfHIlgEc=; fh=yMhJBcMLZGdhvAVOT6yqm5xu1VrOSRtNgZe2rHPsgAs=; b=kr43DOsPqy5WX8T79DyMsNB+ooadi4rnbrcpoVzKKPkoJGnSurPnBdL8q9zdSNlQPr o+XTYo7X0aZmb+U1bwUBHy1eL2yxkDBBp0GMNwBrLeXVFzjUwZdjBbqQ7dtjXTT19AO/ k1cLHnAWGYHUaYBjcbJWgjEo5uBMyIKTW4DafIafvc+qfLPioLLbjWxurITapXWFiFfW Hb7X3YOKnW5sAMO90d7+qw1ZzDd5X/laWtrGJVePAaATuyMLTi3JkaI53vAwwkdYO+Z9 tDb4O5WUw61DDY5hlrFp//USnfGPAyItAvGMnxSrfftLVq1+20pl0DOwCfkM7XSprdbB ekRw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=XLVIY+EP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id l9-20020a170902f68900b001bdc10170casi11268492plg.36.2023.09.27.06.19.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 06:19:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=XLVIY+EP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 3C2F1824BAFE; Wed, 27 Sep 2023 04:54:57 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230496AbjI0Lyu (ORCPT + 99 others); Wed, 27 Sep 2023 07:54:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229901AbjI0Lyt (ORCPT ); Wed, 27 Sep 2023 07:54:49 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e5ab]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9019FC for ; Wed, 27 Sep 2023 04:54:47 -0700 (PDT) Received: from localhost (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by madras.collabora.co.uk (Postfix) with ESMTPSA id C5EB866072F8; Wed, 27 Sep 2023 12:54:44 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1695815685; bh=YN94tjzgqFxb4LwqW2tC1IGyFOsl0EecJ6Oc0vYkFE8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=XLVIY+EPrBWlG1n705Pb1H3C+G4t0Np5e3N0nFpqcLGctkG5nOP4z3lJTNu2DfoOV fnxm32+IOF+XUBppnS80Fsd7ev1208H9l6nrCk3ExtOfdqlBbi5vkTo2UhCQESHc+U 2O8dnWFOzytxvNgXQ8ZL+EU9WrfL5ZikDKkrpx1728bkVXP+qhCyCiylh4yfC+dAZ+ bW5+9EzU913T94JbEccktD/uhb3eqWt2vMKoFu+LdwZ/q9rmzSbjHJMB5FaWLnBRGj PYn6Vbu45MnTlwaQW9wiyFva/InRVJlZ3Izx58B0umYpXP1TC4VGEgaRLjtkytvAg/ 5/dXqgOfnlG4A== Date: Wed, 27 Sep 2023 13:54:41 +0200 From: Boris Brezillon To: Danilo Krummrich Cc: Luben Tuikov , airlied@gmail.com, daniel@ffwll.ch, matthew.brost@intel.com, christian.koenig@amd.com, faith.ekstrand@collabora.com, dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH drm-misc-next 1/3] drm/sched: implement dynamic job flow control Message-ID: <20230927135441.064795f0@collabora.com> In-Reply-To: References: <20230924224555.15595-1-dakr@redhat.com> <312983ee-ba4c-477e-8846-072c815df862@amd.com> <12f19494-7fd0-055f-4135-e17581398eb5@redhat.com> <20230927092514.04776822@collabora.com> Organization: Collabora X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Wed, 27 Sep 2023 04:54:57 -0700 (PDT) On Wed, 27 Sep 2023 13:45:37 +0200 Danilo Krummrich wrote: > On 9/27/23 09:25, Boris Brezillon wrote: > > On Wed, 27 Sep 2023 02:13:59 +0200 > > Danilo Krummrich wrote: > > > >> On 9/26/23 22:43, Luben Tuikov wrote: > >>> Hi, > >>> > >>> On 2023-09-24 18:43, Danilo Krummrich wrote: > >>>> Currently, job flow control is implemented simply by limiting the amount > >>>> of jobs in flight. Therefore, a scheduler is initialized with a > >>>> submission limit that corresponds to a certain amount of jobs. > >>> > >>> "certain"? How about this instead: > >>> " ... that corresponds to the number of jobs which can be sent > >>> to the hardware."? > >>> > >>>> > >>>> This implies that for each job drivers need to account for the maximum > >>> ^, > >>> Please add a comma after "job". > >>> > >>>> job size possible in order to not overflow the ring buffer. > >>> > >>> Well, different hardware designs would implement this differently. > >>> Ideally, you only want pointers into the ring buffer, and then > >>> the hardware consumes as much as it can. But this is a moot point > >>> and it's always a good idea to have a "job size" hint from the client. > >>> So this is a good patch. > >>> > >>> Ideally, you want to say that the hardware needs to be able to > >>> accommodate the number of jobs which can fit in the hardware > >>> queue times the largest job. This is a waste of resources > >>> however, and it is better to give a hint as to the size of a job, > >>> by the client. If the hardware can peek and understand dependencies, > >>> on top of knowing the "size of the job", it can be an extremely > >>> efficient scheduler. > >>> > >>>> > >>>> However, there are drivers, such as Nouveau, where the job size has a > >>>> rather large range. For such drivers it can easily happen that job > >>>> submissions not even filling the ring by 1% can block subsequent > >>>> submissions, which, in the worst case, can lead to the ring run dry. > >>>> > >>>> In order to overcome this issue, allow for tracking the actual job size > >>>> instead of the amount job jobs. Therefore, add a field to track a job's > >>> > >>> "the amount job jobs." --> "the number of jobs." > >> > >> Yeah, I somehow manage to always get this wrong, which I guess you noticed > >> below already. > >> > >> That's all good points below - gonna address them. > >> > >> Did you see Boris' response regarding a separate callback in order to fetch > >> the job's submission units dynamically? Since this is needed by PowerVR, I'd > >> like to include this in V2. What's your take on that? > >> > >> My only concern with that would be that if I got what Boris was saying > >> correctly calling > >> > >> WARN_ON(s_job->submission_units > sched->submission_limit); > >> > >> from drm_sched_can_queue() wouldn't work anymore, since this could indeed happen > >> temporarily. I think this was also Christian's concern. > > > > Actually, I think that's fine to account for the max job size in the > > first check, we're unlikely to have so many native fence waits that our > > job can't fit in an empty ring buffer. > > > > But it can happen, right? Hence, we can't have this check, do we? > I theory, yes, in practice, given the size of the ring buffers, and the size of a fence wait command, I guess we can refuse to queue a job (at the driver level) if the maximum job size (static + maximum dynamic part of the job) doesn't fit in the ring buffer. I that ever becomes a problem, we can revisit it at that point.