Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp2460476rdh; Wed, 27 Sep 2023 03:27:10 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF/ysgapdivP9Bdo8MlK6LN5dZDQNlbADv/vVyR/M4WzbYpPVXPLXUzRD9TIEAsUTkYDW85 X-Received: by 2002:a17:903:1cc:b0:1bb:994c:bc43 with SMTP id e12-20020a17090301cc00b001bb994cbc43mr1255104plh.18.1695810430093; Wed, 27 Sep 2023 03:27:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695810430; cv=none; d=google.com; s=arc-20160816; b=MXzCRBhkjV5JMekcDPheMm2z2P7/sOjIylMJnul7Ii/hDCHqTfd89WzC62LjU1rXgH GYSjDUgSL41Juet8HQBxwYcH33BlGn/kbu1U2FXFma6vPBOxeRqcJMVEZ8JBtN1IcwpB 9VLOzTDM8s8Rk0uQV004EdHMco891lfY05Yl7IeLNMwpkOeybYvAjJU58/O3p5a9kXnq ssvSkNJ36I7NVqYbjwAxtoPRZ5357+KANZQZlv6OFHhOrAIsWQPDqfsSwuCmRXnlzFSB /bJR/fuFv7IAfn+MihbeixLt1/E5HiK1ouO4LaNHzHgVDsSc3GGQxE78FqGg6/SYlYiC 8iHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=w6p1R73Opo4HpzGkJd7LqNiesa7IOGEhqE0jM75Y3lo=; fh=yMhJBcMLZGdhvAVOT6yqm5xu1VrOSRtNgZe2rHPsgAs=; b=a6+t1XL5c/yAw2GNPQFL3yt0QG2aq6Viy1B6TaR80b6rgHu0dEaBIkAK0BFCcUCFx6 ECtNTXb4TZ7okIfIc5Mv4HAotcQsipuuai/JPyDb6M51FkYkx/Y1Rk//KEnZZ2pHM/GO XhOR/TQ3b0MuRf0xs3QI6ef0WTNWucYn11J2UXC+7id2hHvdaVq+w3G0vd6zv2xt/Q91 R2Y3Rm+PDGc/y783CFUTlh/TYqwGj+LPjs9+BuyTV5m0PSRyHYyAd4vqKppq64QWJW2K gY6lEB2MYWy8p+NFJgfbNDG6z+5LgYC3MXYcYMjr85IhvpsInnECG+FBiwWp4/9xn1uj PhqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=dtnSozCk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id c18-20020a170902d49200b001b886d36bf6si15970114plg.226.2023.09.27.03.26.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 03:27:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=dtnSozCk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 58F81804E2F5; Wed, 27 Sep 2023 00:25:30 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229949AbjI0HZ0 (ORCPT + 99 others); Wed, 27 Sep 2023 03:25:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229458AbjI0HZY (ORCPT ); Wed, 27 Sep 2023 03:25:24 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 966D3BF for ; Wed, 27 Sep 2023 00:25:19 -0700 (PDT) Received: from localhost (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by madras.collabora.co.uk (Postfix) with ESMTPSA id 783D06607324; Wed, 27 Sep 2023 08:25:17 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1695799517; bh=2yW5Hh7PnukuPb/3yAYGBd0QgjRVoSWYzB4eyF4arlo=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=dtnSozCkldqRQ+SvdSFv4Ja9BUl8AzVNPbUYw3FDmAIM6GH/NOmSzjRoERLeYdcCH ZvKPQ2Dz+xbVdg5IF7P/DtkxhGGE4WvvYiIGRLQ2sWR2R2BYVzYaZaP4LlOrcByX/T OVr2fPdm/yskSQHObmlG86z/l9u3j7ukS/JHnGpdz5xRtrtfz892UYlZByNOrHgsoq bNyXuc7YJucxebTITcAeQLin9hp56gRpj1Qq3qRZLA6pEc4IO9xnVXnHefO0swxmK7 t37F+s1jraTmOFvtzlRfqNXfxpnK8ZPRf4U/jb01pDxRUBmBsyiCWi8w8rfbUNxiKX Wc/NiKkM+u1yA== Date: Wed, 27 Sep 2023 09:25:14 +0200 From: Boris Brezillon To: Danilo Krummrich Cc: Luben Tuikov , airlied@gmail.com, daniel@ffwll.ch, matthew.brost@intel.com, christian.koenig@amd.com, faith.ekstrand@collabora.com, dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH drm-misc-next 1/3] drm/sched: implement dynamic job flow control Message-ID: <20230927092514.04776822@collabora.com> In-Reply-To: <12f19494-7fd0-055f-4135-e17581398eb5@redhat.com> References: <20230924224555.15595-1-dakr@redhat.com> <312983ee-ba4c-477e-8846-072c815df862@amd.com> <12f19494-7fd0-055f-4135-e17581398eb5@redhat.com> Organization: Collabora X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 27 Sep 2023 00:25:30 -0700 (PDT) On Wed, 27 Sep 2023 02:13:59 +0200 Danilo Krummrich wrote: > On 9/26/23 22:43, Luben Tuikov wrote: > > Hi, > > > > On 2023-09-24 18:43, Danilo Krummrich wrote: > >> Currently, job flow control is implemented simply by limiting the amount > >> of jobs in flight. Therefore, a scheduler is initialized with a > >> submission limit that corresponds to a certain amount of jobs. > > > > "certain"? How about this instead: > > " ... that corresponds to the number of jobs which can be sent > > to the hardware."? > > > >> > >> This implies that for each job drivers need to account for the maximum > > ^, > > Please add a comma after "job". > > > >> job size possible in order to not overflow the ring buffer. > > > > Well, different hardware designs would implement this differently. > > Ideally, you only want pointers into the ring buffer, and then > > the hardware consumes as much as it can. But this is a moot point > > and it's always a good idea to have a "job size" hint from the client. > > So this is a good patch. > > > > Ideally, you want to say that the hardware needs to be able to > > accommodate the number of jobs which can fit in the hardware > > queue times the largest job. This is a waste of resources > > however, and it is better to give a hint as to the size of a job, > > by the client. If the hardware can peek and understand dependencies, > > on top of knowing the "size of the job", it can be an extremely > > efficient scheduler. > > > >> > >> However, there are drivers, such as Nouveau, where the job size has a > >> rather large range. For such drivers it can easily happen that job > >> submissions not even filling the ring by 1% can block subsequent > >> submissions, which, in the worst case, can lead to the ring run dry. > >> > >> In order to overcome this issue, allow for tracking the actual job size > >> instead of the amount job jobs. Therefore, add a field to track a job's > > > > "the amount job jobs." --> "the number of jobs." > > Yeah, I somehow manage to always get this wrong, which I guess you noticed > below already. > > That's all good points below - gonna address them. > > Did you see Boris' response regarding a separate callback in order to fetch > the job's submission units dynamically? Since this is needed by PowerVR, I'd > like to include this in V2. What's your take on that? > > My only concern with that would be that if I got what Boris was saying > correctly calling > > WARN_ON(s_job->submission_units > sched->submission_limit); > > from drm_sched_can_queue() wouldn't work anymore, since this could indeed happen > temporarily. I think this was also Christian's concern. Actually, I think that's fine to account for the max job size in the first check, we're unlikely to have so many native fence waits that our job can't fit in an empty ring buffer.