Received: by 2002:a05:6358:53a8:b0:117:f937:c515 with SMTP id z40csp4013017rwe; Mon, 17 Apr 2023 06:54:53 -0700 (PDT) X-Google-Smtp-Source: AKy350Yq1mnRFfeo+tiXPp62kl7zuVJCTthNVB+2c8r9X/fqdzWTRO85IcCFD/UX0uGLyR2CCwo4 X-Received: by 2002:a05:6a20:b284:b0:eb:df85:5d7c with SMTP id ei4-20020a056a20b28400b000ebdf855d7cmr14123790pzb.11.1681739693276; Mon, 17 Apr 2023 06:54:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681739693; cv=none; d=google.com; s=arc-20160816; b=D54kqnr+xwi1ayTFqQ1aOtQ2YF6wj9YML7fMZZ+CgY6Zq/SNRxBzyQLLhm3LmMqveE m8REMslmzod+5JT9OB/pOTFO/HP6hRJEtrwMfukzwJLAUTgm1COaThmUh6j7EPQ1Gvy0 VqnWm3bCYX0TRHd6cbYhyVkGvOMgxmvq1zBUUN2UqvrN+7/Ff3RGLC6+o1xtckr6O1KH xTYNNEVwIhMHn9KscPUoddHbD9VkPcA8B5aT5055xpbwUf+FPxF3HInZm5mWRfkMO0hJ FatCORvmzDUUyMbXnCNIQjexvpEv5ihWP2oVIDKgl9hqqkDTk587xTNsIXLAjAat486R YNfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=n1AboqRVWhxplnRX9xmGZ+3x7052CTP+9TiYfyhoHXg=; b=zfNiXFS8FYPBQTAUEiE8CJe3Pf/Cnjsox+ovchvPiJ9lqeg0czosqeOdPm0DlZiPhZ T6W06Q8ZVdvRgYFJTwOpwOAGa8EVoNlxZAdPZa5rtxfs7mBwg7OLbSrZy1byYDCgCnP1 OOcg6y7YKvxFCN5polvoy7wSvvc5PZNs8KfYtdkqC91Qe2ve0Kv0ZtpqgaY7vmkOH6FU ruhW3veRyN5+ZTA/IEaicCeHf8iyx26g0phrRx33vAAX8wMjzYb30o5uh8ebs5WYJ9y5 DAptebmGL+md+vt7zsr+SwicJYbvtA1rPcMDOKy/6AvDepAHiUgRJ8HXli91ax76Q6vR ExhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=ohPPr+vP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s32-20020a17090a2f2300b00240d7509eb8si13127112pjd.114.2023.04.17.06.54.35; Mon, 17 Apr 2023 06:54:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=ohPPr+vP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230505AbjDQNnH (ORCPT + 99 others); Mon, 17 Apr 2023 09:43:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230047AbjDQNnF (ORCPT ); Mon, 17 Apr 2023 09:43:05 -0400 Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 114B36EB7; Mon, 17 Apr 2023 06:43:01 -0700 (PDT) Received: by mail-oi1-x236.google.com with SMTP id x22so5150402oiv.10; Mon, 17 Apr 2023 06:43:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681738980; x=1684330980; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=n1AboqRVWhxplnRX9xmGZ+3x7052CTP+9TiYfyhoHXg=; b=ohPPr+vPNReALlZf3AM1ln/5JqsYayv5WB/dJHTDy7h8gZOm2JiiHhkHExsuy0K/vb QxcTvrYjHVQLdh2MTccx5TflioxpArtJZq6eH7/ntDby2mtDq3bAfi238Z2t6ZfYyU4z FSa52wbkSms2TfgZ76cHxXCTk1FlCJz6VFqvlE5BdEMY4rB2cRO/kVKn8Jf1ZRDJmjmp ORF949qIofG9JH/5nMpP5yt3qEuSU6SOnochhwmvcyIlfq2/rvD82bKywSe+wGjBmr3U CNlFpX32BQ0AypmST7uzdIbp356z1fbUKLssI6bMhpFWxPpdSy5xIJHvvNgiP5vJEb7K Zl+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681738980; x=1684330980; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n1AboqRVWhxplnRX9xmGZ+3x7052CTP+9TiYfyhoHXg=; b=H3QB30leNJLzK1c1lH1qb0eBRO+VBLBcFL9WuXsSYyAHYvbDD+soPpvBJ2GLGn870G p52B7uvdKpmiw5u4ypiA+ylCkR+GIe4ueT+j8cBbeCSpukKmY54HYD4MMWRO8+3JyF85 8PaMv/oeMoSTvAcZ5qJEyHaob/c14Q5EBeHPdHb5VXH/4STvpeSuqHUQYs11IJ5p760p mwTAsxeQSwpDZSqruPHYhkUKGD9NTbg6PI14jxxGfOf7oy7K82rKzsUjZuob+3lkErHr hgU4pzxw/++BQyplBnXlYzaWkdnf5yLIKlc6LIMWcyDyayVxPBzv1a0L2bYCNlaIPu2u r/Jw== X-Gm-Message-State: AAQBX9evyApw7iULh9L8L5hGOoMIf9SYdeJ/AcyET+MvAH/a1iOJrBdT UWPaQ/W2INt1bf0tgf8nRSGAvuPjIaksfNCiHGY= X-Received: by 2002:aca:de87:0:b0:38e:30:121b with SMTP id v129-20020acade87000000b0038e0030121bmr1054181oig.5.1681738979969; Mon, 17 Apr 2023 06:42:59 -0700 (PDT) MIME-Version: 1.0 References: <8893ad56-8807-eb69-2185-b338725f0b18@linux.intel.com> <09c8d794-bb64-f7ba-f854-f14ac30600a6@linux.intel.com> In-Reply-To: <09c8d794-bb64-f7ba-f854-f14ac30600a6@linux.intel.com> From: Rob Clark Date: Mon, 17 Apr 2023 06:42:48 -0700 Message-ID: Subject: Re: [PATCH v3 6/7] drm: Add fdinfo memory stats To: Tvrtko Ursulin Cc: Rob Clark , Jonathan Corbet , linux-arm-msm@vger.kernel.org, "open list:DOCUMENTATION" , Emil Velikov , Christopher Healy , dri-devel@lists.freedesktop.org, open list , Boris Brezillon , Thomas Zimmermann , freedreno@lists.freedesktop.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 17, 2023 at 4:10=E2=80=AFAM Tvrtko Ursulin wrote: > > > On 16/04/2023 08:48, Daniel Vetter wrote: > > On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote: > >> On Fri, Apr 14, 2023 at 1:57=E2=80=AFAM Tvrtko Ursulin > >> wrote: > >>> > >>> > >>> On 13/04/2023 21:05, Daniel Vetter wrote: > >>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote: > >>>>> > >>>>> On 13/04/2023 14:27, Daniel Vetter wrote: > >>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote: > >>>>>>> > >>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote: > >>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote: > >>>>>>>>> On Wed, Apr 12, 2023 at 11:17=E2=80=AFAM Daniel Vetter wrote: > >>>>>>>>>> > >>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote: > >>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42=E2=80=AFAM Tvrtko Ursulin > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote: > >>>>>>>>>>>>> From: Rob Clark > >>>>>>>>>>>>> > >>>>>>>>>>>>> Add support to dump GEM stats to fdinfo. > >>>>>>>>>>>>> > >>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64 > >>>>>>>>>>>>> v3: Do it in core > >>>>>>>>>>>>> > >>>>>>>>>>>>> Signed-off-by: Rob Clark > >>>>>>>>>>>>> Reviewed-by: Emil Velikov > >>>>>>>>>>>>> --- > >>>>>>>>>>>>> Documentation/gpu/drm-usage-stats.rst | 21 ++++++++ > >>>>>>>>>>>>> drivers/gpu/drm/drm_file.c | 76 ++++++++++= +++++++++++++++++ > >>>>>>>>>>>>> include/drm/drm_file.h | 1 + > >>>>>>>>>>>>> include/drm/drm_gem.h | 19 +++++++ > >>>>>>>>>>>>> 4 files changed, 117 insertions(+) > >>>>>>>>>>>>> > >>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Docume= ntation/gpu/drm-usage-stats.rst > >>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644 > >>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst > >>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst > >>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the r= espective memory region. > >>>>>>>>>>>>> Default unit shall be bytes with optional unit specif= iers of 'KiB' or 'MiB' > >>>>>>>>>>>>> indicating kibi- or mebi-bytes. > >>>>>>>>>>>>> > >>>>>>>>>>>>> +- drm-shared-memory: [KiB|MiB] > >>>>>>>>>>>>> + > >>>>>>>>>>>>> +The total size of buffers that are shared with another fil= e (ie. have more > >>>>>>>>>>>>> +than a single handle). > >>>>>>>>>>>>> + > >>>>>>>>>>>>> +- drm-private-memory: [KiB|MiB] > >>>>>>>>>>>>> + > >>>>>>>>>>>>> +The total size of buffers that are not shared with another= file. > >>>>>>>>>>>>> + > >>>>>>>>>>>>> +- drm-resident-memory: [KiB|MiB] > >>>>>>>>>>>>> + > >>>>>>>>>>>>> +The total size of buffers that are resident in system memo= ry. > >>>>>>>>>>>> > >>>>>>>>>>>> I think this naming maybe does not work best with the existi= ng > >>>>>>>>>>>> drm-memory- keys. > >>>>>>>>>>> > >>>>>>>>>>> Actually, it was very deliberate not to conflict with the exi= sting > >>>>>>>>>>> drm-memory- keys ;-) > >>>>>>>>>>> > >>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} bu= t it > >>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a = bit tied. > >>>>>>>>>>> > >>>>>>>>>>>> How about introduce the concept of a memory region from the = start and > >>>>>>>>>>>> use naming similar like we do for engines? > >>>>>>>>>>>> > >>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ... > >>>>>>>>>>>> > >>>>>>>>>>>> Then we document a bunch of categories and their semantics, = for instance: > >>>>>>>>>>>> > >>>>>>>>>>>> 'size' - All reachable objects > >>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1 > >>>>>>>>>>>> 'resident' - Objects with backing store > >>>>>>>>>>>> 'active' - Objects in use, subset of resident > >>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident. > >>>>>>>>>>>> > >>>>>>>>>>>> We keep the same semantics as with process memory accounting= (if I got > >>>>>>>>>>>> it right) which could be desirable for a simplified mental m= odel. > >>>>>>>>>>>> > >>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys seman= tics. If we > >>>>>>>>>>>> correctly captured this in the first round it should be equi= valent to > >>>>>>>>>>>> 'resident' above. In any case we can document no category is= equal to > >>>>>>>>>>>> which category, and at most one of the two must be output.) > >>>>>>>>>>>> > >>>>>>>>>>>> Region names we at most partially standardize. Like we could= say > >>>>>>>>>>>> 'system' is to be used where backing store is system RAM and= others are > >>>>>>>>>>>> driver defined. > >>>>>>>>>>>> > >>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for = each memory > >>>>>>>>>>>> region they support. > >>>>>>>>>>>> > >>>>>>>>>>>> I think this all also works for objects which can be migrate= d between > >>>>>>>>>>>> memory regions. 'Size' accounts them against all regions whi= le for > >>>>>>>>>>>> 'resident' they only appear in the region of their current p= lacement, etc. > >>>>>>>>>>> > >>>>>>>>>>> I'm not too sure how to rectify different memory regions with= this, > >>>>>>>>>>> since drm core doesn't really know about the driver's memory = regions. > >>>>>>>>>>> Perhaps we can go back to this being a helper and drivers wit= h vram > >>>>>>>>>>> just don't use the helper? Or?? > >>>>>>>>>> > >>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION= }: then it > >>>>>>>>>> all works out reasonably consistently? > >>>>>>>>> > >>>>>>>>> That is basically what we have now. I could append -system to = each to > >>>>>>>>> make things easier to add vram/etc (from a uabi standpoint).. > >>>>>>>> > >>>>>>>> What you have isn't really -system, but everything. So doesn't r= eally make > >>>>>>>> sense to me to mark this -system, it's only really true for inte= grated (if > >>>>>>>> they don't have stolen or something like that). > >>>>>>>> > >>>>>>>> Also my comment was more in reply to Tvrtko's suggestion. > >>>>>>> > >>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I thi= nk aligns > >>>>>>> with the current drm-memory-$REGION by extending, rather than cre= ating > >>>>>>> confusion with different order of key name components. > >>>>>> > >>>>>> Oh my comment was pretty much just bikeshed, in case someone creat= es a > >>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing = point. > >>>>>> So $CATEGORY before the -memory. > >>>>>> > >>>>>> Otoh I don't think that'll happen, so I guess we can go with whate= ver more > >>>>>> folks like :-) I don't really care much personally. > >>>>> > >>>>> Okay I missed the parsing problem. > >>>>> > >>>>>>> AMD currently has (among others) drm-memory-vram, which we could = define in > >>>>>>> the spec maps to category X, if category component is not present= . > >>>>>>> > >>>>>>> Some examples: > >>>>>>> > >>>>>>> drm-memory-resident-system: > >>>>>>> drm-memory-size-lmem0: > >>>>>>> drm-memory-active-vram: > >>>>>>> > >>>>>>> Etc.. I think it creates a consistent story. > >>>>>>> > >>>>>>> Other than this, my two I think significant opens which haven't b= een > >>>>>>> addressed yet are: > >>>>>>> > >>>>>>> 1) > >>>>>>> > >>>>>>> Why do we want totals (not per region) when userspace can trivial= ly > >>>>>>> aggregate if they want. What is the use case? > >>>>>>> > >>>>>>> 2) > >>>>>>> > >>>>>>> Current proposal limits the value to whole objects and fixates th= at by > >>>>>>> having it in the common code. If/when some driver is able to supp= ort sub-BO > >>>>>>> granularity they will need to opt out of the common printer at wh= ich point > >>>>>>> it may be less churn to start with a helper rather than mid-layer= . Or maybe > >>>>>>> some drivers already support this, I don't know. Given how import= ant VM BIND > >>>>>>> is I wouldn't be surprised. > >>>>>> > >>>>>> I feel like for drivers using ttm we want a ttm helper which takes= care of > >>>>>> the region printing in hopefully a standard way. And that could th= en also > >>>>>> take care of all kinds of of partial binding and funny rules (like= maybe > >>>>>> we want a standard vram region that addds up all the lmem regions = on > >>>>>> intel, so that all dgpu have a common vram bucket that generic too= ls > >>>>>> understand?). > >>>>> > >>>>> First part yes, but for the second I would think we want to avoid a= ny > >>>>> aggregation in the kernel which can be done in userspace just as we= ll. Such > >>>>> total vram bucket would be pretty useless on Intel even since users= pace > >>>>> needs to be region aware to make use of all resources. It could eve= n be > >>>>> counter productive I think - "why am I getting out of memory when h= alf of my > >>>>> vram is unused!?". > >>>> > >>>> This is not for intel-aware userspace. This is for fairly generic "g= putop" > >>>> style userspace, which might simply have no clue or interest in what= lmemX > >>>> means, but would understand vram. > >>>> > >>>> Aggregating makes sense. > >>> > >>> Lmem vs vram is now an argument not about aggregation but about > >>> standardizing regions names. > >>> > >>> One detail also is a change in philosophy compared to engine stats wh= ere > >>> engine names are not centrally prescribed and it was expected userspa= ce > >>> will have to handle things generically and with some vendor specific > >>> knowledge. > >>> > >>> Like in my gputop patches. It doesn't need to understand what is what= , > >>> it just finds what's there and presents it to the user. > >>> > >>> Come some accel driver with local memory it wouldn't be vram any more= . > >>> Or even a headless data center GPU. So I really don't think it is goo= d > >>> to hardcode 'vram' in the spec, or midlayer, or helpers. > >>> > >>> And for aggregation.. again, userspace can do it just as well. If we = do > >>> it in kernel then immediately we have multiple sets of keys to output > >>> for any driver which wants to show the region view. IMO it is just > >>> pointless work in the kernel and more code in the kernel, when usersp= ace > >>> can do it. > >>> > >>> Proposal A (one a discrete gpu, one category only): > >>> > >>> drm-resident-memory: x KiB > >>> drm-resident-memory-system: x KiB > >>> drm-resident-memory-vram: x KiB > >>> > >>> Two loops in the kernel, more parsing in userspace. > >> > >> why would it be more than one loop, ie. > >> > >> mem.resident +=3D size; > >> mem.category[cat].resident +=3D size; > >> > >> At the end of the day, there is limited real-estate to show a million > >> different columns of information. Even the gputop patches I posted > >> don't show everything of what is currently there. And nvtop only > >> shows toplevel resident stat. So I think the "everything" stat is > >> going to be what most tools use. > > > > Yeah with enough finesse the double-loop isn't needed, it's just the > > simplest possible approach. > > > > Also this is fdinfo, I _really_ want perf data showing that it's a > > real-world problem when we conjecture about algorithmic complexity. > > procutils have been algorithmically garbage since decades after all :-) > > Just run it. :) > > Algorithmic complexity is quite obvious and not a conjecture - to find > DRM clients you have to walk _all_ pids and _all_ fds under them. So > amount of work can scale very quickly and even _not_ with the number of > DRM clients. > > It's not too bad on my desktop setup but it is significantly more CPU > intensive than top(1). > > It would be possible to optimise the current code some more by not > parsing full fdinfo (may become more important as number of keys grow), > but that's only relevant when number of drm fds is large. It doesn't > solve the basic pids * open fds search for which we'd need a way to walk > the list of pids with drm fds directly. All of which has (almost[1]) nothing to do with one loop or two (ignoring for a moment that I already pointed out a single loop is all that is needed). If CPU overhead is a problem, we could perhaps come up some sysfs which has one file per drm_file and side-step crawling of all of the proc * fd. I'll play around with it some but I'm pretty sure you are trying to optimize the wrong thing. BR, -R [1] generally a single process using drm has multiple fd's pointing at the same drm_file.. which makes the current approach of having to read fdinfo to find the client-id sub-optimal. But still the total # of proc * fd is much larger