Received: by 2002:a05:6358:53a8:b0:117:f937:c515 with SMTP id z40csp4030564rwe; Mon, 17 Apr 2023 07:06:23 -0700 (PDT) X-Google-Smtp-Source: AKy350axiJBWx+l6RB3c+Z6N4c5RmTQbLo6EsWo8quXekiLnj/AzEpwWlhGX0W/bgGcCFhFoUxuY X-Received: by 2002:a17:902:ea05:b0:1a6:b4b6:af62 with SMTP id s5-20020a170902ea0500b001a6b4b6af62mr8917690plg.24.1681740383200; Mon, 17 Apr 2023 07:06:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681740383; cv=none; d=google.com; s=arc-20160816; b=cyR7uIJi8AeTPBPpjddhMejAkhgVt7Q1y9Im7rZ+KHsqGZQseE7aDaCsvgqwpRcv6z vPv6ayP3+RVd18r/kgtmc3JeN7Rzy7Y7I4KZIxaoVLBSZjwcL8JeFbkeHUCZqiFXEht+ 9UmBCBqLxZC+9YOZ9bDJc16FBoO2JEaFaKsiyWBMloe3g+pYkYiz7YsM5DJTKoJIP2Ga L3y3+hh84PM8LC4lctH9vIE0Qhastj1vVX4y1/QYXbVPG+JWBEAtRx+Fr70uj9/P/yli PEBFPAaSSeKrscul5/C1xnf4wj0Qk1iYC0M2zY2/8P6aKY4k6NXj6eRZqtWd+soRG0qE ymLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=3w7nYMjVbZxCsvXrIUOzX8UDSoHuV4Bpbyw38Z/HEPY=; b=U0loEeJmLQRqZRJ7MD94ihvoWxNNHKQ4WB1uvWJ3/vFRC0dmSEaFGCyjQiZgQ1DUI0 LRgevpztn0XQuwdKHiC6lVohXnBgNoMhTUao9NCZU9WBmHDvIllgOZFV20ykvDUHjLE4 IfFHNCVG1qPWpherjzT8BvXoULXyWvyJ4HKbZ54gwzRZ2Jd9+PP0BFZxS9yDZXGH7nVR AcGAiB96QkcdcsQMh01pF1mZ0odjROsaYhILn7gw1BjSQr9KWLXDSx3MgD87iJskIhzG /wO9rUnJF/KQhD0leCd47hOPy/r9TlzHhqsegCipO13E4+GCve9mtSWoFDE6QU0wUkhH YEkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b="WhwOQ/TI"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t14-20020a170902e84e00b001a4f005d59dsi12752595plg.389.2023.04.17.07.06.11; Mon, 17 Apr 2023 07:06:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b="WhwOQ/TI"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230280AbjDQOFC (ORCPT + 99 others); Mon, 17 Apr 2023 10:05:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231209AbjDQOE5 (ORCPT ); Mon, 17 Apr 2023 10:04:57 -0400 Received: from mail-ot1-x32d.google.com (mail-ot1-x32d.google.com [IPv6:2607:f8b0:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A282846BD; Mon, 17 Apr 2023 07:04:37 -0700 (PDT) Received: by mail-ot1-x32d.google.com with SMTP id ca22-20020a056830611600b006a3c1e2b6d2so14641329otb.13; Mon, 17 Apr 2023 07:04:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681740277; x=1684332277; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3w7nYMjVbZxCsvXrIUOzX8UDSoHuV4Bpbyw38Z/HEPY=; b=WhwOQ/TIsg6vv8xV9hLRR7Ia5OalUi/nv2uez9M0KJByFOSB5gSD91q1nDjPbem+zo Be06LX+XyimZeqb5M9JBQAB0G2tgx19iR+Av+TfhTyrhBDTUUPhVsnMOU82C29YnwHHJ 3UT/7mC0bYdheM68FBvXScN2dQetkn/haj0izL85AeyozM2SUtE+p+vdhRkFckwSPtDa dQ4Hhsxi7Oxbo9bBsxBIrzqIqXRq1x9taI5Xerta5fENa2yJy+vKXQmcUJTUtb87Xd+d 4cDIjpXY6PsfIJ9HGI3oCIwH6J1eeU6T8MbDI/Jx710OJwXo8MjmBrmJQY1tQR0LgzCP +NTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681740277; x=1684332277; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3w7nYMjVbZxCsvXrIUOzX8UDSoHuV4Bpbyw38Z/HEPY=; b=IG1Z8ZMZakOD/r4fUezBuamBaJoF8IlQvvJlcwpUtqHgcEneMHu7qlEqwYYmnw+7bw GeVhJQQQHzN+yqTGwYD7U7c9Vk5T7qHD5bnQjIOXk9eFU23VUK90FGtDYy4jUqGk2Pre L8f7wyQDvbw8jJXyPKCGH2ut7p+4ZqIIV9O+QlRTLU+J9vyGNmtohXfYvYZ5y8RTc0A3 E8hnsnATR8L5ozMQncyFTWHXiiBmXMyntnVEKxWsW5ZG0hc+dfdIdwbjAbRzelTWk3Po IzDU+7Wgv+myPpdcekk8vOBmcPngug/x6GcgXZqYY1zznVr9dk7FI9RHiNAiQnAXg3mn EvDQ== X-Gm-Message-State: AAQBX9exLeQrrcWUNCY1i8fsuwfdjF9ryIS5kFwsaxCGEvWt3BrblODD DN+Uq7onCHHwZTscqkuI3X5mgX96cW465IVziz8= X-Received: by 2002:a9d:64da:0:b0:6a5:d9c7:a6d0 with SMTP id n26-20020a9d64da000000b006a5d9c7a6d0mr1646356otl.3.1681740276604; Mon, 17 Apr 2023 07:04:36 -0700 (PDT) MIME-Version: 1.0 References: <8893ad56-8807-eb69-2185-b338725f0b18@linux.intel.com> <09c8d794-bb64-f7ba-f854-f14ac30600a6@linux.intel.com> In-Reply-To: From: Alex Deucher Date: Mon, 17 Apr 2023 10:04:25 -0400 Message-ID: Subject: Re: [PATCH v3 6/7] drm: Add fdinfo memory stats To: Rob Clark Cc: Tvrtko Ursulin , Rob Clark , "open list:DOCUMENTATION" , linux-arm-msm@vger.kernel.org, Jonathan Corbet , Emil Velikov , Christopher Healy , dri-devel@lists.freedesktop.org, open list , Boris Brezillon , Thomas Zimmermann , freedreno@lists.freedesktop.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 17, 2023 at 9:43=E2=80=AFAM Rob Clark wro= te: > > On Mon, Apr 17, 2023 at 4:10=E2=80=AFAM Tvrtko Ursulin > wrote: > > > > > > On 16/04/2023 08:48, Daniel Vetter wrote: > > > On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote: > > >> On Fri, Apr 14, 2023 at 1:57=E2=80=AFAM Tvrtko Ursulin > > >> wrote: > > >>> > > >>> > > >>> On 13/04/2023 21:05, Daniel Vetter wrote: > > >>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote: > > >>>>> > > >>>>> On 13/04/2023 14:27, Daniel Vetter wrote: > > >>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote: > > >>>>>>> > > >>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote: > > >>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote: > > >>>>>>>>> On Wed, Apr 12, 2023 at 11:17=E2=80=AFAM Daniel Vetter wrote: > > >>>>>>>>>> > > >>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote: > > >>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42=E2=80=AFAM Tvrtko Ursulin > > >>>>>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote: > > >>>>>>>>>>>>> From: Rob Clark > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Add support to dump GEM stats to fdinfo. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u= 64 > > >>>>>>>>>>>>> v3: Do it in core > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Signed-off-by: Rob Clark > > >>>>>>>>>>>>> Reviewed-by: Emil Velikov > > >>>>>>>>>>>>> --- > > >>>>>>>>>>>>> Documentation/gpu/drm-usage-stats.rst | 21 ++++++++ > > >>>>>>>>>>>>> drivers/gpu/drm/drm_file.c | 76 ++++++++= +++++++++++++++++++ > > >>>>>>>>>>>>> include/drm/drm_file.h | 1 + > > >>>>>>>>>>>>> include/drm/drm_gem.h | 19 +++++++ > > >>>>>>>>>>>>> 4 files changed, 117 insertions(+) > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Docu= mentation/gpu/drm-usage-stats.rst > > >>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644 > > >>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst > > >>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst > > >>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the= respective memory region. > > >>>>>>>>>>>>> Default unit shall be bytes with optional unit spec= ifiers of 'KiB' or 'MiB' > > >>>>>>>>>>>>> indicating kibi- or mebi-bytes. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> +- drm-shared-memory: [KiB|MiB] > > >>>>>>>>>>>>> + > > >>>>>>>>>>>>> +The total size of buffers that are shared with another f= ile (ie. have more > > >>>>>>>>>>>>> +than a single handle). > > >>>>>>>>>>>>> + > > >>>>>>>>>>>>> +- drm-private-memory: [KiB|MiB] > > >>>>>>>>>>>>> + > > >>>>>>>>>>>>> +The total size of buffers that are not shared with anoth= er file. > > >>>>>>>>>>>>> + > > >>>>>>>>>>>>> +- drm-resident-memory: [KiB|MiB] > > >>>>>>>>>>>>> + > > >>>>>>>>>>>>> +The total size of buffers that are resident in system me= mory. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I think this naming maybe does not work best with the exis= ting > > >>>>>>>>>>>> drm-memory- keys. > > >>>>>>>>>>> > > >>>>>>>>>>> Actually, it was very deliberate not to conflict with the e= xisting > > >>>>>>>>>>> drm-memory- keys ;-) > > >>>>>>>>>>> > > >>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} = but it > > >>>>>>>>>>> could be mis-parsed by existing userspace so my hands were = a bit tied. > > >>>>>>>>>>> > > >>>>>>>>>>>> How about introduce the concept of a memory region from th= e start and > > >>>>>>>>>>>> use naming similar like we do for engines? > > >>>>>>>>>>>> > > >>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ... > > >>>>>>>>>>>> > > >>>>>>>>>>>> Then we document a bunch of categories and their semantics= , for instance: > > >>>>>>>>>>>> > > >>>>>>>>>>>> 'size' - All reachable objects > > >>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1 > > >>>>>>>>>>>> 'resident' - Objects with backing store > > >>>>>>>>>>>> 'active' - Objects in use, subset of resident > > >>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident. > > >>>>>>>>>>>> > > >>>>>>>>>>>> We keep the same semantics as with process memory accounti= ng (if I got > > >>>>>>>>>>>> it right) which could be desirable for a simplified mental= model. > > >>>>>>>>>>>> > > >>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys sem= antics. If we > > >>>>>>>>>>>> correctly captured this in the first round it should be eq= uivalent to > > >>>>>>>>>>>> 'resident' above. In any case we can document no category = is equal to > > >>>>>>>>>>>> which category, and at most one of the two must be output.= ) > > >>>>>>>>>>>> > > >>>>>>>>>>>> Region names we at most partially standardize. Like we cou= ld say > > >>>>>>>>>>>> 'system' is to be used where backing store is system RAM a= nd others are > > >>>>>>>>>>>> driver defined. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one fo= r each memory > > >>>>>>>>>>>> region they support. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I think this all also works for objects which can be migra= ted between > > >>>>>>>>>>>> memory regions. 'Size' accounts them against all regions w= hile for > > >>>>>>>>>>>> 'resident' they only appear in the region of their current= placement, etc. > > >>>>>>>>>>> > > >>>>>>>>>>> I'm not too sure how to rectify different memory regions wi= th this, > > >>>>>>>>>>> since drm core doesn't really know about the driver's memor= y regions. > > >>>>>>>>>>> Perhaps we can go back to this being a helper and drivers w= ith vram > > >>>>>>>>>>> just don't use the helper? Or?? > > >>>>>>>>>> > > >>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGI= ON}: then it > > >>>>>>>>>> all works out reasonably consistently? > > >>>>>>>>> > > >>>>>>>>> That is basically what we have now. I could append -system t= o each to > > >>>>>>>>> make things easier to add vram/etc (from a uabi standpoint).. > > >>>>>>>> > > >>>>>>>> What you have isn't really -system, but everything. So doesn't= really make > > >>>>>>>> sense to me to mark this -system, it's only really true for in= tegrated (if > > >>>>>>>> they don't have stolen or something like that). > > >>>>>>>> > > >>>>>>>> Also my comment was more in reply to Tvrtko's suggestion. > > >>>>>>> > > >>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I t= hink aligns > > >>>>>>> with the current drm-memory-$REGION by extending, rather than c= reating > > >>>>>>> confusion with different order of key name components. > > >>>>>> > > >>>>>> Oh my comment was pretty much just bikeshed, in case someone cre= ates a > > >>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsin= g point. > > >>>>>> So $CATEGORY before the -memory. > > >>>>>> > > >>>>>> Otoh I don't think that'll happen, so I guess we can go with wha= tever more > > >>>>>> folks like :-) I don't really care much personally. > > >>>>> > > >>>>> Okay I missed the parsing problem. > > >>>>> > > >>>>>>> AMD currently has (among others) drm-memory-vram, which we coul= d define in > > >>>>>>> the spec maps to category X, if category component is not prese= nt. > > >>>>>>> > > >>>>>>> Some examples: > > >>>>>>> > > >>>>>>> drm-memory-resident-system: > > >>>>>>> drm-memory-size-lmem0: > > >>>>>>> drm-memory-active-vram: > > >>>>>>> > > >>>>>>> Etc.. I think it creates a consistent story. > > >>>>>>> > > >>>>>>> Other than this, my two I think significant opens which haven't= been > > >>>>>>> addressed yet are: > > >>>>>>> > > >>>>>>> 1) > > >>>>>>> > > >>>>>>> Why do we want totals (not per region) when userspace can trivi= ally > > >>>>>>> aggregate if they want. What is the use case? > > >>>>>>> > > >>>>>>> 2) > > >>>>>>> > > >>>>>>> Current proposal limits the value to whole objects and fixates = that by > > >>>>>>> having it in the common code. If/when some driver is able to su= pport sub-BO > > >>>>>>> granularity they will need to opt out of the common printer at = which point > > >>>>>>> it may be less churn to start with a helper rather than mid-lay= er. Or maybe > > >>>>>>> some drivers already support this, I don't know. Given how impo= rtant VM BIND > > >>>>>>> is I wouldn't be surprised. > > >>>>>> > > >>>>>> I feel like for drivers using ttm we want a ttm helper which tak= es care of > > >>>>>> the region printing in hopefully a standard way. And that could = then also > > >>>>>> take care of all kinds of of partial binding and funny rules (li= ke maybe > > >>>>>> we want a standard vram region that addds up all the lmem region= s on > > >>>>>> intel, so that all dgpu have a common vram bucket that generic t= ools > > >>>>>> understand?). > > >>>>> > > >>>>> First part yes, but for the second I would think we want to avoid= any > > >>>>> aggregation in the kernel which can be done in userspace just as = well. Such > > >>>>> total vram bucket would be pretty useless on Intel even since use= rspace > > >>>>> needs to be region aware to make use of all resources. It could e= ven be > > >>>>> counter productive I think - "why am I getting out of memory when= half of my > > >>>>> vram is unused!?". > > >>>> > > >>>> This is not for intel-aware userspace. This is for fairly generic = "gputop" > > >>>> style userspace, which might simply have no clue or interest in wh= at lmemX > > >>>> means, but would understand vram. > > >>>> > > >>>> Aggregating makes sense. > > >>> > > >>> Lmem vs vram is now an argument not about aggregation but about > > >>> standardizing regions names. > > >>> > > >>> One detail also is a change in philosophy compared to engine stats = where > > >>> engine names are not centrally prescribed and it was expected users= pace > > >>> will have to handle things generically and with some vendor specifi= c > > >>> knowledge. > > >>> > > >>> Like in my gputop patches. It doesn't need to understand what is wh= at, > > >>> it just finds what's there and presents it to the user. > > >>> > > >>> Come some accel driver with local memory it wouldn't be vram any mo= re. > > >>> Or even a headless data center GPU. So I really don't think it is g= ood > > >>> to hardcode 'vram' in the spec, or midlayer, or helpers. > > >>> > > >>> And for aggregation.. again, userspace can do it just as well. If w= e do > > >>> it in kernel then immediately we have multiple sets of keys to outp= ut > > >>> for any driver which wants to show the region view. IMO it is just > > >>> pointless work in the kernel and more code in the kernel, when user= space > > >>> can do it. > > >>> > > >>> Proposal A (one a discrete gpu, one category only): > > >>> > > >>> drm-resident-memory: x KiB > > >>> drm-resident-memory-system: x KiB > > >>> drm-resident-memory-vram: x KiB > > >>> > > >>> Two loops in the kernel, more parsing in userspace. > > >> > > >> why would it be more than one loop, ie. > > >> > > >> mem.resident +=3D size; > > >> mem.category[cat].resident +=3D size; > > >> > > >> At the end of the day, there is limited real-estate to show a millio= n > > >> different columns of information. Even the gputop patches I posted > > >> don't show everything of what is currently there. And nvtop only > > >> shows toplevel resident stat. So I think the "everything" stat is > > >> going to be what most tools use. > > > > > > Yeah with enough finesse the double-loop isn't needed, it's just the > > > simplest possible approach. > > > > > > Also this is fdinfo, I _really_ want perf data showing that it's a > > > real-world problem when we conjecture about algorithmic complexity. > > > procutils have been algorithmically garbage since decades after all := -) > > > > Just run it. :) > > > > Algorithmic complexity is quite obvious and not a conjecture - to find > > DRM clients you have to walk _all_ pids and _all_ fds under them. So > > amount of work can scale very quickly and even _not_ with the number of > > DRM clients. > > > > It's not too bad on my desktop setup but it is significantly more CPU > > intensive than top(1). > > > > It would be possible to optimise the current code some more by not > > parsing full fdinfo (may become more important as number of keys grow), > > but that's only relevant when number of drm fds is large. It doesn't > > solve the basic pids * open fds search for which we'd need a way to wal= k > > the list of pids with drm fds directly. > > All of which has (almost[1]) nothing to do with one loop or two > (ignoring for a moment that I already pointed out a single loop is all > that is needed). If CPU overhead is a problem, we could perhaps come > up some sysfs which has one file per drm_file and side-step crawling > of all of the proc * fd. I'll play around with it some but I'm pretty > sure you are trying to optimize the wrong thing. Yeah, we have customers that would like a single interface (IOCTL or sysfs) to get all of this info rather than having to walk a ton of files and do effectively two syscalls to accumulate all of this data for all of the processes on the system. Alex > > BR, > -R > > [1] generally a single process using drm has multiple fd's pointing at > the same drm_file.. which makes the current approach of having to read > fdinfo to find the client-id sub-optimal. But still the total # of > proc * fd is much larger