Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp1033381rwb; Wed, 26 Jul 2023 06:44:18 -0700 (PDT) X-Google-Smtp-Source: APBJJlFmFkenR1JTnb246fwDYrYKXA/KniOCTaxBRkcQROemmbVpfcpetKKMFz3ZoRb4gqg/Dt5K X-Received: by 2002:a17:906:10b:b0:991:d2a8:6588 with SMTP id 11-20020a170906010b00b00991d2a86588mr1666612eje.51.1690379058008; Wed, 26 Jul 2023 06:44:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690379057; cv=none; d=google.com; s=arc-20160816; b=HcKU5Z3jgMZLLa0uzGUMWmUogN6vb2nt4SMEezgd/bSBTCqytpdNqagG9VgauJ1azv 8x1Fa4/SDGR/+t/0iBb7Wd5dldXKK37WITnzZyCMRltTowfzHx1cPQXGvHFO/w01nKw1 H6a1Q28cC0Dqe2ceqsel874WnYcssp0vn7MDax4jLg4SuucilaqNWWJz1P2q7nLvsFLI QAQVd/EukO+JuLvzR/Gvif9rnywi3GMq9cxLzgNOJ9q8GQPSHeHbsZIyRLa9qBFKO3jw o70FvKG9aE/k7WEp1w4xwXlS0gaWHkLc1+tomalj8L7xRYf9WhmwTSq5wFkZP3CZaHBk Tbuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=dr9Jl+ob5yrt9+P/OzSTYVs9YMrSNM6KR2DWuql4r4I=; fh=FIo20aVGUDtah8oRurCIoCzJ1ht3sEeHMxaKOZbJZgE=; b=AUoLhAtFDPyTqtKI1O4+5qlSQB/hN1QC6abfIW4NwqBLpC5oR0h/iSByN5eoII5Pdx WbHrWqXEDU/jKYXkbgfjE/gTek79q03iJ9FLvkmVmszoZTlPHZX8m+jR9uPIfT9WPVf5 tgM1jK1urQtkd0ZZp7c0JK9wF5mxZmUnQ4wOT0eZ7J9t9OzH8w5+qtc75UU0j2M3m9kd uE3Tf6XELPJfmryiXYZ/iC8vfGGZZfcbznAF2vNDCc0TpWR26n4My+tLJGoQd7X6kw3g cBDTzr7emGrzPLZXCWUeNphE4ofiUsZHV1zdQrZo38mgKl+ZOHLiVlasxwDHuMpEYU7r XmZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=KhIG6kDv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p2-20020a1709060e8200b00992dce29d4csi9603926ejf.874.2023.07.26.06.43.51; Wed, 26 Jul 2023 06:44:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=KhIG6kDv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234193AbjGZLpA (ORCPT + 99 others); Wed, 26 Jul 2023 07:45:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232311AbjGZLo6 (ORCPT ); Wed, 26 Jul 2023 07:44:58 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F9E73582; Wed, 26 Jul 2023 04:44:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690371876; x=1721907876; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=PkVu2FDpja2qAE94z99NnNol7HutAbJ4ogDZ3EgAmRI=; b=KhIG6kDvpcjokOwh8O6CgLrxkDRuIOZcy3ftWAD3wvGZTyOcOrfj1N6A f20rjwyWmPjiv7UngMdw0QhgvIXUtkfqNJKMmhboK7x8gPEMxpakvpdxd WcLTlv1zTayBYq7Jp3V7vtZdMr18ZKNxCzTyyk6rP8Xb4k7Vaf6GUMpMb TCN16WM8nWyf2DXDQ1vRm0INcA+9NC5D9ii+mCrNKRONLpHVtU00LDDN5 OvO9YbGxKaj+5cu9KffOOZr8yyXsYUEAH/WdyvzF+TQeUF8nk9qbUSUMH Kyr/n8r8lDZ+gSZwbKXp7qnGoCCGIVIy0Xxs0jN6ph5Tx2CWDMcWmQ2oO g==; X-IronPort-AV: E=McAfee;i="6600,9927,10782"; a="347601785" X-IronPort-AV: E=Sophos;i="6.01,231,1684825200"; d="scan'208";a="347601785" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jul 2023 04:41:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10782"; a="850437409" X-IronPort-AV: E=Sophos;i="6.01,231,1684825200"; d="scan'208";a="850437409" Received: from spsmyth-mobl1.ger.corp.intel.com (HELO [10.213.229.32]) ([10.213.229.32]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jul 2023 04:41:29 -0700 Message-ID: <89d7181c-6830-ca6e-0c39-caa49d14d474@linux.intel.com> Date: Wed, 26 Jul 2023 12:41:27 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [PATCH 16/17] cgroup/drm: Expose memory stats Content-Language: en-US To: Maarten Lankhorst , Tejun Heo Cc: Intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Johannes Weiner , Zefan Li , Dave Airlie , Daniel Vetter , Rob Clark , =?UTF-8?Q?St=c3=a9phane_Marchesin?= , "T . J . Mercier" , Kenny.Ho@amd.com, =?UTF-8?Q?Christian_K=c3=b6nig?= , Brian Welty , Tvrtko Ursulin , Eero Tamminen References: <20230712114605.519432-1-tvrtko.ursulin@linux.intel.com> <20230712114605.519432-17-tvrtko.ursulin@linux.intel.com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,HK_RANDOM_ENVFROM,HK_RANDOM_FROM, NICE_REPLY_A,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26/07/2023 11:14, Maarten Lankhorst wrote: > Hey, > > On 2023-07-22 00:21, Tejun Heo wrote: >> On Wed, Jul 12, 2023 at 12:46:04PM +0100, Tvrtko Ursulin wrote: >>>    $ cat drm.memory.stat >>>    card0 region=system total=12898304 shared=0 active=0 >>> resident=12111872 purgeable=167936 >>>    card0 region=stolen-system total=0 shared=0 active=0 resident=0 >>> purgeable=0 >>> >>> Data is generated on demand for simplicty of implementation ie. no >>> running >>> totals are kept or accounted during migrations and such. Various >>> optimisations such as cheaper collection of data are possible but >>> deliberately left out for now. >>> >>> Overall, the feature is deemed to be useful to container orchestration >>> software (and manual management). >>> >>> Limits, either soft or hard, are not envisaged to be implemented on >>> top of >>> this approach due on demand nature of collecting the stats. >> >> So, yeah, if you want to add memory controls, we better think through how >> the fd ownership migration should work. > I've taken a look at the series, since I have been working on cgroup > memory eviction. > > The scheduling stuff will work for i915, since it has a purely software > execlist scheduler, but I don't think it will work for GuC (firmware) > scheduling or other drivers that use the generic drm scheduler. It actually works - I used to have a blurb in the cover letter about it but apparently I dropped it. Just a bit less well with many clients, since there are fewer priority levels. All that the design requires from the invididual drivers is some way to react to the "you are over budget by this much" signal. The rest is driver and backend specific. > For something like this,  you would probably want it to work inside the > drm scheduler first. Presumably, this can be done by setting a weight on > each runqueue, and perhaps adding a callback to update one for a running > queue. Calculating the weights hierarchically might be fun.. It is not needed to work in drm scheduler first. In fact drm scheduler based drivers can plug into what I have since it already has the notion of scheduling priorities. They would only need to implement a hook which allow the cgroup controller to query client GPU utilisation and another to received the over budget signal. Amdgpu and msm AFAIK could be easy candidates because they both support per client utilisation and priorities. Looks like I need to put all this info back into the cover letter. Also, hierarchic weights and time budgets are all already there. What could be done later is make this all smarter and respect the time budget with more precision. That would however, in many cases including Intel, require co-operation with the firmware. In any case it is only work in the implementation, while the cgroup control interface remains the same. > I have taken a look at how the rest of cgroup controllers change > ownership when moved to a different cgroup, and the answer was: not at > all. If we attempt to create the scheduler controls only on the first > time the fd is used, you could probably get rid of all the tracking. Can you send a CPU file descriptor from process A to process B and have CPU usage belonging to process B show up in process' A cgroup, or vice-versa? Nope, I am not making any sense, am I? My point being it is not like-to-like, model is different. No ownership transfer would mean in wide deployments all GPU utilisation would be assigned to Xorg and so there is no point to any of this. No way to throttle a cgroup with un-important GPU clients for instance. > This can be done very easily with the drm scheduler. > > WRT memory, I think the consensus is to track system memory like normal > memory. Stolen memory doesn't need to be tracked. It's kernel only > memory, used for internal bookkeeping  only. > > The only time userspace can directly manipulate stolen memory, is by > mapping the pinned initial framebuffer to its own address space. The > only allocation it can do is when a framebuffer is displayed, and > framebuffer compression creates some stolen memory. Userspace is not > aware of this though, and has no way to manipulate those contents. Stolen memory is irrelevant and not something cgroup controller knows about. Point is drivers say which memory regions they have and their utilisation. Imagine instead of stolen it said vram0, or on Intel multi-tile it shows local0 and local1. People working with containers are interested to see this breakdown. I guess the parallel and use case here is closer to memory.numa_stat. Regards, Tvrtko