Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp514576rwr; Wed, 3 May 2023 02:00:33 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5wjZRNv6X0zhzw/MBP1w4QcP+yo29vFGRbulsqd/CrcOkCatXeyq1+Aqhm0tv4+nZdymyn X-Received: by 2002:a05:6a20:32a9:b0:f2:ad27:f98a with SMTP id g41-20020a056a2032a900b000f2ad27f98amr17701570pzd.14.1683104433358; Wed, 03 May 2023 02:00:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683104433; cv=none; d=google.com; s=arc-20160816; b=WGqbc1mbaiUVEd64AlTZIbiS0g/HCxDs8YwoGHNIzls0pMthLGSShzlxgMrtCLUBrE 08H0krLN45wpftU0YC/v2Rw/grS7E1la2SeQjCLSsjXRBBugaW1vjzijZHxr/Lo5TMEA 9Tqk1EDeofUxr3hc3DtUd8UtsXoN9Hp9SNS/d7bj8BKOWn4oPYRo2uAyTv4jQw8fTyb9 0UZPkzH6FUcTjUNqqxQpQPKeDmWZZ/AoPVixy9bDs2tcEQW1t1f1i9i3pbGXE4tIG+Jt fV0o8PcyQSr9hgLwv5yBE5B6OoQg6mkD5Gg6aZvgD54W9Eo+sHAqfNzCX8squASoFKWB JwJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=mOin6IMcFTg60522fvZqoghmJVCDIpb51pocGbMim+w=; b=xmWoFSwHU0RjKnysun3a1M4e1oojbfIhJvq7GIL3tLzySDrUfL4irD6xN5IngQxYDz fudn4x7KJuLRN+dKUY91kWEBdSOsCGZ8BJYNvLSFD310TLlc6tL7WsjW/DMhNiVsBIru 6m/HbZG5D4WiFocTpNPbrCuVx7LnfSOeqeWTk7/HWdG+NOBIHmGbj2Obi/kyPMaYt+to DvIJvjI4vM2LbMdJoLJvHg9xMJiamdeImWmOtGZaVkkz2kzqUJ3bVVHY8gxYs004+8mY 4JH/tS/qhSIJKsdulRDdt29se5FiYI0tHqmnMBdITm4/EF7w13rLxyULkFNxPd+V1VUG lT2Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w16-20020aa79550000000b0063b63077b05si32845943pfq.386.2023.05.03.02.00.18; Wed, 03 May 2023 02:00:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229766AbjECIuT (ORCPT + 99 others); Wed, 3 May 2023 04:50:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229502AbjECIuR (ORCPT ); Wed, 3 May 2023 04:50:17 -0400 X-Greylist: delayed 301 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 03 May 2023 01:50:15 PDT Received: from mblankhorst.nl (lankhorst.se [141.105.120.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09E0110D9; Wed, 3 May 2023 01:50:14 -0700 (PDT) From: Maarten Lankhorst To: dri-devel@lists.freedesktop.org, cgroups@vger.kernel.org, intel-xe@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Tejun Heo , Zefan Li , Johannes Weiner , David Airlie , Daniel Vetter , amd-gfx@lists.freedesktop.org, Maxime Ripard , Thomas Zimmermann , Maarten Lankhorst , Tvrtko Ursulin Subject: [RFC PATCH 0/4] Add support for DRM cgroup memory accounting. Date: Wed, 3 May 2023 10:34:56 +0200 Message-Id: <20230503083500.645848-1-maarten.lankhorst@linux.intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org RFC as I'm looking for comments. For long running compute, it can be beneficial to partition the GPU memory between cgroups, so each cgroup can use its maximum amount of memory without interfering with other scheduled jobs. Done properly, this can alleviate the need for eviction, which might result in a job being terminated if the GPU doesn't support mid-thread preemption or recoverable page faults. This is done by adding a bunch of knobs to cgroup: drm.capacity: Shows maximum capacity of each resource region. drm.max: Display or limit max amount of memory. drm.current: Current amount of memory in use. TTM has not been made cgroup aware yet, so instead of evicting from the current cgroup to stay within the cgroup limits, it simply returns the error -ENOSPC to userspace. I've used Tvrtko's cgroup controller series as a base, but it implemented scheduling weight, not memory accounting, so I only ended up keeping the base patch. Xe is not upstream yet, so the driver specific patch will only apply on https://gitlab.freedesktop.org/drm/xe/kernel Maarten Lankhorst (3): drm/cgroup: Add memory accounting to DRM cgroup drm/ttm: Handle -EAGAIN in ttm_resource_alloc as -ENOSPC. drm/xe: Add support for the drm cgroup Tvrtko Ursulin (1): cgroup: Add the DRM cgroup controller Documentation/admin-guide/cgroup-v2.rst | 46 ++ Documentation/gpu/drm-compute.rst | 54 ++ drivers/gpu/drm/ttm/ttm_bo.c | 4 +- drivers/gpu/drm/xe/xe_device.c | 4 + drivers/gpu/drm/xe/xe_device_types.h | 4 + drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 21 +- drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h | 5 + include/linux/cgroup_drm.h | 90 ++++ include/linux/cgroup_subsys.h | 4 + init/Kconfig | 7 + kernel/cgroup/Makefile | 1 + kernel/cgroup/drm.c | 557 +++++++++++++++++++++ 12 files changed, 794 insertions(+), 3 deletions(-) create mode 100644 Documentation/gpu/drm-compute.rst create mode 100644 include/linux/cgroup_drm.h create mode 100644 kernel/cgroup/drm.c -- 2.34.1