Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp1169232pxb; Thu, 7 Oct 2021 02:11:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwIUavniMQC+RIft+UBms2kVcnO8nKmXToGwI+BDsGsxQ/GTjfhEZDV7RjvtajvoCLnPCRO X-Received: by 2002:a62:9215:0:b0:44c:4de1:f777 with SMTP id o21-20020a629215000000b0044c4de1f777mr2866842pfd.31.1633597913305; Thu, 07 Oct 2021 02:11:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633597913; cv=none; d=google.com; s=arc-20160816; b=QZQJ4tLWmKojW2x/EFtmDVKF7GTa9RcUpeFnP4TqDzlsAmtQPQ1Jk1maDhpA9ypj2z 7TmIpkWMC4JyQKUphCd8C0j/1nz1z6EtxZecUM2d/f7TAQtbGYhHz8b+jqyg9rCQe6JB ZdrcCx6djCT/RCjxow6+j7Src9hnkvOf1pml+z1KzNshodxY7HUNd24ocG4VY4UKshlE WQcwzzYs6OwrPkupEElHh32v5alUbtaBI3c17xMlYML7hYMA2LhKzNniO/rsbpU/RV2U RHR74311pUrpbt2wqnYCllu+/o9MsLUeTpIFPO0lCEI1p66h289Q8fbOGRbC8pjCE7E2 ZFGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :references:cc:to:from:subject; bh=eDE0Lq6s757+KmYuvfGuYamdduaIoz6nkG3xxvreS2s=; b=gHKu4A8UXzitUBMDNxwH9KFwfZRvnf1R9dANNLoNrpzrpJAUM13obd8znIo0RKwhkU IoIpSmi4O/0wwQGdtAceEbEEVmeMTmFUHCZ8g3udaaogTUZFm+RULaWqari39X6fud2p M3FKp2TgbjOzcqMii3IxPAdOj4L+OQVKL7TJDjcGhNCdT28xGwLdsbii3QxQdutVRI3x FrQWd142ygZ0y222iSlg5NA3M/rlB29dg5hCobq0BsyHsPLYJkqjBAhe11Zz09hwsj5u 4Y8nIQswIKZD9i5NeMajolWmIgI6quZCigGwP/uP1BiwHngaI8zqmLdanxHUf0pUoGRx AWjg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z13si30604122pff.176.2021.10.07.02.11.38; Thu, 07 Oct 2021 02:11:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240339AbhJGJLt (ORCPT + 99 others); Thu, 7 Oct 2021 05:11:49 -0400 Received: from mga14.intel.com ([192.55.52.115]:5182 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231661AbhJGJLs (ORCPT ); Thu, 7 Oct 2021 05:11:48 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10129"; a="226515996" X-IronPort-AV: E=Sophos;i="5.85,354,1624345200"; d="scan'208";a="226515996" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2021 02:09:55 -0700 X-IronPort-AV: E=Sophos;i="5.85,354,1624345200"; d="scan'208";a="657317449" Received: from cleane-mobl.ger.corp.intel.com (HELO [10.213.249.175]) ([10.213.249.175]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2021 02:09:53 -0700 Subject: Re: [RFC 1/8] sched: Add nice value change notifier From: Tvrtko Ursulin To: Barry Song <21cnbao@gmail.com> Cc: "Wanghui (John)" , Intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, LKML , Tvrtko Ursulin , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot References: <20211004143650.699120-1-tvrtko.ursulin@linux.intel.com> <20211004143650.699120-2-tvrtko.ursulin@linux.intel.com> <562d45e1-4a27-3252-f615-3ab1ef531f2b@huawei.com> <8381e87d-ef7f-4759-569b-f6dabeb02939@linux.intel.com> <382a4bd5-bb74-5928-be67-afbdc7aa3663@linux.intel.com> Organization: Intel Corporation UK Plc Message-ID: <6818e34e-d41c-67b7-85dd-76d2e47bc078@linux.intel.com> Date: Thu, 7 Oct 2021 10:09:51 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <382a4bd5-bb74-5928-be67-afbdc7aa3663@linux.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/10/2021 09:50, Tvrtko Ursulin wrote: > > On 06/10/2021 21:21, Barry Song wrote: >> On Thu, Oct 7, 2021 at 2:44 AM Tvrtko Ursulin >> wrote: >>> >>> >>> Hi, >>> >>> On 06/10/2021 08:58, Barry Song wrote: >>>> On Wed, Oct 6, 2021 at 5:15 PM Wanghui (John) >>>> wrote: >>>>> >>>>> HI Tvrtko >>>>> >>>>> On 2021/10/4 22:36, Tvrtko Ursulin wrote: >>>>>>     void set_user_nice(struct task_struct *p, long nice) >>>>>>     { >>>>>>         bool queued, running; >>>>>> -     int old_prio; >>>>>> +     int old_prio, ret; >>>>>>         struct rq_flags rf; >>>>>>         struct rq *rq; >>>>>> >>>>>> @@ -6915,6 +6947,9 @@ void set_user_nice(struct task_struct *p, >>>>>> long nice) >>>>>> >>>>>>     out_unlock: >>>>>>         task_rq_unlock(rq, p, &rf); >>>>>> + >>>>>> +     ret = atomic_notifier_call_chain(&user_nice_notifier_list, >>>>>> nice, p); >>>>>> +     WARN_ON_ONCE(ret != NOTIFY_DONE); >>>>>>     } >>>>> How about adding a new "io_nice" to task_struct,and move the call >>>>> chain to >>>>> sched_setattr/getattr, there are two benefits: >>>> >>>> We already have an ionice for block io scheduler. hardly can this >>>> new io_nice >>>> be generic to all I/O. it seems the patchset is trying to link >>>> process' nice with >>>> GPU's scheduler, to some extent, it makes more senses than having a >>>> common ionice because we have a lot of IO devices in the systems, we >>>> don't >>>> know which I/O the ionice of task_struct should be applied to. >>>> >>>> Maybe we could have an ionice dedicated for GPU just like ionice for >>>> CFQ >>>> of bio/request scheduler. >>> >>> Thought crossed my mind but I couldn't see the practicality of a 3rd >>> nice concept. I mean even to start with I struggle a bit with the >>> usefulness of existing ionice vs nice. Like coming up with practical >>> examples of usecases where it makes sense to decouple the two >>> priorities. >>> >>>   From a different angle I did think inheriting CPU nice makes sense for >>> GPU workloads. This is because today, and more so in the future, >>> computations on a same data set do flow from one to the other. >>> >>> Like maybe a simple example of batch image processing where CPU decodes, >>> GPU does a transform and then CPU encodes. Or a different mix, doesn't >>> really matter, since the main point it is one computing pipeline from >>> users point of view. >>> >> >> I am on it. but I am also seeing two problems here: >> 1. nice is not global in linux. For example, if you have two cgroups, >> cgroup A >> has more quota then cgroup B. Tasks in B won't win even if it has a >> lower nice. >> cgroups will run proportional-weight time-based division of CPU. >> >> 2. Historically, we had dynamic nice which was adjusted based on the >> average >> sleep/running time; right now, we don't have dynamic nice, but virtual >> time >> still make tasks which sleep more preempt other tasks with the same nice >> or even lower nice. >> virtual time += physical time/weight by nice >> so, static nice number doesn't always make sense to decide preemption. >> >> So it seems your patch only works under some simple situation for example >> no cgroups, tasks have similar sleep/running time. > > Yes, I broadly agree with your assessment. Although there are plans for > adding cgroup support to i915 scheduling, I doubt as fine grained > control and exact semantics as there are on the CPU side will happen. > > Mostly because the drive seems to be for more micro-controller managed > scheduling which adds further challenges in connecting the two sides > together. > > But when you say it is a problem, I would characterize it more a > weakness in terms of being only a subset of possible control. It is > still richer (better?) than what currently exists and as demonstrated > with benchmarks in my cover letter it can deliver improvements in user > experience. If in the mid term future we can extend it with cgroup > support then the concept should still apply and get closer to how you > described nice works in the CPU world. > > Main question in my mind is whether the idea of adding the > sched_attr/priority notifier to the kernel can be justified. Because as > mentioned before, everything apart from adjusting currently running GPU > jobs could be done purely in userspace. Stack changes would be quite > extensive and all, but that is not usually a good enough reason to put > something in the kernel. That's why it is an RFC an invitation to discuss. > > Even ionice inherits from nice (see task_nice_ioprio()) so I think > argument can be made for drivers as well. Now that I wrote this, I had a little bit of a light bulb moment. If I abandon the idea of adjusting the priority of already submitted work items, then I can do much of what I want purely from within the confines of i915. I simply add code to inherit from current task nice on every new work item submission. This should probably bring the majority of the benefit I measured. Regards, Tvrtko