Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp440314imn; Fri, 29 Jul 2022 11:37:07 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tTFn0k+4jBCburu6fsLEaMoYVDp0LSbaVMavD2Ha2Sv/c+E06/ByPZ5D2WpPSFsj58tfR8 X-Received: by 2002:a50:ed0b:0:b0:43c:83bd:fd5d with SMTP id j11-20020a50ed0b000000b0043c83bdfd5dmr4746994eds.237.1659119826945; Fri, 29 Jul 2022 11:37:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659119826; cv=none; d=google.com; s=arc-20160816; b=mU209xa3R24+fLKxwAT0sGu6+UnGFwmJdQ2WPfyQBWZURIwxftGbqyHj8ghlW3EyF+ Y4gtcR6C/27fFatPpHc1zHq+I4uxKKyFFtuayWZE5bOQOQJHrbGY2OQPlFPWbplnCAkQ LGd02cVO98gP5iGQ4qWb7tid8Cs7jGMKxEyReDJsBB+gtCF8iOm/HnWfDvqCYOW1Ssls xJ2GPsnT96SCLO+66lVJr0je5nmxcdw/gUR8q/qAEL3rdZfEWHs+8xzfIxmwyEi0VPKa nKDP2sVppcjL+qEXhYBQHEKDtsM9kj+cPO2P7xteedFcXTAJsLJK/xL73Ojtc4kTbE9w cDNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=aWKVK6OoM1bKr67knlvEgc7AaGfjBxkEmDKqD2fNc5c=; b=u6RUl+OOUdqK6v96aywVjlT37/V2ptjMS90g8BU4CXtAqpZh9JsDkpty4nKveoHodX vluJDzQWs98o+q+A9FrpgKR7wk7KwddPyO7hJmJLRUijCouVQxIbMSl57FuU+11ldesT 4wObcjI8ePI6Zq9jEWgK/yxgTXnw5F30xVdwAgZ5YlOKB9q70vaANJambh0xyLPa0pvw k4bDts5tlPM7iRGDeX9C77dPUsk0vlWAZy5UuxrAF506IhALENfKxukK22NIwYnLNdOs w1yK1RmrmQLghuGj5O8G7K6iQvB2rUJp8WD2kmQc1F59EVEunIpYfMPbHCkcLuLDnX0m H8rA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="FvSs/QFX"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gn22-20020a1709070d1600b0072fb44407fdsi4354097ejc.962.2022.07.29.11.36.41; Fri, 29 Jul 2022 11:37:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="FvSs/QFX"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238447AbiG2SbN (ORCPT + 99 others); Fri, 29 Jul 2022 14:31:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59740 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238012AbiG2SbJ (ORCPT ); Fri, 29 Jul 2022 14:31:09 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BD6562A428 for ; Fri, 29 Jul 2022 11:31:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659119467; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aWKVK6OoM1bKr67knlvEgc7AaGfjBxkEmDKqD2fNc5c=; b=FvSs/QFXfug/OSrscBmjFpVaM/o/U5QVY2xoPBBwccKmnTImOQt1GL3TM//HO8/Dix4ypb SxfPbLYfJyU9vB2oc2uPQXmO9N622Cd3HTvIS6ZFT0aglPg5kiLn5f58kjHuxTJGxVol47 EpyUgwKH32x31nfIARvRQPU+oHusF1I= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-647-HEfqHrfIM6a6xvhRHbdgkQ-1; Fri, 29 Jul 2022 14:31:03 -0400 X-MC-Unique: HEfqHrfIM6a6xvhRHbdgkQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D87AC185A7A4; Fri, 29 Jul 2022 18:31:02 +0000 (UTC) Received: from [10.22.34.17] (unknown [10.22.34.17]) by smtp.corp.redhat.com (Postfix) with ESMTP id AF117404D883; Fri, 29 Jul 2022 18:31:01 +0000 (UTC) Message-ID: Date: Fri, 29 Jul 2022 14:31:00 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH 1/2] cgroup/cpuset: Keep current cpus list if cpus affinity was explicitly set Content-Language: en-US From: Waiman Long To: Valentin Schneider , Tejun Heo Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Zefan Li , Johannes Weiner , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Will Deacon References: <20220728005815.1715522-1-longman@redhat.com> <1ae1cc6c-dca9-4958-6b22-24a5777c5e8d@redhat.com> <606ed69e-8ad0-45d5-9de7-48739df7f48d@redhat.com> <92f68392-12d2-f64a-9bb9-1a3a15f99d02@redhat.com> In-Reply-To: <92f68392-12d2-f64a-9bb9-1a3a15f99d02@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/29/22 10:50, Waiman Long wrote: > On 7/29/22 10:15, Valentin Schneider wrote: >> On 28/07/22 11:39, Tejun Heo wrote: >>> Hello, Waiman. >>> >>> On Thu, Jul 28, 2022 at 05:04:19PM -0400, Waiman Long wrote: >>>>> So, the patch you proposed is making the code remember one special >>>>> aspect of >>>>> user requested configuration - whether it configured it or not, >>>>> and trying >>>>> to preserve that particular state as cpuset state changes. It >>>>> addresses the >>>>> immediate problem but it is a very partial approach. Let's say a >>>>> task wanna >>>>> be affined to one logical thread of each core and set its mask to >>>>> 0x5555. >>>>> Now, let's say cpuset got enabled and enforced 0xff and affined >>>>> the task to >>>>> 0xff. After a while, the cgroup got more cpus allocated and its >>>>> cpuset now >>>>> has 0xfff. Ideally, what should happen is the task now having the >>>>> effective >>>>> mask of 0x555. In practice, tho, it either would get 0xf55 or 0x55 >>>>> depending >>>>> on which way we decide to misbehave. >>>> OK, I see what you want to accomplish. To fully address this issue, >>>> we will >>>> need to have a new cpumask variable in the the task structure which >>>> will be >>>> allocated if sched_setaffinity() is ever called. I can rework my >>>> patch to >>>> use this approach. >>> Yeah, we'd need to track what user requested separately from the >>> currently >>> effective cpumask. Let's make sure that the scheduler folks are on >>> board >>> before committing to the idea tho. Peter, Ingo, what do you guys think? >>> >> FWIW on a runtime overhead side of things I think it'll be OK as that >> should be just an extra mask copy  in sched_setaffinity() and a subset >> check / cpumask_and() in set_cpus_allowed_ptr(). The policy side is a >> bit >> less clear (when, if ever, do we clear the user-defined mask? Will it >> keep >> haunting us even after moving a task to a disjoint cpuset partition?). > > The runtime overhead should be minimal. It is the behavioral side that > we should be careful about. It is a change in existing behavior and we > don't want to cause surprise to the users. Currently, a task that set > its cpu affinity explicitly will have its affinity reset whenever > there is any change to the cpuset it belongs to or a hotplug event > touch any cpu in the current cpuset. The new behavior we are proposing > here is that it will try its best to keep the cpu affinity that the > user requested within the constraint of the current cpuset as well as > the cpu hotplug state. > > >> >> There's also if/how that new mask should be exposed, because attaching a >> task to a cpuset will now yield a not-necessarily-obvious affinity - >> e.g. in the thread affinity example above, if the initial affinity >> setting >> was done ages ago by some system tool, IMO the user needs a way to be >> able >> to expect/understand the result of 0x555 rather than 0xfff. > > Users can use sched_getaffinity(2) to retrieve the current cpu > affinity. It is up to users to set another one if they don't like the > current one. I don't think we need to return what the previous > requested cpu affinity is. They are suppose to know that or they can > set their own if they don't like it. \ Looking at Will's series that introduced user_cpus_ptr, I think we can overlay our proposal on top of that. So calling sched_setaffinity() will also update user_cpus_ptr. We may still need a flag to indicate whether user_cpus_ptr is set up because of sched_setaffinity() or due to a call to force_compatible_cpus_allowed_ptr() from arm64 arch code. That will make our work easier as some of the infrastructure is already there. I am looking forward for your feedback. Thanks, Longman