Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41695C05027 for ; Thu, 2 Feb 2023 16:07:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232055AbjBBQHr (ORCPT ); Thu, 2 Feb 2023 11:07:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230317AbjBBQHo (ORCPT ); Thu, 2 Feb 2023 11:07:44 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D86873C2B0 for ; Thu, 2 Feb 2023 08:07:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675354020; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K7phcfiE83fVwz5I8bWRnbFjaiLhRCsY0h7b3g24mK8=; b=AW496yp2glKYX90OUYaHtuuGXVy+pO2vCOPu3U/eqrxBJb/1aifyPa4E2eT6Nsibj/wGAA TWO9tljCuFdD+8dgGMhVlf47CHsWUTZPvqhiTGEDGcYLqJqqnpxCw64sCDaG8pYDG3ukbP mniapK6/BXwr1Ak/skk4rJvzLu22WuU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-630-_-kt1zdcP-2gghTR1sSrxA-1; Thu, 02 Feb 2023 11:06:56 -0500 X-MC-Unique: _-kt1zdcP-2gghTR1sSrxA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9945938288A7; Thu, 2 Feb 2023 16:06:52 +0000 (UTC) Received: from [10.22.8.157] (unknown [10.22.8.157]) by smtp.corp.redhat.com (Postfix) with ESMTP id B6C1E140EBF6; Thu, 2 Feb 2023 16:06:51 +0000 (UTC) Message-ID: Date: Thu, 2 Feb 2023 11:06:51 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs Content-Language: en-US To: Peter Zijlstra Cc: Will Deacon , linux-kernel@vger.kernel.org, kernel-team@android.com, Zefan Li , Tejun Heo , Johannes Weiner , cgroups@vger.kernel.org References: <20230131221719.3176-1-will@kernel.org> <20230131221719.3176-2-will@kernel.org> <6b068916-5e1b-a943-1aad-554964d8b746@redhat.com> From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/2/23 03:34, Peter Zijlstra wrote: > On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote: > >> A major concern I have is the overhead of creating a poor man version of v2 >> cpus_allowed. This issue can be worked around even for cpuset v1 if it is >> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask >> handling. Alternatively we may be able to provide a config option to make >> this the default for v1 without the special mount option, if necessary. > It is equally broken for v2, it masks against effective_cpus. Not to > mention it explicitly starts with cpu_online_mask. After taking a close look at the patch, my understanding of what it is doing is as follows: v2: cpus_allowed will not be affected by hotplug. So the new cpuset_cpus_allowed() will return effective_cpus + offline cpus that should have been part of effective_cpus if online before masking it with allowable cpus and then go up the cpuset hierarchy if necessary. v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the current cpuset and move up the hierarchy if necessary to find a cpuset that have at least one allowable cpu. First of all, it does not take into account of the v2 partition feature that may cause it to produce incorrect result if partition is enabled somewhere. Secondly, I don't see any benefit other than having some additional offline cpu available in a task's cpumask which the scheduler will ignore anyway. v2 is able to recover a previously offlined cpu. So we don't gain any net benefit other than the going up the cpuset hierarchy part. For v1, I agree we should go up the cpuset hierarchy to find a usable cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(), my current preference is to do the hierarchy climbing part in an enhanced cpuset_cpus_allowed_fallback() after an initial failure of cpuset_cpus_allowed(). That will be easier to understand than having such complexity and overhead in cpuset_cpus_allowed() alone. I will work on a patchset to do that as a counter offer. Cheers, Longman