Received: by 2002:a05:6602:2086:0:0:0:0 with SMTP id a6csp4462779ioa; Wed, 27 Apr 2022 04:32:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwMfdX7vBB2Kmzg0WK22vrYAzK8sXtf0Zt3R17g0ABI8Md9l0aXJqs/lR1UPqDxHNvO4ric X-Received: by 2002:a17:90b:4a06:b0:1d9:34a1:ae2e with SMTP id kk6-20020a17090b4a0600b001d934a1ae2emr23950166pjb.51.1651059164362; Wed, 27 Apr 2022 04:32:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651059164; cv=none; d=google.com; s=arc-20160816; b=AlKMO33jKmM3ef85r5mk433ImJZ4x2GWsLRd4wwcnF6pZjQhTEH6mcBPbR7/m8hX8A 3uoyS99q8KDcipJXbZAYwTWejflYiU84cn0ZD06kVOzS11u0fiA/4uWEPLcKaAU5UFHv cAfyfPreziRca4Ixnxu/CX1ij/PH7/ekH742nPbP2gjopoMnUsWn4AyEZpnvyfXsz+ta m3pQs9pmGHV5IZSWpmaw6V2ESZO6EhjMpJ1uryhDi8ib85xE4hsMrIYrBxBViDWVvr5H TWsax44WHy7H1K93tpqiFSqnsDE+kIu0sCf9rw+kXt7zkq82zb7RNA6nz4xgRh9iOmcD 6upg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=GoYuxuTr0wU4xOtdPoGRbWQ9hkISKKeNkUzuaQojQ8I=; b=PGDdv3yHFfGxXioqWRmI0Q1UaLvEZhWQl4Np5ZjORZIBuFmggm4gh/aa3KxRywnyGU /G1unqAdXvJtuMrB31wsjo7C3uVtNDdBEXRj9WdZT3DlPjoVNTfzq1lTF9r36gkr38p/ riKTYR8mpKxid6g88Lc7oSpSW5aHXouJzRxvqTnnSSlSleBbvu48rnQp1v2bOHm+3kUF NDrwylDl23PZw/pvnh5omClDysZOjyOO0RY2YhkvpiADbbOSYFzNGjXqzd9ZR6LU9jEV 2JEgnoH/QJ++RRC9mHycyJ+R/61cxGE/ypxteFn8thZ0/YtvVZDOxQF+1PtP1TtNKgaZ y5+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PH8COKyu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id p3-20020a63c143000000b0039d6172e359si1279701pgi.414.2022.04.27.04.32.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 04:32:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PH8COKyu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6EC46222919; Wed, 27 Apr 2022 03:20:13 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351849AbiDZPBm (ORCPT + 99 others); Tue, 26 Apr 2022 11:01:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351834AbiDZPBh (ORCPT ); Tue, 26 Apr 2022 11:01:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4185FB1A8B for ; Tue, 26 Apr 2022 07:58:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650985108; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GoYuxuTr0wU4xOtdPoGRbWQ9hkISKKeNkUzuaQojQ8I=; b=PH8COKyuOdLgU9veiYu1PdqBavwXURrSeIn398oB0scnUs3Fus0EHM3opa5xQ3sWi/2aeP awotIW9PKUvvlvWeUEPoW/M3A6uCZNgHRkYcWhWinNF55oTnusQ3BEoWT17UYoBfJNRcE0 RMKejB+tsVVSpl0LxMj8oqWzHkj33Gc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-139-VXxt-ww9N8qv-QtWSacfcQ-1; Tue, 26 Apr 2022 10:58:23 -0400 X-MC-Unique: VXxt-ww9N8qv-QtWSacfcQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4B2F0101AA45; Tue, 26 Apr 2022 14:58:22 +0000 (UTC) Received: from [10.18.17.215] (dhcp-17-215.bos.redhat.com [10.18.17.215]) by smtp.corp.redhat.com (Postfix) with ESMTP id BA44C15230A0; Tue, 26 Apr 2022 14:58:21 +0000 (UTC) Message-ID: Date: Tue, 26 Apr 2022 10:58:21 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Subject: Re: [PATCH v2] cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp() Content-Language: en-US To: Feng Tang Cc: Tejun Heo , Zefan Li , Johannes Weiner , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Michal Hocko , Dave Hansen , ying.huang@intel.com, stable@vger.kernel.org References: <20220425155505.1292896-1-longman@redhat.com> <20220426032337.GA84190@shbuild999.sh.intel.com> From: Waiman Long In-Reply-To: <20220426032337.GA84190@shbuild999.sh.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, TVD_SUBJ_WIPE_DEBT autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/25/22 23:23, Feng Tang wrote: > Hi Waiman, > > On Mon, Apr 25, 2022 at 11:55:05AM -0400, Waiman Long wrote: >> There are 3 places where the cpu and node masks of the top cpuset can >> be initialized in the order they are executed: >> 1) start_kernel -> cpuset_init() >> 2) start_kernel -> cgroup_init() -> cpuset_bind() >> 3) kernel_init_freeable() -> do_basic_setup() -> cpuset_init_smp() >> >> The first cpuset_init() function just sets all the bits in the masks. >> The last one executed is cpuset_init_smp() which sets up cpu and node >> masks suitable for v1, but not v2. cpuset_bind() does the right setup >> for both v1 and v2. >> >> For systems with cgroup v2 setup, cpuset_bind() is called once. For >> systems with cgroup v1 setup, cpuset_bind() is called twice. It is >> first called before cpuset_init_smp() in cgroup v2 mode. Then it is >> called again when cgroup v1 filesystem is mounted in v1 mode after >> cpuset_init_smp(). >> >> [ 2.609781] cpuset_bind() called - v2 = 1 >> [ 3.079473] cpuset_init_smp() called >> [ 7.103710] cpuset_bind() called - v2 = 0 > I run some test, on a server with centOS, this did happen that > cpuset_bind() is called twice, first as v2 during kernel boot, > and then as v1 post-boot. > > However on a QEMU running with a basic debian rootfs image, > the second call of cpuset_bind() didn't happen. The first time cpuset_bind() is called in cgroup_init(), the kernel doesn't know if userspace is going to mount v1 or v2 cgroup. By default, it is assumed to be v2. However, if userspace mounts the cgroup v1 filesystem for cpuset, cpuset_bind() will be run at this point by rebind_subsystem() to set up cgroup v1 environment and cpus_allowed/mems_allowed will be correctly set at this point. Mounting the cgroup v2 filesystem, however, does not cause rebind_subsystem() to run and hence cpuset_bind() is not called again. Is the QEMU setup not mounting any cgroup filesystem at all? If so, does it matter whether v1 or v2 setup is used? >> As a result, cpu and memory node hot add may fail to update the cpu and >> node masks of the top cpuset to include the newly added cpu or node in >> a cgroup v2 environment. >> >> smp_init() is called after the first two init functions. So we don't >> have a complete list of active cpus and memory nodes until later in >> cpuset_init_smp() which is the right time to set up effective_cpus >> and effective_mems. >> >> To fix this problem, the potentially incorrect cpus_allowed & >> mems_allowed setup in cpuset_init_smp() are removed. For cgroup v2 >> systems, the initial cpuset_bind() call will set them up correctly. >> For cgroup v1 systems, the second call to cpuset_bind() will do the >> right setup. >> >> cc: stable@vger.kernel.org >> Signed-off-by: Waiman Long >> --- >> kernel/cgroup/cpuset.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >> index 9390bfd9f1cd..6bd8f5ef40fe 100644 >> --- a/kernel/cgroup/cpuset.c >> +++ b/kernel/cgroup/cpuset.c >> @@ -3390,8 +3390,9 @@ static struct notifier_block cpuset_track_online_nodes_nb = { >> */ >> void __init cpuset_init_smp(void) >> { >> - cpumask_copy(top_cpuset.cpus_allowed, cpu_active_mask); >> - top_cpuset.mems_allowed = node_states[N_MEMORY]; > So can we keep line > cpumask_copy(top_cpuset.cpus_allowed, cpu_active_mask); > > and only remove line > top_cpuset.mems_allowed = node_states[N_MEMORY]; > ? That may cause cpusets.cpu to be set incorrectly for systems using cgroup v2. What is really important is that effective_cpus and effective_mems are set correctly. Cheers, Longman