Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp983pxt; Wed, 11 Aug 2021 13:00:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz589PUzba7mQIAsZ6TB7q4GUOHLXh1SnR46od7n2UKw9uTma/Hp+7o1+A2G8GhetHI3CJw X-Received: by 2002:a05:6402:288b:: with SMTP id eg11mr719703edb.241.1628712027149; Wed, 11 Aug 2021 13:00:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628712027; cv=none; d=google.com; s=arc-20160816; b=VdIRtBqCrUxYpPRWfCmGDCMbN0g6B+5Wi/C8P4gFxQxUL21Bs8ZyHXP8p7yubY0uoW Et+nQTVDdmgueXrvYl2fpGvsbSAjj6pUiEn5fKLq5ygZl3sTOMd6CPETM/KyJjujQ78l 85rm0A/7B5l90Dx/Ucg/yEx7LcJDi27dH3vo+K90IoAww3Zy4YImtCUvpSPH8XfSSyJ3 pORXvbJo9PNwjJcT/y8SGi2VNhi35C11u6d4rBnJyn3HHdD8PUqhPMCBp8sYoDuhv7t3 MsZfc+FXUnxwunWHgAbmsCMuiPn1Wh22pp+mCvgGsahhiecKUm7o3aMhqWb8nAWuqC4X hGLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=WxdcaYuyKYoi6tYy6rnNj7AJtwXW/DVftZFjymOGA2c=; b=lwZ3HR2fS6a92+H4pCQo0333H9qlHxQGiw7kiT6UTwMWvbtG/ByXpZXNVYjhhqY82J Z9LXDgu51oGYq9mWoqg6YqFIhMOQd29NcZ0okWHH5RWpB45Twt5Y6N/DagYpf4Mgw8iU L1/Y/WYbFNbZiWg4UyChGZhi3p1saQiYcw+o+L8PeHqkT9pN4DQyQ8v4Db7qlYsg5PgF eyCzJyH6iYgFecI4nrjNFiB1WIZi8YAC3TC+5+13QwDLdlXtk6A2TKtgnjjGjthhriwt bUp9E9MdhYs+31JSo+6dHFGAYCYcD9TJG69/JnOTPLIAvBlhNC54qtfC2u8g+K9sITwF y6/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=c5lcvTKV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i14si338504eds.192.2021.08.11.13.00.02; Wed, 11 Aug 2021 13:00:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=c5lcvTKV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231698AbhHKT6Z (ORCPT + 99 others); Wed, 11 Aug 2021 15:58:25 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:27323 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231403AbhHKT6Y (ORCPT ); Wed, 11 Aug 2021 15:58:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628711880; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc; bh=WxdcaYuyKYoi6tYy6rnNj7AJtwXW/DVftZFjymOGA2c=; b=c5lcvTKVGt+/wwgvMdYJ0ARA5hOF/ipJ7dKb/78lq1BiqvfHMbSQ5LUvXOHGzytGqMlEYu pK1DeMFmdVdFfC6ago9W3ypyOFRCiDc+SsEXgihnf4wk5Az9IqjbGeJNWdtOq3S5b7spzR Lwm5e/5O+sD82A6/1i0TCLUUlWCxi8U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-410-THlbMnEaOlCLFHARtJeTbg-1; Wed, 11 Aug 2021 15:57:56 -0400 X-MC-Unique: THlbMnEaOlCLFHARtJeTbg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4D109192D78F; Wed, 11 Aug 2021 19:57:41 +0000 (UTC) Received: from llong.com (unknown [10.22.18.115]) by smtp.corp.redhat.com (Postfix) with ESMTP id 98E29797C8; Wed, 11 Aug 2021 19:57:17 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Waiman Long Subject: [PATCH v2] cgroup/cpuset: Enable memory migration for cpuset v2 Date: Wed, 11 Aug 2021 15:57:07 -0400 Message-Id: <20210811195707.30851-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When a user changes cpuset.cpus, each task in a v2 cpuset will be moved to one of the new cpus if it is not there already. For memory, however, they won't be migrated to the new nodes when cpuset.mems changes. This is an inconsistency in behavior. In cpuset v1, there is a memory_migrate control file to enable such behavior by setting the CS_MEMORY_MIGRATE flag. Make it the default for cpuset v2 so that we have a consistent set of behavior for both cpus and memory. There is certainly a cost to make memory migration the default, but it is a one time cost that shouldn't really matter as long as cpuset.mems isn't changed frequenty. Update the cgroup-v2.rst file to document the new behavior and recommend against changing cpuset.mems frequently. Since there won't be any concurrent access to the newly allocated cpuset structure in cpuset_css_alloc(), we can use the cheaper non-atomic __set_bit() instead of the more expensive atomic set_bit(). Signed-off-by: Waiman Long --- Documentation/admin-guide/cgroup-v2.rst | 11 +++++++++++ kernel/cgroup/cpuset.c | 6 +++++- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 5c7377b5bd3e..babbe04c8d37 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2056,6 +2056,17 @@ Cpuset Interface Files The value of "cpuset.mems" stays constant until the next update and won't be affected by any memory nodes hotplug events. + Setting a non-empty value to "cpuset.mems" causes memory of + tasks within the cgroup to be migrated to the designated nodes if + they are currently using memory outside of the designated nodes. + + There is a cost for this memory migration. The migration + may not be complete and some memory pages may be left behind. + So it is recommended that "cpuset.mems" should be set properly + before spawning new tasks into the cpuset. Even if there is + a need to change "cpuset.mems" with active tasks, it shouldn't + be done frequently. + cpuset.mems.effective A read-only multiple values file which exists on all cpuset-enabled cgroups. diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index adb5190c4429..d151e1de93d4 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2737,12 +2737,16 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css) return ERR_PTR(-ENOMEM); } - set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags); + __set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags); nodes_clear(cs->mems_allowed); nodes_clear(cs->effective_mems); fmeter_init(&cs->fmeter); cs->relax_domain_level = -1; + /* Set CS_MEMORY_MIGRATE for default hierarchy */ + if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) + __set_bit(CS_MEMORY_MIGRATE, &cs->flags); + return &cs->css; } -- 2.18.1