Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp4236793pxb; Tue, 17 Nov 2020 15:24:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJyxGEA+JblPAziQkZRlhwKWS09pFnJEltxq2046mGMamwwP7fLx5ozGRvwZw1xNurRrFEEU X-Received: by 2002:a17:906:3617:: with SMTP id q23mr21963755ejb.371.1605655467831; Tue, 17 Nov 2020 15:24:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605655467; cv=none; d=google.com; s=arc-20160816; b=l/ux7fZMy9MeY7D0fsIR0to7X2Mxrw7enOB3fWAmMrYWH2p3olo6SnW2SPadQrcJQm CEF2N6IrFwZrbfmv0zkNgLcI1vl0ya8Z5YSsmlvXmU30XBHNcqcsMANF1sbKqmYNpxMi oIAaHuGB/ae0+PYcIjfO/qG5TL24W7GRpmVw/mrSKyS6OqcznJZEJVtgzy8XiFP4Dkin O5WeRSqOjGqam/k8v+CBG1sDgjWQpnikedwl7OfTkN5L7utv8hFnWIJn8/aPCytXk4yk JQ8GElks7Y6RN5b+ZadVV7j2Wa3P0zJxXInroyNq+Hc0vdnCkRfFsl2Hol+Gc0fu+lUK 7MLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=l87QjkGAZn7dFWByEVvl4gKOU5eA+b8J9nkZzvXnyI8=; b=FQ+iuRPwtuD3AZD1mNYL7kmf6M6uReKbUFe7mcoREEofMJ2Cfm88LE5Hh4/9kxU7m0 ciRHkg/DkYrcW+/mADlwzdtpJ42jkX9oT6dfl825v+stqIGKTqZaBPIDz0Y6QhZi5bSm VGsOnSNet+EIC+xEKsP2NxCjiltkBSLEFBEHnoxbPfkPRDFuNzi4GdYSyCWB69FPFXKy UuGqQ+RmD86vIqPG22C4suuLTWIDf4ZJRdkgDG5zlYCpZ8n3WyHIfMZ2j5Hopdi50g/r MsrQZ2a86iDEzcMDr1pv/9qkL/fFMzhTFeeIp2XdmHwRTA/i51RaodAQr0MvKckypOEU zpzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=R1dwMHNT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f2si14579888edv.406.2020.11.17.15.24.05; Tue, 17 Nov 2020 15:24:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=R1dwMHNT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729275AbgKQXU5 (ORCPT + 99 others); Tue, 17 Nov 2020 18:20:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729196AbgKQXU4 (ORCPT ); Tue, 17 Nov 2020 18:20:56 -0500 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C950C0613CF for ; Tue, 17 Nov 2020 15:20:56 -0800 (PST) Received: by mail-qk1-x743.google.com with SMTP id a13so29777qkl.4 for ; Tue, 17 Nov 2020 15:20:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=l87QjkGAZn7dFWByEVvl4gKOU5eA+b8J9nkZzvXnyI8=; b=R1dwMHNT4MtKDNidd37OBDsncm8kUhtaOloEPXjkUAhdSbeyyWPotqQNZY3BnjYTTx UBkIZNBGipqRDeh12B6BLaIsa80M7KzP41T1crbsim4QHPll/mMv5pHIxCafGVmomg/J Gv2pPaulsx3K04+vQvjOluYypIzEHGk+ixmZU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=l87QjkGAZn7dFWByEVvl4gKOU5eA+b8J9nkZzvXnyI8=; b=A7xnQ11QkaYlRcZyGiwRJEuhrYdsFJSbGJEVGW/dnZUqY+jj6Zz4QYS+zT+W3nu5b+ v1pGEW41i9J/JjtAkA8xcb8EwDtYvrYi9cjFYilh4ZS/sglF7h3dPTJdTxwluc4gdWFz vAydxBKP2P9ArDmtN50mtGlTqyKdV4PiRkLK7zj0/QcEugQOQuzo3iImHmdXB2Xb08zN aebNss60EH/0BhtEaqS9nyaFIXy9WMx6tFJpQUsCus1BAnc7wbx6Mo25cby2Tw1GkuHe xV2H5jcrnA1Vz2GVgofxodHqABx3A+pMdnRpjBYAUakEyJmgbFK+tpj8FbPiVdwrekHV ZyOg== X-Gm-Message-State: AOAM530StuAfLfVNxKOM2mQvIxtHzEJ2hf5qO40bnNO80wY1ZAEIoAok TAnjH7oLwdeKfZBOc/z8I1x5SA== X-Received: by 2002:a37:6451:: with SMTP id y78mr2153065qkb.500.1605655255427; Tue, 17 Nov 2020 15:20:55 -0800 (PST) Received: from joelaf.cam.corp.google.com ([2620:15c:6:411:cad3:ffff:feb3:bd59]) by smtp.gmail.com with ESMTPSA id d12sm14555544qtp.77.2020.11.17.15.20.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 15:20:54 -0800 (PST) From: "Joel Fernandes (Google)" To: Nishanth Aravamudan , Julien Desfossez , Peter Zijlstra , Tim Chen , Vineeth Pillai , Aaron Lu , Aubrey Li , tglx@linutronix.de, linux-kernel@vger.kernel.org Cc: mingo@kernel.org, torvalds@linux-foundation.org, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Phil Auld , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini , joel@joelfernandes.org, vineeth@bitbyteword.org, Chen Yu , Christian Brauner , Agata Gruza , Antonio Gomez Iglesias , graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com, pjt@google.com, rostedt@goodmis.org, derkling@google.com, benbjiang@tencent.com, Alexandre Chartre , James.Bottomley@hansenpartnership.com, OWeisse@umich.edu, Dhaval Giani , Junaid Shahid , jsbarnes@google.com, chris.hyser@oracle.com, Ben Segall , Josh Don , Hao Luo , Tom Lendacky , Aubrey Li , "Paul E. McKenney" , Tim Chen Subject: [PATCH -tip 22/32] sched: Split the cookie and setup per-task cookie on fork Date: Tue, 17 Nov 2020 18:19:52 -0500 Message-Id: <20201117232003.3580179-23-joel@joelfernandes.org> X-Mailer: git-send-email 2.29.2.299.gdc1121823c-goog In-Reply-To: <20201117232003.3580179-1-joel@joelfernandes.org> References: <20201117232003.3580179-1-joel@joelfernandes.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In order to prevent interference and clearly support both per-task and CGroup APIs, split the cookie into 2 and allow it to be set from either per-task, or CGroup API. The final cookie is the combined value of both and is computed when the stop-machine executes during a change of cookie. Also, for the per-task cookie, it will get weird if we use pointers of any emphemeral objects. For this reason, introduce a refcounted object who's sole purpose is to assign unique cookie value by way of the object's pointer. While at it, refactor the CGroup code a bit. Future patches will introduce more APIs and support. Reviewed-by: Josh Don Tested-by: Julien Desfossez Signed-off-by: Joel Fernandes (Google) --- include/linux/sched.h | 2 + kernel/sched/core.c | 241 ++++++++++++++++++++++++++++++++++++++++-- kernel/sched/debug.c | 4 + 3 files changed, 236 insertions(+), 11 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a60868165590..c6a3b0fa952b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -688,6 +688,8 @@ struct task_struct { #ifdef CONFIG_SCHED_CORE struct rb_node core_node; unsigned long core_cookie; + unsigned long core_task_cookie; + unsigned long core_group_cookie; unsigned int core_occupation; #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b99a7493d590..7ccca355623a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -346,11 +346,14 @@ void sched_core_put(void) mutex_unlock(&sched_core_mutex); } +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2); + #else /* !CONFIG_SCHED_CORE */ static inline void sched_core_enqueue(struct rq *rq, struct task_struct *p) { } static inline void sched_core_dequeue(struct rq *rq, struct task_struct *p) { } static bool sched_core_enqueued(struct task_struct *task) { return false; } +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2) { } #endif /* CONFIG_SCHED_CORE */ @@ -4032,6 +4035,20 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) #endif #ifdef CONFIG_SCHED_CORE RB_CLEAR_NODE(&p->core_node); + + /* + * Tag child via per-task cookie only if parent is tagged via per-task + * cookie. This is independent of, but can be additive to the CGroup tagging. + */ + if (current->core_task_cookie) { + + /* If it is not CLONE_THREAD fork, assign a unique per-task tag. */ + if (!(clone_flags & CLONE_THREAD)) { + return sched_core_share_tasks(p, p); + } + /* Otherwise share the parent's per-task tag. */ + return sched_core_share_tasks(p, current); + } #endif return 0; } @@ -9731,6 +9748,217 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css, #endif /* CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_SCHED_CORE +/* + * A simple wrapper around refcount. An allocated sched_core_cookie's + * address is used to compute the cookie of the task. + */ +struct sched_core_cookie { + refcount_t refcnt; +}; + +/* + * sched_core_tag_requeue - Common helper for all interfaces to set a cookie. + * @p: The task to assign a cookie to. + * @cookie: The cookie to assign. + * @group: is it a group interface or a per-task interface. + * + * This function is typically called from a stop-machine handler. + */ +void sched_core_tag_requeue(struct task_struct *p, unsigned long cookie, bool group) +{ + if (!p) + return; + + if (group) + p->core_group_cookie = cookie; + else + p->core_task_cookie = cookie; + + /* Use up half of the cookie's bits for task cookie and remaining for group cookie. */ + p->core_cookie = (p->core_task_cookie << + (sizeof(unsigned long) * 4)) + p->core_group_cookie; + + if (sched_core_enqueued(p)) { + sched_core_dequeue(task_rq(p), p); + if (!p->core_task_cookie) + return; + } + + if (sched_core_enabled(task_rq(p)) && + p->core_cookie && task_on_rq_queued(p)) + sched_core_enqueue(task_rq(p), p); +} + +/* Per-task interface */ +static unsigned long sched_core_alloc_task_cookie(void) +{ + struct sched_core_cookie *ptr = + kmalloc(sizeof(struct sched_core_cookie), GFP_KERNEL); + + if (!ptr) + return 0; + refcount_set(&ptr->refcnt, 1); + + /* + * NOTE: sched_core_put() is not done by put_task_cookie(). Instead, it + * is done after the stopper runs. + */ + sched_core_get(); + return (unsigned long)ptr; +} + +static bool sched_core_get_task_cookie(unsigned long cookie) +{ + struct sched_core_cookie *ptr = (struct sched_core_cookie *)cookie; + + /* + * NOTE: sched_core_put() is not done by put_task_cookie(). Instead, it + * is done after the stopper runs. + */ + sched_core_get(); + return refcount_inc_not_zero(&ptr->refcnt); +} + +static void sched_core_put_task_cookie(unsigned long cookie) +{ + struct sched_core_cookie *ptr = (struct sched_core_cookie *)cookie; + + if (refcount_dec_and_test(&ptr->refcnt)) + kfree(ptr); +} + +struct sched_core_task_write_tag { + struct task_struct *tasks[2]; + unsigned long cookies[2]; +}; + +/* + * Ensure that the task has been requeued. The stopper ensures that the task cannot + * be migrated to a different CPU while its core scheduler queue state is being updated. + * It also makes sure to requeue a task if it was running actively on another CPU. + */ +static int sched_core_task_join_stopper(void *data) +{ + struct sched_core_task_write_tag *tag = (struct sched_core_task_write_tag *)data; + int i; + + for (i = 0; i < 2; i++) + sched_core_tag_requeue(tag->tasks[i], tag->cookies[i], false /* !group */); + + return 0; +} + +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2) +{ + struct sched_core_task_write_tag wr = {}; /* for stop machine. */ + bool sched_core_put_after_stopper = false; + unsigned long cookie; + int ret = -ENOMEM; + + mutex_lock(&sched_core_mutex); + + /* + * NOTE: sched_core_get() is done by sched_core_alloc_task_cookie() or + * sched_core_put_task_cookie(). However, sched_core_put() is done + * by this function *after* the stopper removes the tasks from the + * core queue, and not before. This is just to play it safe. + */ + if (t2 == NULL) { + if (t1->core_task_cookie) { + sched_core_put_task_cookie(t1->core_task_cookie); + sched_core_put_after_stopper = true; + wr.tasks[0] = t1; /* Keep wr.cookies[0] reset for t1. */ + } + } else if (t1 == t2) { + /* Assign a unique per-task cookie solely for t1. */ + + cookie = sched_core_alloc_task_cookie(); + if (!cookie) + goto out_unlock; + + if (t1->core_task_cookie) { + sched_core_put_task_cookie(t1->core_task_cookie); + sched_core_put_after_stopper = true; + } + wr.tasks[0] = t1; + wr.cookies[0] = cookie; + } else + /* + * t1 joining t2 + * CASE 1: + * before 0 0 + * after new cookie new cookie + * + * CASE 2: + * before X (non-zero) 0 + * after 0 0 + * + * CASE 3: + * before 0 X (non-zero) + * after X X + * + * CASE 4: + * before Y (non-zero) X (non-zero) + * after X X + */ + if (!t1->core_task_cookie && !t2->core_task_cookie) { + /* CASE 1. */ + cookie = sched_core_alloc_task_cookie(); + if (!cookie) + goto out_unlock; + + /* Add another reference for the other task. */ + if (!sched_core_get_task_cookie(cookie)) { + ret = -EINVAL; + goto out_unlock; + } + + wr.tasks[0] = t1; + wr.tasks[1] = t2; + wr.cookies[0] = wr.cookies[1] = cookie; + + } else if (t1->core_task_cookie && !t2->core_task_cookie) { + /* CASE 2. */ + sched_core_put_task_cookie(t1->core_task_cookie); + sched_core_put_after_stopper = true; + + wr.tasks[0] = t1; /* Reset cookie for t1. */ + + } else if (!t1->core_task_cookie && t2->core_task_cookie) { + /* CASE 3. */ + if (!sched_core_get_task_cookie(t2->core_task_cookie)) { + ret = -EINVAL; + goto out_unlock; + } + + wr.tasks[0] = t1; + wr.cookies[0] = t2->core_task_cookie; + + } else { + /* CASE 4. */ + if (!sched_core_get_task_cookie(t2->core_task_cookie)) { + ret = -EINVAL; + goto out_unlock; + } + sched_core_put_task_cookie(t1->core_task_cookie); + sched_core_put_after_stopper = true; + + wr.tasks[0] = t1; + wr.cookies[0] = t2->core_task_cookie; + } + + stop_machine(sched_core_task_join_stopper, (void *)&wr, NULL); + + if (sched_core_put_after_stopper) + sched_core_put(); + + ret = 0; +out_unlock: + mutex_unlock(&sched_core_mutex); + return ret; +} + +/* CGroup interface */ static u64 cpu_core_tag_read_u64(struct cgroup_subsys_state *css, struct cftype *cft) { struct task_group *tg = css_tg(css); @@ -9761,18 +9989,9 @@ static int __sched_write_tag(void *data) * when we set cgroup tag to 0 when the loop is done below. */ while ((p = css_task_iter_next(&it))) { - p->core_cookie = !!val ? (unsigned long)tg : 0UL; - - if (sched_core_enqueued(p)) { - sched_core_dequeue(task_rq(p), p); - if (!p->core_cookie) - continue; - } - - if (sched_core_enabled(task_rq(p)) && - p->core_cookie && task_on_rq_queued(p)) - sched_core_enqueue(task_rq(p), p); + unsigned long cookie = !!val ? (unsigned long)tg : 0UL; + sched_core_tag_requeue(p, cookie, true /* group */); } css_task_iter_end(&it); diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 60a922d3f46f..8c452b8010ad 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -1024,6 +1024,10 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, __PS("clock-delta", t1-t0); } +#ifdef CONFIG_SCHED_CORE + __PS("core_cookie", p->core_cookie); +#endif + sched_show_numa(p, m); } -- 2.29.2.299.gdc1121823c-goog