Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2332371imm; Fri, 7 Sep 2018 14:50:07 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZNvk7hKv7b/+kVMyaNjdFXNqUKY8SlT55hHxM5LyOJmJGvKDsgw9b7ogh+HgvDUEXIhFdU X-Received: by 2002:a63:731b:: with SMTP id o27-v6mr10115357pgc.216.1536357007703; Fri, 07 Sep 2018 14:50:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536357007; cv=none; d=google.com; s=arc-20160816; b=c+ESn2ER7fMvNGxq8XxtY9vw8awOc9VZ3IH2d7VUcO2WxNxbcAP2tvIDeSqHxnNwIM 8yq1ro4nmmZEsf0OT5sd1DF81JFT1G427iGvXjGnLT6B7+xxLZ+ZmUiUW/VAXD8g2/gU 7M08YDhvjjxOInD22JwW04qSBueOS6TAdwI08UA2+vBj6nYeJzRQTmok+wJUNjxXjvCJ 7INk01u5hBq53e8B6FpLiUcpPJeywxrRWvdhbA9DvuYNpoAN8+CZoKt1ndg87zojvsze gEUhX8gxEA8dsoVeGcu21wYd2y8cW1SZr41LmdJVVwDneSFevA9hx9m3Cry8Na1Kpecu skfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=eyWbCvSmEW0u0nZQPbssV5pvceLxX6YrkxiINfJ1Biw=; b=hKA0oW7ARRdlNBPMahbzWwU6xfCZ0AnYEvOZsihY7Q5RXGn1OVZvZlNPIXTmx5kKQP gV8pLRB8WFJcdrjZyoM5QPgXaAGljFqtl5O7pmH2tMOvEw6WckIBAXL8P9cTjHcSjcXH 3fWtovFkG4yWqdS56/GyfW/D22A24ep2op120/ssNLVVfCKrLEWvkkO8VC3fqE0oBNKP ifitvOC0Hym+0VHDG84RsEMXUhhTustv2ezc2RsqOIfoKW7AAkRPLKBEWi5BX1OaA2Ny qjmp37TBaDWapklFCYGTvP7Bu2udEcpjaDK9YXZVq02bmoerifeFbNR/GFsWkf7DYLGX dZOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=LQh0MoxK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e6-v6si9536143pfh.64.2018.09.07.14.49.52; Fri, 07 Sep 2018 14:50:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=LQh0MoxK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730852AbeIHCa7 (ORCPT + 99 others); Fri, 7 Sep 2018 22:30:59 -0400 Received: from smtp-fw-9102.amazon.com ([207.171.184.29]:62597 "EHLO smtp-fw-9102.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728464AbeIHCa6 (ORCPT ); Fri, 7 Sep 2018 22:30:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1536356881; x=1567892881; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eyWbCvSmEW0u0nZQPbssV5pvceLxX6YrkxiINfJ1Biw=; b=LQh0MoxKr8hUoU93mMsOPkdv98uUw0gtHXqk3b00MYGzOLUrTzpIRVHT dFDWHsWVudOO6FPh6cLo9hveKReCAmG7cPZ7Zeh0+8tFdyul5Ncy2llq/ kZhYNdiOULUfm6OYqh6BBMkS0RY5xg4XhbJ5wy1MYLYIposQNjYnEPHMy A=; X-IronPort-AV: E=Sophos;i="5.53,343,1531785600"; d="scan'208";a="629734803" Received: from sea3-co-svc-lb6-vlan3.sea.amazon.com (HELO email-inbound-relay-2b-c300ac87.us-west-2.amazon.com) ([10.47.22.38]) by smtp-border-fw-out-9102.sea19.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 07 Sep 2018 21:45:30 +0000 Received: from u7588a65da6b65f.ant.amazon.com (pdx2-ws-svc-lb17-vlan3.amazon.com [10.247.140.70]) by email-inbound-relay-2b-c300ac87.us-west-2.amazon.com (8.14.7/8.14.7) with ESMTP id w87Lh7cs051707 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Fri, 7 Sep 2018 21:43:09 GMT Received: from u7588a65da6b65f.ant.amazon.com (localhost [127.0.0.1]) by u7588a65da6b65f.ant.amazon.com (8.15.2/8.15.2/Debian-3) with ESMTPS id w87Lh659027879 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 7 Sep 2018 23:43:06 +0200 Received: (from jschoenh@localhost) by u7588a65da6b65f.ant.amazon.com (8.15.2/8.15.2/Submit) id w87Lh5h3027878; Fri, 7 Sep 2018 23:43:05 +0200 From: =?UTF-8?q?Jan=20H=2E=20Sch=C3=B6nherr?= To: Ingo Molnar , Peter Zijlstra Cc: =?UTF-8?q?Jan=20H=2E=20Sch=C3=B6nherr?= , linux-kernel@vger.kernel.org Subject: [RFC 59/60] cosched: Handle non-atomicity during switches to and from coscheduling Date: Fri, 7 Sep 2018 23:40:46 +0200 Message-Id: <20180907214047.26914-60-jschoenh@amazon.de> X-Mailer: git-send-email 2.9.3.1.gcba166c.dirty In-Reply-To: <20180907214047.26914-1-jschoenh@amazon.de> References: <20180907214047.26914-1-jschoenh@amazon.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We cannot switch a task group from regular scheduling to coscheduling atomically, as it would require locking the whole system. Instead, the switch is done runqueue by runqueue via cosched_set_scheduled(). This means that other CPUs may see an intermediate state when locking a bunch of runqueues, where the sdrq->is_root fields do not yield a consistent picture across a task group. Handle these cases. Signed-off-by: Jan H. Schönherr --- kernel/sched/fair.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 322a84ec9511..8da2033596ff 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -646,6 +646,15 @@ static struct cfs_rq *current_cfs(struct rq *rq) { struct sdrq *sdrq = READ_ONCE(rq->sdrq_data.current_sdrq); + /* + * We might race with concurrent is_root-changes, causing + * current_sdrq to reference an sdrq which is no longer + * !is_root. Counter that by ascending the tg-hierarchy + * until we find an sdrq with is_root. + */ + while (sdrq->is_root && sdrq->tg_parent) + sdrq = sdrq->tg_parent; + return sdrq->cfs_rq; } #else @@ -7141,6 +7150,23 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf se = pick_next_entity(cfs_rq, curr); cfs_rq = pick_next_cfs(se); + +#ifdef CONFIG_COSCHEDULING + if (cfs_rq && is_sd_se(se) && cfs_rq->sdrq.is_root) { + WARN_ON_ONCE(1); /* Untested code path */ + /* + * Race with is_root update. + * + * We just moved downwards in the hierarchy via an + * SD-SE, the CFS-RQ should have is_root set to zero. + * However, a reconfiguration may be in progress. We + * basically ignore that reconfiguration. + * + * Contrary to the case below, there is nothing to fix + * as all the set_next_entity() calls are done later. + */ + } +#endif } while (cfs_rq); if (is_sd_se(se)) @@ -7192,6 +7218,48 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf se = pick_next_entity(cfs_rq, NULL); set_next_entity(cfs_rq, se); cfs_rq = pick_next_cfs(se); + +#ifdef CONFIG_COSCHEDULING + if (cfs_rq && is_sd_se(se) && cfs_rq->sdrq.is_root) { + /* + * Race with is_root update. + * + * We just moved downwards in the hierarchy via an + * SD-SE, the CFS-RQ should have is_root set to zero. + * However, a reconfiguration may be in progress. We + * basically ignore that reconfiguration, but we need + * to fix the picked path to correspond to that + * reconfiguration. + * + * Thus, we walk the hierarchy upwards again and do two + * things simultaneously: + * + * 1. put back picked entities which are not on the + * "correct" path, + * 2. pick the entities along the correct path. + * + * We do this until both paths upwards converge. + */ + struct sched_entity *se2 = cfs_rq->sdrq.tg_se; + bool top = false; + + WARN_ON_ONCE(1); /* Untested code path */ + while (se && se != se2) { + if (!top) { + put_prev_entity(cfs_rq_of(se), se); + if (cfs_rq_of(se) == top_cfs_rq) + top = true; + } + if (top) + se = cfs_rq_of(se)->sdrq.tg_se; + else + se = parent_entity(se); + set_next_entity(cfs_rq_of(se2), se2); + se2 = parent_entity(se2); + } + } +#endif + } while (cfs_rq); retidle: __maybe_unused; -- 2.9.3.1.gcba166c.dirty