Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp129306pxy; Fri, 30 Apr 2021 01:49:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKd9QuDhv4P8e7qAznQ0jsmG/7S216PO2LMjtfl3/HlVVJ7PLCR14gyDvE6oj/nHZkxiFo X-Received: by 2002:a63:f30b:: with SMTP id l11mr3658735pgh.129.1619772586684; Fri, 30 Apr 2021 01:49:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619772586; cv=none; d=google.com; s=arc-20160816; b=PiDBJcHCoiYPcdPdpcHLokTcK8kCi7gcG6jhUoo19N8x0drwRzGi5Ip3XdhbzrTYkl N/cUX8axpLdmfI2b6ObkQq5+Lm/R+4MS4weH0ZcFtTURpJnzQ6zCWqhMrCX3x76hzw4f IzY5ZlkElSe5+x0zAXz/4XKz8h1lt/Cq/lGrr5ZdShO5DLKqW6CIFKwSwnwVrrPErigA LVg3OCjZLvmN7m4x964S1qyQGvKuYnEYeV7rmr7IDi9gBwWdEVrcLYOPQ8IoFjITu/Fk kM2M5QoD0hw9zYlhKiLNZ2zh9jb4UXutWpvaKopcYmHt9YsPFIEu2N0zluMO2WTUJZWL eITQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=bcbqiOtTmiC5jlNtcBXdTM6WSSrjdlexsm8VPgksS3Y=; b=W6R0NDhMjhYWOXBCEzYNlXHreBeh/3QdLzlbZYarMF/kDOKVeusEPURmNESvIB0YKK 6RHoNuMLvQ8So2DzmQApY4JYXkgY4qvIsWxcu77JD0orvp8/VO0s7rn0ataD56AYMvQy 99uCoSA78t1pLJT2kdjVezg7s4TYAlqiBHFWfnFQgSC00RvkYlcFD27HB2H4LChXFfqY I3guUeow4tX0hnX6PgFmZCxry5EVnw0Eem4VqdtyPRTfRF5BP08yEeBH88fzn5lNZ3jG zRdvYqZPurCW0Mch9mi21ro/Mi5X6nUJpNxhmvJukFDQDkQfwYvxcLPvgKoaKYbORiyF 2z0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vx9TakcI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 89si1291407plb.254.2021.04.30.01.49.33; Fri, 30 Apr 2021 01:49:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vx9TakcI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231340AbhD3Itf (ORCPT + 99 others); Fri, 30 Apr 2021 04:49:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229507AbhD3Ite (ORCPT ); Fri, 30 Apr 2021 04:49:34 -0400 Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com [IPv6:2607:f8b0:4864:20::f33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 18AEFC06174A for ; Fri, 30 Apr 2021 01:48:45 -0700 (PDT) Received: by mail-qv1-xf33.google.com with SMTP id a30so7273884qvb.12 for ; Fri, 30 Apr 2021 01:48:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bcbqiOtTmiC5jlNtcBXdTM6WSSrjdlexsm8VPgksS3Y=; b=vx9TakcIZtEzQl4Xg6+pIkZ2Od6la/GuhUNXKx83dw+HuSqAjvcU8xIn5ANlVAj2dj NnU3tuweFkLRBhYfd2n+2bvHqQgaIw7O55cwQFTQje4Bqs453rozC4C19yoLPmwNJfbB p043Nc80fMPinaGDAd1vpVX+Q9E4RXwGqioynmVUy92jP05SXPc/tLsdFdsZUNnxXKKO +ZNmPpzU2tPOpQMtNyxnsIOUwIVIPs4pIhRlfbfjVZ/VrE+VJrzO74E5qE6WdiBnRMt1 XiSTtMBN0ZR75m7hsQLfWKlJUlQwoQQW/yCFzEj3Grt4NubDknWhdt9F/z198wL0a3gZ sRcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bcbqiOtTmiC5jlNtcBXdTM6WSSrjdlexsm8VPgksS3Y=; b=B+i6JygTxH70cge9C7iHH12MWH6ro6Vc5lrurJSWQcJksEBxa3IU+AJ76Q58L8jGic mAvrBH8U4fcLSEdNvRM0OUbwohQQv8dI1Hbtf/1uRMV227+q+Qm20T/4oaUz47gH0sRe cWZhAaJL0/y21JjXTm9g23iJCSBrfplkW1AMzGtCTOKdFvn7q4HBG3UZkJ3cA15dgJqD ox9ooll+V8xsVWZNL/MDd4Oa16nkvJbV2NZeEkQdp0bjcJH+0f5hjZ0WmSmFnMrmY9Zb MytSBWeLlg+F+9w1dHpM7x148vBdGGixSRSehOcCS8U48LzepAXRN4+DUJ4+EjMGLxuz yhCQ== X-Gm-Message-State: AOAM533KDw03UbiikXeWnr4Ga1vyZRIl0z5BcL/waVQF5gaI0wqQ536q WyfRkYlTIWkBoqBvbgdhUUAElrC28g5jrJpy9+vtqQ== X-Received: by 2002:a0c:a425:: with SMTP id w34mr4579283qvw.2.1619772523909; Fri, 30 Apr 2021 01:48:43 -0700 (PDT) MIME-Version: 1.0 References: <20210422120459.447350175@infradead.org> <20210422123308.196692074@infradead.org> In-Reply-To: From: Josh Don Date: Fri, 30 Apr 2021 01:48:32 -0700 Message-ID: Subject: Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock To: Aubrey Li Cc: Peter Zijlstra , Joel Fernandes , "Hyser,Chris" , Ingo Molnar , Vincent Guittot , Valentin Schneider , Mel Gorman , Linux List Kernel Mailing , Thomas Gleixner , Don Hiatt Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 30, 2021 at 1:20 AM Aubrey Li wrote: > > On Fri, Apr 30, 2021 at 4:40 AM Josh Don wrote: > > > > On Thu, Apr 29, 2021 at 1:03 AM Aubrey Li wrote: > > > > > > On Thu, Apr 22, 2021 at 8:39 PM Peter Zijlstra wrote: > > > ----snip---- > > > > @@ -199,6 +224,25 @@ void raw_spin_rq_unlock(struct rq *rq) > > > > raw_spin_unlock(rq_lockp(rq)); > > > > } > > > > > > > > +#ifdef CONFIG_SMP > > > > +/* > > > > + * double_rq_lock - safely lock two runqueues > > > > + */ > > > > +void double_rq_lock(struct rq *rq1, struct rq *rq2) > > > > +{ > > > > + lockdep_assert_irqs_disabled(); > > > > + > > > > + if (rq1->cpu > rq2->cpu) > > > > > > It's still a bit hard for me to digest this function, I guess using (rq->cpu) > > > can't guarantee the sequence of locking when coresched is enabled. > > > > > > - cpu1 and cpu7 shares lockA > > > - cpu2 and cpu8 shares lockB > > > > > > double_rq_lock(1,8) leads to lock(A) and lock(B) > > > double_rq_lock(7,2) leads to lock(B) and lock(A) > > > > > > change to below to avoid ABBA? > > > + if (__rq_lockp(rq1) > __rq_lockp(rq2)) > > > > > > Please correct me if I was wrong. > > > > Great catch Aubrey. This is possibly what is causing the lockups that > > Don is seeing. > > > > The proposed usage of __rq_lockp() is prone to race with sched core > > being enabled/disabled.It also won't order properly if we do > > double_rq_lock(smt0, smt1) vs double_rq_lock(smt1, smt0), since these > > would have equivalent __rq_lockp() > > If __rq_lockp(smt0) == __rq_lockp(smt1), rq0 and rq1 won't swap, > Later only one rq is locked and just returns. I'm not sure how does it not > order properly? If there is a concurrent switch from sched_core enable <-> disable, the value of __rq_lockp() will race. In the version you posted directly above, where we swap rq1 and rq2 if __rq_lockp(rq1) > __rqlockp(rq2) rather than comparing the cpu, the following can happen: cpu 1 and cpu 7 share a core lock when coresched is enabled - schedcore enabled - double_lock(7, 1) - __rq_lockp compares equal for 7 and 1; no swap is done - schedcore disabled; now __rq_lockp returns the per-rq lock - lock(__rq_lockp(7)) => lock(7) - lock(__rq_lockp(1)) => lock(1) Then we can also have - schedcore disabled - double_lock(1, 7) - __rq_lock(1) < rq_lock(7), so no swap - lock(__rqlockp(1)) => lock(1) - lock(__rq_lockp(7)) => lock(7) So we have in the first 7->1 and in the second 1->7 > > .> I'd propose an alternative but similar idea: order by core, then break ties > > by ordering on cpu. > > > > +#ifdef CONFIG_SCHED_CORE > > + if (rq1->core->cpu > rq2->core->cpu) > > + swap(rq1, rq2); > > + else if (rq1->core->cpu == rq2->core->cpu && rq1->cpu > rq2->cpu) > > + swap(rq1, rq2); > > That is, why the "else if" branch is needed? Ensuring that core siblings always take their locks in the same order if coresched is disabled. > > > +#else > > if (rq1->cpu > rq2->cpu) > > swap(rq1, rq2); > > +#endif Best, Josh