Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1914011pxk; Tue, 1 Sep 2020 10:41:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw8wpVO4UdgzpsgYOaJrSZXQoq9f4WNGrkA7v+njvU+0pXWZXvz91lVO8Tgz831xBsmloWB X-Received: by 2002:a05:6402:8d3:: with SMTP id d19mr2758127edz.68.1598982110940; Tue, 01 Sep 2020 10:41:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598982110; cv=none; d=google.com; s=arc-20160816; b=E3oPZEAejw5UszMVwj3mQ0t3zxROlIY7LkJz7V3GVzmNlRPMLtwhD7kAYUsS2PYmhy QI1KH2HL3gqzhOwkIwZMi7crHcXltJpDRh8JF58zjo7yuVynHH42tYO4VkUNLE6EJgpt +JuVdl8LC70U/7a6M7+XpTPDRz1Ns6YojM38Wxc/w2Xd4oJbuFfB77uydXqVifLrA0ct B6dLySJujqlQEI13duSRqMx/sYrxLV+SRFSrnlLrQG7ZfYgiwP/6nsbiIw3pK7WWfxbS WL+Uwlp1RBQi+KCqlblX7S/AdeGWCG4PTOl1QI4+5WAUs9alOz5MmJ8+JoAMIzBnQjNL 2uXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature; bh=7AZRZCQ50VQO4CE1EbIlyq9iRI2To5Z5oSxTCQ9Kvv0=; b=TqW82cWdBKkx6pmbjrilkxMKAT/A/VSrnn0Cu6VC+hu+SawDg8K/t7mwGQnUbABtgo uD5VRJT+bFACl3R/V+GH39BoJCDz6e6kygJ3/TqW1/8X3XgNjNgNxvqZ4qeoI96Ahw6Y ZRblVn/ctNTNW8Ro2vFWT1da0w9FtwbAnQTVzX8zKFxIOfK3yq3k0AFn3akU6WnHrG5r B9Hrtx82YPWd5pT3YrlQRpxNG73H1c0+qgDipezaIvfa0ia4cPOeDWD5/h4afJ0fRYo4 ANb0+8X/PGoIeeI8MfKSf8eo7yOo1pbpsDyy+VrKW/1THIODK5JiV5spVuQt3A6O5RdK 16Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DdHVFVP0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v13si1231963edl.376.2020.09.01.10.41.27; Tue, 01 Sep 2020 10:41:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DdHVFVP0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726301AbgIARil (ORCPT + 99 others); Tue, 1 Sep 2020 13:38:41 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:24496 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728312AbgIARik (ORCPT ); Tue, 1 Sep 2020 13:38:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1598981918; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7AZRZCQ50VQO4CE1EbIlyq9iRI2To5Z5oSxTCQ9Kvv0=; b=DdHVFVP0vYcwBzSEjuf7IQaFPkQCacKHaTmnDTi9bWEuusCcyp50hAnD5f+ARQ5C8i/qWR ZDq31AU4qQ4wdBMKzmhFB6GVo9qbL4uIlSK89EA/4VsPuWtIg5cM0T2RgQhNRh/XEWpj3T w56KTo8yag+4L3kyAttUucahRX1RfHA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-60-kN1KjVNCOh-KAjeBdqpYvA-1; Tue, 01 Sep 2020 13:38:34 -0400 X-MC-Unique: kN1KjVNCOh-KAjeBdqpYvA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C1B7D873112; Tue, 1 Sep 2020 17:38:31 +0000 (UTC) Received: from llong.remote.csb (ovpn-118-74.rdu2.redhat.com [10.10.118.74]) by smtp.corp.redhat.com (Postfix) with ESMTP id CE5175D9CD; Tue, 1 Sep 2020 17:38:29 +0000 (UTC) Subject: Re: [PATCH v10 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock To: Alex Kogan Cc: linux@armlinux.org.uk, peterz@infradead.org, mingo@redhat.com, will.deacon@arm.com, arnd@arndb.de, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de, hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com, jglauber@marvell.com, steven.sistare@oracle.com, daniel.m.jordan@oracle.com, dave.dice@oracle.com References: <20200403205930.1707-1-alex.kogan@oracle.com> <20200403205930.1707-4-alex.kogan@oracle.com> <08E77224-563F-49C7-9E7F-BD98E4FD121D@oracle.com> From: Waiman Long Organization: Red Hat Message-ID: <84fdb30e-387d-7bc1-e1f9-fa57cd9a32c9@redhat.com> Date: Tue, 1 Sep 2020 13:38:29 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <08E77224-563F-49C7-9E7F-BD98E4FD121D@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/31/20 5:39 PM, Alex Kogan wrote: >> On Jul 28, 2020, at 4:00 PM, Waiman Long wrote: >> >> On 4/3/20 4:59 PM, Alex Kogan wrote: >>> In CNA, spinning threads are organized in two queues, a primary queue for >>> threads running on the same node as the current lock holder, and a >>> secondary queue for threads running on other nodes. After acquiring the >>> MCS lock and before acquiring the spinlock, the lock holder scans the >>> primary queue looking for a thread running on the same node (pre-scan). If >>> found (call it thread T), all threads in the primary queue between the >>> current lock holder and T are moved to the end of the secondary queue. >>> If such T is not found, we make another scan of the primary queue when >>> unlocking the MCS lock (post-scan), starting at the position where >>> pre-scan stopped. If both scans fail to find such T, the MCS lock is >>> passed to the first thread in the secondary queue. If the secondary queue >>> is empty, the lock is passed to the next thread in the primary queue. >>> For more details, see https://urldefense.com/v3/__https://arxiv.org/abs/1810.05600__;!!GqivPVa7Brio!OaieLQ3MMZThgxr-Of8i9dbN5CwG8mXSIBJ_sUofhAXcs43IWL2x-stO-XKLEebn$ . >>> >>> Note that this variant of CNA may introduce starvation by continuously >>> passing the lock to threads running on the same node. This issue >>> will be addressed later in the series. >>> >>> Enabling CNA is controlled via a new configuration option >>> (NUMA_AWARE_SPINLOCKS). By default, the CNA variant is patched in at the >>> boot time only if we run on a multi-node machine in native environment and >>> the new config is enabled. (For the time being, the patching requires >>> CONFIG_PARAVIRT_SPINLOCKS to be enabled as well. However, this should be >>> resolved once static_call() is available.) This default behavior can be >>> overridden with the new kernel boot command-line option >>> "numa_spinlock=on/off" (default is "auto"). >>> >>> Signed-off-by: Alex Kogan >>> Reviewed-by: Steve Sistare >>> Reviewed-by: Waiman Long >>> --- >> There is also a concern that the worst case latency for a lock transfer can be close to O(n) which can be quite large for large SMP systems. I have a patch on top that modifies the current behavior to limit the number of node scans after the lock is freed. > I understand the concern. While your patch addresses it, I am afraid it makes > the code somewhat more complex, and duplicates some of the slow path > functionality (e.g., the spin loop until the lock value changes to a certain > value). > > Let me suggest a different idea for gradually restructuring the main queue > that has some similarity to the way you suggested to handle prioritized waiters. > Basically, instead of scanning the entire chain of main queue waiters, > we can check only the next waiter and, if present and it runs on a different > node, move it to the secondary queue. In addition, to maintain the preference > for a certain numa node ID, we set the numa node of the next-next waiter, > if present, to that of the current lock holder. This is the part similar to the > way you suggested to handle prioritized waiters. > > This way, the worst case complexity of cna_order_queue() decreases from O(n) > down to O(1), as we always “scan" only one waiter. And as before, we change > the preference (and flush the secondary queue) after X handovers (or after > Y ms, as in your in other patch). > > I attach the patch that applies on top of your patch for prioritized nodes > (0006), but does not include your patch 0007 (time based threshold), > which I will integrate into the series in the next revision. > > Please, let me know what you think. > That is an interesting idea. I don't have any fundamental objection to that. I just wonder how it will impact the kind of performance test that you ran before. It would be nice to see the performance impact with that change. Cheers, Longman