Received: by 2002:a05:6a10:6006:0:0:0:0 with SMTP id w6csp1250936pxa; Fri, 28 Aug 2020 07:48:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyvSrfWQUwbIwBZjgXjDT8+x+ib9eQw+YUEw8GdZqoRuXT6wkbMKFpy0J3KL/JwaZTqdVHi X-Received: by 2002:a17:906:25d3:: with SMTP id n19mr2079234ejb.551.1598626089219; Fri, 28 Aug 2020 07:48:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598626089; cv=none; d=google.com; s=arc-20160816; b=lwJDmg7jqryaA1ZcMlQZIoeJnzFuUhEKfH78Fl2bpzyGHn6qKFOKH7GrvaGkr1W3mU XWUdbuZcmaLdTXu6nBgpyw0xNH9IkqW+o0wUL8mvw8qivrg1dES3GLsovD/SeLhPHiVv ISjUHQfDyO3byBdqI/qDnLbDk1RY0H1WJ3sa4xDJs9KoJgK7X1o+jCCpA2e/97vxucES rbHBvw987s8YOVlJsRPhMsouSCx74aSSBao0OGL+3w45sTMjziDpp/IMCDzs6tmEEPKo Dmyu5blJq6FaGvhqpZPoqI/aC3u4SWaysNfill5HMfjIHEog6MeRTs8FqhH042VSp4vT obDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=yquFv2yqai6ihVfcrAy9lCsY5CSLZWt8RXJdzk2IKIU=; b=wGXf6PofI4eUevr8kwYLebOqaOmT3RSLw+hzWIjdLeSdmxw4HOPykM6lTCUWm2aPJ9 L8PMKbCeO3ryxZyX5O0A/TkbO4k6oNCfxVD6bhYDm5eIrUfrB8ei8tYMhg8R+eA1L6j6 ql0aQgHr3DZWhz8XdcT8PXBgU6Mcq78AHF3KdaGVhrKTYdVHPEMb/mdZBwk8ssWSxdcc YhyxOQLvzdp0iAgWNYmMFuVfZhh6koork5yN9/POWahzfXjJcjCkbBfq91i6UHcf+KKm f7WKGApMZu5mip3xQmwgYA1IWB+Dk3NF6P8qYiusm92XYt4FDzD+E+eB3VIFv+UA0TRc qZjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=K87TVpu0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m7si790641edq.34.2020.08.28.07.47.46; Fri, 28 Aug 2020 07:48:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=K87TVpu0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727996AbgH1OrH (ORCPT + 99 others); Fri, 28 Aug 2020 10:47:07 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:31860 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726322AbgH1OrG (ORCPT ); Fri, 28 Aug 2020 10:47:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1598626025; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=yquFv2yqai6ihVfcrAy9lCsY5CSLZWt8RXJdzk2IKIU=; b=K87TVpu0hVkDBPjdZ9Ci9NNUyu0L/BufU2KhRrAy6/Yc4vkIcjD9e5Mw5cuUbzgUx6inTl WwqxlojPSOwgugBOavhG1bcwnfiQsswsvWX/FWC5DkLKDvCwts8g0OYJlq4yYwgCC10Crc z0zMmPIykc8awLcIxkbBYHtOMNHKAXk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-449-W2fevorKPdWPX7Px-cRH0w-1; Fri, 28 Aug 2020 10:47:02 -0400 X-MC-Unique: W2fevorKPdWPX7Px-cRH0w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D2E2A100CF7D; Fri, 28 Aug 2020 14:47:00 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.40.194.8]) by smtp.corp.redhat.com (Postfix) with SMTP id 4766F6716C; Fri, 28 Aug 2020 14:46:53 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Fri, 28 Aug 2020 16:47:00 +0200 (CEST) Date: Fri, 28 Aug 2020 16:46:52 +0200 From: Oleg Nesterov To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, mhiramat@kernel.org, Eddy_Wu@trendmicro.com, x86@kernel.org, davem@davemloft.net, rostedt@goodmis.org, naveen.n.rao@linux.ibm.com, anil.s.keshavamurthy@intel.com, linux-arch@vger.kernel.org, cameron@moodycamel.com, will@kernel.org, paulmck@kernel.org Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist Message-ID: <20200828144650.GF28468@redhat.com> References: <20200827161237.889877377@infradead.org> <20200827161754.535381269@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200827161754.535381269@infradead.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/27, Peter Zijlstra wrote: > > 1 file changed, 129 insertions(+) 129 lines! And I spent more than 2 hours trying to understand these 129 lines ;) looks correct... However, I still can't understand the usage of _acquire/release ops in this code. > +static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list) > +{ > + /* > + * Since the refcount is zero, and nobody can increase it once it's > + * zero (except us, and we run only one copy of this method per node at > + * a time, i.e. the single thread case), then we know we can safely > + * change the next pointer of the node; however, once the refcount is > + * back above zero, then other threads could increase it (happens under > + * heavy contention, when the refcount goes to zero in between a load > + * and a refcount increment of a node in try_get, then back up to > + * something non-zero, then the refcount increment is done by the other > + * thread) -- so if the CAS to add the node to the actual list fails, > + * decrese the refcount and leave the add operation to the next thread > + * who puts the refcount back to zero (which could be us, hence the > + * loop). > + */ > + struct freelist_node *head = READ_ONCE(list->head); > + > + for (;;) { > + WRITE_ONCE(node->next, head); > + atomic_set_release(&node->refs, 1); > + > + if (!try_cmpxchg_release(&list->head, &head, node)) { OK, these 2 _release above look understandable, they pair with atomic_try_cmpxchg_acquire/try_cmpxchg_acquire in freelist_try_get(). > + /* > + * Hmm, the add failed, but we can only try again when > + * the refcount goes back to zero. > + */ > + if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1) > + continue; Do we really need _release ? Why can't atomic_fetch_add_relaxed() work? > +static inline struct freelist_node *freelist_try_get(struct freelist_head *list) > +{ > + struct freelist_node *prev, *next, *head = smp_load_acquire(&list->head); > + unsigned int refs; > + > + while (head) { > + prev = head; > + refs = atomic_read(&head->refs); > + if ((refs & REFS_MASK) == 0 || > + !atomic_try_cmpxchg_acquire(&head->refs, &refs, refs+1)) { > + head = smp_load_acquire(&list->head); > + continue; > + } > + > + /* > + * Good, reference count has been incremented (it wasn't at > + * zero), which means we can read the next and not worry about > + * it changing between now and the time we do the CAS. > + */ > + next = READ_ONCE(head->next); > + if (try_cmpxchg_acquire(&list->head, &head, next)) { > + /* > + * Yay, got the node. This means it was on the list, > + * which means should-be-on-freelist must be false no > + * matter the refcount (because nobody else knows it's > + * been taken off yet, it can't have been put back on). > + */ > + WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST); > + > + /* > + * Decrease refcount twice, once for our ref, and once > + * for the list's ref. > + */ > + atomic_fetch_add(-2, &head->refs); Do we the barriers implied by _fetch_? Why can't atomic_sub(2, refs) work? > + /* > + * OK, the head must have changed on us, but we still need to decrement > + * the refcount we increased. > + */ > + refs = atomic_fetch_add(-1, &prev->refs); Cosmetic, but why not atomic_fetch_dec() ? Oleg.