Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp663943imu; Thu, 13 Dec 2018 02:15:13 -0800 (PST) X-Google-Smtp-Source: AFSGD/Xb6rbYbq9HtWCpKd83Ck9XaRTJ5oiC/RrlK0Z70fNnz9lTDtTBCn3whQ9oYQB3fUHwRBNK X-Received: by 2002:a63:b649:: with SMTP id v9mr2933425pgt.436.1544696113245; Thu, 13 Dec 2018 02:15:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544696113; cv=none; d=google.com; s=arc-20160816; b=NgeWbBdeCyMqyXd1GlJuzCueJRX0NDlRKjaue4jtDdYL+GPkAW3SH5rV3YWQm3vtsZ IllnovMl+4nXcu+62zQc6qHSspteQjEzSeACwzr4G49qOeqdd6VCZqVzhBou6qtlaqM3 kYsUxuLD5cQYCjqjoJdeLcMV8FsP3iOHMVA6LJDSbRkl4VwqEd8JpQuRomAubHDqKIev 67Eq82iHv+vcHpXYQxbID6pYF5ZQ3jFd53Ezl05XFqSlcDhQGQdYFiv8whmyEPp8k9EU bKK4Dzryg/r1Ua9v9CyjiwkYYYSz7wTJT6nXfeghaQ1MeYPdDGlCLoHUi/ZMJZ9hhb/5 IDgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version; bh=u4iPhknE1nkh+SlcocTs7YR+9DuO9NQT2FaHxN2svD4=; b=HKMQ+/2RKwYplzf1sx15aoQlUGZ7trlqPLic65H1k2Bnvvb1dopviDYmoOwEaShH7C /4K6ZAKYML5GJ72SorqFNEHppRbWJahMdNl+dvKth13mni5jtu5vU+VzrM3xCxTMdQTK TOj6M24rBewtn/w/AHP2/MScg6AKIqyre5MF/8G/NfLnsR/UC0hKVse6IEmv40BICsga E2E3XKPzJCwwkSVlkCRDM9kBqpDXKaNPY5vZRStj8NSaKmyCjpHKFVirv9G2g2ERbCCE 3XRHZeHEiwyWxoAGLZ5I0z6zFd/eOlKFt1ZtnZpZDvO66NvmS081oOd6rU6TOy9cHydF Gh4w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q14si1295158pgf.47.2018.12.13.02.14.57; Thu, 13 Dec 2018 02:15:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728272AbeLMKOC (ORCPT + 99 others); Thu, 13 Dec 2018 05:14:02 -0500 Received: from mx2.suse.de ([195.135.220.15]:34240 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727720AbeLMKOB (ORCPT ); Thu, 13 Dec 2018 05:14:01 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 039FDAC8A; Thu, 13 Dec 2018 10:13:58 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 13 Dec 2018 11:13:58 +0100 From: Roman Penyaev To: Andrea Parri Cc: Davidlohr Bueso , Jason Baron , Al Viro , "Paul E. McKenney" , Linus Torvalds , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention In-Reply-To: <20181212171348.GA12786@andrea> References: <20181212110357.25656-1-rpenyaev@suse.de> <20181212110357.25656-4-rpenyaev@suse.de> <20181212171348.GA12786@andrea> Message-ID: X-Sender: rpenyaev@suse.de User-Agent: Roundcube Webmail Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-12-12 18:13, Andrea Parri wrote: > On Wed, Dec 12, 2018 at 12:03:57PM +0100, Roman Penyaev wrote: [...] >> +static inline void list_add_tail_lockless(struct list_head *new, >> + struct list_head *head) >> +{ >> + struct list_head *prev; >> + >> + new->next = head; >> + >> + /* >> + * Initially ->next of a new element must be updated with the head >> + * (we are inserting to the tail) and only then pointers are >> atomically >> + * exchanged. XCHG guarantees memory ordering, thus ->next should >> be >> + * updated before pointers are actually swapped. >> + */ >> + >> + prev = xchg(&head->prev, new); >> + >> + /* >> + * It is safe to modify prev->next and new->prev, because a new >> element >> + * is added only to the tail and new->next is updated before XCHG. >> + */ > > IIUC, you're also relying on "some" ordering between the atomic load > of &head->prev above and the store to prev->next below: consider the > following snippet for two concurrent list_add_tail_lockless()'s: > > {Initially: List := H -> A -> B} > > CPU0 CPU1 > > list_add_tail_lockless(C, H): list_add_tail_lockless(D, H): > > C->next = H D->next = H > prev = xchg(&H->prev, C) // =B prev = xchg(&H->prev, D) // =C > B->next = C C->next = D > C->prev = B D->prev = C > > Here, as annotated, CPU0's xchg() "wins" over CPU1's xchg() (i.e., the > latter reads the value of &H->prev that the former stored to that same > location). > > As you noted above, the xchg() guarantees that CPU0's store to C->next > is "ordered before" CPU0's store to &H->prev. > > But we also want CPU1's load from &H->prev to be ordered before CPU1's > store to C->next, which is also guaranteed by the xchg() (or, FWIW, by > the address dependency between these two memory accesses). > > I do not see what could guarantee "C->next == D" in the end, otherwise. > > What am I missing? Hi Andrea, xchg always acts as a full memory barrier, i.e. mfence in x86 terms. So the following statement should be always true, otherwise nothing should work as the same code pattern is used in many generic places: CPU0 CPU1 C->next = H xchg(&ptr, C) C = xchg(&ptr, D) C->next = D This is the only guarantee we need, i.e. make it simplier: C->next = H mfence mfence C->next = D the gurantee that two stores won't reorder. Pattern is always the same: we prepare chunk of memory on CPU0 and do pointers xchg, CPU1 sees chunks of memory with all stores committed by CPU0 (regardless of CPU1 does loads or stores to this chunk). I am repeating the same thing which you also noted, but I just want to be sure that I do not say nonsense. So basically repeating to myself. Ok, let's commit that. Returning to your question: "I do not see what could guarantee "C->next == D" in the end" At the end of what? Lockless insert procedure (insert to tail) relies only on "head->prev". This is the single "place" where we atomically exchange list elements and "somehow" chain them. So insert needs only actual "head->prev", and xchg provides this guarantees to us. But there is also a user of the list, who needs to iterate over the list or to delete elements, etc, i.e. this user of the list needs list fully committed to the memory. This user takes write_lock(). So answering your question (if I understood it correctly): at the end write_lock() guarantees that list won't be seen as corrupted and updates to the last element, i.e. "->next" or "->prev" pointers of the last element are committed and seen correctly. -- Roman