Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp774057imu; Thu, 13 Dec 2018 04:21:17 -0800 (PST) X-Google-Smtp-Source: AFSGD/WW+b2rTmwcflOYYGb/6YKUw7FXQozsMjoklFsFBg6DLhzZI5wbY8yn395f5/G5F49ez0/m X-Received: by 2002:a63:5518:: with SMTP id j24mr21349240pgb.208.1544703677878; Thu, 13 Dec 2018 04:21:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544703677; cv=none; d=google.com; s=arc-20160816; b=zWGPKGmm3ky9I7/Ty/mvEVj2awv9UhIAV8j7Is7K80X3w6lHIJwjHF4qgjhOGx81gT ubPpQxNU1JWh0YwbYnmc4dLNLi6VlGfaZTyIp3UwvQq9G+hWBglB21IzAfBMpXGZ1LdI Wrlk3LLXQJu1bILhvSuVapWHoy43d2v2MCXWc76F6XGanPrVJFm7JJ/c5xMsjmicKx1E F3YnLMoEQGqAtQrMbPTtCre3S3tk4az340RTBIEx4/4tvet/oELIKoegaw6SBmBsnPNt pt3ZEMJZ0RzEbaektc9d+6vyUnA2MXTf/sAFT28wgBS5w4d737aqMfaZDIYRxNo2RHiK z/xQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version; bh=CQotlnQ+E5ALEigsjzMTB/1gFJmxrhX/+KLPJ/RxeuU=; b=BlY4p4NIOO8+pQQohp4FQH4RpP/wEqm2c7AtInM8OShcmb9K5dISqbicEHyajJFXrG JROwO/FK21dfbh0Q0+83YxsT8SKLpPPy5LdQ/wRYl3dYYgtJDm/IAi8da9tc7hmHZCF4 FeeKiwUwf582T4P9UEbCS5Sjcaz6ku1eXaruTp5MMPMvLELSkHfzmTKXR/Zjav2b3RBj ZXqxH7Jc9/LJ5WBAay52DTt/okvpvMEcl5a/Qd+vefHM8B/pNmF/IITCNOOtqCUjKhQB N5qwYUB49chyxGDvR8SdSZcxbahWgPHwtNJFAse5iseOghx/ZFigy7uCLtI0aBIyuyKz BJvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g26si1467988pfe.127.2018.12.13.04.21.02; Thu, 13 Dec 2018 04:21:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729092AbeLMMTb (ORCPT + 99 others); Thu, 13 Dec 2018 07:19:31 -0500 Received: from mx2.suse.de ([195.135.220.15]:54758 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728766AbeLMMTa (ORCPT ); Thu, 13 Dec 2018 07:19:30 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 41F52AEBB; Thu, 13 Dec 2018 12:19:28 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 13 Dec 2018 13:19:28 +0100 From: Roman Penyaev To: Andrea Parri Cc: Davidlohr Bueso , Jason Baron , Al Viro , "Paul E. McKenney" , Linus Torvalds , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention In-Reply-To: <20181213111957.GA8459@andrea> References: <20181212110357.25656-1-rpenyaev@suse.de> <20181212110357.25656-4-rpenyaev@suse.de> <20181212171348.GA12786@andrea> <20181213111957.GA8459@andrea> Message-ID: X-Sender: rpenyaev@suse.de User-Agent: Roundcube Webmail Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-12-13 12:19, Andrea Parri wrote: > On Thu, Dec 13, 2018 at 11:13:58AM +0100, Roman Penyaev wrote: >> On 2018-12-12 18:13, Andrea Parri wrote: >> > On Wed, Dec 12, 2018 at 12:03:57PM +0100, Roman Penyaev wrote: >> >> [...] >> >> > > +static inline void list_add_tail_lockless(struct list_head *new, >> > > + struct list_head *head) >> > > +{ >> > > + struct list_head *prev; >> > > + >> > > + new->next = head; >> > > + >> > > + /* >> > > + * Initially ->next of a new element must be updated with the head >> > > + * (we are inserting to the tail) and only then pointers are >> > > atomically >> > > + * exchanged. XCHG guarantees memory ordering, thus ->next should >> > > be >> > > + * updated before pointers are actually swapped. >> > > + */ >> > > + >> > > + prev = xchg(&head->prev, new); >> > > + >> > > + /* >> > > + * It is safe to modify prev->next and new->prev, because a new >> > > element >> > > + * is added only to the tail and new->next is updated before XCHG. >> > > + */ >> > >> > IIUC, you're also relying on "some" ordering between the atomic load >> > of &head->prev above and the store to prev->next below: consider the >> > following snippet for two concurrent list_add_tail_lockless()'s: >> > >> > {Initially: List := H -> A -> B} >> > >> > CPU0 CPU1 >> > >> > list_add_tail_lockless(C, H): list_add_tail_lockless(D, H): >> > >> > C->next = H D->next = H >> > prev = xchg(&H->prev, C) // =B prev = xchg(&H->prev, D) // =C >> > B->next = C C->next = D >> > C->prev = B D->prev = C >> > >> > Here, as annotated, CPU0's xchg() "wins" over CPU1's xchg() (i.e., the >> > latter reads the value of &H->prev that the former stored to that same >> > location). >> > >> > As you noted above, the xchg() guarantees that CPU0's store to C->next >> > is "ordered before" CPU0's store to &H->prev. >> > >> > But we also want CPU1's load from &H->prev to be ordered before CPU1's >> > store to C->next, which is also guaranteed by the xchg() (or, FWIW, by >> > the address dependency between these two memory accesses). >> > >> > I do not see what could guarantee "C->next == D" in the end, otherwise. >> > >> > What am I missing? >> >> Hi Andrea, >> >> xchg always acts as a full memory barrier, i.e. mfence in x86 terms. >> So the >> following statement should be always true, otherwise nothing should >> work as >> the same code pattern is used in many generic places: >> >> CPU0 CPU1 >> >> C->next = H >> xchg(&ptr, C) >> C = xchg(&ptr, D) >> C->next = D >> >> >> This is the only guarantee we need, i.e. make it simplier: >> >> C->next = H >> mfence mfence >> C->next = D >> >> the gurantee that two stores won't reorder. Pattern is always the >> same: we >> prepare chunk of memory on CPU0 and do pointers xchg, CPU1 sees chunks >> of >> memory with all stores committed by CPU0 (regardless of CPU1 does >> loads >> or stores to this chunk). >> >> I am repeating the same thing which you also noted, but I just want to >> be >> sure that I do not say nonsense. So basically repeating to myself. >> >> Ok, let's commit that. Returning to your question: "I do not see what >> could guarantee "C->next == D" in the end" >> >> At the end of what? Lockless insert procedure (insert to tail) relies >> only >> on "head->prev". This is the single "place" where we atomically >> exchange >> list elements and "somehow" chain them. So insert needs only actual >> "head->prev", and xchg provides this guarantees to us. > > When all the operations reported in the snippet have completed (i.e., > executed and propagated to memory). > > To rephrase my remark: > > I am saying that we do need some ordering between the xchg() and the > program-order _subsequent stores, and implicitly suggesting to write > this down in the comment. As I wrote, this ordering _is provided by > the xchg() itself or by the dependency; so, maybe, something like: > > /* > * [...] XCHG guarantees memory ordering, thus new->next is > * updated before pointers are actually swapped and pointers > * are swapped before prev->next is updated. > */ > > Adding a snippet, say in the form you reported above, would not hurt > of course. ;-) Sure thing. Will extend the comments. -- Roman