Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp679676ybi; Fri, 31 May 2019 07:23:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqxnWlkGLQrl2ZNilDG6b4aHa4BFcAgF6Sus5Tyl17tdIXjbGabSP6gln7qSk20rmVvbFk0f X-Received: by 2002:a17:90a:d992:: with SMTP id d18mr9767201pjv.74.1559312637544; Fri, 31 May 2019 07:23:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559312637; cv=none; d=google.com; s=arc-20160816; b=ecJt1/AmIxPZKrPKu7XoIsrD8QhongaTNZouEoNKNL4BPDoqILpDtnvyqbZ7XleuUo r32ydcLCcW0KjBhmJEGjAD0vXaXlTAGBlO9FfClDk4ABUgpgeSP5sJgiSBOEe8iR8fTL vHwZCfgZ6i+XYHxzgyvRLBl5XBTEGkqr1Gh7JAL9Y0w8MMvJ+8eOMBTKUca3idJ214s9 1cyk1NMsnvfktmtavky5xb+pT7l3mYpY1pkBSpyEXwRWwzmVv/+oNZFJM10aMf6fccdf swjDe1hFB46Oe2rM/h3cSU0JbyoTNy8QOf3EESihuvsJeopeCj+zOpcnghsE87OV7F4u 78IQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version; bh=G6gIuEW8TlqqZD8d+hZPGS7LyJC+o/Dn8MJgqjfv3iI=; b=E9KEz8+gzi3Vy86LhLdXrKbPDBw4RNdAFutz8nqhu22RtQHDUi3PsnDXPXWKa4FHaO 4FRJpoLOWwuI5fLqB5CFHIY9X3QMKS8T/1oR+mfpVbc3FNkVqmS5yQUzVSU0VFf7ilUQ UvTd+b2g8rN0cjCezfCcyqOWHqJr7/F8vA1S55pL3v5NSBkI7dzrUSVPTrnk9ySXXNaG k/oGqMTWN33bX8BZlzVKT/jOxkeYZ9zM1D9DFbDLLxZ3TbL3HfrUOqjFLgij/IOckzsy RdKeNr10Chv7tZjk6VfWHBmAJk5CV5tQ8sEirUKLpEySMCrTshaUIjgWFWBagciZQxxP b9Tg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h5si6351748plt.232.2019.05.31.07.23.41; Fri, 31 May 2019 07:23:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726697AbfEaOVd (ORCPT + 99 others); Fri, 31 May 2019 10:21:33 -0400 Received: from mx2.suse.de ([195.135.220.15]:58328 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726550AbfEaOVd (ORCPT ); Fri, 31 May 2019 10:21:33 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2621AAF79; Fri, 31 May 2019 14:21:31 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 31 May 2019 16:21:30 +0200 From: Roman Penyaev To: Peter Zijlstra Cc: azat@libevent.org, akpm@linux-foundation.org, viro@zeniv.linux.org.uk, torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 06/13] epoll: introduce helpers for adding/removing events to uring In-Reply-To: <20190531125636.GZ2606@hirez.programming.kicks-ass.net> References: <20190516085810.31077-1-rpenyaev@suse.de> <20190516085810.31077-7-rpenyaev@suse.de> <20190531095607.GC17637@hirez.programming.kicks-ass.net> <274e29d102133f3be1f309c66cb0af36@suse.de> <20190531125636.GZ2606@hirez.programming.kicks-ass.net> Message-ID: <98e74ceeefdffc9b50fb33e597d270f7@suse.de> X-Sender: rpenyaev@suse.de User-Agent: Roundcube Webmail Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019-05-31 14:56, Peter Zijlstra wrote: > On Fri, May 31, 2019 at 01:15:21PM +0200, Roman Penyaev wrote: >> On 2019-05-31 11:56, Peter Zijlstra wrote: >> > On Thu, May 16, 2019 at 10:58:03AM +0200, Roman Penyaev wrote: >> > > +static inline bool ep_add_event_to_uring(struct epitem *epi, >> > > __poll_t pollflags) >> > > +{ >> > > + struct eventpoll *ep = epi->ep; >> > > + struct epoll_uitem *uitem; >> > > + bool added = false; >> > > + >> > > + if (WARN_ON(!pollflags)) >> > > + return false; >> > > + >> > > + uitem = &ep->user_header->items[epi->bit]; >> > > + /* >> > > + * Can be represented as: >> > > + * >> > > + * was_ready = uitem->ready_events; >> > > + * uitem->ready_events &= ~EPOLLREMOVED; >> > > + * uitem->ready_events |= pollflags; >> > > + * if (!was_ready) { >> > > + * // create index entry >> > > + * } >> > > + * >> > > + * See the big comment inside ep_remove_user_item(), why it is >> > > + * important to mask EPOLLREMOVED. >> > > + */ >> > > + if (!atomic_or_with_mask(&uitem->ready_events, >> > > + pollflags, EPOLLREMOVED)) { >> > > + unsigned int i, *item_idx, index_mask; >> > > + >> > > + /* >> > > + * Item was not ready before, thus we have to insert >> > > + * new index to the ring. >> > > + */ >> > > + >> > > + index_mask = ep_max_index_nr(ep) - 1; >> > > + i = __atomic_fetch_add(&ep->user_header->tail, 1, >> > > + __ATOMIC_ACQUIRE); >> > > + item_idx = &ep->user_index[i & index_mask]; >> > > + >> > > + /* Signal with a bit, which is > 0 */ >> > > + *item_idx = epi->bit + 1; >> > >> > Did you just increment the user visible tail pointer before you filled >> > the data? That is, can the concurrent userspace observe the increment >> > before you put credible data in its place? >> >> No, the "data" is the "ready_events" mask, which was updated before, >> using cmpxchg, atomic_or_with_mask() call. All I need is to put an >> index of just updated item to the uring. >> >> Userspace, in its turn, gets the index from the ring and then checks >> the mask. > > But where do you write the index into the shared memory? That index > should be written before you publish the new tail. The ep_add_event_to_uring() is lockless, thus I can't increase tail after, I need to reserve the index slot, where to write to. I can use shadow tail, which is not seen by userspace, but I have to guarantee that tail is updated with shadow tail *after* all callers of ep_add_event_to_uring() are left. That is possible, please see the code below, but it adds more complexity: (code was tested on user side, thus has c11 atomics) static inline void add_event__kernel(struct ring *ring, unsigned bit) { unsigned i, cntr, commit_cntr, *item_idx, tail, old; i = __atomic_fetch_add(&ring->cntr, 1, __ATOMIC_ACQUIRE); item_idx = &ring->user_itemsindex[i % ring->nr]; /* Update data */ *item_idx = bit; commit_cntr = __atomic_add_fetch(&ring->commit_cntr, 1, __ATOMIC_RELEASE); tail = ring->user_header->tail; rmb(); do { cntr = ring->cntr; if (cntr != commit_cntr) /* Someone else will advance tail */ break; old = tail; } while ((tail = __sync_val_compare_and_swap(&ring->user_header->tail, old, cntr)) != old); } Another way (current solution) is to spin on userspace side in order to get index > 0 (valid index is always > 0), i.e.: item_idx_ptr = &index[idx & indeces_mask]; /* * Spin here till we see valid index */ while (!(idx = __atomic_load_n(item_idx_ptr, __ATOMIC_ACQUIRE))) ; So of course tail can be updated after, like you mentioned, but then I have to introduce locks. I want to keep it lockless on hot event path. -- Roman