Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp458655ybi; Fri, 31 May 2019 04:17:49 -0700 (PDT) X-Google-Smtp-Source: APXvYqzJjk1o94jeuvPBFRIyaY9+jjBQzF7VQZdcD5lTKn3QcfYpY59MX83ooW8JDnT7lmfyh5wt X-Received: by 2002:aa7:8294:: with SMTP id s20mr1167138pfm.75.1559301468913; Fri, 31 May 2019 04:17:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559301468; cv=none; d=google.com; s=arc-20160816; b=Ap9JXXqNQdSPGv1TkSK92kb0+zD1wP6SQpKumPuUjl4ZzRQlFwrK1h4S78KpVixVF7 9HWmAbVkZqubdX6rdeubZEX2P1N1IqWzben4zY+4+cWC54TPmPmtAjsdj1bzesj4nzl6 yxcjFLYbE8JNgYp9b5IeMPu38Weg07PCWU52/rZshXnDsGSDrUvnMW9mIGra68W/ebMU gf09vG/g7E+NVvyt/INFupLe09ZPhjQcDbVyYQ5oCKUyLAuV+k5dfSZwML45rGtv9MuN r9LbS0bU0JsPnMpVCYQimkiiAN8HPa4D03W5EqQidQm5To9YXs20L/se8rNrVrSZIxIP Cn9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version; bh=PjRmnYkGmc2oKnwJhInCrygztunNuKY+dw9NNw0XiIY=; b=G0GbVF/jXDQ3Csz9ZZHe36LJkpXri7DlMysTF3ZTIkFas3Fr+zjvwxP4Xtp6naP5O+ a1u3o0Pd31FN0QrQJxHRsxeZg2Uogh9xxYTQkYR//JbhflEr0F11yIdinwzGHMhT2UnC N+Q8ey4bTAgNE61I71chas4vdbZgpUhNlI+wXmxNLeO+Qy3wTd6efQYvDCf8coX+aATm 1GVUhyoU0WIvb0uAV8/XNQMZ/8UV9mN0uVAOEGUxzjJvcOszwjA6m0gR/TPZkRVpqme8 xkwNq/GA1cHvpCoYJ5LAAVSGLqEZ4wb8Bsp24o+83g1qvzku/N05u2llSrbCRU08c6eU amsw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bh11si5469117plb.383.2019.05.31.04.17.30; Fri, 31 May 2019 04:17:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727184AbfEaLPY (ORCPT + 99 others); Fri, 31 May 2019 07:15:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:50542 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726240AbfEaLPX (ORCPT ); Fri, 31 May 2019 07:15:23 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 96E9FABE9; Fri, 31 May 2019 11:15:21 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 31 May 2019 13:15:21 +0200 From: Roman Penyaev To: Peter Zijlstra Cc: azat@libevent.org, akpm@linux-foundation.org, viro@zeniv.linux.org.uk, torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 06/13] epoll: introduce helpers for adding/removing events to uring In-Reply-To: <20190531095607.GC17637@hirez.programming.kicks-ass.net> References: <20190516085810.31077-1-rpenyaev@suse.de> <20190516085810.31077-7-rpenyaev@suse.de> <20190531095607.GC17637@hirez.programming.kicks-ass.net> Message-ID: <274e29d102133f3be1f309c66cb0af36@suse.de> X-Sender: rpenyaev@suse.de User-Agent: Roundcube Webmail Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019-05-31 11:56, Peter Zijlstra wrote: > On Thu, May 16, 2019 at 10:58:03AM +0200, Roman Penyaev wrote: >> +static inline bool ep_add_event_to_uring(struct epitem *epi, __poll_t >> pollflags) >> +{ >> + struct eventpoll *ep = epi->ep; >> + struct epoll_uitem *uitem; >> + bool added = false; >> + >> + if (WARN_ON(!pollflags)) >> + return false; >> + >> + uitem = &ep->user_header->items[epi->bit]; >> + /* >> + * Can be represented as: >> + * >> + * was_ready = uitem->ready_events; >> + * uitem->ready_events &= ~EPOLLREMOVED; >> + * uitem->ready_events |= pollflags; >> + * if (!was_ready) { >> + * // create index entry >> + * } >> + * >> + * See the big comment inside ep_remove_user_item(), why it is >> + * important to mask EPOLLREMOVED. >> + */ >> + if (!atomic_or_with_mask(&uitem->ready_events, >> + pollflags, EPOLLREMOVED)) { >> + unsigned int i, *item_idx, index_mask; >> + >> + /* >> + * Item was not ready before, thus we have to insert >> + * new index to the ring. >> + */ >> + >> + index_mask = ep_max_index_nr(ep) - 1; >> + i = __atomic_fetch_add(&ep->user_header->tail, 1, >> + __ATOMIC_ACQUIRE); > > afaict __atomic_fetch_add() does not exist. That is gcc extension. I did not find any API just to increment the variable atomically without using/casting to atomic. What is a proper way to achieve that? > >> + item_idx = &ep->user_index[i & index_mask]; >> + >> + /* Signal with a bit, which is > 0 */ >> + *item_idx = epi->bit + 1; > > Did you just increment the user visible tail pointer before you filled > the data? That is, can the concurrent userspace observe the increment > before you put credible data in its place? No, the "data" is the "ready_events" mask, which was updated before, using cmpxchg, atomic_or_with_mask() call. All I need is to put an index of just updated item to the uring. Userspace, in its turn, gets the index from the ring and then checks the mask. > >> + >> + /* >> + * Want index update be flushed from CPU write buffer and >> + * immediately visible on userspace side to avoid long busy >> + * loops. >> + */ >> + smp_wmb(); > > That's still complete nonsense. Yes, true. My confusion came from the simple test, where one thread swaps pointers in a loop, another thread dereferences pointer and increments a variable: THR#0 ----------- unsigned vvv1 = 0, vvv2 = 0; unsigned *ptr; ptr = &vvv1; thr_level2 = &vvv2; while (!stop) { unsigned *tmp = *thr_level2; *thr_level2 = ptr; barrier(); <<<< ???? ptr = tmp; } THR#1 ----------- while (!stop) { ptr = thr_level2; (*ptr)++; } At the end I expect `vvv1` and `vvv2` are approximately equally incremented. But, without barrier() only one variable is incremented. Now I see that barrier() should be defined as a simple compiler barrier as asm volatile("" ::: "memory"), and there is nothing related with write buffer as I wrote in the comment. So indeed garbage and can be removed. Thanks. -- Roman