Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp4016807ybl; Mon, 3 Feb 2020 10:56:00 -0800 (PST) X-Google-Smtp-Source: APXvYqwkvk2lHVRbUjpLVP3FtkE++rNRGgGIIEuc4w4wCapj2l5EnHNTUrU4LlLTWtmdyMQIcZsU X-Received: by 2002:a9d:2c2a:: with SMTP id f39mr16998930otb.301.1580756159853; Mon, 03 Feb 2020 10:55:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580756159; cv=none; d=google.com; s=arc-20160816; b=v/s3oTlHjexHGL8z1iELYSJBIJKTSgNDqB4qipNY/4ieuDqwxoiqOcxNqCMK1p+CEy E5UO5OmS7aMIFCKSyH3lxQGL7AU5x1SpDCah6mAWYVQOvEnwh30+ulJLDhWczQVKqST3 8/fPVgucyaD8/sWeMnvMvazmRWhD66+a9ivoa/QOxnxmeecPceDCPhzANTihOIeyXnMR +4ZPWk4Ogf4GCVvoB7Og6XrQyPp5Kh+gb92DUl30VJIFbsOcAFBn8JZujVebg/A+D2az /Ve6oLK3xgaH4gN9Xb5LvqrrQmOXTnspb6Y4pE/6QnPYEdKUwIqY6cloNOca30hogHwZ UEMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=JOteDJ3+BukmrNpNc8pt77u+6vDZiBJKwA5I+kmfoCY=; b=c4HroJLZmvkN4Vw3aoie70M8LP/v2hKWobPJAL4PEwGBSqfFBEGzatqEaqP6pd011V wiqYy4WL3VnDwdwSHN7TIsezKrwbAyR/sxaFevxHHmNvIs0LLNHcEJqVndVmespF/WiV L/9j4Z0KBiSZLpAaW4l9iU8iKdkCj52094mxpycu7UnB5iqWfWtdm6qeZd7HDBjpQYae MMj4YuPnRuok7UsgBd+RBFWx3+pFur1mAXR1d4PB8jlxAVo1q3DfVfL5/YzXRluWem9E 0+pzRSU2Zs5pQL860VQ7MMq09vHpZKFWKTwNUQVcOTJbYDYPh8ABZF3Hb0NmJ9UHAVRu 7Oyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=jHHC5auk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w23si4307601otm.256.2020.02.03.10.55.47; Mon, 03 Feb 2020 10:55:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=jHHC5auk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731411AbgBCQsZ (ORCPT + 99 others); Mon, 3 Feb 2020 11:48:25 -0500 Received: from mail.kernel.org ([198.145.29.99]:35010 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728876AbgBCQsX (ORCPT ); Mon, 3 Feb 2020 11:48:23 -0500 Received: from cakuba.hsd1.ca.comcast.net (unknown [199.201.64.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 43B302051A; Mon, 3 Feb 2020 16:48:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1580748502; bh=5qn3J44Ornr8FlQUwjr282VdrCAuOLfDuKSzs4n6BZA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=jHHC5auk5GVPvV7ypPXeZw04FGKiNBJ4TFOPXdzcKyYknlfsJvLwQgdc9A/QIfX09 VURU2Aqzv1k22fJB4lScrZeeDeeD3ynEZ5rLwaMKNAafrDm1zvpuAoZXXS9e8YVEDt Kw8PhzC38WjDhQfkT4dyHE+vmPCRbAFpr3Q2dKXw= Date: Mon, 3 Feb 2020 08:48:21 -0800 From: Jakub Kicinski To: Max Neunhoeffer Cc: netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, LKML , Roman Penyaev , Christopher Kohlhoff , viro@zeniv.linux.org.uk Subject: Re: epoll_wait misses edge-triggered eventfd events: bug in Linux 5.3 and 5.4 Message-ID: <20200203084821.7a672861@cakuba.hsd1.ca.comcast.net> In-Reply-To: <20200203151536.caf6n4b2ymvtssmh@tux> References: <20200131135730.ezwtgxddjpuczpwy@tux> <20200201121647.62914697@cakuba.hsd1.ca.comcast.net> <20200203151536.caf6n4b2ymvtssmh@tux> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 3 Feb 2020 16:15:36 +0100, Max Neunhoeffer wrote: > Dear Jakub and all, > > I have done a git bisect and found that this commit introduced the epoll > bug: > > https://github.com/torvalds/linux/commit/a218cc4914209ac14476cb32769b31a556355b22 > > I Cc the author of the commit. Awesome, thanks a lot for doing that! Hopefully Roman can take a look soon. Breaking boost::asio seems like a pretty serious regression. > This makes sense, since the commit introduces a new rwlock to reduce > contention in ep_poll_callback. I do not fully understand the details > but this sounds all very close to this bug. > > I have also verified that the bug is still present in the latest master > branch in Linus' repository. > > Furthermore, Chris Kohlhoff has provided yet another reproducing program > which is no longer using edge-triggered but standard level-triggered > events and epoll_wait. This makes the bug all the more urgent, since > potentially more programs could run into this problem and could end up > with sleeping barbers. > > I have added all the details to the bugzilla bugreport: > > https://bugzilla.kernel.org/show_bug.cgi?id=205933 > > Hopefully, we can resolve this now equipped with this amount of information. > > Best regards, > Max. > > On 20/02/01 12:16, Jakub Kicinski wrote: > > On Fri, 31 Jan 2020 14:57:30 +0100, Max Neunhoeffer wrote: > > > Dear All, > > > > > > I believe I have found a bug in Linux 5.3 and 5.4 in epoll_wait/epoll_ctl > > > when an eventfd together with edge-triggered or the EPOLLONESHOT policy > > > is used. If an epoll_ctl call to rearm the eventfd happens approximately > > > at the same time as the epoll_wait goes to sleep, the event can be lost, > > > even though proper protection through a mutex is employed. > > > > > > The details together with two programs showing the problem can be found > > > here: > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=205933 > > > > > > Older kernels seem not to have this problem, although I did not test all > > > versions. I know that 4.15 and 5.0 do not show the problem. > > > > > > Note that this method of using epoll_wait/eventfd is used by > > > boost::asio to wake up event loops in case a new completion handler > > > is posted to an io_service, so this is probably relevant for many > > > applications. > > > > > > Any help with this would be appreciated. > > > > Could be networking related but let's CC FS folks just in case. > > > > Would you be able to perform bisection to narrow down the search > > for a buggy change?