Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp3936541ybl; Mon, 3 Feb 2020 09:19:20 -0800 (PST) X-Google-Smtp-Source: APXvYqy5kt4cy4dLpFoZhqfFP8neJ0MT574h3NdLtD2eLJuCFhBZBjHTE3l5HxEtrxCjOpt1In+u X-Received: by 2002:aca:ebc2:: with SMTP id j185mr65713oih.71.1580750360175; Mon, 03 Feb 2020 09:19:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580750360; cv=none; d=google.com; s=arc-20160816; b=Rbp7qx8dzPRAT/UW+KDTAwAJrpITJxG4hP4V0G83gn4UB3FkMq70MvMvSGH3gS0pE2 lqlzEAjylT1irSWEbhkyJfqJB5fYqyHI1cEjQQuMrV7MULj0zYfqOEKThdfjSXKCTPiS 41XDVgtD8KzkxYgJDE+SWr/SdH4pmTxO5gzROmqeouTyUI4Cvw/C8iC/xKG8NMK6yMvw gZYW0p/mgMldXJbELhyIUGZI3YXg2jBAx5tskua2RBUzmU6SQPoNdZVqa81epRQxcPSs EZBKWOWETfMxAAvPrWyy8uzp0pZiy45f2wsC+uUNFZvWqChFCZcjlnsxoUZncpQdSF89 BBmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=oDRH5we1LRYr69H/g74BFyb9GhUUuAeUSoG8mXyHu3Y=; b=FXRh3wvEYn7kJseQ101E4ac3Ol+QWsC23Ss/rtvHH5616dmIh4YxwUXmtQROPwR5yN ehGJtXkQlz3ct7UhUfJfLLG2ezGTilTKFiHSPM2vThMpeq63OHqv7zSpvaj3YwqA5Bc0 MGjfNrD18D3V1uVJewHuq0sX624PZebuTLZlv8G8yBeanepxq5Zjh5B2Dzss2ePMsEwt 7V33DHNluPVQoX5glLtx1RgCnIHFyYqwrB7QzT1c6os8UiSuGeywds+cKwB4EgOSyofF bH/b3IsSjq4k9lZDPBDmgrJUqsAjAfnaHa1uWzWka4c5cC2J92vEdlgRbnpIieNq1Q4H 0XqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@arangodb.com header.s=google header.b=fl1ZYh0C; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z4si8703569oix.48.2020.02.03.09.19.07; Mon, 03 Feb 2020 09:19:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@arangodb.com header.s=google header.b=fl1ZYh0C; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727100AbgBCPQ4 (ORCPT + 99 others); Mon, 3 Feb 2020 10:16:56 -0500 Received: from mail-wm1-f66.google.com ([209.85.128.66]:53508 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727279AbgBCPQz (ORCPT ); Mon, 3 Feb 2020 10:16:55 -0500 Received: by mail-wm1-f66.google.com with SMTP id s10so16345390wmh.3 for ; Mon, 03 Feb 2020 07:16:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arangodb.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=oDRH5we1LRYr69H/g74BFyb9GhUUuAeUSoG8mXyHu3Y=; b=fl1ZYh0CUf3sMGtHtJIdJafqlYKLZsSI1uu6QurSiwRuCTiVJqh9V0cgqfAZkuYn2S fXV9TTePcAwikD+BC5HR6ZHhzT00wtG+qlrpBQ2Wdn5vUqVy6Bx+g7R5fa+HziqZ6Odm llpf1Jt1DT8yD9+pX8BwrZtNhPoh7hpgfOtBrMgiKpA8nf5H9XBIXmfw1qg0KprYgfSw IBVnql47upOCp2D8yO/F4lG3svM0wg5RwmuIrl8ppeivODqmrlpuUvzvK5KqqjYbd3kW 26nq1Ky+W7oW6Mw2INas3fWVOddwxEeg3PyU7BuMHtzNm7TMmEejgAjkYnbaKw7H/oFC q8Xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=oDRH5we1LRYr69H/g74BFyb9GhUUuAeUSoG8mXyHu3Y=; b=WjjbYFSlHYi/mr9Vu8dQBSLvh+H9a6YcR1qFbYc/TIbQjj245+XdyefmXTfQ4AKTPR gseGiMEO5g/pgLVtVZoRIaYpc3KhdDPMQOcN7h94Tnydk6GHOpuo0Eh5/mM3GD1mG+FV dEtHZozg/qcLzXJCqgAnuoR1+L+gyLwGUJ8sns5nKLLwrrqnZZeIac4E++xH2xzFK5mK jP11IPbpbTM1EgagLgD0zF2J1GiCeKQ6GF0rKtf4WAoNAI80MefFTSiTXTZb6iCbzoCv x15qiIaeOEsy/n+/ZTo92LdlJ1s0cWZDC/SfvhRgLPYPd5392uWjbvwo6r5oRZ5+D30i f2sw== X-Gm-Message-State: APjAAAVzchnbu1EwCwvvRe4Meh+kc80skLrNFIOG5TEco5bAFinjP5qX 0eAr8dn4aIoUiITX8c6jkBYg X-Received: by 2002:a7b:c651:: with SMTP id q17mr31280131wmk.5.1580743013835; Mon, 03 Feb 2020 07:16:53 -0800 (PST) Received: from localhost (static-85-197-33-87.netcologne.de. [85.197.33.87]) by smtp.gmail.com with ESMTPSA id z19sm22877844wmi.43.2020.02.03.07.16.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2020 07:16:53 -0800 (PST) Date: Mon, 3 Feb 2020 16:15:36 +0100 From: Max Neunhoeffer To: Jakub Kicinski Cc: netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, LKML , Roman Penyaev , Christopher Kohlhoff Subject: Re: epoll_wait misses edge-triggered eventfd events: bug in Linux 5.3 and 5.4 Message-ID: <20200203151536.caf6n4b2ymvtssmh@tux> References: <20200131135730.ezwtgxddjpuczpwy@tux> <20200201121647.62914697@cakuba.hsd1.ca.comcast.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200201121647.62914697@cakuba.hsd1.ca.comcast.net> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear Jakub and all, I have done a git bisect and found that this commit introduced the epoll bug: https://github.com/torvalds/linux/commit/a218cc4914209ac14476cb32769b31a556355b22 I Cc the author of the commit. This makes sense, since the commit introduces a new rwlock to reduce contention in ep_poll_callback. I do not fully understand the details but this sounds all very close to this bug. I have also verified that the bug is still present in the latest master branch in Linus' repository. Furthermore, Chris Kohlhoff has provided yet another reproducing program which is no longer using edge-triggered but standard level-triggered events and epoll_wait. This makes the bug all the more urgent, since potentially more programs could run into this problem and could end up with sleeping barbers. I have added all the details to the bugzilla bugreport: https://bugzilla.kernel.org/show_bug.cgi?id=205933 Hopefully, we can resolve this now equipped with this amount of information. Best regards, Max. On 20/02/01 12:16, Jakub Kicinski wrote: > On Fri, 31 Jan 2020 14:57:30 +0100, Max Neunhoeffer wrote: > > Dear All, > > > > I believe I have found a bug in Linux 5.3 and 5.4 in epoll_wait/epoll_ctl > > when an eventfd together with edge-triggered or the EPOLLONESHOT policy > > is used. If an epoll_ctl call to rearm the eventfd happens approximately > > at the same time as the epoll_wait goes to sleep, the event can be lost, > > even though proper protection through a mutex is employed. > > > > The details together with two programs showing the problem can be found > > here: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=205933 > > > > Older kernels seem not to have this problem, although I did not test all > > versions. I know that 4.15 and 5.0 do not show the problem. > > > > Note that this method of using epoll_wait/eventfd is used by > > boost::asio to wake up event loops in case a new completion handler > > is posted to an io_service, so this is probably relevant for many > > applications. > > > > Any help with this would be appreciated. > > Could be networking related but let's CC FS folks just in case. > > Would you be able to perform bisection to narrow down the search > for a buggy change?