Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp4186980pxu; Mon, 12 Oct 2020 11:41:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy4YSQDS0LMeNLLtstYBhnJITnCAOPFRWiyBhopy5hAhRkpKEVLt0A4frIvY94H+8wxM794 X-Received: by 2002:aa7:cf17:: with SMTP id a23mr15564606edy.298.1602528086227; Mon, 12 Oct 2020 11:41:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602528086; cv=none; d=google.com; s=arc-20160816; b=kyjrrNgsDZepseOH5ENR2nMygr4ublMmCtG4Ln12qLOvTbPbohy/Zvkd+zQpdLET6L mxCrnasnP8PXBMN4MN3n5p3GsOWpAdbw6V860u+/sQ9gFBGudDZOgdKThsfs7k9OLszF nMcUx+9xg7O0nXxJITzwxaE/XmH6ooZlg8wzjU4GCXMjDHqt2ORdcLlgGRHeOgHTBRgS Px5OFdHqEiGB0Q9m9KM2JUSjb+0HFn9dk0YQwDk8KQ1XW463+bQlUm44ZBg9NG297Cc0 zov2wMsaIgFLk4/dggZEl7U1igODFPUdiZ28OA6DJ8GFzfV2r4KRK2bsZATobshvsTPI zooQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:reply-to :mime-version:dkim-signature; bh=R/gYxTsYJFdTQ38CkcW02MIpSzR9RPkG/h3hLnnndhY=; b=hw8Z9rFwR/OzyQZj+X7USzprIZliHPWgMVfhFAe3yFNtQ5FLYlKDY0HV1OwpB6wVbg pmzwwtu3crPnbfzi0JyiPZ1X8gQIsNpOvc0mOVNLEasUBRTVblPULe8KQW3i1bSMqf7H qkQRiOafrWLJVdqFv1Q9Hu0kRbvBgi90yoh3Q2wplRDndWPtp1RV7//oPXVBRrQqVoBD axGDh+TedF65L7uazT83NLAF/61Hgyw+heNInTGFcgml8lMv6isfqdhHAiUHSp+M01nG FUM/W4V2ZjsxOSFXlMN2b9JyGC44pvQ5/lTy0ZtK60rtJ5sddnm6hJsTfG6jgPi0n/a/ ZyOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=pZ8PPJfu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l17si12414566eds.322.2020.10.12.11.41.02; Mon, 12 Oct 2020 11:41:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=pZ8PPJfu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726636AbgJLSjz (ORCPT + 99 others); Mon, 12 Oct 2020 14:39:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729656AbgJLSjy (ORCPT ); Mon, 12 Oct 2020 14:39:54 -0400 Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D445BC0613D0; Mon, 12 Oct 2020 11:39:54 -0700 (PDT) Received: by mail-oi1-x241.google.com with SMTP id j7so2139037oie.12; Mon, 12 Oct 2020 11:39:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:from:date:message-id:subject:to:cc; bh=R/gYxTsYJFdTQ38CkcW02MIpSzR9RPkG/h3hLnnndhY=; b=pZ8PPJfuk9F6Z1AKgc8pHbhiSMcYsSZFtzvTxccLwnMkYgT3/OEcNLvSKlFdJqD7P5 HvTGWQXCYXJmqywSL6OdTP40wSp2fg8haeRIYlHlkJF2qHpYqBgFyJ2bjioLHqrQHtC1 lhk78vwaSUg3NbnWnHZGwExASL/Rl1mIGG0qCEqJ5o48Je/qTAm8ZudczYHpCKN5QuPz TsLnz4nhcn4n245TZNoPHIoSHw9oZbQehh2IxWpl8MiM4U6I3nAZCU03O7i0Fm3DYFCu 7mDq7v6Le2xMdp+42Vqjs292LnRFZBw+jOyDm4EyTligEGUh/unuN8aisX+z8VrNANP/ nCfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:from:date:message-id :subject:to:cc; bh=R/gYxTsYJFdTQ38CkcW02MIpSzR9RPkG/h3hLnnndhY=; b=VelKh3AWxaFgt7GvZSSIGWAdTw+heIavlHTnURKKqHgSrST0NFZL/QXpFuE8XffEAa 8vEWq35+HZEJZhRi2g5GK9aOZP04EqVI4FZkBSQmszSnbc4Iz/J1EOFXWLLLvsy4VUjQ VxEyZr9cQeuC8b3aEWptPhojRcLvO2ZbDkyf1lKYVUj+XCBPdEyMYTBwFfUH8NStRqz6 REft5UJ70br3YpPjcFZxQyqOcJMzpLPh2gnzaSeahmnWw4QW1ggKVZ9x1B7RzlHuCka/ ZPHLFbZnbH0CLy0xvKSGQjmGAkDsyS8LCQk4IGiY+X7g2ZosRGDU9H1WY16fL+zHzVZK l0hg== X-Gm-Message-State: AOAM5308R80TIR4XIO+Z2RFFkiG/k7Yup0Gq81+KsAzayi/pHJJAvin0 0+gOoTN2NwxKmJckkTDQKZPqH9GtvzgDZFtzX3Q= X-Received: by 2002:aca:bb41:: with SMTP id l62mr9681768oif.148.1602527994142; Mon, 12 Oct 2020 11:39:54 -0700 (PDT) MIME-Version: 1.0 Reply-To: mtk.manpages@gmail.com From: "Michael Kerrisk (man-pages)" Date: Mon, 12 Oct 2020 20:39:41 +0200 Message-ID: Subject: Regression: epoll edge-triggered (EPOLLET) for pipes/FIFOs To: Linus Torvalds Cc: David Howells , Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , Nicolas Dichtel , Ian Kent , Christian Brauner , keyrings@vger.kernel.org, "linux-fsdevel@vger.kernel.org" , Linux API , lkml , Michael Kerrisk Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Linus, Between Linux 5.4 and 5.5 a regression was introduced in the operation of the epoll EPOLLET flag. From some manual bisecting, the regression appears to have been introduced in commit 1b6b26ae7053e4914181eedf70f2d92c12abda8a Author: Linus Torvalds Date: Sat Dec 7 12:14:28 2019 -0800 pipe: fix and clarify pipe write wakeup logic (I also built a kernel from the immediate preceding commit, and did not observe the regression.) The aim of ET (edge-triggered) notification is that epoll_wait() will tell us a file descriptor is ready only if there has been new activity on the FD since we were last informed about the FD. So, in the following scenario where the read end of a pipe is being monitored with EPOLLET, we see: [Write a byte to write end of pipe] 1. Call epoll_wait() ==> tells us pipe read end is ready 2. Call epoll_wait() [again] ==> does not tell us that the read end of pipe is ready (By contrast, in step 2, level-triggered notification would tell us the read end of the pipe is read.) If we go further: [Write another byte to write end of pipe] 3. Call epoll_wait() ==> tells us pipe read end is ready The above was true until the regression. Now, step 3 does not tell us that the pipe read end is ready, even though there is NEW input available on the pipe. (In the analogous situation for sockets and terminals, step 3 does (still) correctly tell us that the FD is ready.) I've appended a test program below. The following are the results on kernel 5.4.0: $ ./pipe_epollet_test Writing a byte to pipe() 1: OK: ret = 1, events = [ EPOLLIN ] 2: OK: ret = 0 Writing a byte to pipe() 3: OK: ret = 1, events = [ EPOLLIN ] Closing write end of pipe() 4: OK: ret = 1, events = [ EPOLLIN EPOLLHUP ] On current kernels, the results are as follows: $ ./pipe_epollet_test Writing a byte to pipe() 1: OK: ret = 1, events = [ EPOLLIN ] 2: OK: ret = 0 Writing a byte to pipe() 3: FAIL: ret = 0; EXPECTED: ret = 1, events = [ EPOLLIN ] Closing write end of pipe() 4: OK: ret = 1, events = [ EPOLLIN EPOLLHUP ] Thanks, Michael ===== /* pipe_epollet_test.c Copyright (c) 2020, Michael Kerrisk Licensed under GNU GPLv2 or later. */ #include #include #include #include #include #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \ } while (0) static void printMask(int events) { printf(" [ %s%s]", (events & EPOLLIN) ? "EPOLLIN " : "", (events & EPOLLHUP) ? "EPOLLHUP " : ""); } static void doEpollWait(int epfd, int timeout, int expectedRetval, int expectedEvents) { struct epoll_event ev; static int callNum = 0; int retval = epoll_wait(epfd, &ev, 1, timeout); if (retval == -1) { perror("epoll_wait"); return; } /* The test succeeded if (1) we got the expected return value and (2) when the return value was 1, we got the expected events mask */ bool succeeded = retval == expectedRetval && (expectedRetval == 0 || expectedEvents == ev.events); callNum++; printf(" %d: ", callNum); if (succeeded) printf("OK: "); else printf("FAIL: "); printf("ret = %d", retval); if (retval == 1) { printf(", events ="); printMask(ev.events); } if (!succeeded) { printf("; EXPECTED: ret = %d", expectedRetval); if (expectedRetval == 1) { printf(", events ="); printMask(expectedEvents); } } printf("\n"); } int main(int argc, char *argv[]) { int epfd; int pfd[2]; epfd = epoll_create(1); if (epfd == -1) errExit("epoll_create"); /* Create a pipe and add read end to epoll interest list */ if (pipe(pfd) == -1) errExit("pipe"); struct epoll_event ev; ev.data.fd = pfd[0]; ev.events = EPOLLIN | EPOLLET; if (epoll_ctl(epfd, EPOLL_CTL_ADD, pfd[0], &ev) == -1) errExit("epoll_ctl"); /* Run some tests */ printf("Writing a byte to pipe()\n"); write(pfd[1], "a", 1); doEpollWait(epfd, 0, 1, EPOLLIN); doEpollWait(epfd, 0, 0, 0); printf("Writing a byte to pipe()\n"); write(pfd[1], "a", 1); doEpollWait(epfd, 0, 1, EPOLLIN); printf("Closing write end of pipe()\n"); close(pfd[1]); doEpollWait(epfd, 0, 1, EPOLLIN | EPOLLHUP); exit(EXIT_SUCCESS); } -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/