Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp3051665ybh; Mon, 16 Mar 2020 14:56:05 -0700 (PDT) X-Google-Smtp-Source: ADFU+vs5DN8ouvS6NRZ8iwuKl9yV0HK1vil2yyqqKZBQc2Q2anNPv8LM2QJBzBZ7F8exIH/1caLb X-Received: by 2002:a9d:2208:: with SMTP id o8mr1147563ota.208.1584395765817; Mon, 16 Mar 2020 14:56:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584395765; cv=none; d=google.com; s=arc-20160816; b=q07yt23OVxPoQv0UNg5OETGxaJyp4r4coUx+OVdloRWvj6NfwjbuWGG1XU5xDP63BH +p9wAA8Rggw6d8wGU2UHWcnUZUD4OQ2kEHTx9bx6s+SEPMatucEWHFoeTYNeed0gJnMO pSmWqm5tZ+0lBTCtxJyjtMbuPN4GgfDd7C+8M+3LXG0N0+Ld2T4Kk/zgV6FyTK1cQ/xG tG2RGSNZcZczj/x+8OOKam0skF9nYcgiuAAVbgiUVBu6dAHxCduz0RaMgSdkwsNRoJf7 EFYxo0YI+FhMU41/m3ou/Ew9wFxDB4FYtt9obD9fDiPUh4hu8Esh8mQgl0jSEjfX4eaa MfqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to; bh=LmVzCKZgI3rkYrBKYUnooX496OPJr3e1YT38aqrwRYk=; b=wnoiDXYxGHBNhLyohMZ9xEIsiOF2o7HF3k5mWSa0zupl+jKHDgsJMsiA+i6zZz6ppo FgnVbMRdG8+f7eOXQEIOJWBXacz4dQCvMV2bgvhR/VIO3vkqTehVL29bG3RCGX9iOHs4 rzVmkLxWJfRtz2bhOvctPVklJFgTV3wYVmbpXJxtPhZa3X5Ckcts+vDccm0Oj8+0yA4d nakjyPbi+inD9Tf6DHajEC2A+AFwQiEWoJ65toTgEvcBhGrzlfeq/5/WHWAvbLvrtjV2 szNjmPd7A9kA69sjr+NZlJAd56KMMhBNkZENeMh/2lz5wlfaLGBrzpxMyuM08TEAAOFn Jlxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b12si502204ots.192.2020.03.16.14.55.52; Mon, 16 Mar 2020 14:56:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732723AbgCPVyC (ORCPT + 99 others); Mon, 16 Mar 2020 17:54:02 -0400 Received: from ale.deltatee.com ([207.54.116.67]:52766 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732636AbgCPVyC (ORCPT ); Mon, 16 Mar 2020 17:54:02 -0400 Received: from guinness.priv.deltatee.com ([172.16.1.162]) by ale.deltatee.com with esmtp (Exim 4.92) (envelope-from ) id 1jDxgE-0004oM-6R; Mon, 16 Mar 2020 15:53:51 -0600 To: Thomas Gleixner , Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org Cc: Peter Zijlstra , Ingo Molnar , Will Deacon , "Paul E . McKenney" , Joel Fernandes , Steven Rostedt , Linus Torvalds , Kurt Schwemmer , Bjorn Helgaas , linux-pci@vger.kernel.org References: <20200313174701.148376-1-bigeasy@linutronix.de> <20200313174701.148376-4-bigeasy@linutronix.de> <4d3a997d-ced4-3dbe-d766-0b1e9fc35b29@deltatee.com> <87v9n4ccvp.fsf@nanos.tec.linutronix.de> From: Logan Gunthorpe Message-ID: <39f2bd27-1a4a-f7ad-5d54-7fe133390cd0@deltatee.com> Date: Mon, 16 Mar 2020 15:53:47 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <87v9n4ccvp.fsf@nanos.tec.linutronix.de> Content-Type: text/plain; charset=utf-8 Content-Language: en-CA Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 172.16.1.162 X-SA-Exim-Rcpt-To: linux-pci@vger.kernel.org, bhelgaas@google.com, kurt.schwemmer@microsemi.com, torvalds@linux-foundation.org, rostedt@goodmis.org, joel@joelfernandes.org, paulmck@kernel.org, will@kernel.org, mingo@kernel.org, peterz@infradead.org, linux-kernel@vger.kernel.org, bigeasy@linutronix.de, tglx@linutronix.de X-SA-Exim-Mail-From: logang@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-8.6 required=5.0 tests=ALL_TRUSTED,BAYES_00, GREYLIST_ISWHITE,MYRULES_EXCLUSIVE,MYRULES_FREE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.2 Subject: Re: [PATCH 3/9] pci/switchtec: Don't abuse completion wait queue for poll X-SA-Exim-Version: 4.2.1 (built Wed, 08 May 2019 21:11:16 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-03-16 1:34 p.m., Thomas Gleixner wrote: > Logan Gunthorpe writes: >> On 2020-03-13 11:46 a.m., Sebastian Andrzej Siewior wrote: >>> 1) It cannot work with EPOLLEXCLUSIVE >> >> Why? You don't explain this. > > man epoll_ctt(2) > > EPOLLEXCLUSIVE (since Linux 4.5) > > Sets an exclusive wakeup mode for the epoll file descriptor that is > being attached to the target file descriptor, fd. When a wakeup event > occurs and multiple epoll file descriptors are attached to the same > target file using EPOLLEXCLUSIVE, one or more of the epoll file > descriptors will receive an event with epoll_wait(2). > > As this uses complete_all() there is no distinction possible, because > complete_all() wakes up everything. > >> And I don't see how this patch would change anything to do with the >> call to poll_wait(). All you've done is open-code the completion. > > wake_up_interruptible(x) resolves to: > > __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL) > > which wakes exactly 1 exclusive waiter. > > Also the other way round is just working because the waker side uses > complete_all(). Why? Because completion internally defaults to exclusive > mode and complete() wakes exactly one exlusive waiter. > > There is a conceptual difference and while it works for that particular > purpose to some extent it's not suitable as a general wait notification > construct. Ok, I now understand this point. That's exceedingly subtle. I certainly would not agree that this qualifies as "seriously broken", and I'm not even sure I'd agree that this actually violates the semantics of poll() seeing the man page clearly states that with EPOLLEXCLUSIVE set, "one or more" pollers will be woken up. So waking up all of them is still allowed. Ensuring fewer pollers wake up is just an optimization to avoid the thundering herd problem which users of this interface are very unlikely to ever have (I can confidently tell you that none have this problem now). If we do want to say that all poll_wait() users *must* respect EPOLLEXCLUSIVE, we should at least have some documentation saying that combining poll_wait() with wake_up_all() (or similar) is not allowed. A *very* quick check finds there's at least a few drivers doing this: drivers/char/ipmi/ipmb_dev_int.c drivers/dma-buf/sync_file.c drivers/gpu/vga/vgaarb.c (That's just looking at the drivers tree, up to "G".) Finally, since we seem to back to more reasonable discussion, I will make this point: it's fairly common for wait queue users to directly use the spinlock from within wait_queue_head_t without an interface (even completion.c does it). How are developers supposed to know when an interface is required and when it's not? Sometimes using "implementation" details interface-free is standard practice, but other times it's "yuck" and will illicit ire from other developers? Is it valid to use completion.wait.lock? Where's the line? Logan