Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp413557rwb; Thu, 10 Nov 2022 02:37:57 -0800 (PST) X-Google-Smtp-Source: AMsMyM4JCV6UNDxOCh1pjZaFLCpElMGYuU60eu2ZDwY9zcA17sERY8iekrL07w7Fk8y3/v0ZMLN+ X-Received: by 2002:a63:1405:0:b0:45f:bf96:771c with SMTP id u5-20020a631405000000b0045fbf96771cmr2317565pgl.131.1668076677286; Thu, 10 Nov 2022 02:37:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668076677; cv=none; d=google.com; s=arc-20160816; b=XqaOL75NPcgajFJ5HV7F95lNeYlTLmDraOX1j0oaHaSNYDNjqP6TrQ4upkROh9bu3M JgMve2V4Rj6iOMtIDR5kR+UKBeuYn0LIIzzbzKlLlZthmAdFo5DfhaQrysrsIL4SiU0N 7jYG84nSC4uDaD5Kmf0OT6+UOW9TlDXncd5nF3JJ2VtS0wRCYnXEQNFOJ/elOa200Y9/ eFBENJhKsxBxJRK/Rc/AJgANKWNGFf96yvP6Fc0i4PBn/dkHfBeaTB8QWU1zkN2eFqhl UFYED6iJMYosq+x90U4mJrU5oEP4EuFUcJDSrXaDjKkJfdH64VI08EfFvAMLewdFIBd9 J5qQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=MROv+ToXh8ryiyR2Mcd72J2oB24qvva8HoOx65uEfnw=; b=FzFOaCSlkgV3r4zeTLs+tXfhVVJQMg8h1QUlyJlzJIK9RYCU4sOxxNCXansLBp/PCy 4hMOQGm+PURdtW3MJVNpCd+/lJZmeAMmF2RgLazhnzeayQIPo0RmALiPB1rsziCPritQ +Etijsu/m3iOxF10bTPQ5KDI4iKhbi+OgnshQ6lJM+PnPcR7gLGCp034IfzAPbx+WXKl uBbb+vEzJJL6VD5/X5o1alky1zgSx73Mfv+YeoEOiimDKxvAgbli+So3Aa+xDWlc+i2w oR3Zu5eKW3pEQB0zC/92fHYbGwLfat4wb9xaEHrbnDK7MKCIcbRnXJO9r/nKLZ3vzlzM D+gQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=JHKz0UX3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lb7-20020a17090b4a4700b0020b0b8df3d2si4421402pjb.57.2022.11.10.02.37.44; Thu, 10 Nov 2022 02:37:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=JHKz0UX3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229939AbiKJKOU (ORCPT + 92 others); Thu, 10 Nov 2022 05:14:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229710AbiKJKOS (ORCPT ); Thu, 10 Nov 2022 05:14:18 -0500 Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95DAF64A25 for ; Thu, 10 Nov 2022 02:14:17 -0800 (PST) Received: by mail-qk1-x736.google.com with SMTP id z17so784577qki.11 for ; Thu, 10 Nov 2022 02:14:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=MROv+ToXh8ryiyR2Mcd72J2oB24qvva8HoOx65uEfnw=; b=JHKz0UX3sNsW9dh+Y6UMohkj8x7TLewUFGw+aCEZs8KbhGciz8bHucdInbV9AVhAue IuJ16GHfz8vFwgTAv3c0LWh/rhHfoG+8DVPHhLOiMYKAs5j2cm/OvFFQBAtw+aUHv1oe ykvxuIwZfY1Jy35F6uQai2CUkYfPxW30crjuSTw57uWnEA35nzJD9IyxZ52U6me5vFlF /oami+J6UiP0CkbneAVaj6p4gzNmEYBkvK89ldTgolmdms9w9lEHSlOsDHFwzLhigp21 PmYUNKf+yP4lPbvRdpHpgERUZSy8ESkXAMnFDMkwjXVzhJcrhcHAyGZrrhWd2KVA4NIq Ra7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MROv+ToXh8ryiyR2Mcd72J2oB24qvva8HoOx65uEfnw=; b=RV6drL3p3C1/hvzBEDxGpiHRAzJ+rf7Ex71tEvpwhJAGSSqXeUu3PDmPQQlpXqrLOP mNYWZ/SKzqak91HJg5/iiQN5IJteGPwErPUoNLeImB5LEspajphBPxvxzyzhhAKkzqiQ pvXBprx8vMEwjrZIJrAG7TuBDI9uLzqMckPDPyYlh2NFLXp1bmJQLwcnilERVcffiNd3 iTTcRekRl4drv4NzUSxePl2ZO9dNoQUZ9Y4yiOp0MUACtnjyuTVyYSClA+t2GC1X28FR EDHQXjZvGmj1oRo7a/W3yu4Guvf4tsFhP8nDHRQ4qc1XvvBVEiUKxI2F2894nGTYQE9/ ag2g== X-Gm-Message-State: ACrzQf03usPp4ci8L5Kcmo+OJzVcbVwRe/0GN9i9dbWZwkEonEzcsr6E kOE2TWR0Hiic+UbIORnaHg+Bi9ZOtn6XIw== X-Received: by 2002:a05:620a:200c:b0:6fa:4c1a:54e1 with SMTP id c12-20020a05620a200c00b006fa4c1a54e1mr34311275qka.113.1668075256555; Thu, 10 Nov 2022 02:14:16 -0800 (PST) Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com. [209.85.219.174]) by smtp.gmail.com with ESMTPSA id n16-20020a05620a295000b006ce0733caebsm12968550qkp.14.2022.11.10.02.14.16 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Nov 2022 02:14:16 -0800 (PST) Received: by mail-yb1-f174.google.com with SMTP id r3so1772086yba.5 for ; Thu, 10 Nov 2022 02:14:16 -0800 (PST) X-Received: by 2002:a05:6902:1001:b0:6be:820d:a0de with SMTP id w1-20020a056902100100b006be820da0demr63814060ybt.240.1668075255736; Thu, 10 Nov 2022 02:14:15 -0800 (PST) MIME-Version: 1.0 References: <4281b354-d67d-2883-d966-a7816ed4f811@kernel.dk> <93fa2da5-c81a-d7f8-115c-511ed14dcdbb@kernel.dk> In-Reply-To: <93fa2da5-c81a-d7f8-115c-511ed14dcdbb@kernel.dk> From: Willem de Bruijn Date: Thu, 10 Nov 2022 11:13:38 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCHSET v3 0/5] Add support for epoll min_wait To: Jens Axboe Cc: Stefan Hajnoczi , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 8, 2022 at 3:09 PM Jens Axboe wrote: > > On 11/8/22 7:00 AM, Stefan Hajnoczi wrote: > > On Mon, Nov 07, 2022 at 02:38:52PM -0700, Jens Axboe wrote: > >> On 11/7/22 1:56 PM, Stefan Hajnoczi wrote: > >>> Hi Jens, > >>> NICs and storage controllers have interrupt mitigation/coalescing > >>> mechanisms that are similar. > >> > >> Yep > >> > >>> NVMe has an Aggregation Time (timeout) and an Aggregation Threshold > >>> (counter) value. When a completion occurs, the device waits until the > >>> timeout or until the completion counter value is reached. > >>> > >>> If I've read the code correctly, min_wait is computed at the beginning > >>> of epoll_wait(2). NVMe's Aggregation Time is computed from the first > >>> completion. > >>> > >>> It makes me wonder which approach is more useful for applications. With > >>> the Aggregation Time approach applications can control how much extra > >>> latency is added. What do you think about that approach? > >> > >> We only tested the current approach, which is time noted from entry, not > >> from when the first event arrives. I suspect the nvme approach is better > >> suited to the hw side, the epoll timeout helps ensure that we batch > >> within xx usec rather than xx usec + whatever the delay until the first > >> one arrives. Which is why it's handled that way currently. That gives > >> you a fixed batch latency. > > > > min_wait is fine when the goal is just maximizing throughput without any > > latency targets. > > That's not true at all, I think you're in different time scales than > this would be used for. > > > The min_wait approach makes it hard to set a useful upper bound on > > latency because unlucky requests that complete early experience much > > more latency than requests that complete later. > > As mentioned in the cover letter or the main patch, this is most useful > for the medium load kind of scenarios. For high load, the min_wait time > ends up not mattering because you will hit maxevents first anyway. For > the testing that we did, the target was 2-300 usec, and 200 usec was > used for the actual test. Depending on what the kind of traffic the > server is serving, that's usually not much of a concern. From your > reply, I'm guessing you're thinking of much higher min_wait numbers. I > don't think those would make sense. If your rate of arrival is low > enough that min_wait needs to be high to make a difference, then the > load is low enough anyway that it doesn't matter. Hence I'd argue that > it is indeed NOT hard to set a useful upper bound on latency, because > that is very much what min_wait is. > > I'm happy to argue merits of one approach over another, but keep in mind > that this particular approach was not pulled out of thin air AND it has > actually been tested and verified successfully on a production workload. > This isn't a hypothetical benchmark kind of setup. Following up on the interrupt mitigation analogy. This also reminds somewhat of SO_RCVLOWAT. That sets a lower bound on received data before waking up a single thread. Would it be more useful to define a minevents event count, rather than a minwait timeout? That might give the same amount of preferred batch size, without adding latency when unnecessary, or having to infer a reasonable bound from expected event rate. Bounded still by the max timeout.