Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp1463921pxb; Fri, 18 Feb 2022 08:16:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJx06aPAX7fL6ANEWVrHFCcNkdnoP59h1vlUMiwjwW1krE3ymqTo3Bw1Ht4yrghQ8pl93o6L X-Received: by 2002:a50:9eeb:0:b0:407:47ba:9a8e with SMTP id a98-20020a509eeb000000b0040747ba9a8emr8937111edf.225.1645200989610; Fri, 18 Feb 2022 08:16:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645200989; cv=none; d=google.com; s=arc-20160816; b=B+uP6XtH/j5URCNx4sIu373A+k6IICe7kmv2PEtifgfa0ucou0GvqrMh+fBp7cL7PY 9h7ybE/RIuRv7QKCGwmyCl/ve2DAsi/Ki8KI2yrhegDw/XIOHsccyWMlaJTyuv2IQZCv tPvwF1v8I6PEg5ay/ERHyW9Kfovm0pI2Q9g8BLZQOPDxm1Iy31/ylYVCT8nA8v6Nfuxj 4XLUAmiaW9EK+TJd6XW2xo8iyfpTxzbhGQquf3PctDSCGTUCRGdE77NmQZnhEKKYZc43 GEI0CxVbOLbWj8OHPoL/CGwBQBcEv+3x3BXgRC3QFR1DxbQx2BmuNL5lJjwPdJo/iO6W W0yw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=XB9Ws3daxQbWMBueDBVoJ12vlfuC8IiSDXQOIWtwbYM=; b=kEUSyi58wQsukStrGp1C28Du0hfd6sKQRoTDlqPo4QCic9fpvE14qct3plpjrU5DFC 3g6DIaWxHEz1C3hkAL7pazuPM5TzxiLCw4s1rHAoRgdNSDgiOULxLvL6EXT/8sSba2lp CxozNg7fDLJ4DoEzYh0L9CYi8nZtaUY9ej3so4Z4PC4uJGIgbPzcCtTwFm9iFmVjhnIk dxcXsqyRwMbBCkOVS8agtZRgaVDE5q/F4zKaFWm8e85SfBNGCcXia4F2jivzb66D+kfN q58I9SjHSj4s54G/kEtmq7XmM/ecsGG9BUoBODnvGA2Uag029y9a9woz0qCsm3Ze0eN7 YcNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20210112.gappssmtp.com header.s=20210112 header.b=ib2R0eit; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q20si4454081edd.61.2022.02.18.08.16.05; Fri, 18 Feb 2022 08:16:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20210112.gappssmtp.com header.s=20210112 header.b=ib2R0eit; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237050AbiBRPfH (ORCPT + 99 others); Fri, 18 Feb 2022 10:35:07 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:58286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237059AbiBRPfG (ORCPT ); Fri, 18 Feb 2022 10:35:06 -0500 Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 706F11EEF5 for ; Fri, 18 Feb 2022 07:34:49 -0800 (PST) Received: by mail-qv1-xf34.google.com with SMTP id d3so15367498qvb.5 for ; Fri, 18 Feb 2022 07:34:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=XB9Ws3daxQbWMBueDBVoJ12vlfuC8IiSDXQOIWtwbYM=; b=ib2R0eit6zqTdwhJ17aK637SZ1+jEjqrpddGCPLZo40hCHBNkr/3aZxQQoQ2dDPHXW W0dYn9SX6MJcrhfWkq64srTQ0h7l7XBBGfvfKPnp6BfWV6GsvpNDiWbBa/FOd8y/InIx c3DckpnFB4VbJI7KWs0mgJ/xUereQcCSEWZLJBkB4YUqnyMPKBO6g/mKPASn31MsMvpo RjiogjqpUSl1YIhn6l4uhH0eM2dbm32ghQ+L3cz5AKmnUERv4aJ8Ifvuy6/Ah2JCWd1J j7Q+e3/mbmmLzupr9Obi0bKE7SNej/Cu7FhD2QAJT8qUrQL4kc6IxE3busi2Xtg1Vs5E pCDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=XB9Ws3daxQbWMBueDBVoJ12vlfuC8IiSDXQOIWtwbYM=; b=O1ZSRvfLGHpMkppSvbsOSkOOtV+trdDkMHsETzux5Arq9+2tvjVcHqaqst65r9YWsY 3Tf1DgvZZNeejuVGSRr0uMTSaPhyad50WjkY7cxW+e0dvO0K/ouYWkpdCPLRWhULDVGw 1qsKkBzmKOjr4BuLfBVC6kMOxRqjnVRzz4QuVlimTgIdScZTp7BPvqjgegtvb/MUeitK 2KA1ZfyOTHjNyD++82lnoGxvm5Zmv0FX9xwGA/DG+SIAZkH5fmkUq1A1lstnVCx0N9Of 0v7Nzq9XdTHeHnrDxPARdR0LDS6uSixWlurKJ1ueHs5i8utsV4aWsVqohWDnEC4l+CH3 GDDQ== X-Gm-Message-State: AOAM530vg0NN3YTfayJryIurhh8D5EUOv5Cfepo1+uOdYxKXQ9yPWL+m vmPJWlyHAfrwTl9dbEUXXx+3Tw6ndnnHH4pu X-Received: by 2002:a05:622a:54d:b0:2d9:6d84:c7d1 with SMTP id m13-20020a05622a054d00b002d96d84c7d1mr7044174qtx.307.1645198488378; Fri, 18 Feb 2022 07:34:48 -0800 (PST) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id bi11sm21322967qkb.18.2022.02.18.07.34.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Feb 2022 07:34:47 -0800 (PST) Date: Fri, 18 Feb 2022 10:34:46 -0500 From: Josef Bacik To: Thorsten Leemhuis Cc: Roman Gushchin , Valentin Schneider , peterz@infradead.org, vincent.guittot@linaro.org, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org, clm@fb.com Subject: Re: [REGRESSION] 5-10% increase in IO latencies with nohz balance patch Message-ID: References: <87lf0y9i8x.mognet@arm.com> <87v8zx8zia.mognet@arm.com> <99452126-661e-9a0c-6b51-d345ed0f76ee@leemhuis.info> <87tuf07hdk.mognet@arm.com> <87k0f37fl6.mognet@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 18, 2022 at 12:00:41PM +0100, Thorsten Leemhuis wrote: > Hi, this is your Linux kernel regression tracker speaking. Top-posting > for once, to make this easy accessible to everyone. > > FWIW, this is a gentle reminder that I'm still tracking this regression. > Afaics nothing happened in the last few weeks. > > If the discussion continued somewhere else, please let me know; you can > do this directly or simply tell my regression tracking bot yourself by > sending a reply to this mail with a paragraph containing a regzbot > command like "#regzbot monitor > https://lore.kernel.org/r/some_msgi@example.com/" > > If you think there are valid reasons to drop this regressions from the > tracking, let me know; you can do this directly or simply tell my > regression tracking bot yourself by sending a reply to this mail with a > paragraph containing a regzbot command like "#regzbot invalid: Some > explanation" (without the quotes). > > Anyway: I'm putting it on back burner now to reduce the noise, as this > afaics is less important than other regressions: > > #regzbot backburner: Culprit is hard to track down > #regzbot poke > > You likely get two more mails like this after the next two merge > windows, then I'll drop it if I don't here anything back. > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > > P.S.: As the Linux kernel's regression tracker I'm getting a lot of > reports on my table. I can only look briefly into most of them and lack > knowledge about most of the areas they concern. I thus unfortunately > will sometimes get things wrong or miss something important. I hope > that's not the case here; if you think it is, don't hesitate to tell me > in a public reply, it's in everyone's interest to set the public record > straight. > > Roman and I sat down to mess with this some more and had some weird observations. On our Facebook internal boxes we couldn't reproduce. If we disable all the normal FB specific stuff so the box is "quiet" 5.16 performs better. However these are all single socket machines with stupid high numbers of cores, my local machine is 2 socket 6 cores. On my box it was actually pretty noisy testing in isolation as well. In the end I rigged up fsperf to run 1000 runs and graph each kernel on top of eachother. What came out was really strange. 1. The "good" kernel had a period for the first ~100 runs that were very low, the p50 was ~9000ns, but after those first 100 runs it jumped up and was right ontop of 5.16. This explains why it shows up on my overnight tests, the box literally reboots and runs tests. So there's a "warmup" period for the scheduler, once it's been hammered on enough it matches 5.16 exactly, otherwise its faster at the beginning. 2. The regression essentially disappears looking at the graphs over 1000 runs. The results are so jittery this was the only way we could honestly look at the results and see anything. The only place the "regression" shows up is in the write completion latency p99. There 5.15 ranges between 75000-85000 ns, whereas 5.16 ranges between 80000 and 100000 ns. However again this is only on my machine, and the p50 latencies and the actual bw_bytes is the same. Given this test is relatively bursty anyways and the fact that we can't reproduce it internally, and the fact that 5.16 actually consistently performs better internally has convinced us to drop this, it's simply too noisy to get a handle on to actually call it a problem. #regzbot invalid: test too noisy and the results aren't clear cut. Thanks, Josef