Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp1939187rwb; Fri, 12 Aug 2022 09:09:22 -0700 (PDT) X-Google-Smtp-Source: AA6agR6xhUqBuqjrPhL6HnCBg0N2Ksx5h8sxmfSfnNZsYq/jiGvhfSyPRJqOUqyN6sY6pMvKmBk3 X-Received: by 2002:a17:907:6818:b0:730:d99f:7b91 with SMTP id qz24-20020a170907681800b00730d99f7b91mr3130506ejc.496.1660320562433; Fri, 12 Aug 2022 09:09:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660320562; cv=none; d=google.com; s=arc-20160816; b=ppoVfCTq9JIn5RRUPVYQYQ/JPkWH0zJTm0xYVDvZBvuEtKx/2G1xpJNefyamrIFgHi fZJFkni8gs9CLI8ktnOi0E+GTWZsXwGaw3GwMT84AnEN0HHL+yQhtvQCBchCDnw+Vv3x 9FD7yZFpE9/hSc351pCEKeTymMfj8bkvFieOHWSxVoEW7HWrlVWYGBCIGtuMlELVVz0s owV2vCopqlh4nkRwO/3Ys3uwE72s/W251uDVYyqvR01aTnIAnBXWWeW3ipqtBJ00EUWX fwJ3edvb3lyiLmj5ElbPdnGQLSFK6SOg8ypuqP/DMfTe9uDDTldys//1p1gA1dBawMoF tTAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:date:references:in-reply-to :message-id:mime-version:user-agent:feedback-id:dkim-signature :dkim-signature; bh=ZfavIhskby4GoKzVfw6MKy4s5jR3CPPCp0NyB7WB09Q=; b=mGl7YxhDPZU+IeZIBHqeFhxtmhH8M1i7aXP3ETJSuFoPW9RNh/WtXYcvX9olD2CX2E CSuD05fdRoGt5zuW9HjU9JaJCzAq7fRITAM8G2n/P4zHgpmRj4NLeMr2pontK+lSbhmM /FKqnkKCaEsOlLxegmc4CJ7oYOulCf0aLEfge9kvwxu2E/9v+v20tDm1R6YrAF+ZfrlT i8p9tD0iuXxFGJ2WykZHl06HrNvlY6q4pMcln2dIIT23dE9MkdslZR6glGJR10Ac/CfJ KXAIM/ELSwGwMDMctm6nx/3UpF+MSI4uRvLvMvt6GIk9GTrKgK5Q2u9+B0Sfa4RlaypD IuSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@colorremedies.com header.s=fm1 header.b=XvhyLVw4; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="HwDdZJV/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h12-20020a05640250cc00b0043dc589b730si2679317edb.72.2022.08.12.09.08.55; Fri, 12 Aug 2022 09:09:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@colorremedies.com header.s=fm1 header.b=XvhyLVw4; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="HwDdZJV/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239102AbiHLQFq (ORCPT + 99 others); Fri, 12 Aug 2022 12:05:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239118AbiHLQFi (ORCPT ); Fri, 12 Aug 2022 12:05:38 -0400 Received: from new4-smtp.messagingengine.com (new4-smtp.messagingengine.com [66.111.4.230]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCDEC1CFF3; Fri, 12 Aug 2022 09:05:36 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailnew.nyi.internal (Postfix) with ESMTP id 50C2F580738; Fri, 12 Aug 2022 12:05:33 -0400 (EDT) Received: from imap50 ([10.202.2.100]) by compute3.internal (MEProxy); Fri, 12 Aug 2022 12:05:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= colorremedies.com; h=cc:cc:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm1; t=1660320333; x= 1660323933; bh=ZfavIhskby4GoKzVfw6MKy4s5jR3CPPCp0NyB7WB09Q=; b=X vhyLVw4NZweU60SIKsiCGV+4EJKvVC8rQ9uyDHozg9JtCXk5hzODpH6oO/RqLNAz OyrIxA2iK5V6V+xrjISpOaAp4ZA3UaTtvP+vHRiIRuu1VWFcbnUIDS3qo1hsmo4E VZmzNMS57hJwxzpr2PXQfN44x9qPjC7AxzZXUpGq+jPoFm7LijIX/ZLaRwFEPuW6 JuSi7VzKG7klRfRmf+IzwEt/5/wz/IYcPdigT2ytkFYm5rWg1CK6pRLTOCSMctLz WlK6SQdKQvXTIzy2Nwui80nU0pJ1DI+jwnHU7kyvU/N0+rJeTP7q0W0mABBhLKtD Hq778xK4mEsTUuLv2JwCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; t=1660320333; x=1660323933; bh=ZfavIhskby4GoKzVfw6MKy4s5jR3 CPPCp0NyB7WB09Q=; b=HwDdZJV/LIXfyrpXiV5kfQbqrztiOVOP4vTxlCH5p5lE fMJpeWasfiQKwsE6hyr/fD8DTeQWCeKGs4ZjX7LX3IPuFpwX8B85b3W7He6vc4sj RxHz3ixfoFaf3aukpqs2Yk7TSG9+pLs17fcNID9UNsrmCPzPBIspc4tCZ+ZT1kxO EzbKI0xYh7yqyyvxGcMMsxLx+9kgTbXaH2vlOHyF0Jlh48XSMbgnJTPHNqhfquNJ EXeTckTzC71s9+DNbcQDmo0aPfcjyuozcJq4BPvbW8XU6H5bYwIfyg3TEEqxx6OZ 3ZngGXE0ffxgssrTHosHP55lusOQCP9v+EChQ0UolQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrvdegiedgleejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne goufhushhpvggtthffohhmrghinhculdegledmnecujfgurhepofgfggfkjghffffhvfev ufgtsehttdertderredtnecuhfhrohhmpedfvehhrhhishcuofhurhhphhihfdcuoehlih hsthhssegtohhlohhrrhgvmhgvughivghsrdgtohhmqeenucggtffrrghtthgvrhhnpeef heeliedugeeuleetffeuheegkeetgfdtveevudffgfejvdegveeljefhvdefhfenucffoh hmrghinhepghhoohhglhgvrdgtohhmnecuvehluhhsthgvrhfuihiivgeptdenucfrrghr rghmpehmrghilhhfrhhomheplhhishhtshestgholhhorhhrvghmvgguihgvshdrtghomh X-ME-Proxy: Feedback-ID: i06494636:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id ECA551700082; Fri, 12 Aug 2022 12:05:32 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-841-g7899e99a45-fm-20220811.002-g7899e99a Mime-Version: 1.0 Message-Id: In-Reply-To: References: <2220d403-e443-4e60-b7c3-d149e402c13e@www.fastmail.com> Date: Fri, 12 Aug 2022 12:05:12 -0400 From: "Chris Murphy" To: "Josef Bacik" , paolo.valente@linaro.org Cc: "Btrfs BTRFS" , Linux-RAID , linux-block@vger.kernel.org, linux-kernel Subject: Re: stalling IO regression since linux 5.12, through 5.18 Content-Type: text/plain X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 10, 2022, at 3:34 PM, Chris Murphy wrote: > Booted with cgroup_disable=io, and confirmed cat > /sys/fs/cgroup/cgroup.controllers does not list io. The problem still reproduces with the cgroup IO controller disabled. On a whim, I decided to switch the IO scheduler from Fedora's default bfq for rotating drives to mq-deadline. The problem does not reproduce for 15+ hours, which is not 100% conclusive but probably 99% conclusive. I then switched live while running the workload to bfq on all eight drives, and within 10 minutes the system cratered, all new commands just hang. Load average goes to triple digits, i/o wait increasing, i/o pressure for the workload tasks to 100%, and IO completely stalls to zero. I was able to switch only two of the drive queues back to mq-deadline and then lost responsivness in that shell and had to issue sysrq+b... Before that I was able to extra sysrq+w and sysrq+t. https://drive.google.com/file/d/16hdQjyBnuzzQIhiQT6fQdE0nkRQJj7EI/view?usp=sharing I can't tell if this is a bfq bug, or if there's some negative interaction between bfq and scsi or megaraid_sas. Obviously it's rare because otherwise people would have been falling over this much sooner. But at this point there's strong correlation that it's bfq related and is a kernel regression that's been around since 5.12.0 through 5.18.0, and I suspect also 5.19.0 but it's being partly masked by other improvements. -- Chris Murphy