Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp4611258rwb; Wed, 17 Aug 2022 03:10:12 -0700 (PDT) X-Google-Smtp-Source: AA6agR6UA3gFA3NNAoWefp72/FgzKv2thJpgJdj6Y7+S+kjCxYnCPWYd9FsxN2/qsQDekN+xRwyy X-Received: by 2002:a17:907:3e20:b0:730:7528:d7d7 with SMTP id hp32-20020a1709073e2000b007307528d7d7mr16119073ejc.136.1660731012093; Wed, 17 Aug 2022 03:10:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660731012; cv=none; d=google.com; s=arc-20160816; b=Ss1n8FVLLGsWQDvMIsPLWyC/DrPnG36gpmjRfjPuQ16zljNeez/3DyKb8sK5nPyZBb iDgKU5qFvMUHgdy4DVbqm1IaXr+TnNS38lVzCnzd7DKCKakeKNEMP+DMuS8QoUC4hV8f VrYNliQ796JQZkXhaWTKY9LKD6Ub+Eu739SLZAWQOSHcykGwbkyPBuPMs8MQmg9DGb3s KCMMHMxFplihwSSTgNGwPbLYSY8l4ACV74iJ1tcWG5cIaXRFy4u45Hu6A3M6Qi6H7MFJ qiHeEB+XcKya56LWr4vmYIDtRshkwTBFGG2IfRqq/r/PU1W/ReDun8U2QCw7Gi2JCvwg txrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:date:message-id:organization:from :references:cc:to:subject; bh=iDGFyeisXwcb2I8poOpp14beXUG+Vs6oziWcifbeZCI=; b=zi/P5gv/FMt4hMLTbxfqOT61AUxIzwmyEroBjxGnRxqa80M5WaLX4Cje3QUCCjezkb j7EnNv1QGioqhsv5VOjTL5c2zfEO/LyAKMYnbd5tUX/5lySmWfFL2salIQ2Je6Xcgvxd +tQ7+XH+klhz/idJSIULckEWfEnMhJHIAXEjUvIiSK1rCOqOK/b026uvY94CZEkVf+O/ fmuXeWxHnHMPlIG5DWITsvMfMZY8etV7yU9eDyVc68kN46hSLa0bYysBKp4tXYAsS097 auAbnrbVS17ofW3uUhHXUAcMtguHqsGiMsgh5kmakUkbaUi2VK5AJThZHWEArURsEM9v CJXw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=applied-asynchrony.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sd8-20020a1709076e0800b00734c06c0faasi13647607ejc.314.2022.08.17.03.09.45; Wed, 17 Aug 2022 03:10:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=applied-asynchrony.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235667AbiHQJxK (ORCPT + 99 others); Wed, 17 Aug 2022 05:53:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234305AbiHQJxB (ORCPT ); Wed, 17 Aug 2022 05:53:01 -0400 Received: from mail.itouring.de (mail.itouring.de [IPv6:2a01:4f8:a0:4463::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 591EA7CAA9; Wed, 17 Aug 2022 02:52:59 -0700 (PDT) Received: from tux.applied-asynchrony.com (p5ddd78be.dip0.t-ipconnect.de [93.221.120.190]) by mail.itouring.de (Postfix) with ESMTPSA id 61A25103762; Wed, 17 Aug 2022 11:52:55 +0200 (CEST) Received: from [192.168.100.221] (hho.applied-asynchrony.com [192.168.100.221]) by tux.applied-asynchrony.com (Postfix) with ESMTP id 08CBDF01600; Wed, 17 Aug 2022 11:52:55 +0200 (CEST) Subject: Re: stalling IO regression since linux 5.12, through 5.18 To: Chris Murphy , Nikolay Borisov , Jens Axboe , Jan Kara , Paolo Valente Cc: Linux-RAID , linux-block , linux-kernel , Josef Bacik , linux-block References: <2220d403-e443-4e60-b7c3-d149e402c13e@www.fastmail.com> <61e5ccda-a527-4fea-9850-91095ffa91c4@www.fastmail.com> <4995baed-c561-421d-ba3e-3a75d6a738a3@www.fastmail.com> <2b8a38fa-f15f-45e8-8caa-61c5f8cd52de@www.fastmail.com> From: =?UTF-8?Q?Holger_Hoffst=c3=a4tte?= Organization: Applied Asynchrony, Inc. Message-ID: <7c830487-95a6-b008-920b-8bc4a318f10a@applied-asynchrony.com> Date: Wed, 17 Aug 2022 11:52:54 +0200 MIME-Version: 1.0 In-Reply-To: <2b8a38fa-f15f-45e8-8caa-61c5f8cd52de@www.fastmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022-08-16 17:34, Chris Murphy wrote: > > On Tue, Aug 16, 2022, at 11:25 AM, Nikolay Borisov wrote: >> How about changing the scheduler either mq-deadline or noop, just >> to see if this is also reproducible with a different scheduler. I >> guess noop would imply the blk cgroup controller is going to be >> disabled > > I already reported on that: always happens with bfq within an hour or > less. Doesn't happen with mq-deadline for ~25+ hours. Does happen > with bfq with the above patches removed. Does happen with > cgroup.disabled=io set. > > Sounds to me like it's something bfq depends on and is somehow > becoming perturbed in a way that mq-deadline does not, and has > changed between 5.11 and 5.12. I have no idea what's under bfq that > matches this description. Chris, just a shot in the dark but can you try the patch from https://lore.kernel.org/linux-block/20220803121504.212071-1-yukuai1@huaweicloud.com/ on top of something more recent than 5.12? Ideally 5.19 where it applies cleanly. No guarantees, I just remembered this patch and your problem sounds like a lost wakeup. Maybe BFQ just drives the sbitmap in a way that triggers the symptom. -h