Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp835458imj; Fri, 15 Feb 2019 07:35:52 -0800 (PST) X-Google-Smtp-Source: AHgI3IbbmaV+8wWAiya16b0xfuWxlgEpi4KhUVS88IixPbFwlxPsst9awiV6ilPce0+y4TCdI4UC X-Received: by 2002:a17:902:5a42:: with SMTP id f2mr675453plm.157.1550244952437; Fri, 15 Feb 2019 07:35:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550244952; cv=none; d=google.com; s=arc-20160816; b=I34CCcgLulsjfa6vRd2mGZFpoy4XIlGqgvTOvuoQ2OeMO8pNXFTlVzXiZQ51xrmvTd 57zTF8vKInLDGDGdiYldHz6En9lrJvJZBdORArZvSgA0KPq2BSFAAg+ljBhkW4PmlGFp Q8yPTfsapVvJMaSXEbBbzRXB6QPMPS9M+aGd5ABUx87qIduakcu7Hu/xgggLhqSDJxZf SaWcZcgdJi6vibhl7dbaKcdr0rNQBJ/A+TXi2rnprKSf8LQbrH+wflcm4SA1tpLgHhrw 3GHLOc8xe4hgxxn1m0LTQ+N5h0Aik0ei/B67ZLFqtKsHZ/QtkX2XOuB+24q1zuLnSJrO CjTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=cEleRcP4+bYzHvWXvxD5GMcvz83PtlW0wDwEY36YEak=; b=XiEOglIGEG4mXrSfJAy2K/lPd65USIfNqUYttSRNSgARkIuVphEWBIaZKNhwwZIN5U uGUW76Pa/nk8gD/20DKa64vt/8S9joUS7/ftkYF7JiMrd40FlwYCrSFzXshSsvCBlre2 Bq/WTapfkaXEwyNTLhpbNKpSpnshT1xlpLMFyIcsx8LHEBXFyg4afhT0pLgdxI5FHlFe OF6rCaDBmS4Nnu6e9B55pCfyPiKfGup5dLSg35hNesJ8uODU69sSN3Vmx1eUSuqHW8HF SRian+pF6Z39I/zAVLllxpKdeHJVdPi8OW6ic13ruSW7Pu3wk9roaLIdOTky5nDl6SaY I77g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m11si5914417plt.26.2019.02.15.07.35.36; Fri, 15 Feb 2019 07:35:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390910AbfBOH5K (ORCPT + 99 others); Fri, 15 Feb 2019 02:57:10 -0500 Received: from smtp2.provo.novell.com ([137.65.250.81]:54013 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726335AbfBOH5K (ORCPT ); Fri, 15 Feb 2019 02:57:10 -0500 Received: from linux-fcij.suse (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by smtp2.provo.novell.com with ESMTP (TLS encrypted); Fri, 15 Feb 2019 00:57:06 -0700 Subject: Re: linux 4.19.19: md0_raid:1317 blocked for more than 120 seconds. To: Wolfgang Walter Cc: Jens Axboe , NeilBrown , linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org References: <2131016.q2kFhguZXe@stwm.de> <3057098.nBgIypvgED@stwm.de> <0c832f67-de10-8872-d3db-6a9f11c97454@suse.com> <3877135.KJXZSZYZ1L@stwm.de> From: Guoqing Jiang Message-ID: <4703e846-d01b-ee2f-3306-958845a01d05@suse.com> Date: Fri, 15 Feb 2019 15:57:02 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 In-Reply-To: <3877135.KJXZSZYZ1L@stwm.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/14/19 11:27 PM, Wolfgang Walter wrote: > Am Donnerstag, 14. Februar 2019, 10:09:56 schrieb Guoqing Jiang: >> On 2/12/19 7:20 PM, Wolfgang Walter wrote: >>> Am Dienstag, 12. Februar 2019, 16:20:11 schrieb Guoqing Jiang: >>>> On 2/11/19 11:12 PM, Wolfgang Walter wrote: >>>>> With 4.19.19 we see sometimes the following issue (practically only with >>>>> blk_mq, though): >>>>> >>>>> Feb 4 20:04:46 tettnang kernel: [252300.060165] INFO: task >>>>> md0_raid1:317 >>>>> blocked for more than 120 seconds. Feb 4 20:04:46 tettnang kernel: >>>>> [252300.060188] Not tainted 4.19.19-debian64.all+1.1 #1 Feb 4 >>>>> 20:04:46 tettnang kernel: [252300.060197] "echo 0 > >>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 4 >>>>> 20:04:46 tettnang kernel: [252300.060207] md0_raid1 D 0 317 >>>>> >>>>> 2 0x80000000 Feb 4 20:04:46 tettnang kernel: [252300.060211] Call >>>>> >>>>> Trace: >>>>> Feb 4 20:04:46 tettnang kernel: [252300.060222] ? >>>>> __schedule+0x2a2/0x8c0 >>>>> Feb 4 20:04:46 tettnang kernel: [252300.060226] ? >>>>> _raw_spin_unlock_irqrestore+0x20/0x40 Feb 4 20:04:46 tettnang kernel: >>>>> [252300.060229] schedule+0x32/0x90 Feb 4 20:04:46 tettnang kernel: >>>>> [252300.060241] md_super_wait+0x69/0xa0 [md_mod] Feb 4 20:04:46 >>>>> tettnang kernel: [252300.060247] ? finish_wait+0x80/0x80 Feb 4 >>>>> 20:04:46 >>>>> tettnang kernel: [252300.060255] md_bitmap_wait_writes+0x8e/0xa0 >>>>> [md_mod] Feb 4 20:04:46 tettnang kernel: [252300.060263] ? >>>>> md_bitmap_get_counter+0x42/0xd0 [md_mod] Feb 4 20:04:46 tettnang >>>>> kernel: >>>>> [252300.060271] md_bitmap_daemon_work+0x1e8/0x380 [md_mod] Feb 4 >>>>> 20:04:46 tettnang kernel: [252300.060278] ? md_rdev_init+0xb0/0xb0 >>>>> [md_mod] Feb 4 20:04:46 tettnang kernel: [252300.060285] >>>>> md_check_recovery+0x26/0x540 [md_mod] Feb 4 20:04:46 tettnang kernel: >>>>> [252300.060290] raid1d+0x5c/0xf00 [raid1] Feb 4 20:04:46 tettnang >>>>> kernel: [252300.060294] ? preempt_count_add+0x79/0xb0 Feb 4 20:04:46 >>>>> tettnang kernel: [252300.060298] ? lock_timer_base+0x67/0x80 Feb 4 >>>>> 20:04:46 tettnang kernel: [252300.060302] ? >>>>> _raw_spin_unlock_irqrestore+0x20/0x40 Feb 4 20:04:46 tettnang kernel: >>>>> [252300.060304] ? try_to_del_timer_sync+0x4d/0x80 Feb 4 20:04:46 >>>>> tettnang kernel: [252300.060306] ? del_timer_sync+0x35/0x40 Feb 4 >>>>> 20:04:46 tettnang kernel: [252300.060309] ? >>>>> schedule_timeout+0x17a/0x3b0 >>>>> Feb 4 20:04:46 tettnang kernel: [252300.060312] ? >>>>> preempt_count_add+0x79/0xb0 Feb 4 20:04:46 tettnang kernel: >>>>> [252300.060315] ? _raw_spin_lock_irqsave+0x25/0x50 Feb 4 20:04:46 >>>>> tettnang kernel: [252300.060321] ? md_rdev_init+0xb0/0xb0 [md_mod] Feb >>>>> 4 20:04:46 tettnang kernel: [252300.060327] ? md_thread+0xf9/0x160 >>>>> [md_mod] Feb 4 20:04:46 tettnang kernel: [252300.060330] ? >>>>> r1bio_pool_alloc+0x20/0x20 [raid1] Feb 4 20:04:46 tettnang kernel: >>>>> [252300.060336] md_thread+0xf9/0x160 [md_mod] Feb 4 20:04:46 tettnang >>>>> kernel: [252300.060340] ? finish_wait+0x80/0x80 Feb 4 20:04:46 >>>>> tettnang >>>>> kernel: [252300.060344] kthread+0x112/0x130 Feb 4 20:04:46 tettnang >>>>> kernel: [252300.060346] ? kthread_create_worker_on_cpu+0x70/0x70 Feb 4 >>>>> 20:04:46 tettnang kernel: [252300.060350] ret_from_fork+0x35/0x40 >>>>> >>>>> I saw that there was a similar problem with raid10 and an upstream patch >>>>> >>>>> e820d55cb99dd93ac2dc949cf486bb187e5cd70d >>>>> md: fix raid10 hang issue caused by barrier >>>>> by Guoqing Jiang >>>>> >>>>> I wonder if there is a similar fix needed for raid1? >>>> Seems not, the calltrace tells the previous write superblock IO was not >>>> finish as expected, >>>> there is a report for raid5 which has similar problem with md_super_wait >>>> in the link [1]. Maybe >>>> you can disable blk-mq to narrow down the issue as well. >>> I already did for 4 weeks. I didn't saw this with blk-mq disabled (for >>> scsi >>> and md), though this may be by luck. >> Then I guess it maybe related to blk-mq, which scheduler are you used >> with blk-mq? >> And maybe you can switch it to see if it is caused by specified >> scheduler or not. > mq-deadline for SCSI and none for md and dm. Can you try with the patch [1]? In case the block was caused by flush. [1]: https://patchwork.kernel.org/patch/10787903/ Thanks, Guoqing