Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp4201906pxk; Tue, 29 Sep 2020 18:02:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwkqh1KKsEYvsCtCe1c8RBrIGP1u2kqfO0S0MhOuJiqza3VPGVlA+cERCxEHujP8DIVNnZy X-Received: by 2002:a17:906:4941:: with SMTP id f1mr325865ejt.417.1601427750538; Tue, 29 Sep 2020 18:02:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601427750; cv=none; d=google.com; s=arc-20160816; b=HCFk57Zia2gmTwZQYdLREmb8KjDxMvsPG+wI9xV0DGcL+BvNewqRVVjvU/NMpbtsN/ owiEbSnTvqHFYhjLUFJFmtUt/fpjgB3IUt/eSC0Pu8Ja8wC7zuL+e3Cx+yZuPGXBHwuz gT1ebK6DK61ryZHX8K3Fs91MMn7GBUXPbsKhR9d51gACXG0uoW2dOWR7z1HeJHmuIn81 4dCzYy+lesxsEAS9wM9QD8L1nukOGUebp3R+Kqa5UstwfKBWphsVi7FT2w+h7xH39s6R nbKKdi4rts63gyd4W9vjOv2K6jjcpM0q93DrsaKxCEWAn9CXqu9Cf+MBxWphxRIxtA2S am+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:content-disposition:mime-version :message-id:subject:to:from:date:dkim-signature; bh=WAjTgkFLiVHH26Oz7ggpE0ffPx2CzaDEn2khtIELZCA=; b=DCuRN0PRXYbD5fuVq9L9k8m+Js7TcygntaBFtaNZs7a0NJzmjc4P1yDa6iCvoWPAiU Y9XmO8V1l4xS0p2Ny/2pmGnAuHUTvdI6egEiZ2bLNA8MgpMNeUoCBCA2czxQFyiPFJOm Eayw2Fnb1HKnVUagp72CUgGz48P5p3SiQCxs1skbee0/Jxqem4mhoN7/DhvRJ7PEvwdC vlTPYR4FWoZ9wpkjh4Py04NZ2Q+IT5ADMGuj88E9q3QQMooABjUH2zUPi5F3H87sbnUf Po8eg/aMVi2nBtUqLECG+yhIzyK5z3Swh1cE5B0xWSMXeYX5z94Wd4tzBn++CjIu1uA6 KapQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@madore.org header.s=achernar header.b=XHohDYDV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i12si320264edq.481.2020.09.29.18.02.08; Tue, 29 Sep 2020 18:02:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@madore.org header.s=achernar header.b=XHohDYDV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729448AbgI3A7F (ORCPT + 99 others); Tue, 29 Sep 2020 20:59:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729179AbgI3A7F (ORCPT ); Tue, 29 Sep 2020 20:59:05 -0400 X-Greylist: delayed 351 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Tue, 29 Sep 2020 17:59:05 PDT Received: from achernar.gro-tsen.net (achernar6.gro-tsen.net [IPv6:2001:bc8:30e8::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6200BC061755 for ; Tue, 29 Sep 2020 17:59:05 -0700 (PDT) Received: by achernar.gro-tsen.net (Postfix, from userid 500) id 4F9682140517; Wed, 30 Sep 2020 02:53:09 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=madore.org; s=achernar; t=1601427189; bh=JARLd7ygBm66sDOXMr+Zt8YZRTUWjQER5orpvTYTFts=; h=Date:From:To:Subject:From; b=XHohDYDVnrdH4qJ/gIHsnDfLKRE1Jwr/hYzJ1Nb0Qgf9EJyaTDIfPZYHf9fIbtJkT Ggt0wyMnF+9zG7cqStX+TsCh2Z3n2tmxDlreD53FlW1Lh3tA6keN9QwMUPvCPdvCaJ rCEdyViztQilVGLMZ16BchWJeyostc7IQ/SV5cMI= Date: Wed, 30 Sep 2020 02:53:09 +0200 From: David Madore To: Linux Kernel mailing-list Subject: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start) Message-ID: <20200930005309.cl5ankdzfe6pxkgq@achernar.gro-tsen.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: NeoMutt/20170113 (1.7.2) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear list, I'm trying to reshape a 3-disk RAID5 array to a 4-disk RAID6 array (of the same total size and per-device size) using linux kernel 4.9.237 on x86_64. I understand that this reshaping operation is supposed to be supported. But it appears perpetually stuck at 0% with no operation taking place whatsoever (the slices are unchanged apart from their metadata, the backup file contains only zeroes, and nothing happens). I wonder if this is a know kernel bug, or what else could explain it, and I have no idea how to debug this sort of thing. Here are some details on exactly what I've been doing. I'll be using loopbacks to illustrate, but I've done this on real partitions and there was no difference. ## Create some empty loop devices: for i in 0 1 2 3 ; do dd if=/dev/zero of=test-${i} bs=1024k count=16 ; done for i in 0 1 2 3 ; do losetup /dev/loop${i} test-${i} ; done ## Make a RAID array out of the first three: mdadm --create /dev/md/test --level=raid5 --chunk=256 --name=test \ --metadata=1.0 --raid-devices=3 /dev/loop{0,1,2} ## Populate it with some content, just to see what's going on: for i in $(seq 0 63) ; do printf "This is chunk %d (0x%x).\n" $i $i \ | dd of=/dev/md/test bs=256k seek=$i ; done ## Now try to reshape the array from 3-way RAID5 to 4-way RAID6: mdadm --manage /dev/md/test --add-spare /dev/loop3 mdadm --grow /dev/md/test --level=6 --raid-devices=4 \ --backup-file=test-reshape.backup ...and then nothing happens. /proc/mdstat reports no progress whatsoever: md112 : active raid6 loop3[4] loop2[3] loop1[1] loop0[0] 32256 blocks super 1.0 level 6, 256k chunk, algorithm 18 [4/3] [UUU_] [>....................] reshape = 0.0% (1/16128) finish=1.0min speed=244K/sec The loop file contents are unchanged except for the metadata superblock, the backup file is entirely empty, and no activity whatsoever is happening. Actually, further investigation shows that the array is in fact operational as a RAID6 array, but one where the Q-syndrome is stuck in the last device: writing data to the md device (e.g., by repopulating it with the same command as above) does cause loop3 to be updated as expected for such a layout. It's just the reshaping which doesn't take place (or indeed begin). For completeness, here's what mdadm --detail /dev/md/test looks like before the reshape, in my example: /dev/md/test: Version : 1.0 Creation Time : Wed Sep 30 02:42:30 2020 Raid Level : raid5 Array Size : 32256 (31.50 MiB 33.03 MB) Used Dev Size : 16128 (15.75 MiB 16.52 MB) Raid Devices : 3 Total Devices : 4 Persistence : Superblock is persistent Update Time : Wed Sep 30 02:44:21 2020 State : clean Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 256K Name : vega.stars:test (local to host vega.stars) UUID : 30f40e34:b9a52ff0:75c8b063:77234832 Events : 20 Number Major Minor RaidDevice State 0 7 0 0 active sync /dev/loop0 1 7 1 1 active sync /dev/loop1 3 7 2 2 active sync /dev/loop2 4 7 3 - spare /dev/loop3 - and here's what it looks like after the attempted reshape has started (or rather, refused to start): /dev/md/test: Version : 1.0 Creation Time : Wed Sep 30 02:42:30 2020 Raid Level : raid6 Array Size : 32256 (31.50 MiB 33.03 MB) Used Dev Size : 16128 (15.75 MiB 16.52 MB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Wed Sep 30 02:44:54 2020 State : clean, degraded, reshaping Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric-6 Chunk Size : 256K Reshape Status : 0% complete New Layout : left-symmetric Name : vega.stars:test (local to host vega.stars) UUID : 30f40e34:b9a52ff0:75c8b063:77234832 Events : 22 Number Major Minor RaidDevice State 0 7 0 0 active sync /dev/loop0 1 7 1 1 active sync /dev/loop1 3 7 2 2 active sync /dev/loop2 4 7 3 3 spare rebuilding /dev/loop3 I also tried writing "frozen" and then "resync" to the /sys/block/md112/md/sync_action file with no further results. I welcome any suggestions on how to investigate, work around, or fix this problem. Happy hacking, -- David A. Madore ( http://www.madore.org/~david/ )