2022-10-06 22:24:03

by Jonathan Derrick

[permalink] [raw]
Subject: [PATCH 1/2] md/bitmap: Move unplug to daemon thread

It's been observed in raid1/raid10 configurations that synchronous I/O
can cause workloads resulting in greater than 40% bitmap updates. This
appears to be due to the synchronous workload requiring a bitmap flush
with every flush of the I/O list. Instead prefer to flush this
configuration in the daemon sleeper thread.

Signed-off-by: Jonathan Derrick <[email protected]>
---
drivers/md/md-bitmap.c | 1 +
drivers/md/raid1.c | 2 --
drivers/md/raid10.c | 4 ----
3 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index bf6dffadbe6f..451259b38d25 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1244,6 +1244,7 @@ void md_bitmap_daemon_work(struct mddev *mddev)
+ mddev->bitmap_info.daemon_sleep))
goto done;

+ md_bitmap_unplug(bitmap);
bitmap->daemon_lastrun = jiffies;
if (bitmap->allclean) {
mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 05d8438cfec8..42ba2d884773 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -793,8 +793,6 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect

static void flush_bio_list(struct r1conf *conf, struct bio *bio)
{
- /* flush any pending bitmap writes to disk before proceeding w/ I/O */
- md_bitmap_unplug(conf->mddev->bitmap);
wake_up(&conf->wait_barrier);

while (bio) { /* submit pending writes */
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 9117fcdee1be..e43352aae3c4 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -881,9 +881,6 @@ static void flush_pending_writes(struct r10conf *conf)
__set_current_state(TASK_RUNNING);

blk_start_plug(&plug);
- /* flush any pending bitmap writes to disk
- * before proceeding w/ I/O */
- md_bitmap_unplug(conf->mddev->bitmap);
wake_up(&conf->wait_barrier);

while (bio) { /* submit pending writes */
@@ -1078,7 +1075,6 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)

/* we aren't scheduling, so we can do the write-out directly. */
bio = bio_list_get(&plug->pending);
- md_bitmap_unplug(mddev->bitmap);
wake_up(&conf->wait_barrier);

while (bio) { /* submit pending writes */
--
2.31.1


2022-10-10 04:56:36

by Yujie Liu

[permalink] [raw]
Subject: [md/bitmap] 935dbb156b: mdadm-selftests.05r1-bitmapfile.fail

Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 935dbb156b7a46615c0c4819ded5f5ef14bf9b99 ("[PATCH 1/2] md/bitmap: Move unplug to daemon thread")
url: https://github.com/intel-lab-lkp/linux/commits/Jonathan-Derrick/Bitmap-percentage-flushing/20221007-061054
base: git://git.kernel.org/cgit/linux/kernel/git/song/md.git md-next
patch link: https://lore.kernel.org/linux-raid/[email protected]

in testcase: mdadm-selftests
version: mdadm-selftests-x86_64-5f41845-1_20220826
with following parameters:

disk: 1HDD
test_prefix: 05

on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz (Haswell) with 16G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


2022-10-10 02:47:33 mkdir -p /var/tmp
2022-10-10 02:47:33 mke2fs -t ext3 -b 4096 -J size=4 -q /dev/sda1
2022-10-10 02:48:06 mount -t ext3 /dev/sda1 /var/tmp
sed -e 's/{DEFAULT_METADATA}/1.2/g' \
-e 's,{MAP_PATH},/run/mdadm/map,g' mdadm.8.in > mdadm.8
/usr/bin/install -D -m 644 mdadm.8 /usr/share/man/man8/mdadm.8
/usr/bin/install -D -m 644 mdmon.8 /usr/share/man/man8/mdmon.8
/usr/bin/install -D -m 644 md.4 /usr/share/man/man4/md.4
/usr/bin/install -D -m 644 mdadm.conf.5 /usr/share/man/man5/mdadm.conf.5
/usr/bin/install -D -m 644 udev-md-raid-creating.rules /lib/udev/rules.d/01-md-raid-creating.rules
/usr/bin/install -D -m 644 udev-md-raid-arrays.rules /lib/udev/rules.d/63-md-raid-arrays.rules
/usr/bin/install -D -m 644 udev-md-raid-assembly.rules /lib/udev/rules.d/64-md-raid-assembly.rules
/usr/bin/install -D -m 644 udev-md-clustered-confirm-device.rules /lib/udev/rules.d/69-md-clustered-confirm-device.rules
/usr/bin/install -D -m 755 mdadm /sbin/mdadm
/usr/bin/install -D -m 755 mdmon /sbin/mdmon
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-add-internalbitmap... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-add-internalbitmap-v1a... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-add-internalbitmap-v1b... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-add-internalbitmap-v1c... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-bitmapfile... FAILED - see /var/tmp/05r1-bitmapfile.log and /var/tmp/fail05r1-bitmapfile.log for details
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-failfast... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-grow-external... FAILED - see /var/tmp/05r1-grow-external.log and /var/tmp/fail05r1-grow-external.log for details
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-grow-internal... FAILED - see /var/tmp/05r1-grow-internal.log and /var/tmp/fail05r1-grow-internal.log for details
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-grow-internal-1... FAILED - see /var/tmp/05r1-grow-internal-1.log and /var/tmp/fail05r1-grow-internal-1.log for details
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-internalbitmap... FAILED - see /var/tmp/05r1-internalbitmap.log and /var/tmp/fail05r1-internalbitmap.log for details
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-internalbitmap-v1a... FAILED - see /var/tmp/05r1-internalbitmap-v1a.log and /var/tmp/fail05r1-internalbitmap-v1a.log for details
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-internalbitmap-v1b... FAILED - see /var/tmp/05r1-internalbitmap-v1b.log and /var/tmp/fail05r1-internalbitmap-v1b.log for details
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-internalbitmap-v1c... FAILED - see /var/tmp/05r1-internalbitmap-v1c.log and /var/tmp/fail05r1-internalbitmap-v1c.log for details
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-n3-bitmapfile... FAILED - see /var/tmp/05r1-n3-bitmapfile.log and /var/tmp/fail05r1-n3-bitmapfile.log for details
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-re-add... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-re-add-nosuper... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-remove-internalbitmap... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-remove-internalbitmap-v1a... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-remove-internalbitmap-v1b... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r1-remove-internalbitmap-v1c... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r5-bitmapfile... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r5-internalbitmap... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r6-bitmapfile... succeeded
Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
/lkp/benchmarks/mdadm-selftests/tests/05r6tor0...
ERROR: dmesg prints errors when testing 05r6tor0!

FAILED - see /var/tmp/05r6tor0.log and /var/tmp/fail05r6tor0.log for details


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/r/[email protected]


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (6.40 kB)
config-6.0.0-rc2-00101-g935dbb156b7a (170.83 kB)
job-script (5.54 kB)
dmesg.xz (25.24 kB)
mdadm-selftests (4.94 kB)
job.yaml (4.79 kB)
reproduce (100.00 B)
Download all attachments

2022-10-17 23:03:29

by Jonathan Derrick

[permalink] [raw]
Subject: Re: [md/bitmap] 935dbb156b: mdadm-selftests.05r1-bitmapfile.fail

I think some of these are invalid, see below

On 10/9/2022 10:32 PM, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-11):
>
> commit: 935dbb156b7a46615c0c4819ded5f5ef14bf9b99 ("[PATCH 1/2] md/bitmap: Move unplug to daemon thread")
> url: https://github.com/intel-lab-lkp/linux/commits/Jonathan-Derrick/Bitmap-percentage-flushing/20221007-061054
> base: git://git.kernel.org/cgit/linux/kernel/git/song/md.git md-next
> patch link: https://lore.kernel.org/linux-raid/[email protected]
>
> in testcase: mdadm-selftests
> version: mdadm-selftests-x86_64-5f41845-1_20220826
> with following parameters:
>
> disk: 1HDD
> test_prefix: 05
>
> on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz (Haswell) with 16G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> 2022-10-10 02:47:33 mkdir -p /var/tmp
> 2022-10-10 02:47:33 mke2fs -t ext3 -b 4096 -J size=4 -q /dev/sda1
> 2022-10-10 02:48:06 mount -t ext3 /dev/sda1 /var/tmp
> sed -e 's/{DEFAULT_METADATA}/1.2/g' \
> -e 's,{MAP_PATH},/run/mdadm/map,g' mdadm.8.in > mdadm.8
> /usr/bin/install -D -m 644 mdadm.8 /usr/share/man/man8/mdadm.8
> /usr/bin/install -D -m 644 mdmon.8 /usr/share/man/man8/mdmon.8
> /usr/bin/install -D -m 644 md.4 /usr/share/man/man4/md.4
> /usr/bin/install -D -m 644 mdadm.conf.5 /usr/share/man/man5/mdadm.conf.5
> /usr/bin/install -D -m 644 udev-md-raid-creating.rules /lib/udev/rules.d/01-md-raid-creating.rules
> /usr/bin/install -D -m 644 udev-md-raid-arrays.rules /lib/udev/rules.d/63-md-raid-arrays.rules
> /usr/bin/install -D -m 644 udev-md-raid-assembly.rules /lib/udev/rules.d/64-md-raid-assembly.rules
> /usr/bin/install -D -m 644 udev-md-clustered-confirm-device.rules /lib/udev/rules.d/69-md-clustered-confirm-device.rules
> /usr/bin/install -D -m 755 mdadm /sbin/mdadm
> /usr/bin/install -D -m 755 mdmon /sbin/mdmon
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-add-internalbitmap... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-add-internalbitmap-v1a... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-add-internalbitmap-v1b... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-add-internalbitmap-v1c... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-bitmapfile... FAILED - see /var/tmp/05r1-bitmapfile.log and /var/tmp/fail05r1-bitmapfile.log for details

This one in particular is designed to set failure if 0 bits are dirty after a resync, followed by testdev.
Running this on a vanilla kernel results in below failure:
+++ mdadm -X /var/tmp/bitmap
+++ rm -f /var/tmp/stderr
+++ sed -n -e 's/.*Bitmap.* \([0-9]*\) dirty.*/\1/p'
+++ case $* in
+++ case $* in
+++ /home/nodlab/mdadm/mdadm --quiet -X /var/tmp/bitmap
+++ rv=0
+++ case $* in
+++ cat /var/tmp/stderr
+++ return 0
++ dirty2=0
++ '[' 0 -lt 400 -o 0 -ne 0 ']'
++ echo 'ERROR bad '\''dirty'\'' counts: 0 and 0'
ERROR bad 'dirty' counts: 0 and 0


Where the exit check is:
mdadm --assemble $md0 --bitmap=$bmf $dev1 $dev2
testdev $md0 1 $mdsize1a 64
dirty1=`mdadm -X $bmf | sed -n -e 's/.*Bitmap.* \([0-9]*\) dirty.*/\1/p'`
sleep 4
dirty2=`mdadm -X $bmf | sed -n -e 's/.*Bitmap.* \([0-9]*\) dirty.*/\1/p'`

if [ $dirty1 -lt 400 -o $dirty2 -ne 0 ]
then echo >&2 "ERROR bad 'dirty' counts: $dirty1 and $dirty2"
exit 1


Seems like if 'testdev()' is quick enough, $dirty1 could be < 400


> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-failfast... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-grow-external... FAILED - see /var/tmp/05r1-grow-external.log and /var/tmp/fail05r1-grow-external.log for details
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-grow-internal... FAILED - see /var/tmp/05r1-grow-internal.log and /var/tmp/fail05r1-grow-internal.log for details
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-grow-internal-1... FAILED - see /var/tmp/05r1-grow-internal-1.log and /var/tmp/fail05r1-grow-internal-1.log for details
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-internalbitmap... FAILED - see /var/tmp/05r1-internalbitmap.log and /var/tmp/fail05r1-internalbitmap.log for details
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-internalbitmap-v1a... FAILED - see /var/tmp/05r1-internalbitmap-v1a.log and /var/tmp/fail05r1-internalbitmap-v1a.log for details
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-internalbitmap-v1b... FAILED - see /var/tmp/05r1-internalbitmap-v1b.log and /var/tmp/fail05r1-internalbitmap-v1b.log for details
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-internalbitmap-v1c... FAILED - see /var/tmp/05r1-internalbitmap-v1c.log and /var/tmp/fail05r1-internalbitmap-v1c.log for details
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-n3-bitmapfile... FAILED - see /var/tmp/05r1-n3-bitmapfile.log and /var/tmp/fail05r1-n3-bitmapfile.log for details
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-re-add... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-re-add-nosuper... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-remove-internalbitmap... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-remove-internalbitmap-v1a... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-remove-internalbitmap-v1b... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r1-remove-internalbitmap-v1c... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r5-bitmapfile... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r5-internalbitmap... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r6-bitmapfile... succeeded
> Testing on linux-6.0.0-rc2-00101-g935dbb156b7a kernel
> /lkp/benchmarks/mdadm-selftests/tests/05r6tor0...
> ERROR: dmesg prints errors when testing 05r6tor0!
>
> FAILED - see /var/tmp/05r6tor0.log and /var/tmp/fail05r6tor0.log for details
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <[email protected]>
> | Link: https://lore.kernel.org/r/[email protected]
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>