Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
Subject: Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
To:     Logan Gunthorpe <logang@deltatee.com>, Xiao Ni <xni@redhat.com>
Cc:     open list <linux-kernel@vger.kernel.org>,
        linux-raid <linux-raid@vger.kernel.org>,
        Song Liu <song@kernel.org>,
        Christoph Hellwig <hch@infradead.org>,
        Stephen Bates <sbates@raithlin.com>,
        Martin Oliveira <Martin.Oliveira@eideticom.com>,
        David Sloan <David.Sloan@eideticom.com>
References: <20220420195425.34911-1-logang@deltatee.com>
 <CALTww28fwNpm0O_jc7-2Xr0JSX9i6F1kgoUQ8m_k6ZgPa1XxXw@mail.gmail.com>
 <c14c0103-9cbd-7d0f-486b-344dd33725ab@deltatee.com>
 <4094aed9-d22d-d14f-07a7-5abe599beeab@linux.dev>
 <8d8fbf24-51b5-a076-b7ad-fcbb7d5c275e@deltatee.com>
 <CALTww28SuvhzCL6p4L9y9ZH5Mmgss-tTm_QzbEo60hZOXAUS0A@mail.gmail.com>
 <4f0b44aa-77a4-9896-b780-eb52241954ae@deltatee.com>
From:   Guoqing Jiang <guoqing.jiang@linux.dev>
Message-ID: <cba5f13e-0481-9dc9-36a4-ed29bf34220f@linux.dev>
Date:   Fri, 29 Apr 2022 08:49:03 +0800
MIME-Version: 1.0
In-Reply-To: <4f0b44aa-77a4-9896-b780-eb52241954ae@deltatee.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Precedence: bulk


On 4/29/22 5:22 AM, Logan Gunthorpe wrote:
>
> On 2022-04-25 10:12, Xiao Ni wrote:
>>> I do know that lkp-tests has run it on this series as I did get an error
>>> from it. But while I'm pretty sure that error has been resolved, I was
>>> never able to figure out how to run them locally.
>>>
>> Hi Logan
>>
>> You can clone the mdadm repo at
>> git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
>> Then you can find there is a script test under the directory. It's not
>> under the tests directory.
>> The test cases are under tests directory.
> So I've been fighting with this and it seems there are just a ton of
> failures in these tests without my changes. Running on the latest master
> (52c67fcdd6dad) with stock v5.17.5 I see major brokenness. About 17 out
> of 44 tests that run failed. I had to run with --disable-integrity
> because those tests seem to hang on an infinite loop waiting for the md
> array to go into the U state (even though it appears idle).
>
> Even though I ran the tests with '--keep-going', the testing stopped
> after the 07revert-grow reported errors in dmesg -- even though the only
> errors printed to dmesg were that of mdadm segfaulting.
>
> Running on md/md-next seems to get a bit further (to
> 10ddf-create-fail-rebuild) and stops with the same segfaulting issue (or
> perhaps the 07 test only randomly fails first -- I haven't run it that
> many times). Though most of the tests between these points fail anyway.
>
> My upcoming v3 patches cause no failures that are different from the
> md/md-next branch. But it seems these tests have rotted to the point
> that they aren't all that useful; or maybe there are a ton of
> regressions in the kernel already and nobody was paying much attention.

I can't agree with you anymore. I would say some patches were submitted
without run enough tests, then after one by one kernel release, the thing
becomes worse.

This is also the reason that I recommend run mdadm tests since md raid
is a complex subsystem, perhaps a simple change could cause regression.
And considering there are really limited developers and reviewers in the
community, the chance to cause regression get bigger.

> I have also tried to test certain cases that appear broken in recent
> kernels anyway (like reducing the number of disks in a raid5 array hangs
> on the first stripe to reshape).
>
> In any case I have a very rough ad-hoc test suite I've been expanding
> that is targeted at testing my specific changes. Testing these changes
> has definitely been challenging. In any case, I've published my tests here:
>
> https://github.com/Eideticom/raid5-tests

If I may, is it possible to submit your tests to mdadm as well? So we can
have one common place to contain enough tests.

Thanks,
Guoqing