2023-05-12 03:39:31

by Masahiro Yamada

[permalink] [raw]
Subject: [RFC] [kbuild test robot] random-order parallel building

Hello, maintainers of the kbuild test robot.

I have a proposal for the 0day tests.


GNU Make traditionally processes the dependency from left to right.

For example, if you have dependency like this:

all: foo bar baz

GNU Make builds foo, bar, baz, in this order.


Some projects that are not capable of parallel builds
rely on that behavior implicitly.

Kbuild, however, is intended to work well in parallel.
(As the maintainer, I really care about it.)


From time to time, people add "just worked for me" code,
but apparently that lacks proper dependency.
Sometimes it requires an expensive CPU to reproduce
parallel build issues.


For example, see this report,
https://lkml.org/lkml/2016/11/30/587

The report says 'make -j112' reproduces the broken parallel build.
Most people do not have such a build machine that comes with 112 cores.
It is difficult to reproduce it (or even notice it).

(Some time later, it was root-caused by 07a422bb213a)



GNU Make 4.4 got this option.

--shuffle[={SEED|random|reverse|none}]
Perform shuffle of prerequisites and goals.



'make --shuffle=reverse' will build in reverse order.
In the example above, baz, bar, foo.

'make --shuffle' will randomize the build order.


If there exists a missing dependency among foo, bar, baz,
it will fail to build.



We already perform the randconfig daily basis.
So, random-order parallel building is a similar idea.

Perhaps, it makes sense to add the "--shuffle=SEED" option
but it requires GNU Make 4.4. (or GNU Make 4.4.1)
Is this too new?



--
Best Regards
Masahiro Yamada


2023-05-12 07:28:53

by Li, Philip

[permalink] [raw]
Subject: Re: [RFC] [kbuild test robot] random-order parallel building

On Fri, May 12, 2023 at 12:25:13PM +0900, Masahiro Yamada wrote:
> Hello, maintainers of the kbuild test robot.
>
> I have a proposal for the 0day tests.

Thanks a lot for the proposal for the shuffle make, we will do some
investigation to try this random order parallel build. The gnu make
we currently use is 4.3, we will try the 4.4 to see any problem.

For the timeline, we may provide update later this month.

>
>
> GNU Make traditionally processes the dependency from left to right.
>
> For example, if you have dependency like this:
>
> all: foo bar baz
>
> GNU Make builds foo, bar, baz, in this order.
>
>
> Some projects that are not capable of parallel builds
> rely on that behavior implicitly.
>
> Kbuild, however, is intended to work well in parallel.
> (As the maintainer, I really care about it.)
>
>
> From time to time, people add "just worked for me" code,
> but apparently that lacks proper dependency.
> Sometimes it requires an expensive CPU to reproduce
> parallel build issues.
>
>
> For example, see this report,
> https://lkml.org/lkml/2016/11/30/587
>
> The report says 'make -j112' reproduces the broken parallel build.
> Most people do not have such a build machine that comes with 112 cores.
> It is difficult to reproduce it (or even notice it).
>
> (Some time later, it was root-caused by 07a422bb213a)
>
>
>
> GNU Make 4.4 got this option.
>
> --shuffle[={SEED|random|reverse|none}]
> Perform shuffle of prerequisites and goals.
>
>
>
> 'make --shuffle=reverse' will build in reverse order.
> In the example above, baz, bar, foo.
>
> 'make --shuffle' will randomize the build order.
>
>
> If there exists a missing dependency among foo, bar, baz,
> it will fail to build.
>
>
>
> We already perform the randconfig daily basis.
> So, random-order parallel building is a similar idea.
>
> Perhaps, it makes sense to add the "--shuffle=SEED" option
> but it requires GNU Make 4.4. (or GNU Make 4.4.1)
> Is this too new?

Our production environment is 4.3 right now. It will take extra
time for us to upgrade the environment but it's doable for us.

>
>
>
> --
> Best Regards
> Masahiro Yamada

2023-06-09 09:03:58

by Yujie Liu

[permalink] [raw]
Subject: Re: [RFC] [kbuild test robot] random-order parallel building

Hi Masahiro,

On Fri, 2023-05-12 at 15:09 +0800, Philip Li wrote:
> On Fri, May 12, 2023 at 12:25:13PM +0900, Masahiro Yamada wrote:
> > Hello, maintainers of the kbuild test robot.
> >
> > I have a proposal for the 0day tests.
>
> Thanks a lot for the proposal for the shuffle make, we will do some
> investigation to try this random order parallel build. The gnu make
> we currently use is 4.3, we will try the 4.4 to see any problem.
>
> For the timeline, we may provide update later this month.

We've upgraded to make v4.4.1 in kernel test robot and enabled random-
order parallel compiling in our randconfig build tests. The shuffle
seed is generated by hashing the randconfig, so it changes overtime and
can cover various random orders. We are still doing some internal
testing and will put it online once everything is done.

> >
> >
> > GNU Make traditionally processes the dependency from left to right.
> >
> > For example, if you have dependency like this:
> >
> >      all: foo bar baz
> >
> > GNU Make builds foo, bar, baz, in this order.
> >
> >
> > Some projects that are not capable of parallel builds
> > rely on that behavior implicitly.
> >
> > Kbuild, however, is intended to work well in parallel.
> > (As the maintainer, I really care about it.)
> >
> >
> > From time to time, people add "just worked for me" code,
> > but apparently that lacks proper dependency.
> > Sometimes it requires an expensive CPU to reproduce
> > parallel build issues.
> >
> >
> > For example, see this report,
> >   https://lkml.org/lkml/2016/11/30/587
> >
> > The report says 'make -j112' reproduces the broken parallel build.
> > Most people do not have such a build machine that comes with 112
> > cores.
> > It is difficult to reproduce it (or even notice it).
> >
> > (Some time later, it was root-caused by 07a422bb213a)

Thanks a lot for sharing this case. We tried to reproduce it, but looks
it dates back to v4.9-rc7 and throws some other errors when compiling
in our kbuild env, so we are not able to reproduce it yet. Not sure if
it is related with toolchain/compiler version or the kernel config.

This case mentioned that 'make -j112' can reproduce the breakage. We
assume this is under traditional serial order build. Does it imply that
it is likely to take much less parallel jobs to reproduce the breakage
when shuffle is set, say 'make --shuffle=SEED -j32', so developers are
able to reproduce it on an ordinary CPU with less cores?

Not sure if there are other known cases of parallel build breakage
(especially in recent kernels). If any, it would be very kind if you
could also share them. We can first try reproducing them in the bot to
confirm our test flow works well.

Another question is about bisection. Say the bot catches a breakage on
commit1 which root-caused to a previous commit2. If we keep the options
"--shuffle=<seed> -j<jobs>" consistent during the whole process of
bisection, will the breakage 100% show up on all the commits between
commit2 and commit1, or it is kind of possible to reproduce the
breakage, but not 100% reproducible on every commit during bisection?

Thanks a lot for this parallel building proposal, and we will keep
updating the status.

--
Best Regards,
Yujie Liu

> >
> >
> > GNU Make 4.4 got this option.
> >
> >   --shuffle[={SEED|random|reverse|none}]
> >        Perform shuffle of prerequisites and goals.
> >
> >
> >
> > 'make --shuffle=reverse' will build in reverse order.
> > In the example above, baz, bar, foo.
> >
> > 'make --shuffle' will randomize the build order.
> >
> >
> > If there exists a missing dependency among foo, bar, baz,
> > it will fail to build.
> >
> >
> >
> > We already perform the randconfig daily basis.
> > So, random-order parallel building is a similar idea.
> >
> > Perhaps, it makes sense to add the "--shuffle=SEED" option
> > but it requires GNU Make 4.4.  (or GNU Make 4.4.1)
> > Is this too new?
>
> Our production environment is 4.3 right now. It will take extra
> time for us to upgrade the environment but it's doable for us.
>
> >
> >
> >
> > --
> > Best Regards
> > Masahiro Yamada
>

2023-06-09 15:33:48

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC] [kbuild test robot] random-order parallel building



On 6/9/23 01:41, Liu, Yujie wrote:
> Hi Masahiro,
>
> On Fri, 2023-05-12 at 15:09 +0800, Philip Li wrote:
>> On Fri, May 12, 2023 at 12:25:13PM +0900, Masahiro Yamada wrote:
>>> Hello, maintainers of the kbuild test robot.
>>>
>>> I have a proposal for the 0day tests.
>> Thanks a lot for the proposal for the shuffle make, we will do some
>> investigation to try this random order parallel build. The gnu make
>> we currently use is 4.3, we will try the 4.4 to see any problem.
>>
>> For the timeline, we may provide update later this month.
> We've upgraded to make v4.4.1 in kernel test robot and enabled random-
> order parallel compiling in our randconfig build tests. The shuffle
> seed is generated by hashing the randconfig, so it changes overtime and
> can cover various random orders. We are still doing some internal
> testing and will put it online once everything is done.
>

I have also been using it since this proposal was submitted.
I haven't seen any issues with it.

thanks.
--
~Randy

2023-06-09 16:17:11

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [RFC] [kbuild test robot] random-order parallel building

On Fri, Jun 9, 2023 at 5:41 PM Liu, Yujie <[email protected]> wrote:
>
> Hi Masahiro,
>
> On Fri, 2023-05-12 at 15:09 +0800, Philip Li wrote:
> > On Fri, May 12, 2023 at 12:25:13PM +0900, Masahiro Yamada wrote:
> > > Hello, maintainers of the kbuild test robot.
> > >
> > > I have a proposal for the 0day tests.
> >
> > Thanks a lot for the proposal for the shuffle make, we will do some
> > investigation to try this random order parallel build. The gnu make
> > we currently use is 4.3, we will try the 4.4 to see any problem.
> >
> > For the timeline, we may provide update later this month.
>
> We've upgraded to make v4.4.1 in kernel test robot and enabled random-
> order parallel compiling in our randconfig build tests. The shuffle
> seed is generated by hashing the randconfig, so it changes overtime and
> can cover various random orders. We are still doing some internal
> testing and will put it online once everything is done.
>
> > >
> > >
> > > GNU Make traditionally processes the dependency from left to right.
> > >
> > > For example, if you have dependency like this:
> > >
> > > all: foo bar baz
> > >
> > > GNU Make builds foo, bar, baz, in this order.
> > >
> > >
> > > Some projects that are not capable of parallel builds
> > > rely on that behavior implicitly.
> > >
> > > Kbuild, however, is intended to work well in parallel.
> > > (As the maintainer, I really care about it.)
> > >
> > >
> > > From time to time, people add "just worked for me" code,
> > > but apparently that lacks proper dependency.
> > > Sometimes it requires an expensive CPU to reproduce
> > > parallel build issues.
> > >
> > >
> > > For example, see this report,
> > > https://lkml.org/lkml/2016/11/30/587
> > >
> > > The report says 'make -j112' reproduces the broken parallel build.
> > > Most people do not have such a build machine that comes with 112
> > > cores.
> > > It is difficult to reproduce it (or even notice it).
> > >
> > > (Some time later, it was root-caused by 07a422bb213a)
>
> Thanks a lot for sharing this case. We tried to reproduce it, but looks
> it dates back to v4.9-rc7 and throws some other errors when compiling
> in our kbuild env, so we are not able to reproduce it yet. Not sure if
> it is related with toolchain/compiler version or the kernel config.
>
> This case mentioned that 'make -j112' can reproduce the breakage. We
> assume this is under traditional serial order build. Does it imply that
> it is likely to take much less parallel jobs to reproduce the breakage
> when shuffle is set, say 'make --shuffle=SEED -j32', so developers are
> able to reproduce it on an ordinary CPU with less cores?


I think --shuffle will help a build machine with fewer cores
catch issues, but it is not a full randomization.

In my understanding, --shuffle still traverses depth-first.


Consider this example.


all: foo bar

foo: foo-sub

bar: bar-sub


Only either [1] or [2] happens.

[1] foo-sub -> foo -> bar-sub -> bar -> all
[2] bar-sub -> bar -> foo-sub -> foo -> all



foo-sub -> bar-sub -> bar -> foo -> all

is a possible order, but --shuffle never schedules like that.






> Not sure if there are other known cases of parallel build breakage
> (especially in recent kernels). If any, it would be very kind if you
> could also share them. We can first try reproducing them in the bot to
> confirm our test flow works well.

I do not remember any other real breakage.

>
> Another question is about bisection. Say the bot catches a breakage on
> commit1 which root-caused to a previous commit2. If we keep the options
> "--shuffle=<seed> -j<jobs>" consistent during the whole process of
> bisection, will the breakage 100% show up on all the commits between
> commit2 and commit1, or it is kind of possible to reproduce the
> breakage, but not 100% reproducible on every commit during bisection?


I am not sure, but I _guess_ git-bisect may not point to commit 2
if there is a Makefile change in between.



commit2 (root cause)
-> commitA (add Makefile change)
-> commit1 (0 day bot noticed an issue here)


Even if the same --shuffle=SEED is given, the issue may not be
reproducible on commit2..commitA if commitA changes a Makefile.


Thanks for considering this.




> Thanks a lot for this parallel building proposal, and we will keep
> updating the status.
>
> --
> Best Regards,
> Yujie Liu
>
> > >
> > >
> > > GNU Make 4.4 got this option.
> > >
> > > --shuffle[={SEED|random|reverse|none}]
> > > Perform shuffle of prerequisites and goals.
> > >
> > >
> > >
> > > 'make --shuffle=reverse' will build in reverse order.
> > > In the example above, baz, bar, foo.
> > >
> > > 'make --shuffle' will randomize the build order.
> > >
> > >
> > > If there exists a missing dependency among foo, bar, baz,
> > > it will fail to build.
> > >
> > >
> > >
> > > We already perform the randconfig daily basis.
> > > So, random-order parallel building is a similar idea.
> > >
> > > Perhaps, it makes sense to add the "--shuffle=SEED" option
> > > but it requires GNU Make 4.4. (or GNU Make 4.4.1)
> > > Is this too new?
> >
> > Our production environment is 4.3 right now. It will take extra
> > time for us to upgrade the environment but it's doable for us.
> >
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Masahiro Yamada
> >
>


--
Best Regards
Masahiro Yamada