2020-02-08 08:36:42

by Josh Triplett

[permalink] [raw]
Subject: Applying pipe fix this merge window?

I've been hammering on your pipe fix patch (switching to exclusive wait
queues) for a month or so, on several different systems, and I've run
into no issues with it. The patch *substantially* improves parallel
build times on large (~100 CPU) systems, both with parallel make and
with other things that use make's pipe-based jobserver.

All current distributions (including stable and long-term stable
distributions) have versions of GNU make that no longer have the
jobserver bug.

Based on that, would you consider merging the pipe fix patch in the
current merge window, giving people plenty of time to hammer on it
further?

- Josh Triplett


2020-02-08 18:46:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: Applying pipe fix this merge window?

On Sat, Feb 8, 2020 at 12:36 AM Josh Triplett <[email protected]> wrote:
>
> Based on that, would you consider merging the pipe fix patch in the
> current merge window, giving people plenty of time to hammer on it
> further?

Sure, I'll apply it and see if anybody hollers.

Worst case, we'll revert if there are too many people who have the
buggy version of make and can't easily upgrade. But even then it might
not be that noticeable, since the make jobserver problem ends up
benign fairly timing-specific too. I may have just been very unlucky
in hitting it back when..

Linus

2020-02-08 21:54:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: Applying pipe fix this merge window?

On Sat, Feb 8, 2020 at 10:45 AM Linus Torvalds
<[email protected]> wrote:
>
> Sure, I'll apply it and see if anybody hollers.

Btw, can you verify that commit 0ddad21d3e99 ("pipe: use exclusive
waits when reading or writing") matches what you've been testing.

I did put a "tested-by" for you in it, but I had a few different
versions of that patch because of other changes in fs/pipe.c and
because I did it while doing the other changes, so maybe I should have
double-checked with you first.

Linus

2020-02-08 22:05:23

by Josh Triplett

[permalink] [raw]
Subject: Re: Applying pipe fix this merge window?

On Sat, Feb 08, 2020 at 01:53:54PM -0800, Linus Torvalds wrote:
> On Sat, Feb 8, 2020 at 10:45 AM Linus Torvalds
> <[email protected]> wrote:
> >
> > Sure, I'll apply it and see if anybody hollers.
>
> Btw, can you verify that commit 0ddad21d3e99 ("pipe: use exclusive
> waits when reading or writing") matches what you've been testing.

Byte-for-byte identical.

> I did put a "tested-by" for you in it, but I had a few different
> versions of that patch because of other changes in fs/pipe.c and
> because I did it while doing the other changes, so maybe I should have
> double-checked with you first.

The commit message for 0ddad21d3e99 looks completely reasonable to me.

Thank you!

- Josh Triplett

2020-02-12 20:05:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: Applying pipe fix this merge window?

On Sat, Feb 8, 2020 at 12:36 AM Josh Triplett <[email protected]> wrote:
>
> I've been hammering on your pipe fix patch (switching to exclusive wait
> queues) for a month or so, on several different systems, and I've run
> into no issues with it. The patch *substantially* improves parallel
> build times on large (~100 CPU) systems, both with parallel make and
> with other things that use make's pipe-based jobserver.

Hmm. I just applied the doc fix that Randy sent, and that made me
revisit this commit and the commit message.

And I realized that I find it surprising that it makes your build
times noticeably better.

Yes, I have that silly example program to show the issue in the commit
message, and yes, the exclusive directed write->read wakeups should
most definitely improve by that commit.

But the make jobserver code ends up using "poll()/pselect()" and
non-blocking reads, because of how it handles the child death signals.

Which means that none of the nice exclusive directed write->read
wakeups should even trigger in the first place, because the readers
never block, and he poll/pselect code doesn't use exclusive wakeups
(because it can't - it doesn't actually consume the data).

So I was looking at it, and going "it should actually not help GNU
jobserver at all" in the fixed jobserver case.

So humor me, Josh - can you try to figure out why your numbers changed
for this commit?

Linus

2020-02-14 06:10:27

by Josh Triplett

[permalink] [raw]
Subject: Re: Applying pipe fix this merge window?

On Wed, Feb 12, 2020 at 12:03:26PM -0800, Linus Torvalds wrote:
> And I realized that I find it surprising that it makes your build
> times noticeably better.
>
> Yes, I have that silly example program to show the issue in the commit
> message, and yes, the exclusive directed write->read wakeups should
> most definitely improve by that commit.
>
> But the make jobserver code ends up using "poll()/pselect()" and
> non-blocking reads, because of how it handles the child death signals.
>
> Which means that none of the nice exclusive directed write->read
> wakeups should even trigger in the first place, because the readers
> never block, and he poll/pselect code doesn't use exclusive wakeups
> (because it can't - it doesn't actually consume the data).
>
> So I was looking at it, and going "it should actually not help GNU
> jobserver at all" in the fixed jobserver case.

I dug into this a little further yesterday and today:

- With hindsight, I realized that the performance improvements I
observed for GNU make didn't measure the pipe fix in isolation; they
measured 5.4 versus ~5.5-rc4 plus the pipe fix, which would include
all the other pipe work in 5.5 and potentially other optimizations.
*That* showed substantial performance improvements in GNU make, on the
order of a couple of seconds in a 30-60 second kernel build. ("5.5-rc4
plus pipe fix" is what I hammered on for a month on various systems.)

- Measuring the pipe fix patch in isolation
(0bf999f9c5e74c7ecf9dafb527146601e5c848b9, with and without the pipe
fix reverted, with nothing else changed), GNU make performance indeed
doesn't show any difference.

- Other things that use the GNU make jobserver (with pipe fds in
blocking mode) benefit much more heavily, in wall-clock time and in
total CPU time. I saw jobs that involved just a minute or two of
wall-clock time, where the total CPU time went down by *minutes*.

Hope that helps,
Josh Triplett