2016-12-29 02:31:35

by Eric Anholt

[permalink] [raw]
Subject: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

For CMA allocations, we expect to occasionally hit this error path, at
which point CMA will retry. Given that, we shouldn't be spamming
dmesg about it.

The Raspberry Pi graphics driver does frequent CMA allocations, and
during regression testing this printk was sometimes occurring 100s of
times per second.

Signed-off-by: Eric Anholt <[email protected]>
Cc: linux-stable <[email protected]>
---
mm/page_alloc.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6de9440e3ae2..bea7204c14a5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7289,8 +7289,6 @@ int alloc_contig_range(unsigned long start, unsigned long end,

/* Make sure the range is really isolated. */
if (test_pages_isolated(outer_start, end, false)) {
- pr_info("%s: [%lx, %lx) PFNs busy\n",
- __func__, outer_start, end);
ret = -EBUSY;
goto done;
}
--
2.11.0


2016-12-29 09:13:06

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

This has been already brought up
http://lkml.kernel.org/r/[email protected] and there
was a proposed patch for that which ratelimited the output
http://lkml.kernel.org/r/[email protected] resp.
http://lkml.kernel.org/r/[email protected]

then the email thread just died out because the issue turned out to be a
configuration issue. Michal indicated that the message might be useful
so dropping it completely seems like a bad idea. I do agree that
something has to be done about that though. Can we reconsider the
ratelimit thing?

On Wed 28-12-16 18:31:31, Eric Anholt wrote:
> For CMA allocations, we expect to occasionally hit this error path, at
> which point CMA will retry. Given that, we shouldn't be spamming
> dmesg about it.
>
> The Raspberry Pi graphics driver does frequent CMA allocations, and
> during regression testing this printk was sometimes occurring 100s of
> times per second.
>
> Signed-off-by: Eric Anholt <[email protected]>
> Cc: linux-stable <[email protected]>
> ---
> mm/page_alloc.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6de9440e3ae2..bea7204c14a5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7289,8 +7289,6 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>
> /* Make sure the range is really isolated. */
> if (test_pages_isolated(outer_start, end, false)) {
> - pr_info("%s: [%lx, %lx) PFNs busy\n",
> - __func__, outer_start, end);
> ret = -EBUSY;
> goto done;
> }
> --
> 2.11.0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Michal Hocko
SUSE Labs

2016-12-29 17:27:48

by Eric Anholt

[permalink] [raw]
Subject: Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

Michal Hocko <[email protected]> writes:

> This has been already brought up
> http://lkml.kernel.org/r/[email protected] and there
> was a proposed patch for that which ratelimited the output
> http://lkml.kernel.org/r/[email protected] resp.
> http://lkml.kernel.org/r/[email protected]
>
> then the email thread just died out because the issue turned out to be a
> configuration issue. Michal indicated that the message might be useful
> so dropping it completely seems like a bad idea. I do agree that
> something has to be done about that though. Can we reconsider the
> ratelimit thing?

I agree that the rate of the message has gone up during 4.9 -- it used
to be a few per second. However, if this is an expected path during
normal operation, we shouldn't be clogging dmesg with it at all. So,
I'd rather we go with this patch, that is unless the KERN_DEBUG in your
ratelimit patch would keep it out of journald as well (un-ratelimited,
journald was eating 10% of a CPU processing the message, and I'd rather
it not be getting logged at all).


Attachments:
signature.asc (832.00 B)

2016-12-29 22:22:33

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

On Thu, Dec 29 2016, Eric Anholt wrote:
> Michal Hocko <[email protected]> writes:
>
>> This has been already brought up
>> http://lkml.kernel.org/r/[email protected] and there
>> was a proposed patch for that which ratelimited the output
>> http://lkml.kernel.org/r/[email protected] resp.
>> http://lkml.kernel.org/r/[email protected]
>>
>> then the email thread just died out because the issue turned out to be a
>> configuration issue. Michal indicated that the message might be useful
>> so dropping it completely seems like a bad idea. I do agree that
>> something has to be done about that though. Can we reconsider the
>> ratelimit thing?
>
> I agree that the rate of the message has gone up during 4.9 -- it used
> to be a few per second.

Sounds like a regression which should be fixed.

This is why I don’t think removing the message is a good idea. If you
suddenly see a lot of those messages, something changed for the worse.
If you remove this message, you will never know.

> However, if this is an expected path during normal operation,

This depends on your definition of ‘expected’ and ‘normal’.

In general, I would argue that the fact those ever happen is a bug
somewhere in the kernel – if memory is allocated as movable, it should
be movable damn it!

> we shouldn't be clogging dmesg with it at all. So, I'd rather we go
> with this patch, that is unless the KERN_DEBUG in your ratelimit patch
> would keep it out of journald as well (un-ratelimited, journald was
> eating 10% of a CPU processing the message, and I'd rather it not be
> getting logged at all).

--
Best regards
ミハウ “????????????????86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»

2016-12-29 23:17:28

by Eric Anholt

[permalink] [raw]
Subject: Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

Michal Nazarewicz <[email protected]> writes:

> On Thu, Dec 29 2016, Eric Anholt wrote:
>> Michal Hocko <[email protected]> writes:
>>
>>> This has been already brought up
>>> http://lkml.kernel.org/r/[email protected] and there
>>> was a proposed patch for that which ratelimited the output
>>> http://lkml.kernel.org/r/[email protected] resp.
>>> http://lkml.kernel.org/r/[email protected]
>>>
>>> then the email thread just died out because the issue turned out to be a
>>> configuration issue. Michal indicated that the message might be useful
>>> so dropping it completely seems like a bad idea. I do agree that
>>> something has to be done about that though. Can we reconsider the
>>> ratelimit thing?
>>
>> I agree that the rate of the message has gone up during 4.9 -- it used
>> to be a few per second.
>
> Sounds like a regression which should be fixed.
>
> This is why I don’t think removing the message is a good idea. If you
> suddenly see a lot of those messages, something changed for the worse.
> If you remove this message, you will never know.
>
>> However, if this is an expected path during normal operation,
>
> This depends on your definition of ‘expected’ and ‘normal’.
>
> In general, I would argue that the fact those ever happen is a bug
> somewhere in the kernel – if memory is allocated as movable, it should
> be movable damn it!

I was taking "expected" from dae803e165a11bc88ca8dbc07a11077caf97bbcb --
if this is a actually a bug, how do we go about debugging it?

I've had Raspbian carrying a patch downstream to remove the error
message for 2 years now, and I either need to get this fixed or get this
patch merged to Fedora and Debian as well, now that they're shipping
some support for Raspberry Pi.


Attachments:
signature.asc (832.00 B)

2016-12-30 07:11:20

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

On Thu, Dec 29 2016, Eric Anholt wrote:
> Michal Nazarewicz <[email protected]> writes:
>
>> On Thu, Dec 29 2016, Eric Anholt wrote:
>>> Michal Hocko <[email protected]> writes:
>>>
>>>> This has been already brought up
>>>> http://lkml.kernel.org/r/[email protected] and there
>>>> was a proposed patch for that which ratelimited the output
>>>> http://lkml.kernel.org/r/[email protected] resp.
>>>> http://lkml.kernel.org/r/[email protected]
>>>>
>>>> then the email thread just died out because the issue turned out to be a
>>>> configuration issue. Michal indicated that the message might be useful
>>>> so dropping it completely seems like a bad idea. I do agree that
>>>> something has to be done about that though. Can we reconsider the
>>>> ratelimit thing?
>>>
>>> I agree that the rate of the message has gone up during 4.9 -- it used
>>> to be a few per second.
>>
>> Sounds like a regression which should be fixed.
>>
>> This is why I don’t think removing the message is a good idea. If you
>> suddenly see a lot of those messages, something changed for the worse.
>> If you remove this message, you will never know.
>>
>>> However, if this is an expected path during normal operation,
>>
>> This depends on your definition of ‘expected’ and ‘normal’.
>>
>> In general, I would argue that the fact those ever happen is a bug
>> somewhere in the kernel – if memory is allocated as movable, it should
>> be movable damn it!
>
> I was taking "expected" from dae803e165a11bc88ca8dbc07a11077caf97bbcb --
> if this is a actually a bug, how do we go about debugging it?

That’s why I’ve pointed out that this depends on the definition. In my
opinion it’s a design bug which is now nearly impossible to fix in
efficient way.

The most likely issues is that some subsystem is allocating movable
memory but then either does not provide a way to actually move it
(that’s an obvious bug in the code IMO) or pins the memory while some
transaction is performed and at the same time CMA tries to move it.

The latter case is really unavoidable at this point which is why this
message is ‘expected’.

But if suddenly, the rate of the messages increases dramatically, you
have yourself a performance regression.

> I've had Raspbian carrying a patch downstream to remove the error
> message for 2 years now, and I either need to get this fixed or get this
> patch merged to Fedora and Debian as well, now that they're shipping
> some support for Raspberry Pi.

--
Best regards
ミハウ “????????????????86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»

2016-12-30 10:52:08

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

On Thu 29-12-16 23:22:20, Michal Nazarewicz wrote:
> On Thu, Dec 29 2016, Eric Anholt wrote:
> > Michal Hocko <[email protected]> writes:
> >
> >> This has been already brought up
> >> http://lkml.kernel.org/r/[email protected] and there
> >> was a proposed patch for that which ratelimited the output
> >> http://lkml.kernel.org/r/[email protected] resp.
> >> http://lkml.kernel.org/r/[email protected]
> >>
> >> then the email thread just died out because the issue turned out to be a
> >> configuration issue. Michal indicated that the message might be useful
> >> so dropping it completely seems like a bad idea. I do agree that
> >> something has to be done about that though. Can we reconsider the
> >> ratelimit thing?
> >
> > I agree that the rate of the message has gone up during 4.9 -- it used
> > to be a few per second.
>
> Sounds like a regression which should be fixed.
>
> This is why I don’t think removing the message is a good idea. If you
> suddenly see a lot of those messages, something changed for the worse.
> If you remove this message, you will never know.

I agree, that removing the message completely is not going to help to
find out regressions. Swamping logs with zillions of messages is,
however, not acceptable. It just causes even more problems. See the
previous report.

> > However, if this is an expected path during normal operation,
>
> This depends on your definition of ‘expected’ and ‘normal’.
>
> In general, I would argue that the fact those ever happen is a bug
> somewhere in the kernel – if memory is allocated as movable, it should
> be movable damn it!

Yes, it should be movable but there is no guarantee it is movable
immediately. Those pages might be pinned for some time. This is
unavoidable AFAICS.

So while this might be a regression which should be investigated there
should be another fix to prevent from swamping the logs as well.

--
Michal Hocko
SUSE Labs

2016-12-30 22:15:22

by Eric Anholt

[permalink] [raw]
Subject: Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

Michal Nazarewicz <[email protected]> writes:

> On Thu, Dec 29 2016, Eric Anholt wrote:
>> Michal Nazarewicz <[email protected]> writes:
>>
>>> On Thu, Dec 29 2016, Eric Anholt wrote:
>>>> Michal Hocko <[email protected]> writes:
>>>>
>>>>> This has been already brought up
>>>>> http://lkml.kernel.org/r/[email protected] and there
>>>>> was a proposed patch for that which ratelimited the output
>>>>> http://lkml.kernel.org/r/[email protected] resp.
>>>>> http://lkml.kernel.org/r/[email protected]
>>>>>
>>>>> then the email thread just died out because the issue turned out to be a
>>>>> configuration issue. Michal indicated that the message might be useful
>>>>> so dropping it completely seems like a bad idea. I do agree that
>>>>> something has to be done about that though. Can we reconsider the
>>>>> ratelimit thing?
>>>>
>>>> I agree that the rate of the message has gone up during 4.9 -- it used
>>>> to be a few per second.
>>>
>>> Sounds like a regression which should be fixed.
>>>
>>> This is why I don’t think removing the message is a good idea. If you
>>> suddenly see a lot of those messages, something changed for the worse.
>>> If you remove this message, you will never know.
>>>
>>>> However, if this is an expected path during normal operation,
>>>
>>> This depends on your definition of ‘expected’ and ‘normal’.
>>>
>>> In general, I would argue that the fact those ever happen is a bug
>>> somewhere in the kernel – if memory is allocated as movable, it should
>>> be movable damn it!
>>
>> I was taking "expected" from dae803e165a11bc88ca8dbc07a11077caf97bbcb --
>> if this is a actually a bug, how do we go about debugging it?
>
> That’s why I’ve pointed out that this depends on the definition. In my
> opinion it’s a design bug which is now nearly impossible to fix in
> efficient way.

OK, so the design is bad. When you said bug, I definitely thought you
were saying that the message shouldn't happen in the design.

Given CMA's current design, should everyone using CMA see their logs
slowly growing with this message that is an secret code for "CMA's
design hasn't yet changed"? If you want to have people be able to track
how often this is happening, let's make a perf event for it or something
instead.


Attachments:
signature.asc (832.00 B)

2016-12-31 05:08:53

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

On Fri, Dec 30 2016, Eric Anholt wrote:
> OK, so the design is bad. When you said bug, I definitely thought you
> were saying that the message shouldn't happen in the design.
>
> Given CMA's current design, should everyone using CMA see their logs
> slowly growing with this message that is an secret code for "CMA's
> design hasn't yet changed"?

Just to be clear, it’s not CMA’s design; it’s movable page’s design that
is bad (at least in the context of this discussion).

But yes, because of the way movable pages are, everyone is likely to see
the message.

> If you want to have people be able to track how often this is
> happening, let's make a perf event for it or something instead.

Sure.

--
Best regards
ミハウ “????????????????86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»