LinuxLists.cc - [PATCH] show message when exceeded rlimit of pending signals

2009-10-23 10:29:29

Subject: [PATCH] show message when exceeded rlimit of pending signals

Hi Andrew,

I was glad to talk to you in Japan Linux Symposium.
I'm writing about it.

I'm working to support kernel.
Recently, I got a inquiry about unexpected system behavior.
I analyzed application of our customer includeing kernel.

Eventually, there was no bug in application or kernel.
I found the cause was the limit of pending signals.
I ran following command. and system behaved expectedly.
# ulimit -i unlimited

When system behaved unexpectedly, the timer_create() in application
had returned -EAGAIN value.
But we can't imagine the -EAGAIN means that it exceeded limit of
pending signals at all.

Then I thought kernel should at least show some message about it.
And I tried to create a patch.

I'm sure that system engineeres will not have to have the same experience as I did.
How do you think about this idea ?

Thank you
Naohiro Ooiwa.

Signed-off-by: Naohiro Ooiwa <[email protected]>
---
kernel/signal.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 6705320..0bc4934 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -188,6 +188,9 @@ int next_signal(struct sigpending *pending, sigset_t *mask)
return sig;
}

+#define MAX_RLIMIT_CAUTION 5
+static int rlimit_caution_count = 0;
+
/*
* allocate a new signal queue record
* - this may be called without locks if and only if t == current, otherwise an
@@ -211,6 +214,16 @@ static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
atomic_read(&user->sigpending) <=
t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
q = kmem_cache_alloc(sigqueue_cachep, flags);
+ else {
+ if (rlimit_caution_count <= MAX_RLIMIT_CAUTION ){
+ printk(KERN_WARNING "reached the limit of pending signalis on pid %d\n", current->pid);
+ /* Last time, show the advice */
+ if (rlimit_caution_count == MAX_RLIMIT_CAUTION)
+ printk(KERN_WARNING "If unexpected your system behavior, you can try ulimit -i unlimited\n");
+ rlimit_caution_count++;
+ }
+ }
+
if (unlikely(q == NULL)) {
atomic_dec(&user->sigpending);
free_uid(user);
--
1.5.4.1

2009-10-23 11:46:26

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

* Naohiro Ooiwa <[email protected]> wrote:

> Hi Andrew,
>
> I was glad to talk to you in Japan Linux Symposium.
> I'm writing about it.
>
>
> I'm working to support kernel.
> Recently, I got a inquiry about unexpected system behavior.
> I analyzed application of our customer includeing kernel.
>
> Eventually, there was no bug in application or kernel.
> I found the cause was the limit of pending signals.
> I ran following command. and system behaved expectedly.
> # ulimit -i unlimited
>
> When system behaved unexpectedly, the timer_create() in application
> had returned -EAGAIN value.
> But we can't imagine the -EAGAIN means that it exceeded limit of
> pending signals at all.
>
> Then I thought kernel should at least show some message about it.
> And I tried to create a patch.
>
> I'm sure that system engineeres will not have to have the same experience as I did.
> How do you think about this idea ?
>
> Thank you
> Naohiro Ooiwa.
>
> Signed-off-by: Naohiro Ooiwa <[email protected]>
> ---
> kernel/signal.c | 13 +++++++++++++
> 1 files changed, 13 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 6705320..0bc4934 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -188,6 +188,9 @@ int next_signal(struct sigpending *pending, sigset_t *mask)
> return sig;
> }
>
> +#define MAX_RLIMIT_CAUTION 5
> +static int rlimit_caution_count = 0;
> +
> /*
> * allocate a new signal queue record
> * - this may be called without locks if and only if t == current, otherwise an
> @@ -211,6 +214,16 @@ static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
> atomic_read(&user->sigpending) <=
> t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
> q = kmem_cache_alloc(sigqueue_cachep, flags);
> + else {
> + if (rlimit_caution_count <= MAX_RLIMIT_CAUTION ){
> + printk(KERN_WARNING "reached the limit of pending signalis on pid %d\n", current->pid);
> + /* Last time, show the advice */
> + if (rlimit_caution_count == MAX_RLIMIT_CAUTION)
> + printk(KERN_WARNING "If unexpected your system behavior, you can try ulimit -i unlimited\n");
> + rlimit_caution_count++;
> + }
> + }
> +
> if (unlikely(q == NULL)) {
> atomic_dec(&user->sigpending);
> free_uid(user);

This new warning looks quite useful, i've seen several apps get into
trouble silently due to that, again and again.

The memory overhead of the signal queue was a problem 15 years ago ...
not so much today and people (and apps) dont expect to get in trouble
here. So the limit and its defaults are somewhat arcane, and the
behavior is catastrophic and hard to debug (because it's a dynamic
failure).

Regarding the patch, i've got a few (very) small suggestions.

Firstly, please update the if / else sequence from:

if (...)
...
else {
...
}

to:

if (...) {
...
} else {
...
}

as we strive for curly brace symmetries.

also, a small typo: s/signalis/signals

Plus, instead of using a pre-cooked global limit print_ratelimit() could
be used as well. That makes it useful for long-lived systems that run
into this limit occasionally. We wont spam the log - nor will we lose
(potentially essential) messages in the process.

Thanks,

Ingo

2009-10-23 21:08:18

by Roland McGrath

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

I have nothing in particular against the logging. (However, to me it seems
a little odd to use system-wide logging for normal well-defined error cases
of individual programs.) This seems to me primarily like a failure of
documentation.

If you'd asked me off hand what EAGAIN from timer_create could mean, I
would have told you right off that you have too many timers or too many
aggregate queued signals. I'm a person who would happen to know, of
course. But also, if you look in POSIX.1 for the timer_create definition,
under ERRORS it says:

[EAGAIN] The system lacks sufficient signal queuing resources to
honor the request.
[EAGAIN] The calling process has already created all of the timers it
is allowed by this implementation.

Now that is a little vague about it potentially relating to the
RLIMIT_SIGPENDING limit (which is not a POSIX.1 feature, though exactly the
sort of thing permitted by the "is allowed by this implementation" clause).
But it certainly points you in some reasonable directions so this doesn't
seem like it would be such a mystery.

But it's certainly unfortunate that man-pages-3.19 for timer_create has only:

-EAGAIN
The system could not process the request.

That description is basically content-free, it applies equally to any
potential error from any call.

Thanks,
Roland

2009-10-24 07:02:54

by tip-bot for Naohiro Ooiwa

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

Hi Ingo

Thank you so much for early quick reply.
and I'm happy you agree with my proposal.

> Regarding the patch, i've got a few (very) small suggestions.

Thank you for pointing out.
Please wait a moment. I will resend a patch.

Of course, I will plan to use print_ratelimit().
Actually, I received with same opinion from OGAWA-san.

Thank you
Naohiro Ooiwa.

Ingo Molnar wrote:
> * Naohiro Ooiwa <[email protected]> wrote:
>
>> Hi Andrew,
>>
>> I was glad to talk to you in Japan Linux Symposium.
>> I'm writing about it.
>>
>>
>> I'm working to support kernel.
>> Recently, I got a inquiry about unexpected system behavior.
>> I analyzed application of our customer includeing kernel.
>>
>> Eventually, there was no bug in application or kernel.
>> I found the cause was the limit of pending signals.
>> I ran following command. and system behaved expectedly.
>> # ulimit -i unlimited
>>
>> When system behaved unexpectedly, the timer_create() in application
>> had returned -EAGAIN value.
>> But we can't imagine the -EAGAIN means that it exceeded limit of
>> pending signals at all.
>>
>> Then I thought kernel should at least show some message about it.
>> And I tried to create a patch.
>>
>> I'm sure that system engineeres will not have to have the same experience as I did.
>> How do you think about this idea ?
>>
>> Thank you
>> Naohiro Ooiwa.
>>
>> Signed-off-by: Naohiro Ooiwa <[email protected]>
>> ---
>> kernel/signal.c | 13 +++++++++++++
>> 1 files changed, 13 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 6705320..0bc4934 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -188,6 +188,9 @@ int next_signal(struct sigpending *pending, sigset_t *mask)
>> return sig;
>> }
>>
>> +#define MAX_RLIMIT_CAUTION 5
>> +static int rlimit_caution_count = 0;
>> +
>> /*
>> * allocate a new signal queue record
>> * - this may be called without locks if and only if t == current, otherwise an
>> @@ -211,6 +214,16 @@ static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
>> atomic_read(&user->sigpending) <=
>> t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
>> q = kmem_cache_alloc(sigqueue_cachep, flags);
>> + else {
>> + if (rlimit_caution_count <= MAX_RLIMIT_CAUTION ){
>> + printk(KERN_WARNING "reached the limit of pending signalis on pid %d\n", current->pid);
>> + /* Last time, show the advice */
>> + if (rlimit_caution_count == MAX_RLIMIT_CAUTION)
>> + printk(KERN_WARNING "If unexpected your system behavior, you can try ulimit -i unlimited\n");
>> + rlimit_caution_count++;
>> + }
>> + }
>> +
>> if (unlikely(q == NULL)) {
>> atomic_dec(&user->sigpending);
>> free_uid(user);
>
> This new warning looks quite useful, i've seen several apps get into
> trouble silently due to that, again and again.
>
> The memory overhead of the signal queue was a problem 15 years ago ...
> not so much today and people (and apps) dont expect to get in trouble
> here. So the limit and its defaults are somewhat arcane, and the
> behavior is catastrophic and hard to debug (because it's a dynamic
> failure).
>
> Regarding the patch, i've got a few (very) small suggestions.
>
> Firstly, please update the if / else sequence from:
>
> if (...)
> ...
> else {
> ...
> }
>
> to:
>
> if (...) {
> ...
> } else {
> ...
> }
>
> as we strive for curly brace symmetries.
>
> also, a small typo: s/signalis/signals
>
> Plus, instead of using a pre-cooked global limit print_ratelimit() could
> be used as well. That makes it useful for long-lived systems that run
> into this limit occasionally. We wont spam the log - nor will we lose
> (potentially essential) messages in the process.
>
> Thanks,
>
> Ingo

2009-10-24 08:27:08

by tip-bot for Naohiro Ooiwa

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

Hi Roland,

Thank you for your reply.

> This seems to me primarily like a failure of
> documentation.

You just said it. At first, I thought it.

> That description is basically content-free, it applies equally to any
> potential error from any call.

The reality is, the man-pages has been summary.

> If you'd asked me off hand what EAGAIN from timer_create could mean, I
> would have told you right off that you have too many timers or too many
> aggregate queued signals.

This idea is for system engineeres, not kernel developers.
In this case, I found this cause soon, because I could reproduce
this phenomenon.
But when it run into this limit occasionally, we can't obtain
any solid physical evidence. On the contrary, It's OK.

If application don't see error value or nobody debugging by strace,
we just no way. We get yelled at by customer.

So I thought this logging.

PS,
Now I have one idea.
When the TCP socket is not called close(), sometimes it countinue to
stay in kernel as FIN_WAIT2 state. I'm understanding why it's happened.
But I think it is same problem.

Thank you
Naohiro Ooiwa.

Roland McGrath wrote:
> I have nothing in particular against the logging. (However, to me it seems
> a little odd to use system-wide logging for normal well-defined error cases
> of individual programs.) This seems to me primarily like a failure of
> documentation.
>
> If you'd asked me off hand what EAGAIN from timer_create could mean, I
> would have told you right off that you have too many timers or too many
> aggregate queued signals. I'm a person who would happen to know, of
> course. But also, if you look in POSIX.1 for the timer_create definition,
> under ERRORS it says:
>
> [EAGAIN] The system lacks sufficient signal queuing resources to
> honor the request.
> [EAGAIN] The calling process has already created all of the timers it
> is allowed by this implementation.
>
> Now that is a little vague about it potentially relating to the
> RLIMIT_SIGPENDING limit (which is not a POSIX.1 feature, though exactly the
> sort of thing permitted by the "is allowed by this implementation" clause).
> But it certainly points you in some reasonable directions so this doesn't
> seem like it would be such a mystery.
>
> But it's certainly unfortunate that man-pages-3.19 for timer_create has only:
>
> -EAGAIN
> The system could not process the request.
>
> That description is basically content-free, it applies equally to any
> potential error from any call.
>
>
> Thanks,
> Roland

2009-10-24 08:56:46

by tip-bot for Naohiro Ooiwa

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

Hi Ingo, Roland,

Now, I received a nice comment from OGAWA-san.
How is this impriment like a print_faital_signal().

I think it's very nice.

Thank you
Naohiro Ooiwa.

Naohiro Ooiwa wrote:
> Hi Ingo
>
> Thank you so much for early quick reply.
> and I'm happy you agree with my proposal.
>
>> Regarding the patch, i've got a few (very) small suggestions.
>
> Thank you for pointing out.
> Please wait a moment. I will resend a patch.
>
> Of course, I will plan to use print_ratelimit().
> Actually, I received with same opinion from OGAWA-san.
>
>
> Thank you
> Naohiro Ooiwa.
>
>
> Ingo Molnar wrote:
>> * Naohiro Ooiwa <[email protected]> wrote:
>>
>>> Hi Andrew,
>>>
>>> I was glad to talk to you in Japan Linux Symposium.
>>> I'm writing about it.
>>>
>>>
>>> I'm working to support kernel.
>>> Recently, I got a inquiry about unexpected system behavior.
>>> I analyzed application of our customer includeing kernel.
>>>
>>> Eventually, there was no bug in application or kernel.
>>> I found the cause was the limit of pending signals.
>>> I ran following command. and system behaved expectedly.
>>> # ulimit -i unlimited
>>>
>>> When system behaved unexpectedly, the timer_create() in application
>>> had returned -EAGAIN value.
>>> But we can't imagine the -EAGAIN means that it exceeded limit of
>>> pending signals at all.
>>>
>>> Then I thought kernel should at least show some message about it.
>>> And I tried to create a patch.
>>>
>>> I'm sure that system engineeres will not have to have the same
>>> experience as I did.
>>> How do you think about this idea ?
>>>
>>> Thank you
>>> Naohiro Ooiwa.
>>>
>>> Signed-off-by: Naohiro Ooiwa <[email protected]>
>>> ---
>>> kernel/signal.c | 13 +++++++++++++
>>> 1 files changed, 13 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>> index 6705320..0bc4934 100644
>>> --- a/kernel/signal.c
>>> +++ b/kernel/signal.c
>>> @@ -188,6 +188,9 @@ int next_signal(struct sigpending *pending,
>>> sigset_t *mask)
>>> return sig;
>>> }
>>>
>>> +#define MAX_RLIMIT_CAUTION 5
>>> +static int rlimit_caution_count = 0;
>>> +
>>> /*
>>> * allocate a new signal queue record
>>> * - this may be called without locks if and only if t == current,
>>> otherwise an
>>> @@ -211,6 +214,16 @@ static struct sigqueue *__sigqueue_alloc(struct
>>> task_struct *t, gfp_t flags,
>>> atomic_read(&user->sigpending) <=
>>> t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
>>> q = kmem_cache_alloc(sigqueue_cachep, flags);
>>> + else {
>>> + if (rlimit_caution_count <= MAX_RLIMIT_CAUTION ){
>>> + printk(KERN_WARNING "reached the limit of pending
>>> signalis on pid %d\n", current->pid);
>>> + /* Last time, show the advice */
>>> + if (rlimit_caution_count == MAX_RLIMIT_CAUTION)
>>> + printk(KERN_WARNING "If unexpected your system
>>> behavior, you can try ulimit -i unlimited\n");
>>> + rlimit_caution_count++;
>>> + }
>>> + }
>>> +
>>> if (unlikely(q == NULL)) {
>>> atomic_dec(&user->sigpending);
>>> free_uid(user);
>>
>> This new warning looks quite useful, i've seen several apps get into
>> trouble silently due to that, again and again.
>>
>> The memory overhead of the signal queue was a problem 15 years ago ...
>> not so much today and people (and apps) dont expect to get in trouble
>> here. So the limit and its defaults are somewhat arcane, and the
>> behavior is catastrophic and hard to debug (because it's a dynamic
>> failure).
>>
>> Regarding the patch, i've got a few (very) small suggestions.
>>
>> Firstly, please update the if / else sequence from:
>>
>> if (...)
>> ...
>> else {
>> ...
>> }
>>
>> to:
>>
>> if (...) {
>> ...
>> } else {
>> ...
>> }
>>
>> as we strive for curly brace symmetries.
>>
>> also, a small typo: s/signalis/signals
>>
>> Plus, instead of using a pre-cooked global limit print_ratelimit()
>> could be used as well. That makes it useful for long-lived systems
>> that run into this limit occasionally. We wont spam the log - nor will
>> we lose (potentially essential) messages in the process.
>>
>> Thanks,
>>
>> Ingo
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-10-24 08:59:16

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

* Naohiro Ooiwa <[email protected]> wrote:

> Hi Ingo, Roland,
>
> Now, I received a nice comment from OGAWA-san.
> How is this impriment like a print_faital_signal().
>
> I think it's very nice.

Agreed - i just wanted to suggest that too. print_fatal_signals is
already a switch that turns on warnings about 'weird looking signal
behavior'. print-on-overflow seems like a good match.

Ingo

2009-10-26 10:17:39

by tip-bot for Naohiro Ooiwa

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

Hi Ingo

I remade a patch.

I already tested it. The result was good for me.
Could you please check it.

Thanks you.
Naohiro Ooiwa

Signed-off-by: Naohiro Ooiwa <[email protected]>
---
Documentation/kernel-parameters.txt | 14 ++++++++++++++
kernel/signal.c | 24 +++++++++++++++++++++++-
kernel/sysctl.c | 9 +++++++++
3 files changed, 46 insertions(+), 1 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 9107b38..37104b1 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2036,6 +2036,20 @@ and is between 256 and 4096 characters. It is defined in the file
the kernel console.
default: off.

+ print-reach-rlimit-sigpending=
+ [KNL] debug: print caution that reached the limit of
+ pending signals.
+ If your working system may have too many POSIX.1 timers
+ or during the system test, you may as well to enable
+ this parameter.
+ print-reach-rlimit-sigpending=0: disable this print
+ print-reach-rlimit-sigpending=1: print message that
+ reached the limit of pending signals to the kernel
+ console.
+ When this message is printed, maybe you should try to
+ "ulimit -i unlimited".
+ default: off.
+
printk.time= Show timing data prefixed to each printk message line
Format: <bool> (1/Y/y=enable, 0/N/n=disable)

diff --git a/kernel/signal.c b/kernel/signal.c
index 6705320..9943e71 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -188,6 +188,24 @@ int next_signal(struct sigpending *pending, sigset_t *mask)
return sig;
}

+int print_reach_rlimit_sigpending;
+
+static void show_reach_rlimit_sigpending(void)
+{
+ if (printk_ratelimit())
+ printk(KERN_WARNING "%s/%d: reached the limit of
+ pending signals.\n", current->comm, current->pid);
+}
+
+static int __init setup_print_reach_rlimit_sigpending(char *str)
+{
+ get_option(&str, &print_reach_rlimit_sigpending);
+
+ return 1;
+}
+
+__setup("print-reach-rlimit-sigpending=", setup_print_reach_rlimit_sigpending);
+
/*
* allocate a new signal queue record
* - this may be called without locks if and only if t == current, otherwise an
@@ -209,8 +227,12 @@ static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
atomic_inc(&user->sigpending);
if (override_rlimit ||
atomic_read(&user->sigpending) <=
- t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
+ t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur) {
q = kmem_cache_alloc(sigqueue_cachep, flags);
+ } else {
+ if (print_reach_rlimit_sigpending)
+ show_reach_rlimit_sigpending();
+ }
if (unlikely(q == NULL)) {
atomic_dec(&user->sigpending);
free_uid(user);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 0d949c5..93b2760 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -67,6 +67,7 @@ static int deprecated_sysctl_warning(struct __sysctl_args *args);
/* External variables not in a header file. */
extern int C_A_D;
extern int print_fatal_signals;
+extern int print_reach_rlimit_sigpending;
extern int sysctl_overcommit_memory;
extern int sysctl_overcommit_ratio;
extern int sysctl_panic_on_oom;
@@ -467,6 +468,14 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "print-reach-rlimit-sigpending",
+ .data = &print_reach_rlimit_sigpending,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
#ifdef CONFIG_SPARC
{
.ctl_name = KERN_SPARC_REBOOT,
-- 1.5.4.1

Ingo Molnar wrote:
> * Naohiro Ooiwa <[email protected]> wrote:
>
>> Hi Ingo, Roland,
>>
>> Now, I received a nice comment from OGAWA-san.
>> How is this impriment like a print_faital_signal().
>>
>> I think it's very nice.
>
> Agreed - i just wanted to suggest that too. print_fatal_signals is
> already a switch that turns on warnings about 'weird looking signal
> behavior'. print-on-overflow seems like a good match.
>
> Ingo

2009-10-26 11:38:28

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

* nooiwa <[email protected]> wrote:

> +++ b/kernel/sysctl.c
> @@ -67,6 +67,7 @@ static int deprecated_sysctl_warning(struct __sysctl_args *args);
> /* External variables not in a header file. */
> extern int C_A_D;
> extern int print_fatal_signals;
> +extern int print_reach_rlimit_sigpending;

Ooiwa-san, Roland, Andrew - what do you think about just making this
part of the existing print_fatal_signals flag, instead of adding a new
one?

Signal queue overflows are a 'fatal', signal-related condition as well -
we lose a signal in essence. The patch would be smaller as well.

Ingo

2009-10-26 16:38:15

by Roland McGrath

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

I have no opinion about the logging or how to enable it.

Thanks,
Roland

2009-10-26 16:39:13

by tip-bot for Naohiro Ooiwa

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

Hi Ingo,

Now that you mention it, I think so, too.
I update my patch.

How is the following patch.
Could you please review it.

Thanks you.
Naohiro Ooiwa

Signed-off-by: Naohiro Ooiwa <[email protected]>
---
Documentation/kernel-parameters.txt | 9 ++++++++-
kernel/signal.c | 16 +++++++++++++---
2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt
b/Documentation/kernel-parameters.txt
index 9107b38..01c2723 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2032,8 +2032,15 @@ and is between 256 and 4096 characters. It is
defined in the file

print-fatal-signals=
[KNL] debug: print fatal signals
+ If you would like to know what the cause of a coredump
+ by signal number, if your working system may have
+ too many POSIX.1 timers, and when during the system
+ test,you may as well to enable this parameter.
print-fatal-signals=1: print segfault info to
- the kernel console.
+ the kernel console, and print caution that reached the
+ limit of pending signals to the kernel console.
+ When printed the caution messages, you can try
+ "ulimit -i unlimited".
default: off.

printk.time= Show timing data prefixed to each printk message line
diff --git a/kernel/signal.c b/kernel/signal.c
index 6705320..137112e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -188,6 +188,14 @@ int next_signal(struct sigpending *pending,
sigset_t *mask)
return sig;
}

+int print_fatal_signals;
+
+static void show_reach_rlimit_sigpending(void)
+{
+ if (printk_ratelimit())
+ printk(KERN_WARNING "%s/%d: reached the limit of pending signals.\n",
current->comm, current->pid);
+}
+
/*
* allocate a new signal queue record
* - this may be called without locks if and only if t == current,
otherwise an
@@ -209,8 +217,12 @@ static struct sigqueue *__sigqueue_alloc(struct
task_struct *t, gfp_t flags,
atomic_inc(&user->sigpending);
if (override_rlimit ||
atomic_read(&user->sigpending) <=
- t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
+ t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur) {
q = kmem_cache_alloc(sigqueue_cachep, flags);
+ } else {
+ if (print_fatal_signals)
+ show_reach_rlimit_sigpending();
+ }
if (unlikely(q == NULL)) {
atomic_dec(&user->sigpending);
free_uid(user);
@@ -925,8 +937,6 @@ static int send_signal(int sig, struct siginfo
*info, struct task_struct *t,
return __send_signal(sig, info, t, group, from_ancestor_ns);
}

-int print_fatal_signals;
-
static void print_fatal_signal(struct pt_regs *regs, int signr)
{
printk("%s/%d: potentially unexpected fatal signal %d.\n",
-- 1.5.4.1

Ingo Molnar wrote:
> * nooiwa <[email protected]> wrote:
>
>> +++ b/kernel/sysctl.c
>> @@ -67,6 +67,7 @@ static int deprecated_sysctl_warning(struct __sysctl_args *args);
>> /* External variables not in a header file. */
>> extern int C_A_D;
>> extern int print_fatal_signals;
>> +extern int print_reach_rlimit_sigpending;
>
> Ooiwa-san, Roland, Andrew - what do you think about just making this
> part of the existing print_fatal_signals flag, instead of adding a new
> one?
>
> Signal queue overflows are a 'fatal', signal-related condition as well -
> we lose a signal in essence. The patch would be smaller as well.
>
> Ingo

2009-10-26 20:28:35

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

* Naohiro Ooiwa <[email protected]> wrote:

> Hi Ingo,
>
> Now that you mention it, I think so, too.
> I update my patch.
>
> How is the following patch.
> Could you please review it.
>
> Thanks you.
> Naohiro Ooiwa
>
>
>
> Signed-off-by: Naohiro Ooiwa <[email protected]>
> ---
> Documentation/kernel-parameters.txt | 9 ++++++++-
> kernel/signal.c | 16 +++++++++++++---
> 2 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt
> b/Documentation/kernel-parameters.txt
> index 9107b38..01c2723 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2032,8 +2032,15 @@ and is between 256 and 4096 characters. It is
> defined in the file
>
> print-fatal-signals=
> [KNL] debug: print fatal signals
> + If you would like to know what the cause of a coredump
> + by signal number, if your working system may have
> + too many POSIX.1 timers, and when during the system
> + test,you may as well to enable this parameter.
> print-fatal-signals=1: print segfault info to
> - the kernel console.
> + the kernel console, and print caution that reached the
> + limit of pending signals to the kernel console.
> + When printed the caution messages, you can try
> + "ulimit -i unlimited".
> default: off.
>

Here's a slightly improved version of the text:

print-fatal-signals=
[KNL] debug: print fatal signals

If enabled, warn about various signal handling
related application anomalies: too many signals,
too many POSIX.1 timers, fatal signals causing a
coredump - etc.

If you hit the warning due to signal overflow,
you might want to try "ulimit -i unlimited".

default: off.

> +int print_fatal_signals;

i'd suggest __read_mostly.

Plus please move variables to the top of the file. (i know this comes
from the previous code but we can improve it while we are touching it)

With these things addressed it looks good to me:

Acked-by: Ingo Molnar <[email protected]>

Ingo

2009-10-27 02:58:40

by tip-bot for Naohiro Ooiwa

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

Hi Ingo,

> Here's a slightly improved version of the text:

Thank you for your review and collect my English.

>> +int print_fatal_signals;
>
> i'd suggest __read_mostly.

> Plus please move variables to the top of the file. (i know this comes
> from the previous code but we can improve it while we are touching it)

Of course. You're right, if we found one like it,
I want to improve the code little by little too.

How is the following patch.

Signed-off-by: Naohiro Ooiwa <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
---
Documentation/kernel-parameters.txt | 13 ++++++++++---
kernel/signal.c | 16 +++++++++++++---
2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 9107b38..8492ad3 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2032,9 +2032,16 @@ and is between 256 and 4096 characters. It is defined in the file

print-fatal-signals=
[KNL] debug: print fatal signals
- print-fatal-signals=1: print segfault info to
- the kernel console.
- default: off.
+
+ If enabled, warn about various signal handling
+ related application anomalies: too many signals,
+ too many POSIX.1 timers, fatal signals causing a
+ coredump - etc.
+
+ If you hit the warning due to signal overflow,
+ you might want to try "ulimit -i unlimited".
+
+ default: off.

printk.time= Show timing data prefixed to each printk message line
Format: <bool> (1/Y/y=enable, 0/N/n=disable)
diff --git a/kernel/signal.c b/kernel/signal.c
index 6705320..c913eb7 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -41,6 +41,8 @@

static struct kmem_cache *sigqueue_cachep;

+int print_fatal_signals __read_mostly;
+
static void __user *sig_handler(struct task_struct *t, int sig)
{
return t->sighand->action[sig - 1].sa.sa_handler;
@@ -188,6 +190,12 @@ int next_signal(struct sigpending *pending, sigset_t *mask)
return sig;
}

+static void show_reach_rlimit_sigpending(void)
+{
+ if (printk_ratelimit())
+ printk(KERN_WARNING "%s/%d: reached the limit of pending signals.\n", current->comm, current->pid);
+}
+
/*
* allocate a new signal queue record
* - this may be called without locks if and only if t == current, otherwise an
@@ -209,8 +217,12 @@ static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
atomic_inc(&user->sigpending);
if (override_rlimit ||
atomic_read(&user->sigpending) <=
- t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
+ t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur) {
q = kmem_cache_alloc(sigqueue_cachep, flags);
+ } else {
+ if (print_fatal_signals)
+ show_reach_rlimit_sigpending();
+ }
if (unlikely(q == NULL)) {
atomic_dec(&user->sigpending);
free_uid(user);
@@ -925,8 +937,6 @@ static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
return __send_signal(sig, info, t, group, from_ancestor_ns);
}

-int print_fatal_signals;
-
static void print_fatal_signal(struct pt_regs *regs, int signr)
{
printk("%s/%d: potentially unexpected fatal signal %d.\n",
-- 1.5.4.1

Ingo Molnar wrote:
> * Naohiro Ooiwa <[email protected]> wrote:
>
>> Hi Ingo,
>>
>> Now that you mention it, I think so, too.
>> I update my patch.
>>
>> How is the following patch.
>> Could you please review it.
>>
>> Thanks you.
>> Naohiro Ooiwa
>>
>>
>>
>> Signed-off-by: Naohiro Ooiwa <[email protected]>
>> ---
>> Documentation/kernel-parameters.txt | 9 ++++++++-
>> kernel/signal.c | 16 +++++++++++++---
>> 2 files changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/kernel-parameters.txt
>> b/Documentation/kernel-parameters.txt
>> index 9107b38..01c2723 100644
>> --- a/Documentation/kernel-parameters.txt
>> +++ b/Documentation/kernel-parameters.txt
>> @@ -2032,8 +2032,15 @@ and is between 256 and 4096 characters. It is
>> defined in the file
>>
>> print-fatal-signals=
>> [KNL] debug: print fatal signals
>> + If you would like to know what the cause of a coredump
>> + by signal number, if your working system may have
>> + too many POSIX.1 timers, and when during the system
>> + test,you may as well to enable this parameter.
>> print-fatal-signals=1: print segfault info to
>> - the kernel console.
>> + the kernel console, and print caution that reached the
>> + limit of pending signals to the kernel console.
>> + When printed the caution messages, you can try
>> + "ulimit -i unlimited".
>> default: off.
>>
>
> Here's a slightly improved version of the text:
>
> print-fatal-signals=
> [KNL] debug: print fatal signals
>
> If enabled, warn about various signal handling
> related application anomalies: too many signals,
> too many POSIX.1 timers, fatal signals causing a
> coredump - etc.
>
> If you hit the warning due to signal overflow,
> you might want to try "ulimit -i unlimited".
>
> default: off.
>
>> +int print_fatal_signals;
>
> i'd suggest __read_mostly.
>
> Plus please move variables to the top of the file. (i know this comes
> from the previous code but we can improve it while we are touching it)
>
> With these things addressed it looks good to me:
>
> Acked-by: Ingo Molnar <[email protected]>
>
> Ingo
>

2009-10-27 04:38:10

by Hiroshi Shimamoto

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

Naohiro Ooiwa wrote:
> Hi Ingo,
>
>> Here's a slightly improved version of the text:
>
> Thank you for your review and collect my English.
>
>>> +int print_fatal_signals;
>> i'd suggest __read_mostly.
>
>> Plus please move variables to the top of the file. (i know this comes
>> from the previous code but we can improve it while we are touching it)
>
> Of course. You're right, if we found one like it,
> I want to improve the code little by little too.
>
> How is the following patch.

If you can provide a test case, it's helpful to confirm this improve.
And nitpicks below.

>
> Signed-off-by: Naohiro Ooiwa <[email protected]>
> Acked-by: Ingo Molnar <[email protected]>
> ---
> Documentation/kernel-parameters.txt | 13 ++++++++++---
> kernel/signal.c | 16 +++++++++++++---
> 2 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 9107b38..8492ad3 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2032,9 +2032,16 @@ and is between 256 and 4096 characters. It is defined in the file
>
> print-fatal-signals=
> [KNL] debug: print fatal signals
> - print-fatal-signals=1: print segfault info to
> - the kernel console.
> - default: off.
> +
> + If enabled, warn about various signal handling
> + related application anomalies: too many signals,
> + too many POSIX.1 timers, fatal signals causing a
> + coredump - etc.
> +
> + If you hit the warning due to signal overflow,
> + you might want to try "ulimit -i unlimited".
> +
> + default: off.

there are white spaces before tabs.

>
> printk.time= Show timing data prefixed to each printk message line
> Format: <bool> (1/Y/y=enable, 0/N/n=disable)
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 6705320..c913eb7 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -41,6 +41,8 @@
>
> static struct kmem_cache *sigqueue_cachep;
>
> +int print_fatal_signals __read_mostly;
> +
> static void __user *sig_handler(struct task_struct *t, int sig)
> {
> return t->sighand->action[sig - 1].sa.sa_handler;
> @@ -188,6 +190,12 @@ int next_signal(struct sigpending *pending, sigset_t *mask)
> return sig;
> }
>
> +static void show_reach_rlimit_sigpending(void)
> +{
> + if (printk_ratelimit())
> + printk(KERN_WARNING "%s/%d: reached the limit of pending signals.\n", current->comm, current->pid);

this line over 80 characters.

thanks,
Hiroshi

2009-10-27 08:27:35

by tip-bot for Naohiro Ooiwa

[permalink] [raw]

Subject: Re: [PATCH] show message when exceeded rlimit of pending signals

Hi Hiroshi

> And nitpicks below.

Thank you so much.
Sorry, I will run scripts/checkpatch.pl in the next time.

> there are white spaces before tabs.

oops...

>> + printk(KERN_WARNING "%s/%d: reached the limit of pending signals.\n",
current->comm, current->pid);
>
> this line over 80 characters.

I deleted "KERN_WARNING", because there is not it in print_faital_signal().

Thanks you.
Naohiro Ooiwa

Signed-off-by: Naohiro Ooiwa <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
---
Documentation/kernel-parameters.txt | 11 +++++++++--
kernel/signal.c | 17 ++++++++++++++---
2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/Documentation/kernel-parameters.txt
b/Documentation/kernel-parameters.txt
index 9107b38..6b74dd4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2032,8 +2032,15 @@ and is between 256 and 4096 characters. It is
defined in the file

print-fatal-signals=
[KNL] debug: print fatal signals
- print-fatal-signals=1: print segfault info to
- the kernel console.
+
+ If enabled, warn about various signal handling
+ related application anomalies: too many signals,
+ too many POSIX.1 timers, fatal signals causing a
+ coredump - etc.
+
+ If you hit the warning due to signal overflow,
+ you might want to try "ulimit -i unlimited".
+
default: off.

printk.time= Show timing data prefixed to each printk message line
diff --git a/kernel/signal.c b/kernel/signal.c
index 6705320..8588833 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -41,6 +41,8 @@

static struct kmem_cache *sigqueue_cachep;

+int print_fatal_signals __read_mostly;
+
static void __user *sig_handler(struct task_struct *t, int sig)
{
return t->sighand->action[sig - 1].sa.sa_handler;
@@ -188,6 +190,13 @@ int next_signal(struct sigpending *pending,
sigset_t *mask)
return sig;
}

+static void show_reach_rlimit_sigpending(void)
+{
+ if (printk_ratelimit())
+ printk("%s/%d: reached the limit of pending signals.\n",
+ current->comm, current->pid);
+}
+
/*
* allocate a new signal queue record
* - this may be called without locks if and only if t == current,
otherwise an
@@ -209,8 +218,12 @@ static struct sigqueue *__sigqueue_alloc(struct
task_struct *t, gfp_t flags,
atomic_inc(&user->sigpending);
if (override_rlimit ||
atomic_read(&user->sigpending) <=
- t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
+ t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur) {
q = kmem_cache_alloc(sigqueue_cachep, flags);
+ } else {
+ if (print_fatal_signals)
+ show_reach_rlimit_sigpending();
+ }
if (unlikely(q == NULL)) {
atomic_dec(&user->sigpending);
free_uid(user);
@@ -925,8 +938,6 @@ static int send_signal(int sig, struct siginfo
*info, struct task_struct *t,
return __send_signal(sig, info, t, group, from_ancestor_ns);
}

-int print_fatal_signals;
-
static void print_fatal_signal(struct pt_regs *regs, int signr)
{
printk("%s/%d: potentially unexpected fatal signal %d.\n",
-- 1.5.4.1

Hiroshi Shimamoto wrote:
> Naohiro Ooiwa wrote:
>> Hi Ingo,
>>
>>> Here's a slightly improved version of the text:
>> Thank you for your review and collect my English.
>>
>>>> +int print_fatal_signals;
>>> i'd suggest __read_mostly.
>>> Plus please move variables to the top of the file. (i know this comes
>>> from the previous code but we can improve it while we are touching it)
>> Of course. You're right, if we found one like it,
>> I want to improve the code little by little too.
>>
>> How is the following patch.
>
> If you can provide a test case, it's helpful to confirm this improve.
> And nitpicks below.
>
>> Signed-off-by: Naohiro Ooiwa <[email protected]>
>> Acked-by: Ingo Molnar <[email protected]>
>> ---
>> Documentation/kernel-parameters.txt | 13 ++++++++++---
>> kernel/signal.c | 16 +++++++++++++---
>> 2 files changed, 23 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
>> index 9107b38..8492ad3 100644
>> --- a/Documentation/kernel-parameters.txt
>> +++ b/Documentation/kernel-parameters.txt
>> @@ -2032,9 +2032,16 @@ and is between 256 and 4096 characters. It is defined in the file
>>
>> print-fatal-signals=
>> [KNL] debug: print fatal signals
>> - print-fatal-signals=1: print segfault info to
>> - the kernel console.
>> - default: off.
>> +
>> + If enabled, warn about various signal handling
>> + related application anomalies: too many signals,
>> + too many POSIX.1 timers, fatal signals causing a
>> + coredump - etc.
>> +
>> + If you hit the warning due to signal overflow,
>> + you might want to try "ulimit -i unlimited".
>> +
>> + default: off.
>
> there are white spaces before tabs.
>
>> printk.time= Show timing data prefixed to each printk message line
>> Format: <bool> (1/Y/y=enable, 0/N/n=disable)
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 6705320..c913eb7 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -41,6 +41,8 @@
>>
>> static struct kmem_cache *sigqueue_cachep;
>>
>> +int print_fatal_signals __read_mostly;
>> +
>> static void __user *sig_handler(struct task_struct *t, int sig)
>> {
>> return t->sighand->action[sig - 1].sa.sa_handler;
>> @@ -188,6 +190,12 @@ int next_signal(struct sigpending *pending, sigset_t *mask)
>> return sig;
>> }
>>
>> +static void show_reach_rlimit_sigpending(void)
>> +{
>> + if (printk_ratelimit())
>> + printk(KERN_WARNING "%s/%d: reached the limit of pending signals.\n", current->comm, current->pid);
>
> this line over 80 characters.
>
> thanks,
> Hiroshi