This taint flag will be set if the system has ever entered a softlockup
state. Similar to TAINT_WARN it is useful to know whether or not the system
has been in a softlockup state when debugging.
Signed-off-by: Josh Hunt <[email protected]>
---
Documentation/oops-tracing.txt | 2 ++
Documentation/sysctl/kernel.txt | 1 +
include/linux/kernel.h | 1 +
kernel/panic.c | 1 +
kernel/watchdog.c | 1 +
5 files changed, 6 insertions(+)
diff --git a/Documentation/oops-tracing.txt b/Documentation/oops-tracing.txt
index e315599..beefb9f 100644
--- a/Documentation/oops-tracing.txt
+++ b/Documentation/oops-tracing.txt
@@ -268,6 +268,8 @@ characters, each representing a particular tainted value.
14: 'E' if an unsigned module has been loaded in a kernel supporting
module signature.
+ 15: 'L' if a soft lockup has previously occurred on the system.
+
The primary reason for the 'Tainted: ' string is to tell kernel
debuggers if this is a clean kernel or if anything unusual has
occurred. Tainting is permanent: even if an offending module is
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 9886c3d..8dfdf2f 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -788,6 +788,7 @@ can be ORed together:
4096 - An out-of-tree module has been loaded.
8192 - An unsigned module has been loaded in a kernel supporting module
signature.
+16384 - A soft lockup has previously occurred on the system.
==============================================================
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 4c52907..eb7b074 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -470,6 +470,7 @@ extern enum system_states {
#define TAINT_FIRMWARE_WORKAROUND 11
#define TAINT_OOT_MODULE 12
#define TAINT_UNSIGNED_MODULE 13
+#define TAINT_SOFTLOCKUP 14
extern const char hex_asc[];
#define hex_asc_lo(x) hex_asc[((x) & 0x0f)]
diff --git a/kernel/panic.c b/kernel/panic.c
index d02fa9f..d68c5d8 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -212,6 +212,7 @@ static const struct tnt tnts[] = {
{ TAINT_FIRMWARE_WORKAROUND, 'I', ' ' },
{ TAINT_OOT_MODULE, 'O', ' ' },
{ TAINT_UNSIGNED_MODULE, 'E', ' ' },
+ { TAINT_SOFTLOCKUP, 'L', ' ' },
};
/**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 516203e..09ac67c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -329,6 +329,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
if (softlockup_panic)
panic("softlockup: hung tasks");
+ add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
__this_cpu_write(soft_watchdog_warn, true);
} else
__this_cpu_write(soft_watchdog_warn, false);
--
1.7.9.5
On Tue, 3 Jun 2014 22:12:35 -0400 Josh Hunt <[email protected]> wrote:
> This taint flag will be set if the system has ever entered a softlockup
> state. Similar to TAINT_WARN it is useful to know whether or not the system
> has been in a softlockup state when debugging.
>
> ...
>
> @@ -329,6 +329,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>
> if (softlockup_panic)
> panic("softlockup: hung tasks");
> + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> __this_cpu_write(soft_watchdog_warn, true);
> } else
> __this_cpu_write(soft_watchdog_warn, false);
Would make more sense to have applied the taint *before* calling
panic()?
diff -puN kernel/watchdog.c~panic-add-taint_softlockup-fix kernel/watchdog.c
--- a/kernel/watchdog.c~panic-add-taint_softlockup-fix
+++ a/kernel/watchdog.c
@@ -368,9 +368,9 @@ static enum hrtimer_restart watchdog_tim
smp_mb__after_atomic();
}
+ add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
if (softlockup_panic)
panic("softlockup: hung tasks");
- add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
__this_cpu_write(soft_watchdog_warn, true);
} else
__this_cpu_write(soft_watchdog_warn, false);
On Mon, 23 Jun 2014 17:45:00 -0500 Josh Hunt <[email protected]> wrote:
> On 06/23/2014 05:11 PM, Andrew Morton wrote:
> > On Tue, 3 Jun 2014 22:12:35 -0400 Josh Hunt <[email protected]> wrote:
> >
> >> This taint flag will be set if the system has ever entered a softlockup
> >> state. Similar to TAINT_WARN it is useful to know whether or not the system
> >> has been in a softlockup state when debugging.
> >>
> >> ...
> >>
> >> @@ -329,6 +329,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> >>
> >> if (softlockup_panic)
> >> panic("softlockup: hung tasks");
> >> + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> >> __this_cpu_write(soft_watchdog_warn, true);
> >> } else
> >> __this_cpu_write(soft_watchdog_warn, false);
> >
> > Would make more sense to have applied the taint *before* calling
> > panic()?
>
> Andrew
>
> Yep, that's a good call. Thanks. Do you want me to send a v2 or did you
> take care of it?
I fixed it up.
> In addition to adding the softlockup taint flag, do you think it'd be
> reasonable to add another flag for page allocation failures? I think
> it'd be nice to be able to account for these conditions somehow without
> having to parse dmesg, etc. As with the softlockup flag, it's helpful to
> know if your system had encountered a page allocation failure at some
> point before the crash or whatever you're debugging.
I don't know, really. Allocation failures are often an expected thing
as drivers try to work out how much memory they can allocate. Those
things can be screened out by testing __GFP_NOWARN. GFP_ATOMIC
failures should probably be ignored, except for when they shouldn't.
But even then, allocation failures are somewhat common. And recency is
a concern: an allocation failure 10 minutes ago is unlikely to be
relevant.
But that's just me waving hands around. I'd be interested to hear from
people whose kernels crash more often than mine, and from those whose
job is to support them (ie distro people?).
On 06/23/2014 05:11 PM, Andrew Morton wrote:
> On Tue, 3 Jun 2014 22:12:35 -0400 Josh Hunt <[email protected]> wrote:
>
>> This taint flag will be set if the system has ever entered a softlockup
>> state. Similar to TAINT_WARN it is useful to know whether or not the system
>> has been in a softlockup state when debugging.
>>
>> ...
>>
>> @@ -329,6 +329,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>>
>> if (softlockup_panic)
>> panic("softlockup: hung tasks");
>> + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>> __this_cpu_write(soft_watchdog_warn, true);
>> } else
>> __this_cpu_write(soft_watchdog_warn, false);
>
> Would make more sense to have applied the taint *before* calling
> panic()?
Andrew
Yep, that's a good call. Thanks. Do you want me to send a v2 or did you
take care of it?
In addition to adding the softlockup taint flag, do you think it'd be
reasonable to add another flag for page allocation failures? I think
it'd be nice to be able to account for these conditions somehow without
having to parse dmesg, etc. As with the softlockup flag, it's helpful to
know if your system had encountered a page allocation failure at some
point before the crash or whatever you're debugging.
Thanks
Josh
On 06/23/2014 05:51 PM, Andrew Morton wrote:
> On Mon, 23 Jun 2014 17:45:00 -0500 Josh Hunt <[email protected]> wrote:
>
>> On 06/23/2014 05:11 PM, Andrew Morton wrote:
>>> On Tue, 3 Jun 2014 22:12:35 -0400 Josh Hunt <[email protected]> wrote:
>>>
>>>> This taint flag will be set if the system has ever entered a softlockup
>>>> state. Similar to TAINT_WARN it is useful to know whether or not the system
>>>> has been in a softlockup state when debugging.
>>>>
>>>> ...
>>>>
>>>> @@ -329,6 +329,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>>>>
>>>> if (softlockup_panic)
>>>> panic("softlockup: hung tasks");
>>>> + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>>>> __this_cpu_write(soft_watchdog_warn, true);
>>>> } else
>>>> __this_cpu_write(soft_watchdog_warn, false);
>>>
>>> Would make more sense to have applied the taint *before* calling
>>> panic()?
>>
>> Andrew
>>
>> Yep, that's a good call. Thanks. Do you want me to send a v2 or did you
>> take care of it?
>
> I fixed it up.
>
>> In addition to adding the softlockup taint flag, do you think it'd be
>> reasonable to add another flag for page allocation failures? I think
>> it'd be nice to be able to account for these conditions somehow without
>> having to parse dmesg, etc. As with the softlockup flag, it's helpful to
>> know if your system had encountered a page allocation failure at some
>> point before the crash or whatever you're debugging.
>
> I don't know, really. Allocation failures are often an expected thing
> as drivers try to work out how much memory they can allocate. Those
> things can be screened out by testing __GFP_NOWARN. GFP_ATOMIC
> failures should probably be ignored, except for when they shouldn't.
> But even then, allocation failures are somewhat common. And recency is
> a concern: an allocation failure 10 minutes ago is unlikely to be
> relevant.
>
> But that's just me waving hands around. I'd be interested to hear from
> people whose kernels crash more often than mine, and from those whose
> job is to support them (ie distro people?).
>
Anyone you'd suggest adding to this thread to get other feedback about
tracking page allocation failures? I could also spin up a patch and cc them.
Thanks
Josh
On Tue, Jun 24, 2014 at 09:22:20AM -0500, Josh Hunt wrote:
> >> In addition to adding the softlockup taint flag, do you think it'd be
> >> reasonable to add another flag for page allocation failures? I think
> >> it'd be nice to be able to account for these conditions somehow without
> >> having to parse dmesg, etc. As with the softlockup flag, it's helpful to
> >> know if your system had encountered a page allocation failure at some
> >> point before the crash or whatever you're debugging.
> >
> > I don't know, really. Allocation failures are often an expected thing
> > as drivers try to work out how much memory they can allocate. Those
> > things can be screened out by testing __GFP_NOWARN. GFP_ATOMIC
> > failures should probably be ignored, except for when they shouldn't.
> > But even then, allocation failures are somewhat common. And recency is
> > a concern: an allocation failure 10 minutes ago is unlikely to be
> > relevant.
> >
> > But that's just me waving hands around. I'd be interested to hear from
> > people whose kernels crash more often than mine, and from those whose
> > job is to support them (ie distro people?).
> >
>
> Anyone you'd suggest adding to this thread to get other feedback about
> tracking page allocation failures? I could also spin up a patch and cc them.
For things like the fuzz test runs I do, I'd have to patch this out.
Things like migrate_pages() with bad arguments will trigger a page
allocation failure rather easily. Likewise set_mempolicy(), and a
handful of other vm syscalls.
There's also the case of "too fragmented to satisfy contiguous multi-page
allocation" which I walk into from time to time (when the kernel manages
to survive a fuzz run long enough, which isn't that often).
Dave
On Tue, Jun 24, 2014 at 11:06:15AM -0400, Dave Jones wrote:
>
> For things like the fuzz test runs I do, I'd have to patch this out.
>
> Things like migrate_pages() with bad arguments will trigger a page
> allocation failure rather easily. Likewise set_mempolicy(), and a
> handful of other vm syscalls.
I grepped logs for the last week. There's also a lot of non-obvious
causes of page alloc failures. (Possibly because free memory had been
filled with dirty huge pages), but here we can't even successfully
alloc a single order 0 page.
Traces from separate runs, on various kernels from the last week.
page allocation failure: order:0, mode:0x280da
dump_stack+0x4e/0x7a
warn_alloc_failed+0xff/0x170
__alloc_pages_nodemask+0x78e/0xc90
alloc_pages_vma+0xaf/0x1c0
handle_mm_fault+0xa31/0xc50
? default_wake_function+0x12/0x20
__do_page_fault+0x1c9/0x630
? __acct_update_integrals+0x8b/0x120
? preempt_count_sub+0xab/0x100
trace_do_page_fault+0x3d/0x130
trace_page_fault+0x22/0x30
page allocation failure: order:0, mode:0x200da
dump_stack+0x4e/0x7a
warn_alloc_failed+0xff/0x170
__alloc_pages_nodemask+0x78e/0xc90
alloc_pages_vma+0xaf/0x1c0
read_swap_cache_async+0x123/0x220
swapin_readahead+0x106/0x1d0
handle_mm_fault+0x9d5/0xc50
? default_wake_function+0x12/0x20
? autoremove_wake_function+0x2b/0x40
__do_page_fault+0x1c9/0x630
? __wake_up+0x44/0x50
? __acct_update_integrals+0x8b/0x120
? preempt_count_sub+0xab/0x100
trace_do_page_fault+0x3d/0x130
trace_page_fault+0x22/0x30
page allocation failure: order:0, mode:0x280da
dump_stack+0x4e/0x7a
warn_alloc_failed+0xff/0x170
__alloc_pages_nodemask+0x78e/0xc90
alloc_pages_vma+0xaf/0x1c0
handle_mm_fault+0xa31/0xc50
? follow_page_mask+0x1f0/0x320
__get_user_pages+0x22b/0x660
? kmem_cache_alloc+0x183/0x210
__mlock_vma_pages_range+0x9e/0xd0
__mm_populate+0xca/0x180
vm_mmap_pgoff+0xd3/0xe0
SyS_mmap_pgoff+0x116/0x2c0
? syscall_trace_enter+0x14d/0x2a0
SyS_mmap+0x22/0x30
tracesys+0xdd/0xe2
page allocation failure: order:0, mode:0x2084d0
dump_stack+0x4e/0x7a
warn_alloc_failed+0xff/0x170
__alloc_pages_nodemask+0x78e/0xc90
alloc_pages_current+0xb1/0x160
pte_alloc_one+0x17/0x90
__pte_alloc+0x27/0x150
handle_mm_fault+0x68d/0xc50
? follow_page_mask+0xcb/0x320
__get_user_pages+0x22b/0x660
? kmem_cache_alloc+0x183/0x210
__mlock_vma_pages_range+0x9e/0xd0
__mm_populate+0xca/0x180
vm_mmap_pgoff+0xd3/0xe0
SyS_mmap_pgoff+0x116/0x2c0
? syscall_trace_enter+0x14d/0x2a0
SyS_mmap+0x22/0x30
tracesys+0xdd/0xe2
On Tue, 24 Jun 2014, Josh Hunt wrote:
> Anyone you'd suggest adding to this thread to get other feedback about
> tracking page allocation failures? I could also spin up a patch and cc them.
>
Page allocation failures happen all the time, mostly because of
large-order allocations (more than PAGE_ALLOC_COSTLY_ORDER) or allocations
done with GFP_ATOMIC where it's impossible to reclaim or compact memory to
allocate. Because of this, they are fairly easy to trigger from userspace
without having to do much.
Why would this qualify for a taint? I have never debugged a kernel crash
that I traced back to an earlier page allocation failure and said "oh, if
I had only known about that page allocation failure earlier!". If one of
them is going to cause an issue, it probably is at the point of the crash
and you shouldn't have to "investigate" much.
I second David, there is no reason for this if there is a bug due to
page allocation
failure it will be probably at the point of kernel panic or in the
trace statements
at panic.
Cheers Nick
On Tue, Jun 24, 2014 at 10:19 PM, Nick Krause <[email protected]> wrote:
> I second David, this is no reason for this if there is a bug due to page
> allocation
> failure it will be probably at the point of kernel panic or in the trace
> statements
> at panic.
> Cheers Nick
>
>
> On Tue, Jun 24, 2014 at 8:45 PM, David Rientjes <[email protected]> wrote:
>>
>> On Tue, 24 Jun 2014, Josh Hunt wrote:
>>
>> > Anyone you'd suggest adding to this thread to get other feedback about
>> > tracking page allocation failures? I could also spin up a patch and cc
>> > them.
>> >
>>
>> Page allocation failures happen all the time, mostly because of
>> large-order allocations (more than PAGE_ALLOC_COSTLY_ORDER) or allocations
>> done with GFP_ATOMIC where it's impossible to reclaim or compact memory to
>> allocate. Because of this, they are fairly easy to trigger from userspace
>> without having to do much.
>>
>> Why would this qualify for a taint? I have never debugged a kernel crash
>> that I traced back to an earlier page allocation failure and said "oh, if
>> I had only known about that page allocation failure earlier!". If one of
>> them is going to cause an issue, it probably is at the point of the crash
>> and you shouldn't have to "investigate" much.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
>
On 06/24/2014 07:45 PM, David Rientjes wrote:
> On Tue, 24 Jun 2014, Josh Hunt wrote:
>
>> Anyone you'd suggest adding to this thread to get other feedback about
>> tracking page allocation failures? I could also spin up a patch and cc them.
>>
>
> Page allocation failures happen all the time, mostly because of
> large-order allocations (more than PAGE_ALLOC_COSTLY_ORDER) or allocations
> done with GFP_ATOMIC where it's impossible to reclaim or compact memory to
> allocate. Because of this, they are fairly easy to trigger from userspace
> without having to do much.
>
> Why would this qualify for a taint? I have never debugged a kernel crash
> that I traced back to an earlier page allocation failure and said "oh, if
> I had only known about that page allocation failure earlier!". If one of
> them is going to cause an issue, it probably is at the point of the crash
> and you shouldn't have to "investigate" much.
>
I guess I was thinking more of the case where all you have is the
trace/dump and for whatever reason the last bits which may contain the
page allocation failure info didn't get flushed to disk. In that case
it'd be nice to know what lead up to the crash. However, I do agree with
your point and Andrew's about the frequency and ease of triggering them
which would make taint the wrong place to account for them.
Thanks
Josh
If you want to flush the ram issues back to disk,
that may be a good idea otherwise I would just
close this discussion.
Cheers Nick
On Tue, Jun 24, 2014 at 11:24 PM, Josh Hunt <[email protected]> wrote:
> On 06/24/2014 07:45 PM, David Rientjes wrote:
>>
>> On Tue, 24 Jun 2014, Josh Hunt wrote:
>>
>>> Anyone you'd suggest adding to this thread to get other feedback about
>>> tracking page allocation failures? I could also spin up a patch and cc
>>> them.
>>>
>>
>> Page allocation failures happen all the time, mostly because of
>> large-order allocations (more than PAGE_ALLOC_COSTLY_ORDER) or allocations
>> done with GFP_ATOMIC where it's impossible to reclaim or compact memory to
>> allocate. Because of this, they are fairly easy to trigger from userspace
>> without having to do much.
>>
>> Why would this qualify for a taint? I have never debugged a kernel crash
>> that I traced back to an earlier page allocation failure and said "oh, if
>> I had only known about that page allocation failure earlier!". If one of
>> them is going to cause an issue, it probably is at the point of the crash
>> and you shouldn't have to "investigate" much.
>>
>
> I guess I was thinking more of the case where all you have is the trace/dump
> and for whatever reason the last bits which may contain the page allocation
> failure info didn't get flushed to disk. In that case it'd be nice to know
> what lead up to the crash. However, I do agree with your point and Andrew's
> about the frequency and ease of triggering them which would make taint the
> wrong place to account for them.
>
> Thanks
> Josh
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/