LinuxLists.cc - Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.

2008-02-07 00:56:32

Subject: Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.

On Wed, 6 Feb 2008 16:33:20 -0800 (PST)
[email protected] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9906
>
> Summary: Weird hang with NPTL and SIGPROF.
> Product: Process Management
> Version: 2.5
> KernelVersion: 2.6.24-rc4
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: high
> Priority: P1
> Component: Scheduler
> AssignedTo: [email protected]
> ReportedBy: [email protected]
>
>
> Latest working kernel version: None
> Earliest failing kernel version: 2.6.18
> Distribution: Ubuntu
> Hardware Environment: Any
> Problem Description:
> I have a testcase that demonstrates a strange hang of the latest kernel
> (as well as previous ones). In the process of investigating the NPTL,
> we wrote a test that just creates a bunch of threads, then does a
> barrier wait to synchronize them all, after which everybody exits.
> That's all it does.
>
> This works fine under most circumstances. Unfortunately, we also want
> to do profiling, so we catch SIGPROF and turn on ITIMER_PROF. In this
> case, at somewhere between 4000 and 4500 threads, and using the NPTL,
> the system hangs. It's not a hard hang, interrupts are still working
> and clocks are ticking, but nothing is making progress. It becomes
> noticeable when the softlockup_tick() warning goes off after the
> watchdog has been starved long enough.
>
> Sometimes the system recovers and gets going again. Other times it
> doesn't. I've examined the state of things several times with kdb and
> there's certainly nothing obvious going on. Something, perhaps having
> to do with the scheduler, is certainly getting into a bad state, but I
> haven't yet been able to figure out what that is. I've even run it with
> KFT and have seen nothing obvious there, either, except for the fact
> that when it hangs it becomes obvious that it stops making progress and
> it begins to fill up with smp_apic_timer_interrupt() and do_softirq()
> entries. I've also seen smp_apic_timer_interrupt() appear twice or more
> on the stack, as if the previous run(s) didn't finish before the next
> tick happened.
>
> Steps to reproduce:
>
> I'll attach a testcase shortly.
>

It's probably better to handle this one via email, so please send that
testcase vie reply-to-all to this email, thanks.

2008-02-07 01:01:26

Roland, I'm very much having to read between the lines of what you've
written. And, obviously, getting it wrong at least half the time. :-)

So you've cleared part of my understanding with your latest email.
Here's what I've gotten from it:

struct task_cputime {
cputime_t utime; /* User time. */
cputime_t stime; /* System time. */
unsigned long long sched_runtime; /* Scheduler time. */
};

This is for both SMP and UP, defined before signal_struct in sched.h
(since that structure refers to this one). Following that:

struct thread_group_cputime;

Which is a forward reference to the real definition later in the file.
The inline functions depend on signal_struct and task_struct, so they
have to come after:

#ifdef SMP

struct thread_group_cputime {
struct task_cputime *totals;
};

< ... inline functions ... >

#else /* SMP */

struct thread_group_cputime {
struct task_cputime totals;
};

< ... inline functions ... >

#endif

The SMP version is percpu, the UP version is just a substructure. In
signal_struct itself, delete utime & stime, add
struct thread_group_cputime cputime;

The inline functions include the ones you defined for UP plus equivalent
ones for SMP. The SMP inlines check the percpu pointer
(sig->cputime.totals) and don't update if it's NULL. One small
correction to one of your inlines, in thread_group_cputime:
*cputime = sig->cputime;
should be
*cputime = sig->cputime.totals;

A representative inline for SMP is:

static inline void account_group_system_time(struct task_struct *task,
cputime_t cputime)
{
struct task_cputime *times;

if (!sig->cputime.totals)
return;
times = per_cpu_ptr(sig->cputime.totals, get_cpu());
times->stime = cputime_add(times->stime, cputime);
put_cpu_no_resched();
}

To deal with the need for bookkeeping with multiple threads in the SMP
case (where there isn't a per-cpu structure until it's needed), I'll
allocate the per-cpu structure in __exit_signal() where the relevant
fields are updated. I'll also allocate it where I do now, in
do_setitimer(), when needed. The allocation will be a "return 0" for UP
and a call to "thread_group_times_alloc_smp()" (which lives in sched.c)
for SMP.

I'll also optimize run_posix_cpu_timers() as you suggest, and eliminate
rlim_expires.

Expect a new patch fairly soon.
--
Frank Mayhar <[email protected]>
Google, Inc.