2012-02-02 08:36:54

by Michael Tokarev

[permalink] [raw]
Subject: Re: 3.0.18 tcsetattr on fd 0 when detached freezes system (RCU timeouts) (Centos 6.1 x86_64)

On 31.01.2012 23:04, Professor Berkley Shands wrote:
> Very strange. Now boxes with NO OFED, no Intel-10 Gige and no special drivers are locking up.
> If your kernel is Red Hat compatible, could you please send me a copy of the .config so I can try to
> isolate this more?

Why are you writing to me personally? I just
tried your code and can't find the problem you
see, and I replied to the list. Cc'ing to the
list now.

I don't understand what is "redhat compatible".
I use kernel from kernel.org, currently at version
3.0.18.

Did you try my small "reproducer" - does it lock your
machines too? I provided complete code which is
compilable and runnable, unlike your version which
lacked some context.

Thanks,

/mjt


2012-02-02 22:09:42

by Professor Berkley Shands

[permalink] [raw]
Subject: Re: 3.0.18 tcsetattr on fd 0 when detached freezes system (RCU timeouts) (Centos 6.1 x86_64)

I built my .config from the redhat .config provided in 2.6.32-131 using
make oldconfig.
that failed miserably. I then used one based on 2.6.39.4, which actually
booted, but I get these
lockup errors, RCU timeouts, ...

The system died right away on the tcsetattr(), (which also did not
return any error).
And my simple test case crashed all the time. Looked rather suspicous...
Now after a week, *ALL* my 3.0.18 boxes lock up (other than sitting
IDLE, any load eventually
causes the system to stop scheduling). That is 32 core 6282's, 3.46GHz
Nehalems, 2.3 GHz 2374's...
I have to assume the tcsetattr() is an artifact at this point.
Without building all the kernels in between 2.6.32.55 and 3.0.18, I
needed a starting point
for the .config that works. Usually it is something unnoticed that
needed to be updated
that make oldconfig didn't point out. Things do change that I can't keep
current on. :-)

So it appears that it has to be my configuration. Hence the request for
a .config I can compare against to see
what is wrong / misconfigured / not configured etc.

Berkley