2002-12-11 21:42:01

by Matt Simonsen

[permalink] [raw]
Subject: PS/Top broken - /proc entry bad

I had a box where ps and top quit working after hundreds of days uptime.
After doing an strace ps I found that one directory in /proc was hanging
it up, a directory named a 5 digit number which I believe was
associtated with a process of the same name.

I tried doing a kill -9 on the process, it returned fine but the process
was still there. Reboot hung my session, too, I had to use reboot -f to
get the machine healthy again.

Is there any way to "fix" /proc other than what I did? I suppose maybe
going into a lower init level and then back to 3 may have worked. It's a
remote machine, though, so reboot was at the time seemed like a better
solution.

Any comments/suggestions on what to do in this situation?

Thanks
Matt



2002-12-16 17:35:54

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: PS/Top broken - /proc entry bad

Use sysreq-t to get a backtrace of the processes. Most likely one of
them hung while still holding the mm semaphore, thereby preventing ps
and top from proceeding. Check your log for oopsen.

-ben

On Wed, Dec 11, 2002 at 01:49:51PM -0800, Matt Simonsen wrote:
> I had a box where ps and top quit working after hundreds of days uptime.
> After doing an strace ps I found that one directory in /proc was hanging
> it up, a directory named a 5 digit number which I believe was
> associtated with a process of the same name.
>
> I tried doing a kill -9 on the process, it returned fine but the process
> was still there. Reboot hung my session, too, I had to use reboot -f to
> get the machine healthy again.
>
> Is there any way to "fix" /proc other than what I did? I suppose maybe
> going into a lower init level and then back to 3 may have worked. It's a
> remote machine, though, so reboot was at the time seemed like a better
> solution.
>
> Any comments/suggestions on what to do in this situation?
>
> Thanks
> Matt
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
"Do you seek knowledge in time travel?"