Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932404AbXINO4G (ORCPT ); Fri, 14 Sep 2007 10:56:06 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757547AbXINOzx (ORCPT ); Fri, 14 Sep 2007 10:55:53 -0400 Received: from smtp.seznam.cz ([77.75.72.43]:47896 "HELO smtp.seznam.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751943AbXINOzw (ORCPT ); Fri, 14 Sep 2007 10:55:52 -0400 From: "Frantisek Rysanek" To: linux-kernel@vger.kernel.org Date: Fri, 14 Sep 2007 17:00:09 +0200 MIME-Version: 1.0 Subject: Re: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU Message-ID: <46EABE19.27660.2EE1940D@localhost> In-reply-to: <200709131030.55247.nickpiggin@yahoo.com.au> References: <46E96951.10344.29AE6546@localhost> X-mailer: Pegasus Mail for Windows (4.21c) Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Content-description: Mail message body X-Smtpd: 1.0.12@11920:11921 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2892 Lines: 75 Dear Mr. Piggin, thanks for your response in the first place :-) On 13 Sep 2007 at 2:30, Nick Piggin wrote: > > Can you see if it is looping in userspace or kernel? Can you kill -9 > the process? > I can't run any command. Any command hangs or coredumps. > Are you able to test with the latest 2.6.23-rc kernel? If not (or if it > still has the same problem), then can you get the output of sysrq+T > and three sysrq+P calls, please? (this might help work out where in > kernel it is spinning). > I've compiled 2.6.23-rc6, enabled serial console and captured the output of sysrq+P (on the affected virtual VGA console) and sysrq+T. http://www.fccps.cz/download/adv/frr/bonnie/2.6.23-rc6.txt The interesting bit of information, related to the erratic "bash" processes, is always a single line, such as: bash R running 0 2358 1 I've also taken a photo of `top` running on another virtual console. I can't get any data out of the affected box, as I can't run any shell commands... http://www.fccps.cz/download/adv/frr/bonnie/top.jpg Note that there are rather few processes running in the user space. Can't say if that makes any difference from a full-blown distro. Maybe I could set up the bootable CD for download somewhere (gzipped ISO of maybe 50 Megs). In this scenario, Linux 2.6.16.18 once reported a soft lockup. http://www.fccps.cz/download/adv/frr/bonnie/soft-lockup1.txt Never again. I also managed to catch the misbehavior in strace once, didn't get a capture, but essentially it was stuck at a single open syscall, I believe it was "waitpid(1, " . (Never managed that again, always got segfaults instead of the loopy bash when trying to watch bash by strace -p). Exactly where does the context switch from user to kernel take place? I know that I can call ioctl() from user space, and I can write ioctl() handlers in kernel space as part of device drivers (the handlers take place entirely in kernel space). The waitpid() thing is a syscall, being entered only once from user space - and the bash process seems to keep looping inside it. Does the single "running" line in Alt+SysRq+T mean that the process is looping in user space? Take a look at the CPU consumption % numbers though... Note that there's no OOM killer. (Seen that one before, under different circumstances - when OCFS2 didn't like machines with less than 1 GB RAM.) My impression is that the erratic behavior could be a secondary symptom of a kernel-space memory leak taking place somewhere else than in the loopy code itself. Can't say if the leak takes place in memory management or EXT3 for instance... Frank Rysanek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/