Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751681AbdFFPB3 (ORCPT ); Tue, 6 Jun 2017 11:01:29 -0400 Received: from mail-ot0-f193.google.com ([74.125.82.193]:35373 "EHLO mail-ot0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751632AbdFFPBR (ORCPT ); Tue, 6 Jun 2017 11:01:17 -0400 Subject: Re: Sleeping BUG in khugepaged for i586 To: Vlastimil Babka , Andrew Morton Cc: LKML , linux-mm@kvack.org References: <968ae9a9-5345-18ca-c7ce-d9beaf9f43b6@lwfinger.net> <20170605144401.5a7e62887b476f0732560fa0@linux-foundation.org> From: Larry Finger Message-ID: <1e883924-9766-4d2a-936c-7a49b337f9e2@lwfinger.net> Date: Tue, 6 Jun 2017 10:01:12 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2415 Lines: 62 On 06/06/2017 09:02 AM, Vlastimil Babka wrote: > On 06/05/2017 11:44 PM, Andrew Morton wrote: >> On Sat, 3 Jun 2017 14:24:26 -0500 Larry Finger wrote: >> >>> I recently turned on locking diagnostics for a Dell Latitude D600 laptop, which >>> requires a 32-bit kernel. In the log I found the following: >>> >>> BUG: sleeping function called from invalid context at mm/khugepaged.c:655 >>> in_atomic(): 1, irqs_disabled(): 0, pid: 20, name: khugepaged >>> 1 lock held by khugepaged/20: >>> #0: (&mm->mmap_sem){++++++}, at: [] >>> collapse_huge_page.isra.47+0x439/0x1240 >>> CPU: 0 PID: 20 Comm: khugepaged Tainted: G W > > W means thre was WARN earler. Could be related... Got logs? When I grabbed a splat, I got the last one in my log. The first one shows "Not tainted". > >>> 4.12.0-rc1-wl-12125-g952a068 #80 > > What is "wl-12125-g952a068"? What patches on top of mainline? I found this while chasing a problem with one of the wireless drivers. For that reason I use Kalle Valo's wireless-testing-next, which happens to be the only kernel tree I have on this laptop. I'm reasonably certain that the extra updates are not the cause of the problem as the first one appears before any of the wireless drivers are loaded, but I will pull a clean copy of mainline to test that assumption. >>> Hardware name: Dell Computer Corporation Latitude D600 >>> /03U652, BIOS A05 05/29/2003 >>> Call Trace: >>> dump_stack+0x76/0xb2 >>> ___might_sleep+0x174/0x230 >>> collapse_huge_page.isra.47+0xacf/0x1240 >>> khugepaged_scan_mm_slot+0x41e/0xc00 >>> ? _raw_spin_lock+0x46/0x50 >>> khugepaged+0x277/0x4f0 >>> ? prepare_to_wait_event+0xe0/0xe0 >>> kthread+0xeb/0x120 >>> ? khugepaged_scan_mm_slot+0xc00/0xc00 >>> ? kthread_create_on_node+0x30/0x30 >>> ret_from_fork+0x21/0x30 >>> >>> I have no idea when this problem was introduced. Of course, I will test any >>> proposed fixes. >>> >> >> Odd. There's nothing wrong with cond_resched() while holding mmap_sem. >> It looks like khugepaged forgot to do a spin_unlock somewhere and we >> leaked a preempt_count. > > Hmm I'd expect such spin lock to be reported together with mmap_sem in > the debugging "locks held" message? My bisection of the problem is about half done. My latest good version is commit 7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go. Larry