Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751985AbdFHP3n (ORCPT ); Thu, 8 Jun 2017 11:29:43 -0400 Received: from mail-ot0-f196.google.com ([74.125.82.196]:33076 "EHLO mail-ot0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751719AbdFHP3l (ORCPT ); Thu, 8 Jun 2017 11:29:41 -0400 Subject: Re: Sleeping BUG in khugepaged for i586 To: David Rientjes , Vlastimil Babka Cc: Andrew Morton , LKML , linux-mm@kvack.org References: <968ae9a9-5345-18ca-c7ce-d9beaf9f43b6@lwfinger.net> <20170605144401.5a7e62887b476f0732560fa0@linux-foundation.org> <1e883924-9766-4d2a-936c-7a49b337f9e2@lwfinger.net> <9ab81c3c-e064-66d2-6e82-fc9bac125f56@suse.cz> From: Larry Finger Message-ID: Date: Thu, 8 Jun 2017 10:29:38 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1398 Lines: 32 On 06/07/2017 03:56 PM, David Rientjes wrote: > On Wed, 7 Jun 2017, Vlastimil Babka wrote: > >>>> Hmm I'd expect such spin lock to be reported together with mmap_sem in >>>> the debugging "locks held" message? >>> >>> My bisection of the problem is about half done. My latest good version is commit >>> 7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go. >> >> Hmm, your bisection will most likely just find commit 338a16ba15495 >> which added the cond_resched() at mm/khugepaged.c:655. CCing David who >> added it. >> > > I agree it's probably going to bisect to 338a16ba15495 since it's the > cond_resched() at the line number reported, but I think there must be > something else going on. I think the list of locks held by khugepaged is > correct because it matches with the implementation. The preempt_count(), > as suggested by Andrew, does not. If this is reproducible, I'd like to > know what preempt_count() is. > The BUG output is reproducible. By the time the box finishes booting, there are at least 2 of them logged. My bisection shows that commit 338a16ba15495 is the bad one. I added a pr_info() to output the value of preempt_count() just before the cond_resched() statement. The count was always 1 whether the BUG was triggered or not. If there are other things you would like logged at that point, or any other diagnostics, please let me know. Larry