Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2485801imu; Sun, 27 Jan 2019 06:10:07 -0800 (PST) X-Google-Smtp-Source: ALg8bN5qe0JubxtMCjI59sPJtNE6jIGItRqJKrNsfkaf515cGuLT9tE+UhgqANOCbfxVXaNobGIW X-Received: by 2002:a63:db48:: with SMTP id x8mr15982902pgi.365.1548598207820; Sun, 27 Jan 2019 06:10:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548598207; cv=none; d=google.com; s=arc-20160816; b=H2dPuTTMRK+A9ZwCyf0FVk7gJrto4+OuD8IhymPyvwowtGvar4ZCvyI8BJveCZcWk5 kRqQ5fpGnlMT48QbV+aXwrsyy9h1iZ4IZ6+WB3R8KRSCL03BZ1K67nkr6GBJNWlojAHj grXqskUKS9Ikz+CZ53vlJvYyLcSjXYwpodndOPPBP9oQOIpkM7S44/gSlmLnDbxyQgGS L9HuXQ5OSA4Vc6BOlS/gCpLueKOmS2e1iD5JG70gN/CSxLjGwhNDBay0M545jjZl5MCD Z0Zsw0HucCUHv/ND4lldDf/CORgiR94HTYqBwKarqGUC6nJ79yBsP0fl+dJgHwYcvIZh upag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=VibrVedpbAmoWFw2Ddn7zcqjmCnnMKwUtYeRa7eK8Dw=; b=lcd+VPB7M8r+01AYQFFviFUuE1NI6kEP1n2feTIY6NPHioorka76FuO0NkKO+5a0OW NrAHFZo7lmol7yZkdHdJUo/VSv1PP0Ya/kI8KxNRbYMN/Oc71ocDilormwQdzUhXqmhb nmaxHYp2N1TiY0voiGgqxwJrnGnpIACtS1/Ogf3BLtYBOvFEgi8L8jSCEAPXwet0K/bQ k1WRnWa3B388Hp+DaEa09vB5+y9/gS/8ryDyps8nUT4O/1td+tXjmF1eUJf8OhGPFzYL pZ8FQpsxoMfwdl+vK3/5iuH/bo4p8/r678DQ7q95H+GEaW1uWp224W9H5NY539ULhpyh BZyw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h191si24339999pgc.302.2019.01.27.06.09.52; Sun, 27 Jan 2019 06:10:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726630AbfA0OJr (ORCPT + 99 others); Sun, 27 Jan 2019 09:09:47 -0500 Received: from outbound-smtp12.blacknight.com ([46.22.139.17]:36019 "EHLO outbound-smtp12.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726511AbfA0OJr (ORCPT ); Sun, 27 Jan 2019 09:09:47 -0500 Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp12.blacknight.com (Postfix) with ESMTPS id 05F991C24D1 for ; Sun, 27 Jan 2019 14:09:45 +0000 (GMT) Received: (qmail 32551 invoked from network); 27 Jan 2019 14:09:44 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[37.228.225.79]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 27 Jan 2019 14:09:44 -0000 Date: Sun, 27 Jan 2019 14:09:36 +0000 From: Mel Gorman To: valdis.kletnieks@vt.edu Cc: Pavel Machek , kernel list , Andrew Morton , vbabka@suse.cz, aarcange@redhat.com, rientjes@google.com, mhocko@kernel.org, zi.yan@cs.rutgers.edu, hannes@cmpxchg.org, Jan Kara Subject: Re: [regression -next0117] What is kcompactd and why is he eating 100% of my cpu? Message-ID: <20190127133132.GA9565@techsingularity.net> References: <20190126200005.GB27513@amd> <12171.1548557813@turing-police.cc.vt.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <12171.1548557813@turing-police.cc.vt.edu> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Adding Jan Kara to cc due to the fact it appears the lockup is within buffer_migrate_page_norefs which changed recently. On Sat, Jan 26, 2019 at 09:56:53PM -0500, valdis.kletnieks@vt.edu wrote: > On Sat, 26 Jan 2019 21:00:05 +0100, Pavel Machek said: > > > top - 13:38:51 up 1:42, 16 users, load average: 1.41, 1.93, 1.62 > > Tasks: 182 total, 3 running, 138 sleeping, 0 stopped, 0 zombie > > %Cpu(s): 2.3 us, 57.8 sy, 0.0 ni, 39.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > KiB Mem: 3020044 total, 2429420 used, 590624 free, 27468 buffers > > KiB Swap: 2097148 total, 0 used, 2097148 free. 1924268 cached Mem > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 608 root 20 0 0 0 0 R 99.6 0.0 11:34.38 kcompactd0 > > 9782 root 20 0 0 0 0 I 7.9 0.0 0:59.02 kworker/0: > > 2971 root 20 0 46624 23076 13576 S 4.3 0.8 2:50.22 Xorg > > I've noticed this as well on earlier kernels (next-20181224 to 20190115) > > Some more info: > > 1) echo 3 > /proc/sys/vm/drop_caches unwedges kcompactd in 1-3 seconds. > > 2) Typical kcompactd traceback: > > cat /proc/27/stack > [<0>] retint_kernel+0x1b/0x2d > [<0>] lock_is_held_type+0x1b/0x50 > [<0>] ___might_sleep+0xad/0x220 > [<0>] __might_sleep+0x113/0x130 > [<0>] on_each_cpu_cond_mask+0x12a/0x140 > [<0>] on_each_cpu_cond+0x18/0x20 > [<0>] invalidate_bh_lrus+0x29/0x30 > [<0>] __buffer_migrate_page+0x154/0x340 > [<0>] buffer_migrate_page_norefs+0x14/0x20 > [<0>] move_to_new_page+0x8e/0x360 > [<0>] migrate_pages+0x3cc/0xfd8 > [<0>] compact_zone+0xb70/0x1380 > [<0>] kcompactd_do_work+0x15b/0x500 > [<0>] kcompactd+0x74/0x340 > [<0>] kthread+0x158/0x170 > [<0>] ret_from_fork+0x3a/0x50 > [<0>] 0xffffffffffffffff > > I've also seen khugepaged hung up: > > cat /proc/29/stack > [<0>] ___preempt_schedule+0x16/0x18 > [<0>] page_vma_mapped_walk+0x60/0x840 > [<0>] remove_migration_pte+0x67/0x390 > [<0>] rmap_walk_file+0x186/0x380 > [<0>] rmap_walk+0xa3/0xd0 > [<0>] remove_migration_ptes+0x69/0x70 > [<0>] migrate_pages+0xb6d/0xfd8 > [<0>] compact_zone+0xb70/0x1370 > [<0>] compact_zone_order+0xd8/0x120 > [<0>] try_to_compact_pages+0xe5/0x550 > [<0>] __alloc_pages_direct_compact+0x6d/0x1a0 > [<0>] __alloc_pages_slowpath+0x6c9/0x1640 > [<0>] __alloc_pages_nodemask+0x558/0x5b0 > [<0>] khugepaged+0x499/0x810 > [<0>] kthread+0x158/0x170 > [<0>] ret_from_fork+0x3a/0x50 > [<0>] 0xffffffffffffffff > > Looks like something has gone astray with compact_zone. > -- Mel Gorman SUSE Labs