Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3252863imu; Mon, 28 Jan 2019 01:18:58 -0800 (PST) X-Google-Smtp-Source: ALg8bN6gNTxjKDGqTeRckohRsaH+TrzUPTKD4rgPnnKgfSBxzAUuuAjJ6dCqQtn4ZoEqLVWILsfo X-Received: by 2002:a17:902:6a8c:: with SMTP id n12mr21177930plk.85.1548667138664; Mon, 28 Jan 2019 01:18:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548667138; cv=none; d=google.com; s=arc-20160816; b=c+OMZQJbgBFIw4cH6v3QbAx/Lzzk2BAK38NPx9f0m/5eYjiGoZlNGDCbcAMeHriC87 xv8G0NkaV2JsCFDaplEizTZhkNAnwy3U56RLALB58BJ3ZF1fnd0gwpRx2zLbF+TTgYLQ cbraabzMj5E8ps+FMHQufsqPtgKpT4DxOBw5dVrnNxdtsBTJ8rRjM1Y9bi2PbYY6gERK 0AEL76YuPme1P0LykXrL6/e1XbTav5QSazOcmDTyVgIjlkIiv8WuAuvvRScSHE7LB7Nc CdMPaXIv/6tAH24OAv0a5ApQtfI3d8A4vu3bLZBZyolb37XhfNMh6JiLQGmZ4AB5dov3 uSZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=hcaO2/Jtj4q4AY9PU5IFetcJiegAR/9GfPSOCrdMpkk=; b=l2pW+DCydfhpkVqyIIOf65ACmO1VA8Oqq+z0oDnK9eHT8axT3D0TLWbW6f4JtxkxjR t8et08QVJ5pu50JVmdRLprO7jVUCxRouYxaBKOrAfNuE7bPwMOZMCyBTz3utwmkH/8BN aXuvZ0iMtoT80n5sW+GKNxCJsFvmNBWkpV1AMimat6DtS+5OMjhDtsM2/if0R+LFs1PA epqk95RsBS1P5rLEWnw2UhNvgd9/Tw2G6rKoV5jJuNs2jzJuemQSwzpX+aB12cbxjQUl Zw0ahFPwCNp3NfdVLJpr3h3W62tQ+vzMaylC8fjCMXqQ9pHWNu5RUk3A9eoqjH5wWOAr HvIw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b7si33353057plb.234.2019.01.28.01.18.43; Mon, 28 Jan 2019 01:18:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726867AbfA1JQc (ORCPT + 99 others); Mon, 28 Jan 2019 04:16:32 -0500 Received: from mx2.suse.de ([195.135.220.15]:58400 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726654AbfA1JQb (ORCPT ); Mon, 28 Jan 2019 04:16:31 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5677FAD17; Mon, 28 Jan 2019 09:16:29 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id F21CF1E3FEF; Mon, 28 Jan 2019 10:16:27 +0100 (CET) Date: Mon, 28 Jan 2019 10:16:27 +0100 From: Jan Kara To: valdis.kletnieks@vt.edu Cc: Pavel Machek , Mel Gorman , kernel list , Andrew Morton , vbabka@suse.cz, aarcange@redhat.com, rientjes@google.com, mhocko@kernel.org, zi.yan@cs.rutgers.edu, hannes@cmpxchg.org, jack@suse.cz Subject: Re: [regression -next0117] What is kcompactd and why is he eating 100% of my cpu? Message-ID: <20190128091627.GA27972@quack2.suse.cz> References: <20190126200005.GB27513@amd> <12171.1548557813@turing-police.cc.vt.edu> <20190127141556.GB9565@techsingularity.net> <20190127160027.GA9340@amd> <13417.1548624994@turing-police.cc.vt.edu> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="J2SCkAp4GZ/dPZZf" Content-Disposition: inline In-Reply-To: <13417.1548624994@turing-police.cc.vt.edu> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun 27-01-19 16:36:34, valdis.kletnieks@vt.edu wrote: > On Sun, 27 Jan 2019 17:00:27 +0100, Pavel Machek said: > > > > I've noticed this as well on earlier kernels (next-20181224 to 20190115) > > > > Some more info: > > > > 1) echo 3 > /proc/sys/vm/drop_caches unwedges kcompactd in 1-3 seconds. > > > This aspect is curious as it indicates that kcompactd could potentially > > > be infinite looping but it's not something I've experienced myself. By > > > any chance is there a preditable reproduction case for this? > > > > I seen it exactly once, so not sure how reproducible this is. x86-32 > > machine, running chromium browser, so yes, there was some swapping > > involved. > > I don't have a surefire replicator, but my laptop (x86_64, so it's not a 32-bit > only issue) triggers it fairly often, up to multiple times a day. Doesn't seem to > be just the Chrome browser that triggers it - usually I'm doing other stuff as > well, like a compile or similar. The fact that 'drop_caches' clears it makes me > wonder if we're hitting a corner case where cache data isn't being automatically > cleared and clogging something up. So my buffer_migrate_page_norefs() is certainly buggy in its current incarnation (as a result block device page cache is not migratable at all). I've sent Andrew a patch over week ago but so far it got ignored. The patch is attached, can you give it a try whether it changes something for you? Thanks! Honza -- Jan Kara SUSE Labs, CR --J2SCkAp4GZ/dPZZf Content-Type: text/x-patch; charset=us-ascii Content-Disposition: attachment; filename="0001-mm-migrate-Make-buffer_migrate_page_norefs-actually-.patch" From 59ab3a8504c35e2215af6c251bdb2a8a1caca1dd Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 16 Jan 2019 11:02:48 +0100 Subject: [PATCH] mm: migrate: Make buffer_migrate_page_norefs() actually succeed Currently, buffer_migrate_page_norefs() was constantly failing because buffer_migrate_lock_buffers() grabbed reference on each buffer. In fact, there's no reason for buffer_migrate_lock_buffers() to grab any buffer references as the page is locked during all our operation and thus nobody can reclaim buffers from the page. So remove grabbing of buffer references which also makes buffer_migrate_page_norefs() succeed. Fixes: 89cb0888ca14 "mm: migrate: provide buffer_migrate_page_norefs()" Signed-off-by: Jan Kara --- mm/migrate.c | 5 ----- 1 file changed, 5 deletions(-) Andrew, can you please merge this patch? Sadly my previous testing only tested that page migration in general didn't get broken but I forgot to test whether the new migrate page callback actually results in more successful migrations for block device pages. So the bug got only revealed by customer testing. Now I've reproduced the workload internally and verified that the patch indeed fixes the issue. diff --git a/mm/migrate.c b/mm/migrate.c index a16b15090df3..712b231a7376 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -709,7 +709,6 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head, /* Simple case, sync compaction */ if (mode != MIGRATE_ASYNC) { do { - get_bh(bh); lock_buffer(bh); bh = bh->b_this_page; @@ -720,18 +719,15 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head, /* async case, we cannot block on lock_buffer so use trylock_buffer */ do { - get_bh(bh); if (!trylock_buffer(bh)) { /* * We failed to lock the buffer and cannot stall in * async migration. Release the taken locks */ struct buffer_head *failed_bh = bh; - put_bh(failed_bh); bh = head; while (bh != failed_bh) { unlock_buffer(bh); - put_bh(bh); bh = bh->b_this_page; } return false; @@ -818,7 +814,6 @@ static int __buffer_migrate_page(struct address_space *mapping, bh = head; do { unlock_buffer(bh); - put_bh(bh); bh = bh->b_this_page; } while (bh != head); -- 2.16.4 --J2SCkAp4GZ/dPZZf--