Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2689704imu; Mon, 19 Nov 2018 04:41:29 -0800 (PST) X-Google-Smtp-Source: AJdET5fff3MjanfODAwuz8+2mD5o5j+P3xCeqV8YJq0HYypp3+R6WiOWQ/TT5eINToQE+JRGF6Fl X-Received: by 2002:a17:902:ba89:: with SMTP id k9mr16992050pls.189.1542631289300; Mon, 19 Nov 2018 04:41:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542631289; cv=none; d=google.com; s=arc-20160816; b=zUxhMLSPzQmYQR2oX3vupF3xBKSea0P+qWxKXoZ1bnwnaw5kmXaPAlt+VWTkk5IAU1 sulgpVCjLt9ijL0vD5Lh4aeQE7eo9bGBZH9cjJphSLhIO7cJKxYAY4BRghWMQ8QvmWsp ucr1gcVCww86LCSf3LO/wzYgw72rNTUNUf1MWmmgrh6kVGtbn3La2HCIrOM0j0/E4LVM ru4BZxZdXbV55on+kFyuW6wb1vmxwriGitzOxjeFW3qRk/D0GH5w/xb/qflvZ+9caAWJ TU0ztD9i2mURjxOcTcitrnf3NBNb9rHd1lEsLR7Js0bFLjh+O0X70ACebmBf/11i1Nmd Mxeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=w9pzgn7/hVdXrEdb+rvrKoZbE2yKHI9Dw/cQ1IMKSdE=; b=ZLRZvabrXuvjFSOUzCumyc9w7oln2mirwJgSc+e+8msB/BkdWpBwVqtiJRK+pqoeCt +Ts5QwKTeT5FzQT7bXF7l8Qe+HsPAtLRy/pB6f9FF1vu6ypv7p5KhI7QA6wcpYh9F0qp vk7YBndWD/PXjj45MMqrIGmupJQJJbIztDe1VqSRZkCihMNpTIYmzfLavORnYZ/45NYh 4Qw4VcoD871qsQXrdoUwhPXuMxEX4jukkai9vNvgSBM7nWx1jS7njMjfYUV4QYmmgRPV fQdxdRBPDGDnUjhH48skR5lOT6/BvmLVI13VHPFHIi/rGVnXi5loiTap3WuQsqNTK1T0 WqDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 30si37704044pgr.396.2018.11.19.04.41.14; Mon, 19 Nov 2018 04:41:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728825AbeKSXEG (ORCPT + 99 others); Mon, 19 Nov 2018 18:04:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:38094 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728724AbeKSXEG (ORCPT ); Mon, 19 Nov 2018 18:04:06 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 482A3AE55; Mon, 19 Nov 2018 12:40:34 +0000 (UTC) Date: Mon, 19 Nov 2018 13:40:33 +0100 From: Michal Hocko To: Baoquan He Cc: David Hildenbrand , linux-mm@kvack.org, pifang@redhat.com, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, aarcange@redhat.com, Mel Gorman , Vlastimil Babka , Hugh Dickins Subject: Re: Memory hotplug softlock issue Message-ID: <20181119124033.GJ22247@dhcp22.suse.cz> References: <20181115073052.GA23831@dhcp22.suse.cz> <20181115075349.GL2653@MiWiFi-R3L-srv> <20181115083055.GD23831@dhcp22.suse.cz> <20181115131211.GP2653@MiWiFi-R3L-srv> <20181115131927.GT23831@dhcp22.suse.cz> <20181115133840.GR2653@MiWiFi-R3L-srv> <20181115143204.GV23831@dhcp22.suse.cz> <20181116012433.GU2653@MiWiFi-R3L-srv> <20181116091409.GD14706@dhcp22.suse.cz> <20181119105202.GE18471@MiWiFi-R3L-srv> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181119105202.GE18471@MiWiFi-R3L-srv> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 19-11-18 18:52:02, Baoquan He wrote: [...] There are few stacks directly in the offline path but those should be OK. The real culprit seems to be the swap in code > [ +1.734416] CPU: 255 PID: 5558 Comm: stress Tainted: G L 4.20.0-rc2+ #7 > [ +0.007927] Hardware name: 9008/IT91SMUB, BIOS BLXSV512 03/22/2018 > [ +0.006297] Call Trace: > [ +0.002537] dump_stack+0x46/0x60 > [ +0.003386] __migration_entry_wait.cold.65+0x5/0x14 > [ +0.005043] do_swap_page+0x84e/0x960 > [ +0.003727] ? arch_tlb_finish_mmu+0x29/0xc0 > [ +0.006412] __handle_mm_fault+0x933/0x1330 > [ +0.004265] handle_mm_fault+0xc4/0x250 > [ +0.003915] __do_page_fault+0x2b7/0x510 > [ +0.003990] do_page_fault+0x2c/0x110 > [ +0.003729] ? page_fault+0x8/0x30 > [ +0.003462] page_fault+0x1e/0x30 There are many traces to this path. We are /* * Once page cache replacement of page migration started, page_count * *must* be zero. And, we don't want to call wait_on_page_locked() * against a page without get_page(). * So, we use get_page_unless_zero(), here. Even failed, page fault * will occur again. */ if (!get_page_unless_zero(page)) goto out; pte_unmap_unlock(ptep, ptl); wait_on_page_locked(page); taking a reference to the page under the migration. I have to think about this much more but I suspec this is just calling for a problem. Cc migration experts. For you background information. We are seeing memory offline not being able to converge because few heavily used pages fail to migrate away - e.g. http://lkml.kernel.org/r/20181116012433.GU2653@MiWiFi-R3L-srv A debugging page to dump stack for these pages http://lkml.kernel.org/r/20181116091409.GD14706@dhcp22.suse.cz shows that references are taken from the swap in code (above). How are we supposed to converge when the swapin code waits for the migration to finish with the reference count elevated? -- Michal Hocko SUSE Labs