Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3227571imu; Mon, 19 Nov 2018 12:35:58 -0800 (PST) X-Google-Smtp-Source: AFSGD/WB21+oacqsqaedIbB/HnZv+c3DwQDpNv4knYnsJLTvMLlfQ4wbahLnLgyNnBaCQbrahc3v X-Received: by 2002:a17:902:29ab:: with SMTP id h40mr59465plb.238.1542659758607; Mon, 19 Nov 2018 12:35:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542659758; cv=none; d=google.com; s=arc-20160816; b=dYteIsMV+PGGh3g7wp2rLo3oEzIBJB5nYSz7KuSX4jdq2dJMjT1B72jafNLwglsAJM bjvlwg3CFepLhfDg2I1KKxFDatkP6rudvMvc3A7SK8UdgzF18mufc+i1xWFy4vo2oTmd OXqmYrtkdPNqN7q1+5FXaKBXRPoBwTYfYxFxPJ6g3//WFgY5qQT+3yt2vcixjdiYLMOJ aALSdFe/ZVoIvRCW+J/N2iav1qHaxUHHv7/zWPUnteu9zaC1neredt2+nHvP1v4kHGVt Ulh9fNS+W2RgjEchV2XJpUAUEGCVOZV/hcFpkvICL284m4Jr3CZSTH3oTUtAVYYh0Oly Bisg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=4VwhEQdIzWoKm8DG6qtHibOe+ZGbPSDLqSmQgH3C0pk=; b=O43rujP7dfKttBa/fOkgS2c9wlDiYdkfk1ihRNKRWATz57uKuKEGAXDhfW3vak7xG/ OWISFWb5QCBBwAeqh6x0w/GFSZhgkku9Hf5PfJ4257MVmHL0yxd7HFRYrExKuXOsokmm rOoAwrDsmJitZHJ891W6+XIm3stAPGsnv6UU1Yady4co3XMYye/OvyZdvs0pTroYRxM6 0rda3TZTRcJWx9u5Zuj8u2RRzTaNuUJjxR2+aM1C4NMibaucmY5XkGZpBLZy+/0qvTbV dQqSf85/R3zROlq4s4SCvRCffYm7V7Evlkvz1ETbnTX3kslk9gE+1AnxQPr8Mf+BFVRX uvGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bYtZAiDO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j38-v6si39337854pgl.138.2018.11.19.12.35.43; Mon, 19 Nov 2018 12:35:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bYtZAiDO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730798AbeKTG7i (ORCPT + 99 others); Tue, 20 Nov 2018 01:59:38 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:43771 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728938AbeKTG7i (ORCPT ); Tue, 20 Nov 2018 01:59:38 -0500 Received: by mail-pf1-f193.google.com with SMTP id w73so6071230pfk.10 for ; Mon, 19 Nov 2018 12:34:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=4VwhEQdIzWoKm8DG6qtHibOe+ZGbPSDLqSmQgH3C0pk=; b=bYtZAiDOQDc3xDCGzox9W4zcbHNCtx4x/bmgoEhc3Q7DD1mo8UhDPyf7hXhAuJB8XV hmtWHfx/nd7yAtNcaW7rQDVbn+lThtgCe7EoZk4+JisFNmNJFH65+7nqhA1GgRQMqyWL bFMWUTEN1zXQUq+GY4n6c+AEPv7VixAqwXTGh0KQhd9DWEwRFPwu7p/PtTtY+OdyNffK YUO43IVmCXzYJo3Fl306CJW3TbYww/rt56SCagslqcLSR9VH14xAkag7xrchbg5maOPf v+OU77ql1m75NsfAa41x0ZPaVcMOlaNflyJC0+bTljYMx2uZaKtHCi9qdBFPORgWy1ye pUBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=4VwhEQdIzWoKm8DG6qtHibOe+ZGbPSDLqSmQgH3C0pk=; b=H0uwsiyxwNSDYCwWbMFH7q4Ok93hEMG9wbcCvZLCXQi/i4n0dk6ki9KTHrRrqDaaxy vIBRW7NLfBn47FlvWg/UrBacLqKf8APHEEm8uOdLh1x0ejbT70a2pAqalhKtwsKD7rHs grc0zGKkJYZvUdq1nA+3ZBntDyRHkZTbVNUPek3LGj6k6Aevmo3ECozu5+S+/qCpmJTu vKnd4VhIaj71JAM4Xy85tQCjqk4ReA1Q/jTUP2ocrW3SghPe+d5US4Be8WcFvNVDbfZS jaUmZvVJZVcwpGEm9ke5P2zWaoNIjwKmvmAkZ6C+0YTGS5le2sKXgMqZiMq3R/WV8Sy0 3ZXw== X-Gm-Message-State: AGRZ1gKGf3Ev7FRUo1gR/7KRGcjFDb0C9uvnWFrNNxVs8uQlDD5G8YZY YobZ89g0KZltqkst61BmzbFVQO1LVEA= X-Received: by 2002:a62:4886:: with SMTP id q6mr20358624pfi.182.1542659658341; Mon, 19 Nov 2018 12:34:18 -0800 (PST) Received: from [100.112.89.103] ([104.133.8.103]) by smtp.gmail.com with ESMTPSA id 186sm41772986pga.36.2018.11.19.12.34.16 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 19 Nov 2018 12:34:17 -0800 (PST) Date: Mon, 19 Nov 2018 12:34:09 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Michal Hocko cc: Baoquan He , David Hildenbrand , linux-mm@kvack.org, pifang@redhat.com, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, aarcange@redhat.com, Mel Gorman , Vlastimil Babka , Hugh Dickins Subject: Re: Memory hotplug softlock issue In-Reply-To: <20181119173312.GV22247@dhcp22.suse.cz> Message-ID: References: <20181115131211.GP2653@MiWiFi-R3L-srv> <20181115131927.GT23831@dhcp22.suse.cz> <20181115133840.GR2653@MiWiFi-R3L-srv> <20181115143204.GV23831@dhcp22.suse.cz> <20181116012433.GU2653@MiWiFi-R3L-srv> <20181116091409.GD14706@dhcp22.suse.cz> <20181119105202.GE18471@MiWiFi-R3L-srv> <20181119124033.GJ22247@dhcp22.suse.cz> <20181119125121.GK22247@dhcp22.suse.cz> <20181119141016.GO22247@dhcp22.suse.cz> <20181119173312.GV22247@dhcp22.suse.cz> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 19 Nov 2018, Michal Hocko wrote: > On Mon 19-11-18 15:10:16, Michal Hocko wrote: > [...] > > In other words. Why cannot we do the following? > > Baoquan, this is certainly not the right fix but I would be really > curious whether it makes the problem go away. > > > diff --git a/mm/migrate.c b/mm/migrate.c > > index f7e4bfdc13b7..7ccab29bcf9a 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -324,19 +324,9 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, > > goto out; > > > > page = migration_entry_to_page(entry); > > - > > - /* > > - * Once page cache replacement of page migration started, page_count > > - * *must* be zero. And, we don't want to call wait_on_page_locked() > > - * against a page without get_page(). > > - * So, we use get_page_unless_zero(), here. Even failed, page fault > > - * will occur again. > > - */ > > - if (!get_page_unless_zero(page)) > > - goto out; > > pte_unmap_unlock(ptep, ptl); > > - wait_on_page_locked(page); > > - put_page(page); > > + page_lock(page); > > + page_unlock(page); > > return; > > out: > > pte_unmap_unlock(ptep, ptl); Thanks for Cc'ing me. I did mention precisely this issue two or three times at LSF/MM this year, and claimed then that I would post the fix. I'm glad that I delayed, what I had then (migration_waitqueue instead of using page_waitqueue) was not wrong, but what I've been using the last couple of months is rather better (and can be put to use to solve similar problems in collapsing pages on huge tmpfs. but we don't need to get into that at this time): put_and_wait_on_page_locked(). What I have not yet done is verify it on latest kernel, and research the interested Cc list (Linus and Tim Chen come immediately to mind), and write the commit comment. I have some testing to do on the latest kernel today, so I'll throw put_and_wait_on_page_locked() in too, and post tomorrow I hope. Hugh