Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp178119imu; Fri, 16 Nov 2018 20:23:13 -0800 (PST) X-Google-Smtp-Source: AJdET5dbUMwO3yPSTCOrevdSK6jcQGXyx3/rTGC0MRXr//z7QIU3GHL314ThX1+UqcrozD31XCBD X-Received: by 2002:a63:1f1c:: with SMTP id f28mr12437217pgf.193.1542428593620; Fri, 16 Nov 2018 20:23:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542428593; cv=none; d=google.com; s=arc-20160816; b=iYtsfrFVPYeyy0QnyuomNZj4slDmIExQGLTWxMm6nVmzy6QxFp8u1R4dZmADNRsgGI YON3HHjn8/9YTIdp7gM+IBIQdGAJBt9Op4N8zTyYg1H/6fCsofkiwZYKufPJHZC9ojuX ggIc56nOtuzXyFyCJ4e8kWG6TkyMSJ03kT1vJdM67uzlcY9wJn8B9w7ecBk5TaNbb5uU FXObEZmoor9hq43jhuYjk4j4InJWdy/5szMBb6jzEUpa6t+0HwZnEvsXiEAPjdQ3AgJW GBaS25DhS3yNYFBSI6DKEpJ4Xc5RGkLhp8dQrNqCCAdojgaXli871dj8AEU5RgFuSmha xq0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=zkiVW0/Q9EW9boBf9u/CLW8xRsfGEVmZaUaQ4x/wW3A=; b=Tq3jT2fUnu5PzgUrX2u5K0tIjke93bDyzzh9F+/HU/2uFIOn4/IuExhuEcaJNfHvAr 63s3RYvfyhU3/+n0bFkIzV4y3pYWDlzjyuIWjHpzrihcVBg82mBXVkNTFy+9Ah6O7CIP U8YddC35qmDUvuXcqB9Ho624QBdIVKw0+RU9u+GtJnHbnfe8ZpVq5Cbg7RTZe7mezSbL eBBhWBRW3bkMVQUgyTI33ryLSzwzR6ASQLh9EAUK6Bgtm9tczXAg2681XpMB6axyfb0c 6sUUg9aMppJ4o4oUe0Mk73OErvbgBAXDk3raOza05WO+mrBqae819Dd3LBHaGpRiM/uA d49A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 26si8948175pgq.402.2018.11.16.20.22.59; Fri, 16 Nov 2018 20:23:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730436AbeKQOhc (ORCPT + 99 others); Sat, 17 Nov 2018 09:37:32 -0500 Received: from mx1.redhat.com ([209.132.183.28]:36274 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729820AbeKQOhc (ORCPT ); Sat, 17 Nov 2018 09:37:32 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 052133001A49; Sat, 17 Nov 2018 04:22:14 +0000 (UTC) Received: from localhost (ovpn-8-17.pek2.redhat.com [10.72.8.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 040AE608E8; Sat, 17 Nov 2018 04:22:10 +0000 (UTC) Date: Sat, 17 Nov 2018 12:22:08 +0800 From: Baoquan He To: Michal Hocko Cc: David Hildenbrand , linux-mm@kvack.org, pifang@redhat.com, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, aarcange@redhat.com Subject: Re: Memory hotplug softlock issue Message-ID: <20181117042208.GB18471@MiWiFi-R3L-srv> References: <20181115051034.GK2653@MiWiFi-R3L-srv> <20181115073052.GA23831@dhcp22.suse.cz> <20181115075349.GL2653@MiWiFi-R3L-srv> <20181115083055.GD23831@dhcp22.suse.cz> <20181115131211.GP2653@MiWiFi-R3L-srv> <20181115131927.GT23831@dhcp22.suse.cz> <20181115133840.GR2653@MiWiFi-R3L-srv> <20181115143204.GV23831@dhcp22.suse.cz> <20181116012433.GU2653@MiWiFi-R3L-srv> <20181116091409.GD14706@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181116091409.GD14706@dhcp22.suse.cz> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Sat, 17 Nov 2018 04:22:14 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/16/18 at 10:14am, Michal Hocko wrote: > Could you try to apply this debugging patch on top please? It will dump > stack trace for each reference count elevation for one page that fails > to migrate after multiple passes. Thanks, applied and fixed two code issues. The dmesg has been sent to you privately, please check. The dmesg is overflow, if you need the earlier message, I will retest. diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index b64ebf253381..f76e2c498f31 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -72,7 +72,7 @@ static inline int page_count(struct page *page) return atomic_read(&compound_head(page)->_refcount); } -struct page *page_to_track; +extern struct page *page_to_track; static inline void set_page_count(struct page *page, int v) { atomic_set(&page->_refcount, v); diff --git a/mm/migrate.c b/mm/migrate.c index 9b2e395a3d68..42c7499c43b9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1339,6 +1339,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, } struct page *page_to_track; +EXPORT_SYMBOL_GPL(page_to_track); /* * migrate_pages - migrate the pages specified in a list, to the free pages > > diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h > index 14d14beb1f7f..b64ebf253381 100644 > --- a/include/linux/page_ref.h > +++ b/include/linux/page_ref.h > @@ -72,9 +72,12 @@ static inline int page_count(struct page *page) > return atomic_read(&compound_head(page)->_refcount); > } > > +struct page *page_to_track; > static inline void set_page_count(struct page *page, int v) > { > atomic_set(&page->_refcount, v); > + if (page == page_to_track) > + dump_stack(); > if (page_ref_tracepoint_active(__tracepoint_page_ref_set)) > __page_ref_set(page, v); > } > @@ -91,6 +94,8 @@ static inline void init_page_count(struct page *page) > static inline void page_ref_add(struct page *page, int nr) > { > atomic_add(nr, &page->_refcount); > + if (page == page_to_track) > + dump_stack(); > if (page_ref_tracepoint_active(__tracepoint_page_ref_mod)) > __page_ref_mod(page, nr); > } > @@ -105,6 +110,8 @@ static inline void page_ref_sub(struct page *page, int nr) > static inline void page_ref_inc(struct page *page) > { > atomic_inc(&page->_refcount); > + if (page == page_to_track) > + dump_stack(); > if (page_ref_tracepoint_active(__tracepoint_page_ref_mod)) > __page_ref_mod(page, 1); > } > @@ -129,6 +136,8 @@ static inline int page_ref_inc_return(struct page *page) > { > int ret = atomic_inc_return(&page->_refcount); > > + if (page == page_to_track) > + dump_stack(); > if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_and_return)) > __page_ref_mod_and_return(page, 1, ret); > return ret; > @@ -156,6 +165,8 @@ static inline int page_ref_add_unless(struct page *page, int nr, int u) > { > int ret = atomic_add_unless(&page->_refcount, nr, u); > > + if (page == page_to_track) > + dump_stack(); > if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_unless)) > __page_ref_mod_unless(page, nr, ret); > return ret; > diff --git a/mm/migrate.c b/mm/migrate.c > index f7e4bfdc13b7..9b2e395a3d68 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1338,6 +1338,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, > return rc; > } > > +struct page *page_to_track; > + > /* > * migrate_pages - migrate the pages specified in a list, to the free pages > * supplied as the target for the page migration > @@ -1375,6 +1377,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > if (!swapwrite) > current->flags |= PF_SWAPWRITE; > > + page_to_track = NULL; > for(pass = 0; pass < 10 && retry; pass++) { > retry = 0; > > @@ -1417,6 +1420,8 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > goto out; > case -EAGAIN: > retry++; > + if (pass > 1 && !page_to_track) > + page_to_track = page; > break; > case MIGRATEPAGE_SUCCESS: > nr_succeeded++; > -- > Michal Hocko > SUSE Labs