Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753692AbdHUS4Z convert rfc822-to-8bit (ORCPT ); Mon, 21 Aug 2017 14:56:25 -0400 Received: from mga02.intel.com ([134.134.136.20]:51652 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753429AbdHUS4Y (ORCPT ); Mon, 21 Aug 2017 14:56:24 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,409,1498546800"; d="scan'208";a="140375137" From: "Liang, Kan" To: Mel Gorman , Linus Torvalds CC: Mel Gorman , "Kirill A. Shutemov" , Tim Chen , Peter Zijlstra , Ingo Molnar , "Andi Kleen" , Andrew Morton , Johannes Weiner , Jan Kara , linux-mm , Linux Kernel Mailing List Subject: RE: [PATCH 1/2] sched/wait: Break up long wake list walk Thread-Topic: [PATCH 1/2] sched/wait: Break up long wake list walk Thread-Index: AQHTFWNBYSKZKyu5OE6Y+fM96SxNwqKEIDEAgASaOPD//398AIAAsKCA//+X4wCAAQZZgIAAplUA//+BiwAAFSW90P//ia2AgAASl4CAAAVjAIAEq10A//91k2A= Date: Mon, 21 Aug 2017 18:56:20 +0000 Message-ID: <37D7C6CF3E00A74B8858931C1DB2F07753788B58@SHSMSX103.ccr.corp.intel.com> References: <37D7C6CF3E00A74B8858931C1DB2F0775378761B@SHSMSX103.ccr.corp.intel.com> <20170818122339.24grcbzyhnzmr4qw@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F077537879BB@SHSMSX103.ccr.corp.intel.com> <20170818144622.oabozle26hasg5yo@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F07753787AE4@SHSMSX103.ccr.corp.intel.com> <20170818185455.qol3st2nynfa47yc@techsingularity.net> <20170821183234.kzennaaw2zt2rbwz@techsingularity.net> In-Reply-To: <20170821183234.kzennaaw2zt2rbwz@techsingularity.net> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYzk2YmUzMWEtYTg1My00YTFkLTk5ODUtYjI1ODdhOWE1YjNjIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IlNNNDNNR2pyalgzTTZzTHdyQlR2NDBOeWJoSGpOWVAzRHdVZHVTQnhram89In0= x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4222 Lines: 138 > > Because that code sequence doesn't actually depend on > > "wait_on_page_lock()" for _correctness_ anyway, afaik. Anybody who > > does "migration_entry_wait()" _has_ to retry anyway, since the page > > table contents may have changed by waiting. > > > > So I'm not proud of the attached patch, and I don't think it's really > > acceptable as-is, but maybe it's worth testing? And maybe it's > > arguably no worse than what we have now? > > > > Comments? > > > > The transhuge migration path for numa balancing doesn't go through the > migration_entry_wait patch despite similarly named functions that suggest > it does so this may only has the most effect when THP is disabled. It's > worth trying anyway. I just finished the test of yield patch (only functionality not performance). Yes, it works well with THP disabled. With THP enabled, I observed one LOCKUP caused by long queue wait. Here is the call stack with THP enabled. # 100.00% (ffffffff9e1aefca) | ---wait_on_page_bit do_huge_pmd_numa_page __handle_mm_fault handle_mm_fault __do_page_fault do_page_fault page_fault | |--60.39%--0x2b7b7 | | | |--34.26%--0x127d8 | | start_thread | | | --25.95%--0x127a2 | start_thread | --39.25%--0x2b788 | --38.81%--0x127a2 start_thread > > Covering both paths would be something like the patch below which spins > until the page is unlocked or it should reschedule. It's not even boot > tested as I spent what time I had on the test case that I hoped would be > able to prove it really works. I will give it a try. Thanks, Kan > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 79b36f57c3ba..31cda1288176 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -517,6 +517,13 @@ static inline void wait_on_page_locked(struct page > *page) > wait_on_page_bit(compound_head(page), PG_locked); > } > > +void __spinwait_on_page_locked(struct page *page); > +static inline void spinwait_on_page_locked(struct page *page) > +{ > + if (PageLocked(page)) > + __spinwait_on_page_locked(page); > +} > + > static inline int wait_on_page_locked_killable(struct page *page) > { > if (!PageLocked(page)) > diff --git a/mm/filemap.c b/mm/filemap.c > index a49702445ce0..c9d6f49614bc 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1210,6 +1210,15 @@ int __lock_page_or_retry(struct page *page, > struct mm_struct *mm, > } > } > > +void __spinwait_on_page_locked(struct page *page) > +{ > + do { > + cpu_relax(); > + } while (PageLocked(page) && !cond_resched()); > + > + wait_on_page_locked(page); > +} > + > /** > * page_cache_next_hole - find the next hole (not-present entry) > * @mapping: mapping > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 90731e3b7e58..c7025c806420 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1443,7 +1443,7 @@ int do_huge_pmd_numa_page(struct vm_fault > *vmf, pmd_t pmd) > if (!get_page_unless_zero(page)) > goto out_unlock; > spin_unlock(vmf->ptl); > - wait_on_page_locked(page); > + spinwait_on_page_locked(page); > put_page(page); > goto out; > } > @@ -1480,7 +1480,7 @@ int do_huge_pmd_numa_page(struct vm_fault > *vmf, pmd_t pmd) > if (!get_page_unless_zero(page)) > goto out_unlock; > spin_unlock(vmf->ptl); > - wait_on_page_locked(page); > + spinwait_on_page_locked(page); > put_page(page); > goto out; > } > diff --git a/mm/migrate.c b/mm/migrate.c > index e84eeb4e4356..9b6c3fc5beac 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -308,7 +308,7 @@ void __migration_entry_wait(struct mm_struct *mm, > pte_t *ptep, > if (!get_page_unless_zero(page)) > goto out; > pte_unmap_unlock(ptep, ptl); > - wait_on_page_locked(page); > + spinwait_on_page_locked(page); > put_page(page); > return; > out: >