Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751059AbdHRNGY (ORCPT ); Fri, 18 Aug 2017 09:06:24 -0400 Received: from mga14.intel.com ([192.55.52.115]:18295 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750709AbdHRNGX (ORCPT ); Fri, 18 Aug 2017 09:06:23 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,393,1498546800"; d="scan'208";a="1207194462" From: "Liang, Kan" To: Linus Torvalds , Mel Gorman , "Kirill A. Shutemov" CC: Tim Chen , Peter Zijlstra , Ingo Molnar , Andi Kleen , Andrew Morton , "Johannes Weiner" , Jan Kara , linux-mm , Linux Kernel Mailing List Subject: RE: [PATCH 1/2] sched/wait: Break up long wake list walk Thread-Topic: [PATCH 1/2] sched/wait: Break up long wake list walk Thread-Index: AQHTFWNBYSKZKyu5OE6Y+fM96SxNwqKEIDEAgASaOPD//398AIAAsKCA//+X4wCAAZgMEA== Date: Fri, 18 Aug 2017 13:06:04 +0000 Message-ID: <37D7C6CF3E00A74B8858931C1DB2F07753787920@SHSMSX103.ccr.corp.intel.com> References: <84c7f26182b7f4723c0fe3b34ba912a9de92b8b7.1502758114.git.tim.c.chen@linux.intel.com> <37D7C6CF3E00A74B8858931C1DB2F07753786CE9@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F0775378761B@SHSMSX103.ccr.corp.intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMTY0ODNmZWMtZjIwZS00YjBjLWJjMjgtMmEwZGU2NzY5ZDNjIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IjI3MXBLTW02cVpvM0M4MDRsV3pcL0xyR3o5Mzlackp0WG1DaTdNQ29GK1o4PSJ9 x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id v7ID6UGA031666 Content-Length: 1525 Lines: 46 > On Thu, Aug 17, 2017 at 1:18 PM, Liang, Kan wrote: > > > > Here is the call stack of wait_on_page_bit_common when the queue is > > long (entries >1000). > > > > # Overhead Trace output > > # ........ .................. > > # > > 100.00% (ffffffff931aefca) > > | > > ---wait_on_page_bit > > __migration_entry_wait > > migration_entry_wait > > do_swap_page > > __handle_mm_fault > > handle_mm_fault > > __do_page_fault > > do_page_fault > > page_fault > > Hmm. Ok, so it does seem to very much be related to migration. Your > wake_up_page_bit() profile made me suspect that, but this one seems to > pretty much confirm it. > > So it looks like that wait_on_page_locked() thing in __migration_entry_wait(), > and what probably happens is that your load ends up triggering a lot of > migration (or just migration of a very hot page), and then *every* thread > ends up waiting for whatever page that ended up getting migrated. > > And so the wait queue for that page grows hugely long. > > Looking at the other profile, the thing that is locking the page (that everybody > then ends up waiting on) would seem to be > migrate_misplaced_transhuge_page(), so this is _presumably_ due to NUMA > balancing. > > Does the problem go away if you disable the NUMA balancing code? > Yes, the problem goes away when NUMA balancing is disabled. Thanks, Kan