Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753558AbdHQUon (ORCPT ); Thu, 17 Aug 2017 16:44:43 -0400 Received: from mail-oi0-f52.google.com ([209.85.218.52]:35890 "EHLO mail-oi0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752632AbdHQUol (ORCPT ); Thu, 17 Aug 2017 16:44:41 -0400 MIME-Version: 1.0 In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F0775378761B@SHSMSX103.ccr.corp.intel.com> References: <84c7f26182b7f4723c0fe3b34ba912a9de92b8b7.1502758114.git.tim.c.chen@linux.intel.com> <37D7C6CF3E00A74B8858931C1DB2F07753786CE9@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F0775378761B@SHSMSX103.ccr.corp.intel.com> From: Linus Torvalds Date: Thu, 17 Aug 2017 13:44:40 -0700 X-Google-Sender-Auth: -kyaEo7B_GO8HsN6wx3VCCPoCTE Message-ID: Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk To: "Liang, Kan" , Mel Gorman , "Kirill A. Shutemov" Cc: Tim Chen , Peter Zijlstra , Ingo Molnar , Andi Kleen , Andrew Morton , Johannes Weiner , Jan Kara , linux-mm , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1533 Lines: 43 On Thu, Aug 17, 2017 at 1:18 PM, Liang, Kan wrote: > > Here is the call stack of wait_on_page_bit_common > when the queue is long (entries >1000). > > # Overhead Trace output > # ........ .................. > # > 100.00% (ffffffff931aefca) > | > ---wait_on_page_bit > __migration_entry_wait > migration_entry_wait > do_swap_page > __handle_mm_fault > handle_mm_fault > __do_page_fault > do_page_fault > page_fault Hmm. Ok, so it does seem to very much be related to migration. Your wake_up_page_bit() profile made me suspect that, but this one seems to pretty much confirm it. So it looks like that wait_on_page_locked() thing in __migration_entry_wait(), and what probably happens is that your load ends up triggering a lot of migration (or just migration of a very hot page), and then *every* thread ends up waiting for whatever page that ended up getting migrated. And so the wait queue for that page grows hugely long. Looking at the other profile, the thing that is locking the page (that everybody then ends up waiting on) would seem to be migrate_misplaced_transhuge_page(), so this is _presumably_ due to NUMA balancing. Does the problem go away if you disable the NUMA balancing code? Adding Mel and Kirill to the participants, just to make them aware of the issue, and just because their names show up when I look at blame. Linus