Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752229AbdHRRs0 (ORCPT ); Fri, 18 Aug 2017 13:48:26 -0400 Received: from mail-oi0-f49.google.com ([209.85.218.49]:33837 "EHLO mail-oi0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751644AbdHRRsZ (ORCPT ); Fri, 18 Aug 2017 13:48:25 -0400 MIME-Version: 1.0 In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F07753787AE4@SHSMSX103.ccr.corp.intel.com> References: <84c7f26182b7f4723c0fe3b34ba912a9de92b8b7.1502758114.git.tim.c.chen@linux.intel.com> <37D7C6CF3E00A74B8858931C1DB2F07753786CE9@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F0775378761B@SHSMSX103.ccr.corp.intel.com> <20170818122339.24grcbzyhnzmr4qw@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F077537879BB@SHSMSX103.ccr.corp.intel.com> <20170818144622.oabozle26hasg5yo@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F07753787AE4@SHSMSX103.ccr.corp.intel.com> From: Linus Torvalds Date: Fri, 18 Aug 2017 10:48:23 -0700 X-Google-Sender-Auth: oIubGc4qwHm7jvv2GEgqae09ZaA Message-ID: Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk To: "Liang, Kan" Cc: Mel Gorman , Mel Gorman , "Kirill A. Shutemov" , Tim Chen , Peter Zijlstra , Ingo Molnar , Andi Kleen , Andrew Morton , Johannes Weiner , Jan Kara , linux-mm , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 981 Lines: 24 On Fri, Aug 18, 2017 at 9:53 AM, Liang, Kan wrote: > >> On Fri, Aug 18, 2017 Mel Gorman wrote: >> >> That indicates that it may be a hot page and it's possible that the page is >> locked for a short time but waiters accumulate. What happens if you leave >> NUMA balancing enabled but disable THP? > > No, disabling THP doesn't help the case. Interesting. That particular code sequence should only be active for THP. What does the profile look like with THP disabled but with NUMA balancing still enabled? Just asking because maybe that different call chain could give us some other ideas of what the commonality here is that triggers out behavioral problem. I was really hoping that we'd root-cause this and have a solution (and then apply Tim's patch as a "belt and suspenders" kind of thing), but it's starting to smell like we may have to apply Tim's patch as a band-aid, and try to figure out what the trigger is longer-term. Linus