Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752567AbdHRQgO (ORCPT ); Fri, 18 Aug 2017 12:36:14 -0400 Received: from mga05.intel.com ([192.55.52.43]:21648 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750949AbdHRQgN (ORCPT ); Fri, 18 Aug 2017 12:36:13 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,393,1498546800"; d="scan'208";a="1005255983" Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk To: Mel Gorman , "Liang, Kan" Cc: Linus Torvalds , Mel Gorman , "Kirill A. Shutemov" , Peter Zijlstra , Ingo Molnar , Andi Kleen , Andrew Morton , Johannes Weiner , Jan Kara , linux-mm , Linux Kernel Mailing List References: <84c7f26182b7f4723c0fe3b34ba912a9de92b8b7.1502758114.git.tim.c.chen@linux.intel.com> <37D7C6CF3E00A74B8858931C1DB2F07753786CE9@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F0775378761B@SHSMSX103.ccr.corp.intel.com> <20170818122339.24grcbzyhnzmr4qw@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F077537879BB@SHSMSX103.ccr.corp.intel.com> <20170818144622.oabozle26hasg5yo@techsingularity.net> From: Tim Chen Message-ID: Date: Fri, 18 Aug 2017 09:36:10 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <20170818144622.oabozle26hasg5yo@techsingularity.net> Content-Type: text/plain; charset=iso-8859-15 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1492 Lines: 35 On 08/18/2017 07:46 AM, Mel Gorman wrote: > On Fri, Aug 18, 2017 at 02:20:38PM +0000, Liang, Kan wrote: >>> Nothing fancy other than needing a comment if it works. >>> >> >> No, the patch doesn't work. >> > > That indicates that it may be a hot page and it's possible that the page is > locked for a short time but waiters accumulate. What happens if you leave > NUMA balancing enabled but disable THP? Waiting on migration entries also > uses wait_on_page_locked so it would be interesting to know if the problem > is specific to THP. > > Can you tell me what this workload is doing? I want to see if it's something > like many threads pounding on a limited number of pages very quickly. If It is a customer workload so we have limited visibility. But we believe there are some pages that are frequently accessed by all threads. > it's many threads working on private data, it would also be important to > know how each buffers threads are aligned, particularly if the buffers > are smaller than a THP or base page size. For example, if each thread is > operating on a base page sized buffer then disabling THP would side-step > the problem but THP would be false sharing between multiple threads. > Still, I don't think this problem is THP specific. If there is a hot regular page getting migrated, we'll also see many threads get queued up quickly. THP may have made the problem worse as migrating it takes a longer time, meaning more threads could get queued up. Thanks. Tim