Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753995AbZLAMXV (ORCPT ); Tue, 1 Dec 2009 07:23:21 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752369AbZLAMXU (ORCPT ); Tue, 1 Dec 2009 07:23:20 -0500 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:49140 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751503AbZLAMXT (ORCPT ); Tue, 1 Dec 2009 07:23:19 -0500 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 From: KOSAKI Motohiro To: Larry Woodman Subject: Re: [RFC] high system time & lock contention running large mixed workload Cc: kosaki.motohiro@jp.fujitsu.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, Hugh Dickins , KAMEZAWA Hiroyuki , Rik van Riel , Andrea Arcangeli In-Reply-To: <1259618429.2345.3.camel@dhcp-100-19-198.bos.redhat.com> References: <20091125133752.2683c3e4@bree.surriel.com> <1259618429.2345.3.camel@dhcp-100-19-198.bos.redhat.com> Message-Id: <20091201102645.5C0A.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Mailer: Becky! ver. 2.50.07 [ja] Date: Tue, 1 Dec 2009 21:23:23 +0900 (JST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2692 Lines: 66 (cc to some related person) > The cause was determined to be the unconditional call to > page_referenced() for every mapped page encountered in > shrink_active_list(). page_referenced() takes the anon_vma->lock and > calls page_referenced_one() for each vma. page_referenced_one() then > calls page_check_address() which takes the pte_lockptr spinlock. If > several CPUs are doing this at the same time there is a lot of > pte_lockptr spinlock contention with the anon_vma->lock held. This > causes contention on the anon_vma->lock, stalling in the fo and very > high system time. > > Before the splitLRU patch shrink_active_list() would only call > page_referenced() when reclaim_mapped got set. reclaim_mapped only got > set when the priority worked its way from 12 all the way to 7. This > prevented page_referenced() from being called from shrink_active_list() > until the system was really struggling to reclaim memory. > > On way to prevent this is to change page_check_address() to execute a > spin_trylock(ptl) when it was called by shrink_active_list() and simply > fail if it could not get the pte_lockptr spinlock. This will make > shrink_active_list() consider the page not referenced and allow the > anon_vma->lock to be dropped much quicker. > > The attached patch does just that, thoughts??? At first look, - We have to fix this issue certenally. - But your patch is a bit risky. Your patch treat trylock(pte-lock) failure as no accessced. but generally lock contention imply to have contention peer. iow, the page have reference bit typically. then, next shrink_inactive_list() move it active list again. that's suboptimal result. However, we can't treat lock-contention as page-is-referenced simply. if it does, the system easily go into OOM. So, if (priority < DEF_PRIORITY - 2) page_referenced() else page_refenced_trylock() is better? On typical workload, almost vmscan only use DEF_PRIORITY. then, if priority==DEF_PRIORITY situation don't cause heavy lock contention, the system don't need to mind the contention. anyway we can't avoid contention if the system have heavy memory pressure. btw, current shrink_active_list() have unnecessary page_mapping_inuse() call. it prevent to drop page reference bit from unmapped cache page. it mean we protect unmapped cache page than mapped page. it is strange. Unfortunately, I don't have enough development time today. I'll working on tommorow. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/