Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756891AbZGMSZN (ORCPT ); Mon, 13 Jul 2009 14:25:13 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756805AbZGMSZN (ORCPT ); Mon, 13 Jul 2009 14:25:13 -0400 Received: from mx2.redhat.com ([66.187.237.31]:58474 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756579AbZGMSZL (ORCPT ); Mon, 13 Jul 2009 14:25:11 -0400 Message-ID: <4A5B7CC5.4030908@redhat.com> Date: Mon, 13 Jul 2009 21:28:21 +0300 From: Izik Eidus User-Agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103) MIME-Version: 1.0 To: Hugh Dickins CC: Andrea Arcangeli , Rik van Riel , Chris Wright , Nick Piggin , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: KSM: current madvise rollup References: <4A49E051.1080400@redhat.com> <4A4A5C56.5000109@redhat.com> <4A4B317F.4050100@redhat.com> <4A57C3D1.7000407@redhat.com> <20090712002219.502540d2@woof.woof> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5275 Lines: 140 Hugh Dickins wrote: > On Sun, 12 Jul 2009, Izik Eidus wrote: > >> On Sat, 11 Jul 2009 20:22:11 +0100 (BST) >> Hugh Dickins wrote: >> >>> We may want to do that anyway. It concerned me a lot when I was >>> first testing (and often saw kernel_pages_allocated greater than >>> pages_shared - probably because of the original KSM's eagerness to >>> merge forked pages, though I think there may have been more to it >>> than that). But seems much less of an issue now (that ratio is much >>> healthier), and even less of an issue once KSM pages can be swapped. >>> So I'm not bothering about it at the moment, but it may make sense. >>> > > I realized since writing that with the current statistics you really > cannot tell how big an issue the orphaned (count 1) KSM pages are - > good sharing of a few will completely hide non-sharing of many. > > But I've hacked in more stats (not something I'd care to share yet!), > and those confirm that for my loads at least, the orphaned KSM pages > are few compared with the shared ones. > > >> We could add patch like the below, but I think we should leave it as it >> is now, >> > > I agree we should leave it as is for now. My guess is that we'll > prefer to leave them around, until approaching max_kernel_pages_alloc, > pruning them only at that stage (rather as we free swap more aggressively > when it's 50% full). There may be benefit in not removing them too soon, > there may be benefit in holding on to stable pages for longer (holding a > reference in the stable tree for a while). Or maybe not, just an idea. > Well, I personaly dont see any real soultion to this problem without real swapping support, So for now I dont mind not to deal with it at all (as long as the admin can decide how many unswappable pages he allow) But, If you have a stronger opnion than me for that case, I really dont care to change it. > >> and solve it all (like you have said) with the ksm pages >> swapping support in next kernel release. >> (Right now ksm can limit itself with max_kernel_pages_alloc) >> >> diff --git a/mm/ksm.c b/mm/ksm.c >> index a0fbdb2..ee80861 100644 >> --- a/mm/ksm.c >> +++ b/mm/ksm.c >> @@ -1261,8 +1261,13 @@ static void ksm_do_scan(unsigned int scan_npages) >> rmap_item = scan_get_next_rmap_item(&page); >> if (!rmap_item) >> return; >> - if (!PageKsm(page) || !in_stable_tree(rmap_item)) >> + if (!PageKsm(page) || !in_stable_tree(rmap_item)) { >> cmp_and_merge_page(page, rmap_item); >> + } else if (page_mapcount(page) == 0) { >> > > If we did that (but we agree not for now), shouldn't it be > page_mapcount(page) == 1 > ? The mapcount 0 ones already got freed by the zap/unmap code. > Yea, I dont know what i thought when i compare it to zero..., 1 is the right value here indeed. > >> + break_cow(rmap_item->mm, >> + rmap_item->address & PAGE_MASK); >> > > Just a note on that " & PAGE_MASK": it's unnecessary there and > almost everywhere else. One of the pleasures of putting flags into > the bottom bits of the address, in code concerned with faulting, is > that the faulting address can be anywhere within the page, so we > don't have to bother to mask off the flags. > > >> + remove_rmap_item_from_tree(rmap_item); >> + } >> put_page(page); >> } >> } >> >> >>> Oh, something that might be making it higher, that I didn't highlight >>> (and can revert if you like, it was just more straightforward this >>> way): with scan_get_next_rmap skipping the non-present ptes, >>> pages_to_scan is currently a limit on the _present_ pages scanned in >>> one batch. >>> >> You mean that now when you say: pages_to_scan = 512, it wont count the >> none present ptes as part of the counter, so if we have 500 not present >> ptes in the begining and then 512 ptes later, before it used to call >> cmp_and_merge_page() only for 12 pages while now it will get called on >> 512 pages? >> > > If I understand you right, yes, before it would do those 500 absent then > 512 present in two batches, first 512 (of which only 12 present) then 500; > whereas now it'll skip the 500 absent without counting them, and handle > the 512 present in that same one batch. > > >> If yes, then I liked this change, it is more logical from cpu >> consumption point of view, >> > > Yes, although it does spend a little time on the absent ones, it should > be much less time than it spends comparing or checksumming on present ones. > Yea, compare to the other stuff it is minor... > >> and in addition we have that cond_reched() >> so I dont see a problem with this. >> > > Right, that cond_resched() is vital in this case. > > By the way, something else I didn't highlight, a significant benefit > from avoiding get_user_pages(): that was doing a mark_page_accessed() > on every present pte that it found, interfering with pageout decisions. > That is a great value, i didn't even thought about that... > Hugh > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/