Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754233AbZGKTWx (ORCPT ); Sat, 11 Jul 2009 15:22:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753214AbZGKTWq (ORCPT ); Sat, 11 Jul 2009 15:22:46 -0400 Received: from mk-filter-2-a-1.mail.uk.tiscali.com ([212.74.100.53]:30704 "EHLO mk-filter-2-a-1.mail.uk.tiscali.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752981AbZGKTWp (ORCPT ); Sat, 11 Jul 2009 15:22:45 -0400 X-Trace: 228140659/mk-filter-2.mail.uk.tiscali.com/B2C/$b2c-THROTTLED-DYNAMIC/b2c-CUSTOMER-DYNAMIC-IP/79.69.87.181/None/hugh.dickins@tiscali.co.uk X-SBRS: None X-RemoteIP: 79.69.87.181 X-IP-MAIL-FROM: hugh.dickins@tiscali.co.uk X-SMTP-AUTH: X-MUA: X-IP-BHB: Once X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApsEADiDWEpPRVe1/2dsb2JhbACBUcpahAkF X-IronPort-AV: E=Sophos;i="4.42,384,1243810800"; d="scan'208";a="228140659" Date: Sat, 11 Jul 2009 20:22:11 +0100 (BST) From: Hugh Dickins X-X-Sender: hugh@sister.anvils To: Izik Eidus cc: Andrea Arcangeli , Rik van Riel , Chris Wright , Nick Piggin , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: KSM: current madvise rollup In-Reply-To: <4A57C3D1.7000407@redhat.com> Message-ID: References: <4A49E051.1080400@redhat.com> <4A4A5C56.5000109@redhat.com> <4A4B317F.4050100@redhat.com> <4A57C3D1.7000407@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6111 Lines: 138 On Sat, 11 Jul 2009, Izik Eidus wrote: >... > Isnt it mean that we are "stop using the stable tree help" ? > It look like every item that will go into the stable tree will get flushed > from it in the second run, that will highly increase the ksmd cpu usage, and > will make it find less pages... > Was this what you wanted to do? or am i missed anything? You sorted this one out for yourself before I got around to it. > > Beside this one more thing i noticed while checking this code: > beacuse the new "Ksm shared page" is not File backed page, it isnt count in > top as a shared page, and i couldnt find a way to see how many pages are > shared for each application.. Hah! I can't quite call that a neat trick, but it is amusing. Yes, checking up on where top gets SHR from, it does originate from file_rss, and your previous definition of PageKsm was such that those pages had to get counted as file_rss. If I thought for a moment that these pages really are like file pages, I'd immediately revert my PageKsm change, and instead make page->mapping point to a fictional address_space (rather like swapper_space), so that those pages could still be definitively identified. But they are not file pages, they are anon pages: anon pages which are shared (in this case shared via KSM rather than shared via fork); and I was fixing the accounting by making them look like anon pages again. And it'll be more important for them to look like anon pages when we get to swapping them next time around. Bundling them in with file_rss may have made some numbers stand out more obviously to you; but it was a masquerade, they weren't really the numbers you were wanting. > This is important for management tools such as a tool that will want to know > what Virtual Machines it want to migrate from the host into another host based > on the memory sharing in that specific host (Meaning how much ram it really > take on that specific host) Okay, I can see that you may well want such info. > > So I started to prepre a patch that will show merged pages count inside > /proc/pid/mergedpages, But then i thought this statics lie: > if we will have 2 applications: application A and application B, that share > the same page, how should it look like?: > > cat /proc/pid_of_A/merged_pages -> 1 > cat /proc/pid_of_B/merged_pages -> 1 > > or: > > cat /proc/pid_of_A/merged_pages -> 0 (beacuse this one was shared with the > page of B) > cat /proc/pid_of_B/merged_pages -> 1 I happen to think that the second method, plausible though it starts out, ends up leading to more grief than the first. But two more important things to say. One, I'm the wrong person to be asking about this: I've little experience to draw on here, and my interest wanes when it comes to the number gathering end of this. Two, I don't think you can do it with a count like that at all. If you're thinking of migrating A away from B, or A and B together away from the rest, don't you need to know how much they're sharing with each other, how much they're sharing with the rest? If A and B are different instances of the same app, they're likely to be sharing much more with each other than with the rest as a whole: and that'll make a huge difference to your decisions on migration. A single number (probably of that first kind) may be a nice kind of reassurance that things are working, and worth providing. But for detailed migration/provisioning decisions, I'd have thought you'd need the kernel to provide a list of "id"s of KSM-shared pages for each process, which your management tools could then crunch upon (observing the different sharings of ids) to try out different splits; or else, doing it the other way around, a representation of the stable_tree itself, with pids at the nodes. Though once you got into that detail, I wonder if you'd find that you need such info, not just about the KSM pages, but about the rest as well (how much are the anon pages being shared across fork, for example? which of the file pages are shmem/tmpfs pages needing swap? how much swap is being used?). I think it becomes quite a big subject, and you may be able to excite other people with it. > > To make the second method thing work as much as reaible as we can we would > want to break KsmPages that have just one mapping into them... We may want to do that anyway. It concerned me a lot when I was first testing (and often saw kernel_pages_allocated greater than pages_shared - probably because of the original KSM's eagerness to merge forked pages, though I think there may have been more to it than that). But seems much less of an issue now (that ratio is much healthier), and even less of an issue once KSM pages can be swapped. So I'm not bothering about it at the moment, but it may make sense. > > What do you think about that? witch direction should we take for that? If nobody else volunteers in on that, I could perhaps make up an incriminating list of mm people who have an interest in such things! > > (Other than this stuff, everything running happy and nice, Glad to hear it, yes, same at my end (I did have a hang in the cow breaking the night before I sent out the rollup, but included the fix in that, and it has stood up since). > I think cpu is > little bit too high beacuse the removing of the stable_tree issue) I think you've resolved that as a non-issue, but is cpu still looking too high to you? It looks high to me, but then I realize that I've tuned it to be high anyway. Do you have any comparison against the /dev/ksm KSM, or your first madvise version? Oh, something that might be making it higher, that I didn't highlight (and can revert if you like, it was just more straightforward this way): with scan_get_next_rmap skipping the non-present ptes, pages_to_scan is currently a limit on the _present_ pages scanned in one batch. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/