Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763357AbZJOXyR (ORCPT ); Thu, 15 Oct 2009 19:54:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1763170AbZJOXyR (ORCPT ); Thu, 15 Oct 2009 19:54:17 -0400 Received: from mk-filter-2-a-1.mail.uk.tiscali.com ([212.74.100.53]:23900 "EHLO mk-filter-2-a-1.mail.uk.tiscali.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758826AbZJOXyQ (ORCPT ); Thu, 15 Oct 2009 19:54:16 -0400 X-Trace: 274570744/mk-filter-2.mail.uk.tiscali.com/B2C/$b2c-THROTTLED-DYNAMIC/b2c-CUSTOMER-DYNAMIC-IP/80.41.50.30/None/hugh.dickins@tiscali.co.uk X-SBRS: None X-RemoteIP: 80.41.50.30 X-IP-MAIL-FROM: hugh.dickins@tiscali.co.uk X-SMTP-AUTH: X-MUA: X-IP-BHB: Once X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AswEAPdS10pQKTIe/2dsb2JhbACBUtcmhDAE X-IronPort-AV: E=Sophos;i="4.44,568,1249254000"; d="scan'208";a="274570744" Date: Fri, 16 Oct 2009 00:53:36 +0100 (BST) From: Hugh Dickins X-X-Sender: hugh@sister.anvils To: KAMEZAWA Hiroyuki cc: Andrew Morton , Nitin Gupta , hongshin@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 7/9] swap_info: swap count continuations In-Reply-To: <20091015123024.21ca3ef7.kamezawa.hiroyu@jp.fujitsu.com> Message-ID: References: <20091015123024.21ca3ef7.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4340 Lines: 114 On Thu, 15 Oct 2009, KAMEZAWA Hiroyuki wrote: > On Thu, 15 Oct 2009 01:56:01 +0100 (BST) > Hugh Dickins wrote: > > > This patch implements swap count continuations: when the count overflows, > > a continuation page is allocated and linked to the original vmalloc'ed > > map page, and this used to hold the continuation counts for that entry > > and its neighbours. These continuation pages are seldom referenced: > > the common paths all work on the original swap_map, only referring to > > a continuation page when the low "digit" of a count is incremented or > > decremented through SWAP_MAP_MAX. > > Hmm...maybe I don't understand the benefit of this style of data structure. I can see that what I have there is not entirely transparent! > > Do we need fine grain chain ? > Is array of "unsigned long" counter is bad ? (too big?) I'll admit that that design just happens to be what first sprang to my mind. It was only later, while implementing it, that I wondered, hey, wouldn't it be a lot simpler just to have an extension array of full counts? It seemed to me (I'm not certain) that the char arrays I was implementing were better suited to (use less memory in) a "normal" workload in which the basic swap_map counts might overflow (but I wonder how normal is any workload in which they overflow). Whereas the array of full counts would be better suited to an "aberrant" workload in which a mischievous user is actually trying to maximize those counts. I decided to carry on with the better solution for the (more) normal workload, the solution less likely to gobble up more memory there than we've used before. While I agree that the full count implementation would be simpler and more obviously correct, I thought it was still going to involve a linked list of pages (but "parallel" rather than "serial": each of the pages assigned to one range of the base page). Looking at what you propose below, maybe I'm not getting the details right, but it looks as if you're having to do an order 2 or order 3 page allocation? Attempted with GFP_ATOMIC? I'd much rather stick with order 0 pages, even if we do have to chain them to the base. (Order 3 on 64-bit? A side issue which deterred me from the full count approach, was the argumentation we'd get into over how big a full count needs to be. I think, for so long as we have atomic_t page count and page mapcount, an int is big enough for swap count. But switching them to atomic_long_t may already be overdue. Anyway, I liked how the char continuations avoided that issue.) I'm reluctant to depart from what I have, now that it's tested; but yes, we could perfectly well replace it by a different design, it is very self-contained. The demands on this code are unusually simple: it only has to manage counting up and counting down; so it is very easily tested. (The part I found difficult was getting rid of the __GFP_ZERO I was allocating with originally.) Hugh > > == > #define EXTENTION_OFFSET_INDEX(offset) (((offset) & PAGE_MASK) > #define EXTENTION_OFFSET_MASK (~(PAGE_SIZE/sizeof(long) - 1)) > struct swapcount_extention_array { > unsigned long *map[EXTEND_MAP_SIZE]; > }; > > At adding continuation. > > int add_swap_count_continuation(swp_entry_t entry, gfp_t gfp_mask) > { > struct page *page; > unsigned long *newmap, *map; > struct swapcount_extention_array *array; > > newmap = __get_free_page(mask); > si = swap_info_get(entry); > array = kmalloc(sizeof(swapcount_extention_array); > > .... > (If overflow) > page = vmalloc_to_page(si->swap_map + offset); > if (!PagePrivate(page)) { > page->praivate = array; > } else > kfree(array); > > index = EXTENTION_OFFSET_INDEX(offset); > pos = EXTENTION_OFFSET_MASK(offset); > > array = page->private; > if (!array->map[index]) { > array->map[index] = newmap; > } else > free_page(newmap); > map = array->map[index]; > map[pos] += 1; > mappage = vaddr_to_page(map); > get_page(mappage); # increment page->count of array. > == > > Hmm? maybe I just don't like chain... > > Regards, > -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/