Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753026Ab0AaP5X (ORCPT ); Sun, 31 Jan 2010 10:57:23 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752485Ab0AaP5X (ORCPT ); Sun, 31 Jan 2010 10:57:23 -0500 Received: from mk-filter-3-a-1.mail.uk.tiscali.com ([212.74.100.54]:42744 "EHLO mk-filter-3-a-1.mail.uk.tiscali.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752364Ab0AaP5W (ORCPT ); Sun, 31 Jan 2010 10:57:22 -0500 X-Trace: 333857953/mk-filter-3.mail.uk.tiscali.com/B2C/$b2c-THROTTLED-DYNAMIC/b2c-CUSTOMER-DYNAMIC-IP/79.69.93.110/None/hugh.dickins@tiscali.co.uk X-SBRS: None X-RemoteIP: 79.69.93.110 X-IP-MAIL-FROM: hugh.dickins@tiscali.co.uk X-SMTP-AUTH: X-Originating-Country: GB/UNITED KINGDOM X-MUA: Alpine 2.00 (LSU 1167 2008-08-23) X-IP-BHB: Once X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlcBAJ03ZUtPRV1u/2dsb2JhbAAI1kGERQQ X-IronPort-AV: E=Sophos;i="4.49,378,1262563200"; d="scan'208";a="333857953" Date: Sun, 31 Jan 2010 15:57:19 +0000 (GMT) From: Hugh Dickins X-X-Sender: hugh@sister.anvils To: Jarkko Lavinen cc: Andrew Morton , Nitin Gupta , Jakub Jelinek , Karel Zak , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: Fix nr_good_pages calculation In-Reply-To: <20100120151912.GA27747@angel.research.nokia.com> Message-ID: References: <20100120151912.GA27747@angel.research.nokia.com> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5477 Lines: 152 On Wed, 20 Jan 2010, Jarkko Lavinen wrote: > Swapon wastes one page of swap space to no effect. > > Mkswap stores the value 'pages - 1' into last_page field, where pages is the > partition size in pages. When nr_good_pages is calculated, last_page + 1 > should be used for the number of all the pages header included. > > Signed-off-by: Jarkko Lavinen > --- > mm/swapfile.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 6c0585b..50d90ca 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1961,7 +1961,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > if (error) > goto bad_swap; > > - nr_good_pages = swap_header->info.last_page - > + nr_good_pages = swap_header->info.last_page + 1 - > swap_header->info.nr_badpages - > 1 /* header page */; > > -- Sorry to be so slow to respond: I needed to do a little research, and find time to cool myself down near absolute zero to consider offs-by-one. You're right that there's an off-by-one there: Nitin and I also noticed it last year; but I'm afraid I have to Nack your patch, just as I did Nitin's similar patch last August. Fixing up nr_good_pages like that makes the numbers displayed look right; but you're then inconsistent with p->max, counting a page which can never get to be used. Here's what I believe is the right patch: is this something you and Nitin can give your Acks to, or do you see further issues with it? Thanks, Hugh [PATCH] mm: fix swapon size off-by-one There's an off-by-one disagreement between mkswap and swapon about the meaning of swap_header last_page: mkswap (in all versions I've looked at: util-linux-ng and BusyBox and old util-linux; probably as far back as 1999) consistently means the offset (in page units) of the last page of the swap area, whereas kernel sys_swapon (as far back as 2.2 and 2.3) strangely takes it to mean the size (in page units) of the swap area. This disagreement is the safe way round; but it's worrying people, and loses us one page of swap. The fix is not just to add one to nr_good_pages: we need to get maxpages (the size of the swap_map array) right before that; and though that is an unsigned long, be careful not to overflow the unsigned int p->max which later holds it (probably why header uses __u32 last_page instead of size). Why did we subtract one from the maximum swp_offset to calculate maxpages? Though it was probably me who made that change in 2.4.10, I don't get it: and now we should be adding one (without risk of overflow in this case). Fix the handling of swap_header badpages: it could have overrun the swap_map when very large swap area used on a more limited architecture. Remove pre-initializations of swap_header, nr_good_pages and maxpages: those date from when sys_swapon was supporting other versions of header. Reported-by: Nitin Gupta Reported-by: Jarkko Lavinen Signed-off-by: Hugh Dickins --- mm/swapfile.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) --- 2.6.33-rc6/mm/swapfile.c 2009-12-18 11:42:55.000000000 +0000 +++ linux/mm/swapfile.c 2010-01-31 13:15:58.000000000 +0000 @@ -1759,11 +1759,11 @@ SYSCALL_DEFINE2(swapon, const char __use unsigned int type; int i, prev; int error; - union swap_header *swap_header = NULL; - unsigned int nr_good_pages = 0; + union swap_header *swap_header; + unsigned int nr_good_pages; int nr_extents = 0; sector_t span; - unsigned long maxpages = 1; + unsigned long maxpages; unsigned long swapfilepages; unsigned char *swap_map = NULL; struct page *page = NULL; @@ -1922,9 +1922,13 @@ SYSCALL_DEFINE2(swapon, const char __use * swap pte. */ maxpages = swp_offset(pte_to_swp_entry( - swp_entry_to_pte(swp_entry(0, ~0UL)))) - 1; - if (maxpages > swap_header->info.last_page) - maxpages = swap_header->info.last_page; + swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1; + if (maxpages > swap_header->info.last_page) { + maxpages = swap_header->info.last_page + 1; + /* p->max is an unsigned int: don't overflow it */ + if ((unsigned int)maxpages == 0) + maxpages = UINT_MAX; + } p->highest_bit = maxpages - 1; error = -EINVAL; @@ -1948,23 +1952,24 @@ SYSCALL_DEFINE2(swapon, const char __use } memset(swap_map, 0, maxpages); + nr_good_pages = maxpages - 1; /* omit header page */ + for (i = 0; i < swap_header->info.nr_badpages; i++) { - int page_nr = swap_header->info.badpages[i]; - if (page_nr <= 0 || page_nr >= swap_header->info.last_page) { + unsigned int page_nr = swap_header->info.badpages[i]; + if (page_nr == 0 || page_nr > swap_header->info.last_page) { error = -EINVAL; goto bad_swap; } - swap_map[page_nr] = SWAP_MAP_BAD; + if (page_nr < maxpages) { + swap_map[page_nr] = SWAP_MAP_BAD; + nr_good_pages--; + } } error = swap_cgroup_swapon(type, maxpages); if (error) goto bad_swap; - nr_good_pages = swap_header->info.last_page - - swap_header->info.nr_badpages - - 1 /* header page */; - if (nr_good_pages) { swap_map[0] = SWAP_MAP_BAD; p->max = maxpages; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/