Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261531AbVBIJJO (ORCPT ); Wed, 9 Feb 2005 04:09:14 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261450AbVBIJJN (ORCPT ); Wed, 9 Feb 2005 04:09:13 -0500 Received: from sv1.valinux.co.jp ([210.128.90.2]:41417 "EHLO sv1.valinux.co.jp") by vger.kernel.org with ESMTP id S261779AbVBIJIZ (ORCPT ); Wed, 9 Feb 2005 04:08:25 -0500 Date: Wed, 09 Feb 2005 18:08:23 +0900 From: IWAMOTO Toshihiro To: Hugh Dickins Cc: IWAMOTO Toshihiro , linux-kernel@vger.kernel.org, lhms-devel@lists.sourceforge.net Subject: Re: [RFC] Changing COW detection to be memory hotplug friendly In-Reply-To: References: <20050203035605.C981A7046E@sv1.valinux.co.jp> User-Agent: Wanderlust/2.11.30 (Wonderwall) SEMI/1.14.6 (Maruoka) FLIM/1.14.6 (Marutamachi) APEL/10.6 Emacs/21.3 (i386-pc-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Message-Id: <20050209090824.F084470468@sv1.valinux.co.jp> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5411 Lines: 136 At Mon, 7 Feb 2005 21:24:59 +0000 (GMT), Hugh Dickins wrote: > > On Thu, 3 Feb 2005, IWAMOTO Toshihiro wrote: > > The current implementation of memory hotremoval relies on that pages > > can be unmapped from process spaces. After successful unmapping, > > subsequent accesses to the pages are blocked and don't interfere > > the hotremoval operation. > > > > However, this code > > > > if (PageSwapCache(page) && > > page_count(page) != page_mapcount(page) + 2) { > > ret = SWAP_FAIL; > > goto out_unmap; > > } > > Yes, that is odd code. It would be nice to have a solution without it. > > > in try_to_unmap_one() prevents unmapping pages that are referenced via > > get_user_pages(), and such references can be held for a long time if > > they are due to such as direct IO. > > I've made a test program that issues multiple direct IO read requests > > against a single read buffer, and pages that belong to the buffer > > cannot be hotremoved because they aren't unmapped. > > I haven't looked at the rest of your hotremoval, so it's not obvious > to me how a change here would help you - obviously you wouldn't want > to be migrating pages while direct IO to them was in progress. > > I presume your patch works for you by letting the page count fall > to a point where migration moves it automatically as soon as the > got_user_pages are put, where without your patch the count is held > too high, and you keep doing scans which tend to miss the window > in which those pages are put? Yes. And my test program has no such time window because IO requests are issued without waiting for completion of older requests. I think issuing IO requests in such a manner is nonsense, but a misbehaving process shouldn't be able to prevent memory hotremoval. If the hotremoval code can unmap a page from a process space, the process will be blocked when it causes a page fault against the page. So, a process cannot continuously issue direct IO requests to keep page counts high. (get_user_pages() causes a (simulated) page fault.) > > - The can_share_swap_page() call in do_swap_page() always returns > > false. It is inefficient but should be harmless. Incrementing > > page_mapcount before calling that function should fix the problem, > > but it may cause bad side effects. > > Odd that your patch moves it if it now doesn't even work! > But I think some more movement should be able to solve that. Moving swap_free() is necessary to avoid can_share_swap_page() returning true when it shouldn't. And, write access cases are handled later with do_wp_page() call, anyway. > > - Another obvious solution to this issue is to find the "offending" > > process from a un-unmappable page and suspend it until the page is > > unmapped. I'm afraid the implementation would be much more complicated. > > Agreed, let's not get into that. Nice to hear that. > > --- mm/memory.c.orig 2005-01-17 14:47:11.000000000 +0900 > > +++ mm/memory.c 2005-01-17 14:55:51.000000000 +0900 > > @@ -1786,10 +1786,6 @@ static int do_swap_page(struct mm_struct > > } > > > > /* The page isn't present yet, go ahead with the fault. */ > > - > > - swap_free(entry); > > - if (vm_swap_full()) > > - remove_exclusive_swap_page(page); > > > > mm->rss++; > > acct_update_integrals(); > > @@ -1800,6 +1796,10 @@ static int do_swap_page(struct mm_struct > > pte = maybe_mkwrite(pte_mkdirty(pte), vma); > > write_access = 0; > > } > > + > > + swap_free(entry); > > + if (vm_swap_full()) > > + remove_exclusive_swap_page(page); > > unlock_page(page); > > > > flush_icache_page(vma, page); > > --- mm/rmap.c.orig 2005-01-17 14:40:08.000000000 +0900 > > +++ mm/rmap.c 2005-01-21 12:34:06.000000000 +0900 > > @@ -569,8 +569,11 @@ static int try_to_unmap_one(struct page > > */ > > if (PageSwapCache(page) && > > page_count(page) != page_mapcount(page) + 2) { > > - ret = SWAP_FAIL; > > - goto out_unmap; > > + if (page_mapcount(page) > 1) { /* happens when COW is in progress */ > > + ret = SWAP_FAIL; > > + goto out_unmap; > > + } > > + printk("unmapping GUPed page\n"); > > } > > > > /* Nuke the page table entry. */ > > I'm disappointed to see you making this more complicated, it's > even harder to understand than before. I believe that if we're > going to make good use of page_mapcount in can_share_swap_page, > it should be possible to delete this block from try_to_unmap_one > altogether. Or did you try that, and find it goes wrong? I just wanted to be conservative to get a working patch. I think this block can be deleted. > > --- mm/swapfile.c.orig 2005-01-17 14:47:11.000000000 +0900 > > +++ mm/swapfile.c 2005-01-31 16:59:11.000000000 +0900 > This can_share_swap_page looks messier than I'd want. > > I was hoping to append my own patch to this response, but I haven't > got it working right yet (swap seems too full). I hope tomorrow, > but thought I'd better not delay these comments any longer. I saw your patch in the other mail, and it looks better and should fix my problem. I'll try and report. -- IWAMOTO Toshihiro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/