Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753535Ab0AEFeM (ORCPT ); Tue, 5 Jan 2010 00:34:12 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752697Ab0AEFeL (ORCPT ); Tue, 5 Jan 2010 00:34:11 -0500 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:42682 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750773Ab0AEFeJ (ORCPT ); Tue, 5 Jan 2010 00:34:09 -0500 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 Date: Tue, 5 Jan 2010 14:30:46 +0900 From: KAMEZAWA Hiroyuki To: Linus Torvalds Cc: Minchan Kim , Peter Zijlstra , "Paul E. McKenney" , Peter Zijlstra , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , cl@linux-foundation.org, "hugh.dickins" , Nick Piggin , Ingo Molnar Subject: Re: [RFC][PATCH 6/8] mm: handle_speculative_fault() Message-Id: <20100105143046.73938ea2.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: References: <20100104182429.833180340@chello.nl> <20100104182813.753545361@chello.nl> <20100105092559.1de8b613.kamezawa.hiroyu@jp.fujitsu.com> <28c262361001042029w4b95f226lf54a3ed6a4291a3b@mail.gmail.com> <20100105134357.4bfb4951.kamezawa.hiroyu@jp.fujitsu.com> Organization: FUJITSU Co. LTD. X-Mailer: Sylpheed 2.7.1 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3222 Lines: 100 On Mon, 4 Jan 2010 21:10:29 -0800 (PST) Linus Torvalds wrote: > > > On Tue, 5 Jan 2010, KAMEZAWA Hiroyuki wrote: > > > > Then, my patch dropped speculative trial of page fault and did synchronous > > job here. I'm still considering how to insert some barrier to delay calling > > remove_vma() until all page fault goes. One idea was reference count but > > it was said not-enough crazy. > > What lock would you use to protect the vma lookup (in order to then > increase the refcount)? A sequence lock with RCU lookup of the vma? > Ah, I just used reference counter to show "how many threads are in page fault to this vma now". Below is from my post. == + rb_node = rcu_dereference(rb_node->rb_left); + } else + rb_node = rcu_dereference(rb_node->rb_right); + } + if (vma) { + if ((vma->vm_start <= addr) && (addr < vma->vm_end)) { + if (!atomic_inc_not_zero(&vma->refcnt)) + vma = NULL; + } else + vma = NULL; + } + rcu_read_unlock(); ... +void vma_put(struct vm_area_struct *vma) +{ + if ((atomic_dec_return(&vma->refcnt) == 1) && + waitqueue_active(&vma->wait_queue)) + wake_up(&vma->wait_queue); + return; +} == And wait for this reference count to be good number before calling remove_vma() == +/* called when vma is unlinked and wait for all racy access.*/ +static void invalidate_vma_before_free(struct vm_area_struct *vma) +{ + atomic_dec(&vma->refcnt); + wait_event(vma->wait_queue, !atomic_read(&vma->refcnt)); +} + .... * us to remove next before dropping the locks. */ __vma_unlink(mm, next, vma); + invalidate_vma_before_free(next); if (file) __remove_shared_vm_struct(next, file, mapping); etc.... == Above codes are a bit heavy(and buggy). I have some fixes. > Sounds doable. But it also sounds way more expensive than the current VM > fault handling, which is pretty close to optimal for single-threaded > cases.. That RCU lookup might be cheap, but just the refcount is generally > going to be as expensive as a lock. > For single-threaded apps, my patch will have no benefits. (but will not make anything worse.) I'll add CONFIG and I wonder I can enable speculave_vma_lookup only after mm_struct is shared.(but the patch may be messy...) > Are there some particular mappings that people care about more than > others? If we limit the speculative lookup purely to anonymous memory, > that might simplify the problem space? > I wonder, for usual people who don't write highly optimized programs, some small benefit of skipping mmap_sem is to reduce mmap_sem() ping-pong after doing fork()->exec(). This can cause some jitter to the application. So, I'm glad if I can help file-backed vmas. > [ From past experiences, I suspect DB people would be upset and really > want it for the general file mapping case.. But maybe the main usage > scenario is something else this time? ] > I'd like to hear use cases of really heavy users, too. Christoph ? Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/