Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754296Ab0A0LGh (ORCPT ); Wed, 27 Jan 2010 06:06:37 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754173Ab0A0LGf (ORCPT ); Wed, 27 Jan 2010 06:06:35 -0500 Received: from e33.co.us.ibm.com ([32.97.110.151]:48555 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751801Ab0A0LGe (ORCPT ); Wed, 27 Jan 2010 06:06:34 -0500 Date: Wed, 27 Jan 2010 16:35:55 +0530 From: Ananth N Mavinakayanahalli To: Peter Zijlstra Cc: Linus Torvalds , Stephen Rothwell , Kyle Moffett , Arnaldo Carvalho de Melo , Fr??d??ric Weisbecker , Oleg Nesterov , Steven Rostedt , LKML , Tom Tromey , "Frank Ch. Eigler" , linux-next@vger.kernel.org, "H. Peter Anvin" , utrace-devel@redhat.com, Thomas Gleixner , avi@redhat.com Subject: Re: linux-next: add utrace tree Message-ID: <20100127110555.GB1842@in.ibm.com> Reply-To: ananth@in.ibm.com References: <1264575134.4283.1983.camel@laptop> <1264589716.4283.2006.camel@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1264589716.4283.2006.camel@laptop> User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3250 Lines: 71 On Wed, Jan 27, 2010 at 11:55:16AM +0100, Peter Zijlstra wrote: > On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote: > > > > On Wed, 27 Jan 2010, Peter Zijlstra wrote: > > > > > > Right, so you're going to love uprobes, which does exactly that. The > > > current proposal is overwriting the target instruction with an INT3 and > > > injecting an extra vma into the target process's address space > > > containing the original instruction(s) and possible jumps back to the > > > old code stream. > > > > Just out of interest, how does it handle the threading issue? > > > > Last I saw, at least some CPU people were _very_ nervous about overwriting > > instructions if another CPU might be just about to execute them. > > > > Even the "overwrite only the first byte with 'int3'" made them go "umm, I > > need to talk to some core CPU people to see if that's ok". They mumble > > about possible CPU errata, I$ coherency, instruction retry etc. > > > > I realize kprobes does this very thing, but kprobes is esoteric stuff and > > doesn't have much choice. In user space, you _could_ do the modification > > on a different physical page and then just switch the page table entry > > instead, and not get into the whole D$/I$ coherency thing at all. > > Right, so there's two aspects: > > 1) concurrency when inserting the probe > 2) concurrency when hitting the probe > > 1) used to be dealt with by using utrace to stop all threads in the > process and then writing the instruction. I suggested to CoW the page, > modify the instruction, set the pagetable and flush tlbs at full speed > -- the very thing you suggest here. > > 2) so traditionally (and the intel arch manual describes this) is to > replace the instruction, single step it, and write the probe back. This > is racy for multi-threading. The current uprobes stuff solves this by > doing single-step-out-of-line (XOL). > > XOL injects a new vma into the target process and puts the old > instruction there, then it single steps on the new location, leaving the > original site with INT3. > > This doesn't work for things like RIP relative instructions, so uprobes > considers them un-probable. Probing RIP-relative instructions work just fine; there are fixups that take care of it. > Also, I myself really object to inserting a vma in a running process, > its like a land-lord, sure he has the key but he won't come in an poke > through your things. > > The alternative is to place the instruction in TLS or stack space, since > each thread can only have a single trap at a time, you only need space > for 1 instruction (plus a possible jump out to the original site). There > is the 'problem' of marking the TLS/stack executable when being probed. > > Then there is the whole emulation angle, the uprobes people basically > say its too much effort to write a x86 emulator. We don't need to write one. I don't know how easy it is to make the kvm emulator less kvm-centric (vcpus, kvm_context, etc). Avi? Ananth -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/