Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754629AbaLDSKR (ORCPT ); Thu, 4 Dec 2014 13:10:17 -0500 Received: from mail-lb0-f174.google.com ([209.85.217.174]:51477 "EHLO mail-lb0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751651AbaLDSKP (ORCPT ); Thu, 4 Dec 2014 13:10:15 -0500 MIME-Version: 1.0 In-Reply-To: <65CD3FC07F3BF942ABE211646D72D770356ED34A@IRSMSX110.ger.corp.intel.com> References: <65CD3FC07F3BF942ABE211646D72D770356EACA5@IRSMSX110.ger.corp.intel.com> <1417099205-13309-1-git-send-email-emmanuel.berthier@intel.com> <65CD3FC07F3BF942ABE211646D72D770356EB2B2@IRSMSX110.ger.corp.intel.com> <65CD3FC07F3BF942ABE211646D72D770356EC658@IRSMSX110.ger.corp.intel.com> <65CD3FC07F3BF942ABE211646D72D770356ECDFD@IRSMSX110.ger.corp.intel.com> <65CD3FC07F3BF942ABE211646D72D770356ED34A@IRSMSX110.ger.corp.intel.com> From: Andy Lutomirski Date: Thu, 4 Dec 2014 10:09:53 -0800 Message-ID: Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception To: "Berthier, Emmanuel" Cc: Thomas Gleixner , "H. Peter Anvin" , X86 ML , "Jarzmik, Robert" , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 4, 2014 at 8:01 AM, Berthier, Emmanuel wrote: >> From: Andy Lutomirski [mailto:luto@amacapital.net] >> Sent: Wednesday, December 3, 2014 8:30 PM >> To: Berthier, Emmanuel >> Cc: Thomas Gleixner; H. Peter Anvin; X86 ML; Jarzmik, Robert; LKML >> Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception >> > The final patch will bypass the new code in case of UserSpace page fault, so >> performance impact will be very low. >> > LBRs copy takes much more time than LBR stop/start. >> > >> > The simple is the better: >> > >> > .macro STOP_LBR >> > #ifdef CONFIG_LBR_DUMP_ON_EXCEPTION >> > testl $3,CS(%rsp) /* Kernel Space? */ >> > jnz 1f >> > testl $3, PER_CPU_VAR(lbr_dump_state) /* Disabled? */ >> > jnz 1f >> >> But that just wasted two of your LBR slots. > > No: false test does not generate Branch record, ex: > > Last Branch Records: > to: [] page_fault+0x0/0x90 > from: [] sysrq_handle_crash+0x16/0x20 > to: [] sysrq_handle_crash+0x0/0x20 > from: [] __handle_sysrq+0x9c/0x170 > to: [] __handle_sysrq+0x92/0x170 > >> > push %rax >> > push %rcx >> > push %rdx >> > movl $MSR_IA32_DEBUGCTLMSR, %ecx >> > rdmsr >> > and $~1, %eax /* Disable LBR recording */ >> > wrmsr >> > pop %rdx >> > pop %rcx >> > pop %rax >> >> And the general problem with this approach (even ignoring the performance >> hit, and kernel faults on user addresses really do happen in real workloads) is >> that you're not saving and restoring MSR_IA32_DEBUGCTL. >> It may be that >> the rest of your patch does whatever magic is needed to make this work, but >> from just this code it's not at all obvious that this is correct. > > The algorithm is quite simple: > When I enter in Exception handler, I stop LBR recording, and dump its content later if needed. > When I leave Exception Handler, I restart LBR recording. > So, after the first exception, LBR in On. > In case of nested Exceptions and crash, you're right, LBR will probably not be relevant. > But your proposal does not solve this issue: If we save registers during 1rst exception, and then overwrite them during 2nd level, > we will lose relevant info if crash is due to the 1rst exception. > >> Hence my suggestion for rdmsr -- if you're willing to enable this and take the >> performance hit, you can simplify it a lot and save some branch slots by >> unconditionally doing the rdmsrs if you've enabled the LBR tracing IDT entry. >> The simplification from using rdmsr isn't that the save code is simplified -- it's >> that there's no state change on exception entry, so you don't need to worry >> about restoring state correctly on the way out or during a context switch. >> And you can enable/disable the whole thing just by writing to the IDT, so >> there's no performance hit at all in the disabled case. > > Concerning performances: if it's really matter, the better is to disable the CONFIG switch. > But if we enable it, it's for using it I guess, and in that case, bypassing UserSpace page faults is better. > You're proposal of "unconditionally doing the rdmsrs" is not good in that case. > The only small gain is when CONFIG is enable and feature is disabled by cmdline: > - with my proposal, we get 1 test and 1 jmp more (if I switch Kernel test with LBR state test): for an exception treatment, does it really matter? > > We can mix our proposals: keep my STOP/START code, and replace the dynamic disabling test by IDT change. > I hope the code will stay readable. > Do we really want to save 2 instructions? I don't really care about the number of instructions. But there are still all the nasty cases: - Context switch during exception processing (both in the C handler and in the retint code). - PMI during exception processing. - Exception while perf is poking at LBR msrs. Where are you planning on saving the start/stop previous state? --Andy > > Thanks, > > Emmanuel. -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/