Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756815AbZCERCQ (ORCPT ); Thu, 5 Mar 2009 12:02:16 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755550AbZCERB7 (ORCPT ); Thu, 5 Mar 2009 12:01:59 -0500 Received: from wa4ehsobe002.messaging.microsoft.com ([216.32.181.12]:22928 "EHLO WA4EHSOBE002.bigfish.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754736AbZCERB5 (ORCPT ); Thu, 5 Mar 2009 12:01:57 -0500 X-BigFish: VPS-36(zz1432R14e0Q98dR1805M936fKzzzzz32i6bh61h) X-Spam-TCS-SCL: 0:0 X-FB-SS: 5, X-WSS-ID: 0KG1LYQ-02-PPC-01 Date: Thu, 5 Mar 2009 18:01:36 +0100 From: Andreas Herrmann To: Ingo Molnar CC: Jaswinder Singh Rajput , "H. Peter Anvin" , x86 maintainers , LKML Subject: Re: [git-pull -tip] x86: msr architecture debug code Message-ID: <20090305170136.GD7347@alberich.amd.com> References: <1236008575.3332.2.camel@localhost.localdomain> <20090302205437.GB14471@elte.hu> <20090305135444.GB7347@alberich.amd.com> <20090305140809.GA27962@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20090305140809.GA27962@elte.hu> User-Agent: Mutt/1.5.16 (2007-06-09) X-OriginalArrivalTime: 05 Mar 2009 17:01:42.0419 (UTC) FILETIME=[0EC44E30:01C99DB4] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12076 Lines: 262 On Thu, Mar 05, 2009 at 03:08:09PM +0100, Ingo Molnar wrote: > * Andreas Herrmann wrote: > > Having this stuff in the kernel unnecessarily bloats up kernel code. > > it should be a default-off Kconfig option and it is in debugfs > so there's no real bloat issue here. I attached parts of an autogenerated file which contains MSR definitions for AMD family 10h in some condensed format. I stripped off some lines -- the file had 487 lines and is about 30k. You really like to have similar stuff for all x86 CPUs in-kernel? > > What the kernel needs to provide is a reliable interface to > > access MSRs -- to pass the data to userspace. This interface > > is already there. > > > > IMHO all kind of parsing and grouping of that data belongs in > > user space. > > > > One exception are MSRs that need to be checked early during > > boot (e.g. MTRRs). For debugging purposes you might want to > > dump certain MSRs early. But then you will use printk and not > > debugfs. > > Well it's really nice to know the _kernel's_ enumeration of MSRs > and its knowledge about the structure of those MSRs. > > Sure, we can and do export the flat MSR space to user-space, but > the kernel also enumerates them internally, in various places. > The debugfs interface shows them in one way - and as such also > acts as a central force to keep these things tidy. > > a VFS namespace is also pretty educative. You can see which MSRs > matter to the lapic for example, you can see their symbolic > names, their current state, etc. etc. > > > Maybe a symlink pointing it back to the topic directory > > > would be useful as well. For example: > > > > > > /debug/x86/cpu/msr/raw/0x372/topic_dir -> /debug/x86/cpu/msr/pmu/pmc_0/ > > > > > > Other "topic directories" are possible too: a > > > /debug/x86/cpu/msr/apic/ layout would be very useful and > > > informative as well, and so are some of the other MSRs we > > > tweak during bootup. > > > > All nice suggestions but why in-kernel? > > > > Just hack some script to do this. This is much more > > maintainable. You don't need a kernel update to add support > > for new CPUs or to fix bugs in this code itself -- you just > > have to tweak your script. > > the kernel tends to know a lot about these MSRs already so we > just provide that information in a more structured form as well. > > Such more structured form, beyond the debugging and > education/development advantages, also acts as a counter-force > back to the MSR enumeration code of the kernel and makes them > more structured. It will no doubt also extend the kernel's > knowledge of MSRs - read-only MSRs we dont normally read. If we don't read them we don't need them -- in kernel code. The knowledge of MSRs is usually required by certain code, drivers or subsystems. I think, we should only add MSR information if it is needed for real kernel functionality. Some examples are - MCA MSRs for mce - Pstate and FIDVID MSRs for powernow-k8 - MTRRs for cpu/mtrr code We don't have interfaces for PCI devices to show all their config space register values in decoded form. The kernel provides the interface to retrieve that information from userspace and usually you call lspci to decode some standard information and to dump all the rest. For MSRs we have an interface, too. There is a lack of a standard tool to do the decoding. (As a start you can use lsmsr.) > There's also a few other things like the IRR readout in the APIC > code or the perfcounters status dump can also be done cleanly > via /debug/x86/cpu/msr/. > > Eventually i'd like /debug/x86/ to become a full CPU state dump: > the kernel pagetable dumping code could go there, we could show > control registers, we could show the GDT and IDT settings and > contents, etc. etc. Yes, we could do a lot in the kernel. But should we? I second that dumping and decoding MSRs (and also CPU config space registers for AMD CPUs) is sometimes needed for debugging. But doing all of this in-kernel -- I think, that's not cool. Regards, Andreas -- /* * Licensed under the terms of the GNU GENERAL PUBLIC LICENSE version 2. * See file COPYING for details. */ #ifndef fam10h_h #define fam10h_h #include "../msr.h" _RANGE(fam10h_LSMCAaddr,48,16,0); _NAMES(fam10h_LSMCAaddr,"ADDR",0); _RANGE(fam10h_LSMCAstatus,16,4,25,1,1,8,2,1,1,1,1,1,1,1,0); _NAMES(fam10h_LSMCAstatus,"ErrorCode","ErrorCodeExt",0,"UECC","CECC","SYND",0,"PCC","ADDRV","MISCV","EN","UC","OVER","VAL"); _RANGE(fam10h_TSC,64,0); _NAMES(fam10h_TSC,"TSC"); _RANGE(fam10h_APIC_BASE,8,1,2,1,36,16,0); _NAMES(fam10h_APIC_BASE,0,"BSC",0,"ApicEn","ApicBar",0); _RANGE(fam10h_EBL_CR_POWERON,16,2,46,0); _NAMES(fam10h_EBL_CR_POWERON,0,"ClusterID",0); _RANGE(fam10h_PATCH_LEVEL,32,32,0); _NAMES(fam10h_PATCH_LEVEL,"PATCH_LEVEL",0); _RANGE(fam10h_MTRRcap,8,1,1,1,53,0); _NAMES(fam10h_MTRRcap,"MtrrCapVCnt","MtrrCapFix",0,"MtrrCapWc",0); _RANGE(fam10h_SYSENTER_CS,16,48,0); _NAMES(fam10h_SYSENTER_CS,"SYSENTER_CS",0); _RANGE(fam10h_SYSENTER_ESP,32,32,0); _NAMES(fam10h_SYSENTER_ESP,"SYSENTER_ESP",0); _RANGE(fam10h_SYSENTER_EIP,32,32,0); _NAMES(fam10h_SYSENTER_EIP,"SYSENTER_EIP",0); _RANGE(fam10h_MCG_CAP,8,1,55,0); _NAMES(fam10h_MCG_CAP,"Count","MCG_CTL_P",0); _RANGE(fam10h_MCG_STAT,1,1,1,61,0); _NAMES(fam10h_MCG_STAT,"RIPV","EIPV","MCIP",0); _RANGE(fam10h_MCG_CTL,1,1,1,1,1,1,58,0); _NAMES(fam10h_MCG_CTL,"DCE","ICE","BUE","LSE","NBE","FRE",0); _RANGE(fam10h_DBG_CTL_MSR,1,1,1,1,1,1,58,0); _NAMES(fam10h_DBG_CTL_MSR,"LBR","BTF","PB0","PB1","PB2","PB3",0); _RANGE(fam10h_BR_FROM,64,0); _NAMES(fam10h_BR_FROM,"LastBranchFromIP"); ... _RANGE(fam10h_MC5_CTL,1,63,0); _NAMES(fam10h_MC5_CTL,"CPUWDT",0); _RANGE(fam10h_MC5_STATUS,16,4,4,8,8,1,4,1,1,8,2,1,1,1,1,1,1,1,0); _NAMES(fam10h_MC5_STATUS,"ErrorCode","ErrorCodeExt",0,"Syndrome",0,"Scrub",0,"UECC","CECC","Syndrome",0,"PCC","AddrV","MiscV","En","UC","OVER","VAL"); _RANGE(fam10h_MC5_ADDR,48,16,0); _NAMES(fam10h_MC5_ADDR,"ADDR",0); _RANGE(fam10h_MC5_MISC,12,52,0); _NAMES(fam10h_MC5_MISC,"State",0); _RANGE(fam10h_EFER,1,7,1,1,1,1,1,1,1,49,0); _NAMES(fam10h_EFER,"SYSCALL",0,"LME",0,"LMA","NXE","SVME","LMSLE","FFXSE",0); _RANGE(fam10h_STAR,32,16,16,0); _NAMES(fam10h_STAR,"Target","SysCallSel","SysRetSel"); _RANGE(fam10h_STAR64,64,0); _NAMES(fam10h_STAR64,"LSTAR"); _RANGE(fam10h_STARCOMPAT,64,0); _NAMES(fam10h_STARCOMPAT,"CSTAR"); _RANGE(fam10h_SYSCALL_FLAG_MASK,32,32,0); _NAMES(fam10h_SYSCALL_FLAG_MASK,"MASK",0); _RANGE(fam10h_FS_BASE,64,0); _NAMES(fam10h_FS_BASE,"FS_BASE"); _RANGE(fam10h_GS_BASE,64,0); _NAMES(fam10h_GS_BASE,"GS_BASE"); _RANGE(fam10h_KernelGSbase,64,0); _NAMES(fam10h_KernelGSbase,"KernelGSBase"); _RANGE(fam10h_TSC_AUX,32,32,0); _NAMES(fam10h_TSC_AUX,"TscAux",0); _RANGE(fam10h_MC4_MISC1,24,8,12,4,1,2,1,4,5,1,1,1,0); _NAMES(fam10h_MC4_MISC1,0,"BlkPtr","ErrCnt",0,"Ovrflw","IntType","CntEn","LvtOffset",0,"Locked","CntP","Valid"); _RANGE(fam10h_MC4_MISC2,24,8,12,4,1,2,1,4,5,1,1,1,0); _NAMES(fam10h_MC4_MISC2,0,"BlkPtr","ErrCnt",0,"Ovrflw","IntType","CntEn","LvtOffset",0,"Locked","CntP","Valid"); _RANGE(fam10h_MC4_MISC3,24,8,32,0); _NAMES(fam10h_MC4_MISC3,0,"BlkPtr",0); _RANGE(fam10h_PERF_CTL0,8,8,1,1,1,1,1,1,1,1,8,4,4,1,1,22,0); _NAMES(fam10h_PERF_CTL0,"EventSelect","UnitMask","User","OS","Edge",0,"Int",0,"En","Inv","CntMask","EventSelect",0,"GuestOnly","HostOnly",0); _RANGE(fam10h_PERF_CTL1,8,8,1,1,1,1,1,1,1,1,8,4,4,1,1,22,0); _NAMES(fam10h_PERF_CTL1,"EventSelect","UnitMask","User","OS","Edge",0,"Int",0,"En","Inv","CntMask","EventSelect",0,"GuestOnly","HostOnly",0); _RANGE(fam10h_PERF_CTL2,8,8,1,1,1,1,1,1,1,1,8,4,4,1,1,22,0); _NAMES(fam10h_PERF_CTL2,"EventSelect","UnitMask","User","OS","Edge",0,"Int",0,"En","Inv","CntMask","EventSelect",0,"GuestOnly","HostOnly",0); _RANGE(fam10h_PERF_CTL3,8,8,1,1,1,1,1,1,1,1,8,4,4,1,1,22,0); _NAMES(fam10h_PERF_CTL3,"EventSelect","UnitMask","User","OS","Edge",0,"Int",0,"En","Inv","CntMask","EventSelect",0,"GuestOnly","HostOnly",0); _RANGE(fam10h_PERF_CTR0,48,16,0); _NAMES(fam10h_PERF_CTR0,"CTR",0); ... _RANGE(fam10h_IbsFetchCtl,16,16,16,1,1,1,1,1,2,1,1,1,6,0); _NAMES(fam10h_IbsFetchCtl,"IbsFetchMaxCnt","IbsFetchCnt","IbsFetchLat","IbsFetchEn","IbsFetchVal","IbsFetchComp","IbsIcMiss","IbsPhyAddrValid","IbsL1TlbPgSz","IbsL1TlbMiss","IbsL2TlbMiss","IbsRandEn",0); _RANGE(fam10h_IbsFetchLinAd,64,0); _NAMES(fam10h_IbsFetchLinAd,"IbsFetchLinAd"); _RANGE(fam10h_IbsFetchPhysAd,64,0); _NAMES(fam10h_IbsFetchPhysAd,"IbsFetchPhysAd"); _RANGE(fam10h_IbsOpCtl,16,1,1,1,45,0); _NAMES(fam10h_IbsOpCtl,"IbsOpMaxCnt",0,"IbsOpEn","IbsOpVal",0); _RANGE(fam10h_IbsOpRip,64,0); _NAMES(fam10h_IbsOpRip,"IbsOpRip"); _RANGE(fam10h_IbsOpData,16,16,1,1,1,1,1,1,26,0); _NAMES(fam10h_IbsOpData,"IbsCompToRetCtr","IbsTagToRetCtr","IbsOpBrnResync","IbsOpMispReturn","IbsOpReturn","IbsOpBrnTaken","IbsOpBrnMisp","IbsOpBrnRet",0); _RANGE(fam10h_IbsOpData2,3,1,1,1,58,0); _NAMES(fam10h_IbsOpData2,"NbIbsReqSrc",0,"NbIbsReqDstProc","NbIbsReqCacheHitSt",0); _RANGE(fam10h_IbsOpData3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,13,16,16,0); _NAMES(fam10h_IbsOpData3,"IbsLdOp","IbsStOp","IbsDcL1tlbMiss","IbsDcL2tlbMiss","IbsDcL1tlbHit2M","IbsDcL1tlbHit1G","IbsDcL2tlbHit2M","IbsDcMiss","IbsDcMisAcc","IbsDcLdBnkCon","IbsDcStBnkCon","IbsDcStToLdFwd","IbsDcStToLdCan","IbsDcUcMemAcc","IbsDcWcMemAcc","IbsDcLockedOp","IbsDcMabHit","IbsDcLinAddrValid","IbsDcPhyAddrValid",0,"IbsDcMissLat",0); _RANGE(fam10h_IbsDcLinAd,64,0); _NAMES(fam10h_IbsDcLinAd,"IbsDcLinAd"); _RANGE(fam10h_IbsDcPhysAd,64,0); _NAMES(fam10h_IbsDcPhysAd,"IbsDcPhysAd"); _RANGE(fam10h_IbsControl,4,4,1,55,0); _NAMES(fam10h_IbsControl,"LvtOffset",0,"LvtOffsetVal",0); struct reg_spec fam10h_spec [] = { _SPEC(0x0000, LSMCAaddr, "load-store MCA address", fam10h_), _SPEC(0x0001, LSMCAstatus, "load-store MCE status", fam10h_), _SPEC(0x0010, TSC, "time-stamp counter", fam10h_), _SPEC(0x001b, APIC_BASE, "APIC base address", fam10h_), _SPEC(0x002a, EBL_CR_POWERON, "cluster ID", fam10h_), _SPEC(0x008b, PATCH_LEVEL, "microcode patch level", fam10h_), _SPEC(0x00fe, MTRRcap, "MTRR capabilities", fam10h_), _SPEC(0x0174, SYSENTER_CS, "SYSENTER/SYSEXIT code segment selector", fam10h_), _SPEC(0x0175, SYSENTER_ESP, "SYSENTER/SYSEXIT stack pointer", fam10h_), _SPEC(0x0176, SYSENTER_EIP, "SYSENTER/SYSEXIT instruction pointer", fam10h_), _SPEC(0x0179, MCG_CAP, "global MC capabilities", fam10h_), _SPEC(0x017a, MCG_STAT, "global MC status", fam10h_), _SPEC(0x017b, MCG_CTL, "global MC control", fam10h_), _SPEC(0x01d9, DBG_CTL_MSR, "debug control", fam10h_), _SPEC(0x01db, BR_FROM, "last branch from IP", fam10h_), _SPEC(0x01dc, BR_TO, "last branch to IP", fam10h_), _SPEC(0x01dd, LastExceptionFromIP, "last exception from IP", fam10h_), _SPEC(0x01de, LastExceptionToIP, "last exception to IP", fam10h_), _SPEC(0x0200, MTRRphysBase0, "base of variable-size MTRR (0)", fam10h_), _SPEC(0x0201, MTRRphysMask0, "mask of variable-size MTRR (0)", fam10h_), ... _SPEC(0xc0011023, BU_CFG, "bus unit configuration", fam10h_), _SPEC(0xc001102A, BU_CFG2, "bus unit configuration 2", fam10h_), _SPEC(0xc0011030, IbsFetchCtl, "IBS fetch control", fam10h_), _SPEC(0xc0011031, IbsFetchLinAd, "IBS fetch linear address", fam10h_), _SPEC(0xc0011032, IbsFetchPhysAd, "IBS fetch physical address", fam10h_), _SPEC(0xc0011033, IbsOpCtl, "IBS execution control", fam10h_), _SPEC(0xc0011034, IbsOpRip, "IBS Op logical address", fam10h_), _SPEC(0xc0011035, IbsOpData, "IBS Op data", fam10h_), _SPEC(0xc0011036, IbsOpData2, "IBS Op data 2", fam10h_), _SPEC(0xc0011037, IbsOpData3, "IBS Op data 3", fam10h_), _SPEC(0xc0011038, IbsDcLinAd, "IBS DC linear address", fam10h_), _SPEC(0xc0011039, IbsDcPhysAd, "IBS DC physical address", fam10h_), _SPEC(0xc001103a, IbsControl, "IBS control", fam10h_), {0, NULL, NULL, NULL, NULL}, }; #endif /* fam10h_h */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/