Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965607AbbLOXAR (ORCPT ); Tue, 15 Dec 2015 18:00:17 -0500 Received: from terminus.zytor.com ([198.137.202.10]:49427 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752144AbbLOXAO (ORCPT ); Tue, 15 Dec 2015 18:00:14 -0500 User-Agent: K-9 Mail for Android In-Reply-To: <1450217797-19295-1-git-send-email-mcgrof@do-not-panic.com> References: <1450217797-19295-1-git-send-email-mcgrof@do-not-panic.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Subject: Re: [RFC v1 0/8] x86/init: Linux linker tables From: "H. Peter Anvin" Date: Tue, 15 Dec 2015 14:59:07 -0800 To: "Luis R. Rodriguez" , tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, konrad.wilk@oracle.com CC: rusty@rustcorp.com.au, luto@amacapital.net, boris.ostrovsky@oracle.com, mcb30@ipxe.org, jgross@suse.com, JBeulich@suse.com, joro@8bytes.org, ryabinin.a.a@gmail.com, andreyknvl@google.com, long.wanglong@huawei.com, qiuxishi@huawei.com, aryabinin@virtuozzo.com, mchehab@osg.samsung.com, valentinrothberg@gmail.com, peter.senna@gmail.com, mcgrof@suse.com, x86@kernel.org, xen-devel@lists.xensource.com, linux-kernel@vger.kernel.org Message-ID: <75BC76AE-2D6D-40E0-917C-CF47E90156FA@zytor.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 15623 Lines: 360 On December 15, 2015 2:16:29 PM PST, "Luis R. Rodriguez" wrote: >From: "Luis R. Rodriguez" > > A long time ago in a galaxy far, > far away... > >Konrad Rzeszutek Wilk posted patches which eventually got merged to >help >with modularizing the IOMMUs we have on x86 [0]. This work was done due >to >the complex relationship that exists on IOMMUs and the requirements on >careful execution. The solution also provided a mechanism which >jettisoned >unused IOMMUs during run-time. > >During review, even though the code was merged, hpa did note that we >tend >to encounter this type of problem "often enough that we should >implement a >generic facility for it" [1], hpa acknowledged that it obviously has to >be >based on sections and even noted that perhaps we might be able in the >future to >automate its creation. He noted that the gPXE folks had done just this >with >linker tables and suggested that "presumably we'd need a few different >flavors >for init tables and so on, but this would make it a generic mechanism." > >The IOMMU code got merged and this was left on someone's mental >backburner. >I've had an itch to scratch recently to try to avoid issues which are >possible >if one does not jettison other code carefully due to the large >complexity of >implicit dependencies of certain code on x86 in particular with >possible dead >code on x86 due to paravirtualization, and the IOMMU jettison strategy >turned >out to be my favorite solution so far. I've taken on hpa's suggestions >from >back in the day to review gPXE's solution to see if we could embrace it >on >Linux for a generic section solution to help jettison code carefully. > >What this patch set does exactly: > >This RFC patch set attempts to add support for such a generic solution. >In the end, it turns out that the best solution possible was the best >of >both worlds: a combination of what Konrad had implemented in addition >to >what Michael Brown had implemented on the gXPE front. The IOMMU >solution >enables simple semantic annotations for dependency relationships, this >however requires a run time sort. The gPXE solution grants the option >to >simply sort at build time. One of gPXE's solution primary goals however >was >also to help avoid bit-rot on code that's possible from #ifdef'ery. The >Linux >linker table solution enables developers to pick and choose what they >need, with linker tables being the simplest solution. Contrary to gPXE >which strives to force compilation of all linker table solutions we >let developers pick *when* they want this as part of their solution. >As can be seen from the suggested x86 init specific use of linker >tables >proposed you can also take advantage of both, linker sorting, optional >compilation when needed (at developer's discretion), and even careful >semantics annotation for dependency / relationship annotations. >Although >the x86 init solution here is heavily inspired by the IOMMU solution it >diverges with strong semantics, and a new optional subarchitecture >annotation. Sorting of init sequences is structure specific, as such >each subsystem must defing their own solution unless semantics could >be shared. I considered sharing semantics but in the end this proved >pointless so this keeps things separate. A series of changes were made >to the x86 init sequence in contrast to the IOMMU solution to be >*extremely >pedantic* on semantics, review of this changes can be studied on the >table-init tree [2]. > >Quick review of gPXE's solution and prospects on further changes: > >In my review from gPXE's solution it was not clear what hpa meant by >gXPE folks having automated this process, they actually use linker >tables >all around, forcing compilation of *everything* and just do linking of >enabled features at link time. You still need to build linker tables on >your own. What I do see more potential for in the future is enabling to >evolve stronger semantics over time, and this would also be subsystem >specific. >This will be evident in this patch set on the x86 init use of linker >tables. >I also see potential in strenghtening semantics for linker sorting, any >of >these types of features however would impose requiring newer binutils. >For >instance, gPXE's linker solution currently relies on SORT(), that >defaults to >SORT_BY_NAME(). This sorts lexicographically, gPXE's solution uses two >digits >to enable SORT_BY_NAME()'s lexicographical sort to sort orde by numeric >priority. Since one is in control of order-level numbers one can >provide >guarantee that this sort should work as intended, however binutils also >now has >a SORT_BY_INIT_PRIORITY() which sorts specifically based on digits. >SORT_BY_INIT_PRIORITY() was designed specifically for init_array >sections >though. Refer to the userspace mockup solution table-init git tree [2] >commit >6deba47ee1ad461e90 for more details on this. One thing I can envision >to help >here further are prospects for future linker enhancements, these >however must >be carefully considered in light of possible requirements of newer >binutils >and also of compatibility with other toolchains. For now we resort to >the good >'ol SORT() which even Linux has made use of for ages already. In that >regards >here, nothing new here. gPXE folks however did find some fun ICC >compatiblity >issues, and have also developed fixes for them, these were obviously >carried >over but should be reviewed carefully. Lastly, I should point out that >essentially what we're developing are different forms loose and strong >semantics -- in the most complex form what we really are after are >feature >graphs. Mauro explained to me that for media driver they needed to >build a >feature graphs, IMHO that could be a next level of strong semantics we >might >want to consider. For now though the combination of gPXE's linker-table >solution based on linker order-level sort, and our old IOMMU init >solution >and its simple dependency map (and possible extensions, see >subarchitecture) >should likely suffice as a light weight solution for where we need >semantics >for at least on the x86 init path. > >Motivation and possible enhancements: > >To understand what made me tick to work on this feel free to read the >dead >code concerns with paravirtualization posts I've made, which also go >into >and detail Xen's alternative entry point [4] [5]. That's not the end of >possibilities to help address to possible "dead code" on Linux, I have >other >ideas but I have a bit more work to do before publishing anything about >it. >If anything -- later work should likely supplement this solution >further. > >Currently the earliest I was able to make use of boot_params on x86-64 >for >linker tables was after load_idt(), so I've decided to stick with the >earliest >init call for x86 init sequences starting on >x86_64_start_reservation(). >However, if we're able to make use of boot_params earlier (help >appreciated), >in particular just boot_params.hdr.hardware_subarch we should be able >to make >clearer and careful annotations on early x86-64 init code. Using the >boot_params.hdr.hardware_subarch and requiring clear subarchitecture >support >annotations on early init code should provide a *proactive* means to >avoid issues >such as the cr4 shadow oversight [6] which later caused crashes on Xen. >The >latest similar type of issue is when KAsan was introduced, after it was >integrated I suspected KAsan would probably break if enabled on Xen, I >reported >this in March 2015 [7] and Andrey confirmed this might be the case but >since he >wanted it enabled for allyesconfig and allmodconfig he could not think >of a >clean way to disable it or address support for it then [8]. During the >development of this patch set I confirmed KAsan crashes Xen dom0 and >therefore >should obviously crash Xen PV guests as well. Using the same kernel >KAsan >still worked on KVM and bare-metal. Provided we had early access to >boot_params.hdr.hardware_subarch, in x86_64_start_kernel() we could >annotate >kasan_early_init() and friends not supported on Xen, however since we >don't >have a way to disable KAsan at run time at this time we wouldn't have >any other >option but to crash early. Because of this though we should perhaps >just consider >disabling KAsan for Xen configs and KAsan folks might just have to bite >the >bullet. If KAsan folks are gung-ho about not disabling KAsan when Xen >is >enabled and if its easier to add support to disable KAsan at run time >than >it is to add KAsan support for Xen one alternative might be to use >linker >tables to annotate this and disable KAsan at run time for Xen. > >One of the points of the use of the x86 linker table solution and >specific >semantics there in is to *force* developers to think about requirements >on x86 >carefully. So for instance by requiring itemizing supported >subarchitectures >*early* on in development we should be able to avoid proactively issues >such as >the crash due to the cr4 shadow changes not added on the Xen entry >path, and >also missing the requirement to develop a solution for Kasan for Xen. >The >proposed x86 linker table solution is not the first to use >boot_params.hdr.hardware_subarch, its first use was actually on i386, >and for >lguest. We take advantage of it to avoid further extending pv_ops, and >*actually* to see if we could simplify pv_ops over time further. This >patch set >also includes a renaming of paravirt_enabled() to paravirt_legacy() > >An unexpected long term side goal which *might* be possible due to >linker >tables on x86 is to help unify the different C entry points for Linux >x86-64. >I've taken a stab at it after these patches [9] but it fails on Xen, >likely >because the stack is not set up right due to the different calls / >argument >requirements. If this is desirable folks could take a look and perhaps >help >on that front. > >Where to get code alternatively and testing: > >All of these patches are RFCs, if anything the only patches worth >consdering >merging sooner rather than later might be "paravirt: rename >paravirt_enabled >to paravirt_legacy" and "x86/boot: add BIT() to boot/bitops.h". After >review >I can send those separately in patch form if agreeable. Since these are >all >just RFCs I've based these patches on top of Linus' v4.4-rc5. If you >want all >patches in one file you can get them here [10], and a respective >linux-next >version which applies on top of next-20151215 here [11]. I also have >two >git trees set up with branches for this code, one based on Linus' tree >[12] >and another based on linux-next next-20151215 [13]. I'll see what >issues >zero-day bot testing finds out. I've just run time tested this on >x86-64 >bare metal, KVM, and Xen dom0 so far. > >[0] https://marc.info/?l=linux-kernel&m=128284562303565&w=2 >[1] https://marc.info/?l=linux-kernel&m=128285216913266&w=2 >[2] https://github.com/mcgrof/table-init/ >[3] >https://github.com/mcgrof/table-init/commit/6deba47ee1ad461e90e0fbba226a535cfc1c58f3 >[4] >http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html >[5] >http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html >[6] >http://lists.xenproject.org/archives/html/xen-devel/2015-02/msg02742.html >[7] >http://lkml.kernel.org/r/CAB=NE6Xs5fepzNtymzT4CueeJZ0KMPETpda114DpL4eMtDswtw@mail.gmail.com >[8] http://lkml.kernel.org/r/54F5B3DA.70203@samsung.com >[9] >http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/x86-merge-x86-init-v1.patch >[10] >http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-20151215-rfc-v1-linker-tables.patch >[11] >http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-next-20151215-rfc-v1-linker-tables.patch >[12] >https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git/log/?h=20151215-rfc-v1-linker-tables >[13] >https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20151215-rfc-v1-linker-tables > >Luis R. Rodriguez (8): > paravirt: rename paravirt_enabled to paravirt_legacy > tables.h: add linker table support > x86/boot: add BIT() to boot/bitops.h > x86/init: add linker table support > x86/init: move ebda reservations into linker table > x86/init: use linker table for i386 early setup > x86/init: user linker table for ce4100 early setup > x86/init: use linker table for mid early setup > > Documentation/kbuild/makefiles.txt | 19 ++ > arch/x86/Kconfig.debug | 47 ++++ > arch/x86/boot/bitops.h | 2 + > arch/x86/boot/boot.h | 2 +- > arch/x86/include/asm/bios_ebda.h | 2 - > arch/x86/include/asm/paravirt.h | 4 +- > arch/x86/include/asm/paravirt_types.h | 11 +- > arch/x86/include/asm/processor.h | 2 +- > arch/x86/include/asm/setup.h | 12 - > arch/x86/include/asm/x86_init.h | 1 + > arch/x86/include/asm/x86_init_fn.h | 267 ++++++++++++++++++++ > arch/x86/kernel/Makefile | 4 +- > arch/x86/kernel/apm_32.c | 2 +- > arch/x86/kernel/asm-offsets.c | 2 +- > arch/x86/kernel/cpu/intel.c | 2 +- > arch/x86/kernel/cpu/microcode/core.c | 2 +- > arch/x86/kernel/dbg-tables/Makefile | 18 ++ > arch/x86/kernel/dbg-tables/alpha.c | 10 + > arch/x86/kernel/dbg-tables/beta.c | 18 ++ > arch/x86/kernel/dbg-tables/delta.c | 10 + > arch/x86/kernel/dbg-tables/gamma.c | 18 ++ > arch/x86/kernel/dbg-tables/gamma.h | 3 + > arch/x86/kernel/{head.c => ebda.c} | 6 +- > arch/x86/kernel/head32.c | 22 +- > arch/x86/kernel/head64.c | 4 +- > arch/x86/kernel/init.c | 55 +++++ > arch/x86/kernel/kvm.c | 9 +- > arch/x86/kernel/paravirt.c | 2 +- > arch/x86/kernel/sort-init.c | 114 +++++++++ > arch/x86/kernel/tboot.c | 2 +- > arch/x86/kernel/vmlinux.lds.S | 25 ++ > arch/x86/lguest/boot.c | 2 +- > arch/x86/mm/dump_pagetables.c | 2 +- > arch/x86/platform/ce4100/ce4100.c | 4 +- > arch/x86/platform/intel-mid/intel-mid.c | 4 +- > arch/x86/tools/relocs.c | 1 + > arch/x86/xen/enlighten.c | 2 +- > drivers/pnp/pnpbios/core.c | 2 +- >include/linux/tables.h | 421 >++++++++++++++++++++++++++++++++ > scripts/Makefile.build | 4 +- > scripts/Makefile.clean | 1 + > scripts/Makefile.lib | 12 + > 42 files changed, 1091 insertions(+), 61 deletions(-) > create mode 100644 arch/x86/include/asm/x86_init_fn.h > create mode 100644 arch/x86/kernel/dbg-tables/Makefile > create mode 100644 arch/x86/kernel/dbg-tables/alpha.c > create mode 100644 arch/x86/kernel/dbg-tables/beta.c > create mode 100644 arch/x86/kernel/dbg-tables/delta.c > create mode 100644 arch/x86/kernel/dbg-tables/gamma.c > create mode 100644 arch/x86/kernel/dbg-tables/gamma.h > rename arch/x86/kernel/{head.c => ebda.c} (94%) > create mode 100644 arch/x86/kernel/init.c > create mode 100644 arch/x86/kernel/sort-init.c > create mode 100644 include/linux/tables.h By the way, please refer to iPXE as gPXE is a dead project. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/