Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965530AbbLOWQq (ORCPT ); Tue, 15 Dec 2015 17:16:46 -0500 Received: from mail-pf0-f173.google.com ([209.85.192.173]:36538 "EHLO mail-pf0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932820AbbLOWQn (ORCPT ); Tue, 15 Dec 2015 17:16:43 -0500 From: "Luis R. Rodriguez" To: hpa@zytor.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, konrad.wilk@oracle.com Cc: rusty@rustcorp.com.au, luto@amacapital.net, boris.ostrovsky@oracle.com, mcb30@ipxe.org, jgross@suse.com, JBeulich@suse.com, joro@8bytes.org, ryabinin.a.a@gmail.com, andreyknvl@google.com, long.wanglong@huawei.com, qiuxishi@huawei.com, aryabinin@virtuozzo.com, mchehab@osg.samsung.com, valentinrothberg@gmail.com, peter.senna@gmail.com, mcgrof@suse.com, x86@kernel.org, xen-devel@lists.xensource.com, linux-kernel@vger.kernel.org Subject: [RFC v1 0/8] x86/init: Linux linker tables Date: Tue, 15 Dec 2015 14:16:29 -0800 Message-Id: <1450217797-19295-1-git-send-email-mcgrof@do-not-panic.com> X-Mailer: git-send-email 2.6.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 15050 Lines: 250 From: "Luis R. Rodriguez" A long time ago in a galaxy far, far away... Konrad Rzeszutek Wilk posted patches which eventually got merged to help with modularizing the IOMMUs we have on x86 [0]. This work was done due to the complex relationship that exists on IOMMUs and the requirements on careful execution. The solution also provided a mechanism which jettisoned unused IOMMUs during run-time. During review, even though the code was merged, hpa did note that we tend to encounter this type of problem "often enough that we should implement a generic facility for it" [1], hpa acknowledged that it obviously has to be based on sections and even noted that perhaps we might be able in the future to automate its creation. He noted that the gPXE folks had done just this with linker tables and suggested that "presumably we'd need a few different flavors for init tables and so on, but this would make it a generic mechanism." The IOMMU code got merged and this was left on someone's mental backburner. I've had an itch to scratch recently to try to avoid issues which are possible if one does not jettison other code carefully due to the large complexity of implicit dependencies of certain code on x86 in particular with possible dead code on x86 due to paravirtualization, and the IOMMU jettison strategy turned out to be my favorite solution so far. I've taken on hpa's suggestions from back in the day to review gPXE's solution to see if we could embrace it on Linux for a generic section solution to help jettison code carefully. What this patch set does exactly: This RFC patch set attempts to add support for such a generic solution. In the end, it turns out that the best solution possible was the best of both worlds: a combination of what Konrad had implemented in addition to what Michael Brown had implemented on the gXPE front. The IOMMU solution enables simple semantic annotations for dependency relationships, this however requires a run time sort. The gPXE solution grants the option to simply sort at build time. One of gPXE's solution primary goals however was also to help avoid bit-rot on code that's possible from #ifdef'ery. The Linux linker table solution enables developers to pick and choose what they need, with linker tables being the simplest solution. Contrary to gPXE which strives to force compilation of all linker table solutions we let developers pick *when* they want this as part of their solution. As can be seen from the suggested x86 init specific use of linker tables proposed you can also take advantage of both, linker sorting, optional compilation when needed (at developer's discretion), and even careful semantics annotation for dependency / relationship annotations. Although the x86 init solution here is heavily inspired by the IOMMU solution it diverges with strong semantics, and a new optional subarchitecture annotation. Sorting of init sequences is structure specific, as such each subsystem must defing their own solution unless semantics could be shared. I considered sharing semantics but in the end this proved pointless so this keeps things separate. A series of changes were made to the x86 init sequence in contrast to the IOMMU solution to be *extremely pedantic* on semantics, review of this changes can be studied on the table-init tree [2]. Quick review of gPXE's solution and prospects on further changes: In my review from gPXE's solution it was not clear what hpa meant by gXPE folks having automated this process, they actually use linker tables all around, forcing compilation of *everything* and just do linking of enabled features at link time. You still need to build linker tables on your own. What I do see more potential for in the future is enabling to evolve stronger semantics over time, and this would also be subsystem specific. This will be evident in this patch set on the x86 init use of linker tables. I also see potential in strenghtening semantics for linker sorting, any of these types of features however would impose requiring newer binutils. For instance, gPXE's linker solution currently relies on SORT(), that defaults to SORT_BY_NAME(). This sorts lexicographically, gPXE's solution uses two digits to enable SORT_BY_NAME()'s lexicographical sort to sort orde by numeric priority. Since one is in control of order-level numbers one can provide guarantee that this sort should work as intended, however binutils also now has a SORT_BY_INIT_PRIORITY() which sorts specifically based on digits. SORT_BY_INIT_PRIORITY() was designed specifically for init_array sections though. Refer to the userspace mockup solution table-init git tree [2] commit 6deba47ee1ad461e90 for more details on this. One thing I can envision to help here further are prospects for future linker enhancements, these however must be carefully considered in light of possible requirements of newer binutils and also of compatibility with other toolchains. For now we resort to the good 'ol SORT() which even Linux has made use of for ages already. In that regards here, nothing new here. gPXE folks however did find some fun ICC compatiblity issues, and have also developed fixes for them, these were obviously carried over but should be reviewed carefully. Lastly, I should point out that essentially what we're developing are different forms loose and strong semantics -- in the most complex form what we really are after are feature graphs. Mauro explained to me that for media driver they needed to build a feature graphs, IMHO that could be a next level of strong semantics we might want to consider. For now though the combination of gPXE's linker-table solution based on linker order-level sort, and our old IOMMU init solution and its simple dependency map (and possible extensions, see subarchitecture) should likely suffice as a light weight solution for where we need semantics for at least on the x86 init path. Motivation and possible enhancements: To understand what made me tick to work on this feel free to read the dead code concerns with paravirtualization posts I've made, which also go into and detail Xen's alternative entry point [4] [5]. That's not the end of possibilities to help address to possible "dead code" on Linux, I have other ideas but I have a bit more work to do before publishing anything about it. If anything -- later work should likely supplement this solution further. Currently the earliest I was able to make use of boot_params on x86-64 for linker tables was after load_idt(), so I've decided to stick with the earliest init call for x86 init sequences starting on x86_64_start_reservation(). However, if we're able to make use of boot_params earlier (help appreciated), in particular just boot_params.hdr.hardware_subarch we should be able to make clearer and careful annotations on early x86-64 init code. Using the boot_params.hdr.hardware_subarch and requiring clear subarchitecture support annotations on early init code should provide a *proactive* means to avoid issues such as the cr4 shadow oversight [6] which later caused crashes on Xen. The latest similar type of issue is when KAsan was introduced, after it was integrated I suspected KAsan would probably break if enabled on Xen, I reported this in March 2015 [7] and Andrey confirmed this might be the case but since he wanted it enabled for allyesconfig and allmodconfig he could not think of a clean way to disable it or address support for it then [8]. During the development of this patch set I confirmed KAsan crashes Xen dom0 and therefore should obviously crash Xen PV guests as well. Using the same kernel KAsan still worked on KVM and bare-metal. Provided we had early access to boot_params.hdr.hardware_subarch, in x86_64_start_kernel() we could annotate kasan_early_init() and friends not supported on Xen, however since we don't have a way to disable KAsan at run time at this time we wouldn't have any other option but to crash early. Because of this though we should perhaps just consider disabling KAsan for Xen configs and KAsan folks might just have to bite the bullet. If KAsan folks are gung-ho about not disabling KAsan when Xen is enabled and if its easier to add support to disable KAsan at run time than it is to add KAsan support for Xen one alternative might be to use linker tables to annotate this and disable KAsan at run time for Xen. One of the points of the use of the x86 linker table solution and specific semantics there in is to *force* developers to think about requirements on x86 carefully. So for instance by requiring itemizing supported subarchitectures *early* on in development we should be able to avoid proactively issues such as the crash due to the cr4 shadow changes not added on the Xen entry path, and also missing the requirement to develop a solution for Kasan for Xen. The proposed x86 linker table solution is not the first to use boot_params.hdr.hardware_subarch, its first use was actually on i386, and for lguest. We take advantage of it to avoid further extending pv_ops, and *actually* to see if we could simplify pv_ops over time further. This patch set also includes a renaming of paravirt_enabled() to paravirt_legacy() An unexpected long term side goal which *might* be possible due to linker tables on x86 is to help unify the different C entry points for Linux x86-64. I've taken a stab at it after these patches [9] but it fails on Xen, likely because the stack is not set up right due to the different calls / argument requirements. If this is desirable folks could take a look and perhaps help on that front. Where to get code alternatively and testing: All of these patches are RFCs, if anything the only patches worth consdering merging sooner rather than later might be "paravirt: rename paravirt_enabled to paravirt_legacy" and "x86/boot: add BIT() to boot/bitops.h". After review I can send those separately in patch form if agreeable. Since these are all just RFCs I've based these patches on top of Linus' v4.4-rc5. If you want all patches in one file you can get them here [10], and a respective linux-next version which applies on top of next-20151215 here [11]. I also have two git trees set up with branches for this code, one based on Linus' tree [12] and another based on linux-next next-20151215 [13]. I'll see what issues zero-day bot testing finds out. I've just run time tested this on x86-64 bare metal, KVM, and Xen dom0 so far. [0] https://marc.info/?l=linux-kernel&m=128284562303565&w=2 [1] https://marc.info/?l=linux-kernel&m=128285216913266&w=2 [2] https://github.com/mcgrof/table-init/ [3] https://github.com/mcgrof/table-init/commit/6deba47ee1ad461e90e0fbba226a535cfc1c58f3 [4] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html [5] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html [6] http://lists.xenproject.org/archives/html/xen-devel/2015-02/msg02742.html [7] http://lkml.kernel.org/r/CAB=NE6Xs5fepzNtymzT4CueeJZ0KMPETpda114DpL4eMtDswtw@mail.gmail.com [8] http://lkml.kernel.org/r/54F5B3DA.70203@samsung.com [9] http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/x86-merge-x86-init-v1.patch [10] http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-20151215-rfc-v1-linker-tables.patch [11] http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-next-20151215-rfc-v1-linker-tables.patch [12] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git/log/?h=20151215-rfc-v1-linker-tables [13] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20151215-rfc-v1-linker-tables Luis R. Rodriguez (8): paravirt: rename paravirt_enabled to paravirt_legacy tables.h: add linker table support x86/boot: add BIT() to boot/bitops.h x86/init: add linker table support x86/init: move ebda reservations into linker table x86/init: use linker table for i386 early setup x86/init: user linker table for ce4100 early setup x86/init: use linker table for mid early setup Documentation/kbuild/makefiles.txt | 19 ++ arch/x86/Kconfig.debug | 47 ++++ arch/x86/boot/bitops.h | 2 + arch/x86/boot/boot.h | 2 +- arch/x86/include/asm/bios_ebda.h | 2 - arch/x86/include/asm/paravirt.h | 4 +- arch/x86/include/asm/paravirt_types.h | 11 +- arch/x86/include/asm/processor.h | 2 +- arch/x86/include/asm/setup.h | 12 - arch/x86/include/asm/x86_init.h | 1 + arch/x86/include/asm/x86_init_fn.h | 267 ++++++++++++++++++++ arch/x86/kernel/Makefile | 4 +- arch/x86/kernel/apm_32.c | 2 +- arch/x86/kernel/asm-offsets.c | 2 +- arch/x86/kernel/cpu/intel.c | 2 +- arch/x86/kernel/cpu/microcode/core.c | 2 +- arch/x86/kernel/dbg-tables/Makefile | 18 ++ arch/x86/kernel/dbg-tables/alpha.c | 10 + arch/x86/kernel/dbg-tables/beta.c | 18 ++ arch/x86/kernel/dbg-tables/delta.c | 10 + arch/x86/kernel/dbg-tables/gamma.c | 18 ++ arch/x86/kernel/dbg-tables/gamma.h | 3 + arch/x86/kernel/{head.c => ebda.c} | 6 +- arch/x86/kernel/head32.c | 22 +- arch/x86/kernel/head64.c | 4 +- arch/x86/kernel/init.c | 55 +++++ arch/x86/kernel/kvm.c | 9 +- arch/x86/kernel/paravirt.c | 2 +- arch/x86/kernel/sort-init.c | 114 +++++++++ arch/x86/kernel/tboot.c | 2 +- arch/x86/kernel/vmlinux.lds.S | 25 ++ arch/x86/lguest/boot.c | 2 +- arch/x86/mm/dump_pagetables.c | 2 +- arch/x86/platform/ce4100/ce4100.c | 4 +- arch/x86/platform/intel-mid/intel-mid.c | 4 +- arch/x86/tools/relocs.c | 1 + arch/x86/xen/enlighten.c | 2 +- drivers/pnp/pnpbios/core.c | 2 +- include/linux/tables.h | 421 ++++++++++++++++++++++++++++++++ scripts/Makefile.build | 4 +- scripts/Makefile.clean | 1 + scripts/Makefile.lib | 12 + 42 files changed, 1091 insertions(+), 61 deletions(-) create mode 100644 arch/x86/include/asm/x86_init_fn.h create mode 100644 arch/x86/kernel/dbg-tables/Makefile create mode 100644 arch/x86/kernel/dbg-tables/alpha.c create mode 100644 arch/x86/kernel/dbg-tables/beta.c create mode 100644 arch/x86/kernel/dbg-tables/delta.c create mode 100644 arch/x86/kernel/dbg-tables/gamma.c create mode 100644 arch/x86/kernel/dbg-tables/gamma.h rename arch/x86/kernel/{head.c => ebda.c} (94%) create mode 100644 arch/x86/kernel/init.c create mode 100644 arch/x86/kernel/sort-init.c create mode 100644 include/linux/tables.h -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/