Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933592AbbLRE0d (ORCPT ); Thu, 17 Dec 2015 23:26:33 -0500 Received: from terminus.zytor.com ([198.137.202.10]:50052 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753961AbbLRE0b (ORCPT ); Thu, 17 Dec 2015 23:26:31 -0500 Subject: Re: [RFC v1 0/8] x86/init: Linux linker tables To: "Luis R. Rodriguez" References: <1450217797-19295-1-git-send-email-mcgrof@do-not-panic.com> <56731D32.4040900@zytor.com> <20151217234625.GM20409@wotan.suse.de> Cc: "Luis R. Rodriguez" , tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, konrad.wilk@oracle.com, rusty@rustcorp.com.au, luto@amacapital.net, boris.ostrovsky@oracle.com, mcb30@ipxe.org, jgross@suse.com, JBeulich@suse.com, joro@8bytes.org, ryabinin.a.a@gmail.com, andreyknvl@google.com, long.wanglong@huawei.com, qiuxishi@huawei.com, aryabinin@virtuozzo.com, mchehab@osg.samsung.com, valentinrothberg@gmail.com, peter.senna@gmail.com, x86@kernel.org, Michal Marek , xen-devel@lists.xensource.com, Michael Matz , linux-kernel@vger.kernel.org From: "H. Peter Anvin" X-Enigmail-Draft-Status: N1110 Message-ID: <56738AAF.2080601@zytor.com> Date: Thu, 17 Dec 2015 20:25:19 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151217234625.GM20409@wotan.suse.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5173 Lines: 130 On 12/17/15 15:46, Luis R. Rodriguez wrote: > > I explain why I do that there but the gist of it is that on Linux we may also > want stronger semantics for specific linker table solutions, and solutions such > as those devised on the IOMMU init stuff do memmove() for sorting depending on > semantics defined (in the simplest case here so far dependency between init > sequences), this makes each set of sequences very subsystem specific. An issue > with *one* subsystem could make things really bad for others. I thought about > this quite a bit and figured its best left to the subsystem maintainers to > decide. > A table that needs sorting or other runtime handling is just a read-write table for the purpose of the linker table construct. It presents to C as an array of initialized data. > Perhaps a new sections.h file (you tell me) which documents the different > section components: > > /* document this *really* well */ > #define SECTION_RODATA ".rodata" > #define SECTION_INIT ".init" > #define SECTION_INIT_RODATA ".init_rodata" > #define SECTION_READ_MOSTLY ".read_mostly" > > Then on tables.h we add the section components support: Yes, something like that. How to macroize it cleanly is another matter; we may want to use slightly different conventions that iPXE to match our own codebase. > #define __table(component, type, name) (component, type, name) > > #define __table_component(table) __table_extract_component table > #define __table_extract_component(component, type, name) component > > #define __table_type(table) __table_extract_type table > #define __table_extract_type(component, type, name) type > > #define __table_name(table) __table_extract_name table > #define __table_extract_name(component, type, name) name > > #define __table_str(x) #x > > #define __table_section(table, idx) \ > "." __table_component (table) ".tbl." __table_name (table) "." __table_str (idx) > > #define __table_entry(table, idx) \ > __attribute__ ((__section__(__table_section(table, idx)), \ > __aligned__(__table_alignment(table)))) > > A user could then be something as follows: > > #define X86_INIT_FNS __table(SECTION_INIT, struct x86_init_fn, "x86_init_fns") > #define __x86_init_fn(order_level) __table_entry(X86_INIT_FNS, order_level) Yes, but in particular the common case of function initialization tables should be generic. I'm kind of thinking a syntax like this: DECLARE_LINKTABLE_RO(struct foo, tablename); DEFINE_LINKTABLE_RO(struct foo, tablename); LINKTABLE_RO(tablename,level) = /* contents */; LINKTABLE_SIZE(tablename) ... which would turn into something like this once it goes through all the preprocessing phases /* DECLARE_LINKTABLE_RO */ extern const struct foo tablename[], tablename__end[]; /* DEFINE_LINKTABLE_RO */ DECLARE_LINKTABLE_RO(struct foo, tablename); const struct foo__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0]; const struct foo__attribute__((used,section(".rodata.tbl.tablename.999"))) tablename__end[0]; /* LINKTABLE_RO */ static const __typeof__(tablename) __attribute__((used,section(".rodata.tbl.tablename.50"))) __tbl_tablename_12345 /* LINKTABLE_SIZE */ ((tablename__end) - (tablename)) ... and so on for all the possible sections where we may want tables. Note: I used 0 and 999 above since they sort before and after all possible 2-digit decimal numbers, but that's just cosmetic. > If that's what you mean? > > I'm a bit wary about having the linker sort any of the above SECTION_*'s, but > if we're happy to do that perhaps a simple first step might be to see if 0-day > but would be happy with just the sort without any consequences to any > architecture. Thoughts? I don't see what is dangerous about it. The section names are such that a lexographical sort will do the right thing, and we can simply use SORT(.rodata.tbl.*) in the linker script, for example. >> The other thing is to take a >> clue from the implementation in iPXE, which uses priority levels 00 and >> 99 (or we could use non-integers which sort appropriately instead of >> using "real" levels) to contain the start and end symbols, which >> eliminates any need for linker script modifications to add new tables. > > This solution uses that as well. The only need for adding custom sections > is when they have a requirement for a custom run time sort, and also to > ensure they don't cause regressions on other subsystems if they have a buggy > sort. The run time sorting is all subsystem specific and up to their own > semantics. Again, from a linker table POV this is nothing other than a read-write table; there is a runtime function that then operates on that read-write table. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/