Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754494AbbLQXq3 (ORCPT ); Thu, 17 Dec 2015 18:46:29 -0500 Received: from mx2.suse.de ([195.135.220.15]:49684 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751815AbbLQXq1 (ORCPT ); Thu, 17 Dec 2015 18:46:27 -0500 Date: Fri, 18 Dec 2015 00:46:25 +0100 From: "Luis R. Rodriguez" To: "H. Peter Anvin" Cc: "Luis R. Rodriguez" , tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, konrad.wilk@oracle.com, rusty@rustcorp.com.au, luto@amacapital.net, boris.ostrovsky@oracle.com, mcb30@ipxe.org, jgross@suse.com, JBeulich@suse.com, joro@8bytes.org, ryabinin.a.a@gmail.com, andreyknvl@google.com, long.wanglong@huawei.com, qiuxishi@huawei.com, aryabinin@virtuozzo.com, mchehab@osg.samsung.com, valentinrothberg@gmail.com, peter.senna@gmail.com, x86@kernel.org, Michal Marek , xen-devel@lists.xensource.com, Michael Matz , linux-kernel@vger.kernel.org Subject: Re: [RFC v1 0/8] x86/init: Linux linker tables Message-ID: <20151217234625.GM20409@wotan.suse.de> References: <1450217797-19295-1-git-send-email-mcgrof@do-not-panic.com> <56731D32.4040900@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56731D32.4040900@zytor.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9018 Lines: 182 On Thu, Dec 17, 2015 at 12:38:10PM -0800, H. Peter Anvin wrote: > I think we can make this even more generic. In particular, I would love > to see a solution for link tables that: > > a) can be used for any kind of data structures, not just function > pointers (the latter is a specialization of the former); Sure, I can give this a shot. I'll sync back to the userspace table-init.git tree what I had in this series and then build first there a solution to make it agnostic to the use case. > b) doesn't need any changes to the linker scripts once the initial > enabling is done for any one architecture. I'm glad you brought this up. I considered this as well, and sadly I forgot to document the resolution. So here it is: iPXE uses *one* entry on their linker script, that's it. All else is just C code. That works well as all code in iPXE is happy to work with one linker script, and the goal and use case for the linker tables are *all* for initialization. iPXE also does *not* make use of any further semantics for dependency annotations. For Linux we have different requirements and goals, and we also wanted to take advantage of other possible semantics when desirable. Consider the proposed use case of a linker table with struct x86_init_fn, for its init sequences. We need at the very least ***one linker table entry* to catch and peg these sections, just as iPXE has. If we wanted to we could live with that but for Linux on x86 that means vmlinux.lds.S, that's already architecture specific. Do we want a more generic place ? If so can anyone suggest one ? So to be clear, the solution in place suggested here already enables this, but its current form does use vmlinux.lds.S. Although the current patch made this change: .tbl : { __tbl_x86_start_init_fns = .; *(SORT(.tbl.x86_init_fns.*)) __tbl_x86_end_init_fns = .; /* ... etc ... */ /* You would add other tables here as needed */ } This was on purpose, a generic form could have simply been used as well: __tbl = .; *(SORT(.tbl.*)) __tbl_end = .; I made this change on the mockup table-init through two commits here: https://github.com/mcgrof/table-init/commit/1a330c1dadbf7fa255eadcbaf214452f7e9ea29c https://github.com/mcgrof/table-init/commit/ad11b4010ab2a5ffae7be39cce0ed4d052fe18fd I explain why I do that there but the gist of it is that on Linux we may also want stronger semantics for specific linker table solutions, and solutions such as those devised on the IOMMU init stuff do memmove() for sorting depending on semantics defined (in the simplest case here so far dependency between init sequences), this makes each set of sequences very subsystem specific. An issue with *one* subsystem could make things really bad for others. I thought about this quite a bit and figured its best left to the subsystem maintainers to decide. So we have the freedom to do we wish still -- its just we need to understand the implications of using something like a run time sort, and also when we want to have the option to use different sized set of routines for different sections. Linker sorting should fortunately give us the freedom to sort without concern or regard to size of the structs / data used, but *iff* different sized data is used in that section then one has to be careful that the sort routine be very subsystem specific. So as I see it we'd need linker script annotations for start and end on subsystems that want to use run time sort. This would compartmentalize their sorting and bugs. If run time sorting is not needed a catch-it-all script section as the above simple one should catch it. To try to avoid issues between these two types of cases we could just document the requirement. Now, it'd nice if we had a way to provide the section start/end through C code, and not modify anything on linker scripts even if you had run time sort requirements, I just can't think of a way to do that right now. Ideas welcomed. Maybe some Kconfig magic to custom linker script section ? Then the main linker script can just include this generated linker script section ? > Key to this is to be able to define tables by name only, which is really > why SORT_BY_NAME() is used: the name sorts before the priority simply by > putting the name before the class. Right, tbl.init_fns.01 for example would be one section if your scruct was .init_fns and 01 for the first "priority". That still puts a requirement if you want to use a run time sort so you can identify the start/end of the range. > Instead of .tbl.* naming of sections I think we should have the first > component be the type of section (.rodata, .data, .init_rodata, > .read_mostly etc.) which makes it easier to write a linker script that > properly sorts it into the right section. Sure, we can peg that as an argument to the __table(), so perhaps: Perhaps a new sections.h file (you tell me) which documents the different section components: /* document this *really* well */ #define SECTION_RODATA ".rodata" #define SECTION_INIT ".init" #define SECTION_INIT_RODATA ".init_rodata" #define SECTION_READ_MOSTLY ".read_mostly" Then on tables.h we add the section components support: #define __table(component, type, name) (component, type, name) #define __table_component(table) __table_extract_component table #define __table_extract_component(component, type, name) component #define __table_type(table) __table_extract_type table #define __table_extract_type(component, type, name) type #define __table_name(table) __table_extract_name table #define __table_extract_name(component, type, name) name #define __table_str(x) #x #define __table_section(table, idx) \ "." __table_component (table) ".tbl." __table_name (table) "." __table_str (idx) #define __table_entry(table, idx) \ __attribute__ ((__section__(__table_section(table, idx)), \ __aligned__(__table_alignment(table)))) A user could then be something as follows: #define X86_INIT_FNS __table(SECTION_INIT, struct x86_init_fn, "x86_init_fns") #define __x86_init_fn(order_level) __table_entry(X86_INIT_FNS, order_level) If that's what you mean? I'm a bit wary about having the linker sort any of the above SECTION_*'s, but if we're happy to do that perhaps a simple first step might be to see if 0-day but would be happy with just the sort without any consequences to any architecture. Thoughts? > The other thing is to take a > clue from the implementation in iPXE, which uses priority levels 00 and > 99 (or we could use non-integers which sort appropriately instead of > using "real" levels) to contain the start and end symbols, which > eliminates any need for linker script modifications to add new tables. This solution uses that as well. The only need for adding custom sections is when they have a requirement for a custom run time sort, and also to ensure they don't cause regressions on other subsystems if they have a buggy sort. The run time sorting is all subsystem specific and up to their own semantics. > Making this a generic facility we could eventually eliminate a bunch of > ad hoc hacks we currently have. Towards the end of my development I did look into seeing if I could use this sort of stuff also for the other section sorting hacks we have in place, I ran into an issue with the type stuff, if we make it agnostic to data structure, then perhaps indeed it might be possible to generalize this even to those scary cob-web corner infested places in the kernel. > Oh, and the link table feature should NOT be x86-specific. As far as I can tell tables.h is not architecture specific at this point, the way arch/x86/include/asm/x86_init_fn.h uses it is, but that's fine. Even with the above changes it would still be architecture agnostic. If there's some discrepancies on how some architecture linkers work -- now that's something to consider now. Its also why I've been mocking this up in userspace, we should be able to test that much easier than a full blown kernel. Now would be a good time to get linker folks involved and eyeballing all this. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/