Date: Fri, 18 Dec 2015 00:46:25 +0100
From: "Luis R. Rodriguez" <mcgrof@suse.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>, tglx@linutronix.de,
        mingo@redhat.com, bp@alien8.de, konrad.wilk@oracle.com,
        rusty@rustcorp.com.au, luto@amacapital.net, boris.ostrovsky@oracle.com,
        mcb30@ipxe.org, jgross@suse.com, JBeulich@suse.com, joro@8bytes.org,
        ryabinin.a.a@gmail.com, andreyknvl@google.com,
        long.wanglong@huawei.com, qiuxishi@huawei.com, aryabinin@virtuozzo.com,
        mchehab@osg.samsung.com, valentinrothberg@gmail.com,
        peter.senna@gmail.com, x86@kernel.org, Michal Marek <mmarek@suse.com>,
        xen-devel@lists.xensource.com, Michael Matz <matz@suse.de>,
        linux-kernel@vger.kernel.org
Subject: Re: [RFC v1 0/8] x86/init: Linux linker tables
Message-ID: <20151217234625.GM20409@wotan.suse.de>
References: <1450217797-19295-1-git-send-email-mcgrof@do-not-panic.com>
 <56731D32.4040900@zytor.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <56731D32.4040900@zytor.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 9018
Lines: 182

On Thu, Dec 17, 2015 at 12:38:10PM -0800, H. Peter Anvin wrote:
> I think we can make this even more generic.  In particular, I would love
> to see a solution for link tables that:
> 
> a) can be used for any kind of data structures, not just function
> pointers (the latter is a specialization of the former);

Sure, I can give this a shot. I'll sync back to the userspace
table-init.git tree what I had in this series and then build
first there a solution to make it agnostic to the use case.

> b) doesn't need any changes to the linker scripts once the initial
> enabling is done for any one architecture.

I'm glad you brought this up. I considered this as well, and sadly I
forgot to document the resolution. So here it is:

iPXE uses *one* entry on their linker script, that's it. All else is
just C code. That works well as all code in iPXE is happy to work
with one linker script, and the goal and use case for the linker
tables are *all* for initialization. iPXE also does *not* make use
of any further semantics for dependency annotations.

For Linux we have different requirements and goals, and we also wanted to take
advantage of other possible semantics when desirable. Consider the proposed use
case of a linker table with struct x86_init_fn, for its init sequences. We
need at the very least ***one linker table entry* to catch and peg these
sections, just as iPXE has. If we wanted to we could live with that but for
Linux on x86 that means vmlinux.lds.S, that's already architecture specific.
Do we want a more generic place ? If so can anyone suggest one ?

So to be clear, the solution in place suggested here already enables this, but
its current form does use vmlinux.lds.S. Although the current patch made this
change:

        .tbl            : {                                                     
                __tbl_x86_start_init_fns = .;                                   
                *(SORT(.tbl.x86_init_fns.*))                                    
                __tbl_x86_end_init_fns = .;                                     
                                                                                
                /* ... etc ... */                                               
                                                                                
                /* You would add other tables here as needed */                 
        }  

This was on purpose, a generic form could have simply been used as well:


    __tbl = .;
    *(SORT(.tbl.*))
    __tbl_end = .;


I made this change on the mockup table-init through two commits here:

https://github.com/mcgrof/table-init/commit/1a330c1dadbf7fa255eadcbaf214452f7e9ea29c
https://github.com/mcgrof/table-init/commit/ad11b4010ab2a5ffae7be39cce0ed4d052fe18fd

I explain why I do that there but the gist of it is that on Linux we may also
want stronger semantics for specific linker table solutions, and solutions such
as those devised on the IOMMU init stuff do memmove() for sorting depending on
semantics defined (in the simplest case here so far dependency between init
sequences), this makes each set of sequences very subsystem specific. An issue
with *one* subsystem could make things really bad for others. I thought about
this quite a bit and figured its best left to the subsystem maintainers to
decide.

So we have the freedom to do we wish still -- its just we need to understand
the implications of using something like a run time sort, and also when we want
to have the option to use different sized set of routines for different sections.
Linker sorting should fortunately give us the freedom to sort without concern
or regard to size of the structs / data used, but *iff* different sized data
is used in that section then one has to be careful that the sort routine be
very subsystem specific. So as I see it we'd need linker script annotations
for start and end on subsystems that want to use run time sort. This would
compartmentalize their sorting and bugs. If run time sorting is not needed
a catch-it-all script section as the above simple one should catch it.

To try to avoid issues between these two types of cases we could just document
the requirement. Now, it'd nice if we had a way to provide the section start/end
through C code, and not modify anything on linker scripts even if you had run
time sort requirements, I just can't think of a way to do that right now. Ideas
welcomed. Maybe some Kconfig magic to custom linker script section ? Then the
main linker script can just include this generated linker script section ?

> Key to this is to be able to define tables by name only, which is really
> why SORT_BY_NAME() is used: the name sorts before the priority simply by
> putting the name before the class.

Right, tbl.init_fns.01 for example would be one section if your scruct
was .init_fns and 01 for the first "priority". That still puts a requirement
if you want to use a run time sort so you can identify the start/end of
the range.

> Instead of .tbl.* naming of sections I think we should have the first
> component be the type of section (.rodata, .data, .init_rodata,
> .read_mostly etc.) which makes it easier to write a linker script that
> properly sorts it into the right section.

Sure, we can peg that as an argument to the __table(), so perhaps:

Perhaps a new sections.h file (you tell me) which documents the different
section components:

/* document this *really* well */
#define SECTION_RODATA		".rodata"
#define SECTION_INIT		".init"
#define SECTION_INIT_RODATA	".init_rodata"
#define SECTION_READ_MOSTLY	".read_mostly"

Then on tables.h we add the section components support:

#define __table(component, type, name) (component, type, name) 

#define __table_component(table) __table_extract_component table                          
#define __table_extract_component(component, type, name) component

#define __table_type(table) __table_extract_type table                          
#define __table_extract_type(component, type, name) type

#define __table_name(table) __table_extract_name table                          
#define __table_extract_name(component, type, name) name 

#define __table_str(x) #x 

#define __table_section(table, idx) \                                           
        "." __table_component (table) ".tbl." __table_name (table) "." __table_str (idx)                      

#define __table_entry(table, idx)                                       \       
        __attribute__ ((__section__(__table_section(table, idx)),       \       
                        __aligned__(__table_alignment(table))))

A user could then be something as follows:

#define X86_INIT_FNS __table(SECTION_INIT, struct x86_init_fn, "x86_init_fns") 
#define __x86_init_fn(order_level) __table_entry(X86_INIT_FNS, order_level)

If that's what you mean?

I'm a bit wary about having the linker sort any of the above SECTION_*'s, but
if we're happy to do that perhaps a simple first step might be to see if 0-day
but would be happy with just the sort without any consequences to any
architecture. Thoughts?

> The other thing is to take a
> clue from the implementation in iPXE, which uses priority levels 00 and
> 99 (or we could use non-integers which sort appropriately instead of
> using "real" levels) to contain the start and end symbols, which
> eliminates any need for linker script modifications to add new tables.

This solution uses that as well. The only need for adding custom sections
is when they have a requirement for a custom run time sort, and also to
ensure they don't cause regressions on other subsystems if they have a buggy
sort. The run time sorting is all subsystem specific and up to their own
semantics.

> Making this a generic facility we could eventually eliminate a bunch of
> ad hoc hacks we currently have.

Towards the end of my development I did look into seeing if I could use this
sort of stuff also for the other section sorting hacks we have in place, I ran
into an issue with the type stuff, if we make it agnostic to data structure,
then perhaps indeed it might be possible to generalize this even to those
scary cob-web corner infested places in the kernel.

> Oh, and the link table feature should NOT be x86-specific.

As far as I can tell tables.h is not architecture specific at this point, the
way arch/x86/include/asm/x86_init_fn.h uses it is, but that's fine. Even with
the above changes it would still be architecture agnostic. If there's some
discrepancies on how some architecture linkers work -- now that's something
to consider now. Its also why I've been mocking this up in userspace, we
should be able to test that much easier than a full blown kernel.

Now would be a good time to get linker folks involved and eyeballing all this.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/