Date: Wed, 30 Nov 2016 18:38:16 +0100
From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: Nicholas Piggin <npiggin@gmail.com>, "H. Peter Anvin" <hpa@zytor.com>,
        Michael Matz <matz@suse.de>, Arnd Bergmann <arnd@arndb.de>,
        Josh Poimboeuf <jpoimboe@redhat.com>,
        Kees Cook <keescook@chromium.org>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>,
        Guenter Roeck <linux@roeck-us.net>,
        Masami Hiramatsu <mhiramat@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Fengguang Wu <fengguang.wu@intel.com>,
        Adrian Hunter <adrian.hunter@intel.com>,
        David Ahern <dsahern@gmail.com>, Jiri Olsa <jolsa@kernel.org>,
        Namhyung Kim <namhyung@kernel.org>, Wang Nan <wangnan0@huawei.com>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Borislav Petkov <bp@alien8.de>, Joerg Roedel <joro@8bytes.org>
Subject: Re: linker-tables v5 testing
Message-ID: <20161130173816.GP1402@wotan.suse.de>
References: <CAB=NE6Whw=q-7waoqE7PmJZ+ZVGN9QqsKg26819gvStfPk_+8A@mail.gmail.com>
 <f9e3abc8-fc5f-df73-4ddf-6f7a7b4a9112@roeck-us.net>
 <20161130013349.GN1402@wotan.suse.de>
 <20161130140947.3ea27d9e@roar.ozlabs.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161130140947.3ea27d9e@roar.ozlabs.ibm.com>
User-Agent: Mutt/1.6.0 (2016-04-01)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 8194
Lines: 172

On Wed, Nov 30, 2016 at 02:09:47PM +1100, Nicholas Piggin wrote:
> On Wed, 30 Nov 2016 02:33:49 +0100
> "Luis R. Rodriguez" <mcgrof@kernel.org> wrote:
> 
> > On Thu, Nov 24, 2016 at 08:18:40AM -0800, Guenter Roeck wrote:
> > > Hi Luis,
> > > 
> > > On 11/23/2016 08:11 PM, Luis R. Rodriguez wrote:  
> > > > Guenter,
> > > > 
> > > > I think I'm ready to start pushing a new patch set out for review.
> > > > Before I do that -- can I trouble you for letting your test
> > > > infrastructure hammer it? I'll only push out the patches for review
> > > > based on this new set of changes once all tests come back OK for all
> > > > architectures.
> > > > 
> > > > https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20161117-linker-tables-v5
> > > > 
> > > > Fenguang & Guenter,
> > > > 
> > > > Any chance I can trouble you to enable the new driver:
> > > > CONFIG_TEST_LINKTABLES=y on each kernel configuration as it will run a
> > > > test driver which will WARN_ON() if it finds any errors.
> > > >   
> > > 
> > > I see a number of compile failures as well as some crashes in your test driver.
> > > Please have a look. http://kerneltests.org/builders, column 'testing'.
> > >   
> > 
> > Thanks! I believe I have addressed most issues.. Any chance I can have you re-try with
> > this new branch:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20161117-linker-tables-v5-try2
> 
> Few minor things:
> 
> - Can module-common.lds be generated with standard cpp_lds_S?

I actually no longer have a need to have module-common.lds generated given
I no longer am using macro wrappers or helpers. IMHO it would be still be
useful to have this generated but I think we can do this separately so
I can drop these patches and we can address this after?

> - Breaking up the input section descriptions like that in the linker
>   script is not what we want AFAIKS. It forces the linker to put them
>   in different locations.

What is wrong with that ? Separating linker table and section ranges is
not just cosmetic, it also helps us split out domain specific sections
which reduces errors when mismatch errors happen. For instance if you
end up trying to refer to a section range with a linker table helper you'd
get a section mismatch complaint now. Likewise for the inverse.
We also restrict section ranges much more than linker tables, we don't
for instance provide any APIs to iterate over them for instance.

Lastly I wanted SORT() to be used on these sections and while some have
expressed being OK in even sorting our big .text section I think its not
prudent to do so right away to ensure we avoid any possible regressions
because of it. Because of this its best to keep then separate sections
for these special use APIs.

Now it does have cost of modifying the linker script but that just needs
to happen *once* per each new section we want to add support for. This
series only adds section range support and linker table support for a
few basic sections:

  o .data
  o .rodata
  o .text
  o .init.data
  o .init.text

This should suffice for *most* Linux kernel code hacks.

> Broader issues:
> 
> - I still think calling these "sections" and "de facto Linux sections"
>   etc. is a bit confusing, especially because assembler sections are
>   also involved. Region, array, blob, anything. Then you get "section
>   ranges" and "linker tables". Fundamentally all it is is a linear memory
>   allocator for static data which is decentralized in the source code
>   (as opposed to centralized which would just be `u8 mydata[size]`).

You are right, sorry I had not updated that part of the documentation yet.
I had a big nose dive into the ELF specifications after I wrote that
documentation and as per ELF specs all we have here are sections using
SHT_PROGBITS (@progbits on most architectures, on ARM this is %progbits)
and then we have "Special sections" [0] (one is .init for example). The
SHT_PROGBITS is a section type which can be used on programs to specify
a section is custom and its use is defined by the program itself. Now,
some of the "Special sections" though also have SHT_PROGBITS -- and because
of this it implicates programs can further customize these sections. This
is precisely why we can customize .init further on Linux for instance.

I think just calling them Linux sections suffices then. Thoughts?

BTW please refer to a draft paper [1] I have on section ranges / linker tables
in light of this nose dive of the ELF specs [2]. It summarizes also limitations
of ELF sections which are important for these APIs, for instance it should
the answer about limits on 'order level' size, (the limit of say the digit
postfix say 002 in .data.tbl.foo.002). I had not yet updated the documentation
to reflect all this yet but I can do so now if you feel it would help.
Lastly my slides from Plumbers might be useful too [3].

[0] https://refspecs.linuxbase.org/elf/gabi4+/ch4.sheader.html#special_sections
[1] http://drvbp1.linux-foundation.org/~mcgrof/papers/2016/11/21/linker-tables-20161121.pdf
[2] https://refspecs.linuxbase.org/elf/gabi4+/contents.html
[3] http://linuxplumbersconf.com/2016/ocw//system/presentations/3621/original/SUSE-ELF-plumbers.odp
> 
> - It would be nice to just clearly separate the memory allocator function
>   from the syntatic sugar on top.

Sorry I do not follow. What exactly do you mean by this?

>   Actually I think if the memory allocation
>   and access functions are clean enough, you don't need anything more.

Sorry I still do not follow.

>   If we have an array of pointers and size, it's trivial C code to iterate over
>   it. If it needs to have a set of LINKTABLE accessors over the top of it
>   for this use case, then that would seem to be a failure of the underlying
>   API, no?

Still did not get it.

> - Is it really important to be able to add new allocators without modifying
>   a central file for the linker script? Yes it's a benefit, but is it enough
>   to justify the complexity?

If by allocators you mean the ability to add new entries into sections easily
without having to modify the linker script -- then my answer is:

What if its not just about API wrapper to make this easier but also there might
be more benefits on top of this ? What are the benefits I see as we merge this?
They are:

First the obvious gains:

  o Allows us to just use plain C code to add new custom ELF sections
  o Makes it less error prone to add new sections

Then the not so obvious immediate gains:

  o Helps simplify initialization code
  o Helps optionally avoid code bit-rot (see commit "kbuild: enable option to
    force compile force-obj-y and force-lib-y)
  o Enables link time ordering which can help make an extremely lightweight
    dependency annotations explicit (for this refer to the tools/linker-tables/
    example where I've demo'd simplfying Xen init code, which has caused tons of
    regressions over time due to the lack of explicit dependency annotations due
    to the sloppy Xen PV entry path; this is not going away any time soon)

Not so obvious long term gains:

  o Run time ordering customization are still possible (this is really long
    term but we already have one precedent in the kernel that does memmove()
    on entries on a section per some dependency heuristic.)
  o May help prevent use of dead code by either free'ing / nop'ing, no exec
    or whatever sections which we know should definitely not run.
  o Can easily increase the levels of initialization in the kernel without
    much changes (refer to tools/linker-tables/ for an example init with
    linker tables). I've seen a few cases where we've already run out of space
    to get init right in a few subsystems. This not only allows this but also
    allows subsystems to define their own custom levels of init if they wanted
    for their own things.
  o Synthetic functions: enables custom C / asm functions which could help
    add new functionality (see synth_init_or() on tools/linker-tables/drivers/synth/main.c)
    I've been hinted this can help with RAID6 support
  o Self modifying consts (see ps_shr() on tools/linker-tables/drivers/synth/main.c)

  Luis