2019-10-04 10:50:22

by Dmitry Goldin

[permalink] [raw]
Subject: [PATCH v2] kheaders: making headers archive reproducible

From: Dmitry Goldin <[email protected]>

In commit 43d8ce9d65a5 ("Provide in-kernel headers to make
extending kernel easier") a new mechanism was introduced, for kernels
>=5.2, which embeds the kernel headers in the kernel image or a module
and exposes them in procfs for use by userland tools.

The archive containing the header files has nondeterminism caused by
header files metadata. This patch normalizes the metadata and utilizes
KBUILD_BUILD_TIMESTAMP if provided and otherwise falls back to the
default behaviour.

In commit f7b101d33046 ("kheaders: Move from proc to sysfs") it was
modified to use sysfs and the script for generation of the archive was
renamed to what is being patched.

Signed-off-by: Dmitry Goldin <[email protected]>

---

v1: Initial fix

v2: Added a bit of info about kheaders to the reproducible builds
documentation and used the opportunity to fix a few typos in the
original patch.

---
Documentation/kbuild/reproducible-builds.rst | 13 +++++++++----
kernel/gen_kheaders.sh | 5 ++++-
2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/Documentation/kbuild/reproducible-builds.rst b/Documentation/kbuild/reproducible-builds.rst
index ab92e98c89c8..ce6a408b3303 100644
--- a/Documentation/kbuild/reproducible-builds.rst
+++ b/Documentation/kbuild/reproducible-builds.rst
@@ -16,16 +16,21 @@ the kernel may be unreproducible, and how to avoid them.
Timestamps
----------

-The kernel embeds a timestamp in two places:
+The kernel embeds timestamps in three places:

* The version string exposed by ``uname()`` and included in
``/proc/version``

* File timestamps in the embedded initramfs

-By default the timestamp is the current time. This must be overridden
-using the `KBUILD_BUILD_TIMESTAMP`_ variable. If you are building
-from a git commit, you could use its commit date.
+* If enabled via ``CONFIG_IKHEADERS``, file timestamps of kernel
+ headers embedded in the kernel or respective module,
+ exposed via ``/sys/kernel/kheaders.tar.xz``
+
+By default the timestamp is the current time and in the case of
+``kheaders`` the various files' modification times. This must
+be overridden using the `KBUILD_BUILD_TIMESTAMP`_ variable.
+If you are building from a git commit, you could use its commit date.

The kernel does *not* use the ``__DATE__`` and ``__TIME__`` macros,
and enables warnings if they are used. If you incorporate external
diff --git a/kernel/gen_kheaders.sh b/kernel/gen_kheaders.sh
index 9ff449888d9c..aff79e461fc9 100755
--- a/kernel/gen_kheaders.sh
+++ b/kernel/gen_kheaders.sh
@@ -71,7 +71,10 @@ done | cpio --quiet -pd $cpio_dir >/dev/null 2>&1
find $cpio_dir -type f -print0 |
xargs -0 -P8 -n1 perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;'

-tar -Jcf $tarfile -C $cpio_dir/ . > /dev/null
+# Create archive and try to normalize metadata for reproducibility
+tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \
+ --owner=0 --group=0 --sort=name --numeric-owner \
+ -Jcf $tarfile -C $cpio_dir/ . > /dev/null

echo "$src_files_md5" > kernel/kheaders.md5
echo "$obj_files_md5" >> kernel/kheaders.md5
--
2.23.0


2019-10-04 15:19:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2] kheaders: making headers archive reproducible

On Fri, Oct 04, 2019 at 10:40:07AM +0000, Dmitry Goldin wrote:
> From: Dmitry Goldin <[email protected]>
>
> In commit 43d8ce9d65a5 ("Provide in-kernel headers to make
> extending kernel easier") a new mechanism was introduced, for kernels
> >=5.2, which embeds the kernel headers in the kernel image or a module
> and exposes them in procfs for use by userland tools.
>
> The archive containing the header files has nondeterminism caused by
> header files metadata. This patch normalizes the metadata and utilizes
> KBUILD_BUILD_TIMESTAMP if provided and otherwise falls back to the
> default behaviour.
>
> In commit f7b101d33046 ("kheaders: Move from proc to sysfs") it was
> modified to use sysfs and the script for generation of the archive was
> renamed to what is being patched.
>
> Signed-off-by: Dmitry Goldin <[email protected]>


Reviewed-by: Greg Kroah-Hartman <[email protected]>

2019-10-04 18:19:18

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH v2] kheaders: making headers archive reproducible

On Fri, Oct 04, 2019 at 10:40:07AM +0000, Dmitry Goldin wrote:
> From: Dmitry Goldin <[email protected]>
>
> In commit 43d8ce9d65a5 ("Provide in-kernel headers to make
> extending kernel easier") a new mechanism was introduced, for kernels
> >=5.2, which embeds the kernel headers in the kernel image or a module
> and exposes them in procfs for use by userland tools.
>
> The archive containing the header files has nondeterminism caused by
> header files metadata. This patch normalizes the metadata and utilizes
> KBUILD_BUILD_TIMESTAMP if provided and otherwise falls back to the
> default behaviour.
>
> In commit f7b101d33046 ("kheaders: Move from proc to sysfs") it was
> modified to use sysfs and the script for generation of the archive was
> renamed to what is being patched.
>
> Signed-off-by: Dmitry Goldin <[email protected]>

Reviewed-by: Joel Fernandes (Google) <[email protected]>

>
> ---
>
> v1: Initial fix
>
> v2: Added a bit of info about kheaders to the reproducible builds
> documentation and used the opportunity to fix a few typos in the
> original patch.
>
> ---
> Documentation/kbuild/reproducible-builds.rst | 13 +++++++++----
> kernel/gen_kheaders.sh | 5 ++++-
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/kbuild/reproducible-builds.rst b/Documentation/kbuild/reproducible-builds.rst
> index ab92e98c89c8..ce6a408b3303 100644
> --- a/Documentation/kbuild/reproducible-builds.rst
> +++ b/Documentation/kbuild/reproducible-builds.rst
> @@ -16,16 +16,21 @@ the kernel may be unreproducible, and how to avoid them.
> Timestamps
> ----------
>
> -The kernel embeds a timestamp in two places:
> +The kernel embeds timestamps in three places:
>
> * The version string exposed by ``uname()`` and included in
> ``/proc/version``
>
> * File timestamps in the embedded initramfs
>
> -By default the timestamp is the current time. This must be overridden
> -using the `KBUILD_BUILD_TIMESTAMP`_ variable. If you are building
> -from a git commit, you could use its commit date.
> +* If enabled via ``CONFIG_IKHEADERS``, file timestamps of kernel
> + headers embedded in the kernel or respective module,
> + exposed via ``/sys/kernel/kheaders.tar.xz``
> +
> +By default the timestamp is the current time and in the case of
> +``kheaders`` the various files' modification times. This must
> +be overridden using the `KBUILD_BUILD_TIMESTAMP`_ variable.
> +If you are building from a git commit, you could use its commit date.
>
> The kernel does *not* use the ``__DATE__`` and ``__TIME__`` macros,
> and enables warnings if they are used. If you incorporate external
> diff --git a/kernel/gen_kheaders.sh b/kernel/gen_kheaders.sh
> index 9ff449888d9c..aff79e461fc9 100755
> --- a/kernel/gen_kheaders.sh
> +++ b/kernel/gen_kheaders.sh
> @@ -71,7 +71,10 @@ done | cpio --quiet -pd $cpio_dir >/dev/null 2>&1
> find $cpio_dir -type f -print0 |
> xargs -0 -P8 -n1 perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;'
>
> -tar -Jcf $tarfile -C $cpio_dir/ . > /dev/null
> +# Create archive and try to normalize metadata for reproducibility
> +tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \
> + --owner=0 --group=0 --sort=name --numeric-owner \
> + -Jcf $tarfile -C $cpio_dir/ . > /dev/null
>
> echo "$src_files_md5" > kernel/kheaders.md5
> echo "$obj_files_md5" >> kernel/kheaders.md5
> --
> 2.23.0

2019-10-05 03:33:12

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH v2] kheaders: making headers archive reproducible

On Fri, Oct 4, 2019 at 7:40 PM Dmitry Goldin <[email protected]> wrote:
>
> From: Dmitry Goldin <[email protected]>
>
> In commit 43d8ce9d65a5 ("Provide in-kernel headers to make
> extending kernel easier") a new mechanism was introduced, for kernels
> >=5.2, which embeds the kernel headers in the kernel image or a module
> and exposes them in procfs for use by userland tools.
>
> The archive containing the header files has nondeterminism caused by
> header files metadata. This patch normalizes the metadata and utilizes
> KBUILD_BUILD_TIMESTAMP if provided and otherwise falls back to the
> default behaviour.
>
> In commit f7b101d33046 ("kheaders: Move from proc to sysfs") it was
> modified to use sysfs and the script for generation of the archive was
> renamed to what is being patched.
>
> Signed-off-by: Dmitry Goldin <[email protected]>
>
> ---

Applied to linux-kbuild. Thanks.



>
> v1: Initial fix
>
> v2: Added a bit of info about kheaders to the reproducible builds
> documentation and used the opportunity to fix a few typos in the
> original patch.
>
> ---
> Documentation/kbuild/reproducible-builds.rst | 13 +++++++++----
> kernel/gen_kheaders.sh | 5 ++++-
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/kbuild/reproducible-builds.rst b/Documentation/kbuild/reproducible-builds.rst
> index ab92e98c89c8..ce6a408b3303 100644
> --- a/Documentation/kbuild/reproducible-builds.rst
> +++ b/Documentation/kbuild/reproducible-builds.rst
> @@ -16,16 +16,21 @@ the kernel may be unreproducible, and how to avoid them.
> Timestamps
> ----------
>
> -The kernel embeds a timestamp in two places:
> +The kernel embeds timestamps in three places:
>
> * The version string exposed by ``uname()`` and included in
> ``/proc/version``
>
> * File timestamps in the embedded initramfs
>
> -By default the timestamp is the current time. This must be overridden
> -using the `KBUILD_BUILD_TIMESTAMP`_ variable. If you are building
> -from a git commit, you could use its commit date.
> +* If enabled via ``CONFIG_IKHEADERS``, file timestamps of kernel
> + headers embedded in the kernel or respective module,
> + exposed via ``/sys/kernel/kheaders.tar.xz``
> +
> +By default the timestamp is the current time and in the case of
> +``kheaders`` the various files' modification times. This must
> +be overridden using the `KBUILD_BUILD_TIMESTAMP`_ variable.
> +If you are building from a git commit, you could use its commit date.
>
> The kernel does *not* use the ``__DATE__`` and ``__TIME__`` macros,
> and enables warnings if they are used. If you incorporate external
> diff --git a/kernel/gen_kheaders.sh b/kernel/gen_kheaders.sh
> index 9ff449888d9c..aff79e461fc9 100755
> --- a/kernel/gen_kheaders.sh
> +++ b/kernel/gen_kheaders.sh
> @@ -71,7 +71,10 @@ done | cpio --quiet -pd $cpio_dir >/dev/null 2>&1
> find $cpio_dir -type f -print0 |
> xargs -0 -P8 -n1 perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;'
>
> -tar -Jcf $tarfile -C $cpio_dir/ . > /dev/null
> +# Create archive and try to normalize metadata for reproducibility
> +tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \
> + --owner=0 --group=0 --sort=name --numeric-owner \
> + -Jcf $tarfile -C $cpio_dir/ . > /dev/null
>
> echo "$src_files_md5" > kernel/kheaders.md5
> echo "$obj_files_md5" >> kernel/kheaders.md5
> --
> 2.23.0



--
Best Regards
Masahiro Yamada

2019-10-07 11:52:21

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH v2] kheaders: making headers archive reproducible

GEN kernel/kheaders_data.tar.xz
tar: unrecognized option '--sort=name'
Try `tar --help' or `tar --usage' for more information.
make[2]: *** [kernel/kheaders_data.tar.xz] Error 64
make[1]: *** [kernel] Error 2
make: *** [sub-make] Error 2
$ tar --version
tar (GNU tar) 1.26
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.

--
Andreas Schwab, [email protected]
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."

2019-10-07 11:53:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2] kheaders: making headers archive reproducible

On Mon, Oct 07, 2019 at 01:49:47PM +0200, Andreas Schwab wrote:
> GEN kernel/kheaders_data.tar.xz
> tar: unrecognized option '--sort=name'
> Try `tar --help' or `tar --usage' for more information.
> make[2]: *** [kernel/kheaders_data.tar.xz] Error 64
> make[1]: *** [kernel] Error 2
> make: *** [sub-make] Error 2
> $ tar --version
> tar (GNU tar) 1.26
> Copyright (C) 2011 Free Software Foundation, Inc.

Wow that's an old version of tar. 2011? What happens if you use a more
modern one?

thanks,

greg k-h

2019-10-07 12:30:10

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH v2] kheaders: making headers archive reproducible

On Okt 07 2019, Greg KH <[email protected]> wrote:

> On Mon, Oct 07, 2019 at 01:49:47PM +0200, Andreas Schwab wrote:
>> GEN kernel/kheaders_data.tar.xz
>> tar: unrecognized option '--sort=name'
>> Try `tar --help' or `tar --usage' for more information.
>> make[2]: *** [kernel/kheaders_data.tar.xz] Error 64
>> make[1]: *** [kernel] Error 2
>> make: *** [sub-make] Error 2
>> $ tar --version
>> tar (GNU tar) 1.26
>> Copyright (C) 2011 Free Software Foundation, Inc.
>
> Wow that's an old version of tar. 2011? What happens if you use a more
> modern one?

That's the most modern I have available on that machine.

Andreas.

--
Andreas Schwab, [email protected]
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."

2019-10-08 08:07:52

by Dmitry Goldin

[permalink] [raw]
Subject: Re: [PATCH v2] kheaders: making headers archive reproducible

Hi,

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, October 7, 2019 2:26 PM, Andreas Schwab <[email protected]> wrote:

> On Okt 07 2019, Greg KH [email protected] wrote:
>
> > On Mon, Oct 07, 2019 at 01:49:47PM +0200, Andreas Schwab wrote:
> >
> > > GEN kernel/kheaders_data.tar.xz
> > > tar: unrecognized option '--sort=name'
> > > Try `tar --help' or`tar --usage' for more information.
> > > make[2]: *** [kernel/kheaders_data.tar.xz] Error 64
> > > make[1]: *** [kernel] Error 2
> > > make: *** [sub-make] Error 2
> > > $ tar --version
> > > tar (GNU tar) 1.26
> > > Copyright (C) 2011 Free Software Foundation, Inc.
> >
> > Wow that's an old version of tar. 2011? What happens if you use a more
> > modern one?
>
> That's the most modern I have available on that machine.

Hmm. --sort was introduced in 1.28 in 2014. Do you think it would warrant some sort of version check and fallback or is this something we can expect the user to handle if their distribution happens to not ship anything more recent? A few sensible workarounds come to mind.

In any case, likely it would make sense to at least update to https://github.com/torvalds/linux/blob/master/Documentation/process/changes.rst with the minimal version we decide on.


Dmitry

2019-10-08 08:16:17

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH v2] kheaders: making headers archive reproducible

On Tue, Oct 8, 2019 at 5:07 PM Dmitry Goldin <[email protected]> wrote:
>
> Hi,
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, October 7, 2019 2:26 PM, Andreas Schwab <[email protected]> wrote:
>
> > On Okt 07 2019, Greg KH [email protected] wrote:
> >
> > > On Mon, Oct 07, 2019 at 01:49:47PM +0200, Andreas Schwab wrote:
> > >
> > > > GEN kernel/kheaders_data.tar.xz
> > > > tar: unrecognized option '--sort=name'
> > > > Try `tar --help' or`tar --usage' for more information.
> > > > make[2]: *** [kernel/kheaders_data.tar.xz] Error 64
> > > > make[1]: *** [kernel] Error 2
> > > > make: *** [sub-make] Error 2
> > > > $ tar --version
> > > > tar (GNU tar) 1.26
> > > > Copyright (C) 2011 Free Software Foundation, Inc.
> > >
> > > Wow that's an old version of tar. 2011? What happens if you use a more
> > > modern one?
> >
> > That's the most modern I have available on that machine.
>
> Hmm. --sort was introduced in 1.28 in 2014. Do you think it would warrant some sort of version check and fallback or is this something we can expect the user to handle if their distribution happens to not ship anything more recent? A few sensible workarounds come to mind.

I think the former.

The release in 2014 is quite new, so
we can not always expect it on the users' system.




>
> In any case, likely it would make sense to at least update to https://github.com/torvalds/linux/blob/master/Documentation/process/changes.rst with the minimal version we decide on.
>
>
> Dmitry



--
Best Regards
Masahiro Yamada

2019-10-08 09:55:29

by Dmitry Goldin

[permalink] [raw]
Subject: Re: [PATCH v2] kheaders: making headers archive reproducible

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, October 8, 2019 10:14 AM, Masahiro Yamada <[email protected]> wrote:

> On Tue, Oct 8, 2019 at 5:07 PM Dmitry Goldin [email protected] wrote:
>
> > Hmm. --sort was introduced in 1.28 in 2014. Do you think it would warrant some sort of version check and fallback or is this something we can expect the user to handle if their distribution happens to not ship anything more recent? A few sensible workarounds come to mind.
>
> I think the former.

After pondering it briefly, maybe substituting the option is a bit less hassle than checking for
the version and then degrading to a possibly non-reproducible archive.

Maybe we could go with something like the sketch below to replace --sort=name. That is, if
that's the only problematic flag.

find $cpio_dir -printf "%P\n" | LC_ALL=C sort | \
tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \
--owner=0 --group=0 --numeric-owner \
-Jcf $tarfile -C $cpio_dir/ -T - > /dev/null

I will look at this a bit more closely and give it a test-run later today or early tomorrow. Then we can decide if its sufficient before submitting another patch. Other suggestions and pointers are welcome, of course.

--
Best regards,
Dmitry

2019-10-08 10:29:39

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH v2] kheaders: making headers archive reproducible

On Tue, Oct 8, 2019 at 6:54 PM Dmitry Goldin <[email protected]> wrote:
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Tuesday, October 8, 2019 10:14 AM, Masahiro Yamada <[email protected]> wrote:
>
> > On Tue, Oct 8, 2019 at 5:07 PM Dmitry Goldin [email protected] wrote:
> >
> > > Hmm. --sort was introduced in 1.28 in 2014. Do you think it would warrant some sort of version check and fallback or is this something we can expect the user to handle if their distribution happens to not ship anything more recent? A few sensible workarounds come to mind.
> >
> > I think the former.
>
> After pondering it briefly, maybe substituting the option is a bit less hassle than checking for
> the version and then degrading to a possibly non-reproducible archive.
>
> Maybe we could go with something like the sketch below to replace --sort=name. That is, if
> that's the only problematic flag.
>
> find $cpio_dir -printf "%P\n" | LC_ALL=C sort | \
> tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \
> --owner=0 --group=0 --numeric-owner \
> -Jcf $tarfile -C $cpio_dir/ -T - > /dev/null
>
> I will look at this a bit more closely and give it a test-run later today or early tomorrow. Then we can decide if its sufficient before submitting another patch. Other suggestions and pointers are welcome, of course.


I am fine with this solution too.

Thanks!

--
Best Regards
Masahiro Yamada