On Mon, Sep 08, 2014 at 02:37:53PM -0300, Henrique de Moraes Holschuh wrote:
> The full requirements for the memory area which holds the microcode
> update binary data can be found in the Intel SDM, vol 3A, section
> 9.11.6, page 9-34. They basically boil down to: 16-byte alignment, and
> the data area must be entirely mapped if paging is already enabled.
>
> The regular microcode update driver doesn't have to do anything special
> to meet these requirements. For peace of mind, add a check to
> WARN_ONCE() when the alignment is (unexpectedly) incorrect, and abort
> the microcode update.
>
> However, the early microcode update driver can only expect 4-byte
> alignment out of the early initramfs file format (which is actually good
> enough for many Intel processors, but unless Intel oficially documents
> this, we cannot make use of that fact). The microcode update data will
> not be aligned to a 16-byte boundary unless userspace takes special
> steps to ensure it.
>
> Change the early microcode driver to make a temporary copy of a portion
> of the microcode header, and move the microcode data backwards
> (overwriting the header) to a suitably aligned position, right before
> issuing the microcode update WRMSR.
>
> Unfortunately, to keep things simple, this has to be undone right after
> the processor finishes the WRMSR. Therefore, the alignment process will
> have to be redone *for every processor core*. This might end up not
> being really noticeable, as the microcode update operation itself is
> already extremely expensive in processor cycles.
>
> If the microcode update data is already aligned in the first place, the
> alignment process is skipped. Users of large systems are encouraged to
> use updated userspace that ensures 16-byte alignment of the microcode
> data file contents inside the early initramfs image.
>
> Add the relevant details to Documentation/x86/early-microcode.txt.
>
> Signed-off-by: Henrique de Moraes Holschuh <[email protected]>
> ---
> Documentation/x86/early-microcode.txt | 10 +++++++++
> arch/x86/kernel/cpu/microcode/intel.c | 5 +++++
> arch/x86/kernel/cpu/microcode/intel_early.c | 30 +++++++++++++++++++++++++--
> 3 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/x86/early-microcode.txt b/Documentation/x86/early-microcode.txt
> index d62bea6..c4f2ebd 100644
> --- a/Documentation/x86/early-microcode.txt
> +++ b/Documentation/x86/early-microcode.txt
> @@ -14,6 +14,16 @@ during boot time. The microcode file in cpio name space is:
> on Intel: kernel/x86/microcode/GenuineIntel.bin
> on AMD : kernel/x86/microcode/AuthenticAMD.bin
>
> +For Intel processors, the microcode load process will be faster when special
faster??
> +care is taken to ensure that the kernel/x86/microcode/GenuineIntel.bin file
> +*data* inside the cpio archive is aligned to a paragraph (16-byte boundary).
> +Standard pax/cpio can be coaxed into doing this by adding a padding file, e.g.
> +"kernel/x86/microcode/.padding" with the appropriate size *right before* the
> +kernel/x86/microcode/GenuineIntel.bin file. Beware the required size for the
> +padding file as it depends on the behavior of the tool used to create the cpio
> +archive. It is also possible to use a specific tool that appends enough NULs
> +_to_ the file name (not _after_ the file name!) to align the file data.
> +
> During BSP boot (before SMP starts), if the kernel finds the microcode file in
> the initrd file, it parses the microcode and saves matching microcode in memory.
> If matching microcode is found, it will be uploaded in BSP and later on in all
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index 2182cec..40caef1 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -157,6 +157,11 @@ static int apply_microcode_intel(int cpu)
> if (mc_intel == NULL)
> return 0;
>
> + /* Intel SDM vol 3A section 9.11.6, page 9-34 */
> + if (WARN_ONCE((unsigned long)(mc_intel->bits) % 16,
> + "microcode data incorrectly aligned"))
I wonder how many people would start complaining when this goes out the
door? Have you checked actually how the majority of the tools do layout
the microcode in the initrd?
> + return -1;
> +
> /*
> * Microcode on this CPU might be already up-to-date. Only apply
> * the microcode patch in mc_intel when it is newer than the one
> diff --git a/arch/x86/kernel/cpu/microcode/intel_early.c b/arch/x86/kernel/cpu/microcode/intel_early.c
> index 92629a8..994c59b 100644
> --- a/arch/x86/kernel/cpu/microcode/intel_early.c
> +++ b/arch/x86/kernel/cpu/microcode/intel_early.c
> @@ -662,14 +662,40 @@ static int apply_microcode_early(struct mc_saved_data *mc_saved_data,
> struct microcode_intel *mc_intel;
> unsigned int val[2];
>
> + char savedbuf[16];
> + void *mcu_data;
> + void *aligned_mcu_data;
> + unsigned int mcu_size = 0;
> +
> mc_intel = uci->mc;
> if (mc_intel == NULL)
> return 0;
>
> + mcu_data = mc_intel->bits;
> + aligned_mcu_data = mc_intel->bits;
> +
> + /* Intel SDM vol 3A section 9.11.6, page 9-34: */
> + /* WRMSR MSR_IA32_UCODE_WRITE requires 16-byte alignment */
Kernel comment style:
/*
* Blabla.
* More bla.
*/
> + if ((unsigned long)(mcu_data) % 16) {
> + /* We have more than 16 bytes worth of microcode header
> + * just before mc_intel->bits on a version 1 header */
> + BUILD_BUG_ON(offsetof(struct microcode_intel, bits) < 16);
That's not really needed - I don't see struct microcode_header_intel
changing anytime soon.
> +
> + aligned_mcu_data = (void *)((unsigned long)(mcu_data) & ~15UL);
> + mcu_size = get_datasize(&mc_intel->hdr);
> + memcpy(savedbuf, aligned_mcu_data, sizeof(savedbuf));
> + memmove(aligned_mcu_data, mcu_data, mcu_size);
> + }
> +
> /* write microcode via MSR 0x79 */
> native_wrmsr(MSR_IA32_UCODE_WRITE,
> - (unsigned long) mc_intel->bits,
> - (unsigned long) mc_intel->bits >> 16 >> 16);
> + lower_32_bits((unsigned long)aligned_mcu_data),
> + upper_32_bits((unsigned long)aligned_mcu_data));
> +
> + if (mcu_size) {
> + memmove(mcu_data, aligned_mcu_data, mcu_size);
> + memcpy(aligned_mcu_data, savedbuf, sizeof(savedbuf));
> + }
>
> /* get the current revision from MSR 0x8B */
> native_wrmsr(MSR_IA32_UCODE_REV, 0, 0);
> --
> 1.7.10.4
>
>
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Fri, 07 Nov 2014, Borislav Petkov wrote:
> > --- a/Documentation/x86/early-microcode.txt
> > +++ b/Documentation/x86/early-microcode.txt
> > @@ -14,6 +14,16 @@ during boot time. The microcode file in cpio name space is:
> > on Intel: kernel/x86/microcode/GenuineIntel.bin
> > on AMD : kernel/x86/microcode/AuthenticAMD.bin
> >
> > +For Intel processors, the microcode load process will be faster when special
>
> faster??
Well, it might well be lost in the noise given how slow a microcode update
is.
What I mean is that the early microcode driver "won't waste cpu time moving
data that is already aligned".
However, if the data is _not_ aligned (and it will *not* be aligned if you
use standard cpio without any tricks), the early driver will have to move it
around so that it respects the architectural requirement of 16-byte
alignment for the microcode update WRMSR.
> > index 2182cec..40caef1 100644
> > --- a/arch/x86/kernel/cpu/microcode/intel.c
> > +++ b/arch/x86/kernel/cpu/microcode/intel.c
> > @@ -157,6 +157,11 @@ static int apply_microcode_intel(int cpu)
> > if (mc_intel == NULL)
> > return 0;
> >
> > + /* Intel SDM vol 3A section 9.11.6, page 9-34 */
> > + if (WARN_ONCE((unsigned long)(mc_intel->bits) % 16,
> > + "microcode data incorrectly aligned"))
>
> I wonder how many people would start complaining when this goes out the
> door? Have you checked actually how the majority of the tools do layout
> the microcode in the initrd?
That WARN_ONCE() should only trigger if a bug shows its ugly face. If it is
triggering in any other case, this patch is broken.
mc_intel->bits in the late driver really shouldn't depend at all on any
alignment from initramfs or whatever: that path is supposed to run on
microcode that came from the firmware loader, or the deprecated microcode
device, or which was memcpy'd from the initramfs to a kmalloc()'d area by
save_microcode() in the early driver. All of these will always be 16-byte
aligned AFAIK.
So, the early driver gets alignment code as it will usually have to deal
with unaligned microcode data, and the late driver (which should never see
unaligned microcode data) gets a WARN_ONCE to alert us if something broke in
the kernel code.
Should I change the WARN_ONCE message to:
WARN_ONCE(... "kernel bug: microcode data incorrectly aligned") ?
> > + return -1;
> > +
> > /*
> > * Microcode on this CPU might be already up-to-date. Only apply
> > * the microcode patch in mc_intel when it is newer than the one
> > diff --git a/arch/x86/kernel/cpu/microcode/intel_early.c b/arch/x86/kernel/cpu/microcode/intel_early.c
> > index 92629a8..994c59b 100644
> > --- a/arch/x86/kernel/cpu/microcode/intel_early.c
> > +++ b/arch/x86/kernel/cpu/microcode/intel_early.c
> > @@ -662,14 +662,40 @@ static int apply_microcode_early(struct mc_saved_data *mc_saved_data,
> > struct microcode_intel *mc_intel;
> > unsigned int val[2];
> >
> > + char savedbuf[16];
> > + void *mcu_data;
> > + void *aligned_mcu_data;
> > + unsigned int mcu_size = 0;
> > +
> > mc_intel = uci->mc;
> > if (mc_intel == NULL)
> > return 0;
> >
> > + mcu_data = mc_intel->bits;
> > + aligned_mcu_data = mc_intel->bits;
> > +
> > + /* Intel SDM vol 3A section 9.11.6, page 9-34: */
> > + /* WRMSR MSR_IA32_UCODE_WRITE requires 16-byte alignment */
>
> Kernel comment style:
>
> /*
> * Blabla.
> * More bla.
> */
Will fix.
> > + if ((unsigned long)(mcu_data) % 16) {
> > + /* We have more than 16 bytes worth of microcode header
> > + * just before mc_intel->bits on a version 1 header */
> > + BUILD_BUG_ON(offsetof(struct microcode_intel, bits) < 16);
>
> That's not really needed - I don't see struct microcode_header_intel
> changing anytime soon.
Neither do I. But this has zero footprint on the resulting kernel, and
might save someone 10 years from now from trying to debug initramfs data
corruption.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Fri, Nov 07, 2014 at 08:54:25PM -0200, Henrique de Moraes Holschuh wrote:
> Well, it might well be lost in the noise given how slow a microcode update
> is.
>
> What I mean is that the early microcode driver "won't waste cpu time moving
> data that is already aligned".
Yes, don't say "faster" but explain that we can save us the shuffling
of microcode data back and forth in the early loader if the microcode
update is 16-byte aligned in the initrd.
> That WARN_ONCE() should only trigger if a bug shows its ugly face. If it is
> triggering in any other case, this patch is broken.
>
> mc_intel->bits in the late driver really shouldn't depend at all on any
> alignment from initramfs or whatever: that path is supposed to run on
> microcode that came from the firmware loader, or the deprecated microcode
Right, the early path is apply_microcode_early() - I misread that part,
sorry.
> device, or which was memcpy'd from the initramfs to a kmalloc()'d area by
> save_microcode() in the early driver. All of these will always be 16-byte
> aligned AFAIK.
Why will it always be 16-byte aligned? And if it is going to be always
16-byte aligned, why do we need the WARN_ONCE() at all?
> So, the early driver gets alignment code as it will usually have to deal
> with unaligned microcode data, and the late driver (which should never see
> unaligned microcode data) gets a WARN_ONCE to alert us if something broke in
> the kernel code.
>
> Should I change the WARN_ONCE message to:
>
> WARN_ONCE(... "kernel bug: microcode data incorrectly aligned") ?
No, you should either give a possible scenario where an unaligned
pointer can really happen or, alternatively, if you can prove that the
condition will never happen, drop the WARN_ONCE completely.
And please no improbable
maybe-it-could-once-in-a-blue-moon-when-the-stars-align-happen cases.
Oh, and even if the WARN_ONCE triggers, what can we do about it? If we
can't do anything about it, then we have a problem. If we can, then we
better do it and change the WARN_ONCE to an if-check which, if positive,
executes code that fixes the situation.
IOW, what I'm trying to say is, can we make the kmalloc'ed memory
16-byte aligned and if so, then copy the microcode data there and be
happy. And I think we can... :-)
> Neither do I. But this has zero footprint on the resulting kernel, and
> might save someone 10 years from now from trying to debug initramfs data
> corruption.
... and someone might waste a lot of time 10 years from now trying
to fathom why the hell that thing was added, only to realize that
someone was being unreasonably overzealous. No, we don't add code for
hypothetical cases which are absolutely improbable.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Sat, 08 Nov 2014, Borislav Petkov wrote:
> On Fri, Nov 07, 2014 at 08:54:25PM -0200, Henrique de Moraes Holschuh wrote:
> > Well, it might well be lost in the noise given how slow a microcode update
> > is.
> >
> > What I mean is that the early microcode driver "won't waste cpu time moving
> > data that is already aligned".
>
> Yes, don't say "faster" but explain that we can save us the shuffling
> of microcode data back and forth in the early loader if the microcode
> update is 16-byte aligned in the initrd.
Will change.
> > That WARN_ONCE() should only trigger if a bug shows its ugly face. If it is
> > triggering in any other case, this patch is broken.
> >
> > mc_intel->bits in the late driver really shouldn't depend at all on any
> > alignment from initramfs or whatever: that path is supposed to run on
> > microcode that came from the firmware loader, or the deprecated microcode
>
> Right, the early path is apply_microcode_early() - I misread that part,
> sorry.
>
> > device, or which was memcpy'd from the initramfs to a kmalloc()'d area by
> > save_microcode() in the early driver. All of these will always be 16-byte
> > aligned AFAIK.
>
> Why will it always be 16-byte aligned? And if it is going to be always
> 16-byte aligned, why do we need the WARN_ONCE() at all?
...
> > So, the early driver gets alignment code as it will usually have to deal
> > with unaligned microcode data, and the late driver (which should never see
> > unaligned microcode data) gets a WARN_ONCE to alert us if something broke in
> > the kernel code.
> >
> > Should I change the WARN_ONCE message to:
> >
> > WARN_ONCE(... "kernel bug: microcode data incorrectly aligned") ?
>
> No, you should either give a possible scenario where an unaligned
> pointer can really happen or, alternatively, if you can prove that the
> condition will never happen, drop the WARN_ONCE completely.
If I know of a non-kernel-bug scenario where it could happen, I'd have added
code that fixes the alignment like I did in the early driver...
AFAIK, that WARN_ON is only going to trigger if kmalloc changes and starts
returning less strictly aligned data, or if something is corrupting memory.
I will remove the WARN_ONCE, and place a comment in its place:
/*
* the memory area holding the microcode update data must be 16-byte
* aligned. This is supposed to be guaranteed by kmalloc().
*/
> Oh, and even if the WARN_ONCE triggers, what can we do about it? If we
> can't do anything about it, then we have a problem. If we can, then we
> better do it and change the WARN_ONCE to an if-check which, if positive,
> executes code that fixes the situation.
All we could do is abort the update. The real reason behind the
misalignment would have to be tracked in order to know what should be done
about it.
> IOW, what I'm trying to say is, can we make the kmalloc'ed memory
> 16-byte aligned and if so, then copy the microcode data there and be
> happy. And I think we can... :-)
Sure, we can. But AFAIK kmalloc() is supposed to already give us a 16-byte
aligned data area.
> > Neither do I. But this has zero footprint on the resulting kernel, and
> > might save someone 10 years from now from trying to debug initramfs data
> > corruption.
>
> ... and someone might waste a lot of time 10 years from now trying
> to fathom why the hell that thing was added, only to realize that
> someone was being unreasonably overzealous. No, we don't add code for
> hypothetical cases which are absolutely improbable.
I will drop it.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Sat, Nov 08, 2014 at 07:57:49PM -0200, Henrique de Moraes Holschuh wrote:
> I will remove the WARN_ONCE, and place a comment in its place:
>
> /*
> * the memory area holding the microcode update data must be 16-byte
> * aligned. This is supposed to be guaranteed by kmalloc().
> */
So this makes this comment pretty useless as it doesn't do anything
about the case where 16-byte alignment gets violated. Actually I was
expecting something else:
* you either write down *why* kmalloc guarantees alignment. From a quick
look it might but it might not, hint
#define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
* or you actually go and fix this by making sure all memory in the intel
loader is 16-byte aligned. Maybe a loader-specific kmalloc wrapper,
something which allocates a bit more and then aligns it properly, and so
on...
But simply adding a comment which doesn't do anything to solve the
situation doesn't make a lot of sense. And more importantly, doesn't
solve the situation.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Tue, 11 Nov 2014, Borislav Petkov wrote:
> On Sat, Nov 08, 2014 at 07:57:49PM -0200, Henrique de Moraes Holschuh wrote:
> * you either write down *why* kmalloc guarantees alignment. From a quick
> look it might but it might not, hint
>
> #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
Meh, I don't know where I came up with the wrong information that kmalloc
aligned to 16-bytes instead of 8 bytes.
I do wonder why I didn't hit this while testing, though. Maybe an artifact
of slub, or just my luck that I never got a memory block that was not
aligned to 16 bytes.
I will fix this.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Tue, Nov 11, 2014 at 02:57:31PM -0200, Henrique de Moraes Holschuh wrote:
> Meh, I don't know where I came up with the wrong information that kmalloc
> aligned to 16-bytes instead of 8 bytes.
>
> I do wonder why I didn't hit this while testing, though. Maybe an artifact
> of slub, or just my luck that I never got a memory block that was not
> aligned to 16 bytes.
The ARCH_KMALLOC_MINALIGN is conditioned on ARCH_DMA_MINALIGN and a
bunch of other things. All I'm saying is, this needs a careful study
when, if at all, kmalloc will not give an 16-byte aligned buffer.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Tue, 11 Nov 2014, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 02:57:31PM -0200, Henrique de Moraes Holschuh wrote:
> > Meh, I don't know where I came up with the wrong information that kmalloc
> > aligned to 16-bytes instead of 8 bytes.
> >
> > I do wonder why I didn't hit this while testing, though. Maybe an artifact
> > of slub, or just my luck that I never got a memory block that was not
> > aligned to 16 bytes.
>
> The ARCH_KMALLOC_MINALIGN is conditioned on ARCH_DMA_MINALIGN and a
> bunch of other things. All I'm saying is, this needs a careful study
> when, if at all, kmalloc will not give an 16-byte aligned buffer.
I tried to do that in order to answer my own question. After a quick look,
it looks like it ends up being an implementation detail of SLUB/SLAB/SLOB,
and it depends on the size of the object being allocated (i.e. if it is
bigger than KMALLOC_MAX_CACHE_SIZE).
I was always getting page-aligned memory blocks out of kmalloc() in my
testing because of that.
It won't be an ugly fix anyway, something like this (conceptual code):
/* as required by Intel SDM blahblahblah */
#define INTEL_MICROCODE_MINALIGN 16
#define INTEL_UCODE_PTR(x) PTR_ALIGN(x, INTEL_MICROCODE_MINALIGN)
void *intel_ucode_kmalloc(size_t size)
{
void *p = kmalloc(size, GFP_KERNEL);
if (likely(p == INTEL_UCODE_PTR(p))
return p;
kfree(p);
return kmalloc(size + INTEL_MICROCODE_MINALIGN - 1, GFP_KERNEL);
}
For most users, that "p == INTEL_UCODE_PTR(p)" will always be true (SLUB
alocator, and microcode size >= 4096).
This way, we don't go around wasting pages when the microcode size is large
enough for the allocation to get kicked directly to the page allocator and
it is also is a multiple of PAGE_SIZE (which it often is).
I will need to add some defense against (size + 15) overflow in that custom
kmalloc, or ensure that we refuse ridiculously large microcode early (say,
larger than 4MiB). Not a big deal, we probably already do this, and if we
don't, it will be an worthwhile enhancement by itself. The largest
microcode I have been made aware of is ~64KiB.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Tue, Nov 11, 2014 at 05:54:00PM -0200, Henrique de Moraes Holschuh wrote:
> void *intel_ucode_kmalloc(size_t size)
> {
> void *p = kmalloc(size, GFP_KERNEL);
Actually I was thinking of this:
void *p = kmalloc(size + 16, GFP_KERNEL);
if (!p)
return -ENOMEM;
if (unlikely((unsigned long)p & 0xf))
p_a = ALIGN(p, 16);
You'd need to stash the original *p somewhere for freeing later, of
course.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Wed, 12 Nov 2014, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 05:54:00PM -0200, Henrique de Moraes Holschuh wrote:
> > void *intel_ucode_kmalloc(size_t size)
> > {
> > void *p = kmalloc(size, GFP_KERNEL);
>
> Actually I was thinking of this:
>
> void *p = kmalloc(size + 16, GFP_KERNEL);
> if (!p)
> return -ENOMEM;
>
> if (unlikely((unsigned long)p & 0xf))
> p_a = ALIGN(p, 16);
>
> You'd need to stash the original *p somewhere for freeing later, of
> course.
Well, it is a trade-off: your version always add 16 bytes. Intel microcode
is always a multiple of 1KiB, so these extra 16 bytes will often result in
allocating an extra page.
The detail is that: since most Intel microcodes are bigger than the kmalloc
cache, most of the time kmalloc will return page-aligned addresses, which
don't need any alignment.
Your version also needs to keep the original pointer around for kfree, which
is going to be annoying.
My version has the drawback that it requires the use of INTEL_UCODE_PTR(p)
to get to the microcode data, but you can just kfree(p), and it will only
add the 16 bytes when absolutely required. This is nice, because it means
we won't waste an extra page in the most common case, and we don't have to
find a place to store any extra pointers.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Wed, Nov 12, 2014 at 10:18:47PM -0200, Henrique de Moraes Holschuh wrote:
> The detail is that: since most Intel microcodes are bigger than the kmalloc
> cache, most of the time kmalloc will return page-aligned addresses, which
> don't need any alignment.
Yeah, you keep saying that. Do you have an actual proof too?
Because if this turns out wrong, we'll end up doing two allocations
instead of one, which is dumb. My suggestion was to do one allocation
only.
> Your version also needs to keep the original pointer around for kfree, which
> is going to be annoying.
>
> My version has the drawback that it requires the use of INTEL_UCODE_PTR(p)
Yeah, just drop that macro - use simply PTR_ALIGN and
INTEL_MICROCODE_MINALIGN.
> to get to the microcode data, but you can just kfree(p), and it will only
> add the 16 bytes when absolutely required. This is nice, because it means
> we won't waste an extra page in the most common case, and we don't have to
> find a place to store any extra pointers.
Yeah yeah, proof please.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Thu, 13 Nov 2014, Borislav Petkov wrote:
> On Wed, Nov 12, 2014 at 10:18:47PM -0200, Henrique de Moraes Holschuh wrote:
> > The detail is that: since most Intel microcodes are bigger than the kmalloc
> > cache, most of the time kmalloc will return page-aligned addresses, which
> > don't need any alignment.
>
> Yeah, you keep saying that. Do you have an actual proof too?
I believe so, from documention gathered through google... but I could be
wrong about this.
And since I have learned my lesson, I took your comment to mean "look
deeper". I took the time to both try to understand somewhat the mm/ code
AND write a kernel module to do empiric testing before I wrote this reply,
instead of trusting the documentation.
And it turns out I failed at proving myself wrong, either because I am not
wrong, or by sheer unluckyness.
Important detail: intel microcodes have 1KiB size granularity, are never
smaller than 2048 bytes, and so far the largest ones are close to 64KiB.
For SLUB:
kmalloc() will do kmalloc_large() for anything > 8192 bytes and that should
return page-aligned data since it calls alloc_kmem_pages()/alloc_pages().
For 2048 bytes to 8192 bytes, we will get memory from one of the following
slabs: kmalloc-2048, kmalloc-4096 or kmalloc-8192.
As far as I could understand (and here I could easily be wrong as the mm/
slab cache code is not trivial to grok), those are going to be object-size
aligned, because the allocation size will be rounded up to the slab's object
size (i.e. 2048/4096/8192 bytes). The objects are allocated from the start
of the slab (which is page-aligned), back-to-back.
Thus SLUB would always return 16-byte aligned memory for the size range of
Intel microcode. In fact, it will return 2048-byte aligned memory in the
worst case (2048-byte microcode).
Empiric testing in x86-64 resulted in data compatible with the above
reasoning.
For SLAB:
SLAB is nasty to grok with a first look due to the whole complexity re. its
queues and cpu caches, and I don't think I understood the code properly.
intel microcode sized allocations end up in slabs with large objects that
are all of the same 2^n size (i.e. 2048 bytes, 4096 bytes, etc). I could
not grok the code enough to assert what real alignment I could expect for
2KiB to 64KiB objects.
Empiric testing in x86-64 SLAB, done by allocating 63 objects of the same
size in a row, for all possible microcode sizes, did not result in a single
kmalloc() that was not sufficiently aligned, and in fact all of them were
object-size aligned, even for the smallest microcodes (2048 bytes). I even
tried it with CONFIG_DEBUG_SLAB turned on and off to see if it changed
anything (it didn't).
So, while I didn't understand the code enough to prove that we'd mostly get
good microcode alignment out of SLAB, I couldn't get it to return pointers
that would require extra alignment either. The worst case was 2048-byte
alignment, for microcodes with 2048 bytes.
For SLOB:
SLOB will call the page allocator for anything bigger than 4095 bytes, so
all microcodes from 4096 bytes and above will be page-aligned.
Only 2048-byte and 3072-byte microcode wouldn't go directly to the page
allocator. This is microcode for ancient processors: Pentium M and earlier,
and the first NetBurst processors.
I didn't bother to test, but from the code, I expect that 2048-byte and
3072-byte sized microcode would *not* be aligned to 16 bytes. However, we
have very few users of these ancient processors left. Calling kmalloc(s);
kfree(); kmalloc(s+15); for these rare processors doesn't look like too much
of an issue IMHO.
SLOB was the only allocator that could result in microcode that needs
further alignment in my testing, and only for the small microcode (2KiB and
3KiB) of ancient processors.
> Because if this turns out wrong, we'll end up doing two allocations
> instead of one, which is dumb. My suggestion was to do one allocation
> only.
See above. As far as I could verify, we wouldn't be doing two allocations
in the common cases.
I really don't like the idea of increasing the allocation order by one for
4KiB, 8KiB, 16KiB, 32KiB and 64KiB microcodes just to alocate 15 bytes of
extra padding that, at least as far as I could check, seem to be most often
not needed.
Note that I did not do any empiric testing on 32 bits.
> > Your version also needs to keep the original pointer around for kfree, which
> > is going to be annoying.
> >
> > My version has the drawback that it requires the use of INTEL_UCODE_PTR(p)
>
> Yeah, just drop that macro - use simply PTR_ALIGN and
> INTEL_MICROCODE_MINALIGN.
I will do that.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
On Sat, Nov 15, 2014 at 09:10:44PM -0200, Henrique de Moraes Holschuh wrote:
> And since I have learned my lesson, I took your comment to mean "look
> deeper". I took the time to both try to understand somewhat the mm/ code
> AND write a kernel module to do empiric testing before I wrote this reply,
> instead of trusting the documentation.
Good! Thanks. :-)
> Important detail: intel microcodes have 1KiB size granularity, are never
> smaller than 2048 bytes, and so far the largest ones are close to 64KiB.
Right, we should be able to manage 16 pages allocation on most systems.
> For SLUB:
>
> kmalloc() will do kmalloc_large() for anything > 8192 bytes and that should
> return page-aligned data since it calls alloc_kmem_pages()/alloc_pages().
Yep.
> For 2048 bytes to 8192 bytes, we will get memory from one of the following
> slabs: kmalloc-2048, kmalloc-4096 or kmalloc-8192.
>
> As far as I could understand (and here I could easily be wrong as the mm/
> slab cache code is not trivial to grok), those are going to be object-size
> aligned, because the allocation size will be rounded up to the slab's object
> size (i.e. 2048/4096/8192 bytes). The objects are allocated from the start
> of the slab (which is page-aligned), back-to-back.
>
> Thus SLUB would always return 16-byte aligned memory for the size range of
> Intel microcode. In fact, it will return 2048-byte aligned memory in the
> worst case (2048-byte microcode).
I'm wondering if those slab objects would have some sort of
metadata after them so that the nice alignment of 2048, 4096 and 8192
gets b0rked. Because there are those members there in slub_def.h:
/*
* Slab cache management.
*/
struct kmem_cache {
...
int size; /* The size of an object including meta data */
int object_size; /* The size of an object without meta data */
but dumping the values from sysfs show me that there's no metadata:
/sys/kernel/slab/kmalloc-2048/object_size:2048
/sys/kernel/slab/kmalloc-2048/slab_size:2048
/sys/kernel/slab/kmalloc-4096/object_size:4096
/sys/kernel/slab/kmalloc-4096/slab_size:4096
/sys/kernel/slab/kmalloc-8192/object_size:8192
/sys/kernel/slab/kmalloc-8192/slab_size:8192
so we should be fine.
> For SLAB:
>
> SLAB is nasty to grok with a first look due to the whole complexity re. its
> queues and cpu caches, and I don't think I understood the code properly.
>
> intel microcode sized allocations end up in slabs with large objects that
> are all of the same 2^n size (i.e. 2048 bytes, 4096 bytes, etc). I could
> not grok the code enough to assert what real alignment I could expect for
> 2KiB to 64KiB objects.
Well, 2KiB alignment certainly satisfies 16 bytes alignment.
> Empiric testing in x86-64 SLAB, done by allocating 63 objects of the same
> size in a row, for all possible microcode sizes, did not result in a single
> kmalloc() that was not sufficiently aligned, and in fact all of them were
> object-size aligned, even for the smallest microcodes (2048 bytes). I even
> tried it with CONFIG_DEBUG_SLAB turned on and off to see if it changed
> anything (it didn't).
Ok.
> So, while I didn't understand the code enough to prove that we'd mostly get
> good microcode alignment out of SLAB, I couldn't get it to return pointers
> that would require extra alignment either. The worst case was 2048-byte
> alignment, for microcodes with 2048 bytes.
Well, looking at calculate_alignment() in mm/slab_common.c
and since we're being called with SLAB_HWCACHE_ALIGN, i.e.
create_kmalloc_caches(ARCH_KMALLOC_FLAGS) in kmem_cache_init(), then the
loop in calculate_alignment() will definitely give higher alignment than
16 for the larger caches. This is, of course AFAICT.
> For SLOB:
>
> SLOB will call the page allocator for anything bigger than 4095 bytes, so
> all microcodes from 4096 bytes and above will be page-aligned.
>
> Only 2048-byte and 3072-byte microcode wouldn't go directly to the page
> allocator. This is microcode for ancient processors: Pentium M and earlier,
> and the first NetBurst processors.
>
> I didn't bother to test, but from the code, I expect that 2048-byte and
> 3072-byte sized microcode would *not* be aligned to 16 bytes. However, we
> have very few users of these ancient processors left. Calling kmalloc(s);
> kfree(); kmalloc(s+15); for these rare processors doesn't look like too much
> of an issue IMHO.
>
> SLOB was the only allocator that could result in microcode that needs
> further alignment in my testing, and only for the small microcode (2KiB and
> 3KiB) of ancient processors.
Right, it looks like slob could give objects aligned to < 16.
Ok, please add the jist of this to the commit message without going into
unnecessary detail too much.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Mon, 24 Nov 2014, Borislav Petkov wrote:
> On Sat, Nov 15, 2014 at 09:10:44PM -0200, Henrique de Moraes Holschuh wrote:
> > For SLAB:
> >
> > SLAB is nasty to grok with a first look due to the whole complexity re. its
> > queues and cpu caches, and I don't think I understood the code properly.
> >
> > intel microcode sized allocations end up in slabs with large objects that
> > are all of the same 2^n size (i.e. 2048 bytes, 4096 bytes, etc). I could
> > not grok the code enough to assert what real alignment I could expect for
> > 2KiB to 64KiB objects.
>
> Well, 2KiB alignment certainly satisfies 16 bytes alignment.
Yeah, however I was not sure SLAB wouldn't spoil the fun by adding a header
before the slab and screwing up MIN(page-size, object-size) alignment inside
the slab.
Unlike SLUB, SLAB is documented to add an enemy-of-large-alignment
head-of-slab header in an attempt to make the world a sadder place where
even SIMD instructions (that want cache-line alignment) would suffer...
But my empiric testing didn't show any of that sadness for these larger
allocation sizes (2KiB and bigger). They're all either page-aligned or
object-size aligned, regardless of whether SLAB debug mode was compiled in
or not.
> > Empiric testing in x86-64 SLAB, done by allocating 63 objects of the same
> > size in a row, for all possible microcode sizes, did not result in a single
> > kmalloc() that was not sufficiently aligned, and in fact all of them were
> > object-size aligned, even for the smallest microcodes (2048 bytes). I even
> > tried it with CONFIG_DEBUG_SLAB turned on and off to see if it changed
> > anything (it didn't).
>
> Ok.
>
> > So, while I didn't understand the code enough to prove that we'd mostly get
> > good microcode alignment out of SLAB, I couldn't get it to return pointers
> > that would require extra alignment either. The worst case was 2048-byte
> > alignment, for microcodes with 2048 bytes.
>
> Well, looking at calculate_alignment() in mm/slab_common.c
> and since we're being called with SLAB_HWCACHE_ALIGN, i.e.
> create_kmalloc_caches(ARCH_KMALLOC_FLAGS) in kmem_cache_init(), then the
> loop in calculate_alignment() will definitely give higher alignment than
> 16 for the larger caches. This is, of course AFAICT.
Ah, that explains it. Thanks. I missed the
create_kmalloc_caches(ARCH_KMALLOC_FLAGS) detail, which indeed ensures there
is no slab info at the head of the slab, so all intel microcode objects will
end up aligned to MIN(page-size, object-size).
> > For SLOB:
> >
> > SLOB will call the page allocator for anything bigger than 4095 bytes, so
> > all microcodes from 4096 bytes and above will be page-aligned.
> >
> > Only 2048-byte and 3072-byte microcode wouldn't go directly to the page
> > allocator. This is microcode for ancient processors: Pentium M and earlier,
> > and the first NetBurst processors.
> >
> > I didn't bother to test, but from the code, I expect that 2048-byte and
> > 3072-byte sized microcode would *not* be aligned to 16 bytes. However, we
> > have very few users of these ancient processors left. Calling kmalloc(s);
> > kfree(); kmalloc(s+15); for these rare processors doesn't look like too much
> > of an issue IMHO.
> >
> > SLOB was the only allocator that could result in microcode that needs
> > further alignment in my testing, and only for the small microcode (2KiB and
> > 3KiB) of ancient processors.
>
> Right, it looks like slob could give objects aligned to < 16.
>
> Ok, please add the jist of this to the commit message without going into
> unnecessary detail too much.
Will do, thanks!
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh