2013-07-09 16:44:22

by Johannes Hirte

[permalink] [raw]
Subject: early microcode on amd is broken when no initramfs provided

When CONFIG_MICROCODE_EARLY is enabled on AMD but no initramfs is provided in the
bootmanager (grub2), the system hangs here:

[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] Linux version 3.10.0-06005-gd2b4a64 (puck@acer) (gcc version 4.8.1 (Gentoo 4.8.1 p1.0, pie-0.5.6) ) #69 SMP PREEMPT Tue Jul 9 18:22:09 CEST 2013
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-06005-gd2b4a64 root=/dev/sda1 ro pcie_aspm=force radeon.dpm=1
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009f7ff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000de555fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000de556000-0x00000000de755fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000de756000-0x00000000dfd3efff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000dfd3f000-0x00000000dfdbefff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000dfdbf000-0x00000000dfebefff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000dfebf000-0x00000000dfef6fff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000dfef7000-0x00000000dfefffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000dff00000-0x00000000dfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f7000000-0x00000000f7ffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ffe00000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000011fffffff] usable
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.6 present.
[ 0.000000] DMI: Packard Bell EasyNote TK81/SJV52_DN, BIOS V2.14 07/27/2011
[ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000000] No AGP bridge found
[ 0.000000] e820: last_pfn = 0x120000 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-FFFFF write-through
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 000000000000 mask FFFF80000000 write-back
[ 0.000000] 1 base 000080000000 mask FFFFC0000000 write-back
[ 0.000000] 2 base 0000C0000000 mask FFFFE0000000 write-back
[ 0.000000] 3 base 0000FFE00000 mask FFFFFFE00000 write-protect
[ 0.000000] 4 base 000100000000 mask FFFFE0000000 write-back
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] TOM2: 0000000120000000 aka 4608M
[ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[ 0.000000] e820: last_pfn = 0xdff00 max_arch_pfn = 0x400000000
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] Base memory trampoline at [ffff880000098000] 98000 size 28672
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] [mem 0x00000000-0x000fffff] page 4k
[ 0.000000] BRK [0x01c08000, 0x01c08fff] PGTABLE
[ 0.000000] BRK [0x01c09000, 0x01c09fff] PGTABLE
[ 0.000000] BRK [0x01c0a000, 0x01c0afff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x11fe00000-0x11fffffff]
[ 0.000000] [mem 0x11fe00000-0x11fffffff] page 2M
[ 0.000000] BRK [0x01c0b000, 0x01c0bfff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x11c000000-0x11fdfffff]
[ 0.000000] [mem 0x11c000000-0x11fdfffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x100000000-0x11bffffff]
[ 0.000000] [mem 0x100000000-0x11bffffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x00100000-0xde555fff]
[ 0.000000] [mem 0x00100000-0x001fffff] page 4k
[ 0.000000] [mem 0x00200000-0x3fffffff] page 2M
[ 0.000000] [mem 0x40000000-0xbfffffff] page 1G
[ 0.000000] [mem 0xc0000000-0xde3fffff] page 2M
[ 0.000000] [mem 0xde400000-0xde555fff] page 4k
[ 0.000000] init_memory_mapping: [mem 0xde756000-0xdfd3efff]
[ 0.000000] [mem 0xde756000-0xde7fffff] page 4k
[ 0.000000] [mem 0xde800000-0xdfbfffff] page 2M
[ 0.000000] [mem 0xdfc00000-0xdfd3efff] page 4k
[ 0.000000] BRK [0x01c0c000, 0x01c0cfff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0xdfef7000-0xdfefffff]
[ 0.000000] [mem 0xdfef7000-0xdfefffff] page 4k
[ 0.000000] ACPI: RSDP 00000000000fe020 00024 (v02 ACRSYS)
[ 0.000000] ACPI: XSDT 00000000dfef6120 0005C (v01 ACRSYS ACRPRDCT 00000003 01000013)
[ 0.000000] ACPI: FACP 00000000dfef5000 000F4 (v04 ACRSYS ACRPRDCT 00000003 1025 01000013)
[ 0.000000] ACPI: DSDT 00000000dfee6000 0B9EF (v01 ACRSYS ACRPRDCT F0000000 1025 01000013)
[ 0.000000] ACPI: FACS 00000000dfe99000 00040
[ 0.000000] ACPI: HPET 00000000dfef4000 00038 (v01 ACRSYS ACRPRDCT 00000001 1025 01000013)
[ 0.000000] ACPI: APIC 00000000dfef3000 00084 (v02 ACRSYS ACRPRDCT 00000001 1025 01000013)
[ 0.000000] ACPI: MCFG 00000000dfef2000 0003C (v01 ACRSYS ACRPRDCT 00000001 1025 01000013)
[ 0.000000] ACPI: BOOT 00000000dfee5000 00028 (v01 ACRSYS ACRPRDCT 00000001 1025 01000013)
[ 0.000000] ACPI: SLIC 00000000dfee4000 00176 (v01 ACRSYS ACRPRDCT 00000001 1025 01000013)
[ 0.000000] ACPI: SSDT 00000000dfee3000 00386 (v01 AMD POWERNOW 00000001 AMD 00000001)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000011fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x00000000-0x11fffffff]
[ 0.000000] NODE_DATA [mem 0x11fffa000-0x11fffbfff]
[ 0.000000] [ffffea0000000000-ffffea00047fffff] PMD -> [ffff88011b600000-ffff88011f5fffff] on node 0
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x00001000-0x00ffffff]
[ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
[ 0.000000] Normal [mem 0x100000000-0x11fffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00001000-0x0009efff]
[ 0.000000] node 0: [mem 0x00100000-0xde555fff]
[ 0.000000] node 0: [mem 0xde756000-0xdfd3efff]
[ 0.000000] node 0: [mem 0xdfef7000-0xdfefffff]
[ 0.000000] node 0: [mem 0x100000000-0x11fffffff]
[ 0.000000] On node 0 totalpages: 1047270
[ 0.000000] DMA zone: 64 pages used for memmap
[ 0.000000] DMA zone: 22 pages reserved
[ 0.000000] DMA zone: 3998 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 14254 pages used for memmap
[ 0.000000] DMA32 zone: 912200 pages, LIFO batch:31
[ 0.000000] Normal zone: 2048 pages used for memmap
[ 0.000000] Normal zone: 131072 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0x408
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x00] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x00] disabled)
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 4, version 33, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x1002a201 base: 0xfed00000
[ 0.000000] smpboot: Allowing 4 CPUs, 2 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 40
[ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dffff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000e0000-0x000fffff]
[ 0.000000] PM: Registered nosave memory: [mem 0xde556000-0xde755fff]
[ 0.000000] PM: Registered nosave memory: [mem 0xdfd3f000-0xdfdbefff]
[ 0.000000] PM: Registered nosave memory: [mem 0xdfdbf000-0xdfebefff]
[ 0.000000] PM: Registered nosave memory: [mem 0xdfebf000-0xdfef6fff]
[ 0.000000] PM: Registered nosave memory: [mem 0xdff00000-0xdfffffff]
[ 0.000000] PM: Registered nosave memory: [mem 0xe0000000-0xf6ffffff]
[ 0.000000] PM: Registered nosave memory: [mem 0xf7000000-0xf7ffffff]
[ 0.000000] PM: Registered nosave memory: [mem 0xf8000000-0xfebfffff]
[ 0.000000] PM: Registered nosave memory: [mem 0xfec00000-0xfec00fff]
[ 0.000000] PM: Registered nosave memory: [mem 0xfec01000-0xfec0ffff]
[ 0.000000] PM: Registered nosave memory: [mem 0xfec10000-0xfec10fff]
[ 0.000000] PM: Registered nosave memory: [mem 0xfec11000-0xfedfffff]
[ 0.000000] PM: Registered nosave memory: [mem 0xfee00000-0xfee00fff]
[ 0.000000] PM: Registered nosave memory: [mem 0xfee01000-0xffdfffff]
[ 0.000000] PM: Registered nosave memory: [mem 0xffe00000-0xffffffff]
[ 0.000000] e820: [mem 0xe0000000-0xf6ffffff] available for PCI devices
[ 0.000000] setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:4 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 26 pages/cpu @ffff88011fc00000 s74880 r8192 d23424 u524288
[ 0.000000] pcpu-alloc: s74880 r8192 d23424 u524288 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0 1 2 3
[ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 1030882
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-06005-gd2b4a64 root=/dev/sda1 ro pcie_aspm=force radeon.dpm=1
[ 0.000000] PCIe ASPM is forcibly enabled
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Node 0: aperture @ d4000000 size 32 MB
[ 0.000000] Aperture pointing to e820 RAM. Ignoring.
[ 0.000000] Your BIOS doesn't leave a aperture memory hole
[ 0.000000] Please enable the IOMMU option in the BIOS setup
[ 0.000000] This costs you 64 MB of RAM
[ 0.000000] Mapping aperture over 65536 KB of RAM @ d4000000
[ 0.000000] PM: Registered nosave memory: [mem 0xd4000000-0xd7ffffff]
[ 0.000000] Memory: 3979016K/4189080K available (4684K kernel code, 507K rwdata, 2176K rodata, 792K init, 756K bss, 210064K reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] Dump stacks of tasks blocking RCU-preempt GP.
[ 0.000000] RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
[ 0.000000] NR_IRQS:4352 nr_irqs:712 16
[ 0.000000] spurious 8259A interrupt: IRQ7.
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] hpet clockevent registered
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.001000] tsc: Detected 2094.751 MHz processor
[ 0.000003] Calibrating delay loop (skipped), value calculated using timer frequency.. 4189.50 BogoMIPS (lpj=2094751)
[ 0.000298] pid_max: default: 32768 minimum: 301
[ 0.000470] Security Framework initialized
[ 0.000915] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
[ 0.002711] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 0.003617] Mount-cache hash table entries: 256
[ 0.003982] Initializing cgroup subsys devices
[ 0.004136] Initializing cgroup subsys freezer
[ 0.004285] Initializing cgroup subsys blkio
[ 0.004451] tseg: 00dff00000
[ 0.004454] CPU: Physical Processor ID: 0
[ 0.004613] CPU: Processor Core ID: 0
[ 0.004761] mce: CPU supports 6 MCE banks
[ 0.004914] LVT offset 0 assigned for vector 0xf9
[ 0.005061] process: using AMD E400 aware idle routine
[ 0.005208] Last level iTLB entries: 4KB 512, 2MB 16, 4MB 8
Last level dTLB entries: 4KB 512, 2MB 128, 4MB 64
tlb_flushall_shift: 4
[ 0.005653] Freeing SMP alternatives memory: 12K (ffffffff81b46000 - ffffffff81b49000)
[ 0.005807] ACPI: Core revision 20130517
[ 0.012354] ACPI: All ACPI Tables successfully acquired
[ 0.617782] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.627941] smpboot: CPU0: AMD Athlon(tm) II P320 Dual-Core Processor (fam: 10, model: 06, stepping: 03)
[ 0.729753] Performance Events: AMD PMU driver.
[ 0.729943] ... version: 0
[ 0.730092] ... bit width: 48
[ 0.730240] ... generic registers: 4
[ 0.730388] ... value mask: 0000ffffffffffff
[ 0.730536] ... max period: 00007fffffffffff
[ 0.730684] ... fixed-purpose events: 0
[ 0.730835] ... event mask: 000000000000000f
[ 0.731018] process: System has AMD C1E enabled
[ 0.731173] process: Switch to broadcast mode on CPU0
[ 0.739310] smpboot: Booting Node 0, Processors #1
[ 0.752552] Brought up 2 CPUs
[ 0.752572] process: Switch to broadcast mode on CPU1
[ 0.752992] smpboot: Total of 2 processors activated (8379.00 BogoMIPS)
[ 0.753731] devtmpfs: initialized
[ 0.754198] PM: Registering ACPI NVS region [mem 0xde556000-0xde755fff] (2097152 bytes)
[ 0.754387] PM: Registering ACPI NVS region [mem 0xdfdbf000-0xdfebefff] (1048576 bytes)
[ 0.754719] xor: measuring software checksum speed
[ 0.764221] prefetch64-sse: 6688.000 MB/sec
[ 0.774223] generic_sse: 6424.000 MB/sec
[ 0.774367] xor: using function: prefetch64-sse (6688.000 MB/sec)
[ 0.774564] NET: Registered protocol family 16
[ 0.775357] node 0 link 0: io port [0, ffffff]
[ 0.775361] TOM: 00000000e0000000 aka 3584M
[ 0.775513] Fam 10h mmconf [mem 0xf7000000-0xf7ffffff]
[ 0.775515] node 0 link 0: mmio [a0000, bffff]
[ 0.775518] node 0 link 0: mmio [e0000000, f6ffffff]
[ 0.775520] node 0 link 0: mmio [f7000000, f7ffffff] ==> none
[ 0.775522] node 0 link 0: mmio [f8000000, ffdfffff]
[ 0.775524] TOM2: 0000000120000000 aka 4608M
[ 0.775674] bus: [bus 00-1f] on node 0 link 0
[ 0.775675] bus: 00 [io 0x0000-0xffff]
[ 0.775677] bus: 00 [mem 0x000a0000-0x000bffff]
[ 0.775677] bus: 00 [mem 0xe0000000-0xf6ffffff]
[ 0.775678] bus: 00 [mem 0xf8000000-0xffffffff]
[ 0.775679] bus: 00 [mem 0x120000000-0xfcffffffff]
[ 0.775736] ACPI: bus type PCI registered
[ 0.776126] PCI: MMCONFIG for domain 0000 [bus 00-0f] at [mem 0xf7000000-0xf7ffffff] (base 0xf7000000)
[ 0.776395] PCI: MMCONFIG at [mem 0xf7000000-0xf7ffffff] reserved in E820
[ 0.777637] PCI: Using configuration type 1 for base access
[ 0.778134] mtrr: your CPUs had inconsistent fixed MTRR settings
[ 0.778327] mtrr: your CPUs had inconsistent variable MTRR settings
[ 0.778473] mtrr: probably your BIOS does not setup all CPUs.
[ 0.778621] mtrr: corrected configuration.
[ 0.784751] bio: create slab <bio-0> at 0
[ 0.801284] raid6: sse2x1 2601 MB/s
[ 0.818261] raid6: sse2x2 3246 MB/s
[ 0.835260] raid6: sse2x4 3511 MB/s
[ 0.835409] raid6: using algorithm sse2x4 (3511 MB/s)
[ 0.835558] raid6: using intx1 recovery algorithm
[ 0.835839] ACPI: Added _OSI(Module Device)
[ 0.835990] ACPI: Added _OSI(Processor Device)
[ 0.836138] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 0.836303] ACPI: Added _OSI(Processor Aggregator Device)
[ 0.838013] ACPI: EC: Look up EC in DSDT
[ 0.839711] ACPI: Executed 1 blocks of module-level executable AML code
[ 0.844083] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored

This was copied out of dmesg from a running kernel with CONFIG_MICROCODE_EARLY disabled.

regards,
Johannes


2013-07-09 18:47:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: early microcode on amd is broken when no initramfs provided

On Tue, Jul 09, 2013 at 06:36:01PM +0200, Johannes Hirte wrote:
> When CONFIG_MICROCODE_EARLY is enabled on AMD but no initramfs is provided in the
> bootmanager (grub2), the system hangs here:

I'll take a look soonish if Jacob doesn't beat me to it.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-10 03:59:48

by Borislav Petkov

[permalink] [raw]
Subject: Re: early microcode on amd is broken when no initramfs provided

On Tue, Jul 09, 2013 at 10:53:31PM -0500, Jacob Shin wrote:
> I won't have access to a box for a while, Boris or Suravee, could you
> please try and reproduce it and get the stack trace when you get the
> chance?
>
> So sorry,

No worries, Jacob, I'm on it. Take your time. :)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-10 07:31:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: early microcode on amd is broken when no initramfs provided

On Tue, Jul 09, 2013 at 06:36:01PM +0200, Johannes Hirte wrote:
> When CONFIG_MICROCODE_EARLY is enabled on AMD but no initramfs is provided in the
> bootmanager (grub2), the system hangs here:

Hmm, I can't reproduce it here.

grub2 entry is:

menuentry 'Debian GNU/Linux, with Linux 3.10.0+' --class debian --class gnu-linux --class gnu --class os {
load_video
insmod gzio
insmod part_msdos
insmod ext2
set root='(hd0,msdos1)'
search --no-floppy --fs-uuid --set=root adbbd17b-6e04-4458-814f-9a2b75a4d91e
echo 'Loading Linux 3.10.0+ ...'
linux /boot/vmlinuz-3.10.0+ root=/dev/sda1 ro resume=/dev/sda2 ignore_loglevel
}

Kernel is: v3.10-8587-g496322b

.config settings are:

$ zgrep -E "(INITRD|MICROCODE)" /proc/config.gz
CONFIG_BLK_DEV_INITRD=y
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_MICROCODE_INTEL_LIB=y
CONFIG_MICROCODE_INTEL_EARLY=y
CONFIG_MICROCODE_AMD_EARLY=y
CONFIG_MICROCODE_EARLY=y
# CONFIG_ACPI_INITRD_TABLE_OVERRIDE is not set

Can you send me your .config and your grub entry please?

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-11 21:05:28

by Johannes Hirte

[permalink] [raw]
Subject: Re: early microcode on amd is broken when no initramfs provided

On Wed, 10 Jul 2013 09:30:49 +0200
Borislav Petkov <[email protected]> wrote:

> On Tue, Jul 09, 2013 at 06:36:01PM +0200, Johannes Hirte wrote:
> > When CONFIG_MICROCODE_EARLY is enabled on AMD but no initramfs is
> > provided in the bootmanager (grub2), the system hangs here:
>
> Hmm, I can't reproduce it here.
>
> grub2 entry is:
>
> menuentry 'Debian GNU/Linux, with Linux 3.10.0+' --class debian
> --class gnu-linux --class gnu --class os { load_video
> insmod gzio
> insmod part_msdos
> insmod ext2
> set root='(hd0,msdos1)'
> search --no-floppy --fs-uuid --set=root
> adbbd17b-6e04-4458-814f-9a2b75a4d91e echo 'Loading Linux
> 3.10.0+ ...' linux /boot/vmlinuz-3.10.0+ root=/dev/sda1 ro
> resume=/dev/sda2 ignore_loglevel }
>
> Kernel is: v3.10-8587-g496322b
>
> .config settings are:
>
> $ zgrep -E "(INITRD|MICROCODE)" /proc/config.gz
> CONFIG_BLK_DEV_INITRD=y
> CONFIG_MICROCODE=y
> CONFIG_MICROCODE_INTEL=y
> CONFIG_MICROCODE_AMD=y
> CONFIG_MICROCODE_OLD_INTERFACE=y
> CONFIG_MICROCODE_INTEL_LIB=y
> CONFIG_MICROCODE_INTEL_EARLY=y
> CONFIG_MICROCODE_AMD_EARLY=y
> CONFIG_MICROCODE_EARLY=y
> # CONFIG_ACPI_INITRD_TABLE_OVERRIDE is not set
>
> Can you send me your .config and your grub entry please?
>
> Thanks.
>

grub entry:

menuentry 'Gentoo GNU/Linux 3.10.0-09080-g19d2f8e' --class gentoo
--class gnu-linux --class gnu --class os $menuentry_id_option
'gnulinux-simple-d044ac73-1dd2-4250-b864-5cb25fd67192' { load_video
insmod gzio insmod part_msdos
insmod btrfs
set root='hd0,msdos3'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root
--hint-bios=hd0,msdos3 --hint-efi=hd0,msdos3
--hint-baremetal=ahci0,msdos3 c684c3ff-5bac-4ba8-8f63-c9036c2ad233
else search --no-floppy --fs-uuid --set=root
c684c3ff-5bac-4ba8-8f63-c9036c2ad233 fi echo 'Linux
3.10.0-09080-g19d2f8e wird geladen …'
linux /vmlinuz-3.10.0-09080-g19d2f8e root=/dev/sda1 ro
pcie_aspm=force radeon.dpm=1 }

config is attached


Attachments:
(No filename) (1.99 kB)
config.bz2 (17.02 kB)
Download all attachments

2013-07-16 17:01:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: early microcode on amd is broken when no initramfs provided

On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
> config is attached

Ok, I can reproduce the hang with your config but even with:

$ grep MICROCODE .config
# CONFIG_MICROCODE is not set
# CONFIG_MICROCODE_INTEL_EARLY is not set
# CONFIG_MICROCODE_AMD_EARLY is not set

which means, it cannot be microcode-related.

And I'd bet if you wait a minute (yep, it should be exactly 60 seconds)
the boot would probably continue. And if so, this is that 60 sec delay
where the kernel tries to find firmware.

Hmm...

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-20 19:01:35

by Torsten Kaiser

[permalink] [raw]
Subject: Re: early microcode on amd is broken when no initramfs provided

On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov <[email protected]> wrote:
> On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
>> config is attached
>
> Ok, I can reproduce the hang with your config but even with:
>
> $ grep MICROCODE .config
> # CONFIG_MICROCODE is not set
> # CONFIG_MICROCODE_INTEL_EARLY is not set
> # CONFIG_MICROCODE_AMD_EARLY is not set
>
> which means, it cannot be microcode-related.
>
> And I'd bet if you wait a minute (yep, it should be exactly 60 seconds)
> the boot would probably continue. And if so, this is that 60 sec delay
> where the kernel tries to find firmware.
>
> Hmm...

I have the same problem: Booting 3.11-rc1 hangs after the line:
ACPI: Executed 3 blocks of module-level executable AML code

I bisected it down to the early microcode changes:
757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
fixup) completely fail to boot (No output beyond "Booting kernel") ,
from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make
find_ucode_in_initrd() __init") I'm seeing this hang.

Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system
now sucessfully boots 3.11-rc1.

Trying to debug this I found the following hack to also solve the boot problem:
Removing the following two lines from collect_cpu_info_amd_early()
from arch/x86/kernel/microcode_amd_early.c:
c->microcode = rev;
c->x86 = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff);

But I can't make sense out of that. And if I try to trace who updates
->x86 it get even more confusing.
Normaly only cpu_detect() seems to update cpuinfo_x86.x86 but now it
seems to fight with collect_cpu_info_amd_early().
On my system this happens:
(Output is always address of the struct cpuinfo_x86 -> value that gets
written into it)

Very early boot:
cpu_detect ffffffff81c8ba40 -> 16

BSP == CPU0 calls load_ucode_ap() via cpu_init():
collect_cpu_info_amd_early ffff880337c10fc0 -> 16
(That is the place I patched out to get the system to boot)

BSP == CPU0 via identify_boot_cpu():
cpu_detect ffffffff81c8ba40 -> 16

BSP == CPU0 stores boot_cpu_data in its per-cpu structure via
smp_store_boot_cpu_info():
smpboot: BSP: store ffffffff81c8ba40 in ffff880337c10fc0

smpboot starts activating the secondary CPUs: Each would in
start_secondary() first call load_ucode_ap() via cpu_init() and then
identidfy_secondary_cpu() via smp_callin():
collect_cpu_info_amd_early ffff880337c50fc0
smpboot: identify_sec_cpu:1/ffff880337c50fc0
cpu_detect ffff880337c50fc0 -> 16

collect_cpu_info_amd_early ffff880337c90fc0
smpboot: identify_sec_cpu:2/ffff880337c90fc0
cpu_detect ffff880337c90fc0 -> 16

collect_cpu_info_amd_early ffff880337cd0fc0
smpboot: identify_sec_cpu:3/ffff880337cd0fc0
cpu_detect ffff880337cd0fc0 -> 16

collect_cpu_info_amd_early ffff880337d10fc0
smpboot: identify_sec_cpu:4/ffff880337d10fc0
cpu_detect ffff880337d10fc0 -> 16

collect_cpu_info_amd_early ffff880337d50fc0
smpboot: identify_sec_cpu:5/ffff880337d50fc0
cpu_detect ffff880337d50fc0 -> 16


It seems the code for updating 'struct cpuinfo_x86 *C' in
collect_cpu_info_amd_early() is useless, because it will be
overwritten first by smp_store_cpu_info() and then again by
identify_secondary_cpu(c) and wrong, because at that point the per-cpu
structure should not be used yet, as smp_store_cpu_info() did not run
yet.
But something else seems to be using the per-cpu structure of the BSP
between its cpu_init() and smp_store_boot_cpu_info().

And its cpu_has_amd_erratum(): It uses cpuinfo_x86.x86 do decide if it
need to fall back to boot_cpu_data, but because
collect_cpu_info_amd_early() has filled that field, but not
.x86_vendor (that is still 0 == X86_VENDOR_INTEL) the erratas are not
applied to the BSP and then something in ACPI gets stuck.

Does this diagnostic make sense / should I send a patch?

Torsten

2013-07-20 22:59:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: early microcode on amd is broken when no initramfs provided

On Sat, Jul 20, 2013 at 09:01:33PM +0200, Torsten Kaiser wrote:
> On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov <[email protected]> wrote:
> > On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
> >> config is attached
> >
> > Ok, I can reproduce the hang with your config but even with:
> >
> > $ grep MICROCODE .config
> > # CONFIG_MICROCODE is not set
> > # CONFIG_MICROCODE_INTEL_EARLY is not set
> > # CONFIG_MICROCODE_AMD_EARLY is not set
> >
> > which means, it cannot be microcode-related.
> >
> > And I'd bet if you wait a minute (yep, it should be exactly 60 seconds)
> > the boot would probably continue. And if so, this is that 60 sec delay
> > where the kernel tries to find firmware.
> >
> > Hmm...
>
> I have the same problem: Booting 3.11-rc1 hangs after the line:
> ACPI: Executed 3 blocks of module-level executable AML code
>
> I bisected it down to the early microcode changes:
> 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
> implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
> fixup) completely fail to boot (No output beyond "Booting kernel") ,
> from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make
> find_ucode_in_initrd() __init") I'm seeing this hang.
>
> Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system
> now sucessfully boots 3.11-rc1.

Ok, I need to be able to reproduce that first - I wasn't that successful
with Johannes' setup.

So, can you please send .config and how you're loading your microcode?
Is it in the initrd or are you doing that later, how? Grub entry please.

Also, is it just plain v3.11-rc1 or with patches ontop?

Also, /proc/cpuinfo please.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-21 04:01:20

by Torsten Kaiser

[permalink] [raw]
Subject: Re: early microcode on amd is broken when no initramfs provided

On Sun, Jul 21, 2013 at 12:59 AM, Borislav Petkov <[email protected]> wrote:
> On Sat, Jul 20, 2013 at 09:01:33PM +0200, Torsten Kaiser wrote:
>> On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov <[email protected]> wrote:
>> > On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
>> >> config is attached
>> >
>> > Ok, I can reproduce the hang with your config but even with:
>> >
>> > $ grep MICROCODE .config
>> > # CONFIG_MICROCODE is not set
>> > # CONFIG_MICROCODE_INTEL_EARLY is not set
>> > # CONFIG_MICROCODE_AMD_EARLY is not set
>> >
>> > which means, it cannot be microcode-related.
>> >
>> > And I'd bet if you wait a minute (yep, it should be exactly 60 seconds)
>> > the boot would probably continue. And if so, this is that 60 sec delay
>> > where the kernel tries to find firmware.
>> >
>> > Hmm...
>>
>> I have the same problem: Booting 3.11-rc1 hangs after the line:
>> ACPI: Executed 3 blocks of module-level executable AML code
>>
>> I bisected it down to the early microcode changes:
>> 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
>> implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
>> fixup) completely fail to boot (No output beyond "Booting kernel") ,
>> from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make
>> find_ucode_in_initrd() __init") I'm seeing this hang.
>>
>> Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system
>> now sucessfully boots 3.11-rc1.
>
> Ok, I need to be able to reproduce that first - I wasn't that successful
> with Johannes' setup.
>
> So, can you please send .config and how you're loading your microcode?
> Is it in the initrd or are you doing that later, how? Grub entry please.
>
> Also, is it just plain v3.11-rc1 or with patches ontop?
>
> Also, /proc/cpuinfo please.

.config and cpuinfo attached.
Microcode seems not to be loaded at all, for MICROCODE_EARLY I did not
attach the needed file / cpio and the normal update mechanism seems to
not have a newer microcode that what the BIOS is providing.
I'm using a custom initrd, but that can't be used for MICROCODE_EARLY
because its compressed and does not contain a AuthenticAMD.bin. Its
also not containing microcode_amd.bin, because I'm suppling that via
CONFIG_EXTRA_FIRMWARE.
Grub entry:
title 3.11.0-rc1-crypt
root (hd0,0)
kernel (hd0,0)/boot/kernel-3.11.0-rc1 fastboot crypt_root=/dev/md6
video=1280x1024 radeon.dpm=1
initrd (hd0,0)/boot/ramfs-2011.gz
savedefault

I was using plain 3.11-rc1 except the changes I made to debug this.

What I think you need: A system that is fatally affected by AMD
Erratum 400 and an 64bit kernel.

>From my debugging I found the following sequence of events occurs on my system:
The BSP will call load_ucode_ap().
That will call collect_cpu_info_amd_early(), which will fill the
cpuinfo_x86.x86 and cpuinfo_x86.microcode fields of the
cpu_info-per-cpu-structure that has not yet been setup. Because this
code will only be used with MICROCODE_EARLY disabling this options
make my system boot. OTOH this function is called regardless if
AuthenticAMD.bin is available or not, thats why I'm hitting it even
without the special cpio.
Then the BSP will call init_amd() to apply the errata fixes. That uses
cpu_has_amd_erratum(), but that function is not using the cpuinfo_x86
that was supplied to init_amd() (And used for the following
set_cpu_bug() is the erratum was found!), but instead is guessing
itself if it should use the per-cpu data or boot_cpu_data. And it uses
the not yet initialized per-cpu data for that guess. Which normally
works fine, because that will all be zeroed out, but
collect_cpu_info_amd_early() has filled ->x86 and so
cpu_has_amd_erratum() wil use the partly filled per-cpu data instead
of the correct boot_cpu_data. But because collect_cpu_info_amd_early()
did not fill ->x86_vendor that field is still 0 == X86_VENDOR_INTEL
and cpu_has_amd_erratum() will lie that no erratum is present.
So the C1E work around is not applied and as soon as ACPI enables this
the boot hangs.

Something like the following (whitespace mangled by Gmail, if it looks
OK for you, I will send it as a clean patch) fixes
cpu_has_amd_erratum() for me, but I did not look how the early
microcode loading should work if AuthenticAMD.bin is available to
offer a fix the premature accesses to per-cpu cpu_info.

--- 3.11-rc1/arch/x86/kernel/cpu/amd.c.orig 2013-07-21
05:42:42.130346496 +0200
+++ 3.11-rc1/arch/x86/kernel/cpu/amd.c 2013-07-21 05:45:09.420345843 +0200
@@ -512,7 +512,7 @@

static const int amd_erratum_383[];
static const int amd_erratum_400[];
-static bool cpu_has_amd_erratum(const int *erratum);
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum);

static void __cpuinit init_amd(struct cpuinfo_x86 *c)
{
@@ -729,11 +729,11 @@
value &= ~(1ULL << 24);
wrmsrl_safe(MSR_AMD64_BU_CFG2, value);

- if (cpu_has_amd_erratum(amd_erratum_383))
+ if (cpu_has_amd_erratum(c, amd_erratum_383))
set_cpu_bug(c, X86_BUG_AMD_TLB_MMATCH);
}

- if (cpu_has_amd_erratum(amd_erratum_400))
+ if (cpu_has_amd_erratum(c, amd_erratum_400))
set_cpu_bug(c, X86_BUG_AMD_APIC_C1E);

rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);
@@ -879,22 +879,14 @@
static const int amd_erratum_383[] =
AMD_OSVW_ERRATUM(3, AMD_MODEL_RANGE(0x10, 0, 0, 0xff, 0xf));

-static bool cpu_has_amd_erratum(const int *erratum)
+static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
{
- struct cpuinfo_x86 *cpu = __this_cpu_ptr(&cpu_info);
int osvw_id = *erratum++;
u32 range;
u32 ms;

- /*
- * If called early enough that current_cpu_data hasn't been initialized
- * yet, fall back to boot_cpu_data.
- */
- if (cpu->x86 == 0)
- cpu = &boot_cpu_data;
-
- if (cpu->x86_vendor != X86_VENDOR_AMD)
- return false;
+ /* Should never be called on Non-AMD-CPUs */
+ BUG_ON(cpu->x86_vendor != X86_VENDOR_AMD);

if (osvw_id >= 0 && osvw_id < 65536 &&
cpu_has(cpu, X86_FEATURE_OSVW)) {


Attachments:
config.txt (21.96 kB)
cpuinfo.txt (5.47 kB)
Download all attachments

2013-07-21 11:52:49

by Johannes Hirte

[permalink] [raw]
Subject: Re: early microcode on amd is broken when no initramfs provided

On Sun, 21 Jul 2013 00:59:11 +0200
Borislav Petkov <[email protected]> wrote:

> On Sat, Jul 20, 2013 at 09:01:33PM +0200, Torsten Kaiser wrote:
> > On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov <[email protected]>
> > wrote:
> > > On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
> > >> config is attached
> > >
> > > Ok, I can reproduce the hang with your config but even with:
> > >
> > > $ grep MICROCODE .config
> > > # CONFIG_MICROCODE is not set
> > > # CONFIG_MICROCODE_INTEL_EARLY is not set
> > > # CONFIG_MICROCODE_AMD_EARLY is not set
> > >
> > > which means, it cannot be microcode-related.
> > >
> > > And I'd bet if you wait a minute (yep, it should be exactly 60
> > > seconds) the boot would probably continue. And if so, this is
> > > that 60 sec delay where the kernel tries to find firmware.
> > >
> > > Hmm...
> >
> > I have the same problem: Booting 3.11-rc1 hangs after the line:
> > ACPI: Executed 3 blocks of module-level executable AML code
> >
> > I bisected it down to the early microcode changes:
> > 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
> > implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
> > fixup) completely fail to boot (No output beyond "Booting kernel") ,
> > from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make
> > find_ucode_in_initrd() __init") I'm seeing this hang.
> >
> > Just turning CONFIG_MICROCODE_EARLY off solves the problem: The
> > system now sucessfully boots 3.11-rc1.
>
> Ok, I need to be able to reproduce that first - I wasn't that
> successful with Johannes' setup.

Strange, I've bisected to the same commit with the config I've send you.

> So, can you please send .config and how you're loading your microcode?
> Is it in the initrd or are you doing that later, how? Grub entry
> please.
>
> Also, is it just plain v3.11-rc1 or with patches ontop?
>
> Also, /proc/cpuinfo please.
>
> Thanks.

/proc/cpuinfo:

processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 6
model name : AMD Athlon(tm) II P320 Dual-Core Processor
stepping : 3
microcode : 0x10000b6
cpu MHz : 800.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl
nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm
extapic cr8_legacy abm sse4a 3dnowprefetch osvw ibs skinit wdt
nodeid_msr hw_pstate npt lbrv svm_lock nrip_save bogomips :
4189.33 TLB size : 1024 4K pages clflush size : 64
cache_alignment : 64 address sizes : 48 bits physical, 48 bits
virtual power management: ts ttp tm stc 100mhzsteps hwpstate

processor : 1
vendor_id : AuthenticAMD
cpu family : 16
model : 6
model name : AMD Athlon(tm) II P320 Dual-Core Processor
stepping : 3
microcode : 0x10000b6
cpu MHz : 800.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl
nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm
extapic cr8_legacy abm sse4a 3dnowprefetch osvw ibs skinit wdt
nodeid_msr hw_pstate npt lbrv svm_lock nrip_save bogomips :
4189.33 TLB size : 1024 4K pages clflush size : 64
cache_alignment : 64 address sizes : 48 bits physical, 48 bits
virtual power management: ts ttp tm stc 100mhzsteps hwpstate