Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755396AbdCGMrR (ORCPT ); Tue, 7 Mar 2017 07:47:17 -0500 Received: from sym2.noone.org ([178.63.92.236]:51762 "EHLO sym2.noone.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754794AbdCGMqp (ORCPT ); Tue, 7 Mar 2017 07:46:45 -0500 Date: Tue, 7 Mar 2017 13:46:10 +0100 From: Tobias Klauser To: Guenter Roeck Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>, Sandra Loosemore , Arnd Bergmann , Andrew Morton , linux-kernel@vger.kernel.org, Ley Foon Tan , nios2-dev@lists.rocketboards.org Subject: Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module' Message-ID: <20170307124609.GF27998@distanz.ch> References: <20170226210338.GA19476@roeck-us.net> <20170228155331.GC27998@distanz.ch> <58B5B934.5040807@codesourcery.com> <20170228181413.GC13455@roeck-us.net> <20170301185817.GA13543@bierbaron.springfield.local> <20170301194520.GA20160@roeck-us.net> <20170302163813.GE27998@distanz.ch> <1ad19c21-6f6e-4516-7df5-d3536df9f4ee@roeck-us.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1ad19c21-6f6e-4516-7df5-d3536df9f4ee@roeck-us.net> X-Editor: Vi IMproved 7.3 User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8792 Lines: 219 On 2017-03-03 at 04:04:41 +0100, Guenter Roeck wrote: > On 03/02/2017 08:38 AM, Tobias Klauser wrote: > >On 2017-03-01 at 20:45:21 +0100, Guenter Roeck wrote: > >>On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote: > >>>Hi Guenter, Tobias and Sandra, > >>> > >>>thanks for your effort here. > >>> > >>>On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: > >>>>On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: > >>>>>On 02/28/2017 08:53 AM, Tobias Klauser wrote: > >>>>>>(adding Sandra Loosemore to Cc due to possible relation to gcc/binutils > >>>>>>for nios2) > >>>>>> > >>>>>>On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: > >>>>>>>Hi Sven, > >>>>>>> > >>>>>>>my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > >>>>>>>update LZ4 compressor module"). The test hangs early during boot before > >>>>>>>any console output is seen. Reverting the offending patch as well as the > >>>>>>>subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 > >>>>>>>and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" > >>>>>>>at the top of the LZ4 decompression code). For reference, bisect log > >>>>>>>is attached. > >>>>>>> > >>>>>>>I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > >>>>>>>and binutils 2.26.1. Scripts used to run the tests are available at > >>>>>>>https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > >>>>>>>Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. > >>>>>> > >>>>>>Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and > >>>>>>binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can > >>>>>>get a kernel booting on latest master branch. AFAICT, none of the > >>>>>>LZ4_decompress_* functions are called during boot. > >>>>>> > >>> > >>>It seems a bit strange that code which is not actually called causes problems like that. > >>> > >>Yes, it is, though it is always possible. The code isn't exactly easy to > >>understand; there may be some hidden caveats such as global variables. It may > >>also be that some jump target exceeds its range (though why that would only > >>be seen with the LZ4 code is another question), or that the compiler gets > >>confused by the forced inlines (disabling that didn't make a difference, > >>though, nor did disabling -O3). > >> > >>>Please let me know if and how I may help you figure out what's happening, especially > >>>regarding the differences between the previous LZ4 and the current implementation. > >>> > >> > >>For my part I am all but clueless. Unless someone has an idea, we may to > >>disable LZ4 support for nios2 for the time being. Does anyone have thoughts > >>on that ? Of course, that would not help if the problem also affects > >>recent gcc/binutil versions on other architectures. > > > >After some further investigations, I'd say this isn't "caused" by LZ4 > >specifically but by a more general problem with one of the nios2 arch > >specific tools involved. > > > >I manually enabled random additional CONFIG_* options and in some cases > >I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return > >-EINVAL in place) while in others I didn't. So I'd rather suspect this > >problem to be connected to the size or structure of the generated vmlinux > >image. > > > >Or could this even be a problem with qemu? Did anyone already verify > >this on the 10m50 devboard? (Unfortunately I don't have any nios2 > >devboard available right now, otherwise I would have done this...) > > > > That is of course always possible. > > >Other than that I'm also becoming all but clueless... One option I > >thought of was using the QEMU monitor to dump the CPU state after the > >hang but so far I didn't manage to get it to work (hints appreciated ;) > > > > Something like > > qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \ > -dtb arch/nios2/boot/dts/10m50_devboard.dtb \ > --append "rdinit=/sbin/init" -initrd busybox-nios2.cpio > > gives you a qemu monitor window. Use "info registers" to see registers. > Looks like it is stuck in init_bootmem_core, or at least that is what it > shows for me. Thanks a lot for the hint, this worked perfectly. I'm not all that familiar with qemu :-/ Using the qemu gdbserver I can indeed confirm that it seems to be stuck in init_bootmem_core: (gdb) file vmlinux Reading symbols from vmlinux...done. (gdb) target remote localhost:1234 Remote debugging using localhost:1234 link_bootmem (bdata=) at mm/bootmem.c:80 80 if (bdata->node_min_pfn < ent->node_min_pfn) { This looks like a very weird place for it to get stuck... So I followed a different path and implemented early printk support for the 8250/16650 serial console on nios2, so I could get debug outputs earlier on (patch below, I'll also officially submit this later one). Now I get the following output on boot: Linux version 4.11.0-rc1-dirty (tobiask@ziws08) (gcc version 7.0.1 20170226 (experimental) (GCC) ) #46 Tue Mar 7 13:40:53 CET 2017 bootconsole [early0] enabled Early console on uart16650 initialized at 0xf8001600 OF: fdt: Error -11 processing FDT Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! ---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! Looks like the in-memory device tree somehow gets corrupted. Not sure yet why and how this is linked to the Kconfig options selected but at least we now have a possibility to use debug messages earlier on. ---%<---%<--- Patch for 8250/16650 early printk support on nios2 (make sure to select CONFIG_EARLY_PRINTK): diff --git a/arch/nios2/Kconfig.debug b/arch/nios2/Kconfig.debug index 2fd08cbfdddb..35b5dd67b15a 100644 --- a/arch/nios2/Kconfig.debug +++ b/arch/nios2/Kconfig.debug @@ -18,7 +18,7 @@ config EARLY_PRINTK bool "Activate early kernel debugging" default y select SERIAL_CORE_CONSOLE - depends on SERIAL_ALTERA_JTAGUART_CONSOLE || SERIAL_ALTERA_UART_CONSOLE + depends on SERIAL_ALTERA_JTAGUART_CONSOLE || SERIAL_ALTERA_UART_CONSOLE || SERIAL_8250_CONSOLE help Enable early printk on console This is useful for kernel debugging when your machine crashes very diff --git a/arch/nios2/kernel/early_printk.c b/arch/nios2/kernel/early_printk.c index c08e4c1486fc..24b4506f4969 100644 --- a/arch/nios2/kernel/early_printk.c +++ b/arch/nios2/kernel/early_printk.c @@ -22,6 +22,8 @@ static unsigned long base_addr; #if defined(CONFIG_SERIAL_ALTERA_JTAGUART_CONSOLE) +#define UART_NAME "altera_jtaguart" + #define ALTERA_JTAGUART_DATA_REG 0 #define ALTERA_JTAGUART_CONTROL_REG 4 #define ALTERA_JTAGUART_CONTROL_WSPACE_MSK 0xFFFF0000 @@ -53,6 +55,8 @@ static void early_console_write(struct console *con, const char *s, unsigned n) #elif defined(CONFIG_SERIAL_ALTERA_UART_CONSOLE) +#define UART_NAME "altera_uart" + #define ALTERA_UART_TXDATA_REG 4 #define ALTERA_UART_STATUS_REG 8 #define ALTERA_UART_STATUS_TRDY 0x0040 @@ -80,9 +84,40 @@ static void early_console_write(struct console *con, const char *s, unsigned n) } } +#elif defined(CONFIG_SERIAL_8250_CONSOLE) + +#define UART_NAME "uart16650" + +#define UART_LSR_TEMT 0x40 /* Transmitter empty */ +#define UART_LSR_THRE 0x20 /* Transmit-hold-register empty */ +#define BOTH_EMPTY (UART_LSR_TEMT | UART_LSR_THRE) + +#define UART_GET_SR() \ + __builtin_ldwio((void *)(base_addr + 0x14)) +#define UART_SET_TX(v) \ + __builtin_stwio((void *)(base_addr), v) + +static void early_console_putc(char c) +{ + while (!((UART_GET_SR() & BOTH_EMPTY) == BOTH_EMPTY)) + ; + + UART_SET_TX(c & 0xff); +} + +static void early_console_write(struct console *con, const char *s, unsigned n) +{ + while (n-- && *s) { + early_console_putc(*s); + if (*s == '\n') + early_console_putc('\r'); + s++; + } +} + #else -# error Neither SERIAL_ALTERA_JTAGUART_CONSOLE nor SERIAL_ALTERA_UART_CONSOLE \ -selected +# error Neither SERIAL_ALTERA_JTAGUART_CONSOLE, SERIAL_ALTERA_UART_CONSOLE, \ + nor SERIAL_8250_CONSOLE selected #endif static struct console early_console_prom = { @@ -95,7 +130,8 @@ static struct console early_console_prom = { void __init setup_early_printk(void) { #if defined(CONFIG_SERIAL_ALTERA_JTAGUART_CONSOLE) || \ - defined(CONFIG_SERIAL_ALTERA_UART_CONSOLE) + defined(CONFIG_SERIAL_ALTERA_UART_CONSOLE) || \ + defined(CONFIG_SERIAL_8250_CONSOLE) base_addr = of_early_console(); #else base_addr = 0; @@ -114,5 +150,5 @@ void __init setup_early_printk(void) early_console = &early_console_prom; register_console(early_console); - pr_info("early_console initialized at 0x%08lx\n", base_addr); + pr_info("Early console on %s initialized at 0x%08lx\n", UART_NAME, base_addr); }