Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751470AbdCCDFu (ORCPT ); Thu, 2 Mar 2017 22:05:50 -0500 Received: from bh-25.webhostbox.net ([208.91.199.152]:52330 "EHLO bh-25.webhostbox.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750941AbdCCDFt (ORCPT ); Thu, 2 Mar 2017 22:05:49 -0500 Subject: Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module' To: Tobias Klauser References: <20170226210338.GA19476@roeck-us.net> <20170228155331.GC27998@distanz.ch> <58B5B934.5040807@codesourcery.com> <20170228181413.GC13455@roeck-us.net> <20170301185817.GA13543@bierbaron.springfield.local> <20170301194520.GA20160@roeck-us.net> <20170302163813.GE27998@distanz.ch> Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>, Sandra Loosemore , Arnd Bergmann , Andrew Morton , linux-kernel@vger.kernel.org, Ley Foon Tan , nios2-dev@lists.rocketboards.org From: Guenter Roeck Message-ID: <1ad19c21-6f6e-4516-7df5-d3536df9f4ee@roeck-us.net> Date: Thu, 2 Mar 2017 19:04:41 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20170302163813.GE27998@distanz.ch> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated_sender: linux@roeck-us.net X-OutGoing-Spam-Status: No, score=-1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - bh-25.webhostbox.net X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - roeck-us.net X-Get-Message-Sender-Via: bh-25.webhostbox.net: authenticated_id: linux@roeck-us.net X-Authenticated-Sender: bh-25.webhostbox.net: linux@roeck-us.net X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4072 Lines: 86 On 03/02/2017 08:38 AM, Tobias Klauser wrote: > On 2017-03-01 at 20:45:21 +0100, Guenter Roeck wrote: >> On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote: >>> Hi Guenter, Tobias and Sandra, >>> >>> thanks for your effort here. >>> >>> On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: >>>> On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: >>>>> On 02/28/2017 08:53 AM, Tobias Klauser wrote: >>>>>> (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils >>>>>> for nios2) >>>>>> >>>>>> On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: >>>>>>> Hi Sven, >>>>>>> >>>>>>> my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: >>>>>>> update LZ4 compressor module"). The test hangs early during boot before >>>>>>> any console output is seen. Reverting the offending patch as well as the >>>>>>> subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 >>>>>>> and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" >>>>>>> at the top of the LZ4 decompression code). For reference, bisect log >>>>>>> is attached. >>>>>>> >>>>>>> I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 >>>>>>> and binutils 2.26.1. Scripts used to run the tests are available at >>>>>>> https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. >>>>>>> Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. >>>>>> >>>>>> Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and >>>>>> binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can >>>>>> get a kernel booting on latest master branch. AFAICT, none of the >>>>>> LZ4_decompress_* functions are called during boot. >>>>>> >>> >>> It seems a bit strange that code which is not actually called causes problems like that. >>> >> Yes, it is, though it is always possible. The code isn't exactly easy to >> understand; there may be some hidden caveats such as global variables. It may >> also be that some jump target exceeds its range (though why that would only >> be seen with the LZ4 code is another question), or that the compiler gets >> confused by the forced inlines (disabling that didn't make a difference, >> though, nor did disabling -O3). >> >>> Please let me know if and how I may help you figure out what's happening, especially >>> regarding the differences between the previous LZ4 and the current implementation. >>> >> >> For my part I am all but clueless. Unless someone has an idea, we may to >> disable LZ4 support for nios2 for the time being. Does anyone have thoughts >> on that ? Of course, that would not help if the problem also affects >> recent gcc/binutil versions on other architectures. > > After some further investigations, I'd say this isn't "caused" by LZ4 > specifically but by a more general problem with one of the nios2 arch > specific tools involved. > > I manually enabled random additional CONFIG_* options and in some cases > I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return > -EINVAL in place) while in others I didn't. So I'd rather suspect this > problem to be connected to the size or structure of the generated vmlinux > image. > > Or could this even be a problem with qemu? Did anyone already verify > this on the 10m50 devboard? (Unfortunately I don't have any nios2 > devboard available right now, otherwise I would have done this...) > That is of course always possible. > Other than that I'm also becoming all but clueless... One option I > thought of was using the QEMU monitor to dump the CPU state after the > hang but so far I didn't manage to get it to work (hints appreciated ;) > Something like qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \ -dtb arch/nios2/boot/dts/10m50_devboard.dtb \ --append "rdinit=/sbin/init" -initrd busybox-nios2.cpio gives you a qemu monitor window. Use "info registers" to see registers. Looks like it is stuck in init_bootmem_core, or at least that is what it shows for me. Guenter