Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753602AbcD0VPf (ORCPT ); Wed, 27 Apr 2016 17:15:35 -0400 Received: from mail-lf0-f52.google.com ([209.85.215.52]:35026 "EHLO mail-lf0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752770AbcD0VPc (ORCPT ); Wed, 27 Apr 2016 17:15:32 -0400 MIME-Version: 1.0 In-Reply-To: References: <1459894127-17698-1-git-send-email-ynorov@caviumnetworks.com> <20160405224412.GA18300@yury-N73SV> <571AEDF9.6030701@huawei.com> Date: Wed, 27 Apr 2016 14:15:30 -0700 Message-ID: Subject: Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64 - LTP results From: Andrew Pinski To: "Zhangjian (Bamvor)" Cc: Yury Norov , Arnd Bergmann , Catalin Marinas , "linux-arm-kernel@lists.infradead.org" , LKML , Martin Schwidefsky , Heiko Carstens , "Kapoor, Prasun" , Andreas Schwab , Nathan Lynch , Alexander Graf , Alexey Klimov , Mark Brown , "Joseph S. Myers" , christoph.muellner@theobroma-systems.com, linux-doc@vger.kernel.org, Linux-Arch , linux-s390 , Hanjun Guo , GCC Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6416 Lines: 199 On Wed, Apr 27, 2016 at 12:30 AM, Andrew Pinski wrote: > On Fri, Apr 22, 2016 at 8:37 PM, Zhangjian (Bamvor) > wrote: >> Hi, Yury >> >> >> On 2016/4/6 6:44, Yury Norov wrote: >>> >>> There are about 20 failing tests of 782 in lite scenario. >>> float_bessel >>> float_exp_log >>> float_iperb >>> float_power >>> float_trigo >>> pipeio_1 >>> pipeio_3 >>> pipeio_5 >>> pipeio_8 >>> abort01 >>> clone02 >>> kill11 >>> mmap16 >>> open12 >>> pause01 >>> rename11 >>> rmdir02 >>> umount2_01 >>> umount2_02 >>> umount2_03 >>> utime06 >>> mtest06 >>> >>> The list is rough because some tests fail not every time. >>> >>> Tests abort01 and kill11 fail for lp64 too, so maybe there's >>> a reason unrelated to ilp32 itself. >>> >>> float_xxx tests fail because they call unwind() from signal context, >>> and GCC for ilp32 has problem with it, as Andrew told. >> >> Is there some progress about this issue. When we talk about unwind >> functions, do you mean the function in libgcc? >> >> We encountered another issue(abort not segfault) which also called >> pthread_cancel(). The test code is in the attachment. Here is the >> backtrace: > > Yes this was a known issue I knew about. I have a patch GCC to fix > this. Basically REG_VALUE_IN_UNWIND_CONTEXT needs to be defined while > building libgcc to support the correct unwind information. > I will be posting a GCC patch to fix this tomorrow. This was a bug > even in the original set of ilp32 patches. I only finally was able to > sit down and fix it today. Here is the link to the GCC patch which I said was going to submit today: https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01726.html Thanks, Andrew > > > Thanks, > Andrew > >> >> ``` >> Program received signal SIGABRT, Aborted. >> [Switching to Thread 0xf77ee330 (LWP 2958)] >> 0x000000000040f5bc in raise (sig=sig@entry=6) >> at ../sysdeps/unix/sysv/linux/raise.c:55 >> 55 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >> (gdb) bt >> #0 0x000000000040f5bc in raise (sig=sig@entry=6) >> at ../sysdeps/unix/sysv/linux/raise.c:55 >> #1 0x000000000040f884 in abort () at abort.c:89 >> >> #2 0x00000000004073b4 in uw_update_context_1 ( >> context=context@entry=0xf77ec820, fs=fs@entry=0xf77ebec8) >> at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1430 >> >> #3 0x00000000004078c0 in uw_update_context >> (context=context@entry=0xf77ec820, >> fs=fs@entry=0xf77ebec8) >> at >> /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1506 >> #4 0x0000000000407a9c in uw_advance_context (fs=0xf77ebec8, >> context=0xf77ec820) >> at >> /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1529 >> #5 _Unwind_ForcedUnwind_Phase2 (exc=exc@entry=0xf77ee580, >> context=context@entry=0xf77ec820) >> at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind.inc:185 >> #6 0x0000000000408228 in _Unwind_ForcedUnwind (exc=0xf77ee580, >> stop=stop@entry=0x405440 , stop_argument=0xf77eddd8) >> at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind.inc:207 >> #7 0x00000000004055c4 in __pthread_unwind (buf=) >> at unwind.c:126 >> #8 0x00000000004050b4 in __do_cancel () at ./pthreadP.h:283 >> #9 sigcancel_handler (sig=, si=, >> ctx=) at nptl-init.c:225 >> ---Type to continue, or q to quit--- >> #10 >> >> #11 0x0000000000000000 in ?? () >> >> #12 0x0000000000423084 in __select (nfds=-66661, readfds=, >> writefds=, exceptfds=, timeout=0x0) >> at ../sysdeps/unix/sysv/linux/generic/select.c:45 >> #13 0x0000000000400604 in TEST_TaskDelay ( >> uiMillSecs=) >> at test-cancel.c:18 >> #14 0x0000000000400680 in printids ( >> s=) >> at test-cancel.c:38 >> #15 0x00000000004006d0 in thr_fn ( >> arg=) >> at test-cancel.c:49 >> #16 0x0000000000401b28 in start_thread (arg=0x4a3000) at >> pthread_create.c:335 >> #17 0x0000000000401b28 in start_thread (arg=0x4a3000) at >> pthread_create.c:335 >> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >> ``` >> >> Such abort is raise by the following code: >> ``` >> static void >> uw_update_context_1 (struct _Unwind_Context *context, _Unwind_FrameState >> *fs) >> { >> //... >> /* Compute this frame's CFA. */ >> switch (fs->regs.cfa_how) >> { >> case CFA_REG_OFFSET: >> cfa = _Unwind_GetPtr (&orig_context, fs->regs.cfa_reg); >> cfa += fs->regs.cfa_offset; >> break; >> >> case CFA_EXP: >> { >> const unsigned char *exp = fs->regs.cfa_exp; >> _uleb128_t len; >> >> exp = read_uleb128 (exp, &len); >> cfa = (void *) (_Unwind_Ptr) >> execute_stack_op (exp, exp + len, &orig_context, 0); >> break; >> } >> >> default: >> gcc_unreachable (); >> } >> context->cfa = cfa; >> //... >> } >> `` >> >> Any suggestion is appreciated. >> >> CC gcc mailing list. Sorry if it is off topic. >> >> Regards >> >> Bamvor >> >> >> >> >>> pipeio_x tests are very unstable and may fail randomly. I strongly >>> suspect race conditions, as they all work like a charm if pinned to >>> single CPU with taskset. Probably, race is the reason of clone02 too. >>> Though I'm not sure, is the race in kernel, glibc or test itself. >>> >>> But I know for sure that pause01 fails due to test design: >>> if (setitimer(ITIMER_REAL, &it, NULL)) // For 1000us >>> tst_brkm(TBROK | TERRNO, NULL, "setitimer() failed"); >>> >>> TEST(pause()); >>> >>> As setitimer() and pause() calls are not atomic, alarm may come before >>> pause() >>> is called, and be silently dropped by the handler. Next pause() call hangs >>> test forever. I already reported to LTP list. >>> >>> open12, rename11, rmdir02, mmap16, mtest06 - all call mkfs tool, and it >>> returns >>> error code. I didn't investigate it much yet. >>> >>> umount02_x, utime06 - cannot reproduce out of scenario, even run it in >>> infinite >>> loop - they work fine. >>> >>> Full test log is attached. >>> >>> Yury >>> >>