Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753455AbcD1MRv (ORCPT ); Thu, 28 Apr 2016 08:17:51 -0400 Received: from szxga04-in.huawei.com ([119.145.14.52]:49893 "EHLO szxga04-in.huawei.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1752665AbcD1MRt (ORCPT ); Thu, 28 Apr 2016 08:17:49 -0400 Subject: Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64 - LTP results To: Andrew Pinski References: <1459894127-17698-1-git-send-email-ynorov@caviumnetworks.com> <20160405224412.GA18300@yury-N73SV> <571AEDF9.6030701@huawei.com> CC: Yury Norov , Arnd Bergmann , Catalin Marinas , "linux-arm-kernel@lists.infradead.org" , LKML , Martin Schwidefsky , Heiko Carstens , "Kapoor, Prasun" , Andreas Schwab , "Nathan Lynch" , Alexander Graf , "Alexey Klimov" , Mark Brown , "Joseph S. Myers" , , , Linux-Arch , linux-s390 , Hanjun Guo , GCC Mailing List , "Zhangjian (Bamvor)" From: "Zhangjian (Bamvor)" Message-ID: <5721FF35.6090602@huawei.com> Date: Thu, 28 Apr 2016 20:16:53 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.72.170] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020206.5721FF4D.02BE,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: b6f4dc186dd153986267d89c4e01c119 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6812 Lines: 209 Hi, Andrew On 2016/4/28 5:15, Andrew Pinski wrote: > On Wed, Apr 27, 2016 at 12:30 AM, Andrew Pinski wrote: >> On Fri, Apr 22, 2016 at 8:37 PM, Zhangjian (Bamvor) >> wrote: >>> Hi, Yury >>> >>> >>> On 2016/4/6 6:44, Yury Norov wrote: >>>> >>>> There are about 20 failing tests of 782 in lite scenario. >>>> float_bessel >>>> float_exp_log >>>> float_iperb >>>> float_power >>>> float_trigo >>>> pipeio_1 >>>> pipeio_3 >>>> pipeio_5 >>>> pipeio_8 >>>> abort01 >>>> clone02 >>>> kill11 >>>> mmap16 >>>> open12 >>>> pause01 >>>> rename11 >>>> rmdir02 >>>> umount2_01 >>>> umount2_02 >>>> umount2_03 >>>> utime06 >>>> mtest06 >>>> >>>> The list is rough because some tests fail not every time. >>>> >>>> Tests abort01 and kill11 fail for lp64 too, so maybe there's >>>> a reason unrelated to ilp32 itself. >>>> >>>> float_xxx tests fail because they call unwind() from signal context, >>>> and GCC for ilp32 has problem with it, as Andrew told. >>> >>> Is there some progress about this issue. When we talk about unwind >>> functions, do you mean the function in libgcc? >>> >>> We encountered another issue(abort not segfault) which also called >>> pthread_cancel(). The test code is in the attachment. Here is the >>> backtrace: >> >> Yes this was a known issue I knew about. I have a patch GCC to fix >> this. Basically REG_VALUE_IN_UNWIND_CONTEXT needs to be defined while >> building libgcc to support the correct unwind information. >> I will be posting a GCC patch to fix this tomorrow. This was a bug >> even in the original set of ilp32 patches. I only finally was able to >> sit down and fix it today. > > Here is the link to the GCC patch which I said was going to submit today: > https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01726.html It works for me. Both float_xx in ltp and my pthread_cancel testcase is pass. Regards Bamvor > > Thanks, > Andrew > >> >> >> Thanks, >> Andrew >> >>> >>> ``` >>> Program received signal SIGABRT, Aborted. >>> [Switching to Thread 0xf77ee330 (LWP 2958)] >>> 0x000000000040f5bc in raise (sig=sig@entry=6) >>> at ../sysdeps/unix/sysv/linux/raise.c:55 >>> 55 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >>> (gdb) bt >>> #0 0x000000000040f5bc in raise (sig=sig@entry=6) >>> at ../sysdeps/unix/sysv/linux/raise.c:55 >>> #1 0x000000000040f884 in abort () at abort.c:89 >>> >>> #2 0x00000000004073b4 in uw_update_context_1 ( >>> context=context@entry=0xf77ec820, fs=fs@entry=0xf77ebec8) >>> at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1430 >>> >>> #3 0x00000000004078c0 in uw_update_context >>> (context=context@entry=0xf77ec820, >>> fs=fs@entry=0xf77ebec8) >>> at >>> /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1506 >>> #4 0x0000000000407a9c in uw_advance_context (fs=0xf77ebec8, >>> context=0xf77ec820) >>> at >>> /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1529 >>> #5 _Unwind_ForcedUnwind_Phase2 (exc=exc@entry=0xf77ee580, >>> context=context@entry=0xf77ec820) >>> at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind.inc:185 >>> #6 0x0000000000408228 in _Unwind_ForcedUnwind (exc=0xf77ee580, >>> stop=stop@entry=0x405440 , stop_argument=0xf77eddd8) >>> at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind.inc:207 >>> #7 0x00000000004055c4 in __pthread_unwind (buf=) >>> at unwind.c:126 >>> #8 0x00000000004050b4 in __do_cancel () at ./pthreadP.h:283 >>> #9 sigcancel_handler (sig=, si=, >>> ctx=) at nptl-init.c:225 >>> ---Type to continue, or q to quit--- >>> #10 >>> >>> #11 0x0000000000000000 in ?? () >>> >>> #12 0x0000000000423084 in __select (nfds=-66661, readfds=, >>> writefds=, exceptfds=, timeout=0x0) >>> at ../sysdeps/unix/sysv/linux/generic/select.c:45 >>> #13 0x0000000000400604 in TEST_TaskDelay ( >>> uiMillSecs=) >>> at test-cancel.c:18 >>> #14 0x0000000000400680 in printids ( >>> s=) >>> at test-cancel.c:38 >>> #15 0x00000000004006d0 in thr_fn ( >>> arg=) >>> at test-cancel.c:49 >>> #16 0x0000000000401b28 in start_thread (arg=0x4a3000) at >>> pthread_create.c:335 >>> #17 0x0000000000401b28 in start_thread (arg=0x4a3000) at >>> pthread_create.c:335 >>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>> ``` >>> >>> Such abort is raise by the following code: >>> ``` >>> static void >>> uw_update_context_1 (struct _Unwind_Context *context, _Unwind_FrameState >>> *fs) >>> { >>> //... >>> /* Compute this frame's CFA. */ >>> switch (fs->regs.cfa_how) >>> { >>> case CFA_REG_OFFSET: >>> cfa = _Unwind_GetPtr (&orig_context, fs->regs.cfa_reg); >>> cfa += fs->regs.cfa_offset; >>> break; >>> >>> case CFA_EXP: >>> { >>> const unsigned char *exp = fs->regs.cfa_exp; >>> _uleb128_t len; >>> >>> exp = read_uleb128 (exp, &len); >>> cfa = (void *) (_Unwind_Ptr) >>> execute_stack_op (exp, exp + len, &orig_context, 0); >>> break; >>> } >>> >>> default: >>> gcc_unreachable (); >>> } >>> context->cfa = cfa; >>> //... >>> } >>> `` >>> >>> Any suggestion is appreciated. >>> >>> CC gcc mailing list. Sorry if it is off topic. >>> >>> Regards >>> >>> Bamvor >>> >>> >>> >>> >>>> pipeio_x tests are very unstable and may fail randomly. I strongly >>>> suspect race conditions, as they all work like a charm if pinned to >>>> single CPU with taskset. Probably, race is the reason of clone02 too. >>>> Though I'm not sure, is the race in kernel, glibc or test itself. >>>> >>>> But I know for sure that pause01 fails due to test design: >>>> if (setitimer(ITIMER_REAL, &it, NULL)) // For 1000us >>>> tst_brkm(TBROK | TERRNO, NULL, "setitimer() failed"); >>>> >>>> TEST(pause()); >>>> >>>> As setitimer() and pause() calls are not atomic, alarm may come before >>>> pause() >>>> is called, and be silently dropped by the handler. Next pause() call hangs >>>> test forever. I already reported to LTP list. >>>> >>>> open12, rename11, rmdir02, mmap16, mtest06 - all call mkfs tool, and it >>>> returns >>>> error code. I didn't investigate it much yet. >>>> >>>> umount02_x, utime06 - cannot reproduce out of scenario, even run it in >>>> infinite >>>> loop - they work fine. >>>> >>>> Full test log is attached. >>>> >>>> Yury >>>> >>>