Received: by 10.223.176.5 with SMTP id f5csp1537387wra; Sun, 4 Feb 2018 06:24:29 -0800 (PST) X-Google-Smtp-Source: AH8x2268UyUYLqIFMAgG8whP9IuKJacoYztZBMNrOBmK7lFRJWSmxy/Qbw7JehIVWOkB2Ec7OgVv X-Received: by 2002:a17:902:3f83:: with SMTP id a3-v6mr40037190pld.263.1517754269611; Sun, 04 Feb 2018 06:24:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517754269; cv=none; d=google.com; s=arc-20160816; b=ptcci0s4Lh1ljAd1tCA5J6qG0O/iz+dbf1rrK/PM6gWSOPel4etm6P8PAgjtg4ySV+ IjQXGlZj7r01vfeFwfGUlSp7/F1RbUM0KPe7EjlR/xKGGYAuCjySPychFqgBhegdtjNI WCiEn/QiL+J87LW54vLQDopsd9yes2DFspf6to9eSqO5xH0SLktpOoCfI70jUregHyeK cBp5E9yelSsG2EuJVVHIw+/bjxU3dzvNBtuS5Cuadiaa5sjzZGnAgUf9lphC8nPl1ooz TlKWowwMkrINMkSiF+fwGQX/Hu/ckeZ70EgvDxN7PLTpbh9bMc00i0GXSvcy6NTZhvsg kXXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=S/UdHDP4aLN4hHJlX48E8nqLlPYJvIZmYLR60wBMnng=; b=irw/r8FrYxj2bRw7NwRyuMzbGwNR+I8T5oColJ5hzZugldXbcz7tN9MqQyaOOlLvfO 1Uikcblv0B8Yg35Ek/LQyeUIwVj4WBDcb+wUcBXbJKN9nSVK/ufY2mTCo8gOSCUZhw5H xMTviZ/3CaYCFHFQZPgkyoy90SR7bqQOOvyf//iI6DUstQiuu8J672kA8wdPWJDagyvx U9u/v3pPEe/lQToPmsSCnOHRHSS05sC6kamMLxCe/bYFv8eG/SfeyEeA6dJGZCcFOW6a iaKlJ2vgV+3WqHqAxL7KZltiYjxwvDp27LXJUZHLPB+YZTHrqffN0aKMFYkiaVnsdl5l gNjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=biGd7N7f; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d13-v6si5581046pln.747.2018.02.04.06.24.14; Sun, 04 Feb 2018 06:24:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=biGd7N7f; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751897AbeBDOVg (ORCPT + 99 others); Sun, 4 Feb 2018 09:21:36 -0500 Received: from mail-qt0-f196.google.com ([209.85.216.196]:36742 "EHLO mail-qt0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750888AbeBDOVa (ORCPT ); Sun, 4 Feb 2018 09:21:30 -0500 Received: by mail-qt0-f196.google.com with SMTP id t25so3013839qtg.3 for ; Sun, 04 Feb 2018 06:21:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=S/UdHDP4aLN4hHJlX48E8nqLlPYJvIZmYLR60wBMnng=; b=biGd7N7fa5IunZfnk2siNLYqiasjUsnEwWrsVPKAPlfLAJwk33s2xBQEMA5dLTldrG ueqPmOxcDHvxObZ390FVPo0IAgqc34BQlPcIeKqAfpkPplhvcv3BxcuqAHo2aOEmR0GS hEbzcgUP8uTZ71Ms85krNmykdTfTbMvN9dy4+eyKNadaj+lgvOvgRtyS11H5wrFC6yPE 11CwYs7v0ZCRUvit5oj5r23vX9Q6KYbZkks2gQZsNEpERXBeafVyt2T/yg7H/Fwao4pV f2hEHUg5vII95XqvBk5GDNtDce1IhNn/acd5x7p0sRngxJWpZn4c5Y/XD75Ug9nvL1Fb 3YjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=S/UdHDP4aLN4hHJlX48E8nqLlPYJvIZmYLR60wBMnng=; b=hzyirJBfUGvOFFdhtzLRzFlkZzKFHwavvCDXEzj8HkyY2MR5Nal27PLSX0yAus1ae4 oKVMDsXxfWixxHumJV28aDoJzPPxwmCMnHRldEfYhz7xlVHgU3BpDKu583bvxzPTedwK ATwn8FRyMYFT45O0Vd3xfthkP6VxNo0Ro5UKiEWN/6z+kYAqSWyb4160sDxnCzcdx7Xh 53e6ZaHvPF3jroc1UeNpyAjssdnp3/7ZREVqv6bywjsd8stTFq7MnHlj1JlbxgciGP2h roYzaA5/tO8sftkbP9W0pzX3nDRFELo8sspZqksPpGZlbF1US0Sp0pfbhmXVZzjbM8ux Y4tw== X-Gm-Message-State: APf1xPDDqgcH+YrBFtNWCeIHUyy5HT1vMFkp6DBFNypOO9OsBFlBpLuZ loQu1V1C/Il7C4qL+0HfxKZQlshbCJmWl1TZBUA= X-Received: by 10.200.68.80 with SMTP id m16mr18515626qtn.294.1517754089548; Sun, 04 Feb 2018 06:21:29 -0800 (PST) MIME-Version: 1.0 Received: by 10.200.63.143 with HTTP; Sun, 4 Feb 2018 06:21:29 -0800 (PST) In-Reply-To: <20180203013455.GA739@jagdpanzerIV> References: <20180130114841.aa2d3bd99526c03c6a5b5810@linux-foundation.org> <20180203013455.GA739@jagdpanzerIV> From: huang ying Date: Sun, 4 Feb 2018 22:21:29 +0800 Message-ID: Subject: Re: bisected bd4c82c22c367e is the first bad commit (was [Bug 198617] New: zswap causing random applications to crash) To: Sergey Senozhatsky Cc: Andrew Morton , "Huang, Ying" , Michal Hocko , Vlastimil Babka , Sergey Senozhatsky , Minchan Kim , Seth Jennings , Dan Streetman , "Kirill A . Shutemov" , linux-mm@kvack.org, LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Sergey, Thanks for reporting! On Sat, Feb 3, 2018 at 9:34 AM, Sergey Senozhatsky wrote: > Hello, > > On (01/30/18 11:48), Andrew Morton wrote: >> Subject: [Bug 198617] New: zswap causing random applications to crash >> >> https://bugzilla.kernel.org/show_bug.cgi?id=198617 >> >> Bug ID: 198617 >> Summary: zswap causing random applications to crash >> Product: Memory Management >> Version: 2.5 >> Kernel Version: 4.14.15 >> Hardware: x86-64 >> OS: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Page Allocator >> Assignee: akpm@linux-foundation.org >> Reporter: kernel_org@dlk.pl >> Regression: No >> >> https://bugs.freedesktop.org/show_bug.cgi?id=104709 >> https://bugs.kde.org/show_bug.cgi?id=389542 >> >> I did have zswap enabled for a long while, and a lot of wine games, >> plasmashell, xorg, kwin_x11 (and other) did crash randomly when reached 100% of >> physical ram and swap was like almost never used. >> >> I could esilly open a lot of browser tabs and the browser or xorg would fail >> every time. >> >> After disabling zswap no crashes at all. >> >> /etc/systemd/swap.conf >> zswap_enabled=1 >> zswap_compressor=lz4 # lzo lz4 >> zswap_max_pool_percent=25 # 1-99 >> zswap_zpool=zbud # zbud z3fold > > > So I did a number of tests and I confirm that under memory pressure > with frontswap enabled I do see segfaults and memory corruptions in > random user space applications. > > kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000] > #0 0x00007fc08889ae0d _int_malloc (libc.so.6) > #1 0x00007fc08889c2f3 malloc (libc.so.6) > #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt) > #3 0x0000560e6005e75c n/a (urxvt) > #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) > #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt) > #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt) > #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt) > #8 0x0000560e6005cb55 ev_run (urxvt) > #9 0x0000560e6003b9b9 main (urxvt) > #10 0x00007fc08883af4a __libc_start_main (libc.so.6) > #11 0x0000560e6003f9da _start (urxvt) > > kernel: urxvt[343]: segfault at 10 ip 00007fa56bd7d52b sp 00007ffc09783a40 error 4 in libc-2.26.so[7fa56bcfd000+1ae000] > #0 0x00007fa56bd7d52b _int_malloc (libc.so.6) > #1 0x00007fa56bd7f2f3 malloc (libc.so.6) > #2 0x00007fa56b3d6097 n/a (libxcb.so.1) > #3 0x00007fa56b3d64d8 n/a (libxcb.so.1) > #4 0x00007fa56c921b79 n/a (libX11.so.6) > #5 0x00007fa56c921ceb n/a (libX11.so.6) > #6 0x00007fa56c921fdd _XEventsQueued (libX11.so.6) > #7 0x00007fa56c913c49 XEventsQueued (libX11.so.6) > #8 0x000055b35cfc3262 _ZN12rxvt_display8flush_cbERN2ev7prepareEi (urxvt) > #9 0x000055b35cfc910f _Z17ev_invoke_pendingv (urxvt) > #10 0x000055b35cfc9c02 ev_run (urxvt) > #11 0x000055b35cfa89b9 main (urxvt) > #12 0x00007fa56bd1df4a __libc_start_main (libc.so.6) > #13 0x000055b35cfac9da _start (urxvt) > > Stack trace of thread 351: > #0 0x00007f5baaee7860 raise (libc.so.6) > #1 0x00007f5baaee8ec9 abort (libc.so.6) > #2 0x00007f5baaf30849 __malloc_assert (libc.so.6) > #3 0x00007f5baaf34011 _int_malloc (libc.so.6) > #4 0x00007f5baaf352f3 malloc (libc.so.6) > #5 0x00007f5baaf71cad __alloc_dir (libc.so.6) > #6 0x00007f5baaf71dbd opendir_tail (libc.so.6) > #7 0x00007f5bab5bbac4 Perl_pp_open_dir (libperl.so) > #8 0x00007f5bab55fec6 Perl_runops_standard (libperl.so) > #9 0x00007f5bab4d9390 Perl_call_sv (libperl.so) > #10 0x00005611f097e190 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) > #11 0x00005611f0947acb _ZN9rxvt_term14init_resourcesEiPKPKc (urxvt) > #12 0x00005611f0948da8 _ZN9rxvt_term5init2EiPKPKc (urxvt) > #13 0x00005611f097a0af n/a (urxvt) > #14 0x00007f5bab568259 Perl_pp_entersub (libperl.so) > #15 0x00007f5bab55fec6 Perl_runops_standard (libperl.so) > #16 0x00007f5bab4d9390 Perl_call_sv (libperl.so) > #17 0x00005611f097e190 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) > #18 0x00005611f0939a77 _ZN9rxvt_term9key_pressER9XKeyEvent (urxvt) > #19 0x00005611f093d77a _ZN9rxvt_term4x_cbER7_XEvent (urxvt) > #20 0x00005611f09572e8 _ZN12rxvt_display8flush_cbERN2ev7prepareEi (urxvt) > #21 0x00005611f095d10f _Z17ev_invoke_pendingv (urxvt) > #22 0x00005611f095dc02 ev_run (urxvt) > #23 0x00005611f093c9b9 main (urxvt) > #24 0x00007f5baaed3f4a __libc_start_main (libc.so.6) > #25 0x00005611f09409da _start (urxvt) > > and so on. > > > However, the problem is not specific to 4.14.15 or 4.14.11. > > I manages to track it down to 4.14 merge window, so we are basically > looking at 4.14-rc0+ > > The bisect log looks as follows: > > git bisect start > # bad: [2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e] Linux 4.14-rc1 > git bisect bad 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e > # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13 > git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261 > # good: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect good aae3dbb4776e7916b6cd442d00159bea27a695c1 > # bad: [2f173d2688559a6f85643d38a2ad6f45eb420c42] KVM: x86: Fix immediate_exit handling for uninitialized AP > git bisect bad 2f173d2688559a6f85643d38a2ad6f45eb420c42 > # bad: [d969443064abf2f51510559a5b01325eaabfcb1d] Merge tag 'sound-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound > git bisect bad d969443064abf2f51510559a5b01325eaabfcb1d > # bad: [a0725ab0c7536076d5477264420ef420ebb64501] Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block > git bisect bad a0725ab0c7536076d5477264420ef420ebb64501 > # bad: [f92e3da18b7d5941468040af962c201235148301] Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad f92e3da18b7d5941468040af962c201235148301 > # good: [1c9fe4409ce3e9c78b1ed96ee8ed699d4f03bf33] x86/mm: Document how CR4.PCIDE restore works > git bisect good 1c9fe4409ce3e9c78b1ed96ee8ed699d4f03bf33 > # bad: [da99ecf117fce6570bd3989263d68ee0007e1249] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim > git bisect bad da99ecf117fce6570bd3989263d68ee0007e1249 > # good: [f7b68046873724129798c405e1a4e326b409c08f] mm: use find_get_pages_range() in filemap_range_has_page() > git bisect good f7b68046873724129798c405e1a4e326b409c08f > # bad: [824f973904a1108806fa0fbe15dc93ee9ecd9e0a] userfaultfd: selftest: enable testing of UFFDIO_ZEROPAGE for shmem > git bisect bad 824f973904a1108806fa0fbe15dc93ee9ecd9e0a > # good: [98cc093cba1e925eb34963dedb5f1684f1bdb2f4] block, THP: make block_device_operations.rw_page support THP > git bisect good 98cc093cba1e925eb34963dedb5f1684f1bdb2f4 > # bad: [fe490cc0fe9e6ee48cc48bb5dc463bc5f0f1428f] mm, THP, swap: add THP swapping out fallback counting > git bisect bad fe490cc0fe9e6ee48cc48bb5dc463bc5f0f1428f > # good: [3e14a57b2416b7c94189b95baffd673cf5e0d0a3] memcg, THP, swap: support move mem cgroup charge for THP swapped out > git bisect good 3e14a57b2416b7c94189b95baffd673cf5e0d0a3 > # good: [d6810d730022016d9c0f389452b86b035dba1492] memcg, THP, swap: make mem_cgroup_swapout() support THP > git bisect good d6810d730022016d9c0f389452b86b035dba1492 > # bad: [bd4c82c22c367e068acb1ec9ec02be2fac3e09e2] mm, THP, swap: delay splitting THP after swapped out > git bisect bad bd4c82c22c367e068acb1ec9ec02be2fac3e09e2 > # first bad commit: [bd4c82c22c367e068acb1ec9ec02be2fac3e09e2] mm, THP, swap: delay splitting THP after swapped out > > > The suspected first bad commit is: > > bd4c82c22c367e068acb1ec9ec02be2fac3e09e2 is the first bad commit > commit bd4c82c22c367e068acb1ec9ec02be2fac3e09e2 > Author: Huang Ying > Date: Wed Sep 6 16:22:49 2017 -0700 > > mm, THP, swap: delay splitting THP after swapped out > > In this patch, splitting transparent huge page (THP) during swapping out > is delayed from after adding the THP into the swap cache to after > swapping out finishes. After the patch, more operations for the > anonymous THP reclaiming, such as writing the THP to the swap device, > removing the THP from the swap cache could be batched. So that the > performance of anonymous THP swapping out could be improved. > > This is the second step for the THP swap support. The plan is to delay > splitting the THP step by step and avoid splitting the THP finally. > > With the patchset, the swap out throughput improves 42% (from about > 5.81GB/s to about 8.25GB/s) in the vm-scalability swap-w-seq test case > with 16 processes. At the same time, the IPI (reflect TLB flushing) > reduced about 78.9%. The test is done on a Xeon E5 v3 system. The swap > device used is a RAM simulated PMEM (persistent memory) device. To test > the sequential swapping out, the test case creates 8 processes, which > sequentially allocate and write to the anonymous pages until the RAM and > part of the swap device is used up. > > Link: http://lkml.kernel.org/r/20170724051840.2309-12-ying.huang@intel.com Can you give me some detailed steps to reproduce this? Like the kernel configuration file, swap configuration, etc. Any kernel WARNING during testing? Can you reproduce this with a real swap device instead of zswap? Best Regards, Huang, Ying