Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752308AbdH2M6C convert rfc822-to-8bit (ORCPT ); Tue, 29 Aug 2017 08:58:02 -0400 Received: from mout.gmx.net ([212.227.17.20]:54434 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751514AbdH2M6A (ORCPT ); Tue, 29 Aug 2017 08:58:00 -0400 Message-ID: <1504011463.8323.45.camel@gmx.de> Subject: Re: kvm splat in mmu_spte_clear_track_bits From: Mike Galbraith To: Nadav Amit , Bernhard Held Cc: Adam Borowski , Paolo Bonzini , Wanpeng Li , Radim =?UTF-8?Q?Kr=C4=8Dm=C3=A1=C5=99?= , kvm , "linux-kernel@vger.kernel.org" , "Kirill A. Shutemov" Date: Tue, 29 Aug 2017 14:57:43 +0200 In-Reply-To: <79BC5306-4ED4-41E4-B2C1-12197D9D1709@gmail.com> References: <20170820231302.s732zclznrqxwr46@angband.pl> <20170821191203.jospdwqpnixlotx3@angband.pl> <20170821195833.GA696@flask> <20170821223228.edc6jrm7bpybtqlj@angband.pl> <1c270e76-05be-6f5f-29c6-9cb31f37f71d@redhat.com> <20170825131419.r5lzm6oluauu65nx@angband.pl> <0a85df4b-ca0a-7e70-51dc-90bd1c460c85@redhat.com> <20170827123505.u4kb24kigjqwa2t2@angband.pl> <0dcca3a4-8ecd-0d05-489c-7f6d1ddb49a6@gmx.de> <79BC5306-4ED4-41E4-B2C1-12197D9D1709@gmail.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.20.5 Mime-Version: 1.0 Content-Transfer-Encoding: 8BIT X-Provags-ID: V03:K0:97rX3OkP4sbv/Bvr3vNp37XZ3ddnfIOVOx9Uy72y9miTO4dGz1y Odw0jjgEOhEBGjIaqY6DlOoCyzT9jNhv5nwAU5WuomkCrH55dbzwtnAeQWXj/Bt/aNjBYHu CugkIOq44U38okAKzxl3I7CQlvvayNUcla1EplGfSs9vDRxO3RgJ3Xwc6Ux+1NC4Aa3MRf9 j618mei8Q/1FlI/fU++lA== X-UI-Out-Filterresults: notjunk:1;V01:K0:UMuhGvN7nAw=:rGC1qBJP8TzRc9s4fJk7BH LSb8EkQkhR2NVtIEQn9nGEcwU5R3WL7sLXH7Sfim11vDhZ3KbW/t4kp22YjagC7wkfzuqCJYk AL5yeGk+cpvMcuVdXPh1CdIolgIdgSo3TA/0kkO38Et3Ie2hSe6PBxxd+MaGpswBwLe87clmh eZXC4BG27GVGaIgZOLvI8aci549ngVjgnFkTnQmh3DmdvIIpkwG+7oDxC1jKiMZMS2phekz+V XG31265NCaJKLpmjNlPROGyDtYrsLYgpl47CRDt/jmhOexe9QALHB70QlVwLnIKFRgREVrONB FfNweewVkGCqjni4YTSN3S4JtDlz5TdTJNumMyFJn49IROUT7fMmglqFxEn7+LZbOXxe64Mgt qKnX1gMmWPGSilnRCIf52GE1ji0GpgJieTiS950nblHhjP43eROKCQWKFXRU27otMPlL/e8LC ccmzgZGPUcpdWwciho5/v+QMNeDlGdDUTdzBj/LV0UJhhWqDusf81W2GUtHjaoZVWNp5rbgDF GNkfQgGR8zMA8wJIUrYJPLLflU3zCS5Kj+U6lPZLRy9SXxHbmhB69nEg6KHuFObI4NH0AyQLJ QILYyDsuZz16MyH+CtYxB8RNTT06sMsMcaBPcDpTrZp1/G4FYsneC5V/pZIrBbWONqcNmCKHU /ZgGSmLLYMQz+FGmEOAnKm0prdpNMbnO0gNBkeKtCkuA/Me5zyin0d7o5cQm7ykh7n9lK6fKC ilXWcFb3wOKdcWqsrTIe1hT31GHeyhshBW4LXeFLvhXvltDpOMEJveEtCm2+u/aOlA6PDV2B4 eESyI8zudfZPJs2CW/M5jpIfjk/XLz67kGiS445artfUsAaA3Y= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3056 Lines: 57 On Mon, 2017-08-28 at 09:56 -0700, Nadav Amit wrote: > Bernhard Held wrote: > > > On 08/27/2017 at 02:35 PM, Adam Borowski wrote: > >> 4.13-rc5 retested fails > >> Crashed only after two hours or so of testing. > >> 4.13-rc4 apparently works > >> It survived several hours of varied tests (like 5 debian-installer runs, a > >> win10 point release upgrade, some hurd package building, openbsd, etc), > >> all while the host was likewise busy. > >> Thus: to the best of my knowledge, the problem is between 4.13-rc4 and 4.13-rc5 > >> but I wouldn't bet my life on it. > > > > I get crashes with Win10 in kvm with 4.13-rc5. 4.13-rc4 works for me. THP seems to accelerate the crash, but that's not 100% sure. > > > > There's still no crash after reverting merge 27df70 on 4.13-rc7. There are 21 commits in this merge, 10 are mm-related: > > > > $ git log 4e082e9ba7cd..e86b298bebf7 --pretty=oneline --abbrev-commit > > e86b298bebf7 userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage > > f357e345eef7 zram: rework copy of compressor name in comp_algorithm_store() > > aac2fea94f7a rmap: do not call mmu_notifier_invalidate_page() under ptl > > d041353dc98a mm: fix list corruptions on shmem shrinklist > > af54aed94bf3 mm/balloon_compaction.c: don't zero ballooned pages > > c0a6a5ae6b5d MAINTAINERS: copy virtio on balloon_compaction.c > > b3a81d0841a9 mm: fix KSM data corruption > > 99baac21e458 mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem > > 0a2dd266dd6b mm: make tlb_flush_pending global > > 56236a59556c mm: refactor TLB gathering API > > a9b802500ebb Revert "mm: numa: defer TLB flush for THP migration as long as possible" > > 0a2c40487f3e mm: migrate: fix barriers around tlb_flush_pending > > 16af97dc5a89 mm: migrate: prevent racy access to tlb_flush_pending > > 9eeb52ae712e fault-inject: fix wrong should_fail() decision in task context > > 4e98ebe5f435 test_kmod: fix small memory leak on filesystem tests > > 9c56771316ef test_kmod: fix the lock in register_test_dev_kmod() > > 434b06ae23ba test_kmod: fix bug which allows negative values on two config options > > a4afe8cdec16 test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY" > > 5af10dfd0afc userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case > > 75dddef32514 mm: ratelimit PFNs busy info message > > d507e2ebd2c7 mm: fix global NR_SLAB_.*CLAIMABLE counter reads > > Don’t blame me for the TLB stuff... My money is on aac2fea94f7a . You may be onto something. FWIW, with an RT host/guest, I reproduced the problem yesterday in fairly short order, but today, with that commit reverted, and pushing markedly harder, nada. (hohum, intermittent bugs tend to do that, they're particularly fond of showing up about 10 seconds after you report them dead...9..8..7;) A colleague suggested going back to mmu_notifier_invalidate_page(), which I'm going to try shortly (hopefully noticing absolutely nothing the least bit 'interesting'), but first, I'm gonna CC the author of that _maybe_ culprit patch. -Mike