Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751984AbdH2JTb (ORCPT ); Tue, 29 Aug 2017 05:19:31 -0400 Received: from mout.gmx.net ([212.227.15.19]:62760 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751646AbdH2JT2 (ORCPT ); Tue, 29 Aug 2017 05:19:28 -0400 From: Bernhard Held Subject: Re: kvm splat in mmu_spte_clear_track_bits To: Nadav Amit Cc: Adam Borowski , Paolo Bonzini , Wanpeng Li , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , kvm , "linux-kernel@vger.kernel.org" References: <20170820231302.s732zclznrqxwr46@angband.pl> <20170821191203.jospdwqpnixlotx3@angband.pl> <20170821195833.GA696@flask> <20170821223228.edc6jrm7bpybtqlj@angband.pl> <1c270e76-05be-6f5f-29c6-9cb31f37f71d@redhat.com> <20170825131419.r5lzm6oluauu65nx@angband.pl> <0a85df4b-ca0a-7e70-51dc-90bd1c460c85@redhat.com> <20170827123505.u4kb24kigjqwa2t2@angband.pl> <0dcca3a4-8ecd-0d05-489c-7f6d1ddb49a6@gmx.de> <79BC5306-4ED4-41E4-B2C1-12197D9D1709@gmail.com> Message-ID: Date: Tue, 29 Aug 2017 11:19:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <79BC5306-4ED4-41E4-B2C1-12197D9D1709@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K0:eqbUM553Nb2yERkVNwihWQonlKgt1s7FvTOqimxic7qspdDvWZj J9NiledlQOBBg/5NfGWPpX/gWfUrYrfGiHcttPcPKw3knB708HvAM8mcR+Wrou6jX0gQfm4 bdyRXRM4i7SuhNOrvyI3tP/e7Lu/oqgwqngzkLY/Bd0SGLJwiJbRhOeaUn8dS9sDykY/gdI H6xmnse4ox5v6znkNpY2g== X-UI-Out-Filterresults: notjunk:1;V01:K0:8BZERpl+v38=:gOFfRTq4C5lT+a3k9e9DWt TLSOMTw2JfG46PByf6m5W7+ig2WFMrKcfouhe2a0umNQVIOVXZNAmjLlCENP7O50VaktZ26BR TRPaQOl2NBcFwlG8Z/TwfdvyEi7n3KGNTyASgGt2stBQyXjte1EZs/p7fQjv3DQy8OWxkkhIV aAS1PNIRZrvN5RjOGt+b+9PrJy51F9QK+yj1VKBTwnKWHnp0hunI0p15ZRo/L/sXs5L9fDbXV 1iZ/ANYeRrr1NBgbSN+ZATgJefjtZRv9J/Prb5U0sQHIu+NJKf9+1QPJxqPi1eimLjlbJvgOE XOYpigS5C3aHmcdi6wp/lkEUYjlgNv0mtKK8oQkbg4LwkaL7hQP6unyKRq+03KsGoo0wo86No ASubWqVpmu5jG3aZkhH7SRkjt7zrgGWw2ZF1VK2ywGImpBccEwy+CrTPXVskKp5+KJTFzgS3M GQQu/7AyJl5BgHDXwjWoI1/ffRa2mv/1I2m7Gm+MX5dKzuhfAHugP0a1cBOtJ/ujhUyyDkURR nOE6DhrRE5fDRko8S/9eSwhCrJMn1SLsEIt7zC8pYOQOXBeTe66VeTxKSZIQ8KUDzh2Qbw10/ gJQ9bWfqJ6Xd6oLz+CEjhZJFYUjx4SLoiL/fwedmeZkq0b++X4cez9CWD8feKdl2j9nImTRcH WhrBDXcGNLVdCFu0sXCiWcbei4GOfo7Qporqs1uCc3t2ceKH7qIYvBBq9nhPBX5BDJIHiOmV9 6871STf4g3UPIjFuPkwk9v8rFA01Ht5TIJC1B7q+UJV0ovmoFOSVnB8b9WhIkoamTp2n6piqJ rvF9RrKLGgd6DehZ3ywWumjI9Z+Twhm96vxz3K17ANeqXZ4yF4= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2762 Lines: 48 On 08/28/2017 at 06:56 PM, Nadav Amit wrote: > Bernhard Held wrote: > >> On 08/27/2017 at 02:35 PM, Adam Borowski wrote: >>> 4.13-rc5 retested fails >>> Crashed only after two hours or so of testing. >>> 4.13-rc4 apparently works >>> It survived several hours of varied tests (like 5 debian-installer runs, a >>> win10 point release upgrade, some hurd package building, openbsd, etc), >>> all while the host was likewise busy. >>> Thus: to the best of my knowledge, the problem is between 4.13-rc4 and 4.13-rc5 >>> but I wouldn't bet my life on it. >> >> I get crashes with Win10 in kvm with 4.13-rc5. 4.13-rc4 works for me. THP seems to accelerate the crash, but that's not 100% sure. >> >> There's still no crash after reverting merge 27df70 on 4.13-rc7. There are 21 commits in this merge, 10 are mm-related: >> >> $ git log 4e082e9ba7cd..e86b298bebf7 --pretty=oneline --abbrev-commit >> e86b298bebf7 userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage >> f357e345eef7 zram: rework copy of compressor name in comp_algorithm_store() >> aac2fea94f7a rmap: do not call mmu_notifier_invalidate_page() under ptl >> d041353dc98a mm: fix list corruptions on shmem shrinklist >> af54aed94bf3 mm/balloon_compaction.c: don't zero ballooned pages >> c0a6a5ae6b5d MAINTAINERS: copy virtio on balloon_compaction.c >> b3a81d0841a9 mm: fix KSM data corruption >> 99baac21e458 mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem >> 0a2dd266dd6b mm: make tlb_flush_pending global >> 56236a59556c mm: refactor TLB gathering API >> a9b802500ebb Revert "mm: numa: defer TLB flush for THP migration as long as possible" >> 0a2c40487f3e mm: migrate: fix barriers around tlb_flush_pending >> 16af97dc5a89 mm: migrate: prevent racy access to tlb_flush_pending >> 9eeb52ae712e fault-inject: fix wrong should_fail() decision in task context >> 4e98ebe5f435 test_kmod: fix small memory leak on filesystem tests >> 9c56771316ef test_kmod: fix the lock in register_test_dev_kmod() >> 434b06ae23ba test_kmod: fix bug which allows negative values on two config options >> a4afe8cdec16 test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY" >> 5af10dfd0afc userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case >> 75dddef32514 mm: ratelimit PFNs busy info message >> d507e2ebd2c7 mm: fix global NR_SLAB_.*CLAIMABLE counter reads > > Don’t blame me for the TLB stuff... My money is on aac2fea94f7a . Amit, thanks for your courage to expose your patch! I'm more and more confident that aac2fea94f7a is the culprit. Maybe it just accelerates the triggering of the splash. To be more sure the kernel needs to be tested for a couple of days. It would be great if others could assist in testing aac2fea94f7a. Have fun, Bernhard