Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp1882835rbb; Tue, 27 Feb 2024 04:29:35 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXWRqdPuSX53+I7vfn5FGbKvGrjayHAm7Msi60Ae7uS9nSRmNTn21yOvQwQf4YaTWJyrhh9kKe5kbTdpMAH2+PTuQNkPr5hQ6ZAS+hE9g== X-Google-Smtp-Source: AGHT+IEw/Fg9V9elD2pz89jHjSTamG8Jf6/yDFARTgIJ/1l6RBjAHsvtvd+cBX/J+brseOnpcRUW X-Received: by 2002:a05:6a20:8ca6:b0:1a0:ce31:128b with SMTP id k38-20020a056a208ca600b001a0ce31128bmr1698978pzh.34.1709036975524; Tue, 27 Feb 2024 04:29:35 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709036975; cv=pass; d=google.com; s=arc-20160816; b=TvWWjbnu2ctTcnrLNWZkpDSHKz0VLt0MsGMWdxkCW3XVvd8lDNhYA6G2haMRONtUbo 7mImRFN2NwnmpnNIjfIlgCsiA+LGaStgw71qRZMJn18i7/nmCBgZSDLbm/ITwZLiDDd/ QjoQ6tbEa5ivq9qcQYwv1LzLzYBShQtH7nI0h/Lo0aJ23yCqip20P8JsCm9+bB4fZo10 VOGOIvcQFp90I+nQ8SzJ227X9g26rmZqgluHx8CKzBBU6J4zDkwkOe+goY/mdVsJZz0p qHj641dSRnWn9KK8T+xqZ09bdh3DeqWTd1qSFvs1XTx+6mcqdCvZdFETR0x1AKs5kl1A Bb3A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=V4SiFrsldLc/1qooO2hdLxocFy7/16LXg9kku+4IXj8=; fh=vpvTSibTk+Kp1b/wd/D7ej3S6A+MQl1NGe84BRls3I8=; b=e4gVNgp+Q/RG5aBYajWMQbGNzzD+QUGaQhpNtSRg6RP7gT5jlNGRuCwy6mIRlIVItc dB3c4n7UFP8Kmtg+RRysovfdwia2AUBA0QgWZqSKnk6BM3d2i9WI0diUCtHLlZ90Tabu CUDSq3wha2K/bYw4RJbiR6NcsQFwkfxXrmNjze2fqbOnEoVeq5h+jBWJ2FugqoKk3hfU C9P/EZQ8YIGH/wuBXwwP2Yf71b1jDHKJEolmMALsbARFxCZyTGdmNYq1SP/oNv9DV4lc 5h5j53q7unC4d7ssYmQVpq5W5t6Pj3B9JYdTCJWLtnutaZyt0x3sBT7+deV8DduVt4o7 OOMw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-83230-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83230-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE) header.from=huawei.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id e34-20020a635022000000b005ceef3c53a1si5335403pgb.635.2024.02.27.04.29.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 04:29:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83230-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-83230-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83230-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id CB7722829C3 for ; Tue, 27 Feb 2024 12:29:34 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C61FD13A87E; Tue, 27 Feb 2024 12:28:24 +0000 (UTC) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9522613A24A for ; Tue, 27 Feb 2024 12:28:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.189 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709036904; cv=none; b=NqmutDmyVHni3e2kCoXTGRB5qbFDF3ivEjVIF2RbiU1IBSJZWQT7Rh2R/8M7Prhs0Ph/K0ynddkon70xvTuOfcZnNCLJbL68xbRLuYPS9nZc/2uQrt38LgMokcEpU+to38nqIILLdZpmUOmCO301C8RPhAemu2ExVFb/YBzz02M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709036904; c=relaxed/simple; bh=RRXFy/boXCwxSxlgJSeOaphTJjc72DSJtT1PMzF0QDs=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=oVO5Oz5DZGJSndp+GEhzFXHdOage91rgERN+CfKnRizZ829iY4QxARqmgut/YmOKr0210qZ1HA0YPUeenaYoNxVVBNIaIsGw8AfKM9haoogcmtMpNuDDdBkcX+IbZ+oJpU+c5186PvVXcdBscnyp06k/DRhxIaNao7komSLuirQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4TkcDT2FLjzNljx; Tue, 27 Feb 2024 20:26:49 +0800 (CST) Received: from dggpemd200001.china.huawei.com (unknown [7.185.36.224]) by mail.maildlp.com (Postfix) with ESMTPS id DFFB91404FC; Tue, 27 Feb 2024 20:28:18 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by dggpemd200001.china.huawei.com (7.185.36.224) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 27 Feb 2024 20:28:18 +0800 From: Wupeng Ma To: , , , , , CC: , , , , , , , , , , Subject: [Question] CoW on VM_PFNMAP vma during write fault Date: Tue, 27 Feb 2024 20:28:14 +0800 Message-ID: <20240227122814.3781907-1-mawupeng1@huawei.com> X-Mailer: git-send-email 2.25.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemd200001.china.huawei.com (7.185.36.224) We find that a warn will be produced during our test, the detail log is shown in the end. The core problem of this warn is that the first pfn of this pfnmap vma is cleared during memory-failure. Digging into the source we find that this problem can be triggered as following: // mmap with MAP_PRIVATE and specific fd which hook mmap mmap(MAP_PRIVATE, fd) __mmap_region remap_pfn_range // set vma with pfnmap and the prot of pte is read only // memset this memory with trigger fault handle_mm_fault __handle_mm_fault handle_pte_fault // write fault and !pte_write(entry) do_wp_page wp_page_copy // this will alloc a new page with valid page struct // for this pfnmap vma // inject a hwpoison to the first page of this vma madvise_inject_error memory_failure hwpoison_user_mappings try_to_unmap_one // mark this pte as invalid (hwpoison) mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, address, range.end); // during unmap vma, the first pfn of this pfnmap vma is invalid vm_mmap_pgoff do_mmap __do_mmap_mm __mmap_region __do_munmap unmap_region unmap_vmas unmap_single_vma untrack_pfn follow_phys // pte is already invalidate, WARN_ON here CoW with a valid page for pfnmap vma is weird to us. Can we use remap_pfn_range for private vma(read only)? Once CoW happens on a pfnmap vma during write fault, this page is normal(page flag is valid) for most mm subsystems, such as memory failure in thais case and extra should be done to handle this special page. During unmap, if this vma is pfnmap, unmap shouldn't be done since page should not be touched for pfnmap vma. But the root problem is that can we insert a valid page for pfnmap vma? Any thoughts to solve this warn? ------------[ cut here ]------------ WARNING: CPU: 0 PID: 503 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xed/0x100 Modules linked in: remap_pfn(OE) CPU: 0 PID: 503 Comm: remap_pfn Tainted: G OE 6.8.0-rc6-dirty #436 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 RIP: 0010:untrack_pfn+0xed/0x100 Code: cc cc cc cc 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 ca 48 8b 7b 30 e8 81 de cf 00 89 6b 28 48 8b 7b 30 e8 05 cc b7 e8 ba c3 ce 00 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 RSP: 0018:ffffb5f683eafc78 EFLAGS: 00010282 RAX: 00000000ffffffea RBX: ffff960b18537658 RCX: 0000000000000043 RDX: bfffffffdcb13e00 RSI: 0000000000000043 RDI: ffff960e45b7a140 RBP: 0000000000000000 R08: 00007f7df7a00000 R09: ffff960a00000fb8 R10: ffff960a00000000 R11: 000fffffffffffff R12: 00007f7df7a08000 R13: 0000000000000000 R14: ffffb5f683eafdc8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff960e2fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7df7aeb110 CR3: 0000000118b66003 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? __warn+0x84/0x140 ? untrack_pfn+0xed/0x100 ? report_bug+0x1bd/0x1d0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? untrack_pfn+0xed/0x100 ? untrack_pfn+0x5c/0x100 unmap_single_vma+0xa6/0xe0 unmap_vmas+0xb2/0x190 exit_mmap+0xee/0x3c0 mmput+0x68/0x120 do_exit+0x2ec/0xb80 do_group_exit+0x31/0x80 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x66/0x180 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7f7df7aeb146 Code: Unable to access opcode bytes at 0x7f7df7aeb11c. RSP: 002b:00007ffe571100a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00007f7df7bf08a0 RCX: 00007f7df7aeb146 RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff80 R10: 0000000000000002 R11: 0000000000000246 R12: 00007f7df7bf08a0 R13: 0000000000000001 R14: 00007f7df7bf92e8 R15: 0000000000000000 ---[ end trace 0000000000000000 ]---