Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp2808754rwb; Mon, 15 Aug 2022 11:46:22 -0700 (PDT) X-Google-Smtp-Source: AA6agR6mi1a6kGrz/znwmdg2JVrC7SQk4kA9rVl9UmAyac5KAqO/e/jliFCi1hmPLK77DgPIoxsO X-Received: by 2002:a17:907:948a:b0:737:4df4:2370 with SMTP id dm10-20020a170907948a00b007374df42370mr7450722ejc.264.1660589182372; Mon, 15 Aug 2022 11:46:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660589182; cv=none; d=google.com; s=arc-20160816; b=RN4XRUAwHv2m3CyJlHCdS+gZy2ZIrsKk11o0uJQSJDz43I0PPL5qL07N8eUnnlaqPf sh94z0ua5inFr/CxiWVPvu854XQFPdh3g2kKzwv4zog43XY7Px18u7XTM48fxkJVa3BX VzWS2n7z/KAs1Hh5RgSfi/YDDA4j+xSYCEz4n59jphKmO8dW/rfcfr390mFGxvRbJD7z JxHn2Jc470LsgUF+aH4DnqrWxzVYJGHbZ6mp1THyabSxBw0ltb+T5VXbjSpOJ06DIVx1 8Jso/TIqhunNPnyPSnbSpe4CgopXF7vE4eDKr4nD7abfCGtMp5e6ITvUZoLNSkr/x4c/ ataQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=wQjr+DSYGxM4TpUxnXhcHX0DX24mB5hjVi7pYJ9v++8=; b=Qh8Edeb9uDJNXJQMGAUznZ6WYWCm5OmBGQJOB6IIeC8/lqNzqslZFRmxUjswg9ebtu EtigO9FVDRHIegBxgFV0DkEDhN8Dm05PHU9cGeBtDo9Z46GRVLjTdjFBpzu2B8fstW10 zP6SvzH+0Z5ZMbmD7u5nJ67dRRbCd6Gq05kuwIdOX+MS9Y7yEYoK1JRkUx/86AYXYIJx 0NovO4PCdF0s+z2McdoOExfoSddQOrEMbfZWQItoCYutiY5at6eEkklv18nw1qr6o1ud /k2BZHY3bgVc7UPk2XFE7I2NE6fvoEGGjGx97059rTz9oW1oPjaMSoLfWwyfNjy9dRw4 pdbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bSvX9OEd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gs8-20020a1709072d0800b0072b53c6b384si9917287ejc.246.2022.08.15.11.45.56; Mon, 15 Aug 2022 11:46:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bSvX9OEd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231140AbiHOSDj (ORCPT + 99 others); Mon, 15 Aug 2022 14:03:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230280AbiHOSDh (ORCPT ); Mon, 15 Aug 2022 14:03:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0A50928E3F for ; Mon, 15 Aug 2022 11:03:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1660586614; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wQjr+DSYGxM4TpUxnXhcHX0DX24mB5hjVi7pYJ9v++8=; b=bSvX9OEdXg+ptQ9Vn42y5FdUy5IxdUvJ1tvFNVVvCRAVbOGkvGSz6Q530UK3kBmlRzvUO6 xn9awTUJpmNAyFDuIP+s7LL8b9eXUlgLrnS4Yk+5+ZWXgPEVdLZPGABAC1kPUHqtaXDY77 LgmMTBiqJZRybYYLk7cbLwIC/8H7YYQ= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-596-pKIDAYNsPuG_SCh8WLWIUg-1; Mon, 15 Aug 2022 14:03:32 -0400 X-MC-Unique: pKIDAYNsPuG_SCh8WLWIUg-1 Received: by mail-ed1-f70.google.com with SMTP id z3-20020a056402274300b0043d4da3b4b5so5122581edd.12 for ; Mon, 15 Aug 2022 11:03:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc; bh=wQjr+DSYGxM4TpUxnXhcHX0DX24mB5hjVi7pYJ9v++8=; b=ZgU4Saixz4UUk0uHo9f2q9a1QgFrY0fdpWxw2tDwwjyZUpRChaBBxyh3ckE6L8GL0A 84eAyA2PcIWDxo0qk2SdvNIY8VywDk/VVL3ZOr3JKCHIwz/Q4RYjyw6DP71rmGSjZivd EdVkBu3mks9F9OYCORvsP5zQVzl05JyWYoqwgdQvhKX1bcFqmTaLwQqO9UqrLJTZ8N7B VmeWX63XuzxvVFMgTxQ7JPJqPXn9qrbhWggRlPlIDYeM5B9BK0QeyUhSkxoZ5TpyhgO/ Uc/aXULvLWVv72geV8MVx+XS0lDymzqDsyIvI9Sdgc1eVRIXolneuYdpjfLPFJtg7aE7 YlQw== X-Gm-Message-State: ACgBeo3ltZUrmF8nl1Mwej4FAEzrMIpTbUn7Exp69KFdyNZR7E9hl3Cp n2SDNoWqmKipnv72+oS4dDkNvNG3XmIjFSAkbF2ZO49EgoDdELr4k4SN3+yyuTCWj8h+KdQBqHt IBuYxA/6XYoGg1g/ZwXNePAML29LsUZ0VPj3LQc/K X-Received: by 2002:aa7:dd50:0:b0:440:3e9d:784 with SMTP id o16-20020aa7dd50000000b004403e9d0784mr15656371edw.195.1660586611551; Mon, 15 Aug 2022 11:03:31 -0700 (PDT) X-Received: by 2002:aa7:dd50:0:b0:440:3e9d:784 with SMTP id o16-20020aa7dd50000000b004403e9d0784mr15656358edw.195.1660586611281; Mon, 15 Aug 2022 11:03:31 -0700 (PDT) MIME-Version: 1.0 References: <20220811103435.188481-1-david@redhat.com> <20220811103435.188481-3-david@redhat.com> <20220815153549.0288a9c6@thinkpad> <20220815175929.303774fd@thinkpad> In-Reply-To: <20220815175929.303774fd@thinkpad> From: David Hildenbrand Date: Mon, 15 Aug 2022 20:03:20 +0200 Message-ID: Subject: Re: [PATCH v2 2/2] mm/hugetlb: support write-faults in shared mappings To: Gerald Schaefer Cc: Mike Kravetz , Linux Kernel Mailing List , Linux MM , stable , linux-s390 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 15, 2022 at 5:59 PM Gerald Schaefer wrote: > > On Mon, 15 Aug 2022 17:07:32 +0200 > David Hildenbrand wrote: > > > On Mon, Aug 15, 2022 at 3:36 PM Gerald Schaefer > > wrote: > > > > > > On Thu, 11 Aug 2022 11:59:09 -0700 > > > Mike Kravetz wrote: > > > > > > > On 08/11/22 12:34, David Hildenbrand wrote: > > > > > If we ever get a write-fault on a write-protected page in a share= d mapping, > > > > > we'd be in trouble (again). Instead, we can simply map the page w= ritable. > > > > > > > > > > > > > > > > > > > Reason is that uffd-wp doesn't clear the uffd-wp PTE bit when > > > > > unregistering and consequently keeps the PTE writeprotected. Reas= on for > > > > > this is to avoid the additional overhead when unregistering. Note > > > > > that this is the case also for !hugetlb and that we will end up w= ith > > > > > writable PTEs that still have the uffd-wp PTE bit set once we ret= urn > > > > > from hugetlb_wp(). I'm not touching the uffd-wp PTE bit for now, = because it > > > > > seems to be a generic thing -- wp_page_reuse() also doesn't clear= it. > > > > > > > > > > VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE > > > > > indicates that MAP_SHARED handling was at least envisioned, but c= ould never > > > > > have worked as expected. > > > > > > > > > > While at it, make sure that we never end up in hugetlb_wp() on wr= ite > > > > > faults without VM_WRITE, because we don't support maybe_mkwrite() > > > > > semantics as commonly used in the !hugetlb case -- for example, i= n > > > > > wp_page_reuse(). > > > > > > > > Nit, > > > > to me 'make sure that we never end up in hugetlb_wp()' implies that > > > > we would check for condition in callers as opposed to first thing i= n > > > > hugetlb_wp(). However, I am OK with description as it. > > > > > > > Hi Gerald, > > > > > Is that new WARN_ON_ONCE() in hugetlb_wp() meant to indicate a real b= ug? > > > > Most probably, unless I am missing something important. > > > > Something triggers FAULT_FLAG_WRITE on a VMA without VM_WRITE and > > hugetlb_wp() would map the pte writable. > > Consequently, we'd have a writable pte inside a VMA that does not have > > write permissions, which is dubious. My check prevents that and bails > > out. > > > > Ordinary (!hugetlb) faults have maybe_mkwrite() (e.g., for FOLL_FORCE > > or breaking COW) semantics such that we won't be mapping PTEs writable > > if the VMA does not have write permissions. > > > > I suspect that either > > > > a) Some write fault misses a protection check and ends up triggering a > > FAULT_FLAG_WRITE where we should actually fail early. > > > > b) The write fault is valid and some VMA misses proper flags (VM_WRITE)= . > > > > c) The write fault is valid (e.g., for breaking COW or FOLL_FORCE) and > > we'd actually want maybe_mkwrite semantics. > > > > > It is triggered by libhugetlbfs testcase "HUGETLB_ELFMAP=3DR linkhuge= _rw" > > > (at least on s390), and crashes our CI, because it runs with panic_on= _warn > > > enabled. > > > > > > Not sure if this means that we have bug elsewhere, allowing us to > > > get to the WARN in hugetlb_wp(). > > > > That's what I suspect. Do you have a backtrace? > > Sure, forgot to send it with initial reply... > > [ 82.574749] ------------[ cut here ]------------ > [ 82.574751] WARNING: CPU: 9 PID: 1674 at mm/hugetlb.c:5264 hugetlb_wp+= 0x3be/0x818 > [ 82.574759] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 = nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft= _chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tabl= es nfnetlink sunrpc uvdevice s390_trng vfio_ccw mdev vfio_iommu_type1 eadm_= sch vfio zcrypt_cex4 sch_fq_codel configfs ghash_s390 prng chacha_s390 libc= hacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha2= 56_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4 > [ 82.574785] CPU: 9 PID: 1674 Comm: linkhuge_rw Kdump: loaded Not taint= ed 5.19.0-next-20220815 #36 > [ 82.574787] Hardware name: IBM 3931 A01 704 (LPAR) > [ 82.574788] Krnl PSW : 0704c00180000000 00000006c9d4bc6a (hugetlb_wp+0= x3c2/0x818) > [ 82.574791] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 P= M:0 RI:0 EA:3 > [ 82.574794] Krnl GPRS: 000000000227c000 0000000008640071 0000000000000= 000 0000000001200000 > [ 82.574796] 0000000001200000 00000000b5a98090 0000000000000= 255 00000000adb2c898 > [ 82.574797] 0000000000000000 00000000adb2c898 0000000001200= 000 00000000b5a98090 > [ 82.574799] 000000008c408000 0000000092fd7300 000003800339b= c10 000003800339baf8 > [ 82.574803] Krnl Code: 00000006c9d4bc5c: f160000407fe mvo 4= (7,%r0),2046(1,%r0) > 00000006c9d4bc62: 47000700 bc 0,1792 > #00000006c9d4bc66: af000000 mc 0,0 > >00000006c9d4bc6a: a7a80040 lhi %r10,64 > 00000006c9d4bc6e: b916002a llgfr %r2,%r10 > 00000006c9d4bc72: eb6ff1600004 lmg %r6,%r15,352(%r15= ) > 00000006c9d4bc78: 07fe bcr 15,%r14 > 00000006c9d4bc7a: 47000700 bc 0,1792 > [ 82.574814] Call Trace: > [ 82.574842] [<00000006c9d4bc6a>] hugetlb_wp+0x3c2/0x818 > [ 82.574846] [<00000006c9d4c62e>] hugetlb_no_page+0x56e/0x5a8 > [ 82.574848] [<00000006c9d4cac2>] hugetlb_fault+0x45a/0x590 > [ 82.574850] [<00000006c9d06d4a>] handle_mm_fault+0x182/0x220 > [ 82.574855] [<00000006c9a9d70e>] do_exception+0x19e/0x470 > [ 82.574858] [<00000006c9a9dff2>] do_dat_exception+0x2a/0x50 > [ 82.574861] [<00000006ca668a18>] __do_pgm_check+0xf0/0x1b0 > [ 82.574866] [<00000006ca677b3c>] pgm_check_handler+0x11c/0x170 > [ 82.574870] Last Breaking-Event-Address: > [ 82.574871] [<00000006c9d4b926>] hugetlb_wp+0x7e/0x818 > [ 82.574873] Kernel panic - not syncing: panic_on_warn set ... > [ 82.574875] CPU: 9 PID: 1674 Comm: linkhuge_rw Kdump: loaded Not taint= ed 5.19.0-next-20220815 #36 > [ 82.574877] Hardware name: IBM 3931 A01 704 (LPAR) > [ 82.574878] Call Trace: > [ 82.574879] [<00000006ca664f22>] dump_stack_lvl+0x62/0x80 > [ 82.574881] [<00000006ca657af8>] panic+0x118/0x300 > [ 82.574884] [<00000006c9ac3da6>] __warn+0xb6/0x160 > [ 82.574887] [<00000006ca29b1ea>] report_bug+0xba/0x140 > [ 82.574890] [<00000006c9a75194>] monitor_event_exception+0x44/0x80 > [ 82.574892] [<00000006ca668a18>] __do_pgm_check+0xf0/0x1b0 > [ 82.574894] [<00000006ca677b3c>] pgm_check_handler+0x11c/0x170 > [ 82.574897] [<00000006c9d4bc6a>] hugetlb_wp+0x3c2/0x818 > [ 82.574899] [<00000006c9d4c62e>] hugetlb_no_page+0x56e/0x5a8 > [ 82.574901] [<00000006c9d4cac2>] hugetlb_fault+0x45a/0x590 > [ 82.574903] [<00000006c9d06d4a>] handle_mm_fault+0x182/0x220 > [ 82.574906] [<00000006c9a9d70e>] do_exception+0x19e/0x470 > [ 82.574907] [<00000006c9a9dff2>] do_dat_exception+0x2a/0x50 > [ 82.574909] [<00000006ca668a18>] __do_pgm_check+0xf0/0x1b0 > [ 82.574912] [<00000006ca677b3c>] pgm_check_handler+0x11c/0x170 do_dat_exception() sets access =3D VM_ACCESS_FLAGS; do_exception() sets is_write =3D (trans_exc_code & store_indication) =3D=3D 0x400; and FAULT_FLAG_WRITE if (access =3D=3D VM_WRITE || is_write) flags |=3D FAULT_FLAG_WRITE; however, for VMA permission checks it only checks if (unlikely(!(vma->vm_flags & access))) goto out_up; as VM_ACCESS_FLAGS includes VM_WRITE | VM_READ ... We end up triggering a write fault (FAULT_FLAG_WRITE), even though the VMA does not allow for writes. I assume that's what happens and that it's a bug in s390x code.