Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp147732rwb; Sun, 25 Sep 2022 17:24:08 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7/xMqr0NXOcmFkEiJxtyZuniMMFX93XS3K1+FrjYY6b+CvF7pvdZKCJAFuuqiNIGDpc4d6 X-Received: by 2002:a63:e044:0:b0:43b:ddc9:3885 with SMTP id n4-20020a63e044000000b0043bddc93885mr17503377pgj.325.1664151848248; Sun, 25 Sep 2022 17:24:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664151848; cv=none; d=google.com; s=arc-20160816; b=QNQnyunAZrOvosmXyurlaS7fF53qnCX8HwL6CUR0yRgSRpXTcAeQBt3oT5NR0XRlVe 0osfuAlYsObQyKdZU8D9IW87kzLeFC6ZmFwf7CWup7KyEOp50xnYApNo/v6yK3Bef86R EDKqJ64qVmBwQx/xiv5HIdjnW80827g6HV1/ttKj9dzq79GbryHWvrJmp2UrrKQPrSTN l/Zh5J+eJ8bQHODfTKgpPOnTcxjYOHxL1/o3bNu5QQ9U7DuYa1e38P5UW1dvBh3rtrXA ziV0Owgj9bRhZm//I2eqNnW7+UwApxMFed5RFTlhbvw/Z/HTHpZdu7HTmkGqKbEt1tCf ETBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=aK8oEr6Mo1r3LJjWh8qC8wg2ZbpaqIAd/c7oyUSXWrA=; b=Rux/8NZo92BpGuOostj1HYparr6kTuRsLEARIQ0POr7/lafOlp7R3SFsYVpkG/h/kj o75JRrPVua3thPn8c1d+ZbW7Qm+Txy6hCnwFBxU4pcn/tuHtE5/vnSpSQrlx4PSwHZ0N ySl9BziYPFdUY3ztg8WKU9ZFbMuOCsCElDakJOemLDaz7dUVaHij9TkG40bHDa20npdv ttEbYn9bmiEznXsu7wxBUO5bwTRQ30YC2hF1HwWV45haX8Hkoi1Z3+0ecFPxHwcSinTb 5UsyAkekI5SPzhOB8xKSOF1YTK00PVFLOq3wVrCK5Ke2v/KxmxIqo3ZbFI9gIKGyUstv A4Ng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HBO5wigp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q95-20020a17090a17e800b001fd7b0552b0si8887967pja.37.2022.09.25.17.23.53; Sun, 25 Sep 2022 17:24:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HBO5wigp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233038AbiIZALR (ORCPT + 99 others); Sun, 25 Sep 2022 20:11:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229605AbiIZALQ (ORCPT ); Sun, 25 Sep 2022 20:11:16 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24C95255B7 for ; Sun, 25 Sep 2022 17:11:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664151074; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=aK8oEr6Mo1r3LJjWh8qC8wg2ZbpaqIAd/c7oyUSXWrA=; b=HBO5wigpGSysqqI0rymLm575OWBCjrOvz564lZ39clHiZMXzofzNPG+XwIfDxtGp8fUF/Q JDbJvkmgUk6w/RZVVylgQ/gyhanf/mvoJOgJ0+OcppRqq3VvSRIJvfo1+FDr2GkzcF+ymJ 5ZW9Uy0SEFgMJjcWj9787ADDVf+ZFNU= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-403-ea2bLH5TMSWJcBzj6AqynQ-1; Sun, 25 Sep 2022 20:11:12 -0400 X-MC-Unique: ea2bLH5TMSWJcBzj6AqynQ-1 Received: by mail-qk1-f198.google.com with SMTP id u20-20020a05620a455400b006ce5151968dso3896598qkp.11 for ; Sun, 25 Sep 2022 17:11:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date; bh=aK8oEr6Mo1r3LJjWh8qC8wg2ZbpaqIAd/c7oyUSXWrA=; b=HnjdepJh5pMPwIKvIbqFxGxnIKa3JCnzifeBxQBVAU+31660PkiCohxiM8BWzy6npK 10SYN4EASnfBeFPd07CrewAGOh9DjmKiVerOOZDej++R1W2VKrOnJHp3kgWPP72HdtJi rjy/ggqC0RcAC7HkriBM3/AFpSklbg7/2Jjhddh+UjfIaMsxPFW7UFXX3V9/qpoZUDy+ IeAdGw1Cbo6fT+mf8wQHFQARsTmSgKZBGE4y+fyxBkQ5aLicq9MMoZDrPatrcMmqUucM mXAeYTmjVDYlPqrILMCWLpewCVyRXzWK35x3xP34CTE7Q89jrNL+UFdUfxB0boEhCZur wfAQ== X-Gm-Message-State: ACrzQf2h+SO4ybMK4TNtE0P/wzgXBaq8qjdsvnl8YcUex+DVjap4RLKq IwpXnhTAm3ajuW3TCF5jy05J0AvdNs8DDXlMljYBTfw3lhv5ig8b8UyeEOM6Iz47cXWn6Vz5PzJ 2hFS03SXjwf7iUt4X0mpQq3Sx X-Received: by 2002:a0c:a9d5:0:b0:4a6:3ec0:74ba with SMTP id c21-20020a0ca9d5000000b004a63ec074bamr15496721qvb.31.1664151071932; Sun, 25 Sep 2022 17:11:11 -0700 (PDT) X-Received: by 2002:a0c:a9d5:0:b0:4a6:3ec0:74ba with SMTP id c21-20020a0ca9d5000000b004a63ec074bamr15496698qvb.31.1664151071664; Sun, 25 Sep 2022 17:11:11 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id bi16-20020a05620a319000b006b58d8f6181sm10636112qkb.72.2022.09.25.17.11.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 25 Sep 2022 17:11:11 -0700 (PDT) Date: Sun, 25 Sep 2022 20:11:09 -0400 From: Peter Xu To: Mike Kravetz Cc: Hugh Dickins , Axel Rasmussen , Yang Shi , Matthew Wilcox , syzbot , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, llvm@lists.linux.dev, nathan@kernel.org, ndesaulniers@google.com, songmuchun@bytedance.com, syzkaller-bugs@googlegroups.com, trix@redhat.com Subject: Re: [syzbot] general protection fault in PageHeadHuge Message-ID: References: <0000000000006c300705e95a59db@google.com> <7693a84-bdc2-27b5-2695-d0fe8566571f@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Sep 24, 2022 at 12:01:16PM -0700, Mike Kravetz wrote: > On 09/24/22 11:06, Peter Xu wrote: > > > > Sorry I forgot to reply on this one. > > > > I didn't try linux-next, but I can easily reproduce this with mm-unstable > > already, and I verified that Hugh's patch fixes the problem for shmem. > > > > When I was testing I found hugetlb selftest is broken too but with some > > other errors: > > > > $ sudo ./userfaultfd hugetlb 100 10 > > ... > > bounces: 6, mode: racing ver read, ERROR: unexpected write fault (errno=0, line=779) > > > > The failing check was making sure all MISSING events are not triggered by > > writes, but frankly I don't really know why it's required, and that check > > existed since the 1st commit when test was introduced. > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c47174fc362a089b1125174258e53ef4a69ce6b8 > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/vm/userfaultfd.c?id=c47174fc362a089b1125174258e53ef4a69ce6b8#n291 > > > > And obviously some recent hugetlb-related change caused that to happen. > > > > Dropping that check can definitely work, but I'll have a closer look soon > > too to make sure I didn't miss something. Mike, please also let me know if > > you are aware of this problem. > > > > Peter, I am not aware of this problem. I really should make running ALL > hugetlb tests part of my regular routine. > > If you do not beat me to it, I will take a look in the next few days. Just to update - my bisection points to 00cdec99f3eb ("hugetlbfs: revert use i_mmap_rwsem to address page fault/truncate race", 2022-09-21). I don't understand how they are related so far, though. It should be a timing thing because the failure cannot be reproduced on a VM but only on the host, and it can also pass sometimes even on the host but rarely. Logically all the uffd messages in the stress test should be generated by the locking thread, upon: pthread_mutex_lock(area_mutex(area_dst, page_nr)); I thought a common scheme for lock() fast path should already be an userspace cmpxchg() and that should be a write fault already. For example, I did some stupid hack on the test and I can trigger the write check fault with anonymous easily with an explicit cmpxchg on byte offset 128: diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 74babdbc02e5..a7d6938d4553 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -637,6 +637,10 @@ static void *locking_thread(void *arg) } else page_nr += 1; page_nr %= nr_pages; + char *ptr = area_dst + (page_nr * page_size) + 128; + char _old = 0, new = 1; + (void)__atomic_compare_exchange_n(ptr, &_old, new, false, + __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); pthread_mutex_lock(area_mutex(area_dst, page_nr)); count = *area_count(area_dst, page_nr); if (count != count_verify[page_nr]) I'll need some more time thinking about it before I send a patch to drop the write check.. Thanks, -- Peter Xu