Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2861643rwd; Fri, 9 Jun 2023 19:18:16 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4bJiU1fQ8UcKoyGwQ2jlWMRxsZAGSH//qGs20Y4cSOrUJ9+Kba8J9TO8+tS9RCi/qUMXHL X-Received: by 2002:a17:906:eece:b0:959:6fb2:1c3b with SMTP id wu14-20020a170906eece00b009596fb21c3bmr3260706ejb.39.1686363496044; Fri, 09 Jun 2023 19:18:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686363496; cv=none; d=google.com; s=arc-20160816; b=yCmQhXXvLP0b8m7aUEAvWyLz8rFe1vCf59EZM4nzYxAdVb3HeOLAWYO2J0SMQFWJ3q R62xN8KLSSAGznA/7h2c3/rFel/p/bYywu4sg1Zwlv6E69vCgx7U5pHoAYEUOuzhYyL/ HnoBdwItH+YqyzmdaR79sYrak6/7SISUOK0XjSPyEAmxhOVweIYAbpbvGYkxI+CrHFYZ PpNla4hs7xPX1Bcl4ByGV0CZES9aQEPNWl1sNHJCITFRcD3KcgbDWQE2ufURN3IT5QGQ ERMHQwxpzKVusRq6fF3H2oOMvhGozPjrpRJo1uMPPl4cwFMRfecZWfHz4IcBC6VGL10p X4qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Y+PqyIB5ZqDwlmZ1up6rkpqNAc9gKklwSUdrIkGBt/E=; b=CPq8ZsfbUZSP6lFTqjtdwd/NvZMYqKNFCrPUK3UEyZtM4bMXZweDWUrid5pRvDPt1s J7Blcotv2ApeRZkGjypFBfW0kXT6OWfcDN3yQpFtg1spy+zadO2lMKpl7f7WMGu0r1fm fjTnwnkFsaJIdZNtGrYF8HnDmY9ZgrDwAxbmagSg/8V3wLxCCv3nX8v16y9kr9eeL+th FPX5SmQm5mVxSGV4OySRbgsD+MD85Hw2/dYrG9xE7s3NDR0cZtKjG8nP4trwu3Q6umkh H7JK4yK79FKi67zqORE043OCRk7LJw1ga/+ejvFqax6k7+3eMd7Gx4JMhM4LmbU79zw4 Bjzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=cu0m1MUv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id rp26-20020a170906d97a00b0096a97038da2si2294281ejb.598.2023.06.09.19.17.51; Fri, 09 Jun 2023 19:18:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=cu0m1MUv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232239AbjFJBp1 (ORCPT + 99 others); Fri, 9 Jun 2023 21:45:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjFJBp0 (ORCPT ); Fri, 9 Jun 2023 21:45:26 -0400 Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BC0E2D44 for ; Fri, 9 Jun 2023 18:45:24 -0700 (PDT) Received: by mail-lf1-x136.google.com with SMTP id 2adb3069b0e04-4f6255ad8aeso2996295e87.2 for ; Fri, 09 Jun 2023 18:45:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1686361522; x=1688953522; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Y+PqyIB5ZqDwlmZ1up6rkpqNAc9gKklwSUdrIkGBt/E=; b=cu0m1MUv/HXrUmSOm4jb525dJ3bd+mgyvDKFXtB+yAmRSQvoMPoei1e63p5TuksRUQ CIDLjtLkiqfA4HB7tGzdN9ELMJ9bAKLJbSH3om9TvSip61Ua4Kr/sCORSAicmie+lhex Ku1DndiaervlNRJ2mIs5rEmsH4te5Sq8Ej47k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686361522; x=1688953522; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y+PqyIB5ZqDwlmZ1up6rkpqNAc9gKklwSUdrIkGBt/E=; b=GJxlbKkvAoV9TFH3ezuGS9OEkuOesXlCFpWF3K58nHkrhWuX6XmlqOnSAOlB+jXGzh 2+drO3/rsJEdaiM/cqbuTPHDZdgZhME+78rx/RSgAMjXTgr/kDVdCBs6zg1ovhoj448n T1on+Tf9dAYggPGDCoiSO5XLv2C+5dx6/En4WV5A5Y5ZurI/yDiqa0S+SEsKNvsHsWlo OMM9EXh/gpIYs4N0P23t6Oscw+wwsZaZ4BEF9FCupwrk1daZGdoKYaSKVs0NsVou1rDJ x5U3nQ7EGfTS+rFzQf8ZIaMnUNfcIbFnW3JDMp91PkPZyB/uxqS0eoKyX4WCJ0p5lwbk 1Rpw== X-Gm-Message-State: AC+VfDx27ojF172Mt1LBx/cYw0m3HrPuGudMqpZ7fco29ztozuHcDDmL RIlUxMgdaD+POBu+zgkZtxtg5X5vWeMy5WWsduVjsA== X-Received: by 2002:ac2:5dc9:0:b0:4ec:8816:f4fc with SMTP id x9-20020ac25dc9000000b004ec8816f4fcmr1842267lfq.6.1686361522539; Fri, 09 Jun 2023 18:45:22 -0700 (PDT) MIME-Version: 1.0 References: <20230607053135.2087354-1-stevensd@google.com> In-Reply-To: From: David Stevens Date: Sat, 10 Jun 2023 10:45:11 +0900 Message-ID: Subject: Re: [PATCH] mm/khugepaged: fix iteration in collapse_file To: Hugh Dickins Cc: linux-mm@kvack.org, Andrew Morton , Peter Xu , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Jiaqi Yan , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 10, 2023 at 5:03=E2=80=AFAM Hugh Dickins wro= te: > > On Wed, 7 Jun 2023, David Stevens wrote: > > > > Remove an unnecessary call to xas_set(index) when iterating over the > > target range in collapse_file. The extra call to xas_set reset the xas > > cursor to the top of the tree, causing the xas_next call on the next > > iteration to walk the tree to index instead of advancing to index+1. > > This returned the same page again, which would cause collapse_file to > > fail because the page is already locked. > > > > This bug was hidden when CONFIG_DEBUG_VM was set. When that config was > > used, the xas_load in a subsequent VM_BUG_ON assert would walk xas from > > the top of the tree to index, causing the xas_next call on the next loo= p > > iteration to advance the cursor as expected. > > > > Fixes: a2e17cc2efc7 ("mm/khugepaged: maintain page cache uptodate flag"= ) > > Signed-off-by: David Stevens > > This patch seems to be wrong, but I have not investigated why. > > It's certainly an interesting and worrying observation, > if a CONFIG_DEBUG_VM=3Dy kernel goes a significantly different way. > > I almost always do have CONFIG_DEBUG_VM=3Dy, so you won't be surprised th= at > I never saw the issue. But once I ran an mm-everything with this patch i= n, > I hit that VM_BUG_ON_PAGE(page !=3D xas_load(&xas), page) for the first t= ime > (after about 2 hours of huge tmpfs swapping load). Is the particular workload one you can share? I haven't hit that assert so far with my tests. Also, I'm a little surprised that this is the assert which is being hit. My patch series didn't make that many changes to the first loop of the function, and the changes it did make were mostly about missing pages, not present pages. > As if you have just transferred the problem from DEBUG_VM=3Dn to DEBUG_VM= =3Dy. > But I then tried a CONFIG_DEBUG_VM off 6.4-rc1 kernel (including the fixe= e > but not this fixer) under similar load, and saw no problem in 14 hours. > So I can't even reproduce the bug that is being fixed here: only hit a > bug that it introduces. The bug this fixes isn't a crash - it's the fact that khugepage becomes nearly unable to collapse pages. Specifically, it can only collapse if there is exactly one present page, and that page is at index 511. Since MADV_COLLAPSE uses the same code path, it's easy to reproduce the bug I'm trying to fix with this program: #define _GNU_SOURCE #include #include #include #include #include #define THP_SIZE (2 * 1024 * 1024) #ifndef MADV_COLLAPSE #define MADV_COLLAPSE 25 #endif int main() { int memfd =3D memfd_create("memfd", MFD_CLOEXEC); assert(memfd >=3D 0); int ret =3D ftruncate(memfd, THP_SIZE); assert(ret >=3D 0); char *addr =3D mmap(NULL, THP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, mem= fd, 0); assert(addr !=3D MAP_FAILED); addr[0] =3D 0xff; ret =3D madvise(addr, THP_SIZE, MADV_COLLAPSE); assert(ret =3D=3D 0); } If DEBUG_VM isn't set, then the test will trigger the assert. If DEBUG_VM is set or if this fix is included, then the test will pass. -David > Hugh > > > --- > > mm/khugepaged.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 6b9d39d65b73..2d0d58fb4e7f 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -2070,7 +2070,6 @@ static int collapse_file(struct mm_struct *mm, un= signed long addr, > > TTU_IGNORE_MLOCK | TTU_BATCH_FLUS= H); > > > > xas_lock_irq(&xas); > > - xas_set(&xas, index); > > > > VM_BUG_ON_PAGE(page !=3D xas_load(&xas), page); > > > > -- > > 2.41.0.rc2.161.g9c6817b8e7-goog