Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp84297pxk; Wed, 23 Sep 2020 23:32:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy4WYEhkIlZa9/Yk1xfwEapwIbhX3zfZGLm4/V58lg48WkVAoLqT2sSJEEOGtLXuY/8jwqd X-Received: by 2002:a17:906:b04a:: with SMTP id bj10mr3090603ejb.303.1600929153540; Wed, 23 Sep 2020 23:32:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600929153; cv=none; d=google.com; s=arc-20160816; b=KCPHg4OQaqs7u1ve5N7qf4rPhS+RKTbw+Q3Wd1HbV5tzMjuNr5DtlYda4/2KC0qLiZ onzEUJ7xNZv3Z1wdoLr5FbFykb4gK0M7tpyFXLKG9jQEvAewtLXy2Ew0i7Sk3F3fip5A 7xh+eOZdu/rqFSAPh+9+fqiEvx6ceW4x7CPXPW5kWWPXfniWkCSQb6nEiU1dYZc+S5qe CTMxc0Hv8W1WQHDxz4R9mGEAwEeS1wrZ92ziIa1sgzDlCnC+aTuWn74n5YRHo8ofhTBe DL7Ox/fz7fXxoY10Ht1b6h3+PIqqruJf2nqLSgnTR24WhzXr89NKAPgJIDs+Nz085oIO YhvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=rFFyYTrms2NUjPJLW1DPbzsCtUaxd33WbnTy6rIAmEs=; b=Af8eNIxYSpT5b0kPjVtgveS630yuhvtvQ9eFIA7/kT5dGyiptL96nQLmHMqiG/WXJy Os4908KTWVxuuP3vomAmDMFosGC0zrqcFRIW+nHSYVbkbhRRSwpY0Ya8awGaINv/2euv YKcQd/1jA32oL0gp5lHe431vBz5HgpLDfASgF0jCwIdOFHGCCyQDZJioQF3eTMxJz9Ex RELUobxbrEn4HXlrGKVZx0a2vXfKOtVZyli16ikExfOaoHIkqAmScClyhU0/+270J7Qc haJOM5WLp6/r6LteOWiOzIy1HZO8miFyXuqdg7G6El7hCZchUVScnj9KaWdUv6RWb2f7 V3ow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Qy0SMjVC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gj3si1348612ejb.728.2020.09.23.23.32.09; Wed, 23 Sep 2020 23:32:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Qy0SMjVC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726896AbgIXGas (ORCPT + 99 others); Thu, 24 Sep 2020 02:30:48 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:32487 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726119AbgIXGas (ORCPT ); Thu, 24 Sep 2020 02:30:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600929047; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rFFyYTrms2NUjPJLW1DPbzsCtUaxd33WbnTy6rIAmEs=; b=Qy0SMjVCGP7rqzu9BpV6sqDYuQFyR/SN/uVsPvsZxgveH57B28YatpzFdH1c9qTZQ5C8dE /vqO1sT+NTeGcdRdIvBTZ7YHNRqNxlToyzmkSdHrBlM3vnV0jMgv0++6yy+XXi6T1gXADh PbmAIXFrQM/XbyuI0VnFxPi3An0t0Lw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-476-Y4uhOkLUOfKTtV0MCvFwnQ-1; Thu, 24 Sep 2020 02:30:45 -0400 X-MC-Unique: Y4uhOkLUOfKTtV0MCvFwnQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 62A6C64096; Thu, 24 Sep 2020 06:30:43 +0000 (UTC) Received: from optiplex-lnx (unknown [10.3.128.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1776678827; Thu, 24 Sep 2020 06:30:41 +0000 (UTC) Date: Thu, 24 Sep 2020 02:30:38 -0400 From: Rafael Aquini To: "Huang, Ying" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org Subject: Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference Message-ID: <20200924063038.GD1023012@optiplex-lnx> References: <20200922184838.978540-1-aquini@redhat.com> <878sd1qllb.fsf@yhuang-dev.intel.com> <20200923043459.GL795820@optiplex-lnx> <87sgb9oz1u.fsf@yhuang-dev.intel.com> <20200923130138.GM795820@optiplex-lnx> <87blhwng5f.fsf@yhuang-dev.intel.com> <20200924020928.GC1023012@optiplex-lnx> <877dsjessq.fsf@yhuang-dev.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <877dsjessq.fsf@yhuang-dev.intel.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 24, 2020 at 11:51:17AM +0800, Huang, Ying wrote: > Rafael Aquini writes: > > The bug here is quite simple: split_swap_cluster() misses checking for > > lock_cluster() returning NULL before committing to change cluster_info->flags. > > I don't think so. We shouldn't run into this situation firstly. So the > "fix" hides the real bug instead of fixing it. Just like we call > VM_BUG_ON_PAGE(!PageLocked(head), head) in split_huge_page_to_list() > instead of returning if !PageLocked(head) silently. > Not the same thing, obviously, as you are going for an apples-to-carrots comparison, but since you mentioned: split_huge_page_to_list() asserts (in debug builds) *page is locked, and later checks if *head bears the SwapCache flag. deferred_split_scan(), OTOH, doesn't hand down the compound head locked, but the 2nd page in the group instead. This doesn't necessarely means it's a problem, though, but might help on hitting the issue. > > The fundamental problem has nothing to do with allocating, or not allocating > > a swap cluster, but it has to do with the fact that the THP deferred split scan > > can transiently race with swapcache insertion, and the fact that when you run > > your swap area on rotational storage cluster_info is _always_ NULL. > > split_swap_cluster() needs to check for lock_cluster() returning NULL because > > that's one possible case, and it clearly fails to do so. > > If there's a race, we should fix the race. But the code path for > swapcache insertion is, > > add_to_swap() > get_swap_page() /* Return if fails to allocate */ > add_to_swap_cache() > SetPageSwapCache() > > While the code path to split THP is, > > split_huge_page_to_list() > if PageSwapCache() > split_swap_cluster() > > Both code paths are protected by the page lock. So there should be some > other reasons to trigger the bug. As mentioned above, no they seem to not be protected (at least, not the same page, depending on the case). While add_to_swap() will assure a page_lock on the compound head, split_huge_page_to_list() does not. > And again, for HDD, a THP shouldn't have PageSwapCache() set at the > first place. If so, the bug is that the flag is set and we should fix > the setting. > I fail to follow your claim here. Where is the guarantee, in the code, that you'll never have a compound head in the swapcache? > > Run a workload that cause multiple THP COW, and add a memory hogger to create > > memory pressure so you'll force the reclaimers to kick the registered > > shrinkers. The trigger is not heavy swapping, and that's probably why > > most swap test cases don't hit it. The window is tight, but you will get the > > NULL pointer dereference. > > Do you have a script to reproduce the bug? > Nope, a convoluted set of internal regression tests we have usually triggers it. In the wild, customers running HANNA are seeing it, occasionally. > > Regardless you find furhter bugs, or not, this patch is needed to correct a > > blunt coding mistake. > > As above. I don't agree with that. > It's OK to disagree, split_swap_cluster still misses the cluster_info NULL check, though.