Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8114558imu; Tue, 4 Dec 2018 03:15:00 -0800 (PST) X-Google-Smtp-Source: AFSGD/VYHN8kTj8z/UT7vtJjDqfJoPsCB+fFCmcJlbQ6KVyMth4vWd1VNfMzgq3d2pmzUm0CJhUV X-Received: by 2002:a63:1157:: with SMTP id 23mr16597217pgr.245.1543922100845; Tue, 04 Dec 2018 03:15:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543922100; cv=none; d=google.com; s=arc-20160816; b=UsUUXDbcsrg1wdFbDc5+Uf/pZlNg5iie1x0XOFd2tRSMXaihxMIZeslWm18w2CcY14 GlIAA03O73aTSl+DUcF41pVnRKb312+WRxjO97Zh4+GvhITTqoBCzkGqox29hEiuGxCC PFHslvXY0Cx/2rrsI/bPBXnXeEEIxpzHGAe4ZeqJt7/TRRFvtx9Cx2VP+ZsaivQqzJRb guXBKXfD+tqok8Hj71ClTRrqf1kGKsQ5McF9vcf5MxQKx2BcKARuaVsPhcrS+CvXDbDy 7ETGtIcLAPyDxb8iAduYes/HU3t+HXSomeRdtlMqapZ2YqBvav4zlwP5Q/ayQuXeRR5Z vfGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=imvNUg+RfNg6KTbolwc0v2M/oJgf8uxGObim7oJyRuA=; b=C41qWT7MpmoEwn3VdekJ06JOQaMsI7w2oIZVW2fmOL/EPZOjYV6adS5TeXceucSdwU JYknKVuOr6lcVUtggztXKr/d0iU8mANznRHrpCPqm0EoEP3ERkk8MLA3CKCeqzxtkJem dBD6YijtLyLdfFXmc8AzdsL4Bk4SzRtVGcTawHOGaQPQjzJ6HHu0g2M4HRkRmJ6sAPXY WDNVwxTTYnyuPc6ZAmBsNMaLSCmLiXxny8LgiXCizTjjjpBlz+K1bcoysN9llC9K2mwl e19sq071rUOclkhNBICWdvUnSEh5u/4fM6sbL5AzdU7UzXO5mcA7T9XwTyo6rL0KfzHj 3DpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=v1Bz5xVD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1si4690112plr.189.2018.12.04.03.14.45; Tue, 04 Dec 2018 03:15:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=v1Bz5xVD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728580AbeLDLII (ORCPT + 99 others); Tue, 4 Dec 2018 06:08:08 -0500 Received: from mail.kernel.org ([198.145.29.99]:57328 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728562AbeLDLIF (ORCPT ); Tue, 4 Dec 2018 06:08:05 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D36DC214DB; Tue, 4 Dec 2018 11:08:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1543921684; bh=9CKDm/TquTPeAeSkOVwUKMF79Ns72oXHekuyDb1M+WI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=v1Bz5xVDJF21giOG1CZ0IQVxDqPupnGrFs9jOkVDGcjaOrhU2cpYKNCZhuesF1PCQ r4I5x4EjqB4913V3I1aAmUrmLXnHQ5BULlYSXeinZw+wHllRi2a18xHNVOZ7uex15f qoNjoV+xAsrWsn0jIcXU4IoHeV3DtpFPn2zzaiY4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Konstantin Khlebnikov , "Kirill A. Shutemov" , Michal Hocko , Nicholas Piggin , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [PATCH 4.9 02/50] mm/huge_memory.c: reorder operations in __split_huge_page_tail() Date: Tue, 4 Dec 2018 11:49:57 +0100 Message-Id: <20181204103714.604521350@linuxfoundation.org> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20181204103714.485546262@linuxfoundation.org> References: <20181204103714.485546262@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ commit 605ca5ede7643a01f4c4a15913f9714ac297f8a6 upstream. THP split makes non-atomic change of tail page flags. This is almost ok because tail pages are locked and isolated but this breaks recent changes in page locking: non-atomic operation could clear bit PG_waiters. As a result concurrent sequence get_page_unless_zero() -> lock_page() might block forever. Especially if this page was truncated later. Fix is trivial: clone flags before unfreezing page reference counter. This race exists since commit 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") while unsave unfreeze itself was added in commit 8df651c7059e ("thp: cleanup split_huge_page()"). clear_compound_head() also must be called before unfreezing page reference because after successful get_page_unless_zero() might follow put_page() which needs correct compound_head(). And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which is made especially for that and has semantic of smp_store_release(). Link: http://lkml.kernel.org/r/151844393341.210639.13162088407980624477.stgit@buzz Signed-off-by: Konstantin Khlebnikov Acked-by: Kirill A. Shutemov Cc: Michal Hocko Cc: Nicholas Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- mm/huge_memory.c | 36 +++++++++++++++--------------------- 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 583ad61cc2f1..c14aec110e90 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1876,26 +1876,13 @@ static void __split_huge_page_tail(struct page *head, int tail, struct page *page_tail = head + tail; VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail); - VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail); /* - * tail_page->_refcount is zero and not changing from under us. But - * get_page_unless_zero() may be running from under us on the - * tail_page. If we used atomic_set() below instead of atomic_inc() or - * atomic_add(), we would then run atomic_set() concurrently with - * get_page_unless_zero(), and atomic_set() is implemented in C not - * using locked ops. spin_unlock on x86 sometime uses locked ops - * because of PPro errata 66, 92, so unless somebody can guarantee - * atomic_set() here would be safe on all archs (and not only on x86), - * it's safer to use atomic_inc()/atomic_add(). + * Clone page flags before unfreezing refcount. + * + * After successful get_page_unless_zero() might follow flags change, + * for exmaple lock_page() which set PG_waiters. */ - if (PageAnon(head)) { - page_ref_inc(page_tail); - } else { - /* Additional pin to radix tree */ - page_ref_add(page_tail, 2); - } - page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; page_tail->flags |= (head->flags & ((1L << PG_referenced) | @@ -1907,14 +1894,21 @@ static void __split_huge_page_tail(struct page *head, int tail, (1L << PG_unevictable) | (1L << PG_dirty))); - /* - * After clearing PageTail the gup refcount can be released. - * Page flags also must be visible before we make the page non-compound. - */ + /* Page flags must be visible before we make the page non-compound. */ smp_wmb(); + /* + * Clear PageTail before unfreezing page refcount. + * + * After successful get_page_unless_zero() might follow put_page() + * which needs correct compound_head(). + */ clear_compound_head(page_tail); + /* Finally unfreeze refcount. Additional reference from page cache. */ + page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) || + PageSwapCache(head))); + if (page_is_young(head)) set_page_young(page_tail); if (page_is_idle(head)) -- 2.17.1