Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3746603imu; Mon, 10 Dec 2018 07:14:24 -0800 (PST) X-Google-Smtp-Source: AFSGD/UbjuhMs72RGlIBnKq0suMhz7xXIj81agH4FiFXWIYDtTIh/kwdPt8gktHs6qGjdSx3s0TY X-Received: by 2002:a65:63d3:: with SMTP id n19mr11347489pgv.179.1544454864353; Mon, 10 Dec 2018 07:14:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544454864; cv=none; d=google.com; s=arc-20160816; b=Gvm/H+T+addCCf1fofWG0+ANP2yw8MqbS3NAnVOFYlpkKZvyWnzgF5Pz+83lTBMjvv Mdn2Gp+vU4wObM3ohGt7KnUOLtBx8oc33G38gSilUbizew0eLKWJIKGoMJA502ZVQRqu jzmrxqB7SUl61RciifK2lwioGl7kt3S8NCXseX4/gjKDBu3br4WxZFmfCFiNwMVLJdkQ IqBEbWrazbokiJ54PKBpuMNL0sF0w/GAk83W6b0pZJSoSa/LHRrAp1LRYEQNNULAWzPV c+Lmf6iPgko5IFQ8ugNr999Pt9dvUGnmV3eWYPDp6u6eDiOm1MjT8GFOFKBykEaynhrp uHtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=4GuqSKY6uJELXXF4HdEubR423WKhWObkR4oB3tErzw4=; b=xDTkdJkbJtV+0kHFQWM30jpHSbCf8l2zrAe8BjbLbiDCwjLbhLbJAUlcobSdMUQVsb kATj2oST5F3LwmumyjeSsCRU9x1PB9H/qqeqHuG44N5mNwkn3ewU+7OMP0qaFPmtC8ct SrLaY7UC6CJi0MoXRLGvg9yxWIUYMHeryJslGNcltPXaANo+kx/A+2yjTWBAc+5HcOX2 fNL+2BdBG2Gfd8LcjeUfNPlF8ctZWNMblJLEC34Xk95NBaKYmyy/r6WB7VeVNVLzCbfd Bcbt2iM6f6QsC/774sW17/6gGIAJht0wkhBjU4kOpiwsebC1Jt6m0dgYb2YGfe/OSznB WuEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@armlinux.org.uk header.s=pandora-2014 header.b=gYUdTEfJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l123si10685799pfc.187.2018.12.10.07.14.08; Mon, 10 Dec 2018 07:14:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@armlinux.org.uk header.s=pandora-2014 header.b=gYUdTEfJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727950AbeLJPG1 (ORCPT + 99 others); Mon, 10 Dec 2018 10:06:27 -0500 Received: from pandora.armlinux.org.uk ([78.32.30.218]:52946 "EHLO pandora.armlinux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726978AbeLJPG0 (ORCPT ); Mon, 10 Dec 2018 10:06:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2014; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=4GuqSKY6uJELXXF4HdEubR423WKhWObkR4oB3tErzw4=; b=gYUdTEfJbSUBmsBE8nBM8nCXI ehWFi+KCqX9/X+yiYcFB05r3owBJAkfefmmEcUIkZ0awi1CHwsnVjeh+Udds3OJNLz+zQX6EpRC0X o5iV250ikzdnioI2yfnPJF4qAVLc6ziphzse9y8+2uw4J9JVCLToHtUo6a+xOughG4NEk=; Received: from n2100.armlinux.org.uk ([2002:4e20:1eda:1:214:fdff:fe10:4f86]:56510) by pandora.armlinux.org.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.90_1) (envelope-from ) id 1gWN8B-0000wU-3P; Mon, 10 Dec 2018 15:05:59 +0000 Received: from linux by n2100.armlinux.org.uk with local (Exim 4.90_1) (envelope-from ) id 1gWN86-0002sP-4I; Mon, 10 Dec 2018 15:05:54 +0000 Date: Mon, 10 Dec 2018 15:05:51 +0000 From: Russell King - ARM Linux To: Robin Murphy Cc: Rafael David Tinoco , Rich Felker , linux-ia64@vger.kernel.org, Sergey Senozhatsky , linux-sh@vger.kernel.org, Benjamin Herrenschmidt , Heiko Carstens , Ram Pai , linux-mips@vger.kernel.org, linux-mm@kvack.org, Khalid Aziz , Paul Mackerras , "H . Peter Anvin" , sparclinux@vger.kernel.org, linux-s390@vger.kernel.org, Yoshinori Sato , Michael Ellerman , x86@kernel.org, Ingo Molnar , Catalin Marinas , James Hogan , Anthony Yznaga , Nitin Gupta , Fenghua Yu , Joerg Roedel , Juergen Gross , Vasily Gorbik , Will Deacon , Nicholas Piggin , Borislav Petkov , Andy Lutomirski , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, Christophe Leroy , Tony Luck , Jiri Kosina , linux-kernel@vger.kernel.org, Ralf Baechle , Minchan Kim , Paul Burton , "Aneesh Kumar K . V" , Martin Schwidefsky , linuxppc-dev@lists.ozlabs.org, "David S . Miller" , "Kirill A . Shutemov" Subject: Re: [PATCH] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support Message-ID: <20181210150551.GE30658@n2100.armlinux.org.uk> References: <20181210142105.6750-1-rafael.tinoco@linaro.org> <4da655ec-a1ac-c524-1eb4-5cbd18b265ef@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4da655ec-a1ac-c524-1eb4-5cbd18b265ef@arm.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 10, 2018 at 02:35:55PM +0000, Robin Murphy wrote: > On 10/12/2018 14:21, Rafael David Tinoco wrote: > >On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the > >physical frame number might be so big that zsmalloc obj encoding (to > >location) will break, causing: > > > >BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc > >Read of size 4 at addr 00000000 by task mkfs.ext4/623 > >CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 4.19.0-rc8-00017-g8239bc6d3307-dirty #15 > >Hardware name: Generic DT based system > >[] (unwind_backtrace) from [] (show_stack+0x20/0x24) > >[] (show_stack) from [] (dump_stack+0xbc/0xe8) > >[] (dump_stack) from [] (kasan_report+0x248/0x390) > >[] (kasan_report) from [] (__asan_load4+0x78/0xb4) > >[] (__asan_load4) from [] (zs_map_object+0xa4/0x2bc) > >[] (zs_map_object) from [] (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram]) > >[] (zram_bvec_rw.constprop.2 [zram]) from [] (zram_make_request+0x234/0x46c [zram]) > >[] (zram_make_request [zram]) from [] (generic_make_request+0x304/0x63c) > >[] (generic_make_request) from [] (submit_bio+0x4c/0x1c8) > >[] (submit_bio) from [] (submit_bh_wbc.constprop.15+0x238/0x26c) > >[] (submit_bh_wbc.constprop.15) from [] (__block_write_full_page+0x524/0x76c) > >[] (__block_write_full_page) from [] (block_write_full_page+0x1bc/0x1d4) > >[] (block_write_full_page) from [] (blkdev_writepage+0x24/0x28) > >[] (blkdev_writepage) from [] (__writepage+0x44/0x78) > >[] (__writepage) from [] (write_cache_pages+0x3b8/0x800) > >[] (write_cache_pages) from [] (generic_writepages+0x74/0xa0) > >[] (generic_writepages) from [] (blkdev_writepages+0x18/0x1c) > >[] (blkdev_writepages) from [] (do_writepages+0x68/0x134) > >[] (do_writepages) from [] (__filemap_fdatawrite_range+0xb0/0xf4) > >[] (__filemap_fdatawrite_range) from [] (file_write_and_wait_range+0x64/0xd0) > >[] (file_write_and_wait_range) from [] (blkdev_fsync+0x54/0x84) > >[] (blkdev_fsync) from [] (vfs_fsync_range+0x70/0xd4) > >[] (vfs_fsync_range) from [] (do_fsync+0x4c/0x80) > >[] (do_fsync) from [] (sys_fsync+0x1c/0x20) > >[] (sys_fsync) from [] (ret_fast_syscall+0x0/0x2c) > > > >when trying to decode (the pfn) and map the object. > > > >That happens because one architecture might not re-define > >MAX_PHYSMEM_BITS, like in this ARM 32-bit w/ LPAE enabled example. For > >32-bit systems, if not re-defined, MAX_POSSIBLE_PHYSMEM_BITS will > >default to BITS_PER_LONG (32) in most cases, and, with PAE enabled, > >_PFN_BITS might be wrong: which may cause obj variable to overflow if > >frame number is in HIGHMEM and referencing a page above the 4GB > >watermark. > > > >commit 6e00ec00b1a7 ("staging: zsmalloc: calculate MAX_PHYSMEM_BITS if > >not defined") realized MAX_PHYSMEM_BITS depended on SPARSEMEM headers > >and "fixed" it by calculating it using BITS_PER_LONG if SPARSEMEM wasn't > >used, like in the example given above. > > > >Systems with potential for PAE exist for a long time and assuming > >BITS_PER_LONG seems inadequate. Defining MAX_PHYSMEM_BITS looks better, > >however it is NOT a constant anymore for x86. > > > >SO, instead, MAX_POSSIBLE_PHYSMEM_BITS should be defined by every > >architecture using zsmalloc, together with a sanity check for > >MAX_POSSIBLE_PHYSMEM_BITS being too big on 32-bit systems. > > > >Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17 > >Signed-off-by: Rafael David Tinoco > >--- > > arch/arm/include/asm/pgtable-2level-types.h | 2 ++ > > arch/arm/include/asm/pgtable-3level-types.h | 2 ++ > > arch/arm64/include/asm/pgtable-types.h | 2 ++ > > arch/ia64/include/asm/page.h | 2 ++ > > arch/mips/include/asm/page.h | 2 ++ > > arch/powerpc/include/asm/mmu.h | 2 ++ > > arch/s390/include/asm/page.h | 2 ++ > > arch/sh/include/asm/page.h | 2 ++ > > arch/sparc/include/asm/page_32.h | 2 ++ > > arch/sparc/include/asm/page_64.h | 2 ++ > > arch/x86/include/asm/pgtable-2level_types.h | 2 ++ > > arch/x86/include/asm/pgtable-3level_types.h | 3 +- > > arch/x86/include/asm/pgtable_64_types.h | 4 +-- > > mm/zsmalloc.c | 35 +++++++++++---------- > > 14 files changed, 45 insertions(+), 19 deletions(-) > > > >diff --git a/arch/arm/include/asm/pgtable-2level-types.h b/arch/arm/include/asm/pgtable-2level-types.h > >index 66cb5b0e89c5..552dba411324 100644 > >--- a/arch/arm/include/asm/pgtable-2level-types.h > >+++ b/arch/arm/include/asm/pgtable-2level-types.h > >@@ -64,4 +64,6 @@ typedef pteval_t pgprot_t; > > #endif /* STRICT_MM_TYPECHECKS */ > >+#define MAX_POSSIBLE_PHYSMEM_BITS 32 > >+ > > #endif /* _ASM_PGTABLE_2LEVEL_TYPES_H */ > >diff --git a/arch/arm/include/asm/pgtable-3level-types.h b/arch/arm/include/asm/pgtable-3level-types.h > >index 921aa30259c4..664c39e6517c 100644 > >--- a/arch/arm/include/asm/pgtable-3level-types.h > >+++ b/arch/arm/include/asm/pgtable-3level-types.h > >@@ -67,4 +67,6 @@ typedef pteval_t pgprot_t; > > #endif /* STRICT_MM_TYPECHECKS */ > >+#define MAX_POSSIBLE_PHYSMEM_BITS 36 > > Nit: with LPAE, physical addresses go up to 40 bits, not just 36. Right, with that set at 40, we get: #define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT) == 28 #define OBJ_TAG_BITS 1 #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) == 3 #define ZS_MAX_ZSPAGE_ORDER 2 #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) == 4 #define ZS_MIN_ALLOC_SIZE \ MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) == 4 << 12 >> 3 = 2048 or half-page allocations. Given this in Documentation/vm/zsmalloc.rst: On the other hand, if we just use single (0-order) pages, it would suffer from very high fragmentation -- any object of size PAGE_SIZE/2 or larger would occupy an entire page. This was one of the major issues with its predecessor (xvmalloc). It seems that would not be acceptable, so I have to ask whether using an unsigned long to store PFN + object ID is really acceptable. Maybe the real solution to this problem is to stop using an unsigned long to contain both the PFN and object ID? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up