Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3989174imu; Mon, 10 Dec 2018 11:07:10 -0800 (PST) X-Google-Smtp-Source: AFSGD/UR/dxfXoMXVxXrZ632f1es2LKUoUPuFIhkFy+k3H89LGdQIq8j5/HCr/wwWhIdZU521G7v X-Received: by 2002:a63:30c8:: with SMTP id w191mr12194030pgw.120.1544468830171; Mon, 10 Dec 2018 11:07:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544468830; cv=none; d=google.com; s=arc-20160816; b=pfg93mIKxtr4En1aM8bJdigulrkoCes80xoSR6jNptFUWrQEz4kkh0PbB20k1Oilfc 9DCjcwmOAHuMaCe2Rr4sWootxrm6+KGZWJrs9i7YnwPFTBpxZ3Cd+Nzt7slnW0R0euEw XeCSXvEA+VnUymwhWOWvPxLuBeX0g24lvBxvU60jq6tGf9PwO3sqdsWjZfzL1FFcbuXY rMB43aQ1nu1cwII9XMl27txGpTX83ssF3fZ/ni3UBSU0TcYSSu/bqabRPSQ6zyOEMeHs NXSL8xFPNjm5rK6agUKXIZttQmqanZSvzkfqrAAavrF5I2VSnYSR7KzmUmP9fYRUIRCT UpvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:to:subject:cc :dkim-signature; bh=4E/A4XqGwktpScqeqeYZIvgoC7FcBTRDxKDPbLw2eKY=; b=X4mO+yNUGPoiLzJTmdBptbwSlbdDxeTCAA7R2rPwU2Msr3KUSSXe4Sr+4cvJuNfZCO rKZtsTsIZEHtPTuY+iTTERBxuKdzpwOnKKodSu+iE3iGbCPVcusApXY4y2z/IriGSOpO B91QyzAGnpAbyt+W0UYtyOTZ2HQ/gciaYKsOt4bSgRmvosSdjiMmeWRoQnA/odSNPupT 2roArSb3eYjqcdkt/jamXaWXC5TJeVYSvikXNKIGhcq3GqVKOllOOTK7uZfe4EzW44JE 5VtZZpabs47bIJa+ARqovY9xGrq303eTsIFDg9I6cRuvkgLma4DAxMUdgvJGpQnt/x4a ixMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Gmw+rb+V; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f5si10710034pfn.259.2018.12.10.11.06.54; Mon, 10 Dec 2018 11:07:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Gmw+rb+V; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728119AbeLJQyO (ORCPT + 99 others); Mon, 10 Dec 2018 11:54:14 -0500 Received: from mail-qk1-f194.google.com ([209.85.222.194]:35372 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727054AbeLJQyN (ORCPT ); Mon, 10 Dec 2018 11:54:13 -0500 Received: by mail-qk1-f194.google.com with SMTP id w204so6888457qka.2 for ; Mon, 10 Dec 2018 08:54:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=cc:subject:to:references:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=4E/A4XqGwktpScqeqeYZIvgoC7FcBTRDxKDPbLw2eKY=; b=Gmw+rb+VDP/lXIa+o1QmGeclT5AayLA2m5FiiqZUMwT9qiPJgnvJgpGgSaOTIeUfEC iagAhuU1PBTRAIte/H/Mm+y2dRGMUMGiiGA/ORU9P3cwXkAjuuFX9ndr0aHIyjCWZSFc 8MaeMjU2Jmdhzz55JVTOvXy8V5l9AArT2pVVk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:cc:subject:to:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=4E/A4XqGwktpScqeqeYZIvgoC7FcBTRDxKDPbLw2eKY=; b=Yo3QnqSJTHlGXY1eqFxs+TNwJ/hgWFTvqNrnq8wAZqkF4rm7ZEnC8MfsBcLRJxSTSf HcmLZbIux1DbRSoe0tveHBGNnXlx1CTqCi+1jmuKkVmCsAav4W9C6x8ThL5zQUFu5wyI CNgIc04cbCep060Lk+mQBBmxch+ULby3Uts1aHA26uWNgGTVDM0mLodycLvs3wWeKoBB BZFPjjMpo4ZWDla6o/KkRJjjw5HXyHbav9ji4k1tr0Tf8J6OC+clcrmSXEBTymY2Mwnu j1J7A3O2LIVSKgKGsbVdC0AeM7oFUoK5nRCqPrLV+fkTWP6QcE6HEo/gQHrbx18UV+9y 3Z0g== X-Gm-Message-State: AA+aEWb7cWg0978tg+ue2Q7mO6spxFr+SFuo28SIhFQWwCPfqG/GI3lZ pKrSAxvA6UKcYlVCY8Wf4ccCoA== X-Received: by 2002:a37:2cc2:: with SMTP id s185mr11328178qkh.74.1544460852144; Mon, 10 Dec 2018 08:54:12 -0800 (PST) Received: from [192.168.49.18] ([138.204.25.7]) by smtp.gmail.com with ESMTPSA id u16sm8952639qkg.14.2018.12.10.08.54.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 08:54:11 -0800 (PST) Cc: Rafael David Tinoco , Rich Felker , linux-ia64@vger.kernel.org, Sergey Senozhatsky , linux-sh@vger.kernel.org, Benjamin Herrenschmidt , Heiko Carstens , Ram Pai , linux-mips@vger.kernel.org, linux-mm@kvack.org, Khalid Aziz , Paul Mackerras , "H . Peter Anvin" , sparclinux@vger.kernel.org, linux-s390@vger.kernel.org, Yoshinori Sato , Michael Ellerman , x86@kernel.org, Ingo Molnar , Catalin Marinas , James Hogan , Anthony Yznaga , Nitin Gupta , Fenghua Yu , Joerg Roedel , Juergen Gross , Vasily Gorbik , Will Deacon , Nicholas Piggin , Borislav Petkov , Andy Lutomirski , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, Christophe Leroy , Tony Luck , Jiri Kosina , linux-kernel@vger.kernel.org, Ralf Baechle , Minchan Kim , Paul Burton , "Aneesh Kumar K . V" , Martin Schwidefsky , linuxppc-dev@lists.ozlabs.org, "David S . Miller" , "Kirill A . Shutemov" Subject: Re: [PATCH] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support To: Russell King - ARM Linux , Robin Murphy References: <20181210142105.6750-1-rafael.tinoco@linaro.org> <4da655ec-a1ac-c524-1eb4-5cbd18b265ef@arm.com> <20181210150551.GE30658@n2100.armlinux.org.uk> From: Rafael David Tinoco Organization: Linaro Message-ID: <27fb9417-fc55-f01e-c575-8c18862a282e@linaro.org> Date: Mon, 10 Dec 2018 14:53:59 -0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: <20181210150551.GE30658@n2100.armlinux.org.uk> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/10/18 1:05 PM, Russell King - ARM Linux wrote: > On Mon, Dec 10, 2018 at 02:35:55PM +0000, Robin Murphy wrote: >> On 10/12/2018 14:21, Rafael David Tinoco wrote: >>> On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the >>> physical frame number might be so big that zsmalloc obj encoding (to >>> location) will break, causing: >>> >>> BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc >>> Read of size 4 at addr 00000000 by task mkfs.ext4/623 >>> CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 4.19.0-rc8-00017-g8239bc6d3307-dirty #15 >>> Hardware name: Generic DT based system >>> [] (unwind_backtrace) from [] (show_stack+0x20/0x24) >>> [] (show_stack) from [] (dump_stack+0xbc/0xe8) >>> [] (dump_stack) from [] (kasan_report+0x248/0x390) >>> [] (kasan_report) from [] (__asan_load4+0x78/0xb4) >>> [] (__asan_load4) from [] (zs_map_object+0xa4/0x2bc) >>> [] (zs_map_object) from [] (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram]) >>> [] (zram_bvec_rw.constprop.2 [zram]) from [] (zram_make_request+0x234/0x46c [zram]) >>> [] (zram_make_request [zram]) from [] (generic_make_request+0x304/0x63c) >>> [] (generic_make_request) from [] (submit_bio+0x4c/0x1c8) >>> [] (submit_bio) from [] (submit_bh_wbc.constprop.15+0x238/0x26c) >>> [] (submit_bh_wbc.constprop.15) from [] (__block_write_full_page+0x524/0x76c) >>> [] (__block_write_full_page) from [] (block_write_full_page+0x1bc/0x1d4) >>> [] (block_write_full_page) from [] (blkdev_writepage+0x24/0x28) >>> [] (blkdev_writepage) from [] (__writepage+0x44/0x78) >>> [] (__writepage) from [] (write_cache_pages+0x3b8/0x800) >>> [] (write_cache_pages) from [] (generic_writepages+0x74/0xa0) >>> [] (generic_writepages) from [] (blkdev_writepages+0x18/0x1c) >>> [] (blkdev_writepages) from [] (do_writepages+0x68/0x134) >>> [] (do_writepages) from [] (__filemap_fdatawrite_range+0xb0/0xf4) >>> [] (__filemap_fdatawrite_range) from [] (file_write_and_wait_range+0x64/0xd0) >>> [] (file_write_and_wait_range) from [] (blkdev_fsync+0x54/0x84) >>> [] (blkdev_fsync) from [] (vfs_fsync_range+0x70/0xd4) >>> [] (vfs_fsync_range) from [] (do_fsync+0x4c/0x80) >>> [] (do_fsync) from [] (sys_fsync+0x1c/0x20) >>> [] (sys_fsync) from [] (ret_fast_syscall+0x0/0x2c) >>> >>> when trying to decode (the pfn) and map the object. >>> >>> That happens because one architecture might not re-define >>> MAX_PHYSMEM_BITS, like in this ARM 32-bit w/ LPAE enabled example. For >>> 32-bit systems, if not re-defined, MAX_POSSIBLE_PHYSMEM_BITS will >>> default to BITS_PER_LONG (32) in most cases, and, with PAE enabled, >>> _PFN_BITS might be wrong: which may cause obj variable to overflow if >>> frame number is in HIGHMEM and referencing a page above the 4GB >>> watermark. >>> >>> commit 6e00ec00b1a7 ("staging: zsmalloc: calculate MAX_PHYSMEM_BITS if >>> not defined") realized MAX_PHYSMEM_BITS depended on SPARSEMEM headers >>> and "fixed" it by calculating it using BITS_PER_LONG if SPARSEMEM wasn't >>> used, like in the example given above. >>> >>> Systems with potential for PAE exist for a long time and assuming >>> BITS_PER_LONG seems inadequate. Defining MAX_PHYSMEM_BITS looks better, >>> however it is NOT a constant anymore for x86. >>> >>> SO, instead, MAX_POSSIBLE_PHYSMEM_BITS should be defined by every >>> architecture using zsmalloc, together with a sanity check for >>> MAX_POSSIBLE_PHYSMEM_BITS being too big on 32-bit systems. >>> >>> Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17 >>> Signed-off-by: Rafael David Tinoco >>> --- >>> arch/arm/include/asm/pgtable-2level-types.h | 2 ++ >>> arch/arm/include/asm/pgtable-3level-types.h | 2 ++ >>> arch/arm64/include/asm/pgtable-types.h | 2 ++ >>> arch/ia64/include/asm/page.h | 2 ++ >>> arch/mips/include/asm/page.h | 2 ++ >>> arch/powerpc/include/asm/mmu.h | 2 ++ >>> arch/s390/include/asm/page.h | 2 ++ >>> arch/sh/include/asm/page.h | 2 ++ >>> arch/sparc/include/asm/page_32.h | 2 ++ >>> arch/sparc/include/asm/page_64.h | 2 ++ >>> arch/x86/include/asm/pgtable-2level_types.h | 2 ++ >>> arch/x86/include/asm/pgtable-3level_types.h | 3 +- >>> arch/x86/include/asm/pgtable_64_types.h | 4 +-- >>> mm/zsmalloc.c | 35 +++++++++++---------- >>> 14 files changed, 45 insertions(+), 19 deletions(-) >>> >>> diff --git a/arch/arm/include/asm/pgtable-2level-types.h b/arch/arm/include/asm/pgtable-2level-types.h >>> index 66cb5b0e89c5..552dba411324 100644 >>> --- a/arch/arm/include/asm/pgtable-2level-types.h >>> +++ b/arch/arm/include/asm/pgtable-2level-types.h >>> @@ -64,4 +64,6 @@ typedef pteval_t pgprot_t; >>> #endif /* STRICT_MM_TYPECHECKS */ >>> +#define MAX_POSSIBLE_PHYSMEM_BITS 32 >>> + >>> #endif /* _ASM_PGTABLE_2LEVEL_TYPES_H */ >>> diff --git a/arch/arm/include/asm/pgtable-3level-types.h b/arch/arm/include/asm/pgtable-3level-types.h >>> index 921aa30259c4..664c39e6517c 100644 >>> --- a/arch/arm/include/asm/pgtable-3level-types.h >>> +++ b/arch/arm/include/asm/pgtable-3level-types.h >>> @@ -67,4 +67,6 @@ typedef pteval_t pgprot_t; >>> #endif /* STRICT_MM_TYPECHECKS */ >>> +#define MAX_POSSIBLE_PHYSMEM_BITS 36 >> >> Nit: with LPAE, physical addresses go up to 40 bits, not just 36. Hum, I got it from where it was being defined when having SPARSEMEM enabled (#define MAX_PHYSMEM_BITS 36 in arch/arm/include/asm/sparsemem.h), since without SPARSEMEM it would default to BITS_PER_LONG. > Right, with that set at 40, we get: > > #define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT) > > == 28 > > #define OBJ_TAG_BITS 1 > #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) > > == 3 > > #define ZS_MAX_ZSPAGE_ORDER 2 > #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) > > == 4 > > #define ZS_MIN_ALLOC_SIZE \ > MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) > > == 4 << 12 >> 3 = 2048 > > or half-page allocations. > > Given this in Documentation/vm/zsmalloc.rst: > > On the other hand, if we just use single > (0-order) pages, it would suffer from very high fragmentation -- > any object of size PAGE_SIZE/2 or larger would occupy an entire page. > This was one of the major issues with its predecessor (xvmalloc). > > It seems that would not be acceptable, so I have to ask whether > using an unsigned long to store PFN + object ID is really acceptable. > Maybe the real solution to this problem is to stop using an > unsigned long to contain both the PFN and object ID? >