2018-10-25 01:28:33

by Rafael Tinoco

[permalink] [raw]
Subject: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the
physical frame number might be so big that zsmalloc obj encoding (to
location) will break IF architecture does not re-define
MAX_PHYSMEM_BITS, causing:

[ 497.097843] ==================================================================
[ 497.102365] BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc
[ 497.105933] Read of size 4 at addr 00000000 by task mkfs.ext4/623
[ 497.109684]
[ 497.110722] CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 4.19.0-rc8-00017-g8239bc6d3307-dirty #15
[ 497.116098] Hardware name: Generic DT based system
[ 497.119094] [<c0418f7c>] (unwind_backtrace) from [<c0410ca4>] (show_stack+0x20/0x24)
[ 497.123819] [<c0410ca4>] (show_stack) from [<c16bd540>] (dump_stack+0xbc/0xe8)
[ 497.128299] [<c16bd540>] (dump_stack) from [<c06cab74>] (kasan_report+0x248/0x390)
[ 497.132928] [<c06cab74>] (kasan_report) from [<c06c94e8>] (__asan_load4+0x78/0xb4)
[ 497.137530] [<c06c94e8>] (__asan_load4) from [<c06ddc24>] (zs_map_object+0xa4/0x2bc)
[ 497.142335] [<c06ddc24>] (zs_map_object) from [<bf0bbbd8>] (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram])
[ 497.148294] [<bf0bbbd8>] (zram_bvec_rw.constprop.2 [zram]) from [<bf0bc3cc>] (zram_make_request+0x234/0x46c [zram])
[ 497.154653] [<bf0bc3cc>] (zram_make_request [zram]) from [<c09aff9c>] (generic_make_request+0x304/0x63c)
[ 497.160413] [<c09aff9c>] (generic_make_request) from [<c09b0320>] (submit_bio+0x4c/0x1c8)
[ 497.165379] [<c09b0320>] (submit_bio) from [<c0743570>] (submit_bh_wbc.constprop.15+0x238/0x26c)
[ 497.170775] [<c0743570>] (submit_bh_wbc.constprop.15) from [<c0746cf8>] (__block_write_full_page+0x524/0x76c)
[ 497.176776] [<c0746cf8>] (__block_write_full_page) from [<c07472c4>] (block_write_full_page+0x1bc/0x1d4)
[ 497.182549] [<c07472c4>] (block_write_full_page) from [<c0748eb4>] (blkdev_writepage+0x24/0x28)
[ 497.187849] [<c0748eb4>] (blkdev_writepage) from [<c064a780>] (__writepage+0x44/0x78)
[ 497.192633] [<c064a780>] (__writepage) from [<c064b50c>] (write_cache_pages+0x3b8/0x800)
[ 497.197616] [<c064b50c>] (write_cache_pages) from [<c064dd78>] (generic_writepages+0x74/0xa0)
[ 497.202807] [<c064dd78>] (generic_writepages) from [<c0748e64>] (blkdev_writepages+0x18/0x1c)
[ 497.208022] [<c0748e64>] (blkdev_writepages) from [<c064e890>] (do_writepages+0x68/0x134)
[ 497.213002] [<c064e890>] (do_writepages) from [<c06368a4>] (__filemap_fdatawrite_range+0xb0/0xf4)
[ 497.218447] [<c06368a4>] (__filemap_fdatawrite_range) from [<c0636b68>] (file_write_and_wait_range+0x64/0xd0)
[ 497.224416] [<c0636b68>] (file_write_and_wait_range) from [<c0747af8>] (blkdev_fsync+0x54/0x84)
[ 497.229749] [<c0747af8>] (blkdev_fsync) from [<c0739dac>] (vfs_fsync_range+0x70/0xd4)
[ 497.234549] [<c0739dac>] (vfs_fsync_range) from [<c0739e98>] (do_fsync+0x4c/0x80)
[ 497.239159] [<c0739e98>] (do_fsync) from [<c073a26c>] (sys_fsync+0x1c/0x20)
[ 497.243407] [<c073a26c>] (sys_fsync) from [<c0401000>] (ret_fast_syscall+0x0/0x2c)

like in this ARM 32-bit (LPAE enabled), example.

That occurs because, if not set, MAX_POSSIBLE_PHYSMEM_BITS will default
to BITS_PER_LONG (32) in most cases, and, for PAE, _PFN_BITS will be
wrong: which may cause obj variable to overflow if PFN is HIGHMEM and
referencing any page above the 4GB watermark.

This commit exposes the BUG IF the architecture supports PAE AND does
not explicitly set MAX_POSSIBLE_PHYSMEM_BITS to supported value.

Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17
Signed-off-by: Rafael David Tinoco <[email protected]>
---
mm/zsmalloc.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9da65552e7ca..9c3ff8c2ccbc 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -119,6 +119,15 @@
#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
#define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1)

+/*
+ * When using PAE, the obj encoding might overflow if arch does
+ * not re-define MAX_PHYSMEM_BITS, since zsmalloc uses HIGHMEM.
+ * This checks for a future bad page access, when de-coding obj.
+ */
+#define OBJ_OVERFLOW(_pfn) \
+ (((unsigned long long) _pfn << (OBJ_INDEX_BITS + OBJ_TAG_BITS)) >= \
+ ((_AC(1, ULL)) << MAX_POSSIBLE_PHYSMEM_BITS) ? 1 : 0)
+
#define FULLNESS_BITS 2
#define CLASS_BITS 8
#define ISOLATED_BITS 3
@@ -871,9 +880,14 @@ static void obj_to_location(unsigned long obj, struct page **page,
*/
static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)
{
- unsigned long obj;
+ unsigned long obj, pfn;
+
+ pfn = page_to_pfn(page);
+
+ if (unlikely(OBJ_OVERFLOW(pfn)))
+ BUG();

- obj = page_to_pfn(page) << OBJ_INDEX_BITS;
+ obj = pfn << OBJ_INDEX_BITS;
obj |= obj_idx & OBJ_INDEX_MASK;
obj <<= OBJ_TAG_BITS;

--
2.19.1



2018-10-25 01:28:35

by Rafael Tinoco

[permalink] [raw]
Subject: [PATCH 2/2] mm/zsmalloc.c: fix zsmalloc ARM LPAE support

Since commit 02390b87a945 ("mm/zsmalloc: Prepare to variable
MAX_PHYSMEM_BITS"), an architecture has to define this value in order to
guarantee that zsmalloc will be able to encode and decode the obj value
properly.

Similar to that change, this one sets the value for ARM LPAE, fixing a
possible null-ptr-deref in zs_map_object() when using ARM LPAE and
HIGHMEM pages located above the 4GB watermark.

Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17
Signed-off-by: Rafael David Tinoco <[email protected]>
---
arch/arm/include/asm/pgtable-3level-types.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-3level-types.h b/arch/arm/include/asm/pgtable-3level-types.h
index 921aa30259c4..bd4994f98700 100644
--- a/arch/arm/include/asm/pgtable-3level-types.h
+++ b/arch/arm/include/asm/pgtable-3level-types.h
@@ -67,4 +67,6 @@ typedef pteval_t pgprot_t;

#endif /* STRICT_MM_TYPECHECKS */

+#define MAX_POSSIBLE_PHYSMEM_BITS 36
+
#endif /* _ASM_PGTABLE_3LEVEL_TYPES_H */
--
2.19.1


2018-10-25 05:29:43

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

On (10/24/18 22:27), Rafael David Tinoco wrote:
> static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)
> {
> - unsigned long obj;
> + unsigned long obj, pfn;
> +
> + pfn = page_to_pfn(page);
> +
> + if (unlikely(OBJ_OVERFLOW(pfn)))
> + BUG();

The trend these days is to have less BUG/BUG_ON-s in the kernel.

-ss

2018-10-25 11:05:41

by Rafael Tinoco

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

On Thu, Oct 25, 2018 at 2:29 AM, Sergey Senozhatsky
<[email protected]> wrote:
> On (10/24/18 22:27), Rafael David Tinoco wrote:
>> static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)
>> {
>> - unsigned long obj;
>> + unsigned long obj, pfn;
>> +
>> + pfn = page_to_pfn(page);
>> +
>> + if (unlikely(OBJ_OVERFLOW(pfn)))
>> + BUG();
>
> The trend these days is to have less BUG/BUG_ON-s in the kernel.
>
> -ss

For this case, IMHO, it is worth.

It will avoid a investigation like:
https://bugs.linaro.org/show_bug.cgi?id=3765#c7 and and #c8, where I
had to poison slab allocation - to force both zs_handle and zspage
slabs not to be merged - and to make sure the zspage slab had a good
magic number AND to identify why the bad paging request happened.

If this happens again, for any other arch supporting PAE that does not
declare MAX_POSSIBLE_PHYSMEM_BITS or MAX_PHYSMEM_BITS appropriately,
the kernel will panic, no matter what, by the time it reaches
obj_to_location(). Things can be more complicated about declarations
for PAE if we consider ARM can declare MAX_PHYSMEM_BITS differently in
arch/arm/mach-XXX and/or, for this case, when having, or not SPARSEMEM
set (if I had SPARSEMEM set I would not face this, for example).

If this occurs, the kernel will panic, no matter what, by the time it
reaches obj_to_location()... so why not to BUG() here and let user to
know exactly where it panic-ed and why ? Other option would be to
WARN() here and let it panic naturally because of bad paging request
in a very near future... please advise.

Thanks,
Best Rgds
-Rafael

2018-10-25 12:17:50

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

On Wed, Oct 24, 2018 at 10:27:44PM -0300, Rafael David Tinoco wrote:
> On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the
> physical frame number might be so big that zsmalloc obj encoding (to
> location) will break IF architecture does not re-define
> MAX_PHYSMEM_BITS, causing:

I think there's a deeper problem here - a misunderstanding of what
MAX_PHYSMEM_BITS is.

MAX_PHYSMEM_BITS is a definition for sparsemem, and is only visible
when sparsemem is enabled. When sparsemem is disabled, asm/sparsemem.h
is not included (and should not be included) which means there is no
MAX_PHYSMEM_BITS definition.

I don't think zsmalloc.c should be (ab)using MAX_PHYSMEM_BITS, and
your description above makes it sound like you expect it to always be
defined.

If we want to have a definition for this, we shouldn't be playing
fragile games like:

#ifndef MAX_POSSIBLE_PHYSMEM_BITS
#ifdef MAX_PHYSMEM_BITS
#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
#else
/*
* If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
* be PAGE_SHIFT
*/
#define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
#endif
#endif

but instead insist that MAX_PHYSMEM_BITS is defined _everywhere_.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2018-10-25 12:39:18

by Rafael Tinoco

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

> MAX_PHYSMEM_BITS is a definition for sparsemem, and is only visible
> when sparsemem is enabled. When sparsemem is disabled, asm/sparsemem.h
> is not included (and should not be included) which means there is no
> MAX_PHYSMEM_BITS definition.

Missed that part :\, tks.

> I don't think zsmalloc.c should be (ab)using MAX_PHYSMEM_BITS, and
> your description above makes it sound like you expect it to always be
> defined.
>
> If we want to have a definition for this, we shouldn't be playing
> fragile games like:
>
> #ifndef MAX_POSSIBLE_PHYSMEM_BITS
> #ifdef MAX_PHYSMEM_BITS
> #define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
> #else
> /*
> * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
> * be PAGE_SHIFT
> */
> #define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
> #endif
> #endif
>
> but instead insist that MAX_PHYSMEM_BITS is defined _everywhere_.

Is it okay to propose using only MAX_PHYSMEM_BITS for zsmalloc (like
it was before commit 02390b87) instead, and make sure *at least* ARM
32/64 and x86/x64, for now, have it defined outside sparsemem headers
as well ? This way I can WARN_ONCE(), instead of BUG(), when specific
arch does not define it - enforcing behavior - showing BITS_PER_LONG
is being used instead of MAX_PHYSMEM_BITS (warning, at least once, for
the possibility of an overflow, like the issue showed in here).

2018-10-25 12:44:02

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

Hi Rafael,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux-sof-driver/master]
[also build test WARNING on v4.19 next-20181019]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Rafael-David-Tinoco/mm-zsmalloc-c-check-encoded-object-value-overflow-for-PAE/20181025-110258
base: https://github.com/thesofproject/linux master
config: um-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=um

All warnings (new ones prefixed by >>):

In file included from include/linux/kernel.h:10:0,
from include/linux/list.h:9,
from include/linux/module.h:9,
from mm/zsmalloc.c:33:
mm/zsmalloc.c: In function 'location_to_obj':
>> mm/zsmalloc.c:129:17: warning: left shift count >= width of type [-Wshift-count-overflow]
((_AC(1, ULL)) << MAX_POSSIBLE_PHYSMEM_BITS) ? 1 : 0)
^
include/linux/compiler.h:77:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
^
>> mm/zsmalloc.c:886:15: note: in expansion of macro 'OBJ_OVERFLOW'
if (unlikely(OBJ_OVERFLOW(pfn)))
^~~~~~~~~~~~
Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size
Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_read
Cyclomatic Complexity 1 include/linux/kasan-checks.h:kasan_check_write
Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:set_bit
Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:clear_bit
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:clear_bit_unlock
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:test_and_set_bit
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:test_and_set_bit_lock
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:variable_test_bit
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64
Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64
Cyclomatic Complexity 1 include/linux/kernel.h:___might_sleep
Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD
Cyclomatic Complexity 2 include/linux/list.h:__list_add
Cyclomatic Complexity 1 include/linux/list.h:list_add
Cyclomatic Complexity 1 include/linux/list.h:__list_del
Cyclomatic Complexity 2 include/linux/list.h:__list_del_entry
Cyclomatic Complexity 1 include/linux/list.h:list_del
Cyclomatic Complexity 1 include/linux/list.h:list_del_init
Cyclomatic Complexity 1 include/linux/list.h:list_empty
Cyclomatic Complexity 1 include/linux/list.h:__list_splice
Cyclomatic Complexity 2 include/linux/list.h:list_splice_init
Cyclomatic Complexity 1 arch/um/include/shared/mem.h:to_virt
Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
Cyclomatic Complexity 1 arch/um/include/asm/thread_info.h:current_thread_info
Cyclomatic Complexity 1 include/asm-generic/preempt.h:preempt_count
Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_read
Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_set
Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_inc
Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:arch_atomic_dec_and_test
Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_read
Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_add
Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_sub
Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_inc
Cyclomatic Complexity 1 arch/x86/include/asm/atomic64_64.h:arch_atomic64_dec
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_read
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_read
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_set
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_inc
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_inc
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_dec
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_add
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic64_sub
Cyclomatic Complexity 1 include/asm-generic/atomic-instrumented.h:atomic_dec_and_test
Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_read
Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_inc
Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_dec
Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_add
Cyclomatic Complexity 1 include/asm-generic/atomic-long.h:atomic_long_sub
Cyclomatic Complexity 1 arch/x86/um/asm/processor.h:rep_nop
Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
Cyclomatic Complexity 1 include/linux/spinlock.h:spin_lock
Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock
Cyclomatic Complexity 1 include/linux/jump_label.h:static_key_count
Cyclomatic Complexity 2 include/linux/jump_label.h:static_key_false
Cyclomatic Complexity 1 include/linux/nodemask.h:node_state
Cyclomatic Complexity 1 include/linux/err.h:PTR_ERR
Cyclomatic Complexity 1 include/linux/err.h:IS_ERR
Cyclomatic Complexity 1 include/linux/workqueue.h:queue_work
Cyclomatic Complexity 1 include/linux/workqueue.h:schedule_work
Cyclomatic Complexity 1 include/linux/topology.h:numa_node_id
Cyclomatic Complexity 1 include/linux/topology.h:numa_mem_id
Cyclomatic Complexity 1 include/linux/gfp.h:__alloc_pages
Cyclomatic Complexity 4 include/linux/gfp.h:__alloc_pages_node
Cyclomatic Complexity 2 include/linux/gfp.h:alloc_pages_node
Cyclomatic Complexity 4 include/linux/bit_spinlock.h:bit_spin_lock
Cyclomatic Complexity 2 include/linux/bit_spinlock.h:bit_spin_trylock
Cyclomatic Complexity 2 include/linux/bit_spinlock.h:bit_spin_unlock
Cyclomatic Complexity 2 include/linux/bit_spinlock.h:bit_spin_is_locked
Cyclomatic Complexity 1 include/linux/fs.h:mount_pseudo
Cyclomatic Complexity 2 include/linux/page-flags.h:compound_head
Cyclomatic Complexity 1 include/linux/page-flags.h:PagePoisoned
Cyclomatic Complexity 1 include/linux/page-flags.h:PageLocked
Cyclomatic Complexity 1 include/linux/page-flags.h:PagePrivate
Cyclomatic Complexity 1 include/linux/page-flags.h:SetPagePrivate
Cyclomatic Complexity 1 include/linux/page-flags.h:ClearPagePrivate
Cyclomatic Complexity 1 include/linux/page-flags.h:PageOwnerPriv1
Cyclomatic Complexity 1 include/linux/page-flags.h:SetPageOwnerPriv1
Cyclomatic Complexity 1 include/linux/page-flags.h:ClearPageOwnerPriv1
Cyclomatic Complexity 1 include/linux/page-flags.h:PageIsolated
Cyclomatic Complexity 1 include/linux/page_ref.h:page_ref_count
Cyclomatic Complexity 2 include/linux/page_ref.h:page_ref_inc
Cyclomatic Complexity 2 include/linux/page_ref.h:page_ref_dec_and_test
Cyclomatic Complexity 1 include/linux/mm.h:put_page_testzero
Cyclomatic Complexity 1 include/linux/mm.h:page_mapcount_reset
Cyclomatic Complexity 1 include/linux/mm.h:page_zonenum
Cyclomatic Complexity 1 include/linux/mm.h:get_page
Cyclomatic Complexity 2 include/linux/mm.h:put_page
Cyclomatic Complexity 1 include/linux/mm.h:page_zone
Cyclomatic Complexity 1 include/linux/vmstat.h:__inc_zone_state
Cyclomatic Complexity 1 include/linux/vmstat.h:__dec_zone_state
Cyclomatic Complexity 1 include/linux/vmstat.h:__inc_zone_page_state
Cyclomatic Complexity 1 include/linux/vmstat.h:__dec_zone_page_state
Cyclomatic Complexity 1 include/linux/mm.h:lowmem_page_address
Cyclomatic Complexity 1 include/linux/uaccess.h:pagefault_disabled_inc
Cyclomatic Complexity 1 include/linux/uaccess.h:pagefault_disabled_dec

vim +129 mm/zsmalloc.c

32
> 33 #include <linux/module.h>
34 #include <linux/kernel.h>
35 #include <linux/sched.h>
36 #include <linux/magic.h>
37 #include <linux/bitops.h>
38 #include <linux/errno.h>
39 #include <linux/highmem.h>
40 #include <linux/string.h>
41 #include <linux/slab.h>
42 #include <asm/tlbflush.h>
43 #include <asm/pgtable.h>
44 #include <linux/cpumask.h>
45 #include <linux/cpu.h>
46 #include <linux/vmalloc.h>
47 #include <linux/preempt.h>
48 #include <linux/spinlock.h>
49 #include <linux/shrinker.h>
50 #include <linux/types.h>
51 #include <linux/debugfs.h>
52 #include <linux/zsmalloc.h>
53 #include <linux/zpool.h>
54 #include <linux/mount.h>
55 #include <linux/migrate.h>
56 #include <linux/pagemap.h>
57 #include <linux/fs.h>
58
59 #define ZSPAGE_MAGIC 0x58
60
61 /*
62 * This must be power of 2 and greater than of equal to sizeof(link_free).
63 * These two conditions ensure that any 'struct link_free' itself doesn't
64 * span more than 1 page which avoids complex case of mapping 2 pages simply
65 * to restore link_free pointer values.
66 */
67 #define ZS_ALIGN 8
68
69 /*
70 * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
71 * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
72 */
73 #define ZS_MAX_ZSPAGE_ORDER 2
74 #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
75
76 #define ZS_HANDLE_SIZE (sizeof(unsigned long))
77
78 /*
79 * Object location (<PFN>, <obj_idx>) is encoded as
80 * as single (unsigned long) handle value.
81 *
82 * Note that object index <obj_idx> starts from 0.
83 *
84 * This is made more complicated by various memory models and PAE.
85 */
86
87 #ifndef MAX_POSSIBLE_PHYSMEM_BITS
88 #ifdef MAX_PHYSMEM_BITS
89 #define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
90 #else
91 /*
92 * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
93 * be PAGE_SHIFT
94 */
95 #define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
96 #endif
97 #endif
98
99 #define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)
100
101 /*
102 * Memory for allocating for handle keeps object position by
103 * encoding <page, obj_idx> and the encoded value has a room
104 * in least bit(ie, look at obj_to_location).
105 * We use the bit to synchronize between object access by
106 * user and migration.
107 */
108 #define HANDLE_PIN_BIT 0
109
110 /*
111 * Head in allocated object should have OBJ_ALLOCATED_TAG
112 * to identify the object was allocated or not.
113 * It's okay to add the status bit in the least bit because
114 * header keeps handle which is 4byte-aligned address so we
115 * have room for two bit at least.
116 */
117 #define OBJ_ALLOCATED_TAG 1
118 #define OBJ_TAG_BITS 1
119 #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
120 #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
121
122 /*
123 * When using PAE, the obj encoding might overflow if arch does
124 * not re-define MAX_PHYSMEM_BITS, since zsmalloc uses HIGHMEM.
125 * This checks for a future bad page access, when de-coding obj.
126 */
127 #define OBJ_OVERFLOW(_pfn) \
128 (((unsigned long long) _pfn << (OBJ_INDEX_BITS + OBJ_TAG_BITS)) >= \
> 129 ((_AC(1, ULL)) << MAX_POSSIBLE_PHYSMEM_BITS) ? 1 : 0)
130

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (11.97 kB)
.config.gz (18.83 kB)
Download all attachments

2018-10-25 13:44:45

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

On Thu, Oct 25, 2018 at 09:37:59AM -0300, Rafael David Tinoco wrote:
> Is it okay to propose using only MAX_PHYSMEM_BITS for zsmalloc (like
> it was before commit 02390b87) instead, and make sure *at least* ARM
> 32/64 and x86/x64, for now, have it defined outside sparsemem headers
> as well ?

It looks to me like this has been broken on ARM for quite some time,
predating that commit. The original was:

#ifndef MAX_PHYSMEM_BITS
#ifdef CONFIG_HIGHMEM64G
#define MAX_PHYSMEM_BITS 36
#else /* !CONFIG_HIGHMEM64G */
#define MAX_PHYSMEM_BITS BITS_PER_LONG
#endif
#endif
#define _PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT)

On ARM, CONFIG_HIGHMEM64G is never defined (it's an x86 private symbol)
which means that the above sets MAX_PHYSMEM_BITS to 32 on non-sparsemem
ARM LPAE platforms. So commit 02390b87 hasn't really changed anything
as far as ARM LPAE is concerned - and this looks to be a bug that goes
all the way back to when zsmalloc.c was moved out of staging in 2014.

Digging further back, it seems this brokenness was introduced with:

commit 6e00ec00b1a76a199b8c0acae401757b795daf57
Author: Seth Jennings <[email protected]>
Date: Mon Mar 5 11:33:22 2012 -0600

staging: zsmalloc: calculate MAX_PHYSMEM_BITS if not defined

This patch provides a way to determine or "set a
reasonable value for" MAX_PHYSMEM_BITS in the case that
it is not defined (i.e. !SPARSEMEM)

Signed-off-by: Seth Jennings <[email protected]>
Acked-by: Nitin Gupta <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

which, at the time, realised the problem with SPARSEMEM, but decided
that in the absense of SPARSEMEM, that MAX_PHYSMEM_BITS shall be
BITS_PER_LONG which seems absurd (see below.)

> This way I can WARN_ONCE(), instead of BUG(), when specific
> arch does not define it - enforcing behavior - showing BITS_PER_LONG
> is being used instead of MAX_PHYSMEM_BITS (warning, at least once, for
> the possibility of an overflow, like the issue showed in here).

Assuming that the maximum number of physical memory bits are
BITS_PER_LONG in the absense of MAX_POSSIBLE_PHYSMEM_BITS is a nonsense
- we have had the potential for PAE systems for a long time, and to
introduce new code that makes this assumption was plainly wrong.

We know when there's the potential for PAE, and thus more than
BITS_PER_LONG bits of physical memory address, through
CONFIG_PHYS_ADDR_T_64BIT. So if we have the situation where
MAX_POSSIBLE_PHYSMEM_BITS (or the older case of MAX_PHYSMEM_BITS) not
being defined, but CONFIG_PHYS_ADDR_T_64BIT set, we should've been
erroring or something based on not knowing how many physical memory
bits are possible - it would be more than BITS_PER_LONG but less
than some unknown number of bits.

This is why I think any fallback here to BITS_PER_LONG is wrong.

What I suggested is to not fall back to BITS_PER_LONG in any case, but
always define MAX_PHYSMEM_BITS. However, I now see that won't work for
x86 because MAX_PHYSMEM_BITS is not a constant anymore.

So I suggest everything that uses zsmalloc.c should instead define
MAX_POSSIBLE_PHYSMEM_BITS.

Note that there should _also_ be some protection in zsmalloc.c against
MAX_POSSIBLE_PHYSMEM_BITS being too large:

#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
#define OBJ_TAG_BITS 1
#define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)

which means there's an implicit limitation on _PFN_BITS being less than
BITS_PER_LONG - OBJ_TAG_BITS (where, if it's equal to this, and hence
OBJ_INDEX_BITS will be zero.) This imples that MAX_POSSIBLE_PHYSMEM_BITS
must be smaller than BITS_PER_LONG + PAGE_SHIFT - OBJ_TAG_BITS, or
43 bits on a 32 bit system. If you want to guarantee a minimum number
of objects, then that limitation needs to be reduced further.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2018-11-21 00:12:52

by Rafael Tinoco

[permalink] [raw]
Subject: [PATCH v2] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support

On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the
physical frame number might be so big that zsmalloc obj encoding (to
location) will break, causing:

BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc
Read of size 4 at addr 00000000 by task mkfs.ext4/623
CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 4.19.0-rc8-00017-g8239bc6d3307-dirty #15
Hardware name: Generic DT based system
[<c0418f7c>] (unwind_backtrace) from [<c0410ca4>] (show_stack+0x20/0x24)
[<c0410ca4>] (show_stack) from [<c16bd540>] (dump_stack+0xbc/0xe8)
[<c16bd540>] (dump_stack) from [<c06cab74>] (kasan_report+0x248/0x390)
[<c06cab74>] (kasan_report) from [<c06c94e8>] (__asan_load4+0x78/0xb4)
[<c06c94e8>] (__asan_load4) from [<c06ddc24>] (zs_map_object+0xa4/0x2bc)
[<c06ddc24>] (zs_map_object) from [<bf0bbbd8>] (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram])
[<bf0bbbd8>] (zram_bvec_rw.constprop.2 [zram]) from [<bf0bc3cc>] (zram_make_request+0x234/0x46c [zram])
[<bf0bc3cc>] (zram_make_request [zram]) from [<c09aff9c>] (generic_make_request+0x304/0x63c)
[<c09aff9c>] (generic_make_request) from [<c09b0320>] (submit_bio+0x4c/0x1c8)
[<c09b0320>] (submit_bio) from [<c0743570>] (submit_bh_wbc.constprop.15+0x238/0x26c)
[<c0743570>] (submit_bh_wbc.constprop.15) from [<c0746cf8>] (__block_write_full_page+0x524/0x76c)
[<c0746cf8>] (__block_write_full_page) from [<c07472c4>] (block_write_full_page+0x1bc/0x1d4)
[<c07472c4>] (block_write_full_page) from [<c0748eb4>] (blkdev_writepage+0x24/0x28)
[<c0748eb4>] (blkdev_writepage) from [<c064a780>] (__writepage+0x44/0x78)
[<c064a780>] (__writepage) from [<c064b50c>] (write_cache_pages+0x3b8/0x800)
[<c064b50c>] (write_cache_pages) from [<c064dd78>] (generic_writepages+0x74/0xa0)
[<c064dd78>] (generic_writepages) from [<c0748e64>] (blkdev_writepages+0x18/0x1c)
[<c0748e64>] (blkdev_writepages) from [<c064e890>] (do_writepages+0x68/0x134)
[<c064e890>] (do_writepages) from [<c06368a4>] (__filemap_fdatawrite_range+0xb0/0xf4)
[<c06368a4>] (__filemap_fdatawrite_range) from [<c0636b68>] (file_write_and_wait_range+0x64/0xd0)
[<c0636b68>] (file_write_and_wait_range) from [<c0747af8>] (blkdev_fsync+0x54/0x84)
[<c0747af8>] (blkdev_fsync) from [<c0739dac>] (vfs_fsync_range+0x70/0xd4)
[<c0739dac>] (vfs_fsync_range) from [<c0739e98>] (do_fsync+0x4c/0x80)
[<c0739e98>] (do_fsync) from [<c073a26c>] (sys_fsync+0x1c/0x20)
[<c073a26c>] (sys_fsync) from [<c0401000>] (ret_fast_syscall+0x0/0x2c)

when trying to decode (the pfn) and map the object.

That happens because one architecture might not re-define
MAX_PHYSMEM_BITS, like in this ARM 32-bit w/ LPAE enabled example. For
32-bit systems, if not re-defined, MAX_POSSIBLE_PHYSMEM_BITS will
default to BITS_PER_LONG (32) in most cases, and, with PAE enabled,
_PFN_BITS might be wrong: which may cause obj variable to overflow if
frame number is in HIGHMEM and referencing a page above the 4GB
watermark.

commit 6e00ec00b1a7 ("staging: zsmalloc: calculate MAX_PHYSMEM_BITS if
not defined") realized MAX_PHYSMEM_BITS depended on SPARSEMEM headers
and "fixed" it by calculating it using BITS_PER_LONG if SPARSEMEM wasn't
used, like in the example given above.

Systems with potential for PAE exist for a long time and assuming
BITS_PER_LONG seems inadequate. Defining MAX_PHYSMEM_BITS looks better,
however it is NOT a constant anymore for x86.

SO, instead, MAX_POSSIBLE_PHYSMEM_BITS should be defined by every
architecture using zsmalloc, together with a sanity check for
MAX_POSSIBLE_PHYSMEM_BITS being too big on 32-bit systems.

Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17
Signed-off-by: Rafael David Tinoco <[email protected]>
---
arch/arm/include/asm/pgtable-2level-types.h | 2 ++
arch/arm/include/asm/pgtable-3level-types.h | 2 ++
arch/arm64/include/asm/pgtable-types.h | 2 ++
arch/ia64/include/asm/page.h | 2 ++
arch/mips/include/asm/page.h | 2 ++
arch/powerpc/include/asm/mmu.h | 2 ++
arch/s390/include/asm/page.h | 2 ++
arch/sh/include/asm/page.h | 2 ++
arch/sparc/include/asm/page_32.h | 2 ++
arch/sparc/include/asm/page_64.h | 2 ++
arch/x86/include/asm/pgtable-2level_types.h | 2 ++
arch/x86/include/asm/pgtable-3level_types.h | 3 +-
arch/x86/include/asm/pgtable_64_types.h | 4 +--
mm/zsmalloc.c | 35 +++++++++++----------
14 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level-types.h b/arch/arm/include/asm/pgtable-2level-types.h
index 66cb5b0e89c5..552dba411324 100644
--- a/arch/arm/include/asm/pgtable-2level-types.h
+++ b/arch/arm/include/asm/pgtable-2level-types.h
@@ -64,4 +64,6 @@ typedef pteval_t pgprot_t;

#endif /* STRICT_MM_TYPECHECKS */

+#define MAX_POSSIBLE_PHYSMEM_BITS 32
+
#endif /* _ASM_PGTABLE_2LEVEL_TYPES_H */
diff --git a/arch/arm/include/asm/pgtable-3level-types.h b/arch/arm/include/asm/pgtable-3level-types.h
index 921aa30259c4..664c39e6517c 100644
--- a/arch/arm/include/asm/pgtable-3level-types.h
+++ b/arch/arm/include/asm/pgtable-3level-types.h
@@ -67,4 +67,6 @@ typedef pteval_t pgprot_t;

#endif /* STRICT_MM_TYPECHECKS */

+#define MAX_POSSIBLE_PHYSMEM_BITS 36
+
#endif /* _ASM_PGTABLE_3LEVEL_TYPES_H */
diff --git a/arch/arm64/include/asm/pgtable-types.h b/arch/arm64/include/asm/pgtable-types.h
index 345a072b5856..45c3834eb4c8 100644
--- a/arch/arm64/include/asm/pgtable-types.h
+++ b/arch/arm64/include/asm/pgtable-types.h
@@ -64,4 +64,6 @@ typedef struct { pteval_t pgprot; } pgprot_t;
#include <asm-generic/5level-fixup.h>
#endif

+#define MAX_POSSIBLE_PHYSMEM_BITS CONFIG_ARM64_PA_BITS
+
#endif /* __ASM_PGTABLE_TYPES_H */
diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h
index 5798bd2b462c..a3e055979e46 100644
--- a/arch/ia64/include/asm/page.h
+++ b/arch/ia64/include/asm/page.h
@@ -235,4 +235,6 @@ get_order (unsigned long size)

#define __HAVE_ARCH_GATE_AREA 1

+#define MAX_POSSIBLE_PHYSMEM_BITS 50
+
#endif /* _ASM_IA64_PAGE_H */
diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h
index e8cc328fce2d..f6a5dea1a66c 100644
--- a/arch/mips/include/asm/page.h
+++ b/arch/mips/include/asm/page.h
@@ -263,4 +263,6 @@ extern int __virt_addr_valid(const volatile void *kaddr);
#include <asm-generic/memory_model.h>
#include <asm-generic/getorder.h>

+#define MAX_POSSIBLE_PHYSMEM_BITS 48
+
#endif /* _ASM_PAGE_H */
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index eb20eb3b8fb0..2ebc1d2d9a5c 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -324,6 +324,8 @@ static inline u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address)
#define MAX_PHYSMEM_BITS 46
#endif

+#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
+
#ifdef CONFIG_PPC_BOOK3S_64
#include <asm/book3s/64/mmu.h>
#else /* CONFIG_PPC_BOOK3S_64 */
diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index a4d38092530a..8abec1461bf7 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -180,4 +180,6 @@ static inline int devmem_is_allowed(unsigned long pfn)
#include <asm-generic/memory_model.h>
#include <asm-generic/getorder.h>

+#define MAX_POSSIBLE_PHYSMEM_BITS CONFIG_MAX_PHYSMEM_BITS
+
#endif /* _S390_PAGE_H */
diff --git a/arch/sh/include/asm/page.h b/arch/sh/include/asm/page.h
index 5eef8be3e59f..40c7e12cf09e 100644
--- a/arch/sh/include/asm/page.h
+++ b/arch/sh/include/asm/page.h
@@ -205,4 +205,6 @@ typedef struct page *pgtable_t;
#define ARCH_SLAB_MINALIGN 8
#endif

+#define MAX_POSSIBLE_PHYSMEM_BITS 32
+
#endif /* __ASM_SH_PAGE_H */
diff --git a/arch/sparc/include/asm/page_32.h b/arch/sparc/include/asm/page_32.h
index b76d59edec8c..14e9ca4659d7 100644
--- a/arch/sparc/include/asm/page_32.h
+++ b/arch/sparc/include/asm/page_32.h
@@ -139,4 +139,6 @@ extern unsigned long pfn_base;
#include <asm-generic/memory_model.h>
#include <asm-generic/getorder.h>

+#define MAX_POSSIBLE_PHYSMEM_BITS 32
+
#endif /* _SPARC_PAGE_H */
diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index e80f2d5bf62f..6d6f3654ead1 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -163,4 +163,6 @@ extern unsigned long PAGE_OFFSET;

#include <asm-generic/getorder.h>

+#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYS_ADDRESS_BITS
+
#endif /* _SPARC64_PAGE_H */
diff --git a/arch/x86/include/asm/pgtable-2level_types.h b/arch/x86/include/asm/pgtable-2level_types.h
index 6deb6cd236e3..c2eae59e6505 100644
--- a/arch/x86/include/asm/pgtable-2level_types.h
+++ b/arch/x86/include/asm/pgtable-2level_types.h
@@ -38,4 +38,6 @@ typedef union {
/* This covers all VMSPLIT_* and VMSPLIT_*_OPT variants */
#define PGD_KERNEL_START (CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)

+#define MAX_POSSIBLE_PHYSMEM_BITS 32
+
#endif /* _ASM_X86_PGTABLE_2LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index 33845d36897c..5fce514a49a0 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -45,7 +45,8 @@ typedef union {
*/
#define PTRS_PER_PTE 512

-#define MAX_POSSIBLE_PHYSMEM_BITS 36
#define PGD_KERNEL_START (CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)

+#define MAX_POSSIBLE_PHYSMEM_BITS 36
+
#endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 84bd9bdc1987..d808cfde3d19 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -64,8 +64,6 @@ extern unsigned int ptrs_per_p4d;
#define P4D_SIZE (_AC(1, UL) << P4D_SHIFT)
#define P4D_MASK (~(P4D_SIZE - 1))

-#define MAX_POSSIBLE_PHYSMEM_BITS 52
-
#else /* CONFIG_X86_5LEVEL */

/*
@@ -154,4 +152,6 @@ extern unsigned int ptrs_per_p4d;

#define PGD_KERNEL_START ((PAGE_SIZE / 2) / sizeof(pgd_t))

+#define MAX_POSSIBLE_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
+
#endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 0787d33b80d8..132c20b6fd4f 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -80,23 +80,7 @@
* as single (unsigned long) handle value.
*
* Note that object index <obj_idx> starts from 0.
- *
- * This is made more complicated by various memory models and PAE.
- */
-
-#ifndef MAX_POSSIBLE_PHYSMEM_BITS
-#ifdef MAX_PHYSMEM_BITS
-#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
-#else
-/*
- * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
- * be PAGE_SHIFT
*/
-#define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
-#endif
-#endif
-
-#define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)

/*
* Memory for allocating for handle keeps object position by
@@ -116,6 +100,25 @@
*/
#define OBJ_ALLOCATED_TAG 1
#define OBJ_TAG_BITS 1
+
+/*
+ * MAX_POSSIBLE_PHYSMEM_BITS should be defined by all archs using zsmalloc:
+ * Trying to guess it from MAX_PHYSMEM_BITS, or considering it BITS_PER_LONG,
+ * proved to be wrong by not considering PAE capabilities, or using SPARSEMEM
+ * only headers, leading to bad object encoding due to object index overflow.
+ */
+#ifndef MAX_POSSIBLE_PHYSMEM_BITS
+ #define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
+ #error "MAX_POSSIBLE_PHYSMEM_BITS HAS to be defined by arch using zsmalloc";
+#else
+ #ifndef CONFIG_64BIT
+ #if (MAX_POSSIBLE_PHYSMEM_BITS >= (BITS_PER_LONG + PAGE_SHIFT - OBJ_TAG_BITS))
+ #error "MAX_POSSIBLE_PHYSMEM_BITS is wrong for this arch";
+ #endif
+ #endif
+#endif
+
+#define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)
#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
#define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1)

--
2.19.1


2018-11-21 00:34:20

by Rafael Tinoco

[permalink] [raw]
Subject: Re: [PATCH v2] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support

On 11/20/18 10:11 PM, Rafael David Tinoco wrote:
> On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the
> physical frame number might be so big that zsmalloc obj encoding (to
> location) will break, causing:
>
> BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc
> Read of size 4 at addr 00000000 by task mkfs.ext4/623
> CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 4.19.0-rc8-00017-g8239bc6d3307-dirty #15
> Hardware name: Generic DT based system
> [<c0418f7c>] (unwind_backtrace) from [<c0410ca4>] (show_stack+0x20/0x24)
> [<c0410ca4>] (show_stack) from [<c16bd540>] (dump_stack+0xbc/0xe8)
> [<c16bd540>] (dump_stack) from [<c06cab74>] (kasan_report+0x248/0x390)
> [<c06cab74>] (kasan_report) from [<c06c94e8>] (__asan_load4+0x78/0xb4)
> [<c06c94e8>] (__asan_load4) from [<c06ddc24>] (zs_map_object+0xa4/0x2bc)
> [<c06ddc24>] (zs_map_object) from [<bf0bbbd8>] (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram])
> [<bf0bbbd8>] (zram_bvec_rw.constprop.2 [zram]) from [<bf0bc3cc>] (zram_make_request+0x234/0x46c [zram])
> [<bf0bc3cc>] (zram_make_request [zram]) from [<c09aff9c>] (generic_make_request+0x304/0x63c)
> [<c09aff9c>] (generic_make_request) from [<c09b0320>] (submit_bio+0x4c/0x1c8)
> [<c09b0320>] (submit_bio) from [<c0743570>] (submit_bh_wbc.constprop.15+0x238/0x26c)
> [<c0743570>] (submit_bh_wbc.constprop.15) from [<c0746cf8>] (__block_write_full_page+0x524/0x76c)
> [<c0746cf8>] (__block_write_full_page) from [<c07472c4>] (block_write_full_page+0x1bc/0x1d4)
> [<c07472c4>] (block_write_full_page) from [<c0748eb4>] (blkdev_writepage+0x24/0x28)
> [<c0748eb4>] (blkdev_writepage) from [<c064a780>] (__writepage+0x44/0x78)
> [<c064a780>] (__writepage) from [<c064b50c>] (write_cache_pages+0x3b8/0x800)
> [<c064b50c>] (write_cache_pages) from [<c064dd78>] (generic_writepages+0x74/0xa0)
> [<c064dd78>] (generic_writepages) from [<c0748e64>] (blkdev_writepages+0x18/0x1c)
> [<c0748e64>] (blkdev_writepages) from [<c064e890>] (do_writepages+0x68/0x134)
> [<c064e890>] (do_writepages) from [<c06368a4>] (__filemap_fdatawrite_range+0xb0/0xf4)
> [<c06368a4>] (__filemap_fdatawrite_range) from [<c0636b68>] (file_write_and_wait_range+0x64/0xd0)
> [<c0636b68>] (file_write_and_wait_range) from [<c0747af8>] (blkdev_fsync+0x54/0x84)
> [<c0747af8>] (blkdev_fsync) from [<c0739dac>] (vfs_fsync_range+0x70/0xd4)
> [<c0739dac>] (vfs_fsync_range) from [<c0739e98>] (do_fsync+0x4c/0x80)
> [<c0739e98>] (do_fsync) from [<c073a26c>] (sys_fsync+0x1c/0x20)
> [<c073a26c>] (sys_fsync) from [<c0401000>] (ret_fast_syscall+0x0/0x2c)
>
> when trying to decode (the pfn) and map the object.
>
> That happens because one architecture might not re-define
> MAX_PHYSMEM_BITS, like in this ARM 32-bit w/ LPAE enabled example. For
> 32-bit systems, if not re-defined, MAX_POSSIBLE_PHYSMEM_BITS will
> default to BITS_PER_LONG (32) in most cases, and, with PAE enabled,
> _PFN_BITS might be wrong: which may cause obj variable to overflow if
> frame number is in HIGHMEM and referencing a page above the 4GB
> watermark.
>
> commit 6e00ec00b1a7 ("staging: zsmalloc: calculate MAX_PHYSMEM_BITS if
> not defined") realized MAX_PHYSMEM_BITS depended on SPARSEMEM headers
> and "fixed" it by calculating it using BITS_PER_LONG if SPARSEMEM wasn't
> used, like in the example given above.
>
> Systems with potential for PAE exist for a long time and assuming
> BITS_PER_LONG seems inadequate. Defining MAX_PHYSMEM_BITS looks better,
> however it is NOT a constant anymore for x86.
>
> SO, instead, MAX_POSSIBLE_PHYSMEM_BITS should be defined by every
> architecture using zsmalloc, together with a sanity check for
> MAX_POSSIBLE_PHYSMEM_BITS being too big on 32-bit systems.
>
> Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17
> Signed-off-by: Rafael David Tinoco <[email protected]>
> ---
> arch/arm/include/asm/pgtable-2level-types.h | 2 ++
> arch/arm/include/asm/pgtable-3level-types.h | 2 ++
> arch/arm64/include/asm/pgtable-types.h | 2 ++
> arch/ia64/include/asm/page.h | 2 ++
> arch/mips/include/asm/page.h | 2 ++
> arch/powerpc/include/asm/mmu.h | 2 ++
> arch/s390/include/asm/page.h | 2 ++
> arch/sh/include/asm/page.h | 2 ++
> arch/sparc/include/asm/page_32.h | 2 ++
> arch/sparc/include/asm/page_64.h | 2 ++
> arch/x86/include/asm/pgtable-2level_types.h | 2 ++
> arch/x86/include/asm/pgtable-3level_types.h | 3 +-
> arch/x86/include/asm/pgtable_64_types.h | 4 +--
> mm/zsmalloc.c | 35 +++++++++++----------
> 14 files changed, 45 insertions(+), 19 deletions(-)

Russel,

I have tried to place MAX_POSSIBLE_PHYSMEM_BITS in the best available
header for each architecture, considering different paging levels, PAE
existence, and existing similar definitions. Also, I have only
considered those architectures already having "sparsemem.h" header.

Would you mind reviewing it ?

Tks
--
Rafael D. Tinoco
Linaro Kernel Validation

2018-11-27 20:35:00

by Rafael Tinoco

[permalink] [raw]
Subject: Re: [PATCH v2] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support

On 11/20/18 10:18 PM, Rafael David Tinoco wrote:
>
> Russell,
>
> I have tried to place MAX_POSSIBLE_PHYSMEM_BITS in the best available
> header for each architecture, considering different paging levels, PAE
> existence, and existing similar definitions. Also, I have only
> considered those architectures already having "sparsemem.h" header.
>
> Would you mind reviewing it ?

Should I re-send the this v2 (as v3) with complete list of
get_maintainer.pl ? I was in doubt because I'm touching headers from
several archs and I'm not sure who, if it is accepted, would merge it.

Thank you
--
Rafael D. Tinoco
Linaro Kernel Validation

2018-11-29 02:55:32

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support

On (11/27/18 18:33), Rafael David Tinoco wrote:
> On 11/20/18 10:18 PM, Rafael David Tinoco wrote:
> >
> > Russell,
> >
> > I have tried to place MAX_POSSIBLE_PHYSMEM_BITS in the best available
> > header for each architecture, considering different paging levels, PAE
> > existence, and existing similar definitions. Also, I have only
> > considered those architectures already having "sparsemem.h" header.
> >
> > Would you mind reviewing it ?
>
> Should I re-send the this v2 (as v3) with complete list of
> get_maintainer.pl ? I was in doubt because I'm touching headers from
> several archs and I'm not sure who, if it is accepted, would merge it.

Yes, resending and Cc-ing archs' maintainers if the right thing to do.
It's also possible that they will ask to split the patch and do a
per-arch change.

-ss