2022-12-17 02:02:19

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 0/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

Problem:
***
Stephen reported vread() will skip vm_map_ram areas when reading out
/proc/kcore with drgn utility. Please see below link to get more about
it:

/proc/kcore reads 0's for vmap_block
https://lore.kernel.org/all/[email protected]/T/#u

Root cause:
***
The normal vmalloc API uses struct vmap_area to manage the virtual
kernel area allocated and associate a vm_struct to store more information
and passed out. However, area reserved through vm_map_ram() interface
doesn't allocate vm_struct to bind with. So the current code in vread()
will skip the vm_map_ram area through 'if (!va->vm)' conditional checking.

Solution:
***
There are two types of vm_map_ram area. One is the whole vmap_area being
reserved and mapped at one time; the other is the whole vmap_area with
VMAP_BLOCK_SIZE size being reserved at one time, while mapped into split
regions with smaller size several times via vb_alloc(). I will call the
2nd type vb region.

In patch 1 and 2, add flags into struct vmap_area to mark these two types
of vm_map_ram area, meanwhile add bitmap field used_map into struct
vmap_block to mark those vb regions being used to differentiate with dirty
and free regions in vmap_block.

With the help of above vmap_area->flags and vmap_block->used_map, we can
recognize them in vread() and handle them respectively in patch 3.

Besides,
***
In patch 5, let's ignore vmap area with VM_UNINITIALIZED set in
vm->flags, because this kind of area is created by calling
__vmalloc_node_range(), VM_UNINITIALIZED set indicating it has
vm_struct associated with, but is still during the page allocating and
mapping process.

In patch 6 and 7, change area flag from VM_ALLOC to VM_IOREMAP in two
places. This will show them as 'ioremap' in /proc/vmallocinfo, and
exclude them from /proc/kcore.

Testing
***
Only did the basic testing.

Changelog
***
v1->v2:
- Change alloc_vmap_area() to pass in va_flags so that we can pass and
set vmap_area->flags for vm_map_ram area. With this, no extra action
need be added to acquire vmap_area_lock when doing the vmap_area->flags
setting. Uladzislau reviewed v1 and pointed out the issue.
- Improve vb_vread() to cover the case where reading is started from a
dirty or free region.

RFC->v1:
- Add a new field flags in vmap_area to mark vm_map_ram area. It cold be
risky reusing the vm union in vmap_area in RFC. I will consider
reusing the union in vmap_area to save memory later. Now just take the
simpler way to let's focus on resolving the main problem.
- Add patch 4~7 for optimization.

Baoquan He (7):
mm/vmalloc.c: add used_map into vmap_block to track space of
vmap_block
mm/vmalloc.c: add flags to mark vm_map_ram area
mm/vmalloc.c: allow vread() to read out vm_map_ram areas
mm/vmalloc: explicitly identify vm_map_ram area when shown in
/proc/vmcoreinfo
mm/vmalloc: skip the uninitilized vmalloc areas
powerpc: mm: add VM_IOREMAP flag to the vmalloc area
sh: mm: set VM_IOREMAP flag to the vmalloc area

arch/powerpc/kernel/pci_64.c | 2 +-
arch/sh/kernel/cpu/sh4/sq.c | 2 +-
include/linux/vmalloc.h | 1 +
mm/vmalloc.c | 106 +++++++++++++++++++++++++++++------
4 files changed, 92 insertions(+), 19 deletions(-)

--
2.34.1


2022-12-17 02:21:09

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 4/7] mm/vmalloc: explicitly identify vm_map_ram area when shown in /proc/vmcoreinfo

Now, by marking VMAP_RAM in vmap_area->flags for vm_map_ram, we can
clearly differentiate it with other vmalloc areas. So identify
vm_map_area area by checking VMAP_RAM of vmap_area->flags when shown
in /proc/vmcoreinfo.

Meanwhile, the code comment above vm_map_ram area checking in s_show()
is not needed any more, remove it here.

Signed-off-by: Baoquan He <[email protected]>
---
mm/vmalloc.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6612914459cf..3bfa872a4513 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4182,11 +4182,7 @@ static int s_show(struct seq_file *m, void *p)

va = list_entry(p, struct vmap_area, list);

- /*
- * s_show can encounter race with remove_vm_area, !vm on behalf
- * of vmap area is being tear down or vm_map_ram allocation.
- */
- if (!va->vm) {
+ if (!va->vm && (va->flags & VMAP_RAM)) {
seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
(void *)va->va_start, (void *)va->va_end,
va->va_end - va->va_start);
--
2.34.1

2022-12-17 02:21:54

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 5/7] mm/vmalloc: skip the uninitilized vmalloc areas

For areas allocated via vmalloc_xxx() APIs, it searches for unmapped area
to reserve and allocates new pages to map into, please see function
__vmalloc_node_range(). During the process, flag VM_UNINITIALIZED is set
in vm->flags to indicate that the pages allocation and mapping haven't
been done, until clear_vm_uninitialized_flag() is called to clear it.

For this kind of area, if VM_UNINITIALIZED is still set, let's ignore
it in vread() because pages newly allocated and being mapped in that
area only contains zero data. reading them out by aligned_vread() is
wasting time.

Signed-off-by: Baoquan He <[email protected]>
---
mm/vmalloc.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3bfa872a4513..bdaceda1b878 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3617,6 +3617,11 @@ long vread(char *buf, char *addr, unsigned long count)
if (!vm && !flags)
continue;

+ if (vm->flags & VM_UNINITIALIZED)
+ continue;
+ /* Pair with smp_wmb() in clear_vm_uninitialized_flag() */
+ smp_rmb();
+
vaddr = (char *) va->va_start;
size = flags ? va_size(va) : get_vm_area_size(vm);

--
2.34.1

2022-12-17 02:26:00

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 6/7] powerpc: mm: add VM_IOREMAP flag to the vmalloc area

Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
specific alignment clamping in __get_vm_area_node(), they will be
1) Shown as ioremap in /proc/vmallocinfo;
2) Ignored by /proc/kcore reading via vread()

So for the io mapping in ioremap_phb() of ppc, we should set VM_IOREMAP
in flag to make it handled correctly as above.

Signed-off-by: Baoquan He <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "Pali Roh??r" <[email protected]>
Cc: [email protected]
---
arch/powerpc/kernel/pci_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 0c7cfb9fab04..fd42059ae2a5 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -132,7 +132,7 @@ void __iomem *ioremap_phb(phys_addr_t paddr, unsigned long size)
* address decoding but I'd rather not deal with those outside of the
* reserved 64K legacy region.
*/
- area = __get_vm_area_caller(size, 0, PHB_IO_BASE, PHB_IO_END,
+ area = __get_vm_area_caller(size, VM_IOREMAP, PHB_IO_BASE, PHB_IO_END,
__builtin_return_address(0));
if (!area)
return NULL;
--
2.34.1

2022-12-17 02:30:57

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 1/7] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block

In one vmap_block area, there could be three types of regions: region
being used which is allocated through vb_alloc(), dirty region which
is freed via vb_free() and free region. Among them, only used region
has available data. While there's no way to track those used regions
currently.

Here, add bitmap field used_map into vmap_block, and set/clear it during
allocation or freeing regions of vmap_block area.

This is a preparatoin for later use.

Signed-off-by: Baoquan He <[email protected]>
---
mm/vmalloc.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ccaa461998f3..5d3fd3e6fe09 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1896,6 +1896,7 @@ struct vmap_block {
spinlock_t lock;
struct vmap_area *va;
unsigned long free, dirty;
+ DECLARE_BITMAP(used_map, VMAP_BBMAP_BITS);
unsigned long dirty_min, dirty_max; /*< dirty range */
struct list_head free_list;
struct rcu_head rcu_head;
@@ -1972,10 +1973,12 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
vb->va = va;
/* At least something should be left free */
BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
+ bitmap_zero(vb->used_map, VMAP_BBMAP_BITS);
vb->free = VMAP_BBMAP_BITS - (1UL << order);
vb->dirty = 0;
vb->dirty_min = VMAP_BBMAP_BITS;
vb->dirty_max = 0;
+ bitmap_set(vb->used_map, 0, (1UL << order));
INIT_LIST_HEAD(&vb->free_list);

vb_idx = addr_to_vb_idx(va->va_start);
@@ -2081,6 +2084,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
pages_off = VMAP_BBMAP_BITS - vb->free;
vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
vb->free -= 1UL << order;
+ bitmap_set(vb->used_map, pages_off, (1UL << order));
if (vb->free == 0) {
spin_lock(&vbq->lock);
list_del_rcu(&vb->free_list);
@@ -2114,6 +2118,9 @@ static void vb_free(unsigned long addr, unsigned long size)
order = get_order(size);
offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT;
vb = xa_load(&vmap_blocks, addr_to_vb_idx(addr));
+ spin_lock(&vb->lock);
+ bitmap_clear(vb->used_map, offset, (1UL << order));
+ spin_unlock(&vb->lock);

vunmap_range_noflush(addr, addr + size);

--
2.34.1

2022-12-17 02:32:34

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 7/7] sh: mm: set VM_IOREMAP flag to the vmalloc area

Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
specific alignment clamping in __get_vm_area_node(), they will be
1) Shown as ioremap in /proc/vmallocinfo;
2) Ignored by /proc/kcore reading via vread()

So for the ioremap in __sq_remap() of sh, we should set VM_IOREMAP
in flag to make it handled correctly as above.

Signed-off-by: Baoquan He <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
---
arch/sh/kernel/cpu/sh4/sq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sh/kernel/cpu/sh4/sq.c b/arch/sh/kernel/cpu/sh4/sq.c
index a76b94e41e91..27f2e3da5aa2 100644
--- a/arch/sh/kernel/cpu/sh4/sq.c
+++ b/arch/sh/kernel/cpu/sh4/sq.c
@@ -103,7 +103,7 @@ static int __sq_remap(struct sq_mapping *map, pgprot_t prot)
#if defined(CONFIG_MMU)
struct vm_struct *vma;

- vma = __get_vm_area_caller(map->size, VM_ALLOC, map->sq_addr,
+ vma = __get_vm_area_caller(map->size, VM_IOREMAP, map->sq_addr,
SQ_ADDRMAX, __builtin_return_address(0));
if (!vma)
return -ENOMEM;
--
2.34.1

2022-12-17 02:43:13

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

Currently, vread can read out vmalloc areas which is associated with
a vm_struct. While this doesn't work for areas created by vm_map_ram()
interface because it doesn't have an associated vm_struct. Then in vread(),
these areas will be skipped.

Here, add a new function vb_vread() to read out areas managed by
vmap_block specifically. Then recognize vm_map_ram areas via vmap->flags
and handle them respectively.

Signed-off-by: Baoquan He <[email protected]>
---
mm/vmalloc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 59 insertions(+), 7 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 190f29bbaaa7..6612914459cf 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3515,6 +3515,51 @@ static int aligned_vread(char *buf, char *addr, unsigned long count)
return copied;
}

+static void vb_vread(char *buf, char *addr, int count)
+{
+ char *start;
+ struct vmap_block *vb;
+ unsigned long offset;
+ unsigned int rs, re, n;
+
+ vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
+
+ spin_lock(&vb->lock);
+ if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
+ spin_unlock(&vb->lock);
+ memset(buf, 0, count);
+ return;
+ }
+ for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
+ if (!count)
+ break;
+ start = vmap_block_vaddr(vb->va->va_start, rs);
+ if (addr < start) {
+ if (count == 0)
+ break;
+ *buf = '\0';
+ buf++;
+ addr++;
+ count--;
+ }
+ /*it could start reading from the middle of used region*/
+ offset = offset_in_page(addr);
+ n = (re - rs + 1) << PAGE_SHIFT - offset;
+ if (n > count)
+ n = count;
+ aligned_vread(buf, start+offset, n);
+
+ buf += n;
+ addr += n;
+ count -= n;
+ }
+ spin_unlock(&vb->lock);
+
+ /* zero-fill the left dirty or free regions */
+ if (count)
+ memset(buf, 0, count);
+}
+
/**
* vread() - read vmalloc area in a safe way.
* @buf: buffer for reading data
@@ -3545,7 +3590,7 @@ long vread(char *buf, char *addr, unsigned long count)
struct vm_struct *vm;
char *vaddr, *buf_start = buf;
unsigned long buflen = count;
- unsigned long n;
+ unsigned long n, size, flags;

addr = kasan_reset_tag(addr);

@@ -3566,12 +3611,16 @@ long vread(char *buf, char *addr, unsigned long count)
if (!count)
break;

- if (!va->vm)
+ vm = va->vm;
+ flags = va->flags & VMAP_FLAGS_MASK;
+
+ if (!vm && !flags)
continue;

- vm = va->vm;
- vaddr = (char *) vm->addr;
- if (addr >= vaddr + get_vm_area_size(vm))
+ vaddr = (char *) va->va_start;
+ size = flags ? va_size(va) : get_vm_area_size(vm);
+
+ if (addr >= vaddr + size)
continue;
while (addr < vaddr) {
if (count == 0)
@@ -3581,10 +3630,13 @@ long vread(char *buf, char *addr, unsigned long count)
addr++;
count--;
}
- n = vaddr + get_vm_area_size(vm) - addr;
+ n = vaddr + size - addr;
if (n > count)
n = count;
- if (!(vm->flags & VM_IOREMAP))
+
+ if ((flags & (VMAP_RAM|VMAP_BLOCK)) == (VMAP_RAM|VMAP_BLOCK))
+ vb_vread(buf, addr, n);
+ else if ((flags & VMAP_RAM) || !(vm->flags & VM_IOREMAP))
aligned_vread(buf, addr, n);
else /* IOREMAP area is treated as memory hole */
memset(buf, 0, n);
--
2.34.1

2022-12-17 04:45:29

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

Hi Baoquan,

I love your patch! Perhaps something to improve:

[auto build test WARNING on akpm-mm/mm-everything]

url: https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/mm-vmalloc-c-allow-vread-to-read-out-vm_map_ram-areas/20221217-095615
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20221217015435.73889-4-bhe%40redhat.com
patch subject: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
config: powerpc-randconfig-r031-20221216
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 98b13979fb05f3ed288a900deb843e7b27589e58)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/368cd65be8fedd1642e53393dc3f28ff8726122d
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Baoquan-He/mm-vmalloc-c-allow-vread-to-read-out-vm_map_ram-areas/20221217-095615
git checkout 368cd65be8fedd1642e53393dc3f28ff8726122d
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> mm/vmalloc.c:3563:35: warning: operator '<<' has lower precedence than '-'; '-' will be evaluated first [-Wshift-op-parentheses]
n = (re - rs + 1) << PAGE_SHIFT - offset;
~~ ~~~~~~~~~~~^~~~~~~~
mm/vmalloc.c:3563:35: note: place parentheses around the '-' expression to silence this warning
n = (re - rs + 1) << PAGE_SHIFT - offset;
~~~~~~~~~~~^~~~~~~~
1 warning generated.


vim +3563 mm/vmalloc.c

3533
3534 static void vb_vread(char *buf, char *addr, int count)
3535 {
3536 char *start;
3537 struct vmap_block *vb;
3538 unsigned long offset;
3539 unsigned int rs, re, n;
3540
3541 vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
3542
3543 spin_lock(&vb->lock);
3544 if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
3545 spin_unlock(&vb->lock);
3546 memset(buf, 0, count);
3547 return;
3548 }
3549 for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
3550 if (!count)
3551 break;
3552 start = vmap_block_vaddr(vb->va->va_start, rs);
3553 if (addr < start) {
3554 if (count == 0)
3555 break;
3556 *buf = '\0';
3557 buf++;
3558 addr++;
3559 count--;
3560 }
3561 /*it could start reading from the middle of used region*/
3562 offset = offset_in_page(addr);
> 3563 n = (re - rs + 1) << PAGE_SHIFT - offset;
3564 if (n > count)
3565 n = count;
3566 aligned_vread(buf, start+offset, n);
3567
3568 buf += n;
3569 addr += n;
3570 count -= n;
3571 }
3572 spin_unlock(&vb->lock);
3573
3574 /* zero-fill the left dirty or free regions */
3575 if (count)
3576 memset(buf, 0, count);
3577 }
3578

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (3.62 kB)
config (151.16 kB)
Download all attachments

2022-12-17 07:49:23

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

Hi Baoquan,

I love your patch! Perhaps something to improve:

[auto build test WARNING on akpm-mm/mm-everything]

url: https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/mm-vmalloc-c-allow-vread-to-read-out-vm_map_ram-areas/20221217-095615
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20221217015435.73889-4-bhe%40redhat.com
patch subject: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
config: loongarch-randconfig-r006-20221216
compiler: loongarch64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/368cd65be8fedd1642e53393dc3f28ff8726122d
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Baoquan-He/mm-vmalloc-c-allow-vread-to-read-out-vm_map_ram-areas/20221217-095615
git checkout 368cd65be8fedd1642e53393dc3f28ff8726122d
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

mm/vmalloc.c: In function 'vb_vread':
>> mm/vmalloc.c:3563:49: warning: suggest parentheses around '-' inside '<<' [-Wparentheses]
3563 | n = (re - rs + 1) << PAGE_SHIFT - offset;


vim +3563 mm/vmalloc.c

3533
3534 static void vb_vread(char *buf, char *addr, int count)
3535 {
3536 char *start;
3537 struct vmap_block *vb;
3538 unsigned long offset;
3539 unsigned int rs, re, n;
3540
3541 vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
3542
3543 spin_lock(&vb->lock);
3544 if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
3545 spin_unlock(&vb->lock);
3546 memset(buf, 0, count);
3547 return;
3548 }
3549 for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
3550 if (!count)
3551 break;
3552 start = vmap_block_vaddr(vb->va->va_start, rs);
3553 if (addr < start) {
3554 if (count == 0)
3555 break;
3556 *buf = '\0';
3557 buf++;
3558 addr++;
3559 count--;
3560 }
3561 /*it could start reading from the middle of used region*/
3562 offset = offset_in_page(addr);
> 3563 n = (re - rs + 1) << PAGE_SHIFT - offset;
3564 if (n > count)
3565 n = count;
3566 aligned_vread(buf, start+offset, n);
3567
3568 buf += n;
3569 addr += n;
3570 count -= n;
3571 }
3572 spin_unlock(&vb->lock);
3573
3574 /* zero-fill the left dirty or free regions */
3575 if (count)
3576 memset(buf, 0, count);
3577 }
3578

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (3.17 kB)
config (130.43 kB)
Download all attachments

2022-12-17 10:34:18

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

On 12/17/22 at 02:41pm, kernel test robot wrote:
> Hi Baoquan,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on akpm-mm/mm-everything]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/mm-vmalloc-c-allow-vread-to-read-out-vm_map_ram-areas/20221217-095615
> base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/r/20221217015435.73889-4-bhe%40redhat.com
> patch subject: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
> config: loongarch-randconfig-r006-20221216
> compiler: loongarch64-linux-gcc (GCC) 12.1.0
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/intel-lab-lkp/linux/commit/368cd65be8fedd1642e53393dc3f28ff8726122d
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review Baoquan-He/mm-vmalloc-c-allow-vread-to-read-out-vm_map_ram-areas/20221217-095615
> git checkout 368cd65be8fedd1642e53393dc3f28ff8726122d
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch olddefconfig
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
> mm/vmalloc.c: In function 'vb_vread':
> >> mm/vmalloc.c:3563:49: warning: suggest parentheses around '-' inside '<<' [-Wparentheses]
> 3563 | n = (re - rs + 1) << PAGE_SHIFT - offset;

Thanks, below code change can fix the warning.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index bdaceda1b878..ec5665e70114 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3544,7 +3544,7 @@ static void vb_vread(char *buf, char *addr, int count)
}
/*it could start reading from the middle of used region*/
offset = offset_in_page(addr);
- n = (re - rs + 1) << PAGE_SHIFT - offset;
+ n = ((re - rs + 1) << PAGE_SHIFT) - offset;
if (n > count)
n = count;
aligned_vread(buf, start+offset, n);

2022-12-17 12:26:42

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

On Sat, Dec 17, 2022 at 09:54:31AM +0800, Baoquan He wrote:
> Currently, vread can read out vmalloc areas which is associated with
> a vm_struct. While this doesn't work for areas created by vm_map_ram()
> interface because it doesn't have an associated vm_struct. Then in vread(),
> these areas will be skipped.
>
> Here, add a new function vb_vread() to read out areas managed by
> vmap_block specifically. Then recognize vm_map_ram areas via vmap->flags
> and handle them respectively.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> mm/vmalloc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 59 insertions(+), 7 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 190f29bbaaa7..6612914459cf 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3515,6 +3515,51 @@ static int aligned_vread(char *buf, char *addr, unsigned long count)
> return copied;
> }
>
> +static void vb_vread(char *buf, char *addr, int count)
> +{
> + char *start;
> + struct vmap_block *vb;
> + unsigned long offset;
> + unsigned int rs, re, n;
> +
> + vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
> +
> + spin_lock(&vb->lock);
> + if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
> + spin_unlock(&vb->lock);
> + memset(buf, 0, count);
> + return;
> + }
> + for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
> + if (!count)
> + break;
> + start = vmap_block_vaddr(vb->va->va_start, rs);
> + if (addr < start) {
> + if (count == 0)
> + break;
> + *buf = '\0';
> + buf++;
> + addr++;
> + count--;
> + }

I may be missing something here, but is this not essentially 'if the address is
below a used region, write a single null byte into the buffer and continue,
assuming we are now in a used area?'

This doesn't seem right, but I am happy to be corrected (perhaps we only expect
to be a single byte below a start region?)

> + /*it could start reading from the middle of used region*/
> + offset = offset_in_page(addr);
> + n = (re - rs + 1) << PAGE_SHIFT - offset;

The kernel bot has already picked up on this paren issue :)

> + if (n > count)
> + n = count;
> + aligned_vread(buf, start+offset, n);
> +
> + buf += n;
> + addr += n;
> + count -= n;
> + }
> + spin_unlock(&vb->lock);
> +
> + /* zero-fill the left dirty or free regions */
> + if (count)
> + memset(buf, 0, count);
> +}
> +
> /**
> * vread() - read vmalloc area in a safe way.
> * @buf: buffer for reading data
> @@ -3545,7 +3590,7 @@ long vread(char *buf, char *addr, unsigned long count)
> struct vm_struct *vm;
> char *vaddr, *buf_start = buf;
> unsigned long buflen = count;
> - unsigned long n;
> + unsigned long n, size, flags;
>
> addr = kasan_reset_tag(addr);
>
> @@ -3566,12 +3611,16 @@ long vread(char *buf, char *addr, unsigned long count)
> if (!count)
> break;
>
> - if (!va->vm)
> + vm = va->vm;
> + flags = va->flags & VMAP_FLAGS_MASK;
> +
> + if (!vm && !flags)
> continue;
>

This seems very delicate now as going forward, vm _could_ be NULL. In fact, a
later patch in the series then goes on to use vm and assume it is not null (will
comment).

I feel we should be very explicit after here asserting that vm != NULL.

> - vm = va->vm;
> - vaddr = (char *) vm->addr;
> - if (addr >= vaddr + get_vm_area_size(vm))
> + vaddr = (char *) va->va_start;
> + size = flags ? va_size(va) : get_vm_area_size(vm);

For example here, I feel that this ternary should be reversed and based on
whether vm is null, unles we expect vm to ever be non-null _and_ flags to be
set?

> +
> + if (addr >= vaddr + size)
> continue;
> while (addr < vaddr) {
> if (count == 0)
> @@ -3581,10 +3630,13 @@ long vread(char *buf, char *addr, unsigned long count)
> addr++;
> count--;
> }
> - n = vaddr + get_vm_area_size(vm) - addr;
> + n = vaddr + size - addr;
> if (n > count)
> n = count;
> - if (!(vm->flags & VM_IOREMAP))
> +
> + if ((flags & (VMAP_RAM|VMAP_BLOCK)) == (VMAP_RAM|VMAP_BLOCK))
> + vb_vread(buf, addr, n);
> + else if ((flags & VMAP_RAM) || !(vm->flags & VM_IOREMAP))
> aligned_vread(buf, addr, n);
> else /* IOREMAP area is treated as memory hole */
> memset(buf, 0, n);
> --
> 2.34.1
>

2022-12-17 12:39:29

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] mm/vmalloc: skip the uninitilized vmalloc areas

On Sat, Dec 17, 2022 at 09:54:33AM +0800, Baoquan He wrote:
> @@ -3617,6 +3617,11 @@ long vread(char *buf, char *addr, unsigned long count)
> if (!vm && !flags)
> continue;
>
> + if (vm->flags & VM_UNINITIALIZED)
> + continue;

This comes immediately after asserting that vm _might be null_. This surely must become:-

if (vm && vm->flags & VM_UNINITIALIZED)
continue;

2022-12-19 07:53:09

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] mm/vmalloc: skip the uninitilized vmalloc areas

On 12/17/22 at 12:07pm, Lorenzo Stoakes wrote:
> On Sat, Dec 17, 2022 at 09:54:33AM +0800, Baoquan He wrote:
> > @@ -3617,6 +3617,11 @@ long vread(char *buf, char *addr, unsigned long count)
> > if (!vm && !flags)
> > continue;
> >
> > + if (vm->flags & VM_UNINITIALIZED)
> > + continue;
>
> This comes immediately after asserting that vm _might be null_. This surely must become:-
>
> if (vm && vm->flags & VM_UNINITIALIZED)
> continue;

You are right, will fix it in v3. Thanks for careful reivewing.

2023-01-04 08:43:12

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

On 12/17/22 at 12:06pm, Lorenzo Stoakes wrote:
> On Sat, Dec 17, 2022 at 09:54:31AM +0800, Baoquan He wrote:
> > Currently, vread can read out vmalloc areas which is associated with
> > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > interface because it doesn't have an associated vm_struct. Then in vread(),
> > these areas will be skipped.
> >
> > Here, add a new function vb_vread() to read out areas managed by
> > vmap_block specifically. Then recognize vm_map_ram areas via vmap->flags
> > and handle them respectively.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > ---
> > mm/vmalloc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++------
> > 1 file changed, 59 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 190f29bbaaa7..6612914459cf 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3515,6 +3515,51 @@ static int aligned_vread(char *buf, char *addr, unsigned long count)
> > return copied;
> > }
> >
> > +static void vb_vread(char *buf, char *addr, int count)
> > +{
> > + char *start;
> > + struct vmap_block *vb;
> > + unsigned long offset;
> > + unsigned int rs, re, n;
> > +
> > + vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
> > +
> > + spin_lock(&vb->lock);
> > + if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
> > + spin_unlock(&vb->lock);
> > + memset(buf, 0, count);
> > + return;
> > + }
> > + for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
> > + if (!count)
> > + break;
> > + start = vmap_block_vaddr(vb->va->va_start, rs);
> > + if (addr < start) {
> > + if (count == 0)
> > + break;
> > + *buf = '\0';
> > + buf++;
> > + addr++;
> > + count--;
> > + }

Very sorry, Lorenzo, I just noticed this mail. It's very weird. Earlier,
Uladzislau's reply to patch 2/7 got to be seen in my mutt mail client 10
days later. I am not sure it's my mail client's problem, or a mail server
delivery issue.

>
> I may be missing something here, but is this not essentially 'if the address is
> below a used region, write a single null byte into the buffer and continue,
> assuming we are now in a used area?'

Not sure if I got you. for_each_set_bitrange only iterates the used
regions. So in the for loop, what we do is fill zero into the buffer
below the used region, then read out the used region. You said
'continue', I don't understand what it means.

Assume we have 3 used regions in one vmap block, see below diagram.
|_______|______________|________|_____________|_____|_____________|______|
|hole 0 |used region 0 |hole 1 |used region 1|hole2|used region2 |hole 3 |

hole 0,1,2 will be set zero when we iterate to the used region above
them. And the last hole 3 is set at the end of this function. Please
help point it out if I got it wrong.

>
> This doesn't seem right, but I am happy to be corrected (perhaps we only expect
> to be a single byte below a start region?)
>
> > + /*it could start reading from the middle of used region*/
> > + offset = offset_in_page(addr);
> > + n = (re - rs + 1) << PAGE_SHIFT - offset;
>
> The kernel bot has already picked up on this paren issue :)

Right, has been handled. Thanks.

>
> > + if (n > count)
> > + n = count;
> > + aligned_vread(buf, start+offset, n);
> > +
> > + buf += n;
> > + addr += n;
> > + count -= n;
> > + }
> > + spin_unlock(&vb->lock);
> > +
> > + /* zero-fill the left dirty or free regions */
> > + if (count)
> > + memset(buf, 0, count);
> > +}
> > +
> > /**
> > * vread() - read vmalloc area in a safe way.
> > * @buf: buffer for reading data
> > @@ -3545,7 +3590,7 @@ long vread(char *buf, char *addr, unsigned long count)
> > struct vm_struct *vm;
> > char *vaddr, *buf_start = buf;
> > unsigned long buflen = count;
> > - unsigned long n;
> > + unsigned long n, size, flags;
> >
> > addr = kasan_reset_tag(addr);
> >
> > @@ -3566,12 +3611,16 @@ long vread(char *buf, char *addr, unsigned long count)
> > if (!count)
> > break;
> >
> > - if (!va->vm)
> > + vm = va->vm;
> > + flags = va->flags & VMAP_FLAGS_MASK;
> > +
> > + if (!vm && !flags)
> > continue;
> >
>
> This seems very delicate now as going forward, vm _could_ be NULL. In fact, a
> later patch in the series then goes on to use vm and assume it is not null (will
> comment).
>
> I feel we should be very explicit after here asserting that vm != NULL.
>
> > - vm = va->vm;
> > - vaddr = (char *) vm->addr;
> > - if (addr >= vaddr + get_vm_area_size(vm))
> > + vaddr = (char *) va->va_start;
> > + size = flags ? va_size(va) : get_vm_area_size(vm);
>
> For example here, I feel that this ternary should be reversed and based on
> whether vm is null, unles we expect vm to ever be non-null _and_ flags to be
> set?

Now only vm_map_ram area sets flags, all other types has vm not null.
Since those temporary state, e.g vm==NULL, flags==0 case has been
filtered out. Is below you suggested?

size = (!vm&&flags)? va_size(va) : get_vm_area_size(vm);
or
size = (vm&&!flags)? get_vm_area_size(vm):va_size(va);

>
> > +
> > + if (addr >= vaddr + size)
> > continue;
> > while (addr < vaddr) {
> > if (count == 0)
> > @@ -3581,10 +3630,13 @@ long vread(char *buf, char *addr, unsigned long count)
> > addr++;
> > count--;
> > }
> > - n = vaddr + get_vm_area_size(vm) - addr;
> > + n = vaddr + size - addr;
> > if (n > count)
> > n = count;
> > - if (!(vm->flags & VM_IOREMAP))
> > +
> > + if ((flags & (VMAP_RAM|VMAP_BLOCK)) == (VMAP_RAM|VMAP_BLOCK))
> > + vb_vread(buf, addr, n);
> > + else if ((flags & VMAP_RAM) || !(vm->flags & VM_IOREMAP))
> > aligned_vread(buf, addr, n);
> > else /* IOREMAP area is treated as memory hole */
> > memset(buf, 0, n);
> > --
> > 2.34.1
> >
>

2023-01-04 20:50:32

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

On Wed, Jan 04, 2023 at 04:01:36PM +0800, Baoquan He wrote:
> On 12/17/22 at 12:06pm, Lorenzo Stoakes wrote:
> > On Sat, Dec 17, 2022 at 09:54:31AM +0800, Baoquan He wrote:
> > > Currently, vread can read out vmalloc areas which is associated with
> > > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > > interface because it doesn't have an associated vm_struct. Then in vread(),
> > > these areas will be skipped.
> > >
> > > Here, add a new function vb_vread() to read out areas managed by
> > > vmap_block specifically. Then recognize vm_map_ram areas via vmap->flags
> > > and handle them respectively.
> > >
> > > Signed-off-by: Baoquan He <[email protected]>
> > > ---
> > > mm/vmalloc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++------
> > > 1 file changed, 59 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 190f29bbaaa7..6612914459cf 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -3515,6 +3515,51 @@ static int aligned_vread(char *buf, char *addr, unsigned long count)
> > > return copied;
> > > }
> > >
> > > +static void vb_vread(char *buf, char *addr, int count)
> > > +{
> > > + char *start;
> > > + struct vmap_block *vb;
> > > + unsigned long offset;
> > > + unsigned int rs, re, n;
> > > +
> > > + vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
> > > +
> > > + spin_lock(&vb->lock);
> > > + if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
> > > + spin_unlock(&vb->lock);
> > > + memset(buf, 0, count);
> > > + return;
> > > + }
> > > + for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
> > > + if (!count)
> > > + break;
> > > + start = vmap_block_vaddr(vb->va->va_start, rs);
> > > + if (addr < start) {
> > > + if (count == 0)
> > > + break;
> > > + *buf = '\0';
> > > + buf++;
> > > + addr++;
> > > + count--;
> > > + }
>
> Very sorry, Lorenzo, I just noticed this mail. It's very weird. Earlier,
> Uladzislau's reply to patch 2/7 got to be seen in my mutt mail client 10
> days later. I am not sure it's my mail client's problem, or a mail server
> delivery issue.
>

Odd, maybe try lei with mutt I find that works well :)

> >
> > I may be missing something here, but is this not essentially 'if the address is
> > below a used region, write a single null byte into the buffer and continue,
> > assuming we are now in a used area?'
>
> Not sure if I got you. for_each_set_bitrange only iterates the used
> regions. So in the for loop, what we do is fill zero into the buffer
> below the used region, then read out the used region. You said
> 'continue', I don't understand what it means.
>
> Assume we have 3 used regions in one vmap block, see below diagram.
> |_______|______________|________|_____________|_____|_____________|______|
> |hole 0 |used region 0 |hole 1 |used region 1|hole2|used region2 |hole 3 |
>
> hole 0,1,2 will be set zero when we iterate to the used region above
> them. And the last hole 3 is set at the end of this function. Please
> help point it out if I got it wrong.

Maybe let me rephrase:-

- We want to read `count` bytes from `addr` into `buf`
- We iterate over _used_ blocks, placing the start/end of each block in `rs`, `re`
respectively.
- If we hit a block whose start address is above the one in which we are interested then:-
- Place a zero byte in the buffer
- Increment `addr` by 1 byte
- Decrement the `count` by 1 byte
- Carry on

I am seriously confused as to why we do this? Surely we should be checking
whether the range [addr, addr + count) overlaps this block at all, and only then
copying the relevant region?

It's the fact that blocks are at base page granularity but then this condition
is at byte granularity that is confusing to me (again it's _very_ possible I am
just being dumb here and missing something, just really want to understand this
better :)

> > > - vm = va->vm;
> > > - vaddr = (char *) vm->addr;
> > > - if (addr >= vaddr + get_vm_area_size(vm))
> > > + vaddr = (char *) va->va_start;
> > > + size = flags ? va_size(va) : get_vm_area_size(vm);
> >
> > For example here, I feel that this ternary should be reversed and based on
> > whether vm is null, unles we expect vm to ever be non-null _and_ flags to be
> > set?
>
> Now only vm_map_ram area sets flags, all other types has vm not null.
> Since those temporary state, e.g vm==NULL, flags==0 case has been
> filtered out. Is below you suggested?
>
> size = (!vm&&flags)? va_size(va) : get_vm_area_size(vm);
> or
> size = (vm&&!flags)? get_vm_area_size(vm):va_size(va);
>

Sorry I didn't phrase this very well, my point is that the key thing you're
relying on here is whether vm exists in order to use it so I simply meant:-

size = vm ? get_vm_area_size(vm) : va_size(va);

This just makes it really explicit that you need vm to be non-NULL, and you've
already done the flags check before so this should suffice.

2023-01-09 05:07:04

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

On 01/04/23 at 08:20pm, Lorenzo Stoakes wrote:
> On Wed, Jan 04, 2023 at 04:01:36PM +0800, Baoquan He wrote:
> > On 12/17/22 at 12:06pm, Lorenzo Stoakes wrote:
> > > On Sat, Dec 17, 2022 at 09:54:31AM +0800, Baoquan He wrote:
> > > > Currently, vread can read out vmalloc areas which is associated with
> > > > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > > > interface because it doesn't have an associated vm_struct. Then in vread(),
> > > > these areas will be skipped.
> > > >
> > > > Here, add a new function vb_vread() to read out areas managed by
> > > > vmap_block specifically. Then recognize vm_map_ram areas via vmap->flags
> > > > and handle them respectively.
> > > >
> > > > Signed-off-by: Baoquan He <[email protected]>
> > > > ---
> > > > mm/vmalloc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++------
> > > > 1 file changed, 59 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 190f29bbaaa7..6612914459cf 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -3515,6 +3515,51 @@ static int aligned_vread(char *buf, char *addr, unsigned long count)
> > > > return copied;
> > > > }
> > > >
> > > > +static void vb_vread(char *buf, char *addr, int count)
> > > > +{
> > > > + char *start;
> > > > + struct vmap_block *vb;
> > > > + unsigned long offset;
> > > > + unsigned int rs, re, n;
> > > > +
> > > > + vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
> > > > +
> > > > + spin_lock(&vb->lock);
> > > > + if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
> > > > + spin_unlock(&vb->lock);
> > > > + memset(buf, 0, count);
> > > > + return;
> > > > + }
> > > > + for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
> > > > + if (!count)
> > > > + break;
> > > > + start = vmap_block_vaddr(vb->va->va_start, rs);
> > > > + if (addr < start) {
> > > > + if (count == 0)
> > > > + break;
> > > > + *buf = '\0';
> > > > + buf++;
> > > > + addr++;
> > > > + count--;
> > > > + }
> >
> > Very sorry, Lorenzo, I just noticed this mail. It's very weird. Earlier,
> > Uladzislau's reply to patch 2/7 got to be seen in my mutt mail client 10
> > days later. I am not sure it's my mail client's problem, or a mail server
> > delivery issue.
> >
>
> Odd, maybe try lei with mutt I find that works well :)

Sorry for late reply, just come back from vacation.

Lei + mutt sounds like a good idea. I relied too much on mbsync in the
past.

>
> > >
> > > I may be missing something here, but is this not essentially 'if the address is
> > > below a used region, write a single null byte into the buffer and continue,
> > > assuming we are now in a used area?'
> >
> > Not sure if I got you. for_each_set_bitrange only iterates the used
> > regions. So in the for loop, what we do is fill zero into the buffer
> > below the used region, then read out the used region. You said
> > 'continue', I don't understand what it means.
> >
> > Assume we have 3 used regions in one vmap block, see below diagram.
> > |_______|______________|________|_____________|_____|_____________|______|
> > |hole 0 |used region 0 |hole 1 |used region 1|hole2|used region2 |hole 3 |
> >
> > hole 0,1,2 will be set zero when we iterate to the used region above
> > them. And the last hole 3 is set at the end of this function. Please
> > help point it out if I got it wrong.
>
> Maybe let me rephrase:-
>
> - We want to read `count` bytes from `addr` into `buf`
> - We iterate over _used_ blocks, placing the start/end of each block in `rs`, `re`
> respectively.
> - If we hit a block whose start address is above the one in which we are interested then:-
> - Place a zero byte in the buffer
> - Increment `addr` by 1 byte
> - Decrement the `count` by 1 byte
> - Carry on
>
> I am seriously confused as to why we do this? Surely we should be checking
> whether the range [addr, addr + count) overlaps this block at all, and only then
> copying the relevant region?

I guessed this could be your concern, but not very sure. That
code block is copied from vread(), and my considerations are:
1) We could starting read from any position of kcore file. /proc/kcore
is a elf file logically, it's allowed to read from anywhere, right? We
don't have to read the entire file always. So the vmap_block reading is
not necessarily page aligned. It's very similar with the empty area
filling in vread().
2) memset() is doing the byte by byte reading. We can
change code as below. While we don't save the effort very much, and we
need introduce an extra local variable to store the value of
(start - end).

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b054081aa66b..dce4a843a9e8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3576,6 +3576,15 @@ static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags
+ if (addr < start) {
+ int num = min(count, (start - add));
+ memset(buf, 0, count);
+ count -= num;
+ if (count == 0)
+ break;
+ buf -= num;
+ addr -= num;
+ }
/*it could start reading from the middle of used region*/
offset = offset_in_page(addr);
n = ((re - rs + 1) << PAGE_SHIFT) - offset;

void *memset(void *s, int c, size_t count)
{
char *xs = s;

while (count--)
*xs++ = c;
return s;
}

>
> It's the fact that blocks are at base page granularity but then this condition
> is at byte granularity that is confusing to me (again it's _very_ possible I am
> just being dumb here and missing something, just really want to understand this
> better :)

I like this kind of reviewing with careful checking and deep thinking.
For above code block, I think it's a very great point. From my point of
view, I like the memset version better, it's easier to understand. If we
all agree, we can change it to take memset way. When I made patches,
several issues related to patches were hovering in my mind at the same
time, I did not consider this one so deeply.

>
> > > > - vm = va->vm;
> > > > - vaddr = (char *) vm->addr;
> > > > - if (addr >= vaddr + get_vm_area_size(vm))
> > > > + vaddr = (char *) va->va_start;
> > > > + size = flags ? va_size(va) : get_vm_area_size(vm);
> > >
> > > For example here, I feel that this ternary should be reversed and based on
> > > whether vm is null, unles we expect vm to ever be non-null _and_ flags to be
> > > set?
> >
> > Now only vm_map_ram area sets flags, all other types has vm not null.
> > Since those temporary state, e.g vm==NULL, flags==0 case has been
> > filtered out. Is below you suggested?
> >
> > size = (!vm&&flags)? va_size(va) : get_vm_area_size(vm);
> > or
> > size = (vm&&!flags)? get_vm_area_size(vm):va_size(va);
> >
>
> Sorry I didn't phrase this very well, my point is that the key thing you're
> relying on here is whether vm exists in order to use it so I simply meant:-
>
> size = vm ? get_vm_area_size(vm) : va_size(va);
>
> This just makes it really explicit that you need vm to be non-NULL, and you've
> already done the flags check before so this should suffice.

Sounds reasonable, I will copy above line you pasted. Thanks a lot.

2023-01-09 07:39:53

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

On Mon, Jan 09, 2023 at 12:35:04PM +0800, Baoquan He wrote:
> Sorry for late reply, just come back from vacation.

Hope you had a great time! :)

>
> Lei + mutt sounds like a good idea. I relied too much on mbsync in the
> past.
>

Yeah I'm finding it works well,
https://josefbacik.github.io/kernel/2021/10/18/lei-and-b4.html is a handy guide!

[snip]
> > Maybe let me rephrase:-
> >
> > - We want to read `count` bytes from `addr` into `buf`
> > - We iterate over _used_ blocks, placing the start/end of each block in `rs`, `re`
> > respectively.
> > - If we hit a block whose start address is above the one in which we are interested then:-
> > - Place a zero byte in the buffer
> > - Increment `addr` by 1 byte
> > - Decrement the `count` by 1 byte
> > - Carry on
> >
> > I am seriously confused as to why we do this? Surely we should be checking
> > whether the range [addr, addr + count) overlaps this block at all, and only then
> > copying the relevant region?
>
> I guessed this could be your concern, but not very sure. That
> code block is copied from vread(), and my considerations are:
> 1) We could starting read from any position of kcore file. /proc/kcore
> is a elf file logically, it's allowed to read from anywhere, right? We
> don't have to read the entire file always. So the vmap_block reading is
> not necessarily page aligned. It's very similar with the empty area
> filling in vread().
> 2) memset() is doing the byte by byte reading. We can
> change code as below. While we don't save the effort very much, and we
> need introduce an extra local variable to store the value of
> (start - end).
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index b054081aa66b..dce4a843a9e8 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3576,6 +3576,15 @@ static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags
> + if (addr < start) {
> + int num = min(count, (start - add));
> + memset(buf, 0, count);
> + count -= num;
> + if (count == 0)
> + break;
> + buf -= num;
> + addr -= num;
> + }
> /*it could start reading from the middle of used region*/
> offset = offset_in_page(addr);
> n = ((re - rs + 1) << PAGE_SHIFT) - offset;
>

The difference with vread() is that uses a while loop rather than an if clause
so operates over the whole region byte-by-byte, your original would only do this
for 1 byte so now things make a lot more sense!

This approach makes sense though I'd put the count == 0 check first and nit
'add' should be 'addr'.

I am happy with either this or a while loop instead of an if which it seems is
what the original issue was!

> void *memset(void *s, int c, size_t count)
> {
> char *xs = s;
>
> while (count--)
> *xs++ = c;
> return s;
> }
>
> >
> > It's the fact that blocks are at base page granularity but then this condition
> > is at byte granularity that is confusing to me (again it's _very_ possible I am
> > just being dumb here and missing something, just really want to understand this
> > better :)
>
> I like this kind of reviewing with careful checking and deep thinking.
> For above code block, I think it's a very great point. From my point of
> view, I like the memset version better, it's easier to understand. If we
> all agree, we can change it to take memset way. When I made patches,
> several issues related to patches were hovering in my mind at the same
> time, I did not consider this one so deeply.
>

Thanks :) I have a particular interest in vmalloc so am happy to dive in with
reviews here!

> >
> > > > > - vm = va->vm;
> > > > > - vaddr = (char *) vm->addr;
> > > > > - if (addr >= vaddr + get_vm_area_size(vm))
> > > > > + vaddr = (char *) va->va_start;
> > > > > + size = flags ? va_size(va) : get_vm_area_size(vm);
> > > >
> > > > For example here, I feel that this ternary should be reversed and based on
> > > > whether vm is null, unles we expect vm to ever be non-null _and_ flags to be
> > > > set?
> > >
> > > Now only vm_map_ram area sets flags, all other types has vm not null.
> > > Since those temporary state, e.g vm==NULL, flags==0 case has been
> > > filtered out. Is below you suggested?
> > >
> > > size = (!vm&&flags)? va_size(va) : get_vm_area_size(vm);
> > > or
> > > size = (vm&&!flags)? get_vm_area_size(vm):va_size(va);
> > >
> >
> > Sorry I didn't phrase this very well, my point is that the key thing you're
> > relying on here is whether vm exists in order to use it so I simply meant:-
> >
> > size = vm ? get_vm_area_size(vm) : va_size(va);
> >
> > This just makes it really explicit that you need vm to be non-NULL, and you've
> > already done the flags check before so this should suffice.
>
> Sounds reasonable, I will copy above line you pasted. Thanks a lot.
>

Cheers!

2023-01-09 13:05:19

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas

On 01/09/23 at 07:12am, Lorenzo Stoakes wrote:
> On Mon, Jan 09, 2023 at 12:35:04PM +0800, Baoquan He wrote:
> > Sorry for late reply, just come back from vacation.
>
> Hope you had a great time! :)

Thanks.

>
> >
> > Lei + mutt sounds like a good idea. I relied too much on mbsync in the
> > past.
> >
>
> Yeah I'm finding it works well,
> https://josefbacik.github.io/kernel/2021/10/18/lei-and-b4.html is a handy guide!

Very helpful, will try.

>
> [snip]
> > > Maybe let me rephrase:-
> > >
> > > - We want to read `count` bytes from `addr` into `buf`
> > > - We iterate over _used_ blocks, placing the start/end of each block in `rs`, `re`
> > > respectively.
> > > - If we hit a block whose start address is above the one in which we are interested then:-
> > > - Place a zero byte in the buffer
> > > - Increment `addr` by 1 byte
> > > - Decrement the `count` by 1 byte
> > > - Carry on
> > >
> > > I am seriously confused as to why we do this? Surely we should be checking
> > > whether the range [addr, addr + count) overlaps this block at all, and only then
> > > copying the relevant region?
> >
> > I guessed this could be your concern, but not very sure. That
> > code block is copied from vread(), and my considerations are:
> > 1) We could starting read from any position of kcore file. /proc/kcore
> > is a elf file logically, it's allowed to read from anywhere, right? We
> > don't have to read the entire file always. So the vmap_block reading is
> > not necessarily page aligned. It's very similar with the empty area
> > filling in vread().
> > 2) memset() is doing the byte by byte reading. We can
> > change code as below. While we don't save the effort very much, and we
> > need introduce an extra local variable to store the value of
> > (start - end).
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index b054081aa66b..dce4a843a9e8 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3576,6 +3576,15 @@ static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags
> > + if (addr < start) {
> > + int num = min(count, (start - add));
> > + memset(buf, 0, count);
> > + count -= num;
> > + if (count == 0)
> > + break;
> > + buf -= num;
> > + addr -= num;
> > + }
> > /*it could start reading from the middle of used region*/
> > offset = offset_in_page(addr);
> > n = ((re - rs + 1) << PAGE_SHIFT) - offset;
> >
>
> The difference with vread() is that uses a while loop rather than an if clause
> so operates over the whole region byte-by-byte, your original would only do this
> for 1 byte so now things make a lot more sense!

Oops, that 'if clause' is a code bug, I finally got your point until
now, my dumb head.

>
> This approach makes sense though I'd put the count == 0 check first and nit
> 'add' should be 'addr'.
>
> I am happy with either this or a while loop instead of an if which it seems is
> what the original issue was!

OK, I will think again which one is more appropriate.

>
> > void *memset(void *s, int c, size_t count)
> > {
> > char *xs = s;
> >
> > while (count--)
> > *xs++ = c;
> > return s;
> > }
> >
> > >
> > > It's the fact that blocks are at base page granularity but then this condition
> > > is at byte granularity that is confusing to me (again it's _very_ possible I am
> > > just being dumb here and missing something, just really want to understand this
> > > better :)
> >
> > I like this kind of reviewing with careful checking and deep thinking.
> > For above code block, I think it's a very great point. From my point of
> > view, I like the memset version better, it's easier to understand. If we
> > all agree, we can change it to take memset way. When I made patches,
> > several issues related to patches were hovering in my mind at the same
> > time, I did not consider this one so deeply.
> >
>
> Thanks :) I have a particular interest in vmalloc so am happy to dive in with
> reviews here!
>
> > >
> > > > > > - vm = va->vm;
> > > > > > - vaddr = (char *) vm->addr;
> > > > > > - if (addr >= vaddr + get_vm_area_size(vm))
> > > > > > + vaddr = (char *) va->va_start;
> > > > > > + size = flags ? va_size(va) : get_vm_area_size(vm);
> > > > >
> > > > > For example here, I feel that this ternary should be reversed and based on
> > > > > whether vm is null, unles we expect vm to ever be non-null _and_ flags to be
> > > > > set?
> > > >
> > > > Now only vm_map_ram area sets flags, all other types has vm not null.
> > > > Since those temporary state, e.g vm==NULL, flags==0 case has been
> > > > filtered out. Is below you suggested?
> > > >
> > > > size = (!vm&&flags)? va_size(va) : get_vm_area_size(vm);
> > > > or
> > > > size = (vm&&!flags)? get_vm_area_size(vm):va_size(va);
> > > >
> > >
> > > Sorry I didn't phrase this very well, my point is that the key thing you're
> > > relying on here is whether vm exists in order to use it so I simply meant:-
> > >
> > > size = vm ? get_vm_area_size(vm) : va_size(va);
> > >
> > > This just makes it really explicit that you need vm to be non-NULL, and you've
> > > already done the flags check before so this should suffice.
> >
> > Sounds reasonable, I will copy above line you pasted. Thanks a lot.

Thanks again for careful reviewing and great suggestions and findings.