Various notifications of type "BUG kmalloc-4096 () : Redzone
overwritten" have been observed recently in various parts of
the kernel. After some time, it has been made a relation with
the use of BTRFS filesystem.
[ 22.809700] BUG kmalloc-4096 (Tainted: G W ): Redzone overwritten
[ 22.809971] -----------------------------------------------------------------------------
[ 22.810286] INFO: 0xbe1a5921-0xfbfc06cd. First byte 0x0 instead of 0xcc
[ 22.810866] INFO: Allocated in __load_free_space_cache+0x588/0x780 [btrfs] age=22 cpu=0 pid=224
[ 22.811193] __slab_alloc.constprop.26+0x44/0x70
[ 22.811345] kmem_cache_alloc_trace+0xf0/0x2ec
[ 22.811588] __load_free_space_cache+0x588/0x780 [btrfs]
[ 22.811848] load_free_space_cache+0xf4/0x1b0 [btrfs]
[ 22.812090] cache_block_group+0x1d0/0x3d0 [btrfs]
[ 22.812321] find_free_extent+0x680/0x12a4 [btrfs]
[ 22.812549] btrfs_reserve_extent+0xec/0x220 [btrfs]
[ 22.812785] btrfs_alloc_tree_block+0x178/0x5f4 [btrfs]
[ 22.813032] __btrfs_cow_block+0x150/0x5d4 [btrfs]
[ 22.813262] btrfs_cow_block+0x194/0x298 [btrfs]
[ 22.813484] commit_cowonly_roots+0x44/0x294 [btrfs]
[ 22.813718] btrfs_commit_transaction+0x63c/0xc0c [btrfs]
[ 22.813973] close_ctree+0xf8/0x2a4 [btrfs]
[ 22.814107] generic_shutdown_super+0x80/0x110
[ 22.814250] kill_anon_super+0x18/0x30
[ 22.814437] btrfs_kill_super+0x18/0x90 [btrfs]
[ 22.814590] INFO: Freed in proc_cgroup_show+0xc0/0x248 age=41 cpu=0 pid=83
[ 22.814841] proc_cgroup_show+0xc0/0x248
[ 22.814967] proc_single_show+0x54/0x98
[ 22.815086] seq_read+0x278/0x45c
[ 22.815190] __vfs_read+0x28/0x17c
[ 22.815289] vfs_read+0xa8/0x14c
[ 22.815381] ksys_read+0x50/0x94
[ 22.815475] ret_from_syscall+0x0/0x38
Commit 69d2480456d1 ("btrfs: use copy_page for copying pages instead
of memcpy") changed the way bitmap blocks are copied. But allthough
bitmaps have the size of a page, they were allocated with kzalloc().
Most of the time, kzalloc() allocates aligned blocks of memory, so
copy_page() can be used. But when some debug options like SLAB_DEBUG
are activated, kzalloc() may return unaligned pointer.
On powerpc, memcpy(), copy_page() and other copying functions use
'dcbz' instruction which provides an entire zeroed cacheline to avoid
memory read when the intention is to overwrite a full line. Functions
like memcpy() are writen to care about partial cachelines at the start
and end of the destination, but copy_page() assumes it gets pages. As
pages are naturally cache aligned, copy_page() doesn't care about
partial lines. This means that when copy_page() is called with a
misaligned pointer, a few leading bytes are zeroed.
To fix it, allocate bitmaps with get_zeroed_page() instead of kzalloc()
Reported-by: Erhard F. <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204371
Fixes: 69d2480456d1 ("btrfs: use copy_page for copying pages instead of memcpy")
Cc: [email protected]
Signed-off-by: Christophe Leroy <[email protected]>
Tested-by: Erhard F. <[email protected]>
---
fs/btrfs/free-space-cache.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 062be9dde4c6..3229a058e025 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -764,7 +764,7 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode,
} else {
ASSERT(num_bitmaps);
num_bitmaps--;
- e->bitmap = kzalloc(PAGE_SIZE, GFP_NOFS);
+ e->bitmap = (void *)get_zeroed_page(GFP_NOFS);
if (!e->bitmap) {
kmem_cache_free(
btrfs_free_space_cachep, e);
@@ -1881,7 +1881,7 @@ static void free_bitmap(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *bitmap_info)
{
unlink_free_space(ctl, bitmap_info);
- kfree(bitmap_info->bitmap);
+ free_page((unsigned long)bitmap_info->bitmap);
kmem_cache_free(btrfs_free_space_cachep, bitmap_info);
ctl->total_bitmaps--;
ctl->op->recalc_thresholds(ctl);
@@ -2135,7 +2135,7 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl,
}
/* allocate the bitmap */
- info->bitmap = kzalloc(PAGE_SIZE, GFP_NOFS);
+ info->bitmap = (void *)get_zeroed_page(GFP_NOFS);
spin_lock(&ctl->tree_lock);
if (!info->bitmap) {
ret = -ENOMEM;
@@ -2146,7 +2146,7 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl,
out:
if (info) {
- kfree(info->bitmap);
+ free_page((unsigned long)info->bitmap);
kmem_cache_free(btrfs_free_space_cachep, info);
}
@@ -2802,7 +2802,7 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group,
if (entry->bytes == 0) {
ctl->free_extents--;
if (entry->bitmap) {
- kfree(entry->bitmap);
+ free_page((unsigned long)entry->bitmap);
ctl->total_bitmaps--;
ctl->op->recalc_thresholds(ctl);
}
@@ -3606,7 +3606,7 @@ int test_add_free_space_entry(struct btrfs_block_group_cache *cache,
}
if (!map) {
- map = kzalloc(PAGE_SIZE, GFP_NOFS);
+ map = (void *)get_zeroed_page(GFP_NOFS);
if (!map) {
kmem_cache_free(btrfs_free_space_cachep, info);
return -ENOMEM;
@@ -3635,7 +3635,7 @@ int test_add_free_space_entry(struct btrfs_block_group_cache *cache,
if (info)
kmem_cache_free(btrfs_free_space_cachep, info);
- kfree(map);
+ free_page((unsigned long)map);
return 0;
}
--
2.17.1
On Sat, Aug 17, 2019 at 07:44:39AM +0000, Christophe Leroy wrote:
> Various notifications of type "BUG kmalloc-4096 () : Redzone
> overwritten" have been observed recently in various parts of
> the kernel. After some time, it has been made a relation with
> the use of BTRFS filesystem.
>
> [ 22.809700] BUG kmalloc-4096 (Tainted: G W ): Redzone overwritten
> [ 22.809971] -----------------------------------------------------------------------------
>
> [ 22.810286] INFO: 0xbe1a5921-0xfbfc06cd. First byte 0x0 instead of 0xcc
> [ 22.810866] INFO: Allocated in __load_free_space_cache+0x588/0x780 [btrfs] age=22 cpu=0 pid=224
> [ 22.811193] __slab_alloc.constprop.26+0x44/0x70
> [ 22.811345] kmem_cache_alloc_trace+0xf0/0x2ec
> [ 22.811588] __load_free_space_cache+0x588/0x780 [btrfs]
> [ 22.811848] load_free_space_cache+0xf4/0x1b0 [btrfs]
> [ 22.812090] cache_block_group+0x1d0/0x3d0 [btrfs]
> [ 22.812321] find_free_extent+0x680/0x12a4 [btrfs]
> [ 22.812549] btrfs_reserve_extent+0xec/0x220 [btrfs]
> [ 22.812785] btrfs_alloc_tree_block+0x178/0x5f4 [btrfs]
> [ 22.813032] __btrfs_cow_block+0x150/0x5d4 [btrfs]
> [ 22.813262] btrfs_cow_block+0x194/0x298 [btrfs]
> [ 22.813484] commit_cowonly_roots+0x44/0x294 [btrfs]
> [ 22.813718] btrfs_commit_transaction+0x63c/0xc0c [btrfs]
> [ 22.813973] close_ctree+0xf8/0x2a4 [btrfs]
> [ 22.814107] generic_shutdown_super+0x80/0x110
> [ 22.814250] kill_anon_super+0x18/0x30
> [ 22.814437] btrfs_kill_super+0x18/0x90 [btrfs]
> [ 22.814590] INFO: Freed in proc_cgroup_show+0xc0/0x248 age=41 cpu=0 pid=83
> [ 22.814841] proc_cgroup_show+0xc0/0x248
> [ 22.814967] proc_single_show+0x54/0x98
> [ 22.815086] seq_read+0x278/0x45c
> [ 22.815190] __vfs_read+0x28/0x17c
> [ 22.815289] vfs_read+0xa8/0x14c
> [ 22.815381] ksys_read+0x50/0x94
> [ 22.815475] ret_from_syscall+0x0/0x38
>
> Commit 69d2480456d1 ("btrfs: use copy_page for copying pages instead
> of memcpy") changed the way bitmap blocks are copied. But allthough
> bitmaps have the size of a page, they were allocated with kzalloc().
>
> Most of the time, kzalloc() allocates aligned blocks of memory, so
> copy_page() can be used. But when some debug options like SLAB_DEBUG
> are activated, kzalloc() may return unaligned pointer.
>
> On powerpc, memcpy(), copy_page() and other copying functions use
> 'dcbz' instruction which provides an entire zeroed cacheline to avoid
> memory read when the intention is to overwrite a full line. Functions
> like memcpy() are writen to care about partial cachelines at the start
> and end of the destination, but copy_page() assumes it gets pages.
This assumption is not documented nor any pitfalls mentioned in
include/asm-generic/page.h that provides the generic implementation. I
as an API user cannot check each arch implementation for additional
constraints or I would expect that it deals with the boundary cases the
same way as arch-specific memcpy implementations.
Another thing that is lost is the slub debugging support for all
architectures, because get_zeroed_pages lacking the red zones and sanity
checks.
I find working with raw pages in this code a bit inconsistent with the
rest of btrfs code, but that's rather minor compared to the above.
Summing it up, I think that the proper fix should go to copy_page
implementation on architectures that require it or make it clear what
are the copy_page constraints.
Le 19/08/2019 à 19:46, David Sterba a écrit :
> On Sat, Aug 17, 2019 at 07:44:39AM +0000, Christophe Leroy wrote:
>> Various notifications of type "BUG kmalloc-4096 () : Redzone
>> overwritten" have been observed recently in various parts of
>> the kernel. After some time, it has been made a relation with
>> the use of BTRFS filesystem.
>>
>> [ 22.809700] BUG kmalloc-4096 (Tainted: G W ): Redzone overwritten
>> [ 22.809971] -----------------------------------------------------------------------------
>>
>> [ 22.810286] INFO: 0xbe1a5921-0xfbfc06cd. First byte 0x0 instead of 0xcc
>> [ 22.810866] INFO: Allocated in __load_free_space_cache+0x588/0x780 [btrfs] age=22 cpu=0 pid=224
>> [ 22.811193] __slab_alloc.constprop.26+0x44/0x70
>> [ 22.811345] kmem_cache_alloc_trace+0xf0/0x2ec
>> [ 22.811588] __load_free_space_cache+0x588/0x780 [btrfs]
>> [ 22.811848] load_free_space_cache+0xf4/0x1b0 [btrfs]
>> [ 22.812090] cache_block_group+0x1d0/0x3d0 [btrfs]
>> [ 22.812321] find_free_extent+0x680/0x12a4 [btrfs]
>> [ 22.812549] btrfs_reserve_extent+0xec/0x220 [btrfs]
>> [ 22.812785] btrfs_alloc_tree_block+0x178/0x5f4 [btrfs]
>> [ 22.813032] __btrfs_cow_block+0x150/0x5d4 [btrfs]
>> [ 22.813262] btrfs_cow_block+0x194/0x298 [btrfs]
>> [ 22.813484] commit_cowonly_roots+0x44/0x294 [btrfs]
>> [ 22.813718] btrfs_commit_transaction+0x63c/0xc0c [btrfs]
>> [ 22.813973] close_ctree+0xf8/0x2a4 [btrfs]
>> [ 22.814107] generic_shutdown_super+0x80/0x110
>> [ 22.814250] kill_anon_super+0x18/0x30
>> [ 22.814437] btrfs_kill_super+0x18/0x90 [btrfs]
>> [ 22.814590] INFO: Freed in proc_cgroup_show+0xc0/0x248 age=41 cpu=0 pid=83
>> [ 22.814841] proc_cgroup_show+0xc0/0x248
>> [ 22.814967] proc_single_show+0x54/0x98
>> [ 22.815086] seq_read+0x278/0x45c
>> [ 22.815190] __vfs_read+0x28/0x17c
>> [ 22.815289] vfs_read+0xa8/0x14c
>> [ 22.815381] ksys_read+0x50/0x94
>> [ 22.815475] ret_from_syscall+0x0/0x38
>>
>> Commit 69d2480456d1 ("btrfs: use copy_page for copying pages instead
>> of memcpy") changed the way bitmap blocks are copied. But allthough
>> bitmaps have the size of a page, they were allocated with kzalloc().
>>
>> Most of the time, kzalloc() allocates aligned blocks of memory, so
>> copy_page() can be used. But when some debug options like SLAB_DEBUG
>> are activated, kzalloc() may return unaligned pointer.
>>
>> On powerpc, memcpy(), copy_page() and other copying functions use
>> 'dcbz' instruction which provides an entire zeroed cacheline to avoid
>> memory read when the intention is to overwrite a full line. Functions
>> like memcpy() are writen to care about partial cachelines at the start
>> and end of the destination, but copy_page() assumes it gets pages.
>
> This assumption is not documented nor any pitfalls mentioned in
> include/asm-generic/page.h that provides the generic implementation. I
> as an API user cannot check each arch implementation for additional
> constraints or I would expect that it deals with the boundary cases the
> same way as arch-specific memcpy implementations.
For me, copy_page() is there to ... copy pages. Not to copy any piece of
RAM having the size of a page.
But it happened to others. See commit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d72e9a7a93e4f8e9e52491921d99e0c8aa89eb4e
>
> Another thing that is lost is the slub debugging support for all
> architectures, because get_zeroed_pages lacking the red zones and sanity
> checks.
>
> I find working with raw pages in this code a bit inconsistent with the
> rest of btrfs code, but that's rather minor compared to the above.
What about using kmem_cache instead ? I see kmem_cache is already widely
used in BTRFS, so using it also for block of memory of size PAGE_SIZE
should be ok ?
AFAICS, kmem_cache has the red zones and sanity checks.
>
> Summing it up, I think that the proper fix should go to copy_page
> implementation on architectures that require it or make it clear what
> are the copy_page constraints.
>
I guess anybody using copy_page() to copy something else than a page is
on his/her own.
But following that (bad) experience, I propose a patch to at least
detect it early, see https://patchwork.ozlabs.org/patch/1148033/
Christophe
On Mon, Aug 19, 2019 at 07:46:00PM +0200, David Sterba wrote:
> Another thing that is lost is the slub debugging support for all
> architectures, because get_zeroed_pages lacking the red zones and sanity
> checks.
>
> I find working with raw pages in this code a bit inconsistent with the
> rest of btrfs code, but that's rather minor compared to the above.
>
> Summing it up, I think that the proper fix should go to copy_page
> implementation on architectures that require it or make it clear what
> are the copy_page constraints.
The whole point of copy_page is to copy exactly one page and it makes
sense to assume that is aligned. A sane memcpy would use the same
underlying primitives as well after checking they fit. So I think the
prime issue here is btrfs' use of copy_page instead of memcpy. The
secondary issue is slub fucking up alignments for no good reason. We
just got bitten by that crap again in XFS as well :(
On 8/20/19 4:30 AM, Christoph Hellwig wrote:
> On Mon, Aug 19, 2019 at 07:46:00PM +0200, David Sterba wrote:
>> Another thing that is lost is the slub debugging support for all
>> architectures, because get_zeroed_pages lacking the red zones and sanity
>> checks.
>>
>> I find working with raw pages in this code a bit inconsistent with the
>> rest of btrfs code, but that's rather minor compared to the above.
>>
>> Summing it up, I think that the proper fix should go to copy_page
>> implementation on architectures that require it or make it clear what
>> are the copy_page constraints.
>
> The whole point of copy_page is to copy exactly one page and it makes
> sense to assume that is aligned. A sane memcpy would use the same
> underlying primitives as well after checking they fit. So I think the
> prime issue here is btrfs' use of copy_page instead of memcpy. The
> secondary issue is slub fucking up alignments for no good reason. We
> just got bitten by that crap again in XFS as well :(
Meh, I should finally get back to https://lwn.net/Articles/787740/ right
On Tue, Aug 20, 2019 at 01:06:25PM +0200, Vlastimil Babka wrote:
> > The whole point of copy_page is to copy exactly one page and it makes
> > sense to assume that is aligned. A sane memcpy would use the same
> > underlying primitives as well after checking they fit. So I think the
> > prime issue here is btrfs' use of copy_page instead of memcpy. The
> > secondary issue is slub fucking up alignments for no good reason. We
> > just got bitten by that crap again in XFS as well :(
>
> Meh, I should finally get back to https://lwn.net/Articles/787740/ right
Yes. For now Dave came up with an idea for a workaround that will
be forward-compatible with that:
https://www.spinics.net/lists/linux-xfs/msg30521.html