2001-03-04 22:17:49

by Kenn Humborg

[permalink] [raw]
Subject: kmalloc() alignment


Does kmalloc() make any guarantees of the alignment of allocated
blocks? Will the returned block always be 4-, 8- or 16-byte
aligned, for example?

Later,
Kenn


2001-03-04 22:32:20

by Alan

[permalink] [raw]
Subject: Re: kmalloc() alignment

> Does kmalloc() make any guarantees of the alignment of allocated
> blocks? Will the returned block always be 4-, 8- or 16-byte
> aligned, for example?

There are people who assume 16byte alignment guarantees. I dont think anyone
has formally specified the guarantee beyond 4 bytes tho

2001-03-04 22:41:42

by Manfred Spraul

[permalink] [raw]
Subject: Re: kmalloc() alignment

>
> Does kmalloc() make any guarantees of the alignment of allocated
> blocks? Will the returned block always be 4-, 8- or 16-byte
> aligned, for example?
>

4-byte alignment is guaranteed on 32-bit cpus, 8-byte alignment on
64-bit cpus.

--
Manfred

2001-03-05 09:40:40

by Rogier Wolff

[permalink] [raw]
Subject: Re: kmalloc() alignment

Alan Cox wrote:
> > Does kmalloc() make any guarantees of the alignment of allocated
> > blocks? Will the returned block always be 4-, 8- or 16-byte
> > aligned, for example?

> There are people who assume 16byte alignment guarantees. I dont
> think anyone has formally specified the guarantee beyond 4 bytes tho

What does "formally specified" mean?

As far as I know, you can count on 16-bytes alignment from
kmalloc. The trouble is that you would have to keep the original
pointer and free that if you have to do the "round" yourself.

I once wrote a kmalloc(*) that would allow you to free any pointer
inside the kmalloc-ed area. This is dangerous as freeing a random
pointer is more likely to "work". But in this case it would be very
convenient.

Roger.

(*) Too buggy for anyone but me.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.

2001-03-05 13:22:35

by Alan

[permalink] [raw]
Subject: Re: kmalloc() alignment

> As far as I know, you can count on 16-bytes alignment from
> kmalloc. The trouble is that you would have to keep the original

Actually it depends on the debug settings


2001-03-05 13:22:55

by Rogier Wolff

[permalink] [raw]
Subject: Re: kmalloc() alignment

Alan Cox wrote:
> > As far as I know, you can count on 16-bytes alignment from
> > kmalloc. The trouble is that you would have to keep the original
>
> Actually it depends on the debug settings

Actually THAT's a bug in the debug stuff....

Roger.


--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.

2001-03-06 00:07:33

by Kenn Humborg

[permalink] [raw]
Subject: Re: kmalloc() alignment

On Sun, Mar 04, 2001 at 11:41:12PM +0100, Manfred Spraul wrote:
> >
> > Does kmalloc() make any guarantees of the alignment of allocated
> > blocks? Will the returned block always be 4-, 8- or 16-byte
> > aligned, for example?
> >
>
> 4-byte alignment is guaranteed on 32-bit cpus, 8-byte alignment on
> 64-bit cpus.

So, to summarise (for 32-bit CPUs):

o Alan Cox & Manfred Spraul say 4-byte alignment is guaranteed.

o If you need larger alignment, you need to alloc a larger space,
round as necessary, and keep the original pointer for kfree()

Maybe I'll just use get_free_pages, since it's a 64KB chunk that
I need (and it's only a once-off).

Thanks for your advice.

Later,
Kenn

2001-03-06 00:16:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: kmalloc() alignment

Followup to: <[email protected]>
By author: Kenn Humborg <[email protected]>
In newsgroup: linux.dev.kernel
>
> On Sun, Mar 04, 2001 at 11:41:12PM +0100, Manfred Spraul wrote:
> > >
> > > Does kmalloc() make any guarantees of the alignment of allocated
> > > blocks? Will the returned block always be 4-, 8- or 16-byte
> > > aligned, for example?
> > >
> >
> > 4-byte alignment is guaranteed on 32-bit cpus, 8-byte alignment on
> > 64-bit cpus.
>
> So, to summarise (for 32-bit CPUs):
>
> o Alan Cox & Manfred Spraul say 4-byte alignment is guaranteed.
>
> o If you need larger alignment, you need to alloc a larger space,
> round as necessary, and keep the original pointer for kfree()
>
> Maybe I'll just use get_free_pages, since it's a 64KB chunk that
> I need (and it's only a once-off).
>

It might be worth asking the question if larger blocks are more
aligned?

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2001-03-06 00:29:44

by Kenn Humborg

[permalink] [raw]
Subject: Re: kmalloc() alignment

On Mon, Mar 05, 2001 at 04:15:36PM -0800, H. Peter Anvin wrote:
> > So, to summarise (for 32-bit CPUs):
> >
> > o Alan Cox & Manfred Spraul say 4-byte alignment is guaranteed.
> >
> > o If you need larger alignment, you need to alloc a larger space,
> > round as necessary, and keep the original pointer for kfree()
> >
> > Maybe I'll just use get_free_pages, since it's a 64KB chunk that
> > I need (and it's only a once-off).
> >
>
> It might be worth asking the question if larger blocks are more
> aligned?

OK, I'll bite...

Are larger blocks more aligned?

Later,
Kenn

2001-03-06 02:12:11

by Alan

[permalink] [raw]
Subject: Re: kmalloc() alignment

> > It might be worth asking the question if larger blocks are more
> > aligned?
>
> OK, I'll bite...
> Are larger blocks more aligned?

Only get_free_page()

Alan

2001-03-06 05:05:32

by H. Peter Anvin

[permalink] [raw]
Subject: Re: kmalloc() alignment

Alan Cox wrote:
>
> > > It might be worth asking the question if larger blocks are more
> > > aligned?
> >
> > OK, I'll bite...
> > Are larger blocks more aligned?
>
> Only get_free_page()
>

I wonder if it would be practical/reasonable to guarantee better
alignment for larger allocations (at least for sizes that are powers of
two); especially 8- and 16-byte alignment is sometimes necessary.

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2001-03-06 08:31:36

by Rogier Wolff

[permalink] [raw]
Subject: Re: kmalloc() alignment


> Followup to: <[email protected]>
> By author: Kenn Humborg <[email protected]>
> In newsgroup: linux.dev.kernel
> >
> > On Sun, Mar 04, 2001 at 11:41:12PM +0100, Manfred Spraul wrote:
> > > >
> > > > Does kmalloc() make any guarantees of the alignment of allocated
> > > > blocks? Will the returned block always be 4-, 8- or 16-byte
> > > > aligned, for example?
> > > >
> > >
> > > 4-byte alignment is guaranteed on 32-bit cpus, 8-byte alignment on
> > > 64-bit cpus.
> >
> > So, to summarise (for 32-bit CPUs):
> >
> > o Alan Cox & Manfred Spraul say 4-byte alignment is guaranteed.
> >
> > o If you need larger alignment, you need to alloc a larger space,
> > round as necessary, and keep the original pointer for kfree()
> >
> > Maybe I'll just use get_free_pages, since it's a 64KB chunk that
> > I need (and it's only a once-off).

My old kmalloc would actually use n+10 bytes if you request n bytes.
As memory comes in pools of powers of two, if you request 64k, you
would acutaly use 128k of memory. If you use "get_free_pages", you'll
not have the overhead, and actually allocate the 64k you need.

I'm not sure what the slab stuff does...

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.

2001-03-06 09:00:21

by Philipp Rumpf

[permalink] [raw]
Subject: Re: kmalloc() alignment

On Sun, Mar 04, 2001 at 10:34:31PM +0000, Alan Cox wrote:
> > Does kmalloc() make any guarantees of the alignment of allocated
> > blocks? Will the returned block always be 4-, 8- or 16-byte
> > aligned, for example?
>
> There are people who assume 16byte alignment guarantees. I dont think anyone
> has formally specified the guarantee beyond 4 bytes tho

Userspace malloc is "suitably aligned for any kind of variable", so I think
expecting 8 bytes alignment (long long on 32-bit platforms) should be okay.

>From reading the code it seems as though we actually use L1_CACHE_BYTES,
and I think it might be a good idea to document the current behaviour (as
long as there's no good reason to change it ?)

diff -ur linux/mm/slab.c linux-prumpf/mm/slab.c
--- linux/mm/slab.c Tue Mar 6 00:54:38 2001
+++ linux-prumpf/mm/slab.c Tue Mar 6 01:00:47 2001
@@ -1525,9 +1525,10 @@
* @flags: the type of memory to allocate.
*
* kmalloc is the normal method of allocating memory
- * in the kernel. Note that the @size parameter must be less than or
- * equals to %KMALLOC_MAXSIZE and the caller must ensure this. The @flags
- * argument may be one of:
+ * in the kernel. It returns a pointer (aligned to a hardware cache line
+ * boundary) to the allocated memory, or %NULL in case of failure. Note that
+ * the @size parameter must be less than or equal to %KMALLOC_MAXSIZE and
+ * the caller must ensure this. The @flags argument may be one of:
*
* %GFP_USER - Allocate memory on behalf of user. May sleep.
*

2001-03-06 12:12:01

by Xavier Bestel

[permalink] [raw]
Subject: Re: kmalloc() alignment

Le 06 Mar 2001 09:31:01 +0100, Rogier Wolff a ?crit :
>
> > Followup to: <[email protected]>
> > By author: Kenn Humborg <[email protected]>
> > In newsgroup: linux.dev.kernel
> > >
> > > On Sun, Mar 04, 2001 at 11:41:12PM +0100, Manfred Spraul wrote:
> > > > >
> > > > > Does kmalloc() make any guarantees of the alignment of allocated
> > > > > blocks? Will the returned block always be 4-, 8- or 16-byte
> > > > > aligned, for example?
> > > > >
> > > >
> > > > 4-byte alignment is guaranteed on 32-bit cpus, 8-byte alignment on
> > > > 64-bit cpus.
> > >
> > > So, to summarise (for 32-bit CPUs):
> > >
> > > o Alan Cox & Manfred Spraul say 4-byte alignment is guaranteed.
> > >
> > > o If you need larger alignment, you need to alloc a larger space,
> > > round as necessary, and keep the original pointer for kfree()
> > >
> > > Maybe I'll just use get_free_pages, since it's a 64KB chunk that
> > > I need (and it's only a once-off).
>
> My old kmalloc would actually use n+10 bytes if you request n bytes.
> As memory comes in pools of powers of two, if you request 64k, you
> would acutaly use 128k of memory. If you use "get_free_pages", you'll
> not have the overhead, and actually allocate the 64k you need.
>
> I'm not sure what the slab stuff does...

A properly initialised (i.e. default settings) 64k slab would put object
descriptors outside the slab itself, and so use the expected number of
pages for each 64k object, I believe.
Small or non n*512 sized objects are a different story.

Xav

2001-03-06 12:12:01

by Alan

[permalink] [raw]
Subject: Re: kmalloc() alignment

> > There are people who assume 16byte alignment guarantees. I dont think anyone
> > has formally specified the guarantee beyond 4 bytes tho
>
> Userspace malloc is "suitably aligned for any kind of variable", so I think
> expecting 8 bytes alignment (long long on 32-bit platforms) should be okay.
>
> >From reading the code it seems as though we actually use L1_CACHE_BYTES,
> and I think it might be a good idea to document the current behaviour (as
> long as there's no good reason to change it ?)

With slab poisoning I dont belive this is true

2001-03-07 07:54:15

by Jauder Ho

[permalink] [raw]
Subject: RAID, 2.4.2 and Buslogic


Leonard,

My story is somewhat similar to what Dick Johnson has encountered except
this is with 2.4.2 running on a pentium 200.

I encountered an oops last night while untarring a file. Upon reboot, it
appears that the partition labels disappeared along with the superblock.
Unfortunately, I was not able to recover and had to redo the setup from
scratch.


Here is the lspci output

deepthought%jauderho% lspci
00:00.0 Host bridge: Intel Corporation 430TX - 82439TX MTXC (rev 01)
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 01)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 01)
00:09.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100]
(rev 02)
00:0b.0 VGA compatible controller: ATI Technologies Inc 210888GX [Mach64
GX] (rev 01)
00:0d.0 Ethernet controller: Accton Technology Corporation SMC2-1211TX
(rev 10)
00:0f.0 SCSI storage controller: BusLogic BT-946C (BA80C30) [MultiMaster 10] (rev 08)



Unfortunately, the System.map was deleted during a compile but attached is
the dmesg output.

EXT2-fs error (device md(9,0)): ext2_add_entry: bad entry in directory
#343396:
inode out of bounds - offset=0, inode=343396, rec_len=12, name_len=1
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 12
EXT2-fs error (device md(9,0)): free_inode: reserved inode or nonexistent
inode
kernel BUG at inode.c:885!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c01425ba>]
EFLAGS: 00010292
eax: 0000001b ebx: c2af8ba0 ecx: c373c000 edx: 00000001
esi: c023b9e0 edi: c38bd017 ebp: c3c285e0 esp: c1877f24
ds: 0018 es: 0018 ss: 0018
Process tar (pid: 5383, stackpage=c1877000)
Stack: c01fd7e5 c01fd865 00000375 c2af8ba0 c13b8f40 c014fa07 c2af8ba0
ffffffff
000001fd c3c285e0 c3c28650 c13b8f40 00000007 c3932560 c0138ef7
fffffffe
c013a773 c3c285e0 c13b8ee0 000001fd c13b8ee0 c1877fa4 c13b8ee0
c1e5c000
Call Trace: [<c014fa07>] [<c0138ef7>] [<c013a773>] [<c013a816>]
[<c0108de3>]

Code: 0f 0b 83 c4 0c eb 6f 39 1b 74 3b f6 83 ec 00 00 00 07 75 26
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 474218
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 474219
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 474216

...

EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number:
1062908
EXT2-fs error (device md(9,0)): ext2_readdir: bad entry in directory
#310689: in
ode out of bounds - offset=0, inode=310689, rec_len=12, name_len=1
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 228935
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 212584
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 212583
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 212586
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 212588
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 212589
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 212587
EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 212585
EXT2-fs error (device md(9,0)): ext2_find_entry: bad entry in directory
#883010:
inode out of bounds - offset=60, inode=245344, rec_len=4036, name_len=16



--Jauder







PS. Is there a minimum processor speed requirement to do RAID? I know the
pentium 200 is pretty wimpy but if this is the failure mode it was
certainly unexpected.



2001-03-07 08:23:26

by Andreas Dilger

[permalink] [raw]
Subject: Re: RAID, 2.4.2 and Buslogic

Jauder Ho writes:
> My story is somewhat similar to what Dick Johnson has encountered except
> this is with 2.4.2 running on a pentium 200.
>
> EXT2-fs error (device md(9,0)): ext2_add_entry: bad entry in directory
> #343396:
> inode out of bounds - offset=0, inode=343396, rec_len=12, name_len=1
> EXT2-fs error (device md(9,0)): ext2_write_inode: bad inode number: 12
>
> EXT2-fs error (device md(9,0)): free_inode: reserved inode or nonexistent
> inode
> kernel BUG at inode.c:885!

Inode 12 is a perfectly valid inode number for any filesystem, so your
ext2 superblock must have been corrupt (or zeroed out) at this point.
The value for sb->u.ext2_sb.s_es->s_inodes_count must have been < 12
(likely zero), which would explain all of these errors. Strange.

I have posted (twice) a patch which would prevent the BUG from happening.
Granted, it won't help your RAID/SCSI corruption problem (*). Please see

[PATCH] sanity checks for ext2 root inode

in l-k archives. I don't think this is in either Linus' or Alan's tree.

Cheers, Andreas

(*) in normal cases this prevents a small filesystem corruption from
halting the system, but in your case, the BUG may have prevented
larger corruption by halting the system before more damage was done?
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert