The stable util-linux-ng 2.17 release is available at
ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/v2.17/
Feedback and bug reports, as always, are welcomed.
Karel
Util-linux-ng 2.17 Release Notes (08-Jan-2010)
==============================================
Release highlights
------------------
fallocate:
- this NEW COMMAND is a command line interface to fallocate
Linux syscall and allows to preallocate blocks to a file.
unshare
- this NEW COMMAND is a command line interface to unshare Linux syscall
and allows to run program with some namespaces unshared from parent.
wipefs
- this NEW COMMAND is based on libblkid and allows to remove filesystem
or RAID signatures from a device.
libblkid:
- libblkid allows to gather information about block device topology,
currently supported methods are:
* ioctl - supported since kernel 2.6.32
* sysfs - supported since kernel 2.6.31
* fallback for DM, MD, LVM and EVMS on old kernels (based on code
from xfsprogs/libdisk)
The topology support is mostly designed for mkfs programs or partitioning
tools (already used in mkfs.xfs, mkex2fs, libparted and fdisk)
- libblkid supports partition tables parsing (currently supported are
aix, bsd, dos, mac, gpt, minix, sgi, solaris, sun and unixware). This
functionality is designed for mkfs programs, DeviceKits, [k]partx or so.
- libblkid API documentation is available at
http://ftp.kernel.org/pub/linux/utils/util-linux-ng/libblkid-docs/
blockdev:
- supports all new topology ioctls
fdisk:
- the fdisk command aligns newly created partitions to minimum_io_size
boundary ("minimum_io_size" is physical sector size or stripe chunk
size on RAIDs).
- the fdisk command supports disks with alignment_offset now.
Stable maintenance releases between v2.16 and v2.17
---------------------------------------------------
util-linux-ng 2.16.1 [07-Sep-2009]
* ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/v2.16/v2.16.1-ReleaseNotes
ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/v2.16/v2.16.1-ChangeLog
util-linux-ng 2.16.2 [30-Nov-2009]
* ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/v2.16/v2.16.2-ReleaseNotes
ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/v2.16/v2.16.2-ChangeLog
ChangeLog between v2.16 and v2.17
---------------------------------
For more details see ChangeLog files at:
ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/v2.17/
addpart:
- addpart.8 formatting [Peter Breitenlohner]
blkid:
- add ID_FS_AMBIVALENT for udev output [Karel Zak]
- add pretty output, document -L incompatibility with e2fsprogs [Karel Zak]
- allow to use -s <TAG> for low-level probing (-p mode) [Karel Zak]
blockdev:
- add support for uint and ushort ioctls [Karel Zak]
- add topology ioctls support [Karel Zak]
- blockdev.8 formatting [Peter Breitenlohner]
- fix topology ioctls [Karel Zak]
- refactoring (better commands definition) [Karel Zak]
- use c.h [Karel Zak]
build-sys:
- add --disable-makeinstall-setuid [Karel Zak]
- add HAVE_LIBBLKID_INTERNAL [Karel Zak]
- check for pkg-config before gtk-doc [Karel Zak]
- check for union semun instead of using _SEM_SEMUN_UNDEFINED [Guillem Jover]
- clean up gtk-doc stuff [Karel Zak]
- clean up gtk-doc usage [Karel Zak]
- cleanup --disable-{fallocate,pivot_root,unshare} [Karel Zak]
- cleanup AM_CFLAGS usage [Karel Zak]
- cleanup static building [Karel Zak]
- detect if const is available [Guillem Jover]
- detect if volatile is available [Guillem Jover]
- don't distribute generated *.pc files [Karel Zak]
- don't distribute generated blkid.h [Karel Zak]
- enable silent rules if automake >= 1.11 [Guillem Jover]
- fix (official) gtk-doc.make [Karel Zak]
- fix BUILD_PIVOT_ROOT condition [Karel Zak]
- fix blkid CFLAGS in fdisk/Makefile.am [Karel Zak]
- fix out-of-source build [Karel Zak]
- release++ (v2.17-rc1) [Karel Zak]
- release++ (v2.17-rc2) [Karel Zak]
- release++ (v2.17-rc3) [Karel Zak]
- remove LT_STATIC_LDFLAGS [Karel Zak]
- remove gtkdocize from autogen.sh [Karel Zak]
- remove obsolete --with-fsprobe from distcheck flags [Karel Zak]
- rewrite TLS detection [Karel Zak]
cal:
- cal.1 formatting [Peter Breitenlohner]
- fix (harmless) typo [Peter Breitenlohner]
- fix broken computation for Sep 1752 [Peter Breitenlohner]
- remove obsolete <localeinfo.h> include [Guillem Jover]
- use c.h [Karel Zak]
cfdisk:
- cfdisk.8 formatting [Peter Breitenlohner]
- more key alternatives [Jan Sarenik]
chfn:
- chfn.1 formatting [Peter Breitenlohner]
chrt:
- use c.h [Karel Zak]
chsh:
- chsh.1 formatting [Peter Breitenlohner]
ctrlaltdel:
- ctrlaltdel.8 formatting [Peter Breitenlohner]
cytune:
- cytune.8 missing description of `-S', formatting [Peter Breitenlohner]
ddate:
- ddate.1 formatting [Peter Breitenlohner]
delpart:
- delpart.8 formatting [Peter Breitenlohner]
dmesg:
- add -r to help output [Karel Zak]
- dmesg.1 formatting [Peter Breitenlohner]
- fix typo in man page [Ken Kopin]
docs:
- README width and language correction [Jan Sarenik]
- add 'unshare' and 'wipefs' to AUTHORS [Karel Zak]
- add LGPLv2+ to list of licenses [Karel Zak]
- add ngettext() into TODO file [Karel Zak]
- add v2.17 ReleaseNotes [Karel Zak]
- update "The Perfect Patch" URL [Karel Zak]
- update AUTHORS file [Karel Zak]
- update TODO [Karel Zak]
- update TODO file [Karel Zak]
- update TODO list [Karel Zak]
- update v2.17 ReleaseNotes [Karel Zak]
elvtune:
- elvtune.8 formatting [Peter Breitenlohner]
fallocate:
- check for ERANGE errors [Karel Zak]
- new command [Karel Zak, Eric Sandeen]
fdformat:
- fdformat.8 formatting [Peter Breitenlohner]
- fix memory leak in verify_disk() [Cristian Rodríguez]
fdisk:
- add basic routines for LBA alignment [Karel Zak]
- add regression test listing empty/nonsense images [Zdenek Behan]
- align end of partition when defined by +size{K,M,G} [Karel Zak]
- check for partition boundary [Karel Zak]
- fdisk.8 formatting [Peter Breitenlohner]
- fix strict-aliasing bugs [Karel Zak]
- offer aligned first sector [Karel Zak]
- print info and recommendations about alignment [Karel Zak]
- read topology info from libblkid [Karel Zak]
- sgi label - remove duplicate swab16swab[16,32]() definitions [Karel Zak]
- sleep-after-sync and fsync usage [Karel Zak]
- use c.h [Karel Zak]
- use minimal_io_size for the first partition [Karel Zak]
findfs:
- fix typo in findfs.8 [Karel Zak]
flock:
- fix hang when parent ignores SIGCHLD [Mike Frysinger]
fsck:
- document fsck behavior wrt nofail option and fstype 'auto' [Ludwig Nussel]
- fsck.8 formatting [Peter Breitenlohner]
- honor nofail option in fsck [Ludwig Nussel]
fsck.minix:
- fix broken zone checking [Karel Zak]
- fix strict-aliasing bugs [Karel Zak]
- fsck.minix.8 formatting [Peter Breitenlohner]
fstab:
- fstab.5 formatting [Peter Breitenlohner]
getopt:
- getopt.1 formatting [Peter Breitenlohner]
hexdump:
- bug in hexdump when offset == file length [Américo Wang]
- hexdump.1 erroneous .Nm "" [Peter Breitenlohner]
hwclock:
- do not access hardware clock when using --systz [Scott James Remnant]
- hwclock.8 formatting [Peter Breitenlohner]
- set kernel timezone with --systz --utc [Scott James Remnant]
- use c.h [Karel Zak]
- use time limit for KDGHWCLK busy wait [Karel Zak]
include:
- add c.h with fundamental C definitions [Karel Zak]
- use c.h in canonicalize.h [Karel Zak]
initctl:
- fix strict-aliasing bugs [Karel Zak]
- initctl.8 formatting [Peter Breitenlohner]
ionice:
- add a note about none class and CFQ [Karel Zak]
- ionice.1 formatting [Peter Breitenlohner]
ipcmk:
- ipcmk.1 formatting [Peter Breitenlohner]
ipcrm:
- ipcrm.1 formatting [Peter Breitenlohner]
ipcs:
- ipcs.1 formatting [Peter Breitenlohner]
- use __GLIBC__ instead of obsolete __GNU_LIBRARY__ [Guillem Jover]
isosize:
- isosize.8 formatting [Peter Breitenlohner]
kill:
- kill.1 formatting [Peter Breitenlohner]
- use c.h [Karel Zak]
last:
- fix utmp.ut_time usage [Karel Zak]
- last.1 formatting [Peter Breitenlohner]
ldattach:
- ldattach.8 formatting [Peter Breitenlohner]
- use c.h [Karel Zak]
lib:
- add a generic crc32() [Karel Zak]
- bug (typo) in function MD5Final() [Karel Zak]
- fix file descriptor leak in is_mounted() [Theodore Ts'o]
- fix lib/Makefile.am (remove pttype.c) [Karel Zak]
- import whole ismounted.c code from e2fsprogs [Karel Zak]
- remove pttype.c [Karel Zak]
libblkid:
- DRBD support for blkid [Bastian Friedrich]
- add *.ko.gz support to modules.dep parser [Karel Zak]
- add AIX partitions support [Karel Zak]
- add BLKID_SUBLKS_* flags [Karel Zak]
- add BLKID_{VERSION,DATE} to blkid.h [Karel Zak]
- add BSD partitions support [Karel Zak]
- add DM topology support (for old kernels) [Karel Zak]
- add EFI GPT partitions support [Karel Zak]
- add EVMS topology support (for old kernels) [Karel Zak]
- add LVM topology support (for old kernels) [Karel Zak]
- add MAC partitions support [Karel Zak]
- add MD topology support (for old kernels) [Karel Zak]
- add MINIX partitions support [Karel Zak]
- add MS-DOS partitions support [Karel Zak]
- add SGI partitions support [Karel Zak]
- add SOLARIS-X86 partitions support [Karel Zak]
- add SUN partitions support [Karel Zak]
- add UBI volume support [Corentin Chary]
- add UBIFS support [Corentin Chary]
- add UNIXWARE partitions support [Karel Zak]
- add a probe for bfs [Christoph Hellwig]
- add blkid_devno_to_wholedisk() [Karel Zak]
- add blkid_driver_has_major() [Karel Zak]
- add blkid_new_probe_from_filename() [Karel Zak]
- add blkid_partition_get_type_string() [Karel Zak]
- add blkid_probe_get_{size,sectorsize,devno} [Karel Zak]
- add blkit_[un]ref() to TODO [Karel Zak]
- add chain structs [Karel Zak]
- add functions for chain tags [Karel Zak]
- add generic filter functions [Karel Zak]
- add generic function for binary data [Karel Zak]
- add missing comments [Karel Zak]
- add missing comments to probe.c [Karel Zak]
- add missing packed attributes [Karel Zak]
- add mkfs sample [Karel Zak]
- add note about UUID_SUB, increment number of superblock values [Karel Zak]
- add partitions filter routines [Karel Zak]
- add partitions parsing support [Karel Zak]
- add partitions sample [Karel Zak]
- add private blkid_topology_set_*() functions [Karel Zak]
- add samples/topology.c [Karel Zak]
- add sector size funcs to blkid.h.in [Karel Zak]
- add superblocks chain [Karel Zak]
- add superblocks filter functions [Karel Zak]
- add superblocks.c sample [Karel Zak]
- add support for SBMAGIC and SBMAGIC_OFFSET [Karel Zak]
- add support for VMFS (VMware File System) [Mike Hommey]
- add support for topology ioctls [Karel Zak]
- add test cases for VMFS [Mike Hommey]
- add topology support [Karel Zak]
- allow linking with uClibc [Daniel Mierswa]
- allow to change dimension of probing area [Karel Zak]
- allow to read in sectors [Karel Zak]
- allows more probing methods for topology chain [Karel Zak]
- announce Joliet extension [Maxim Levitsky]
- cleanup blkid_probe_set_device() [Karel Zak]
- cleanup topology fallback [Karel Zak]
- convert GPT partition LBA to 512-byte sectors [Karel Zak]
- cosmetic change in topology sample [Karel Zak]
- create a generic blkid_encode_to_utf8() [Karel Zak]
- create a generic blkid_unparse_uuid() [Karel Zak]
- does not return useless binary data [Karel Zak]
- don't return empty LABELs [Karel Zak]
- don't scan private /dev/.udev directory [Karel Zak]
- fix Adaptec RAID detection [Karel Zak]
- fix FALSE definition [Karel Zak]
- fix FAT super block definition [Lawrence Rust]
- fix NTFS non-ASCII labels [Karel Zak]
- fix UFS detection [Karel Zak]
- fix blkid_devno_to_wholedisk() [Karel Zak]
- fix blkid_do_probe() to work properly with chains [Karel Zak]
- fix blkid_fstatat() code [Karel Zak]
- fix blkid_probe_set_utf8label() call for Joliet [Karel Zak]
- fix buffer overflow in blkid_encode_string() [Florian Zumbiehl]
- fix cache->probe memory leak [Karel Zak]
- fix ext2 detection on systems with ext4 only [Karel Zak]
- fix gcc warning (warn_unused_result) [Karel Zak]
- fix highpoint37x detection [Karel Zak]
- fix non-magic FAT detection [Karel Zak]
- fix probing for binary interface [Karel Zak]
- fix segfault in blkid_do_probe() [Karel Zak]
- fix the default cache file path [Karel Zak]
- fix topology information values [Eric Sandeen]
- fix typo (swsupend -> swsuspend) [Karel Zak]
- fix typo s/Hihg/High/ [Jim Meyering]
- fix warning message in mkfs sample [Karel Zak]
- gtkdocize (API docs generated by gtk-docs) [Karel Zak]
- minor changes to dm topology code [Karel Zak]
- minor changes to samples [Karel Zak]
- minor fix in topology sample [Karel Zak]
- move FS/raid stuff to superblocks directory [Karel Zak]
- move blkid_known_fstype() to superblocks.c [Karel Zak]
- move filter macros to header file [Karel Zak]
- prefer ISO9660 PVD Label to Joliet Label [Karel Zak]
- properly reset position in probing chains [Karel Zak]
- refresh blkid.{h,sym} [Karel Zak]
- remove duplicate debug message [Karel Zak]
- remove superblock functions from probe.c [Karel Zak]
- rename highpoint RAIDs to hpt{37,45}x_raid_member [Karel Zak]
- return first detected crypto device [Scott James Remnant]
- topology - add logical and physical sector size [Karel Zak]
- topology - ignore non-blockdevs [Karel Zak]
- trim tailing whitespace from unicode LABELs [Karel Zak]
- update docs/.gitignore [Karel Zak]
- use BLKSSZGET for GPT sectors [Karel Zak]
- use blkid_new_probe_from_filename() in docs [Karel Zak]
- use c.h [Karel Zak]
- use c.h in samples [Karel Zak]
- use chains in blkid_do_{safe,full,}_probe() [Karel Zak]
- use chains in prober (de)initialization [Karel Zak]
- use fstatat(), improve readdir() usage [Karel Zak]
- use private {lookup,get}_value functions [Karel Zak]
- use superblock filter functions [Karel Zak]
- use superblocks.h [Karel Zak]
- use the new API in whole u-l-ng [Karel Zak]
libuuid:
- remove .UE macro from libuuid man pages. [Milan Broz]
line:
- remove deprecated #ident directive [Karel Zak]
losetup:
- losetup.8 formatting [Peter Breitenlohner]
- remove unused macro [Karel Zak]
lscpu:
- add {32,64}-bit CPU modes detection [Karel Zak]
- lscpu.1 formatting [Peter Breitenlohner]
mcookie:
- mcookie.1 formatting [Peter Breitenlohner]
mesg:
- mesg.1 formatting [Peter Breitenlohner]
mkfs:
- mkfs.8 incomplete sentence and formatting [Peter Breitenlohner]
mkfs.bfs:
- mkfs.bfs.8 formatting [Peter Breitenlohner]
mkfs.cramfs:
- fix gcc warning (incompatible pointer type) [Karel Zak]
mkfs.minix:
- fix strict-aliasing bugs [Karel Zak]
- mkfs.minix.8 formatting [Peter Breitenlohner]
mkswap:
- fix memory leaks, cleanup check_blocks() [Karel Zak]
- mkswap.8 formatting [Peter Breitenlohner]
- restore device argument in mkswap.8 synopsis [Peter Breitenlohner]
- unbreak -c ("check") option. [Peter De Wachter]
- use libblkid to detect PT [Karel Zak]
more:
- limited line buffer length results in corrupted UTF-8 text [Karel Zak]
- more.1 formatting [Peter Breitenlohner]
mount:
- add --no-canonicalize option [Karel Zak]
- add a note about /dev/disk/by-* to mount.8 [Karel Zak]
- add a note about bind-dir remounts [Karel Zak]
- add info about ext{3,4} barriers to mount.8 [Karel Zak]
- add long options to mount.8 [Karel Zak]
- add squashfs to mount.8 [Karel Zak]
- add ubifs to the mount.8 man page [Sebastian Andrzej Siewior]
- and libblkid covert /dev/dm-N to /dev/mapper/<name> [Karel Zak]
- better --move description [Karel Zak]
- check for unsuccessful read-only bind mounts [Karel Zak]
- disable --no-canonicalize for non-root users [Karel Zak]
- document changed semantics of tmpfs size option in mount.8 [[email protected]]
- fix mount.8, xfs attr2 is enabled by default [Karel Zak]
- fix reference to samba-client in mount.8 [Karel Zak]
- fix typo in mount.8 [Karel Zak]
- mention mtab for single mount point mount in mount.8 [Peter Volkov]
- more explicitly explain 'strictatime' in mount.8 [Karel Zak]
- more verbose "mount only root can do that" message [Karel Zak]
- mount.8 formatting [Peter Breitenlohner]
- move info about devices to the top of mount.8 [Karel Zak]
- update list of pseudo filesystems [Karel Zak]
namei:
- better mount points detection [Karel Zak]
- fix alone symlink evaluation [Karel Zak]
- gater information about / (root) [Karel Zak]
- namei.1 formatting [Peter Breitenlohner]
- use c.h [Karel Zak]
newgrp:
- newgrp.1 formatting [Peter Breitenlohner]
- use c.h, remove tailing whitespace [Karel Zak]
partx:
- partx.8 formatting [Peter Breitenlohner]
- use c.h [Karel Zak]
- work properly with 512 sectors (dos PT) [Karel Zak]
pg:
- command enters infinite loop [Mike Frysinger]
- compiler warning with NLS disabled [Peter Breitenlohner]
- pg.1 formatting [Peter Breitenlohner]
pivot_root:
- pivot_root.8 formatting [Peter Breitenlohner]
po:
- fix grammar glitch in german translation [Hendrik Lönngren]
- fix msgid bugs [Karel Zak]
- merge changes [Karel Zak]
- update POTFILES.in [Karel Zak]
- update cs.po (from translationproject.org) [Petr Pisar]
- update eu.po (from translationproject.org) [Mikel Olasagasti Uranga]
- update eu.po (from translationproject.org) [Mikel Olasagasti]
- update fi.po (from translationproject.org) [Lauri Nurmi]
- update fr.po (from translationproject.org) [Nicolas Provost]
- update id.po (from translationproject.org) [Arif E. Nugroho]
- update ja.po (from translationproject.org) [Makoto Kato]
- update pl.po (from translationproject.org) [Jakub Bogusz]
- update po/POTFILES.in [Karel Zak]
- update vi.po (from translationproject.org) [Clytie Siddall]
- update zh_CN.po (from translationproject.org) [Ray Wang]
rdev:
- rdev.8 formatting [Peter Breitenlohner]
readprofile:
- readprofile.1 formatting [Peter Breitenlohner]
rename:
- rename.1 formatting [Peter Breitenlohner]
renice:
- renice.1 formatting [Peter Breitenlohner]
reset:
- reset.1 formatting [Peter Breitenlohner]
rtcwake:
- add S5 support [Karel Zak]
- ignore the tm_isdst field returned from the RTC [Paul Fox]
- rtcwake.8 formatting [Peter Breitenlohner]
scriptreplay:
- fix typo in error message [Karel Zak]
- scriptreplay.1 formatting [Peter Breitenlohner]
setarch:
- setarch.8 formatting [Peter Breitenlohner]
setsid:
- setsid.1 formatting [Peter Breitenlohner]
setterm:
- setterm.1 formatting [Peter Breitenlohner]
- use c.h, remove tailing whitespace [Karel Zak]
sfdisk:
- confused about disk size [Karel Zak]
- dump has to be $LANG insensitive [Karel Zak]
- sfdisk.8 formatting [Peter Breitenlohner]
- use c.h, remove obsolete #ifdefs [Karel Zak]
shutdown:
- shutdown.8 formatting [Peter Breitenlohner]
simpleinit:
- simpleinit.8 formatting [Peter Breitenlohner]
swapon:
- fix typo on swapon.8 manpage [Florentin Duneau]
- handle <=linux-2.6.19 bug in /proc/swaps [Mike Frysinger]
- more robust progname probing [Karel Zak]
- swapon.8 formatting [Peter Breitenlohner]
switch_root:
- add note about subroots to switch_root.8 [Karel Zak]
- remove TIOCSCTTY and setsid() [Karel Zak]
- switch_root.8 formatting [Peter Breitenlohner]
tailf:
- fix printf format [Mike Frysinger]
- report inotify_add_watch() problems [Karel Zak]
tests:
- add BFS libblkid regression test [Karel Zak]
- add NTFS blkid test [Karel Zak]
- add UBIFS test image to blkid test suite [Corentin Chary]
- add UFS test image for libblkid [Karel Zak]
- add VIA RAID test image for libblkid [Karel Zak]
- add adaptec RAID test [Karel Zak]
- add blkid regression tests for ISO9660 [Karel Zak]
- add hpt37x RAID test [Karel Zak]
- add hpt45x RAID test [Karel Zak]
- add isw RAID test [Karel Zak]
- add jmicron RAID test [Karel Zak]
- add lsi RAID test [Karel Zak]
- add nvidia RAID test [Karel Zak]
- add partitions probing test [Karel Zak]
- add promise RAID test [Karel Zak]
- add silicon RAID test [Karel Zak]
- fdisk doslabel test also checks changing partition type [Zdenek Behan]
- fdisk doslabel test also checks setting partition active [Zdenek Behan]
- refresh GPT regression test [Karel Zak]
- refresh lscpu tests [Karel Zak]
- remove vol_id from tests [Karel Zak]
- rename blkid/images to blkid/images-fs [Karel Zak]
- swapon workaround for libtool wrapper [Karel Zak]
- test for basic functionality of sun labels [Zdenek Behan]
- update fsck.ismounted test [Karel Zak]
tunelp:
- tunelp.8 formatting [Peter Breitenlohner]
ul:
- ul.1 erroneous .SH instead of .Sh [Peter Breitenlohner]
umount:
- add --no-canonicalize [Karel Zak]
- umount.8 command line for umount helpers, formatting [Peter Breitenlohner]
unshare:
- new command [Mikhail Gusarov]
uuidd:
- uuidd.8 formatting [Peter Breitenlohner]
uuidgen:
- uuidgen.1 formatting [Peter Breitenlohner]
vipw:
- vipw.8 remove erroneous empty line, formatting [Peter Breitenlohner]
whereis:
- whereis.1 formatting [Peter Breitenlohner]
wipefs:
- fix coding style [Karel Zak]
- new command [Karel Zak]
- remove obsolete comment [Karel Zak]
write:
- write.1 formatting [Peter Breitenlohner]
--
Karel Zak <[email protected]>
On 01/08/2010 01:33 AM, Karel Zak wrote:
>
> fdisk:
> - the fdisk command aligns newly created partitions to minimum_io_size
> boundary ("minimum_io_size" is physical sector size or stripe chunk
> size on RAIDs).
>
> - the fdisk command supports disks with alignment_offset now.
>
I think we should align, by default, much more aggressively than that --
because frequently we just don't know what the real physical alignment
is (think of flash media, which uses large erase blocks underneath.)
Windows aligns partitions 1 MB boundaries by default now -- I think
that's probably a reasonably good idea, at least for any disk that's not
tiny, say 256 MB or less.
-hpa
On 2010-01-08, at 14:43, H. Peter Anvin wrote:
> On 01/08/2010 01:33 AM, Karel Zak wrote:
>> fdisk:
>> - the fdisk command aligns newly created partitions to
>> minimum_io_size
>> boundary ("minimum_io_size" is physical sector size or stripe
>> chunk
>> size on RAIDs).
>>
>> - the fdisk command supports disks with alignment_offset now.
>
> I think we should align, by default, much more aggressively than
> that --
> because frequently we just don't know what the real physical alignment
> is (think of flash media, which uses large erase blocks underneath.)
> Windows aligns partitions 1 MB boundaries by default now -- I think
> that's probably a reasonably good idea, at least for any disk that's
> not
> tiny, say 256 MB or less.
I agree whole heartedly. We steer users very sharply away from using
partitions at all, because on h/w RAID devices the 512-byte offset
from fdisk completely kills RAID-5/6 performance.
Making the default minimum alignment for DOS/GPT partitions makes a
lot of sense, and LVM PEs should be on 1MB boundaries as well (I don't
think that is the case today either).
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
On 01/08/2010 02:40 PM, Andreas Dilger wrote:
> On 2010-01-08, at 14:43, H. Peter Anvin wrote:
>> On 01/08/2010 01:33 AM, Karel Zak wrote:
>>> fdisk:
>>> - the fdisk command aligns newly created partitions to
>>> minimum_io_size
>>> boundary ("minimum_io_size" is physical sector size or stripe
>>> chunk
>>> size on RAIDs).
>>>
>>> - the fdisk command supports disks with alignment_offset now.
>>
>> I think we should align, by default, much more aggressively than
>> that --
>> because frequently we just don't know what the real physical alignment
>> is (think of flash media, which uses large erase blocks underneath.)
>> Windows aligns partitions 1 MB boundaries by default now -- I think
>> that's probably a reasonably good idea, at least for any disk that's
>> not
>> tiny, say 256 MB or less.
>
> I agree whole heartedly. We steer users very sharply away from using
> partitions at all, because on h/w RAID devices the 512-byte offset
> from fdisk completely kills RAID-5/6 performance.
>
> Making the default minimum alignment for DOS/GPT partitions makes a
> lot of sense, and LVM PEs should be on 1MB boundaries as well (I don't
> think that is the case today either).
>
As far as I can tell, there is absolutely no reason to not align all
partitions, all the time (for both GPT and MBR... GPT may need a "dummy
alignment partition" to fulfill the "no nonpartitioned space" dictum,
although it seems like an impossible requirement in practice -- I think
they main reason for it is to avoid abusers like Grub relying on putting
data in unpartitioned space.)
-hpa
>>>>> "Andreas" == Andreas Dilger <[email protected]> writes:
Andreas> I agree whole heartedly. We steer users very sharply away from
Andreas> using partitions at all, because on h/w RAID devices the
Andreas> 512-byte offset from fdisk completely kills RAID-5/6
Andreas> performance.
I don't have a problem aligning to 1MB (taking alignment_offset into
account) by default but there needs to be an easy override. There are
RAID arrays that internally compensate for the legacy 63 sector offset
and we'll cause misalignment if we start at 1MB on those.
I'm heavily lobbying our storage partners to make sure they fill out the
right SCSI bits if their LUNs are not naturally aligned. But there are
obviously going to be legacy devices out there that will need manual
alignment compensation.
--
Martin K. Petersen Oracle Linux Engineering
On 01/10/2010 10:36 PM, Martin K. Petersen wrote:
>>>>>> "Andreas" == Andreas Dilger <[email protected]> writes:
>
> Andreas> I agree whole heartedly. We steer users very sharply away from
> Andreas> using partitions at all, because on h/w RAID devices the
> Andreas> 512-byte offset from fdisk completely kills RAID-5/6
> Andreas> performance.
>
> I don't have a problem aligning to 1MB (taking alignment_offset into
> account) by default but there needs to be an easy override. There are
> RAID arrays that internally compensate for the legacy 63 sector offset
> and we'll cause misalignment if we start at 1MB on those.
>
> I'm heavily lobbying our storage partners to make sure they fill out the
> right SCSI bits if their LUNs are not naturally aligned. But there are
> obviously going to be legacy devices out there that will need manual
> alignment compensation.
>
Yes, but we shouldn't default to braindead mode.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Fri 2010-01-08 13:43:47, H. Peter Anvin wrote:
> On 01/08/2010 01:33 AM, Karel Zak wrote:
> >
> > fdisk:
> > - the fdisk command aligns newly created partitions to minimum_io_size
> > boundary ("minimum_io_size" is physical sector size or stripe chunk
> > size on RAIDs).
> >
> > - the fdisk command supports disks with alignment_offset now.
> >
>
> I think we should align, by default, much more aggressively than that --
> because frequently we just don't know what the real physical alignment
> is (think of flash media, which uses large erase blocks underneath.)
Flash has special mapping layer, and does not care (SD/MMC), or is a
raw nand and can't be used as block device (smartmedia).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 01/11/2010 06:05 AM, Pavel Machek wrote:
> On Fri 2010-01-08 13:43:47, H. Peter Anvin wrote:
>> On 01/08/2010 01:33 AM, Karel Zak wrote:
>>>
>>> fdisk:
>>> - the fdisk command aligns newly created partitions to minimum_io_size
>>> boundary ("minimum_io_size" is physical sector size or stripe chunk
>>> size on RAIDs).
>>>
>>> - the fdisk command supports disks with alignment_offset now.
>>>
>>
>> I think we should align, by default, much more aggressively than that --
>> because frequently we just don't know what the real physical alignment
>> is (think of flash media, which uses large erase blocks underneath.)
>
> Flash has special mapping layer, and does not care (SD/MMC), or is a
> raw nand and can't be used as block device (smartmedia).
>
Uhm, that's just plain wrong.
It doesn't matter if there is a "special mapping layer" -- if you're
crossing multiple erase blocks you're still having more churn in your
flash translation layer, with more wear on the device, and lower
performance than if you didn't.
-hpa
On Mon, 2010-01-11 at 15:05 +0100, Pavel Machek wrote:
> On Fri 2010-01-08 13:43:47, H. Peter Anvin wrote:
> > On 01/08/2010 01:33 AM, Karel Zak wrote:
> > >
> > > fdisk:
> > > - the fdisk command aligns newly created partitions to minimum_io_size
> > > boundary ("minimum_io_size" is physical sector size or stripe chunk
> > > size on RAIDs).
> > >
> > > - the fdisk command supports disks with alignment_offset now.
> > >
> >
> > I think we should align, by default, much more aggressively than that --
> > because frequently we just don't know what the real physical alignment
> > is (think of flash media, which uses large erase blocks underneath.)
>
> Flash has special mapping layer, and does not care (SD/MMC), or is a
> raw nand and can't be used as block device (smartmedia).
MMC does care, and we measured it, non-4KiB aligned writes were
noticeably slower on real eMMC and some MMC cards. I cannot remember the
numbers, but I vaguely recall something like 11%.
So the first statement is incorrect. It really does matter. E.g., we
observed that FAT FS was slower if the partition did not start from 4KiB
boundary.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
On Mon 2010-01-11 08:52:56, H. Peter Anvin wrote:
> On 01/11/2010 06:05 AM, Pavel Machek wrote:
> >On Fri 2010-01-08 13:43:47, H. Peter Anvin wrote:
> >>On 01/08/2010 01:33 AM, Karel Zak wrote:
> >>>
> >>>fdisk:
> >>> - the fdisk command aligns newly created partitions to minimum_io_size
> >>> boundary ("minimum_io_size" is physical sector size or stripe chunk
> >>> size on RAIDs).
> >>>
> >>> - the fdisk command supports disks with alignment_offset now.
> >>>
> >>
> >>I think we should align, by default, much more aggressively than that --
> >>because frequently we just don't know what the real physical alignment
> >>is (think of flash media, which uses large erase blocks underneath.)
> >
> >Flash has special mapping layer, and does not care (SD/MMC), or is a
> >raw nand and can't be used as block device (smartmedia).
> >
>
> Uhm, that's just plain wrong.
>
> It doesn't matter if there is a "special mapping layer" -- if you're
> crossing multiple erase blocks you're still having more churn in
> your flash translation layer, with more wear on the device, and
> lower performance than if you didn't.
Eraseblocks really should not matter. It is not as if each logical
sector belongs to one eraseblock....
(OTOH, maybe the eraseblock *groups* that are basis for wear-leveling
do, or maybe firmware is doing something really really strange.)
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 01/11/2010 12:17 PM, Pavel Machek wrote:
>>
>> Uhm, that's just plain wrong.
>>
>> It doesn't matter if there is a "special mapping layer" -- if you're
>> crossing multiple erase blocks you're still having more churn in
>> your flash translation layer, with more wear on the device, and
>> lower performance than if you didn't.
>
> Eraseblocks really should not matter. It is not as if each logical
> sector belongs to one eraseblock....
>
> (OTOH, maybe the eraseblock *groups* that are basis for wear-leveling
> do, or maybe firmware is doing something really really strange.)
> Pavel
Maybe they "should not" matter, but they *do* matter. In most existing
FTLs, each logical sector *does* belong to one erase block, although
which particular erase block that is of course moves around. However,
the invariant that matters though -- and the reason alignment matters --
is that for most FTLs, the *offset* of any particular logical sector
within the erase block it currently belongs to is invariant, i.e. the
FTL operates on physical sectors which are the same size as the erase
blocks.
-hpa
On Mon, 11 January 2010 15:05:42 +0100, Pavel Machek wrote:
>
> Flash has special mapping layer, and does not care (SD/MMC), or is a
> raw nand and can't be used as block device (smartmedia).
Complete bullsh*t. Smartmedia can (and arguably must) be used as a
block device and all devices you listed do care about alignment.
Jörn
--
Consensus is no proof!
-- John Naisbitt
On 01/11/2010 11:19 PM, Jörn Engel wrote:
> On Mon, 11 January 2010 15:05:42 +0100, Pavel Machek wrote:
>>
>> Flash has special mapping layer, and does not care (SD/MMC), or is a
>> raw nand and can't be used as block device (smartmedia).
>
> Complete bullsh*t. Smartmedia can (and arguably must) be used as a
> block device and all devices you listed do care about alignment.
>
I believe for Smartmedia the FTL is done host side, but that there is a
standard-specified FTL (called CIS, Card Information System.)
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Tue, 12 January 2010 00:19:24 -0800, H. Peter Anvin wrote:
>
> I believe for Smartmedia the FTL is done host side, but that there is a
> standard-specified FTL (called CIS, Card Information System.)
A CIS block/page is part of the FTL design. Not sure whether it also
lends its name to it. I'll just take your word for it.
Whether the FTL is done host side depends on the card reader and your
definition of host side. Most old card readers implemented it in the
driver, but some newer ones prefer to export a usb mass storage
interface, thereby not requiring any driver CD.
Jörn
--
You cannot suppose that Moliere ever troubled himself to be original in the
matter of ideas. You cannot suppose that the stories he tells in his plays
have never been told before. They were culled, as you very well know.
-- Andre-Louis Moreau in Scarabouche
On Mon, 2010-01-11 at 15:26 -0800, H. Peter Anvin wrote:
> On 01/11/2010 12:17 PM, Pavel Machek wrote:
> >>
> >> Uhm, that's just plain wrong.
> >>
> >> It doesn't matter if there is a "special mapping layer" -- if you're
> >> crossing multiple erase blocks you're still having more churn in
> >> your flash translation layer, with more wear on the device, and
> >> lower performance than if you didn't.
> >
> > Eraseblocks really should not matter. It is not as if each logical
> > sector belongs to one eraseblock....
> >
> > (OTOH, maybe the eraseblock *groups* that are basis for wear-leveling
> > do, or maybe firmware is doing something really really strange.)
> > Pavel
>
> Maybe they "should not" matter, but they *do* matter. In most existing
> FTLs, each logical sector *does* belong to one erase block, although
> which particular erase block that is of course moves around. However,
> the invariant that matters though -- and the reason alignment matters --
> is that for most FTLs, the *offset* of any particular logical sector
> within the erase block it currently belongs to is invariant, i.e. the
> FTL operates on physical sectors which are the same size as the erase
> blocks.
I think the other reason why alignment matters, at least on eMMC, is
that they most probably have several MLC NAND chips inside, and they use
"striping". And what matters is whether your I/O request is aligned to
the "striping I/O group" size or not. Yes, because the offsets are most
probably invariant.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
On Tue, 2010-01-12 at 15:33 +0200, Artem Bityutskiy wrote:
> I think the other reason why alignment matters, at least on eMMC, is
s/eMMC/eMMC we tested/
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
On 01/12/2010 01:37 AM, Jörn Engel wrote:
> On Tue, 12 January 2010 00:19:24 -0800, H. Peter Anvin wrote:
>>
>> I believe for Smartmedia the FTL is done host side, but that there is a
>> standard-specified FTL (called CIS, Card Information System.)
>
> A CIS block/page is part of the FTL design. Not sure whether it also
> lends its name to it. I'll just take your word for it.
>
> Whether the FTL is done host side depends on the card reader and your
> definition of host side. Most old card readers implemented it in the
> driver, but some newer ones prefer to export a usb mass storage
> interface, thereby not requiring any driver CD.
>
"Host side" in this case means "not on the card". The electrical
interface to Smartmedia is, indeed, raw NAND.
-hpa