A well-known kernel bug is that it guesses at the partition type
and the partitions on any disk it encounters. This is bad because
needless I/O is done, slowing down the boot, sometimes quite a lot,
especially when I/O errors occur. And it is bad because sometimes
we guess wrong.
In other words, we need the user space command `partition',
where "partition -t dos /dev/sda" reads a DOS-type partition
table. (And "partition /dev/sda" tries all known heuristics
to decide what type of partitioning might be present.)
The two variants are: (i) partition tells the kernel
to do the partition table reading, and (ii) partition uses partx
to read the partition table and tells the kernel one-by-one
about the partitions found this way.
Since this is a fundamental change, a long transition period
is needed, and that period could start with a kernel boot parameter
telling the kernel not to do partition table parsing on a particular
disk, or a particular type of disks, or all disks.
This could have been the intro to a patch doing that, but is not.
(It is just an RFC.)
The tiny patch below prompted the above - it was suggested by Uwe Bonnes
who encountered USB devices without partition table where our present
heuristics did not suffice to stop partition table parsing.
It causes the kernel to ignore partitions of type 0. A band-aid.
I think nobody uses such partitions seriously, but nevertheless this
should probably live in -mm for a while to see if anybody complains.
Andries
diff -uprN -X /linux/dontdiff a/fs/partitions/msdos.c b/fs/partitions/msdos.c
--- a/fs/partitions/msdos.c 2004-12-29 03:39:55.000000000 +0100
+++ b/fs/partitions/msdos.c 2005-02-26 22:21:06.000000000 +0100
@@ -430,6 +430,8 @@ int msdos_partition(struct parsed_partit
for (slot = 1 ; slot <= 4 ; slot++, p++) {
u32 start = START_SECT(p)*sector_size;
u32 size = NR_SECTS(p)*sector_size;
+ if (SYS_IND(p) == 0)
+ continue;
if (!size)
continue;
if (is_extended_partition(p)) {
>>>>> "Andries" == Andries Brouwer <[email protected]> writes:
Andrew,
Andries> I think nobody uses such partitions seriously, but nevertheless
Andries> this should probably live in -mm for a while to see if anybody
Andries> complains.
the partition table of the USB stick in question is valid:
1B0: 00 00 00 00 00 00 00 00 53 3F 3C B9 00 00 00 01 ........S?<.....
1C0: 01 00 06 10 21 7D 25 00 00 00 DB F3 01 00 00 00 ....!}%.........
1D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
*
1F0: 00 00 00 00 00 00 00 54 72 75 6D 70 4D 53 55 AA .......TrumpMSU.
Entry 1 is a FAT partition of exactly the size of the stick, and entries 2
to 4 are empty, marked by id zero. However the manufacturer decided to put a
name string "Trump" ( /sbin/lsusb gives
Bus 004 Device 012: ID 090a:1bc0 Trumpion Microelectronics, Inc.) just before
the "55 AA" partition table magic and our code reads this string as a
(bogus) size for the fourth entry, taking it for real.
Please consider the patch for the main kernel distribution.
Cheers
--
Uwe Bonnes [email protected]
Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
On Sat, 26 Feb 2005, Uwe Bonnes wrote:
>
> Andrew,
>
> Andries> I think nobody uses such partitions seriously, but nevertheless
> Andries> this should probably live in -mm for a while to see if anybody
> Andries> complains.
>
> the partition table of the USB stick in question is valid:
>
> 1B0: 00 00 00 00 00 00 00 00 53 3F 3C B9 00 00 00 01 ........S?<.....
> 1C0: 01 00 06 10 21 7D 25 00 00 00 DB F3 01 00 00 00 ....!}%.........
> 1D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> *
> 1F0: 00 00 00 00 00 00 00 54 72 75 6D 70 4D 53 55 AA .......TrumpMSU.
>
> Entry 1 is a FAT partition of exactly the size of the stick, and entries 2
> to 4 are empty, marked by id zero. However the manufacturer decided to put a
> name string "Trump" ( /sbin/lsusb gives
> Bus 004 Device 012: ID 090a:1bc0 Trumpion Microelectronics, Inc.) just before
> the "55 AA" partition table magic and our code reads this string as a
> (bogus) size for the fourth entry, taking it for real.
Would it not make more sense to just sanity-check the size itself, and
throw it out if the partition size (plus start) is bigger than the disk
size?
We already do that within extended partitions, ie we do
if (offs + size > this_size)
continue;
to just ignore crap. For some reason we don't do this for the primary one
(possibly because we don't trust disk size reporting, I guess).
There might well be people use use partition type 0, just because they
just never _set_ the partition type.. I don't think Linux has ever cared
about any type except for the "extended partition" type, so checking for
zero doesn't seem very safe..
Linus
On Sat, 26 Feb 2005, Linus Torvalds wrote:
>
> Would it not make more sense to just sanity-check the size itself, and
> throw it out if the partition size (plus start) is bigger than the disk
> size?
Something like this (TOTALLY UNTESTED AS USUAL!)?
What does fdisk and other tools do on that disk? Just out of interest..
Linus
---
===== fs/partitions/msdos.c 1.26 vs edited =====
--- 1.26/fs/partitions/msdos.c 2004-11-09 12:43:17 -08:00
+++ edited/fs/partitions/msdos.c 2005-02-26 14:33:33 -08:00
@@ -381,6 +381,7 @@
int msdos_partition(struct parsed_partitions *state, struct block_device *bdev)
{
int sector_size = bdev_hardsect_size(bdev) / 512;
+ sector_t nr_sectors;
Sector sect;
unsigned char *data;
struct partition *p;
@@ -426,11 +427,12 @@
* On the second pass look inside *BSD, Unixware and Solaris partitions.
*/
+ nr_sectors = get_capacity(bdev->bd_disk);
state->next = 5;
for (slot = 1 ; slot <= 4 ; slot++, p++) {
u32 start = START_SECT(p)*sector_size;
u32 size = NR_SECTS(p)*sector_size;
- if (!size)
+ if (!size || size > nr_sectors)
continue;
if (is_extended_partition(p)) {
/* prevent someone doing mkfs or mkswap on an
On Sat, Feb 26, 2005 at 02:28:45PM -0800, Linus Torvalds wrote:
> Would it not make more sense to just sanity-check the size itself, and
> throw it out if the partition size (plus start) is bigger than the disk
> size?
I don't mind.
> There might well be people use use partition type 0, just because they
> just never _set_ the partition type.. I don't think Linux has ever cared
> about any type except for the "extended partition" type, so checking for
> zero doesn't seem very safe..
The default fdisk will assign type 83 to a newly created partition.
One has to change it by hand to 0. So, I do not think testing against 0
is so bad. A heuristic, You give another heuristic. Probably there will
be a point in time where we need both.
(About type 0: DOS has used type 0 as definition of unused. It is not
bad if Linux uses DOS-conventions for a DOS-type partition table.)
Andries
On Sat, 26 Feb 2005, Andries Brouwer wrote:
>
> The default fdisk will assign type 83 to a newly created partition.
Ok. Is that a "it has done so for the last 5 years" thing?
> (About type 0: DOS has used type 0 as definition of unused. It is not
> bad if Linux uses DOS-conventions for a DOS-type partition table.)
Agreed. At the same time, I could well imagine that some people might use
such a type exactly to make DOS ignore it (but I assume the same is true
of the regular 0x83 type too, so maybe I'm just being difficult).
There's certainly a good argument for fixing a known problem (Uwe) and a
small enough risk of it breaking anything else.
Linus
>>>>> "Linus" == Linus Torvalds <[email protected]> writes:
Linus> On Sat, 26 Feb 2005, Linus Torvalds wrote:
>> Would it not make more sense to just sanity-check the size itself,
>> and throw it out if the partition size (plus start) is bigger than
>> the disk size?
Linus> Something like this (TOTALLY UNTESTED AS USUAL!)?
Yes, no phantom partitions also with your approach.
Linus> What does fdisk and other tools do on that disk? Just out of
Linus> interest..
To be honest:
r50:~ # fdisk /dev/sda
Command (m for help): p
Disk /dev/sda: 65 MB, 65536000 bytes
17 heads, 33 sectors/track, 228 cylinders
Units = cylinders of 561 * 512 = 287232 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 229 63981+ 6 FAT16
Partition 1 has different physical/logical beginnings (non-Linux?):
phys=(0, 1, 1) logical=(0, 1, 5)
Partition 1 has different physical/logical endings:
phys=(125, 16, 33) logical=(228, 2, 26)
/dev/sda4 3512348 6003585 698791990+ 0 Empty
Partition 4 has different physical/logical beginnings (non-Linux?):
phys=(0, 0, 0) logical=(3512347, 6, 16)
Partition 4 has different physical/logical endings:
phys=(0, 0, 0) logical=(6003584, 7, 6)
Partition 4 does not end on cylinder boundary.
--
Uwe Bonnes [email protected]
Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
On Sat, Feb 26, 2005 at 03:12:28PM -0800, Linus Torvalds wrote:
> > The default fdisk will assign type 83 to a newly created partition.
>
> Ok. Is that a "it has done so for the last 5 years" thing?
The last twelve years.
> > (About type 0: DOS has used type 0 as definition of unused. It is not
> > bad if Linux uses DOS-conventions for a DOS-type partition table.)
>
> Agreed. At the same time, I could well imagine that some people might use
> such a type exactly to make DOS ignore it (but I assume the same is true
> of the regular 0x83 type too, so maybe I'm just being difficult).
>
> There's certainly a good argument for fixing a known problem (Uwe) and a
> small enough risk of it breaking anything else.
Yes.
Andries
(Concerning the "size" version: it occurred to me that there is one
very minor objection: For extended partitions so far the size did
not normally play a role. Only the starting sector was significant.
If, at some moment we decide also to check the size, then a weaker
check, namely only checking for non-extended partitions, might be
better at first.)
(Yes, disk capacity is not always known - see e.g. ll_rw_blk.c:
/* Test device or partition size, when known. */
See also sd.c, with the strange
sdkp->capacity = 0x200000; /* 1 GB - random */
In such cases we just access the blocks user space tells us to access.)
On Sun, 27 Feb 2005, Uwe Bonnes wrote:
>
> /dev/sda4 3512348 6003585 698791990+ 0 Empty
> Partition 4 has different physical/logical beginnings (non-Linux?):
> phys=(0, 0, 0) logical=(3512347, 6, 16)
> Partition 4 has different physical/logical endings:
> phys=(0, 0, 0) logical=(6003584, 7, 6)
> Partition 4 does not end on cylinder boundary.
Yeah, your case could check for zero in the physical sector stuff too, but
I'm not sure that matters, since nobody really cares about the physical
values and they've long been too limited to matter. So I'm not at all
convinced that adding a few more checks for zero would be any better than
checking the one zero that Andries does.
I think Andries' patch is fine. We should probably do the same for the
extended partition case, just to be consistent.
Linus
On Sun, 27 Feb 2005, Andries Brouwer wrote:
>
> (Concerning the "size" version: it occurred to me that there is one
> very minor objection: For extended partitions so far the size did
> not normally play a role. Only the starting sector was significant.
> If, at some moment we decide also to check the size, then a weaker
> check, namely only checking for non-extended partitions, might be
> better at first.)
Yes. I agree - checking the size is likely _more_ dangerous and likely to
break something silly than checking the ID for zero.
So your patch it is. I'll put it in immediately after doing a 2.6.11
(no need to worry about getting into 2.6.11, since afaik the worst problem
right now is an extra partition that isn't usable).
Linus
On Sat, Feb 26, 2005 at 03:46:03PM -0800, Linus Torvalds wrote:
> We should probably do the same for the
> extended partition case, just to be consistent.
True.
diff -uprN -X /linux/dontdiff a/fs/partitions/msdos.c b/fs/partitions/msdos.c
--- a/fs/partitions/msdos.c 2004-12-29 03:39:55.000000000 +0100
+++ b/fs/partitions/msdos.c 2005-02-27 01:10:06.000000000 +0100
@@ -114,6 +114,9 @@ parse_extended(struct parsed_partitions
*/
for (i=0; i<4; i++, p++) {
u32 offs, size, next;
+
+ if (SYS_IND(p) == 0)
+ continue;
if (!NR_SECTS(p) || is_extended_partition(p))
continue;
@@ -430,6 +433,8 @@ int msdos_partition(struct parsed_partit
for (slot = 1 ; slot <= 4 ; slot++, p++) {
u32 start = START_SECT(p)*sector_size;
u32 size = NR_SECTS(p)*sector_size;
+ if (SYS_IND(p) == 0)
+ continue;
if (!size)
continue;
if (is_extended_partition(p)) {
>>>>> "Linus" == Linus Torvalds <[email protected]> writes:
Linus> On Sun, 27 Feb 2005, Andries Brouwer wrote:
>> (Concerning the "size" version: it occurred to me that there is one
>> very minor objection: For extended partitions so far the size did not
>> normally play a role. Only the starting sector was significant. If,
>> at some moment we decide also to check the size, then a weaker check,
>> namely only checking for non-extended partitions, might be better at
>> first.)
Linus> Yes. I agree - checking the size is likely _more_ dangerous and
Linus> likely to break something silly than checking the ID for zero.
Linus> So your patch it is. I'll put it in immediately after doing a
Linus> 2.6.11 (no need to worry about getting into 2.6.11, since afaik
Linus> the worst problem right now is an extra partition that isn't
Linus> usable).
Well,
on a Suse 9.2 System with Suse Hotplug, the phantom partition was somehow
recognized as Reiserfs, and then the Hotplug mechanism trying to mount the
bogus partition as a Reiser Filesystem ended in an Oops...
--
Uwe Bonnes [email protected]
Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
On Sun, 27 Feb 2005, Uwe Bonnes wrote:
>
> on a Suse 9.2 System with Suse Hotplug, the phantom partition was somehow
> recognized as Reiserfs, and then the Hotplug mechanism trying to mount the
> bogus partition as a Reiser Filesystem ended in an Oops...
Heh. That oops would be interesting in itself, since it implies that
reiserfs is not doing very well on the sanity-checking front.
But yes, point taken.
Linus
On Sun, Feb 27, 2005 at 01:47:43AM +0100, Uwe Bonnes wrote:
> on a Suse 9.2 System with Suse Hotplug, the phantom partition was somehow
> recognized as Reiserfs, and then the Hotplug mechanism trying to mount the
> bogus partition as a Reiser Filesystem ended in an Oops...
Always report the oops. It is well-known that mounting garbage may crash
the kernel. Earlier the reply was "don't do that then". Nowadays we have
more layers of software trying to probe and do automatic things, and the
kernel must survive attempts to mount garbage.
Andries
>>>>> "Linus" == Linus Torvalds <[email protected]> writes:
Linus> On Sun, 27 Feb 2005, Uwe Bonnes wrote:
>> on a Suse 9.2 System with Suse Hotplug, the phantom partition was
>> somehow recognized as Reiserfs, and then the Hotplug mechanism trying
>> to mount the bogus partition as a Reiser Filesystem ended in an
>> Oops...
Linus> Heh. That oops would be interesting in itself, since it implies
Linus> that reiserfs is not doing very well on the sanity-checking
Linus> front.
Linus> But yes, point taken.
I have to admit, I saw the oops only with a glimpse of the eye, trying to do
other tasks at that moment about a week ago. The oops also left no trace in
the syslog. As syslog is on a reiserfs partition, perhaps reiserfs refused
no write anything after the oops. Trying to reproduce by mounting the bogus
sda4 partition of the usb stick on two different setups, now reiserfs
clearly refuses to mount it. Perhaps the oops was unrelated.
Find appended an oops happening yesterday, related to reiserfs too. Probably
totally useless, as run on a custom Suse kernel. A reiserfsck after the oops
corrected some errors, perhaps introduced by the oops one week earlier. I
don't understand what to do about the missing ksyms..
B.t.w.: While I have you attention, can I point you to
http://www.ussg.iu.edu/hypermail/linux/kernel/0410.1/1246.html
where I reported a Linux missbehaviour and a possible patch. I tried to get
the attention of several kernel developpers, to no avail so long ...
Cheers
--
Uwe Bonnes [email protected]
Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
==undecoded excerpt from the syslog===
Feb 26 18:47:40 r50 kernel: Unable to handle kernel paging request at virtual address 8a5e65b8
Feb 26 18:47:40 r50 kernel: printing eip:
Feb 26 18:47:40 r50 kernel: c01d1482
Feb 26 18:47:40 r50 kernel: *pde = 00000000
Feb 26 18:47:40 r50 kernel: Oops: 0000 [#1]
Feb 26 18:47:40 r50 kernel: Modules linked in: usb_storage nvram usbserial parport_pc lp parport cpufreq_userspace speedstep_centrino freq_table thermal processor fan button battery ac edd snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore snd_page_alloc ipv6 pcmcia joydev sg st sd_mod sr_mod scsi_mod ide_cd cdrom subfs nls_utf8 ntfs i2c_i801 i2c_core cinergyT2 dvb_core e1000 ehci_hcd hw_random uhci_hcd ohci1394 ieee1394 yenta_socket rsrc_nonstatic pcmcia_core intel_agp agpgart evdev dm_mod usbcore reiserfs
Feb 26 18:47:40 r50 kernel: CPU: 0
Feb 26 18:47:40 r50 kernel: EIP: 0060:[<c01d1482>] Tainted: G U VLI
Feb 26 18:47:40 r50 kernel: EFLAGS: 00010202 (2.6.11-rc4-bk2)
Feb 26 18:47:40 r50 kernel: EIP is at memcpy+0x12/0x30
Feb 26 18:47:40 r50 kernel: eax: 00000004 ebx: df47b058 ecx: 00000004 edx: 00000010
Feb 26 18:47:40 r50 kernel: esi: 8a5e65b8 edi: df47b058 ebp: c35d7b34 esp: c35d7af4
Feb 26 18:47:40 r50 kernel: ds: 007b es: 007b ss: 0068
Feb 26 18:47:40 r50 kernel: Process xauth (pid: 30301, threadinfo=c35d6000 task=c73d5aa0)
Feb 26 18:47:40 r50 kernel: Stack: dfb3c000 df451458 c35d7c94 e0a5acf4 7aaaaa5a c35d7c94 df6a9ad8 e0a70707
Feb 26 18:47:40 r50 kernel: df6a9ad8 7aaaaa5a 00000004 df451458 c35d7c94 df6a9ad8 df451458 00000005
Feb 26 18:47:41 r50 kernel: c35d7c94 c15cf424 df451458 00000004 c35d7ea8 00000027 00000001 c35d7c94
Feb 26 18:47:41 r50 kernel: Call Trace:
Feb 26 18:47:41 r50 kernel: [<e0a5acf4>] replace_key+0x34/0x70 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a70707>] internal_shift_left+0xa7/0xd0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a70a9a>] balance_internal_when_delete+0x1ca/0x220 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a7114d>] balance_internal+0x5bd/0x800 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c02f846e>] io_schedule+0xe/0x20
Feb 26 18:47:41 r50 kernel: [<c0152da9>] bh_lru_install+0x89/0xb0
Feb 26 18:47:41 r50 kernel: [<c013dc08>] mark_page_accessed+0x28/0x30
Feb 26 18:47:41 r50 kernel: [<e0a67f3b>] dc_check_balance_internal+0x37b/0x390 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a5ae57>] do_balance+0xc7/0xf0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a68b1c>] fix_nodes+0x31c/0x360 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a7456e>] reiserfs_cut_from_item+0x32e/0x4e0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a74a83>] reiserfs_do_truncate+0x2e3/0x540 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a73da8>] reiserfs_delete_object+0x28/0x60 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a5d768>] reiserfs_delete_inode+0x68/0xe0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c013e4d9>] truncate_inode_pages+0x9/0x10
Feb 26 18:47:41 r50 kernel: [<e0a5d700>] reiserfs_delete_inode+0x0/0xe0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c01678d4>] generic_delete_inode+0xa4/0x130
Feb 26 18:47:41 r50 kernel: [<c0167add>] iput+0x4d/0x70
Feb 26 18:47:41 r50 kernel: [<c015ebfa>] sys_unlink+0xba/0x130
Feb 26 18:47:41 r50 kernel: [<c014602d>] do_munmap+0xcd/0x100
Feb 26 18:47:41 r50 kernel: [<c0102c49>] sysenter_past_esp+0x52/0x79
Feb 26 18:47:41 r50 kernel: Code: e0 ff ff 21 e2 3b 42 18 73 06 8b 50 fd 31 c0 c3 31 d2 b8 f2 ff ff ff c3 90 57 56 53 89 c3 89 c8 89 d6 c1 e8 02 89 ca 89 df 89 c1 <f3> a5 f6 c2 02 74 02 66 a5 f6 c2 01 74 01 a4 89 d8 5b 5e 5f c3
Feb 26 18:47:41 r50 kernel: Badness in do_exit at kernel/exit.c:790
Feb 26 18:47:41 r50 kernel: [<c011c0ae>] do_exit+0x2de/0x2f0
Feb 26 18:47:41 r50 kernel: [<c010448e>] die+0x12e/0x130
Feb 26 18:47:41 r50 kernel: [<c0114e05>] do_page_fault+0x395/0x588
Feb 26 18:47:41 r50 kernel: [<c01d1482>] memcpy+0x12/0x30
Feb 26 18:47:41 r50 kernel: [<e0a765f4>] get_cnode+0x14/0x70 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a7a2a2>] journal_mark_dirty+0x102/0x230 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a6e473>] leaf_copy_items_entirely+0x1d3/0x240 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a6eb43>] leaf_copy_items+0xf3/0x110 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a6ed52>] leaf_move_items+0x62/0x70 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c0114a70>] do_page_fault+0x0/0x588
Feb 26 18:47:41 r50 kernel: [<c0103dbb>] error_code+0x2b/0x30
Feb 26 18:47:41 r50 kernel: [<c01d1482>] memcpy+0x12/0x30
Feb 26 18:47:41 r50 kernel: [<e0a5acf4>] replace_key+0x34/0x70 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a70707>] internal_shift_left+0xa7/0xd0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a70a9a>] balance_internal_when_delete+0x1ca/0x220 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a7114d>] balance_internal+0x5bd/0x800 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c02f846e>] io_schedule+0xe/0x20
Feb 26 18:47:41 r50 kernel: [<c0152da9>] bh_lru_install+0x89/0xb0
Feb 26 18:47:41 r50 kernel: [<c013dc08>] mark_page_accessed+0x28/0x30
Feb 26 18:47:41 r50 kernel: [<e0a67f3b>] dc_check_balance_internal+0x37b/0x390 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a5ae57>] do_balance+0xc7/0xf0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a68b1c>] fix_nodes+0x31c/0x360 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a7456e>] reiserfs_cut_from_item+0x32e/0x4e0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a74a83>] reiserfs_do_truncate+0x2e3/0x540 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a73da8>] reiserfs_delete_object+0x28/0x60 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a5d768>] reiserfs_delete_inode+0x68/0xe0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c013e4d9>] truncate_inode_pages+0x9/0x10
Feb 26 18:47:41 r50 kernel: [<e0a5d700>] reiserfs_delete_inode+0x0/0xe0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c01678d4>] generic_delete_inode+0xa4/0x130
Feb 26 18:47:41 r50 kernel: [<c0167add>] iput+0x4d/0x70
Feb 26 18:47:41 r50 kernel: [<c015ebfa>] sys_unlink+0xba/0x130
Feb 26 18:47:41 r50 kernel: [<c014602d>] do_munmap+0xcd/0x100
Feb 26 18:47:41 r50 kernel: [<c0102c49>] sysenter_past_esp+0x52/0x79
Feb 26 18:47:41 r50 kernel: ReiserFS: warning: is_internal: free space seems wrong: level=2, nr_items=169, free_space=16 rdkey
Feb 26 18:47:41 r50 kernel: ReiserFS: hda5: warning: vs-5150: search_by_key: invalid format found in block 8212. Fsck?
Feb 26 18:47:41 r50 kernel: ReiserFS: hda5: warning: vs-13050: reiserfs_update_sd: i/o failure occurred trying to update [147 150 0x0 SD] stat data
Feb 26 18:47:41 r50 kernel: ReiserFS: warning: is_internal: free space seems wrong: level=2, nr_items=169, free_space=16 rdkey
Feb 26 18:47:41 r50 kernel: ReiserFS: hda5: warning: vs-5150: search_by_key: invalid format found in block 8212. Fsck?
Feb 26 18:47:41 r50 kernel: ReiserFS: hda5: warning: vs-13050: reiserfs_update_sd: i/o failure occurred trying to update [147 149 0x0 SD] stat data
Feb 26 18:47:41 r50 kernel: ReiserFS: warning: is_internal: free space seems wrong: level=2, nr_items=169, free_space=16 rdkey
Feb 26 18:47:41 r50 kernel: ReiserFS: hda5: warning: vs-5150: search_by_key: invalid format found in block 8212. Fsck?
Feb 26 18:47:41 r50 kernel: ReiserFS: hda5: warning: vs-13050: reiserfs_update_sd: i/o failure occurred trying to update [147 146 0x0 SD] stat data
Feb 26 18:47:42 r50 kernel: ReiserFS: warning: is_internal: free space seems wrong: level=2, nr_items=169, free_space=16 rdkey
Feb 26 18:47:42 r50 kernel: ReiserFS: hda5: warning: vs-5150: search_by_key: invalid format found in block 8212. Fsck?
Feb 26 18:47:42 r50 kernel: ReiserFS: hda5: warning: vs-13050: reiserfs_update_sd: i/o failure occurred trying to update [147 146 0x0 SD] stat data
Feb 26 18:47:43 r50 kernel: ReiserFS: warning: is_internal: free space seems wrong: level=2, nr_items=169, free_space=16 rdkey
Feb 26 18:47:43 r50 kernel: ReiserFS: hda5: warning: vs-5150: search_by_key: invalid format found in block 8212. Fsck?
Feb 26 18:47:43 r50 kernel: ReiserFS: hda5: warning: vs-13050: reiserfs_update_sd: i/o failure occurred trying to update [147 146 0x0 SD] stat data
Feb 26 18:47:44 r50 kernel: ReiserFS: warning: is_internal: free space seems wrong: level=2, nr_items=169, free_space=16 rdkey
Feb 26 18:47:44 r50 kernel: ReiserFS: hda5: warning: vs-5150: search_by_key: invalid format found in block 8212. Fsck?
Feb 26 18:47:44 r50 kernel: ReiserFS: hda5: warning: vs-13050: reiserfs_update_sd: i/o failure occurred trying to update [147 146 0x0 SD] stat data
=== ksymoops decoded ===
ksymoops 2.4.9 on i686 2.6.11-rc4-bk2. Options used
-V (default)
-k /proc/kallsyms (default)
-l /proc/modules (default)
-o /lib/modules/2.6.11-rc4-bk2/ (default)
-m /boot/System.map-2.6.11-rc4-bk2 (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Warning (read_ksyms): no kernel symbols in ksyms, is /proc/kallsyms a valid ksyms file?
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Feb 26 18:47:40 r50 kernel: Unable to handle kernel paging request at virtual address 8a5e65b8
Feb 26 18:47:40 r50 kernel: c01d1482
Feb 26 18:47:40 r50 kernel: *pde = 00000000
Feb 26 18:47:40 r50 kernel: Oops: 0000 [#1]
Feb 26 18:47:40 r50 kernel: CPU: 0
Feb 26 18:47:40 r50 kernel: EIP: 0060:[<c01d1482>] Tainted: G U VLI
Using defaults from ksymoops -t elf32-i386 -a i386
Feb 26 18:47:40 r50 kernel: EFLAGS: 00010202 (2.6.11-rc4-bk2)
Feb 26 18:47:40 r50 kernel: eax: 00000004 ebx: df47b058 ecx: 00000004 edx: 00000010
Feb 26 18:47:40 r50 kernel: esi: 8a5e65b8 edi: df47b058 ebp: c35d7b34 esp: c35d7af4
Feb 26 18:47:40 r50 kernel: ds: 007b es: 007b ss: 0068
Feb 26 18:47:40 r50 kernel: Stack: dfb3c000 df451458 c35d7c94 e0a5acf4 7aaaaa5a c35d7c94 df6a9ad8 e0a70707
Feb 26 18:47:40 r50 kernel: df6a9ad8 7aaaaa5a 00000004 df451458 c35d7c94 df6a9ad8 df451458 00000005
Feb 26 18:47:41 r50 kernel: c35d7c94 c15cf424 df451458 00000004 c35d7ea8 00000027 00000001 c35d7c94
Feb 26 18:47:41 r50 kernel: Call Trace:
Feb 26 18:47:41 r50 kernel: [<e0a5acf4>] replace_key+0x34/0x70 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a70707>] internal_shift_left+0xa7/0xd0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a70a9a>] balance_internal_when_delete+0x1ca/0x220 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a7114d>] balance_internal+0x5bd/0x800 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c02f846e>] io_schedule+0xe/0x20
Feb 26 18:47:41 r50 kernel: [<c0152da9>] bh_lru_install+0x89/0xb0
Feb 26 18:47:41 r50 kernel: [<c013dc08>] mark_page_accessed+0x28/0x30
Feb 26 18:47:41 r50 kernel: [<e0a67f3b>] dc_check_balance_internal+0x37b/0x390 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a5ae57>] do_balance+0xc7/0xf0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a68b1c>] fix_nodes+0x31c/0x360 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a7456e>] reiserfs_cut_from_item+0x32e/0x4e0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a74a83>] reiserfs_do_truncate+0x2e3/0x540 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a73da8>] reiserfs_delete_object+0x28/0x60 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a5d768>] reiserfs_delete_inode+0x68/0xe0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c013e4d9>] truncate_inode_pages+0x9/0x10
Feb 26 18:47:41 r50 kernel: [<e0a5d700>] reiserfs_delete_inode+0x0/0xe0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c01678d4>] generic_delete_inode+0xa4/0x130
Feb 26 18:47:41 r50 kernel: [<c0167add>] iput+0x4d/0x70
Feb 26 18:47:41 r50 kernel: [<c015ebfa>] sys_unlink+0xba/0x130
Feb 26 18:47:41 r50 kernel: [<c014602d>] do_munmap+0xcd/0x100
Feb 26 18:47:41 r50 kernel: [<c0102c49>] sysenter_past_esp+0x52/0x79
Feb 26 18:47:41 r50 kernel: Code: e0 ff ff 21 e2 3b 42 18 73 06 8b 50 fd 31 c0 c3 31 d2 b8 f2 ff ff ff c3 90 57 56 53 89 c3 89 c8 89 d6 c1 e8 02 89 ca 89 df 89 c1 <f3> a5 f6 c2 02 74 02 66 a5 f6 c2 01 74 01 a4 89 d8 5b 5e 5f c3
>>EIP; c01d1482 <memcpy+12/30> <=====
>>ebx; df47b058 <pg0+1f035058/3fbb8400>
>>edi; df47b058 <pg0+1f035058/3fbb8400>
>>ebp; c35d7b34 <pg0+3191b34/3fbb8400>
>>esp; c35d7af4 <pg0+3191af4/3fbb8400>
Trace; e0a5acf4 <pg0+20614cf4/3fbb8400>
Trace; e0a70707 <pg0+2062a707/3fbb8400>
Trace; e0a70a9a <pg0+2062aa9a/3fbb8400>
Trace; e0a7114d <pg0+2062b14d/3fbb8400>
Trace; c02f846e <io_schedule+e/20>
Trace; c0152da9 <bh_lru_install+89/b0>
Trace; c013dc08 <mark_page_accessed+28/30>
Trace; e0a67f3b <pg0+20621f3b/3fbb8400>
Trace; e0a5ae57 <pg0+20614e57/3fbb8400>
Trace; e0a68b1c <pg0+20622b1c/3fbb8400>
Trace; e0a7456e <pg0+2062e56e/3fbb8400>
Trace; e0a74a83 <pg0+2062ea83/3fbb8400>
Trace; e0a73da8 <pg0+2062dda8/3fbb8400>
Trace; e0a5d768 <pg0+20617768/3fbb8400>
Trace; c013e4d9 <truncate_inode_pages+9/10>
Trace; e0a5d700 <pg0+20617700/3fbb8400>
Trace; c01678d4 <generic_delete_inode+a4/130>
Trace; c0167add <iput+4d/70>
Trace; c015ebfa <sys_unlink+ba/130>
Trace; c014602d <do_munmap+cd/100>
Trace; c0102c49 <sysenter_past_esp+52/79>
This architecture has variable length instructions, decoding before eip
is unreliable, take these instructions with a pinch of salt.
Code; c01d1457 <__get_user_4+7/17>
00000000 <_EIP>:
Code; c01d1457 <__get_user_4+7/17>
0: e0 ff loopne 1 <_EIP+0x1>
Code; c01d1459 <__get_user_4+9/17>
2: ff 21 jmp *(%ecx)
Code; c01d145b <__get_user_4+b/17>
4: e2 3b loop 41 <_EIP+0x41>
Code; c01d145d <__get_user_4+d/17>
6: 42 inc %edx
Code; c01d145e <__get_user_4+e/17>
7: 18 73 06 sbb %dh,0x6(%ebx)
Code; c01d1461 <__get_user_4+11/17>
a: 8b 50 fd mov 0xfffffffd(%eax),%edx
Code; c01d1464 <__get_user_4+14/17>
d: 31 c0 xor %eax,%eax
Code; c01d1466 <__get_user_4+16/17>
f: c3 ret
Code; c01d1467 <bad_get_user+0/9>
10: 31 d2 xor %edx,%edx
Code; c01d1469 <bad_get_user+2/9>
12: b8 f2 ff ff ff mov $0xfffffff2,%eax
Code; c01d146e <bad_get_user+7/9>
17: c3 ret
Code; c01d146f <bad_get_user+8/9>
18: 90 nop
Code; c01d1470 <memcpy+0/30>
19: 57 push %edi
Code; c01d1471 <memcpy+1/30>
1a: 56 push %esi
Code; c01d1472 <memcpy+2/30>
1b: 53 push %ebx
Code; c01d1473 <memcpy+3/30>
1c: 89 c3 mov %eax,%ebx
Code; c01d1475 <memcpy+5/30>
1e: 89 c8 mov %ecx,%eax
Code; c01d1477 <memcpy+7/30>
20: 89 d6 mov %edx,%esi
Code; c01d1479 <memcpy+9/30>
22: c1 e8 02 shr $0x2,%eax
Code; c01d147c <memcpy+c/30>
25: 89 ca mov %ecx,%edx
Code; c01d147e <memcpy+e/30>
27: 89 df mov %ebx,%edi
Code; c01d1480 <memcpy+10/30>
29: 89 c1 mov %eax,%ecx
This decode from eip onwards should be reliable
Code; c01d1482 <memcpy+12/30>
00000000 <_EIP>:
Code; c01d1482 <memcpy+12/30> <=====
0: f3 a5 repz movsl %ds:(%esi),%es:(%edi) <=====
Code; c01d1484 <memcpy+14/30>
2: f6 c2 02 test $0x2,%dl
Code; c01d1487 <memcpy+17/30>
5: 74 02 je 9 <_EIP+0x9>
Code; c01d1489 <memcpy+19/30>
7: 66 a5 movsw %ds:(%esi),%es:(%edi)
Code; c01d148b <memcpy+1b/30>
9: f6 c2 01 test $0x1,%dl
Code; c01d148e <memcpy+1e/30>
c: 74 01 je f <_EIP+0xf>
Code; c01d1490 <memcpy+20/30>
e: a4 movsb %ds:(%esi),%es:(%edi)
Code; c01d1491 <memcpy+21/30>
f: 89 d8 mov %ebx,%eax
Code; c01d1493 <memcpy+23/30>
11: 5b pop %ebx
Code; c01d1494 <memcpy+24/30>
12: 5e pop %esi
Code; c01d1495 <memcpy+25/30>
13: 5f pop %edi
Code; c01d1496 <memcpy+26/30>
14: c3 ret
Feb 26 18:47:41 r50 kernel: [<c011c0ae>] do_exit+0x2de/0x2f0
Feb 26 18:47:41 r50 kernel: [<c010448e>] die+0x12e/0x130
Feb 26 18:47:41 r50 kernel: [<c0114e05>] do_page_fault+0x395/0x588
Feb 26 18:47:41 r50 kernel: [<c01d1482>] memcpy+0x12/0x30
Feb 26 18:47:41 r50 kernel: [<e0a765f4>] get_cnode+0x14/0x70 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a7a2a2>] journal_mark_dirty+0x102/0x230 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a6e473>] leaf_copy_items_entirely+0x1d3/0x240 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a6eb43>] leaf_copy_items+0xf3/0x110 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a6ed52>] leaf_move_items+0x62/0x70 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c0114a70>] do_page_fault+0x0/0x588
Feb 26 18:47:41 r50 kernel: [<c0103dbb>] error_code+0x2b/0x30
Feb 26 18:47:41 r50 kernel: [<c01d1482>] memcpy+0x12/0x30
Feb 26 18:47:41 r50 kernel: [<e0a5acf4>] replace_key+0x34/0x70 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a70707>] internal_shift_left+0xa7/0xd0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a70a9a>] balance_internal_when_delete+0x1ca/0x220 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a7114d>] balance_internal+0x5bd/0x800 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c02f846e>] io_schedule+0xe/0x20
Feb 26 18:47:41 r50 kernel: [<c0152da9>] bh_lru_install+0x89/0xb0
Feb 26 18:47:41 r50 kernel: [<c013dc08>] mark_page_accessed+0x28/0x30
Feb 26 18:47:41 r50 kernel: [<e0a67f3b>] dc_check_balance_internal+0x37b/0x390 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a5ae57>] do_balance+0xc7/0xf0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a68b1c>] fix_nodes+0x31c/0x360 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a7456e>] reiserfs_cut_from_item+0x32e/0x4e0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a74a83>] reiserfs_do_truncate+0x2e3/0x540 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a73da8>] reiserfs_delete_object+0x28/0x60 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<e0a5d768>] reiserfs_delete_inode+0x68/0xe0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c013e4d9>] truncate_inode_pages+0x9/0x10
Feb 26 18:47:41 r50 kernel: [<e0a5d700>] reiserfs_delete_inode+0x0/0xe0 [reiserfs]
Feb 26 18:47:41 r50 kernel: [<c01678d4>] generic_delete_inode+0xa4/0x130
Feb 26 18:47:41 r50 kernel: [<c0167add>] iput+0x4d/0x70
Feb 26 18:47:41 r50 kernel: [<c015ebfa>] sys_unlink+0xba/0x130
Feb 26 18:47:41 r50 kernel: [<c014602d>] do_munmap+0xcd/0x100
Feb 26 18:47:41 r50 kernel: [<c0102c49>] sysenter_past_esp+0x52/0x79
2 warnings issued. Results may not be reliable.
On Sun, Feb 27, 2005 at 12:40:53AM +0100, Andries Brouwer wrote:
> (Concerning the "size" version: it occurred to me that there is one
> very minor objection: For extended partitions so far the size did
> not normally play a role. Only the starting sector was significant.
> If, at some moment we decide also to check the size, then a weaker
> check, namely only checking for non-extended partitions, might be
> better at first.)
I recently encountered a disk that had clipping enabled. If you go
for the size implementation be careful that people can still run a
program to unclip the disk after the disk has been detected and the
partition rejected....
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
Andries Brouwer wrote:
> In other words, we need the user space command `partition',
> where "partition -t dos /dev/sda" reads a DOS-type partition
> table.
So if you e.g. hotplug a new device, its partitions won't be
accessible before you (or some hotplug manager, etc.) run
"partition" ?
> The two variants are: (i) partition tells the kernel
> to do the partition table reading, and (ii) partition uses partx
> to read the partition table and tells the kernel one-by-one
> about the partitions found this way.
I guess, once you've reached the point where the kernel is
unable to find partitions without user-space help, you may
as well do everything in user space.
> Since this is a fundamental change,
Pretty much, yes. Except for a few embedded systems (*), this
would mark the end of kernels that can do anything useful
without initrd or initramfs.
(*) Oh, regarding the other exception, ceterum censeo nfsroot
esse delendam.
> a long transition period
> is needed, and that period could start with a kernel boot parameter
> telling the kernel not to do partition table parsing on a particular
> disk, or a particular type of disks, or all disks.
... and allow "partition" to override partitions previously
auto-detected by the kernel. That way, you can phase in
"partition" without needing to change your kernel setup.
Besides, the ability to correct past mistakes would also be
useful if auto-detection from user space yields garbage.
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/
On Sat, 19 Mar 2005, Werner Almesberger wrote:
> Andries Brouwer wrote:
>> The two variants are: (i) partition tells the kernel
>> to do the partition table reading, and (ii) partition uses partx
>> to read the partition table and tells the kernel one-by-one
>> about the partitions found this way.
>
> I guess, once you've reached the point where the kernel is
> unable to find partitions without user-space help, you may
> as well do everything in user space.
I agree. This is userspace job. This can be done very easily using
device-mapper. I think EVMS does something similar.
I even asked on LKML some time ago about option for disabling kernel
partition driver (maybe for some devices) from kernel command line to
allow other tools do the job (because now I have unusable /dev/sda1 and
usable /dev/evms/sda1 and this leads to stupid mistakes). But there were
no replies.
Grzegorz Kulewski