2009-06-16 19:25:57

by Jesse Barnes

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow

On Thu, 28 May 2009 02:14:46 -0700 (PDT)
Martin Knoblauch <[email protected]> wrote:
> > I expect the duplicate comes from a left-over mount in initramfs
> > which isn't a duplicate in the sense of a bug in vfs or mount or
> > anything. I guess, it is just still mounted in the initial kernel
> > rootfs, below the root from the disk. It could be that a umount
> > from initramfs did go wrong because of a changed timing.
> >
>
> This is what I suspect as well. I know for sure that the first
> sysfs-line in /proc/mounts
>
> | none /sys sysfs rw 0 0
>
> is already there (2.6.29-rc1 and up) when entering startup-skripts.
> It is supposed to be unmounted before, but something seems to prevent
> it. I have idea how to capture debug output from the initrd/init
> script :-(

What's the latest here Martin? It sounded like this was a userspace
issue, with something reading the VPD over and over? Or was it just a
longer timeout that caused a specific driver to slow everything down?

--
Jesse Barnes, Intel Open Source Technology Center


2009-06-17 08:36:11

by Martin Knoblauch

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow


----- Original Message ----

> From: Jesse Barnes <[email protected]>
> To: Martin Knoblauch <[email protected]>
> Cc: Kay Sievers <[email protected]>; Andrew Morton <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Sent: Tuesday, June 16, 2009 9:25:47 PM
> Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
>
> On Thu, 28 May 2009 02:14:46 -0700 (PDT)
> Martin Knoblauch wrote:
> > > I expect the duplicate comes from a left-over mount in initramfs
> > > which isn't a duplicate in the sense of a bug in vfs or mount or
> > > anything. I guess, it is just still mounted in the initial kernel
> > > rootfs, below the root from the disk. It could be that a umount
> > > from initramfs did go wrong because of a changed timing.
> > >
> >
> > This is what I suspect as well. I know for sure that the first
> > sysfs-line in /proc/mounts
> >
> > | none /sys sysfs rw 0 0
> >
> > is already there (2.6.29-rc1 and up) when entering startup-skripts.
> > It is supposed to be unmounted before, but something seems to prevent
> > it. I have idea how to capture debug output from the initrd/init
> > script :-(
>
> What's the latest here Martin? It sounded like this was a userspace
> issue, with something reading the VPD over and over? Or was it just a
> longer timeout that caused a specific driver to slow everything down?
>

Not sure about the VPD thing. Anyway, no real news. Still happens in 2.6.30. But it only happens on a certain HW platform (HP/DL380G4). The folks at HP try to reproduce in their environment.

Cheers
Martin

2009-06-20 16:37:46

by jim owens

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow

Martin Knoblauch wrote:
> ----- Original Message ----
>
>> From: Jesse Barnes <[email protected]>
>> To: Martin Knoblauch <[email protected]>
>> Cc: Kay Sievers <[email protected]>; Andrew Morton <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
>> Sent: Tuesday, June 16, 2009 9:25:47 PM
>> Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
>>
>> On Thu, 28 May 2009 02:14:46 -0700 (PDT)
>> Martin Knoblauch wrote:
>>>> I expect the duplicate comes from a left-over mount in initramfs
>>>> which isn't a duplicate in the sense of a bug in vfs or mount or
>>>> anything. I guess, it is just still mounted in the initial kernel
>>>> rootfs, below the root from the disk. It could be that a umount
>>>> from initramfs did go wrong because of a changed timing.
>>>>
>>> This is what I suspect as well. I know for sure that the first
>>> sysfs-line in /proc/mounts
>>>
>>> | none /sys sysfs rw 0 0
>>>
>>> is already there (2.6.29-rc1 and up) when entering startup-skripts.
>>> It is supposed to be unmounted before, but something seems to prevent
>>> it. I have idea how to capture debug output from the initrd/init
>>> script :-(
>> What's the latest here Martin? It sounded like this was a userspace
>> issue, with something reading the VPD over and over? Or was it just a
>> longer timeout that caused a specific driver to slow everything down?
>>
>
> Not sure about the VPD thing. Anyway, no real news. Still happens in 2.6.30. But it only happens on a certain HW platform (HP/DL380G4). The folks at HP try to reproduce in their environment.
>
> Cheers
> Martin

I reproduced this and verified Martin's analysis. Conclusions:

- >>> | none /sys sysfs rw 0 0

is because the initrd "umount /sys" fails with EBUSY

|commit 1120f8b8169fb2cb51219d326892d963e762edb6
|Author: Stephen Hemminger <[email protected]>
|Date: Thu Dec 18 09:17:16 2008 -0800
|
| PCI: handle long delays in VPD access

does not have a bug. The longer timeout makes the problem visible.

/sys is busy because udev is trying to read the vpd and the
cciss pci device always fails the vpd with ETIMEOUT. If all
timeouts are before or after the umount, no firmware load problem.

IMO there is either a vpd read bug on this platform or it is
unsupported and ETIMEOUT is the wrong error.

... now I punt this to the HP platform/driver people.

jim

2009-06-20 16:59:15

by Matthew Wilcox

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow

On Sat, Jun 20, 2009 at 12:37:34PM -0400, jim owens wrote:
> Martin Knoblauch wrote:
>> ----- Original Message ----
>>
>>> From: Jesse Barnes <[email protected]>
>>> To: Martin Knoblauch <[email protected]>
>>> Cc: Kay Sievers <[email protected]>; Andrew Morton <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
>>> Sent: Tuesday, June 16, 2009 9:25:47 PM
>>> Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
>> Not sure about the VPD thing. Anyway, no real news. Still happens in 2.6.30. But it only happens on a certain HW platform (HP/DL380G4). The folks at HP try to reproduce in their environment.
>
> I reproduced this and verified Martin's analysis. Conclusions:
>
> - >>> | none /sys sysfs rw 0 0
>
> is because the initrd "umount /sys" fails with EBUSY
>
> |commit 1120f8b8169fb2cb51219d326892d963e762edb6
> |Author: Stephen Hemminger <[email protected]>
> |Date: Thu Dec 18 09:17:16 2008 -0800
> |
> | PCI: handle long delays in VPD access
>
> does not have a bug. The longer timeout makes the problem visible.
>
> /sys is busy because udev is trying to read the vpd and the
> cciss pci device always fails the vpd with ETIMEOUT. If all
> timeouts are before or after the umount, no firmware load problem.
>
> IMO there is either a vpd read bug on this platform or it is
> unsupported and ETIMEOUT is the wrong error.
>
> ... now I punt this to the HP platform/driver people.

OK, let's add the PCI list in on this too ...

There are a bunch of PCI devices with buggy VPD. I've got one (it's a prototype card, I think, so I shan't name the vendor). lspci reports:

Capabilities: [84] Vital Product Data
Unknown small resource type 00
[last line repeated about 32768 times]

and it takes for-freaking-ever. I posted a patch to pciutils on May
13th to fix this:

---

I have several cards which report more-or-less garbage in their VPD.
It can take an extraordinarily long time to read all their VPD and none
of it is of interest. Instead, if we find an unknown resource type,
just stop trying to read any more.

Signed-off-by: Matthew Wilcox <[email protected]>

diff --git a/ls-vpd.c b/ls-vpd.c
index f50d7a4..1ba917f 100644
--- a/ls-vpd.c
+++ b/ls-vpd.c
@@ -204,7 +204,7 @@ cap_vpd(struct device *d)
default:
printf("\t\tUnknown %s resource type %02x\n",
(tag & 0x80) ? "large" : "small", tag & ~0x80);
- break;
+ return;
}

res_addr += res_len;

-----

but I don't know what udev is doing. The udev source doesn't seem to
read PCI vpd itself:

udev-0.141$ find -type f |xargs grep -il vpd
./extras/volume_id/lib/adaptec_raid.c
./extras/scsi_id/scsi_id.8
./extras/scsi_id/scsi.h
./extras/scsi_id/scsi_id.config
./extras/scsi_id/scsi_serial.c

so there must be some script that it's invoking which is doing that.
Anyone familiar with udev?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-06-20 18:21:51

by Kay Sievers

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow

On Sat, Jun 20, 2009 at 18:58, Matthew Wilcox<[email protected]> wrote:

> but I don't know what udev is doing.  The udev source doesn't seem to
> read PCI vpd itself:
>
> udev-0.141$ find -type f |xargs grep -il vpd
> ./extras/volume_id/lib/adaptec_raid.c
> ./extras/scsi_id/scsi_id.8
> ./extras/scsi_id/scsi.h
> ./extras/scsi_id/scsi_id.config
> ./extras/scsi_id/scsi_serial.c
>
> so there must be some script that it's invoking which is doing that.
> Anyone familiar with udev?

scsi_id is usually also called for cciss devices:
KERNEL=="cciss*", ..., IMPORT{program}="scsi_id ...

Kay

2009-06-20 18:26:24

by Matthew Wilcox

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow

On Sat, Jun 20, 2009 at 08:19:43PM +0200, Kay Sievers wrote:
> On Sat, Jun 20, 2009 at 18:58, Matthew Wilcox<[email protected]> wrote:
>
> > but I don't know what udev is doing. ??The udev source doesn't seem to
> > read PCI vpd itself:
> >
> > udev-0.141$ find -type f |xargs grep -il vpd
> > ./extras/volume_id/lib/adaptec_raid.c
> > ./extras/scsi_id/scsi_id.8
> > ./extras/scsi_id/scsi.h
> > ./extras/scsi_id/scsi_id.config
> > ./extras/scsi_id/scsi_serial.c
> >
> > so there must be some script that it's invoking which is doing that.
> > Anyone familiar with udev?
>
> scsi_id is usually also called for cciss devices:
> KERNEL=="cciss*", ..., IMPORT{program}="scsi_id ...

yes, but it's getting SCSI VPD (by asking for mode pages from the SCSI
device). This problem is with PCI VPD which is totally different.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-06-20 18:36:48

by Kay Sievers

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow

On Sat, Jun 20, 2009 at 20:26, Matthew Wilcox<[email protected]> wrote:
> On Sat, Jun 20, 2009 at 08:19:43PM +0200, Kay Sievers wrote:
>> On Sat, Jun 20, 2009 at 18:58, Matthew Wilcox<[email protected]> wrote:
>>
>> > but I don't know what udev is doing. ??The udev source doesn't seem to
>> > read PCI vpd itself:
>> >
>> > udev-0.141$ find -type f |xargs grep -il vpd
>> > ./extras/volume_id/lib/adaptec_raid.c
>> > ./extras/scsi_id/scsi_id.8
>> > ./extras/scsi_id/scsi.h
>> > ./extras/scsi_id/scsi_id.config
>> > ./extras/scsi_id/scsi_serial.c
>> >
>> > so there must be some script that it's invoking which is doing that.
>> > Anyone familiar with udev?
>>
>> scsi_id is usually also called for cciss devices:
>>   KERNEL=="cciss*", ..., IMPORT{program}="scsi_id ...
>
> yes, but it's getting SCSI VPD (by asking for mode pages from the SCSI
> device).  This problem is with PCI VPD which is totally different.

Ah, I see. There is no tool around udev, I know of, which does this.

Maybe someone is still using the broken-by-design libsysfs, which
opens _every_ file it can find in /sys, even when not asked for
anything specific.

Thanks,
Kay

2009-06-20 19:06:38

by Matthew Wilcox

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow

On Sat, Jun 20, 2009 at 08:36:25PM +0200, Kay Sievers wrote:
> Ah, I see. There is no tool around udev, I know of, which does this.
>
> Maybe someone is still using the broken-by-design libsysfs, which
> opens _every_ file it can find in /sys, even when not asked for
> anything specific.

I did consider that option, but dismissed it ... I didn't think anyone
was using something that old. Martin, could you take a look at what
scripts you're running?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-06-20 21:17:19

by jim owens

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow

Matthew Wilcox wrote:
> On Sat, Jun 20, 2009 at 08:36:25PM +0200, Kay Sievers wrote:
>> Ah, I see. There is no tool around udev, I know of, which does this.
>>
>> Maybe someone is still using the broken-by-design libsysfs, which
>> opens _every_ file it can find in /sys, even when not asked for
>> anything specific.
>
> I did consider that option, but dismissed it ... I didn't think anyone
> was using something that old.

My test is a fresh RHEL 4.3 install with 2.6.30-rc8. Whatever
scripts are running are stock redhat and I'm still in initrd
so what would I be looking for?

The initrd timeouts all seem to be triggered by the insmod and
only 1 more vpd read happens in init 3 at the end after HAL starts.

This is the BUG() I set to ID (maybe incorrectly)
that the vpd reads were comming from udev:

[ 5.260664] ------------[ cut here ]------------
[ 5.260667] kernel BUG at
/root/linux-2.6.30-rc8/drivers/pci/access.c:207!
[ 5.260671] invalid opcode: 0000 [#1] SMP
[ 5.260675] last sysfs file:
/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.2/0000:04:03.0/msi_bus
[ 5.260678] CPU 1
[ 5.260681] Modules linked in: cciss(+) sd_mod scsi_mod
[ 5.260689] Pid: 476, comm: udev Not tainted 2.6.30-rc8 #33 ProLiant
DL380 G4
[ 5.260692] RIP: 0010:[<ffffffff803b282f>] [<ffffffff803b282f>]
pci_vpd_pci22_wait+0xef/0x100
[ 5.260703] RSP: 0018:ffff8801199e7dd8 EFLAGS: 00010246
[ 5.260706] RAX: 0000000000000000 RBX: ffff88011ad60440 RCX:
0000000000001001
[ 5.260709] RDX: 0000000000007c7b RSI: 0000000000000001 RDI:
ffffffff8072c160
[ 5.260712] RBP: ffff8801199e7e08 R08: 0000000000000001 R09:
0000000000000001
[ 5.260715] R10: 0000000000000000 R11: 0000000000000006 R12:
ffff88011ac8d800
[ 5.260718] R13: 00000000fffee028 R14: ffff88011ac8d800 R15:
ffff88011992e000
[ 5.260722] FS: 0000000000000000(0000) GS:ffff88002edd9000(0000)
knlGS:0000000000000000
[ 5.260725] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5.260728] CR2: 00000000022562b8 CR3: 00000001198ef000 CR4:
00000000000006e0
[ 5.260731] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 5.260734] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 5.260738] Process udev (pid: 476, threadinfo ffff8801199e6000, task
ffff88011b78c3c0)
[ 5.260740] Stack:
[ 5.260742] ffff8801199e7e08 0000ffff803b268d 0000000000001000
0000000000000000
[ 5.260746] 0000000000000000 ffff88011ad60440 ffff8801199e7e68
ffffffff803b292c
[ 5.260752] ffffffff803455f6 ffff88011ad60458 0000000000001000
0000000000001000
[ 5.260757] Call Trace:
[ 5.260760] [<ffffffff803b292c>] pci_vpd_pci22_read+0xec/0x170
[ 5.260765] [<ffffffff803455f6>] ? read+0x96/0x1b0
[ 5.260771] [<ffffffff803b22a9>] pci_read_vpd+0x29/0x40
[ 5.260775] [<ffffffff803bab06>] read_vpd_attr+0x46/0x50
[ 5.260780] [<ffffffff803456fb>] read+0x19b/0x1b0
[ 5.260784] [<ffffffff80363566>] ? security_file_permission+0x16/0x20
[ 5.260790] [<ffffffff802e3f5d>] vfs_read+0xbd/0x190
[ 5.260797] [<ffffffff802e4365>] sys_read+0x55/0x90
[ 5.260801] [<ffffffff8020b71b>] system_call_fastpath+0x16/0x1b
[ 5.260807] Code: c9 c3 8b 35 74 33 fe 00 31 c0 4c 89 e2 48 c7 c7 ed
89 6a 80 e8 b3 3d e9 ff 8b 05 5d 33 fe 00 ff c0 89 05 55 33 fe 00 ff c8
75 04 <0f> 0b eb fe b8 92 ff ff ff e9 35 ff ff ff 66 66 90 55 48 89 e5
[ 5.260845] RIP [<ffffffff803b282f>] pci_vpd_pci22_wait+0xef/0x100
[ 5.260849] RSP <ffff8801199e7dd8>
[ 5.260854] ---[ end trace 37f1683404a28980 ]---


here is a single user boot with a timeout printk() and a hack
in mkinitrd to sleep before and after the insmod cciss:

[ 3.858278] NET: Registered protocol family 17
[ 3.864293] Freeing unused kernel memory: 2416k freed
Red Hat nash version 4.2.1.6 starting
Mounted /proc filesystem
Mounting sysfs
Creating /dev
Starting udev
[ 3.876470] udev used greatest stack depth: 5632 bytes left
Loading scsi_mod.ko module
[ 3.977087] SCSI subsystem initialized
Loading sd_mod.ko module
[ 3.987672] Driver 'sd' needs updating - please use bus_type methods
[ 4.280095] input: AT Translated Set 2 keyboard as /class/input/input0
[ 4.626432] input: PS/2 Generic Mouse as /class/input/input1
JIM sleep 1
Loading cciss.ko module
[ 4.999782] HP CISS Driver (v 3.6.20)
[ 5.003771] cciss 0000:04:03.0: PCI INT A -> GSI 51 (level, low) ->
IRQ 51
[ 5.052021] IRQ 51/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
[ 5.059001] cciss0: <0x46> at PCI 0000:04:03.0 IRQ 51 using DAC
[ 5.065800] blocks= 71122560 block_size= 512
[ 5.070602] heads=255, sectors=32, cylinders=8716
[ 5.070603]
[ 5.077428] blocks= 71122560 block_size= 512
[ 5.082151] heads=255, sectors=32, cylinders=8716
[ 5.082153]
[ 5.089413] cciss/c0d0: p1
[ 5.131820] blocks= 71122560 block_size= 512
[ 5.139627] heads=255, sectors=32, cylinders=8716
[ 5.139629]
[ 5.146117] blocks= 71122560 block_size= 512
[ 5.150887] heads=255, sectors=32, cylinders=8716
[ 5.150889]
[ 5.157577] cciss/c0d1: p1
[ 5.160261] TIMEOUT 1 ffff88011ad54800
[ 5.164164] p2 p3
[ 5.208880] blocks= 142253280 block_size= 512
[ 5.215732] heads=255, sectors=32, cylinders=17433
[ 5.215734]
[ 5.223191] blocks= 142253280 block_size= 512
[ 5.227965]
[ 5.227966] TIMEOUT 2 ffff88011ad54800
[ 5.232942] heads=255, sectors=32, cylinders=17433
[ 5.232944]
[ 5.239516] cciss/c0d2: p1 p2 p3
[ 5.292263]
[ 5.292265] TIMEOUT 3 ffff88011ad54800
[ 5.356262]
[ 5.356263] TIMEOUT 4 ffff88011ad54800
[ 5.420266]
[ 5.420267] TIMEOUT 5 ffff88011ad54800
[ 5.484855]
[ 5.484857] TIMEOUT 6 ffff88011ad54800
[ 5.491980] blocks= 71122560 block_size= 512
[ 5.496991] heads=255, sectors=32, cylinders=8716
[ 5.496993]
[ 5.503484] blocks= 71122560 block_size= 512
[ 5.508761] heads=255, sectors=32, cylinders=8716
[ 5.508763]
[ 5.515729] cciss/c0d3: p1
[ 5.548263]
[ 5.548264] TIMEOUT 7 ffff88011ad54800
[ 5.610138] work_for_cpu used greatest stack depth: 4592 bytes left
[ 5.612264]
[ 5.612266] TIMEOUT 8 ffff88011ad54800
[ 5.674223]
[ 5.674224] TIMEOUT 9 ffff88011ad54800
[ 5.736261]
[ 5.736263] TIMEOUT 10 ffff88011ad54800
[ 5.800261]
[ 5.800263] TIMEOUT 11 ffff88011ad54800
[ 5.864259]
[ 5.864260] TIMEOUT 12 ffff88011ad54800
[ 5.928259]
[ 5.928260] TIMEOUT 13 ffff88011ad54800
[ 5.992011]
[ 5.992012] TIMEOUT 14 ffff88011ad54800
[ 6.056011]
[ 6.056012] TIMEOUT 15 ffff88011ad54800
[ 6.120011]
[ 6.120013] TIMEOUT 16 ffff88011ad54800
[ 6.184011]
[ 6.184012] TIMEOUT 17 ffff88011ad54800
[ 6.248011]
[ 6.248012] TIMEOUT 18 ffff88011ad54800
[ 6.312011]
[ 6.312012] TIMEOUT 19 ffff88011ad54800
[ 6.376011]
[ 6.376013] TIMEOUT 20 ffff88011ad54800
JIM sleep 1
Loading jbd.ko module
Loading ext3.ko module
Creating root device
Mounting root filesystem
[ 6.760540] kjournald starting. Commit interval 5 seconds
[ 6.765559] EXT3-fs: mounted filesystem with writeback data mode.
Switching to new root
[ 6.946395] SELinux: Disabled at runtime.
[ 6.950840] type=1404 audit(1245517131.950:2): selinux=0
auid=4294967295 ses=4294967295
INIT: version 2.85 booting
[ 7.235810] uname used greatest stack depth: 4296 bytes left
[ 7.304350] awk used greatest stack depth: 3688 bytes left
Welcome to Red Hat Enterprise Linux AS
Press 'I' to enter interactive startup.
Starting udev: [ OK ]
Initializing hardware... storage network audio done[ OK ]
Configuring kernel parameters: [ OK ]
Setting clock (utc): Sat Jun 20 12:59:02 EDT 2009 [ OK ]
Setting hostname dl380.lnx.usa.hp.com: [ OK ]
Checking root filesystem
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/cciss/c0d1p3
/1: clean, 348377/4177920 files, 2477319/8351791 blocks
[ OK ]
Remounting root filesystem in read-write mode: [ OK ]
No devices found
no block devices found
Setting up Logical Volume Management: [ OK ]
Checking filesystems
Checking all file systems.
[/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/cciss/c0d1p1
/boot12: clean, 46/26104 files, 26042/104388 blocks
[ OK ]
Mounting local filesystems: [ OK ]
Enabling local filesystem quotas: [ OK ]
Enabling swap space: [ OK ]
sh-3.00#

2009-06-21 10:54:29

by Martin Knoblauch

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow


----- Original Message ----

> From: jim owens <[email protected]>
> To: Martin Knoblauch <[email protected]>
> Cc: Jesse Barnes <[email protected]>; Kay Sievers <[email protected]>; Andrew Morton <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Sent: Saturday, June 20, 2009 6:37:34 PM
> Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
>
> Martin Knoblauch wrote:
> > ----- Original Message ----
> >
> >> From: Jesse Barnes
> >> To: Martin Knoblauch
> >> Cc: Kay Sievers ; Andrew Morton
> ; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> >> Sent: Tuesday, June 16, 2009 9:25:47 PM
> >> Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
> >>
> >> On Thu, 28 May 2009 02:14:46 -0700 (PDT)
> >> Martin Knoblauch wrote:
> >>>> I expect the duplicate comes from a left-over mount in initramfs
> >>>> which isn't a duplicate in the sense of a bug in vfs or mount or
> >>>> anything. I guess, it is just still mounted in the initial kernel
> >>>> rootfs, below the root from the disk. It could be that a umount
> >>>> from initramfs did go wrong because of a changed timing.
> >>>>
> >>> This is what I suspect as well. I know for sure that the first
> >>> sysfs-line in /proc/mounts
> >>>
> >>> | none /sys sysfs rw 0 0
> >>>
> >>> is already there (2.6.29-rc1 and up) when entering startup-skripts.
> >>> It is supposed to be unmounted before, but something seems to prevent
> >>> it. I have idea how to capture debug output from the initrd/init
> >>> script :-(
> >> What's the latest here Martin? It sounded like this was a userspace
> >> issue, with something reading the VPD over and over? Or was it just a
> >> longer timeout that caused a specific driver to slow everything down?
> >>
> >
> > Not sure about the VPD thing. Anyway, no real news. Still happens in 2.6.30.
> But it only happens on a certain HW platform (HP/DL380G4). The folks at HP try
> to reproduce in their environment.
> >
> > Cheers
> > Martin
>
> I reproduced this and verified Martin's analysis. Conclusions:
>

Cool, so I am not seeing gremlins :-)

Cheers
Martin

2009-06-21 10:58:02

by Martin Knoblauch

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow


----- Original Message ----

> From: jim owens <[email protected]>
> To: Matthew Wilcox <[email protected]>
> Cc: Kay Sievers <[email protected]>; Martin Knoblauch <[email protected]>; Jesse Barnes <[email protected]>; Andrew Morton <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Sent: Saturday, June 20, 2009 11:17:06 PM
> Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow
>
> Matthew Wilcox wrote:
> > On Sat, Jun 20, 2009 at 08:36:25PM +0200, Kay Sievers wrote:
> >> Ah, I see. There is no tool around udev, I know of, which does this.
> >>
> >> Maybe someone is still using the broken-by-design libsysfs, which
> >> opens _every_ file it can find in /sys, even when not asked for
> >> anything specific.
> >
> > I did consider that option, but dismissed it ... I didn't think anyone
> > was using something that old.
>
> My test is a fresh RHEL 4.3 install with 2.6.30-rc8. Whatever
> scripts are running are stock redhat and I'm still in initrd
> so what would I be looking for?
>

same here. Anything in initrd is stock RedHat stuff, except the kernel modules

> The initrd timeouts all seem to be triggered by the insmod and
> only 1 more vpd read happens in init 3 at the end after HAL starts.
>
> This is the BUG() I set to ID (maybe incorrectly)
> that the vpd reads were comming from udev:
>
> [ 5.260664] ------------[ cut here ]------------
> [ 5.260667] kernel BUG at /root/linux-2.6.30-rc8/drivers/pci/access.c:207!
> [ 5.260671] invalid opcode: 0000 [#1] SMP
> [ 5.260675] last sysfs file:
> /sys/devices/pci0000:00/0000:00:02.0/0000:02:00.2/0000:04:03.0/msi_bus
> [ 5.260678] CPU 1
> [ 5.260681] Modules linked in: cciss(+) sd_mod scsi_mod
> [ 5.260689] Pid: 476, comm: udev Not tainted 2.6.30-rc8 #33 ProLiant DL380 G4
> [ 5.260692] RIP: 0010:[] []
> pci_vpd_pci22_wait+0xef/0x100
> [ 5.260703] RSP: 0018:ffff8801199e7dd8 EFLAGS: 00010246
> [ 5.260706] RAX: 0000000000000000 RBX: ffff88011ad60440 RCX: 0000000000001001
> [ 5.260709] RDX: 0000000000007c7b RSI: 0000000000000001 RDI: ffffffff8072c160
> [ 5.260712] RBP: ffff8801199e7e08 R08: 0000000000000001 R09: 0000000000000001
> [ 5.260715] R10: 0000000000000000 R11: 0000000000000006 R12: ffff88011ac8d800
> [ 5.260718] R13: 00000000fffee028 R14: ffff88011ac8d800 R15: ffff88011992e000
> [ 5.260722] FS: 0000000000000000(0000) GS:ffff88002edd9000(0000)
> knlGS:0000000000000000
> [ 5.260725] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 5.260728] CR2: 00000000022562b8 CR3: 00000001198ef000 CR4: 00000000000006e0
> [ 5.260731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 5.260734] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 5.260738] Process udev (pid: 476, threadinfo ffff8801199e6000, task
> ffff88011b78c3c0)
> [ 5.260740] Stack:
> [ 5.260742] ffff8801199e7e08 0000ffff803b268d 0000000000001000
> 0000000000000000
> [ 5.260746] 0000000000000000 ffff88011ad60440 ffff8801199e7e68
> ffffffff803b292c
> [ 5.260752] ffffffff803455f6 ffff88011ad60458 0000000000001000
> 0000000000001000
> [ 5.260757] Call Trace:
> [ 5.260760] [] pci_vpd_pci22_read+0xec/0x170
> [ 5.260765] [] ? read+0x96/0x1b0
> [ 5.260771] [] pci_read_vpd+0x29/0x40
> [ 5.260775] [] read_vpd_attr+0x46/0x50
> [ 5.260780] [] read+0x19b/0x1b0
> [ 5.260784] [] ? security_file_permission+0x16/0x20
> [ 5.260790] [] vfs_read+0xbd/0x190
> [ 5.260797] [] sys_read+0x55/0x90
> [ 5.260801] [] system_call_fastpath+0x16/0x1b
> [ 5.260807] Code: c9 c3 8b 35 74 33 fe 00 31 c0 4c 89 e2 48 c7 c7 ed 89 6a 80
> e8 b3 3d e9 ff 8b 05 5d 33 fe 00 ff c0 89 05 55 33 fe 00 ff c8 75 04 <0f> 0b eb
> fe b8 92 ff ff ff e9 35 ff ff ff 66 66 90 55 48 89 e5
> [ 5.260845] RIP [] pci_vpd_pci22_wait+0xef/0x100
> [ 5.260849] RSP
> [ 5.260854] ---[ end trace 37f1683404a28980 ]---
>
>
> here is a single user boot with a timeout printk() and a hack
> in mkinitrd to sleep before and after the insmod cciss:
>

How do you get to see the debug-output from initrd? I never managed that.

> [ 3.858278] NET: Registered protocol family 17
> [ 3.864293] Freeing unused kernel memory: 2416k freed
> Red Hat nash version 4.2.1.6 starting
> Mounted /proc filesystem
> Mounting sysfs
> Creating /dev
> Starting udev
> [ 3.876470] udev used greatest stack depth: 5632 bytes left
> Loading scsi_mod.ko module
> [ 3.977087] SCSI subsystem initialized
> Loading sd_mod.ko module
> [ 3.987672] Driver 'sd' needs updating - please use bus_type methods
> [ 4.280095] input: AT Translated Set 2 keyboard as /class/input/input0
> [ 4.626432] input: PS/2 Generic Mouse as /class/input/input1
> JIM sleep 1
> Loading cciss.ko module
> [ 4.999782] HP CISS Driver (v 3.6.20)
> [ 5.003771] cciss 0000:04:03.0: PCI INT A -> GSI 51 (level, low) -> IRQ 51
> [ 5.052021] IRQ 51/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
> [ 5.059001] cciss0: <0x46> at PCI 0000:04:03.0 IRQ 51 using DAC
> [ 5.065800] blocks= 71122560 block_size= 512
> [ 5.070602] heads=255, sectors=32, cylinders=8716
> [ 5.070603]
> [ 5.077428] blocks= 71122560 block_size= 512
> [ 5.082151] heads=255, sectors=32, cylinders=8716
> [ 5.082153]
> [ 5.089413] cciss/c0d0: p1
> [ 5.131820] blocks= 71122560 block_size= 512
> [ 5.139627] heads=255, sectors=32, cylinders=8716
> [ 5.139629]
> [ 5.146117] blocks= 71122560 block_size= 512
> [ 5.150887] heads=255, sectors=32, cylinders=8716
> [ 5.150889]
> [ 5.157577] cciss/c0d1: p1
> [ 5.160261] TIMEOUT 1 ffff88011ad54800
> [ 5.164164] p2 p3
> [ 5.208880] blocks= 142253280 block_size= 512
> [ 5.215732] heads=255, sectors=32, cylinders=17433
> [ 5.215734]
> [ 5.223191] blocks= 142253280 block_size= 512
> [ 5.227965]
> [ 5.227966] TIMEOUT 2 ffff88011ad54800
> [ 5.232942] heads=255, sectors=32, cylinders=17433
> [ 5.232944]
> [ 5.239516] cciss/c0d2: p1 p2 p3
> [ 5.292263]
> [ 5.292265] TIMEOUT 3 ffff88011ad54800
> [ 5.356262]
> [ 5.356263] TIMEOUT 4 ffff88011ad54800
> [ 5.420266]
> [ 5.420267] TIMEOUT 5 ffff88011ad54800
> [ 5.484855]
> [ 5.484857] TIMEOUT 6 ffff88011ad54800
> [ 5.491980] blocks= 71122560 block_size= 512
> [ 5.496991] heads=255, sectors=32, cylinders=8716
> [ 5.496993]
> [ 5.503484] blocks= 71122560 block_size= 512
> [ 5.508761] heads=255, sectors=32, cylinders=8716
> [ 5.508763]
> [ 5.515729] cciss/c0d3: p1
> [ 5.548263]
> [ 5.548264] TIMEOUT 7 ffff88011ad54800
> [ 5.610138] work_for_cpu used greatest stack depth: 4592 bytes left
> [ 5.612264]
> [ 5.612266] TIMEOUT 8 ffff88011ad54800
> [ 5.674223]
> [ 5.674224] TIMEOUT 9 ffff88011ad54800
> [ 5.736261]
> [ 5.736263] TIMEOUT 10 ffff88011ad54800
> [ 5.800261]
> [ 5.800263] TIMEOUT 11 ffff88011ad54800
> [ 5.864259]
> [ 5.864260] TIMEOUT 12 ffff88011ad54800
> [ 5.928259]
> [ 5.928260] TIMEOUT 13 ffff88011ad54800
> [ 5.992011]
> [ 5.992012] TIMEOUT 14 ffff88011ad54800
> [ 6.056011]
> [ 6.056012] TIMEOUT 15 ffff88011ad54800
> [ 6.120011]
> [ 6.120013] TIMEOUT 16 ffff88011ad54800
> [ 6.184011]
> [ 6.184012] TIMEOUT 17 ffff88011ad54800
> [ 6.248011]
> [ 6.248012] TIMEOUT 18 ffff88011ad54800
> [ 6.312011]
> [ 6.312012] TIMEOUT 19 ffff88011ad54800
> [ 6.376011]
> [ 6.376013] TIMEOUT 20 ffff88011ad54800
> JIM sleep 1
> Loading jbd.ko module
> Loading ext3.ko module
> Creating root device
> Mounting root filesystem
> [ 6.760540] kjournald starting. Commit interval 5 seconds
> [ 6.765559] EXT3-fs: mounted filesystem with writeback data mode.
> Switching to new root
> [ 6.946395] SELinux: Disabled at runtime.
> [ 6.950840] type=1404 audit(1245517131.950:2): selinux=0 auid=4294967295
> ses=4294967295
> INIT: version 2.85 booting
> [ 7.235810] uname used greatest stack depth: 4296 bytes left
> [ 7.304350] awk used greatest stack depth: 3688 bytes left
> Welcome to Red Hat Enterprise Linux AS
> Press 'I' to enter interactive startup.
> Starting udev: [ OK ]
> Initializing hardware... storage network audio done[ OK ]
> Configuring kernel parameters: [ OK ]
> Setting clock (utc): Sat Jun 20 12:59:02 EDT 2009 [ OK ]
> Setting hostname dl380.lnx.usa.hp.com: [ OK ]
> Checking root filesystem
> [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/cciss/c0d1p3
> /1: clean, 348377/4177920 files, 2477319/8351791 blocks
> [ OK ]
> Remounting root filesystem in read-write mode: [ OK ]
> No devices found
> no block devices found
> Setting up Logical Volume Management: [ OK ]
> Checking filesystems
> Checking all file systems.
> [/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/cciss/c0d1p1
> /boot12: clean, 46/26104 files, 26042/104388 blocks
> [ OK ]
> Mounting local filesystems: [ OK ]
> Enabling local filesystem quotas: [ OK ]
> Enabling swap space: [ OK ]
> sh-3.00#

2009-06-21 13:50:26

by jim owens

[permalink] [raw]
Subject: Re: Analyzed/Solved/Bisected: Booting 2.6.30-rc2-git7 very slow

Martin Knoblauch wrote:
>
> How do you get to see the debug-output from initrd? I never managed that.

I did all my boot testing from ILO with VSP for a serial console.

jim