LinuxLists.cc - How does the disk buffer cache work?

2002-12-31 00:20:53

Subject: How does the disk buffer cache work?

Earlier I wrote to the list where my SS10 hung on the partition check
if a bad disk was installed.

This behavior is new to the 2.4.20 kernel. I previously ran 2.2.20 on the
machine. (the default in a Debian 3.0r0 install) I can't vouch for 2.4
kernels previous to 2.4.20.

I have traced the problem to a hang in the one of the disk buffer caches.

Can anyone tell me how to correct the behavior so that I:

1. Don't break things for other parts of the kernel
2. The disk cache will return with an error for a hung disk?

Here's the tail of the console with debugging printk's inserted:

sda: Spinning up disk.......................................................................................................not responding...
sda : READ CAPACITY failed.
sda : status = 0, message = 00, host = 0, driver = 28
Current sd00:00: sense key Not Ready
Additional sense indicates Logical unit is in process of becoming ready
sda : block size assumed to be 512 bytes, disk size 1GB.
Partition check:
sda:sun.c/sun_partition: Before read_dev_sector
check.c/read_dev_sector: offset = 0
check.c/read_dev_sector: passing page filler function @ f004fd14
filemap.c/read_cache_page: enter
filemap.c/read_cache_page: before __read_cache_page
filemap.c/__read_cache_page: enter
block_dev.c/block_read_full_page: before first do
block_dev.c/block_read_full_page: before if (!nr)
block_dev.c/block_read_full_page: before stage two
block_dev.c/block_read_full_page: before starting I/O
block_dev.c/block_read_full_page: returning
filemap.c/read_cache_page: after __read_cache_page
filemap.c/read_cache_page: after mark_page_accessed

[.. the next function call in read_cache_page() is lock_page(), which we
hang forever on ..]

Can those more familiar with the buffer caches advise me on a solution?
Errors on cached devices should propagate up to higher layers. As is, the
machine hangs forever when reading sector 0 to check the partition table.

Thanks!

- Matt

2002-12-31 01:16:06

by Andrew Morton

[permalink] [raw]

Subject: Re: How does the disk buffer cache work?

Matthew Zahorik wrote:
>
> Earlier I wrote to the list where my SS10 hung on the partition check
> if a bad disk was installed.
>
> This behavior is new to the 2.4.20 kernel. I previously ran 2.2.20 on the
> machine. (the default in a Debian 3.0r0 install) I can't vouch for 2.4
> kernels previous to 2.4.20.
>
> I have traced the problem to a hang in the one of the disk buffer caches.
>
> Can anyone tell me how to correct the behavior so that I:
>
> 1. Don't break things for other parts of the kernel
> 2. The disk cache will return with an error for a hung disk?
>
> Here's the tail of the console with debugging printk's inserted:
>
> ...
> [.. the next function call in read_cache_page() is lock_page(), which we
> hang forever on ..]

lock_page() will sleep until the page is unlocked. The page is unlocked
from end_buffer_io_sync(), which is called from within the context of
the disk device driver's interrupt handler.

This is probably a device driver or interrupt routing problem: the disk
controller hardware interrupts are not making it through to the CPU.

2002-12-31 02:45:16

by Matthew Zahorik

[permalink] [raw]

Subject: Re: How does the disk buffer cache work?

On Mon, 30 Dec 2002, Andrew Morton wrote:

> > [.. the next function call in read_cache_page() is lock_page(), which we
> > hang forever on ..]
>
> lock_page() will sleep until the page is unlocked. The page is unlocked
> from end_buffer_io_sync(), which is called from within the context of
> the disk device driver's interrupt handler.

Okay, I'll track it down there. Probably the driver not calling
end_buffer_io_sync() when timed out. When the bad drive is detached,
things work fine - leading me to believe that hardware and interrupt
routing wise things are okay.

Thanks!

- Matt

2002-12-31 04:05:46

by Andrew Morton

[permalink] [raw]

Subject: Re: How does the disk buffer cache work?

Matthew Zahorik wrote:
>
> On Mon, 30 Dec 2002, Andrew Morton wrote:
>
> > > [.. the next function call in read_cache_page() is lock_page(), which we
> > > hang forever on ..]
> >
> > lock_page() will sleep until the page is unlocked. The page is unlocked
> > from end_buffer_io_sync(), which is called from within the context of
> > the disk device driver's interrupt handler.
>
> Okay, I'll track it down there. Probably the driver not calling
> end_buffer_io_sync() when timed out. When the bad drive is detached,
> things work fine - leading me to believe that hardware and interrupt
> routing wise things are okay.
>

It won't call end_buffer_io_sync() explicitly - it calls the function which
is pointed at by the relevant buffer's b_end_io vector. Typically that
will point at end_buffer_io_aysnc() or end_buffer_io_sync()

2003-01-01 19:06:51

by Matthew Zahorik

[permalink] [raw]

Subject: sd driver NOT_READY behavior / was Re: How does the disk buffer cache work?

On Mon, 30 Dec 2002, Andrew Morton wrote:

> Matthew Zahorik wrote:
> >
> > Earlier I wrote to the list where my SS10 hung on the partition check
> > if a bad disk was installed.
>
> lock_page() will sleep until the page is unlocked. The page is unlocked
> from end_buffer_io_sync(), which is called from within the context of
> the disk device driver's interrupt handler.
>
> This is probably a device driver or interrupt routing problem: the disk
> controller hardware interrupts are not making it through to the CPU.

Found the problem, don't know how to fix it. 2.4.20 kernel.

The bad drive is returning "NOT READY" to sd. According to this code in
scsi_lib.c/scsi_io_completion():

if ((SCpnt->sense_buffer[0] & 0x7f) == 0x70) {
/*
* If the device is in the process of becoming ready,
* retry.
*/
if (SCpnt->sense_buffer[12] == 0x04 &&
SCpnt->sense_buffer[13] == 0x01) {
scsi_queue_next_request(q, SCpnt);
return;
}

My sense is [0] = 0x70, [2] = 0x2 (not ready) [12] = 4 [13] = 1.

Unfortunately, the drive never becomes ready. Therefore the request is
resubmitted, forever. Therefore the sector read never returns success,
therefore you hang on read or write of a drive that returns NOT_READY
forever. Therefore I'm hanging on the read of the partition table,
therefore my kernel won't start with a bad drive in the system.

2.2 behavior was different. A not ready would be labeled as a SCSI error
and the failure to read was passed up through the layers.

Now, I could put that beahavior back, where a NOT_READY is a fatal error,
but I'm afraid to screw up other situations where NOT_READY means you
should wait a little longer. (hot plug, removables, etc?)

What is the correct behavior that I should implement?

a. if !removable && not ready then error
b. if not ready then increase count until threshold then error
c if not ready then error
d. none of the above

Thanks!

- Matt

2003-01-01 20:41:26

by Alan

[permalink] [raw]

Subject: Re: sd driver NOT_READY behavior / was Re: How does the disk buffer cache work?

On Wed, 2003-01-01 at 19:19, Matthew Zahorik wrote:
> What is the correct behavior that I should implement?
>
> a. if !removable && not ready then error
> b. if not ready then increase count until threshold then error
> c if not ready then error
> d. none of the above

I would go for a time limit (you don't want to keep spamming the same
command but to poll politely really IMHO)

2003-01-05 01:21:22

by John Bäckstrand

[permalink] [raw]

Subject: Re: How does the disk buffer cache work?

Matthew Zahorik wrote:
>
> Earlier I wrote to the list where my SS10 hung on the
partition check
> if a bad disk was installed.
>
> This behavior is new to the 2.4.20 kernel. I
previously ran 2.2.20 on the
> machine. (the default in a Debian 3.0r0 install) I
can't vouch for 2.4
> kernels previous to 2.4.20.
>
> I have traced the problem to a hang in the one of the
disk buffer caches.
>
> Can anyone tell me how to correct the behavior so
that I:
>
> 1. Don't break things for other parts of the kernel
> 2. The disk cache will return with an error for a
hung disk?
>
> Here's the tail of the console with debugging
printk's inserted:
>
> ...
> [.. the next function call in read_cache_page() is
lock_page(), which we
> hang forever on ..]

This happens to me aswell. 2.5.35(I think) and 2.4.20
is not working, a slackware 2.2 bootdisk is fine though
so something is wrong. The hdd is fine in DOS aswell.

---
John B?ckstrand

2003-01-05 19:00:36

by John Bäckstrand

[permalink] [raw]

Subject: Re: How does the disk buffer cache work?

> > Earlier I wrote to the list where my SS10 hung on
the partition check
> This happens to me aswell. 2.5.35(I think) and 2.4.20
> is not working, a slackware 2.2 bootdisk is fine
though
> so something is wrong. The hdd is fine in DOS aswell.

More details: lspci -vv output for the IDE controller:

00:07.1 IDE interface: Intel Corp. 82371FB PIIX IDE
[Triton I] (rev 02) (prog-if 80 [Master])
Control: I/O+ Mem- BusMaster+ SpecCycle-
MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr-
DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Region 0: [virtual] I/O ports at 01f0
Region 1: [virtual] I/O ports at 03f4
Region 2: [virtual] I/O ports at 0170
Region 3: [virtual] I/O ports at 0374
Region 4: I/O ports at 3000 [size=16]

The hdd is a 1GB ST51080A, but I dont know if its the
particualr hdd that causes problems, or if its
something else. A cdrom on the same channel works. Dont
have any other hdds to test with right now.

---
John B?ckstrand

2003-01-07 16:18:49

by John Bäckstrand

[permalink] [raw]

Subject: Re: How does the disk buffer cache work?

> More details: lspci -vv output for the IDE
controller:
>
> 00:07.1 IDE interface: Intel Corp. 82371FB PIIX IDE
> [Triton I] (rev 02) (prog-if 80 [Master])
> Control: I/O+ Mem- BusMaster+ SpecCycle-
> MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
> Status: Cap- 66Mhz- UDF- FastB2B+ ParErr-
> DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR-
<PERR-
> Latency: 32
> Region 0: [virtual] I/O ports at 01f0
> Region 1: [virtual] I/O ports at 03f4
> Region 2: [virtual] I/O ports at 0170
> Region 3: [virtual] I/O ports at 0374
> Region 4: I/O ports at 3000 [size=16]
>
> The hdd is a 1GB ST51080A, but I dont know if its the
> particualr hdd that causes problems, or if its
> something else. A cdrom on the same channel works.
Dont
> have any other hdds to test with right now.

I saw that this mail did not connect to the right
thread, so to recap: With this hdd as master on my
second IDE channel, 2.4.20 hangs at the partition
check, a backtrace:

lock_page
read_cache_page
read_dev_sector
handle_ide_mess
msdos_partition
check_partition
grok_partitions
register_disk
ide_geninit
ide_init
blk_dev_init
device_init
do_initcalls

There is a primary msdos partition on this drive that
use the entire disk, but I also tried with a clear
partition table. 2.2 slackware bootdisk is fine, DOS is
fine in using the hdd. If anybody wants more info, just
tell me what to do. Im not on the list so reply to me
if so. I also tried lots of IDE patches but didnt help.
Any specific kernel versions or kernel options I should
try/disable/enable?

---
John B?ckstrand