2002-12-27 20:50:16

by Matthew Zahorik

[permalink] [raw]
Subject: Hang on partition check, 2.4 kernels

I have a sparc SS10 that I recently installed with Debian 3.0r0.

There are two drives in the machine, sda (ID #1) and sdb (ID #3)

sda is dead, sdb is functioning fine.

Rather than awaiting a replacement for sda, I went ahead and installed
Debian on sdb. Worked fine. Happy day.

Debian 3.0r0 on sparc32 installs a 2.2.20 kernel. Obtained and compiled
2.4.20. Rebooted.

The 2.4.20 kernel hangs forever waiting to read the partitions on sda.

2.2.20 didn't do this. It printed a message about failing to read the
table, then continued.

Wandered through the 2.2.20 and 2.4.20 code and compared.

In 2.2.20, the partition check for Sun is in
drivers/block/genhd.c:sun_partition()

This code calls bread() directly, immediately handling the error and
skipping a bad drive.

In 2.4.20, the partition check for Sun is in
fs/partitions/sun.c:sun_partition()

This calls read_dev_sector in fs/partitions/check.c, which
calls read_cache_page then calls wait_on_page is there is an error.
This seems to wait forever.

Can someone explain how/why read_cache_page is called, how it's wrong, and
why bread() isn't sufficient in this function? I don't know how to fix
the code once VM got involved.

I could simply pull the damaged drive and the machine would boot (I'd have
to adjust root to point to sda instead of sdb...) But I'd like to fix
the behavior now - the machine should boot despite the failure.

Thanks!

- Matt