Date: Tue, 24 Jun 2003 19:43:31 +0200
From: Willy Tarreau <willy@w.ods.org>
To: Stephan von Krawczynski <skraw@ithnet.com>
Cc: linux-kernel@vger.kernel.org, willy@w.ods.org, marcelo@conectiva.com.br,
       kpfleming@cox.net, stoffel@lucent.com, gibbs@scsiguy.com,
       green@namesys.com
Subject: Re: Undo aic7xxx changes (now rc7+aic20030603)
Message-ID: <20030624174331.GA31650@alpha.home.local>
References: <20030509150207.3ff9cd64.skraw@ithnet.com> <41560000.1055306361@caspian.scsiguy.com> <20030611222346.0a26729e.skraw@ithnet.com> <16103.39056.810025.975744@gargle.gargle.HOWL> <20030613114531.2b7235e7.skraw@ithnet.com> <20030621105019.GA834@pcw.home.local> <20030623133053.30d6cb88.skraw@ithnet.com> <20030624131138.249fb7df.skraw@ithnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20030624131138.249fb7df.skraw@ithnet.com>
User-Agent: Mutt/1.4i
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1833
Lines: 39

Hi Stephan,

> Is it possible that the verification errors do not occur because of a read
> problem, but because of a page cached block getting trashed somehow between
> "tar to tape" and "read from tape". I would suspect that some blocks survive in
> memory and are re-used during verification. If for some reason this data is
> invalid or corrupted the verification fails although the read was correct.

That seems strange to me, I don't see how we could cache data from a char
device. It is possible that chkblk and tar don't use same block size and that
your problem only occurs on larger transfers, or particularly aligned ones.

You could try to increase the block size in chkblk to something bigger than a
page for example. I don't know if tar reads your tape at full speed, but it's
possible that if it doesn't cope with the tape speed, an overrun occurs and
something finally gets dropped :-/

> I know that this sounds weird, but nevertheless possible, or not?
> It may even be worse, the data may have also been left from the original nfs
> action, correct?
> Is there a way to completely invalidate/flush all cached blocks concerning this
> fs (besides umount)?

I don't believe in this. But as Justin says, this card can get very high
performances and hassle the hardware. Perhaps you have a rare weakness in your
hardware that only occurs under these conditions, although I don't know how
this could be checked.

IIRC, you said that it works flawlessly in UP and you need SMP to hit the bug.
Perhaps your second CPU is sometimes flaky (bad cache, etc...) :-/

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/