2009-10-09 06:18:09

by Andy Isaacson

[permalink] [raw]
Subject: DMAR regression in 2.6.31 leads to ext4 corruption?

[resending to fit under vger's size limits, sorry if anybody gets this
twice.]

I'm testing DMAR support on 2.6.32 on Intel VT-d laptop platforms. It
was pretty stable circa 2.6.31-rc5 (we have dozens of machines running
2.6.31-rc8), but in the last two weeks I've had a bunch of instability
on Linus' tip kernels that looked potentially like IOMMU badness.

For example,
<[email protected]>
http://lkml.org/lkml/2009/9/28/201

Today while running 817b33d38 I got the following (on a Thinkpad X200
I'd replaced the Dell with, just in case it was previously-good hardware
going bad).

[ 29.450550] EXT4-fs error (device sda1): ext4_lookup: deleted inode referenced: 79
[ 30.022328] DRHD: handling fault status reg 3
[ 30.022328] DMAR:[DMA Write] Request device [00:02.0] fault addr ddae28000
[ 30.022328] DMAR:[fault reason 05] PTE Write access is not set
[ 30.146136] DRHD: handling fault status reg 3
[ 30.248938] DMAR:[DMA Write] Request device [00:02.0] fault addr ddae28000
[ 30.248939] DMAR:[fault reason 05] PTE Write access is not set

The full output of fsck and full dmesg are at the URL below.

I don't know that DMAR is resulting in my repeated filesystem
corruption, but it does seem like a potential cause (and would explain
why I'm seeing this whereas most people aren't, since few people are
using VT-d *and* i915).

I see that the BROKEN_GFX_WA code has been removed; do we actually
believe that the relevant code is working? Could it be corrupting my
AHCI DMAs if not? At the end of the last thread Ted thought that we'd
lost a write of an inode block; this time the symptoms look different,
in that I don't see one inode block representing a significant data
loss (though I'm by no means an expert).

Complete dmesg etc are at
http://web.hexapodia.org/~adi/bugs/20091008-ext4-dmar/

I'll try running with BROKEN_GFX_WA turned back on and see if that
improves things at all.

Thanks,
-andy


2009-10-09 23:37:45

by Andy Isaacson

[permalink] [raw]
Subject: Re: DMAR regression in 2.6.31 leads to ext4 corruption?

On Thu, Oct 08, 2009 at 11:17:29PM -0700, Andy Isaacson wrote:
> I'm testing DMAR support on 2.6.32 on Intel VT-d laptop platforms. It
> was pretty stable circa 2.6.31-rc5 (we have dozens of machines running
> 2.6.31-rc8), but in the last two weeks I've had a bunch of instability
> on Linus' tip kernels that looked potentially like IOMMU badness.
>
> For example,
> <[email protected]>
> http://lkml.org/lkml/2009/9/28/201
>
> Today while running 817b33d38 I got the following (on a Thinkpad X200
> I'd replaced the Dell with, just in case it was previously-good hardware
> going bad).
>
> [ 29.450550] EXT4-fs error (device sda1): ext4_lookup: deleted inode referenced: 79
> [ 30.022328] DRHD: handling fault status reg 3
> [ 30.022328] DMAR:[DMA Write] Request device [00:02.0] fault addr ddae28000
> [ 30.022328] DMAR:[fault reason 05] PTE Write access is not set
> [ 30.146136] DRHD: handling fault status reg 3
> [ 30.248938] DMAR:[DMA Write] Request device [00:02.0] fault addr ddae28000
> [ 30.248939] DMAR:[fault reason 05] PTE Write access is not set
>
> The full output of fsck and full dmesg are at the URL below.

I also have a "e2image -r" for this filesystem. Please let me know if
you'd like any further information.

-andy

2009-10-10 00:10:18

by Chris Wright

[permalink] [raw]
Subject: Re: DMAR regression in 2.6.31 leads to ext4 corruption?

* Andy Isaacson ([email protected]) wrote:
> Today while running 817b33d38 I got the following (on a Thinkpad X200
> I'd replaced the Dell with, just in case it was previously-good hardware
> going bad).
>
> [ 29.450550] EXT4-fs error (device sda1): ext4_lookup: deleted inode referenced: 79
> [ 30.022328] DRHD: handling fault status reg 3
> [ 30.022328] DMAR:[DMA Write] Request device [00:02.0] fault addr ddae28000
> [ 30.022328] DMAR:[fault reason 05] PTE Write access is not set
> [ 30.146136] DRHD: handling fault status reg 3
> [ 30.248938] DMAR:[DMA Write] Request device [00:02.0] fault addr ddae28000
> [ 30.248939] DMAR:[fault reason 05] PTE Write access is not set

There's some timing coincidence there, but it's a full 1/2 second between
the ext4 error and the DMAR fault (and there's various DMAR faults
along the way for the same buffer before and after the ext4 error).
That fault is quite typical of a driver bug, and it's the VGA device
(rather its driver) that is culpable. The IOMMU caught the VGA device
trying to do a DMA write to a buffer mapped r/o.

> The full output of fsck and full dmesg are at the URL below.
>
> I don't know that DMAR is resulting in my repeated filesystem
> corruption, but it does seem like a potential cause (and would explain
> why I'm seeing this whereas most people aren't, since few people are
> using VT-d *and* i915).

I do use it every day on my primary workstation (x200), and haven't
had any issue (I'm using ext3).

> I see that the BROKEN_GFX_WA code has been removed; do we actually
> believe that the relevant code is working? Could it be corrupting my
> AHCI DMAs if not?

It should be for your adapter (after 66a4fe0c merged in agp fixes).
While it could still be broken (aside of the initial faults before the
device is even initialized in Linux -- I'm not seeing any faults, btw),
iommu=pt will put all devices in a 1:1 mapped domain and would suppress
the DMAR faults you see (similar to intel_iommu=off, but allowing the
iommu to still be used for pci device assignment). However, doing that
or enabling the gfx workaround would allow the device to generate invalid
DMA requests since if effectively disables the IOMMU for the gfx device,
which would leave a better opportunity for DMA related corruption.
The earlier fs issues we saw w/ the IOMMU were when it was actively
blocking disk DMA requests, but that's not happening here.

> At the end of the last thread Ted thought that we'd
> lost a write of an inode block; this time the symptoms look different,
> in that I don't see one inode block representing a significant data
> loss (though I'm by no means an expert).
>
> Complete dmesg etc are at
> http://web.hexapodia.org/~adi/bugs/20091008-ext4-dmar/
>
> I'll try running with BROKEN_GFX_WA turned back on and see if that
> improves things at all.

thanks,
-chris

2009-10-10 01:47:52

by Andy Isaacson

[permalink] [raw]
Subject: Re: DMAR regression in 2.6.31 leads to ext4 corruption?

On Fri, Oct 09, 2009 at 05:09:26PM -0700, Chris Wright wrote:
> There's some timing coincidence there, but it's a full 1/2 second between
> the ext4 error and the DMAR fault (and there's various DMAR faults
> along the way for the same buffer before and after the ext4 error).
> That fault is quite typical of a driver bug, and it's the VGA device
> (rather its driver) that is culpable. The IOMMU caught the VGA device
> trying to do a DMA write to a buffer mapped r/o.

Yeah, this timing coincidence isn't very compelling to me, but
*something* sure is hosing my filesystems. (Note that I've had ext3 and
ext4 both fail with what look like missed, misdirected, or corrupted
writes.)

> > The full output of fsck and full dmesg are at the URL below.
> >
> > I don't know that DMAR is resulting in my repeated filesystem
> > corruption, but it does seem like a potential cause (and would explain
> > why I'm seeing this whereas most people aren't, since few people are
> > using VT-d *and* i915).
>
> I do use it every day on my primary workstation (x200), and haven't
> had any issue (I'm using ext3).

What BIOS version are you using? The X200 I'm testing on has "BIOS
Version 2.02 (6DET38WW) 2008-12-19". Hmm, I see there's a 3.08
2009-09-07 available...

Could you provide a dmesg from a working machine?

> > I see that the BROKEN_GFX_WA code has been removed; do we actually
> > believe that the relevant code is working? Could it be corrupting my
> > AHCI DMAs if not?
>
> It should be for your adapter (after 66a4fe0c merged in agp fixes).
> While it could still be broken (aside of the initial faults before the
> device is even initialized in Linux -- I'm not seeing any faults, btw),
> iommu=pt will put all devices in a 1:1 mapped domain and would suppress
> the DMAR faults you see (similar to intel_iommu=off, but allowing the
> iommu to still be used for pci device assignment). However, doing that
> or enabling the gfx workaround would allow the device to generate invalid
> DMA requests since if effectively disables the IOMMU for the gfx device,
> which would leave a better opportunity for DMA related corruption.

OK, thanks for the confirmation that the BROKEN_GFX_WA issues should be
fixed in current linus kernels, I'm certainly running with 66a4fe0c.

> The earlier fs issues we saw w/ the IOMMU were when it was actively
> blocking disk DMA requests, but that's not happening here.

Well, we don't know for sure what happened on the previous boot where
the filesystem corruption occurred. I'm imagining a nightmare scenario
where GPU erroneous writes cause DMAR faults and handling them somehow
causes AHCI DMA requests to get lost.

I'm going to go ahead on the theory that the BIOS needs an update.

-andy

2009-10-14 12:10:37

by David Woodhouse

[permalink] [raw]
Subject: Re: DMAR regression in 2.6.31 leads to ext4 corruption?

On Fri, 2009-10-09 at 18:47 -0700, Andy Isaacson wrote:
> Well, we don't know for sure what happened on the previous boot where
> the filesystem corruption occurred. I'm imagining a nightmare scenario
> where GPU erroneous writes cause DMAR faults and handling them somehow
> causes AHCI DMA requests to get lost.

Seems unlikely. The GPU faults happen whenever the GATT changes, because
it translates _every_ address in the GATT through the IOMMU right there
and then -- so if parts of the table are uninitialised, they'll cause
stray write faults. But no writes are actually _happening_.

> I'm going to go ahead on the theory that the BIOS needs an update.

I can't really imagine how that would help; how the BIOS would be
responsible for this. I'm more inclined to blame the drive. It's not an
SSD, is it?

--
dwmw2

2009-10-14 15:27:09

by Bhavesh Davda

[permalink] [raw]
Subject: RE: DMAR regression in 2.6.31 leads to ext4 corruption?

Sorry if this is unrelated, but I'm also seeing a IOMMU PTE Write fault on my Lenovo x200 booted into 2.6.32-rc4 early on boot. I'm just using ext3, and no visible file system corruptioin so far.

[ 0.208727] DMAR: Forcing write-buffer flush capability
...
[ 0.221299] DMAR: Host address width 36
[ 0.221299] DMAR: DRHD base: 0x000000feb03000 flags: 0x0
[ 0.221299] IOMMU feb03000: ver 1:0 cap c9008020e30260 ecap 1000
[ 0.221299] DMAR: DRHD base: 0x000000feb01000 flags: 0x0
[ 0.221299] IOMMU feb01000: ver 1:0 cap c0000020630260 ecap 1000
[ 0.221299] DMAR: DRHD base: 0x000000feb00000 flags: 0x0
[ 0.221299] IOMMU feb00000: ver 1:0 cap c0000020630270 ecap 1000
[ 0.221299] DMAR: DRHD base: 0x000000feb02000 flags: 0x1
[ 0.221299] IOMMU feb02000: ver 1:0 cap c9008020630260 ecap 1000
[ 0.221299] DMAR: RMRR base: 0x000000f2826c00 end: 0x000000f28273ff
[ 0.221299] DMAR: RMRR base: 0x000000bdc00000 end: 0x000000bfffffff
[ 0.221299] DMAR: No ATSR found
...
[ 0.224001] DRHD: handling fault status reg 3
[ 0.224003] DMAR:[DMA Write] Request device [00:02.0] fault addr 95e7000
[ 0.224004] DMAR:[fault reason 05] PTE Write access is not set
[ 0.224084] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O

BIOS version 6DET33WW (1.10)

Thanks

- Bhavesh

Bhavesh P. Davda

> -----Original Message-----
> From: [email protected] [mailto:iommu-
> [email protected]] On Behalf Of David Woodhouse
> Sent: Wednesday, October 14, 2009 5:09 AM
> To: Andy Isaacson
> Cc: Chris Wright; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: DMAR regression in 2.6.31 leads to ext4 corruption?
>
> On Fri, 2009-10-09 at 18:47 -0700, Andy Isaacson wrote:
> > Well, we don't know for sure what happened on the previous boot where
> > the filesystem corruption occurred. I'm imagining a nightmare
> scenario
> > where GPU erroneous writes cause DMAR faults and handling them
> somehow
> > causes AHCI DMA requests to get lost.
>
> Seems unlikely. The GPU faults happen whenever the GATT changes,
> because
> it translates _every_ address in the GATT through the IOMMU right there
> and then -- so if parts of the table are uninitialised, they'll cause
> stray write faults. But no writes are actually _happening_.
>
> > I'm going to go ahead on the theory that the BIOS needs an update.
>
> I can't really imagine how that would help; how the BIOS would be
> responsible for this. I'm more inclined to blame the drive. It's not an
> SSD, is it?
>
> --
> dwmw2
>
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/iommu

2009-10-14 15:35:20

by David Woodhouse

[permalink] [raw]
Subject: RE: DMAR regression in 2.6.31 leads to ext4 corruption?

On Wed, 2009-10-14 at 08:26 -0700, Bhavesh Davda wrote:
> [ 0.224001] DRHD: handling fault status reg 3
> [ 0.224003] DMAR:[DMA Write] Request device [00:02.0] fault addr 95e7000
> [ 0.224004] DMAR:[fault reason 05] PTE Write access is not set

It's harmless. Your BIOS just isn't completely initialising the GATT.

--
dwmw2

2009-10-14 17:52:52

by Andy Isaacson

[permalink] [raw]
Subject: Re: DMAR regression in 2.6.31 leads to ext4 corruption?

On Wed, Oct 14, 2009 at 01:09:26PM +0100, David Woodhouse wrote:
> On Fri, 2009-10-09 at 18:47 -0700, Andy Isaacson wrote:
> > Well, we don't know for sure what happened on the previous boot where
> > the filesystem corruption occurred. I'm imagining a nightmare scenario
> > where GPU erroneous writes cause DMAR faults and handling them somehow
> > causes AHCI DMA requests to get lost.
>
> Seems unlikely. The GPU faults happen whenever the GATT changes, because
> it translates _every_ address in the GATT through the IOMMU right there
> and then -- so if parts of the table are uninitialised, they'll cause
> stray write faults. But no writes are actually _happening_.
>
> > I'm going to go ahead on the theory that the BIOS needs an update.
>
> I can't really imagine how that would help; how the BIOS would be
> responsible for this. I'm more inclined to blame the drive. It's not an
> SSD, is it?

It's a Fujitsu (now serviced by Toshiba?) MHZ2160BH. smartctl says:

Device Model: FUJITSU MHZ2160BH G1
Serial Number: K60WT8C2HHRS
Firmware Version: 0084000A
User Capacity: 160,041,885,696 bytes
...
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always -
219593
2 Throughput_Performance 0x0005 100 100 030 Pre-fail Offline -
27721728
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always -
0
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always -
406
5 Reallocated_Sector_Ct 0x0033 100 100 024 Pre-fail Always -
8589934592000
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always -
112
8 Seek_Time_Performance 0x0005 100 100 019 Pre-fail Offline -
0
9 Power_On_Hours 0x0032 097 097 000 Old_age Always -
1598
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always -
0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always -
284
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always -
78
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always -
1216
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always -
38 (Lifetime Min/Max 21/46)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always -
247
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always -
457965568
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always -
0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline -
0
199 UDMA_CRC_Error_Count 0x003e 200 253 000 Old_age Always -
0
200 Multi_Zone_Error_Rate 0x000f 100 100 060 Pre-fail Always -
10448
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always -
1529011503750
240 Head_Flying_Hours 0x003e 200 200 000 Old_age Always -
0

-andy