by Jiang Liu

[permalink] [raw]

Subject: Re: hpsa driver bug crack kernel down!

Hi Baoquan,
Could you please help to give output of "lspci -vvvv"?
Is device "hpsa 0000:03:00.0" a legacy PCI device(non-PCIe)?
It may have relationship with IOMMU driver.
Thanks!
Gerry

On 2014/4/10 12:03, Bjorn Helgaas wrote:
> [+cc Joerg, iommu list]
>
> On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso <[email protected]> wrote:
>> On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote:
>>> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote:
>>>> On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote:
>>>>> On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote:
>>>>>> [+linux-scsi]
>>>>>> On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote:
>>>>>>> On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The kernel is 3.14.0+ which is pulled just now.
>>>>>>>
>>>>>>> Cc'ing more people.
>>>>>>>
>>>>>>> While the hpsa driver appears to be involved in some way, I'm sure if
>>>>>>> this is a related issue, but as of today's pull I'm getting another
>>>>>>> problem that causes my DL980 not to come up.
>>>>>>>
>>>>>>> *Massive* amounts of:
>>>>>>>
>>>>>>> DMAR:[fault reason 02] Present bit in context entry is clear
>>>>>>> dmar: DRHD: handling fault status reg 602
>>>>>>> dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000
>>>>>>>
>>>>>>> Then:
>>>>>>>
>>>>>>> hpsa 0000:03:00.0: Controller lockup detected: 0xffff0000
>>>>>>> ...
>>>>>>> Workqueue: events hpsa_monitor_ctlr_worker [hpsa]
>>>>>>> ...
>>>>>>>
>>>>>>> Screenshot of the actual LOCKUP:
>>>>>>> http://stgolabs.net/hpsa-hard-lockup-3.14+.png
>>>>>>>
>>>>>>> While I haven't bisected, things worked fine until at least until commit
>>>>>>> 39de65aa2c3e (April 2nd).
>>>>>>>
>>>>>>> Any ideas?
>>>>>>
>>>>>> Well, it's either a DMA remapping issue or a hpsa one. Your assertion
>>>>>> that everything worked fine until 39de65aa2c3e would tend to vindicate
>>>>>> hpsa,
>>>>
>>>> Hmm here you mean DMA, right?
>>>
>>> No, it vindicates the hpsa changes ... they don't seem to be causing
>>> problems until something goes wrong with dma remapping.
>>>
>>>>> because all the hpsa changes went in before that under
>>>>> Missing crucial info:
>>>>>
>>>>> commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1
>>>>>
>>>>>> Merge: 3e75c6d b2bff6c
>>>>>> Author: Linus Torvalds <[email protected]>
>>>>>> Date: Tue Apr 1 18:49:04 2014 -0700
>>>>>>
>>>>>> Merge tag 'scsi-misc' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>>>>>>
>>>>>> can you revalidate that this commit works OK just to make sure?
>>>>
>>>> Ok so I don't see those DMA messages and system starts just fine. I'm
>>>> thinking perhaps something broke after the IO mmu stuff in commit
>>>> 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly
>>>> causing the CPU stalls and just blame hpsa in the path as a side effect?
>>>>
>>>> /me goes out to try the commit.
>>>
>>> That's my guess. The DMAR messages are DMA remapping issues caused in
>>> the IOMMU. If I had to guess, I'd say the DMAR fault message is
>>> indicating the IOMMU is calling for a mapping address before it can
>>> satisfy the driver read request, which is causing the hang apparently in
>>> the hpsa driver.
>>>
>>> I've added linux-pci to the cc; I think they deal with iommu issues on
>>> x86.
>>
>> So that merge commit appears to be the culprit, I see both the DMA
>> messages and the lockup blaming hpsa...
>
> My understanding so far (please correct me if I'm wrong):
>
> 39de65aa2c3e OK ("Merge branch 'i2c/for-next'")
> 1a0b6abaea78 OK ("Merge tag 'scsi-misc'")
> 3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'")
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2014-04-10 08:46:57

On Thu, 2014-04-10 at 09:14 -0600, Bjorn Helgaas wrote:
> > Thus, my first guess would be that we are quite happily setting up the
> > requested DMA maps on the *wrong* IOMMU, and then taking faults when the
> > device actually tries to do DMA.
> >
> I like the "wrong IOMMU (or no IOMMU at all)" theory. If we didn't
> connect the device with an IOMMU at all, that would explain the device
> DMAing directly to a physical address, wouldn't it?

An unlikely failure mode. We're much more likely to see *wrong* IOMMU
than no IOMMU. And thus we'd still see the distinctive virtual addresses
just below 4GiB.

However, Rob's answer may solve that puzzle. If this is one of those
abominations where the device continues to do DMA to system memory even
after the OS is up and running and *thinks* it has control of the
hardware, then the offending address will be listed in an RMRR entry
(which tells the OS to set up a 1:1 mapping for access to certain memory
ranges for a given device). And will be inside an E820 reserved region.

A little odd that such an error would trigger only when we're actually
trying to initialise the device from the Linux driver, not as soon as we
enable the IOMMU. But all things are possible.

But the DMAR table and dmesg that I asked for would give us a bit more
information and hopefully let us stop speculating...

> > We should also rate-limit DMA faults, which would avoid the lockup
> > failure mode. Bjorn, what should an IOMMU driver *do* when it detects
> > that a device is creating an endless stream of DMA faults and isn't
> > aborting the transaction?
>
> You mentioned that POWER with EEH does something intelligent in this
> case, but I'm not familiar with that code. We have AER support, which
> can result in resetting a device, but I think DMA faults are reported
> differently, and I don't think there's any nice existing way for PCI
> to deal with them. Maybe there should be, though.

Quite frankly, I don't care how *you* deal with them, or even if you
can. All I want to know is how I tell you about the problem, because *I*
sure as hell don't want to be trying to deal with it in the IOMMU code.
That's a generic PCI layer thing. :)

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

Attachments:

smime.p7s (3.36 kB)

2014-04-10 15:43:31

by Bjorn Helgaas

[permalink] [raw]

Subject: Re: hpsa driver bug crack kernel down!

On Tue, Apr 8, 2014 at 8:39 PM, Baoquan He <[email protected]> wrote:
> Hi,
>
> The kernel is 3.14.0+ which is pulled just now.
>
>
> [ 18.402695] systemd[1]: Set hostname to
> <hp-sl4545g7-01.rhts.eng.bos.redhat.com>.
> [ 18.408456] random: systemd urandom read with 70 bits of entropy
> available
> [ 18md[1]: Expecting device
> dev-mapper-rhel_hp\x2d\x2dsl4545g7\x2d\x2d01\x2droot.device...
> Expecting device
> dev-mapper-rhel_hp\x2d\x2dsl4545g7\...droot.device...
> [ 18.860704] systemd[1]: Starting -.slice.
> [ OK ] Created slice -.slice.
> [ 18.866030] systemd[1]: Created slice -.slice.
> [ 18.869466] systemd[1]: Starting System Slice.
> [ OK ] Created slice System Sl 18.939116] systemd[1]: Created
> slice System Slice.
> [ 18.976213] systemd[1]: Starting Slices.
> [ OK ] Reached target Slices.
> [ 18.981154] systemd[1]: Reached target Slices.
> [ 18.984183] systemd[1]: Starting Timers.
> [ OK ] Reached target Timers.
> [ 18.989161] systemd[1]: Reached target Timers.
> [ 18.992004] systemd[1]: Starting Journal Socket.
> [ OK ] Listening on Journal Socket.
> [ 18.997174] systemd[1]: Listening on Journal Socket.
> [ 19.000702] systemd[1]: Starting dracut cmdline hook...
> Starting dracut cmdline hook...
> [ 19.006697] systemd[1]: Started Load KernModules.
> [ 19.110408] systemd[1]: Starting Setup Virtual Console...
> Starting Setup Virtual Console...
> [ 19.116652] systemd[1]: Starting Journal Service...
> Starting Journal Service...
> [ OK ] Started Journal Service.
> [ 19.127172] systemd[1]: Started Journal Service.
> [ OK ] Listening on udev Kernel Socket.
> [ 19.141504] systemd-journald[281]: Vac[ OK ] Listening on udev
> Control Socket.
> [ OK ] Reached target Sockets.
> Starting Create list of required static device nodes...rrent
> kernel...
> Starting Apply Kernel Variables...
> [ OK ] Reached target Swap.
> [ OK ] Reached target Local File Systems.
> [ OK ] Started dracut cmdline hook.
> [ OK ] Started Setup Virtual Console.
> [ OK ] Started Apply Kernel Variables.
> [ OK ] Started Create list of required static device nodes ...current
> kernel.
> Starting Create static device nodes in /dev...
> Starting dracut pre-udev hook...
> [ OK ] Started Create static device nodes in /dev.
> [ 20.247819] device-mapper: uevent: version 1.0.3
> [ 20.251101] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30)
> initialised: [email protected]
> [ OK ] Started dracut pre-udev hook.
> Starting udev Kernel Device Manager...
> [ 20.322923] systemd-udevd[335]: starting version 208
> [ OK ] Started udev Kernel Device Manager.
> Starting udev Coldplug all Devices...
> Mounting Configuration File System...
> [ OK ] Mounted Configuration File System.
> [ OK ] Started udev Coldplug all Devices.
> Starting dracut initqueue hook...
> [ OK ][1] HP HPSA Driver (v 3.4.4-1)
> [ 20.832850] hpsa 0000:05:00.0: can't disable ASPM; OS doesn't have
> ASPM control
> Reached target System Initialization.
> [ 20.875178] ACPI: PCI Interrupt Link [I0C0] enabled at IRQ 36
> [ 20.909000] hpsa 0000:05:00.0: MSIX
> [ 20.911586] hpsa 0000:05:00.0: Logical aborts not supported
> [ 20.916004] [drm] Initialized drm 1.1.0 20060810
> [ 20.936139] hpsa 0000:05:00.0: hpsa0: <0x323b> at IRQ 73 using DAC
> [ 20.956967] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> [ 20.956997] IP: [<ffffffffa004b97f>]
> hpsa_enter_performant_mode+0x4ff/0x580 [hpsa]
> [ 20.957003] PGD 0
> [ 20.957012] Oops: 0002 [#1] SMP
> [ 20.957035] Modules linked in: drm(+) libata hpsa(+) i2c_core
> dm_mirror dm_region_hash dm_log dm_mod
> [ 20.957046] CPU: 10 PID: 341 Comm: systemd-udevd Not tainted 3.14.0+
> #28
> [ 20.957049] Hardware name: HP ProLiant SL4545 G7/, BIOS A31
> 12/08/2012
> [ 20.957055] task: ffff880824191b40 ti: ffff88082309c000 task.ti:
> ffff88082309c000
> [ 20.957078] RIP: 0010:[<ffffffffa004b97f>] [<ffffffffa004b97f>]
> hpsa_enter_performant_mode+0x4ff/0x580 [hpsa]
> [ 20.957083] RSP: 0018:ffff88082309da18 EFLAGS: 00010297
> [ 20.957088] RAX: 0000000000000000 RBX: 000000007c000167 RCX:
> 0000000000000004
> [ 20.957091] RDX: 000000000000

What happened with this original report? This looks like a different
problem than the DMA fault reported by Davidlohr. I'd start by
disassembling the hpsa module and matching the IP to a line.
Documentation/oops-tracing.txt might have useful tips on how to do
that.

2014-04-10 15:54:25

On Thu, 2014-04-10 at 16:34 +0800, Jiang Liu wrote:
> Hi Baoquan,
> Could you please help to give output of "lspci -vvvv"?

Reran as root, attached again.

Attachments:

lspci2.txt (144.16 kB)

2014-04-10 16:19:56

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: hpsa driver bug crack kernel down!

On Thu, 2014-04-10 at 08:46 +0000, Woodhouse, David wrote:
> On Thu, 2014-04-10 at 09:15 +0200, Joerg Roedel wrote:
> > [+ David, VT-d maintainer ]
> >
> > Jiang, David, can you please have a look into this issue?
> >
>
> > > > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear
> > > > >> > > > > dmar: DRHD: handling fault status reg 602
> > > > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000
>
> That "Present bit in context entry is clear" fault means that we have
> not set up *any* mappings for this PCI device… on this IOMMU.
>
> > > Yes, specifically (finally done bisecting):
> > >
> > > commit 2e45528930388658603ea24d49cf52867b928d3e
> > > Author: Jiang Liu <[email protected]>
> > > Date: Wed Feb 19 14:07:36 2014 +0800
> > >
> > > iommu/vt-d: Unify the way to process DMAR device scope array
>
> This commit is about how we decide which IOMMU a given PCI device is
> attached to.
>
> Thus, my first guess would be that we are quite happily setting up the
> requested DMA maps on the *wrong* IOMMU, and then taking faults when the
> device actually tries to do DMA.
>
> However, I'm not 100% convinced of that. The fault address looks
> suspiciously like a true physical address, not a virtual bus address of
> the type that we'd normally allocate for a dma_map_* operation. Those
> would start at 0xfffff000 and work downwards, typically.
>
> Do you have 'iommu=pt' on the kernel command line?

No.

> Can I see the full
> dmesg as this system boots, and also a copy of the DMAR table?

Attaching a dmesg from one of the kernels that boots. It doesn't appear
to have much of the related information... is there any debug config
option I can enable that might give you more data?

Attachments:

dmesg.out (100.37 kB)

2014-04-10 16:32:30

On Thu, 2014-04-10 at 17:17 -0600, Shuah Khan wrote:
> This smells very much like the problem that was solved couple of years
> ago for SI domain. It is likely that path is broken with the DMAR
> device scope array change. Please take a look to see if the following
> no longer occurs. Looks like BIOS could be expecting this RMRR to be
> still mapped.
>
> /*
> * We want to prevent any device associated with an RMRR from
> * getting placed into the SI Domain. This is done because
> * problems exist when devices are moved in and out of domains
> * and their respective RMRR info is lost. We exempt USB devices
> * from this process due to their usage of RMRRs that are known
> * to not be needed after BIOS hand-off to OS.
> */
> if (device_has_rmrr(dev) &&
> (pdev->class >> 8) != PCI_CLASS_SERIAL_USB)
> return 0;

Yeah, I'd be inclined to agree.... although I've tested with graphics
*since* these patches. That's another case where we need to preserve the
RMRR mapping after the driver takes over — and it *was* working.

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

Attachments:

smime.p7s (5.61 kB)

2014-04-11 09:21:30

Sorry for the delay, I've been having to take turns for this box.

On Fri, 2014-04-11 at 09:18 +0000, Woodhouse, David wrote:
> On Thu, 2014-04-10 at 09:19 -0700, Davidlohr Bueso wrote:
> > Attaching a dmesg from one of the kernels that boots. It doesn't appear
> > to have much of the related information... is there any debug config
> > option I can enable that might give you more data?
>
> I'd like the contents of /sys/firmware/acpi/tables/DMAR please.

Attached is the disassembly of the raw output.

> And
> please could you also apply this patch to both the last-working and
> first-failing kernels and show me the output in both cases?

So I still cannot get around getting the info for the first failing
kernel, but below is for the last working. Thanks.

Device 0:03:00.0 on IOMMU at a8000000
Device 0:03:00.0 on IOMMU at a8000000
IOMMU: Setting identity map for device 0000:02:00.0 [0x7f61e000 - 0x7f61ffff]
Device 0:02:00.0 on IOMMU at a8000000
Device 0:02:00.0 on IOMMU at a8000000
IOMMU: Setting identity map for device 0000:02:00.2 [0x7f61e000 - 0x7f61ffff]
Device 0:02:00.2 on IOMMU at a8000000
Device 0:02:00.2 on IOMMU at a8000000
IOMMU: Setting identity map for device 0000:00:1d.0 [0x7f7e7000 - 0x7f7ecfff]
Device 0:00:1d.0 on IOMMU at a8000000
Device 0:00:1d.0 on IOMMU at a8000000
IOMMU: Setting identity map for device 0000:00:1d.1 [0x7f7e7000 - 0x7f7ecfff]
Device 0:00:1d.1 on IOMMU at a8000000
Device 0:00:1d.1 on IOMMU at a8000000
IOMMU: Setting identity map for device 0000:00:1d.2 [0x7f7e7000 - 0x7f7ecfff]
Device 0:00:1d.2 on IOMMU at a8000000
Device 0:00:1d.2 on IOMMU at a8000000
IOMMU: Setting identity map for device 0000:00:1d.3 [0x7f7e7000 - 0x7f7ecfff]
Device 0:00:1d.3 on IOMMU at a8000000
Device 0:00:1d.3 on IOMMU at a8000000
IOMMU: Setting identity map for device 0000:02:00.0 [0x7f7e7000 - 0x7f7ecfff]
Device 0:02:00.0 on IOMMU at a8000000
IOMMU: Setting identity map for device 0000:02:00.2 [0x7f7e7000 - 0x7f7ecfff]
Device 0:02:00.2 on IOMMU at a8000000
IOMMU: Setting identity map for device 0000:02:00.4 [0x7f7e7000 - 0x7f7ecfff]
Device 0:02:00.4 on IOMMU at a8000000
Device 0:02:00.4 on IOMMU at a8000000
IOMMU: Setting identity map for device 0000:00:1d.7 [0x7f7ee000 - 0x7f7effff]
Device 0:00:1d.7 on IOMMU at a8000000
Device 0:00:1d.7 on IOMMU at a8000000
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
Device 0:00:1f.0 on IOMMU at a8000000
Device 0:00:1f.0 on IOMMU at a8000000
PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
Device 0:00:00.0 on IOMMU at a8000000
Device 0:00:01.0 on IOMMU at a8000000
Device 0:00:02.0 on IOMMU at a8000000
Device 0:00:03.0 on IOMMU at a8000000
Device 0:00:04.0 on IOMMU at a8000000
Device 0:00:05.0 on IOMMU at a8000000
Device 0:00:06.0 on IOMMU at a8000000
Device 0:00:07.0 on IOMMU at a8000000
Device 0:00:08.0 on IOMMU at a8000000
Device 0:00:09.0 on IOMMU at a8000000
Device 0:00:0a.0 on IOMMU at a8000000
Device 0:00:14.0 on IOMMU at a8000000
Device 0:00:1c.0 on IOMMU at a8000000
Device 0:00:1c.4 on IOMMU at a8000000
Device 0:00:1d.0 on IOMMU at a8000000
Device 0:00:1d.1 on IOMMU at a8000000
Device 0:00:1d.2 on IOMMU at a8000000
Device 0:00:1d.3 on IOMMU at a8000000
Device 0:00:1d.7 on IOMMU at a8000000
Device 0:00:1e.0 on IOMMU at a8000000
Device 0:00:1f.0 on IOMMU at a8000000
Device 0:04:00.0 on IOMMU at a8000000
Device 0:04:00.1 on IOMMU at a8000000
Device 0:04:00.2 on IOMMU at a8000000
Device 0:04:00.3 on IOMMU at a8000000
Device 0:03:00.0 on IOMMU at a8000000
Device 0:02:00.0 on IOMMU at a8000000
Device 0:02:00.2 on IOMMU at a8000000
Device 0:02:00.4 on IOMMU at a8000000
Device 0:01:03.0 on IOMMU at a8000000
Device 0:50:00.0 on IOMMU at ac000000
Device 0:50:01.0 on IOMMU at ac000000
Device 0:50:02.0 on IOMMU at ac000000
Device 0:50:03.0 on IOMMU at ac000000
Device 0:50:04.0 on IOMMU at ac000000
Device 0:50:05.0 on IOMMU at ac000000
Device 0:50:06.0 on IOMMU at ac000000
Device 0:50:07.0 on IOMMU at ac000000
Device 0:50:08.0 on IOMMU at ac000000
Device 0:50:09.0 on IOMMU at ac000000
Device 0:50:0a.0 on IOMMU at ac000000
Device 0:50:14.0 on IOMMU at a8000000
Device 0:a0:00.0 on IOMMU at b0000000
Device 0:a0:01.0 on IOMMU at b0000000
Device 0:a0:02.0 on IOMMU at b0000000
Device 0:a0:03.0 on IOMMU at b0000000
Device 0:a0:04.0 on IOMMU at b0000000
Device 0:a0:05.0 on IOMMU at b0000000
Device 0:a0:06.0 on IOMMU at b0000000
Device 0:a0:07.0 on IOMMU at b0000000
Device 0:a0:08.0 on IOMMU at b0000000
Device 0:a0:09.0 on IOMMU at b0000000
Device 0:a0:0a.0 on IOMMU at b0000000
Device 0:a0:14.0 on IOMMU at a8000000
Device 0:7c:00.0 on IOMMU at a8000000
Device 0:7c:08.0 on IOMMU at a8000000
Device 0:82:00.0 on IOMMU at a8000000
Device 0:82:08.0 on IOMMU at a8000000

Attachments:

DMAR.dsl (28.46 kB)

2014-04-14 16:20:04

by Jiang Liu

[permalink] [raw]

Subject: Re: hpsa driver bug crack kernel down!

Hi Davidlohr,
Thanks for providing the DMAR table. According to the DMAR
table, one bug in the iommu driver fails to handle this entry:
[1D2h 0466 1] Device Scope Entry Type : 01
[1D3h 0467 1] Entry Length : 0A
[1D4h 0468 2] Reserved : 0000
[1D6h 0470 1] Enumeration ID : 00
[1D7h 0471 1] PCI Bus Number : 00
[1D8h 0472 2] PCI Path : 1C,04
[1DAh 0474 2] PCI Path : 00,02

And the patch sent out by me should fix this bug. Could you please help
to have a try?
Thanks!
Gerry

On 2014/4/14 23:45, Davidlohr Bueso wrote:
> Sorry for the delay, I've been having to take turns for this box.
>
> On Fri, 2014-04-11 at 09:18 +0000, Woodhouse, David wrote:
>> On Thu, 2014-04-10 at 09:19 -0700, Davidlohr Bueso wrote:
>>> Attaching a dmesg from one of the kernels that boots. It doesn't appear
>>> to have much of the related information... is there any debug config
>>> option I can enable that might give you more data?
>>
>> I'd like the contents of /sys/firmware/acpi/tables/DMAR please.
>
> Attached is the disassembly of the raw output.
>
>> And
>> please could you also apply this patch to both the last-working and
>> first-failing kernels and show me the output in both cases?
>
> So I still cannot get around getting the info for the first failing
> kernel, but below is for the last working. Thanks.
>
> Device 0:03:00.0 on IOMMU at a8000000
> Device 0:03:00.0 on IOMMU at a8000000
> IOMMU: Setting identity map for device 0000:02:00.0 [0x7f61e000 - 0x7f61ffff]
> Device 0:02:00.0 on IOMMU at a8000000
> Device 0:02:00.0 on IOMMU at a8000000
> IOMMU: Setting identity map for device 0000:02:00.2 [0x7f61e000 - 0x7f61ffff]
> Device 0:02:00.2 on IOMMU at a8000000
> Device 0:02:00.2 on IOMMU at a8000000
> IOMMU: Setting identity map for device 0000:00:1d.0 [0x7f7e7000 - 0x7f7ecfff]
> Device 0:00:1d.0 on IOMMU at a8000000
> Device 0:00:1d.0 on IOMMU at a8000000
> IOMMU: Setting identity map for device 0000:00:1d.1 [0x7f7e7000 - 0x7f7ecfff]
> Device 0:00:1d.1 on IOMMU at a8000000
> Device 0:00:1d.1 on IOMMU at a8000000
> IOMMU: Setting identity map for device 0000:00:1d.2 [0x7f7e7000 - 0x7f7ecfff]
> Device 0:00:1d.2 on IOMMU at a8000000
> Device 0:00:1d.2 on IOMMU at a8000000
> IOMMU: Setting identity map for device 0000:00:1d.3 [0x7f7e7000 - 0x7f7ecfff]
> Device 0:00:1d.3 on IOMMU at a8000000
> Device 0:00:1d.3 on IOMMU at a8000000
> IOMMU: Setting identity map for device 0000:02:00.0 [0x7f7e7000 - 0x7f7ecfff]
> Device 0:02:00.0 on IOMMU at a8000000
> IOMMU: Setting identity map for device 0000:02:00.2 [0x7f7e7000 - 0x7f7ecfff]
> Device 0:02:00.2 on IOMMU at a8000000
> IOMMU: Setting identity map for device 0000:02:00.4 [0x7f7e7000 - 0x7f7ecfff]
> Device 0:02:00.4 on IOMMU at a8000000
> Device 0:02:00.4 on IOMMU at a8000000
> IOMMU: Setting identity map for device 0000:00:1d.7 [0x7f7ee000 - 0x7f7effff]
> Device 0:00:1d.7 on IOMMU at a8000000
> Device 0:00:1d.7 on IOMMU at a8000000
> IOMMU: Prepare 0-16MiB unity mapping for LPC
> IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
> Device 0:00:1f.0 on IOMMU at a8000000
> Device 0:00:1f.0 on IOMMU at a8000000
> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
> Device 0:00:00.0 on IOMMU at a8000000
> Device 0:00:01.0 on IOMMU at a8000000
> Device 0:00:02.0 on IOMMU at a8000000
> Device 0:00:03.0 on IOMMU at a8000000
> Device 0:00:04.0 on IOMMU at a8000000
> Device 0:00:05.0 on IOMMU at a8000000
> Device 0:00:06.0 on IOMMU at a8000000
> Device 0:00:07.0 on IOMMU at a8000000
> Device 0:00:08.0 on IOMMU at a8000000
> Device 0:00:09.0 on IOMMU at a8000000
> Device 0:00:0a.0 on IOMMU at a8000000
> Device 0:00:14.0 on IOMMU at a8000000
> Device 0:00:1c.0 on IOMMU at a8000000
> Device 0:00:1c.4 on IOMMU at a8000000
> Device 0:00:1d.0 on IOMMU at a8000000
> Device 0:00:1d.1 on IOMMU at a8000000
> Device 0:00:1d.2 on IOMMU at a8000000
> Device 0:00:1d.3 on IOMMU at a8000000
> Device 0:00:1d.7 on IOMMU at a8000000
> Device 0:00:1e.0 on IOMMU at a8000000
> Device 0:00:1f.0 on IOMMU at a8000000
> Device 0:04:00.0 on IOMMU at a8000000
> Device 0:04:00.1 on IOMMU at a8000000
> Device 0:04:00.2 on IOMMU at a8000000
> Device 0:04:00.3 on IOMMU at a8000000
> Device 0:03:00.0 on IOMMU at a8000000
> Device 0:02:00.0 on IOMMU at a8000000
> Device 0:02:00.2 on IOMMU at a8000000
> Device 0:02:00.4 on IOMMU at a8000000
> Device 0:01:03.0 on IOMMU at a8000000
> Device 0:50:00.0 on IOMMU at ac000000
> Device 0:50:01.0 on IOMMU at ac000000
> Device 0:50:02.0 on IOMMU at ac000000
> Device 0:50:03.0 on IOMMU at ac000000
> Device 0:50:04.0 on IOMMU at ac000000
> Device 0:50:05.0 on IOMMU at ac000000
> Device 0:50:06.0 on IOMMU at ac000000
> Device 0:50:07.0 on IOMMU at ac000000
> Device 0:50:08.0 on IOMMU at ac000000
> Device 0:50:09.0 on IOMMU at ac000000
> Device 0:50:0a.0 on IOMMU at ac000000
> Device 0:50:14.0 on IOMMU at a8000000
> Device 0:a0:00.0 on IOMMU at b0000000
> Device 0:a0:01.0 on IOMMU at b0000000
> Device 0:a0:02.0 on IOMMU at b0000000
> Device 0:a0:03.0 on IOMMU at b0000000
> Device 0:a0:04.0 on IOMMU at b0000000
> Device 0:a0:05.0 on IOMMU at b0000000
> Device 0:a0:06.0 on IOMMU at b0000000
> Device 0:a0:07.0 on IOMMU at b0000000
> Device 0:a0:08.0 on IOMMU at b0000000
> Device 0:a0:09.0 on IOMMU at b0000000
> Device 0:a0:0a.0 on IOMMU at b0000000
> Device 0:a0:14.0 on IOMMU at a8000000
> Device 0:7c:00.0 on IOMMU at a8000000
> Device 0:7c:08.0 on IOMMU at a8000000
> Device 0:82:00.0 on IOMMU at a8000000
> Device 0:82:08.0 on IOMMU at a8000000
>

2014-04-14 16:44:26

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: hpsa driver bug crack kernel down!

On Tue, 2014-04-15 at 00:19 +0800, Jiang Liu wrote:
> Hi Davidlohr,
> Thanks for providing the DMAR table. According to the DMAR
> table, one bug in the iommu driver fails to handle this entry:
> [1D2h 0466 1] Device Scope Entry Type : 01
> [1D3h 0467 1] Entry Length : 0A
> [1D4h 0468 2] Reserved : 0000
> [1D6h 0470 1] Enumeration ID : 00
> [1D7h 0471 1] PCI Bus Number : 00
> [1D8h 0472 2] PCI Path : 1C,04
> [1DAh 0474 2] PCI Path : 00,02
>
> And the patch sent out by me should fix this bug. Could you please help
> to have a try?

Sorry, I am unable to find any patches from you regarding this issue...
I must be missing something. Could you please point me to the lkml link?

Thanks.

2014-04-14 16:47:56

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: hpsa driver bug crack kernel down!

On Mon, 2014-04-14 at 09:44 -0700, Davidlohr Bueso wrote:
> On Tue, 2014-04-15 at 00:19 +0800, Jiang Liu wrote:
> > Hi Davidlohr,
> > Thanks for providing the DMAR table. According to the DMAR
> > table, one bug in the iommu driver fails to handle this entry:
> > [1D2h 0466 1] Device Scope Entry Type : 01
> > [1D3h 0467 1] Entry Length : 0A
> > [1D4h 0468 2] Reserved : 0000
> > [1D6h 0470 1] Enumeration ID : 00
> > [1D7h 0471 1] PCI Bus Number : 00
> > [1D8h 0472 2] PCI Path : 1C,04
> > [1DAh 0474 2] PCI Path : 00,02
> >
> > And the patch sent out by me should fix this bug. Could you please help
> > to have a try?
>
> Sorry, I am unable to find any patches from you regarding this issue...
> I must be missing something. Could you please point me to the lkml link?

Never mind, I got it internally. I'll let you know as soon as I can
test it later today.

2014-04-14 17:05:14

On Wed, 2014-04-16 at 15:37 +0200, [email protected] wrote:
> Hey David,
>
> On Mon, Apr 14, 2014 at 05:03:51PM +0000, Woodhouse, David wrote:
> > Jiang, if you can then let me have a copy with a signed-off-by I'll
> > shepherd it upstream along with your other patch which is already in my
> > iommu-2.6.git tree.
>
> What is the state of these fixes? I plan to send out a pull-request
> before easter and hoped to include these fixes as well.

I'm travelling and was going to do some final testing and send out a
pull request after I got home tomorrow. But since you ask...

Please pull from
git://git.infradead.org/iommu-2.6.git

David Woodhouse (1):
iommu/vt-d: Fix get_domain_for_dev() handling of upstream PCIe bridges

Jiang Liu (2):
iommu/vt-d: fix memory leakage caused by commit ea8ea46
iommu/vt-d: fix bug in matching PCI devices with DRHD/RMRR descriptors

drivers/iommu/dmar.c | 3 ++-
drivers/iommu/intel-iommu.c | 10 +++++++---
2 files changed, 9 insertions(+), 4 deletions(-)

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

Attachments:

smime.p7s (3.36 kB)

2014-04-16 14:13:28

by Joerg Roedel

[permalink] [raw]

Subject: Re: hpsa driver bug crack kernel down!

On Wed, Apr 16, 2014 at 01:58:44PM +0000, Woodhouse, David wrote:
> On Wed, 2014-04-16 at 15:37 +0200, [email protected] wrote:
> > What is the state of these fixes? I plan to send out a pull-request
> > before easter and hoped to include these fixes as well.
>
> I'm travelling and was going to do some final testing and send out a
> pull request after I got home tomorrow. But since you ask...
>
> Please pull from
> git://git.infradead.org/iommu-2.6.git
>
> David Woodhouse (1):
> iommu/vt-d: Fix get_domain_for_dev() handling of upstream PCIe bridges
>
> Jiang Liu (2):
> iommu/vt-d: fix memory leakage caused by commit ea8ea46
> iommu/vt-d: fix bug in matching PCI devices with DRHD/RMRR descriptors
>
> drivers/iommu/dmar.c | 3 ++-
> drivers/iommu/intel-iommu.c | 10 +++++++---
> 2 files changed, 9 insertions(+), 4 deletions(-)

Pulled, thanks David. I will also do some additional testing before
sending it upstream.

Joerg