Taking a SD-MMC controller (over PCI) as an example, following is an
example sequencing, where the log-flooding happened :
0.
We have a host and a guest, both running latest x86_64 kernels.
1.
Host-machine is booted up (with intel_iommu=on), and the DMA-PTEs
are setup for the controller (on the host), for the first time.
2.
The SD-controller device is added to a (L1) guest on a KVM-VM
(via virt-manager).
3.
The KVM-VM is booted up.
4.
Above triggers a re-setup of DMA-PTEs on the host, for a
second time.
It is observed that the new PTEs formed (on the host) are same
as the original PTEs, and thus following logs, accompanied by
stacktraces, overwhelm the logs :
......
DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
......
As the PTEs are same, so there is no cause of concern, and we can easily
avoid the logs-flood for this non-error case.
Signed-off-by: Ajay Garg <[email protected]>
---
drivers/iommu/intel/iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59ae28e6..8bea8b4e3ff9 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2370,7 +2370,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
* touches the iova range
*/
tmp = cmpxchg64_local(&pte->val, 0ULL, pteval);
- if (tmp) {
+ if (tmp && (tmp != pteval)) {
static int dumps = 5;
pr_crit("ERROR: DMA PTE for vPFN 0x%lx already set (to %llx not %llx)\n",
iov_pfn, tmp, (unsigned long long)pteval);
--
2.30.2
On 2021/10/2 20:40, Ajay Garg wrote:
> Taking a SD-MMC controller (over PCI) as an example, following is an
> example sequencing, where the log-flooding happened :
>
> 0.
> We have a host and a guest, both running latest x86_64 kernels.
>
> 1.
> Host-machine is booted up (with intel_iommu=on), and the DMA-PTEs
> are setup for the controller (on the host), for the first time.
>
> 2.
> The SD-controller device is added to a (L1) guest on a KVM-VM
> (via virt-manager).
Isn't the domain should be switched from a default domain to an
unmanaged domain when the device is assigned to the guest?
Even you want to r-setup the same mappings, you need to un-map all
existing mappings, right?
Best regards,
baolu
>
> 3.
> The KVM-VM is booted up.
>
> 4.
> Above triggers a re-setup of DMA-PTEs on the host, for a
> second time.
>
> It is observed that the new PTEs formed (on the host) are same
> as the original PTEs, and thus following logs, accompanied by
> stacktraces, overwhelm the logs :
>
> ......
> DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
> DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
> DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
> DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
> DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
> ......
>
> As the PTEs are same, so there is no cause of concern, and we can easily
> avoid the logs-flood for this non-error case.
>
> Signed-off-by: Ajay Garg <[email protected]>
> ---
> drivers/iommu/intel/iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..8bea8b4e3ff9 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2370,7 +2370,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
> * touches the iova range
> */
> tmp = cmpxchg64_local(&pte->val, 0ULL, pteval);
> - if (tmp) {
> + if (tmp && (tmp != pteval)) {
> static int dumps = 5;
> pr_crit("ERROR: DMA PTE for vPFN 0x%lx already set (to %llx not %llx)\n",
> iov_pfn, tmp, (unsigned long long)pteval);
>
Thanks Lu for the reply.
>
> Isn't the domain should be switched from a default domain to an
> unmanaged domain when the device is assigned to the guest?
>
> Even you want to r-setup the same mappings, you need to un-map all
> existing mappings, right?
>
Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities
need to take.
May be the patch could suppress the flooding till then?
Thanks and Regards,
Ajay
On Sat, 2 Oct 2021 22:48:24 +0530
Ajay Garg <[email protected]> wrote:
> Thanks Lu for the reply.
>
> >
> > Isn't the domain should be switched from a default domain to an
> > unmanaged domain when the device is assigned to the guest?
> >
> > Even you want to r-setup the same mappings, you need to un-map all
> > existing mappings, right?
> >
>
> Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities
> need to take.
> May be the patch could suppress the flooding till then?
No, this is wrong. The pte values should not exist, it doesn't matter
that they're the same. Is the host driver failing to remove mappings
and somehow they persist in the new vfio owned domain? There's
definitely a bug beyond logging going on here. Thanks,
Alex
Thanks Alex for the reply.
Lu, Alex :
I got my diagnosis regarding the host-driver wrong, my apologies.
There is no issue with the pci-device's host-driver (confirmed by
preventing the loading of host-driver at host-bootup). Thus, nothing
to be fixed at the host-driver side.
Rather seems some dma mapping/unmapping inconsistency is happening,
when kvm/qemu boots up with the pci-device attached to the guest.
I put up debug-logs in "vfio_iommu_type1_ioctl" method in
"vfio_iommu_type1.c" (on the host-machine).
When the guest boots up, repeated DMA-mappings are observed for the
same address as per the host-machine's logs (without a corresponding
DMA-unmapping first) :
##########################################################################################
ajay@ajay-Latitude-E6320:~$ tail -f /var/log/syslog | grep "ajay: "
Oct 7 14:12:32 ajay-Latitude-E6320 kernel: [ 146.202297] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct 7 14:12:32 ajay-Latitude-E6320 kernel: [ 146.583179] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct 7 14:12:32 ajay-Latitude-E6320 kernel: [ 146.583253] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct 7 14:12:36 ajay-Latitude-E6320 kernel: [ 150.105584] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 180.986499] ajay:
_UNMAP_DMA for [0x7ffe724a9840] status [0]
Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 180.986559] ajay:
_MAP_DMA for [0x7ffe724a97d0] status [0]
Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 180.986638] ajay:
_MAP_DMA for [0x7ffe724a97d0] status [0]
Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 181.087359] ajay:
_MAP_DMA for [0x7ffe724a97d0] status [0]
Oct 7 14:13:13 ajay-Latitude-E6320 kernel: [ 187.271232] ajay:
_UNMAP_DMA for [0x7fde7b7fcfa0] status [0]
Oct 7 14:13:13 ajay-Latitude-E6320 kernel: [ 187.271320] ajay:
_UNMAP_DMA for [0x7fde7b7fcfa0] status [0]
....
##########################################################################################
I'll try and backtrack to the userspace process that is sending these ioctls.
Thanks and Regards,
Ajay
On Tue, Oct 5, 2021 at 4:01 AM Alex Williamson
<[email protected]> wrote:
>
> On Sat, 2 Oct 2021 22:48:24 +0530
> Ajay Garg <[email protected]> wrote:
>
> > Thanks Lu for the reply.
> >
> > >
> > > Isn't the domain should be switched from a default domain to an
> > > unmanaged domain when the device is assigned to the guest?
> > >
> > > Even you want to r-setup the same mappings, you need to un-map all
> > > existing mappings, right?
> > >
> >
> > Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities
> > need to take.
> > May be the patch could suppress the flooding till then?
>
> No, this is wrong. The pte values should not exist, it doesn't matter
> that they're the same. Is the host driver failing to remove mappings
> and somehow they persist in the new vfio owned domain? There's
> definitely a bug beyond logging going on here. Thanks,
>
> Alex
>
> I'll try and backtrack to the userspace process that is sending these ioctls.
>
The userspace process is qemu.
I compiled qemu from latest source, installed via "sudo make install"
on host-machine, rebooted the host-machine, and booted up the
guest-machine on the host-machine. Now, no kernel-flooding is seen on
the host-machine.
For me, the issue is thus closed-invalid; admins may take the
necessary action to officially mark ;)
Thanks and Regards,
Ajay
The flooding was seen today again, after I booted the host-machine in
the morning.
Need to look what the heck is going on ...
On Sun, Oct 10, 2021 at 11:45 AM Ajay Garg <[email protected]> wrote:
>
> > I'll try and backtrack to the userspace process that is sending these ioctls.
> >
>
> The userspace process is qemu.
>
> I compiled qemu from latest source, installed via "sudo make install"
> on host-machine, rebooted the host-machine, and booted up the
> guest-machine on the host-machine. Now, no kernel-flooding is seen on
> the host-machine.
>
> For me, the issue is thus closed-invalid; admins may take the
> necessary action to officially mark ;)
>
>
> Thanks and Regards,
> Ajay
On Mon, 11 Oct 2021 11:49:33 +0530
Ajay Garg <[email protected]> wrote:
> The flooding was seen today again, after I booted the host-machine in
> the morning.
> Need to look what the heck is going on ...
>
> On Sun, Oct 10, 2021 at 11:45 AM Ajay Garg <[email protected]> wrote:
> >
> > > I'll try and backtrack to the userspace process that is sending these ioctls.
> > >
> >
> > The userspace process is qemu.
> >
> > I compiled qemu from latest source, installed via "sudo make install"
> > on host-machine, rebooted the host-machine, and booted up the
> > guest-machine on the host-machine. Now, no kernel-flooding is seen on
> > the host-machine.
> >
> > For me, the issue is thus closed-invalid; admins may take the
> > necessary action to officially mark ;)
Even this QEMU explanation doesn't make a lot of sense, vfio tracks
userspace mappings and will return an -EEXIST error for duplicate or
overlapping IOVA entries. We expect to have an entirely empty IOMMU
domain when a device is assigned, but it seems the only way userspace
can trigger duplicate PTEs would be if mappings already exist, or we
have a bug somewhere.
If the most recent instance is purely on bare metal, then it seems the
host itself has conflicting mappings. I can only speculate with the
limited data presented, but I'm suspicious there's something happening
with RMRRs here (but that should also entirely preclude assignment).
dmesg, lspci -vvv, and VM configuration would be useful. Thanks,
Alex
Thanks Alex for your time.
I think I may have found the issue. Right now, when doing a
dma-unmapping, we do a "soft-unmapping" only, as the pte-values
themselves are not cleared in the unlinked pagetable-frame.
I have made the (simple) changes, and things are looking good as of
now (almost an hour now).
However, this time I will give it a day ;)
If there is not a single-flooding observed in the next 24 hours, I
would float the v2 patch for review.
Thanks again for your time and patience.
Thanks and Regards,
Ajay
>
> Even this QEMU explanation doesn't make a lot of sense, vfio tracks
> userspace mappings and will return an -EEXIST error for duplicate or
> overlapping IOVA entries. We expect to have an entirely empty IOMMU
> domain when a device is assigned, but it seems the only way userspace
> can trigger duplicate PTEs would be if mappings already exist, or we
> have a bug somewhere.
>
> If the most recent instance is purely on bare metal, then it seems the
> host itself has conflicting mappings. I can only speculate with the
> limited data presented, but I'm suspicious there's something happening
> with RMRRs here (but that should also entirely preclude assignment).
> dmesg, lspci -vvv, and VM configuration would be useful. Thanks,
>
> Alex
>
Hi Alex, Lu.
Posted v2 patch, as per
https://lists.linuxfoundation.org/pipermail/iommu/2021-October/059955.html
Kindly review, and let's continue on that thread now.
Thanks and Regards,
Ajay
On Mon, Oct 11, 2021 at 11:43 PM Ajay Garg <[email protected]> wrote:
>
> Thanks Alex for your time.
>
> I think I may have found the issue. Right now, when doing a
> dma-unmapping, we do a "soft-unmapping" only, as the pte-values
> themselves are not cleared in the unlinked pagetable-frame.
>
> I have made the (simple) changes, and things are looking good as of
> now (almost an hour now).
> However, this time I will give it a day ;)
>
> If there is not a single-flooding observed in the next 24 hours, I
> would float the v2 patch for review.
>
>
> Thanks again for your time and patience.
>
>
> Thanks and Regards,
> Ajay
>
>
> >
> > Even this QEMU explanation doesn't make a lot of sense, vfio tracks
> > userspace mappings and will return an -EEXIST error for duplicate or
> > overlapping IOVA entries. We expect to have an entirely empty IOMMU
> > domain when a device is assigned, but it seems the only way userspace
> > can trigger duplicate PTEs would be if mappings already exist, or we
> > have a bug somewhere.
> >
> > If the most recent instance is purely on bare metal, then it seems the
> > host itself has conflicting mappings. I can only speculate with the
> > limited data presented, but I'm suspicious there's something happening
> > with RMRRs here (but that should also entirely preclude assignment).
> > dmesg, lspci -vvv, and VM configuration would be useful. Thanks,
> >
> > Alex
> >