by Joao Martins

[permalink] [raw]

Subject: Re: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver

On 2/28/22 21:01, Jason Gunthorpe wrote:
> On Mon, Feb 28, 2022 at 01:01:39PM +0000, Joao Martins wrote:
>> On 2/25/22 20:44, Jason Gunthorpe wrote:
>>> On Fri, Feb 25, 2022 at 07:18:37PM +0000, Joao Martins wrote:
>>>> On 2/23/22 01:03, Jason Gunthorpe wrote:
>>>>> On Tue, Feb 22, 2022 at 11:55:55AM +0000, Joao Martins wrote:
>>>>>>>> If by conclusion you mean the whole thing to be merged, how can the work be
>>>>>>>> broken up to pieces if we busy-waiting on the new subsystem? Or maybe you meant
>>>>>>>> in terms of direction...
>>>>>>>
>>>>>>> I think go ahead and build it on top of iommufd, start working out the
>>>>>>> API details, etc. I think once the direction is concluded the new APIs
>>>>>>> will go forward.
>>>>>>>
>>>>>> /me nods, will do. Looking at your repository it is looking good.
>>>>>
>>>>> I would like to come with some plan for dirty tracking on iommufd and
>>>>> combine that with a plan for dirty tracking inside the new migration
>>>>> drivers.
>>>>>
>>>> I had a few things going on my end over the past weeks, albeit it is
>>>> getting a bit better now and I will be coming back to this topic. I hope/want
>>>> to give you a more concrete update/feedback over the coming week or two wrt
>>>> to dirty-tracking+iommufd+amd.
>>>>
>>>> So far, I am not particularly concerned that this will affect overall iommufd
>>>> design. The main thing is really lookups to get vendor iopte, upon on what might
>>>> be a iommu_sync_dirty_bitmap(domain, iova, size) API. For toggling
>>>> the tracking,
>>>
>>> I'm not very keen on these multiplexer interfaces. I think you should
>>> just add a new ops to the new iommu_domain_ops 'set_dirty_tracking'
>>> 'read_dirty_bits'
>>>
>>> NULL op means not supported.
>>>
>>> IMHO we don't need a kapi wrapper if only iommufd is going to call the
>>> op.
>>>
>>
>> OK, good point.
>>
>> Even without a kapi wrapper I am still wondering whether the iommu op needs to
>> be something like a generic iommu feature toggling (e.g. .set_feature()), rather
>> than one that sits "hardcoded" as set_dirty(). Unless dirty right now is about
>> the only feature we will change out-of-band in the protection-domain.
>> I guess I can stay with set_ad_tracking/set_dirty_tracking and if should
>> need arise we will expand with a generic .set_feature(dom, IOMMU_DIRTY | IOMMU_ACCESS).
>
> I just generally dislike multiplexers like this. We are already
> calling through a function pointer struct, why should the driver
> implement another switch/case just to find out which function pointer
> the caller really ment to use? It doesn't make things faster, it
> doesn't make things smaller, it doesn't use less LOC. Why do it?
>
I concur with you in the above but I don't mean it like a multiplexer.
Rather, mimicking the general nature of feature bits in the protection domain
(or the hw pagetable abstraction). So hypothetically for every bit ... if you
wanted to create yet another op that just flips a bit of the
DTEs/PASID-table/CD it would be an excessive use of callbacks to get
to in the iommu_domain_ops if they all set do the same thing.
Right now it's only Dirty (Access bit I don't see immediate use for it
right now) bits, but could there be more? Perhaps this is something to
think about.

Anyways, I am anyways sticking with plain ops function.

>> Regarding the dirty 'data' that's one that I am wondering about. I called it 'sync'
>> because the sync() doesn't only read, but also "writes" to root pagetable to update
>> the dirty bit (and then IOTLB flush). And that's about what VFIO current interface
>> does (i.e. there's only a GET_BITMAP in the ioctl) and no explicit interface to clear.
>
> 'read and clear' is what I'd suggest
>
/me nods

>>> - What about the unmap and read dirty without races operation that
>>> vfio has?
>>>
>> I am afraid that might need a new unmap iommu op that reads the dirty bit
>> after clearing the page table entry. It's marshalling the bits from
>> iopte into a bitmap as opposed to some logic added on top. As far as I
>> looked for AMD this isn't difficult to add, (same for Intel) it can
>> *I think* reuse all of the unmap code.
>
> Ok. It feels necessary to be complete
>
Yes, I guess it's mandatory if we want to fully implement vfio-compat dirty
tracking IOCTLs, too.

>>> Yes, this is a point that needs some answering. One option is to pass
>>> in the tracking range list from userspace. Another is to query it in
>>> the driver from the currently mapped areas in IOAS.
>>>
>> I sort of like the second given that we de-duplicate info that is already
>> tracked by IOAS -- it would be mostly internal and then it would be a
>> matter of when does this device tracker is set up, and whether we should
>> differentiate tracker "start"/"stop" vs "setup"/"teardown".
>
> One problem with this is that devices that don't support dynamic
> tracking are stuck in vIOMMU cases where the IOAS will have some tiny
> set of all memory mapped.
>
Sorry to be pedantic, when you say 'dynamic tracking' for you it just means
that you have no limitation of ranges and fw/hw can cope with being fed of
'new-ranges' to enable dirty-tracking. Or does it mean something else at
hw level? Like dirty bitmap being pulled out of device memory, and hw keeps
its own state of dirtying and sw 'just' needs to sync the bitmap every scan.