by Suzuki K Poulose

[permalink] [raw]

Subject: Re: [PATCH v5 16/18] kvm: arm64: Set a limit on the IPA size

On 09/25/2018 10:59 AM, Auger Eric wrote:
> Hi Suzuki,
>
> On 9/17/18 12:41 PM, Suzuki K Poulose wrote:
>> So far we have restricted the IPA size of the VM to the default
>> value (40bits). Now that we can manage the IPA size per VM and
>> support dynamic stage2 page tables, we can allow VMs to have
>> larger IPA. This patch introduces a the maximum IPA size
>> supported on the host.
> to be reworded
> This is decided by the following factors :

Sure

>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index 51ecf0f7c912..76972b19bdd7 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -34,6 +34,9 @@
>> #include <asm/kvm_coproc.h>
>> #include <asm/kvm_mmu.h>
>>
>> +/* Maximum phys_shift supported for any VM on this host */
>> +static u32 kvm_ipa_limit;
>> +
>> /*
>> * ARMv8 Reset Values
>> */
>> @@ -135,6 +138,46 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>> return kvm_timer_vcpu_reset(vcpu);
>> }
>>
>> +void kvm_set_ipa_limit(void)
>> +{
>> + unsigned int ipa_max, va_max, parange;
>> +
>> + parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7;
>> + ipa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
>> +
>> + /* Raise the limit to the default size for backward compatibility */
>> + if (ipa_max < KVM_PHYS_SHIFT) {
>> + WARN_ONCE(1,
>> + "PARange is %d bits, unsupported configuration!",
>> + ipa_max);
>> + ipa_max = KVM_PHYS_SHIFT;
> I don't really get what does happen in this case. The CPU cannot handle
> PA up to ipa_max so can the VM run properly? In case it is a
> showstopper, kvm_set_ipa_limit should return an error, cascaded by
> init_common_resources. Otherwise the warning message may be reworded.

I think this was a warning added to warn against the older
Foundation model which had a 36bit PA size. So the VTCR was progammed
with a 36bit limit, while the KVM guest was allowed to create 40bit
IPA space, though it wouldn't fly well if someone tried to.

With this series, I think we may expose the real IPA_MAX (which could
be < 40bit) and warn the user if someone tried to create a VM with
40bit IPA (vm_type == 0) and let the call succeed (for the sake of ABI).

Marc, Christoffer, Eric

Thoughts ?

>> + }
>> +
>> + /* Clamp it to the PA size supported by the kernel */
>> + ipa_max = (ipa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : ipa_max;
>> + /*
>> + * Since our stage2 table is dependent on the stage1 page table code,
>> + * we must always honor the following condition:
>> + *
>> + * Number of levels in Stage1 >= Number of levels in Stage2.
>> + *
>> + * So clamp the ipa limit further down to limit the number of levels.
>> + * Since we can concatenate upto 16 tables at entry level, we could
>> + * go upto 4bits above the maximum VA addressible with the current
> addressable?

Sure

>> + * number of levels.
>> + */
>> + va_max = PGDIR_SHIFT + PAGE_SHIFT - 3;
>> + va_max += 4;
>> +
>> + if (va_max < ipa_max) {
>> + kvm_info("Limiting IPA limit to %dbytes due to host VA bits limitation\n",
>> + va_max);
>> + ipa_max = va_max;
> you have a trace for this limitation but none for the comparison against
> PHYS_MASK_SHIFT.

May be I could add a message which only mentions what is the limiting
factor kernel VA vs kernel PA support

>> + }
>> +
>> + kvm_ipa_limit = ipa_max;
>> +}
>> +
>> /*
>> * Configure the VTCR_EL2 for this VM. The VTCR value is common
>> * across all the physical CPUs on the system. We use system wide
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 43e716bc3f08..631f9a3ad99a 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -1413,6 +1413,8 @@ static int init_common_resources(void)
>> kvm_vmid_bits = kvm_get_vmid_bits();
>> kvm_info("%d-bit VMID\n", kvm_vmid_bits);
>>
>> + kvm_set_ipa_limit();
> As we have a kvm_info for the supported vmid_bits, may be good to output
> the max IPA size supported by the host whatever the applied clamps?

Sure, will do that.

Thanks
Suzuki

2018-09-25 11:58:00

by Eric Auger

[permalink] [raw]

Subject: Re: [PATCH v5 11/18] kvm: arm64: Dynamic configuration of VTTBR mask

Hi Suzuki,

On 9/20/18 5:22 PM, Suzuki K Poulose wrote:
>
>
> On 20/09/18 15:07, Auger Eric wrote:
>> Hi Suzuki,
>> On 9/17/18 12:41 PM, Suzuki K Poulose wrote:
>>> On arm64 VTTBR_EL2:BADDR holds the base address for the stage2
>>> translation table. The Arm ARM mandates that the bits BADDR[x-1:0]
>>> should be 0, where 'x' is defined for a given IPA Size and the
>>> number of levels for a translation granule size. It is defined
>>> using some magical constants. This patch is a reverse engineered
>>> implementation to calculate the 'x' at runtime for a given ipa and
>>> number of page table levels. See patch for more details.
>>>
>>> Cc: Marc Zyngier <[email protected]>
>>> Cc: Christoffer Dall <[email protected]>
>>> Signed-off-by: Suzuki K Poulose <[email protected]>
>>
>>> ---
>>> Changes since V3:
>>> - Update reference to latest ARM ARM and improve commentary
>>> ---
>>> arch/arm64/include/asm/kvm_arm.h | 63 +++++++++++++++++++++++++++++---
>>> arch/arm64/include/asm/kvm_mmu.h | 25 ++++++++++++-
>>> 2 files changed, 81 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_arm.h
>>> b/arch/arm64/include/asm/kvm_arm.h
>>> index 14317b3a1820..3fb1d440be6e 100644
>>> --- a/arch/arm64/include/asm/kvm_arm.h
>>> +++ b/arch/arm64/include/asm/kvm_arm.h
>>> @@ -123,7 +123,6 @@
>>> #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT)
>>> #define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT)
>>> #define VTCR_EL2_T0SZ_MASK 0x3f
>>> -#define VTCR_EL2_T0SZ_40B   24
>>> #define VTCR_EL2_VS_SHIFT 19
>>> #define VTCR_EL2_VS_8BIT   (0 << VTCR_EL2_VS_SHIFT)
>>> #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT)
>>> @@ -140,11 +139,8 @@
>>>    * Note that when using 4K pages, we concatenate two first level
>>> page tables
>>>    * together. With 16K pages, we concatenate 16 first level page
>>> tables.
>>>    *
>>> - * The magic numbers used for VTTBR_X in this patch can be found in
>>> Tables
>>> - * D4-23 and D4-25 in ARM DDI 0487A.b.
>>>    */
>>>
>>> -#define VTCR_EL2_T0SZ_IPA   VTCR_EL2_T0SZ_40B
>>> #define VTCR_EL2_COMMON_BITS       (VTCR_EL2_SH0_INNER |
>>> VTCR_EL2_ORGN0_WBWA | \
>>>                               VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
>>>
>>> @@ -175,9 +171,64 @@
>>> #endif
>>>
>>> #define VTCR_EL2_FLAGS                     (VTCR_EL2_COMMON_BITS |
>>> VTCR_EL2_TGRAN_FLAGS)
>>> -#define VTTBR_X                             (VTTBR_X_TGRAN_MAGIC -
>>> VTCR_EL2_T0SZ_IPA)
>>> +/*
>>> + * ARM VMSAv8-64 defines an algorithm for finding the translation table
>>> + * descriptors in section D4.2.8 in ARM DDI 0487C.a.
>>> + *
>>> + * The algorithm defines the expectations on the BaseAddress (for
>>> the page
>>> + * table) bits resolved at each level based on the page size, entry
>>> level
>>> + * and T0SZ. The variable "x" in the algorithm also affects the
>>> VTTBR:BADDR
>>> + * for stage2 page table.
>>> + *
>>> + * The value of "x" is calculated as :
>>> + * x = Magic_N - T0SZ
>>
>> What is not crystal clear to me is the "if SL0b,c = n" case where x get
>> a value not based on Magic_N. Please could you explain why it is not
>> relevant?
>
> We only care about the "x" for the "entry" level of the table look up
> to make sure that the VTTBR is physical address meets the required
> alignment. In both cases, if SL0 b,c == n, x is (PAGE_SHIFT) iff the
> level you are looking at is not the "entry level". So this should always
> be page aligned, like any intermediate level table.

Oh OK I get it now.
>
> The Magic value is needed only needed for the "entry" level due to the
> fact that we may have lesser bits to resolve (i.e, depending on your
> PAMax or in other words T0SZ) than the intermediate levels (where we
> always resolve {PAGE_SHIFT - 3} bits. This is further complicated by the
> fact that Stage2 could use different number of levels for a given T0SZ
> than the stage1.
> I acknowledge that the algorithm is a bit too cryptic and I spent quite
> sometime decode it to the formula we use below ;-).
>
> I could update the comment to :
>
> /*
> * ARM VMSAv8-64 defines an algorithm for finding the translation table
> * descriptors in section D4.2.8 in ARM DDI 0487C.a.
> *
> * The algorithm defines the expectations on the translation table
> * addresses for each level, based on PAGE_SIZE, entry level
> * and the translation table size (T0SZ). The variable "x" in the
> * algorithm determines the alignment of a table base address at a given
> * level and thus determines the alignment of VTTBR:BADDR for stage2
> * page table entry level.
> * Since the number of bits resolved at the entry level could vary
> * depending on the T0SZ, the value of "x" is defined based on a
> * Magic constant for a given PAGE_SIZE and Entry Level. The
> * intermediate levels must be always aligned to the PAGE_SIZE (i.e,
> * x = PAGE_SHIFT).
> *
> * The value of "x" for entry level is calculated as :
> *     x = Magic_N - T0SZ
> *
Looks OK.

Thank you for the explanation.

Eric
>
> ...
>
> Suzuki
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy
> the information in any medium. Thank you.