The arch_has_hw_pte_young() is false for riscv by default. If it's
false, page table walk is almost skipped for MGLRU reclaim. And it
will also cause useless step in __wp_page_copy_user().
RISC-V Privileged Book says that riscv have two schemes to manage A
and D bit.
So add a config for selecting, the default is true. For simple
implementation riscv CPU which just generate page fault, unselect it.
Signed-off-by: Jinyu Tang <[email protected]>
---
arch/riscv/Kconfig | 10 ++++++++++
arch/riscv/include/asm/pgtable.h | 7 +++++++
2 files changed, 17 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..17c82885549c 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -180,6 +180,16 @@ config PAGE_OFFSET
default 0x80000000 if 64BIT && !MMU
default 0xff60000000000000 if 64BIT
+config ARCH_HAS_HARDWARE_PTE_YOUNG
+ bool "Hardware Set PTE Access Bit"
+ default y
+ help
+ Select if hardware set A bit when PTE is accessed. The default is
+ 'Y', because most RISC-V CPU hardware can manage A and D bit.
+ But RISC-V may have simple implementation that do not support
+ hardware set A bit but only generate page fault, for that case just
+ unselect it.
+
config KASAN_SHADOW_OFFSET
hex
depends on KASAN_GENERIC
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 4eba9a98d0e3..1db54ab4e1ba 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -532,6 +532,13 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
*/
return ptep_test_and_clear_young(vma, address, ptep);
}
+#ifdef CONFIG_ARCH_HAS_HARDWARE_PTE_YOUNG
+#define arch_has_hw_pte_young arch_has_hw_pte_young
+static inline bool arch_has_hw_pte_young(void)
+{
+ return true;
+}
+#endif
#define pgprot_noncached pgprot_noncached
static inline pgprot_t pgprot_noncached(pgprot_t _prot)
--
2.30.2
On Sun, Jan 29, 2023 at 02:49:56PM +0800, Jinyu Tang wrote:
> The arch_has_hw_pte_young() is false for riscv by default. If it's
> false, page table walk is almost skipped for MGLRU reclaim. And it
> will also cause useless step in __wp_page_copy_user().
>
> RISC-V Privileged Book says that riscv have two schemes to manage A
> and D bit.
>
> So add a config for selecting, the default is true. For simple
> implementation riscv CPU which just generate page fault, unselect it.
>
> Signed-off-by: Jinyu Tang <[email protected]>
> ---
> arch/riscv/Kconfig | 10 ++++++++++
> arch/riscv/include/asm/pgtable.h | 7 +++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..17c82885549c 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -180,6 +180,16 @@ config PAGE_OFFSET
> default 0x80000000 if 64BIT && !MMU
> default 0xff60000000000000 if 64BIT
>
> +config ARCH_HAS_HARDWARE_PTE_YOUNG
> + bool "Hardware Set PTE Access Bit"
> + default y
> + help
> + Select if hardware set A bit when PTE is accessed. The default is
> + 'Y', because most RISC-V CPU hardware can manage A and D bit.
> + But RISC-V may have simple implementation that do not support
> + hardware set A bit but only generate page fault, for that case just
> + unselect it.
> +
> config KASAN_SHADOW_OFFSET
> hex
> depends on KASAN_GENERIC
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 4eba9a98d0e3..1db54ab4e1ba 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -532,6 +532,13 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
> */
> return ptep_test_and_clear_young(vma, address, ptep);
> }
> +#ifdef CONFIG_ARCH_HAS_HARDWARE_PTE_YOUNG
> +#define arch_has_hw_pte_young arch_has_hw_pte_young
> +static inline bool arch_has_hw_pte_young(void)
> +{
> + return true;
> +}
> +#endif
>
> #define pgprot_noncached pgprot_noncached
> static inline pgprot_t pgprot_noncached(pgprot_t _prot)
> --
> 2.30.2
>
Reviewed-by: Andrew Jones <[email protected]>
Thanks,
drew
On Sun, Jan 29, 2023 at 02:49:56PM +0800, Jinyu Tang wrote:
> The arch_has_hw_pte_young() is false for riscv by default. If it's
> false, page table walk is almost skipped for MGLRU reclaim. And it
> will also cause useless step in __wp_page_copy_user().
>
> RISC-V Privileged Book says that riscv have two schemes to manage A
> and D bit.
>
> So add a config for selecting, the default is true. For simple
> implementation riscv CPU which just generate page fault, unselect it.
>
> Signed-off-by: Jinyu Tang <[email protected]>
> ---
> arch/riscv/Kconfig | 10 ++++++++++
> arch/riscv/include/asm/pgtable.h | 7 +++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..17c82885549c 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -180,6 +180,16 @@ config PAGE_OFFSET
> default 0x80000000 if 64BIT && !MMU
> default 0xff60000000000000 if 64BIT
>
> +config ARCH_HAS_HARDWARE_PTE_YOUNG
> + bool "Hardware Set PTE Access Bit"
> + default y
> + help
> + Select if hardware set A bit when PTE is accessed. The default is
> + 'Y', because most RISC-V CPU hardware can manage A and D bit.
> + But RISC-V may have simple implementation that do not support
> + hardware set A bit but only generate page fault, for that case just
> + unselect it.
Hmm, I am not really sure if this is the right way to go. Should we
really be defaulting this option to enabled if there are going to be
implementations that do not support it?
Thanks,
Conor.
On Sun, Jan 29, 2023 at 12:21 PM Jinyu Tang <[email protected]> wrote:
>
> The arch_has_hw_pte_young() is false for riscv by default. If it's
> false, page table walk is almost skipped for MGLRU reclaim. And it
> will also cause useless step in __wp_page_copy_user().
>
> RISC-V Privileged Book says that riscv have two schemes to manage A
> and D bit.
>
> So add a config for selecting, the default is true. For simple
> implementation riscv CPU which just generate page fault, unselect it.
I totally disagree with this approach.
Almost all existing RISC-V platforms don't have HW support
PTE.A and PTE.D updates.
We want the same kernel image to run HW with/without PTE.A
and PTE.D updates so kconfig based approach is not going to
fly.
>
> Signed-off-by: Jinyu Tang <[email protected]>
> ---
> arch/riscv/Kconfig | 10 ++++++++++
> arch/riscv/include/asm/pgtable.h | 7 +++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..17c82885549c 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -180,6 +180,16 @@ config PAGE_OFFSET
> default 0x80000000 if 64BIT && !MMU
> default 0xff60000000000000 if 64BIT
>
> +config ARCH_HAS_HARDWARE_PTE_YOUNG
> + bool "Hardware Set PTE Access Bit"
> + default y
> + help
> + Select if hardware set A bit when PTE is accessed. The default is
> + 'Y', because most RISC-V CPU hardware can manage A and D bit.
> + But RISC-V may have simple implementation that do not support
> + hardware set A bit but only generate page fault, for that case just
> + unselect it.
> +
> config KASAN_SHADOW_OFFSET
> hex
> depends on KASAN_GENERIC
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 4eba9a98d0e3..1db54ab4e1ba 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -532,6 +532,13 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
> */
> return ptep_test_and_clear_young(vma, address, ptep);
> }
> +#ifdef CONFIG_ARCH_HAS_HARDWARE_PTE_YOUNG
> +#define arch_has_hw_pte_young arch_has_hw_pte_young
> +static inline bool arch_has_hw_pte_young(void)
> +{
> + return true;
Drop the kconfig option ARCH_HAS_HARDWARE_PTE_YOUNG
and instead use code patching to return true only when Svadu
ISA extension is available in DT ISA string.
> +}
> +#endif
>
> #define pgprot_noncached pgprot_noncached
> static inline pgprot_t pgprot_noncached(pgprot_t _prot)
> --
> 2.30.2
>
Regards,
Anup
On Mon, Jan 30, 2023 at 03:55:55PM +0530, Anup Patel wrote:
> On Sun, Jan 29, 2023 at 12:21 PM Jinyu Tang <[email protected]> wrote:
> >
> > The arch_has_hw_pte_young() is false for riscv by default. If it's
> > false, page table walk is almost skipped for MGLRU reclaim. And it
> > will also cause useless step in __wp_page_copy_user().
> >
> > RISC-V Privileged Book says that riscv have two schemes to manage A
> > and D bit.
> >
> > So add a config for selecting, the default is true. For simple
> > implementation riscv CPU which just generate page fault, unselect it.
>
> I totally disagree with this approach.
>
> Almost all existing RISC-V platforms don't have HW support
> PTE.A and PTE.D updates.
>
> We want the same kernel image to run HW with/without PTE.A
> and PTE.D updates so kconfig based approach is not going to
> fly.
>
> >
> > Signed-off-by: Jinyu Tang <[email protected]>
> > ---
> > arch/riscv/Kconfig | 10 ++++++++++
> > arch/riscv/include/asm/pgtable.h | 7 +++++++
> > 2 files changed, 17 insertions(+)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..17c82885549c 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -180,6 +180,16 @@ config PAGE_OFFSET
> > default 0x80000000 if 64BIT && !MMU
> > default 0xff60000000000000 if 64BIT
> >
> > +config ARCH_HAS_HARDWARE_PTE_YOUNG
> > + bool "Hardware Set PTE Access Bit"
> > + default y
> > + help
> > + Select if hardware set A bit when PTE is accessed. The default is
> > + 'Y', because most RISC-V CPU hardware can manage A and D bit.
> > + But RISC-V may have simple implementation that do not support
> > + hardware set A bit but only generate page fault, for that case just
> > + unselect it.
> > +
> > config KASAN_SHADOW_OFFSET
> > hex
> > depends on KASAN_GENERIC
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index 4eba9a98d0e3..1db54ab4e1ba 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -532,6 +532,13 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
> > */
> > return ptep_test_and_clear_young(vma, address, ptep);
> > }
> > +#ifdef CONFIG_ARCH_HAS_HARDWARE_PTE_YOUNG
>
> > +#define arch_has_hw_pte_young arch_has_hw_pte_young
> > +static inline bool arch_has_hw_pte_young(void)
> > +{
> > + return true;
>
> Drop the kconfig option ARCH_HAS_HARDWARE_PTE_YOUNG
> and instead use code patching to return true only when Svadu
> ISA extension is available in DT ISA string.
Indeed. I should have checked if there was an extension for this
first. It crossed my mind that we should only be enabling features
when the extensions are present, but looking at the privileged manual
isn't sufficient to learn about the Svadu extension. I should have
checked https://wiki.riscv.org/display/HOME/Specification+Status
Anyway, I retract my r-b and agree with Anup.
Thanks,
drew
On 30 Jan 2023, at 10:49, Andrew Jones <[email protected]> wrote:
>
> On Mon, Jan 30, 2023 at 03:55:55PM +0530, Anup Patel wrote:
>> On Sun, Jan 29, 2023 at 12:21 PM Jinyu Tang <[email protected]> wrote:
>>>
>>> The arch_has_hw_pte_young() is false for riscv by default. If it's
>>> false, page table walk is almost skipped for MGLRU reclaim. And it
>>> will also cause useless step in __wp_page_copy_user().
>>>
>>> RISC-V Privileged Book says that riscv have two schemes to manage A
>>> and D bit.
>>>
>>> So add a config for selecting, the default is true. For simple
>>> implementation riscv CPU which just generate page fault, unselect it.
>>
>> I totally disagree with this approach.
>>
>> Almost all existing RISC-V platforms don't have HW support
>> PTE.A and PTE.D updates.
>>
>> We want the same kernel image to run HW with/without PTE.A
>> and PTE.D updates so kconfig based approach is not going to
>> fly.
>>
>>>
>>> Signed-off-by: Jinyu Tang <[email protected]>
>>> ---
>>> arch/riscv/Kconfig | 10 ++++++++++
>>> arch/riscv/include/asm/pgtable.h | 7 +++++++
>>> 2 files changed, 17 insertions(+)
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index e2b656043abf..17c82885549c 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -180,6 +180,16 @@ config PAGE_OFFSET
>>> default 0x80000000 if 64BIT && !MMU
>>> default 0xff60000000000000 if 64BIT
>>>
>>> +config ARCH_HAS_HARDWARE_PTE_YOUNG
>>> + bool "Hardware Set PTE Access Bit"
>>> + default y
>>> + help
>>> + Select if hardware set A bit when PTE is accessed. The default is
>>> + 'Y', because most RISC-V CPU hardware can manage A and D bit.
>>> + But RISC-V may have simple implementation that do not support
>>> + hardware set A bit but only generate page fault, for that case just
>>> + unselect it.
>>> +
>>> config KASAN_SHADOW_OFFSET
>>> hex
>>> depends on KASAN_GENERIC
>>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>>> index 4eba9a98d0e3..1db54ab4e1ba 100644
>>> --- a/arch/riscv/include/asm/pgtable.h
>>> +++ b/arch/riscv/include/asm/pgtable.h
>>> @@ -532,6 +532,13 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
>>> */
>>> return ptep_test_and_clear_young(vma, address, ptep);
>>> }
>>> +#ifdef CONFIG_ARCH_HAS_HARDWARE_PTE_YOUNG
>>
>>> +#define arch_has_hw_pte_young arch_has_hw_pte_young
>>> +static inline bool arch_has_hw_pte_young(void)
>>> +{
>>> + return true;
>>
>> Drop the kconfig option ARCH_HAS_HARDWARE_PTE_YOUNG
>> and instead use code patching to return true only when Svadu
>> ISA extension is available in DT ISA string.
>
> Indeed. I should have checked if there was an extension for this
> first. It crossed my mind that we should only be enabling features
> when the extensions are present, but looking at the privileged manual
> isn't sufficient to learn about the Svadu extension. I should have
> checked https://wiki.riscv.org/display/HOME/Specification+Status
>
> Anyway, I retract my r-b and agree with Anup.
Svadu is a bit of a mess, for years it’s been legal to implement
hardware A/D tracking and such implementations exist (it’s what QEMU
has done for many years, and I know of an FPGA-based implementation
that does it too), yet RVA20S64 outlaws that by requiring what it calls
Ssptead and Svadu gets introduced to re-allow that behaviour gated
behind a CSR bit.
Jess