2022-02-09 16:07:42

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH] microblaze: remove CONFIG_SET_FS

From: Arnd Bergmann <[email protected]>

Remove the address space override API set_fs(). The microblaze user
address space is now limited to TASK_SIZE.

To support this we implement and wire in __get_kernel_nofault and
__set_kernel_nofault.

The function user_addr_max is removed as there is a default definition
provided when CONFIG_SET_FS is not used.

Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/microblaze/Kconfig | 1 -
arch/microblaze/include/asm/thread_info.h | 6 ---
arch/microblaze/include/asm/uaccess.h | 56 ++++++++++-------------
arch/microblaze/kernel/asm-offsets.c | 1 -
4 files changed, 25 insertions(+), 39 deletions(-)

diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index 59798e43cdb0..1fb1cec087b7 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -42,7 +42,6 @@ config MICROBLAZE
select CPU_NO_EFFICIENT_FFS
select MMU_GATHER_NO_RANGE
select SPARSE_IRQ
- select SET_FS
select ZONE_DMA
select TRACE_IRQFLAGS_SUPPORT

diff --git a/arch/microblaze/include/asm/thread_info.h b/arch/microblaze/include/asm/thread_info.h
index 44f5ca331862..a0ddd2a36fb9 100644
--- a/arch/microblaze/include/asm/thread_info.h
+++ b/arch/microblaze/include/asm/thread_info.h
@@ -56,17 +56,12 @@ struct cpu_context {
__u32 fsr;
};

-typedef struct {
- unsigned long seg;
-} mm_segment_t;
-
struct thread_info {
struct task_struct *task; /* main task structure */
unsigned long flags; /* low level flags */
unsigned long status; /* thread-synchronous flags */
__u32 cpu; /* current CPU */
__s32 preempt_count; /* 0 => preemptable,< 0 => BUG*/
- mm_segment_t addr_limit; /* thread address space */

struct cpu_context cpu_context;
};
@@ -80,7 +75,6 @@ struct thread_info {
.flags = 0, \
.cpu = 0, \
.preempt_count = INIT_PREEMPT_COUNT, \
- .addr_limit = KERNEL_DS, \
}

/* how to get the thread information struct from C */
diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index d2a8ef9f8978..346fe4618b27 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -16,45 +16,20 @@
#include <asm/extable.h>
#include <linux/string.h>

-/*
- * On Microblaze the fs value is actually the top of the corresponding
- * address space.
- *
- * The fs value determines whether argument validity checking should be
- * performed or not. If get_fs() == USER_DS, checking is performed, with
- * get_fs() == KERNEL_DS, checking is bypassed.
- *
- * For historical reasons, these macros are grossly misnamed.
- *
- * For non-MMU arch like Microblaze, KERNEL_DS and USER_DS is equal.
- */
-# define MAKE_MM_SEG(s) ((mm_segment_t) { (s) })
-
-# define KERNEL_DS MAKE_MM_SEG(0xFFFFFFFF)
-# define USER_DS MAKE_MM_SEG(TASK_SIZE - 1)
-
-# define get_fs() (current_thread_info()->addr_limit)
-# define set_fs(val) (current_thread_info()->addr_limit = (val))
-# define user_addr_max() get_fs().seg
-
-# define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
-
static inline int access_ok(const void __user *addr, unsigned long size)
{
if (!size)
goto ok;

- if ((get_fs().seg < ((unsigned long)addr)) ||
- (get_fs().seg < ((unsigned long)addr + size - 1))) {
- pr_devel("ACCESS fail at 0x%08x (size 0x%x), seg 0x%08x\n",
- (__force u32)addr, (u32)size,
- (u32)get_fs().seg);
+ if ((((unsigned long)addr) > TASK_SIZE) ||
+ (((unsigned long)addr + size - 1) > TASK_SIZE)) {
+ pr_devel("ACCESS fail at 0x%08x (size 0x%x)",
+ (__force u32)addr, (u32)size);
return 0;
}
ok:
- pr_devel("ACCESS OK at 0x%08x (size 0x%x), seg 0x%08x\n",
- (__force u32)addr, (u32)size,
- (u32)get_fs().seg);
+ pr_devel("ACCESS OK at 0x%08x (size 0x%x)\n",
+ (__force u32)addr, (u32)size);
return 1;
}

@@ -280,6 +255,25 @@ extern long __user_bad(void);
__gu_err; \
})

+#define __get_kernel_nofault(dst, src, type, label) \
+{ \
+ type __user *p = (type __force __user *)(src); \
+ type data; \
+ if (__get_user(data, p)) \
+ goto label; \
+ *(type *)dst = data; \
+}
+
+#define __put_kernel_nofault(dst, src, type, label) \
+{ \
+ type __user *p = (type __force __user *)(dst); \
+ type data = *(type *)src; \
+ if (__put_user(data, p)) \
+ goto label; \
+}
+
+#define HAVE_GET_KERNEL_NOFAULT
+
static inline unsigned long
raw_copy_from_user(void *to, const void __user *from, unsigned long n)
{
diff --git a/arch/microblaze/kernel/asm-offsets.c b/arch/microblaze/kernel/asm-offsets.c
index b77dd188dec4..47ee409508b1 100644
--- a/arch/microblaze/kernel/asm-offsets.c
+++ b/arch/microblaze/kernel/asm-offsets.c
@@ -86,7 +86,6 @@ int main(int argc, char *argv[])
/* struct thread_info */
DEFINE(TI_TASK, offsetof(struct thread_info, task));
DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));
- DEFINE(TI_ADDR_LIMIT, offsetof(struct thread_info, addr_limit));
DEFINE(TI_CPU_CONTEXT, offsetof(struct thread_info, cpu_context));
DEFINE(TI_PREEMPT_COUNT, offsetof(struct thread_info, preempt_count));
BLANK();
--
2.29.2



2022-02-09 18:20:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] microblaze: remove CONFIG_SET_FS

On Wed, Feb 09, 2022 at 02:50:32PM +0100, Michal Simek wrote:
> I can't see any issue with the patch when I run it on real HW.
> Tested-by: Michal Simek <[email protected]>
>
> Christoph: Is there any recommended test suite which I should run?

No. For architectures that already didn't use set_fs internally
there is nothing specific to test. Running some perf or backtrace
tests might be useful to check if the non-faulting kernel helpers
work properly.

2022-02-10 12:56:43

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] microblaze: remove CONFIG_SET_FS

From: Arnd
> Sent: 09 February 2022 14:49
>
> Remove the address space override API set_fs(). The microblaze user
> address space is now limited to TASK_SIZE.
>
> To support this we implement and wire in __get_kernel_nofault and
> __set_kernel_nofault.
>
> The function user_addr_max is removed as there is a default definition
> provided when CONFIG_SET_FS is not used.
...
> static inline int access_ok(const void __user *addr, unsigned long size)
> {
> if (!size)
> goto ok;
>
> - if ((get_fs().seg < ((unsigned long)addr)) ||
> - (get_fs().seg < ((unsigned long)addr + size - 1))) {
> - pr_devel("ACCESS fail at 0x%08x (size 0x%x), seg 0x%08x\n",
> - (__force u32)addr, (u32)size,
> - (u32)get_fs().seg);
> + if ((((unsigned long)addr) > TASK_SIZE) ||
> + (((unsigned long)addr + size - 1) > TASK_SIZE)) {
> + pr_devel("ACCESS fail at 0x%08x (size 0x%x)",
> + (__force u32)addr, (u32)size);
> return 0;

Isn't that the wrong check?
If 'size' is big 'addr + size' can wrap.

It needs to be (addr >= TASK_SIZE || size > TASK_SIZE - addr)

Which is pretty much a generic version.
Although typical 64bit architectures can use the faster:
((addr | size) >> 62)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2022-02-10 16:49:21

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] microblaze: remove CONFIG_SET_FS

From: Arnd Bergmann
> Sent: 10 February 2022 13:30
...
> #define __access_ok(addr, size) \
> ((get_fs().seg == KERNEL_DS.seg) || \
> (((unsigned long)addr < get_fs().seg) && \
> (unsigned long)size < (get_fs().seg - (unsigned long)addr)))

That one is strange.
Seems to be optimised for kernel accesses!

> ia64 and sparc skip the size check entirely but rely on an unmapped page
> at the beginning of the kernel address range, which avoids this problem
> but may also be dangerous.

An unmapped page requires that the kernel do sequential accesses
(or, at least, not big offset) - which is normally fine.
Especially for 64bit where there is plenty of address space.
I guess it could be problematic for 32bit if you can/want to
use 'big pages' for the kernel addresses.
Losing a single (typically) 4k page isn't a problem.

Certainly not mapping the page at TASK_SIZE is a good safety check.
Actually, setting TASK_SIZE to 0xc0000000 - PAGE_SIZE and never
mapping the last user page has the same effect.
Except I bet the ldso has to go there :-(
Not to mention the instruction sets where loading the constant
would then be two instructions.

...
> > Although typical 64bit architectures can use the faster:
> > ((addr | size) >> 62)
>
> I think this is the best version, and it's already widely used:

I just did a quick check, both clang and gcc optimise out
constant values for 'size'.

> static inline int __range_ok(unsigned long addr, unsigned long size)
> {
> return size <= TASK_SIZE && addr <= (TASK_SIZE - size);
> }
>
> since 'size' is usually constant, so this turns into a single comparison
> against a compile-time constant.

Hmmm... maybe there should be a comment that it is the same as
the more obvious:
(addr <= TASK_SIZE && addr <= TASK_SIZE - size)
but is better for constant size.
(Provided TASK_SIZE is a constant.)

I'm sure Linus was 'unhappy' about checking against 2^63 for
32bit processes on a 64bit kernel.

Hmmm compat code that has 32bit addr/size needn't even call
access_ok() - it can never access kernel memory at all.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-02-10 20:09:27

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] microblaze: remove CONFIG_SET_FS

On Thu, Feb 10, 2022 at 3:21 PM David Laight <[email protected]> wrote:
> From: Arnd Bergmann Sent: 10 February 2022 13:30
>
> > static inline int __range_ok(unsigned long addr, unsigned long size)
> > {
> > return size <= TASK_SIZE && addr <= (TASK_SIZE - size);
> > }
> >
> > since 'size' is usually constant, so this turns into a single comparison
> > against a compile-time constant.
>
> Hmmm... maybe there should be a comment that it is the same as
> the more obvious:
> (addr <= TASK_SIZE && addr <= TASK_SIZE - size)
> but is better for constant size.
> (Provided TASK_SIZE is a constant.)
>
> I'm sure Linus was 'unhappy' about checking against 2^63 for
> 32bit processes on a 64bit kernel.
>
> Hmmm compat code that has 32bit addr/size needn't even call
> access_ok() - it can never access kernel memory at all.

I suppose the generic function should compare against
TASK_SIZE_MAX or user_addr_max() then, to make it a
constant while TASK_SIZE potentially depends on compat
mode.

Arnd

2022-02-10 22:44:27

by Stafford Horne

[permalink] [raw]
Subject: Re: [PATCH] microblaze: remove CONFIG_SET_FS

On Thu, Feb 10, 2022 at 02:21:07PM +0000, David Laight wrote:
> From: Arnd Bergmann
> > static inline int __range_ok(unsigned long addr, unsigned long size)
> > {
> > return size <= TASK_SIZE && addr <= (TASK_SIZE - size);
> > }
> >
> > since 'size' is usually constant, so this turns into a single comparison
> > against a compile-time constant.
>
> Hmmm... maybe there should be a comment that it is the same as
> the more obvious:
> (addr <= TASK_SIZE && addr <= TASK_SIZE - size)
> but is better for constant size.
> (Provided TASK_SIZE is a constant.)

Ah, this makes sense that. I might as well revert the OpenRISC patch for this.
Though, we shall move to the generic version in the end.

With ideal compare:

-rwxrwxr-x. 1 shorne shorne 6011932 Feb 9 17:57 vmlinux

With comparing (size <= TASK_SIZE && addr <= TASK_SIZE - size):

-rwxrwxr-x. 1 shorne shorne 6003556 Feb 11 07:18 vmlinux

-Stafford

2022-02-11 17:40:56

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] microblaze: remove CONFIG_SET_FS

On Thu, Feb 10, 2022 at 10:36 AM David Laight <[email protected]> wrote:
> From: Arnd Sent: 09 February 2022 14:49
> >
> > Remove the address space override API set_fs(). The microblaze user
> > address space is now limited to TASK_SIZE.
> >
> > To support this we implement and wire in __get_kernel_nofault and
> > __set_kernel_nofault.
> >
> > The function user_addr_max is removed as there is a default definition
> > provided when CONFIG_SET_FS is not used.
> ...
> > static inline int access_ok(const void __user *addr, unsigned long size)
> > {
> > if (!size)
> > goto ok;
> >
> > - if ((get_fs().seg < ((unsigned long)addr)) ||
> > - (get_fs().seg < ((unsigned long)addr + size - 1))) {
> > - pr_devel("ACCESS fail at 0x%08x (size 0x%x), seg 0x%08x\n",
> > - (__force u32)addr, (u32)size,
> > - (u32)get_fs().seg);
> > + if ((((unsigned long)addr) > TASK_SIZE) ||
> > + (((unsigned long)addr + size - 1) > TASK_SIZE)) {
> > + pr_devel("ACCESS fail at 0x%08x (size 0x%x)",
> > + (__force u32)addr, (u32)size);
> > return 0;
>
> Isn't that the wrong check?
> If 'size' is big 'addr + size' can wrap.

Ah right, that seems dangerous, and we should probably fix that first, with
a backport to stable kernels before my patch. I see the same bug on csky
and hexagon:

static inline int __access_ok(unsigned long addr, unsigned long size)
{
unsigned long limit = current_thread_info()->addr_limit.seg;
return ((addr < limit) && ((addr + size) < limit));
}

#define __access_ok(addr, size) \
((get_fs().seg == KERNEL_DS.seg) || \
(((unsigned long)addr < get_fs().seg) && \
(unsigned long)size < (get_fs().seg - (unsigned long)addr)))

ia64 and sparc skip the size check entirely but rely on an unmapped page
at the beginning of the kernel address range, which avoids this problem
but may also be dangerous.

m68k-coldfire-mmu has a comment about needing a check, but tests
for neither address nor size.

> It needs to be (addr >= TASK_SIZE || size > TASK_SIZE - addr)
>
> Which is pretty much a generic version.
> Although typical 64bit architectures can use the faster:
> ((addr | size) >> 62)

I think this is the best version, and it's already widely used:

static inline int __range_ok(unsigned long addr, unsigned long size)
{
return size <= TASK_SIZE && addr <= (TASK_SIZE - size);
}

since 'size' is usually constant, so this turns into a single comparison
against a compile-time constant.

Arnd

2022-02-11 22:35:02

by Stafford Horne

[permalink] [raw]
Subject: Re: [PATCH] microblaze: remove CONFIG_SET_FS

On Fri, Feb 11, 2022 at 03:10:10PM -0600, Eric W. Biederman wrote:
> Arnd Bergmann <[email protected]> writes:
> >
> > I had previously gotten stuck at ia64, but gave it another go now
> > and uploaded an updated branch with ia64 taken care of and another
> > patch to clean up bits afterwards.
> >
> > I only gave it light testing so far, mainly building the defconfig for every
> > architecture. I'll post the series once the build bots are happy with the
> > branch overall.
> >
>
> Thank you so much for doing this work.
>

I'll echo this. Thank you, the changes look good. I test built and booted the
OpenRISC architecture and it works.

I can drop the openrisc only patch for this from the openrisc queue now.

-Stafford

2022-02-14 13:04:52

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] microblaze: remove CONFIG_SET_FS

Arnd Bergmann <[email protected]> writes:

> On Fri, Feb 11, 2022 at 6:46 PM Linus Torvalds
> <[email protected]> wrote:
>> On Fri, Feb 11, 2022 at 9:00 AM Arnd Bergmann <[email protected]> wrote:
>> >
>> > I have now uploaded a cleanup series to
>> > https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=set_fs
>> >
>> > This uses the same access_ok() function across almost all
>> > architectures, with the exception of those that need something else,
>> > and I then I went further and killed off set_fs for everything other
>> > than ia64.
>>
>> Thanks, looks good to me.
>>
>> Can you say why you didn't convert ia64? I don't see any set_fs() use
>> there, except for the unaligned handler, which looks trivial to
>> remove. It looks like the only reason for it is kernel-mode unaligned
>> exceptions, which we should just turn fatal, I suspect (they already
>> get logged).
>>
>> And ia64 people could make the unaligned handling do the kernel mode
>> case in emulate_load/store_int() - it doesn't look *that* painful.
>>
>> But maybe you noticed something else?
>>
>> It would be really good to just be able to say that set_fs() no longer
>> exists at all.
>
> I had previously gotten stuck at ia64, but gave it another go now
> and uploaded an updated branch with ia64 taken care of and another
> patch to clean up bits afterwards.
>
> I only gave it light testing so far, mainly building the defconfig for every
> architecture. I'll post the series once the build bots are happy with the
> branch overall.
>

Thank you so much for doing this work.

Eric