Hi all,
this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.
[Note to Linus: I'd like to get this into linux-next rather earlier
than later. Do you think it is ok to add this tree to linux-next?]
The file system part has been posted a few times, and the read/write side
has been pretty much unchanced. For splice this series drops the
conversion of the seq_file and sysctl code to the iter ops, and thus loses
the splice support for them. The reasons for that is that it caused a lot
of churn for not much use - splice for these small files really isn't much
of a win, even if existing userspace uses it. All callers I found do the
proper fallback, but if this turns out to be an issue the conversion can
be resurrected.
Besides x86 and powerpc I plan to eventually convert all other
architectures, although this will be a slow process, starting with the
easier ones once the infrastructure is merged. The process to convert
architectures is roughtly:
(1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
(2) implement __get_kernel_nofault and __put_kernel_nofault
(3) remove the arch specific address limitation functionality
Changes since v2:
- add back the patch to support splice through read_iter/write iter
on /proc/sys/*
- entirely remove the tests that depend on set_fs. Note that for
lkdtm the maintainer (Kees) disagrees with this request from Linus
- fix a wrong check in the powerpc access_ok, and drop a few spurious
cleanups there
Changes since v1:
- drop the patch to remove the non-iter ops for /dev/zero and
/dev/null as they caused a performance regression
- don't enable user access in __get_kernel on powerpc
- xfail the set_fs() based lkdtm tests
Diffstat:
On Thu, Sep 3, 2020 at 7:22 AM Christoph Hellwig <[email protected]> wrote:
>
> [Note to Linus: I'd like to get this into linux-next rather earlier
> than later. Do you think it is ok to add this tree to linux-next?]
This whole series looks really good to me now, with each patch looking
like a clear cleanup on its own.
So ack on the whole series, and yes, please get this into linux-next
and ready for 5.10. Whether through Al or something else.
Thanks,
Linus
* Christoph Hellwig <[email protected]> wrote:
> Hi all,
>
> this series removes the last set_fs() used to force a kernel address
> space for the uaccess code in the kernel read/write/splice code, and then
> stops implementing the address space overrides entirely for x86 and
> powerpc.
Cool! For the x86 bits:
Acked-by: Ingo Molnar <[email protected]>
Thanks,
Ingo
On Fri, Sep 04, 2020 at 08:00:24AM +0200, Ingo Molnar wrote:
> * Christoph Hellwig <[email protected]> wrote:
> > this series removes the last set_fs() used to force a kernel address
> > space for the uaccess code in the kernel read/write/splice code, and then
> > stops implementing the address space overrides entirely for x86 and
> > powerpc.
>
> Cool! For the x86 bits:
>
> Acked-by: Ingo Molnar <[email protected]>
set_fs() is older than some kernel hackers!
$ cd linux-0.11/
$ find . -type f -name '*.h' | xargs grep -e set_fs -w -n -A3
./include/asm/segment.h:61:extern inline void set_fs(unsigned long val)
./include/asm/segment.h-62-{
./include/asm/segment.h-63- __asm__("mov %0,%%fs"::"a" ((unsigned short) val));
./include/asm/segment.h-64-}
On Fri, Sep 4, 2020 at 10:58 AM Alexey Dobriyan <[email protected]> wrote:
>
> set_fs() is older than some kernel hackers!
>
> $ cd linux-0.11/
> $ find . -type f -name '*.h' | xargs grep -e set_fs -w -n -A3
Oh, it's older than that. It was there (as set_fs) in 0.10, and may
even predate that. But sadly, I don't have tar-balls for 0.02 and
0.03, so can't check.
The actual use of %fs as the user space segment is already there in
0.01, but there was no 'set_fs()'. That was a simpler and more direct
time, and "get_fs()" looked like this back then:
#define _fs() ({ \
register unsigned short __res; \
__asm__("mov %%fs,%%ax":"=a" (__res):); \
__res;})
and all the setting was basically part of the kernel entry asm and. Lovely.
Linus
From: Alexey Dobriyan
> Sent: 04 September 2020 18:58
>
> On Fri, Sep 04, 2020 at 08:00:24AM +0200, Ingo Molnar wrote:
> > * Christoph Hellwig <[email protected]> wrote:
> > > this series removes the last set_fs() used to force a kernel address
> > > space for the uaccess code in the kernel read/write/splice code, and then
> > > stops implementing the address space overrides entirely for x86 and
> > > powerpc.
> >
> > Cool! For the x86 bits:
> >
> > Acked-by: Ingo Molnar <[email protected]>
>
> set_fs() is older than some kernel hackers!
>
> $ cd linux-0.11/
> $ find . -type f -name '*.h' | xargs grep -e set_fs -w -n -A3
> ./include/asm/segment.h:61:extern inline void set_fs(unsigned long val)
> ./include/asm/segment.h-62-{
> ./include/asm/segment.h-63- __asm__("mov %0,%%fs"::"a" ((unsigned short) val));
> ./include/asm/segment.h-64-}
What is this strange %fs register you are talking about.
Figure 2-4 only has CS, DS, SS and ES.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Le 04/09/2020 à 23:01, David Laight a écrit :
> From: Alexey Dobriyan
>> Sent: 04 September 2020 18:58
>>
>> On Fri, Sep 04, 2020 at 08:00:24AM +0200, Ingo Molnar wrote:
>>> * Christoph Hellwig <[email protected]> wrote:
>>>> this series removes the last set_fs() used to force a kernel address
>>>> space for the uaccess code in the kernel read/write/splice code, and then
>>>> stops implementing the address space overrides entirely for x86 and
>>>> powerpc.
>>>
>>> Cool! For the x86 bits:
>>>
>>> Acked-by: Ingo Molnar <[email protected]>
>>
>> set_fs() is older than some kernel hackers!
>>
>> $ cd linux-0.11/
>> $ find . -type f -name '*.h' | xargs grep -e set_fs -w -n -A3
>> ./include/asm/segment.h:61:extern inline void set_fs(unsigned long val)
>> ./include/asm/segment.h-62-{
>> ./include/asm/segment.h-63- __asm__("mov %0,%%fs"::"a" ((unsigned short) val));
>> ./include/asm/segment.h-64-}
>
> What is this strange %fs register you are talking about.
> Figure 2-4 only has CS, DS, SS and ES.
>
Intel added registers FS and GS in the i386
Christophe
From: Christophe Leroy
> Sent: 05 September 2020 08:16
>
> Le 04/09/2020 à 23:01, David Laight a écrit :
> > From: Alexey Dobriyan
> >> Sent: 04 September 2020 18:58
...
> > What is this strange %fs register you are talking about.
> > Figure 2-4 only has CS, DS, SS and ES.
> >
>
> Intel added registers FS and GS in the i386
I know, I've got both the 'iAPX 286 Programmer's Reference Manual'
and the '80386 Programmer's Reference Manual' on my shelf.
I don't have the 8088 book though - which I used in 1982.
The old books are a lot easier to read if, for instance,
you are trying to work out how to back and forth to real mode
to do bios calls.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
> -----Original Message-----
> From: Segher Boessenkool <[email protected]>
> Sent: 10 September 2020 16:21
> To: David Laight <[email protected]>
> Cc: 'Christophe Leroy' <[email protected]>; 'Linus Torvalds' <torvalds@linux-
> foundation.org>; linux-arch <[email protected]>; Kees Cook <[email protected]>; the
> arch/x86 maintainers <[email protected]>; Nick Desaulniers <[email protected]>; Linux Kernel
> Mailing List <[email protected]>; Alexey Dobriyan <[email protected]>; Luis Chamberlain
> <[email protected]>; Al Viro <[email protected]>; linux-fsdevel <[email protected]>;
> linuxppc-dev <[email protected]>; Christoph Hellwig <[email protected]>
> Subject: Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3
>
> On Thu, Sep 10, 2020 at 12:26:53PM +0000, David Laight wrote:
> > Actually this is pretty sound:
> > __label__ label;
> > register int eax asm ("eax");
> > // Ensure eax can't be reloaded from anywhere
> > // In particular it can't be reloaded after the asm goto line
> > asm volatile ("" : "=r" (eax));
>
> This asm is fine. It says it writes the "eax" variable, which lives in
> the eax register *in that asm* (so *not* guaranteed after it!).
>
> > // Provided gcc doesn't save eax here...
> > asm volatile goto ("xxxxx" ::: "eax" : label);
>
> So this is incorrect.
From the other email:
> It is neither input nor output operand here! Only *then* is a local
> register asm guaranteed to be in the given reg: as input or output to an
> inline asm.
Ok, so adding '"r" (eax)' to the input section helps a bit.
> > // ... and reload the saved value here.
> > // The input value here will be that modified by the 'asm goto'.
> > // Since this modifies eax it can't be moved before the 'asm goto'.
> > asm volatile ("" : "+r" (eax));
> > // So here eax must contain the value set by the "xxxxx" instructions.
>
> No, the register eax will contain the value of the eax variable. In the
> asm; it might well be there before or after the asm as well, but none of
> that is guaranteed.
Perhaps not 'guaranteed', but very unlikely to be wrong.
It doesn't give gcc much scope for not generating the desired code.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Thu, Sep 10, 2020 at 03:31:53PM +0000, David Laight wrote:
> > > asm volatile ("" : "+r" (eax));
> > > // So here eax must contain the value set by the "xxxxx" instructions.
> >
> > No, the register eax will contain the value of the eax variable. In the
> > asm; it might well be there before or after the asm as well, but none of
> > that is guaranteed.
>
> Perhaps not 'guaranteed', but very unlikely to be wrong.
> It doesn't give gcc much scope for not generating the desired code.
Wanna bet? :-)
Correct is correct. Anything else is not.
Segher