[apologies for enormous Cc; I've talked to some of you in private mail
and after being politely asked to explain WTF was all that thing for
and how was it supposed to work, well...]
Below is an attempt to describe how kernel threads work now. I'm
going to put a cleaned up variant into Documentation/something, so any
questions, suggestions of improvements, etc. are very welcome.
1. Basic rules for process lifetime.
Except for the initial process (init_task, eventual idle thread on the boot
CPU) all processes are created by do_fork(). There are three classes of
those: kernel threads, userland processes and idle threads to be. There are
few low-level operations involved:
* a kernel thread can spawn a new kernel thread; the primitive
doing that is kernel_thread().
* a userland process can spawn a new userland process; that's
done by sys_fork()/sys_vfork()/sys_clone()/sys_clone2().
* a kernel thread can become a userland process. The primitive
is kernel_execve().
* a kernel thread can spawn a future idle thread; that's done
by fork_idle(). Result is *not* scheduled until the secondary CPU gets
initialized and its state is heavily overwritten in process.
Under no circumstances a userland process can become a kernel thread or
spawn one. And kernel threads never do fork(2) et.al.
Note that kernel_thread() and kernel_execve() are really very low-level.
In particular, any process, be it a userland one or a kernel thread, can
ask a dedicated kernel thread (kthreadd) to spawn a kernel thread and
have a given function executed in it. Or to stop a thread that had been
spawned that way. Or ask to spawn it tied to given CPU, etc. The public
interfaces are in linux/kthread.h and the code implementing them is in
kernel/kthread.c; kernel_thread() is what it uses internally. Another
group of related public APIs deals with spawning a userland process from
kernel - call_usermodehelper() and friends in linux/kmod.h and kernel/kmod.c.
These two groups cover everything kernel-thread-related we care about in
the kernel and I'm not going to deal with them here. What I'm going to
describe is the primitives used to implement those mechanisms.
Historically the situation used to be different - kernel_thread() used to
be a fairly widely used public API until 2006 or so. Some out-of-tree
code might still be using it; the proper fix is to switch to use of
kthread_run() and be done with that. kthread_run() has calling conventions
and rules for callback similar to what kernel_thread() used to have, so
conversion tends to be trivial.
The rules for kernel_thread() callbacks (all 6 of them ;-) have changed,
though. What we currently have is
kernel_thread(fn, arg)
where arg is void * and fn is an int-returning function with void * as
argument. New kernel thread is created and fn(arg) is called in it.
It should either never return (run forever, as kthreadd or call something
that would terminate the thread - do_exit() or, in one case, panic())
or return 0. That is done after kernel_execve() has returned 0 and
then the thread will proceed into userland context created by that execve.
Note that some architectures still have kernel_execve() itself switch
to userland upon success; that's fine - this is just another case of
callback never returning to caller. In other words, this switchover
to new model isn't a flagday affair - all callbacks are already in the
form that works both for converted and unconverted architectures.
2. How should kernel_thread() and kernel_execve() work for
converted architecture?
Recall how the fork() works. We have the syscall call do_fork(), passing
it the pointer to struct pt_regs created on syscall entry and holding the
userland state of caller. do_fork() has new task_struct and new kernel
stack allocated; then it calls copy_thread(), which sets the arch-dependent
things for new process. Then it makes the new task_struct visible to
scheduler and once it's picked for execution, it'll be woken up and proceed
to return to userland, restoring the userland state copied from the parent.
The work of copy_thread() is to arrange the things up for that.
It copies pt_regs to wherever the child would expect to find them on return
from syscall (usually on child's kernel stack) and sets things up so that
when the scheduler finally does switch_to() into the newborn, it will be
woken up in the code that will drive it to userland. Normally switch_to()
wakes the next process up in the place where it has given the CPU last time,
i.e. in the same switch_to(). We could, in principle, set the things up
for newborn so that they would look that way. No architecture goes to
such pains, though - no point faking a fairly deep call chain, especially
since changes in scheduler might require modifying all such fakers. What's
done instead is a much shorter call chain - we act as if we had given CPU
up in the very beginning of ret_from_fork(), called from the syscall entry
glue. Since we won't be going through the parts of schedule() done after
switch_to(), ret_from_fork() starts with calling schedule_tail() to mop
up. Then it's off to the normal return from syscall.
Old implementation of kernel_thread() had been rather convoluted. In the
best case, it filled struct pt_regs according to its arguments and passed
them to do_fork(). The goal was to fool the code doing return from
syscall into *not* leaving the kernel mode, so that newborn would have
(after emptying its kernel stack) end up in a helper function with the
right values in registers. Running in kernel mode. The helper took
fn and arg, and called the former passing it the latter. Then it called
do_exit(), assuming it got that far. Contortions came from the "fool
the return from syscall into leaving us in kernel mode" part.
New implementation is much simpler. Generic kernel_thread() still does
do_fork(), but instead of filling pt_regs it passes fn and arg in a couple
of arguments that are blindly passed to copy_thread() and passes NULL as
pt_regs pointer. In that case copy_thread() should still arrange the things
up for switch_to(), but instead of ret_from_fork() we want to wake up in
a slightly different function. The name (just as in case of ret_from_fork)
is entirely up to the architecture; I've called it ret_from_kernel_thread
in most of the cases. What it does is almost identical to what ret_from_fork()
does; it calls schedule_tail() to mop up, then it does fn(arg), using the
information left to it by copy_thread(), then it's off to return from syscall.
Note the difference between that and the old one: instead of
* schedule_tail() finishes the things for scheduler
* return to userland, fooled into leaving us in kernel mode; registers
are set from what we'd left in pt_regs.
* we are in helper() (and still in kernel mode, with empty kernel
stack), which calls fn(arg)
* fn(arg) either never returns or does successful kernel_execve(),
which does magic to switch to user mode and jump into the image we got from
the binary loaded by kernel_execve()
we have
* schedule_tail() finishes the things for scheduler
* fn(arg) is called
* fn(arg) either never returns or does successful kernel_execve(),
which doesn't have to do any magic - it has pt_regs on kernel stack in
the right position, so filling them up as usual and returning 0 to caller
is just fine
* we proceed to return to userland. Nobody needs to be fooled,
everything happens as on normal return from execve(2) - registers are
set as needed by the contents of pt_regs, as filled by do_execve() and
we are off to user mode at the entry point of new binary.
The new variant is obviously nowhere near as hairy. Moreover, kernel_execve()
can be completely generic as well. Even better, we don't have to cope with
clone(2) or execve(2) done with non-empty kernel stack (which was a fairly
common way to do aforementioned black magic in kernel_execve()), so sys_execve()
doesn't need anything convoluted to find the pt_regs to pass to do_execve().
In other words, on converted platforms we can switch sys_execve() to
completely generic version as well.
3. Gory details.
As I mentioned above, there's a couple of do_fork() arguments that are passed
to copy_thread() as-is, without even looking at them. Those we use to pass
fn and arg. It's the second and the third argument resp.; for userland
clone2(2) we'd be passing userland stack pointer and stack size in them.
Only ia64 has clone2() wired, so usually copy_thread() instance names them
'usp' and 'unused' resp. - fork()/vfork()/clone() pass userland stack pointer
to do_fork(), but pass 0 as the 3rd argument. In any case, for copy_thread()
it's something like
if (unlikely(!regs)) {
set the things up, expecting (unsigned long)fn in argument 2
and (unsigned long)arg in argument 3
} else {
copy *regs to child, etc.
}
How to set the things up depends mostly on the way switch_to() is implemented.
In any case, we need to clean the child's pt_regs - it wouldn't do to leak
random kernel data in to userland registers if the child eventually becomes
a userland process. If switch_to() restores callee-saved registers of the
process it switches to before jumping to the place where that process should
be woken up (i.e. if the processes appear to sleep at the very end of
switch_to() and not in the middle), it's probably the best to pass fn and
arg in a pair of callee-saved registers; then ret_from_kernel_thread() will
find them already loaded into those registers by switch_to(). If switch_to()
is something like
save callee-saved registers of last process
save stack pointer
save l as wakep location
restore stack pointer of the next process
jump to wakeup location of the next process
l: restore callee-saved registers of the next process
(i.e. the wakeup location is in the middle of switch_to), it's probably best
to save fn and arg in child's pt_regs and read them explicitly in
ret_from_kernel_thread(), since switch_to() won't get to restoring callee-saved
registers when it switches to newborn. Stack pointer is almost certainly
switched before the jump; if it isn't, we are going to notice that as soon
as you look at ret_from_fork() - it would be in the same situation and it
would have to set the stack pointer itself, so we can just duplicate that.
In any case, ret_from_fork() and ret_from_kernel_thread() will be very
similar. So much that on a predicated architecture it might make sense
to merge them and just make the call of payload predicated on "is it
a kernel thread" or something equivalent. I'd done that for arm (and
fucked up the call setup in case of thumb-mode kernel, which rmk had fun
to debug) and ia64. Probably the same could be done for parisc as well.
One note about clearing the child's pt_regs: we won't be using it to
fool the return from syscall into anything, so most of the convolutions
go away. It's probably a good idea to set it so that user_mode(child_regs)
would be false - more robust that way. The only case when they end up
anywhere near return from syscall codepath is if we'd done successful
do_execve(), so we can count on at least start_thread() having been done.
Usually that's enough; in one case (ppc64) I ended up with a lovely
detection job finding out why it wasn't. Turned out that there was a
field (childregs->softe) that was set to 1 by kernel_execve() black
magic; return from syscall logics relied on it being non-zero. Something
similar might easily be true elsewhere; that's a potential pitfall that
might be useful to keep in mind debugging that stuff.
4. What is done?
I've done the conversions for almost all architectures, but quite a few
are completely untested.
I'm fairly sure about alpha, x86 and um. Tested and I understand the
architecture well enough. arm, mips and c6x had been tested by architecture
maintainers. This stuff also works. alpha, arm, x86 and um are fully
converted in mainline by now.
Next group: m68k, ppc, s390, arm64, parisc. I'm reasonably sure those
are OK, but I'd like the maintainers to take a look.
sparc: Dave said he'll look it through. I'm still in one piece and not
charred, so either it's OK or he didn't have time to read it yet. Works
here, anyway.
Next comes the pile of embedded architectures where the best that can be said
about what I have is that it might serve as a starting point for producing
something that works. I've no hardware, no emulated setups and my
knowledge of architecture comes from architecture manuals and nothing
else. Those are
avr32
blackfin
cris
frv
h8300
m32r
microblaze
mn10300
score
sh
unicore32
Maintainers are Cc'd. My (very, _very_ tentative) patchsets are in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal arch-$ARCH
Nearly in the same state: ia64. The only difference is that I've tested
it under ski(1) and it seems to work. Accuracy of ski(1) for the purposes
of finding bugs in asm glue is not inspiring, though.
Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
I would very much appreciate ACKs/testing/fixes/outright replacements/etc.
for this stuff. Right now all infrastructure is in the mainline and
per-architecture bits are entirely independent from each other. As soon
as maintainer in question is OK with what's in such per-architecture branch,
I'll be quite happy to put it into never-rebased mode, so that it would be
safe to pull. There are some fun things that'll become possible once
all architectures are converted, but let's handle that stuff first, OK?
On Wed, Oct 17, 2012 at 2:35 AM, Al Viro <[email protected]> wrote:
> [apologies for enormous Cc; I've talked to some of you in private mail
> and after being politely asked to explain WTF was all that thing for
> and how was it supposed to work, well...]
[...]
> Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
I'm doing xtensa part.
BTW, what linus-arch ML might be for?
--
Thanks.
-- Max
On Wed, Oct 17, 2012 at 09:32:34AM +0400, Max Filippov wrote:
> On Wed, Oct 17, 2012 at 2:35 AM, Al Viro <[email protected]> wrote:
> > [apologies for enormous Cc; I've talked to some of you in private mail
> > and after being politely asked to explain WTF was all that thing for
> > and how was it supposed to work, well...]
>
> [...]
>
> > Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
>
> I'm doing xtensa part.
Thanks; I hope this variant is going to be less painful than messing with
ret_from_kernel_execve()...
> BTW, what linus-arch ML might be for?
A typo, noticed only when I got a bounce ;-)
My apologies...
On 17 October 2012 00:35, Al Viro <[email protected]> wrote:
>
> Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
>
I did most of the OpenRISC conversion last weekend... the
kernel_thread bits work fine but I end up with the init thread dying
with what I've got now for kernel_execve. Once I've got that sorted
out, I'll pass this along to you.
Thanks for the above explanation... it took me a good while to wrap my
head around how this was supposed to work from reading through the
arch's that you had already converted. All in all, though, I find
it's a nice cleanup and, conceptually, an improvement over what we had
before.
Thanks,
Jonas
2012/10/17 Jonas Bonn <[email protected]>:
> On 17 October 2012 00:35, Al Viro <[email protected]> wrote:
>>
>> Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
>>
>
> I did most of the OpenRISC conversion last weekend... the
> kernel_thread bits work fine but I end up with the init thread dying
> with what I've got now for kernel_execve. Once I've got that sorted
> out, I'll pass this along to you.
I am testing the Microblaze conversion and I see the similar problem
with GENERIC_KERNEL_EXECVE
(commit: http://git.kernel.org/?p=linux/kernel/git/viro/signal.git;a=commit;h=6aa044199aed5b541eba7fe7f25efdfb3a655a58)
I have look at the patch and I have found this.
(From description above: a kernel thread can become a userland
process. The primitive is kernel_execve())
In init/main.c:795/run_init_process() kernel_execve is called.
In old style, kernel_execve is called which runs microblaze
kernel_execve which calls __NR_execve as syscall.
In entry.S user exception detects that jump comes from kernel space
and save pt_regs on the current stack
and calls sys_execve and then microblaze_execve with 4th argument
which is pointer to pt_regs, etc.
In the patch above there is directly used current_pt_regs() function
which works good for newly created threads
when pt_regs are exactly in current_pt_regs() position but not for
pt_regs which are saved on the stack
which is the init task case.
Also this is the reason why microblaze has implementation for calling
_user_exception from the kernel space.
I believe that it is called just once for /init.
My question is how should /init be called? Because I need to save
pt_regs to current_pt_regs() position where
generic kernel_execve expects it.
Thanks,
Michal
--
Michal Simek, Ing. (M.Eng)
w: http://www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian
On Wed, Oct 17, 2012 at 04:27:06PM +0200, Michal Simek wrote:
> In the patch above there is directly used current_pt_regs() function
> which works good for newly created threads
> when pt_regs are exactly in current_pt_regs() position but not for
> pt_regs which are saved on the stack
> which is the init task case.
init_task does *not* do kernel_execve(). It's PID 0, not PID 1.
init is spawned by it.
> My question is how should /init be called? Because I need to save
> pt_regs to current_pt_regs() position where
> generic kernel_execve expects it.
What happens during boot is this:
* init_task (not to be confused with init) is used as current during
infrastructure initializations. Once everything needed for scheduler and
for working fork is set, we spawn two threads - future init and future
kthreadd. The last thing we do with init_task is telling init that kthreadd
has been spawned. After that init_task turns itself into an idle thread.
* future init waits for kthreadd to be spawned (it would be more
natural to fork them in opposite order, but we want init to have PID 1 -
too much stuff in userland depends on that). Then it does the rest of
initialization, including setting up initramfs contents. And does
kernel_execve() on /init. Note that this is a task that had been created
by kernel_thread() and is currently in function called from
ret_from_kernel_thread(). Its kernel stack has been set up by copy_thread().
That's where pt_regs need to be set up; note that they'll be passed to
start_thread() before you return to userland. If there are any magic bits
in pt_regs needed by return-from-syscall code, set them in kthread case of
copy_thread().
On Wed, Oct 17, 2012 at 05:07:03PM +0100, Al Viro wrote:
> What happens during boot is this:
> * init_task (not to be confused with init) is used as current during
> infrastructure initializations. Once everything needed for scheduler and
> for working fork is set, we spawn two threads - future init and future
> kthreadd. The last thing we do with init_task is telling init that kthreadd
> has been spawned. After that init_task turns itself into an idle thread.
> * future init waits for kthreadd to be spawned (it would be more
> natural to fork them in opposite order, but we want init to have PID 1 -
> too much stuff in userland depends on that). Then it does the rest of
> initialization, including setting up initramfs contents. And does
> kernel_execve() on /init. Note that this is a task that had been created
> by kernel_thread() and is currently in function called from
> ret_from_kernel_thread(). Its kernel stack has been set up by copy_thread().
> That's where pt_regs need to be set up; note that they'll be passed to
> start_thread() before you return to userland. If there are any magic bits
> in pt_regs needed by return-from-syscall code, set them in kthread case of
> copy_thread().
PS: I suspect that we end up with the wrong value in childregs->msr;
start_thread() only add MSR_UMS there. I'd suggest running the kernel
with these patches + printk childregs->msr the very first time start_thread()
is called and see what it prints, then working kernel + such printk and
compare the results...
On Tue, Oct 16, 2012 at 11:35:08PM +0100, Al Viro wrote:
> 1. Basic rules for process lifetime.
> Except for the initial process (init_task, eventual idle thread on the boot
> CPU) all processes are created by do_fork(). There are three classes of
> those: kernel threads, userland processes and idle threads to be. There are
> few low-level operations involved:
> * a kernel thread can spawn a new kernel thread; the primitive
> doing that is kernel_thread().
> * a userland process can spawn a new userland process; that's
> done by sys_fork()/sys_vfork()/sys_clone()/sys_clone2().
> * a kernel thread can become a userland process. The primitive
> is kernel_execve().
> * a kernel thread can spawn a future idle thread; that's done
> by fork_idle(). Result is *not* scheduled until the secondary CPU gets
> initialized and its state is heavily overwritten in process.
Minor correction: while the first two cases go through do_fork() to
copy_process() to copy_thread(), fork_idle() calls copy_process() directly.
> 4. What is done?
> I've done the conversions for almost all architectures, but quite a few
> are completely untested.
>
> I'm fairly sure about alpha, x86 and um. Tested and I understand the
> architecture well enough. arm, mips and c6x had been tested by architecture
> maintainers. This stuff also works. alpha, arm, x86 and um are fully
> converted in mainline by now.
arm64 fixed and tested by maintainer, put in no-rebase mode.
sparc corrected to avoid branching beyond what ba,pt allows, ACKed by Davem
in that form. In no-rebase mode.
m68k tested and ACKed on coldfire; I think that along with aranym testing
here that is enough. In no-rebase mode.
Surprisingly enough, ia64 one seems to work on actual hardware; I have sent
Tony an incremental patch cleaning copy_thread() up, waiting for results of
testing that on SMP box.
Even more surprisingly, unicore32 variant turned out to contain only one
obvious typo. Fixed and tested by maintainer of unicore32 tree and actually
applied there, I've pulled his branch at that point.
microblaze: some fixes from Michal folded, still breakage with kernel_execve()
side of things.
Since there had been no signs of life from hexagon folks, I'd done (absolutely
blind and untested) tentative patches; see #arch-hexagon. Same situation
as with most of the embedded architectures - i.e. take with a cartload of salt,
that pair of patches is intended to be a possible starting point for producing
something working.
At that point we have the following situation:
alpha done
arm done
arm64 done
avr32 untested
blackfin untested
c6x done
cris untested
frv untested, maintainer going to test
h8300 untested
hexagon untested
ia64 apparently works, needs the final ACK from Tony.
m32r untested
m68k done
microblaze partially tested, maintainer hunting breakage down
mips done
mn10300 untested
openrisc maintainers said to have partially working variant
parisc should work, needs testing and ACK
powerpc should work, needs testing and ACK
s390 should work, needs testing and ACK
score untested
sh untested, maintainers planned reviewing and testing
sparc done
tile maintainers writing that one
um done
unicore32 done
x86 done
xtensa maintainers writing that one
One more thing: AFAICS, just about everything has something along the lines
of
if (!usp)
usp = <current userland sp>
do_fork(flags, usp, ....)
in their sys_clone(). How about taking that into copy_thread()? After
all, the logics there is
copy all the state, including userland stack pointer to child
override userland stack pointer with what the caller passed to
copy_thread()
often enough with "... and if we are about to override it with something
different, do the following extra work". Turning that into
copy all the state, including userland stack pointer to child
if (usp) {
override the userland stack pointer for child and maybe do
some extra work
}
would seem to be a fairly natural thing. Does anybody see problems with
doing that on their architecture? Note that with that fork() becomes
simply
#ifndef CONFIG_MMU
return -EINVAL;
#else
return do_fork(SIGCHLD, 0, current_pt_regs(), 0, NULL, NULL);
#endif
and similar for vfork(). And these can definitely drop the Cthulhu-awful
kludges for obtaining pt_regs (OK, on everything that doesn't do
kernel_thread() via syscall-from-kernel, but by now only xtensa is still
doing that). In some cases we need to do a bit of work before that
(gather callee-saved registers so that the child could get them as on alpha,
mips, m68k, openrisc, parisc, ppc and x86, flush userland register windows
on sparc and get psr/wim values on sparc32), but a lot more architectures
lose the asm wrappers for those and the rest can get rid of assorted
ugliness involved in getting that struct pt_regs *.
BTW, alpha seems to be doing an absolutely pointless work on the way out of
sys_fork() et.al. - saving callee-saved registers is needed, all right,
but why bother restoring all of them on the way out in the parent? All
we need is rp; that's ~0.3Kb of useless reads from memory on each fork()...
The same goes for m68k; there the amount of traffic is less, but still, what
the hell for? Child needs callee-saved registers restored (and usually will
have that done by switch_to()), but the parent needs only to make sure they
are saved and available for copy_thread() to bring them to child (incidentally,
copying registers is needed only when they are not embedded into task_struct.
At least um is doing a memcpy() for no reason whatsoever; fix will be sent
to rw shortly and ISTR seeing something similar on some of the other
architectures).
Another cross-architecture thing: folks, watch out for what's being done with
thread flags; I've just found a lovely bug on alpha where we have prctl(2)
doing non-atomic modifications of those (as in ti->flags = (ti->flags&~x)|y;),
which is obviously broken; TIF_SIGPENDING can be set asynchronously and even
from an interrupt. Fix for this one is going to Linus shortly (adding
a separate field for thread-synchronous flags, taking obviously t-s ones
there, including the UAC_... bunch set by that prctl()), but I don't think
that I can audit that for all architectures efficiently; cursory look has
found a braino on frv (fix being discussed with dhowells), but there may bloody
well be more of that fun.
> Surprisingly enough, ia64 one seems to work on actual hardware; I have sent
> Tony an incremental patch cleaning copy_thread() up, waiting for results of
> testing that on SMP box.
Tiny bit faster than plain 3.7-rc1. lmbench3 reports fork+execve test at between
558 to 567 usec with the new code, compared with 562-572 usec with the old.
-Tony
On Fri, Oct 19, 2012 at 05:16:50PM +0000, Luck, Tony wrote:
> > Surprisingly enough, ia64 one seems to work on actual hardware; I have sent
> > Tony an incremental patch cleaning copy_thread() up, waiting for results of
> > testing that on SMP box.
>
> Tiny bit faster than plain 3.7-rc1. lmbench3 reports fork+execve test at between
> 558 to 567 usec with the new code, compared with 562-572 usec with the old.
Are you OK with the state of comments in call_payload() in the current
form of that sucker? Right now in #arch-ia64 is looks so:
ENTRY(call_payload)
.prologue ASM_UNW_PRLG_RP|ASM_UNW_PRLG_PFS, ASM_UNW_PRLG_GRSAVE(0)
/* call the kernel_thread payload; fn is in r4, arg - in r5 */
alloc loc1=ar.pfs,0,3,1,0
mov loc0=rp
mov loc2=gp
mov out0=r5 // arg
ld8 r14 = [r4], 8 // fn.address
;;
mov b6 = r14
ld8 gp = [r4] // fn.gp
;;
br.call.sptk.many rp=b6 // fn(arg)
.ret12: mov gp=loc2
mov rp=loc0
mov ar.pfs=loc1
/* ... and if it has returned, we are going to userland */
cmp.ne pKStk,pUStk=r0,r0
br.ret.sptk.many rp
END(call_payload)
IIRC, the lack of comments on function with unusual calling conventions was
the last remaining issue...
On Fri, Oct 19, 2012 at 10:30 AM, Al Viro <[email protected]> wrote:
> IIRC, the lack of comments on function with unusual calling conventions was
> the last remaining issue...
Stylistically other asm functions have huge block header
comments detailing register usage. But typically those
are way more complex. I think your inline comments
work fine here.
-Tony
On Fri, Oct 19, 2012 at 11:01:53AM -0700, Tony Luck wrote:
> On Fri, Oct 19, 2012 at 10:30 AM, Al Viro <[email protected]> wrote:
> > IIRC, the lack of comments on function with unusual calling conventions was
> > the last remaining issue...
>
> Stylistically other asm functions have huge block header
> comments detailing register usage. But typically those
> are way more complex. I think your inline comments
> work fine here.
Thanks; ACKed-by and Tested-by added, branch put into no-rebase mode.
So openrisc and ia64 are in the finished pile now...
Also provide an optimized current_pt_regs() while we're at it.
Signed-off-by: Chris Metcalf <[email protected]>
---
arch/tile/Kconfig | 2 +
arch/tile/include/asm/processor.h | 3 ++
arch/tile/include/asm/switch_to.h | 5 +-
arch/tile/kernel/entry.S | 11 ----
arch/tile/kernel/intvec_32.S | 15 ++++++
arch/tile/kernel/intvec_64.S | 15 ++++++
arch/tile/kernel/process.c | 103 ++++++++++++++-----------------------
7 files changed, 78 insertions(+), 76 deletions(-)
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 875d008..ea7f61e 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -21,6 +21,8 @@ config TILE
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select GENERIC_CLOCKEVENTS
select MODULES_USE_ELF_RELA
+ select GENERIC_KERNEL_THREAD
+ select GENERIC_KERNEL_EXECVE
# FIXME: investigate whether we need/want these options.
# select HAVE_IOREMAP_PROT
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 8c4dd9f..9a83e53 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -239,6 +239,9 @@ unsigned long get_wchan(struct task_struct *p);
#define KSTK_TOP(task) (task_ksp0(task) - STACK_TOP_DELTA)
#define task_pt_regs(task) \
((struct pt_regs *)(task_ksp0(task) - KSTK_PTREGS_GAP) - 1)
+#define current_pt_regs() \
+ ((struct pt_regs *)((stack_pointer | (THREAD_SIZE - 1)) - \
+ (KSTK_PTREGS_GAP - 1)) - 1)
#define task_sp(task) (task_pt_regs(task)->sp)
#define task_pc(task) (task_pt_regs(task)->pc)
/* Aliases for pc and sp (used in fs/proc/array.c) */
diff --git a/arch/tile/include/asm/switch_to.h b/arch/tile/include/asm/switch_to.h
index 1d48c5f..b8f888c 100644
--- a/arch/tile/include/asm/switch_to.h
+++ b/arch/tile/include/asm/switch_to.h
@@ -68,7 +68,10 @@ extern unsigned long get_switch_to_pc(void);
/* Support function for forking a new task. */
void ret_from_fork(void);
-/* Called from ret_from_fork() when a new process starts up. */
+/* Support function for forking a new kernel thread. */
+void ret_from_kernel_thread(void *fn, void *arg);
+
+/* Called from ret_from_xxx() when a new process starts up. */
struct task_struct *sim_notify_fork(struct task_struct *prev);
#endif /* !__ASSEMBLY__ */
diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index c31637b..f116cb0 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -28,17 +28,6 @@ STD_ENTRY(current_text_addr)
STD_ENDPROC(current_text_addr)
/*
- * Implement execve(). The i386 code has a note that forking from kernel
- * space results in no copy on write until the execve, so we should be
- * careful not to write to the stack here.
- */
-STD_ENTRY(kernel_execve)
- moveli TREG_SYSCALL_NR_NAME, __NR_execve
- swint1
- jrp lr
- STD_ENDPROC(kernel_execve)
-
-/*
* We don't run this function directly, but instead copy it to a page
* we map into every user process. See vdso_setup().
*
diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index 6943515..58aad51 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -1291,6 +1291,21 @@ STD_ENTRY(ret_from_fork)
}
STD_ENDPROC(ret_from_fork)
+STD_ENTRY(ret_from_kernel_thread)
+ jal sim_notify_fork
+ jal schedule_tail
+ FEEDBACK_REENTER(ret_from_fork)
+ {
+ move r0, r31
+ jalr r30
+ }
+ FEEDBACK_REENTER(ret_from_kernel_thread)
+ {
+ movei r30, 0 /* not an NMI */
+ j .Lresume_userspace /* jump into middle of interrupt_return */
+ }
+ STD_ENDPROC(ret_from_kernel_thread)
+
/*
* Code for ill interrupt.
*/
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 7c06d59..a84911c 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -1150,6 +1150,21 @@ STD_ENTRY(ret_from_fork)
}
STD_ENDPROC(ret_from_fork)
+STD_ENTRY(ret_from_kernel_thread)
+ jal sim_notify_fork
+ jal schedule_tail
+ FEEDBACK_REENTER(ret_from_fork)
+ {
+ move r0, r31
+ jalr r30
+ }
+ FEEDBACK_REENTER(ret_from_kernel_thread)
+ {
+ movei r30, 0 /* not an NMI */
+ j .Lresume_userspace /* jump into middle of interrupt_return */
+ }
+ STD_ENDPROC(ret_from_kernel_thread)
+
/* Various stub interrupt handlers and syscall handlers */
STD_ENTRY_LOCAL(_kernel_double_fault)
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 307d010..ca6de8d 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -157,24 +157,44 @@ void arch_release_thread_info(struct thread_info *info)
static void save_arch_state(struct thread_struct *t);
int copy_thread(unsigned long clone_flags, unsigned long sp,
- unsigned long stack_size,
+ unsigned long arg,
struct task_struct *p, struct pt_regs *regs)
{
- struct pt_regs *childregs;
+ struct pt_regs *childregs = task_pt_regs(p);
unsigned long ksp;
+ unsigned long *callee_regs;
/*
- * When creating a new kernel thread we pass sp as zero.
- * Assign it to a reasonable value now that we have the stack.
+ * Set up the stack and stack pointer appropriately for the
+ * new child to find itself woken up in __switch_to().
+ * The callee-saved registers must be on the stack to be read;
+ * the new task will then jump to assembly support to handle
+ * calling schedule_tail(), etc., and (for userspace tasks)
+ * returning to the context set up in the pt_regs.
*/
- if (sp == 0 && regs->ex1 == PL_ICS_EX1(KERNEL_PL, 0))
- sp = KSTK_TOP(p);
+ ksp = (unsigned long) childregs;
+ ksp -= C_ABI_SAVE_AREA_SIZE; /* interrupt-entry save area */
+ ((long *)ksp)[0] = ((long *)ksp)[1] = 0;
+ ksp -= CALLEE_SAVED_REGS_COUNT * sizeof(unsigned long);
+ callee_regs = (unsigned long *)ksp;
+ ksp -= C_ABI_SAVE_AREA_SIZE; /* __switch_to() save area */
+ ((long *)ksp)[0] = ((long *)ksp)[1] = 0;
+ p->thread.ksp = ksp;
- /*
- * Do not clone step state from the parent; each thread
- * must make its own lazily.
- */
- task_thread_info(p)->step_state = NULL;
+ /* Record the pid of the task that created this one. */
+ p->thread.creator_pid = current->pid;
+
+ if (unlikely(!regs)) {
+ /* kernel thread */
+ memset(childregs, 0, sizeof(struct pt_regs));
+ memset(&callee_regs[2], 0,
+ (CALLEE_SAVED_REGS_COUNT - 2) * sizeof(unsigned long));
+ callee_regs[0] = sp; /* r30 = function */
+ callee_regs[1] = arg; /* r31 = arg */
+ childregs->ex1 = PL_ICS_EX1(KERNEL_PL, 0);
+ p->thread.pc = (unsigned long) ret_from_kernel_thread;
+ return 0;
+ }
/*
* Start new thread in ret_from_fork so it schedules properly
@@ -182,20 +202,24 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
*/
p->thread.pc = (unsigned long) ret_from_fork;
+ /*
+ * Do not clone step state from the parent; each thread
+ * must make its own lazily.
+ */
+ task_thread_info(p)->step_state = NULL;
+
/* Save user stack top pointer so we can ID the stack vm area later. */
p->thread.usp0 = sp;
- /* Record the pid of the process that created this one. */
- p->thread.creator_pid = current->pid;
-
/*
* Copy the registers onto the kernel stack so the
* return-from-interrupt code will reload it into registers.
*/
- childregs = task_pt_regs(p);
*childregs = *regs;
childregs->regs[0] = 0; /* return value is zero */
childregs->sp = sp; /* override with new user stack pointer */
+ memcpy(callee_regs, ®s->regs[CALLEE_SAVED_FIRST_REG],
+ CALLEE_SAVED_REGS_COUNT * sizeof(unsigned long));
/*
* If CLONE_SETTLS is set, set "tp" in the new task to "r4",
@@ -204,24 +228,6 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
if (clone_flags & CLONE_SETTLS)
childregs->tp = regs->regs[4];
- /*
- * Copy the callee-saved registers from the passed pt_regs struct
- * into the context-switch callee-saved registers area.
- * This way when we start the interrupt-return sequence, the
- * callee-save registers will be correctly in registers, which
- * is how we assume the compiler leaves them as we start doing
- * the normal return-from-interrupt path after calling C code.
- * Zero out the C ABI save area to mark the top of the stack.
- */
- ksp = (unsigned long) childregs;
- ksp -= C_ABI_SAVE_AREA_SIZE; /* interrupt-entry save area */
- ((long *)ksp)[0] = ((long *)ksp)[1] = 0;
- ksp -= CALLEE_SAVED_REGS_COUNT * sizeof(unsigned long);
- memcpy((void *)ksp, ®s->regs[CALLEE_SAVED_FIRST_REG],
- CALLEE_SAVED_REGS_COUNT * sizeof(unsigned long));
- ksp -= C_ABI_SAVE_AREA_SIZE; /* __switch_to() save area */
- ((long *)ksp)[0] = ((long *)ksp)[1] = 0;
- p->thread.ksp = ksp;
#if CHIP_HAS_TILE_DMA()
/*
@@ -650,37 +656,6 @@ unsigned long get_wchan(struct task_struct *p)
return 0;
}
-/*
- * We pass in lr as zero (cleared in kernel_thread) and the caller
- * part of the backtrace ABI on the stack also zeroed (in copy_thread)
- * so that backtraces will stop with this function.
- * Note that we don't use r0, since copy_thread() clears it.
- */
-static void start_kernel_thread(int dummy, int (*fn)(int), int arg)
-{
- do_exit(fn(arg));
-}
-
-/*
- * Create a kernel thread
- */
-int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
-{
- struct pt_regs regs;
-
- memset(®s, 0, sizeof(regs));
- regs.ex1 = PL_ICS_EX1(KERNEL_PL, 0); /* run at kernel PL, no ICS */
- regs.pc = (long) start_kernel_thread;
- regs.flags = PT_FLAGS_CALLER_SAVES; /* need to restore r1 and r2 */
- regs.regs[1] = (long) fn; /* function pointer */
- regs.regs[2] = (long) arg; /* parameter register */
-
- /* Ok, create the new process.. */
- return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, ®s,
- 0, NULL, NULL);
-}
-EXPORT_SYMBOL(kernel_thread);
-
/* Flush thread state. */
void flush_thread(void)
{
--
1.7.10.3
Also provide an optimized current_pt_regs() while we're at it.
Signed-off-by: Chris Metcalf <[email protected]>
---
[Re-sending to correct linus-arch / linux-arch typo.]
arch/tile/Kconfig | 2 +
arch/tile/include/asm/processor.h | 3 ++
arch/tile/include/asm/switch_to.h | 5 +-
arch/tile/kernel/entry.S | 11 ----
arch/tile/kernel/intvec_32.S | 15 ++++++
arch/tile/kernel/intvec_64.S | 15 ++++++
arch/tile/kernel/process.c | 103 ++++++++++++++-----------------------
7 files changed, 78 insertions(+), 76 deletions(-)
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 875d008..ea7f61e 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -21,6 +21,8 @@ config TILE
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select GENERIC_CLOCKEVENTS
select MODULES_USE_ELF_RELA
+ select GENERIC_KERNEL_THREAD
+ select GENERIC_KERNEL_EXECVE
# FIXME: investigate whether we need/want these options.
# select HAVE_IOREMAP_PROT
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 8c4dd9f..9a83e53 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -239,6 +239,9 @@ unsigned long get_wchan(struct task_struct *p);
#define KSTK_TOP(task) (task_ksp0(task) - STACK_TOP_DELTA)
#define task_pt_regs(task) \
((struct pt_regs *)(task_ksp0(task) - KSTK_PTREGS_GAP) - 1)
+#define current_pt_regs() \
+ ((struct pt_regs *)((stack_pointer | (THREAD_SIZE - 1)) - \
+ (KSTK_PTREGS_GAP - 1)) - 1)
#define task_sp(task) (task_pt_regs(task)->sp)
#define task_pc(task) (task_pt_regs(task)->pc)
/* Aliases for pc and sp (used in fs/proc/array.c) */
diff --git a/arch/tile/include/asm/switch_to.h b/arch/tile/include/asm/switch_to.h
index 1d48c5f..b8f888c 100644
--- a/arch/tile/include/asm/switch_to.h
+++ b/arch/tile/include/asm/switch_to.h
@@ -68,7 +68,10 @@ extern unsigned long get_switch_to_pc(void);
/* Support function for forking a new task. */
void ret_from_fork(void);
-/* Called from ret_from_fork() when a new process starts up. */
+/* Support function for forking a new kernel thread. */
+void ret_from_kernel_thread(void *fn, void *arg);
+
+/* Called from ret_from_xxx() when a new process starts up. */
struct task_struct *sim_notify_fork(struct task_struct *prev);
#endif /* !__ASSEMBLY__ */
diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index c31637b..f116cb0 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -28,17 +28,6 @@ STD_ENTRY(current_text_addr)
STD_ENDPROC(current_text_addr)
/*
- * Implement execve(). The i386 code has a note that forking from kernel
- * space results in no copy on write until the execve, so we should be
- * careful not to write to the stack here.
- */
-STD_ENTRY(kernel_execve)
- moveli TREG_SYSCALL_NR_NAME, __NR_execve
- swint1
- jrp lr
- STD_ENDPROC(kernel_execve)
-
-/*
* We don't run this function directly, but instead copy it to a page
* we map into every user process. See vdso_setup().
*
diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index 6943515..58aad51 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -1291,6 +1291,21 @@ STD_ENTRY(ret_from_fork)
}
STD_ENDPROC(ret_from_fork)
+STD_ENTRY(ret_from_kernel_thread)
+ jal sim_notify_fork
+ jal schedule_tail
+ FEEDBACK_REENTER(ret_from_fork)
+ {
+ move r0, r31
+ jalr r30
+ }
+ FEEDBACK_REENTER(ret_from_kernel_thread)
+ {
+ movei r30, 0 /* not an NMI */
+ j .Lresume_userspace /* jump into middle of interrupt_return */
+ }
+ STD_ENDPROC(ret_from_kernel_thread)
+
/*
* Code for ill interrupt.
*/
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 7c06d59..a84911c 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -1150,6 +1150,21 @@ STD_ENTRY(ret_from_fork)
}
STD_ENDPROC(ret_from_fork)
+STD_ENTRY(ret_from_kernel_thread)
+ jal sim_notify_fork
+ jal schedule_tail
+ FEEDBACK_REENTER(ret_from_fork)
+ {
+ move r0, r31
+ jalr r30
+ }
+ FEEDBACK_REENTER(ret_from_kernel_thread)
+ {
+ movei r30, 0 /* not an NMI */
+ j .Lresume_userspace /* jump into middle of interrupt_return */
+ }
+ STD_ENDPROC(ret_from_kernel_thread)
+
/* Various stub interrupt handlers and syscall handlers */
STD_ENTRY_LOCAL(_kernel_double_fault)
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 307d010..ca6de8d 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -157,24 +157,44 @@ void arch_release_thread_info(struct thread_info *info)
static void save_arch_state(struct thread_struct *t);
int copy_thread(unsigned long clone_flags, unsigned long sp,
- unsigned long stack_size,
+ unsigned long arg,
struct task_struct *p, struct pt_regs *regs)
{
- struct pt_regs *childregs;
+ struct pt_regs *childregs = task_pt_regs(p);
unsigned long ksp;
+ unsigned long *callee_regs;
/*
- * When creating a new kernel thread we pass sp as zero.
- * Assign it to a reasonable value now that we have the stack.
+ * Set up the stack and stack pointer appropriately for the
+ * new child to find itself woken up in __switch_to().
+ * The callee-saved registers must be on the stack to be read;
+ * the new task will then jump to assembly support to handle
+ * calling schedule_tail(), etc., and (for userspace tasks)
+ * returning to the context set up in the pt_regs.
*/
- if (sp == 0 && regs->ex1 == PL_ICS_EX1(KERNEL_PL, 0))
- sp = KSTK_TOP(p);
+ ksp = (unsigned long) childregs;
+ ksp -= C_ABI_SAVE_AREA_SIZE; /* interrupt-entry save area */
+ ((long *)ksp)[0] = ((long *)ksp)[1] = 0;
+ ksp -= CALLEE_SAVED_REGS_COUNT * sizeof(unsigned long);
+ callee_regs = (unsigned long *)ksp;
+ ksp -= C_ABI_SAVE_AREA_SIZE; /* __switch_to() save area */
+ ((long *)ksp)[0] = ((long *)ksp)[1] = 0;
+ p->thread.ksp = ksp;
- /*
- * Do not clone step state from the parent; each thread
- * must make its own lazily.
- */
- task_thread_info(p)->step_state = NULL;
+ /* Record the pid of the task that created this one. */
+ p->thread.creator_pid = current->pid;
+
+ if (unlikely(!regs)) {
+ /* kernel thread */
+ memset(childregs, 0, sizeof(struct pt_regs));
+ memset(&callee_regs[2], 0,
+ (CALLEE_SAVED_REGS_COUNT - 2) * sizeof(unsigned long));
+ callee_regs[0] = sp; /* r30 = function */
+ callee_regs[1] = arg; /* r31 = arg */
+ childregs->ex1 = PL_ICS_EX1(KERNEL_PL, 0);
+ p->thread.pc = (unsigned long) ret_from_kernel_thread;
+ return 0;
+ }
/*
* Start new thread in ret_from_fork so it schedules properly
@@ -182,20 +202,24 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
*/
p->thread.pc = (unsigned long) ret_from_fork;
+ /*
+ * Do not clone step state from the parent; each thread
+ * must make its own lazily.
+ */
+ task_thread_info(p)->step_state = NULL;
+
/* Save user stack top pointer so we can ID the stack vm area later. */
p->thread.usp0 = sp;
- /* Record the pid of the process that created this one. */
- p->thread.creator_pid = current->pid;
-
/*
* Copy the registers onto the kernel stack so the
* return-from-interrupt code will reload it into registers.
*/
- childregs = task_pt_regs(p);
*childregs = *regs;
childregs->regs[0] = 0; /* return value is zero */
childregs->sp = sp; /* override with new user stack pointer */
+ memcpy(callee_regs, ®s->regs[CALLEE_SAVED_FIRST_REG],
+ CALLEE_SAVED_REGS_COUNT * sizeof(unsigned long));
/*
* If CLONE_SETTLS is set, set "tp" in the new task to "r4",
@@ -204,24 +228,6 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
if (clone_flags & CLONE_SETTLS)
childregs->tp = regs->regs[4];
- /*
- * Copy the callee-saved registers from the passed pt_regs struct
- * into the context-switch callee-saved registers area.
- * This way when we start the interrupt-return sequence, the
- * callee-save registers will be correctly in registers, which
- * is how we assume the compiler leaves them as we start doing
- * the normal return-from-interrupt path after calling C code.
- * Zero out the C ABI save area to mark the top of the stack.
- */
- ksp = (unsigned long) childregs;
- ksp -= C_ABI_SAVE_AREA_SIZE; /* interrupt-entry save area */
- ((long *)ksp)[0] = ((long *)ksp)[1] = 0;
- ksp -= CALLEE_SAVED_REGS_COUNT * sizeof(unsigned long);
- memcpy((void *)ksp, ®s->regs[CALLEE_SAVED_FIRST_REG],
- CALLEE_SAVED_REGS_COUNT * sizeof(unsigned long));
- ksp -= C_ABI_SAVE_AREA_SIZE; /* __switch_to() save area */
- ((long *)ksp)[0] = ((long *)ksp)[1] = 0;
- p->thread.ksp = ksp;
#if CHIP_HAS_TILE_DMA()
/*
@@ -650,37 +656,6 @@ unsigned long get_wchan(struct task_struct *p)
return 0;
}
-/*
- * We pass in lr as zero (cleared in kernel_thread) and the caller
- * part of the backtrace ABI on the stack also zeroed (in copy_thread)
- * so that backtraces will stop with this function.
- * Note that we don't use r0, since copy_thread() clears it.
- */
-static void start_kernel_thread(int dummy, int (*fn)(int), int arg)
-{
- do_exit(fn(arg));
-}
-
-/*
- * Create a kernel thread
- */
-int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
-{
- struct pt_regs regs;
-
- memset(®s, 0, sizeof(regs));
- regs.ex1 = PL_ICS_EX1(KERNEL_PL, 0); /* run at kernel PL, no ICS */
- regs.pc = (long) start_kernel_thread;
- regs.flags = PT_FLAGS_CALLER_SAVES; /* need to restore r1 and r2 */
- regs.regs[1] = (long) fn; /* function pointer */
- regs.regs[2] = (long) arg; /* parameter register */
-
- /* Ok, create the new process.. */
- return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, ®s,
- 0, NULL, NULL);
-}
-EXPORT_SYMBOL(kernel_thread);
-
/* Flush thread state. */
void flush_thread(void)
{
--
1.7.10.3
On Fri, Oct 19, 2012 at 04:25:12PM -0400, Chris Metcalf wrote:
> Also provide an optimized current_pt_regs() while we're at it.
Applied. BTW, are you sure you want to record parent's pid and not tid?
Anyway, here's a followup on top of this one (again, completely untested) -
switching to generic sys_execve(). Does that look right for you? While
we are at it, I wonder if any of PTREGS_SYSCALL wrappers are needed -
current_pt_regs() would do just as well, won't it? It's a couple of
arithmetical operations vs. arith operation + branch; even if the latter
is somehow cheaper, can't be cheaper by much. And I'd expect it to be
costlier, actually, what with the icache effects.
Until now current_pt_regs() was not an option for execve(), due to
kernel_thread() implementation being what it was, but now it should
work for all of them...
diff --git a/arch/tile/include/asm/compat.h b/arch/tile/include/asm/compat.h
index 3063e6f..404a3dd 100644
--- a/arch/tile/include/asm/compat.h
+++ b/arch/tile/include/asm/compat.h
@@ -304,9 +304,6 @@ long compat_sys_sched_rr_get_interval(compat_pid_t pid,
struct compat_timespec __user *interval);
/* These are the intvec_64.S trampolines. */
-long _compat_sys_execve(const char __user *path,
- const compat_uptr_t __user *argv,
- const compat_uptr_t __user *envp);
long _compat_sys_sigaltstack(const struct compat_sigaltstack __user *uss_ptr,
struct compat_sigaltstack __user *uoss_ptr);
long _compat_sys_rt_sigreturn(void);
diff --git a/arch/tile/include/asm/elf.h b/arch/tile/include/asm/elf.h
index f8ccf08..b73e103 100644
--- a/arch/tile/include/asm/elf.h
+++ b/arch/tile/include/asm/elf.h
@@ -148,6 +148,7 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm,
#define compat_start_thread(regs, ip, usp) do { \
regs->pc = ptr_to_compat_reg((void *)(ip)); \
regs->sp = ptr_to_compat_reg((void *)(usp)); \
+ single_step_execve(); \
} while (0)
/*
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 9a83e53..879073e 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -211,6 +211,7 @@ static inline void start_thread(struct pt_regs *regs,
{
regs->pc = pc;
regs->sp = usp;
+ single_step_execve();
}
/* Free all resources held by a thread. */
diff --git a/arch/tile/include/asm/syscalls.h b/arch/tile/include/asm/syscalls.h
index 06f0464..0d52992 100644
--- a/arch/tile/include/asm/syscalls.h
+++ b/arch/tile/include/asm/syscalls.h
@@ -68,9 +68,10 @@ long _sys_sigaltstack(const stack_t __user *, stack_t __user *);
long _sys_rt_sigreturn(void);
long _sys_clone(unsigned long clone_flags, unsigned long newsp,
void __user *parent_tid, void __user *child_tid);
-long _sys_execve(const char __user *filename,
+long sys_execve(const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp);
+#define sys_execve sys_execve
#include <asm-generic/syscalls.h>
diff --git a/arch/tile/include/asm/unistd.h b/arch/tile/include/asm/unistd.h
index 6e032a0..dab827d 100644
--- a/arch/tile/include/asm/unistd.h
+++ b/arch/tile/include/asm/unistd.h
@@ -16,4 +16,5 @@
#define __ARCH_WANT_SYS_LLSEEK
#endif
#define __ARCH_WANT_SYS_NEWFSTATAT
+#define __ARCH_WANT_SYS_EXECVE
#include <uapi/asm/unistd.h>
diff --git a/arch/tile/kernel/compat.c b/arch/tile/kernel/compat.c
index d67459b..a8e5a84 100644
--- a/arch/tile/kernel/compat.c
+++ b/arch/tile/kernel/compat.c
@@ -103,7 +103,6 @@ long compat_sys_sched_rr_get_interval(compat_pid_t pid,
#define compat_sys_readahead sys32_readahead
/* Call the trampolines to manage pt_regs where necessary. */
-#define compat_sys_execve _compat_sys_execve
#define compat_sys_sigaltstack _compat_sys_sigaltstack
#define compat_sys_rt_sigreturn _compat_sys_rt_sigreturn
#define sys_clone _sys_clone
diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index 58aad51..174b837 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -1476,7 +1476,6 @@ STD_ENTRY_LOCAL(bad_intr)
}; \
STD_ENDPROC(_##x)
-PTREGS_SYSCALL(sys_execve, r3)
PTREGS_SYSCALL(sys_sigaltstack, r2)
PTREGS_SYSCALL_SIGRETURN(sys_rt_sigreturn, r0)
PTREGS_SYSCALL(sys_cmpxchg_badaddr, r1)
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index a84911c4..283efed 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -1205,11 +1205,9 @@ STD_ENTRY_LOCAL(bad_intr)
}; \
STD_ENDPROC(_##x)
-PTREGS_SYSCALL(sys_execve, r3)
PTREGS_SYSCALL(sys_sigaltstack, r2)
PTREGS_SYSCALL_SIGRETURN(sys_rt_sigreturn, r0)
#ifdef CONFIG_COMPAT
-PTREGS_SYSCALL(compat_sys_execve, r3)
PTREGS_SYSCALL(compat_sys_sigaltstack, r2)
PTREGS_SYSCALL_SIGRETURN(compat_sys_rt_sigreturn, r0)
#endif
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index ca6de8d..58f8fd1 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -594,51 +594,6 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
parent_tidptr, child_tidptr);
}
-/*
- * sys_execve() executes a new program.
- */
-SYSCALL_DEFINE4(execve, const char __user *, path,
- const char __user *const __user *, argv,
- const char __user *const __user *, envp,
- struct pt_regs *, regs)
-{
- long error;
- struct filename *filename;
-
- filename = getname(path);
- error = PTR_ERR(filename);
- if (IS_ERR(filename))
- goto out;
- error = do_execve(filename->name, argv, envp, regs);
- putname(filename);
- if (error == 0)
- single_step_execve();
-out:
- return error;
-}
-
-#ifdef CONFIG_COMPAT
-long compat_sys_execve(const char __user *path,
- compat_uptr_t __user *argv,
- compat_uptr_t __user *envp,
- struct pt_regs *regs)
-{
- long error;
- struct filename *filename;
-
- filename = getname(path);
- error = PTR_ERR(filename);
- if (IS_ERR(filename))
- goto out;
- error = compat_do_execve(filename->name, argv, envp, regs);
- putname(filename);
- if (error == 0)
- single_step_execve();
-out:
- return error;
-}
-#endif
-
unsigned long get_wchan(struct task_struct *p)
{
struct KBacktraceIterator kbt;
diff --git a/arch/tile/kernel/sys.c b/arch/tile/kernel/sys.c
index b08095b..359e76f 100644
--- a/arch/tile/kernel/sys.c
+++ b/arch/tile/kernel/sys.c
@@ -107,7 +107,6 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
#endif
/* Call the trampolines to manage pt_regs where necessary. */
-#define sys_execve _sys_execve
#define sys_sigaltstack _sys_sigaltstack
#define sys_rt_sigreturn _sys_rt_sigreturn
#define sys_clone _sys_clone
On 10/19/2012 5:35 PM, Al Viro wrote:
> On Fri, Oct 19, 2012 at 04:25:12PM -0400, Chris Metcalf wrote:
>> Also provide an optimized current_pt_regs() while we're at it.
> Applied. BTW, are you sure you want to record parent's pid and not tid?
By recording ->pid rather than ->pgid, we ARE recording the tid :-)
> Anyway, here's a followup on top of this one (again, completely untested) -
> switching to generic sys_execve(). Does that look right for you?
It does look right, but it doesn't quite work as-is. But after some tweaks
it did yield a kernel that booted up userspace correctly, so I think it's
basically good.
First, the compat_sys_execve() declaration provided in
arch/tile/include/asm/compat.h isn't right, so I deleted that (you had only
deleted the PTREGS_SYSCALL trampoline declaration, _compat_sys_execve).
However, then arch/tile/kernel/compat.c failed to build, because
<linux/compat.h> is included before <asm/unistd.h>, and <asm/unistd.h>
provides __ARCH_WANT_SYS_EXECVE, and so we end up with no declaration at
all for compat_sys_execve. For most platforms this is no big deal, but on
tile we use the __SYSCALL #define to provide the actual syscall table, and
for that to work we need a declaration in scope for each syscall at the
time we create the table.
The best solution seems likely to be to copy the other place in
<linux/compat.h> where we need to do something configurable (that is,
CONFIG_ARCH_WANT_OLD_COMPAT_IPC), and just convert __ARCH_WANT_SYS_EXECVE
to be a Kconfig option.
Another possibility is to pre-include <asm/unistd.h> in the tile compat.c
before including <linux/compat.h>. This requires adding some #undefs for
_SC_3264, etc., in <asm-generic/unistd.h>, since we'll need to include the
header twice, once to satisfy <linux/compat.h>, and then again to actually
provide the body of the syscall array. If we go down this path, I suspect
we should just make <linux/compat.h> include <asm/unistd.h>, so it gets the
__ARCH_WANT_SYS_EXECVE define provided. Otherwise we have the ugly
requirement of requiring the C file to include specific headers in specific
order for it to work right.
> While
> we are at it, I wonder if any of PTREGS_SYSCALL wrappers are needed -
> current_pt_regs() would do just as well, won't it? It's a couple of
> arithmetical operations vs. arith operation + branch; even if the latter
> is somehow cheaper, can't be cheaper by much. And I'd expect it to be
> costlier, actually, what with the icache effects.
Yes, that's a good idea. I'll look at it when I'm back in the office next
week.
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
On Sat, Oct 20, 2012 at 09:06:57AM -0400, Chris Metcalf wrote:
> First, the compat_sys_execve() declaration provided in
> arch/tile/include/asm/compat.h isn't right, so I deleted that (you had only
> deleted the PTREGS_SYSCALL trampoline declaration, _compat_sys_execve).
>
> However, then arch/tile/kernel/compat.c failed to build, because
> <linux/compat.h> is included before <asm/unistd.h>, and <asm/unistd.h>
> provides __ARCH_WANT_SYS_EXECVE, and so we end up with no declaration at
> all for compat_sys_execve. For most platforms this is no big deal, but on
> tile we use the __SYSCALL #define to provide the actual syscall table, and
> for that to work we need a declaration in scope for each syscall at the
> time we create the table.
>
> The best solution seems likely to be to copy the other place in
> <linux/compat.h> where we need to do something configurable (that is,
> CONFIG_ARCH_WANT_OLD_COMPAT_IPC), and just convert __ARCH_WANT_SYS_EXECVE
> to be a Kconfig option.
Frankly, I hope to get rid of the damn thing completely. By now we have
at least some variant of execve conversions for just about everything;
I certainly hope that by the beginning of the next cycle we'll have it
defined on everything. And put unconditional declarations in syscalls.h
and compat.h
Actually, we can make the declaration in linux/compat.h unconditional
right now. The only obstacle is the situation on arm64; there the mainline
has C variant of that sucker (with struct pt_regs * in arguments) in
arch/arm64/kernel/sys_compat.c. So we could ask Linus to pull
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 execve
and then do the following in one patch:
introduce current_pt_regs() on tile (from your commit)
get rid of pt_regs * argument in tile compat_sys_execve(), making it
use current_pt_regs(); kill its wrapper
kill the declarations of compat_sys_execve()/_compat_sys_execve()
in tile asm/compat.h
make declaration in linux/compat.h unconditional
Note that this does *not* depend on kernel_thread/kernel_execve patch -
kernel_execve() is never going to hit compat_sys_execve(), since it's
only called from kernel threads and those are not going to be 32bit.
After that we can do kernel_thread/kernel_execve commit and
sys_execve() conversion with nothing outside of arch/tile touched.
Comments? If Catalin feels that arm64 series needs more testing, we could
reorder it a bit - all we really need there is getting rid of pt_regs argument
in compat_sys_execve() and just as on tile, that isn't tied into kernel_execve
and kernel_thread issues.
FWIW, it's my fault - I assumed that it would be easier to sort the compat
side of things out on per-arch basis. Should've consolidated *that* first
(unlike sys_execve(), that doesn't have any prereqs), followed by the
infrastructure for kernel_thread/kernel_execve/sys_execve series, followed
by the per-architecture conversions. Too late for that now, of course,
and fortunately the mess is fairly small.
On Sat, Oct 20, 2012 at 04:34:01PM +0100, Al Viro wrote:
> On Sat, Oct 20, 2012 at 09:06:57AM -0400, Chris Metcalf wrote:
> > First, the compat_sys_execve() declaration provided in
> > arch/tile/include/asm/compat.h isn't right, so I deleted that (you had only
> > deleted the PTREGS_SYSCALL trampoline declaration, _compat_sys_execve).
> >
> > However, then arch/tile/kernel/compat.c failed to build, because
> > <linux/compat.h> is included before <asm/unistd.h>, and <asm/unistd.h>
> > provides __ARCH_WANT_SYS_EXECVE, and so we end up with no declaration at
> > all for compat_sys_execve. For most platforms this is no big deal, but on
> > tile we use the __SYSCALL #define to provide the actual syscall table, and
> > for that to work we need a declaration in scope for each syscall at the
> > time we create the table.
> >
> > The best solution seems likely to be to copy the other place in
> > <linux/compat.h> where we need to do something configurable (that is,
> > CONFIG_ARCH_WANT_OLD_COMPAT_IPC), and just convert __ARCH_WANT_SYS_EXECVE
> > to be a Kconfig option.
>
> Frankly, I hope to get rid of the damn thing completely. By now we have
> at least some variant of execve conversions for just about everything;
> I certainly hope that by the beginning of the next cycle we'll have it
> defined on everything. And put unconditional declarations in syscalls.h
> and compat.h
>
> Actually, we can make the declaration in linux/compat.h unconditional
> right now. The only obstacle is the situation on arm64; there the mainline
> has C variant of that sucker (with struct pt_regs * in arguments) in
> arch/arm64/kernel/sys_compat.c. So we could ask Linus to pull
> git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 execve
> and then do the following in one patch:
> introduce current_pt_regs() on tile (from your commit)
> get rid of pt_regs * argument in tile compat_sys_execve(), making it
> use current_pt_regs(); kill its wrapper
> kill the declarations of compat_sys_execve()/_compat_sys_execve()
> in tile asm/compat.h
> make declaration in linux/compat.h unconditional
> Note that this does *not* depend on kernel_thread/kernel_execve patch -
> kernel_execve() is never going to hit compat_sys_execve(), since it's
> only called from kernel threads and those are not going to be 32bit.
> After that we can do kernel_thread/kernel_execve commit and
> sys_execve() conversion with nothing outside of arch/tile touched.
Another possible variant is for you to merge that branch from arm64 tree
(only 3 commits it it) and then do as described above. I.e. on top of
that apply the thing below, followed by your kernel_thread()/kernel_execve()
patch (sans current_pt_regs() part), followed by obviously massaged parts
of generic sys_execve for tile patch I've sent. FWIW, I've put the
whole series (based at the end of arm64 branch) in signal.git#arch-tile.
Comments?
Drop struct pt_regs * argument in compat_sys_execve()
Signed-off-by: Al Viro <[email protected]>
---
diff --git a/arch/tile/include/asm/compat.h b/arch/tile/include/asm/compat.h
index 3063e6f..3bcf1b9 100644
--- a/arch/tile/include/asm/compat.h
+++ b/arch/tile/include/asm/compat.h
@@ -275,9 +275,6 @@ extern int compat_setup_rt_frame(int sig, struct k_sigaction *ka,
struct compat_sigaction;
struct compat_siginfo;
struct compat_sigaltstack;
-long compat_sys_execve(const char __user *path,
- compat_uptr_t __user *argv,
- compat_uptr_t __user *envp, struct pt_regs *);
long compat_sys_rt_sigaction(int sig, struct compat_sigaction __user *act,
struct compat_sigaction __user *oact,
size_t sigsetsize);
@@ -304,9 +301,6 @@ long compat_sys_sched_rr_get_interval(compat_pid_t pid,
struct compat_timespec __user *interval);
/* These are the intvec_64.S trampolines. */
-long _compat_sys_execve(const char __user *path,
- const compat_uptr_t __user *argv,
- const compat_uptr_t __user *envp);
long _compat_sys_sigaltstack(const struct compat_sigaltstack __user *uss_ptr,
struct compat_sigaltstack __user *uoss_ptr);
long _compat_sys_rt_sigreturn(void);
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 8c4dd9f..9a83e53 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -239,6 +239,9 @@ unsigned long get_wchan(struct task_struct *p);
#define KSTK_TOP(task) (task_ksp0(task) - STACK_TOP_DELTA)
#define task_pt_regs(task) \
((struct pt_regs *)(task_ksp0(task) - KSTK_PTREGS_GAP) - 1)
+#define current_pt_regs() \
+ ((struct pt_regs *)((stack_pointer | (THREAD_SIZE - 1)) - \
+ (KSTK_PTREGS_GAP - 1)) - 1)
#define task_sp(task) (task_pt_regs(task)->sp)
#define task_pc(task) (task_pt_regs(task)->pc)
/* Aliases for pc and sp (used in fs/proc/array.c) */
diff --git a/arch/tile/kernel/compat.c b/arch/tile/kernel/compat.c
index d67459b..a8e5a84 100644
--- a/arch/tile/kernel/compat.c
+++ b/arch/tile/kernel/compat.c
@@ -103,7 +103,6 @@ long compat_sys_sched_rr_get_interval(compat_pid_t pid,
#define compat_sys_readahead sys32_readahead
/* Call the trampolines to manage pt_regs where necessary. */
-#define compat_sys_execve _compat_sys_execve
#define compat_sys_sigaltstack _compat_sys_sigaltstack
#define compat_sys_rt_sigreturn _compat_sys_rt_sigreturn
#define sys_clone _sys_clone
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 7c06d59..73f6c0a 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -1194,7 +1194,6 @@ PTREGS_SYSCALL(sys_execve, r3)
PTREGS_SYSCALL(sys_sigaltstack, r2)
PTREGS_SYSCALL_SIGRETURN(sys_rt_sigreturn, r0)
#ifdef CONFIG_COMPAT
-PTREGS_SYSCALL(compat_sys_execve, r3)
PTREGS_SYSCALL(compat_sys_sigaltstack, r2)
PTREGS_SYSCALL_SIGRETURN(compat_sys_rt_sigreturn, r0)
#endif
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 307d010..9dc1391 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -614,8 +614,7 @@ out:
#ifdef CONFIG_COMPAT
long compat_sys_execve(const char __user *path,
compat_uptr_t __user *argv,
- compat_uptr_t __user *envp,
- struct pt_regs *regs)
+ compat_uptr_t __user *envp)
{
long error;
struct filename *filename;
@@ -624,7 +623,8 @@ long compat_sys_execve(const char __user *path,
error = PTR_ERR(filename);
if (IS_ERR(filename))
goto out;
- error = compat_do_execve(filename->name, argv, envp, regs);
+ error = compat_do_execve(filename->name, argv, envp,
+ current_pt_regs());
putname(filename);
if (error == 0)
single_step_execve();
diff --git a/include/linux/compat.h b/include/linux/compat.h
index d0ced10..d2db710 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -286,10 +286,8 @@ asmlinkage ssize_t compat_sys_pwritev(unsigned long fd,
int compat_do_execve(const char *filename, const compat_uptr_t __user *argv,
const compat_uptr_t __user *envp, struct pt_regs *regs);
-#ifdef __ARCH_WANT_SYS_EXECVE
asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
const compat_uptr_t __user *envp);
-#endif
asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
compat_ulong_t __user *outp, compat_ulong_t __user *exp,
Hi Al,
Sorry, couldn't reply earlier.
On Sat, Oct 20, 2012 at 04:34:01PM +0100, Al Viro wrote:
> On Sat, Oct 20, 2012 at 09:06:57AM -0400, Chris Metcalf wrote:
> > First, the compat_sys_execve() declaration provided in
> > arch/tile/include/asm/compat.h isn't right, so I deleted that (you had only
> > deleted the PTREGS_SYSCALL trampoline declaration, _compat_sys_execve).
> >
> > However, then arch/tile/kernel/compat.c failed to build, because
> > <linux/compat.h> is included before <asm/unistd.h>, and <asm/unistd.h>
> > provides __ARCH_WANT_SYS_EXECVE, and so we end up with no declaration at
> > all for compat_sys_execve. For most platforms this is no big deal, but on
> > tile we use the __SYSCALL #define to provide the actual syscall table, and
> > for that to work we need a declaration in scope for each syscall at the
> > time we create the table.
> >
> > The best solution seems likely to be to copy the other place in
> > <linux/compat.h> where we need to do something configurable (that is,
> > CONFIG_ARCH_WANT_OLD_COMPAT_IPC), and just convert __ARCH_WANT_SYS_EXECVE
> > to be a Kconfig option.
>
> Frankly, I hope to get rid of the damn thing completely. By now we have
> at least some variant of execve conversions for just about everything;
> I certainly hope that by the beginning of the next cycle we'll have it
> defined on everything. And put unconditional declarations in syscalls.h
> and compat.h
>
> Actually, we can make the declaration in linux/compat.h unconditional
> right now. The only obstacle is the situation on arm64; there the mainline
> has C variant of that sucker (with struct pt_regs * in arguments) in
> arch/arm64/kernel/sys_compat.c. So we could ask Linus to pull
> git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 execve
> and then do the following in one patch:
> introduce current_pt_regs() on tile (from your commit)
> get rid of pt_regs * argument in tile compat_sys_execve(), making it
> use current_pt_regs(); kill its wrapper
> kill the declarations of compat_sys_execve()/_compat_sys_execve()
> in tile asm/compat.h
> make declaration in linux/compat.h unconditional
> Note that this does *not* depend on kernel_thread/kernel_execve patch -
> kernel_execve() is never going to hit compat_sys_execve(), since it's
> only called from kernel threads and those are not going to be 32bit.
> After that we can do kernel_thread/kernel_execve commit and
> sys_execve() conversion with nothing outside of arch/tile touched.
>
> Comments? If Catalin feels that arm64 series needs more testing, we could
> reorder it a bit - all we really need there is getting rid of pt_regs argument
> in compat_sys_execve() and just as on tile, that isn't tied into kernel_execve
> and kernel_thread issues.
I don't mind merging it earlier (if Linus is ok to take it after the
merging window). It works fine in my tests.
--
Catalin
On 10/20/2012 1:16 PM, Al Viro wrote:
> On Sat, Oct 20, 2012 at 04:34:01PM +0100, Al Viro wrote:
>> On Sat, Oct 20, 2012 at 09:06:57AM -0400, Chris Metcalf wrote:
>>> First, the compat_sys_execve() declaration provided in
>>> arch/tile/include/asm/compat.h isn't right, so I deleted that (you had only
>>> deleted the PTREGS_SYSCALL trampoline declaration, _compat_sys_execve).
>>>
>>> However, then arch/tile/kernel/compat.c failed to build, because
>>> <linux/compat.h> is included before <asm/unistd.h>, and <asm/unistd.h>
>>> provides __ARCH_WANT_SYS_EXECVE, and so we end up with no declaration at
>>> all for compat_sys_execve. For most platforms this is no big deal, but on
>>> tile we use the __SYSCALL #define to provide the actual syscall table, and
>>> for that to work we need a declaration in scope for each syscall at the
>>> time we create the table.
>>>
>>> The best solution seems likely to be to copy the other place in
>>> <linux/compat.h> where we need to do something configurable (that is,
>>> CONFIG_ARCH_WANT_OLD_COMPAT_IPC), and just convert __ARCH_WANT_SYS_EXECVE
>>> to be a Kconfig option.
>> Frankly, I hope to get rid of the damn thing completely. By now we have
>> at least some variant of execve conversions for just about everything;
>> I certainly hope that by the beginning of the next cycle we'll have it
>> defined on everything. And put unconditional declarations in syscalls.h
>> and compat.h
>>
>> Actually, we can make the declaration in linux/compat.h unconditional
>> right now. The only obstacle is the situation on arm64; there the mainline
>> has C variant of that sucker (with struct pt_regs * in arguments) in
>> arch/arm64/kernel/sys_compat.c. So we could ask Linus to pull
>> git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 execve
>> and then do the following in one patch:
>> introduce current_pt_regs() on tile (from your commit)
>> get rid of pt_regs * argument in tile compat_sys_execve(), making it
>> use current_pt_regs(); kill its wrapper
>> kill the declarations of compat_sys_execve()/_compat_sys_execve()
>> in tile asm/compat.h
>> make declaration in linux/compat.h unconditional
>> Note that this does *not* depend on kernel_thread/kernel_execve patch -
>> kernel_execve() is never going to hit compat_sys_execve(), since it's
>> only called from kernel threads and those are not going to be 32bit.
>> After that we can do kernel_thread/kernel_execve commit and
>> sys_execve() conversion with nothing outside of arch/tile touched.
> Another possible variant is for you to merge that branch from arm64 tree
> (only 3 commits it it) and then do as described above. I.e. on top of
> that apply the thing below, followed by your kernel_thread()/kernel_execve()
> patch (sans current_pt_regs() part), followed by obviously massaged parts
> of generic sys_execve for tile patch I've sent. FWIW, I've put the
> whole series (based at the end of arm64 branch) in signal.git#arch-tile.
> Comments?
I fetched the series from your arch-tile branch and built it, and it works
fine. It looks good from my inspection:
Acked-by: Chris Metcalf <[email protected]>
As you had suggested in an earlier email, I went ahead and eliminated the
special pt_regs handling for sigaltstack, rt_sigreturn, and clone. (Also a
tilepro-specific syscall that was also using PTREG_SYSCALL.) I'll send it
separately and you can include it in your tree if you like.
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
Using the new current_pt_regs() model, we can remove some trampolines
from assembly code and call directly to the C syscall implementations.
rt_sigreturn() and clone() still need some assembly wrapping, but no
longer are passed a pt_regs pointer. sigaltstack() and the
tilepro-specific cmpxchg_badaddr() syscalls are now just straight C.
Signed-off-by: Chris Metcalf <[email protected]>
---
This is based on Al Viro's signal.git#arch-tile branch, which includes
some ARM64 changes and some Tile changes for generic kernel execve.
arch/tile/include/asm/compat.h | 9 +++------
arch/tile/include/asm/syscalls.h | 19 +++++++++++++------
arch/tile/kernel/compat.c | 4 ++--
arch/tile/kernel/compat_signal.c | 10 +++++-----
arch/tile/kernel/intvec_32.S | 13 +------------
arch/tile/kernel/intvec_64.S | 13 +------------
arch/tile/kernel/process.c | 6 +++---
arch/tile/kernel/signal.c | 9 +++++----
arch/tile/kernel/sys.c | 8 +++-----
arch/tile/mm/fault.c | 5 +++--
10 files changed, 39 insertions(+), 57 deletions(-)
diff --git a/arch/tile/include/asm/compat.h b/arch/tile/include/asm/compat.h
index 3bcf1b9..ca61fb4 100644
--- a/arch/tile/include/asm/compat.h
+++ b/arch/tile/include/asm/compat.h
@@ -280,10 +280,9 @@ long compat_sys_rt_sigaction(int sig, struct compat_sigaction __user *act,
size_t sigsetsize);
long compat_sys_rt_sigqueueinfo(int pid, int sig,
struct compat_siginfo __user *uinfo);
-long compat_sys_rt_sigreturn(struct pt_regs *);
+long compat_sys_rt_sigreturn(void);
long compat_sys_sigaltstack(const struct compat_sigaltstack __user *uss_ptr,
- struct compat_sigaltstack __user *uoss_ptr,
- struct pt_regs *);
+ struct compat_sigaltstack __user *uoss_ptr);
long compat_sys_truncate64(char __user *filename, u32 dummy, u32 low, u32 high);
long compat_sys_ftruncate64(unsigned int fd, u32 dummy, u32 low, u32 high);
long compat_sys_pread64(unsigned int fd, char __user *ubuf, size_t count,
@@ -300,9 +299,7 @@ long compat_sys_fallocate(int fd, int mode,
long compat_sys_sched_rr_get_interval(compat_pid_t pid,
struct compat_timespec __user *interval);
-/* These are the intvec_64.S trampolines. */
-long _compat_sys_sigaltstack(const struct compat_sigaltstack __user *uss_ptr,
- struct compat_sigaltstack __user *uoss_ptr);
+/* Assembly trampoline to avoid clobbering r0. */
long _compat_sys_rt_sigreturn(void);
#endif /* _ASM_TILE_COMPAT_H */
diff --git a/arch/tile/include/asm/syscalls.h b/arch/tile/include/asm/syscalls.h
index 0d52992..369696d 100644
--- a/arch/tile/include/asm/syscalls.h
+++ b/arch/tile/include/asm/syscalls.h
@@ -51,8 +51,7 @@ long sys_cacheflush(unsigned long addr, unsigned long len,
#ifndef __tilegx__
/* mm/fault.c */
-long sys_cmpxchg_badaddr(unsigned long address, struct pt_regs *);
-long _sys_cmpxchg_badaddr(unsigned long address);
+long sys_cmpxchg_badaddr(unsigned long address);
#endif
#ifdef CONFIG_COMPAT
@@ -63,15 +62,23 @@ long sys_truncate64(const char __user *path, loff_t length);
long sys_ftruncate64(unsigned int fd, loff_t length);
#endif
-/* These are the intvec*.S trampolines. */
-long _sys_sigaltstack(const stack_t __user *, stack_t __user *);
-long _sys_rt_sigreturn(void);
-long _sys_clone(unsigned long clone_flags, unsigned long newsp,
+/* Provide versions of standard syscalls that use current_pt_regs(). */
+long sys_clone(unsigned long clone_flags, unsigned long newsp,
void __user *parent_tid, void __user *child_tid);
long sys_execve(const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp);
+long sys_rt_sigreturn(void);
+long sys_sigaltstack(const stack_t __user *, stack_t __user *);
+#define sys_clone sys_clone
#define sys_execve sys_execve
+#define sys_rt_sigreturn sys_rt_sigreturn
+#define sys_sigaltstack sys_sigaltstack
+
+/* These are the intvec*.S trampolines. */
+long _sys_rt_sigreturn(void);
+long _sys_clone(unsigned long clone_flags, unsigned long newsp,
+ void __user *parent_tid, void __user *child_tid);
#include <asm-generic/syscalls.h>
diff --git a/arch/tile/kernel/compat.c b/arch/tile/kernel/compat.c
index a8e5a84..a2e8055 100644
--- a/arch/tile/kernel/compat.c
+++ b/arch/tile/kernel/compat.c
@@ -102,9 +102,9 @@ long compat_sys_sched_rr_get_interval(compat_pid_t pid,
#define compat_sys_fadvise64_64 sys32_fadvise64_64
#define compat_sys_readahead sys32_readahead
-/* Call the trampolines to manage pt_regs where necessary. */
-#define compat_sys_sigaltstack _compat_sys_sigaltstack
+/* Call the assembly trampolines where necessary. */
#define compat_sys_rt_sigreturn _compat_sys_rt_sigreturn
+#undef sys_clone
#define sys_clone _sys_clone
/*
diff --git a/arch/tile/kernel/compat_signal.c b/arch/tile/kernel/compat_signal.c
index 08b4fe1..210a9bb 100644
--- a/arch/tile/kernel/compat_signal.c
+++ b/arch/tile/kernel/compat_signal.c
@@ -197,8 +197,7 @@ int copy_siginfo_from_user32(siginfo_t *to, struct compat_siginfo __user *from)
}
long compat_sys_sigaltstack(const struct compat_sigaltstack __user *uss_ptr,
- struct compat_sigaltstack __user *uoss_ptr,
- struct pt_regs *regs)
+ struct compat_sigaltstack __user *uoss_ptr)
{
stack_t uss, uoss;
int ret;
@@ -219,7 +218,7 @@ long compat_sys_sigaltstack(const struct compat_sigaltstack __user *uss_ptr,
set_fs(KERNEL_DS);
ret = do_sigaltstack(uss_ptr ? (stack_t __user __force *)&uss : NULL,
(stack_t __user __force *)&uoss,
- (unsigned long)compat_ptr(regs->sp));
+ (unsigned long)compat_ptr(current_pt_regs()->sp));
set_fs(seg);
if (ret >= 0 && uoss_ptr) {
if (!access_ok(VERIFY_WRITE, uoss_ptr, sizeof(*uoss_ptr)) ||
@@ -232,8 +231,9 @@ long compat_sys_sigaltstack(const struct compat_sigaltstack __user *uss_ptr,
}
/* The assembly shim for this function arranges to ignore the return value. */
-long compat_sys_rt_sigreturn(struct pt_regs *regs)
+long compat_sys_rt_sigreturn(void)
{
+ struct pt_regs *regs = current_pt_regs();
struct compat_rt_sigframe __user *frame =
(struct compat_rt_sigframe __user *) compat_ptr(regs->sp);
sigset_t set;
@@ -248,7 +248,7 @@ long compat_sys_rt_sigreturn(struct pt_regs *regs)
if (restore_sigcontext(regs, &frame->uc.uc_mcontext))
goto badframe;
- if (compat_sys_sigaltstack(&frame->uc.uc_stack, NULL, regs) != 0)
+ if (compat_sys_sigaltstack(&frame->uc.uc_stack, NULL) != 0)
goto badframe;
return 0;
diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index 174b837..f212bf7 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -1452,15 +1452,6 @@ STD_ENTRY_LOCAL(bad_intr)
panic "Unhandled interrupt %#x: PC %#lx"
STD_ENDPROC(bad_intr)
-/* Put address of pt_regs in reg and jump. */
-#define PTREGS_SYSCALL(x, reg) \
- STD_ENTRY(_##x); \
- { \
- PTREGS_PTR(reg, PTREGS_OFFSET_BASE); \
- j x \
- }; \
- STD_ENDPROC(_##x)
-
/*
* Special-case sigreturn to not write r0 to the stack on return.
* This is technically more efficient, but it also avoids difficulties
@@ -1476,11 +1467,9 @@ STD_ENTRY_LOCAL(bad_intr)
}; \
STD_ENDPROC(_##x)
-PTREGS_SYSCALL(sys_sigaltstack, r2)
PTREGS_SYSCALL_SIGRETURN(sys_rt_sigreturn, r0)
-PTREGS_SYSCALL(sys_cmpxchg_badaddr, r1)
-/* Save additional callee-saves to pt_regs, put address in r4 and jump. */
+/* Save additional callee-saves to pt_regs and jump to standard function. */
STD_ENTRY(_sys_clone)
push_extra_callee_saves r4
j sys_clone
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 283efed..54bc9a6 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -1181,15 +1181,6 @@ STD_ENTRY_LOCAL(bad_intr)
panic "Unhandled interrupt %#x: PC %#lx"
STD_ENDPROC(bad_intr)
-/* Put address of pt_regs in reg and jump. */
-#define PTREGS_SYSCALL(x, reg) \
- STD_ENTRY(_##x); \
- { \
- PTREGS_PTR(reg, PTREGS_OFFSET_BASE); \
- j x \
- }; \
- STD_ENDPROC(_##x)
-
/*
* Special-case sigreturn to not write r0 to the stack on return.
* This is technically more efficient, but it also avoids difficulties
@@ -1205,14 +1196,12 @@ STD_ENTRY_LOCAL(bad_intr)
}; \
STD_ENDPROC(_##x)
-PTREGS_SYSCALL(sys_sigaltstack, r2)
PTREGS_SYSCALL_SIGRETURN(sys_rt_sigreturn, r0)
#ifdef CONFIG_COMPAT
-PTREGS_SYSCALL(compat_sys_sigaltstack, r2)
PTREGS_SYSCALL_SIGRETURN(compat_sys_rt_sigreturn, r0)
#endif
-/* Save additional callee-saves to pt_regs, put address in r4 and jump. */
+/* Save additional callee-saves to pt_regs and jump to standard function. */
STD_ENTRY(_sys_clone)
push_extra_callee_saves r4
j sys_clone
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 58f8fd1..6e7fb4e 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -584,10 +584,10 @@ int do_work_pending(struct pt_regs *regs, u32 thread_info_flags)
}
/* Note there is an implicit fifth argument if (clone_flags & CLONE_SETTLS). */
-SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
- void __user *, parent_tidptr, void __user *, child_tidptr,
- struct pt_regs *, regs)
+SYSCALL_DEFINE4(clone, unsigned long, clone_flags, unsigned long, newsp,
+ void __user *, parent_tidptr, void __user *, child_tidptr)
{
+ struct pt_regs *regs = current_pt_regs();
if (!newsp)
newsp = regs->sp;
return do_fork(clone_flags, newsp, regs, 0,
diff --git a/arch/tile/kernel/signal.c b/arch/tile/kernel/signal.c
index 67efb65..657a7ac 100644
--- a/arch/tile/kernel/signal.c
+++ b/arch/tile/kernel/signal.c
@@ -37,10 +37,10 @@
#define DEBUG_SIG 0
-SYSCALL_DEFINE3(sigaltstack, const stack_t __user *, uss,
- stack_t __user *, uoss, struct pt_regs *, regs)
+SYSCALL_DEFINE2(sigaltstack, const stack_t __user *, uss,
+ stack_t __user *, uoss)
{
- return do_sigaltstack(uss, uoss, regs->sp);
+ return do_sigaltstack(uss, uoss, current_pt_regs()->sp);
}
@@ -83,8 +83,9 @@ void signal_fault(const char *type, struct pt_regs *regs,
}
/* The assembly shim for this function arranges to ignore the return value. */
-SYSCALL_DEFINE1(rt_sigreturn, struct pt_regs *, regs)
+SYSCALL_DEFINE0(rt_sigreturn)
{
+ struct pt_regs *regs = current_pt_regs();
struct rt_sigframe __user *frame =
(struct rt_sigframe __user *)(regs->sp);
sigset_t set;
diff --git a/arch/tile/kernel/sys.c b/arch/tile/kernel/sys.c
index 359e76f..02ff5c0 100644
--- a/arch/tile/kernel/sys.c
+++ b/arch/tile/kernel/sys.c
@@ -106,13 +106,11 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
#define sys_readahead sys32_readahead
#endif
-/* Call the trampolines to manage pt_regs where necessary. */
-#define sys_sigaltstack _sys_sigaltstack
+/* Call the assembly trampolines where necessary. */
+#undef sys_rt_sigreturn
#define sys_rt_sigreturn _sys_rt_sigreturn
+#undef sys_clone
#define sys_clone _sys_clone
-#ifndef __tilegx__
-#define sys_cmpxchg_badaddr _sys_cmpxchg_badaddr
-#endif
/*
* Note that we can't include <linux/unistd.h> here since the header
diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index fe811fa..3d2b81c 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -70,9 +70,10 @@ static noinline void force_sig_info_fault(const char *type, int si_signo,
* Synthesize the fault a PL0 process would get by doing a word-load of
* an unaligned address or a high kernel address.
*/
-SYSCALL_DEFINE2(cmpxchg_badaddr, unsigned long, address,
- struct pt_regs *, regs)
+SYSCALL_DEFINE1(cmpxchg_badaddr, unsigned long, address)
{
+ struct pt_regs *regs = current_pt_regs();
+
if (address >= PAGE_OFFSET)
force_sig_info_fault("atomic segfault", SIGSEGV, SEGV_MAPERR,
address, INT_DTLB_MISS, current, regs);
--
1.7.10.3
On Tue, Oct 23, 2012 at 01:30:26PM -0400, Chris Metcalf wrote:
> I fetched the series from your arch-tile branch and built it, and it works
> fine. It looks good from my inspection:
>
> Acked-by: Chris Metcalf <[email protected]>
Thanks; Acked-by applied, branch pushed and put into no-rebase mode.
BTW, something like detached Acked-by objects might be a good idea - i.e.
commit-like git object with amendment to commit message of a given ancestor.
The situation when ACKs come only after the commit has been pushed is quite
common. Linus, what do you think about usefulness of such thing? Ability
to append ACKed-by/Tested-by of an earlier commit to a branch instead of
git commit --amend + possibly some cherry-picks + force-push, that is.
> As you had suggested in an earlier email, I went ahead and eliminated the
> special pt_regs handling for sigaltstack, rt_sigreturn, and clone. (Also a
> tilepro-specific syscall that was also using PTREG_SYSCALL.) I'll send it
> separately and you can include it in your tree if you like.
OK by me... Or just pull this arch-tile into a separate branch in your
tree, add commit on top of it and I'll pull that.
AFAICS, the current state is:
avr32 untested
blackfin untested
cris untested
frv untested, maintainer going to test
h8300 untested
hexagon maintainer hunting the breakage down
m32r untested
microblaze maintainer hunting the breakage down
mn10300 untested
powerpc should work, needs testing and ACK
s390 should work, needs testing and ACK
score untested
sh untested, maintainers planned reviewing and testing
xtensa maintainers writing that one
everything else done
so we are more than halfway through on that one... FWIW, here's what I want
to do next in that area:
* consider taking the "if (!usp) usp = <parent's usp>" logics from
clone(2) to copy_thread(). Note that it allows to have fork and vfork
(where implemented) simply pass 0. Per-architecture change, independent for
different architectures.
* do generic sys_fork() and sys_vfork() variants, have the
architectures that want to use them do so.
* consider making regs argument of copy_thread() unused. The thing
is, we can tell whether we have userland process or kernel thread being
spawned just by looking at p->flags & PF_KTHREAD. And on the "userland
process" side of things, it's often just as cheap (or cheaper) to use
current_pt_regs() instead of using the argument, especially on architectures
where 6-argument functions get the last args passed on stack. Again,
independent for different architectures.
* kill pt_regs argument of do_execve/compat_do_execve/do_execve_common/
search_binary_handler/->load_binary. Depends on having everything switched
to generic kernel_execve/sys_execve; local to fs/* and on the last step
a couple of binfmt instances in arch (alpha ecoff, ia32 aout)
* kill ifdefs by CONFIG_GENERIC_KERNEL_{THREAD,EXECVE} once everything
has those always true.
* kill pt_regs argument of copy_thread() if all architectures can live
with that. Same for copy_process() and do_fork(). A flagday change, obviously.
* kill idle_regs() - it'll be unused by that point.
On 10/23/2012 2:41 PM, Al Viro wrote:
> On Tue, Oct 23, 2012 at 01:30:26PM -0400, Chris Metcalf wrote:
>> As you had suggested in an earlier email, I went ahead and eliminated the
>> special pt_regs handling for sigaltstack, rt_sigreturn, and clone. (Also a
>> tilepro-specific syscall that was also using PTREG_SYSCALL.) I'll send it
>> separately and you can include it in your tree if you like.
> OK by me... Or just pull this arch-tile into a separate branch in your
> tree, add commit on top of it and I'll pull that.
Done in
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git viro-signal-tile
Thanks!
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
On Tue, Oct 23, 2012 at 03:22:36PM -0400, Chris Metcalf wrote:
> On 10/23/2012 2:41 PM, Al Viro wrote:
> > On Tue, Oct 23, 2012 at 01:30:26PM -0400, Chris Metcalf wrote:
> >> As you had suggested in an earlier email, I went ahead and eliminated the
> >> special pt_regs handling for sigaltstack, rt_sigreturn, and clone. (Also a
> >> tilepro-specific syscall that was also using PTREG_SYSCALL.) I'll send it
> >> separately and you can include it in your tree if you like.
> > OK by me... Or just pull this arch-tile into a separate branch in your
> > tree, add commit on top of it and I'll pull that.
>
> Done in
>
> git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git viro-signal-tile
Argh... That's exactly the reason why I would like to get amending commits -
you've done that commit on top of the identical tree, but without your
Acked-by on the last commit. Result: merge conflict ;-/
Could you test the following on top of your patch? It gets rid of
regs use in sys_clone() *and* of the regs argument in copy_thread().
If that work (including SMP - note that it changes the path taken
by copy_thread() when called by fork_idle()), that should be all
we'll need in arch/tile for killing idle_regs() and killing the pt_regs
passing to do_fork()/copy_process()/copy_thread().
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 6e7fb4e..1c20029 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -158,7 +158,7 @@ static void save_arch_state(struct thread_struct *t);
int copy_thread(unsigned long clone_flags, unsigned long sp,
unsigned long arg,
- struct task_struct *p, struct pt_regs *regs)
+ struct task_struct *p, struct pt_regs *unused)
{
struct pt_regs *childregs = task_pt_regs(p);
unsigned long ksp;
@@ -184,7 +184,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
/* Record the pid of the task that created this one. */
p->thread.creator_pid = current->pid;
- if (unlikely(!regs)) {
+ if (unlikely(p->flags & PF_KTHREAD)) {
/* kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
memset(&callee_regs[2], 0,
@@ -208,25 +208,26 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
*/
task_thread_info(p)->step_state = NULL;
- /* Save user stack top pointer so we can ID the stack vm area later. */
- p->thread.usp0 = sp;
-
/*
* Copy the registers onto the kernel stack so the
* return-from-interrupt code will reload it into registers.
*/
- *childregs = *regs;
+ *childregs = *current_pt_regs();
childregs->regs[0] = 0; /* return value is zero */
- childregs->sp = sp; /* override with new user stack pointer */
- memcpy(callee_regs, ®s->regs[CALLEE_SAVED_FIRST_REG],
+ if (sp)
+ childregs->sp = sp; /* override with new user stack pointer */
+ memcpy(callee_regs, &childregs->regs[CALLEE_SAVED_FIRST_REG],
CALLEE_SAVED_REGS_COUNT * sizeof(unsigned long));
+ /* Save user stack top pointer so we can ID the stack vm area later. */
+ p->thread.usp0 = childregs->sp;
+
/*
* If CLONE_SETTLS is set, set "tp" in the new task to "r4",
* which is passed in as arg #5 to sys_clone().
*/
if (clone_flags & CLONE_SETTLS)
- childregs->tp = regs->regs[4];
+ childregs->tp = childregs->regs[4];
#if CHIP_HAS_TILE_DMA()
@@ -587,10 +588,7 @@ int do_work_pending(struct pt_regs *regs, u32 thread_info_flags)
SYSCALL_DEFINE4(clone, unsigned long, clone_flags, unsigned long, newsp,
void __user *, parent_tidptr, void __user *, child_tidptr)
{
- struct pt_regs *regs = current_pt_regs();
- if (!newsp)
- newsp = regs->sp;
- return do_fork(clone_flags, newsp, regs, 0,
+ return do_fork(clone_flags, newsp, current_pt_regs(), 0,
parent_tidptr, child_tidptr);
}
On Tue, 23 Oct 2012, Al Viro wrote:
> On Tue, Oct 23, 2012 at 01:30:26PM -0400, Chris Metcalf wrote:
>
> > I fetched the series from your arch-tile branch and built it, and it works
> > fine. It looks good from my inspection:
> >
> > Acked-by: Chris Metcalf <[email protected]>
>
> Thanks; Acked-by applied, branch pushed and put into no-rebase mode.
>
> BTW, something like detached Acked-by objects might be a good idea - i.e.
> commit-like git object with amendment to commit message of a given ancestor.
> The situation when ACKs come only after the commit has been pushed is quite
> common. Linus, what do you think about usefulness of such thing? Ability
> to append ACKed-by/Tested-by of an earlier commit to a branch instead of
> git commit --amend + possibly some cherry-picks + force-push, that is.
I agree that this is a common issue. Acked-by/Reviewed-by mails come
in after the fact that the patch has been committed to an immutable
(i.e no-rebase mode) branch or if the change in question already hit
Linus tree.
Still it would be nice to have a recording of that in the git tree
itself.
Something like: "git --attach SHA1 <comment>" would be appreciated!
Thanks,
tglx
On Tue, Oct 23, 2012 at 10:47:28PM +0200, Thomas Gleixner wrote:
> I agree that this is a common issue. Acked-by/Reviewed-by mails come
> in after the fact that the patch has been committed to an immutable
> (i.e no-rebase mode) branch or if the change in question already hit
> Linus tree.
>
> Still it would be nice to have a recording of that in the git tree
> itself.
>
> Something like: "git --attach SHA1 <comment>" would be appreciated!
It is spelled:
git notes add -m <comment> SHA1
The resulting notes are stored in a separate revision-controlled branch
and can be pushed and pulled like regular refs. Note, though, that the
default refspecs do not yet include refs/notes, so you'd have to add
them manually. The workflows around notes are not very mature yet, so if
you start using them, feedback would be appreciated.
-Peff
On 23 October 2012 21:51, Jeff King <[email protected]> wrote:
> On Tue, Oct 23, 2012 at 10:47:28PM +0200, Thomas Gleixner wrote:
>
>> I agree that this is a common issue. Acked-by/Reviewed-by mails come
>> in after the fact that the patch has been committed to an immutable
>> (i.e no-rebase mode) branch or if the change in question already hit
>> Linus tree.
>>
>> Still it would be nice to have a recording of that in the git tree
>> itself.
>>
>> Something like: "git --attach SHA1 <comment>" would be appreciated!
>
> It is spelled:
>
> git notes add -m <comment> SHA1
>
> The resulting notes are stored in a separate revision-controlled branch
> and can be pushed and pulled like regular refs. Note, though, that the
> default refspecs do not yet include refs/notes, so you'd have to add
> them manually. The workflows around notes are not very mature yet, so if
> you start using them, feedback would be appreciated.
What would be nice is that notes are pushed/pulled automatically with
standard git push/fetch/pull commands. Usually git walks the DAG
starting with the pulled commit or tag and following the parents. With
notes, the reference is reversed, the note pointing to the commit and
not the other way around. So handling this automatically in Git would
be really useful.
The other feature I'd like is that notes are automatically folded in
the log during git rebase (maybe similar to the squash option). If you
rebase, you lose all the notes (though this depends on the workflow,
it may not be needed with published branches).
--
Catalin
On Tue, Oct 23, 2012 at 10:09:46PM +0100, Catalin Marinas wrote:
> > It is spelled:
> >
> > git notes add -m <comment> SHA1
> >
> > The resulting notes are stored in a separate revision-controlled branch
> > and can be pushed and pulled like regular refs. Note, though, that the
> > default refspecs do not yet include refs/notes, so you'd have to add
> > them manually. The workflows around notes are not very mature yet, so if
> > you start using them, feedback would be appreciated.
>
> What would be nice is that notes are pushed/pulled automatically with
> standard git push/fetch/pull commands. Usually git walks the DAG
> starting with the pulled commit or tag and following the parents. With
> notes, the reference is reversed, the note pointing to the commit and
> not the other way around. So handling this automatically in Git would
> be really useful.
Right, that's what I meant about the refspecs. You can configure git to
push or pull them automatically, but it is not the default. Something
like:
git config --add remote.origin.fetch '+refs/notes/*:refs/notes/origin/*'
would be a start, but you'd also want to "git notes merge" upstream's
changes into your local notes (you _could_ just fetch straight into
refs/notes/, but if you are making your own notes locally, you have to
resolve it somehow). Exactly how to make this smooth is one of the workflow
considerations; there's been discussion, but most people aren't using
the feature, so we don't have a lot of data.
If you are asking whether we could "auto-follow" notes for commits that
have been fetched like we do for tags, the answer is "not really". The
notes tree is version-controlled as a whole, so you generally fetch the
whole thing or not. And the remote does not advertise note information
the same way we advertise peeled tag references, so a client does not
know which notes are available for fetch. The intended strategy is to
pull in the notes or not (though you can have multiple notes refs with
different names, and fetch just a subset of them).
> The other feature I'd like is that notes are automatically folded in
> the log during git rebase (maybe similar to the squash option). If you
> rebase, you lose all the notes (though this depends on the workflow,
> it may not be needed with published branches).
Git-rebase can automatically copy notes from one commit to another
during a rebase, but you need to set notes.rewriteRef to do so (see "git
help config" for details). The reason for this conservative default is
that some notes may not be appropriate for automatic copying (e.g., a
notes tree containing QA approval should probably be invalidated during
a rebase, whereas one with commentary probably should).
Squashing the notes into the commit message during rebase would be a
useful feature (at least for some type of notes), but that feature does
not currently exist (and as far as I recall, this is the first it has
been proposed).
Again, I think a lot of this comes down to the fact that not many people
are really using notes for their daily workflow, so these itches are not
coming up and getting scratched.
-Peff
On Tue, 23 Oct 2012, Jeff King wrote:
> On Tue, Oct 23, 2012 at 10:47:28PM +0200, Thomas Gleixner wrote:
>
> > I agree that this is a common issue. Acked-by/Reviewed-by mails come
> > in after the fact that the patch has been committed to an immutable
> > (i.e no-rebase mode) branch or if the change in question already hit
> > Linus tree.
> >
> > Still it would be nice to have a recording of that in the git tree
> > itself.
> >
> > Something like: "git --attach SHA1 <comment>" would be appreciated!
>
> It is spelled:
>
> git notes add -m <comment> SHA1
Cool!
> The resulting notes are stored in a separate revision-controlled branch
Which branch(es) is/are that ? What are the semantics of that?
Assume I commit something to branch "foo"
Now I get that late Ack/Reviewed-by and want to associate that to that
commit in branch "foo". Does that go into "notes/foo" ?
If yes, good. (Any other sensible prefix is good as well). If no,
where does it go to?
Later when I send a pull request to my upstream maintainer for branch
"foo" does he get "notes/foo" automagically or do I have to request to
pull him that separately?
Either way is fine for me, though something which lets me "automate"
that would be appreciated. I can work around that easily as my pull
requests are generated via scripts, so I can add the secondary one for
the dependent "notes" branch if necessary. Though it would be nice to
avoid that. Avoiding that, i.e having a straight connection (maybe
configurable) between "foo" and "notes/foo" and the commits which have
not yet hit my upstream maintainer would make my life easier. I.e. I
just have to check "foo" for stuff which is not upstream yet instead
of checking both, but that might just be my laziness.
Thoughts?
tglx
On Tue, Oct 23, 2012 at 11:25:06PM +0200, Thomas Gleixner wrote:
> > The resulting notes are stored in a separate revision-controlled branch
>
> Which branch(es) is/are that ? What are the semantics of that?
They are stored in refs/notes/commits by default, but you can have
multiple notes refs if you want to store logically distinct sets of
notes.
A notes ref's tree is just a tree whose entries are sha1s, and the file
contents contain the notes themselves (the sha1s are broken down into
subdirectories for performance, but "git notes" handles this behind the
scenes). Technically you could check it out as a branch, edit, and
commit, but "git checkout" is not happy to have a HEAD outside of
refs/heads/, so you are stuck with plumbing like:
$ git checkout `git rev-parse refs/notes/commits`
$ edit edit edit
$ git commit ...
$ git update-ref refs/notes/commits HEAD
It's probably not good for much beyond exploring how notes are
implemented. See "git help notes" for more discussion.
> Assume I commit something to branch "foo"
>
> Now I get that late Ack/Reviewed-by and want to associate that to that
> commit in branch "foo". Does that go into "notes/foo" ?
No. It would go into refs/notes/commits, or you could ask it to go to
refs/notes/acks if you wanted to keep them separate from your default
notes. It is indexed by commit object, not by branch (so if that branch
later goes away, the notes are always still attached to the commit
objects, assuming they got merged in).
> Later when I send a pull request to my upstream maintainer for branch
> "foo" does he get "notes/foo" automagically or do I have to request to
> pull him that separately?
No, he would have to pull your notes separately. Most of the discussion
around sharing has been configuring the default refspec configuration to
fetch notes. But in the kernel you guys use a lot of one-off pulls
without configured remotes. I'm not sure what the right workflow would
be. It might simply be to ask git to always pull particular notes
commits at the same time (so you might push your notes to
refs/notes/for-linus, and then git would automatically grab the notes
when somebody pulls refs/heads/for-linus).
> Either way is fine for me, though something which lets me "automate"
> that would be appreciated. I can work around that easily as my pull
> requests are generated via scripts, so I can add the secondary one for
> the dependent "notes" branch if necessary. Though it would be nice to
> avoid that. Avoiding that, i.e having a straight connection (maybe
> configurable) between "foo" and "notes/foo" and the commits which have
> not yet hit my upstream maintainer would make my life easier. I.e. I
> just have to check "foo" for stuff which is not upstream yet instead
> of checking both, but that might just be my laziness.
>
> Thoughts?
That all makes sense. Putting extra work on the puller is not a good
long-term solution. So while sending them an extra "also pull these
notes" line, even if it ends up being a cut-and-pastable single-liner,
is not great (even if it is the most flexible thing). Using a convention
based on name-equivalence seems like a sensible compromise.
-Peff
On Tue, Oct 23, 2012 at 03:06:59PM -0700, Marc Gauthier wrote:
> Can a later commit be eventually be made to reference some set
> of notes added so far, so they become part of the whole history
> signed by the HEAD SHA1? hence pulled/pushed automatically as
> well. Otherwise do you not end up with a forever growing separate
> tree of notes that loses some of the properties of being behind
> the head SHA1 (and perhaps less scalable in manageability)?
> Also that way notes are separate only temporarily.
Interesting idea. It would be tough to do with existing objects. There
are really only two ways for a commit to reference objects:
1. Via a parent header. But we would not want to include the notes
tree as a separate parent. The semantics are all wrong, and would
make your commit look like a nonsense merge.
2. As an entry in a tree. But we do not enforce connectivity of
commits referenced in trees, because that is the way that
submodules are implemented.
So I think we would have to add a new header that says "also, here are
some notes for my history". That has two problems:
1. It's not backwards compatible. People with the new version of git
will expect to have objects referenced by the new header, but older
servers may not provide those objects (and vice versa). We can add
a protocol extension to communicate this, but fundamentally you are
going to lose the object connection any time it passes through a
repo running an older git.
2. It's impure from the perspective of git's data model. Adding in the
notes reference is not really a property of the commit. It's more
like saying "Oh, these other things happened to _past_ commits, and
I'm just now mentioning them". So you pick an arbitrary commit to
attach it to. What are the semantics with relation to that commit's
position in the history graph? If I have a commit that is identical
but without the notes reference, it will have a different sha1. But
is it the same commit?
So it's a bit ugly. I think I'd rather build out the transfer
infrastructure to pass the notes references around more gracefully
without trying to shoehorn them into the commit graph.
> As for automating the inclusion of notes in the flow, can that
> be conditional on some pattern in the note, so that e.g. the
> Acked-by's get included and folded in automatically, whereas
> others do not, according to settings?
Yeah. You can store arbitrary data in notes (e.g., one of the existing
uses of notes is to record metadata on the patch emails that led to a
commit). Right now you typically separate it out by data type into
separate refs, and then you ask git log to show you particular ones (so
we see refs/notes/commits with "--notes", but you can do "--notes=foo"
to see refs/notes/foo, or even show multiple refs).
For the fold-on-rebase idea, I'd think you would want something similar,
like setting rebase.foldNotes to "foo" to say "refs/notes/foo contains
pseudo-headers that should be folded in like a signed-off-by".
-Peff
On Wed, Oct 24, 2012 at 12:25 AM, Thomas Gleixner <[email protected]> wrote:
>>
>> It is spelled:
>>
>> git notes add -m <comment> SHA1
>
> Cool!
Don't use them for anything global.
Use them for local codeflow, but don't expect them to be distributed.
It's a separate "flow", and while it *can* be distributed, it's not
going to be for the kernel, for example. So no, don't start using this
to ack things, because the acks *will* get lost.
Linus
On Wed, Oct 24, 2012 at 04:02:49AM +0300, Linus Torvalds wrote:
> On Wed, Oct 24, 2012 at 12:25 AM, Thomas Gleixner <[email protected]> wrote:
> >>
> >> It is spelled:
> >>
> >> git notes add -m <comment> SHA1
> >
> > Cool!
>
> Don't use them for anything global.
>
> Use them for local codeflow, but don't expect them to be distributed.
> It's a separate "flow", and while it *can* be distributed, it's not
> going to be for the kernel, for example. So no, don't start using this
> to ack things, because the acks *will* get lost.
How about git commit --allow-empty, with
"belated ACK for <commit>
Acked-by: <...>
" as commit message? I mean, that ought to work and propagate sanely,
but I'm really not sure if that's something in a good taste and should
be allowed as a common practice...
Jeff King wrote:
> On Tue, Oct 23, 2012 at 11:25:06PM +0200, Thomas Gleixner wrote:
> > > The resulting notes are stored in a separate
> > > revision-controlled branch
> >
> > Which branch(es) is/are that ? What are the semantics of that?
[...]
Nice feature.
Can a later commit be eventually be made to reference some set
of notes added so far, so they become part of the whole history
signed by the HEAD SHA1? hence pulled/pushed automatically as
well. Otherwise do you not end up with a forever growing separate
tree of notes that loses some of the properties of being behind
the head SHA1 (and perhaps less scalable in manageability)?
Also that way notes are separate only temporarily.
As for automating the inclusion of notes in the flow, can that
be conditional on some pattern in the note, so that e.g. the
Acked-by's get included and folded in automatically, whereas
others do not, according to settings?
-Marc
On Wed, Oct 24, 2012 at 4:56 AM, Al Viro <[email protected]> wrote:
>
> How about git commit --allow-empty, with
> "belated ACK for <commit>
Don't bother. It's not that important, and it's just distracting.
It's not like this is vital information. If you pushed it out without
the ack, it's out without the ack. Big deal.
Linus
* Linus Torvalds <[email protected]> wrote:
> On Wed, Oct 24, 2012 at 12:25 AM, Thomas Gleixner <[email protected]> wrote:
> >>
> >> It is spelled:
> >>
> >> git notes add -m <comment> SHA1
> >
> > Cool!
>
> Don't use them for anything global.
>
> Use them for local codeflow, but don't expect them to be
> distributed. It's a separate "flow", and while it *can* be
> distributed, it's not going to be for the kernel, for example.
> So no, don't start using this to ack things, because the acks
> *will* get lost.
I'd also add a small meta argument: that it would be actively
wrong to *allow* 'belated' acks to be added. In practice acks
are most useful *before* a commit gets created and they often
have a mostly buerocratic role afterwards.
So we should encourage timely acks (which actually help
development), and accept ack-less patches as long as they are
correct and create no problems. More utility, less buerocracy.
Incorrect, ack-less patches causing problems will get all the
flames they deserve.
Thanks,
Ingo
Am 10/24/2012 0:23, schrieb Jeff King:
> For the fold-on-rebase idea, I'd think you would want something similar,
> like setting rebase.foldNotes to "foo" to say "refs/notes/foo contains
> pseudo-headers that should be folded in like a signed-off-by".
If you are rebasing anyway, you can already use interactive rebase's
--autosquash option:
# a late ACK came in:
git commit --allow-empty -m'squash! tile: support GENERIC_
Acked-by: A U Thor <[email protected]>'
git rebase -i --keep-empty --autosquash $forkpoint
Requires git 1.7.11 for --keep-empty and requires to edit out the
'squash!...' headers when the editor appears during the rebase.
-- Hannes
On Tue, Oct 23, 2012 at 10:22:45PM +0100, Jeff King wrote:
> On Tue, Oct 23, 2012 at 10:09:46PM +0100, Catalin Marinas wrote:
> > > It is spelled:
> > >
> > > git notes add -m <comment> SHA1
> > >
> > > The resulting notes are stored in a separate revision-controlled branch
> > > and can be pushed and pulled like regular refs. Note, though, that the
> > > default refspecs do not yet include refs/notes, so you'd have to add
> > > them manually. The workflows around notes are not very mature yet, so if
> > > you start using them, feedback would be appreciated.
> >
> > What would be nice is that notes are pushed/pulled automatically with
> > standard git push/fetch/pull commands. Usually git walks the DAG
> > starting with the pulled commit or tag and following the parents. With
> > notes, the reference is reversed, the note pointing to the commit and
> > not the other way around. So handling this automatically in Git would
> > be really useful.
>
> Right, that's what I meant about the refspecs. You can configure git to
> push or pull them automatically, but it is not the default. Something
> like:
>
> git config --add remote.origin.fetch '+refs/notes/*:refs/notes/origin/*'
Yes, but that's a bit more complicated than a simple pull. Anyway, Linus
seems to not be in favour of annotating commits later for adding acks,
so no need for such feature.
> > The other feature I'd like is that notes are automatically folded in
> > the log during git rebase (maybe similar to the squash option). If you
> > rebase, you lose all the notes (though this depends on the workflow,
> > it may not be needed with published branches).
>
> Git-rebase can automatically copy notes from one commit to another
> during a rebase, but you need to set notes.rewriteRef to do so (see "git
> help config" for details). The reason for this conservative default is
> that some notes may not be appropriate for automatic copying (e.g., a
> notes tree containing QA approval should probably be invalidated during
> a rebase, whereas one with commentary probably should).
Thanks, I wasn't aware of this.
> Squashing the notes into the commit message during rebase would be a
> useful feature (at least for some type of notes), but that feature does
> not currently exist (and as far as I recall, this is the first it has
> been proposed).
For some workflow - I post patches to the list, people reply with their
acks, I could just add those to notes and later fold them into the
existing commits before pushing the branch upstream. I guess it may be
just a matter of changing git format-patch to include the notes. I can
later reword he commits and drop the "Notes:" line.
--
Catalin
On 10/23/2012 4:36 PM, Al Viro wrote:
> Could you test the following on top of your patch? It gets rid of
> regs use in sys_clone() *and* of the regs argument in copy_thread().
> If that work (including SMP - note that it changes the path taken
> by copy_thread() when called by fork_idle()), that should be all
> we'll need in arch/tile for killing idle_regs() and killing the pt_regs
> passing to do_fork()/copy_process()/copy_thread().
It seems to work fine. Thanks!
Acked-by: Chris Metcalf <[email protected]>
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
On Thu, Oct 25, 2012 at 09:31:31AM -0400, Chris Metcalf wrote:
> On 10/23/2012 4:36 PM, Al Viro wrote:
> > Could you test the following on top of your patch? It gets rid of
> > regs use in sys_clone() *and* of the regs argument in copy_thread().
> > If that work (including SMP - note that it changes the path taken
> > by copy_thread() when called by fork_idle()), that should be all
> > we'll need in arch/tile for killing idle_regs() and killing the pt_regs
> > passing to do_fork()/copy_process()/copy_thread().
>
> It seems to work fine. Thanks!
Thanks. Put into no-rebase mode.
On Tue, Oct 16, 2012 at 11:35:08PM +0100, Al Viro wrote:
> Maintainers are Cc'd. My (very, _very_ tentative) patchsets are in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal arch-$ARCH
>
> Nearly in the same state: ia64. The only difference is that I've tested
> it under ski(1) and it seems to work. Accuracy of ski(1) for the purposes
> of finding bugs in asm glue is not inspiring, though.
>
> Not even a tentative patchset: hexagon, openrisc, tile, xtensa.
>
> I would very much appreciate ACKs/testing/fixes/outright replacements/etc.
> for this stuff. Right now all infrastructure is in the mainline and
> per-architecture bits are entirely independent from each other. As soon
> as maintainer in question is OK with what's in such per-architecture branch,
> I'll be quite happy to put it into never-rebased mode, so that it would be
> safe to pull. There are some fun things that'll become possible once
> all architectures are converted, but let's handle that stuff first, OK?
Latest version of the Hexagon patches look good. Thanks!
Acked-by: Richard Kuo <[email protected]>
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
The situation got much better by now. More than a half of
architectures are done - alpha arm arm64 c6x hexagon ia64 m68k mips openrisc
parisc sparc tile um unicore32 and x86.
Two more avait ACKs from maintainers - powerpc and s390. Should work,
AFAICS.
xtensa - Max was going to repost updated patches; waiting for that
to happen, but essentially it's done and tested.
microblaze - Michal was debugging kernel_execve side of it the last
time I've heard from him...
frv, mn10300 - dhowells was going to test those
sh - Paul Mundt was going to test and send fixes
avr32, blackfin, cris, h8300, m32r, score - no signs of life from
maintainers. Folks, please show up and at least test the damn patchsets.
Hell knows, they might even work - unicore32 one did, modulo trivial typo,
to my deep surprise...
On Fri, Oct 26, 2012 at 07:31:07PM +0100, Al Viro wrote:
> The situation got much better by now. More than a half of
> architectures are done - alpha arm arm64 c6x hexagon ia64 m68k mips openrisc
> parisc sparc tile um unicore32 and x86.
>
> Two more avait ACKs from maintainers - powerpc and s390. Should work,
> AFAICS.
>
> xtensa - Max was going to repost updated patches; waiting for that
> to happen, but essentially it's done and tested.
>
> microblaze - Michal was debugging kernel_execve side of it the last
> time I've heard from him...
>
> frv, mn10300 - dhowells was going to test those
>
> sh - Paul Mundt was going to test and send fixes
>
> avr32, blackfin, cris, h8300, m32r, score - no signs of life from
> maintainers. Folks, please show up and at least test the damn patchsets.
> Hell knows, they might even work - unicore32 one did, modulo trivial typo,
> to my deep surprise...
BTW, there's a tangentially related issue: several architectures have
very odd clone(2). Namely, blackfin, h8300, no-MMU microblaze and sh64 (==sh5)
silently ignore child_tidptr and parent_tidptr arguments. I.e. treat them
as NULL - or as if CLONE_PARENT_SETTID/CLONE_CHILD_SETTID/CLONE_CHILD_CLEARTID
were never set. With the patchset in the local part of queue it would be
trivial to switch to normal semantics; strictly speaking, it's an ABI change.
Somebody doing
n = 0x69696969;
if (clone(CLONE_PARENT_SETTID, 0, &n) > 0) {
if (n != 0x69696969) {
printf("oh, shit, we are not on blackfin\n");
exit(-1);
}
}
would run into a user-visible behaviour change, but IMO that's in the realm
of testing for known architecture-dependent bugs and finding them fixed...
Opinions, vetoes? Should we preserve the current behaviour in this case?
I would obviously prefer to just go ahead and fix the sucker - the odds of
any actual software depending on that behaviour are pretty much nil.
Linus, does that cross the boundary between bug fix and ABI breakage?
Another curious thing happens on blackfin; there we subtract 12 from usp
when it's non-zero (zero == inherit the parent's usp, as always). No idea
why is that done; this one definitely has to be preserved, so I'm just
wondering about the reasons behind that oddity... Mike?
On Fri, 26 Oct 2012 19:31:07 +0100
Al Viro <[email protected]> wrote:
> The situation got much better by now. More than a half of
> architectures are done - alpha arm arm64 c6x hexagon ia64 m68k mips openrisc
> parisc sparc tile um unicore32 and x86.
>
> Two more avait ACKs from maintainers - powerpc and s390. Should work,
> AFAICS.
Oops, sorry. I tested this weeks ago but it seems I never wrote a mail to
indicate success. The current git kernel works just fine.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
On Mon, Oct 29, 2012 at 08:53:39AM +0100, Martin Schwidefsky wrote:
> Oops, sorry. I tested this weeks ago but it seems I never wrote a mail to
> indicate success. The current git kernel works just fine.
"Current git" being what? Linus' tree? linux-next? signal.git#arch-s390?
FWIW, the relevant diff against mainline is below, linux-next already
contains it.
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 3f3d9ca..3cdc0f1 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -136,6 +136,7 @@ config S390
select KTIME_SCALAR if 32BIT
select HAVE_ARCH_SECCOMP_FILTER
select GENERIC_KERNEL_THREAD
+ select GENERIC_KERNEL_EXECVE
select HAVE_MOD_ARCH_SPECIFIC
select MODULES_USE_ELF_RELA
diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h
index bbbae41..ccbcab7 100644
--- a/arch/s390/include/asm/unistd.h
+++ b/arch/s390/include/asm/unistd.h
@@ -54,7 +54,6 @@
# define __ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND
# endif
#define __ARCH_WANT_SYS_EXECVE
-#define __ARCH_WANT_KERNEL_EXECVE
/*
* "Conditional" syscalls
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index ef46f66..aa8f2ba 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -330,40 +330,18 @@ ENTRY(ret_from_fork)
la %r11,STACK_FRAME_OVERHEAD(%r15)
l %r12,__LC_THREAD_INFO
l %r13,__LC_SVC_NEW_PSW+4
- tm __PT_PSW+1(%r11),0x01 # forking a kernel thread ?
- je 1f
l %r1,BASED(.Lschedule_tail)
basr %r14,%r1 # call schedule_tail
TRACE_IRQS_ON
ssm __LC_SVC_NEW_PSW # reenable interrupts
- j sysc_tracenogo
-
-1: # it's a kernel thread
- st %r15,__PT_R15(%r11) # store stack pointer for new kthread
- l %r1,BASED(.Lschedule_tail)
- basr %r14,%r1 # call schedule_tail
- TRACE_IRQS_ON
- ssm __LC_SVC_NEW_PSW # reenable interrupts
- lm %r9,%r11,__PT_R9(%r11) # load gprs
+ tm __PT_PSW+1(%r11),0x01 # forking a kernel thread ?
+ jne sysc_tracenogo
+ # it's a kernel thread
+ lm %r9,%r10,__PT_R9(%r11) # load gprs
ENTRY(kernel_thread_starter)
la %r2,0(%r10)
basr %r14,%r9
- la %r2,0
- br %r11 # do_exit
-
-#
-# kernel_execve function needs to deal with pt_regs that is not
-# at the usual place
-#
-ENTRY(ret_from_kernel_execve)
- ssm __LC_PGM_NEW_PSW # disable I/O and ext. interrupts
- lr %r15,%r2
- lr %r11,%r2
- ahi %r15,-STACK_FRAME_OVERHEAD
- xc __SF_BACKCHAIN(4,%r15),__SF_BACKCHAIN(%r15)
- l %r12,__LC_THREAD_INFO
- ssm __LC_SVC_NEW_PSW # reenable interrupts
- j sysc_return
+ j sysc_tracenogo
/*
* Program check handler routine
diff --git a/arch/s390/kernel/entry64.S b/arch/s390/kernel/entry64.S
index 07d8de3..499e95e 100644
--- a/arch/s390/kernel/entry64.S
+++ b/arch/s390/kernel/entry64.S
@@ -352,33 +352,17 @@ sysc_tracenogo:
ENTRY(ret_from_fork)
la %r11,STACK_FRAME_OVERHEAD(%r15)
lg %r12,__LC_THREAD_INFO
- tm __PT_PSW+1(%r11),0x01 # forking a kernel thread ?
- je 1f
brasl %r14,schedule_tail
TRACE_IRQS_ON
ssm __LC_SVC_NEW_PSW # reenable interrupts
- j sysc_tracenogo
-1: # it's a kernel thread
- stg %r15,__PT_R15(%r11) # store stack pointer for new kthread
- brasl %r14,schedule_tail
- TRACE_IRQS_ON
- ssm __LC_SVC_NEW_PSW # reenable interrupts
- lmg %r9,%r11,__PT_R9(%r11) # load gprs
+ tm __PT_PSW+1(%r11),0x01 # forking a kernel thread ?
+ jne sysc_tracenogo
+ # it's a kernel thread
+ lmg %r9,%r10,__PT_R9(%r11) # load gprs
ENTRY(kernel_thread_starter)
la %r2,0(%r10)
basr %r14,%r9
- la %r2,0
- br %r11 # do_exit
-
-ENTRY(ret_from_kernel_execve)
- ssm __LC_PGM_NEW_PSW # disable I/O and ext. interrupts
- lgr %r15,%r2
- lgr %r11,%r2
- aghi %r15,-STACK_FRAME_OVERHEAD
- xc __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
- lg %r12,__LC_THREAD_INFO
- ssm __LC_SVC_NEW_PSW # reenable interrupts
- j sysc_return
+ j sysc_tracenogo
/*
* Program check handler routine
On Mon, 29 Oct 2012 13:25:21 +0000
Al Viro <[email protected]> wrote:
> On Mon, Oct 29, 2012 at 08:53:39AM +0100, Martin Schwidefsky wrote:
>
> > Oops, sorry. I tested this weeks ago but it seems I never wrote a mail to
> > indicate success. The current git kernel works just fine.
>
> "Current git" being what? Linus' tree? linux-next? signal.git#arch-s390?
> FWIW, the relevant diff against mainline is below, linux-next already
> contains it.
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal arch-s390
against Linux 3.7-rc3
# uname -a
Linux r3545014 3.7.0-rc3-00002-g95a96a7 #1 SMP Mon Oct 29 15:32:18 CET 2012 s390x s390x s390x GNU/Linux
works as it is. Feel free to add my Acked-By.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
On Mon, Oct 29, 2012 at 03:38:23PM +0100, Martin Schwidefsky wrote:
> On Mon, 29 Oct 2012 13:25:21 +0000
> Al Viro <[email protected]> wrote:
>
> > On Mon, Oct 29, 2012 at 08:53:39AM +0100, Martin Schwidefsky wrote:
> >
> > > Oops, sorry. I tested this weeks ago but it seems I never wrote a mail to
> > > indicate success. The current git kernel works just fine.
> >
> > "Current git" being what? Linus' tree? linux-next? signal.git#arch-s390?
> > FWIW, the relevant diff against mainline is below, linux-next already
> > contains it.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal arch-s390
>
> against Linux 3.7-rc3
>
> # uname -a
> Linux r3545014 3.7.0-rc3-00002-g95a96a7 #1 SMP Mon Oct 29 15:32:18 CET 2012 s390x s390x s390x GNU/Linux
>
> works as it is. Feel free to add my Acked-By.
Done and pushed, should propagate shortly. It's in no-rebase mode now.
Hi Al,
2012/10/17 Al Viro <[email protected]>:
> On Wed, Oct 17, 2012 at 05:07:03PM +0100, Al Viro wrote:
>> What happens during boot is this:
>> * init_task (not to be confused with init) is used as current during
>> infrastructure initializations. Once everything needed for scheduler and
>> for working fork is set, we spawn two threads - future init and future
>> kthreadd. The last thing we do with init_task is telling init that kthreadd
>> has been spawned. After that init_task turns itself into an idle thread.
>> * future init waits for kthreadd to be spawned (it would be more
>> natural to fork them in opposite order, but we want init to have PID 1 -
>> too much stuff in userland depends on that). Then it does the rest of
>> initialization, including setting up initramfs contents. And does
>> kernel_execve() on /init. Note that this is a task that had been created
>> by kernel_thread() and is currently in function called from
>> ret_from_kernel_thread(). Its kernel stack has been set up by copy_thread().
>> That's where pt_regs need to be set up; note that they'll be passed to
>> start_thread() before you return to userland. If there are any magic bits
>> in pt_regs needed by return-from-syscall code, set them in kthread case of
>> copy_thread().
>
> PS: I suspect that we end up with the wrong value in childregs->msr;
> start_thread() only add MSR_UMS there. I'd suggest running the kernel
> with these patches + printk childregs->msr the very first time start_thread()
> is called and see what it prints, then working kernel + such printk and
> compare the results...
sorry for taking this so long.
I have looked at it and fix it.
Here is the branch based on rc5 (information below)
and here is giweb.
http://developer.petalogix.com/git/gitweb.cgi?p=linux-2.6-microblaze.git;a=shortlog;h=refs/heads/viro/arch-microblaze-rc5
I have also looked at your sys_fork / sys_vfork / sys_clone unification
and I have fixed it for Microblaze.
Also I have done some tests on it for sure.
I would add sys_execve/kernel_execve/kernel_thread patches to my next branch.
Are you OK with that?
Do you need to test anything else for MB?
Thanks,
Michal
The following changes since commit 77b67063bb6bce6d475e910d3b886a606d0d91f7:
Linus Torvalds (1):
Linux 3.7-rc5
are available in the git repository at:
git://git.monstr.eu/linux-2.6-microblaze.git viro/arch-microblaze-rc5
Al Viro (5):
microblaze: switch to generic kernel_thread()
microblaze: switch to generic kernel_execve()
microblaze: switch to generic sys_execve()
generic sys_fork / sys_vfork / sys_clone
microblaze: switch to generic fork/vfork/clone
Michal Simek (3):
microblaze: Fix bug with schedule_tail
microblaze: Define current_pt_regs
microblaze: Remove BIP from childregs
arch/Kconfig | 11 ++++
arch/microblaze/Kconfig | 3 +
arch/microblaze/include/asm/processor.h | 10 +---
arch/microblaze/include/asm/unistd.h | 6 ++
arch/microblaze/kernel/entry-nommu.S | 20 +++-----
arch/microblaze/kernel/entry.S | 57 ++++-------------------
arch/microblaze/kernel/process.c | 77 ++++++++++---------------------
arch/microblaze/kernel/sys_microblaze.c | 53 ---------------------
arch/microblaze/kernel/syscall_table.S | 6 +--
include/asm-generic/syscalls.h | 7 +--
kernel/fork.c | 43 +++++++++++++++++
11 files changed, 111 insertions(+), 182 deletions(-)
--
Michal Simek, Ing. (M.Eng)
w: http://www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian
On Thu, Nov 15, 2012 at 05:41:16PM +0100, Michal Simek wrote:
> Here is the branch based on rc5 (information below)
> and here is giweb.
> http://developer.petalogix.com/git/gitweb.cgi?p=linux-2.6-microblaze.git;a=shortlog;h=refs/heads/viro/arch-microblaze-rc5
>
> I have also looked at your sys_fork / sys_vfork / sys_clone unification
> and I have fixed it for Microblaze.
>
> Also I have done some tests on it for sure.
>
> I would add sys_execve/kernel_execve/kernel_thread patches to my next branch.
> Are you OK with that?
Umm... In principle - yes, but I've a couple of question abouts those.
1) What's that set_fs(USER_DS) in start_thread() for? Note that we do the same
thing in flush_old_exec(), at the same time we remove PF_KTHREAD from
current->flags.
While we are at it, if we *ever* hit do_signal() with KERNEL_DS, we are
very deep in trouble. set_fs(USER_DS) in setup_{rt_,}frame() is pointless.
2) your definition of current_pt_regs() is an exact copy of on in
include/linux/ptrace.h; why is "microblaze: Define current_pt_regs"
needed at all? IOW, I'd rather added #include <linux/ptrace.h> to
arch/microblaze/kernel/process.c instead...
2012/11/15 Al Viro <[email protected]>:
> On Thu, Nov 15, 2012 at 05:41:16PM +0100, Michal Simek wrote:
>> Here is the branch based on rc5 (information below)
>> and here is giweb.
>> http://developer.petalogix.com/git/gitweb.cgi?p=linux-2.6-microblaze.git;a=shortlog;h=refs/heads/viro/arch-microblaze-rc5
>>
>> I have also looked at your sys_fork / sys_vfork / sys_clone unification
>> and I have fixed it for Microblaze.
>>
>> Also I have done some tests on it for sure.
>>
>> I would add sys_execve/kernel_execve/kernel_thread patches to my next branch.
>> Are you OK with that?
>
> Umm... In principle - yes, but I've a couple of question abouts those.
sure.
BTW: that generic sys_fork / sys_vfork / sys_clone will go through your tree.
> 1) What's that set_fs(USER_DS) in start_thread() for? Note that we do the same
> thing in flush_old_exec(), at the same time we remove PF_KTHREAD from
> current->flags.
ok. Will remove it.
> While we are at it, if we *ever* hit do_signal() with KERNEL_DS, we are
> very deep in trouble. set_fs(USER_DS) in setup_{rt_,}frame() is pointless.
I have seen that several your signal patches around signal are there.
Do you have set of tests which should run it?
> 2) your definition of current_pt_regs() is an exact copy of on in
> include/linux/ptrace.h; why is "microblaze: Define current_pt_regs"
> needed at all? IOW, I'd rather added #include <linux/ptrace.h> to
> arch/microblaze/kernel/process.c instead...
Agree. Fixed.
I have updated that branch or I can send you patches if you like.
Thanks,
Michal
--
Michal Simek, Ing. (M.Eng)
w: http://www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian
On Fri, Nov 16, 2012 at 08:59:25AM +0100, Michal Simek wrote:
> Do you have set of tests which should run it?
>
>
> > 2) your definition of current_pt_regs() is an exact copy of on in
> > include/linux/ptrace.h; why is "microblaze: Define current_pt_regs"
> > needed at all? IOW, I'd rather added #include <linux/ptrace.h> to
> > arch/microblaze/kernel/process.c instead...
>
> Agree. Fixed.
>
> I have updated that branch or I can send you patches if you like.
Pulled; see #arch-microblaze in there (== beginning of your branch).
As for the other things I'd like to see confirmed... See #for-michal;
4 commits in there had been hanging around for a long time and if
you are OK with those, I'd like to see them gone into mainline,
by whichever path you prefer.
Another thing that looks like a bug - consider the following testcase:
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
void handler(int n, siginfo_t *foo, void *bar)
{
char *signame = n == SIGUSR1 ? "SIGUSR1" : "SIGUSR2";
stack_t stack;
sigaltstack(NULL, &stack);
printf("took %s%s\n", signame,
stack.ss_flags == SS_ONSTACK ? " on altstack" : "");
if (n == SIGUSR1)
raise(SIGUSR2);
printf("%s done\n", signame);
}
main()
{
struct sigaction s = {
.sa_sigaction = handler,
.sa_flags = SA_ONSTACK | SA_SIGINFO
};
stack_t stack = {.ss_sp = malloc(16384), .ss_size = 16384};
sigaction(SIGUSR2, &s, NULL);
sigaction(SIGUSR1, &s, NULL);
sigaltstack(&stack, NULL);
raise(SIGUSR1);
}
Should print
took SIGUSR1 on altstack
took SIGUSR2 on altstack
SIGUSR2 done
SIGUSR1 done
- we raise SIGUSR1, it's marked onstack, we flip to altstack, raise SIGUSR2
and we are still on altstack, obviously. Now, think what happens on the
way *out* - rt_sigreturn from the second handler will call do_sigaltstack(),
passing it the saved altstack settings... and the user stack pointer we'll
get once we return to caller. I.e. something within the altstack. Which
will give you -EPERM. Which will have microblaze sys_rt_sigreturn() force-feed
you SIGSEGV, AFAICS.
IOW, there's a reason why rt_sigreturn implementations ignore -EPERM from
do_sigaltstack(). A bad one, but... FWIW, sigaltstack handling is a mess
right now:
* every architecture has the sucker done separately, even though
there's very little point doing so; worse yet, they tend to come with
asm wrappers from hell, all for no good reason - we need to get userland
stack pointer and that's done in all kinds of messy ways.
* biarch ones have compat versions that ought to be mergable as
well.
* rt_sigreturn instances call do_sigaltstack() and ignore just
about everything; -EFAULT is not ignored, but realistically it's impossible
to hit - you'd need a race with munmap() ripping the stack page from under
you just as you've almost finished with sigreturn. Accesses on both sides
of that stack_t had been done by that point, so nothing short of such munmap()
would do. Everything else *is* ignored, or we are screwed. AFAICS, that's
what microblaze has stepped into.
As far as I can tell, the sane way to deal with that would be to introduce
(mandatory) helper that would give you the current userland stack pointer.
It's almost always either user_stack_pointer(current_pt_regs()) or rdusp().
There are few exceptions - itanic has user_stack_pointer giving the backing
store of register stack instead of desired r12 and several architectures
lack user_stack_pointer() even though the stack pointer is saved in pt_regs
and helper is trivial to add. That dealt with, we can take sys_sigaltstack()
to kernel/signal.c unconditionally. And kill the wrappers on almost everything.
The next step is unifying compat variants; AFAICS, that's also not a problem.
Then we need bool restore_altstack(const stack_t __user *) and compat
counterpart - originally with "call do_sigaltstack(), fail if and only if
it has returned -EFAULT", then a saner behaviour ("if we are not asked to
change the current settings, just succeed and to hell with on_sig_stack()
check; any other error case means that sigframe had been deliberately messed
with and deserves a failure").
Linus, do you have any objections to the above? FWIW, I've a tentative
patchset in that direction (most of it from the last cycle); right now
it + stuff currently in signal.git#for-next is at -3.4KLoC and I hadn't
dealt with the biarch side of things yet...
On Sat, Nov 17, 2012 at 7:45 PM, Al Viro <[email protected]> wrote:
>
> Linus, do you have any objections to the above? FWIW, I've a tentative
> patchset in that direction (most of it from the last cycle); right now
> it + stuff currently in signal.git#for-next is at -3.4KLoC and I hadn't
> dealt with the biarch side of things yet...
I have absolutely no objections. sigaltstack has always been kind of
messy, and made worse by the fact that it gets effectively no testing
(because it's generally not used by normal code and even code that
uses it tends to use it only for very uncommon events). So forcing all
the sigaltstack code into generic code and at least avoiding the
"different architectures can get things subtly - or not so subtly -
wrong in different ways" sounds like a good thing.
Linus
From: Linus Torvalds <[email protected]>
Date: Sun, 18 Nov 2012 08:45:43 -1000
> On Sat, Nov 17, 2012 at 7:45 PM, Al Viro <[email protected]> wrote:
>>
>> Linus, do you have any objections to the above? FWIW, I've a tentative
>> patchset in that direction (most of it from the last cycle); right now
>> it + stuff currently in signal.git#for-next is at -3.4KLoC and I hadn't
>> dealt with the biarch side of things yet...
>
> I have absolutely no objections. sigaltstack has always been kind of
> messy, and made worse by the fact that it gets effectively no testing
> (because it's generally not used by normal code and even code that
> uses it tends to use it only for very uncommon events). So forcing all
> the sigaltstack code into generic code and at least avoiding the
> "different architectures can get things subtly - or not so subtly -
> wrong in different ways" sounds like a good thing.
FWIW, if folks are looking for testcases there are a small number in
glibc, a quick grep shows:
nptl/tst-cancel20.c
nptl/tst-cancel21.c
nptl/tst-signal6.c
debug/tst-longjmp_chk2.c
LTP probably has a bunch too.
On Sun, Nov 18, 2012 at 02:03:32PM -0500, David Miller wrote:
> > I have absolutely no objections. sigaltstack has always been kind of
> > messy, and made worse by the fact that it gets effectively no testing
> > (because it's generally not used by normal code and even code that
> > uses it tends to use it only for very uncommon events). So forcing all
> > the sigaltstack code into generic code and at least avoiding the
> > "different architectures can get things subtly - or not so subtly -
> > wrong in different ways" sounds like a good thing.
>
> FWIW, if folks are looking for testcases there are a small number in
> glibc, a quick grep shows:
>
> nptl/tst-cancel20.c
> nptl/tst-cancel21.c
> nptl/tst-signal6.c
> debug/tst-longjmp_chk2.c
>
> LTP probably has a bunch too.
Might be a good idea to start adding tests/* in the kernel tree, perhaps?
Ones in glibc had been present prior to the LGPLv3 clusterfuck, by the
look of it...
From: Al Viro <[email protected]>
Date: Sun, 18 Nov 2012 19:59:21 +0000
> Might be a good idea to start adding tests/* in the kernel tree,
> perhaps?
I've always been a strong advocate of this idea.
I think if someone just did the work to get things going, it would
just pick up it's own momentum and get merged quite easily.
On Sun, Nov 18, 2012 at 07:59:21PM +0000, Al Viro wrote:
> On Sun, Nov 18, 2012 at 02:03:32PM -0500, David Miller wrote:
> > > I have absolutely no objections. sigaltstack has always been kind of
> > > messy, and made worse by the fact that it gets effectively no testing
> > > (because it's generally not used by normal code and even code that
> > > uses it tends to use it only for very uncommon events). So forcing all
> > > the sigaltstack code into generic code and at least avoiding the
> > > "different architectures can get things subtly - or not so subtly -
> > > wrong in different ways" sounds like a good thing.
> >
> > FWIW, if folks are looking for testcases there are a small number in
> > glibc, a quick grep shows:
> >
> > nptl/tst-cancel20.c
> > nptl/tst-cancel21.c
> > nptl/tst-signal6.c
> > debug/tst-longjmp_chk2.c
> >
> > LTP probably has a bunch too.
>
> Might be a good idea to start adding tests/* in the kernel tree, perhaps?
> Ones in glibc had been present prior to the LGPLv3 clusterfuck, by the
> look of it...
Incidentally, sparc64 is also broken the same way:
sparc64:~# gcc sigreturn-test.c
sparc64:~# ./a.out
entering SIGUSR1 on altstack
entering SIGUSR2 on altstack
SIGUSR2 done
SIGUSR1 done
sparc64:~# gcc -m64 sigreturn-test.c
sparc64:~# ./a.out
entering SIGUSR1 on altstack
entering SIGUSR2 on altstack
SIGUSR2 done
Segmentation fault
sparc64:~#
32bit rt_sigreturn (in signal32.c) simply ignores all errors. One in
signal_64.c fails on *any* error from do_sigaltstack(), with the results
above...
sigreturn-test.c is what I've posted upthread. I think this one is
-stable fodder. It's not hard to deal with, fortunately. Are you
OK with the patch below? Should be the minimal fix, getting rid of
those segfaults and converting to usual semantics here...
Signed-off-by: Al Viro <[email protected]>
---
diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c
index 867de2f..689e1ba 100644
--- a/arch/sparc/kernel/signal_64.c
+++ b/arch/sparc/kernel/signal_64.c
@@ -295,9 +295,7 @@ void do_rt_sigreturn(struct pt_regs *regs)
err |= restore_fpu_state(regs, fpu_save);
err |= __copy_from_user(&set, &sf->mask, sizeof(sigset_t));
- err |= do_sigaltstack(&sf->stack, NULL, (unsigned long)sf);
-
- if (err)
+ if (err || do_sigaltstack(&sf->stack, NULL, (unsigned long)sf) == -EFAULT)
goto segv;
err |= __get_user(rwin_save, &sf->rwin_save);
From: Al Viro <[email protected]>
Date: Sun, 18 Nov 2012 21:02:53 +0000
> Are you OK with the patch below? Should be the minimal fix, getting
> rid of those segfaults and converting to usual semantics here...
>
> Signed-off-by: Al Viro <[email protected]>
Yep, looks fine:
Acked-by: David S. Miller <[email protected]>
On Sun, Nov 18, 2012 at 04:18:33PM -0500, David Miller wrote:
> From: Al Viro <[email protected]>
> Date: Sun, 18 Nov 2012 21:02:53 +0000
>
> > Are you OK with the patch below? Should be the minimal fix, getting
> > rid of those segfaults and converting to usual semantics here...
> >
> > Signed-off-by: Al Viro <[email protected]>
>
> Yep, looks fine:
>
> Acked-by: David S. Miller <[email protected]>
Er... So which tree should that go through? sparc or signal? There's
a similar microblaze patch and a few more of the "do_sigaltstack() takes
userland pointer" variety, so I can put together a pile in
signal.git#for-linus, but if you prefer that to go through sparc tree,
I'm fine with that...
From: Al Viro <[email protected]>
Date: Mon, 19 Nov 2012 01:10:13 +0000
> On Sun, Nov 18, 2012 at 04:18:33PM -0500, David Miller wrote:
>> From: Al Viro <[email protected]>
>> Date: Sun, 18 Nov 2012 21:02:53 +0000
>>
>> > Are you OK with the patch below? Should be the minimal fix, getting
>> > rid of those segfaults and converting to usual semantics here...
>> >
>> > Signed-off-by: Al Viro <[email protected]>
>>
>> Yep, looks fine:
>>
>> Acked-by: David S. Miller <[email protected]>
>
> Er... So which tree should that go through? sparc or signal? There's
> a similar microblaze patch and a few more of the "do_sigaltstack() takes
> userland pointer" variety, so I can put together a pile in
> signal.git#for-linus, but if you prefer that to go through sparc tree,
> I'm fine with that...
I'm happy to take it via my sparc tree, and I'll queue it up for
-stable as well. Can you resend it to me with a proper commit message?
Thanks a lot Al.
On Sun, Nov 18, 2012 at 08:30:05PM -0500, David Miller wrote:
> > Er... So which tree should that go through? sparc or signal? There's
> > a similar microblaze patch and a few more of the "do_sigaltstack() takes
> > userland pointer" variety, so I can put together a pile in
> > signal.git#for-linus, but if you prefer that to go through sparc tree,
> > I'm fine with that...
>
> I'm happy to take it via my sparc tree, and I'll queue it up for
> -stable as well. Can you resend it to me with a proper commit message?
sparc64: not any error from do_sigaltstack() should fail rt_sigreturn()
If a signal handler is executed on altstack and another signal comes,
we will end up with rt_sigreturn() on return from the second handler
getting -EPERM from do_sigaltstack(). It's perfectly OK, since we
are not asking to change the settings; in fact, they couldn't have been
changed during the second handler execution exactly because we'd been
on altstack all along. 64bit sigreturn on sparc treats any error from
do_sigaltstack() as "SIGSEGV now"; we need to switch to the same semantics
we are using on other architectures.
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
---
diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c
index 867de2f..689e1ba 100644
--- a/arch/sparc/kernel/signal_64.c
+++ b/arch/sparc/kernel/signal_64.c
@@ -295,9 +295,7 @@ void do_rt_sigreturn(struct pt_regs *regs)
err |= restore_fpu_state(regs, fpu_save);
err |= __copy_from_user(&set, &sf->mask, sizeof(sigset_t));
- err |= do_sigaltstack(&sf->stack, NULL, (unsigned long)sf);
-
- if (err)
+ if (err || do_sigaltstack(&sf->stack, NULL, (unsigned long)sf) == -EFAULT)
goto segv;
err |= __get_user(rwin_save, &sf->rwin_save);
From: Al Viro <[email protected]>
Date: Mon, 19 Nov 2012 02:35:07 +0000
> sparc64: not any error from do_sigaltstack() should fail rt_sigreturn()
>
> If a signal handler is executed on altstack and another signal comes,
> we will end up with rt_sigreturn() on return from the second handler
> getting -EPERM from do_sigaltstack(). It's perfectly OK, since we
> are not asking to change the settings; in fact, they couldn't have been
> changed during the second handler execution exactly because we'd been
> on altstack all along. 64bit sigreturn on sparc treats any error from
> do_sigaltstack() as "SIGSEGV now"; we need to switch to the same semantics
> we are using on other architectures.
>
> Cc: [email protected]
> Signed-off-by: Al Viro <[email protected]>
Applied, thanks.
On Sun, Nov 18, 2012 at 03:48:59PM -0500, David Miller wrote:
> From: Al Viro <[email protected]>
> Date: Sun, 18 Nov 2012 19:59:21 +0000
>
> > Might be a good idea to start adding tests/* in the kernel tree,
> > perhaps?
>
> I've always been a strong advocate of this idea.
I would also love to see this happen.
thanks,
greg k-h
On Sun, Nov 18, 2012 at 08:45:43AM -1000, Linus Torvalds wrote:
> On Sat, Nov 17, 2012 at 7:45 PM, Al Viro <[email protected]> wrote:
> >
> > Linus, do you have any objections to the above? FWIW, I've a tentative
> > patchset in that direction (most of it from the last cycle); right now
> > it + stuff currently in signal.git#for-next is at -3.4KLoC and I hadn't
> > dealt with the biarch side of things yet...
>
> I have absolutely no objections. sigaltstack has always been kind of
> messy, and made worse by the fact that it gets effectively no testing
> (because it's generally not used by normal code and even code that
> uses it tends to use it only for very uncommon events). So forcing all
> the sigaltstack code into generic code and at least avoiding the
> "different architectures can get things subtly - or not so subtly -
> wrong in different ways" sounds like a good thing.
OK... Intermediate state (do_sigaltstack() guts still need to be cleaned up
and distributed between the 4 callers) is in signal.git#master right now;
_not_ for merge in that form. Crap found in process:
* microblaze, sparc64 and tile (compat side) have the same bug (see
above). Fixed in that queue.
* score, sh64 and openrisc all tried call do do_sigaltstack() passing
it an address of on-stack copy; no, set_fs() was not used. Fixed in that queue.
* alpha and c6x do not bother to restore saved altstack settings on
sigreturn.
* c6x, cris and hexagon don't bother *saving* altstack settings on
signal arrival. And yes, it does mean that cris and hexagon call
do_sigaltstack() on an unitialized chunk of userland stack. Needs to be
fixed (and in a way that could go into -stable, unfortunately ;-/)
I hadn't done anything serious with altstack users yet; I'm afraid that
this code (i.e. sigframe allocation logics) will also bring a lot of
fun weirdness... ;-/
On Sun, Nov 18, 2012 at 10:27:24PM -0500, David Miller wrote:
> > Cc: [email protected]
> > Signed-off-by: Al Viro <[email protected]>
>
> Applied, thanks.
Hmm... There's something odd going on with {rt_,}sigaction on sparc -
we *do* have sa_restorer in struct sigaction and struct old_sigaction,
but it's not used for anything whatsoever. There's also a separately
passed restorer pointer for rt_sigaction() and *that* is used instead,
but not reported via *oact.
What's the reason for that weirdness? I understand why we do that on
alpha (we have no sa_restorer in struct sigaction we'd inherited from
OSF/1), but sparc always had perfectly normal sigaction->sa_restorer
field all along - even for old sigaction(2)...
On Mon, Nov 26, 2012 at 05:10:02AM +0000, Al Viro wrote:
> On Sun, Nov 18, 2012 at 10:27:24PM -0500, David Miller wrote:
> > > Cc: [email protected]
> > > Signed-off-by: Al Viro <[email protected]>
> >
> > Applied, thanks.
>
> Hmm... There's something odd going on with {rt_,}sigaction on sparc -
> we *do* have sa_restorer in struct sigaction and struct old_sigaction,
> but it's not used for anything whatsoever. There's also a separately
> passed restorer pointer for rt_sigaction() and *that* is used instead,
> but not reported via *oact.
>
> What's the reason for that weirdness? I understand why we do that on
> alpha (we have no sa_restorer in struct sigaction we'd inherited from
> OSF/1), but sparc always had perfectly normal sigaction->sa_restorer
> field all along - even for old sigaction(2)...
PS: speaking of weirdness, what's the reason for sparc and ppc (and nothing
else) expecting the first argument of sigaction(2) to be minus signal
number? ABI archaeology is fun...