2017-07-12 21:28:00

by Matthias Kaehlcke

[permalink] [raw]
Subject: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

Commit f05058c4d652 supposedly "forces a stack frame to be created before
the inline asm code if CONFIG_FRAME_POINTER is enabled by listing the
stack pointer as an output operand for the get_user() inline assembly
statement.". This doesn't work as intended, at least with gcc v4.9.2 and
x86-64 the generated code is exactly the same with and without the patch.
However clang adds an extra instruction that adjusts %rsp, which ends up
causing double faults all over the place.

Signed-off-by: Matthias Kaehlcke <[email protected]>
---
arch/x86/include/asm/uaccess.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 476ea27f490b..9ec2beab73df 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -161,11 +161,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
({ \
int __ret_gu; \
register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
- register void *__sp asm(_ASM_SP); \
__chk_user_ptr(ptr); \
might_fault(); \
- asm volatile("call __get_user_%P4" \
- : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
+ asm volatile("call __get_user_%P3" \
+ : "=a" (__ret_gu), "=r" (__val_gu) \
: "0" (ptr), "i" (sizeof(*(ptr)))); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
__builtin_expect(__ret_gu, 0); \
--
2.13.2.932.g7449e964c-goog


2017-07-12 22:12:48

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Wed, Jul 12, 2017 at 02:27:44PM -0700, Matthias Kaehlcke wrote:
> Commit f05058c4d652 supposedly "forces a stack frame to be created before
> the inline asm code if CONFIG_FRAME_POINTER is enabled by listing the
> stack pointer as an output operand for the get_user() inline assembly
> statement.". This doesn't work as intended, at least with gcc v4.9.2 and
> x86-64 the generated code is exactly the same with and without the patch.
> However clang adds an extra instruction that adjusts %rsp, which ends up
> causing double faults all over the place.

I don't think reverting it is the right approach, because that will
still break frame pointers in certain cases.

The original commit probably should have clarified:

" ... forces a stack frame *if it doesn't already exist*."

In *most* cases it will have no effect, as you saw, because users of
get_user() tend to do other function calls beforehand, so they will have
already saved the frame pointer before calling it.

However, that isn't always the case. We found that certain configs
change GCC's behavior such that, for certain get_user() call sites, the
containing function doesn't saved the frame pointer before inserting
get_user()'s inline asm.

GCC completely ignores inline asm, so it has no idea that it has a call
instruction in it. So in general, *any* inline asm with a call
instruction needs this constraint, to force the frame pointer to be
saved, if it hasn't already.

This is admittedly an awkward way of achieving this goal, but it's the
only way I know how to do it with GCC.

What extra instruction does clang add?

--
Josh

2017-07-12 22:20:44

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

Hi Josh,

thanks for your prompt reply.

El Wed, Jul 12, 2017 at 05:12:42PM -0500 Josh Poimboeuf ha dit:

> On Wed, Jul 12, 2017 at 02:27:44PM -0700, Matthias Kaehlcke wrote:
> > Commit f05058c4d652 supposedly "forces a stack frame to be created before
> > the inline asm code if CONFIG_FRAME_POINTER is enabled by listing the
> > stack pointer as an output operand for the get_user() inline assembly
> > statement.". This doesn't work as intended, at least with gcc v4.9.2 and
> > x86-64 the generated code is exactly the same with and without the patch.
> > However clang adds an extra instruction that adjusts %rsp, which ends up
> > causing double faults all over the place.
>
> I don't think reverting it is the right approach, because that will
> still break frame pointers in certain cases.
>
> The original commit probably should have clarified:
>
> " ... forces a stack frame *if it doesn't already exist*."
>
>
> In *most* cases it will have no effect, as you saw, because users of
> get_user() tend to do other function calls beforehand, so they will have
> already saved the frame pointer before calling it.
>
> However, that isn't always the case. We found that certain configs
> change GCC's behavior such that, for certain get_user() call sites, the
> containing function doesn't saved the frame pointer before inserting
> get_user()'s inline asm.
>
> GCC completely ignores inline asm, so it has no idea that it has a call
> instruction in it. So in general, *any* inline asm with a call
> instruction needs this constraint, to force the frame pointer to be
> saved, if it hasn't already.

Thanks for the clarification!

> This is admittedly an awkward way of achieving this goal, but it's the
> only way I know how to do it with GCC.
>
> What extra instruction does clang add?

I was looking at the get_user() call in drm_mode_setcrtc(). The code
generated by clang without the patch is:

if (get_user(out_id, &set_connectors_ptr[i])) {
ffffffff81386955: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
ffffffff8138695c: 00
ffffffff8138695d: 49 03 06 add (%r14),%rax
ffffffff81386960: e8 2b a5 f0 ff callq ffffffff81290e90 <__get_user_4>

And with the patch:

if (get_user(out_id, &set_connectors_ptr[i])) {
ffffffff81386a56: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
ffffffff81386a5d: 00
ffffffff81386a5e: 49 03 06 add (%r14),%rax
ffffffff81386a61: 48 8b 64 24 28 mov 0x28(%rsp),%rsp
ffffffff81386a66: e8 15 a5 f0 ff callq
ffffffff81290f80 <__get_user_4>

2017-07-12 22:35:53

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Wed, Jul 12, 2017 at 03:20:40PM -0700, Matthias Kaehlcke wrote:
> > This is admittedly an awkward way of achieving this goal, but it's the
> > only way I know how to do it with GCC.
> >
> > What extra instruction does clang add?
>
> I was looking at the get_user() call in drm_mode_setcrtc(). The code
> generated by clang without the patch is:
>
> if (get_user(out_id, &set_connectors_ptr[i])) {
> ffffffff81386955: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
> ffffffff8138695c: 00
> ffffffff8138695d: 49 03 06 add (%r14),%rax
> ffffffff81386960: e8 2b a5 f0 ff callq ffffffff81290e90 <__get_user_4>
>
> And with the patch:
>
> if (get_user(out_id, &set_connectors_ptr[i])) {
> ffffffff81386a56: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
> ffffffff81386a5d: 00
> ffffffff81386a5e: 49 03 06 add (%r14),%rax
> ffffffff81386a61: 48 8b 64 24 28 mov 0x28(%rsp),%rsp
> ffffffff81386a66: e8 15 a5 f0 ff callq
> ffffffff81290f80 <__get_user_4>

Hm, that seems odd. Can you sure the disassembly for the whole
function?

--
Josh

2017-07-12 22:36:35

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Wed, Jul 12, 2017 at 05:35:47PM -0500, Josh Poimboeuf wrote:
> On Wed, Jul 12, 2017 at 03:20:40PM -0700, Matthias Kaehlcke wrote:
> > > This is admittedly an awkward way of achieving this goal, but it's the
> > > only way I know how to do it with GCC.
> > >
> > > What extra instruction does clang add?
> >
> > I was looking at the get_user() call in drm_mode_setcrtc(). The code
> > generated by clang without the patch is:
> >
> > if (get_user(out_id, &set_connectors_ptr[i])) {
> > ffffffff81386955: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
> > ffffffff8138695c: 00
> > ffffffff8138695d: 49 03 06 add (%r14),%rax
> > ffffffff81386960: e8 2b a5 f0 ff callq ffffffff81290e90 <__get_user_4>
> >
> > And with the patch:
> >
> > if (get_user(out_id, &set_connectors_ptr[i])) {
> > ffffffff81386a56: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
> > ffffffff81386a5d: 00
> > ffffffff81386a5e: 49 03 06 add (%r14),%rax
> > ffffffff81386a61: 48 8b 64 24 28 mov 0x28(%rsp),%rsp
> > ffffffff81386a66: e8 15 a5 f0 ff callq
> > ffffffff81290f80 <__get_user_4>
>
> Hm, that seems odd. Can you sure the disassembly for the whole
> function?

Er, share :-)

--
Josh

2017-07-12 23:22:19

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

El Wed, Jul 12, 2017 at 05:36:30PM -0500 Josh Poimboeuf ha dit:

> On Wed, Jul 12, 2017 at 05:35:47PM -0500, Josh Poimboeuf wrote:
> > On Wed, Jul 12, 2017 at 03:20:40PM -0700, Matthias Kaehlcke wrote:
> > > > This is admittedly an awkward way of achieving this goal, but it's the
> > > > only way I know how to do it with GCC.
> > > >
> > > > What extra instruction does clang add?
> > >
> > > I was looking at the get_user() call in drm_mode_setcrtc(). The code
> > > generated by clang without the patch is:
> > >
> > > if (get_user(out_id, &set_connectors_ptr[i])) {
> > > ffffffff81386955: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
> > > ffffffff8138695c: 00
> > > ffffffff8138695d: 49 03 06 add (%r14),%rax
> > > ffffffff81386960: e8 2b a5 f0 ff callq ffffffff81290e90 <__get_user_4>
> > >
> > > And with the patch:
> > >
> > > if (get_user(out_id, &set_connectors_ptr[i])) {
> > > ffffffff81386a56: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
> > > ffffffff81386a5d: 00
> > > ffffffff81386a5e: 49 03 06 add (%r14),%rax
> > > ffffffff81386a61: 48 8b 64 24 28 mov 0x28(%rsp),%rsp
> > > ffffffff81386a66: e8 15 a5 f0 ff callq
> > > ffffffff81290f80 <__get_user_4>
> >
> > Hm, that seems odd. Can you sure the disassembly for the whole
> > function?
>
> Er, share :-)

Sure, please find below the disassemblies with and without the
patch. The exact extra instruction differs from the one above, the
disassembly above is from a debug session with some 'random' kernel
version (bisect), the ones below from a v4.12ish kernel. At the bottom
you also find a log of a double faults observed with the patch.

If you are interested in building the kernel with clang yourself I can
provide instructions, it is fairly painless nowadays as long as you
have a recent version of clang (a somewhat older version should also
do for this issue with some extra kernel patches).

With f05058c4d652:

ffffffff81367550 <drm_mode_setcrtc>:
* Returns:
* Zero on success, negative errno on failure.
*/
int drm_mode_setcrtc(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
ffffffff81367550: e8 db 71 2e 00 callq ffffffff8164e730 <__fentry__>
ffffffff81367555: 55 push %rbp
ffffffff81367556: 48 89 e5 mov %rsp,%rbp
ffffffff81367559: 41 57 push %r15
ffffffff8136755b: 41 56 push %r14
ffffffff8136755d: 41 55 push %r13
ffffffff8136755f: 41 54 push %r12
ffffffff81367561: 53 push %rbx
ffffffff81367562: 48 81 ec c0 00 00 00 sub $0xc0,%rsp
ffffffff81367569: 49 89 f6 mov %rsi,%r14
ffffffff8136756c: 48 89 fb mov %rdi,%rbx
#define DRM_SWITCH_POWER_DYNAMIC_OFF 3

static __inline__ int drm_core_check_feature(struct drm_device *dev,
int feature)
{
return ((dev->driver->driver_features & feature) ? 1 : 0);
ffffffff8136756f: 48 8b 43 20 mov 0x20(%rbx),%rax
uint32_t __user *set_connectors_ptr;
struct drm_modeset_acquire_ctx ctx;
int ret;
int i;

if (!drm_core_check_feature(dev, DRIVER_MODESET))
ffffffff81367573: f6 80 81 01 00 00 20 testb $0x20,0x181(%rax)
ffffffff8136757a: 75 09 jne ffffffff81367585 <drm_mode_setcrtc+0x35>
ffffffff8136757c: 6a ea pushq $0xffffffffffffffea
ffffffff8136757e: 41 5c pop %r12
ffffffff81367580: e9 d5 04 00 00 jmpq ffffffff81367a5a <drm_mode_setcrtc+0x50a>
ffffffff81367585: 6a de pushq $0xffffffffffffffde
ffffffff81367587: 41 5c pop %r12

/*
* Universal plane src offsets are only 16.16, prevent havoc for
* drivers using universal plane code internally.
*/
if (crtc_req->x & 0xffff0000 || crtc_req->y & 0xffff0000)
ffffffff81367589: 41 81 7e 14 ff ff 00 cmpl $0xffff,0x14(%r14)
ffffffff81367590: 00
ffffffff81367591: 0f 87 c3 04 00 00 ja ffffffff81367a5a <drm_mode_setcrtc+0x50a>
ffffffff81367597: 41 81 7e 18 ff ff 00 cmpl $0xffff,0x18(%r14)
ffffffff8136759e: 00
ffffffff8136759f: 0f 87 b5 04 00 00 ja ffffffff81367a5a <drm_mode_setcrtc+0x50a>
return -ERANGE;

crtc = drm_crtc_find(dev, crtc_req->crtc_id);
ffffffff813675a5: 41 8b 76 0c mov 0xc(%r14),%esi
ffffffff813675a9: 48 89 df mov %rbx,%rdi
ffffffff813675ac: e8 15 fe ff ff callq ffffffff813673c6 <drm_crtc_find>
ffffffff813675b1: 49 89 c5 mov %rax,%r13
if (!crtc) {
ffffffff813675b4: 4d 85 ed test %r13,%r13
ffffffff813675b7: 0f 84 7d 04 00 00 je ffffffff81367a3a <drm_mode_setcrtc+0x4ea>
ffffffff813675bd: 48 89 5d b0 mov %rbx,-0x50(%rbp)
DRM_DEBUG_KMS("Unknown CRTC ID %d\n", crtc_req->crtc_id);
return -ENOENT;
}
DRM_DEBUG_KMS("[CRTC:%d:%s]\n", crtc->base.id, crtc->name);
ffffffff813675c1: 41 8b 4d 78 mov 0x78(%r13),%ecx
ffffffff813675c5: 4d 8b 45 20 mov 0x20(%r13),%r8
ffffffff813675c9: 6a 04 pushq $0x4
ffffffff813675cb: 5e pop %rsi
ffffffff813675cc: 48 c7 c7 5d 82 95 81 mov $0xffffffff8195825d,%rdi
ffffffff813675d3: 48 c7 c2 e8 55 9b 81 mov $0xffffffff819b55e8,%rdx
ffffffff813675da: 31 c0 xor %eax,%eax
ffffffff813675dc: e8 45 c6 ff ff callq ffffffff81363c26 <drm_printk>

mutex_lock(&crtc->dev->mode_config.mutex);
ffffffff813675e1: 49 8b 7d 00 mov 0x0(%r13),%rdi
ffffffff813675e5: 48 81 c7 b0 02 00 00 add $0x2b0,%rdi
ffffffff813675ec: e8 b2 3c 2e 00 callq ffffffff8164b2a3 <mutex_lock>
ffffffff813675f1: 48 8d 9d 38 ff ff ff lea -0xc8(%rbp),%rbx
drm_modeset_acquire_init(&ctx, 0);
ffffffff813675f8: 31 f6 xor %esi,%esi
ffffffff813675fa: 48 89 df mov %rbx,%rdi
ffffffff813675fd: e8 ef 7d 00 00 callq ffffffff8136f3f1 <drm_modeset_acquire_init>
ffffffff81367602: 49 8d 46 24 lea 0x24(%r14),%rax
ffffffff81367606: 48 89 45 a0 mov %rax,-0x60(%rbp)
ffffffff8136760a: 31 c0 xor %eax,%eax
ffffffff8136760c: 48 89 45 c0 mov %rax,-0x40(%rbp)
ffffffff81367610: 31 c0 xor %eax,%eax
ffffffff81367612: 48 89 45 d0 mov %rax,-0x30(%rbp)
ffffffff81367616: 31 c0 xor %eax,%eax
ffffffff81367618: 48 89 45 c8 mov %rax,-0x38(%rbp)
ffffffff8136761c: 45 31 ff xor %r15d,%r15d
ffffffff8136761f: 48 89 45 a8 mov %rax,-0x58(%rbp)
ffffffff81367623: 4c 89 6d b8 mov %r13,-0x48(%rbp)
ffffffff81367627: e9 0a 03 00 00 jmpq ffffffff81367936 <drm_mode_setcrtc+0x3e6>
ffffffff8136762c: 48 8d 9d 38 ff ff ff lea -0xc8(%rbp),%rbx
}
}
kfree(connector_set);
drm_mode_destroy(dev, mode);
if (ret == -EDEADLK) {
drm_modeset_backoff(&ctx);
ffffffff81367633: 48 89 df mov %rbx,%rdi
ffffffff81367636: e8 88 7e 00 00 callq ffffffff8136f4c3 <drm_modeset_backoff>
ffffffff8136763b: e9 f6 02 00 00 jmpq ffffffff81367936 <drm_mode_setcrtc+0x3e6>
goto out;
if (crtc_req->mode_valid) {
/* If we have a mode we need a framebuffer. */
/* If we pass -1, set the mode with the currently bound fb */
if (crtc_req->fb_id == -1) {
if (!crtc->primary->fb) {
ffffffff81367640: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax
ffffffff81367647: 48 8b 98 b0 00 00 00 mov 0xb0(%rax),%rbx
ffffffff8136764e: 48 85 db test %rbx,%rbx
ffffffff81367651: 0f 84 19 01 00 00 je ffffffff81367770 <drm_mode_setcrtc+0x220>
*
* This function increments the framebuffer's reference count.
*/
static inline void drm_framebuffer_get(struct drm_framebuffer *fb)
{
drm_mode_object_get(&fb->base);
ffffffff81367657: 48 8d 7b 18 lea 0x18(%rbx),%rdi
ffffffff8136765b: e8 97 de 00 00 callq ffffffff813754f7 <drm_mode_object_get>
ffffffff81367660: 48 89 5d d0 mov %rbx,-0x30(%rbp)
ffffffff81367664: 4c 8b 6d b8 mov -0x48(%rbp),%r13
ffffffff81367668: 48 8b 5d b0 mov -0x50(%rbp),%rbx
ret = -ENOENT;
goto out;
}
}

mode = drm_mode_create(dev);
ffffffff8136766c: 48 89 df mov %rbx,%rdi
ffffffff8136766f: e8 32 08 00 00 callq ffffffff81367ea6 <drm_mode_create>
if (!mode) {
ffffffff81367674: 48 85 c0 test %rax,%rax
ffffffff81367677: 74 38 je ffffffff813676b1 <drm_mode_setcrtc+0x161>
ffffffff81367679: 48 89 c1 mov %rax,%rcx
ret = -ENOMEM;
goto out;
}

ret = drm_mode_convert_umode(mode, &crtc_req->mode);
ffffffff8136767c: 48 89 4d c0 mov %rcx,-0x40(%rbp)
ffffffff81367680: 48 89 c7 mov %rax,%rdi
ffffffff81367683: 48 8b 75 a0 mov -0x60(%rbp),%rsi
ffffffff81367687: e8 3a 1b 00 00 callq ffffffff813691c6 <drm_mode_convert_umode>
ffffffff8136768c: 41 89 c4 mov %eax,%r12d
if (ret) {
ffffffff8136768f: 45 85 e4 test %r12d,%r12d
ffffffff81367692: 74 2c je ffffffff813676c0 <drm_mode_setcrtc+0x170>
DRM_DEBUG_KMS("Invalid mode\n");
ffffffff81367694: 48 c7 c7 5d 82 95 81 mov $0xffffffff8195825d,%rdi
ffffffff8136769b: 48 c7 c2 1c 7b 9b 81 mov $0xffffffff819b7b1c,%rdx
ffffffff813676a2: 31 c0 xor %eax,%eax
ffffffff813676a4: 6a 04 pushq $0x4
ffffffff813676a6: 5e pop %rsi
ffffffff813676a7: e8 7a c5 ff ff callq ffffffff81363c26 <drm_printk>
ffffffff813676ac: e9 a0 02 00 00 jmpq ffffffff81367951 <drm_mode_setcrtc+0x401>
ffffffff813676b1: 31 c0 xor %eax,%eax
ffffffff813676b3: 48 89 45 c0 mov %rax,-0x40(%rbp)
ffffffff813676b7: 6a f4 pushq $0xfffffffffffffff4
ffffffff813676b9: 41 5c pop %r12
ffffffff813676bb: e9 91 02 00 00 jmpq ffffffff81367951 <drm_mode_setcrtc+0x401>
* Drivers not implementing the universal planes API use a
* default formats list provided by the DRM core which doesn't
* match real hardware capabilities. Skip the check in that
* case.
*/
if (!crtc->primary->format_default) {
ffffffff813676c0: 49 8b bd 98 00 00 00 mov 0x98(%r13),%rdi
ffffffff813676c7: 80 bf a4 00 00 00 00 cmpb $0x0,0xa4(%rdi)
ffffffff813676ce: 0f 84 cf 00 00 00 je ffffffff813677a3 <drm_mode_setcrtc+0x253>
&format_name));
goto out;
}
}

ret = drm_crtc_check_viewport(crtc, crtc_req->x, crtc_req->y,
ffffffff813676d4: 41 8b 76 14 mov 0x14(%r14),%esi
ffffffff813676d8: 41 8b 56 18 mov 0x18(%r14),%edx
ffffffff813676dc: 4c 89 ef mov %r13,%rdi
ffffffff813676df: 48 8b 4d c0 mov -0x40(%rbp),%rcx
ffffffff813676e3: 4c 8b 45 d0 mov -0x30(%rbp),%r8
ffffffff813676e7: e8 de fd ff ff callq ffffffff813674ca <drm_crtc_check_viewport>
ffffffff813676ec: 41 89 c4 mov %eax,%r12d
mode, fb);
if (ret)
ffffffff813676ef: 45 85 e4 test %r12d,%r12d
ffffffff813676f2: 0f 85 59 02 00 00 jne ffffffff81367951 <drm_mode_setcrtc+0x401>
goto out;

}

if (crtc_req->count_connectors == 0 && mode) {
ffffffff813676f8: 41 8b 4e 08 mov 0x8(%r14),%ecx
ffffffff813676fc: 48 8b 45 c0 mov -0x40(%rbp),%rax
ffffffff81367700: 48 85 c0 test %rax,%rax
ffffffff81367703: 74 1e je ffffffff81367723 <drm_mode_setcrtc+0x1d3>
ffffffff81367705: 85 c9 test %ecx,%ecx
ffffffff81367707: 75 1a jne ffffffff81367723 <drm_mode_setcrtc+0x1d3>
DRM_DEBUG_KMS("Count connectors is 0 but mode set\n");
ffffffff81367709: 48 c7 c7 5d 82 95 81 mov $0xffffffff8195825d,%rdi
ffffffff81367710: 48 c7 c2 43 7b 9b 81 mov $0xffffffff819b7b43,%rdx
ffffffff81367717: 31 c0 xor %eax,%eax
ffffffff81367719: 6a 04 pushq $0x4
ffffffff8136771b: 5e pop %rsi
ffffffff8136771c: e8 05 c5 ff ff callq ffffffff81363c26 <drm_printk>
ffffffff81367721: eb 44 jmp ffffffff81367767 <drm_mode_setcrtc+0x217>
if (ret)
goto out;

}

if (crtc_req->count_connectors == 0 && mode) {
ffffffff81367723: 48 85 c0 test %rax,%rax
ffffffff81367726: 0f 95 c0 setne %al
DRM_DEBUG_KMS("Count connectors is 0 but mode set\n");
ret = -EINVAL;
goto out;
}

if (crtc_req->count_connectors > 0 && (!mode || !fb)) {
ffffffff81367729: 48 83 7d d0 00 cmpq $0x0,-0x30(%rbp)
ffffffff8136772e: 0f 95 c2 setne %dl
if (ret)
goto out;

}

if (crtc_req->count_connectors == 0 && mode) {
ffffffff81367731: 85 c9 test %ecx,%ecx
DRM_DEBUG_KMS("Count connectors is 0 but mode set\n");
ret = -EINVAL;
goto out;
}

if (crtc_req->count_connectors > 0 && (!mode || !fb)) {
ffffffff81367733: 74 1e je ffffffff81367753 <drm_mode_setcrtc+0x203>
ffffffff81367735: 20 d0 and %dl,%al
ffffffff81367737: 75 1a jne ffffffff81367753 <drm_mode_setcrtc+0x203>
DRM_DEBUG_KMS("Count connectors is %d but no mode or fb set\n",
ffffffff81367739: 48 c7 c7 5d 82 95 81 mov $0xffffffff8195825d,%rdi
ffffffff81367740: 48 c7 c2 67 7b 9b 81 mov $0xffffffff819b7b67,%rdx
ffffffff81367747: 31 c0 xor %eax,%eax
ffffffff81367749: 6a 04 pushq $0x4
ffffffff8136774b: 5e pop %rsi
ffffffff8136774c: e8 d5 c4 ff ff callq ffffffff81363c26 <drm_printk>
ffffffff81367751: eb 14 jmp ffffffff81367767 <drm_mode_setcrtc+0x217>
if (ret)
goto out;

}

if (crtc_req->count_connectors == 0 && mode) {
ffffffff81367753: 85 c9 test %ecx,%ecx
crtc_req->count_connectors);
ret = -EINVAL;
goto out;
}

if (crtc_req->count_connectors > 0) {
ffffffff81367755: 74 3e je ffffffff81367795 <drm_mode_setcrtc+0x245>
u32 out_id;

/* Avoid unbounded kernel memory allocation */
if (crtc_req->count_connectors > config->num_connector) {
ffffffff81367757: 48 8b 45 b0 mov -0x50(%rbp),%rax
ffffffff8136775b: 3b 88 10 04 00 00 cmp 0x410(%rax),%ecx
ffffffff81367761: 0f 86 8d 00 00 00 jbe ffffffff813677f4 <drm_mode_setcrtc+0x2a4>
ffffffff81367767: 6a ea pushq $0xffffffffffffffea
ffffffff81367769: 41 5c pop %r12
ffffffff8136776b: e9 da 01 00 00 jmpq ffffffff8136794a <drm_mode_setcrtc+0x3fa>
if (crtc_req->mode_valid) {
/* If we have a mode we need a framebuffer. */
/* If we pass -1, set the mode with the currently bound fb */
if (crtc_req->fb_id == -1) {
if (!crtc->primary->fb) {
DRM_DEBUG_KMS("CRTC doesn't have current FB\n");
ffffffff81367770: 48 c7 c7 5d 82 95 81 mov $0xffffffff8195825d,%rdi
ffffffff81367777: 48 c7 c2 ed 7a 9b 81 mov $0xffffffff819b7aed,%rdx
ffffffff8136777e: 31 c0 xor %eax,%eax
ffffffff81367780: 6a 04 pushq $0x4
ffffffff81367782: 5e pop %rsi
ffffffff81367783: e8 9e c4 ff ff callq ffffffff81363c26 <drm_printk>
ffffffff81367788: 6a ea pushq $0xffffffffffffffea
ffffffff8136778a: 41 5c pop %r12
ffffffff8136778c: 4c 8b 6d b8 mov -0x48(%rbp),%r13
ffffffff81367790: e9 b5 01 00 00 jmpq ffffffff8136794a <drm_mode_setcrtc+0x3fa>
ffffffff81367795: 31 d2 xor %edx,%edx
ffffffff81367797: 48 8d b5 38 ff ff ff lea -0xc8(%rbp),%rsi
ffffffff8136779e: e9 46 01 00 00 jmpq ffffffff813678e9 <drm_mode_setcrtc+0x399>
* match real hardware capabilities. Skip the check in that
* case.
*/
if (!crtc->primary->format_default) {
ret = drm_plane_check_pixel_format(crtc->primary,
fb->format->format);
ffffffff813677a3: 48 8b 45 d0 mov -0x30(%rbp),%rax
ffffffff813677a7: 48 8b 40 38 mov 0x38(%rax),%rax
ffffffff813677ab: 8b 30 mov (%rax),%esi
* default formats list provided by the DRM core which doesn't
* match real hardware capabilities. Skip the check in that
* case.
*/
if (!crtc->primary->format_default) {
ret = drm_plane_check_pixel_format(crtc->primary,
ffffffff813677ad: e8 40 f4 00 00 callq ffffffff81376bf2 <drm_plane_check_pixel_format>
ffffffff813677b2: 41 89 c4 mov %eax,%r12d
fb->format->format);
if (ret) {
ffffffff813677b5: 45 85 e4 test %r12d,%r12d
ffffffff813677b8: 0f 84 16 ff ff ff je ffffffff813676d4 <drm_mode_setcrtc+0x184>
struct drm_format_name_buf format_name;
DRM_DEBUG_KMS("Invalid pixel format %s\n",
ffffffff813677be: 48 8b 45 d0 mov -0x30(%rbp),%rax
ffffffff813677c2: 48 8b 40 38 mov 0x38(%rax),%rax
ffffffff813677c6: 8b 38 mov (%rax),%edi
ffffffff813677c8: 48 8d b5 18 ff ff ff lea -0xe8(%rbp),%rsi
ffffffff813677cf: e8 c7 03 00 00 callq ffffffff81367b9b <drm_get_format_name>
ffffffff813677d4: 48 89 c1 mov %rax,%rcx
ffffffff813677d7: 48 c7 c7 5d 82 95 81 mov $0xffffffff8195825d,%rdi
ffffffff813677de: 48 c7 c2 2a 7b 9b 81 mov $0xffffffff819b7b2a,%rdx
ffffffff813677e5: 31 c0 xor %eax,%eax
ffffffff813677e7: 6a 04 pushq $0x4
ffffffff813677e9: 5e pop %rsi
ffffffff813677ea: e8 37 c4 ff ff callq ffffffff81363c26 <drm_printk>
ffffffff813677ef: e9 5d 01 00 00 jmpq ffffffff81367951 <drm_mode_setcrtc+0x401>
{
if (size != 0 && n > SIZE_MAX / size)
return NULL;
if (__builtin_constant_p(n) && __builtin_constant_p(size))
return kmalloc(n * size, flags);
return __kmalloc(n * size, flags);
ffffffff813677f4: 48 c1 e1 03 shl $0x3,%rcx
ffffffff813677f8: be c0 00 40 01 mov $0x14000c0,%esi
ffffffff813677fd: 48 89 cf mov %rcx,%rdi
ffffffff81367800: e8 85 78 dd ff callq ffffffff8113f08a <__kmalloc>
}

connector_set = kmalloc_array(crtc_req->count_connectors,
sizeof(struct drm_connector *),
GFP_KERNEL);
if (!connector_set) {
ffffffff81367805: 48 85 c0 test %rax,%rax
ffffffff81367808: 48 8d b5 38 ff ff ff lea -0xc8(%rbp),%rsi
ffffffff8136780f: 0f 84 c2 00 00 00 je ffffffff813678d7 <drm_mode_setcrtc+0x387>
ffffffff81367815: 31 db xor %ebx,%ebx
ffffffff81367817: 48 89 c2 mov %rax,%rdx
ffffffff8136781a: 48 89 55 c8 mov %rdx,-0x38(%rbp)
ffffffff8136781e: eb 35 jmp ffffffff81367855 <drm_mode_setcrtc+0x305>
DRM_DEBUG_KMS("Connector id %d unknown\n",
out_id);
ret = -ENOENT;
goto out;
}
DRM_DEBUG_KMS("[CONNECTOR:%d:%s]\n",
ffffffff81367820: 41 8b 4d 28 mov 0x28(%r13),%ecx
ffffffff81367824: 4d 8b 45 48 mov 0x48(%r13),%r8
ffffffff81367828: 48 c7 c7 5d 82 95 81 mov $0xffffffff8195825d,%rdi
ffffffff8136782f: 48 c7 c2 94 57 9b 81 mov $0xffffffff819b5794,%rdx
ffffffff81367836: 31 c0 xor %eax,%eax
ffffffff81367838: 6a 04 pushq $0x4
ffffffff8136783a: 5e pop %rsi
ffffffff8136783b: e8 e6 c3 ff ff callq ffffffff81363c26 <drm_printk>
ffffffff81367840: 48 8b 45 c8 mov -0x38(%rbp),%rax
connector->base.id,
connector->name);

connector_set[i] = connector;
ffffffff81367844: 4e 89 2c f8 mov %r13,(%rax,%r15,8)
if (!connector_set) {
ret = -ENOMEM;
goto out;
}

for (i = 0; i < crtc_req->count_connectors; i++) {
ffffffff81367848: ff c3 inc %ebx
ffffffff8136784a: 4c 8b 6d b8 mov -0x48(%rbp),%r13
ffffffff8136784e: 48 8d b5 38 ff ff ff lea -0xc8(%rbp),%rsi
ffffffff81367855: 41 8b 56 08 mov 0x8(%r14),%edx
ffffffff81367859: 39 d3 cmp %edx,%ebx
ffffffff8136785b: 0f 83 85 00 00 00 jae ffffffff813678e6 <drm_mode_setcrtc+0x396>
connector_set[i] = NULL;
ffffffff81367861: 4c 63 fb movslq %ebx,%r15
ffffffff81367864: 4a 83 24 f8 00 andq $0x0,(%rax,%r15,8)
set_connectors_ptr = (uint32_t __user *)(unsigned long)crtc_req->set_connectors_ptr;
if (get_user(out_id, &set_connectors_ptr[i])) {
ffffffff81367869: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
ffffffff81367870: 00
ffffffff81367871: 49 03 06 add (%r14),%rax
ffffffff81367874: 48 8b 65 a8 mov -0x58(%rbp),%rsp
ffffffff81367878: e8 73 26 f1 ff callq ffffffff81279ef0 <__get_user_4>
ffffffff8136787d: 49 89 d4 mov %rdx,%r12
ffffffff81367880: 48 89 65 a8 mov %rsp,-0x58(%rbp)
ffffffff81367884: 85 c0 test %eax,%eax
ffffffff81367886: 0f 85 a0 00 00 00 jne ffffffff8136792c <drm_mode_setcrtc+0x3dc>
*/
static inline struct drm_connector *drm_connector_lookup(struct drm_device *dev,
uint32_t id)
{
struct drm_mode_object *mo;
mo = drm_mode_object_find(dev, id, DRM_MODE_OBJECT_CONNECTOR);
ffffffff8136788c: ba c0 c0 c0 c0 mov $0xc0c0c0c0,%edx
ffffffff81367891: 48 8b 7d b0 mov -0x50(%rbp),%rdi
ffffffff81367895: 44 89 e6 mov %r12d,%esi
ffffffff81367898: e8 f7 db 00 00 callq ffffffff81375494 <drm_mode_object_find>
ffffffff8136789d: 49 89 c5 mov %rax,%r13
return mo ? obj_to_connector(mo) : NULL;
ffffffff813678a0: 4d 85 ed test %r13,%r13
ret = -EFAULT;
goto out;
}

connector = drm_connector_lookup(dev, out_id);
if (!connector) {
ffffffff813678a3: 74 0a je ffffffff813678af <drm_mode_setcrtc+0x35f>
ffffffff813678a5: 49 83 c5 d8 add $0xffffffffffffffd8,%r13
ffffffff813678a9: 0f 85 71 ff ff ff jne ffffffff81367820 <drm_mode_setcrtc+0x2d0>
DRM_DEBUG_KMS("Connector id %d unknown\n",
ffffffff813678af: 48 c7 c7 5d 82 95 81 mov $0xffffffff8195825d,%rdi
ffffffff813678b6: 48 c7 c2 95 7b 9b 81 mov $0xffffffff819b7b95,%rdx
ffffffff813678bd: 31 c0 xor %eax,%eax
ffffffff813678bf: 6a 04 pushq $0x4
ffffffff813678c1: 5e pop %rsi
ffffffff813678c2: 44 89 e1 mov %r12d,%ecx
ffffffff813678c5: e8 5c c3 ff ff callq ffffffff81363c26 <drm_printk>
ffffffff813678ca: 6a fe pushq $0xfffffffffffffffe
ffffffff813678cc: 41 5c pop %r12
ffffffff813678ce: 4c 8b 7d c8 mov -0x38(%rbp),%r15
ffffffff813678d2: e9 b5 fe ff ff jmpq ffffffff8136778c <drm_mode_setcrtc+0x23c>
if (crtc_req->count_connectors > config->num_connector) {
ret = -EINVAL;
goto out;
}

connector_set = kmalloc_array(crtc_req->count_connectors,
ffffffff813678d7: 49 89 c7 mov %rax,%r15
ffffffff813678da: 6a f4 pushq $0xfffffffffffffff4
ffffffff813678dc: 41 5c pop %r12
ffffffff813678de: 31 c0 xor %eax,%eax
ffffffff813678e0: 48 89 45 c8 mov %rax,-0x38(%rbp)
ffffffff813678e4: eb 64 jmp ffffffff8136794a <drm_mode_setcrtc+0x3fa>
ffffffff813678e6: 49 89 c7 mov %rax,%r15

connector_set[i] = connector;
}
}

set.crtc = crtc;
ffffffff813678e9: 4c 89 ad 78 ff ff ff mov %r13,-0x88(%rbp)
set.x = crtc_req->x;
ffffffff813678f0: 41 8b 4e 14 mov 0x14(%r14),%ecx
ffffffff813678f4: 89 4d 88 mov %ecx,-0x78(%rbp)
set.y = crtc_req->y;
ffffffff813678f7: 41 8b 4e 18 mov 0x18(%r14),%ecx
ffffffff813678fb: 89 4d 8c mov %ecx,-0x74(%rbp)
set.mode = mode;
ffffffff813678fe: 48 8b 45 c0 mov -0x40(%rbp),%rax
ffffffff81367902: 48 89 45 80 mov %rax,-0x80(%rbp)
set.connectors = connector_set;
ffffffff81367906: 4c 89 7d 90 mov %r15,-0x70(%rbp)
set.num_connectors = crtc_req->count_connectors;
ffffffff8136790a: 89 d0 mov %edx,%eax
ffffffff8136790c: 48 89 45 98 mov %rax,-0x68(%rbp)
set.fb = fb;
ffffffff81367910: 48 8b 45 d0 mov -0x30(%rbp),%rax
ffffffff81367914: 48 89 85 70 ff ff ff mov %rax,-0x90(%rbp)
ret = __drm_mode_set_config_internal(&set, &ctx);
ffffffff8136791b: 48 8d bd 70 ff ff ff lea -0x90(%rbp),%rdi
ffffffff81367922: e8 bf fa ff ff callq ffffffff813673e6 <__drm_mode_set_config_internal>
ffffffff81367927: 41 89 c4 mov %eax,%r12d
ffffffff8136792a: eb 1e jmp ffffffff8136794a <drm_mode_setcrtc+0x3fa>
ffffffff8136792c: 6a f2 pushq $0xfffffffffffffff2
ffffffff8136792e: 41 5c pop %r12
ffffffff81367930: 4c 8b 7d c8 mov -0x38(%rbp),%r15
ffffffff81367934: eb 14 jmp ffffffff8136794a <drm_mode_setcrtc+0x3fa>
DRM_DEBUG_KMS("[CRTC:%d:%s]\n", crtc->base.id, crtc->name);

mutex_lock(&crtc->dev->mode_config.mutex);
drm_modeset_acquire_init(&ctx, 0);
retry:
ret = drm_modeset_lock_all_ctx(crtc->dev, &ctx);
ffffffff81367936: 49 8b 7d 00 mov 0x0(%r13),%rdi
ffffffff8136793a: 48 89 de mov %rbx,%rsi
ffffffff8136793d: e8 08 7b 00 00 callq ffffffff8136f44a <drm_modeset_lock_all_ctx>
ffffffff81367942: 41 89 c4 mov %eax,%r12d
if (ret)
ffffffff81367945: 45 85 e4 test %r12d,%r12d
ffffffff81367948: 74 16 je ffffffff81367960 <drm_mode_setcrtc+0x410>
set.num_connectors = crtc_req->count_connectors;
set.fb = fb;
ret = __drm_mode_set_config_internal(&set, &ctx);

out:
if (fb)
ffffffff8136794a: 48 83 7d d0 00 cmpq $0x0,-0x30(%rbp)
ffffffff8136794f: 74 6b je ffffffff813679bc <drm_mode_setcrtc+0x46c>
* This function decrements the framebuffer's reference count and frees the
* framebuffer if the reference count drops to zero.
*/
static inline void drm_framebuffer_put(struct drm_framebuffer *fb)
{
drm_mode_object_put(&fb->base);
ffffffff81367951: 48 8b 45 d0 mov -0x30(%rbp),%rax
ffffffff81367955: 48 8d 78 18 lea 0x18(%rax),%rdi
ffffffff81367959: e8 46 db 00 00 callq ffffffff813754a4 <drm_mode_object_put>
ffffffff8136795e: eb 62 jmp ffffffff813679c2 <drm_mode_setcrtc+0x472>
drm_modeset_acquire_init(&ctx, 0);
retry:
ret = drm_modeset_lock_all_ctx(crtc->dev, &ctx);
if (ret)
goto out;
if (crtc_req->mode_valid) {
ffffffff81367960: 41 83 7e 20 00 cmpl $0x0,0x20(%r14)
ffffffff81367965: 0f 84 8d fd ff ff je ffffffff813676f8 <drm_mode_setcrtc+0x1a8>
/* If we have a mode we need a framebuffer. */
/* If we pass -1, set the mode with the currently bound fb */
if (crtc_req->fb_id == -1) {
ffffffff8136796b: 41 8b 76 10 mov 0x10(%r14),%esi
ffffffff8136796f: 83 fe ff cmp $0xffffffff,%esi
ffffffff81367972: 0f 84 c8 fc ff ff je ffffffff81367640 <drm_mode_setcrtc+0xf0>
ffffffff81367978: 48 8b 5d b0 mov -0x50(%rbp),%rbx
}
fb = crtc->primary->fb;
/* Make refcounting symmetric with the lookup path. */
drm_framebuffer_get(fb);
} else {
fb = drm_framebuffer_lookup(dev, crtc_req->fb_id);
ffffffff8136797c: 48 89 df mov %rbx,%rdi
ffffffff8136797f: e8 a3 b3 00 00 callq ffffffff81372d27 <drm_framebuffer_lookup>
ffffffff81367984: 48 89 c1 mov %rax,%rcx
if (!fb) {
ffffffff81367987: 48 89 4d d0 mov %rcx,-0x30(%rbp)
ffffffff8136798b: 48 85 c0 test %rax,%rax
ffffffff8136798e: 0f 85 d8 fc ff ff jne ffffffff8136766c <drm_mode_setcrtc+0x11c>
DRM_DEBUG_KMS("Unknown FB ID%d\n",
ffffffff81367994: 41 8b 4e 10 mov 0x10(%r14),%ecx
ffffffff81367998: 48 c7 c7 5d 82 95 81 mov $0xffffffff8195825d,%rdi
ffffffff8136799f: 48 c7 c2 0b 7b 9b 81 mov $0xffffffff819b7b0b,%rdx
ffffffff813679a6: 31 c0 xor %eax,%eax
ffffffff813679a8: 6a 04 pushq $0x4
ffffffff813679aa: 5e pop %rsi
ffffffff813679ab: e8 76 c2 ff ff callq ffffffff81363c26 <drm_printk>
ffffffff813679b0: 31 c0 xor %eax,%eax
ffffffff813679b2: 48 89 45 d0 mov %rax,-0x30(%rbp)
ffffffff813679b6: 6a fe pushq $0xfffffffffffffffe
ffffffff813679b8: 41 5c pop %r12
ffffffff813679ba: eb 06 jmp ffffffff813679c2 <drm_mode_setcrtc+0x472>
ffffffff813679bc: 31 c0 xor %eax,%eax
ffffffff813679be: 48 89 45 d0 mov %rax,-0x30(%rbp)

out:
if (fb)
drm_framebuffer_put(fb);

if (connector_set) {
ffffffff813679c2: 4d 85 ff test %r15,%r15
ffffffff813679c5: 74 27 je ffffffff813679ee <drm_mode_setcrtc+0x49e>
ffffffff813679c7: 31 db xor %ebx,%ebx
ffffffff813679c9: eb 17 jmp ffffffff813679e2 <drm_mode_setcrtc+0x492>
for (i = 0; i < crtc_req->count_connectors; i++) {
if (connector_set[i])
ffffffff813679cb: 48 63 c3 movslq %ebx,%rax
ffffffff813679ce: 49 8b 3c c7 mov (%r15,%rax,8),%rdi
ffffffff813679d2: 48 85 ff test %rdi,%rdi
ffffffff813679d5: 74 09 je ffffffff813679e0 <drm_mode_setcrtc+0x490>
* This function decrements the connector's reference count and frees the
* object if the reference count drops to zero.
*/
static inline void drm_connector_put(struct drm_connector *connector)
{
drm_mode_object_put(&connector->base);
ffffffff813679d7: 48 83 c7 28 add $0x28,%rdi
ffffffff813679db: e8 c4 da 00 00 callq ffffffff813754a4 <drm_mode_object_put>
out:
if (fb)
drm_framebuffer_put(fb);

if (connector_set) {
for (i = 0; i < crtc_req->count_connectors; i++) {
ffffffff813679e0: ff c3 inc %ebx
ffffffff813679e2: 41 3b 5e 08 cmp 0x8(%r14),%ebx
ffffffff813679e6: 72 e3 jb ffffffff813679cb <drm_mode_setcrtc+0x47b>
ffffffff813679e8: 4c 8b 6d b8 mov -0x48(%rbp),%r13
ffffffff813679ec: eb 03 jmp ffffffff813679f1 <drm_mode_setcrtc+0x4a1>
ffffffff813679ee: 45 31 ff xor %r15d,%r15d
if (connector_set[i])
drm_connector_put(connector_set[i]);
}
}
kfree(connector_set);
ffffffff813679f1: 48 8b 7d c8 mov -0x38(%rbp),%rdi
ffffffff813679f5: e8 55 78 dd ff callq ffffffff8113f24f <kfree>
drm_mode_destroy(dev, mode);
ffffffff813679fa: 48 8b 7d b0 mov -0x50(%rbp),%rdi
ffffffff813679fe: 48 8b 75 c0 mov -0x40(%rbp),%rsi
ffffffff81367a02: e8 ef 04 00 00 callq ffffffff81367ef6 <drm_mode_destroy>
if (ret == -EDEADLK) {
ffffffff81367a07: 41 83 fc dd cmp $0xffffffdd,%r12d
ffffffff81367a0b: 0f 84 1b fc ff ff je ffffffff8136762c <drm_mode_setcrtc+0xdc>
ffffffff81367a11: 4c 8d b5 38 ff ff ff lea -0xc8(%rbp),%r14
drm_modeset_backoff(&ctx);
goto retry;
}
drm_modeset_drop_locks(&ctx);
ffffffff81367a18: 4c 89 f7 mov %r14,%rdi
ffffffff81367a1b: e8 62 7b 00 00 callq ffffffff8136f582 <drm_modeset_drop_locks>
drm_modeset_acquire_fini(&ctx);
ffffffff81367a20: 4c 89 f7 mov %r14,%rdi
ffffffff81367a23: e8 ad 7a 00 00 callq ffffffff8136f4d5 <drm_modeset_acquire_fini>
mutex_unlock(&crtc->dev->mode_config.mutex);
ffffffff81367a28: 49 8b 7d 00 mov 0x0(%r13),%rdi
ffffffff81367a2c: 48 81 c7 b0 02 00 00 add $0x2b0,%rdi
ffffffff81367a33: e8 b0 38 2e 00 callq ffffffff8164b2e8 <mutex_unlock>
ffffffff81367a38: eb 20 jmp ffffffff81367a5a <drm_mode_setcrtc+0x50a>
if (crtc_req->x & 0xffff0000 || crtc_req->y & 0xffff0000)
return -ERANGE;

crtc = drm_crtc_find(dev, crtc_req->crtc_id);
if (!crtc) {
DRM_DEBUG_KMS("Unknown CRTC ID %d\n", crtc_req->crtc_id);
ffffffff81367a3a: 41 8b 4e 0c mov 0xc(%r14),%ecx
ffffffff81367a3e: 6a 04 pushq $0x4
ffffffff81367a40: 5e pop %rsi
ffffffff81367a41: 48 c7 c7 5d 82 95 81 mov $0xffffffff8195825d,%rdi
ffffffff81367a48: 48 c7 c2 d9 7a 9b 81 mov $0xffffffff819b7ad9,%rdx
ffffffff81367a4f: 31 c0 xor %eax,%eax
ffffffff81367a51: e8 d0 c1 ff ff callq ffffffff81363c26 <drm_printk>
ffffffff81367a56: 6a fe pushq $0xfffffffffffffffe
ffffffff81367a58: 41 5c pop %r12
drm_modeset_drop_locks(&ctx);
drm_modeset_acquire_fini(&ctx);
mutex_unlock(&crtc->dev->mode_config.mutex);

return ret;
}
ffffffff81367a5a: 44 89 e0 mov %r12d,%eax
ffffffff81367a5d: 48 81 c4 c0 00 00 00 add $0xc0,%rsp
ffffffff81367a64: 5b pop %rbx
ffffffff81367a65: 41 5c pop %r12
ffffffff81367a67: 41 5d pop %r13
ffffffff81367a69: 41 5e pop %r14
ffffffff81367a6b: 41 5f pop %r15
ffffffff81367a6d: 5d pop %rbp
ffffffff81367a6e: c3 retq

============================================================

Without f05058c4d652:

ffffffff8136744c <drm_mode_setcrtc>:
* Returns:
* Zero on success, negative errno on failure.
*/
int drm_mode_setcrtc(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
ffffffff8136744c: e8 1f 71 2e 00 callq ffffffff8164e570 <__fentry__>
ffffffff81367451: 55 push %rbp
ffffffff81367452: 48 89 e5 mov %rsp,%rbp
ffffffff81367455: 41 57 push %r15
ffffffff81367457: 41 56 push %r14
ffffffff81367459: 41 55 push %r13
ffffffff8136745b: 41 54 push %r12
ffffffff8136745d: 53 push %rbx
ffffffff8136745e: 48 81 ec b8 00 00 00 sub $0xb8,%rsp
ffffffff81367465: 49 89 f6 mov %rsi,%r14
ffffffff81367468: 48 89 fb mov %rdi,%rbx
#define DRM_SWITCH_POWER_DYNAMIC_OFF 3

static __inline__ int drm_core_check_feature(struct drm_device *dev,
int feature)
{
return ((dev->driver->driver_features & feature) ? 1 : 0);
ffffffff8136746b: 48 8b 43 20 mov 0x20(%rbx),%rax
uint32_t __user *set_connectors_ptr;
struct drm_modeset_acquire_ctx ctx;
int ret;
int i;

if (!drm_core_check_feature(dev, DRIVER_MODESET))
ffffffff8136746f: f6 80 81 01 00 00 20 testb $0x20,0x181(%rax)
ffffffff81367476: 75 09 jne ffffffff81367481 <drm_mode_setcrtc+0x35>
ffffffff81367478: 6a ea pushq $0xffffffffffffffea
ffffffff8136747a: 41 5c pop %r12
ffffffff8136747c: e9 be 04 00 00 jmpq ffffffff8136793f <drm_mode_setcrtc+0x4f3>
ffffffff81367481: 6a de pushq $0xffffffffffffffde
ffffffff81367483: 41 5c pop %r12

/*
* Universal plane src offsets are only 16.16, prevent havoc for
* drivers using universal plane code internally.
*/
if (crtc_req->x & 0xffff0000 || crtc_req->y & 0xffff0000)
ffffffff81367485: 41 81 7e 14 ff ff 00 cmpl $0xffff,0x14(%r14)
ffffffff8136748c: 00
ffffffff8136748d: 0f 87 ac 04 00 00 ja ffffffff8136793f <drm_mode_setcrtc+0x4f3>
ffffffff81367493: 41 81 7e 18 ff ff 00 cmpl $0xffff,0x18(%r14)
ffffffff8136749a: 00
ffffffff8136749b: 0f 87 9e 04 00 00 ja ffffffff8136793f <drm_mode_setcrtc+0x4f3>
return -ERANGE;

crtc = drm_crtc_find(dev, crtc_req->crtc_id);
ffffffff813674a1: 41 8b 76 0c mov 0xc(%r14),%esi
ffffffff813674a5: 48 89 df mov %rbx,%rdi
ffffffff813674a8: e8 15 fe ff ff callq ffffffff813672c2 <drm_crtc_find>
ffffffff813674ad: 49 89 c5 mov %rax,%r13
if (!crtc) {
ffffffff813674b0: 4d 85 ed test %r13,%r13
ffffffff813674b3: 0f 84 66 04 00 00 je ffffffff8136791f <drm_mode_setcrtc+0x4d3>
ffffffff813674b9: 48 89 5d b0 mov %rbx,-0x50(%rbp)
DRM_DEBUG_KMS("Unknown CRTC ID %d\n", crtc_req->crtc_id);
return -ENOENT;
}
DRM_DEBUG_KMS("[CRTC:%d:%s]\n", crtc->base.id, crtc->name);
ffffffff813674bd: 41 8b 4d 78 mov 0x78(%r13),%ecx
ffffffff813674c1: 4d 8b 45 20 mov 0x20(%r13),%r8
ffffffff813674c5: 6a 04 pushq $0x4
ffffffff813674c7: 5e pop %rsi
ffffffff813674c8: 48 c7 c7 4d 82 95 81 mov $0xffffffff8195824d,%rdi
ffffffff813674cf: 48 c7 c2 d8 55 9b 81 mov $0xffffffff819b55d8,%rdx
ffffffff813674d6: 31 c0 xor %eax,%eax
ffffffff813674d8: e8 45 c6 ff ff callq ffffffff81363b22 <drm_printk>

mutex_lock(&crtc->dev->mode_config.mutex);
ffffffff813674dd: 49 8b 7d 00 mov 0x0(%r13),%rdi
ffffffff813674e1: 48 81 c7 b0 02 00 00 add $0x2b0,%rdi
ffffffff813674e8: e8 e6 3b 2e 00 callq ffffffff8164b0d3 <mutex_lock>
ffffffff813674ed: 48 8d 9d 40 ff ff ff lea -0xc0(%rbp),%rbx
drm_modeset_acquire_init(&ctx, 0);
ffffffff813674f4: 31 f6 xor %esi,%esi
ffffffff813674f6: 48 89 df mov %rbx,%rdi
ffffffff813674f9: e8 d8 7d 00 00 callq ffffffff8136f2d6 <drm_modeset_acquire_init>
ffffffff813674fe: 49 8d 46 24 lea 0x24(%r14),%rax
ffffffff81367502: 48 89 45 a8 mov %rax,-0x58(%rbp)
ffffffff81367506: 31 c0 xor %eax,%eax
ffffffff81367508: 48 89 45 c0 mov %rax,-0x40(%rbp)
ffffffff8136750c: 31 c0 xor %eax,%eax
ffffffff8136750e: 48 89 45 d0 mov %rax,-0x30(%rbp)
ffffffff81367512: 31 c0 xor %eax,%eax
ffffffff81367514: 48 89 45 c8 mov %rax,-0x38(%rbp)
ffffffff81367518: 45 31 ff xor %r15d,%r15d
ffffffff8136751b: 4c 89 6d b8 mov %r13,-0x48(%rbp)
ffffffff8136751f: e9 f7 02 00 00 jmpq ffffffff8136781b <drm_mode_setcrtc+0x3cf>
ffffffff81367524: 48 8d 9d 40 ff ff ff lea -0xc0(%rbp),%rbx
}
}
kfree(connector_set);
drm_mode_destroy(dev, mode);
if (ret == -EDEADLK) {
drm_modeset_backoff(&ctx);
ffffffff8136752b: 48 89 df mov %rbx,%rdi
ffffffff8136752e: e8 75 7e 00 00 callq ffffffff8136f3a8 <drm_modeset_backoff>
ffffffff81367533: e9 e3 02 00 00 jmpq ffffffff8136781b <drm_mode_setcrtc+0x3cf>
goto out;
if (crtc_req->mode_valid) {
/* If we have a mode we need a framebuffer. */
/* If we pass -1, set the mode with the currently bound fb */
if (crtc_req->fb_id == -1) {
if (!crtc->primary->fb) {
ffffffff81367538: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax
ffffffff8136753f: 48 8b 98 b0 00 00 00 mov 0xb0(%rax),%rbx
ffffffff81367546: 48 85 db test %rbx,%rbx
ffffffff81367549: 0f 84 19 01 00 00 je ffffffff81367668 <drm_mode_setcrtc+0x21c>
*
* This function increments the framebuffer's reference count.
*/
static inline void drm_framebuffer_get(struct drm_framebuffer *fb)
{
drm_mode_object_get(&fb->base);
ffffffff8136754f: 48 8d 7b 18 lea 0x18(%rbx),%rdi
ffffffff81367553: e8 45 de 00 00 callq ffffffff8137539d <drm_mode_object_get>
ffffffff81367558: 48 89 5d d0 mov %rbx,-0x30(%rbp)
ffffffff8136755c: 4c 8b 6d b8 mov -0x48(%rbp),%r13
ffffffff81367560: 48 8b 5d b0 mov -0x50(%rbp),%rbx
ret = -ENOENT;
goto out;
}
}

mode = drm_mode_create(dev);
ffffffff81367564: 48 89 df mov %rbx,%rdi
ffffffff81367567: e8 1f 08 00 00 callq ffffffff81367d8b <drm_mode_create>
if (!mode) {
ffffffff8136756c: 48 85 c0 test %rax,%rax
ffffffff8136756f: 74 38 je ffffffff813675a9 <drm_mode_setcrtc+0x15d>
ffffffff81367571: 48 89 c1 mov %rax,%rcx
ret = -ENOMEM;
goto out;
}

ret = drm_mode_convert_umode(mode, &crtc_req->mode);
ffffffff81367574: 48 89 4d c0 mov %rcx,-0x40(%rbp)
ffffffff81367578: 48 89 c7 mov %rax,%rdi
ffffffff8136757b: 48 8b 75 a8 mov -0x58(%rbp),%rsi
ffffffff8136757f: e8 27 1b 00 00 callq ffffffff813690ab <drm_mode_convert_umode>
ffffffff81367584: 41 89 c4 mov %eax,%r12d
if (ret) {
ffffffff81367587: 45 85 e4 test %r12d,%r12d
ffffffff8136758a: 74 2c je ffffffff813675b8 <drm_mode_setcrtc+0x16c>
DRM_DEBUG_KMS("Invalid mode\n");
ffffffff8136758c: 48 c7 c7 4d 82 95 81 mov $0xffffffff8195824d,%rdi
ffffffff81367593: 48 c7 c2 0c 7b 9b 81 mov $0xffffffff819b7b0c,%rdx
ffffffff8136759a: 31 c0 xor %eax,%eax
ffffffff8136759c: 6a 04 pushq $0x4
ffffffff8136759e: 5e pop %rsi
ffffffff8136759f: e8 7e c5 ff ff callq ffffffff81363b22 <drm_printk>
ffffffff813675a4: e9 8d 02 00 00 jmpq ffffffff81367836 <drm_mode_setcrtc+0x3ea>
ffffffff813675a9: 31 c0 xor %eax,%eax
ffffffff813675ab: 48 89 45 c0 mov %rax,-0x40(%rbp)
ffffffff813675af: 6a f4 pushq $0xfffffffffffffff4
ffffffff813675b1: 41 5c pop %r12
ffffffff813675b3: e9 7e 02 00 00 jmpq ffffffff81367836 <drm_mode_setcrtc+0x3ea>
* Drivers not implementing the universal planes API use a
* default formats list provided by the DRM core which doesn't
* match real hardware capabilities. Skip the check in that
* case.
*/
if (!crtc->primary->format_default) {
ffffffff813675b8: 49 8b bd 98 00 00 00 mov 0x98(%r13),%rdi
ffffffff813675bf: 80 bf a4 00 00 00 00 cmpb $0x0,0xa4(%rdi)
ffffffff813675c6: 0f 84 cf 00 00 00 je ffffffff8136769b <drm_mode_setcrtc+0x24f>
&format_name));
goto out;
}
}

ret = drm_crtc_check_viewport(crtc, crtc_req->x, crtc_req->y,
ffffffff813675cc: 41 8b 76 14 mov 0x14(%r14),%esi
ffffffff813675d0: 41 8b 56 18 mov 0x18(%r14),%edx
ffffffff813675d4: 4c 89 ef mov %r13,%rdi
ffffffff813675d7: 48 8b 4d c0 mov -0x40(%rbp),%rcx
ffffffff813675db: 4c 8b 45 d0 mov -0x30(%rbp),%r8
ffffffff813675df: e8 e2 fd ff ff callq ffffffff813673c6 <drm_crtc_check_viewport>
ffffffff813675e4: 41 89 c4 mov %eax,%r12d
mode, fb);
if (ret)
ffffffff813675e7: 45 85 e4 test %r12d,%r12d
ffffffff813675ea: 0f 85 46 02 00 00 jne ffffffff81367836 <drm_mode_setcrtc+0x3ea>
goto out;

}

if (crtc_req->count_connectors == 0 && mode) {
ffffffff813675f0: 41 8b 4e 08 mov 0x8(%r14),%ecx
ffffffff813675f4: 48 8b 45 c0 mov -0x40(%rbp),%rax
ffffffff813675f8: 48 85 c0 test %rax,%rax
ffffffff813675fb: 74 1e je ffffffff8136761b <drm_mode_setcrtc+0x1cf>
ffffffff813675fd: 85 c9 test %ecx,%ecx
ffffffff813675ff: 75 1a jne ffffffff8136761b <drm_mode_setcrtc+0x1cf>
DRM_DEBUG_KMS("Count connectors is 0 but mode set\n");
ffffffff81367601: 48 c7 c7 4d 82 95 81 mov $0xffffffff8195824d,%rdi
ffffffff81367608: 48 c7 c2 33 7b 9b 81 mov $0xffffffff819b7b33,%rdx
ffffffff8136760f: 31 c0 xor %eax,%eax
ffffffff81367611: 6a 04 pushq $0x4
ffffffff81367613: 5e pop %rsi
ffffffff81367614: e8 09 c5 ff ff callq ffffffff81363b22 <drm_printk>
ffffffff81367619: eb 44 jmp ffffffff8136765f <drm_mode_setcrtc+0x213>
if (ret)
goto out;

}

if (crtc_req->count_connectors == 0 && mode) {
ffffffff8136761b: 48 85 c0 test %rax,%rax
ffffffff8136761e: 0f 95 c0 setne %al
DRM_DEBUG_KMS("Count connectors is 0 but mode set\n");
ret = -EINVAL;
goto out;
}

if (crtc_req->count_connectors > 0 && (!mode || !fb)) {
ffffffff81367621: 48 83 7d d0 00 cmpq $0x0,-0x30(%rbp)
ffffffff81367626: 0f 95 c2 setne %dl
if (ret)
goto out;

}

if (crtc_req->count_connectors == 0 && mode) {
ffffffff81367629: 85 c9 test %ecx,%ecx
DRM_DEBUG_KMS("Count connectors is 0 but mode set\n");
ret = -EINVAL;
goto out;
}

if (crtc_req->count_connectors > 0 && (!mode || !fb)) {
ffffffff8136762b: 74 1e je ffffffff8136764b <drm_mode_setcrtc+0x1ff>
ffffffff8136762d: 20 d0 and %dl,%al
ffffffff8136762f: 75 1a jne ffffffff8136764b <drm_mode_setcrtc+0x1ff>
DRM_DEBUG_KMS("Count connectors is %d but no mode or fb set\n",
ffffffff81367631: 48 c7 c7 4d 82 95 81 mov $0xffffffff8195824d,%rdi
ffffffff81367638: 48 c7 c2 57 7b 9b 81 mov $0xffffffff819b7b57,%rdx
ffffffff8136763f: 31 c0 xor %eax,%eax
ffffffff81367641: 6a 04 pushq $0x4
ffffffff81367643: 5e pop %rsi
ffffffff81367644: e8 d9 c4 ff ff callq ffffffff81363b22 <drm_printk>
ffffffff81367649: eb 14 jmp ffffffff8136765f <drm_mode_setcrtc+0x213>
if (ret)
goto out;

}

if (crtc_req->count_connectors == 0 && mode) {
ffffffff8136764b: 85 c9 test %ecx,%ecx
crtc_req->count_connectors);
ret = -EINVAL;
goto out;
}

if (crtc_req->count_connectors > 0) {
ffffffff8136764d: 74 3e je ffffffff8136768d <drm_mode_setcrtc+0x241>
u32 out_id;

/* Avoid unbounded kernel memory allocation */
if (crtc_req->count_connectors > config->num_connector) {
ffffffff8136764f: 48 8b 45 b0 mov -0x50(%rbp),%rax
ffffffff81367653: 3b 88 10 04 00 00 cmp 0x410(%rax),%ecx
ffffffff81367659: 0f 86 8d 00 00 00 jbe ffffffff813676ec <drm_mode_setcrtc+0x2a0>
ffffffff8136765f: 6a ea pushq $0xffffffffffffffea
ffffffff81367661: 41 5c pop %r12
ffffffff81367663: e9 c7 01 00 00 jmpq ffffffff8136782f <drm_mode_setcrtc+0x3e3>
if (crtc_req->mode_valid) {
/* If we have a mode we need a framebuffer. */
/* If we pass -1, set the mode with the currently bound fb */
if (crtc_req->fb_id == -1) {
if (!crtc->primary->fb) {
DRM_DEBUG_KMS("CRTC doesn't have current FB\n");
ffffffff81367668: 48 c7 c7 4d 82 95 81 mov $0xffffffff8195824d,%rdi
ffffffff8136766f: 48 c7 c2 dd 7a 9b 81 mov $0xffffffff819b7add,%rdx
ffffffff81367676: 31 c0 xor %eax,%eax
ffffffff81367678: 6a 04 pushq $0x4
ffffffff8136767a: 5e pop %rsi
ffffffff8136767b: e8 a2 c4 ff ff callq ffffffff81363b22 <drm_printk>
ffffffff81367680: 6a ea pushq $0xffffffffffffffea
ffffffff81367682: 41 5c pop %r12
ffffffff81367684: 4c 8b 6d b8 mov -0x48(%rbp),%r13
ffffffff81367688: e9 a2 01 00 00 jmpq ffffffff8136782f <drm_mode_setcrtc+0x3e3>
ffffffff8136768d: 31 d2 xor %edx,%edx
ffffffff8136768f: 48 8d b5 40 ff ff ff lea -0xc0(%rbp),%rsi
ffffffff81367696: e9 36 01 00 00 jmpq ffffffff813677d1 <drm_mode_setcrtc+0x385>
* match real hardware capabilities. Skip the check in that
* case.
*/
if (!crtc->primary->format_default) {
ret = drm_plane_check_pixel_format(crtc->primary,
fb->format->format);
ffffffff8136769b: 48 8b 45 d0 mov -0x30(%rbp),%rax
ffffffff8136769f: 48 8b 40 38 mov 0x38(%rax),%rax
ffffffff813676a3: 8b 30 mov (%rax),%esi
* default formats list provided by the DRM core which doesn't
* match real hardware capabilities. Skip the check in that
* case.
*/
if (!crtc->primary->format_default) {
ret = drm_plane_check_pixel_format(crtc->primary,
ffffffff813676a5: e8 ee f3 00 00 callq ffffffff81376a98 <drm_plane_check_pixel_format>
ffffffff813676aa: 41 89 c4 mov %eax,%r12d
fb->format->format);
if (ret) {
ffffffff813676ad: 45 85 e4 test %r12d,%r12d
ffffffff813676b0: 0f 84 16 ff ff ff je ffffffff813675cc <drm_mode_setcrtc+0x180>
struct drm_format_name_buf format_name;
DRM_DEBUG_KMS("Invalid pixel format %s\n",
ffffffff813676b6: 48 8b 45 d0 mov -0x30(%rbp),%rax
ffffffff813676ba: 48 8b 40 38 mov 0x38(%rax),%rax
ffffffff813676be: 8b 38 mov (%rax),%edi
ffffffff813676c0: 48 8d b5 20 ff ff ff lea -0xe0(%rbp),%rsi
ffffffff813676c7: e8 b4 03 00 00 callq ffffffff81367a80 <drm_get_format_name>
ffffffff813676cc: 48 89 c1 mov %rax,%rcx
ffffffff813676cf: 48 c7 c7 4d 82 95 81 mov $0xffffffff8195824d,%rdi
ffffffff813676d6: 48 c7 c2 1a 7b 9b 81 mov $0xffffffff819b7b1a,%rdx
ffffffff813676dd: 31 c0 xor %eax,%eax
ffffffff813676df: 6a 04 pushq $0x4
ffffffff813676e1: 5e pop %rsi
ffffffff813676e2: e8 3b c4 ff ff callq ffffffff81363b22 <drm_printk>
ffffffff813676e7: e9 4a 01 00 00 jmpq ffffffff81367836 <drm_mode_setcrtc+0x3ea>
{
if (size != 0 && n > SIZE_MAX / size)
return NULL;
if (__builtin_constant_p(n) && __builtin_constant_p(size))
return kmalloc(n * size, flags);
return __kmalloc(n * size, flags);
ffffffff813676ec: 48 c1 e1 03 shl $0x3,%rcx
ffffffff813676f0: be c0 00 40 01 mov $0x14000c0,%esi
ffffffff813676f5: 48 89 cf mov %rcx,%rdi
ffffffff813676f8: e8 cc 78 dd ff callq ffffffff8113efc9 <__kmalloc>
}

connector_set = kmalloc_array(crtc_req->count_connectors,
sizeof(struct drm_connector *),
GFP_KERNEL);
if (!connector_set) {
ffffffff813676fd: 48 85 c0 test %rax,%rax
ffffffff81367700: 48 8d b5 40 ff ff ff lea -0xc0(%rbp),%rsi
ffffffff81367707: 0f 84 b2 00 00 00 je ffffffff813677bf <drm_mode_setcrtc+0x373>
ffffffff8136770d: 31 db xor %ebx,%ebx
ffffffff8136770f: 48 89 c2 mov %rax,%rdx
ffffffff81367712: 48 89 55 c8 mov %rdx,-0x38(%rbp)
ffffffff81367716: eb 35 jmp ffffffff8136774d <drm_mode_setcrtc+0x301>
DRM_DEBUG_KMS("Connector id %d unknown\n",
out_id);
ret = -ENOENT;
goto out;
}
DRM_DEBUG_KMS("[CONNECTOR:%d:%s]\n",
ffffffff81367718: 41 8b 4d 28 mov 0x28(%r13),%ecx
ffffffff8136771c: 4d 8b 45 48 mov 0x48(%r13),%r8
ffffffff81367720: 48 c7 c7 4d 82 95 81 mov $0xffffffff8195824d,%rdi
ffffffff81367727: 48 c7 c2 84 57 9b 81 mov $0xffffffff819b5784,%rdx
ffffffff8136772e: 31 c0 xor %eax,%eax
ffffffff81367730: 6a 04 pushq $0x4
ffffffff81367732: 5e pop %rsi
ffffffff81367733: e8 ea c3 ff ff callq ffffffff81363b22 <drm_printk>
ffffffff81367738: 48 8b 45 c8 mov -0x38(%rbp),%rax
connector->base.id,
connector->name);

connector_set[i] = connector;
ffffffff8136773c: 4e 89 2c f8 mov %r13,(%rax,%r15,8)
if (!connector_set) {
ret = -ENOMEM;
goto out;
}

for (i = 0; i < crtc_req->count_connectors; i++) {
ffffffff81367740: ff c3 inc %ebx
ffffffff81367742: 4c 8b 6d b8 mov -0x48(%rbp),%r13
ffffffff81367746: 48 8d b5 40 ff ff ff lea -0xc0(%rbp),%rsi
ffffffff8136774d: 41 8b 56 08 mov 0x8(%r14),%edx
ffffffff81367751: 39 d3 cmp %edx,%ebx
ffffffff81367753: 73 79 jae ffffffff813677ce <drm_mode_setcrtc+0x382>
connector_set[i] = NULL;
ffffffff81367755: 4c 63 fb movslq %ebx,%r15
ffffffff81367758: 4a 83 24 f8 00 andq $0x0,(%rax,%r15,8)
set_connectors_ptr = (uint32_t __user *)(unsigned long)crtc_req->set_connectors_ptr;
if (get_user(out_id, &set_connectors_ptr[i])) {
ffffffff8136775d: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
ffffffff81367764: 00
ffffffff81367765: 49 03 06 add (%r14),%rax
ffffffff81367768: e8 83 26 f1 ff callq ffffffff81279df0 <__get_user_4>
ffffffff8136776d: 49 89 d4 mov %rdx,%r12
ffffffff81367770: 85 c0 test %eax,%eax
ffffffff81367772: 0f 85 99 00 00 00 jne ffffffff81367811 <drm_mode_setcrtc+0x3c5>
*/
static inline struct drm_connector *drm_connector_lookup(struct drm_device *dev,
uint32_t id)
{
struct drm_mode_object *mo;
mo = drm_mode_object_find(dev, id, DRM_MODE_OBJECT_CONNECTOR);
ffffffff81367778: ba c0 c0 c0 c0 mov $0xc0c0c0c0,%edx
ffffffff8136777d: 48 8b 7d b0 mov -0x50(%rbp),%rdi
ffffffff81367781: 44 89 e6 mov %r12d,%esi
ffffffff81367784: e8 b1 db 00 00 callq ffffffff8137533a <drm_mode_object_find>
ffffffff81367789: 49 89 c5 mov %rax,%r13
return mo ? obj_to_connector(mo) : NULL;
ffffffff8136778c: 4d 85 ed test %r13,%r13
ret = -EFAULT;
goto out;
}

connector = drm_connector_lookup(dev, out_id);
if (!connector) {
ffffffff8136778f: 74 06 je ffffffff81367797 <drm_mode_setcrtc+0x34b>
ffffffff81367791: 49 83 c5 d8 add $0xffffffffffffffd8,%r13
ffffffff81367795: 75 81 jne ffffffff81367718 <drm_mode_setcrtc+0x2cc>
DRM_DEBUG_KMS("Connector id %d unknown\n",
ffffffff81367797: 48 c7 c7 4d 82 95 81 mov $0xffffffff8195824d,%rdi
ffffffff8136779e: 48 c7 c2 85 7b 9b 81 mov $0xffffffff819b7b85,%rdx
ffffffff813677a5: 31 c0 xor %eax,%eax
ffffffff813677a7: 6a 04 pushq $0x4
ffffffff813677a9: 5e pop %rsi
ffffffff813677aa: 44 89 e1 mov %r12d,%ecx
ffffffff813677ad: e8 70 c3 ff ff callq ffffffff81363b22 <drm_printk>
ffffffff813677b2: 6a fe pushq $0xfffffffffffffffe
ffffffff813677b4: 41 5c pop %r12
ffffffff813677b6: 4c 8b 7d c8 mov -0x38(%rbp),%r15
ffffffff813677ba: e9 c5 fe ff ff jmpq ffffffff81367684 <drm_mode_setcrtc+0x238>
if (crtc_req->count_connectors > config->num_connector) {
ret = -EINVAL;
goto out;
}

connector_set = kmalloc_array(crtc_req->count_connectors,
ffffffff813677bf: 49 89 c7 mov %rax,%r15
ffffffff813677c2: 6a f4 pushq $0xfffffffffffffff4
ffffffff813677c4: 41 5c pop %r12
ffffffff813677c6: 31 c0 xor %eax,%eax
ffffffff813677c8: 48 89 45 c8 mov %rax,-0x38(%rbp)
ffffffff813677cc: eb 61 jmp ffffffff8136782f <drm_mode_setcrtc+0x3e3>
ffffffff813677ce: 49 89 c7 mov %rax,%r15

connector_set[i] = connector;
}
}

set.crtc = crtc;
ffffffff813677d1: 4c 89 6d 80 mov %r13,-0x80(%rbp)
set.x = crtc_req->x;
ffffffff813677d5: 41 8b 4e 14 mov 0x14(%r14),%ecx
ffffffff813677d9: 89 4d 90 mov %ecx,-0x70(%rbp)
set.y = crtc_req->y;
ffffffff813677dc: 41 8b 4e 18 mov 0x18(%r14),%ecx
ffffffff813677e0: 89 4d 94 mov %ecx,-0x6c(%rbp)
set.mode = mode;
ffffffff813677e3: 48 8b 45 c0 mov -0x40(%rbp),%rax
ffffffff813677e7: 48 89 45 88 mov %rax,-0x78(%rbp)
set.connectors = connector_set;
ffffffff813677eb: 4c 89 7d 98 mov %r15,-0x68(%rbp)
set.num_connectors = crtc_req->count_connectors;
ffffffff813677ef: 89 d0 mov %edx,%eax
ffffffff813677f1: 48 89 45 a0 mov %rax,-0x60(%rbp)
set.fb = fb;
ffffffff813677f5: 48 8b 45 d0 mov -0x30(%rbp),%rax
ffffffff813677f9: 48 89 85 78 ff ff ff mov %rax,-0x88(%rbp)
ret = __drm_mode_set_config_internal(&set, &ctx);
ffffffff81367800: 48 8d bd 78 ff ff ff lea -0x88(%rbp),%rdi
ffffffff81367807: e8 d6 fa ff ff callq ffffffff813672e2 <__drm_mode_set_config_internal>
ffffffff8136780c: 41 89 c4 mov %eax,%r12d
ffffffff8136780f: eb 1e jmp ffffffff8136782f <drm_mode_setcrtc+0x3e3>
ffffffff81367811: 6a f2 pushq $0xfffffffffffffff2
ffffffff81367813: 41 5c pop %r12
ffffffff81367815: 4c 8b 7d c8 mov -0x38(%rbp),%r15
ffffffff81367819: eb 14 jmp ffffffff8136782f <drm_mode_setcrtc+0x3e3>
DRM_DEBUG_KMS("[CRTC:%d:%s]\n", crtc->base.id, crtc->name);

mutex_lock(&crtc->dev->mode_config.mutex);
drm_modeset_acquire_init(&ctx, 0);
retry:
ret = drm_modeset_lock_all_ctx(crtc->dev, &ctx);
ffffffff8136781b: 49 8b 7d 00 mov 0x0(%r13),%rdi
ffffffff8136781f: 48 89 de mov %rbx,%rsi
ffffffff81367822: e8 08 7b 00 00 callq ffffffff8136f32f <drm_modeset_lock_all_ctx>
ffffffff81367827: 41 89 c4 mov %eax,%r12d
if (ret)
ffffffff8136782a: 45 85 e4 test %r12d,%r12d
ffffffff8136782d: 74 16 je ffffffff81367845 <drm_mode_setcrtc+0x3f9>
set.num_connectors = crtc_req->count_connectors;
set.fb = fb;
ret = __drm_mode_set_config_internal(&set, &ctx);

out:
if (fb)
ffffffff8136782f: 48 83 7d d0 00 cmpq $0x0,-0x30(%rbp)
ffffffff81367834: 74 6b je ffffffff813678a1 <drm_mode_setcrtc+0x455>
* This function decrements the framebuffer's reference count and frees the
* framebuffer if the reference count drops to zero.
*/
static inline void drm_framebuffer_put(struct drm_framebuffer *fb)
{
drm_mode_object_put(&fb->base);
ffffffff81367836: 48 8b 45 d0 mov -0x30(%rbp),%rax
ffffffff8136783a: 48 8d 78 18 lea 0x18(%rax),%rdi
ffffffff8136783e: e8 07 db 00 00 callq ffffffff8137534a <drm_mode_object_put>
ffffffff81367843: eb 62 jmp ffffffff813678a7 <drm_mode_setcrtc+0x45b>
drm_modeset_acquire_init(&ctx, 0);
retry:
ret = drm_modeset_lock_all_ctx(crtc->dev, &ctx);
if (ret)
goto out;
if (crtc_req->mode_valid) {
ffffffff81367845: 41 83 7e 20 00 cmpl $0x0,0x20(%r14)
ffffffff8136784a: 0f 84 a0 fd ff ff je ffffffff813675f0 <drm_mode_setcrtc+0x1a4>
/* If we have a mode we need a framebuffer. */
/* If we pass -1, set the mode with the currently bound fb */
if (crtc_req->fb_id == -1) {
ffffffff81367850: 41 8b 76 10 mov 0x10(%r14),%esi
ffffffff81367854: 83 fe ff cmp $0xffffffff,%esi
ffffffff81367857: 0f 84 db fc ff ff je ffffffff81367538 <drm_mode_setcrtc+0xec>
ffffffff8136785d: 48 8b 5d b0 mov -0x50(%rbp),%rbx
}
fb = crtc->primary->fb;
/* Make refcounting symmetric with the lookup path. */
drm_framebuffer_get(fb);
} else {
fb = drm_framebuffer_lookup(dev, crtc_req->fb_id);
ffffffff81367861: 48 89 df mov %rbx,%rdi
ffffffff81367864: e8 64 b3 00 00 callq ffffffff81372bcd <drm_framebuffer_lookup>
ffffffff81367869: 48 89 c1 mov %rax,%rcx
if (!fb) {
ffffffff8136786c: 48 89 4d d0 mov %rcx,-0x30(%rbp)
ffffffff81367870: 48 85 c0 test %rax,%rax
ffffffff81367873: 0f 85 eb fc ff ff jne ffffffff81367564 <drm_mode_setcrtc+0x118>
DRM_DEBUG_KMS("Unknown FB ID%d\n",
ffffffff81367879: 41 8b 4e 10 mov 0x10(%r14),%ecx
ffffffff8136787d: 48 c7 c7 4d 82 95 81 mov $0xffffffff8195824d,%rdi
ffffffff81367884: 48 c7 c2 fb 7a 9b 81 mov $0xffffffff819b7afb,%rdx
ffffffff8136788b: 31 c0 xor %eax,%eax
ffffffff8136788d: 6a 04 pushq $0x4
ffffffff8136788f: 5e pop %rsi
ffffffff81367890: e8 8d c2 ff ff callq ffffffff81363b22 <drm_printk>
ffffffff81367895: 31 c0 xor %eax,%eax
ffffffff81367897: 48 89 45 d0 mov %rax,-0x30(%rbp)
ffffffff8136789b: 6a fe pushq $0xfffffffffffffffe
ffffffff8136789d: 41 5c pop %r12
ffffffff8136789f: eb 06 jmp ffffffff813678a7 <drm_mode_setcrtc+0x45b>
ffffffff813678a1: 31 c0 xor %eax,%eax
ffffffff813678a3: 48 89 45 d0 mov %rax,-0x30(%rbp)

out:
if (fb)
drm_framebuffer_put(fb);

if (connector_set) {
ffffffff813678a7: 4d 85 ff test %r15,%r15
ffffffff813678aa: 74 27 je ffffffff813678d3 <drm_mode_setcrtc+0x487>
ffffffff813678ac: 31 db xor %ebx,%ebx
ffffffff813678ae: eb 17 jmp ffffffff813678c7 <drm_mode_setcrtc+0x47b>
for (i = 0; i < crtc_req->count_connectors; i++) {
if (connector_set[i])
ffffffff813678b0: 48 63 c3 movslq %ebx,%rax
ffffffff813678b3: 49 8b 3c c7 mov (%r15,%rax,8),%rdi
ffffffff813678b7: 48 85 ff test %rdi,%rdi
ffffffff813678ba: 74 09 je ffffffff813678c5 <drm_mode_setcrtc+0x479>
* This function decrements the connector's reference count and frees the
* object if the reference count drops to zero.
*/
static inline void drm_connector_put(struct drm_connector *connector)
{
drm_mode_object_put(&connector->base);
ffffffff813678bc: 48 83 c7 28 add $0x28,%rdi
ffffffff813678c0: e8 85 da 00 00 callq ffffffff8137534a <drm_mode_object_put>
out:
if (fb)
drm_framebuffer_put(fb);

if (connector_set) {
for (i = 0; i < crtc_req->count_connectors; i++) {
ffffffff813678c5: ff c3 inc %ebx
ffffffff813678c7: 41 3b 5e 08 cmp 0x8(%r14),%ebx
ffffffff813678cb: 72 e3 jb ffffffff813678b0 <drm_mode_setcrtc+0x464>
ffffffff813678cd: 4c 8b 6d b8 mov -0x48(%rbp),%r13
ffffffff813678d1: eb 03 jmp ffffffff813678d6 <drm_mode_setcrtc+0x48a>
ffffffff813678d3: 45 31 ff xor %r15d,%r15d
if (connector_set[i])
drm_connector_put(connector_set[i]);
}
}
kfree(connector_set);
ffffffff813678d6: 48 8b 7d c8 mov -0x38(%rbp),%rdi
ffffffff813678da: e8 af 78 dd ff callq ffffffff8113f18e <kfree>
drm_mode_destroy(dev, mode);
ffffffff813678df: 48 8b 7d b0 mov -0x50(%rbp),%rdi
ffffffff813678e3: 48 8b 75 c0 mov -0x40(%rbp),%rsi
ffffffff813678e7: e8 ef 04 00 00 callq ffffffff81367ddb <drm_mode_destroy>
if (ret == -EDEADLK) {
ffffffff813678ec: 41 83 fc dd cmp $0xffffffdd,%r12d
ffffffff813678f0: 0f 84 2e fc ff ff je ffffffff81367524 <drm_mode_setcrtc+0xd8>
ffffffff813678f6: 4c 8d b5 40 ff ff ff lea -0xc0(%rbp),%r14
drm_modeset_backoff(&ctx);
goto retry;
}
drm_modeset_drop_locks(&ctx);
ffffffff813678fd: 4c 89 f7 mov %r14,%rdi
ffffffff81367900: e8 62 7b 00 00 callq ffffffff8136f467 <drm_modeset_drop_locks>
drm_modeset_acquire_fini(&ctx);
ffffffff81367905: 4c 89 f7 mov %r14,%rdi
ffffffff81367908: e8 ad 7a 00 00 callq ffffffff8136f3ba <drm_modeset_acquire_fini>
mutex_unlock(&crtc->dev->mode_config.mutex);
ffffffff8136790d: 49 8b 7d 00 mov 0x0(%r13),%rdi
ffffffff81367911: 48 81 c7 b0 02 00 00 add $0x2b0,%rdi
ffffffff81367918: e8 fb 37 2e 00 callq ffffffff8164b118 <mutex_unlock>
ffffffff8136791d: eb 20 jmp ffffffff8136793f <drm_mode_setcrtc+0x4f3>
if (crtc_req->x & 0xffff0000 || crtc_req->y & 0xffff0000)
return -ERANGE;

crtc = drm_crtc_find(dev, crtc_req->crtc_id);
if (!crtc) {
DRM_DEBUG_KMS("Unknown CRTC ID %d\n", crtc_req->crtc_id);
ffffffff8136791f: 41 8b 4e 0c mov 0xc(%r14),%ecx
ffffffff81367923: 6a 04 pushq $0x4
ffffffff81367925: 5e pop %rsi
ffffffff81367926: 48 c7 c7 4d 82 95 81 mov $0xffffffff8195824d,%rdi
ffffffff8136792d: 48 c7 c2 c9 7a 9b 81 mov $0xffffffff819b7ac9,%rdx
ffffffff81367934: 31 c0 xor %eax,%eax
ffffffff81367936: e8 e7 c1 ff ff callq ffffffff81363b22 <drm_printk>
ffffffff8136793b: 6a fe pushq $0xfffffffffffffffe
ffffffff8136793d: 41 5c pop %r12
drm_modeset_drop_locks(&ctx);
drm_modeset_acquire_fini(&ctx);
mutex_unlock(&crtc->dev->mode_config.mutex);

return ret;
}
ffffffff8136793f: 44 89 e0 mov %r12d,%eax
ffffffff81367942: 48 81 c4 b8 00 00 00 add $0xb8,%rsp
ffffffff81367949: 5b pop %rbx
ffffffff8136794a: 41 5c pop %r12
ffffffff8136794c: 41 5d pop %r13
ffffffff8136794e: 41 5e pop %r14
ffffffff81367950: 41 5f pop %r15
ffffffff81367952: 5d pop %rbp
ffffffff81367953: c3 retq

============================================================

Double fault with f05058c4d652:

[ 3.798722] PANIC: double fault, error_code: 0x0
[ 3.807387] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #107
[ 3.816040] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
[ 3.824792] task: ffff880075b92f00 task.stack: ffffc90000d6c000
[ 3.829599] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,data=ordered
[ 3.839092] RIP: 0010:drm_mode_setcrtc+0x328/0x51f
[ 3.844443] RSP: 0018:0000000000000000 EFLAGS: 00010206
[ 3.850280] RAX: 0000559e707c4d60 RBX: 0000000000000000 RCX: 0000000000000008
[ 3.858253] RDX: 0000000000000001 RSI: ffffc90000d6fcc8 RDI: ffffffff81367805
[ 3.866225] RBP: ffffc90000d6fd90 R08: 00000000014000c0 R09: 0000000000000308
[ 3.874199] R10: 0000000000000300 R11: 0000000000000556 R12: 0000000000000000
[ 3.882163] R13: ffff880077a25000 R14: ffffc90000d6fdd0 R15: 0000000000000000
[ 3.890136] FS: 00007fb2dbd62740(0000) GS:ffff88007ad00000(0000) knlGS:0000000000000000
[ 3.899177] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.905595] CR2: fffffffffffffff8 CR3: 000000007596b000 CR4: 00000000001006e0
[ 3.913568] Call Trace:
[ 3.916296] Code: b8 48 8d b5 38 ff ff ff 41 8b 56 08 39 d3 0f 83 85 00 00 00 4c 63 fb 4a 83 24 f8 00 4a 8d 04 bd 00 00 00 00 49 03 06 48 8b 65 a8 <e8> 73 26 f1 ff 49 89 d4 48 89 65 a8 85 c0 0f 85 a0 00 00 00 ba
[ 3.937440] Kernel panic - not syncing: Machine halted.
[ 3.943268] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #107
[ 3.951921] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
[ 3.960671] Call Trace:
[ 3.963398] <#DF>
[ 3.965637] __dump_stack+0x19/0x1b
[ 3.969531] dump_stack+0x42/0x60
[ 3.969541] PANIC: double fault, error_code: 0x0
[ 3.969545] CPU: 0 PID: 719 Comm: rsyslogd Not tainted 4.12.0-00023-g711d82c128ff #107
[ 3.969546] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
[ 3.969548] task: ffff880075210000 task.stack: ffffc90000554000
[ 3.969554] RIP: 0010:SyS_setgroups+0x6b/0xd4
[ 3.969556] RSP: 0018:ffffffff8106360d EFLAGS: 00010297
[ 3.969558] RAX: 0000558afd6664e0 RBX: 0000000000000000 RCX: 0000000000000001
[ 3.969559] RDX: 000000000000000c RSI: 0000000000000000 RDI: ffffffff8106360d
[ 3.969560] RBP: ffffc90000557f48 R08: ffffffffffffffea R09: ffffc90000557e50
[ 3.969562] R10: 00007fb4acc1d0d0 R11: ffff880075210000 R12: 0000558afd6664e0
[ 3.969563] R13: 00007fb4acdf9780 R14: fffffffffffffff2 R15: ffff880076d13f50
[ 3.969565] FS: 00007fb4acdf9780(0000) GS:ffff88007ac00000(0000) knlGS:0000000000000000
[ 3.969566] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.969568] CR2: ffffffff810635f8 CR3: 00000000751fb000 CR4: 00000000001006f0
[ 3.969569] Call Trace:
[ 3.969571] Code: 5d c3 89 df e8 9c fd ff ff 49 89 c7 4d 85 ff 74 5b 41 8b 4f 04 31 f6 6a f2 41 5e 6a ea 41 58 eb 22 48 63 de 49 8d 04 9c 48 89 fc <e8> 7c 66 21 00 85 c0 75 41 83 fa ff 74 39 48 89 e7 41 89 54 9f
[ 4.100376] panic+0xf0/0x23e
[ 4.103688] df_debug+0x31/0x31
[ 4.107193] do_double_fault+0xd8/0xfb
[ 4.111379] double_fault+0x22/0x30
[ 4.115274] RIP: 0010:drm_mode_setcrtc+0x328/0x51f
[ 4.120624] RSP: 0018:0000000000000000 EFLAGS: 00010206
[ 4.126451] RAX: 0000559e707c4d60 RBX: 0000000000000000 RCX: 0000000000000008
[ 4.134424] RDX: 0000000000000001 RSI: ffffc90000d6fcc8 RDI: ffffffff81367805
[ 4.142397] RBP: ffffc90000d6fd90 R08: 00000000014000c0 R09: 0000000000000308
[ 4.150369] R10: 0000000000000300 R11: 0000000000000556 R12: 0000000000000000
[ 4.158341] R13: ffff880077a25000 R14: ffffc90000d6fdd0 R15: 0000000000000000
[ 4.166316] ? drm_mode_setcrtc+0x2b5/0x51f
[ 4.170988] WARNING: kernel stack regs at ffff88007ad05f58 in frecon:605 has bad 'bp' value ffffc90000d6fd90
[ 4.170990] unwind stack type:0 next_sp: (null) mask:0x10 graph_idx:0
[ 4.170993] ffff88007ad05d40: ffff88007ad05e20 (0xffff88007ad05e20)
[ 4.170995] ffff88007ad05d48: ffffffff810184b0 (show_trace_log_lvl+0x163/0x23a)
[ 4.170996] ffff88007ad05d50: 0000000000000000 ...
[ 4.170998] ffff88007ad05d58: ffff88007ad05000 (0xffff88007ad05000)
[ 4.170999] ffff88007ad05d60: ffff88007ad06000 (0xffff88007ad06000)
[ 4.171000] ffff88007ad05d68: 0000000000000000 ...
[ 4.171001] ffff88007ad05d70: 0000000000000010 (0x10)
[ 4.171003] ffff88007ad05d78: ffff880075b92f00 (0xffff880075b92f00)
[ 4.171004] ffff88007ad05d80: 0000010100000000 (0x10100000000)
[ 4.171005] ffff88007ad05d88: 0000000000000000 ...
[ 4.171006] ffff88007ad05d90: ffff88007ad05d40 (0xffff88007ad05d40)
[ 4.171008] ffff88007ad05d98: ffff88007ad05f58 (0xffff88007ad05f58)
[ 4.171010] ffff88007ad05da0: ffffffff81367878 (drm_mode_setcrtc+0x328/0x51f)
[ 4.171011] ffff88007ad05da8: 0000000000000004 (0x4)
[ 4.171013] ffff88007ad05db0: ffff88007ad05000 (0xffff88007ad05000)
[ 4.171014] ffff88007ad05db8: ffff88007ad06000 (0xffff88007ad06000)
[ 4.171015] ffff88007ad05dc0: 0000000000000000 ...
[ 4.171016] ffff88007ad05dc8: 0000000000000010 (0x10)
[ 4.171018] ffff88007ad05dd0: ffff88007ad05f58 (0xffff88007ad05f58)
[ 4.171019] ffff88007ad05dd8: ffff880075b92f00 (0xffff880075b92f00)
[ 4.171020] ffff88007ad05de0: ffffffff8195ba29 (0xffffffff8195ba29)
[ 4.171022] ffff88007ad05de8: 0000000081e70fc0 (0x81e70fc0)
[ 4.171023] ffff88007ad05df0: ffffffff8195b9f9 (0xffffffff8195b9f9)
[ 4.171024] ffff88007ad05df8: 0000000000000086 (0x86)
[ 4.171025] ffff88007ad05e00: 0000000000000000 ...
[ 4.171027] ffff88007ad05e08: ffff880077a25000 (0xffff880077a25000)
[ 4.171028] ffff88007ad05e10: ffff88007ad05f00 (0xffff88007ad05f00)
[ 4.171029] ffff88007ad05e18: ffff880075b92f00 (0xffff880075b92f00)
[ 4.171031] ffff88007ad05e20: ffff88007ad05e30 (0xffff88007ad05e30)
[ 4.171033] ffff88007ad05e28: ffffffff81018647 (show_stack+0x45/0x47)
[ 4.171034] ffff88007ad05e30: ffff88007ad05e40 (0xffff88007ad05e40)
[ 4.171038] ffff88007ad05e38: ffffffff8126f18f (__dump_stack+0x19/0x1b)
[ 4.171039] ffff88007ad05e40: ffff88007ad05e70 (0xffff88007ad05e70)
[ 4.171042] ffff88007ad05e48: ffffffff8126f158 (dump_stack+0x42/0x60)
[ 4.171043] ffff88007ad05e50: 0000000000000086 (0x86)
[ 4.171044] ffff88007ad05e58: 0000000000000000 ...
[ 4.171045] ffff88007ad05e60: 0000000200000000 (0x200000000)
[ 4.171046] ffff88007ad05e68: ffffffff819641ff (0xffffffff819641ff)
[ 4.171048] ffff88007ad05e70: ffff88007ad05f08 (0xffff88007ad05f08)
[ 4.171050] ffff88007ad05e78: ffffffff81047bea (panic+0xf0/0x23e)
[ 4.171051] ffff88007ad05e80: ffff88007ad05e90 (0xffff88007ad05e90)
[ 4.171052] ffff88007ad05e88: 000000007596b000 (0x7596b000)
[ 4.171054] ffff88007ad05e90: 00000000001006e0 (0x1006e0)
[ 4.171055] ffff88007ad05e98: 0000000000000002 (0x2)
[ 4.171056] ffff88007ad05ea0: 0000000000000000 ...
[ 4.171057] ffff88007ad05ea8: ffffffff81c42180 (0xffffffff81c42180)
[ 4.171058] ffff88007ad05eb0: 0000000200000000 (0x200000000)
[ 4.171060] ffff88007ad05eb8: 00000000ffff0a30 (0xffff0a30)
[ 4.171061] ffff88007ad05ec0: 0000003000000008 (0x3000000008)
[ 4.171062] ffff88007ad05ec8: ffff88007ad05f18 (0xffff88007ad05f18)
[ 4.171064] ffff88007ad05ed0: ffff88007ad05e90 (0xffff88007ad05e90)
[ 4.171065] ffff88007ad05ed8: ba00000000000024 (0xba00000000000024)
[ 4.171066] ffff88007ad05ee0: ffff88007ad05f58 (0xffff88007ad05f58)
[ 4.171067] ffff88007ad05ee8: 0000000000000000 ...
[ 4.171069] ffff88007ad05ef0: ffff880077a25000 (0xffff880077a25000)
[ 4.171070] ffff88007ad05ef8: ffff88007ad05f58 (0xffff88007ad05f58)
[ 4.171071] ffff88007ad05f00: ffff880075b92f00 (0xffff880075b92f00)
[ 4.171073] ffff88007ad05f08: ffff88007ad05f20 (0xffff88007ad05f20)
[ 4.171075] ffff88007ad05f10: ffffffff81037185 (df_debug+0x31/0x31)
[ 4.171075] ffff88007ad05f18: 0000000000000000 ...
[ 4.171077] ffff88007ad05f20: ffff88007ad05f48 (0xffff88007ad05f48)
[ 4.171079] ffff88007ad05f28: ffffffff81016626 (do_double_fault+0xd8/0xfb)
[ 4.171080] ffff88007ad05f30: 0000000000000001 (0x1)
[ 4.171082] ffff88007ad05f38: ffffc90000d6fdd0 (0xffffc90000d6fdd0)
[ 4.171082] ffff88007ad05f40: 0000000000000000 ...
[ 4.171084] ffff88007ad05f48: ffff88007ad05f59 (0xffff88007ad05f59)
[ 4.171086] ffff88007ad05f50: ffffffff8164df72 (double_fault+0x22/0x30)
[ 4.171087] ffff88007ad05f58: 0000000000000000 ...
[ 4.171088] ffff88007ad05f60: ffffc90000d6fdd0 (0xffffc90000d6fdd0)
[ 4.171090] ffff88007ad05f68: ffff880077a25000 (0xffff880077a25000)
[ 4.171091] ffff88007ad05f70: 0000000000000000 ...
[ 4.171092] ffff88007ad05f78: ffffc90000d6fd90 (0xffffc90000d6fd90)
[ 4.171093] ffff88007ad05f80: 0000000000000000 ...
[ 4.171094] ffff88007ad05f88: 0000000000000556 (0x556)
[ 4.171095] ffff88007ad05f90: 0000000000000300 (0x300)
[ 4.171096] ffff88007ad05f98: 0000000000000308 (0x308)
[ 4.171098] ffff88007ad05fa0: 00000000014000c0 (0x14000c0)
[ 4.171099] ffff88007ad05fa8: 0000559e707c4d60 (0x559e707c4d60)
[ 4.171100] ffff88007ad05fb0: 0000000000000008 (0x8)
[ 4.171101] ffff88007ad05fb8: 0000000000000001 (0x1)
[ 4.171103] ffff88007ad05fc0: ffffc90000d6fcc8 (0xffffc90000d6fcc8)
[ 4.171105] ffff88007ad05fc8: ffffffff81367805 (drm_mode_setcrtc+0x2b5/0x51f)
[ 4.171107] ffff88007ad05fd0: ffffffffffffffff (0xffffffffffffffff)
[ 4.171109] ffff88007ad05fd8: ffffffff81367878 (drm_mode_setcrtc+0x328/0x51f)
[ 4.171110] ffff88007ad05fe0: 0000000000000010 (0x10)
[ 4.171111] ffff88007ad05fe8: 0000000000010206 (0x10206)
[ 4.171112] ffff88007ad05ff0: 0000000000000000 ...
[ 4.171114] ffff88007ad05ff8: 0000000000000018 (0x18)
[ 4.171115] </#DF>
[ 5.819749] Shutting down cpus with NMI
[ 5.824035] Kernel Offset: disabled
[ 5.827930] ACPI MEMORY or I/O RESET_REG.

2017-07-13 18:00:14

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Wed, Jul 12, 2017 at 04:22:13PM -0700, Matthias Kaehlcke wrote:
> El Wed, Jul 12, 2017 at 05:36:30PM -0500 Josh Poimboeuf ha dit:
>
> > On Wed, Jul 12, 2017 at 05:35:47PM -0500, Josh Poimboeuf wrote:
> > > On Wed, Jul 12, 2017 at 03:20:40PM -0700, Matthias Kaehlcke wrote:
> > > > > This is admittedly an awkward way of achieving this goal, but it's the
> > > > > only way I know how to do it with GCC.
> > > > >
> > > > > What extra instruction does clang add?
> > > >
> > > > I was looking at the get_user() call in drm_mode_setcrtc(). The code
> > > > generated by clang without the patch is:
> > > >
> > > > if (get_user(out_id, &set_connectors_ptr[i])) {
> > > > ffffffff81386955: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
> > > > ffffffff8138695c: 00
> > > > ffffffff8138695d: 49 03 06 add (%r14),%rax
> > > > ffffffff81386960: e8 2b a5 f0 ff callq ffffffff81290e90 <__get_user_4>
> > > >
> > > > And with the patch:
> > > >
> > > > if (get_user(out_id, &set_connectors_ptr[i])) {
> > > > ffffffff81386a56: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
> > > > ffffffff81386a5d: 00
> > > > ffffffff81386a5e: 49 03 06 add (%r14),%rax
> > > > ffffffff81386a61: 48 8b 64 24 28 mov 0x28(%rsp),%rsp
> > > > ffffffff81386a66: e8 15 a5 f0 ff callq
> > > > ffffffff81290f80 <__get_user_4>
> > >
> > > Hm, that seems odd. Can you sure the disassembly for the whole
> > > function?
> >
> > Er, share :-)
>
> Sure, please find below the disassemblies with and without the
> patch. The exact extra instruction differs from the one above, the
> disassembly above is from a debug session with some 'random' kernel
> version (bisect), the ones below from a v4.12ish kernel. At the bottom
> you also find a log of a double faults observed with the patch.
>
> If you are interested in building the kernel with clang yourself I can
> provide instructions, it is fairly painless nowadays as long as you
> have a recent version of clang (a somewhat older version should also
> do for this issue with some extra kernel patches).

Here's the reason for the double fault. First it puts zero on the stack
at offset -0x58:

> ffffffff81367616: 31 c0 xor %eax,%eax
> ffffffff81367618: 48 89 45 c8 mov %rax,-0x38(%rbp)
> ffffffff8136761c: 45 31 ff xor %r15d,%r15d
> ffffffff8136761f: 48 89 45 a8 mov %rax,-0x58(%rbp)

Then, later, it copies that zeroed word from the stack to RSP:

> ffffffff81367874: 48 8b 65 a8 mov -0x58(%rbp),%rsp

Then it double faults because the call instruction tries to write RIP on
the stack, but RSP is zero:

> ffffffff81367878: e8 73 26 f1 ff callq ffffffff81279ef0 <__get_user_4>

Then clang tries to put RSP's value on the stack, at the same stack slot
where the original zero was stored (though it never reaches this point):

> ffffffff8136787d: 49 89 d4 mov %rdx,%r12
> ffffffff81367880: 48 89 65 a8 mov %rsp,-0x58(%rbp)

The panic is consistent with the above. RIP points to the call
instruction, RSP is zero:

> [ 3.798722] PANIC: double fault, error_code: 0x0
> [ 3.807387] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #107
> [ 3.816040] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
> [ 3.824792] task: ffff880075b92f00 task.stack: ffffc90000d6c000
> [ 3.829599] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,data=ordered
> [ 3.839092] RIP: 0010:drm_mode_setcrtc+0x328/0x51f
> [ 3.844443] RSP: 0018:0000000000000000 EFLAGS: 00010206
> [ 3.850280] RAX: 0000559e707c4d60 RBX: 0000000000000000 RCX: 0000000000000008
> [ 3.858253] RDX: 0000000000000001 RSI: ffffc90000d6fcc8 RDI: ffffffff81367805
> [ 3.866225] RBP: ffffc90000d6fd90 R08: 00000000014000c0 R09: 0000000000000308
> [ 3.874199] R10: 0000000000000300 R11: 0000000000000556 R12: 0000000000000000
> [ 3.882163] R13: ffff880077a25000 R14: ffffc90000d6fdd0 R15: 0000000000000000
> [ 3.890136] FS: 00007fb2dbd62740(0000) GS:ffff88007ad00000(0000) knlGS:0000000000000000
> [ 3.899177] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3.905595] CR2: fffffffffffffff8 CR3: 000000007596b000 CR4: 00000000001006e0
> [ 3.913568] Call Trace:
> [ 3.916296] Code: b8 48 8d b5 38 ff ff ff 41 8b 56 08 39 d3 0f 83 85 00 00 00 4c 63 fb 4a 83 24 f8 00 4a 8d 04 bd 00 00 00 00 49 03 06 48 8b 65 a8 <e8> 73 26 f1 ff 49 89 d4 48 89 65 a8 85 c0 0f 85 a0 00 00 00 ba
> [ 3.937440] Kernel panic - not syncing: Machine halted.
> [ 3.943268] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #107
> [ 3.951921] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
> [ 3.960671] Call Trace:
> [ 3.963398] <#DF>
> [ 3.965637] __dump_stack+0x19/0x1b
> [ 3.969531] dump_stack+0x42/0x60

clang is obviously getting confused by the RSP output constraint. I
think it tries to take the constraint literally, since it takes RSP as
an output from the inline asm and stores it on the stack. However, that
behavior doesn't really make sense for a "register" variable. It also
doesn't explain why it's zeroing the register out first.

What happens if you try the below patch instead of the revert? Any
chance the offending instruction goes away?

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 11433f9..beac907 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -171,7 +171,7 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
might_fault(); \
asm volatile("call __get_user_%P4" \
: "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
- : "0" (ptr), "i" (sizeof(*(ptr)))); \
+ : "0" (ptr), "i" (sizeof(*(ptr))), "r" (__sp)); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
__builtin_expect(__ret_gu, 0); \
})

2017-07-13 18:47:52

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

El Thu, Jul 13, 2017 at 01:00:01PM -0500 Josh Poimboeuf ha dit:

> On Wed, Jul 12, 2017 at 04:22:13PM -0700, Matthias Kaehlcke wrote:
> > El Wed, Jul 12, 2017 at 05:36:30PM -0500 Josh Poimboeuf ha dit:
> >
> > > On Wed, Jul 12, 2017 at 05:35:47PM -0500, Josh Poimboeuf wrote:
> > > > On Wed, Jul 12, 2017 at 03:20:40PM -0700, Matthias Kaehlcke wrote:
> > > > > > This is admittedly an awkward way of achieving this goal, but it's the
> > > > > > only way I know how to do it with GCC.
> > > > > >
> > > > > > What extra instruction does clang add?
> > > > >
> > > > > I was looking at the get_user() call in drm_mode_setcrtc(). The code
> > > > > generated by clang without the patch is:
> > > > >
> > > > > if (get_user(out_id, &set_connectors_ptr[i])) {
> > > > > ffffffff81386955: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
> > > > > ffffffff8138695c: 00
> > > > > ffffffff8138695d: 49 03 06 add (%r14),%rax
> > > > > ffffffff81386960: e8 2b a5 f0 ff callq ffffffff81290e90 <__get_user_4>
> > > > >
> > > > > And with the patch:
> > > > >
> > > > > if (get_user(out_id, &set_connectors_ptr[i])) {
> > > > > ffffffff81386a56: 4a 8d 04 bd 00 00 00 lea 0x0(,%r15,4),%rax
> > > > > ffffffff81386a5d: 00
> > > > > ffffffff81386a5e: 49 03 06 add (%r14),%rax
> > > > > ffffffff81386a61: 48 8b 64 24 28 mov 0x28(%rsp),%rsp
> > > > > ffffffff81386a66: e8 15 a5 f0 ff callq
> > > > > ffffffff81290f80 <__get_user_4>
> > > >
> > > > Hm, that seems odd. Can you sure the disassembly for the whole
> > > > function?
> > >
> > > Er, share :-)
> >
> > Sure, please find below the disassemblies with and without the
> > patch. The exact extra instruction differs from the one above, the
> > disassembly above is from a debug session with some 'random' kernel
> > version (bisect), the ones below from a v4.12ish kernel. At the bottom
> > you also find a log of a double faults observed with the patch.
> >
> > If you are interested in building the kernel with clang yourself I can
> > provide instructions, it is fairly painless nowadays as long as you
> > have a recent version of clang (a somewhat older version should also
> > do for this issue with some extra kernel patches).
>
> Here's the reason for the double fault. First it puts zero on the stack
> at offset -0x58:
>
> > ffffffff81367616: 31 c0 xor %eax,%eax
> > ffffffff81367618: 48 89 45 c8 mov %rax,-0x38(%rbp)
> > ffffffff8136761c: 45 31 ff xor %r15d,%r15d
> > ffffffff8136761f: 48 89 45 a8 mov %rax,-0x58(%rbp)
>
> Then, later, it copies that zeroed word from the stack to RSP:
>
> > ffffffff81367874: 48 8b 65 a8 mov -0x58(%rbp),%rsp
>
> Then it double faults because the call instruction tries to write RIP on
> the stack, but RSP is zero:
>
> > ffffffff81367878: e8 73 26 f1 ff callq ffffffff81279ef0 <__get_user_4>
>
> Then clang tries to put RSP's value on the stack, at the same stack slot
> where the original zero was stored (though it never reaches this point):
>
> > ffffffff8136787d: 49 89 d4 mov %rdx,%r12
> > ffffffff81367880: 48 89 65 a8 mov %rsp,-0x58(%rbp)
>
> The panic is consistent with the above. RIP points to the call
> instruction, RSP is zero:
>
> > [ 3.798722] PANIC: double fault, error_code: 0x0
> > [ 3.807387] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #107
> > [ 3.816040] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
> > [ 3.824792] task: ffff880075b92f00 task.stack: ffffc90000d6c000
> > [ 3.829599] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,data=ordered
> > [ 3.839092] RIP: 0010:drm_mode_setcrtc+0x328/0x51f
> > [ 3.844443] RSP: 0018:0000000000000000 EFLAGS: 00010206
> > [ 3.850280] RAX: 0000559e707c4d60 RBX: 0000000000000000 RCX: 0000000000000008
> > [ 3.858253] RDX: 0000000000000001 RSI: ffffc90000d6fcc8 RDI: ffffffff81367805
> > [ 3.866225] RBP: ffffc90000d6fd90 R08: 00000000014000c0 R09: 0000000000000308
> > [ 3.874199] R10: 0000000000000300 R11: 0000000000000556 R12: 0000000000000000
> > [ 3.882163] R13: ffff880077a25000 R14: ffffc90000d6fdd0 R15: 0000000000000000
> > [ 3.890136] FS: 00007fb2dbd62740(0000) GS:ffff88007ad00000(0000) knlGS:0000000000000000
> > [ 3.899177] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 3.905595] CR2: fffffffffffffff8 CR3: 000000007596b000 CR4: 00000000001006e0
> > [ 3.913568] Call Trace:
> > [ 3.916296] Code: b8 48 8d b5 38 ff ff ff 41 8b 56 08 39 d3 0f 83 85 00 00 00 4c 63 fb 4a 83 24 f8 00 4a 8d 04 bd 00 00 00 00 49 03 06 48 8b 65 a8 <e8> 73 26 f1 ff 49 89 d4 48 89 65 a8 85 c0 0f 85 a0 00 00 00 ba
> > [ 3.937440] Kernel panic - not syncing: Machine halted.
> > [ 3.943268] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #107
> > [ 3.951921] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
> > [ 3.960671] Call Trace:
> > [ 3.963398] <#DF>
> > [ 3.965637] __dump_stack+0x19/0x1b
> > [ 3.969531] dump_stack+0x42/0x60
>
> clang is obviously getting confused by the RSP output constraint. I
> think it tries to take the constraint literally, since it takes RSP as
> an output from the inline asm and stores it on the stack. However, that
> behavior doesn't really make sense for a "register" variable. It also
> doesn't explain why it's zeroing the register out first.

Thanks for your analysis!

> What happens if you try the below patch instead of the revert? Any
> chance the offending instruction goes away?
>
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 11433f9..beac907 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -171,7 +171,7 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> might_fault(); \
> asm volatile("call __get_user_%P4" \
> : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> + : "0" (ptr), "i" (sizeof(*(ptr))), "r" (__sp)); \
> (x) = (__force __typeof__(*(ptr))) __val_gu; \
> __builtin_expect(__ret_gu, 0); \
> })

The generated code is basically the same, only that now the value from
the stack is stored in a register and written twice to RSP:

ffffffff813676ba: 31 c0 xor %eax,%eax
ffffffff813676bc: 48 89 45 c8 mov %rax,-0x38(%rbp)
ffffffff813676c0: 45 31 ff xor %r15d,%r15d
ffffffff813676c3: 48 89 45 a8 mov %rax,-0x58(%rbp)
...
ffffffff81367918: 48 8b 4d a8 mov -0x58(%rbp),%rcx
ffffffff8136791c: 48 89 cc mov %rcx,%rsp
ffffffff8136791f: 48 89 cc mov %rcx,%rsp
ffffffff81367922: e8 69 26 f1 ff callq ffffffff81279f90 <__get_user_4>

2017-07-13 19:25:59

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Thu, Jul 13, 2017 at 11:47:48AM -0700, Matthias Kaehlcke wrote:
> > What happens if you try the below patch instead of the revert? Any
> > chance the offending instruction goes away?
> >
> > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> > index 11433f9..beac907 100644
> > --- a/arch/x86/include/asm/uaccess.h
> > +++ b/arch/x86/include/asm/uaccess.h
> > @@ -171,7 +171,7 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> > might_fault(); \
> > asm volatile("call __get_user_%P4" \
> > : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> > - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> > + : "0" (ptr), "i" (sizeof(*(ptr))), "r" (__sp)); \
> > (x) = (__force __typeof__(*(ptr))) __val_gu; \
> > __builtin_expect(__ret_gu, 0); \
> > })
>
> The generated code is basically the same, only that now the value from
> the stack is stored in a register and written twice to RSP:
>
> ffffffff813676ba: 31 c0 xor %eax,%eax
> ffffffff813676bc: 48 89 45 c8 mov %rax,-0x38(%rbp)
> ffffffff813676c0: 45 31 ff xor %r15d,%r15d
> ffffffff813676c3: 48 89 45 a8 mov %rax,-0x58(%rbp)
> ...
> ffffffff81367918: 48 8b 4d a8 mov -0x58(%rbp),%rcx
> ffffffff8136791c: 48 89 cc mov %rcx,%rsp
> ffffffff8136791f: 48 89 cc mov %rcx,%rsp
> ffffffff81367922: e8 69 26 f1 ff callq ffffffff81279f90 <__get_user_4>

LOL. Why corrupt the stack pointer with a single instruction (reading a
zero from memory, no less) when you can instead do it with three
instructions, including two duplicates?

Anyway this seems like a clang bug to me. If I specify RSP as an input
register then the compiler shouldn't overwrite it first. For that
matter it has no reason to overwrite it if it's an output register
either.

--
Josh

2017-07-13 19:38:35

by Michael Davidson

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Thu, Jul 13, 2017 at 12:25 PM, Josh Poimboeuf <[email protected]> wrote:

>
> Anyway this seems like a clang bug to me. If I specify RSP as an input
> register then the compiler shouldn't overwrite it first. For that
> matter it has no reason to overwrite it if it's an output register
> either.
>

It's certainly a difference in behavior between clang and gcc.

My question is whether this particular construct is really a
"supported" (or, at least, reasonably guaranteed) way of forcing gcc
to create a stack frame if none exists. or whether it is something
that "just happens to work".

If someone could explain the rationale behind *why* this works the way
that it does on gcc that might help convince the clang people that
this is actually a bug rather than just a piece of undefined behavior
which gcc and clang happen to handle differently.

2017-07-13 20:19:04

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Thu, Jul 13, 2017 at 12:38:32PM -0700, Michael Davidson wrote:
> On Thu, Jul 13, 2017 at 12:25 PM, Josh Poimboeuf <[email protected]> wrote:
>
> >
> > Anyway this seems like a clang bug to me. If I specify RSP as an input
> > register then the compiler shouldn't overwrite it first. For that
> > matter it has no reason to overwrite it if it's an output register
> > either.
> >
>
> It's certainly a difference in behavior between clang and gcc.
>
> My question is whether this particular construct is really a
> "supported" (or, at least, reasonably guaranteed) way of forcing gcc
> to create a stack frame if none exists. or whether it is something
> that "just happens to work".
>
> If someone could explain the rationale behind *why* this works the way
> that it does on gcc that might help convince the clang people that
> this is actually a bug rather than just a piece of undefined behavior
> which gcc and clang happen to handle differently.

Disclaimer: I'm no compiler expert, and there are usually a variety of
opinions about compiler undefined behavior. So it would probably be
good for real compiler people to participate in the discussion.

But I think there are two separate issues here.

1) The first issue is whether it's supported behavior to specify RSP as
an output constraint in order to force GCC to create a stack frame.
As far as I know, this is a quirk of GCC, and not really considered
defined behavior.

However, the idea was suggested by some GCC developers:

https://gcc.gnu.org/ml/gcc/2015-07/msg00079.html

So at least it seems to be endorsed by GCC to some degree. If you
need details on why it works, that thread has the details.

2) The second issue is whether clang should corrupt RSP. I don't see a
reason for clang to do that. IMO, when using a local register
variable as an input or output to inline asm, the compiler should
leave the contents of the register alone.

FWIW, my reading of the GCC manual seems to support that:

https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables


--
Josh

2017-07-13 20:22:12

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On 07/13/2017 09:47 PM, Matthias Kaehlcke wrote:

> Thanks for your analysis!
>
>> What happens if you try the below patch instead of the revert? Any
>> chance the offending instruction goes away?
>>
>> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
>> index 11433f9..beac907 100644
>> --- a/arch/x86/include/asm/uaccess.h
>> +++ b/arch/x86/include/asm/uaccess.h
>> @@ -171,7 +171,7 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
>> might_fault(); \
>> asm volatile("call __get_user_%P4" \
>> : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
>> - : "0" (ptr), "i" (sizeof(*(ptr)))); \
>> + : "0" (ptr), "i" (sizeof(*(ptr))), "r" (__sp)); \
>> (x) = (__force __typeof__(*(ptr))) __val_gu; \
>> __builtin_expect(__ret_gu, 0); \
>> })
>
> The generated code is basically the same, only that now the value from
> the stack is stored in a register and written twice to RSP:
>

AFAIR clang works much better with global named registers.
Could you check if the patch bellow helps?


---
arch/x86/include/asm/uaccess.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index a059aac9e937..121204387978 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -157,15 +157,18 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
* Clang/LLVM cares about the size of the register, but still wants
* the base register for something that ends up being a pair.
*/
+
+register unsigned long __current_sp asm(_ASM_SP);
+
#define get_user(x, ptr) \
({ \
int __ret_gu; \
register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
- register void *__sp asm(_ASM_SP); \
__chk_user_ptr(ptr); \
might_fault(); \
asm volatile("call __get_user_%P4" \
- : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
+ : "=a" (__ret_gu), "=r" (__val_gu), \
+ "+r" (__current_sp) \
: "0" (ptr), "i" (sizeof(*(ptr)))); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
__builtin_expect(__ret_gu, 0); \
--
2.13.0


2017-07-13 20:34:21

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Thu, Jul 13, 2017 at 11:20:04PM +0300, Andrey Rybainin wrote:
> On 07/13/2017 09:47 PM, Matthias Kaehlcke wrote:
>
> > Thanks for your analysis!
> >
> >> What happens if you try the below patch instead of the revert? Any
> >> chance the offending instruction goes away?
> >>
> >> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> >> index 11433f9..beac907 100644
> >> --- a/arch/x86/include/asm/uaccess.h
> >> +++ b/arch/x86/include/asm/uaccess.h
> >> @@ -171,7 +171,7 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> >> might_fault(); \
> >> asm volatile("call __get_user_%P4" \
> >> : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> >> - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> >> + : "0" (ptr), "i" (sizeof(*(ptr))), "r" (__sp)); \
> >> (x) = (__force __typeof__(*(ptr))) __val_gu; \
> >> __builtin_expect(__ret_gu, 0); \
> >> })
> >
> > The generated code is basically the same, only that now the value from
> > the stack is stored in a register and written twice to RSP:
> >
>
> AFAIR clang works much better with global named registers.
> Could you check if the patch bellow helps?

And yet another one to try (clobbering sp) :-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 11433f9..21f0c39 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -166,12 +166,12 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
({ \
int __ret_gu; \
register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
- register void *__sp asm(_ASM_SP); \
__chk_user_ptr(ptr); \
might_fault(); \
- asm volatile("call __get_user_%P4" \
- : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
- : "0" (ptr), "i" (sizeof(*(ptr)))); \
+ asm volatile("call __get_user_%P3" \
+ : "=a" (__ret_gu), "=r" (__val_gu) \
+ : "0" (ptr), "i" (sizeof(*(ptr))) \
+ : "sp"); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
__builtin_expect(__ret_gu, 0); \
})

2017-07-13 21:12:48

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

El Thu, Jul 13, 2017 at 03:34:16PM -0500 Josh Poimboeuf ha dit:

> On Thu, Jul 13, 2017 at 11:20:04PM +0300, Andrey Rybainin wrote:
> > On 07/13/2017 09:47 PM, Matthias Kaehlcke wrote:
> >
> > > Thanks for your analysis!
> > >
> > >> What happens if you try the below patch instead of the revert? Any
> > >> chance the offending instruction goes away?
> > >>
> > >> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> > >> index 11433f9..beac907 100644
> > >> --- a/arch/x86/include/asm/uaccess.h
> > >> +++ b/arch/x86/include/asm/uaccess.h
> > >> @@ -171,7 +171,7 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> > >> might_fault(); \
> > >> asm volatile("call __get_user_%P4" \
> > >> : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> > >> - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> > >> + : "0" (ptr), "i" (sizeof(*(ptr))), "r" (__sp)); \
> > >> (x) = (__force __typeof__(*(ptr))) __val_gu; \
> > >> __builtin_expect(__ret_gu, 0); \
> > >> })
> > >
> > > The generated code is basically the same, only that now the value from
> > > the stack is stored in a register and written twice to RSP:
> > >
> >
> > AFAIR clang works much better with global named registers.
> > Could you check if the patch bellow helps?
>
> And yet another one to try (clobbering sp) :-)
>
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 11433f9..21f0c39 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -166,12 +166,12 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> ({ \
> int __ret_gu; \
> register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
> - register void *__sp asm(_ASM_SP); \
> __chk_user_ptr(ptr); \
> might_fault(); \
> - asm volatile("call __get_user_%P4" \
> - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> + asm volatile("call __get_user_%P3" \
> + : "=a" (__ret_gu), "=r" (__val_gu) \
> + : "0" (ptr), "i" (sizeof(*(ptr))) \
> + : "sp"); \
> (x) = (__force __typeof__(*(ptr))) __val_gu; \
> __builtin_expect(__ret_gu, 0); \
> })

This compiles with both gcc and clang, clang does not corrupt the
stack pointer. I wouldn't be able to tell though if it forces a stack
frame if it doesn't already exist, as the original patch intends.

2017-07-13 21:14:18

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

El Thu, Jul 13, 2017 at 11:20:04PM +0300 Andrey Rybainin ha dit:

> On 07/13/2017 09:47 PM, Matthias Kaehlcke wrote:
>
> > Thanks for your analysis!
> >
> >> What happens if you try the below patch instead of the revert? Any
> >> chance the offending instruction goes away?
> >>
> >> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> >> index 11433f9..beac907 100644
> >> --- a/arch/x86/include/asm/uaccess.h
> >> +++ b/arch/x86/include/asm/uaccess.h
> >> @@ -171,7 +171,7 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> >> might_fault(); \
> >> asm volatile("call __get_user_%P4" \
> >> : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> >> - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> >> + : "0" (ptr), "i" (sizeof(*(ptr))), "r" (__sp)); \
> >> (x) = (__force __typeof__(*(ptr))) __val_gu; \
> >> __builtin_expect(__ret_gu, 0); \
> >> })
> >
> > The generated code is basically the same, only that now the value from
> > the stack is stored in a register and written twice to RSP:
> >
>
> AFAIR clang works much better with global named registers.
> Could you check if the patch bellow helps?
>
>
> ---
> arch/x86/include/asm/uaccess.h | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index a059aac9e937..121204387978 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -157,15 +157,18 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> * Clang/LLVM cares about the size of the register, but still wants
> * the base register for something that ends up being a pair.
> */
> +
> +register unsigned long __current_sp asm(_ASM_SP);
> +
> #define get_user(x, ptr) \
> ({ \
> int __ret_gu; \
> register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
> - register void *__sp asm(_ASM_SP); \
> __chk_user_ptr(ptr); \
> might_fault(); \
> asm volatile("call __get_user_%P4" \
> - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> + : "=a" (__ret_gu), "=r" (__val_gu), \
> + "+r" (__current_sp) \
> : "0" (ptr), "i" (sizeof(*(ptr)))); \
> (x) = (__force __typeof__(*(ptr))) __val_gu; \
> __builtin_expect(__ret_gu, 0); \

Thanks for the suggestion, however it fails to build with both gcc and clang:

fs/ioctl.c:585:6: error: use of undeclared identifier '__current_sp'
if (get_user(count, &argp->dest_count)) {
^
arch/x86/include/asm/uaccess.h:168:16: note: expanded from macro 'get_user'
"+r" (__current_sp)
\

The references I found refer to __current_sp as an intrinsic function
for ARM32.

2017-07-13 21:27:48

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"



On 07/14/2017 12:14 AM, Matthias Kaehlcke wrote:
> El Thu, Jul 13, 2017 at 11:20:04PM +0300 Andrey Rybainin ha dit:
>
>> On 07/13/2017 09:47 PM, Matthias Kaehlcke wrote:
>>
>>> Thanks for your analysis!
>>>
>>>> What happens if you try the below patch instead of the revert? Any
>>>> chance the offending instruction goes away?
>>>>
>>>> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
>>>> index 11433f9..beac907 100644
>>>> --- a/arch/x86/include/asm/uaccess.h
>>>> +++ b/arch/x86/include/asm/uaccess.h
>>>> @@ -171,7 +171,7 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
>>>> might_fault(); \
>>>> asm volatile("call __get_user_%P4" \
>>>> : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
>>>> - : "0" (ptr), "i" (sizeof(*(ptr)))); \
>>>> + : "0" (ptr), "i" (sizeof(*(ptr))), "r" (__sp)); \
>>>> (x) = (__force __typeof__(*(ptr))) __val_gu; \
>>>> __builtin_expect(__ret_gu, 0); \
>>>> })
>>>
>>> The generated code is basically the same, only that now the value from
>>> the stack is stored in a register and written twice to RSP:
>>>
>>
>> AFAIR clang works much better with global named registers.
>> Could you check if the patch bellow helps?
>>
>>
>> ---
>> arch/x86/include/asm/uaccess.h | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
>> index a059aac9e937..121204387978 100644
>> --- a/arch/x86/include/asm/uaccess.h
>> +++ b/arch/x86/include/asm/uaccess.h
>> @@ -157,15 +157,18 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
>> * Clang/LLVM cares about the size of the register, but still wants
>> * the base register for something that ends up being a pair.
>> */
>> +
>> +register unsigned long __current_sp asm(_ASM_SP);
>> +
>> #define get_user(x, ptr) \
>> ({ \
>> int __ret_gu; \
>> register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
>> - register void *__sp asm(_ASM_SP); \
>> __chk_user_ptr(ptr); \
>> might_fault(); \
>> asm volatile("call __get_user_%P4" \
>> - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
>> + : "=a" (__ret_gu), "=r" (__val_gu), \
>> + "+r" (__current_sp) \
>> : "0" (ptr), "i" (sizeof(*(ptr)))); \
>> (x) = (__force __typeof__(*(ptr))) __val_gu; \
>> __builtin_expect(__ret_gu, 0); \
>
> Thanks for the suggestion, however it fails to build with both gcc and clang:
>
> fs/ioctl.c:585:6: error: use of undeclared identifier '__current_sp'
> if (get_user(count, &argp->dest_count)) {
> ^
> arch/x86/include/asm/uaccess.h:168:16: note: expanded from macro 'get_user'
> "+r" (__current_sp)
> \
>
> The references I found refer to __current_sp as an intrinsic function
> for ARM32.

What? __current_sp declared right above get_user() as "register unsigned long __current_sp asm(_ASM_SP);"
Did you actually applied my patch or you just modified the code yourself but have missed
"register unsigned long __current_sp asm(_ASM_SP);" ?

FWIW patch works (builds) for me with gcc.

2017-07-13 21:34:14

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Thu, Jul 13, 2017 at 02:12:45PM -0700, Matthias Kaehlcke wrote:
> El Thu, Jul 13, 2017 at 03:34:16PM -0500 Josh Poimboeuf ha dit:
> > And yet another one to try (clobbering sp) :-)
> >
> > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> > index 11433f9..21f0c39 100644
> > --- a/arch/x86/include/asm/uaccess.h
> > +++ b/arch/x86/include/asm/uaccess.h
> > @@ -166,12 +166,12 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> > ({ \
> > int __ret_gu; \
> > register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
> > - register void *__sp asm(_ASM_SP); \
> > __chk_user_ptr(ptr); \
> > might_fault(); \
> > - asm volatile("call __get_user_%P4" \
> > - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> > - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> > + asm volatile("call __get_user_%P3" \
> > + : "=a" (__ret_gu), "=r" (__val_gu) \
> > + : "0" (ptr), "i" (sizeof(*(ptr))) \
> > + : "sp"); \
> > (x) = (__force __typeof__(*(ptr))) __val_gu; \
> > __builtin_expect(__ret_gu, 0); \
> > })
>
> This compiles with both gcc and clang, clang does not corrupt the
> stack pointer. I wouldn't be able to tell though if it forces a stack
> frame if it doesn't already exist, as the original patch intends.

Whether it forces the stack frame on clang is a very minor issue
compared to the double fault. That really only matters when you want to
use CONFIG_STACK_VALIDATION to get 100% reliable stacktraces with frame
pointers. And that feature is currently very GCC-specific. So you
probably don't need to worry about that for now, at least until you want
to do live patching with a clang-compiled kernel.

IIRC, clobbering SP does at least force the stack frame on GCC, though I
need to double check that. I can try to work up an official patch in
the next week or so (need to do some testing first).

--
Josh

2017-07-13 21:43:29

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

El Fri, Jul 14, 2017 at 12:25:42AM +0300 Andrey Rybainin ha dit:

>
>
> On 07/14/2017 12:14 AM, Matthias Kaehlcke wrote:
> > El Thu, Jul 13, 2017 at 11:20:04PM +0300 Andrey Rybainin ha dit:
> >
> >> On 07/13/2017 09:47 PM, Matthias Kaehlcke wrote:
> >>
> >>> Thanks for your analysis!
> >>>
> >>>> What happens if you try the below patch instead of the revert? Any
> >>>> chance the offending instruction goes away?
> >>>>
> >>>> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> >>>> index 11433f9..beac907 100644
> >>>> --- a/arch/x86/include/asm/uaccess.h
> >>>> +++ b/arch/x86/include/asm/uaccess.h
> >>>> @@ -171,7 +171,7 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> >>>> might_fault(); \
> >>>> asm volatile("call __get_user_%P4" \
> >>>> : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> >>>> - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> >>>> + : "0" (ptr), "i" (sizeof(*(ptr))), "r" (__sp)); \
> >>>> (x) = (__force __typeof__(*(ptr))) __val_gu; \
> >>>> __builtin_expect(__ret_gu, 0); \
> >>>> })
> >>>
> >>> The generated code is basically the same, only that now the value from
> >>> the stack is stored in a register and written twice to RSP:
> >>>
> >>
> >> AFAIR clang works much better with global named registers.
> >> Could you check if the patch bellow helps?
> >>
> >>
> >> ---
> >> arch/x86/include/asm/uaccess.h | 7 +++++--
> >> 1 file changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> >> index a059aac9e937..121204387978 100644
> >> --- a/arch/x86/include/asm/uaccess.h
> >> +++ b/arch/x86/include/asm/uaccess.h
> >> @@ -157,15 +157,18 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> >> * Clang/LLVM cares about the size of the register, but still wants
> >> * the base register for something that ends up being a pair.
> >> */
> >> +
> >> +register unsigned long __current_sp asm(_ASM_SP);
> >> +
> >> #define get_user(x, ptr) \
> >> ({ \
> >> int __ret_gu; \
> >> register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
> >> - register void *__sp asm(_ASM_SP); \
> >> __chk_user_ptr(ptr); \
> >> might_fault(); \
> >> asm volatile("call __get_user_%P4" \
> >> - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> >> + : "=a" (__ret_gu), "=r" (__val_gu), \
> >> + "+r" (__current_sp) \
> >> : "0" (ptr), "i" (sizeof(*(ptr)))); \
> >> (x) = (__force __typeof__(*(ptr))) __val_gu; \
> >> __builtin_expect(__ret_gu, 0); \
> >
> > Thanks for the suggestion, however it fails to build with both gcc and clang:
> >
> > fs/ioctl.c:585:6: error: use of undeclared identifier '__current_sp'
> > if (get_user(count, &argp->dest_count)) {
> > ^
> > arch/x86/include/asm/uaccess.h:168:16: note: expanded from macro 'get_user'
> > "+r" (__current_sp)
> > \
> >
> > The references I found refer to __current_sp as an intrinsic function
> > for ARM32.
>
> What? __current_sp declared right above get_user() as "register unsigned long __current_sp asm(_ASM_SP);"
> Did you actually applied my patch or you just modified the code yourself but have missed
> "register unsigned long __current_sp asm(_ASM_SP);" ?

Indeed, since the patch is only a few lines and I had the function
already open in the editor it seemed easier to change the affected
lines than to apply the patch and I missed the definition <:‑|

After adding the missing line the code builds with clang and the stack
pointer is not corrupted.

2017-07-13 21:52:19

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Thu, Jul 13, 2017 at 02:43:26PM -0700, Matthias Kaehlcke wrote:
> > >> AFAIR clang works much better with global named registers.

Is that a bug or a feature?

--
Josh

2017-07-13 21:57:07

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

El Thu, Jul 13, 2017 at 04:34:06PM -0500 Josh Poimboeuf ha dit:

> On Thu, Jul 13, 2017 at 02:12:45PM -0700, Matthias Kaehlcke wrote:
> > El Thu, Jul 13, 2017 at 03:34:16PM -0500 Josh Poimboeuf ha dit:
> > > And yet another one to try (clobbering sp) :-)
> > >
> > > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> > > index 11433f9..21f0c39 100644
> > > --- a/arch/x86/include/asm/uaccess.h
> > > +++ b/arch/x86/include/asm/uaccess.h
> > > @@ -166,12 +166,12 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> > > ({ \
> > > int __ret_gu; \
> > > register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
> > > - register void *__sp asm(_ASM_SP); \
> > > __chk_user_ptr(ptr); \
> > > might_fault(); \
> > > - asm volatile("call __get_user_%P4" \
> > > - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> > > - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> > > + asm volatile("call __get_user_%P3" \
> > > + : "=a" (__ret_gu), "=r" (__val_gu) \
> > > + : "0" (ptr), "i" (sizeof(*(ptr))) \
> > > + : "sp"); \
> > > (x) = (__force __typeof__(*(ptr))) __val_gu; \
> > > __builtin_expect(__ret_gu, 0); \
> > > })
> >
> > This compiles with both gcc and clang, clang does not corrupt the
> > stack pointer. I wouldn't be able to tell though if it forces a stack
> > frame if it doesn't already exist, as the original patch intends.
>
> Whether it forces the stack frame on clang is a very minor issue
> compared to the double fault.

I totally agree, I was mainly concerned about not breaking the
solution that currently works with gcc.

> That really only matters when you want to use
> CONFIG_STACK_VALIDATION to get 100% reliable stacktraces with frame
> pointers. And that feature is currently very GCC-specific. So you
> probably don't need to worry about that for now, at least until you want
> to do live patching with a clang-compiled kernel.

Eventually I expect that there will be interest in live patching
clang-compiled kernels, however at this stage it probably isn't an
overly important feature.

> IIRC, clobbering SP does at least force the stack frame on GCC, though I
> need to double check that. I can try to work up an official patch in
> the next week or so (need to do some testing first).

Sounds great.

Thanks again for looking into this and coming up with a solution!

Matthias

2017-07-19 17:46:36

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Thu, Jul 13, 2017 at 02:57:04PM -0700, Matthias Kaehlcke wrote:
> El Thu, Jul 13, 2017 at 04:34:06PM -0500 Josh Poimboeuf ha dit:
>
> > On Thu, Jul 13, 2017 at 02:12:45PM -0700, Matthias Kaehlcke wrote:
> > > El Thu, Jul 13, 2017 at 03:34:16PM -0500 Josh Poimboeuf ha dit:
> > > > And yet another one to try (clobbering sp) :-)
> > > >
> > > > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> > > > index 11433f9..21f0c39 100644
> > > > --- a/arch/x86/include/asm/uaccess.h
> > > > +++ b/arch/x86/include/asm/uaccess.h
> > > > @@ -166,12 +166,12 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> > > > ({ \
> > > > int __ret_gu; \
> > > > register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
> > > > - register void *__sp asm(_ASM_SP); \
> > > > __chk_user_ptr(ptr); \
> > > > might_fault(); \
> > > > - asm volatile("call __get_user_%P4" \
> > > > - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> > > > - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> > > > + asm volatile("call __get_user_%P3" \
> > > > + : "=a" (__ret_gu), "=r" (__val_gu) \
> > > > + : "0" (ptr), "i" (sizeof(*(ptr))) \
> > > > + : "sp"); \
> > > > (x) = (__force __typeof__(*(ptr))) __val_gu; \
> > > > __builtin_expect(__ret_gu, 0); \
> > > > })
> > >
> > > This compiles with both gcc and clang, clang does not corrupt the
> > > stack pointer. I wouldn't be able to tell though if it forces a stack
> > > frame if it doesn't already exist, as the original patch intends.
> >
> > Whether it forces the stack frame on clang is a very minor issue
> > compared to the double fault.
>
> I totally agree, I was mainly concerned about not breaking the
> solution that currently works with gcc.
>
> > That really only matters when you want to use
> > CONFIG_STACK_VALIDATION to get 100% reliable stacktraces with frame
> > pointers. And that feature is currently very GCC-specific. So you
> > probably don't need to worry about that for now, at least until you want
> > to do live patching with a clang-compiled kernel.
>
> Eventually I expect that there will be interest in live patching
> clang-compiled kernels, however at this stage it probably isn't an
> overly important feature.
>
> > IIRC, clobbering SP does at least force the stack frame on GCC, though I
> > need to double check that. I can try to work up an official patch in
> > the next week or so (need to do some testing first).
>
> Sounds great.
>
> Thanks again for looking into this and coming up with a solution!

After doing some testing, I don't think this approach is going to work
after all. In addition to forcing the stack frame, it also causes GCC
to add an unnecessary extra instruction to the epilogue of each affected
function:

lea -0x10(%rbp),%rsp

We shouldn't be inserting extra instructions like that. I also don't
think the other suggestion of turning the '__sp' register variable into
a global variable is a very good solution either, as that just wastes
memory for no reason.

It would be nice if both compilers could agree on a way to support this.

--
Josh

2017-07-19 21:50:42

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

El Wed, Jul 19, 2017 at 12:46:31PM -0500 Josh Poimboeuf ha dit:

> On Thu, Jul 13, 2017 at 02:57:04PM -0700, Matthias Kaehlcke wrote:
> > El Thu, Jul 13, 2017 at 04:34:06PM -0500 Josh Poimboeuf ha dit:
> >
> > > On Thu, Jul 13, 2017 at 02:12:45PM -0700, Matthias Kaehlcke wrote:
> > > > El Thu, Jul 13, 2017 at 03:34:16PM -0500 Josh Poimboeuf ha dit:
> > > > > And yet another one to try (clobbering sp) :-)
> > > > >
> > > > > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> > > > > index 11433f9..21f0c39 100644
> > > > > --- a/arch/x86/include/asm/uaccess.h
> > > > > +++ b/arch/x86/include/asm/uaccess.h
> > > > > @@ -166,12 +166,12 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> > > > > ({ \
> > > > > int __ret_gu; \
> > > > > register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
> > > > > - register void *__sp asm(_ASM_SP); \
> > > > > __chk_user_ptr(ptr); \
> > > > > might_fault(); \
> > > > > - asm volatile("call __get_user_%P4" \
> > > > > - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> > > > > - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> > > > > + asm volatile("call __get_user_%P3" \
> > > > > + : "=a" (__ret_gu), "=r" (__val_gu) \
> > > > > + : "0" (ptr), "i" (sizeof(*(ptr))) \
> > > > > + : "sp"); \
> > > > > (x) = (__force __typeof__(*(ptr))) __val_gu; \
> > > > > __builtin_expect(__ret_gu, 0); \
> > > > > })
> > > >
> > > > This compiles with both gcc and clang, clang does not corrupt the
> > > > stack pointer. I wouldn't be able to tell though if it forces a stack
> > > > frame if it doesn't already exist, as the original patch intends.
> > >
> > > Whether it forces the stack frame on clang is a very minor issue
> > > compared to the double fault.
> >
> > I totally agree, I was mainly concerned about not breaking the
> > solution that currently works with gcc.
> >
> > > That really only matters when you want to use
> > > CONFIG_STACK_VALIDATION to get 100% reliable stacktraces with frame
> > > pointers. And that feature is currently very GCC-specific. So you
> > > probably don't need to worry about that for now, at least until you want
> > > to do live patching with a clang-compiled kernel.
> >
> > Eventually I expect that there will be interest in live patching
> > clang-compiled kernels, however at this stage it probably isn't an
> > overly important feature.
> >
> > > IIRC, clobbering SP does at least force the stack frame on GCC, though I
> > > need to double check that. I can try to work up an official patch in
> > > the next week or so (need to do some testing first).
> >
> > Sounds great.
> >
> > Thanks again for looking into this and coming up with a solution!
>
> After doing some testing, I don't think this approach is going to work
> after all. In addition to forcing the stack frame, it also causes GCC
> to add an unnecessary extra instruction to the epilogue of each affected
> function:
>
> lea -0x10(%rbp),%rsp
>
> We shouldn't be inserting extra instructions like that. I also don't
> think the other suggestion of turning the '__sp' register variable into
> a global variable is a very good solution either, as that just wastes
> memory for no reason.
>
> It would be nice if both compilers could agree on a way to support this.

Thanks for the update, Josh.

I will ask compiler folks to bring this up with LLVM.

2017-07-20 10:01:44

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

2017-07-19 20:46 GMT+03:00 Josh Poimboeuf <[email protected]>:

>
> After doing some testing, I don't think this approach is going to work
> after all. In addition to forcing the stack frame, it also causes GCC
> to add an unnecessary extra instruction to the epilogue of each affected
> function:
>
> lea -0x10(%rbp),%rsp
>
> We shouldn't be inserting extra instructions like that. I also don't
> think the other suggestion of turning the '__sp' register variable into
> a global variable is a very good solution either, as that just wastes
> memory for no reason.
>

Wastes memory? How is that wastes memory? That doesn't make any sense.

> It would be nice if both compilers could agree on a way to support this.
>
> --
> Josh

2017-07-20 15:18:20

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Thu, Jul 20, 2017 at 01:01:39PM +0300, Andrey Ryabinin wrote:
> 2017-07-19 20:46 GMT+03:00 Josh Poimboeuf <[email protected]>:
>
> >
> > After doing some testing, I don't think this approach is going to work
> > after all. In addition to forcing the stack frame, it also causes GCC
> > to add an unnecessary extra instruction to the epilogue of each affected
> > function:
> >
> > lea -0x10(%rbp),%rsp
> >
> > We shouldn't be inserting extra instructions like that. I also don't
> > think the other suggestion of turning the '__sp' register variable into
> > a global variable is a very good solution either, as that just wastes
> > memory for no reason.
> >
>
> Wastes memory? How is that wastes memory? That doesn't make any sense.

Yes, you're right, that doesn't make any sense. I think I was trying to
wrap my head around what it means to have a global register variable --
in a header file no less -- and why clang would treat that differently
than a local register variable.

--
Josh

2017-07-20 15:30:27

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

2017-07-20 18:18 GMT+03:00 Josh Poimboeuf <[email protected]>:
> On Thu, Jul 20, 2017 at 01:01:39PM +0300, Andrey Ryabinin wrote:
>> 2017-07-19 20:46 GMT+03:00 Josh Poimboeuf <[email protected]>:
>>
>> >
>> > After doing some testing, I don't think this approach is going to work
>> > after all. In addition to forcing the stack frame, it also causes GCC
>> > to add an unnecessary extra instruction to the epilogue of each affected
>> > function:
>> >
>> > lea -0x10(%rbp),%rsp
>> >
>> > We shouldn't be inserting extra instructions like that. I also don't
>> > think the other suggestion of turning the '__sp' register variable into
>> > a global variable is a very good solution either, as that just wastes
>> > memory for no reason.
>> >
>>
>> Wastes memory? How is that wastes memory? That doesn't make any sense.
>
> Yes, you're right, that doesn't make any sense. I think I was trying to
> wrap my head around what it means to have a global register variable --
> in a header file no less -- and why clang would treat that differently
> than a local register variable.
>

FWIW bellow is my understanding of what's going on.

It seems clang treats local named register almost the same as ordinary
local variables.
The only difference is that before reading the register variable clang
puts variable's value into the specified register.

So clang just assigns stack slot for the variable __sp where it's
going to keep variable's value.
But since __sp is unitialized (we haven't assign anything to it), the
value of the __sp is some garbage from stack.
inline asm specifies __sp as input, so clang assumes that it have to
load __sp into 'rsp' because inline asm is going to use
it. And it just loads garbage from stack into 'rsp'

In fact, such behavior (I mean storing the value on stack and loading
into reg before the use) is very useful.
Clang's behavior allows to keep the value assigned to the
call-clobbered register across the function calls.

Unlike clang, gcc assigns value to the register right away and doesn't
store the value anywhere else. So if the reg is
call clobbered register you have to be absolutely sure that there is
no subsequent function call that might clobber the register.

E.g. see some real examples
https://patchwork.kernel.org/patch/4111971/ or 98d4ded60bda("msm: scm:
Fix improper register assignment").
These bugs shouldn't happen with clang.

But the global named register works slightly differently in clang. For
the global, the value is just the value of the register itself,
whatever it is. Read/write from global named register is just like
direct read/write to the register

2017-07-20 20:56:57

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Thu, Jul 20, 2017 at 06:30:24PM +0300, Andrey Ryabinin wrote:
> FWIW bellow is my understanding of what's going on.
>
> It seems clang treats local named register almost the same as ordinary
> local variables.
> The only difference is that before reading the register variable clang
> puts variable's value into the specified register.
>
> So clang just assigns stack slot for the variable __sp where it's
> going to keep variable's value.
> But since __sp is unitialized (we haven't assign anything to it), the
> value of the __sp is some garbage from stack.
> inline asm specifies __sp as input, so clang assumes that it have to
> load __sp into 'rsp' because inline asm is going to use
> it. And it just loads garbage from stack into 'rsp'
>
> In fact, such behavior (I mean storing the value on stack and loading
> into reg before the use) is very useful.
> Clang's behavior allows to keep the value assigned to the
> call-clobbered register across the function calls.
>
> Unlike clang, gcc assigns value to the register right away and doesn't
> store the value anywhere else. So if the reg is
> call clobbered register you have to be absolutely sure that there is
> no subsequent function call that might clobber the register.
>
> E.g. see some real examples
> https://patchwork.kernel.org/patch/4111971/ or 98d4ded60bda("msm: scm:
> Fix improper register assignment").
> These bugs shouldn't happen with clang.
>
> But the global named register works slightly differently in clang. For
> the global, the value is just the value of the register itself,
> whatever it is. Read/write from global named register is just like
> direct read/write to the register

Thanks, that clears up a lot of the confusion for me.

Still, unfortunately, I don't think that's going to work for GCC.
Changing the '__sp' register variable to global in the header file
causes it to make a *bunch* of changes across the kernel, even in
functions which don't do inline asm. It seems to be disabling some
optimizations across the board.

I do have another idea, which is to replace all uses of

asm(" ... call foo ... " : outputs : inputs : clobbers);

with a new ASM_CALL macro:

ASM_CALL(" ... call foo ... ", outputs, inputs, clobbers);

Then the compiler differences can be abstracted out, with GCC adding
"sp" as an output constraint and clang doing nothing (for now).

--
Josh

2017-07-21 09:11:10

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"



On 07/20/2017 11:56 PM, Josh Poimboeuf wrote:
> On Thu, Jul 20, 2017 at 06:30:24PM +0300, Andrey Ryabinin wrote:
>> FWIW bellow is my understanding of what's going on.
>>
>> It seems clang treats local named register almost the same as ordinary
>> local variables.
>> The only difference is that before reading the register variable clang
>> puts variable's value into the specified register.
>>
>> So clang just assigns stack slot for the variable __sp where it's
>> going to keep variable's value.
>> But since __sp is unitialized (we haven't assign anything to it), the
>> value of the __sp is some garbage from stack.
>> inline asm specifies __sp as input, so clang assumes that it have to
>> load __sp into 'rsp' because inline asm is going to use
>> it. And it just loads garbage from stack into 'rsp'
>>
>> In fact, such behavior (I mean storing the value on stack and loading
>> into reg before the use) is very useful.
>> Clang's behavior allows to keep the value assigned to the
>> call-clobbered register across the function calls.
>>
>> Unlike clang, gcc assigns value to the register right away and doesn't
>> store the value anywhere else. So if the reg is
>> call clobbered register you have to be absolutely sure that there is
>> no subsequent function call that might clobber the register.
>>
>> E.g. see some real examples
>> https://patchwork.kernel.org/patch/4111971/ or 98d4ded60bda("msm: scm:
>> Fix improper register assignment").
>> These bugs shouldn't happen with clang.
>>
>> But the global named register works slightly differently in clang. For
>> the global, the value is just the value of the register itself,
>> whatever it is. Read/write from global named register is just like
>> direct read/write to the register
>
> Thanks, that clears up a lot of the confusion for me.
>
> Still, unfortunately, I don't think that's going to work for GCC.
> Changing the '__sp' register variable to global in the header file
> causes it to make a *bunch* of changes across the kernel, even in
> functions which don't do inline asm. It seems to be disabling some
> optimizations across the board.

All I see is just bunch of reordering of independent instructions, like this:

-ffffffff81012760: 5b pop %rbx
-ffffffff81012761: 31 c0 xor %eax,%eax
+ffffffff81012760: 31 c0 xor %eax,%eax
+ffffffff81012762: 5b pop %rbx

-ffffffff810c29ae: 48 83 c4 28 add $0x28,%rsp
-ffffffff810c29b2: 89 d8 mov %ebx,%eax
+ffffffff810c29ae: 89 d8 mov %ebx,%eax
+ffffffff810c29b0: 48 83 c4 28 add $0x28,%rsp

I haven't noticed any single bad/harmful change. The size of .text remained the same.

And btw, arm/arm64 already use global current_stack_pointer just fine.

> I do have another idea, which is to replace all uses of
>
> asm(" ... call foo ... " : outputs : inputs : clobbers);
>
> with a new ASM_CALL macro:
>
> ASM_CALL(" ... call foo ... ", outputs, inputs, clobbers);
>
> Then the compiler differences can be abstracted out, with GCC adding
> "sp" as an output constraint and clang doing nothing (for now).
>

2017-07-21 13:24:58

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Fri, Jul 21, 2017 at 12:13:31PM +0300, Andrey Ryabinin wrote:
> > Still, unfortunately, I don't think that's going to work for GCC.
> > Changing the '__sp' register variable to global in the header file
> > causes it to make a *bunch* of changes across the kernel, even in
> > functions which don't do inline asm. It seems to be disabling some
> > optimizations across the board.
>
> All I see is just bunch of reordering of independent instructions, like this:
>
> -ffffffff81012760: 5b pop %rbx
> -ffffffff81012761: 31 c0 xor %eax,%eax
> +ffffffff81012760: 31 c0 xor %eax,%eax
> +ffffffff81012762: 5b pop %rbx
>
> -ffffffff810c29ae: 48 83 c4 28 add $0x28,%rsp
> -ffffffff810c29b2: 89 d8 mov %ebx,%eax
> +ffffffff810c29ae: 89 d8 mov %ebx,%eax
> +ffffffff810c29b0: 48 83 c4 28 add $0x28,%rsp
>
> I haven't noticed any single bad/harmful change. The size of .text remained the same.

I compiled with -ffunction-sections to make the comparisons easier. The
reordering is much more extreme than your example. (This is with GCC 7,
btw). And it's not just reordering of instructions. It's control flow
changes as well.

Also, the text size grew a little:

text data bss dec hex filename
10630602 8295074 16461824 35387500 21bf86c vmlinux.before
10634013 8295074 16461824 35390911 21c05bf vmlinux.after

A small two-line change, which is supposed to be a noop, or at least
should only affect a small number of functions, but which instead
affects optimization decisions across the entire kernel, is actively
harmful IMO.

> And btw, arm/arm64 already use global current_stack_pointer just fine.

I wonder if they looked for the impact.

--
Josh

2017-07-29 00:38:57

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

El Thu, Jul 20, 2017 at 03:56:52PM -0500 Josh Poimboeuf ha dit:

> On Thu, Jul 20, 2017 at 06:30:24PM +0300, Andrey Ryabinin wrote:
> > FWIW bellow is my understanding of what's going on.
> >
> > It seems clang treats local named register almost the same as ordinary
> > local variables.
> > The only difference is that before reading the register variable clang
> > puts variable's value into the specified register.
> >
> > So clang just assigns stack slot for the variable __sp where it's
> > going to keep variable's value.
> > But since __sp is unitialized (we haven't assign anything to it), the
> > value of the __sp is some garbage from stack.
> > inline asm specifies __sp as input, so clang assumes that it have to
> > load __sp into 'rsp' because inline asm is going to use
> > it. And it just loads garbage from stack into 'rsp'
> >
> > In fact, such behavior (I mean storing the value on stack and loading
> > into reg before the use) is very useful.
> > Clang's behavior allows to keep the value assigned to the
> > call-clobbered register across the function calls.
> >
> > Unlike clang, gcc assigns value to the register right away and doesn't
> > store the value anywhere else. So if the reg is
> > call clobbered register you have to be absolutely sure that there is
> > no subsequent function call that might clobber the register.
> >
> > E.g. see some real examples
> > https://patchwork.kernel.org/patch/4111971/ or 98d4ded60bda("msm: scm:
> > Fix improper register assignment").
> > These bugs shouldn't happen with clang.
> >
> > But the global named register works slightly differently in clang. For
> > the global, the value is just the value of the register itself,
> > whatever it is. Read/write from global named register is just like
> > direct read/write to the register
>
> Thanks, that clears up a lot of the confusion for me.
>
> Still, unfortunately, I don't think that's going to work for GCC.
> Changing the '__sp' register variable to global in the header file
> causes it to make a *bunch* of changes across the kernel, even in
> functions which don't do inline asm. It seems to be disabling some
> optimizations across the board.
>
> I do have another idea, which is to replace all uses of
>
> asm(" ... call foo ... " : outputs : inputs : clobbers);
>
> with a new ASM_CALL macro:
>
> ASM_CALL(" ... call foo ... ", outputs, inputs, clobbers);
>
> Then the compiler differences can be abstracted out, with GCC adding
> "sp" as an output constraint and clang doing nothing (for now).

The idea sounds interesting, however I see two issues with ASM_CALL():

Inline assembly expects the individual elements of outputs, inputs and
clobbers to be comma separated, and so does the macro for it's
parameters.

The assembler template can refer to the position of output and input
operands, adding "sp" as output changes the positions of the inputs
wrt to the clang version.

Not sure how to move forward from here. Not even using an ugly #ifdef
seems to be a halfway reasonable solution, since get_user() isn't the
only place using this construct and #ifdefs would result in highly
redundant macros in multiple places.

2017-07-29 00:55:26

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Fri, Jul 28, 2017 at 05:38:52PM -0700, Matthias Kaehlcke wrote:
> El Thu, Jul 20, 2017 at 03:56:52PM -0500 Josh Poimboeuf ha dit:
>
> > On Thu, Jul 20, 2017 at 06:30:24PM +0300, Andrey Ryabinin wrote:
> > > FWIW bellow is my understanding of what's going on.
> > >
> > > It seems clang treats local named register almost the same as ordinary
> > > local variables.
> > > The only difference is that before reading the register variable clang
> > > puts variable's value into the specified register.
> > >
> > > So clang just assigns stack slot for the variable __sp where it's
> > > going to keep variable's value.
> > > But since __sp is unitialized (we haven't assign anything to it), the
> > > value of the __sp is some garbage from stack.
> > > inline asm specifies __sp as input, so clang assumes that it have to
> > > load __sp into 'rsp' because inline asm is going to use
> > > it. And it just loads garbage from stack into 'rsp'
> > >
> > > In fact, such behavior (I mean storing the value on stack and loading
> > > into reg before the use) is very useful.
> > > Clang's behavior allows to keep the value assigned to the
> > > call-clobbered register across the function calls.
> > >
> > > Unlike clang, gcc assigns value to the register right away and doesn't
> > > store the value anywhere else. So if the reg is
> > > call clobbered register you have to be absolutely sure that there is
> > > no subsequent function call that might clobber the register.
> > >
> > > E.g. see some real examples
> > > https://patchwork.kernel.org/patch/4111971/ or 98d4ded60bda("msm: scm:
> > > Fix improper register assignment").
> > > These bugs shouldn't happen with clang.
> > >
> > > But the global named register works slightly differently in clang. For
> > > the global, the value is just the value of the register itself,
> > > whatever it is. Read/write from global named register is just like
> > > direct read/write to the register
> >
> > Thanks, that clears up a lot of the confusion for me.
> >
> > Still, unfortunately, I don't think that's going to work for GCC.
> > Changing the '__sp' register variable to global in the header file
> > causes it to make a *bunch* of changes across the kernel, even in
> > functions which don't do inline asm. It seems to be disabling some
> > optimizations across the board.
> >
> > I do have another idea, which is to replace all uses of
> >
> > asm(" ... call foo ... " : outputs : inputs : clobbers);
> >
> > with a new ASM_CALL macro:
> >
> > ASM_CALL(" ... call foo ... ", outputs, inputs, clobbers);
> >
> > Then the compiler differences can be abstracted out, with GCC adding
> > "sp" as an output constraint and clang doing nothing (for now).
>
> The idea sounds interesting, however I see two issues with ASM_CALL():
>
> Inline assembly expects the individual elements of outputs, inputs and
> clobbers to be comma separated, and so does the macro for it's
> parameters.

I think this isn't a problem, see the (untested and unfinished) patch
below for an idea of how the arguments can be handled.

> The assembler template can refer to the position of output and input
> operands, adding "sp" as output changes the positions of the inputs
> wrt to the clang version.

True, though I think we could convert all ASM_CALL users to use named
operands instead of the (bug-prone) numbered ones. It would be an
improvement regardless.

> Not sure how to move forward from here. Not even using an ugly #ifdef
> seems to be a halfway reasonable solution, since get_user() isn't the
> only place using this construct and #ifdefs would result in highly
> redundant macros in multiple places.

I still think ASM_CALL might work. The below patch is what I came up
with last week before I got pulled into some other work. I'll be out on
vacation next week but I hope to finish the patch after that.


diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 1b020381ab38..5da4bcbeebab 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -216,14 +216,14 @@ static inline int alternatives_text_reserved(void *start, void *end)
* Otherwise, old function is used.
*/
#define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, \
- output, input...) \
+ output, input, clobbers...) \
{ \
- register void *__sp asm(_ASM_SP); \
- asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
- "call %P[new2]", feature2) \
- : output, "+r" (__sp) \
- : [old] "i" (oldfunc), [new1] "i" (newfunc1), \
- [new2] "i" (newfunc2), ## input); \
+ ASM_CALL(ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1, \
+ "call %P[new2]", feature2), \
+ ARGS(output), \
+ ARGS([old] "i" (oldfunc), [new1] "i" (newfunc1), \
+ [new2] "i" (newfunc2), ARGS(input)), \
+ ## clobbers); \
}

/*
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index b4a0d43248cf..21adcc8e739f 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -45,8 +45,8 @@ static inline void clear_page(void *page)
clear_page_rep, X86_FEATURE_REP_GOOD,
clear_page_erms, X86_FEATURE_ERMS,
"=D" (page),
- "0" (page)
- : "memory", "rax", "rcx");
+ "0" (page),
+ ARGS("memory", "rax", "rcx"));
}

void copy_page(void *to, void *from);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index cb976bab6299..2a585b799a3e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -471,8 +471,7 @@ int paravirt_disable_iospace(void);
*/
#ifdef CONFIG_X86_32
#define PVOP_VCALL_ARGS \
- unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx; \
- register void *__sp asm("esp")
+ unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
#define PVOP_CALL_ARGS PVOP_VCALL_ARGS

#define PVOP_CALL_ARG1(x) "a" ((unsigned long)(x))
@@ -492,8 +491,7 @@ int paravirt_disable_iospace(void);
/* [re]ax isn't an arg, but the return val */
#define PVOP_VCALL_ARGS \
unsigned long __edi = __edi, __esi = __esi, \
- __edx = __edx, __ecx = __ecx, __eax = __eax; \
- register void *__sp asm("rsp")
+ __edx = __edx, __ecx = __ecx, __eax = __eax;
#define PVOP_CALL_ARGS PVOP_VCALL_ARGS

#define PVOP_CALL_ARG1(x) "D" ((unsigned long)(x))
@@ -541,24 +539,24 @@ int paravirt_disable_iospace(void);
/* This is 32-bit specific, but is okay in 64-bit */ \
/* since this condition will never hold */ \
if (sizeof(rettype) > sizeof(unsigned long)) { \
- asm volatile(pre \
- paravirt_alt(PARAVIRT_CALL) \
- post \
- : call_clbr, "+r" (__sp) \
- : paravirt_type(op), \
- paravirt_clobber(clbr), \
- ##__VA_ARGS__ \
- : "memory", "cc" extra_clbr); \
+ ASM_CALL(pre \
+ paravirt_alt(PARAVIRT_CALL) \
+ post, \
+ ARGS(call_clbr), \
+ ARGS(paravirt_type(op), \
+ paravirt_clobber(clbr), \
+ ##__VA_ARGS__), \
+ ARGS("memory", "cc" extra_clbr)); \
__ret = (rettype)((((u64)__edx) << 32) | __eax); \
} else { \
- asm volatile(pre \
- paravirt_alt(PARAVIRT_CALL) \
- post \
- : call_clbr, "+r" (__sp) \
- : paravirt_type(op), \
- paravirt_clobber(clbr), \
- ##__VA_ARGS__ \
- : "memory", "cc" extra_clbr); \
+ ASM_CALL(pre \
+ paravirt_alt(PARAVIRT_CALL) \
+ post, \
+ ARGS(call_clbr), \
+ ARGS(paravirt_type(op), \
+ paravirt_clobber(clbr), \
+ ##__VA_ARGS__), \
+ ARGS("memory", "cc" extra_clbr)); \
__ret = (rettype)(__eax & PVOP_RETMASK(rettype)); \
} \
__ret; \
@@ -578,14 +576,14 @@ int paravirt_disable_iospace(void);
({ \
PVOP_VCALL_ARGS; \
PVOP_TEST_NULL(op); \
- asm volatile(pre \
- paravirt_alt(PARAVIRT_CALL) \
- post \
- : call_clbr, "+r" (__sp) \
- : paravirt_type(op), \
- paravirt_clobber(clbr), \
- ##__VA_ARGS__ \
- : "memory", "cc" extra_clbr); \
+ ASM_CALL(pre \
+ paravirt_alt(PARAVIRT_CALL) \
+ post, \
+ ARGS(call_clbr), \
+ ARGS(paravirt_type(op), \
+ paravirt_clobber(clbr), \
+ ##__VA_ARGS__), \
+ ARGS("memory", "cc" extra_clbr)); \
})

#define __PVOP_VCALL(op, pre, post, ...) \
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index ec1f3c651150..19a034cbde37 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -102,16 +102,14 @@ static __always_inline bool should_resched(int preempt_offset)
extern asmlinkage void ___preempt_schedule(void);
# define __preempt_schedule() \
({ \
- register void *__sp asm(_ASM_SP); \
- asm volatile ("call ___preempt_schedule" : "+r"(__sp)); \
+ ASM_CALL("call ___preempt_schedule",,); \
})

extern asmlinkage void preempt_schedule(void);
extern asmlinkage void ___preempt_schedule_notrace(void);
# define __preempt_schedule_notrace() \
({ \
- register void *__sp asm(_ASM_SP); \
- asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp)); \
+ ASM_CALL("call ___preempt_schedule_notrace",,); \
})
extern asmlinkage void preempt_schedule_notrace(void);
#endif
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 550bc4ab0df4..2bbfb777c8bb 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -677,20 +677,19 @@ static inline void sync_core(void)
* Like all of Linux's memory ordering operations, this is a
* compiler barrier as well.
*/
- register void *__sp asm(_ASM_SP);

#ifdef CONFIG_X86_32
- asm volatile (
+ ASM_CALL(
"pushfl\n\t"
"pushl %%cs\n\t"
"pushl $1f\n\t"
"iret\n\t"
- "1:"
- : "+r" (__sp) : : "memory");
+ "1:",
+ , , "memory");
#else
unsigned int tmp;

- asm volatile (
+ ASM_CALL(
UNWIND_HINT_SAVE
"mov %%ss, %0\n\t"
"pushq %q0\n\t"
@@ -702,8 +701,8 @@ static inline void sync_core(void)
"pushq $1f\n\t"
"iretq\n\t"
UNWIND_HINT_RESTORE
- "1:"
- : "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
+ "1:",
+ "=&r" (tmp), , ARGS("cc", "memory"));
#endif
}

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index a34e0d4b957d..9d20f4fb5d7c 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -99,25 +99,24 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
/*
* lock for writing
*/
-#define ____down_write(sem, slow_path) \
-({ \
- long tmp; \
- struct rw_semaphore* ret; \
- register void *__sp asm(_ASM_SP); \
- \
- asm volatile("# beginning down_write\n\t" \
- LOCK_PREFIX " xadd %1,(%4)\n\t" \
- /* adds 0xffff0001, returns the old value */ \
- " test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
- /* was the active mask 0 before? */\
- " jz 1f\n" \
- " call " slow_path "\n" \
- "1:\n" \
- "# ending down_write" \
- : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
- : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
- : "memory", "cc"); \
- ret; \
+#define ____down_write(sem, slow_path) \
+({ \
+ long tmp; \
+ struct rw_semaphore* ret; \
+ \
+ ASM_CALL("# beginning down_write\n\t" \
+ LOCK_PREFIX " xadd %1,(%3)\n\t" \
+ /* adds 0xffff0001, returns the old value */ \
+ " test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
+ /* was the active mask 0 before? */ \
+ " jz 1f\n" \
+ " call " slow_path "\n" \
+ "1:\n" \
+ "# ending down_write", \
+ ARGS("+m" (sem->count), "=d" (tmp), "=a" (ret)), \
+ ARGS("a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS)), \
+ ARGS("memory", "cc")); \
+ ret; \
})

static inline void __down_write(struct rw_semaphore *sem)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 184eb9894dae..c7be076ee703 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -166,12 +166,11 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
({ \
int __ret_gu; \
register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
- register void *__sp asm(_ASM_SP); \
__chk_user_ptr(ptr); \
might_fault(); \
- asm volatile("call __get_user_%P4" \
- : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
- : "0" (ptr), "i" (sizeof(*(ptr)))); \
+ ASM_CALL("call __get_user_%P3", \
+ ARGS("=a" (__ret_gu), "=r" (__val_gu)), \
+ ARGS("0" (ptr), "i" (sizeof(*(ptr))))); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
__builtin_expect(__ret_gu, 0); \
})
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index b16f6a1d8b26..8b63e2cb1eaf 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -38,10 +38,9 @@ copy_user_generic(void *to, const void *from, unsigned len)
X86_FEATURE_REP_GOOD,
copy_user_enhanced_fast_string,
X86_FEATURE_ERMS,
- ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from),
- "=d" (len)),
- "1" (to), "2" (from), "3" (len)
- : "memory", "rcx", "r8", "r9", "r10", "r11");
+ ARGS("=a" (ret), "=D" (to), "=S" (from), "=d" (len)),
+ ARGS("1" (to), "2" (from), "3" (len)),
+ ARGS("memory", "rcx", "r8", "r9", "r10", "r11"));
return ret;
}

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 11071fcd630e..2feddeeec1d1 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -114,9 +114,8 @@ extern struct { char _entry[32]; } hypercall_page[];
register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
- register void *__sp asm(_ASM_SP);

-#define __HYPERCALL_0PARAM "=r" (__res), "+r" (__sp)
+#define __HYPERCALL_0PARAM "=r" (__res)
#define __HYPERCALL_1PARAM __HYPERCALL_0PARAM, "+r" (__arg1)
#define __HYPERCALL_2PARAM __HYPERCALL_1PARAM, "+r" (__arg2)
#define __HYPERCALL_3PARAM __HYPERCALL_2PARAM, "+r" (__arg3)
@@ -146,10 +145,10 @@ extern struct { char _entry[32]; } hypercall_page[];
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_0ARG(); \
- asm volatile (__HYPERCALL \
- : __HYPERCALL_0PARAM \
- : __HYPERCALL_ENTRY(name) \
- : __HYPERCALL_CLOBBER0); \
+ ASM_CALL(__HYPERCALL, \
+ __HYPERCALL_0PARAM, \
+ __HYPERCALL_ENTRY(name), \
+ __HYPERCALL_CLOBBER0); \
(type)__res; \
})

@@ -157,10 +156,10 @@ extern struct { char _entry[32]; } hypercall_page[];
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_1ARG(a1); \
- asm volatile (__HYPERCALL \
- : __HYPERCALL_1PARAM \
- : __HYPERCALL_ENTRY(name) \
- : __HYPERCALL_CLOBBER1); \
+ ASM_CALL(__HYPERCALL, \
+ __HYPERCALL_1PARAM, \
+ __HYPERCALL_ENTRY(name), \
+ __HYPERCALL_CLOBBER1); \
(type)__res; \
})

@@ -168,10 +167,10 @@ extern struct { char _entry[32]; } hypercall_page[];
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_2ARG(a1, a2); \
- asm volatile (__HYPERCALL \
- : __HYPERCALL_2PARAM \
- : __HYPERCALL_ENTRY(name) \
- : __HYPERCALL_CLOBBER2); \
+ ASM_CALL(__HYPERCALL, \
+ __HYPERCALL_2PARAM, \
+ __HYPERCALL_ENTRY(name), \
+ __HYPERCALL_CLOBBER2); \
(type)__res; \
})

@@ -179,10 +178,10 @@ extern struct { char _entry[32]; } hypercall_page[];
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_3ARG(a1, a2, a3); \
- asm volatile (__HYPERCALL \
- : __HYPERCALL_3PARAM \
- : __HYPERCALL_ENTRY(name) \
- : __HYPERCALL_CLOBBER3); \
+ ASM_CALL(__HYPERCALL, \
+ __HYPERCALL_3PARAM, \
+ __HYPERCALL_ENTRY(name), \
+ __HYPERCALL_CLOBBER3); \
(type)__res; \
})

@@ -190,10 +189,10 @@ extern struct { char _entry[32]; } hypercall_page[];
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_4ARG(a1, a2, a3, a4); \
- asm volatile (__HYPERCALL \
- : __HYPERCALL_4PARAM \
- : __HYPERCALL_ENTRY(name) \
- : __HYPERCALL_CLOBBER4); \
+ ASM_CALL(__HYPERCALL, \
+ __HYPERCALL_4PARAM, \
+ __HYPERCALL_ENTRY(name), \
+ __HYPERCALL_CLOBBER4); \
(type)__res; \
})

@@ -201,10 +200,10 @@ extern struct { char _entry[32]; } hypercall_page[];
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_5ARG(a1, a2, a3, a4, a5); \
- asm volatile (__HYPERCALL \
- : __HYPERCALL_5PARAM \
- : __HYPERCALL_ENTRY(name) \
- : __HYPERCALL_CLOBBER5); \
+ ASM_CALL(__HYPERCALL, \
+ __HYPERCALL_5PARAM, \
+ __HYPERCALL_ENTRY(name), \
+ __HYPERCALL_CLOBBER5); \
(type)__res; \
})

@@ -218,10 +217,10 @@ privcmd_call(unsigned call,
__HYPERCALL_5ARG(a1, a2, a3, a4, a5);

stac();
- asm volatile("call *%[call]"
- : __HYPERCALL_5PARAM
- : [call] "a" (&hypercall_page[call])
- : __HYPERCALL_CLOBBER5);
+ ASM_CALL("call *%[call]",
+ __HYPERCALL_5PARAM,
+ [call] "a" (&hypercall_page[call]),
+ __HYPERCALL_CLOBBER5);
clac();

return (long)__res;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fb0055953fbc..4d0d326029a7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5284,16 +5284,15 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,

static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
{
- register void *__sp asm(_ASM_SP);
ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;

if (!(ctxt->d & ByteOp))
fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE;

- asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
- : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
- [fastop]"+S"(fop), "+r"(__sp)
- : "c"(ctxt->src2.val));
+ ASM_CALL("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n",
+ ARGS("+a"(ctxt->dst.val), "+d"(ctxt->src.val),
+ [flags]"+D"(flags), [fastop]"+S"(fop)),
+ ARGS("c"(ctxt->src2.val)));

ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
if (!fop) /* exception is returned in fop variable */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ffd469ecad57..d42806ad3c9c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8671,7 +8671,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
{
u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
- register void *__sp asm(_ASM_SP);

if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
== (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
@@ -8686,7 +8685,7 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
vector = exit_intr_info & INTR_INFO_VECTOR_MASK;
desc = (gate_desc *)vmx->host_idt_base + vector;
entry = gate_offset(*desc);
- asm volatile(
+ ASM_CALL(
#ifdef CONFIG_X86_64
"mov %%" _ASM_SP ", %[sp]\n\t"
"and $0xfffffffffffffff0, %%" _ASM_SP "\n\t"
@@ -8695,16 +8694,14 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
#endif
"pushf\n\t"
__ASM_SIZE(push) " $%c[cs]\n\t"
- "call *%[entry]\n\t"
- :
+ "call *%[entry]\n\t",
#ifdef CONFIG_X86_64
- [sp]"=&r"(tmp),
+ [sp]"=&r"(tmp)
#endif
- "+r"(__sp)
- :
- [entry]"r"(entry),
- [ss]"i"(__KERNEL_DS),
- [cs]"i"(__KERNEL_CS)
+ ,
+ ARGS([entry]"r"(entry),
+ [ss]"i"(__KERNEL_DS),
+ [cs]"i"(__KERNEL_CS))
);
}
}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2a1fa10c6a98..ca642ac3a390 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -802,7 +802,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
if (is_vmalloc_addr((void *)address) &&
(((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
- register void *__sp asm("rsp");
unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
/*
* We're likely to be running with very little stack space
@@ -814,13 +813,12 @@ no_context(struct pt_regs *regs, unsigned long error_code,
* and then double-fault, though, because we're likely to
* break the console driver and lose most of the stack dump.
*/
- asm volatile ("movq %[stack], %%rsp\n\t"
- "call handle_stack_overflow\n\t"
- "1: jmp 1b"
- : "+r" (__sp)
- : "D" ("kernel stack overflow (page fault)"),
- "S" (regs), "d" (address),
- [stack] "rm" (stack));
+ ASM_CALL("movq %[stack], %%rsp\n\t"
+ "call handle_stack_overflow\n\t"
+ "1: jmp 1b",
+ ,
+ ARGS("D" ("kernel stack overflow (page fault)"),
+ "S" (regs), "d" (address), [stack] "rm" (stack)));
unreachable();
}
#endif
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 3f8c88e29a46..3a57f32a0bfd 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -589,4 +589,18 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
(_________p1); \
})

+#define ARGS(args...) args
+
+#ifdef CONFIG_FRAME_POINTER
+
+#define ASM_CALL(str, outputs, inputs, clobbers...) \
+ asm volatile(str : outputs : inputs : "sp", ## clobbers)
+
+#else
+
+#define ASM_CALL(str, outputs, inputs, clobbers...) \
+ asm volatile(str : outputs : inputs : clobbers);
+
+#endif
+
#endif /* __LINUX_COMPILER_H */
diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
index 6a1af43862df..766a00ebf80c 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -193,13 +193,8 @@ they mean, and suggestions for how to fix them.

If it's a GCC-compiled .c file, the error may be because the function
uses an inline asm() statement which has a "call" instruction. An
- asm() statement with a call instruction must declare the use of the
- stack pointer in its output operand. For example, on x86_64:
-
- register void *__sp asm("rsp");
- asm volatile("call func" : "+r" (__sp));
-
- Otherwise the stack frame may not get created before the call.
+ asm() statement with a call instruction must use the ASM_CALL macro,
+ which forces the frame pointer to be saved before the call.


2. file.o: warning: objtool: .text+0x53: unreachable instruction
@@ -221,7 +216,7 @@ they mean, and suggestions for how to fix them.
change ENDPROC to END.


-4. file.o: warning: objtool: func(): can't find starting instruction
+3. file.o: warning: objtool: func(): can't find starting instruction
or
file.o: warning: objtool: func()+0x11dd: can't decode instruction

@@ -230,7 +225,7 @@ they mean, and suggestions for how to fix them.
section like .data or .rodata.


-5. file.o: warning: objtool: func()+0x6: unsupported instruction in callable function
+4. file.o: warning: objtool: func()+0x6: unsupported instruction in callable function

This is a kernel entry/exit instruction like sysenter or iret. Such
instructions aren't allowed in a callable function, and are most
@@ -239,7 +234,7 @@ they mean, and suggestions for how to fix them.
annotated with the unwind hint macros in asm/unwind_hints.h.


-6. file.o: warning: objtool: func()+0x26: sibling call from callable instruction with modified stack frame
+5. file.o: warning: objtool: func()+0x26: sibling call from callable instruction with modified stack frame

This is a dynamic jump or a jump to an undefined symbol. Objtool
assumed it's a sibling call and detected that the frame pointer
@@ -253,7 +248,7 @@ they mean, and suggestions for how to fix them.
the unwind hint macros in asm/unwind_hints.h.


-7. file: warning: objtool: func()+0x5c: stack state mismatch
+6. file: warning: objtool: func()+0x5c: stack state mismatch

The instruction's frame pointer state is inconsistent, depending on
which execution path was taken to reach the instruction.
@@ -270,7 +265,7 @@ they mean, and suggestions for how to fix them.
asm/unwind_hints.h.


-8. file.o: warning: objtool: funcA() falls through to next function funcB()
+7. file.o: warning: objtool: funcA() falls through to next function funcB()

This means that funcA() doesn't end with a return instruction or an
unconditional jump, and that objtool has determined that the function

2017-07-29 00:58:16

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

On Fri, Jul 28, 2017 at 07:55:21PM -0500, Josh Poimboeuf wrote:
> +#define ASM_CALL(str, outputs, inputs, clobbers...) \
> + asm volatile(str : outputs : inputs : "sp", ## clobbers)

And note this part isn't right, the sp should be in the output operands.

--
Josh

2017-07-29 01:06:35

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/uaccess: Add stack frame output operand in get_user() inline asm"

El Fri, Jul 28, 2017 at 07:55:21PM -0500 Josh Poimboeuf ha dit:

> On Fri, Jul 28, 2017 at 05:38:52PM -0700, Matthias Kaehlcke wrote:
> > El Thu, Jul 20, 2017 at 03:56:52PM -0500 Josh Poimboeuf ha dit:
> >
> > > On Thu, Jul 20, 2017 at 06:30:24PM +0300, Andrey Ryabinin wrote:
> > > > FWIW bellow is my understanding of what's going on.
> > > >
> > > > It seems clang treats local named register almost the same as ordinary
> > > > local variables.
> > > > The only difference is that before reading the register variable clang
> > > > puts variable's value into the specified register.
> > > >
> > > > So clang just assigns stack slot for the variable __sp where it's
> > > > going to keep variable's value.
> > > > But since __sp is unitialized (we haven't assign anything to it), the
> > > > value of the __sp is some garbage from stack.
> > > > inline asm specifies __sp as input, so clang assumes that it have to
> > > > load __sp into 'rsp' because inline asm is going to use
> > > > it. And it just loads garbage from stack into 'rsp'
> > > >
> > > > In fact, such behavior (I mean storing the value on stack and loading
> > > > into reg before the use) is very useful.
> > > > Clang's behavior allows to keep the value assigned to the
> > > > call-clobbered register across the function calls.
> > > >
> > > > Unlike clang, gcc assigns value to the register right away and doesn't
> > > > store the value anywhere else. So if the reg is
> > > > call clobbered register you have to be absolutely sure that there is
> > > > no subsequent function call that might clobber the register.
> > > >
> > > > E.g. see some real examples
> > > > https://patchwork.kernel.org/patch/4111971/ or 98d4ded60bda("msm: scm:
> > > > Fix improper register assignment").
> > > > These bugs shouldn't happen with clang.
> > > >
> > > > But the global named register works slightly differently in clang. For
> > > > the global, the value is just the value of the register itself,
> > > > whatever it is. Read/write from global named register is just like
> > > > direct read/write to the register
> > >
> > > Thanks, that clears up a lot of the confusion for me.
> > >
> > > Still, unfortunately, I don't think that's going to work for GCC.
> > > Changing the '__sp' register variable to global in the header file
> > > causes it to make a *bunch* of changes across the kernel, even in
> > > functions which don't do inline asm. It seems to be disabling some
> > > optimizations across the board.
> > >
> > > I do have another idea, which is to replace all uses of
> > >
> > > asm(" ... call foo ... " : outputs : inputs : clobbers);
> > >
> > > with a new ASM_CALL macro:
> > >
> > > ASM_CALL(" ... call foo ... ", outputs, inputs, clobbers);
> > >
> > > Then the compiler differences can be abstracted out, with GCC adding
> > > "sp" as an output constraint and clang doing nothing (for now).
> >
> > The idea sounds interesting, however I see two issues with ASM_CALL():
> >
> > Inline assembly expects the individual elements of outputs, inputs and
> > clobbers to be comma separated, and so does the macro for it's
> > parameters.
>
> I think this isn't a problem, see the (untested and unfinished) patch
> below for an idea of how the arguments can be handled.

Good point!

> > The assembler template can refer to the position of output and input
> > operands, adding "sp" as output changes the positions of the inputs
> > wrt to the clang version.
>
> True, though I think we could convert all ASM_CALL users to use named
> operands instead of the (bug-prone) numbered ones. It would be an
> improvement regardless.

Agreed, that sounds like a good solution and a desirable improvement.

> > Not sure how to move forward from here. Not even using an ugly #ifdef
> > seems to be a halfway reasonable solution, since get_user() isn't the
> > only place using this construct and #ifdefs would result in highly
> > redundant macros in multiple places.
>
> I still think ASM_CALL might work. The below patch is what I came up
> with last week before I got pulled into some other work. I'll be out on
> vacation next week but I hope to finish the patch after that.

Thanks for working on this.

Enjoy your vacation!

Matthias

> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 1b020381ab38..5da4bcbeebab 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -216,14 +216,14 @@ static inline int alternatives_text_reserved(void *start, void *end)
> * Otherwise, old function is used.
> */
> #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, \
> - output, input...) \
> + output, input, clobbers...) \
> { \
> - register void *__sp asm(_ASM_SP); \
> - asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
> - "call %P[new2]", feature2) \
> - : output, "+r" (__sp) \
> - : [old] "i" (oldfunc), [new1] "i" (newfunc1), \
> - [new2] "i" (newfunc2), ## input); \
> + ASM_CALL(ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1, \
> + "call %P[new2]", feature2), \
> + ARGS(output), \
> + ARGS([old] "i" (oldfunc), [new1] "i" (newfunc1), \
> + [new2] "i" (newfunc2), ARGS(input)), \
> + ## clobbers); \
> }
>
> /*
> diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
> index b4a0d43248cf..21adcc8e739f 100644
> --- a/arch/x86/include/asm/page_64.h
> +++ b/arch/x86/include/asm/page_64.h
> @@ -45,8 +45,8 @@ static inline void clear_page(void *page)
> clear_page_rep, X86_FEATURE_REP_GOOD,
> clear_page_erms, X86_FEATURE_ERMS,
> "=D" (page),
> - "0" (page)
> - : "memory", "rax", "rcx");
> + "0" (page),
> + ARGS("memory", "rax", "rcx"));
> }
>
> void copy_page(void *to, void *from);
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index cb976bab6299..2a585b799a3e 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -471,8 +471,7 @@ int paravirt_disable_iospace(void);
> */
> #ifdef CONFIG_X86_32
> #define PVOP_VCALL_ARGS \
> - unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx; \
> - register void *__sp asm("esp")
> + unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
> #define PVOP_CALL_ARGS PVOP_VCALL_ARGS
>
> #define PVOP_CALL_ARG1(x) "a" ((unsigned long)(x))
> @@ -492,8 +491,7 @@ int paravirt_disable_iospace(void);
> /* [re]ax isn't an arg, but the return val */
> #define PVOP_VCALL_ARGS \
> unsigned long __edi = __edi, __esi = __esi, \
> - __edx = __edx, __ecx = __ecx, __eax = __eax; \
> - register void *__sp asm("rsp")
> + __edx = __edx, __ecx = __ecx, __eax = __eax;
> #define PVOP_CALL_ARGS PVOP_VCALL_ARGS
>
> #define PVOP_CALL_ARG1(x) "D" ((unsigned long)(x))
> @@ -541,24 +539,24 @@ int paravirt_disable_iospace(void);
> /* This is 32-bit specific, but is okay in 64-bit */ \
> /* since this condition will never hold */ \
> if (sizeof(rettype) > sizeof(unsigned long)) { \
> - asm volatile(pre \
> - paravirt_alt(PARAVIRT_CALL) \
> - post \
> - : call_clbr, "+r" (__sp) \
> - : paravirt_type(op), \
> - paravirt_clobber(clbr), \
> - ##__VA_ARGS__ \
> - : "memory", "cc" extra_clbr); \
> + ASM_CALL(pre \
> + paravirt_alt(PARAVIRT_CALL) \
> + post, \
> + ARGS(call_clbr), \
> + ARGS(paravirt_type(op), \
> + paravirt_clobber(clbr), \
> + ##__VA_ARGS__), \
> + ARGS("memory", "cc" extra_clbr)); \
> __ret = (rettype)((((u64)__edx) << 32) | __eax); \
> } else { \
> - asm volatile(pre \
> - paravirt_alt(PARAVIRT_CALL) \
> - post \
> - : call_clbr, "+r" (__sp) \
> - : paravirt_type(op), \
> - paravirt_clobber(clbr), \
> - ##__VA_ARGS__ \
> - : "memory", "cc" extra_clbr); \
> + ASM_CALL(pre \
> + paravirt_alt(PARAVIRT_CALL) \
> + post, \
> + ARGS(call_clbr), \
> + ARGS(paravirt_type(op), \
> + paravirt_clobber(clbr), \
> + ##__VA_ARGS__), \
> + ARGS("memory", "cc" extra_clbr)); \
> __ret = (rettype)(__eax & PVOP_RETMASK(rettype)); \
> } \
> __ret; \
> @@ -578,14 +576,14 @@ int paravirt_disable_iospace(void);
> ({ \
> PVOP_VCALL_ARGS; \
> PVOP_TEST_NULL(op); \
> - asm volatile(pre \
> - paravirt_alt(PARAVIRT_CALL) \
> - post \
> - : call_clbr, "+r" (__sp) \
> - : paravirt_type(op), \
> - paravirt_clobber(clbr), \
> - ##__VA_ARGS__ \
> - : "memory", "cc" extra_clbr); \
> + ASM_CALL(pre \
> + paravirt_alt(PARAVIRT_CALL) \
> + post, \
> + ARGS(call_clbr), \
> + ARGS(paravirt_type(op), \
> + paravirt_clobber(clbr), \
> + ##__VA_ARGS__), \
> + ARGS("memory", "cc" extra_clbr)); \
> })
>
> #define __PVOP_VCALL(op, pre, post, ...) \
> diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
> index ec1f3c651150..19a034cbde37 100644
> --- a/arch/x86/include/asm/preempt.h
> +++ b/arch/x86/include/asm/preempt.h
> @@ -102,16 +102,14 @@ static __always_inline bool should_resched(int preempt_offset)
> extern asmlinkage void ___preempt_schedule(void);
> # define __preempt_schedule() \
> ({ \
> - register void *__sp asm(_ASM_SP); \
> - asm volatile ("call ___preempt_schedule" : "+r"(__sp)); \
> + ASM_CALL("call ___preempt_schedule",,); \
> })
>
> extern asmlinkage void preempt_schedule(void);
> extern asmlinkage void ___preempt_schedule_notrace(void);
> # define __preempt_schedule_notrace() \
> ({ \
> - register void *__sp asm(_ASM_SP); \
> - asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp)); \
> + ASM_CALL("call ___preempt_schedule_notrace",,); \
> })
> extern asmlinkage void preempt_schedule_notrace(void);
> #endif
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 550bc4ab0df4..2bbfb777c8bb 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -677,20 +677,19 @@ static inline void sync_core(void)
> * Like all of Linux's memory ordering operations, this is a
> * compiler barrier as well.
> */
> - register void *__sp asm(_ASM_SP);
>
> #ifdef CONFIG_X86_32
> - asm volatile (
> + ASM_CALL(
> "pushfl\n\t"
> "pushl %%cs\n\t"
> "pushl $1f\n\t"
> "iret\n\t"
> - "1:"
> - : "+r" (__sp) : : "memory");
> + "1:",
> + , , "memory");
> #else
> unsigned int tmp;
>
> - asm volatile (
> + ASM_CALL(
> UNWIND_HINT_SAVE
> "mov %%ss, %0\n\t"
> "pushq %q0\n\t"
> @@ -702,8 +701,8 @@ static inline void sync_core(void)
> "pushq $1f\n\t"
> "iretq\n\t"
> UNWIND_HINT_RESTORE
> - "1:"
> - : "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
> + "1:",
> + "=&r" (tmp), , ARGS("cc", "memory"));
> #endif
> }
>
> diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
> index a34e0d4b957d..9d20f4fb5d7c 100644
> --- a/arch/x86/include/asm/rwsem.h
> +++ b/arch/x86/include/asm/rwsem.h
> @@ -99,25 +99,24 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
> /*
> * lock for writing
> */
> -#define ____down_write(sem, slow_path) \
> -({ \
> - long tmp; \
> - struct rw_semaphore* ret; \
> - register void *__sp asm(_ASM_SP); \
> - \
> - asm volatile("# beginning down_write\n\t" \
> - LOCK_PREFIX " xadd %1,(%4)\n\t" \
> - /* adds 0xffff0001, returns the old value */ \
> - " test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
> - /* was the active mask 0 before? */\
> - " jz 1f\n" \
> - " call " slow_path "\n" \
> - "1:\n" \
> - "# ending down_write" \
> - : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
> - : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
> - : "memory", "cc"); \
> - ret; \
> +#define ____down_write(sem, slow_path) \
> +({ \
> + long tmp; \
> + struct rw_semaphore* ret; \
> + \
> + ASM_CALL("# beginning down_write\n\t" \
> + LOCK_PREFIX " xadd %1,(%3)\n\t" \
> + /* adds 0xffff0001, returns the old value */ \
> + " test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
> + /* was the active mask 0 before? */ \
> + " jz 1f\n" \
> + " call " slow_path "\n" \
> + "1:\n" \
> + "# ending down_write", \
> + ARGS("+m" (sem->count), "=d" (tmp), "=a" (ret)), \
> + ARGS("a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS)), \
> + ARGS("memory", "cc")); \
> + ret; \
> })
>
> static inline void __down_write(struct rw_semaphore *sem)
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 184eb9894dae..c7be076ee703 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -166,12 +166,11 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> ({ \
> int __ret_gu; \
> register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
> - register void *__sp asm(_ASM_SP); \
> __chk_user_ptr(ptr); \
> might_fault(); \
> - asm volatile("call __get_user_%P4" \
> - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> - : "0" (ptr), "i" (sizeof(*(ptr)))); \
> + ASM_CALL("call __get_user_%P3", \
> + ARGS("=a" (__ret_gu), "=r" (__val_gu)), \
> + ARGS("0" (ptr), "i" (sizeof(*(ptr))))); \
> (x) = (__force __typeof__(*(ptr))) __val_gu; \
> __builtin_expect(__ret_gu, 0); \
> })
> diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
> index b16f6a1d8b26..8b63e2cb1eaf 100644
> --- a/arch/x86/include/asm/uaccess_64.h
> +++ b/arch/x86/include/asm/uaccess_64.h
> @@ -38,10 +38,9 @@ copy_user_generic(void *to, const void *from, unsigned len)
> X86_FEATURE_REP_GOOD,
> copy_user_enhanced_fast_string,
> X86_FEATURE_ERMS,
> - ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from),
> - "=d" (len)),
> - "1" (to), "2" (from), "3" (len)
> - : "memory", "rcx", "r8", "r9", "r10", "r11");
> + ARGS("=a" (ret), "=D" (to), "=S" (from), "=d" (len)),
> + ARGS("1" (to), "2" (from), "3" (len)),
> + ARGS("memory", "rcx", "r8", "r9", "r10", "r11"));
> return ret;
> }
>
> diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
> index 11071fcd630e..2feddeeec1d1 100644
> --- a/arch/x86/include/asm/xen/hypercall.h
> +++ b/arch/x86/include/asm/xen/hypercall.h
> @@ -114,9 +114,8 @@ extern struct { char _entry[32]; } hypercall_page[];
> register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
> register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
> register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
> - register void *__sp asm(_ASM_SP);
>
> -#define __HYPERCALL_0PARAM "=r" (__res), "+r" (__sp)
> +#define __HYPERCALL_0PARAM "=r" (__res)
> #define __HYPERCALL_1PARAM __HYPERCALL_0PARAM, "+r" (__arg1)
> #define __HYPERCALL_2PARAM __HYPERCALL_1PARAM, "+r" (__arg2)
> #define __HYPERCALL_3PARAM __HYPERCALL_2PARAM, "+r" (__arg3)
> @@ -146,10 +145,10 @@ extern struct { char _entry[32]; } hypercall_page[];
> ({ \
> __HYPERCALL_DECLS; \
> __HYPERCALL_0ARG(); \
> - asm volatile (__HYPERCALL \
> - : __HYPERCALL_0PARAM \
> - : __HYPERCALL_ENTRY(name) \
> - : __HYPERCALL_CLOBBER0); \
> + ASM_CALL(__HYPERCALL, \
> + __HYPERCALL_0PARAM, \
> + __HYPERCALL_ENTRY(name), \
> + __HYPERCALL_CLOBBER0); \
> (type)__res; \
> })
>
> @@ -157,10 +156,10 @@ extern struct { char _entry[32]; } hypercall_page[];
> ({ \
> __HYPERCALL_DECLS; \
> __HYPERCALL_1ARG(a1); \
> - asm volatile (__HYPERCALL \
> - : __HYPERCALL_1PARAM \
> - : __HYPERCALL_ENTRY(name) \
> - : __HYPERCALL_CLOBBER1); \
> + ASM_CALL(__HYPERCALL, \
> + __HYPERCALL_1PARAM, \
> + __HYPERCALL_ENTRY(name), \
> + __HYPERCALL_CLOBBER1); \
> (type)__res; \
> })
>
> @@ -168,10 +167,10 @@ extern struct { char _entry[32]; } hypercall_page[];
> ({ \
> __HYPERCALL_DECLS; \
> __HYPERCALL_2ARG(a1, a2); \
> - asm volatile (__HYPERCALL \
> - : __HYPERCALL_2PARAM \
> - : __HYPERCALL_ENTRY(name) \
> - : __HYPERCALL_CLOBBER2); \
> + ASM_CALL(__HYPERCALL, \
> + __HYPERCALL_2PARAM, \
> + __HYPERCALL_ENTRY(name), \
> + __HYPERCALL_CLOBBER2); \
> (type)__res; \
> })
>
> @@ -179,10 +178,10 @@ extern struct { char _entry[32]; } hypercall_page[];
> ({ \
> __HYPERCALL_DECLS; \
> __HYPERCALL_3ARG(a1, a2, a3); \
> - asm volatile (__HYPERCALL \
> - : __HYPERCALL_3PARAM \
> - : __HYPERCALL_ENTRY(name) \
> - : __HYPERCALL_CLOBBER3); \
> + ASM_CALL(__HYPERCALL, \
> + __HYPERCALL_3PARAM, \
> + __HYPERCALL_ENTRY(name), \
> + __HYPERCALL_CLOBBER3); \
> (type)__res; \
> })
>
> @@ -190,10 +189,10 @@ extern struct { char _entry[32]; } hypercall_page[];
> ({ \
> __HYPERCALL_DECLS; \
> __HYPERCALL_4ARG(a1, a2, a3, a4); \
> - asm volatile (__HYPERCALL \
> - : __HYPERCALL_4PARAM \
> - : __HYPERCALL_ENTRY(name) \
> - : __HYPERCALL_CLOBBER4); \
> + ASM_CALL(__HYPERCALL, \
> + __HYPERCALL_4PARAM, \
> + __HYPERCALL_ENTRY(name), \
> + __HYPERCALL_CLOBBER4); \
> (type)__res; \
> })
>
> @@ -201,10 +200,10 @@ extern struct { char _entry[32]; } hypercall_page[];
> ({ \
> __HYPERCALL_DECLS; \
> __HYPERCALL_5ARG(a1, a2, a3, a4, a5); \
> - asm volatile (__HYPERCALL \
> - : __HYPERCALL_5PARAM \
> - : __HYPERCALL_ENTRY(name) \
> - : __HYPERCALL_CLOBBER5); \
> + ASM_CALL(__HYPERCALL, \
> + __HYPERCALL_5PARAM, \
> + __HYPERCALL_ENTRY(name), \
> + __HYPERCALL_CLOBBER5); \
> (type)__res; \
> })
>
> @@ -218,10 +217,10 @@ privcmd_call(unsigned call,
> __HYPERCALL_5ARG(a1, a2, a3, a4, a5);
>
> stac();
> - asm volatile("call *%[call]"
> - : __HYPERCALL_5PARAM
> - : [call] "a" (&hypercall_page[call])
> - : __HYPERCALL_CLOBBER5);
> + ASM_CALL("call *%[call]",
> + __HYPERCALL_5PARAM,
> + [call] "a" (&hypercall_page[call]),
> + __HYPERCALL_CLOBBER5);
> clac();
>
> return (long)__res;
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index fb0055953fbc..4d0d326029a7 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -5284,16 +5284,15 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
>
> static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
> {
> - register void *__sp asm(_ASM_SP);
> ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
>
> if (!(ctxt->d & ByteOp))
> fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE;
>
> - asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
> - : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
> - [fastop]"+S"(fop), "+r"(__sp)
> - : "c"(ctxt->src2.val));
> + ASM_CALL("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n",
> + ARGS("+a"(ctxt->dst.val), "+d"(ctxt->src.val),
> + [flags]"+D"(flags), [fastop]"+S"(fop)),
> + ARGS("c"(ctxt->src2.val)));
>
> ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
> if (!fop) /* exception is returned in fop variable */
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index ffd469ecad57..d42806ad3c9c 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -8671,7 +8671,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
> static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
> {
> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> - register void *__sp asm(_ASM_SP);
>
> if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
> == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
> @@ -8686,7 +8685,7 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
> vector = exit_intr_info & INTR_INFO_VECTOR_MASK;
> desc = (gate_desc *)vmx->host_idt_base + vector;
> entry = gate_offset(*desc);
> - asm volatile(
> + ASM_CALL(
> #ifdef CONFIG_X86_64
> "mov %%" _ASM_SP ", %[sp]\n\t"
> "and $0xfffffffffffffff0, %%" _ASM_SP "\n\t"
> @@ -8695,16 +8694,14 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
> #endif
> "pushf\n\t"
> __ASM_SIZE(push) " $%c[cs]\n\t"
> - "call *%[entry]\n\t"
> - :
> + "call *%[entry]\n\t",
> #ifdef CONFIG_X86_64
> - [sp]"=&r"(tmp),
> + [sp]"=&r"(tmp)
> #endif
> - "+r"(__sp)
> - :
> - [entry]"r"(entry),
> - [ss]"i"(__KERNEL_DS),
> - [cs]"i"(__KERNEL_CS)
> + ,
> + ARGS([entry]"r"(entry),
> + [ss]"i"(__KERNEL_DS),
> + [cs]"i"(__KERNEL_CS))
> );
> }
> }
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 2a1fa10c6a98..ca642ac3a390 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -802,7 +802,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> if (is_vmalloc_addr((void *)address) &&
> (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
> address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
> - register void *__sp asm("rsp");
> unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
> /*
> * We're likely to be running with very little stack space
> @@ -814,13 +813,12 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> * and then double-fault, though, because we're likely to
> * break the console driver and lose most of the stack dump.
> */
> - asm volatile ("movq %[stack], %%rsp\n\t"
> - "call handle_stack_overflow\n\t"
> - "1: jmp 1b"
> - : "+r" (__sp)
> - : "D" ("kernel stack overflow (page fault)"),
> - "S" (regs), "d" (address),
> - [stack] "rm" (stack));
> + ASM_CALL("movq %[stack], %%rsp\n\t"
> + "call handle_stack_overflow\n\t"
> + "1: jmp 1b",
> + ,
> + ARGS("D" ("kernel stack overflow (page fault)"),
> + "S" (regs), "d" (address), [stack] "rm" (stack)));
> unreachable();
> }
> #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 3f8c88e29a46..3a57f32a0bfd 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -589,4 +589,18 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
> (_________p1); \
> })
>
> +#define ARGS(args...) args
> +
> +#ifdef CONFIG_FRAME_POINTER
> +
> +#define ASM_CALL(str, outputs, inputs, clobbers...) \
> + asm volatile(str : outputs : inputs : "sp", ## clobbers)
> +
> +#else
> +
> +#define ASM_CALL(str, outputs, inputs, clobbers...) \
> + asm volatile(str : outputs : inputs : clobbers);
> +
> +#endif
> +
> #endif /* __LINUX_COMPILER_H */
> diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
> index 6a1af43862df..766a00ebf80c 100644
> --- a/tools/objtool/Documentation/stack-validation.txt
> +++ b/tools/objtool/Documentation/stack-validation.txt
> @@ -193,13 +193,8 @@ they mean, and suggestions for how to fix them.
>
> If it's a GCC-compiled .c file, the error may be because the function
> uses an inline asm() statement which has a "call" instruction. An
> - asm() statement with a call instruction must declare the use of the
> - stack pointer in its output operand. For example, on x86_64:
> -
> - register void *__sp asm("rsp");
> - asm volatile("call func" : "+r" (__sp));
> -
> - Otherwise the stack frame may not get created before the call.
> + asm() statement with a call instruction must use the ASM_CALL macro,
> + which forces the frame pointer to be saved before the call.
>
>
> 2. file.o: warning: objtool: .text+0x53: unreachable instruction
> @@ -221,7 +216,7 @@ they mean, and suggestions for how to fix them.
> change ENDPROC to END.
>
>
> -4. file.o: warning: objtool: func(): can't find starting instruction
> +3. file.o: warning: objtool: func(): can't find starting instruction
> or
> file.o: warning: objtool: func()+0x11dd: can't decode instruction
>
> @@ -230,7 +225,7 @@ they mean, and suggestions for how to fix them.
> section like .data or .rodata.
>
>
> -5. file.o: warning: objtool: func()+0x6: unsupported instruction in callable function
> +4. file.o: warning: objtool: func()+0x6: unsupported instruction in callable function
>
> This is a kernel entry/exit instruction like sysenter or iret. Such
> instructions aren't allowed in a callable function, and are most
> @@ -239,7 +234,7 @@ they mean, and suggestions for how to fix them.
> annotated with the unwind hint macros in asm/unwind_hints.h.
>
>
> -6. file.o: warning: objtool: func()+0x26: sibling call from callable instruction with modified stack frame
> +5. file.o: warning: objtool: func()+0x26: sibling call from callable instruction with modified stack frame
>
> This is a dynamic jump or a jump to an undefined symbol. Objtool
> assumed it's a sibling call and detected that the frame pointer
> @@ -253,7 +248,7 @@ they mean, and suggestions for how to fix them.
> the unwind hint macros in asm/unwind_hints.h.
>
>
> -7. file: warning: objtool: func()+0x5c: stack state mismatch
> +6. file: warning: objtool: func()+0x5c: stack state mismatch
>
> The instruction's frame pointer state is inconsistent, depending on
> which execution path was taken to reach the instruction.
> @@ -270,7 +265,7 @@ they mean, and suggestions for how to fix them.
> asm/unwind_hints.h.
>
>
> -8. file.o: warning: objtool: funcA() falls through to next function funcB()
> +7. file.o: warning: objtool: funcA() falls through to next function funcB()
>
> This means that funcA() doesn't end with a return instruction or an
> unconditional jump, and that objtool has determined that the function