2018-11-15 20:52:32

by Alexander Popov

[permalink] [raw]
Subject: Re: Investigating a stack state mismatch in Linux kernel

On 15.11.2018 13:24, Florian Weimer wrote:
> * Alexander Popov:
>
>> Of course, there is a naive solution for this issue -- just skip stackleak
>> instrumentation for acpi_duplicate_processor_id(). But it would be great to find
>> out the reasons behind this compiler behavior. It might help to create a better
>> solution.
>
> Please show us the RTL dumps with both compilers, both before and after
> the plugin pass.

Thanks a lot for your reply, Florian!

I have more information to share.

I attach the list of gcc passes. The 'rtl-stackleak_cleanup' runs after the
'rtl-reload' pass.

I attach RTL dumps both for gcc-5 and gcc-7 for:
- 'rtl-reload' pass,
- 'rtl-stackleak_cleanup' pass,
- 'rtl-pro_and_epilogue' pass.

I've found out that for gcc-5:
- if I put the 'rtl-stackleak_cleanup' pass (deleting CALL insn) *before* the
'rtl-pro_and_epilogue' pass, objtool reports about stack state mismatch in
acpi_duplicate_processor_id();
- if I put the 'rtl-stackleak_cleanup' pass *after* the 'rtl-pro_and_epilogue'
pass, objtool *doesn't* report about stack state mismatch.

So gcc-5 does some mistake during the 'rtl-pro_and_epilogue' pass. And gcc-7
doesn't have this issue.

In the original grsecurity code the stackleak RTL pass was registered just
before the 'rtl-final' pass. Some time ago Richard Sandiford noted that:

>>> This might be too late, since it happens e.g. after addresses have
>>> been calculated for branch ranges, and after machine-specific passes
>>> (e.g. bundling on ia64).
>>>
>>> The stack size is final after reload, so inserting the pass after that
>>> might be better.

https://lore.kernel.org/patchwork/patch/879912/


So what is the best moment when we know the stack frame size and can safely
delete the CALL insn using delete_insn_and_edges()?

Thanks!

Best regards,
Alexander


Attachments:
gcc5-after-stackleak_cleanup.txt (7.89 kB)
gcc5-after-reload.txt (16.60 kB)
gcc5-after-pro_and_epilogue.txt (10.08 kB)
gcc7-after-stackleak_cleanup.txt (8.00 kB)
gcc7-after-reload.txt (18.76 kB)
gcc7-after-pro_and_epilogue.txt (10.93 kB)
gcc-passes.txt (20.23 kB)
Download all attachments

2018-11-20 23:11:05

by Alexander Popov

[permalink] [raw]
Subject: Re: Investigating a stack state mismatch in Linux kernel

Hello everyone!

At irc.freenode.org/#gcc people told me that I should CC [email protected] to get
some attention of gcc developers.

Link to previous discussion:
https://www.openwall.com/lists/kernel-hardening/2018/11/14/1


On 15.11.2018 23:51, Alexander Popov wrote:
> In the original grsecurity code the stackleak RTL pass was registered just
> before the 'rtl-final' pass. Some time ago Richard Sandiford noted that:
>
>>>> This might be too late, since it happens e.g. after addresses have
>>>> been calculated for branch ranges, and after machine-specific passes
>>>> (e.g. bundling on ia64).
>>>>
>>>> The stack size is final after reload, so inserting the pass after that
>>>> might be better.
>
> https://lore.kernel.org/patchwork/patch/879912/
>
> So what is the best moment when we know the stack frame size and can safely
> delete the CALL insn using delete_insn_and_edges()?

At irc.oftc.net/#gcc Segher (kudos to him!) confirmed that 'final' pass is too
late for this and proposed registering 'stackleak_cleanup' pass before
'machine_reorg' pass. It's the moment when the stack frame size is already final
and function prologues and epilogues are generated. That would also fit
Richard's concerns.

It looks reasonable -- that's what gcc/target.def says about
machine_dependent_reorg() hook, which is called during 'machine_reorg' pass:

"If non-null, this hook performs a target-specific pass over the
instruction stream. The compiler will run it at all optimization levels,
just before the point at which it normally does delayed-branch scheduling.
The exact purpose of the hook varies from target to target. Some use
it to do transformations that are necessary for correctness, such as
laying out in-function constant pools or avoiding hardware hazards.
Others use it as an opportunity to do some machine-dependent optimizations".


So I would appreciate any comments on the following solution:

diff --git a/scripts/gcc-plugins/stackleak_plugin.c
b/scripts/gcc-plugins/stackleak_plugin.c
index 2f48da9..6f41b32 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -363,10 +363,12 @@ __visible int plugin_init(struct plugin_name_args
*plugin_info,
PASS_POS_INSERT_BEFORE);

/*
- * The stackleak_cleanup pass should be executed after the
- * "reload" pass, when the stack frame size is final.
+ * The stackleak_cleanup pass should be executed before the "mach"
+ * pass, which performs the machine dependent code transformations.
+ * It's the moment when the stack frame size is already final and
+ * function prologues and epilogues are generated.
*/
- PASS_INFO(stackleak_cleanup, "reload", 1, PASS_POS_INSERT_AFTER);
+ PASS_INFO(stackleak_cleanup, "mach", 1, PASS_POS_INSERT_BEFORE);

if (!plugin_default_version_check(version, &gcc_version)) {
error(G_("incompatible gcc/plugin versions"));

2018-11-24 08:36:37

by Alexander Monakov

[permalink] [raw]
Subject: Re: Investigating a stack state mismatch in Linux kernel

Hi,

On Wed, 21 Nov 2018, Alexander Popov wrote:

> Hello everyone!
>
> At irc.freenode.org/#gcc people told me that I should CC [email protected] to get
> some attention of gcc developers.
>
> Link to previous discussion:
> https://www.openwall.com/lists/kernel-hardening/2018/11/14/1

So just for information, it isn't a plugin bug. I'll elaborate below, but in
short, objtool was complaining about unreachable code. To be fair though,
gcc could have made the problem easier to spot.

Now for the details.

The repro uses an unusual (randomized) kernel config, in particular gcov is
enabled and NR_CPUS is 1. Thus we have

static int duplicate_processor_ids[] = {
[0 ... 1 - 1] = -1,
};

and the compiler deduces that the loop in acpi_duplicate_processor_id does not
iterate:

bool acpi_duplicate_processor_id(int proc_id)
{
int i;

for (i = 0; i < nr_duplicate_ids; i++) {
if (duplicate_processor_ids[i] == proc_id)
return true;
}
return false;
}

However as gcov is enabled, loop backedge remains, followed by gcov counter
increment and __builtin_unreachable() (on GIMPLE). On RTL, however, the basic
block with the increment ends with a barrier rtx, which is not visible in
assembly and looks just as if control flow falls through to a function exit.
Since the exit in question corresponds to a shrink-wrapped early exit that
does not push/pop registers, objtool complains.

Ultimately this is poor luck, if gcc optimized the code better or terminated
the block that-should-not-be-reached with ud2, it would not arise.

Alexander