On Fri 2020-05-22 16:53:06, Shreyas Joshi wrote:
> If uboot passes a blank string to console_setup then it results in a trashed memory.
> Ultimately, the kernel crashes during freeing up the memory. This fix checks if there
> is a blank parameter being passed to console_setup from uboot.
> In case it detects that the console parameter is blank then
> it doesn't setup the serial device and it gracefully exits.
>
> Signed-off-by: Shreyas Joshi <[email protected]>
> ---
> V1:
> Fixed console_loglevel to default as per the review comments
>
> kernel/printk/printk.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ad4606234545..e9ad730991e0 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2165,7 +2165,10 @@ static int __init console_setup(char *str)
> char buf[sizeof(console_cmdline[0].name) + 4]; /* 4 for "ttyS" */
> char *s, *options, *brl_options = NULL;
> int idx;
> -
> + if (str[0] == 0) {
> + return 1;
> + }
> if (_braille_console_setup(&str, &brl_options))
> return 1;
I have fixed formatting and pushed it into printk/linux.git,
branch for-5.8.
Best Regards,
Petr
Cc-ing Guenter,
On (20/05/22 12:00), Petr Mladek wrote:
> On Fri 2020-05-22 16:53:06, Shreyas Joshi wrote:
> > If uboot passes a blank string to console_setup then it results in a trashed memory.
> > Ultimately, the kernel crashes during freeing up the memory. This fix checks if there
> > is a blank parameter being passed to console_setup from uboot.
> > In case it detects that the console parameter is blank then
> > it doesn't setup the serial device and it gracefully exits.
> >
> > Signed-off-by: Shreyas Joshi <[email protected]>
> > ---
> > V1:
> > Fixed console_loglevel to default as per the review comments
> >
> > kernel/printk/printk.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index ad4606234545..e9ad730991e0 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2165,7 +2165,10 @@ static int __init console_setup(char *str)
> > char buf[sizeof(console_cmdline[0].name) + 4]; /* 4 for "ttyS" */
> > char *s, *options, *brl_options = NULL;
> > int idx;
> > -
> > + if (str[0] == 0) {
> > + return 1;
> > + }
> > if (_braille_console_setup(&str, &brl_options))
> > return 1;
>
> I have fixed formatting and pushed it into printk/linux.git,
> branch for-5.8.
Petr, this patch's causing regressions for us. We use blank console= boot
param to bypass dts. It appears that it'd be better to revert the change.
-ss
On 10/5/20 7:59 PM, Sergey Senozhatsky wrote:
> Cc-ing Guenter,
>
> On (20/05/22 12:00), Petr Mladek wrote:
>> On Fri 2020-05-22 16:53:06, Shreyas Joshi wrote:
>>> If uboot passes a blank string to console_setup then it results in a trashed memory.
>>> Ultimately, the kernel crashes during freeing up the memory. This fix checks if there
>>> is a blank parameter being passed to console_setup from uboot.
>>> In case it detects that the console parameter is blank then
>>> it doesn't setup the serial device and it gracefully exits.
>>>
>>> Signed-off-by: Shreyas Joshi <[email protected]>
>>> ---
>>> V1:
>>> Fixed console_loglevel to default as per the review comments
>>>
>>> kernel/printk/printk.c | 5 ++++-
>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>>> index ad4606234545..e9ad730991e0 100644
>>> --- a/kernel/printk/printk.c
>>> +++ b/kernel/printk/printk.c
>>> @@ -2165,7 +2165,10 @@ static int __init console_setup(char *str)
>>> char buf[sizeof(console_cmdline[0].name) + 4]; /* 4 for "ttyS" */
>>> char *s, *options, *brl_options = NULL;
>>> int idx;
>>> -
>>> + if (str[0] == 0) {
>>> + return 1;
>>> + }
>>> if (_braille_console_setup(&str, &brl_options))
>>> return 1;
>>
>> I have fixed formatting and pushed it into printk/linux.git,
>> branch for-5.8.
>
> Petr, this patch's causing regressions for us. We use blank console= boot
> param to bypass dts. It appears that it'd be better to revert the change.
>
Not just to bypass dts, it was also possible to use console= to disable consoles
passed as config option, as well as other default console options. A quick test
confirms that this affects all platforms/architectures, not just Chromebooks.
Prior to this patch, it was possible to disable a default console with an
empty "console=" parameter. This is no longer possible. This means that
this patch results in a substantial (and, as far as I can see, completely
undiscussed) functionality change.
I don't understand why (yet), but the patch also causes regressions with
seemingly unrelated functionality, specifically with dm-verity on at least
one Chromebook platform. I filed crbug.com/1135157 to track the problem,
and reverted the patch from all our stable releases immediately after
the last round of stable release merges.
On a side note, I don't see the problem presumably fixed with this
patch in any of my tests.
Guenter
On Mon, Oct 05, 2020 at 08:35:59PM -0700, Guenter Roeck wrote:
> On 10/5/20 7:59 PM, Sergey Senozhatsky wrote:
> > Cc-ing Guenter,
> >
> > On (20/05/22 12:00), Petr Mladek wrote:
> >> On Fri 2020-05-22 16:53:06, Shreyas Joshi wrote:
> >>> If uboot passes a blank string to console_setup then it results in a trashed memory.
> >>> Ultimately, the kernel crashes during freeing up the memory. This fix checks if there
> >>> is a blank parameter being passed to console_setup from uboot.
> >>> In case it detects that the console parameter is blank then
> >>> it doesn't setup the serial device and it gracefully exits.
> >>>
> >>> Signed-off-by: Shreyas Joshi <[email protected]>
> >>> ---
> >>> V1:
> >>> Fixed console_loglevel to default as per the review comments
> >>>
> >>> kernel/printk/printk.c | 5 ++++-
> >>> 1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >>> index ad4606234545..e9ad730991e0 100644
> >>> --- a/kernel/printk/printk.c
> >>> +++ b/kernel/printk/printk.c
> >>> @@ -2165,7 +2165,10 @@ static int __init console_setup(char *str)
> >>> char buf[sizeof(console_cmdline[0].name) + 4]; /* 4 for "ttyS" */
> >>> char *s, *options, *brl_options = NULL;
> >>> int idx;
> >>> -
> >>> + if (str[0] == 0) {
> >>> + return 1;
> >>> + }
> >>> if (_braille_console_setup(&str, &brl_options))
> >>> return 1;
> >>
> >> I have fixed formatting and pushed it into printk/linux.git,
> >> branch for-5.8.
> >
> > Petr, this patch's causing regressions for us. We use blank console= boot
> > param to bypass dts. It appears that it'd be better to revert the change.
> >
>
> Not just to bypass dts, it was also possible to use console= to disable consoles
> passed as config option, as well as other default console options. A quick test
> confirms that this affects all platforms/architectures, not just Chromebooks.
> Prior to this patch, it was possible to disable a default console with an
> empty "console=" parameter. This is no longer possible. This means that
> this patch results in a substantial (and, as far as I can see, completely
> undiscussed) functionality change.
>
> I don't understand why (yet), but the patch also causes regressions with
> seemingly unrelated functionality, specifically with dm-verity on at least
> one Chromebook platform. I filed crbug.com/1135157 to track the problem,
> and reverted the patch from all our stable releases immediately after
> the last round of stable release merges.
>
> On a side note, I don't see the problem presumably fixed with this
> patch in any of my tests.
I have no problem reverting this in the stable trees, but are you going
to hit this issue in Linus's tree in the next release?
thanks,
greg k-h
On (20/10/05 20:35), Guenter Roeck wrote:
> On a side note, I don't see the problem presumably fixed with this
> patch in any of my tests.
Hmm. This is rather interesting. Empty console= certainly oops-es my laptop,
but not the cros board I just tested this on. Do we carry around any chromeos
patches that may affect the parsing of the kernel boot command line?
-ss
On Tue 2020-10-06 15:59:07, Sergey Senozhatsky wrote:
> On (20/10/05 20:35), Guenter Roeck wrote:
> > On a side note, I don't see the problem presumably fixed with this
> > patch in any of my tests.
>
> Hmm. This is rather interesting. Empty console= certainly oops-es my laptop,
Just by chance. Do you have any log with the Oops? Or does it die
silently?
Best Regards,
Petr
On Mon 2020-10-05 20:35:59, Guenter Roeck wrote:
> On 10/5/20 7:59 PM, Sergey Senozhatsky wrote:
> > On (20/05/22 12:00), Petr Mladek wrote:
> >> On Fri 2020-05-22 16:53:06, Shreyas Joshi wrote:
> >>> If uboot passes a blank string to console_setup then it results in a trashed memory.
> >>> Ultimately, the kernel crashes during freeing up the memory. This fix checks if there
> >>> is a blank parameter being passed to console_setup from uboot.
> >>> In case it detects that the console parameter is blank then
> >>> it doesn't setup the serial device and it gracefully exits.
> >>>
> > Petr, this patch's causing regressions for us. We use blank console= boot
> > param to bypass dts. It appears that it'd be better to revert the change.
> >
> Not just to bypass dts, it was also possible to use console= to disable consoles
> passed as config option, as well as other default console options. A quick test
> confirms that this affects all platforms/architectures, not just Chromebooks.
> Prior to this patch, it was possible to disable a default console with an
> empty "console=" parameter. This is no longer possible. This means that
> this patch results in a substantial (and, as far as I can see, completely
> undiscussed) functionality change.
Where is this behavior documented, please?
I do not see it anywhere (documentation, git log, google) and it is far from
obvious from the code. It seems that any random string would do the
same job, e.g. console=none.
Of course, we need to restore the original behavior when it breaks
existing systems. But I want to be sure that there is no better
solution.
And it makes perfect sense to disable all consoles or drop all defined
by dts. But I would prefer to make it more obvious way, for
example by parameters like:
+ console=none
+ no-console
+ no-dtd-console
+ no-default-console
JFYI, the console= parameter handling is a real historical mess. We are
always surprised what undefined behavior people depend on. For
example, see:
+ commit 33225d7b0ac9903c5701b ("printk: Correctly set CON_CONSDEV
even when preferred console was not registered")
+ commit e369d8227fd211be3624 ("printk: Fix preferred console
selection with multiple matches")
> I don't understand why (yet), but the patch also causes regressions with
> seemingly unrelated functionality, specifically with dm-verity on at least
> one Chromebook platform. I filed crbug.com/1135157 to track the problem,
> and reverted the patch from all our stable releases immediately after
> the last round of stable release merges.
>
> On a side note, I don't see the problem presumably fixed with this
> patch in any of my tests.
Console drivers might provide a custom match() callback to handle
various aliases. I guess that some driver wrongly matches the empty
string stored in the array of preferred consoles.
There are likely other ways to fix the original problem.
Best Regards,
Petr
On 10/6/20 2:52 AM, Petr Mladek wrote:
> On Mon 2020-10-05 20:35:59, Guenter Roeck wrote:
>> On 10/5/20 7:59 PM, Sergey Senozhatsky wrote:
>>> On (20/05/22 12:00), Petr Mladek wrote:
>>>> On Fri 2020-05-22 16:53:06, Shreyas Joshi wrote:
>>>>> If uboot passes a blank string to console_setup then it results in a trashed memory.
>>>>> Ultimately, the kernel crashes during freeing up the memory. This fix checks if there
>>>>> is a blank parameter being passed to console_setup from uboot.
>>>>> In case it detects that the console parameter is blank then
>>>>> it doesn't setup the serial device and it gracefully exits.
>>>>>
>>> Petr, this patch's causing regressions for us. We use blank console= boot
>>> param to bypass dts. It appears that it'd be better to revert the change.
>>>
>> Not just to bypass dts, it was also possible to use console= to disable consoles
>> passed as config option, as well as other default console options. A quick test
>> confirms that this affects all platforms/architectures, not just Chromebooks.
>> Prior to this patch, it was possible to disable a default console with an
>> empty "console=" parameter. This is no longer possible. This means that
>> this patch results in a substantial (and, as far as I can see, completely
>> undiscussed) functionality change.
>
> Where is this behavior documented, please?
>
I don't know. I didn't find it either. All I know is that Chromebooks
apparently used it from day 1 to disable the console, and I always thought
it was official behavior until I stumbled over the problem last weekend,
tried to look it up, and failed to find it.
> I do not see it anywhere (documentation, git log, google) and it is far from
> obvious from the code. It seems that any random string would do the
> same job, e.g. console=none.
>
Agreed about the "far from obvious". From looking at the code, it seems like
an unintended (?) side effect to me.
> Of course, we need to restore the original behavior when it breaks
> existing systems. But I want to be sure that there is no better
> solution.
>
> And it makes perfect sense to disable all consoles or drop all defined
> by dts. But I would prefer to make it more obvious way, for
> example by parameters like:
>
> + console=none
> + no-console
> + no-dtd-console
> + no-default-console
>
Again, the problem isn't limited to dts provided consoles, or at least
that was my understanding. I am still trying to understand how default
consoles are defined, so I may get something wrong. Anyway, personally I
liked "console=", but that is just me. Anything else should work for us
as long as it is backward compatible (which excludes the no-xxx options).
Whatever is decided, I'd like to have it made official and documented to
avoid a similar problem in the future.
>
> JFYI, the console= parameter handling is a real historical mess. We are
> always surprised what undefined behavior people depend on. For
> example, see:
>
> + commit 33225d7b0ac9903c5701b ("printk: Correctly set CON_CONSDEV
> even when preferred console was not registered")
>
> + commit e369d8227fd211be3624 ("printk: Fix preferred console
> selection with multiple matches")
>
>> I don't understand why (yet), but the patch also causes regressions with
>> seemingly unrelated functionality, specifically with dm-verity on at least
>> one Chromebook platform. I filed crbug.com/1135157 to track the problem,
>> and reverted the patch from all our stable releases immediately after
>> the last round of stable release merges.
>>
>> On a side note, I don't see the problem presumably fixed with this
>> patch in any of my tests.
>
> Console drivers might provide a custom match() callback to handle
> various aliases. I guess that some driver wrongly matches the empty
> string stored in the array of preferred consoles.
>
That might well be. Obviously all Chromebooks never had a problem with it.
I'll keep trying; maybe I can find a qemu emulation that crashes with it.
Unfortunately we don't have a traceback, so it is difficult to determine
what actually caused the problem. Maybe Sergey can provide one.
> There are likely other ways to fix the original problem.
>
Most definitely. Either case, again, I'd like to make sure that we get
some official means to disable a pre-configured console using the
command lime.
Thanks,
Guenter
On 10/5/20 10:08 PM, Greg Kroah-Hartman wrote:
> On Mon, Oct 05, 2020 at 08:35:59PM -0700, Guenter Roeck wrote:
>> On 10/5/20 7:59 PM, Sergey Senozhatsky wrote:
>>> Cc-ing Guenter,
>>>
>>> On (20/05/22 12:00), Petr Mladek wrote:
>>>> On Fri 2020-05-22 16:53:06, Shreyas Joshi wrote:
>>>>> If uboot passes a blank string to console_setup then it results in a trashed memory.
>>>>> Ultimately, the kernel crashes during freeing up the memory. This fix checks if there
>>>>> is a blank parameter being passed to console_setup from uboot.
>>>>> In case it detects that the console parameter is blank then
>>>>> it doesn't setup the serial device and it gracefully exits.
>>>>>
>>>>> Signed-off-by: Shreyas Joshi <[email protected]>
>>>>> ---
>>>>> V1:
>>>>> Fixed console_loglevel to default as per the review comments
>>>>>
>>>>> kernel/printk/printk.c | 5 ++++-
>>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>>>>> index ad4606234545..e9ad730991e0 100644
>>>>> --- a/kernel/printk/printk.c
>>>>> +++ b/kernel/printk/printk.c
>>>>> @@ -2165,7 +2165,10 @@ static int __init console_setup(char *str)
>>>>> char buf[sizeof(console_cmdline[0].name) + 4]; /* 4 for "ttyS" */
>>>>> char *s, *options, *brl_options = NULL;
>>>>> int idx;
>>>>> -
>>>>> + if (str[0] == 0) {
>>>>> + return 1;
>>>>> + }
>>>>> if (_braille_console_setup(&str, &brl_options))
>>>>> return 1;
>>>>
>>>> I have fixed formatting and pushed it into printk/linux.git,
>>>> branch for-5.8.
>>>
>>> Petr, this patch's causing regressions for us. We use blank console= boot
>>> param to bypass dts. It appears that it'd be better to revert the change.
>>>
>>
>> Not just to bypass dts, it was also possible to use console= to disable consoles
>> passed as config option, as well as other default console options. A quick test
>> confirms that this affects all platforms/architectures, not just Chromebooks.
>> Prior to this patch, it was possible to disable a default console with an
>> empty "console=" parameter. This is no longer possible. This means that
>> this patch results in a substantial (and, as far as I can see, completely
>> undiscussed) functionality change.
>>
>> I don't understand why (yet), but the patch also causes regressions with
>> seemingly unrelated functionality, specifically with dm-verity on at least
>> one Chromebook platform. I filed crbug.com/1135157 to track the problem,
>> and reverted the patch from all our stable releases immediately after
>> the last round of stable release merges.
>>
>> On a side note, I don't see the problem presumably fixed with this
>> patch in any of my tests.
>
> I have no problem reverting this in the stable trees, but are you going
> to hit this issue in Linus's tree in the next release?
>
Not sure what you mean with "next release". As mentioned, I already reverted
the patch from all Chrome OS stable branches. We have already seen the problem
in the top-of-tree test branch (which is presumably why Sergey brought it up
back in May), so we'll definitely have to either revert this patch in the next
Chrome OS stable branch (presumably based on 5.10 unless that changes), or
we'll have to find find some other (backward-compatible) solution to disable
the default console on Chromebooks.
Since the patch is already reverted in our branches, it is not an urgent
problem for us. But we will need some solution - I really don't want to
carry reverts of upstream patches in our trees.
Guenter
On (20/10/06 11:54), Petr Mladek wrote:
> On Tue 2020-10-06 15:59:07, Sergey Senozhatsky wrote:
> > On (20/10/05 20:35), Guenter Roeck wrote:
> > > On a side note, I don't see the problem presumably fixed with this
> > > patch in any of my tests.
> >
> > Hmm. This is rather interesting. Empty console= certainly oops-es my laptop,
>
> Just by chance. Do you have any log with the Oops? Or does it die
> silently?
The laptop in question has fbdev and no serial. It dies with blank screen.
I'll try to dig it and get some backtrace or anything useful.
-ss
On Tue 2020-10-06 03:45:00, Guenter Roeck wrote:
> On 10/6/20 2:52 AM, Petr Mladek wrote:
> > And it makes perfect sense to disable all consoles or drop all defined
> > by dts. But I would prefer to make it more obvious way, for
> > example by parameters like:
> >
> > + console=none
> > + no-console
> > + no-dtd-console
> > + no-default-console
> >
> Again, the problem isn't limited to dts provided consoles, or at least
> that was my understanding. I am still trying to understand how default
> consoles are defined, so I may get something wrong. Anyway, personally I
> liked "console=", but that is just me. Anything else should work for us
> as long as it is backward compatible (which excludes the no-xxx options).
Here is my understanding:
The consoles can be defined by scpr, dts, and on the command line. It
is anyone calling add_preferred_console().
Then the various devices call register_console(). They are registered
only when they match any console in console_cmdline[] array, see
try_enable_new_console().
The only exception is when the array is empty (or only braile console
was added). Then the first console with tty binding is registered.
This special case is done by the following code in register_console():
/*
* See if we want to use this console driver. If we
* didn't select a console we take the first one
* that registers here.
*/
if (!has_preferred_console) {
if (newcon->index < 0)
newcon->index = 0;
if (newcon->setup == NULL ||
newcon->setup(newcon, NULL) == 0) {
newcon->flags |= CON_ENABLED;
if (newcon->device) {
newcon->flags |= CON_CONSDEV;
has_preferred_console = true;
}
}
}
> Whatever is decided, I'd like to have it made official and documented to
> avoid a similar problem in the future.
Sure. I am going to play with the code. I would prefer to avoid
introducing back the crash that was solved by the patch.
If the change is simple, we could use it. If not, we should just
revert the problematic patch and come up with something better
for-5.10
or later.
We need to be careful because the behavior is not defined. It seems
that many people actually use also console=null for this purpose, see
https://www.programmersought.com/article/19374022450/
https://developer.toradex.com/knowledge-base/how-to-disable-enable-debug-messages-in-linux
https://unix.stackexchange.com/questions/117926/try-to-disable-console-output-console-null-doesnt-work
Best Regards,
Petr
On 10/6/20 6:33 AM, Sergey Senozhatsky wrote:
> On (20/10/06 11:54), Petr Mladek wrote:
>> On Tue 2020-10-06 15:59:07, Sergey Senozhatsky wrote:
>>> On (20/10/05 20:35), Guenter Roeck wrote:
>>>> On a side note, I don't see the problem presumably fixed with this
>>>> patch in any of my tests.
>>>
>>> Hmm. This is rather interesting. Empty console= certainly oops-es my laptop,
>>
>> Just by chance. Do you have any log with the Oops? Or does it die
>> silently?
>
> The laptop in question has fbdev and no serial. It dies with blank screen.
> I'll try to dig it and get some backtrace or anything useful.
>
Some versions of systemd (and possibly other distributions) apparently
react allergic if no console is present. See [1]. Maybe that is what
happens with your laptop ?
That exchange leads to the question what should be done with /dev/console
if there is no console. On Chromebooks we see an error when trying
to open it if I recall correctly.
Guenter
---
[1] https://github.com/systemd/systemd/issues/13332
On (20/10/06 07:22), Guenter Roeck wrote:
> On 10/6/20 6:33 AM, Sergey Senozhatsky wrote:
> > On (20/10/06 11:54), Petr Mladek wrote:
> >> On Tue 2020-10-06 15:59:07, Sergey Senozhatsky wrote:
> >>> On (20/10/05 20:35), Guenter Roeck wrote:
> >>>> On a side note, I don't see the problem presumably fixed with this
> >>>> patch in any of my tests.
> >>>
> >>> Hmm. This is rather interesting. Empty console= certainly oops-es my laptop,
> >>
> >> Just by chance. Do you have any log with the Oops? Or does it die
> >> silently?
> >
> > The laptop in question has fbdev and no serial. It dies with blank screen.
> > I'll try to dig it and get some backtrace or anything useful.
> >
>
> Some versions of systemd (and possibly other distributions) apparently
> react allergic if no console is present. See [1]. Maybe that is what
> happens with your laptop ?
Seems to be crashing before /init
> That exchange leads to the question what should be done with /dev/console
> if there is no console. On Chromebooks we see an error when trying
> to open it if I recall correctly.
A wild guess:
Devices that you test, do they have 'blah blah console= blah blah'
command line? If so, does anything change if you revert the patch in
question and change kernel boot command line to 'blah blah console='?
-ss
On Tue 2020-10-06 15:43:28, Petr Mladek wrote:
> On Tue 2020-10-06 03:45:00, Guenter Roeck wrote:
> > On 10/6/20 2:52 AM, Petr Mladek wrote:
> > > And it makes perfect sense to disable all consoles or drop all defined
> > > by dts. But I would prefer to make it more obvious way, for
> > > example by parameters like:
> > >
> > > + console=none
> > > + no-console
> > > + no-dtd-console
> > > + no-default-console
> > >
> > Again, the problem isn't limited to dts provided consoles, or at least
> > that was my understanding. I am still trying to understand how default
> > consoles are defined, so I may get something wrong. Anyway, personally I
> > liked "console=", but that is just me. Anything else should work for us
> > as long as it is backward compatible (which excludes the no-xxx options).
>
> Here is my understanding:
>
> The consoles can be defined by scpr, dts, and on the command line. It
> is anyone calling add_preferred_console().
>
> Then the various devices call register_console(). They are registered
> only when they match any console in console_cmdline[] array, see
> try_enable_new_console().
>
> > Whatever is decided, I'd like to have it made official and documented to
> > avoid a similar problem in the future.
Sigh, it is even bigger mess than I expected. There is a magic
variable "console_set_on_cmdline". It used, for example, in
of_console_check() to prevent using the default console from dts.
It is used on few more locations to prevent the default console.
But there are other locations when add_preferred_console() is
called without checking this variable.
As a result, "console=" has different effect on different systems.
I tend to revert the problematic patch now.
And I would try to clean up this mess for-5.11. There is a big chance
that people used the empty console= only on systems where it disabled
all default consoles. I would try to make it the official global
behavior. But this would need some longer testing in linux-next, ...
Best Regards,
Petr
On (20/10/06 18:35), Petr Mladek wrote:
> > > Whatever is decided, I'd like to have it made official and documented to
> > > avoid a similar problem in the future.
>
> Sigh, it is even bigger mess than I expected. There is a magic
> variable "console_set_on_cmdline". It used, for example, in
> of_console_check() to prevent using the default console from dts.
I wonder if we can do something like:
---
@@ -2200,6 +2200,9 @@ static int __init console_setup(char *str)
char *s, *options, *brl_options = NULL;
int idx;
if (str[0] == 0) {
+ console_set_on_cmdline = 1;
return 1;
}
if (_braille_console_setup(&str, &brl_options))
return 1;
---
-ss
On Wed 2020-10-07 02:15:04, Sergey Senozhatsky wrote:
> On (20/10/06 18:35), Petr Mladek wrote:
> > > > Whatever is decided, I'd like to have it made official and documented to
> > > > avoid a similar problem in the future.
> >
> > Sigh, it is even bigger mess than I expected. There is a magic
> > variable "console_set_on_cmdline". It used, for example, in
> > of_console_check() to prevent using the default console from dts.
>
> I wonder if we can do something like:
>
> ---
> @@ -2200,6 +2200,9 @@ static int __init console_setup(char *str)
> char *s, *options, *brl_options = NULL;
> int idx;
>
> if (str[0] == 0) {
> + console_set_on_cmdline = 1;
Unfortunately, this is not enough. We will also need to prevent
enabling the fallback console when has_preferred_console is not set.
The following might work:
/*
* Dirty hack to prevent using any console with tty
* binding as a fallback and adding the empty
* name into console_cmdline array.
*/
preferred_console = MAX_CMDLINECONSOLES;
> return 1;
> }
It might be the minimal change that would fix the regression and keep
the original fix. But it would make the code even more hairy.
It might be acceptable as a hotfix. But we really should somehow clean
up the code and try to make the behavior more consistent.
Best Regards,
Petr
On (20/10/07 09:28), Petr Mladek wrote:
>
> /*
> * Dirty hack to prevent using any console with tty
> * binding as a fallback and adding the empty
> * name into console_cmdline array.
> */
> preferred_console = MAX_CMDLINECONSOLES;
Let me dump my findings so far. I still don't understand what exactly
crashes the laptop (blank screen is not very helpful).
So, things start with the "preferred_console = -1". In console_setup()
we call __add_preferred_console(). Since we have no consoles, the
name matching loop is not executed, and console selection counter remains
at 0. After the loop, despite the fact that we don't have the console
(`name' is empty), we still set `preferred_console', to 0. This affects
register_console(). Since we have `preferred_console >= 0' we don't
execute the newcon->setup(), but, more importantly, we don't set the
newcon->flags |= CON_ENABLED. Now, we call try_enable_new_console():
since there are no consoles, the ->match() loop is not executed.
newcone does not have CON_ENABLED set, so try_enable_new_console()
returns -ENOENT. Both for user_specified=true and for fallback
user_specified=false cases. At this point we hit error-return path
from register_console() - we don't add newcon to the list of console
drivers. The console drivers list, thus, remains empty. So far so good.
Now. Things get strange in init/main.c
We have that kernel_init_freeable()->console_on_rootfs() control path.
console_on_rootfs() attempts to filp_open()->tty_open() /dev/console.
This ends up in printk's console_device(), which iterates the list of
console drivers and returns associated console->device back to tty. The
problem is that console drivers list is empty, so the function returns
NULL, and filp_open("/dev/console") fails. But the console_on_rootfs()
comment says that this function should never fail (!). This sort of
makes me wonder if "console=" is actually legal.
What this filp_open() failure means in particular, is that we never
create stdin/out/err fds, because we error-out and don't invoke
init_dup(file).
Things look different in older kernels. For instance, even in 5.4
the corresponding code looks as follows:
/* Open the /dev/console on the rootfs, this should never fail */
if (ksys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
pr_err("Warning: unable to open an initial console.\n");
(void) ksys_dup(0);
(void) ksys_dup(0);
Somehow, the fact that we don't init_dup(file) causes problems on my
laptop, but, at the moment, I can't tell exactly where. Perhaps more
experienced people will be like "darn, this is trivial, the problem is
here, here and there".
Hint: I can crash my laptop when I remove the "console=" boot param and
comment out init_dup(file) calls in console_on_rootfs().
I guess the problem is somewhat related to missing stdin/out/err fds.
Any ideas?
-ss
On 10/7/20 5:30 AM, Sergey Senozhatsky wrote:
[ ... ]
>
> console_on_rootfs() attempts to filp_open()->tty_open() /dev/console.
> This ends up in printk's console_device(), which iterates the list of
> console drivers and returns associated console->device back to tty. The
> problem is that console drivers list is empty, so the function returns
> NULL, and filp_open("/dev/console") fails. But the console_on_rootfs()
> comment says that this function should never fail (!). This sort of
> makes me wonder if "console=" is actually legal.
>
I would not want to use a term such as "legal". It just happened to work
and was used.
> Hint: I can crash my laptop when I remove the "console=" boot param and
> comment out init_dup(file) calls in console_on_rootfs().
>
I can see to options: Link /dev/console to /dev/null if there is no console,
or do something like
if (IS_ERR(file)) {
pr_warn("Warning: unable to open an initial console.\n");
file = filp_open("/dev/null", O_RDWR, 0);
if (IS_ERR(file))
return;
}
Guenter
On (20/10/07 08:57), Guenter Roeck wrote:
> On 10/7/20 5:30 AM, Sergey Senozhatsky wrote:
[..]
> I can see to options: Link /dev/console to /dev/null if there is no console,
> or do something like
>
> if (IS_ERR(file)) {
> pr_warn("Warning: unable to open an initial console.\n");
> file = filp_open("/dev/null", O_RDWR, 0);
> if (IS_ERR(file))
> return;
> }
As far as I can tell, /dev/null does not exist yet on this stage
(at least not in my system). But generally the idea looks interesting.
-ss
On (20/10/07 21:30), Sergey Senozhatsky wrote:
> On (20/10/07 09:28), Petr Mladek wrote:
> >
> > /*
> > * Dirty hack to prevent using any console with tty
> > * binding as a fallback and adding the empty
> > * name into console_cmdline array.
> > */
> > preferred_console = MAX_CMDLINECONSOLES;
[..]
> Hint: I can crash my laptop when I remove the "console=" boot param and
> comment out init_dup(file) calls in console_on_rootfs().
My guess is that since we don't have stdin/out/err fds then,
theoretically, something like this can happen
int main()
{
...
int fd = open(.... );
int fd = open(..., "vfat.ko");
//fd is 1
fprintf(stdout, "loading vfat\n");
...
}
stdout (fd 1) is not stdout, it's fd that we got from open(vfat.ko).
Does this make sense?
-ss
On (20/10/08 01:29), Sergey Senozhatsky wrote:
> On (20/10/07 08:57), Guenter Roeck wrote:
> > On 10/7/20 5:30 AM, Sergey Senozhatsky wrote:
>
> [..]
>
> > I can see to options: Link /dev/console to /dev/null if there is no console,
> > or do something like
> >
> > if (IS_ERR(file)) {
> > pr_warn("Warning: unable to open an initial console.\n");
> > file = filp_open("/dev/null", O_RDWR, 0);
> > if (IS_ERR(file))
> > return;
> > }
>
> As far as I can tell, /dev/null does not exist yet on this stage
> (at least not in my system). But generally the idea looks interesting.
Hmm. How about this. console= is undocumented and unspecified - it
may work sometimes or it may kill the system (and theoretically even
corrupt some files, depending on what fd 1 and fd 2 point to). So
maybe we can document console= and handle it in printk, rather than
somewhere deep in init/main.c
IOW add one more flag (yeah, I know) and set it when console_setup()
sees console= boot param. The idea is allow console registration,
but all consoles should be disabled (cleared CON_ENABLED bit). This
would be easier to document, at least.
Schematically:
---
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 929e86a01148..b71ff9d87693 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -281,6 +281,7 @@ static struct console_cmdline console_cmdline[MAX_CMDLINECONSOLES];
static int preferred_console = -1;
static bool has_preferred_console;
+static bool mute_consoles = false;
int console_set_on_cmdline;
EXPORT_SYMBOL(console_set_on_cmdline);
@@ -2141,6 +2142,9 @@ static int __add_preferred_console(char *name, int idx, char *options,
struct console_cmdline *c;
int i;
+ if (mute_consoles)
+ return;
+
/*
* See if this tty is not yet registered, and
* if we have a slot free.
@@ -2189,6 +2193,11 @@ static int __init console_setup(char *str)
char *s, *options, *brl_options = NULL;
int idx;
+ if (str[0] == 0) {
+ mute_consoles = true;
+ return 0;
+ }
+
if (_braille_console_setup(&str, &brl_options))
return 1;
@@ -2630,6 +2639,9 @@ EXPORT_SYMBOL(console_stop);
void console_start(struct console *console)
{
+ if (mute_consoles)
+ return;
+
console_lock();
console->flags |= CON_ENABLED;
console_unlock();
@@ -2811,6 +2823,9 @@ void register_console(struct console *newcon)
console_drivers->next = newcon;
}
+ if (mute_consoles)
+ newcon->flags &= ~CON_ENABLED;
+
if (newcon->flags & CON_EXTENDED)
nr_ext_console_drivers++;
On Wed 2020-10-07 21:30:44, Sergey Senozhatsky wrote:
> On (20/10/07 09:28), Petr Mladek wrote:
> >
> > /*
> > * Dirty hack to prevent using any console with tty
> > * binding as a fallback and adding the empty
> > * name into console_cmdline array.
> > */
> > preferred_console = MAX_CMDLINECONSOLES;
>
> Let me dump my findings so far. I still don't understand what exactly
> crashes the laptop (blank screen is not very helpful).
>
> So, things start with the "preferred_console = -1". In console_setup()
> we call __add_preferred_console(). Since we have no consoles, the
> name matching loop is not executed, and console selection counter remains
> at 0. After the loop, despite the fact that we don't have the console
> (`name' is empty), we still set `preferred_console', to 0.
Heh, we actually add the console. But it is ignored in all the later
cycles because the name is "". All the cycles takes this as
the end of the cycle.
> This affects
> register_console(). Since we have `preferred_console >= 0' we don't
> execute the newcon->setup(), but, more importantly, we don't set the
> newcon->flags |= CON_ENABLED. Now, we call try_enable_new_console():
> since there are no consoles, the ->match() loop is not executed.
> newcone does not have CON_ENABLED set, so try_enable_new_console()
> returns -ENOENT. Both for user_specified=true and for fallback
> user_specified=false cases. At this point we hit error-return path
> from register_console() - we don't add newcon to the list of console
> drivers. The console drivers list, thus, remains empty. So far so good.
>
> Now. Things get strange in init/main.c
>
> We have that kernel_init_freeable()->console_on_rootfs() control path.
>
> console_on_rootfs() attempts to filp_open()->tty_open() /dev/console.
> This ends up in printk's console_device(), which iterates the list of
> console drivers and returns associated console->device back to tty. The
> problem is that console drivers list is empty, so the function returns
> NULL, and filp_open("/dev/console") fails. But the console_on_rootfs()
> comment says that this function should never fail (!). This sort of
> makes me wonder if "console=" is actually legal.
>
> What this filp_open() failure means in particular, is that we never
> create stdin/out/err fds, because we error-out and don't invoke
> init_dup(file).
>
> Things look different in older kernels. For instance, even in 5.4
> the corresponding code looks as follows:
>
> /* Open the /dev/console on the rootfs, this should never fail */
> if (ksys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
> pr_err("Warning: unable to open an initial console.\n");
>
> (void) ksys_dup(0);
> (void) ksys_dup(0);
>
> Somehow, the fact that we don't init_dup(file) causes problems on my
> laptop, but, at the moment, I can't tell exactly where. Perhaps more
> experienced people will be like "darn, this is trivial, the problem is
> here, here and there".
>
> Hint: I can crash my laptop when I remove the "console=" boot param and
> comment out init_dup(file) calls in console_on_rootfs().
>
> I guess the problem is somewhat related to missing stdin/out/err fds.
I wonder if you see the problem solved by the commit 2d3145f8d2809592ef8
("early init: fix error handling when opening /dev/console").
I am also curious about the commit 74f1a299107b9e1a56 "Revert "fs:
remove ksys_dup()"". I wonder why it was safe to call ksys_dup(0);
even though the previous ksys_open() failed.
Best Regards,
Petr
PS: I am quite busy with something else this week. I wish, had more
time to dig into it. It should be better the following week.
Anyway, you seem to be on the right way. And we really should
understand the need of stdout and stderr before allowing
to disable all consoles.
On Thu 2020-10-08 14:52:38, Sergey Senozhatsky wrote:
> On (20/10/08 01:29), Sergey Senozhatsky wrote:
> > On (20/10/07 08:57), Guenter Roeck wrote:
> > > On 10/7/20 5:30 AM, Sergey Senozhatsky wrote:
> >
> > [..]
> >
> > > I can see to options: Link /dev/console to /dev/null if there is no console,
> > > or do something like
> > >
> > > if (IS_ERR(file)) {
> > > pr_warn("Warning: unable to open an initial console.\n");
> > > file = filp_open("/dev/null", O_RDWR, 0);
> > > if (IS_ERR(file))
> > > return;
> > > }
> >
> > As far as I can tell, /dev/null does not exist yet on this stage
> > (at least not in my system). But generally the idea looks interesting.
>
> Hmm. How about this. console= is undocumented and unspecified - it
> may work sometimes or it may kill the system (and theoretically even
> corrupt some files, depending on what fd 1 and fd 2 point to). So
> maybe we can document console= and handle it in printk, rather than
> somewhere deep in init/main.c
>
> IOW add one more flag (yeah, I know) and set it when console_setup()
> sees console= boot param. The idea is allow console registration,
> but all consoles should be disabled (cleared CON_ENABLED bit). This
> would be easier to document, at least.
>
> Schematically:
>
> ---
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 929e86a01148..b71ff9d87693 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -281,6 +281,7 @@ static struct console_cmdline console_cmdline[MAX_CMDLINECONSOLES];
>
> static int preferred_console = -1;
> static bool has_preferred_console;
> +static bool mute_consoles = false;
> int console_set_on_cmdline;
> EXPORT_SYMBOL(console_set_on_cmdline);
>
> @@ -2141,6 +2142,9 @@ static int __add_preferred_console(char *name, int idx, char *options,
> struct console_cmdline *c;
> int i;
>
> + if (mute_consoles)
> + return;
> +
> /*
> * See if this tty is not yet registered, and
> * if we have a slot free.
Interesting idea. Well, it looks like yet another mess:
+ it would show the consoles in /proc/consoles
even thought they will be basically unusable
+ it is yet another way to affect the amount of messages
on console. We already have console_loglevel, ignore_loglevel.
+ this effect is far from obvious when using console=""
IMHO, we should try to understand why it actually crashes first.
It might help to solve the problem some cleaner way.
Thanks a lot for digging into it.
Best Regards,
Petr
On (20/10/08 10:50), Petr Mladek wrote:
> On Wed 2020-10-07 21:30:44, Sergey Senozhatsky wrote:
> > On (20/10/07 09:28), Petr Mladek wrote:
> > >
> > > /*
> > > * Dirty hack to prevent using any console with tty
> > > * binding as a fallback and adding the empty
> > > * name into console_cmdline array.
> > > */
> > > preferred_console = MAX_CMDLINECONSOLES;
> >
> > Let me dump my findings so far. I still don't understand what exactly
> > crashes the laptop (blank screen is not very helpful).
> >
> > So, things start with the "preferred_console = -1". In console_setup()
> > we call __add_preferred_console(). Since we have no consoles, the
> > name matching loop is not executed, and console selection counter remains
> > at 0. After the loop, despite the fact that we don't have the console
> > (`name' is empty), we still set `preferred_console', to 0.
>
> Heh, we actually add the console.
To the console drovers list? I don't think so. At least on my laptop
what I have is as follows:
/* See if this console matches one we selected on the command line */
err = try_enable_new_console(newcon, true);
/* If not, try to match against the platform default(s) */
if (err == -ENOENT)
err = try_enable_new_console(newcon, false);
/* printk() messages are not printed to the Braille console. */
if (err || newcon->flags & CON_BRL)
return;
We hit this error return. Because both try_enable_new_console() return
-ENOENT. So this is never executed
...
console_lock();
if ((newcon->flags & CON_CONSDEV) || console_drivers == NULL) {
newcon->next = console_drivers;
console_drivers = newcon;
if (newcon->next)
newcon->next->flags &= ~CON_CONSDEV;
/* Ensure this flag is always set for the head of the list */
newcon->flags |= CON_CONSDEV;
} else {
newcon->next = console_drivers->next;
console_drivers->next = newcon;
}
...
The console driver list is 0x00.
> I wonder if you see the problem solved by the commit 2d3145f8d2809592ef8
> ("early init: fix error handling when opening /dev/console").
/dev/console does exist. What does not exist is console driver, because
console drivers list is NULL. So the failure here is not filp_open()
per se, but tty_lookup_driver()->console_device(), which returns NULL.
As far as I'm concerned.
> I am also curious about the commit 74f1a299107b9e1a56 "Revert "fs:
> remove ksys_dup()"". I wonder why it was safe to call ksys_dup(0);
> even though the previous ksys_open() failed.
I'm quite sure ksys_dup(0) fails, in fact. I guess the issue here boils
down to user-space that does modprobe/fsck/mount and what kind of things
it attempts to do with standard file descriptors 0/1/2.
> PS: I am quite busy with something else this week.
Sure, no prob. Thanks.
-ss
On (20/10/08 21:20), Sergey Senozhatsky wrote:
[..]
> > > Let me dump my findings so far. I still don't understand what exactly
> > > crashes the laptop (blank screen is not very helpful).
> > >
> > > So, things start with the "preferred_console = -1". In console_setup()
> > > we call __add_preferred_console(). Since we have no consoles, the
> > > name matching loop is not executed, and console selection counter remains
> > > at 0. After the loop, despite the fact that we don't have the console
> > > (`name' is empty), we still set `preferred_console', to 0.
> >
> > Heh, we actually add the console.
>
> To the console drovers list?
Oh, sorry, I realized that you were talking about __add_preferred_console(),
not about console drivers list and console registration.
Well, yeah, that's funny. We sort of add preferred console. But since
it has empty name it's not recognized by printk as legit console. So
essentially it sort of does not exist, yet the preferred selector tells
printk that console does exist.
-ss
On (20/10/08 11:01), Petr Mladek wrote:
>
> Interesting idea. Well, it looks like yet another mess:
>
> + it would show the consoles in /proc/consoles
> even thought they will be basically unusable
Which is fine, no? We already can have disables consoles in
/proc/consoles.
$ cat /proc/consoles
tty0 -WU ( C p ) 4:1
So tty0 is not 'E'-enabled. I see no problems with that.
These are the flags that /proc/consoles handle
con_flags[] = {
{ CON_ENABLED, 'E' },
{ CON_CONSDEV, 'C' },
{ CON_BOOT, 'B' },
{ CON_PRINTBUFFER, 'p' },
{ CON_BRL, 'b' },
{ CON_ANYTIME, 'a' },
};
Why do you think that having disabled consoles in /proc/consoles
is a mess?
> IMHO, we should try to understand why it actually crashes first.
> It might help to solve the problem some cleaner way.
Well, I guess, we have files (either regular files or devices) sitting
in fd-s 0,1,2. God knows what mount/fsck/modprobe can fprintf(), for
instance, to stdout/stderr and what they can corrupt.
-ss
On (20/10/08 11:01), Petr Mladek wrote:
>
> + it is yet another way to affect the amount of messages
> on console. We already have console_loglevel, ignore_loglevel.
True. Yes, there are "alternative" ways of doing this, but what we
have to face here is - console= has been used for a long time, and
it does, sometimes, bad things that can kill the system. And we,
probably, don't have that many options, we need to "fix" console=
and make it safe, while preserving the behaviour that people are
used to by now. console= is a buggy feature by now.
-ss
On Thu 2020-10-08 14:52:38, Sergey Senozhatsky wrote:
> On (20/10/08 01:29), Sergey Senozhatsky wrote:
> > On (20/10/07 08:57), Guenter Roeck wrote:
> > > On 10/7/20 5:30 AM, Sergey Senozhatsky wrote:
> >
> > [..]
> >
> > > I can see to options: Link /dev/console to /dev/null if there is no console,
> > > or do something like
> > >
> > > if (IS_ERR(file)) {
> > > pr_warn("Warning: unable to open an initial console.\n");
> > > file = filp_open("/dev/null", O_RDWR, 0);
> > > if (IS_ERR(file))
> > > return;
> > > }
> >
> > As far as I can tell, /dev/null does not exist yet on this stage
> > (at least not in my system). But generally the idea looks interesting.
>
> Hmm. How about this. console= is undocumented and unspecified - it
> may work sometimes or it may kill the system (and theoretically even
> corrupt some files, depending on what fd 1 and fd 2 point to). So
> maybe we can document console= and handle it in printk, rather than
> somewhere deep in init/main.c
I have dig more into it. If I get it correctly, /dev/console is really
used as stdin, stdout, and stderr for the init process. It has been
like this from the very beginning.
In theory, it might be possible to fallback into /dev/null. But it
would not solve the problem when anyone tries to use /dev/console
later.
IMHO, creating /dev/console really _should not_ fail. It means
that we should register some console.
> IOW add one more flag (yeah, I know) and set it when console_setup()
> sees console= boot param. The idea is allow console registration,
> but all consoles should be disabled (cleared CON_ENABLED bit). This
> would be easier to document, at least.
It seems that introducing a new option/flag is the best solution
after all. All other flags are manipulated on different situations
and it would not be easy to define a sane behavior.
I like the proposed "mute_consoles". Well, I have it associated rather
with CONSOLE_LOGLEVEL_SILENT than with disabled console.
I have played with it and am going to send two patches as RFC.
Best Regards,
Petr
On (20/10/22 13:38), Petr Mladek wrote:
> > Hmm. How about this. console= is undocumented and unspecified - it
> > may work sometimes or it may kill the system (and theoretically even
> > corrupt some files, depending on what fd 1 and fd 2 point to). So
> > maybe we can document console= and handle it in printk, rather than
> > somewhere deep in init/main.c
>
> I have dig more into it. If I get it correctly, /dev/console is really
> used as stdin, stdout, and stderr for the init process. It has been
> like this from the very beginning.
>
> In theory, it might be possible to fallback into /dev/null. But it
> would not solve the problem when anyone tries to use /dev/console
> later.
>
> IMHO, creating /dev/console really _should not_ fail. It means
> that we should register some console.
Yes, I didn't find out exactly why the kernel panics yet. Got
interrupted. What I did notice (when we don't have stdin/out/err)
was init process installing "/" as fd 0, and then doing things
like fprintf(stderr, "running early hook"), perhaps some of those
fprintf()-s end up in the wrong place.
> > IOW add one more flag (yeah, I know) and set it when console_setup()
> > sees console= boot param. The idea is allow console registration,
> > but all consoles should be disabled (cleared CON_ENABLED bit). This
> > would be easier to document, at least.
>
> It seems that introducing a new option/flag is the best solution
> after all. All other flags are manipulated on different situations
> and it would not be easy to define a sane behavior.
>
> I like the proposed "mute_consoles". Well, I have it associated rather
> with CONSOLE_LOGLEVEL_SILENT than with disabled console.
>
> I have played with it and am going to send two patches as RFC.
Cool, thanks. I'll reply to that RFC patch set; there are some
more ideas, that we can discuss.
-ss