Date:   Tue, 21 Mar 2023 17:38:01 +0100
From:   Petr Mladek <pmladek@suse.com>
To:     John Ogness <john.ogness@linutronix.de>
Cc:     Sergey Senozhatsky <senozhatsky@chromium.org>,
        Steven Rostedt <rostedt@goodmis.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        linux-kernel@vger.kernel.org,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH printk v1 07/18] printk: nobkl: Add buffer management
Message-ID: <ZBndaSUFd4ipvKwj@alley>
References: <20230302195618.156940-1-john.ogness@linutronix.de>
 <20230302195618.156940-8-john.ogness@linutronix.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20230302195618.156940-8-john.ogness@linutronix.de>
Precedence: bulk

On Thu 2023-03-02 21:02:07, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> In case of hostile takeovers it must be ensured that the previous
> owner cannot scribble over the output buffer of the emergency/panic
> context. This is achieved by:
> 
>  - Adding a global output buffer instance for early boot (pre per CPU
>    data being available).
> 
>  - Allocating an output buffer per console for threaded printers once
>    printer threads become available.
> 
>  - Allocating per CPU output buffers per console for printing from
>    all contexts not covered by the other buffers.
> 
>  - Choosing the appropriate buffer is handled in the acquire/release
>    functions.
> 
> The output buffer is wrapped into a separate data structure so other
> context related fields can be added in later steps.
> 
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -166,6 +166,47 @@ static inline bool cons_check_panic(void)
>  	return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
>  }
>  
> +static struct cons_context_data early_cons_ctxt_data __initdata;
> +
> +/**
> + * cons_context_set_pbufs - Set the output text buffer for the current context
> + * @ctxt:	Pointer to the acquire context
> + *
> + * Buffer selection:
> + *   1) Early boot uses the global (initdata) buffer
> + *   2) Printer threads use the dynamically allocated per-console buffers
> + *   3) All other contexts use the per CPU buffers
> + *
> + * This guarantees that there is no concurrency on the output records ever.
> + * Early boot and per CPU nesting is not a problem. The takeover logic
> + * tells the interrupted context that the buffer has been overwritten.
> + *
> + * There are two critical regions that matter:
> + *
> + * 1) Context is filling the buffer with a record. After interruption
> + *    it continues to sprintf() the record and before it goes to
> + *    write it out, it checks the state, notices the takeover, discards
> + *    the content and backs out.
> + *
> + * 2) Context is in a unsafe critical region in the driver. After
> + *    interruption it might read overwritten data from the output
> + *    buffer. When it leaves the critical region it notices and backs
> + *    out. Hostile takeovers in driver critical regions are best effort
> + *    and there is not much that can be done about that.
> + */
> +static __ref void cons_context_set_pbufs(struct cons_context *ctxt)
> +{
> +	struct console *con = ctxt->console;
> +
> +	/* Thread context or early boot? */
> +	if (ctxt->thread)
> +		ctxt->pbufs = con->thread_pbufs;
> +	else if (!con->pcpu_data)
> +		ctxt->pbufs = &early_cons_ctxt_data.pbufs;
> +	else
> +		ctxt->pbufs = &(this_cpu_ptr(con->pcpu_data)->pbufs);

What exactly do we need the per-CPU buffers for, please?
Is it for an early boot or panic or another scenario?

I would expect that per-console buffer should be enough.
The per-console atomic lock should define who owns
the per-console buffer. The buffer must be accessed
carefully because any context could loose the atomic lock.
Why is kthread special?

The per-CPU buffer actually looks dangerous. It might
be used by more NOBKL consoles. How is the access synchronized
please? By console_list_lock? It is not obvious to me.


On the contrary, we might need 4 static buffers for the early
boot. For example, one atomic console might start printing
in the normal context. Second atomic console might use
the same static buffer in IRQ context. But the first console
will not realize it because it did not loose the per-CPU
atomic lock when the CPU handled the interrupt..
Or is this handled another way, please?

Best Regards,
Petr