2024-05-02 21:36:43

by Sam James

[permalink] [raw]
Subject: Re: [oss-security] escaping terminal control characters (was Re: backdoor in upstream xz/liblzma leading to ssh server compromise)

Solar Designer <[email protected]> writes:

> On Wed, Apr 03, 2024 at 11:03:17AM +1100, Matthew Fernandez wrote:
>> On 4/1/24 08:30, Solar Designer wrote:
>> >On Sat, Mar 30, 2024 at 04:37:48PM -0000, Tavis Ormandy wrote:
>> >>It was also pointed out they submitted an odd PR to libarchive:
>> >>
>> >>https://github.com/libarchive/libarchive/pull/1609
>> >>
>> >>In summary, they replaced calls to safe_fprintf() with fprintf() --
>> >>meaning control characters are no longer filtered from errors. That
>> >>seems pretty minor, but now that we know they were in the business of
>> >>obfuscating the presence of backdoors -- seems a bit suspicious.
>> >>
>> >>Regardless, that change has now been reverted:
>> >>
>> >>https://github.com/libarchive/libarchive/pull/2101
>> >
>> >This does look minor indeed - not usable for large-scale attacks, and
>> >libarchive is quite unique in that it even bothered to filter control
>> >characters, whereas most command-line tools outputting filenames don't
>> >bother. My guess is it could have been an early experiment to see
>> >whether the project would accept PRs degrading security.
>> >
>> >That said, here's an excellent write-up by David Leadbeater on specific
>> >ways that specific terminal emulators may be usefully attacked with
>> >control sequences:
>> >
>> >https://dgl.cx/2023/09/ansi-terminal-security#vulnerabilities-using-known-replies
>>
>> Is the currently accepted wisdom that any application printing to
>> stdout/stderr should take steps to avoid control characters in the
>> output?
>
> First, let's limit this to cases where the control characters come from
> potentially untrusted input to the program. Obviously, many programs
> generate terminal escapes on their own (usually via a library), for
> their intended functionality (colorized listings, TUIs, etc.) Some
> programs pass potential control characters from their trusted input.
>
> Second, I think no, there isn't currently an established opinion on
> whether programs should perform such filtering of untrusted input.

Lasse has put up an initial implementation for xz:
https://github.com/tukaani-project/xz/pull/118.

Comments are welcome. It was a TODO from a long time ago ;)

We're not sure how much is overkill (or underkill) for this, especially
given it gets harder when Unicode is involved.

> [...]

thanks,
sam


Attachments:
signature.asc (385.00 B)

2024-05-03 10:48:00

by Steffen Nurpmeso

[permalink] [raw]
Subject: Re: [oss-security] escaping terminal control characters (was Re: backdoor in upstream xz/liblzma leading to ssh server compromise)

Sam James wrote in
<[email protected]>:
|Solar Designer <[email protected]> writes:
|> On Wed, Apr 03, 2024 at 11:03:17AM +1100, Matthew Fernandez wrote:
|>> On 4/1/24 08:30, Solar Designer wrote:
|>>>On Sat, Mar 30, 2024 at 04:37:48PM -0000, Tavis Ormandy wrote:
|>>>>It was also pointed out they submitted an odd PR to libarchive:
|>>>>
|>>>>https://github.com/libarchive/libarchive/pull/1609
|>>>>
|>>>>In summary, they replaced calls to safe_fprintf() with fprintf() --
|>>>>meaning control characters are no longer filtered from errors. That
|>>>>seems pretty minor, but now that we know they were in the business of
|>>>>obfuscating the presence of backdoors -- seems a bit suspicious.
|>>>>
|>>>>Regardless, that change has now been reverted:
|>>>>
|>>>>https://github.com/libarchive/libarchive/pull/2101
|>>>
|>>>This does look minor indeed - not usable for large-scale attacks, and
|>>>libarchive is quite unique in that it even bothered to filter control
|>>>characters, whereas most command-line tools outputting filenames don't
|>>>bother. My guess is it could have been an early experiment to see
|>>>whether the project would accept PRs degrading security.
|>>>
|>>>That said, here's an excellent write-up by David Leadbeater on specific
|>>>ways that specific terminal emulators may be usefully attacked with
|>>>control sequences:
|>>>
|>>>https://dgl.cx/2023/09/ansi-terminal-security#vulnerabilities-using-know\
|>>>n-replies
|>>
|>> Is the currently accepted wisdom that any application printing to
|>> stdout/stderr should take steps to avoid control characters in the
|>> output?
|>
|> First, let's limit this to cases where the control characters come from
|> potentially untrusted input to the program. Obviously, many programs
|> generate terminal escapes on their own (usually via a library), for
|> their intended functionality (colorized listings, TUIs, etc.) Some
|> programs pass potential control characters from their trusted input.
|>
|> Second, I think no, there isn't currently an established opinion on
|> whether programs should perform such filtering of untrusted input.
|
|Lasse has put up an initial implementation for xz:
|https://github.com/tukaani-project/xz/pull/118.
|
|Comments are welcome. It was a TODO from a long time ago ;)
|
|We're not sure how much is overkill (or underkill) for this, especially
|given it gets harder when Unicode is involved.
|
|> [...]

For this purpose there exists the (very very expensive)

https://man.netbsd.org/vis.3

series of functions. Or you do something like this, where "isuni"
gives you "this is a UTF-8 nl_langinfo(CODESET)".

if(!iswprint(wc) && wc != '\n' /*&& wc != '\r' && wc != '\b'*/ &&
wc != '\t'){
if((wc & ~S(wchar_t,037)) == 0)
wc = isuni ? 0x2400 | wc : '?';
else if(wc == 0177)
wc = isuni ? 0x2421 : '?';
else
wc = isuni ? 0x2426 : '?';
}else if(isuni){ /* TODO ctext */
/* Need to filter out L-TO-R and R-TO-R marks TODO ctext */
if(wc == 0x200E || wc == 0x200F || (wc >= 0x202A && wc <= 0x202E))
continue;
/* And some zero-width messes */
if(wc == 0x00AD || (wc >= 0x200B && wc <= 0x200D))
continue;
/* Oh about the ISO C wide character interfaces, baby! */
if(wc == 0xFEFF)
continue;
}

if((n = wctomb(mbb, wc)) <= 0)
continue;

This can be made better (for example the above requires "wc" to be
an actual ISO 10646 codepoint, which ISO C etc), but the key point
is that the Unicode standard gives you everything needed to
properly mask these sequences, from its very beginning in 1993:

2400..2424 ; 1.1 # [37] SYMBOL FOR NULL..SYMBOL FOR NEWLINE

and i have yet to see a font which does not support those.
(Whether and how users can make sense of them totally aside.)
Of course you loose the copy&paste capability.

--End of <[email protected]>

--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)

2024-05-03 10:50:44

by Steffen Nurpmeso

[permalink] [raw]
Subject: Re: [oss-security] escaping terminal control characters (was Re: backdoor in upstream xz/liblzma leading to ssh server compromise)

Steffen Nurpmeso wrote in
<20240502223912.08A3RYp4@steffen%sdaoden.eu>:
|Sam James wrote in
| <[email protected]>:
||Solar Designer <[email protected]> writes:
||> On Wed, Apr 03, 2024 at 11:03:17AM +1100, Matthew Fernandez wrote:
||>> On 4/1/24 08:30, Solar Designer wrote:
||>>>On Sat, Mar 30, 2024 at 04:37:48PM -0000, Tavis Ormandy wrote:
...
||>> Is the currently accepted wisdom that any application printing to
||>> stdout/stderr should take steps to avoid control characters in the
||>> output?
||>
||> First, let's limit this to cases where the control characters come from
||> potentially untrusted input to the program. Obviously, many programs
||> generate terminal escapes on their own (usually via a library), for
||> their intended functionality (colorized listings, TUIs, etc.) Some
||> programs pass potential control characters from their trusted input.
||>
||> Second, I think no, there isn't currently an established opinion on
||> whether programs should perform such filtering of untrusted input.
||
||Lasse has put up an initial implementation for xz:
||https://github.com/tukaani-project/xz/pull/118.
||
||Comments are welcome. It was a TODO from a long time ago ;)
||
||We're not sure how much is overkill (or underkill) for this, especially
||given it gets harder when Unicode is involved.
||
||> [...]
|
|For this purpose there exists the (very very expensive)
|
| https://man.netbsd.org/vis.3
|
|series of functions. Or you do something like this, where "isuni"
|gives you "this is a UTF-8 nl_langinfo(CODESET)".
...
| if((wc & ~S(wchar_t,037)) == 0)
| wc = isuni ? 0x2400 | wc : '?';
| else if(wc == 0177)
| wc = isuni ? 0x2421 : '?';
| else
| wc = isuni ? 0x2426 : '?';
...
|This can be made better (for example the above requires "wc" to be
|an actual ISO 10646 codepoint, which ISO C etc), but the key point
|is that the Unicode standard gives you everything needed to
|properly mask these sequences, from its very beginning in 1993:
|
| 2400..2424 ; 1.1 # [37] SYMBOL FOR NULL..SYMBOL FOR NEWLINE
|
|and i have yet to see a font which does not support those.
|(Whether and how users can make sense of them totally aside.)
|Of course you loose the copy&paste capability.

While here please let me back the not yet gracefully supported
shell escape mechanism $''.
The current approach seems to be to be as atomic as possible:

# touch $(printf 'a\rb\tc\a')
# ll
->
-rw-r----- 1 steffen steffen 0 May 3 00:46 'c'$'\a'
-rw-r----- 1 steffen steffen 0 May 3 00:46 'a'$'\r''b'

(GNU coreutils). Isn't that just terrible? In (the development
version of) my mailer tab-completion leads to

#..mbox? /tmp/<TAB>
$'a\rb'
$'c\a'

which i find at least a little bit better. (Do not even think
about looking in its implementation though, look ICU or what.)

And even though currently unsupported, it should be said that with
"grapheme clusters" and in general things like ligatures and other
such language-specific constructs which need to look at
surroundings -- in general interfaces like towupper() etc are not
useful in global context, entire sentences have to be looked at as
a whole due to this! --, shell quotes should be extended to the
largest possible range possible. Ie, all the iconv(3)s that are
currently used because of a lack of other interfaces should be
enabled to see the longest possible (sub)string, not the most
atomar, as seen above.

--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)