2009-03-01 05:47:58

by Pavel Roskin

[permalink] [raw]
Subject: Re: [Orinoco-users] linux-firmware binary corruption with gitweb

On Sat, 2009-02-28 at 19:24 +0000, Dave wrote:
> I'm aware of at least a couple users of orinoco who have picked up
> corrupt firmware# from the linux-firmware tree*.
>
> I've verified that the firmware in the repository itself is correct.
>
> It appears that downloading the file using the blob/raw links from
> gitweb causes the corruption (0xc3 everywhere). At least it does with
> firefox.

I can confirm the problem with Firefox 3.0.6. But it's not "0xc3
everywhere". The corrupted file is a result of recoding from iso-8859-1
to utf-8. The correct agere_sta_fw.bin is 65046 bytes long. The
corrupted agere_sta_fw.bin is 89729 bytes long.

There is a way to recode the original binary with GNU recode:
recode utf8..iso8859-1 agere_sta_fw.bin

wget 1.11.4 also gets a corrupted file 89729 bytes long.

$ wget "http://git.kernel.org/?p=linux/kernel/git/dwmw2/linux-firmware.git;a=blob;f=agere_sta_fw.bin;h=bae000f5a7162f5a5b052a2f5b78016e95f825c5;hb=d4cfa9f14c55e9d62f053a542fac21744f22546b"
--2009-03-01 00:42:38-- http://git.kernel.org/?p=linux/kernel/git/dwmw2/linux-firmware.git;a=blob;f=agere_sta_fw.bin;h=bae000f5a7162f5a5b052a2f5b78016e95f825c5;hb=d4cfa9f14c55e9d62f053a542fac21744f22546b
Resolving git.kernel.org... 204.152.191.40, 149.20.20.136
Connecting to git.kernel.org|204.152.191.40|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: `index.html?p=linux%2Fkernel%2Fgit%2Fdwmw2%2Flinux-firmware.git;a=blob;f=agere_sta_fw.bin;h=bae000f5a7162f5a5b052a2f5b78016e95f825c5;hb=d4cfa9f14c55e9d62f053a542fac21744f22546b'

[ <=> ] 89,729 237K/s in 0.4s

2009-03-01 00:42:39 (237 KB/s) - `index.html?p=linux%2Fkernel%2Fgit%2Fdwmw2%2Flinux-firmware.git;a=blob;f=agere_sta_fw.bin;h=bae000f5a7162f5a5b052a2f5b78016e95f825c5;hb=d4cfa9f14c55e9d62f053a542fac21744f22546b' saved [89729]

curl 7.18.2 also get the corrupted file:

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 89729 0 89729 0 0 111k 0 --:--:-- --:--:-- --:--:-- 191k

My strong impression is that the recoding takes place on the server. I
think the bug should be reported to the gitweb maintainers unless it a
local breakage on the kernel.org site.

--
Regards,
Pavel Roskin


2009-03-03 19:00:25

by Dave Kilroy

[permalink] [raw]
Subject: Re: [Orinoco-users] linux-firmware binary corruption with gitweb

Adding the git mailing list.

Pavel Roskin wrote:
> On Sat, 2009-02-28 at 19:24 +0000, Dave wrote:
>> I'm aware of at least a couple users of orinoco who have picked up
>> corrupt firmware# from the linux-firmware tree*.
>>
>> I've verified that the firmware in the repository itself is correct.
>>
>> It appears that downloading the file using the blob/raw links from
>> gitweb causes the corruption (0xc3 everywhere). At least it does with
>> firefox.
>
> I can confirm the problem with Firefox 3.0.6. But it's not "0xc3
> everywhere". The corrupted file is a result of recoding from iso-8859-1
> to utf-8. The correct agere_sta_fw.bin is 65046 bytes long. The
> corrupted agere_sta_fw.bin is 89729 bytes long.
>
> There is a way to recode the original binary with GNU recode:
> recode utf8..iso8859-1 agere_sta_fw.bin
>
> wget 1.11.4 also gets a corrupted file 89729 bytes long.
>
> curl 7.18.2 also get the corrupted file:
>
> My strong impression is that the recoding takes place on the server. I
> think the bug should be reported to the gitweb maintainers unless it a
> local breakage on the kernel.org site.

Thanks Pavel.

I just did a quick scan of the gitweb README - is this an issue with the
$mimetypes_file or $fallback_encoding configurations variables?


Regards,

Dave.

#<http://marc.info/?l=orinoco-users&m=123411762524637>
*<http://git.kernel.org/?p=linux/kernel/git/dwmw2/linux-firmware.git;a=shortlog>

2009-03-04 00:26:48

by Jakub Narebski

[permalink] [raw]
Subject: Re: [Orinoco-users] linux-firmware binary corruption with gitweb

Dave <[email protected]> writes:

> Adding the git mailing list.
>
> Pavel Roskin wrote:
> > On Sat, 2009-02-28 at 19:24 +0000, Dave wrote:

>>> I'm aware of at least a couple users of orinoco who have picked up
>>> corrupt firmware# from the linux-firmware tree*.
>>>
>>> I've verified that the firmware in the repository itself is correct.
>>>
>>> It appears that downloading the file using the blob/raw links from
>>> gitweb causes the corruption (0xc3 everywhere). At least it does with
>>> firefox.
>>
>> I can confirm the problem with Firefox 3.0.6. But it's not "0xc3
>> everywhere". The corrupted file is a result of recoding from iso-8859-1
>> to utf-8. The correct agere_sta_fw.bin is 65046 bytes long. The
>> corrupted agere_sta_fw.bin is 89729 bytes long.

[...]
>> My strong impression is that the recoding takes place on the server. I
>> think the bug should be reported to the gitweb maintainers unless it a
>> local breakage on the kernel.org site.
>
> Thanks Pavel.
>
> I just did a quick scan of the gitweb README - is this an issue with the
> $mimetypes_file or $fallback_encoding configurations variables?

First, what version of gitweb do you use? It should be in 'Generator'
meta header, or (in older gitweb) in comments in HTML source at the
top of the page.

Second, the file is actually sent to browser 'as is', using binmode :raw
(or at least should be according to my understanding of Perl). And *.bin
binary file gets application/octet-stream mimetype, and doesn't send any
charset info. git.kernel.org should have modern enough gitweb to use this.
Strange...

--
Jakub Narebski
Poland
ShadeHawk on #git

2009-03-04 23:52:38

by Dave Kilroy

[permalink] [raw]
Subject: Re: [Orinoco-users] linux-firmware binary corruption with gitweb

Jakub Narebski wrote:
> Dave <[email protected]> writes:
>>> My strong impression is that the recoding takes place on the server. I
>>> think the bug should be reported to the gitweb maintainers unless it a
>>> local breakage on the kernel.org site.
>> Thanks Pavel.
>>
>> I just did a quick scan of the gitweb README - is this an issue with the
>> $mimetypes_file or $fallback_encoding configurations variables?
>
> First, what version of gitweb do you use? It should be in 'Generator'
> meta header, or (in older gitweb) in comments in HTML source at the
> top of the page.

Not sure where I'd find the meta header, but at the top of the HTML:

<!-- git web interface version 1.4.5-rc0.GIT-dirty, (C) 2005-2006, Kay
Sievers <[email protected]>, Christian Gierke -->
<!-- git core binaries version 1.6.1.1 -->

> Second, the file is actually sent to browser 'as is', using binmode :raw
> (or at least should be according to my understanding of Perl). And *.bin
> binary file gets application/octet-stream mimetype, and doesn't send any
> charset info. git.kernel.org should have modern enough gitweb to use this.
> Strange...

Dug around gitweb.perl in the main git repo. Then looked at the
git/warthog9/gitweb.git repo (after noting the Git Wiki says kernel.org
is running John Hawley's branch).

One notable change to git_blob_plain:

undef $/;
binmode STDOUT, ':raw';
- print <$fd>;
+ #print <$fd>;
+ $output .= <$fd>;
binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
$/ = "\n";

close $fd;
+
+ return $output;

If that's the code that's running, doesn't that mean the output mode
change doesn't impact the concatenation to $output? So the blob gets utf
encoding when actually printed.


Regards,

Dave.

2009-03-05 17:26:41

by Pavel Roskin

[permalink] [raw]
Subject: Re: [Orinoco-users] linux-firmware binary corruption with gitweb

On Wed, 2009-03-04 at 23:52 +0000, Dave wrote:
> binmode STDOUT, ':raw';
> - print <$fd>;
> + #print <$fd>;
> + $output .= <$fd>;
> binmode STDOUT, ':utf8'; # as set at the beginning of
> gitweb.cgi

Nice catch!

Looking at the gitweb repository from kernel.org, two instances of
circumventing binmode were introduced by this commit:

commit c79ae555fb3c89d91b4cafbfce306e695720507b
Author: John Hawley <[email protected]>
Date: Thu Dec 28 21:59:43 2006 -0800

Last of the changes to deal with channeling the text through the caching
engine. Wow is this a total hack.

The original behavior was restored in git_snapshot() by the recent
commit c15229acd9bedf165f1eb05d99fa989d3b9f3e32, but git_blob_plain()
remains broken.

I don't see an easy fix. We cannot manipulate the blob to counteract
the encoding, as it may not be valid utf-8, and therefore won't be
output in the utf-8 mode.

Maybe binmode should be raw everywhere, and adding to $output should
recode data to utf-8 from other encodings where needed, but it would be
a massive patch, I'm afraid. Or it would be a small patch requiring
massive testing.

Adding John Hawley to cc:

--
Regards,
Pavel Roskin

2009-03-06 00:03:56

by Jakub Narebski

[permalink] [raw]
Subject: Re: [Orinoco-users] linux-firmware binary corruption with gitweb

On Thu, 5 March 2009, Dave wrote:
> Jakub Narebski wrote:
>> Dave <[email protected]> writes:

>>>> My strong impression is that the recoding takes place on the server. I
>>>> think the bug should be reported to the gitweb maintainers unless it a
>>>> local breakage on the kernel.org site.

It is on server, but kernel.org runs modified version of gitweb, and
the bug is in the modifications. See below.

CC-ed John 'Warthog9' Hawley, maintainer of gitweb on kernel.org

>>>>
>>> Thanks Pavel.
>>>
>>> I just did a quick scan of the gitweb README - is this an issue with the
>>> $mimetypes_file or $fallback_encoding configurations variables?
>>
>> First, what version of gitweb do you use? It should be in 'Generator'
>> meta header, or (in older gitweb) in comments in HTML source at the
>> top of the page.
>
> Not sure where I'd find the meta header,

<meta name="generator" content="gitweb/1.4.5-rc0.GIT-dirty git/1.6.1.1"/>

> but at the top of the HTML:
>
> <!-- git web interface version 1.4.5-rc0.GIT-dirty, (C) 2005-2006, Kay
> Sievers <[email protected]>, Christian Gierke -->
> <!-- git core binaries version 1.6.1.1 -->

The question was if it is extremely old version of gitweb, without fix
of raw blob ('blob_plain') output for non-utf8, non-text files. But the
answer is that it is _modified_ version of gitweb, see below.

>
>> Second, the file is actually sent to browser 'as is', using binmode :raw
>> (or at least should be according to my understanding of Perl). And *.bin
>> binary file gets application/octet-stream mimetype, and doesn't send any
>> charset info. git.kernel.org should have modern enough gitweb to use this.
>> Strange...
>
> Dug around gitweb.perl in the main git repo. Then looked at the
> git/warthog9/gitweb.git repo (after noting the Git Wiki says kernel.org
> is running John Hawley's branch).
>
> One notable change to git_blob_plain:
>
> undef $/;
> binmode STDOUT, ':raw';
> - print <$fd>;
> + #print <$fd>;
> + $output .= <$fd>;
> binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
> $/ = "\n";
>
> close $fd;
> +
> + return $output;
>
> If that's the code that's running, doesn't that mean the output mode
> change doesn't impact the concatenation to $output? So the blob gets utf
> encoding when actually printed.

That is the culprit. kernel.org runs modified version of gitweb, with
added caching. I guess that the above change was to have 'blob_plain'
output cached... but it loses "rawness", and I guess it also loses
mimetype info (unless "print $cgi->header(...)" is also changed to
appending to $output).

One possible solution would be to redirect STDOUT to scalar, and return
that scalar; do that always when caching _output_, and print :raw all
cached _output_ data.
close STDOUT;
open STDOUT, '>', \$output or die "Can't open STDOUT: $!";


BTW. f5aa79d (gitweb: safely output binary files for 'blob_plain' action)
was my third patch for git...

--
Jakub Narebski
Poland