2002-08-04 22:21:24

by Petr Vandrovec

[permalink] [raw]
Subject: [PATCH] IDE udma_status = 0x76 and 2.5.30...

Hi,
patch below fixes troubles with PDC20265 reporting udma_status = 0x76
with kernels since pcidma cleanup (it was already reported here, but
I did not found patch for it). Code before change (IDE110) set highest
bit of last dword to 1 on all devices except TRM290. New code
set it only on TRM290 devices, which breaks at least mine PDC20265.
Please send it to Linus in your next update.

BTW, are there any TRM290 owners using 2.5.30? Old code set length to
((length >> 2) - 1) << 16, while new code does not have special handling
for TRM290. Or do I miss something?

And BTW#2, mine problematic Toshiba disk works fine with PDC20265 with
512B request size... It breaks with i845 and i440BX, under any UDMA.
Thanks,
Petr Vandrovec
[email protected]


diff -urdN linux/drivers/ide/pcidma.c linux/drivers/ide/pcidma.c
--- linux/drivers/ide/pcidma.c 2002-08-04 17:54:41.000000000 +0200
+++ linux/drivers/ide/pcidma.c 2002-08-04 23:52:33.000000000 +0200
@@ -403,10 +403,8 @@
sg++;
}

-#ifdef CONFIG_BLK_DEV_TRM290
- if (ch->chipset == ide_trm290)
+ if (ch->chipset != ide_trm290)
*--table |= cpu_to_le32(0x80000000);
-#endif

return ch->sg_nents;
}


2002-08-05 09:34:38

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [PATCH] IDE udma_status = 0x76 and 2.5.30...

Uz.ytkownik Petr Vandrovec napisa?:
> Hi,
> patch below fixes troubles with PDC20265 reporting udma_status = 0x76
> with kernels since pcidma cleanup (it was already reported here, but
> I did not found patch for it). Code before change (IDE110) set highest
> bit of last dword to 1 on all devices except TRM290. New code
> set it only on TRM290 devices, which breaks at least mine PDC20265.
> Please send it to Linus in your next update.

Yes it is obviously correct.

> BTW, are there any TRM290 owners using 2.5.30? Old code set length to
> ((length >> 2) - 1) << 16, while new code does not have special handling
> for TRM290. Or do I miss something?

The new code is overwriting those values in the host controller driver
itself.

> And BTW#2, mine problematic Toshiba disk works fine with PDC20265 with
> 512B request size... It breaks with i845 and i440BX, under any UDMA.

Hmm... It is very well possible that the Toshiba doesn't like the
fact that the intel chipsets cheat and do something like UDMA88 instead
of UDMA100. Could you verify this by checking whatever forcing them to
UDMA66 helps please? Vojtech?

2002-08-05 10:30:29

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [PATCH] IDE udma_status = 0x76 and 2.5.30...

Uz.ytkownik Petr Vandrovec napisa?:
> On 5 Aug 02 at 11:33, Marcin Dalecki wrote:
>
>>Uz.ytkownik Petr Vandrovec napisa?:
>>
>>> BTW, are there any TRM290 owners using 2.5.30? Old code set length to
>>>((length >> 2) - 1) << 16, while new code does not have special handling
>>>for TRM290. Or do I miss something?
>>
>>The new code is overwriting those values in the host controller driver
>>itself.
>
>
> Really? I'm not able to locate such overwrite in trm290.c. Are they hidden
> somewhere else?

This should handle it:

hwif->seg_boundary_mask = 0xffffffff;
>
> Also BUG_ON() in udma_new_table is bogus. Change code:
>
> - u32 cur_len = sg_dma_len(sg) & 0xffff;
> + u32 cur_len = sg_dma_len(sg);
>
> /* Delete this ... */
> BUG_ON(cur_len > ch->max_segment_size);
>
> *table++ = cpu_to_le32(cur_addr);
> - *table++ = cpu_to_le32(cur_len);
> + *table++ = cpu_to_le32(cur_len & 0xffff);
>
> Without first change BUG_ON will not trigger on any transfer: values
> up to 0xFE00 are legal, and values over 0x10000 get cut down to
> 0x0xxxx...
>
> Second change is needed only if we have some driver setting
> max_segment_size to value > 0xffff: currently we do not have such driver,
> default is 0xfe00, and value set by cs5530 is 0xfe00 too.

Well trm390 *does* set ->seg_boundary_mask.

>>Hmm... It is very well possible that the Toshiba doesn't like the
>>fact that the intel chipsets cheat and do something like UDMA88 instead
>>of UDMA100. Could you verify this by checking whatever forcing them to
>>UDMA66 helps please? Vojtech?
>
>
> It happens with UDMA0 too (and I tried slowest possible timming at
> i845, and it still happens).

I'm more and more amanzed, why my system works at all...

> P.S.: Marcin, are you Marcin or Martin? MAINTAINERS says Martin,
> but your replies state Marcin...

It doesn't really matter. Marcin is just the polish spelling of the name
Martin. So choose whatever you want please.


2002-08-05 10:41:12

by Petr Vandrovec

[permalink] [raw]
Subject: Re: [PATCH] IDE udma_status = 0x76 and 2.5.30...

On 5 Aug 02 at 11:33, Marcin Dalecki wrote:
> Uz.ytkownik Petr Vandrovec napisa?:
> > BTW, are there any TRM290 owners using 2.5.30? Old code set length to
> > ((length >> 2) - 1) << 16, while new code does not have special handling
> > for TRM290. Or do I miss something?
>
> The new code is overwriting those values in the host controller driver
> itself.

Really? I'm not able to locate such overwrite in trm290.c. Are they hidden
somewhere else?

Also BUG_ON() in udma_new_table is bogus. Change code:

- u32 cur_len = sg_dma_len(sg) & 0xffff;
+ u32 cur_len = sg_dma_len(sg);

/* Delete this ... */
BUG_ON(cur_len > ch->max_segment_size);

*table++ = cpu_to_le32(cur_addr);
- *table++ = cpu_to_le32(cur_len);
+ *table++ = cpu_to_le32(cur_len & 0xffff);

Without first change BUG_ON will not trigger on any transfer: values
up to 0xFE00 are legal, and values over 0x10000 get cut down to
0x0xxxx...

Second change is needed only if we have some driver setting
max_segment_size to value > 0xffff: currently we do not have such driver,
default is 0xfe00, and value set by cs5530 is 0xfe00 too.

> > And BTW#2, mine problematic Toshiba disk works fine with PDC20265 with
> > 512B request size... It breaks with i845 and i440BX, under any UDMA.
>
> Hmm... It is very well possible that the Toshiba doesn't like the
> fact that the intel chipsets cheat and do something like UDMA88 instead
> of UDMA100. Could you verify this by checking whatever forcing them to
> UDMA66 helps please? Vojtech?

It happens with UDMA0 too (and I tried slowest possible timming at
i845, and it still happens).
Petr Vandrovec
[email protected]

P.S.: Marcin, are you Marcin or Martin? MAINTAINERS says Martin,
but your replies state Marcin...

2002-08-05 14:03:51

by Petr Vandrovec

[permalink] [raw]
Subject: Re: [PATCH] IDE udma_status = 0x76 and 2.5.30...

On 5 Aug 02 at 12:28, Marcin Dalecki wrote:
> Uz.ytkownik Petr Vandrovec napisa?:
> > On 5 Aug 02 at 11:33, Marcin Dalecki wrote:
> >
> >>Uz.ytkownik Petr Vandrovec napisa?:
> >>
> >>> BTW, are there any TRM290 owners using 2.5.30? Old code set length to
> >>>((length >> 2) - 1) << 16, while new code does not have special handling
> >>>for TRM290. Or do I miss something?
> >>
> >>The new code is overwriting those values in the host controller driver
> >>itself.
> >
> >
> > Really? I'm not able to locate such overwrite in trm290.c. Are they hidden
> > somewhere else?
>
> This should handle it:
>
> hwif->seg_boundary_mask = 0xffffffff;

Nope. This is another problem. TRM290 code did:

xcount = bcount & 0xffff;
if (is_trm290_chipset)
xcount = ((xcount >> 2) - 1) << 16;
...
*table++ = cpu_to_le32(xcount);

Current code does not do xcount modification for trm290 - while
other hosts want byte count in lower 16bit word of length,
trm290 apparently wants dword count (less 1) in upper 16bit word of length.

> >
> > Also BUG_ON() in udma_new_table is bogus. Change code:
> >
> > - u32 cur_len = sg_dma_len(sg) & 0xffff;
> > + u32 cur_len = sg_dma_len(sg);
> >
> > /* Delete this ... */
> > BUG_ON(cur_len > ch->max_segment_size);
> >
> > *table++ = cpu_to_le32(cur_addr);
> > - *table++ = cpu_to_le32(cur_len);
> > + *table++ = cpu_to_le32(cur_len & 0xffff);
> >
> > Without first change BUG_ON will not trigger on any transfer: values
> > up to 0xFE00 are legal, and values over 0x10000 get cut down to
> > 0x0xxxx...
> >
> > Second change is needed only if we have some driver setting
> > max_segment_size to value > 0xffff: currently we do not have such driver,
> > default is 0xfe00, and value set by cs5530 is 0xfe00 too.
>
> Well trm390 *does* set ->seg_boundary_mask.

But max_segment_size is what we are testing here. If you want test
crossing 64K boundary, you want

BUG_ON((cur_addr & ch->seg_boundary_mask) + cur_len - 1 > ch->seg_boundary_mask),

as even 2 sector transfer can cross 64K boundary which IDE specs
forbids (even 1 sector can do that, but I believe that bio layer
sends properly aligned sectors to us).

>
> >>Hmm... It is very well possible that the Toshiba doesn't like the
> >>fact that the intel chipsets cheat and do something like UDMA88 instead
> >>of UDMA100. Could you verify this by checking whatever forcing them to
> >>UDMA66 helps please? Vojtech?
> >
> >
> > It happens with UDMA0 too (and I tried slowest possible timming at
> > i845, and it still happens).
>
> I'm more and more amanzed, why my system works at all...

PDC20265 works correctly without setting stop-bit in last descriptor
for reads, as IDE drive signals no more data, and we stop udma engine
manually in such case. But for writes PDC20265 prefetches beyond last
pointer, finds garbage here (probably descriptor crossing 64KB, or
odd length or ...), and aborts whole transfer in the middle (about
4 bytes before end of real write, when it tries to prefetch beginning
of next sector).
Petr Vandrovec
[email protected]

2002-08-05 14:22:32

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [PATCH] IDE udma_status = 0x76 and 2.5.30...

Uz.ytkownik Petr Vandrovec napisa?:

> PDC20265 works correctly without setting stop-bit in last descriptor
> for reads, as IDE drive signals no more data, and we stop udma engine
> manually in such case. But for writes PDC20265 prefetches beyond last
> pointer, finds garbage here (probably descriptor crossing 64KB, or
> odd length or ...), and aborts whole transfer in the middle (about
> 4 bytes before end of real write, when it tries to prefetch beginning
> of next sector).

What about inserting a trap pointer there? One which would allow to
determine "overflows" fast? However I don't see a way to trigger
something as convenient as a simple page fault here.

2002-08-05 20:12:46

by Andre Hedrick

[permalink] [raw]
Subject: Re: [PATCH] IDE udma_status = 0x76 and 2.5.30...


This has now become an utter JOKE!

If we examine Table 2-3 of the Retired SFF-8038i rev 1.0 the only document
to describe the behaviors of Bus Master IDE Status Register.
It is a R/W field.

Bit 7: Simplex RO
Bit 6: Device 1 DMA Capable RW
Bit 5: Device 0 DMA Capable RW
Bit 4: Reserved "MUST RETURN ZERO ON READS" !!! Vendor Write
Bit 3: Reserved "MUST RETURN ZERO ON READS" !!! Vendor Write
Bit 2: Bus Master Interrupt STATUS R Clear W
Bit 1: Bus Master Error STATUS R Clear W
Bit 0: Bus Master Active STATUS

Vendor Write, is not a published or listed techincal term.
It is me trying to present this clearly enough so that the masses will see
how poorly the general understanding of the basics in 2.5.

Not to worry, I am sure UnderDog will come an save the day soon enough.

Regards,

Andre Hedrick
LAD Storage Consulting Group

2002-08-05 20:20:30

by Alan

[permalink] [raw]
Subject: Re: [PATCH] IDE udma_status = 0x76 and 2.5.30...

On Mon, 2002-08-05 at 21:08, Andre Hedrick wrote:
> If we examine Table 2-3 of the Retired SFF-8038i rev 1.0 the only document
> to describe the behaviors of Bus Master IDE Status Register.
> It is a R/W field.

Andre - you are assuming the drive vendor follows 8038i and didnt just
leave the reserved bits random. At that point 0x76 makes reasonable
sense since its two DMA capable devices + a reserved bit and interrupt
with bus master error.

Which simply means the code is still broken rather than anything truely
insane is happening.

Alan

2002-08-05 20:28:15

by Petr Vandrovec

[permalink] [raw]
Subject: Re: [PATCH] IDE udma_status = 0x76 and 2.5.30...

On 5 Aug 02 at 13:08, Andre Hedrick wrote:
> Bit 7: Simplex RO
> Bit 6: Device 1 DMA Capable RW
> Bit 5: Device 0 DMA Capable RW
> Bit 4: Reserved "MUST RETURN ZERO ON READS" !!! Vendor Write
> Bit 3: Reserved "MUST RETURN ZERO ON READS" !!! Vendor Write
> Bit 2: Bus Master Interrupt STATUS R Clear W
> Bit 1: Bus Master Error STATUS R Clear W
> Bit 0: Bus Master Active STATUS
>
> Vendor Write, is not a published or listed techincal term.
> It is me trying to present this clearly enough so that the masses will see
> how poorly the general understanding of the basics in 2.5.

If you'll bother with reading code, you'll find that dma status
is reported after:

(dma_stat & 7) != 4 ? (0x10 | dma_stat) : 0;

Because of same code is used in your 2.4.x ide-dma.c (at least in
2.4.19-rc5), I'm really happy that now I know that you are really
familiar with your code.

Code ORs read value with 0x10 to make sure that it reports non-zero
value when error happens (when chip returs dma_stat == 0x00). And 0x10
was choosen because of this bit should be always zero, as you know.

Maybe we should print value after ANDing with 0xEF, but why? Everybody
can read code when in doubt.
Petr Vandrovec
[email protected]

2002-08-05 21:14:17

by Andre Hedrick

[permalink] [raw]
Subject: Re: [PATCH] IDE udma_status = 0x76 and 2.5.30...


Petr,

Funny reading the code of udma_status in pcidma.c I see nothing that can
report the contents of busmaster register undoctored. Only udma_stop
returns a binary state or that is all it can do.

Your point about about code familiarity is a lark.

On Mon, 5 Aug 2002, Petr Vandrovec wrote:

> Code ORs read value with 0x10 to make sure that it reports non-zero
> value when error happens (when chip returs dma_stat == 0x00). And 0x10
> was choosen because of this bit should be always zero, as you know.

The only time known when it returns a non 0xE7 maskable value or the inner
bits are set, is when you poke the dma engine when it is BUSY and/or EOT
has not occurred. So clearly you have broken the EOT checks because
clearly it is not needed in broken host drivers.

Now, to hand you a clue bat!

I have observed this in the port forward into 2.5 also, and have seen
dma_status return on a simplex setup with 0xFF. Now given that I have
correct EOT checks, and from a brief overview of 2.5 can not tell if
they are exist anymore, it can only mean that events are outside the model
of the DMA State Machine.

The result is if the inner two bits on the top and bottom nibbles are set,
the damn thing is busy and code is wrong on the calling events.


Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Mon, 5 Aug 2002, Petr Vandrovec wrote:

> On 5 Aug 02 at 13:08, Andre Hedrick wrote:
> > Bit 7: Simplex RO
> > Bit 6: Device 1 DMA Capable RW
> > Bit 5: Device 0 DMA Capable RW
> > Bit 4: Reserved "MUST RETURN ZERO ON READS" !!! Vendor Write
> > Bit 3: Reserved "MUST RETURN ZERO ON READS" !!! Vendor Write
> > Bit 2: Bus Master Interrupt STATUS R Clear W
> > Bit 1: Bus Master Error STATUS R Clear W
> > Bit 0: Bus Master Active STATUS
> >
> > Vendor Write, is not a published or listed techincal term.
> > It is me trying to present this clearly enough so that the masses will see
> > how poorly the general understanding of the basics in 2.5.
>
> If you'll bother with reading code, you'll find that dma status
> is reported after:
>
> (dma_stat & 7) != 4 ? (0x10 | dma_stat) : 0;
>
> Because of same code is used in your 2.4.x ide-dma.c (at least in
> 2.4.19-rc5), I'm really happy that now I know that you are really
> familiar with your code.
>
> Code ORs read value with 0x10 to make sure that it reports non-zero
> value when error happens (when chip returs dma_stat == 0x00). And 0x10
> was choosen because of this bit should be always zero, as you know.
>
> Maybe we should print value after ANDing with 0xEF, but why? Everybody
> can read code when in doubt.
> Petr Vandrovec
> [email protected]
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


2002-08-05 21:23:23

by Andre Hedrick

[permalink] [raw]
Subject: Re: [PATCH] IDE udma_status = 0x76 and 2.5.30...


Try

dma_timer_expiry()

if ((dma_stat & 0x18) == 0x18) /* BUSY Stupid Early Timer !! */
return WAIT_CMD;

That will expand the dma_timer_expiry timeout to give the hardware a
chance to complete; however, if ya'll have successfully goat screwed the
EOT you die in a LOOP.

On Mon, 5 Aug 2002, Andre Hedrick wrote:

>
> Petr,
>
> Funny reading the code of udma_status in pcidma.c I see nothing that can
> report the contents of busmaster register undoctored. Only udma_stop
> returns a binary state or that is all it can do.
>
> Your point about about code familiarity is a lark.
>
> On Mon, 5 Aug 2002, Petr Vandrovec wrote:
>
> > Code ORs read value with 0x10 to make sure that it reports non-zero
> > value when error happens (when chip returs dma_stat == 0x00). And 0x10
> > was choosen because of this bit should be always zero, as you know.
>
> The only time known when it returns a non 0xE7 maskable value or the inner
> bits are set, is when you poke the dma engine when it is BUSY and/or EOT
> has not occurred. So clearly you have broken the EOT checks because
> clearly it is not needed in broken host drivers.
>
> Now, to hand you a clue bat!
>
> I have observed this in the port forward into 2.5 also, and have seen
> dma_status return on a simplex setup with 0xFF. Now given that I have
> correct EOT checks, and from a brief overview of 2.5 can not tell if
> they are exist anymore, it can only mean that events are outside the model
> of the DMA State Machine.
>
> The result is if the inner two bits on the top and bottom nibbles are set,
> the damn thing is busy and code is wrong on the calling events.
>
>
> Cheers,
>
> Andre Hedrick
> LAD Storage Consulting Group
>
> On Mon, 5 Aug 2002, Petr Vandrovec wrote:
>
> > On 5 Aug 02 at 13:08, Andre Hedrick wrote:
> > > Bit 7: Simplex RO
> > > Bit 6: Device 1 DMA Capable RW
> > > Bit 5: Device 0 DMA Capable RW
> > > Bit 4: Reserved "MUST RETURN ZERO ON READS" !!! Vendor Write
> > > Bit 3: Reserved "MUST RETURN ZERO ON READS" !!! Vendor Write
> > > Bit 2: Bus Master Interrupt STATUS R Clear W
> > > Bit 1: Bus Master Error STATUS R Clear W
> > > Bit 0: Bus Master Active STATUS
> > >
> > > Vendor Write, is not a published or listed techincal term.
> > > It is me trying to present this clearly enough so that the masses will see
> > > how poorly the general understanding of the basics in 2.5.
> >
> > If you'll bother with reading code, you'll find that dma status
> > is reported after:
> >
> > (dma_stat & 7) != 4 ? (0x10 | dma_stat) : 0;
> >
> > Because of same code is used in your 2.4.x ide-dma.c (at least in
> > 2.4.19-rc5), I'm really happy that now I know that you are really
> > familiar with your code.
> >
> > Code ORs read value with 0x10 to make sure that it reports non-zero
> > value when error happens (when chip returs dma_stat == 0x00). And 0x10
> > was choosen because of this bit should be always zero, as you know.
> >
> > Maybe we should print value after ANDing with 0xEF, but why? Everybody
> > can read code when in doubt.
> > Petr Vandrovec
> > [email protected]
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Andre Hedrick
LAD Storage Consulting Group