Assembly code in .../source/lib/raid6/sse2.c

I’ve been working on software to check our 11TB RAID6 arrays for parity and q syndrome consistency (the built-in process only yields an error count and doesn’t properly correct errors). Everything I’ve come up with so far is too slow, so I’m looking at the SSE2-optimized assembly in the kernel sources. After tracing the operations, most of it makes sense, but some of it doesn’t. Following is the relevant section of code from …/source/lib/raid6/sse2.c. Comments that are not mine are removed. Does anyone understand what’s happening with the pcmpgtb operations?

	
	asm volatile("pxor %xmm5,%xmm5");   // xmm5 is zeroed by XORing it with itself
//...more code, but nothing that appears to affect xmm5
	for ( d = 0 ; d < bytes ; d += 64 ) {
		for ( z = z0 ; z >= 0 ; z-- ) {
			asm volatile("prefetchnta %0" :: "m" (dptr[z][d]));
			asm volatile("prefetchnta %0" :: "m" (dptr[z][d+32]));
			asm volatile("pcmpgtb %xmm4,%xmm5");    // Here's what I don't get. Shouldn't xmm5 always be zero here,
			                                        // and therefore never greater than xmm4?
			asm volatile("pcmpgtb %xmm6,%xmm7");
			asm volatile("pcmpgtb %xmm12,%xmm13");
			asm volatile("pcmpgtb %xmm14,%xmm15");
			asm volatile("paddb %xmm4,%xmm4");
			asm volatile("paddb %xmm6,%xmm6");
			asm volatile("paddb %xmm12,%xmm12");
			asm volatile("paddb %xmm14,%xmm14");
			asm volatile("pand %xmm0,%xmm5");
			asm volatile("pand %xmm0,%xmm7");
			asm volatile("pand %xmm0,%xmm13");
			asm volatile("pand %xmm0,%xmm15");
			asm volatile("pxor %xmm5,%xmm4");
			asm volatile("pxor %xmm7,%xmm6");
			asm volatile("pxor %xmm13,%xmm12");
			asm volatile("pxor %xmm15,%xmm14");
			asm volatile("movdqa %0,%%xmm5" :: "m" (dptr[z][d]));
			asm volatile("movdqa %0,%%xmm7" :: "m" (dptr[z][d+16]));
			asm volatile("movdqa %0,%%xmm13" :: "m" (dptr[z][d+32]));
			asm volatile("movdqa %0,%%xmm15" :: "m" (dptr[z][d+48]));
			asm volatile("pxor %xmm5,%xmm2");
			asm volatile("pxor %xmm7,%xmm3");
			asm volatile("pxor %xmm13,%xmm10");
			asm volatile("pxor %xmm15,%xmm11");
			asm volatile("pxor %xmm5,%xmm4");
			asm volatile("pxor %xmm7,%xmm6");
			asm volatile("pxor %xmm13,%xmm12");
			asm volatile("pxor %xmm15,%xmm14");
			asm volatile("pxor %xmm5,%xmm5");   // xmm5 is zeroed by XORing it with itself
			asm volatile("pxor %xmm7,%xmm7");
			asm volatile("pxor %xmm13,%xmm13");
			asm volatile("pxor %xmm15,%xmm15");
		}

On 2011-12-07 00:36, kylefaucett wrote:
> I’ve been working on software to check our 11TB RAID6 arrays for parity
> and q syndrome consistency (the built-in process only yields an error

You should be asking in the programming subforum, not here.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Carlos E. R. wrote:
> On 2011-12-07 00:36, kylefaucett wrote:
>> I’ve been working on software to check our 11TB RAID6 arrays for parity
>> and q syndrome consistency (the built-in process only yields an error
>
> You should be asking in the programming subforum, not here.

And probably even better would be the Linux RAID mailing list:

http://vger.kernel.org/vger-lists.html#linux-raid

Sorry about posting in the wrong category - the sub-forum listing at Get Technical Help Here doesn’t show the development forum, and I figured this sub-forum was most closely related to kernel code.

Anyway, I figured it out. The pcmpgtb operator apparently performs a two’s-complement signed comparison byte-by-byte, instead of unsigned as I assumed. The reference material I found didn’t mention that. So register xmm5 is always zero when comparing, but every byte that is greater than 0x7F when treated as unsigned is actually less than zero when the comparison is made. This works well for testing the parity of the highest bit to set up the XOR mask (the q syndrome calculation is basically a linear feedback shift register).

Thanks, djh-novell, for the info about the mailing list.