From: "Dave Korn" Subject: [VXWORKS BUG+FIX] Nasty bug in VxWorks PPC Cache library/ Compilers and optimization. Date: 11 Dec 2000 00:00:00 GMT Message-ID: X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3612.1700 X-Complaints-To: news@u-net.net X-Trace: newsr1.u-net.net 976556785 193.123.2.233 (Mon, 11 Dec 2000 17:46:25 GMT) Organization: Lumber Cartel (tinlc) Members #2234-2237 (owing to browser refresh) NNTP-Posting-Date: Mon, 11 Dec 2000 17:46:25 GMT Newsgroups: comp.os.vxworks Good day everybody, Please take a look at the following output, which I obtained by doing >arppc x F:\TORNADO\target\lib\libPPC604gnuvx.a cacheALib.o >objdumpppc -Sr cachealib.o in my $WIND_BASE/target/lib directory. ;// 000000c8 add r5,r5,r4 ;// 000000cc rlwinm r4,r4,0,0,26 ;// 000000d0 cmpwi cr3,r3,1 ;// 000000d4 beq cr3,000000e8 ;// 000000d8 cmpwi r3,0 ;// 000000dc bne 0000014c ;// 000000e0 icbi r0,r4 ;// 000000e4 b 000000ec ;// 000000e8 dcbi r0,r4 ;// 000000ec addi r4,r4,32 ;// 000000f0 cmplw r4,r5 ;// 000000f4 bge 0000016c ;// 000000f8 beq cr3,000000e8 ;// 000000fc b 000000e0 Now, according to the EABI spec., condition code register fields 2, 3 and 4 are non-volatile, and any routine that wants to use them must save them on entry and restore them on exit. The cacheArchInvalidate routine above clearly does no such thing. [I've also checked the code path through cacheInvalidate that leads here, and it neither saves nor uses cr3]. This bug only shows itself up under very limited circumstances. To the best of my knowledge, the cr fields are only used for longterm storage when you have the optimizer turned on, and perhaps only at high levels. Use of the -g flag (and probably -fvolatile too, though I haven't checked) kills this optimization. When you do get bitten by this one, though, it's going to be nasty. The apparent symptom is liable to be something such as an if .. else statement branching the wrong way, where the test in the condition is an expression that's used repeatedly in the enclosing function. Something like void somefunction(lots of args) { if (some condition) do something; else do something else; ... more code..... if (same condition as before) do something; else do something else; ... more code, including some that does IO or for other reasons invalidates the cache ....... if (same condition again) do something; else do something else; } ..and you find that the third if (..) sometimes takes the opposite decision to the first two, despite there being no code in between that could alter the values of the variables on which the condition depends. The answer is simple enough. Here's a little bit of assembler code that hotpatches your OS to make the code above use cr6 in place of cr3. It checks that all the instructions it is about to patch are where they should be, and won't do anything if it doesn't recognize the hex values corresponding to the instructions. You might want to remove the comments if you aren't using the c preprocessor on your .S files, and you might need to convert crX and rX into plain X. // int FixOSProblem(void) // 0 = fixed, 1 = not able to fix - code didn't match pattern. .globl FixOSProblem FixOSProblem: b .go .insns: .insnc8: add r5,r5,r4 .insncc: rlwinm r4,r4,0,0,26 .insnd0: cmpwi cr3,r3,1 // change me .insnd4: beq cr3,.insne8 // and me .insnd8: cmpwi r3,0 .insndc: bne .insn14c .insne0: icbi r0,r4 .insne4: b .insnec .insne8: dcbi r0,r4 .insnec: addi r4,r4,32 .insnf0: cmplw r4,r5 .insnf4: bge .insn16c .insnf8: beq cr3,.insne8 // and me .insnfc: b .insne0 .insn14c: ori r0,r0,r0 .insn16c: ori r0,r0,r0 .repls: .replc8: add r5,r5,r4 .replcc: rlwinm r4,r4,0,0,26 .repld0: cmpwi cr6,r3,1 // change me .repld4: beq cr6,.reple8 // and me .repld8: cmpwi r3,0 .repldc: bne .repl14c .reple0: icbi r0,r4 .reple4: b .replec .reple8: dcbi r0,r4 .replec: addi r4,r4,32 .replf0: cmplw r4,r5 .replf4: bge .repl16c .replf8: beq cr6,.reple8 // and me .replfc: b .reple0 .repl14c: ori 0,r0,r0 .repl16c: ori r0,r0,r0 // first verify the insns are as we expect. .go: lis r3,cacheArchInvalidate@ha addi r3,r3,cacheArchInvalidate@l lis r4,.insns@ha addi r4,r4,.insns@l lis r5,.repls@ha addi r5,r5,.repls@l lwz r6,.insnd0-.insns(r3) lwz r7,.insnd0-.insns(r4) xor. r6,r6,r7 bne cr0,.err lwz r6,.insnd4-.insns(r3) lwz r7,.insnd4-.insns(r4) xor. r6,r6,r7 bne cr0,.err lwz r6,.insnf8-.insns(r3) lwz r7,.insnf8-.insns(r4) xor. r6,r6,r7 bne cr0,.err lwz r6,.repld4-.repls(r5) stw r6,.repld4-.repls(r3) lwz r6,.replf8-.repls(r5) stw r6,.replf8-.repls(r3) lwz r6,.repld0-.repls(r5) stw r6,.repld0-.repls(r3) // flush d cache back to mem li r4,.repld4-.repls li r5,.replf8-.repls li r6,.repld0-.repls // make no assumptions about cache lines // just flush all 3 modified insns dcbst r6,r3 dcbst r4,r3 dcbst r5,r3 // wait for mem to update sync // invalidate I cache icbi r6,r3 icbi r4,r3 icbi r5,r3 // and context sync to ensure I cache invalidation completes. isync xor r3,r3,r3 // return success blr .err: li r3,1 // return failure blr hth, DaveK -- They laughed at Galileo. They laughed at Copernicus. They laughed at Columbus. But remember, they also laughed at Bozo the Clown.