In i386 calling convention, the caller put the address of return value of zdot into the first hidden parameter. Thus, the callee should delete this address before return. Actually, I have fixed the same bug on x86/zdot_sse2.S (issue #32). However, that is not a good implementation which uses 3 instructions. Mr. John told me used "ret $0x4" to skip the first hidden address (4 bytes).tags/v0.1alpha2.4^2