Do not call memcpy for single-byte values, as writing them directly involves fewer overhead and thus is faster. Verified effect with valgrind's callgrind profiler and wallclock timing.