String escaping is a surprisingly performance intense operation.
I spent many hours in the profiler, and the root problem seems
to be that there is no easy way to detect the character classes
that need to be escaped, where the root cause is that these
characters are spread all over the ascii table. I tried
several approaches, including call tables, re-structuring
the case condition, different types of if conditions and
reordering the if conditions. What worked out best is this:
The regular case is that a character must not be escaped. So
we want to process that as fast as possible. In order to
detect this as quickly as possible, we have a lookup table
that tells us if a char needs escaping ("needsEscape", below).
This table has a spot for each ascii code. Note that it uses
chars, because anything larger causes worse cache operation
and anything smaller requires bit indexing and masking
operations, which are also comparatively costly. So plain
chars work best. What we then do is a single lookup into the
table to detect if we need to escape a character. If we need,
we go into the depth of actual escape detection. But if we
do NOT need to escape, we just quickly advance the index
and are done with that char. Note that it may look like the
extra table lookup costs performance, but right the contrary
is the case. We get amore than 30% performance increase due
to it (compared to the latest version of the code that did not
do the lookups.
the previous method relied on return codes to convey failure,
however, most often the return code was not checked. This could
lead to several issues, among them malformedness of the generated
string. Also, it is very unlikely to really get into an OOM
condition and, if so, it is extremely likely that the rest of
the (calling) app will not behave correctly.
This patch takes a different approach: if we run out of memory,
we ignore data for which no memory could be allocated. In essence,
this can result in the same random malformedness as the previous
approach, but we do not try to signal back to the orignal caller.
Note that with the previous approach this also did not work
reliably. The benefit of this approach is that we gain some
performance, have simpler code and have a clearly defined failure
state.
The regular printbuf calls always keep the string properly
terminated. However, this is not required during construction of
the printable string and thus simply unnecessary overhead. I have
added a new function which avoids writing until the result is
finally finished. Both profiler (valgrind's callgrind) and
wallclock timing show this is a clear improvement).
Do not call memcpy for single-byte values, as writing them directly
involves fewer overhead and thus is faster. Verified effect with
valgrind's callgrind profiler and wallclock timing.
We avoid writing to "memory" just in order to check loop
termination condition. While this looks like it makes really
no difference, the profiler (valgrind's callgrind tool) tells it
actually does. Also, wallclock time taken by time is consistently
a bit lower than with the previous method. As it doesn't hurt
anything (in terms of more complex code or so), I think the
change is worth it.
This function and its helpers used to call sprintbuf() for constant
strings. This requires a lot of performance, because the call ends
up in vprintf(), which is not necessary as there is no modification
of the string to be done.
In some other places of the code, printbuf_memappend() was used in
such cases. This is the right route to follow, because it greatly
decreases processing time.
This patch does this replacement and keeps using sprintbuf() only
where it really is required. In my testing, this resulted in a very
noticable performance improvement (n *times* faster, where n
depends on the details of what is formatted and with which options).
smalls strings inside json_objects had a high overhead because dynamic
memory allocation was needed for each of them. This also meant that the
pointer needed to be updated. This is now changed so that small strings
can directly be stored inside the json_object. Note that on the regular
64 bit machines a pointer takes 8 bytes. So even without increasing
memory, we could store string up to 7 bytes directly inside the object.
The max size is configurable. I have selected up to 31 bytes (which
means a buffer of 32 including the NUL byte). This brings a 24-bytes
memory overhead, but I consider that still useful because the memory
allocator usually also has quite some overhead (16 bytes) for
dyn alloced memory blocks. In any case, the max buffer size can be
tweaked via #define.
These items were used for statistics tracking, but no code at all
exists to consume them. By removing them we save
a) space
because they counters required space, and did so in each and every
json object
b) performance
because calloc() needs to write less data and the counters are
no longer maintained; cache performance can be better, load
on OS main memory is lighter
We could conditionally enable/disable these counters, but I have not
done this they were really nowhere used and it looked more like a
left-over from the import of hashtable code.
This also adds a new API json_global_set_string_hash() which permits
to select the hash function. The default one is the only one that was
previously present. So there are no changes to existing apps, and the
new hash function needs to be explicitely be opted in. Especially for
smaller strings, the perllike functions seems to be around twice as
fast as the other one, with similarly good results in value distribution.
In certain C libraries (e.g uClibc), isnan() and related functions are
implemented in libm, so json-c needs to link against it. This commit
therefore adds an AC_TRY_LINK() test to check whether a program
calling isnan() can be properly linked with no special flags. If not,
we assume linking against libm is needed.
The json-c.pc.in file is also adjusted so that in the case of static
linking against json-c, -lm is also used.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>