String escaping is a surprisingly performance intense operation.
I spent many hours in the profiler, and the root problem seems
to be that there is no easy way to detect the character classes
that need to be escaped, where the root cause is that these
characters are spread all over the ascii table. I tried
several approaches, including call tables, re-structuring
the case condition, different types of if conditions and
reordering the if conditions. What worked out best is this:
The regular case is that a character must not be escaped. So
we want to process that as fast as possible. In order to
detect this as quickly as possible, we have a lookup table
that tells us if a char needs escaping ("needsEscape", below).
This table has a spot for each ascii code. Note that it uses
chars, because anything larger causes worse cache operation
and anything smaller requires bit indexing and masking
operations, which are also comparatively costly. So plain
chars work best. What we then do is a single lookup into the
table to detect if we need to escape a character. If we need,
we go into the depth of actual escape detection. But if we
do NOT need to escape, we just quickly advance the index
and are done with that char. Note that it may look like the
extra table lookup costs performance, but right the contrary
is the case. We get amore than 30% performance increase due
to it (compared to the latest version of the code that did not
do the lookups.
the previous method relied on return codes to convey failure,
however, most often the return code was not checked. This could
lead to several issues, among them malformedness of the generated
string. Also, it is very unlikely to really get into an OOM
condition and, if so, it is extremely likely that the rest of
the (calling) app will not behave correctly.
This patch takes a different approach: if we run out of memory,
we ignore data for which no memory could be allocated. In essence,
this can result in the same random malformedness as the previous
approach, but we do not try to signal back to the orignal caller.
Note that with the previous approach this also did not work
reliably. The benefit of this approach is that we gain some
performance, have simpler code and have a clearly defined failure
state.
The regular printbuf calls always keep the string properly
terminated. However, this is not required during construction of
the printable string and thus simply unnecessary overhead. I have
added a new function which avoids writing until the result is
finally finished. Both profiler (valgrind's callgrind) and
wallclock timing show this is a clear improvement).
We avoid writing to "memory" just in order to check loop
termination condition. While this looks like it makes really
no difference, the profiler (valgrind's callgrind tool) tells it
actually does. Also, wallclock time taken by time is consistently
a bit lower than with the previous method. As it doesn't hurt
anything (in terms of more complex code or so), I think the
change is worth it.
This function and its helpers used to call sprintbuf() for constant
strings. This requires a lot of performance, because the call ends
up in vprintf(), which is not necessary as there is no modification
of the string to be done.
In some other places of the code, printbuf_memappend() was used in
such cases. This is the right route to follow, because it greatly
decreases processing time.
This patch does this replacement and keeps using sprintbuf() only
where it really is required. In my testing, this resulted in a very
noticable performance improvement (n *times* faster, where n
depends on the details of what is formatted and with which options).
smalls strings inside json_objects had a high overhead because dynamic
memory allocation was needed for each of them. This also meant that the
pointer needed to be updated. This is now changed so that small strings
can directly be stored inside the json_object. Note that on the regular
64 bit machines a pointer takes 8 bytes. So even without increasing
memory, we could store string up to 7 bytes directly inside the object.
The max size is configurable. I have selected up to 31 bytes (which
means a buffer of 32 including the NUL byte). This brings a 24-bytes
memory overhead, but I consider that still useful because the memory
allocator usually also has quite some overhead (16 bytes) for
dyn alloced memory blocks. In any case, the max buffer size can be
tweaked via #define.
These items were used for statistics tracking, but no code at all
exists to consume them. By removing them we save
a) space
because they counters required space, and did so in each and every
json object
b) performance
because calloc() needs to write less data and the counters are
no longer maintained; cache performance can be better, load
on OS main memory is lighter
We could conditionally enable/disable these counters, but I have not
done this they were really nowhere used and it looked more like a
left-over from the import of hashtable code.
Arrays can already be sorted with json_object_array_sort() which uses
qsort() of the standard C library. This adds a counterpart using the
bsearch() from C.
sscanf is always a potential problem when converting numeric
values as it does not correctly handle over- and underflow
(or at least gives no indication that it has done so).
This change converts json_object_get_double() to use strtod()
according to CERT guidelines.
There are now three options: JSON_C_TO_STRING_SPACED, JSON_C_TO_STRING_PLAIN and JSON_C_TO_STRING_PRETTY.
This also add a json_object_to_file_ext() that takes the same flags.
Existing output of json_object_to_json_string() is unchanged, and uses JSON_C_TO_STRING_SPACED.
Thanks fo Grant Edwards for the initial patches.
In building large systems, there are often clashes over the
preferred base type to use for bool/boolean. At least one
experience has been with a 3rd party proprietary library which
can not be changed. In that case, boolean was a synonym for
unsigned char and used widely in packed structures.