LEJP JSON Stream Parser

test app./test-apps/test-lejp.c -> libwebsockets-test-lejp

LEJP is a lightweight JSON stream parser.

The features are:

  • completely immune to input fragmentation, give it any size blocks of JSON as they become available, 1 byte, or 100K at a time give identical parsing results
  • input chunks discarded as they are parsed, whole JSON never needed in memory
  • nonrecursive, fixed stack usage of a few dozen bytes
  • no heap allocations at all, just requires ~500 byte context usually on caller stack
  • creates callbacks to a user-provided handler as members are parsed out
  • no payload size limit, supports huge / endless strings bigger than system memory
  • collates utf-8 text payloads into a 250-byte chunk buffer in the json parser context object for ease of access

Type handling

LEJP leaves all numbers in text form, they are signalled in different callbacks according to int or float, but delivered as text strings in the first ctx->npos chars of ctx->buf.

For numeric types, you would typically use atoi() or similar to recover the number as a host type.

Callback reasons

The user callback does not have to handle any callbacks, it only needs to process the data for the ones it is interested in.

Callback reasonJSON structureAssociated data
LEJPCB_CONSTRUCTEDCreated the parse context
LEJPCB_DESTRUCTEDDestroyed the parse context
LEJPCB_COMPLETEThe parsing completed OK
LEJPCB_FAILEDThe parsing failed
LEJPCB_VAL_TRUEboolean true
LEJPCB_VAL_FALSEboolean false
LEJPCB_PAIR_NAMEThe name part of a JSON key: value map pairctx->buf
LEJPCB_VAL_STR_STARTA UTF-8 string is starting
LEJPCB_VAL_STR_CHUNKThe next string chunkctx->npos bytes in ctx->buf
LEJPCB_VAL_STR_ENDThe last string chunkctx->npos bytes in ctx->buf
LEJPCB_ARRAY_STARTAn array is starting
LEJPCB_ARRAY_ENDAn array has ended
LEJPCB_OBJECT_STARTA JSON object is starting
LEJPCB_OBJECT_ENDA JSON object has ended

Handling JSON UTF-8 strings

When a string is parsed, an advisory callback of LECPCB_VAL_STR_START occurs first. No payload is delivered with the START callback.

Payload is collated into ctx->buf[], the valid length is in ctx->npos.

For short strings or blobs where the length is known, the whole payload is delivered in a single LECPCB_VAL_STR_END callback.

For payloads larger than the size of ctx->buf[], LECPCB_VAL_STR_CHUNK callbacks occur delivering each sequential bufferload.

The last chunk (which may be zero length) is delievered by LECPCB_VAL_STR_END.

Parsing paths

LEJP maintains a "parsing path" in ctx->path that represents the context of the callback events. As a convenience, at LEJP context creation time, you can pass in an array of path strings you want to match on, and have any match checkable in the callback using ctx->path_match, it's 0 if no active match, or the match index from your path array starting from 1 for the first entry.

CBOR elementRepresentation in path
JSON Array[]
JSON Map entry key stringkeystring

Comparison with LECP (CBOR parser)

LECP is based on the same principles as LEJP and shares most of the callbacks. The major differences:

  • LEJP value callbacks all appear in ctx->buf[], ie, floating-point is provided to the callback in ascii form like "1.0". CBOR provides a more strict typing system, and the different type values are provided either in ctx->buf[] for blobs or utf-8 text strtings, or the item.u union for converted types, with additional callback reasons specific to each type.

  • CBOR "maps" use _OBJECT_START and _END parsing callbacks around the key / value pairs. LEJP has a special callback type PAIR_NAME for the key string / integer, but in LECP these are provided as generic callbacks dependent on type, ie, generic string callbacks or integer ones, and the value part is represented according to whatever comes.