blob: 79e15582ffb79e2f2cb80684dfe22e60ac6bccb4 [file] [log] [blame]
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" >
<article>
<title>The C Code Generator</title>
<section>
<title>Design</title>
<para>The overall goal is to keep the code-generator as simple
as possible. Hopefully performance isn't sacrificed to that end!</para>
<para>Anyways, we generate very little code: we mostly generate
structure definitions (for example enums and structures
for messages) and some metadata which is basically
reflection-type data.</para>
<para>The serializing and deserializing is implemented in a library,
called libprotobuf-c rather than generated code.</para>
</section>
<section>
<title>The Generated Code</title>
<para>
For each enum, we generate a C enum.
For each message, we generate a C structure
which can be cast to a <type>ProtobufCMessage</type>.
</para>
<para>
For each enum and message, we generate a descriptor
object that allows us to implement a kind of reflection
on the structures.
</para>
<section><title>Naming Conventions</title>
<para>First, some naming conventions:
<itemizedlist>
<listitem><para>
The name of the type for enums and messages and services
is camel case (meaning WordsAreCrammedTogether)
except that double-underscores are used to delimit
scopes. For example:
<programlisting><![CDATA[
package foo.bar;
message BazBah {
int32 val;
}
]]></programlisting>
would generate a C type <type>Foo__Bar__BazBah</type>.</para>
</listitem><listitem>
<para>Functions and globals are all lowercase, with camel-case
words separated by single underscores; namespaces are separated with
double-underscores.
For example:
<programlisting><![CDATA[
Foo__Bar__BazBah *foo__bar__baz_bah__unpack
(ProtobufCAllocator *allocator,
size_t length,
const unsigned char *data);
]]></programlisting>
</para>
</listitem><listitem>
<para>Enums values are all uppercase.</para>
</listitem>
<listitem><para>
Stuff we dd to your symbol names will also be
separated by a double-underscore. For example,
the unpack method above.</para></listitem>
</itemizedlist>
</para>
</section>
<section><title>Generated Descriptors</title>
<para>
We also generate descriptor objects for messages
and enums. These are declared in the .h files:
<programlisting><![CDATA[
extern const ProtobufCMessageDescriptor
foo__bar__baz_bah__descriptor;
]]></programlisting>
</para>
</section>
<section><title>Message Methods</title>
<para>
The message structures all begin with <type>ProtobufCMessage</type>,
so they may be cast to that type.
</para>
<para>
We generate some functions for each message:
<itemizedlist>
<listitem>
<para><function>unpack()</function>. Unpack data for a particular
message-format:
<programlisting><![CDATA[
Foo__Bar__BazBah *
foo__bar__baz_bah__unpack (ProtobufCAllocator *allocator,
size_t length,
const unsigned char *data);
]]></programlisting>
Note that <parameter>allocator</parameter> may be NULL.
</para>
</listitem>
<listitem>
<para><function>free_unpacked()</function>. Free a message
that you obtained with the unpack method:
<programlisting><![CDATA[
void
foo__bar__baz_bah__free_unpacked (Foo__Bar__BazBah *baz_bah,
ProtobufCAllocator *allocator);
]]></programlisting>
</para>
</listitem>
<listitem>
<para><function>get_packed_size()</function>. Find how long
the serialized representation of the data will be:
message-format:
<programlisting><![CDATA[
size_t
foo__bar__baz_bah__get_packed_size
(const Foo__Bar__BazBah *message);
]]></programlisting>
</para>
</listitem>
<listitem>
<para><function>pack()</function>. Pack message
into buffer; assumes that buffer is long enough (use get_packed_size first!).
<programlisting><![CDATA[
size_t
foo__bar__baz_bah__pack
(const Foo__Bar__BazBah *message,
unsigned char *packed_data_out);
]]></programlisting>
</para>
</listitem>
<listitem>
<para><function>pack_to_buffer()</function>. Pack message
into virtualize buffer.
<programlisting><![CDATA[
size_t
foo__bar__baz_bah__pack_to_buffer
(const Foo__Bar__BazBah *message,
ProtobufCBuffer *buffer);
]]></programlisting>
</para>
</listitem>
</itemizedlist>
</para>
</section>
<section><title>Services</title>
<para>
Services are collections of methods each having an input and output type.
Unlike messages where we generate a structure that corresponds
to the actual message object, for services we generate
a function that creates a <type>ProtobufCService</type>
from a collection of user-defined methods.
</para>
<para>
We also define simple functions that invoke each method of a service.
These functions work if the service is created by
the <function>create_service</function> generated function
or if the service is instantiated by an RPC system.
</para>
<para>
Suppose we have a .proto file:
<programlisting><![CDATA[
message A {
required uint32 val;
}
message B {
required string foo;
}
service Convert {
rpc Itoa (A) returns (B);
rpc Atoi (B) returns (A);
}
]]></programlisting>
We will get generated code:
<programlisting><![CDATA[
struct _Convert_Service {
ProtobufCService base;
void (*itoa) (Convert_Service *service,
const A *input,
B__Closure closure,
void *closure_data);
void (*atoi) (Convert_Service *service,
const B *input,
A__Closure closure,
void *closure_data);
void (*destroy) (Convert_Service *service);
};
]]></programlisting>
<programlisting><![CDATA[
/* structure derived from Convert_Service. */
typedef struct {
Convert_Service base; /* must be first member */
unsigned radix;
} Convert_WithRadix;
/* convert int to string (not really implemented) */
static void radix_itoa (Convert_Service *service,
const A *input,
B__Closure closure,
void *closure_data)
{
char buf[256];
Convert_WithRadix *wr = (Convert_WithRadix *) service;
B rv;
print_int_with_radix (input->val, wr->radix, buf);
rv.descriptor = &b__descriptor;
rv.str = buf;
closure (&rv, closure_data);
}
/* convert string to int: use strtoul */
static void radix_atoi (Convert_Service *service,
const B *input,
A__Closure closure,
void *closure_data)
{
Convert_WithRadix *wr = (Convert_WithRadix *) service;
A rv;
rv.val = strtoul (input->val, NULL, wr->radix);
rv.descriptor = &a__descriptor;
closure (&rv, closure_data);
}
/* create a new convert service by radix */
ProtobufCService *
create_convert_service_from_radix (unsigned radix)
{
Convert_WithRadix *wr = malloc (sizeof (Convert_WithRadix));
convert__init (wr, (Convert__ServiceDestroy) free);
wr->base.itoa = radix_itoa;
wr->base.atoi = radix_atoi;
wr->radix = radix;
return (ProtobufCService *) wr;
}
]]></programlisting>
Just like with messages, you may cast
from <type>Convert_Service</type> to <type>ProtobufCService</type>,
at least as long as you have run the __init function.
</para>
<para>
Conversely, we generate functions to help you invoke service
methods on generic <type>ProtobufCService</type> objects.
These go through the <function>invoke()</function> of service
and they work on both services created with create_service
as well as factory-provided services like those provided by RPC systems.
For example:
<programlisting><![CDATA[
void convert__itoa (ProtobufCService *service,
const B *input,
A__Closure closure,
void *closure_data);
]]></programlisting>
</para>
</section>
</section>
<section>
<title>The protobuf-c Library</title>
<para>This library is used by the generated code;
it includes common structures and enums,
as well as functions that most users of the generated code
will want.</para>
<para>
There are three main components:
<orderedlist>
<listitem><para>the Descriptor structures</para></listitem>
<listitem><para>helper structures and objects</para></listitem>
<listitem><para>packing and unpacking code</para></listitem>
</orderedlist>
</para>
</section>
<section>
<title>protobuf-c: the Descriptor structures</title>
<para>For example, enums are described in terms of structures:
<programlisting><![CDATA[
struct _ProtobufCEnumValue
{
const char *name;
const char *c_name;
int value;
};
struct _ProtobufCEnumDescriptor
{
const char *name;
const char *short_name;
const char *package_name;
/* sorted by value */
unsigned n_values;
const ProtobufCEnumValue *values;
/* sorted by name */
unsigned n_value_names;
const ProtobufCEnumValue *values_by_name;
};
]]></programlisting></para>
<para>Likewise, messages are described by:
<programlisting><![CDATA[
struct _ProtobufCFieldDescriptor
{
const char *name;
int id;
ProtobufCFieldLabel label;
ProtobufCFieldType type;
unsigned quantifier_offset;
unsigned offset;
void *descriptor; /* for MESSAGE and ENUM types */
};
struct _ProtobufCMessageDescriptor
{
const char *name;
const char *short_name;
const char *package_name;
/* sorted by field-id */
unsigned n_fields;
const ProtobufCFieldDescriptor *fields;
};
]]></programlisting></para>
<para>
And finally services are described by:
<programlisting><![CDATA[
struct _ProtobufCMethodDescriptor
{
const char *name;
const ProtobufCMessageDescriptor *input;
const ProtobufCMessageDescriptor *output;
};
struct _ProtobufCServiceDescriptor
{
const char *name;
unsigned n_methods;
ProtobufCMethodDescriptor *methods; // sorted by name
};
]]></programlisting></para>
</section>
<section>
<title>protobuf-c: helper structures and typedefs</title>
<para>We defined typedefs for a few types
which are used in .proto files but do not
have obvious standard C equivalents:
<itemizedlist>
<listitem><para>a boolean type (<type>protobuf_c_boolean</type>)</para></listitem>
<listitem><para>a binary-data (bytes) type (<type>ProtobufCBinaryData</type>)</para></listitem>
<listitem><para>the various int types (<type>int32_t</type>, <type>uint32_t</type>, <type>int64_t</type>, <type>uint64_t</type>)
are obtained by including <filename>inttypes.h</filename></para></listitem>
</itemizedlist>
</para>
<para>We also define a simple allocator object, ProtobufCAllocator
that let's you control how allocations are done.
This is predominately used for parsing.</para>
<para>There is a virtual buffer facility that
only has to implement a method to append binary-data
to the buffer. This can be used to serialize messages
to different targets (instead of a flat slab of data).</para>
<para>We define a base-type for all messages,
for code that handles messages generically.
All it has is the descriptor object.</para>
<section id="buffers">
<title>Buffers</title>
<para>One important helper type is the <type>ProtobufCBuffer</type>
which allows you to abstract the target of serialization. The only
thing that a buffer has is an <function>append</function> method:
<programlisting><![CDATA[
struct _ProtobufCBuffer
{
void (*append)(ProtobufCBuffer *buffer,
size_t len,
const unsigned char *data);
}
]]></programlisting>
ProtobufCBuffer subclasses are often defined on the stack.
</para>
<para>
For example, to write to a <type>FILE</type> you could make:
<programlisting><![CDATA[
typedef struct
{
ProtobufCBuffer base;
FILE *fp;
} BufferAppendToFile
static void my_buffer_file_append (ProtobufCBuffer *buffer,
unsigned len,
const unsigned char *data)
{
BufferAppendToFile *file_buf = (BufferAppendToFile *) buffer;
fwrite (data, len, 1, file_buf->fp); // XXX: no error handling!
}
]]></programlisting>
</para>
<para>
To use this new type of Buffer, you would do something like:
<programlisting><![CDATA[
...
BufferAppendToFile tmp;
tmp.base.append = my_buffer_file_append;
tmp.fp = fp;
protobuf_c_message_pack_to_buffer (&message, &tmp);
...
]]></programlisting>
</para>
<para>
A commonly builtin subtype is the BufferSimple
which is declared on the stack and uses a scratch buffer provided by the user
for its initial allocation. It does exponential resizing.
To create a BufferSimple, use code like:
<programlisting><![CDATA[
unsigned char pad[128];
ProtobufCBufferSimple buf = PROTOBUF_C_BUFFER_SIMPLE_INIT (pad);
ProtobufCBuffer *buffer = (ProtobufCBuffer *) &simple;
protobuf_c_buffer_append (buffer, 6, (unsigned char *) "hi mom");
]]></programlisting>
You can access the data as buf.len and buf.data. For example,
<programlisting><![CDATA[
assert (buf.len == 6);
assert (memcmp (buf.data, "hi mom", 6) == 0);
]]></programlisting>
To finish up, use:
<programlisting><![CDATA[
PROTOBUF_C_BUFFER_SIMPLE_CLEAR (&buf);
]]></programlisting>
</para>
</section>
</section>
<section>
<title>protobuf-c: packing and unpacking messages</title>
<para>
To pack messages one first computes their packed size,
then provide a buffer to pack into.
<programlisting><![CDATA[
size_t protobuf_c_message_get_packed_size
(ProtobufCMessage *message);
void protobuf_c_message_pack (ProtobufCMessage *message,
unsigned char *out);
]]></programlisting>
</para>
<para>
Or you can use the "streaming" approach:
<programlisting><![CDATA[
void protobuf_c_message_pack_to_buffer
(ProtobufCMessage *message,
ProtobufCBuffer *buffer);
]]></programlisting>
where <type>ProtobufCBuffer</type> is a base object with an append metod.
See <xref linkend="buffers" />.
</para>
<para>
To unpack messages, you should simple call
<programlisting><![CDATA[
ProtobufCMessage *
protobuf_c_message_unpack (const ProtobufCMessageDescriptor *,
ProtobufCAllocator *allocator,
size_t len,
const unsigned char *data);
]]></programlisting>
If you pass NULL for <parameter>allocator</parameter>, then
the default allocator will be used.
</para>
<para>
You can cast the result to the type that matches
the descriptor.
</para>
<para>
The result of unpacking should be freed with protobuf_c_message_free().
</para>
</section>
<section>
<title>Author</title>
<para>Dave Benson.</para>
</section>
</article>