Blame - Documentation/filesystems/relayfs.txt - kernel/msm-4.9

blob: 5832377b7340ed4b811b1a5ad12cad4147a2c9bf [file] [log] [blame]

Tom Zanussi	e82894f	2005-09-06 15:16:30 -0700	[diff] [blame]	1
				2	relayfs - a high-speed data relay filesystem
				3	============================================
				4
				5	relayfs is a filesystem designed to provide an efficient mechanism for
				6	tools and facilities to relay large and potentially sustained streams
				7	of data from kernel space to user space.
				8
				9	The main abstraction of relayfs is the 'channel'. A channel consists
				10	of a set of per-cpu kernel buffers each represented by a file in the
				11	relayfs filesystem. Kernel clients write into a channel using
				12	efficient write functions which automatically log to the current cpu's
				13	channel buffer. User space applications mmap() the per-cpu files and
				14	retrieve the data as it becomes available.
				15
				16	The format of the data logged into the channel buffers is completely
				17	up to the relayfs client; relayfs does however provide hooks which
Marcelo Tosatti	afeda2c	2005-09-16 19:28:01 -0700	[diff] [blame]	18	allow clients to impose some structure on the buffer data. Nor does
Tom Zanussi	e82894f	2005-09-06 15:16:30 -0700	[diff] [blame]	19	relayfs implement any form of data filtering - this also is left to
				20	the client. The purpose is to keep relayfs as simple as possible.
				21
				22	This document provides an overview of the relayfs API. The details of
				23	the function parameters are documented along with the functions in the
				24	filesystem code - please see that for details.
				25
				26	Semantics
				27	=========
				28
				29	Each relayfs channel has one buffer per CPU, each buffer has one or
				30	more sub-buffers. Messages are written to the first sub-buffer until
				31	it is too full to contain a new message, in which case it it is
				32	written to the next (if available). Messages are never split across
				33	sub-buffers. At this point, userspace can be notified so it empties
				34	the first sub-buffer, while the kernel continues writing to the next.
				35
				36	When notified that a sub-buffer is full, the kernel knows how many
				37	bytes of it are padding i.e. unused. Userspace can use this knowledge
				38	to copy only valid data.
				39
				40	After copying it, userspace can notify the kernel that a sub-buffer
				41	has been consumed.
				42
				43	relayfs can operate in a mode where it will overwrite data not yet
				44	collected by userspace, and not wait for it to consume it.
				45
				46	relayfs itself does not provide for communication of such data between
Tom Zanussi	6b34350	2006-01-08 01:02:32 -0800	[diff] [blame]	47	userspace and kernel, allowing the kernel side to remain simple and
				48	not impose a single interface on userspace. It does provide a set of
				49	examples and a separate helper though, described below.
Tom Zanussi	e82894f	2005-09-06 15:16:30 -0700	[diff] [blame]	50
Tom Zanussi	6b34350	2006-01-08 01:02:32 -0800	[diff] [blame]	51	klog and relay-apps example code
				52	================================
Tom Zanussi	e82894f	2005-09-06 15:16:30 -0700	[diff] [blame]	53
Tom Zanussi	6b34350	2006-01-08 01:02:32 -0800	[diff] [blame]	54	relayfs itself is ready to use, but to make things easier, a couple
				55	simple utility functions and a set of examples are provided.
Tom Zanussi	e82894f	2005-09-06 15:16:30 -0700	[diff] [blame]	56
Tom Zanussi	6b34350	2006-01-08 01:02:32 -0800	[diff] [blame]	57	The relay-apps example tarball, available on the relayfs sourceforge
				58	site, contains a set of self-contained examples, each consisting of a
				59	pair of .c files containing boilerplate code for each of the user and
				60	kernel sides of a relayfs application; combined these two sets of
				61	boilerplate code provide glue to easily stream data to disk, without
				62	having to bother with mundane housekeeping chores.
Tom Zanussi	e82894f	2005-09-06 15:16:30 -0700	[diff] [blame]	63
Tom Zanussi	6b34350	2006-01-08 01:02:32 -0800	[diff] [blame]	64	The 'klog debugging functions' patch (klog.patch in the relay-apps
				65	tarball) provides a couple of high-level logging functions to the
				66	kernel which allow writing formatted text or raw data to a channel,
				67	regardless of whether a channel to write into exists or not, or
				68	whether relayfs is compiled into the kernel or is configured as a
				69	module. These functions allow you to put unconditional 'trace'
				70	statements anywhere in the kernel or kernel modules; only when there
				71	is a 'klog handler' registered will data actually be logged (see the
				72	klog and kleak examples for details).
				73
				74	It is of course possible to use relayfs from scratch i.e. without
				75	using any of the relay-apps example code or klog, but you'll have to
				76	implement communication between userspace and kernel, allowing both to
				77	convey the state of buffers (full, empty, amount of padding).
				78
				79	klog and the relay-apps examples can be found in the relay-apps
				80	tarball on http://relayfs.sourceforge.net
				81
Tom Zanussi	e82894f	2005-09-06 15:16:30 -0700	[diff] [blame]	82
				83	The relayfs user space API
				84	==========================
				85
				86	relayfs implements basic file operations for user space access to
				87	relayfs channel buffer data. Here are the file operations that are
				88	available and some comments regarding their behavior:
				89
				90	open() enables user to open an _existing_ buffer.
				91
				92	mmap() results in channel buffer being mapped into the caller's
				93	memory space. Note that you can't do a partial mmap - you must
				94	map the entire file, which is NRBUF * SUBBUFSIZE.
				95
				96	read() read the contents of a channel buffer. The bytes read are
				97	'consumed' by the reader i.e. they won't be available again
				98	to subsequent reads. If the channel is being used in
				99	no-overwrite mode (the default), it can be read at any time
				100	even if there's an active kernel writer. If the channel is
				101	being used in overwrite mode and there are active channel
				102	writers, results may be unpredictable - users should make
				103	sure that all logging to the channel has ended before using
				104	read() with overwrite mode.
				105
				106	poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are
				107	notified when sub-buffer boundaries are crossed.
				108
				109	close() decrements the channel buffer's refcount. When the refcount
				110	reaches 0 i.e. when no process or kernel client has the buffer
				111	open, the channel buffer is freed.
				112
				113
				114	In order for a user application to make use of relayfs files, the
				115	relayfs filesystem must be mounted. For example,
				116
				117	mount -t relayfs relayfs /mnt/relay
				118
				119	NOTE: relayfs doesn't need to be mounted for kernel clients to create
				120	or use channels - it only needs to be mounted when user space
				121	applications need access to the buffer data.
				122
				123
				124	The relayfs kernel API
				125	======================
				126
				127	Here's a summary of the API relayfs provides to in-kernel clients:
				128
				129
				130	channel management functions:
				131
				132	relay_open(base_filename, parent, subbuf_size, n_subbufs,
				133	callbacks)
				134	relay_close(chan)
				135	relay_flush(chan)
				136	relay_reset(chan)
				137	relayfs_create_dir(name, parent)
				138	relayfs_remove_dir(dentry)
Tom Zanussi	925ac8a	2006-01-08 01:02:27 -0800	[diff] [blame]	139	relayfs_create_file(name, parent, mode, fops, data)
				140	relayfs_remove_file(dentry)
Tom Zanussi	e82894f	2005-09-06 15:16:30 -0700	[diff] [blame]	141
				142	channel management typically called on instigation of userspace:
				143
				144	relay_subbufs_consumed(chan, cpu, subbufs_consumed)
				145
				146	write functions:
				147
				148	relay_write(chan, data, length)
				149	__relay_write(chan, data, length)
				150	relay_reserve(chan, length)
				151
				152	callbacks:
				153
				154	subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
				155	buf_mapped(buf, filp)
				156	buf_unmapped(buf, filp)
Tom Zanussi	df49af8	2006-01-08 01:02:30 -0800	[diff] [blame]	157	create_buf_file(filename, parent, mode, buf, is_global)
Tom Zanussi	03d78d1	2006-01-08 01:02:29 -0800	[diff] [blame]	158	remove_buf_file(dentry)
Tom Zanussi	e82894f	2005-09-06 15:16:30 -0700	[diff] [blame]	159
				160	helper functions:
				161
				162	relay_buf_full(buf)
				163	subbuf_start_reserve(buf, length)
				164
				165
				166	Creating a channel
				167	------------------
				168
				169	relay_open() is used to create a channel, along with its per-cpu
				170	channel buffers. Each channel buffer will have an associated file
				171	created for it in the relayfs filesystem, which can be opened and
				172	mmapped from user space if desired. The files are named
				173	basename0...basenameN-1 where N is the number of online cpus, and by
				174	default will be created in the root of the filesystem. If you want a
				175	directory structure to contain your relayfs files, you can create it
				176	with relayfs_create_dir() and pass the parent directory to
				177	relay_open(). Clients are responsible for cleaning up any directory
				178	structure they create when the channel is closed - use
				179	relayfs_remove_dir() for that.
				180
				181	The total size of each per-cpu buffer is calculated by multiplying the
				182	number of sub-buffers by the sub-buffer size passed into relay_open().
				183	The idea behind sub-buffers is that they're basically an extension of
				184	double-buffering to N buffers, and they also allow applications to
				185	easily implement random-access-on-buffer-boundary schemes, which can
				186	be important for some high-volume applications. The number and size
				187	of sub-buffers is completely dependent on the application and even for
				188	the same application, different conditions will warrant different
				189	values for these parameters at different times. Typically, the right
				190	values to use are best decided after some experimentation; in general,
				191	though, it's safe to assume that having only 1 sub-buffer is a bad
				192	idea - you're guaranteed to either overwrite data or lose events
				193	depending on the channel mode being used.
				194
				195	Channel 'modes'
				196	---------------
				197
				198	relayfs channels can be used in either of two modes - 'overwrite' or
				199	'no-overwrite'. The mode is entirely determined by the implementation
				200	of the subbuf_start() callback, as described below. In 'overwrite'
				201	mode, also known as 'flight recorder' mode, writes continuously cycle
				202	around the buffer and will never fail, but will unconditionally
				203	overwrite old data regardless of whether it's actually been consumed.
				204	In no-overwrite mode, writes will fail i.e. data will be lost, if the
				205	number of unconsumed sub-buffers equals the total number of
				206	sub-buffers in the channel. It should be clear that if there is no
				207	consumer or if the consumer can't consume sub-buffers fast enought,
				208	data will be lost in either case; the only difference is whether data
				209	is lost from the beginning or the end of a buffer.
				210
				211	As explained above, a relayfs channel is made of up one or more
				212	per-cpu channel buffers, each implemented as a circular buffer
				213	subdivided into one or more sub-buffers. Messages are written into
				214	the current sub-buffer of the channel's current per-cpu buffer via the
				215	write functions described below. Whenever a message can't fit into
				216	the current sub-buffer, because there's no room left for it, the
				217	client is notified via the subbuf_start() callback that a switch to a
				218	new sub-buffer is about to occur. The client uses this callback to 1)
				219	initialize the next sub-buffer if appropriate 2) finalize the previous
				220	sub-buffer if appropriate and 3) return a boolean value indicating
				221	whether or not to actually go ahead with the sub-buffer switch.
				222
				223	To implement 'no-overwrite' mode, the userspace client would provide
				224	an implementation of the subbuf_start() callback something like the
				225	following:
				226
				227	static int subbuf_start(struct rchan_buf *buf,
				228	void *subbuf,
				229	void *prev_subbuf,
				230	unsigned int prev_padding)
				231	{
				232	if (prev_subbuf)
				233	((unsigned )prev_subbuf) = prev_padding;
				234
				235	if (relay_buf_full(buf))
				236	return 0;
				237
				238	subbuf_start_reserve(buf, sizeof(unsigned int));
				239
				240	return 1;
				241	}
				242
				243	If the current buffer is full i.e. all sub-buffers remain unconsumed,
				244	the callback returns 0 to indicate that the buffer switch should not
				245	occur yet i.e. until the consumer has had a chance to read the current
				246	set of ready sub-buffers. For the relay_buf_full() function to make
				247	sense, the consumer is reponsible for notifying relayfs when
				248	sub-buffers have been consumed via relay_subbufs_consumed(). Any
				249	subsequent attempts to write into the buffer will again invoke the
				250	subbuf_start() callback with the same parameters; only when the
				251	consumer has consumed one or more of the ready sub-buffers will
				252	relay_buf_full() return 0, in which case the buffer switch can
				253	continue.
				254
				255	The implementation of the subbuf_start() callback for 'overwrite' mode
				256	would be very similar:
				257
				258	static int subbuf_start(struct rchan_buf *buf,
				259	void *subbuf,
				260	void *prev_subbuf,
				261	unsigned int prev_padding)
				262	{
				263	if (prev_subbuf)
				264	((unsigned )prev_subbuf) = prev_padding;
				265
				266	subbuf_start_reserve(buf, sizeof(unsigned int));
				267
				268	return 1;
				269	}
				270
				271	In this case, the relay_buf_full() check is meaningless and the
				272	callback always returns 1, causing the buffer switch to occur
				273	unconditionally. It's also meaningless for the client to use the
				274	relay_subbufs_consumed() function in this mode, as it's never
				275	consulted.
				276
				277	The default subbuf_start() implementation, used if the client doesn't
				278	define any callbacks, or doesn't define the subbuf_start() callback,
				279	implements the simplest possible 'no-overwrite' mode i.e. it does
				280	nothing but return 0.
				281
				282	Header information can be reserved at the beginning of each sub-buffer
				283	by calling the subbuf_start_reserve() helper function from within the
				284	subbuf_start() callback. This reserved area can be used to store
				285	whatever information the client wants. In the example above, room is
				286	reserved in each sub-buffer to store the padding count for that
				287	sub-buffer. This is filled in for the previous sub-buffer in the
				288	subbuf_start() implementation; the padding value for the previous
				289	sub-buffer is passed into the subbuf_start() callback along with a
				290	pointer to the previous sub-buffer, since the padding value isn't
				291	known until a sub-buffer is filled. The subbuf_start() callback is
				292	also called for the first sub-buffer when the channel is opened, to
				293	give the client a chance to reserve space in it. In this case the
				294	previous sub-buffer pointer passed into the callback will be NULL, so
				295	the client should check the value of the prev_subbuf pointer before
				296	writing into the previous sub-buffer.
				297
				298	Writing to a channel
				299	--------------------
				300
				301	kernel clients write data into the current cpu's channel buffer using
				302	relay_write() or __relay_write(). relay_write() is the main logging
				303	function - it uses local_irqsave() to protect the buffer and should be
				304	used if you might be logging from interrupt context. If you know
				305	you'll never be logging from interrupt context, you can use
				306	__relay_write(), which only disables preemption. These functions
				307	don't return a value, so you can't determine whether or not they
				308	failed - the assumption is that you wouldn't want to check a return
				309	value in the fast logging path anyway, and that they'll always succeed
				310	unless the buffer is full and no-overwrite mode is being used, in
				311	which case you can detect a failed write in the subbuf_start()
				312	callback by calling the relay_buf_full() helper function.
				313
				314	relay_reserve() is used to reserve a slot in a channel buffer which
				315	can be written to later. This would typically be used in applications
				316	that need to write directly into a channel buffer without having to
				317	stage data in a temporary buffer beforehand. Because the actual write
				318	may not happen immediately after the slot is reserved, applications
				319	using relay_reserve() can keep a count of the number of bytes actually
				320	written, either in space reserved in the sub-buffers themselves or as
				321	a separate array. See the 'reserve' example in the relay-apps tarball
				322	at http://relayfs.sourceforge.net for an example of how this can be
				323	done. Because the write is under control of the client and is
				324	separated from the reserve, relay_reserve() doesn't protect the buffer
				325	at all - it's up to the client to provide the appropriate
				326	synchronization when using relay_reserve().
				327
				328	Closing a channel
				329	-----------------
				330
				331	The client calls relay_close() when it's finished using the channel.
				332	The channel and its associated buffers are destroyed when there are no
				333	longer any references to any of the channel buffers. relay_flush()
				334	forces a sub-buffer switch on all the channel buffers, and can be used
				335	to finalize and process the last sub-buffers before the channel is
				336	closed.
				337
Tom Zanussi	925ac8a	2006-01-08 01:02:27 -0800	[diff] [blame]	338	Creating non-relay files
				339	------------------------
				340
				341	relay_open() automatically creates files in the relayfs filesystem to
				342	represent the per-cpu kernel buffers; it's often useful for
				343	applications to be able to create their own files alongside the relay
				344	files in the relayfs filesystem as well e.g. 'control' files much like
				345	those created in /proc or debugfs for similar purposes, used to
				346	communicate control information between the kernel and user sides of a
				347	relayfs application. For this purpose the relayfs_create_file() and
				348	relayfs_remove_file() API functions exist. For relayfs_create_file(),
				349	the caller passes in a set of user-defined file operations to be used
				350	for the file and an optional void * to a user-specified data item,
				351	which will be accessible via inode->u.generic_ip (see the relay-apps
				352	tarball for examples). The file_operations are a required parameter
				353	to relayfs_create_file() and thus the semantics of these files are
				354	completely defined by the caller.
				355
				356	See the relay-apps tarball at http://relayfs.sourceforge.net for
				357	examples of how these non-relay files are meant to be used.
				358
Tom Zanussi	03d78d1	2006-01-08 01:02:29 -0800	[diff] [blame]	359	Creating relay files in other filesystems
				360	-----------------------------------------
				361
				362	By default of course, relay_open() creates relay files in the relayfs
				363	filesystem. Because relay_file_operations is exported, however, it's
				364	also possible to create and use relay files in other pseudo-filesytems
				365	such as debugfs.
				366
				367	For this purpose, two callback functions are provided,
				368	create_buf_file() and remove_buf_file(). create_buf_file() is called
				369	once for each per-cpu buffer from relay_open() to allow the client to
				370	create a file to be used to represent the corresponding buffer; if
				371	this callback is not defined, the default implementation will create
				372	and return a file in the relayfs filesystem to represent the buffer.
				373	The callback should return the dentry of the file created to represent
				374	the relay buffer. Note that the parent directory passed to
				375	relay_open() (and passed along to the callback), if specified, must
				376	exist in the same filesystem the new relay file is created in. If
				377	create_buf_file() is defined, remove_buf_file() must also be defined;
				378	it's responsible for deleting the file(s) created in create_buf_file()
				379	and is called during relay_close().
				380
Tom Zanussi	df49af8	2006-01-08 01:02:30 -0800	[diff] [blame]	381	The create_buf_file() implementation can also be defined in such a way
				382	as to allow the creation of a single 'global' buffer instead of the
				383	default per-cpu set. This can be useful for applications interested
				384	mainly in seeing the relative ordering of system-wide events without
				385	the need to bother with saving explicit timestamps for the purpose of
				386	merging/sorting per-cpu files in a postprocessing step.
				387
				388	To have relay_open() create a global buffer, the create_buf_file()
				389	implementation should set the value of the is_global outparam to a
				390	non-zero value in addition to creating the file that will be used to
				391	represent the single buffer. In the case of a global buffer,
				392	create_buf_file() and remove_buf_file() will be called only once. The
				393	normal channel-writing functions e.g. relay_write() can still be used
				394	- writes from any cpu will transparently end up in the global buffer -
				395	but since it is a global buffer, callers should make sure they use the
				396	proper locking for such a buffer, either by wrapping writes in a
				397	spinlock, or by copying a write function from relayfs_fs.h and
				398	creating a local version that internally does the proper locking.
				399
Tom Zanussi	03d78d1	2006-01-08 01:02:29 -0800	[diff] [blame]	400	See the 'exported-relayfile' examples in the relay-apps tarball for
				401	examples of creating and using relay files in debugfs.
				402
Tom Zanussi	e82894f	2005-09-06 15:16:30 -0700	[diff] [blame]	403	Misc
				404	----
				405
				406	Some applications may want to keep a channel around and re-use it
				407	rather than open and close a new channel for each use. relay_reset()
				408	can be used for this purpose - it resets a channel to its initial
				409	state without reallocating channel buffer memory or destroying
				410	existing mappings. It should however only be called when it's safe to
				411	do so i.e. when the channel isn't currently being written to.
				412
				413	Finally, there are a couple of utility callbacks that can be used for
				414	different purposes. buf_mapped() is called whenever a channel buffer
				415	is mmapped from user space and buf_unmapped() is called when it's
				416	unmapped. The client can use this notification to trigger actions
				417	within the kernel application, such as enabling/disabling logging to
				418	the channel.
				419
				420
				421	Resources
				422	=========
				423
				424	For news, example code, mailing list, etc. see the relayfs homepage:
				425
				426	http://relayfs.sourceforge.net
				427
				428
				429	Credits
				430	=======
				431
				432	The ideas and specs for relayfs came about as a result of discussions
				433	on tracing involving the following:
				434
				435	Michel Dagenais <michel.dagenais@polymtl.ca>
				436	Richard Moore <richardj_moore@uk.ibm.com>
				437	Bob Wisniewski <bob@watson.ibm.com>
				438	Karim Yaghmour <karim@opersys.com>
				439	Tom Zanussi <zanussi@us.ibm.com>
				440
				441	Also thanks to Hubertus Franke for a lot of useful suggestions and bug
				442	reports.