Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 1 | An introduction to the videobuf layer |
| 2 | Jonathan Corbet <corbet@lwn.net> |
| 3 | Current as of 2.6.33 |
| 4 | |
| 5 | The videobuf layer functions as a sort of glue layer between a V4L2 driver |
| 6 | and user space. It handles the allocation and management of buffers for |
| 7 | the storage of video frames. There is a set of functions which can be used |
| 8 | to implement many of the standard POSIX I/O system calls, including read(), |
| 9 | poll(), and, happily, mmap(). Another set of functions can be used to |
| 10 | implement the bulk of the V4L2 ioctl() calls related to streaming I/O, |
| 11 | including buffer allocation, queueing and dequeueing, and streaming |
| 12 | control. Using videobuf imposes a few design decisions on the driver |
| 13 | author, but the payback comes in the form of reduced code in the driver and |
| 14 | a consistent implementation of the V4L2 user-space API. |
| 15 | |
| 16 | Buffer types |
| 17 | |
| 18 | Not all video devices use the same kind of buffers. In fact, there are (at |
| 19 | least) three common variations: |
| 20 | |
| 21 | - Buffers which are scattered in both the physical and (kernel) virtual |
| 22 | address spaces. (Almost) all user-space buffers are like this, but it |
| 23 | makes great sense to allocate kernel-space buffers this way as well when |
| 24 | it is possible. Unfortunately, it is not always possible; working with |
| 25 | this kind of buffer normally requires hardware which can do |
| 26 | scatter/gather DMA operations. |
| 27 | |
| 28 | - Buffers which are physically scattered, but which are virtually |
| 29 | contiguous; buffers allocated with vmalloc(), in other words. These |
| 30 | buffers are just as hard to use for DMA operations, but they can be |
| 31 | useful in situations where DMA is not available but virtually-contiguous |
| 32 | buffers are convenient. |
| 33 | |
| 34 | - Buffers which are physically contiguous. Allocation of this kind of |
| 35 | buffer can be unreliable on fragmented systems, but simpler DMA |
| 36 | controllers cannot deal with anything else. |
| 37 | |
| 38 | Videobuf can work with all three types of buffers, but the driver author |
| 39 | must pick one at the outset and design the driver around that decision. |
| 40 | |
| 41 | [It's worth noting that there's a fourth kind of buffer: "overlay" buffers |
| 42 | which are located within the system's video memory. The overlay |
| 43 | functionality is considered to be deprecated for most use, but it still |
| 44 | shows up occasionally in system-on-chip drivers where the performance |
| 45 | benefits merit the use of this technique. Overlay buffers can be handled |
| 46 | as a form of scattered buffer, but there are very few implementations in |
| 47 | the kernel and a description of this technique is currently beyond the |
| 48 | scope of this document.] |
| 49 | |
| 50 | Data structures, callbacks, and initialization |
| 51 | |
| 52 | Depending on which type of buffers are being used, the driver should |
| 53 | include one of the following files: |
| 54 | |
| 55 | <media/videobuf-dma-sg.h> /* Physically scattered */ |
| 56 | <media/videobuf-vmalloc.h> /* vmalloc() buffers */ |
| 57 | <media/videobuf-dma-contig.h> /* Physically contiguous */ |
| 58 | |
| 59 | The driver's data structure describing a V4L2 device should include a |
| 60 | struct videobuf_queue instance for the management of the buffer queue, |
| 61 | along with a list_head for the queue of available buffers. There will also |
| 62 | need to be an interrupt-safe spinlock which is used to protect (at least) |
| 63 | the queue. |
| 64 | |
| 65 | The next step is to write four simple callbacks to help videobuf deal with |
| 66 | the management of buffers: |
| 67 | |
| 68 | struct videobuf_queue_ops { |
| 69 | int (*buf_setup)(struct videobuf_queue *q, |
| 70 | unsigned int *count, unsigned int *size); |
| 71 | int (*buf_prepare)(struct videobuf_queue *q, |
| 72 | struct videobuf_buffer *vb, |
| 73 | enum v4l2_field field); |
| 74 | void (*buf_queue)(struct videobuf_queue *q, |
| 75 | struct videobuf_buffer *vb); |
| 76 | void (*buf_release)(struct videobuf_queue *q, |
| 77 | struct videobuf_buffer *vb); |
| 78 | }; |
| 79 | |
| 80 | buf_setup() is called early in the I/O process, when streaming is being |
| 81 | initiated; its purpose is to tell videobuf about the I/O stream. The count |
| 82 | parameter will be a suggested number of buffers to use; the driver should |
| 83 | check it for rationality and adjust it if need be. As a practical rule, a |
| 84 | minimum of two buffers are needed for proper streaming, and there is |
| 85 | usually a maximum (which cannot exceed 32) which makes sense for each |
| 86 | device. The size parameter should be set to the expected (maximum) size |
| 87 | for each frame of data. |
| 88 | |
| 89 | Each buffer (in the form of a struct videobuf_buffer pointer) will be |
| 90 | passed to buf_prepare(), which should set the buffer's size, width, height, |
| 91 | and field fields properly. If the buffer's state field is |
| 92 | VIDEOBUF_NEEDS_INIT, the driver should pass it to: |
| 93 | |
| 94 | int videobuf_iolock(struct videobuf_queue* q, struct videobuf_buffer *vb, |
| 95 | struct v4l2_framebuffer *fbuf); |
| 96 | |
| 97 | Among other things, this call will usually allocate memory for the buffer. |
| 98 | Finally, the buf_prepare() function should set the buffer's state to |
| 99 | VIDEOBUF_PREPARED. |
| 100 | |
| 101 | When a buffer is queued for I/O, it is passed to buf_queue(), which should |
| 102 | put it onto the driver's list of available buffers and set its state to |
| 103 | VIDEOBUF_QUEUED. Note that this function is called with the queue spinlock |
| 104 | held; if it tries to acquire it as well things will come to a screeching |
| 105 | halt. Yes, this is the voice of experience. Note also that videobuf may |
| 106 | wait on the first buffer in the queue; placing other buffers in front of it |
| 107 | could again gum up the works. So use list_add_tail() to enqueue buffers. |
| 108 | |
| 109 | Finally, buf_release() is called when a buffer is no longer intended to be |
| 110 | used. The driver should ensure that there is no I/O active on the buffer, |
| 111 | then pass it to the appropriate free routine(s): |
| 112 | |
| 113 | /* Scatter/gather drivers */ |
| 114 | int videobuf_dma_unmap(struct videobuf_queue *q, |
Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 115 | struct videobuf_dmabuf *dma); |
Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 116 | int videobuf_dma_free(struct videobuf_dmabuf *dma); |
| 117 | |
| 118 | /* vmalloc drivers */ |
| 119 | void videobuf_vmalloc_free (struct videobuf_buffer *buf); |
| 120 | |
| 121 | /* Contiguous drivers */ |
| 122 | void videobuf_dma_contig_free(struct videobuf_queue *q, |
Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 123 | struct videobuf_buffer *buf); |
Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 124 | |
| 125 | One way to ensure that a buffer is no longer under I/O is to pass it to: |
| 126 | |
| 127 | int videobuf_waiton(struct videobuf_buffer *vb, int non_blocking, int intr); |
| 128 | |
| 129 | Here, vb is the buffer, non_blocking indicates whether non-blocking I/O |
| 130 | should be used (it should be zero in the buf_release() case), and intr |
| 131 | controls whether an interruptible wait is used. |
| 132 | |
| 133 | File operations |
| 134 | |
| 135 | At this point, much of the work is done; much of the rest is slipping |
| 136 | videobuf calls into the implementation of the other driver callbacks. The |
| 137 | first step is in the open() function, which must initialize the |
| 138 | videobuf queue. The function to use depends on the type of buffer used: |
| 139 | |
| 140 | void videobuf_queue_sg_init(struct videobuf_queue *q, |
Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 141 | struct videobuf_queue_ops *ops, |
| 142 | struct device *dev, |
| 143 | spinlock_t *irqlock, |
| 144 | enum v4l2_buf_type type, |
| 145 | enum v4l2_field field, |
| 146 | unsigned int msize, |
| 147 | void *priv); |
Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 148 | |
| 149 | void videobuf_queue_vmalloc_init(struct videobuf_queue *q, |
Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 150 | struct videobuf_queue_ops *ops, |
| 151 | struct device *dev, |
| 152 | spinlock_t *irqlock, |
| 153 | enum v4l2_buf_type type, |
| 154 | enum v4l2_field field, |
| 155 | unsigned int msize, |
Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 156 | void *priv); |
| 157 | |
| 158 | void videobuf_queue_dma_contig_init(struct videobuf_queue *q, |
| 159 | struct videobuf_queue_ops *ops, |
| 160 | struct device *dev, |
| 161 | spinlock_t *irqlock, |
| 162 | enum v4l2_buf_type type, |
| 163 | enum v4l2_field field, |
| 164 | unsigned int msize, |
| 165 | void *priv); |
| 166 | |
| 167 | In each case, the parameters are the same: q is the queue structure for the |
| 168 | device, ops is the set of callbacks as described above, dev is the device |
| 169 | structure for this video device, irqlock is an interrupt-safe spinlock to |
| 170 | protect access to the data structures, type is the buffer type used by the |
| 171 | device (cameras will use V4L2_BUF_TYPE_VIDEO_CAPTURE, for example), field |
| 172 | describes which field is being captured (often V4L2_FIELD_NONE for |
| 173 | progressive devices), msize is the size of any containing structure used |
| 174 | around struct videobuf_buffer, and priv is a private data pointer which |
| 175 | shows up in the priv_data field of struct videobuf_queue. Note that these |
| 176 | are void functions which, evidently, are immune to failure. |
| 177 | |
| 178 | V4L2 capture drivers can be written to support either of two APIs: the |
| 179 | read() system call and the rather more complicated streaming mechanism. As |
| 180 | a general rule, it is necessary to support both to ensure that all |
| 181 | applications have a chance of working with the device. Videobuf makes it |
| 182 | easy to do that with the same code. To implement read(), the driver need |
| 183 | only make a call to one of: |
| 184 | |
| 185 | ssize_t videobuf_read_one(struct videobuf_queue *q, |
Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 186 | char __user *data, size_t count, |
Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 187 | loff_t *ppos, int nonblocking); |
| 188 | |
| 189 | ssize_t videobuf_read_stream(struct videobuf_queue *q, |
Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 190 | char __user *data, size_t count, |
Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 191 | loff_t *ppos, int vbihack, int nonblocking); |
| 192 | |
| 193 | Either one of these functions will read frame data into data, returning the |
| 194 | amount actually read; the difference is that videobuf_read_one() will only |
| 195 | read a single frame, while videobuf_read_stream() will read multiple frames |
| 196 | if they are needed to satisfy the count requested by the application. A |
| 197 | typical driver read() implementation will start the capture engine, call |
| 198 | one of the above functions, then stop the engine before returning (though a |
| 199 | smarter implementation might leave the engine running for a little while in |
| 200 | anticipation of another read() call happening in the near future). |
| 201 | |
| 202 | The poll() function can usually be implemented with a direct call to: |
| 203 | |
| 204 | unsigned int videobuf_poll_stream(struct file *file, |
| 205 | struct videobuf_queue *q, |
| 206 | poll_table *wait); |
| 207 | |
| 208 | Note that the actual wait queue eventually used will be the one associated |
| 209 | with the first available buffer. |
| 210 | |
| 211 | When streaming I/O is done to kernel-space buffers, the driver must support |
| 212 | the mmap() system call to enable user space to access the data. In many |
| 213 | V4L2 drivers, the often-complex mmap() implementation simplifies to a |
| 214 | single call to: |
| 215 | |
| 216 | int videobuf_mmap_mapper(struct videobuf_queue *q, |
| 217 | struct vm_area_struct *vma); |
| 218 | |
| 219 | Everything else is handled by the videobuf code. |
| 220 | |
| 221 | The release() function requires two separate videobuf calls: |
| 222 | |
| 223 | void videobuf_stop(struct videobuf_queue *q); |
| 224 | int videobuf_mmap_free(struct videobuf_queue *q); |
| 225 | |
| 226 | The call to videobuf_stop() terminates any I/O in progress - though it is |
| 227 | still up to the driver to stop the capture engine. The call to |
| 228 | videobuf_mmap_free() will ensure that all buffers have been unmapped; if |
| 229 | so, they will all be passed to the buf_release() callback. If buffers |
| 230 | remain mapped, videobuf_mmap_free() returns an error code instead. The |
| 231 | purpose is clearly to cause the closing of the file descriptor to fail if |
| 232 | buffers are still mapped, but every driver in the 2.6.32 kernel cheerfully |
| 233 | ignores its return value. |
| 234 | |
| 235 | ioctl() operations |
| 236 | |
| 237 | The V4L2 API includes a very long list of driver callbacks to respond to |
| 238 | the many ioctl() commands made available to user space. A number of these |
| 239 | - those associated with streaming I/O - turn almost directly into videobuf |
| 240 | calls. The relevant helper functions are: |
| 241 | |
| 242 | int videobuf_reqbufs(struct videobuf_queue *q, |
Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 243 | struct v4l2_requestbuffers *req); |
Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 244 | int videobuf_querybuf(struct videobuf_queue *q, struct v4l2_buffer *b); |
| 245 | int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b); |
Mauro Carvalho Chehab | 7cae112 | 2010-02-22 18:55:00 -0300 | [diff] [blame] | 246 | int videobuf_dqbuf(struct videobuf_queue *q, struct v4l2_buffer *b, |
| 247 | int nonblocking); |
Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 248 | int videobuf_streamon(struct videobuf_queue *q); |
| 249 | int videobuf_streamoff(struct videobuf_queue *q); |
Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 250 | |
| 251 | So, for example, a VIDIOC_REQBUFS call turns into a call to the driver's |
| 252 | vidioc_reqbufs() callback which, in turn, usually only needs to locate the |
| 253 | proper struct videobuf_queue pointer and pass it to videobuf_reqbufs(). |
| 254 | These support functions can replace a great deal of buffer management |
| 255 | boilerplate in a lot of V4L2 drivers. |
| 256 | |
| 257 | The vidioc_streamon() and vidioc_streamoff() functions will be a bit more |
| 258 | complex, of course, since they will also need to deal with starting and |
Hans Verkuil | e4ea644 | 2010-12-25 07:15:22 -0300 | [diff] [blame] | 259 | stopping the capture engine. |
Jonathan Corbet | 4b586a3 | 2010-02-22 17:47:46 -0300 | [diff] [blame] | 260 | |
| 261 | Buffer allocation |
| 262 | |
| 263 | Thus far, we have talked about buffers, but have not looked at how they are |
| 264 | allocated. The scatter/gather case is the most complex on this front. For |
| 265 | allocation, the driver can leave buffer allocation entirely up to the |
| 266 | videobuf layer; in this case, buffers will be allocated as anonymous |
| 267 | user-space pages and will be very scattered indeed. If the application is |
| 268 | using user-space buffers, no allocation is needed; the videobuf layer will |
| 269 | take care of calling get_user_pages() and filling in the scatterlist array. |
| 270 | |
| 271 | If the driver needs to do its own memory allocation, it should be done in |
| 272 | the vidioc_reqbufs() function, *after* calling videobuf_reqbufs(). The |
| 273 | first step is a call to: |
| 274 | |
| 275 | struct videobuf_dmabuf *videobuf_to_dma(struct videobuf_buffer *buf); |
| 276 | |
| 277 | The returned videobuf_dmabuf structure (defined in |
| 278 | <media/videobuf-dma-sg.h>) includes a couple of relevant fields: |
| 279 | |
| 280 | struct scatterlist *sglist; |
| 281 | int sglen; |
| 282 | |
| 283 | The driver must allocate an appropriately-sized scatterlist array and |
| 284 | populate it with pointers to the pieces of the allocated buffer; sglen |
| 285 | should be set to the length of the array. |
| 286 | |
| 287 | Drivers using the vmalloc() method need not (and cannot) concern themselves |
| 288 | with buffer allocation at all; videobuf will handle those details. The |
| 289 | same is normally true of contiguous-DMA drivers as well; videobuf will |
| 290 | allocate the buffers (with dma_alloc_coherent()) when it sees fit. That |
| 291 | means that these drivers may be trying to do high-order allocations at any |
| 292 | time, an operation which is not always guaranteed to work. Some drivers |
| 293 | play tricks by allocating DMA space at system boot time; videobuf does not |
| 294 | currently play well with those drivers. |
| 295 | |
| 296 | As of 2.6.31, contiguous-DMA drivers can work with a user-supplied buffer, |
| 297 | as long as that buffer is physically contiguous. Normal user-space |
| 298 | allocations will not meet that criterion, but buffers obtained from other |
| 299 | kernel drivers, or those contained within huge pages, will work with these |
| 300 | drivers. |
| 301 | |
| 302 | Filling the buffers |
| 303 | |
| 304 | The final part of a videobuf implementation has no direct callback - it's |
| 305 | the portion of the code which actually puts frame data into the buffers, |
| 306 | usually in response to interrupts from the device. For all types of |
| 307 | drivers, this process works approximately as follows: |
| 308 | |
| 309 | - Obtain the next available buffer and make sure that somebody is actually |
| 310 | waiting for it. |
| 311 | |
| 312 | - Get a pointer to the memory and put video data there. |
| 313 | |
| 314 | - Mark the buffer as done and wake up the process waiting for it. |
| 315 | |
| 316 | Step (1) above is done by looking at the driver-managed list_head structure |
| 317 | - the one which is filled in the buf_queue() callback. Because starting |
| 318 | the engine and enqueueing buffers are done in separate steps, it's possible |
| 319 | for the engine to be running without any buffers available - in the |
| 320 | vmalloc() case especially. So the driver should be prepared for the list |
| 321 | to be empty. It is equally possible that nobody is yet interested in the |
| 322 | buffer; the driver should not remove it from the list or fill it until a |
| 323 | process is waiting on it. That test can be done by examining the buffer's |
| 324 | done field (a wait_queue_head_t structure) with waitqueue_active(). |
| 325 | |
| 326 | A buffer's state should be set to VIDEOBUF_ACTIVE before being mapped for |
| 327 | DMA; that ensures that the videobuf layer will not try to do anything with |
| 328 | it while the device is transferring data. |
| 329 | |
| 330 | For scatter/gather drivers, the needed memory pointers will be found in the |
| 331 | scatterlist structure described above. Drivers using the vmalloc() method |
| 332 | can get a memory pointer with: |
| 333 | |
| 334 | void *videobuf_to_vmalloc(struct videobuf_buffer *buf); |
| 335 | |
| 336 | For contiguous DMA drivers, the function to use is: |
| 337 | |
| 338 | dma_addr_t videobuf_to_dma_contig(struct videobuf_buffer *buf); |
| 339 | |
| 340 | The contiguous DMA API goes out of its way to hide the kernel-space address |
| 341 | of the DMA buffer from drivers. |
| 342 | |
| 343 | The final step is to set the size field of the relevant videobuf_buffer |
| 344 | structure to the actual size of the captured image, set state to |
| 345 | VIDEOBUF_DONE, then call wake_up() on the done queue. At this point, the |
| 346 | buffer is owned by the videobuf layer and the driver should not touch it |
| 347 | again. |
| 348 | |
| 349 | Developers who are interested in more information can go into the relevant |
| 350 | header files; there are a few low-level functions declared there which have |
| 351 | not been talked about here. Also worthwhile is the vivi driver |
| 352 | (drivers/media/video/vivi.c), which is maintained as an example of how V4L2 |
| 353 | drivers should be written. Vivi only uses the vmalloc() API, but it's good |
| 354 | enough to get started with. Note also that all of these calls are exported |
| 355 | GPL-only, so they will not be available to non-GPL kernel modules. |