Carsten Otte | d763b7a | 2005-06-23 22:05:31 -0700 | [diff] [blame] | 1 | Execute-in-place for file mappings |
| 2 | ---------------------------------- |
| 3 | |
| 4 | Motivation |
| 5 | ---------- |
| 6 | File mappings are performed by mapping page cache pages to userspace. In |
| 7 | addition, read&write type file operations also transfer data from/to the page |
| 8 | cache. |
| 9 | |
| 10 | For memory backed storage devices that use the block device interface, the page |
| 11 | cache pages are in fact copies of the original storage. Various approaches |
| 12 | exist to work around the need for an extra copy. The ramdisk driver for example |
| 13 | does read the data into the page cache, keeps a reference, and discards the |
| 14 | original data behind later on. |
| 15 | |
| 16 | Execute-in-place solves this issue the other way around: instead of keeping |
| 17 | data in the page cache, the need to have a page cache copy is eliminated |
| 18 | completely. With execute-in-place, read&write type operations are performed |
| 19 | directly from/to the memory backed storage device. For file mappings, the |
| 20 | storage device itself is mapped directly into userspace. |
| 21 | |
| 22 | This implementation was initialy written for shared memory segments between |
| 23 | different virtual machines on s390 hardware to allow multiple machines to |
| 24 | share the same binaries and libraries. |
| 25 | |
| 26 | Implementation |
| 27 | -------------- |
| 28 | Execute-in-place is implemented in three steps: block device operation, |
| 29 | address space operation, and file operations. |
| 30 | |
| 31 | A block device operation named direct_access is used to retrieve a |
| 32 | reference (pointer) to a block on-disk. The reference is supposed to be |
| 33 | cpu-addressable, physical address and remain valid until the release operation |
| 34 | is performed. A struct block_device reference is used to address the device, |
| 35 | and a sector_t argument is used to identify the individual block. As an |
| 36 | alternative, memory technology devices can be used for this. |
| 37 | |
| 38 | The block device operation is optional, these block devices support it as of |
| 39 | today: |
| 40 | - dcssblk: s390 dcss block device driver |
| 41 | |
| 42 | An address space operation named get_xip_page is used to retrieve reference |
| 43 | to a struct page. To address the target page, a reference to an address_space, |
| 44 | and a sector number is provided. A 3rd argument indicates whether the |
| 45 | function should allocate blocks if needed. |
| 46 | |
| 47 | This address space operation is mutually exclusive with readpage&writepage that |
| 48 | do page cache read/write operations. |
| 49 | The following filesystems support it as of today: |
| 50 | - ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt |
| 51 | |
| 52 | A set of file operations that do utilize get_xip_page can be found in |
| 53 | mm/filemap_xip.c . The following file operation implementations are provided: |
| 54 | - aio_read/aio_write |
| 55 | - readv/writev |
| 56 | - sendfile |
| 57 | |
| 58 | The generic file operations do_sync_read/do_sync_write can be used to implement |
| 59 | classic synchronous IO calls. |
| 60 | |
| 61 | Shortcomings |
| 62 | ------------ |
| 63 | This implementation is limited to storage devices that are cpu addressable at |
| 64 | all times (no highmem or such). It works well on rom/ram, but enhancements are |
| 65 | needed to make it work with flash in read+write mode. |
| 66 | Putting the Linux kernel and/or its modules on a xip filesystem does not mean |
| 67 | they are not copied. |