| __BUILTIN_PREFETCH(3) | Library Functions Manual | __BUILTIN_PREFETCH(3) | 
__builtin_prefetch —
__builtin_prefetch(const
  void *addr, ...);
__builtin_prefetch() function prefetches memory from
  addr. The rationale is to minimize cache-miss latency by
  trying to move data into a cache before accessing the data. Possible use cases
  include frequently called sections of code in which it is known that the data
  in a given address is likely to be accessed soon.
In addition to addr, there are two optional stdarg(3) arguments, rw and locality. The value of the latter should be a compile-time constant integer between 0 and 3. The higher the value, the higher the temporal locality in the data. When locality is 0, it is assumed that there is little or no temporal locality in the data; after access, it is not necessary to leave the data in the cache. The default value is 3. The value of rw is either 0 or 1, corresponding with read and write prefetch, respectively. The default value of rw is 0. Also rw must be a compile-time constant integer.
The __builtin_prefetch() function
    translates into prefetch instructions only if the architecture has support
    for these. If there is no support, addr is evaluated
    only if it includes side effects, although no warnings are issued by
    gcc(1).
cpu_in_cksum() function that calculates checksums for
  the inet(4) headers:
while (mlen >= 32) {
	__builtin_prefetch(data + 32);
	partial += *(uint16_t *)data;
	partial += *(uint16_t *)(data + 2);
	partial += *(uint16_t *)(data + 4);
	...
	partial += *(uint16_t *)(data + 28);
	partial += *(uint16_t *)(data + 30);
	data += 32;
	mlen -= 32;
	...
Ulrich Drepper, What Every Programmer Should Know About Memory, http://www.akkadia.org/drepper/cpumemory.pdf, November 21, 2007.
| December 22, 2010 | NetBSD 10.1 |