struct sk_buff

sk_buff is the main networking structure representing a packet.

Basic sk_buff geometry

struct sk_buff itself is a metadata structure and does not hold any packet data. All the data is held in associated buffers.

sk_buff.head points to the main “head” buffer. The head buffer is divided into two parts:

  • data buffer, containing headers and sometimes payload; this is the part of the skb operated on by the common helpers such as skb_put() or skb_pull();

  • shared info (struct skb_shared_info) which holds an array of pointers to read-only data in the (page, offset, length) format.

Optionally skb_shared_info.frag_list may point to another skb.

Basic diagram may look like this:

                                ---------------
                               | sk_buff       |
                                ---------------
   ,---------------------------  + head
  /          ,-----------------  + data
 /          /      ,-----------  + tail
|          |      |            , + end
|          |      |           |
v          v      v           v
 -----------------------------------------------
| headroom | data |  tailroom | skb_shared_info |
 -----------------------------------------------
                               + [page frag]
                               + [page frag]
                               + [page frag]
                               + [page frag]       ---------
                               + frag_list    --> | sk_buff |
                                                   ---------

Shared skbs and skb clones

sk_buff.users is a simple refcount allowing multiple entities to keep a struct sk_buff alive. skbs with a sk_buff.users != 1 are referred to as shared skbs (see skb_shared()).

skb_clone() allows for fast duplication of skbs. None of the data buffers get copied, but caller gets a new metadata struct (struct sk_buff). &skb_shared_info.refcount indicates the number of skbs pointing at the same packet data (i.e. clones).

dataref and headerless skbs

Transport layers send out clones of payload skbs they hold for retransmissions. To allow lower layers of the stack to prepend their headers we split skb_shared_info.dataref into two halves. The lower 16 bits count the overall number of references. The higher 16 bits indicate how many of the references are payload-only. skb_header_cloned() checks if skb is allowed to add / write the headers.

The creator of the skb (e.g. TCP) marks its skb as sk_buff.nohdr (via __skb_header_release()). Any clone created from marked skb will get sk_buff.hdr_len populated with the available headroom. If there’s the only clone in existence it’s able to modify the headroom at will. The sequence of calls inside the transport layer is:

<alloc skb>
skb_reserve()
__skb_header_release()
skb_clone()
// send the clone down the stack

This is not a very generic construct and it depends on the transport layers doing the right thing. In practice there’s usually only one payload-only skb. Having multiple payload-only skbs with different lengths of hdr_len is not possible. The payload-only skbs should never leave their owner.