The Linux Kernel
5.19.0
  • The Linux kernel user’s and administrator’s guide
  • Kernel Build System
  • The Linux kernel firmware guide
  • Open Firmware and Devicetree
  • The Linux kernel user-space API guide
  • Working with the kernel development community
  • Development tools for the kernel
  • How to write kernel documentation
  • Kernel Hacking Guides
  • Linux Tracing Technologies
  • Kernel Maintainer Handbook
  • fault-injection
  • Kernel Livepatching
  • The Linux driver implementer’s API guide
  • Core API Documentation
  • locking
  • Accounting
  • Block
  • cdrom
  • Linux CPUFreq - CPU frequency and voltage scaling code in the Linux(TM) kernel
  • Frame Buffer
  • fpga
  • Human Interface Devices (HID)
  • I2C/SMBus Subsystem
  • Industrial I/O
  • ISDN
  • InfiniBand
  • LEDs
  • NetLabel
  • Networking
    • AF_XDP
    • Bare UDP Tunnelling Module Documentation
    • batman-adv
    • SocketCAN - Controller Area Network
    • The UCAN Protocol
    • Hardware Device Drivers
    • Distributed Switch Architecture
    • Linux Devlink Documentation
      • Locking
      • Interface documentation
        • Devlink DPIPE
        • Devlink Health
        • Devlink Info
        • Devlink Flash
        • Devlink Params
        • Devlink Port
        • Devlink Region
        • Devlink Resource
        • Devlink Reload
        • Devlink Trap
        • Devlink Line card
      • Driver-specific documentation
    • CAIF
    • Netlink interface for ethtool
    • IEEE 802.15.4 Developer’s Guide
    • J1939 Documentation
    • Linux Networking and Network Devices APIs
    • MSG_ZEROCOPY
    • FAILOVER
    • Net DIM - Generic Network Dynamic Interrupt Moderation
    • NET_FAILOVER
    • Page Pool API
    • PHY Abstraction Layer
    • phylink
    • IP-Aliasing
    • Ethernet Bridging
    • SNMP counter
    • Checksum Offloads
    • Segmentation Offloads
    • Scaling in the Linux Networking Stack
    • Kernel TLS
    • Kernel TLS offload
    • Linux NFC subsystem
    • Netdev private dataroom for 6lowpan interfaces
    • 6pack Protocol
    • ARCnet Hardware
    • ARCnet
    • ATM
    • AX.25
    • Linux Ethernet Bonding Driver HOWTO
    • cdc_mbim - Driver for CDC MBIM Mobile Broadband modems
    • DCCP protocol
    • DCTCP (DataCenter TCP)
    • Linux DECnet Networking Layer Information
    • DNS Resolver Module
    • Softnet Driver Issues
    • EQL Driver: Serial IP Load Balancing HOWTO
    • LC-trie implementation notes
    • Linux Socket Filtering aka Berkeley Packet Filter (BPF)
    • Generic HDLC layer
    • Generic Netlink
    • Generic networking statistics for netlink users
    • The Linux kernel GTP tunneling module
    • Identifier Locator Addressing (ILA)
    • IOAM6 Sysfs variables
    • AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation
    • IP dynamic address hack-port v0.03
    • IPsec
    • IP Sysctl
    • IPv6
    • IPVLAN Driver HOWTO
    • IPvs-sysctl
    • Kernel Connection Multiplexor
    • L2TP
    • The Linux LAPB Module Interface
    • How to use packet injection with mac80211
    • Management Component Transport Protocol (MCTP)
    • MPLS Sysfs variables
    • MPTCP Sysfs variables
    • HOWTO for multiqueue network device support
    • Netconsole
    • Netdev features mess and how to get out from it alive
    • Network Devices, the Kernel, and You!
    • Netfilter Sysfs variables
    • NETIF Msg Level
    • Resilient Next-hop Groups
    • Netfilter Conntrack Sysfs variables
    • Netfilter’s flowtable infrastructure
    • Open vSwitch datapath developer documentation
    • Operational States
    • Packet MMAP
    • Linux Phonet protocol family
    • HOWTO for the linux packet generator
    • PLIP: The Parallel Line Internet Protocol Device
    • PPP Generic Driver and Channel Interface
    • The proc/net/tcp and proc/net/tcp6 variables
    • How to use radiotap headers
    • RDS
    • Linux wireless regulatory documentation
    • RxRPC Network Protocol
    • SOCKET OPTIONS
    • SECURITY
    • EXAMPLE CLIENT USAGE
    • Linux Kernel SCTP
    • LSM/SeLinux secid
    • Seg6 Sysfs variables
    • struct sk_buff
    • SMC Sysctl
    • Interface statistics
    • Stream Parser (strparser)
    • Ethernet switch device driver model (switchdev)
    • Sysfs tagging
    • TC Actions - Environmental Rules
    • Thin-streams and TCP
    • Team
    • Timestamping
    • Linux Kernel TIPC
    • Transparent proxy support
    • Universal TUN/TAP device driver
    • The UDP-Lite protocol (RFC 3828)
    • Virtual Routing and Forwarding (VRF)
    • Virtual eXtensible Local Area Networking documentation
    • Packet Layer to Device Driver
    • Device Driver to Packet Layer
    • Requirements for the device driver
    • Linux X.25 Project
    • XFRM device - offloading the IPsec computations
    • XFRM proc - /proc/net/xfrm_* files
    • XFRM
    • XFRM Syscall
  • pcmcia
  • Power Management
  • TCM Virtual Device
  • timers
  • Serial Peripheral Interface (SPI)
  • 1-Wire Subsystem
  • Linux Watchdog Support
  • Linux Virtualization Support
  • The Linux Input Documentation
  • Linux Hardware Monitoring
  • Linux GPU Driver Developer’s Guide
  • Security Documentation
  • Linux Sound Subsystem Documentation
  • Linux Kernel Crypto API
  • Filesystems in the Linux kernel
  • Linux Memory Management Documentation
  • BPF Documentation
  • USB support
  • Linux PCI Bus Subsystem
  • Linux SCSI Subsystem
  • Assorted Miscellaneous Devices Documentation
  • Linux Scheduler
  • MHI
  • Linux PECI Subsystem
  • Assembler Annotations
  • CPU Architectures
  • Kernel tools
  • Unsorted Documentation
  • Atomic Types
  • Atomic bitops
  • Memory Barriers
  • Translations
The Linux Kernel
  • »
  • Networking »
  • Linux Devlink Documentation »
  • Devlink Port
  • View page source

Devlink Port¶

devlink-port is a port that exists on the device. It has a logically separate ingress/egress point of the device. A devlink port can be any one of many flavours. A devlink port flavour along with port attributes describe what a port represents.

A device driver that intends to publish a devlink port sets the devlink port attributes and registers the devlink port.

Devlink port flavours are described below.

List of devlink port flavours¶

Flavour

Description

DEVLINK_PORT_FLAVOUR_PHYSICAL

Any kind of physical port. This can be an eswitch physical port or any other physical port on the device.

DEVLINK_PORT_FLAVOUR_DSA

This indicates a DSA interconnect port.

DEVLINK_PORT_FLAVOUR_CPU

This indicates a CPU port applicable only to DSA.

DEVLINK_PORT_FLAVOUR_PCI_PF

This indicates an eswitch port representing a port of PCI physical function (PF).

DEVLINK_PORT_FLAVOUR_PCI_VF

This indicates an eswitch port representing a port of PCI virtual function (VF).

DEVLINK_PORT_FLAVOUR_PCI_SF

This indicates an eswitch port representing a port of PCI subfunction (SF).

DEVLINK_PORT_FLAVOUR_VIRTUAL

This indicates a virtual port for the PCI virtual function.

Devlink port can have a different type based on the link layer described below.

List of devlink port types¶

Type

Description

DEVLINK_PORT_TYPE_ETH

Driver should set this port type when a link layer of the port is Ethernet.

DEVLINK_PORT_TYPE_IB

Driver should set this port type when a link layer of the port is InfiniBand.

DEVLINK_PORT_TYPE_AUTO

This type is indicated by the user when driver should detect the port type automatically.

PCI controllers¶

In most cases a PCI device has only one controller. A controller consists of potentially multiple physical, virtual functions and subfunctions. A function consists of one or more ports. This port is represented by the devlink eswitch port.

A PCI device connected to multiple CPUs or multiple PCI root complexes or a SmartNIC, however, may have multiple controllers. For a device with multiple controllers, each controller is distinguished by a unique controller number. An eswitch is on the PCI device which supports ports of multiple controllers.

An example view of a system with two controllers:

             ---------------------------------------------------------
             |                                                       |
             |           --------- ---------         ------- ------- |
-----------  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
| server  |  | -------   ----/---- ---/----- ------- ---/--- ---/--- |
| pci rc  |=== | pf0 |______/________/       | pf1 |___/_______/     |
| connect |  | -------                       -------                 |
-----------  |     | controller_num=1 (no eswitch)                   |
             ------|--------------------------------------------------
             (internal wire)
                   |
             ---------------------------------------------------------
             | devlink eswitch ports and reps                        |
             | ----------------------------------------------------- |
             | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
             | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
             | ----------------------------------------------------- |
             | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
             | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
             | ----------------------------------------------------- |
             |                                                       |
             |                                                       |
-----------  |           --------- ---------         ------- ------- |
| smartNIC|  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
| pci rc  |==| -------   ----/---- ---/----- ------- ---/--- ---/--- |
| connect |  | | pf0 |______/________/       | pf1 |___/_______/     |
-----------  | -------                       -------                 |
             |                                                       |
             |  local controller_num=0 (eswitch)                     |
             ---------------------------------------------------------

In the above example, the external controller (identified by controller number = 1) doesn’t have the eswitch. Local controller (identified by controller number = 0) has the eswitch. The Devlink instance on the local controller has eswitch devlink ports for both the controllers.

Function configuration¶

A user can configure the function attribute before enumerating the PCI function. Usually it means, user should configure function attribute before a bus specific device for the function is created. However, when SRIOV is enabled, virtual function devices are created on the PCI bus. Hence, function attribute should be configured before binding virtual function device to the driver. For subfunctions, this means user should configure port function attribute before activating the port function.

A user may set the hardware address of the function using ‘devlink port function set hw_addr’ command. For Ethernet port function this means a MAC address.

Subfunction¶

Subfunction is a lightweight function that has a parent PCI function on which it is deployed. Subfunction is created and deployed in unit of 1. Unlike SRIOV VFs, a subfunction doesn’t require its own PCI virtual function. A subfunction communicates with the hardware through the parent PCI function.

To use a subfunction, 3 steps setup sequence is followed. (1) create - create a subfunction; (2) configure - configure subfunction attributes; (3) deploy - deploy the subfunction;

Subfunction management is done using devlink port user interface. User performs setup on the subfunction management device.

(1) Create¶

A subfunction is created using a devlink port interface. A user adds the subfunction by adding a devlink port of subfunction flavour. The devlink kernel code calls down to subfunction management driver (devlink ops) and asks it to create a subfunction devlink port. Driver then instantiates the subfunction port and any associated objects such as health reporters and representor netdevice.

(2) Configure¶

A subfunction devlink port is created but it is not active yet. That means the entities are created on devlink side, the e-switch port representor is created, but the subfunction device itself is not created. A user might use e-switch port representor to do settings, putting it into bridge, adding TC rules, etc. A user might as well configure the hardware address (such as MAC address) of the subfunction while subfunction is inactive.

(3) Deploy¶

Once a subfunction is configured, user must activate it to use it. Upon activation, subfunction management driver asks the subfunction management device to instantiate the subfunction device on particular PCI function. A subfunction device is created on the Documentation/driver-api/auxiliary_bus.rst. At this point a matching subfunction driver binds to the subfunction’s auxiliary device.

Rate object management¶

Devlink provides API to manage tx rates of single devlink port or a group. This is done through rate objects, which can be one of the two types:

leaf

Represents a single devlink port; created/destroyed by the driver. Since leaf have 1to1 mapping to its devlink port, in user space it is referred as pci/<bus_addr>/<port_index>;

node

Represents a group of rate objects (leafs and/or nodes); created/deleted by request from the userspace; initially empty (no rate objects added). In userspace it is referred as pci/<bus_addr>/<node_name>, where node_name can be any identifier, except decimal number, to avoid collisions with leafs.

API allows to configure following rate object’s parameters:

tx_share

Minimum TX rate value shared among all other rate objects, or rate objects that parts of the parent group, if it is a part of the same group.

tx_max

Maximum TX rate value.

parent

Parent node name. Parent node rate limits are considered as additional limits to all node children limits. tx_max is an upper limit for children. tx_share is a total bandwidth distributed among children.

Driver implementations are allowed to support both or either rate object types and setting methods of their parameters.

Terms and Definitions¶

Terms and Definitions¶

Term

Definitions

PCI device

A physical PCI device having one or more PCI buses consists of one or more PCI controllers.

PCI controller

A controller consists of potentially multiple physical functions, virtual functions and subfunctions.

Port function

An object to manage the function of a port.

Subfunction

A lightweight function that has parent PCI function on which it is deployed.

Subfunction device

A bus device of the subfunction, usually on a auxiliary bus.

Subfunction driver

A device driver for the subfunction auxiliary device.

Subfunction management device

A PCI physical function that supports subfunction management.

Subfunction management driver

A device driver for PCI physical function that supports subfunction management using devlink port interface.

Subfunction host driver

A device driver for PCI physical function that hosts subfunction devices. In most cases it is same as subfunction management driver. When subfunction is used on external controller, subfunction management and host drivers are different.

Previous Next

© Copyright The kernel development community.

Built with Sphinx using a theme provided by Read the Docs.