The Linux Kernel
5.15.0
  • The Linux kernel user’s and administrator’s guide
  • Kernel Build System
  • The Linux kernel firmware guide
  • Open Firmware and Devicetree
  • The Linux kernel user-space API guide
  • Working with the kernel development community
  • Development tools for the kernel
  • How to write kernel documentation
  • Kernel Hacking Guides
  • Linux Tracing Technologies
  • Kernel Maintainer Handbook
  • fault-injection
  • Kernel Livepatching
  • The Linux driver implementer’s API guide
    • Driver Model
    • Driver Basics
    • Device drivers infrastructure
    • ioctl based interfaces
    • Early Userspace
    • CPU and Device Power Management
    • The Common Clk Framework
    • Bus-Independent Device Accesses
    • Buffer Sharing and Synchronization
    • Device links
    • Component Helper for Aggregate Drivers
    • Message-based devices
    • InfiniBand and Remote DMA (RDMA) Interfaces
    • Frame Buffer Library
    • Voltage and current regulator API
    • Reset controller API
    • Industrial I/O
    • Input Subsystem
    • Linux USB API
    • Firewire (IEEE 1394) driver Interface Guide
    • The Linux PCI driver implementer’s API guide
    • Compute Express Link
      • Compute Express Link Memory Devices
        • Driver Infrastructure
        • External Interfaces
    • Serial Peripheral Interface (SPI)
    • I2C and SMBus Subsystem
    • IPMB Driver for a Satellite MC
    • The Linux IPMI Driver
    • I3C subsystem
    • Generic System Interconnect Subsystem
    • Device Frequency Scaling
    • High Speed Synchronous Serial Interface (HSI)
    • Error Detection And Correction (EDAC) Devices
    • SCSI Interfaces Guide
    • libATA Developer’s Guide
    • target and iSCSI Interfaces Guide
    • The Common Mailbox Framework
    • MTD NAND Driver Programming Interface
    • Parallel Port Devices
    • 16x50 UART Driver
    • Pulse-Width Modulation (PWM)
    • Intel(R) Management Engine Interface (Intel(R) MEI)
    • Memory Technology Device (MTD)
    • MMC/SD/SDIO card support
    • Non-Volatile Memory Device (NVDIMM)
    • W1: Dallas’ 1-wire bus
    • The Linux RapidIO Subsystem
    • Writing s390 channel device drivers
    • VME Device Drivers
    • Linux 802.11 Driver Developer’s Guide
    • The Userspace I/O HOWTO
    • Linux Firmware API
    • PINCTRL (PIN CONTROL) subsystem
    • General Purpose Input/Output (GPIO)
    • RAID
    • Media subsystem kernel internal API
    • Miscellaneous Devices
    • Near Field Communication
    • DMAEngine documentation
    • Linux kernel SLIMbus support
    • SoundWire Documentation
    • Thermal
    • FPGA Subsystem
    • ACPI Support
    • Auxiliary Bus
    • Kernel driver lp855x
    • Kernel Connector
    • Console Drivers
    • Dell Systems Management Base Driver
    • EISA bus support
    • ISA Drivers
    • ISA Plug & Play support by Jaroslav Kysela <perex@suse.cz>
    • The io_mapping functions
    • Ordering I/O writes to memory-mapped addresses
    • Generic Counter Interface
    • Memory Controller drivers
    • MEN Chameleon Bus
    • NTB Drivers
    • NVMEM Subsystem
    • PARPORT interface documentation
    • PPS - Pulse Per Second
    • PTP hardware clock infrastructure for Linux
    • Generic PHY Framework
    • Pulse Width Modulation (PWM) interface
    • PLDM Firmware Flash Update Library
    • Overview of the pldmfw library
    • rfkill - RF kill switch support
    • Support for Serial devices
    • SM501 Driver
    • Surface System Aggregator Module (SSAM)
    • Linux Switchtec Support
    • Sync File API Guide
    • VFIO Mediated devices
    • VFIO - “Virtual Function I/O”
    • Xilinx FPGA
    • Xillybus driver for generic FPGA interface
    • Writing Device Drivers for Zorro Devices
  • Core API Documentation
  • locking
  • Accounting
  • Block
  • cdrom
  • Linux CPUFreq - CPU frequency and voltage scaling code in the Linux(TM) kernel
  • Integrated Drive Electronics (IDE)
  • Frame Buffer
  • fpga
  • Human Interface Devices (HID)
  • I2C/SMBus Subsystem
  • Industrial I/O
  • ISDN
  • InfiniBand
  • LEDs
  • NetLabel
  • Linux Networking Documentation
  • pcmcia
  • Power Management
  • TCM Virtual Device
  • timers
  • Serial Peripheral Interface (SPI)
  • 1-Wire Subsystem
  • Linux Watchdog Support
  • Linux Virtualization Support
  • The Linux Input Documentation
  • Linux Hardware Monitoring
  • Linux GPU Driver Developer’s Guide
  • Security Documentation
  • Linux Sound Subsystem Documentation
  • Linux Kernel Crypto API
  • Filesystems in the Linux kernel
  • Linux Memory Management Documentation
  • BPF Documentation
  • USB support
  • Linux PCI Bus Subsystem
  • Linux SCSI Subsystem
  • Assorted Miscellaneous Devices Documentation
  • Linux Scheduler
  • MHI
  • Assembler Annotations
  • CPU Architectures
  • Unsorted Documentation
  • Atomic Types
  • Atomic bitops
  • Memory Barriers
  • General notification mechanism
  • Translations
The Linux Kernel
  • »
  • The Linux driver implementer’s API guide »
  • Compute Express Link »
  • Compute Express Link Memory Devices
  • View page source

Compute Express Link Memory Devices¶

A Compute Express Link Memory Device is a CXL component that implements the CXL.mem protocol. It contains some amount of volatile memory, persistent memory, or both. It is enumerated as a PCI device for configuration and passing messages over an MMIO mailbox. Its contribution to the System Physical Address space is handled via HDM (Host Managed Device Memory) decoders that optionally define a device’s contribution to an interleaved address range across multiple devices underneath a host-bridge or interleaved across host-bridges.

Driver Infrastructure¶

This section covers the driver infrastructure for a CXL memory device.

CXL Memory Device¶

This implements the PCI exclusive functionality for a CXL device as it is defined by the Compute Express Link specification. CXL devices may surface certain functionality even if it isn’t CXL enabled.

The driver has several responsibilities, mainly:
  • Create the memX device and register on the CXL bus.

  • Enumerate device’s register interface and map them.

  • Probe the device attributes to establish sysfs interface.

  • Provide an IOCTL interface to userspace to communicate with the device for things like firmware update.

struct mbox_cmd¶

A command to be submitted to hardware.

Definition

struct mbox_cmd {
  u16 opcode;
  void *payload_in;
  void *payload_out;
  size_t size_in;
  size_t size_out;
  u16 return_code;
#define CXL_MBOX_SUCCESS 0;
};

Members

opcode

(input) The command set and command submitted to hardware.

payload_in

(input) Pointer to the input payload.

payload_out

(output) Pointer to the output payload. Must be allocated by the caller.

size_in

(input) Number of bytes to load from payload_in.

size_out

(input) Max number of bytes loaded into payload_out. (output) Number of bytes generated by the device. For fixed size outputs commands this is always expected to be deterministic. For variable sized output commands, it tells the exact number of bytes written.

return_code

(output) Error code returned from hardware.

Description

This is the primary mechanism used to send commands to the hardware. All the fields except payload_* correspond exactly to the fields described in Command Register section of the CXL 2.0 8.2.8.4.5. payload_in and payload_out are written to, and read from the Command Payload Registers defined in CXL 2.0 8.2.8.4.8.

struct cxl_mem_command¶

Driver representation of a memory device command

Definition

struct cxl_mem_command {
  struct cxl_command_info info;
  enum opcode opcode;
  u32 flags;
#define CXL_CMD_FLAG_NONE 0;
#define CXL_CMD_FLAG_FORCE_ENABLE BIT(0);
};

Members

info

Command information as it exists for the UAPI

opcode

The actual bits used for the mailbox protocol

flags

Set of flags effecting driver behavior.

  • CXL_CMD_FLAG_FORCE_ENABLE: In cases of error, commands with this flag will be enabled by the driver regardless of what hardware may have advertised.

Description

The cxl_mem_command is the driver’s internal representation of commands that are supported by the driver. Some of these commands may not be supported by the hardware. The driver will use info to validate the fields passed in by the user then submit the opcode to the hardware.

See struct cxl_command_info.

int __cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, struct mbox_cmd *mbox_cmd)¶

Execute a mailbox command

Parameters

struct cxl_mem *cxlm

The CXL memory device to communicate with.

struct mbox_cmd *mbox_cmd

Command to send to the memory device.

Context

Any context. Expects mbox_mutex to be held.

Return

-ETIMEDOUT if timeout occurred waiting for completion. 0 on success.

Caller should check the return code in mbox_cmd to make sure it succeeded.

Description

This is a generic form of the CXL mailbox send command thus only using the registers defined by the mailbox capability ID - CXL 2.0 8.2.8.4. Memory devices, and perhaps other types of CXL devices may have further information available upon error conditions. Driver facilities wishing to send mailbox commands should use the wrapper command.

The CXL spec allows for up to two mailboxes. The intention is for the primary mailbox to be OS controlled and the secondary mailbox to be used by system firmware. This allows the OS and firmware to communicate with the device and not need to coordinate with each other. The driver only uses the primary mailbox.

int cxl_mem_mbox_get(struct cxl_mem *cxlm)¶

Acquire exclusive access to the mailbox.

Parameters

struct cxl_mem *cxlm

The memory device to gain access to.

Context

Any context. Takes the mbox_mutex.

Return

0 if exclusive access was acquired.

void cxl_mem_mbox_put(struct cxl_mem *cxlm)¶

Release exclusive access to the mailbox.

Parameters

struct cxl_mem *cxlm

The CXL memory device to communicate with.

Context

Any context. Expects mbox_mutex to be held.

int handle_mailbox_cmd_from_user(struct cxl_mem *cxlm, const struct cxl_mem_command *cmd, u64 in_payload, u64 out_payload, s32 *size_out, u32 *retval)¶

Dispatch a mailbox command for userspace.

Parameters

struct cxl_mem *cxlm

The CXL memory device to communicate with.

const struct cxl_mem_command *cmd

The validated command.

u64 in_payload

Pointer to userspace’s input payload.

u64 out_payload

Pointer to userspace’s output payload.

s32 *size_out

(Input) Max payload size to copy out. (Output) Payload size hardware generated.

u32 *retval

Hardware generated return code from the operation.

Return

  • 0 - Mailbox transaction succeeded. This implies the mailbox

    protocol completed successfully not that the operation itself was successful.

  • -ENOMEM - Couldn’t allocate a bounce buffer.

  • -EFAULT - Something happened with copy_to/from_user.

  • -EINTR - Mailbox acquisition interrupted.

  • -EXXX - Transaction level failures.

Description

Creates the appropriate mailbox command and dispatches it on behalf of a userspace request. The input and output payloads are copied between userspace.

See cxl_send_cmd().

int cxl_validate_cmd_from_user(struct cxl_mem *cxlm, const struct cxl_send_command *send_cmd, struct cxl_mem_command *out_cmd)¶

Check fields for CXL_MEM_SEND_COMMAND.

Parameters

struct cxl_mem *cxlm

struct cxl_mem device whose mailbox will be used.

const struct cxl_send_command *send_cmd

struct cxl_send_command copied in from userspace.

struct cxl_mem_command *out_cmd

Sanitized and populated struct cxl_mem_command.

Return

  • 0 - out_cmd is ready to send.

  • -ENOTTY - Invalid command specified.

  • -EINVAL - Reserved fields or invalid values were used.

  • -ENOMEM - Input or output buffer wasn’t sized properly.

  • -EPERM - Attempted to use a protected command.

Description

The result of this command is a fully validated command in out_cmd that is safe to send to the hardware.

See handle_mailbox_cmd_from_user()

int cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, u16 opcode, void *in, size_t in_size, void *out, size_t out_size)¶

Send a mailbox command to a memory device.

Parameters

struct cxl_mem *cxlm

The CXL memory device to communicate with.

u16 opcode

Opcode for the mailbox command.

void *in

The input payload for the mailbox command.

size_t in_size

The length of the input payload

void *out

Caller allocated buffer for the output.

size_t out_size

Expected size of output.

Context

Any context. Will acquire and release mbox_mutex.

Return

  • %>=0 - Number of bytes returned in out.

  • -E2BIG - Payload is too large for hardware.

  • -EBUSY - Couldn’t acquire exclusive mailbox access.

  • -EFAULT - Hardware error occurred.

  • -ENXIO - Command completed, but device reported an error.

  • -EIO - Unexpected output size.

Description

Mailbox commands may execute successfully yet the device itself reported an error. While this distinction can be useful for commands from userspace, the kernel will only be able to use results when both are successful.

See __cxl_mem_mbox_send_cmd()

int cxl_mem_setup_regs(struct cxl_mem *cxlm)¶

Setup necessary MMIO.

Parameters

struct cxl_mem *cxlm

The CXL memory device to communicate with.

Return

0 if all necessary registers mapped.

Description

A memory device is required by spec to implement a certain set of MMIO regions. The purpose of this function is to enumerate and map those registers.

void cxl_walk_cel(struct cxl_mem *cxlm, size_t size, u8 *cel)¶

Walk through the Command Effects Log.

Parameters

struct cxl_mem *cxlm

Device.

size_t size

Length of the Command Effects Log.

u8 *cel

CEL

Description

Iterate over each entry in the CEL and determine if the driver supports the command. If so, the command is enabled for the device and can be used later.

int cxl_mem_get_partition_info(struct cxl_mem *cxlm, u64 *active_volatile_bytes, u64 *active_persistent_bytes, u64 *next_volatile_bytes, u64 *next_persistent_bytes)¶

Get partition info

Parameters

struct cxl_mem *cxlm

The device to act on

u64 *active_volatile_bytes

returned active volatile capacity

u64 *active_persistent_bytes

returned active persistent capacity

u64 *next_volatile_bytes

return next volatile capacity

u64 *next_persistent_bytes

return next persistent capacity

Description

Retrieve the current partition info for the device specified. If not 0, the ‘next’ values are pending and take affect on next cold reset.

See CXL 8.2.9.5.2.1 Get Partition Info

Return

0 if no error: or the result of the mailbox command.

int cxl_mem_enumerate_cmds(struct cxl_mem *cxlm)¶

Enumerate commands for a device.

Parameters

struct cxl_mem *cxlm

The device.

Description

Returns 0 if enumerate completed successfully.

CXL devices have optional support for certain commands. This function will determine the set of supported commands for the hardware and update the enabled_cmds bitmap in the cxlm.

int cxl_mem_identify(struct cxl_mem *cxlm)¶

Send the IDENTIFY command to the device.

Parameters

struct cxl_mem *cxlm

The device to identify.

Return

0 if identify was executed successfully.

Description

This will dispatch the identify command to the device and on success populate structures to be exported to sysfs.

CXL Core¶

The CXL core objects like ports, decoders, and regions are shared between the subsystem drivers cxl_acpi, cxl_pci, and core drivers (port-driver, region-driver, nvdimm object-drivers… etc).

struct cxl_decoder¶

CXL address range decode configuration

Definition

struct cxl_decoder {
  struct device dev;
  int id;
  struct range range;
  int interleave_ways;
  int interleave_granularity;
  enum cxl_decoder_type target_type;
  unsigned long flags;
  struct cxl_dport *target[];
};

Members

dev

this decoder’s device

id

kernel device name id

range

address range considered by this decoder

interleave_ways

number of cxl_dports in this decode

interleave_granularity

data stride per dport

target_type

accelerator vs expander (type2 vs type3) selector

flags

memory type capabilities and locking

target

active ordered target list in current decoder configuration

struct cxl_port¶

logical collection of upstream port devices and downstream port devices to construct a CXL memory decode hierarchy.

Definition

struct cxl_port {
  struct device dev;
  struct device *uport;
  int id;
  struct list_head dports;
  struct ida decoder_ida;
  resource_size_t component_reg_phys;
};

Members

dev

this port’s device

uport

PCI or platform device implementing the upstream port capability

id

id for port device-name

dports

cxl_dport instances referenced by decoders

decoder_ida

allocator for decoder ids

component_reg_phys

component register capability base address (optional)

struct cxl_dport¶

CXL downstream port

Definition

struct cxl_dport {
  struct device *dport;
  int port_id;
  resource_size_t component_reg_phys;
  struct cxl_port *port;
  struct list_head list;
};

Members

dport

PCI bridge or firmware device representing the downstream link

port_id

unique hardware identifier for dport in decoder target list

component_reg_phys

downstream port component registers

port

reference to cxl_port that contains this downstream port

list

node for a cxl_port’s list of cxl_dport instances

The CXL core provides a set of interfaces that can be consumed by CXL aware drivers. The interfaces allow for creation, modification, and destruction of regions, memory devices, ports, and decoders. CXL aware drivers must register with the CXL core via these interfaces in order to be able to participate in cross-device interleave coordination. The CXL core also establishes and maintains the bridge to the nvdimm subsystem.

CXL core introduces sysfs hierarchy to control the devices that are instantiated by the core.

The core CXL PMEM infrastructure supports persistent memory provisioning and serves as a bridge to the LIBNVDIMM subsystem. A CXL ‘bridge’ device is added at the root of a CXL device topology if platform firmware advertises at least one persistent memory capable CXL window. That root-level bridge corresponds to a LIBNVDIMM ‘bus’ device. Then for each cxl_memdev in the CXL device topology a bridge device is added to host a LIBNVDIMM dimm object. When these bridges are registered native LIBNVDIMM uapis are translated to CXL operations, for example, namespace label access commands.

CXL device capabilities are enumerated by PCI DVSEC (Designated Vendor-specific) and / or descriptors provided by platform firmware. They can be defined as a set like the device and component registers mandated by CXL Section 8.1.12.2 Memory Device PCIe Capabilities and Extended Capabilities, or they can be individual capabilities appended to bridged and endpoint devices.

Provide common infrastructure for enumerating and mapping these discrete capabilities.

External Interfaces¶

CXL IOCTL Interface¶

Not all of all commands that the driver supports are always available for use by userspace. Userspace must check the results from the QUERY command in order to determine the live set of commands.

struct cxl_command_info¶

Command information returned from a query.

Definition

struct cxl_command_info {
  __u32 id;
  __u32 flags;
#define CXL_MEM_COMMAND_FLAG_MASK GENMASK(0, 0);
  __s32 size_in;
  __s32 size_out;
};

Members

id

ID number for the command.

flags

Flags that specify command behavior.

size_in

Expected input size, or -1 if variable length.

size_out

Expected output size, or -1 if variable length.

Description

Represents a single command that is supported by both the driver and the hardware. This is returned as part of an array from the query ioctl. The following would be a command that takes a variable length input and returns 0 bytes of output.

  • id = 10

  • flags = 0

  • size_in = -1

  • size_out = 0

See struct cxl_mem_query_commands.

struct cxl_mem_query_commands¶

Query supported commands.

Definition

struct cxl_mem_query_commands {
  __u32 n_commands;
  __u32 rsvd;
  struct cxl_command_info __user commands[];
};

Members

n_commands

In/out parameter. When n_commands is > 0, the driver will return min(num_support_commands, n_commands). When n_commands is 0, driver will return the number of total supported commands.

rsvd

Reserved for future use.

commands

Output array of supported commands. This array must be allocated by userspace to be at least min(num_support_commands, n_commands)

Description

Allow userspace to query the available commands supported by both the driver, and the hardware. Commands that aren’t supported by either the driver, or the hardware are not returned in the query.

Examples

  • { .n_commands = 0 } // Get number of supported commands

  • { .n_commands = 15, .commands = buf } // Return first 15 (or less) supported commands

See struct cxl_command_info.

struct cxl_send_command¶

Send a command to a memory device.

Definition

struct cxl_send_command {
  __u32 id;
  __u32 flags;
  union {
    struct {
      __u16 opcode;
      __u16 rsvd;
    } raw;
    __u32 rsvd;
  };
  __u32 retval;
  struct {
    __s32 size;
    __u32 rsvd;
    __u64 payload;
  } in;
  struct {
    __s32 size;
    __u32 rsvd;
    __u64 payload;
  } out;
};

Members

id

The command to send to the memory device. This must be one of the commands returned by the query command.

flags

Flags for the command (input).

{unnamed_union}

anonymous

raw

Special fields for raw commands

raw.opcode

Opcode passed to hardware when using the RAW command.

raw.rsvd

Must be zero.

rsvd

Must be zero.

retval

Return value from the memory device (output).

in

Parameters associated with input payload.

in.size

Size of the payload to provide to the device (input).

in.rsvd

Must be zero.

in.payload

Pointer to memory for payload input, payload is little endian.

out

Parameters associated with output payload.

out.size

Size of the payload received from the device (input/output). This field is filled in by userspace to let the driver know how much space was allocated for output. It is populated by the driver to let userspace know how large the output payload actually was.

out.rsvd

Must be zero.

out.payload

Pointer to memory for payload output, payload is little endian.

Description

Mechanism for userspace to send a command to the hardware for processing. The driver will do basic validation on the command sizes. In some cases even the payload may be introspected. Userspace is required to allocate large enough buffers for size_out which can be variable length in certain situations.

Next Previous

© Copyright The kernel development community.

Built with Sphinx using a theme provided by Read the Docs.