Compute Express Link Memory Devices¶
A Compute Express Link Memory Device is a CXL component that implements the CXL.mem protocol. It contains some amount of volatile memory, persistent memory, or both. It is enumerated as a PCI device for configuration and passing messages over an MMIO mailbox. Its contribution to the System Physical Address space is handled via HDM (Host Managed Device Memory) decoders that optionally define a device’s contribution to an interleaved address range across multiple devices underneath a host-bridge or interleaved across host-bridges.
Driver Infrastructure¶
This section covers the driver infrastructure for a CXL memory device.
CXL Memory Device¶
This implements the PCI exclusive functionality for a CXL device as it is defined by the Compute Express Link specification. CXL devices may surface certain functionality even if it isn’t CXL enabled. While this driver is focused around the PCI specific aspects of a CXL device, it binds to the specific CXL memory device class code, and therefore the implementation of cxl_pci is focused around CXL memory devices.
- The driver has several responsibilities, mainly:
Create the memX device and register on the CXL bus.
Enumerate device’s register interface and map them.
Registers nvdimm bridge device with cxl_core.
Registers a CXL mailbox with cxl_core.
-
int
__cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *mbox_cmd)¶ Execute a mailbox command
Parameters
struct cxl_dev_state *cxldsThe device state to communicate with.
struct cxl_mbox_cmd *mbox_cmdCommand to send to the memory device.
Context
Any context. Expects mbox_mutex to be held.
Return
- -ETIMEDOUT if timeout occurred waiting for completion. 0 on success.
Caller should check the return code in mbox_cmd to make sure it succeeded.
Description
This is a generic form of the CXL mailbox send command thus only using the registers defined by the mailbox capability ID - CXL 2.0 8.2.8.4. Memory devices, and perhaps other types of CXL devices may have further information available upon error conditions. Driver facilities wishing to send mailbox commands should use the wrapper command.
The CXL spec allows for up to two mailboxes. The intention is for the primary mailbox to be OS controlled and the secondary mailbox to be used by system firmware. This allows the OS and firmware to communicate with the device and not need to coordinate with each other. The driver only uses the primary mailbox.
-
int
cxl_pci_mbox_get(struct cxl_dev_state *cxlds)¶ Acquire exclusive access to the mailbox.
Parameters
struct cxl_dev_state *cxldsThe device state to gain access to.
Context
Any context. Takes the mbox_mutex.
Return
0 if exclusive access was acquired.
-
void
cxl_pci_mbox_put(struct cxl_dev_state *cxlds)¶ Release exclusive access to the mailbox.
Parameters
struct cxl_dev_state *cxldsThe device state to communicate with.
Context
Any context. Expects mbox_mutex to be held.
-
int
cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type, struct cxl_register_map *map)¶ Locate register blocks by type
Parameters
struct pci_dev *pdevThe CXL PCI device to enumerate.
enum cxl_regloc_type typeRegister Block Indicator id
struct cxl_register_map *mapEnumeration output, clobbered on error
Return
0 if register block enumerated, negative error code otherwise
Description
A CXL DVSEC may point to one or more register blocks, search for them by type.
CXL Core¶
The CXL core objects like ports, decoders, and regions are shared between the subsystem drivers cxl_acpi, cxl_pci, and core drivers (port-driver, region-driver, nvdimm object-drivers… etc).
-
struct
cxl_register_map¶ DVSEC harvested register block mapping parameters
Definition
struct cxl_register_map {
void __iomem *base;
u64 block_offset;
u8 reg_type;
u8 barno;
union {
struct cxl_component_reg_map component_map;
struct cxl_device_reg_map device_map;
};
};
Members
basevirtual base of the register-block-BAR + block_offset
block_offsetoffset to start of register block in barno
reg_typesee enum cxl_regloc_type
barnoPCI BAR number containing the register block
{unnamed_union}anonymous
component_mapcxl_reg_map for component registers
device_mapcxl_reg_maps for device registers
-
struct
cxl_decoder¶ CXL address range decode configuration
Definition
struct cxl_decoder {
struct device dev;
int id;
struct range range;
int interleave_ways;
int interleave_granularity;
enum cxl_decoder_type target_type;
unsigned long flags;
int nr_targets;
struct cxl_dport *target[];
};
Members
devthis decoder’s device
idkernel device name id
rangeaddress range considered by this decoder
interleave_waysnumber of cxl_dports in this decode
interleave_granularitydata stride per dport
target_typeaccelerator vs expander (type2 vs type3) selector
flagsmemory type capabilities and locking
nr_targetsnumber of elements in target
targetactive ordered target list in current decoder configuration
-
enum
cxl_nvdimm_brige_state¶ state machine for managing bus rescans
Constants
CXL_NVB_NEWSet at bridge create and after cxl_pmem_wq is destroyed
CXL_NVB_DEADSet at brige unregistration to preclude async probing
CXL_NVB_ONLINETarget state after successful ->probe()
CXL_NVB_OFFLINETarget state after ->remove() or failed ->probe()
-
struct
cxl_port¶ logical collection of upstream port devices and downstream port devices to construct a CXL memory decode hierarchy.
Definition
struct cxl_port {
struct device dev;
struct device *uport;
int id;
struct list_head dports;
struct ida decoder_ida;
resource_size_t component_reg_phys;
};
Members
devthis port’s device
uportPCI or platform device implementing the upstream port capability
idid for port device-name
dportscxl_dport instances referenced by decoders
decoder_idaallocator for decoder ids
component_reg_physcomponent register capability base address (optional)
-
struct
cxl_dport¶ CXL downstream port
Definition
struct cxl_dport {
struct device *dport;
int port_id;
resource_size_t component_reg_phys;
struct cxl_port *port;
struct list_head list;
};
Members
dportPCI bridge or firmware device representing the downstream link
port_idunique hardware identifier for dport in decoder target list
component_reg_physdownstream port component registers
portreference to cxl_port that contains this downstream port
listnode for a cxl_port’s list of cxl_dport instances
The CXL core provides a set of interfaces that can be consumed by CXL aware drivers. The interfaces allow for creation, modification, and destruction of regions, memory devices, ports, and decoders. CXL aware drivers must register with the CXL core via these interfaces in order to be able to participate in cross-device interleave coordination. The CXL core also establishes and maintains the bridge to the nvdimm subsystem.
CXL core introduces sysfs hierarchy to control the devices that are instantiated by the core.
-
struct cxl_port *
devm_cxl_add_port(struct device *host, struct device *uport, resource_size_t component_reg_phys, struct cxl_port *parent_port)¶ register a cxl_port in CXL memory decode hierarchy
Parameters
struct device *hosthost device for devm operations
struct device *uport“physical” device implementing this upstream port
resource_size_t component_reg_phys(optional) for configurable cxl_port instances
struct cxl_port *parent_portnext hop up in the CXL memory decode hierarchy
-
int
cxl_add_dport(struct cxl_port *port, struct device *dport_dev, int port_id, resource_size_t component_reg_phys)¶ append downstream port data to a cxl_port
Parameters
struct cxl_port *portthe cxl_port that references this dport
struct device *dport_devfirmware or PCI device representing the dport
int port_ididentifier for this dport in a decoder’s target list
resource_size_t component_reg_physoptional location of CXL component registers
Description
Note that all allocations and links are undone by cxl_port deletion and release.
-
int
__cxl_driver_register(struct cxl_driver *cxl_drv, struct module *owner, const char *modname)¶ register a driver for the cxl bus
Parameters
struct cxl_driver *cxl_drvcxl driver structure to attach
struct module *ownerowning module/driver
const char *modnameKBUILD_MODNAME for parent driver
The core CXL PMEM infrastructure supports persistent memory provisioning and serves as a bridge to the LIBNVDIMM subsystem. A CXL ‘bridge’ device is added at the root of a CXL device topology if platform firmware advertises at least one persistent memory capable CXL window. That root-level bridge corresponds to a LIBNVDIMM ‘bus’ device. Then for each cxl_memdev in the CXL device topology a bridge device is added to host a LIBNVDIMM dimm object. When these bridges are registered native LIBNVDIMM uapis are translated to CXL operations, for example, namespace label access commands.
CXL device capabilities are enumerated by PCI DVSEC (Designated Vendor-specific) and / or descriptors provided by platform firmware. They can be defined as a set like the device and component registers mandated by CXL Section 8.1.12.2 Memory Device PCIe Capabilities and Extended Capabilities, or they can be individual capabilities appended to bridged and endpoint devices.
Provide common infrastructure for enumerating and mapping these discrete capabilities.
Core implementation of the CXL 2.0 Type-3 Memory Device Mailbox. The implementation is used by the cxl_pci driver to initialize the device and implement the cxl_mem.h IOCTL UAPI. It also implements the backend of the cxl_pmem_ctl() transport for LIBNVDIMM.
External Interfaces¶
CXL IOCTL Interface¶
Not all of all commands that the driver supports are always available for use by userspace. Userspace must check the results from the QUERY command in order to determine the live set of commands.
-
struct
cxl_command_info¶ Command information returned from a query.
Definition
struct cxl_command_info {
__u32 id;
__u32 flags;
#define CXL_MEM_COMMAND_FLAG_MASK GENMASK(0, 0);
__s32 size_in;
__s32 size_out;
};
Members
idID number for the command.
flagsFlags that specify command behavior.
size_inExpected input size, or -1 if variable length.
size_outExpected output size, or -1 if variable length.
Description
Represents a single command that is supported by both the driver and the hardware. This is returned as part of an array from the query ioctl. The following would be a command that takes a variable length input and returns 0 bytes of output.
id = 10
flags = 0
size_in = -1
size_out = 0
See struct cxl_mem_query_commands.
-
struct
cxl_mem_query_commands¶ Query supported commands.
Definition
struct cxl_mem_query_commands {
__u32 n_commands;
__u32 rsvd;
struct cxl_command_info __user commands[];
};
Members
n_commandsIn/out parameter. When n_commands is > 0, the driver will return min(num_support_commands, n_commands). When n_commands is 0, driver will return the number of total supported commands.
rsvdReserved for future use.
commandsOutput array of supported commands. This array must be allocated by userspace to be at least min(num_support_commands, n_commands)
Description
Allow userspace to query the available commands supported by both the driver, and the hardware. Commands that aren’t supported by either the driver, or the hardware are not returned in the query.
Examples
{ .n_commands = 0 } // Get number of supported commands
{ .n_commands = 15, .commands = buf } // Return first 15 (or less) supported commands
-
struct
cxl_send_command¶ Send a command to a memory device.
Definition
struct cxl_send_command {
__u32 id;
__u32 flags;
union {
struct {
__u16 opcode;
__u16 rsvd;
} raw;
__u32 rsvd;
};
__u32 retval;
struct {
__s32 size;
__u32 rsvd;
__u64 payload;
} in;
struct {
__s32 size;
__u32 rsvd;
__u64 payload;
} out;
};
Members
idThe command to send to the memory device. This must be one of the commands returned by the query command.
flagsFlags for the command (input).
{unnamed_union}anonymous
rawSpecial fields for raw commands
raw.opcodeOpcode passed to hardware when using the RAW command.
raw.rsvdMust be zero.
rsvdMust be zero.
retvalReturn value from the memory device (output).
inParameters associated with input payload.
in.sizeSize of the payload to provide to the device (input).
in.rsvdMust be zero.
in.payloadPointer to memory for payload input, payload is little endian.
outParameters associated with output payload.
out.sizeSize of the payload received from the device (input/output). This field is filled in by userspace to let the driver know how much space was allocated for output. It is populated by the driver to let userspace know how large the output payload actually was.
out.rsvdMust be zero.
out.payloadPointer to memory for payload output, payload is little endian.
Description
Mechanism for userspace to send a command to the hardware for processing. The driver will do basic validation on the command sizes. In some cases even the payload may be introspected. Userspace is required to allocate large enough buffers for size_out which can be variable length in certain situations.