vllm.model_executor.layers.quantization.kernels.scaled_mm.xpu ¶
XPUFP8ScaledMMLinearKernel ¶
Bases: FP8ScaledMMLinearKernel
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/xpu.py
__init__ ¶
__init__(
c: FP8ScaledMMLinearLayerConfig,
layer_param_names: Sequence[str],
) -> None
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/xpu.py
apply_scaled_mm ¶
apply_scaled_mm(
*,
A: Tensor,
B: Tensor,
out_dtype: dtype,
As: Tensor,
Bs: Tensor,
bias: Tensor | None,
output_shape: list,
) -> Tensor
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/xpu.py
apply_weights ¶
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/xpu.py
can_implement classmethod ¶
can_implement(
c: FP8ScaledMMLinearLayerConfig,
) -> tuple[bool, str | None]