Projection

dattri.func.projection.random_project(feature: Dict[str, Tensor] | Tensor, feature_batch_size: int, proj_dim: int, proj_max_batch_size: int, device: str, proj_seed: int = 0, *, use_half_precision: bool = True) Callable

Randomly projects the features to a smaller dimension.

Parameters:
  • feature (Union[Dict[str, Tensor], Tensor]) – The feature needs to be projected. This can simple be a tensor with size [feature_batch_size, feature_dim]. Or typically, if the this is gradient of some torch.nn.Module models, it will have the structure similar to the result of model.named_parameters().

  • feature_batch_size (int) – The batch size of each tensor in the feature about to be projected. The typical type of feature are gradients of torch.nn.Module model but can restricted to this.

  • proj_dim (int) – Dimension of the projected feature.

  • proj_max_batch_size (int) – The maximum batch size used by fast_jl if the CudaProjector is used. Must be a multiple of 8. The maximum batch size is 32 for A100 GPUs, 16 for V100 GPUs, 40 for H100 GPUs.

  • device (str) – “cuda” or “cpu”.

  • proj_seed (int) – Random seed used by the projector. Defaults to 0.

  • use_half_precision (bool) – If True, torch.float16 will be used for all computations and arrays will be stored in torch.float16.

Returns:

A function that takes projects feature to a smaller dimension.

dattri.func.projection.arnoldi_project(feature_dim: int, func: Callable, x: List, argnums: int = 0, proj_dim: int = 100, max_iter: int = 100, norm_constant: float = 1.0, tol: float = 1e-07, mode: str = 'rev-fwd', regularization: float = 0.0, seed: int = 0, device: torch.device = 'cpu') Callable

Apply Arnoldi algorithm to approximate iHVP.

Parameters:
  • feature_dim (int) – Dimension of the features to be projected. Typically, this equals the number of parameters in the model (dimension of the gradient vectors).

  • func (Callable) – A Python function that takes one or more arguments. Must return a single-element Tensor. The Hessian will be calculated on this function. The positional arguments to func must all be Tensors.

  • x (List) – List of arguments for func.

  • argnums (int) – An integer defaulting to 0. Specifies which argument of func to compute Hessian with respect to.

  • proj_dim (int) – Dimension after the projection. This corresponds to the number of top eigenvalues (top-k eigenvalues) to keep for the Hessian approximation.

  • max_iter (int) – An integer defaulting to 100. Specifies the maximum iteration to calculate the ihvp through Arnoldi Iteration.

  • norm_constant (float) – A float defaulting to 1.0. Specifies a constant value for the norm of each projection. In some situations (e.g. with a large number of parameters) it might be advisable to set norm_constant > 1 to avoid dividing projection components by a large normalization factor.

  • tol (float) – A float defaulting to 1e-7. Specifies the break condition that decides if the algorithm has converged. If the torch.norm of the current basis vector is less than tol, then the algorithm is truncated.

  • mode (str) –

    The auto diff mode, which can have one of the following values: - rev-rev: calculate the Hessian with two reverse-mode auto-diff.

    It has better compatibility while costing more memory.

    • rev-fwd: calculate the Hessian with the composition of reverse-mode and forward-mode. It’s more memory-efficient but may not be supported by some operators.

  • regularization (float) – A float defaulting to 0.0. Specifies the regularization term to be added to the Hessian vector product, which is useful for the later inverse calculation if the Hessian matrix is singular or ill-conditioned. Specifically, the regularization term is regularization * v.

  • seed (int) – Random seed used by the projector. Defaults to 0.

  • device (torch.device) – “cuda” or “cpu”. Defaults to “cpu”.

Returns:

A function that applies Arnoldi algorithm on input feature.