# All-at-once optimization for CP tensor decomposition

We explain how to use `cp_opt` function which implements the **CP-OPT** method that fits the CP model using *direct* or *all-at-once* optimization. This is in contrast to the `cp_als` function which implements the **CP-ALS** method that fits the CP model using *alternating* optimization. The CP-OPT method is described in the following reference:

- E. Acar, D. M. Dunlavy and T. G. Kolda, A Scalable Optimization Approach for Fitting Canonical Tensor Decompositions, J. Chemometrics, 25(2):67-86, 2011, http://doi.org/10.1002/cem.1335

This method works with any tensor that support the functions `size`, `norm`, and `mttkrp`. This includes not only `tensor` and `sptensor`, but also `ktensor`, `ttensor`, and `sumtensor`.

## Contents

- Major overhaul in Tensor Toolbox Version 3.3, August 2022
- Optimization methods
- Optimization parameters
- Simple example problem #1
- Calling
`cp_opt`method and evaluating its outputs - Specifying different optimization method
- Calling
`cp_opt`method with higher rank - Simple example problem #2 (nonnegative)
- Call the
`cp_opt`method with lower bound of zero - Reproducibility
- Specifying different optimization method

## Major overhaul in Tensor Toolbox Version 3.3, August 2022

The code was completely overhauled in the Version 3.3 release of the Tensor Toolbox. The old code is available in `cp_opt_legacy` and documented here. The major differences are as follows:

- The function being optimized is now where
*X*is the data tensor and*M*is the model. Previously, the function being optimized was . The new formulation is only different by a constant factor, but its advantage is that the convergence tests (e.g., norm of gradient) are less sensitive to the scale of the*X*. - We now support the MATLAB Optimization Toolbox methods,
`fminunc`and`fmincon`, the later of which has support for bound contraints. - The input and output arguments are different. We've retained the legacy version for those that cannot easily change their workflow.

## Optimization methods

The `cp_opt` methods uses optimization methods from other packages; see Optimization Methods for Tensor Toolbox. As of Version 3.3., we distribute the default method (`'lbfgsb'`) along with Tensor Toolbox.

Options for 'method':

`'lbfgsb'`(default) - Uses**L-BFGS-B**by Stephen Becker, which implements the bound-constrained, limited-memory BFGS method. This code is distributed along with the Tensor Toolbox and will be activiated on first use. Supports bound contraints.`'fminunc'`or`'fmincon'`- Routines provided by the**MATLAB Optimization Toolbox**. The latter supports bound constraints. (It also supports linear and nonlinear constraints, but we have not included an interface to those.)`'lbfgs'`- Uses**POBLANO**Version 1.2 by Evrim Acar, Daniel Dunlavy, and Tamara Kolda, which implemented the limited-memory BFGS method. Does not support bound constraints, but it is pure MATLAB and may work if`'lbfgsb'`does not.

## Optimization parameters

The full list of optimization parameters for each method are provide in Optimization Methods for Tensor Toolbox. We list a few of the most relevant ones here.

`'gtol'`- The stopping condition for the norm of the gradient (or equivalent constrainted condition). Defaults to 1e-5.`'lower'`- Lower bounds, which can be a scalar (e.g., 0) or a vector. Defaults to`-Inf`(no lower bound).`'printitn'`- Frequency of printing. Normally, this specifies how often to print in terms of number of iteration. However, the MATLAB Optimization Toolbox method limit the options are 0 for none, 1 for every iteration, or > 1 for just a final summary. These methods do not allow any more granularity than that.

## Simple example problem #1

Create an example 50 x 40 x 30 tensor with rank 5 and add 10% noise.

rng(1); % Reproducibility R = 5; problem = create_problem('Size', [50 40 30], 'Num_Factors', R, 'Noise', 0.10); X = problem.Data; M_true = problem.Soln; % Create initial guess using 'nvecs' M_init = create_guess('Data', X, 'Num_Factors', R, 'Factor_Generator', 'nvecs');

## Calling `cp_opt` method and evaluating its outputs

Here is an example call to the cp_opt method. By default, each iteration prints the least squares fit function value (being minimized) and the norm of the gradient.

[M,M0,info] = cp_opt(X, R, 'init', M_init, 'printitn', 25);

CP-OPT CP Direct Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Size: 50 x 40 x 30, Rank: 5 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 25, f(x) = 5.529663e-02, ||grad||_infty = 2.18e-04 Iter 50, f(x) = 9.818485e-03, ||grad||_infty = 1.78e-04 Iter 55, f(x) = 9.809476e-03, ||grad||_infty = 3.71e-06 End Main Loop Final f: 9.8095e-03 Setup time: 0.016 seconds Optimization time: 0.33 seconds Iterations: 55 Total iterations: 161 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.

It's important to check the output of the optimization method. In particular, it's worthwhile to check the exit message. The message `CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH` means that it has converged because the function value stopped improving. The message `CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL` means that the gradient is sufficiently small.

exitmsg = info.optout.exit_condition

exitmsg = 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.'

The objective function here is different than for CP-ALS. That uses the fit, which is . In other words, it is the percentage of the data that is explained by the model. Because we have 10% noise, we do not expect the fit to be about 90%.

fit = 1 - sqrt(info.f)

fit = 0.9010

We can "score" the similarity of the model computed by CP and compare that with the truth. The `score` function on ktensor's gives a score in [0,1] with 1 indicating a perfect match. Because we have noise, we do not expect the fit to be perfect. See doc score for more details.

scr = score(M,M_true)

scr = 0.9986

## Specifying different optimization method

[M,M0,info] = cp_opt(X, R, 'init', M_init, 'printitn', 25, 'method', 'fminunc');

CP-OPT CP Direct Optimzation Unconstrained Optimization (via Optimization Toolbox) Size: 50 x 40 x 30, Rank: 5 Parameters: gtol=1e-05, maxiters = 1000, maxsubiters=10000 Begin Main Loop Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. End Main Loop Final f: 9.8097e-03 Setup time: 1.7 seconds Optimization time: 2.5 seconds Iterations: 107 Total iterations: 1004 Exit condition: Gradient norm smaller than tolerance

Check exit condition

```
exitmsg = info.optout.exit_condition
% Fit
fit = 1 - sqrt(info.f)
```

exitmsg = 'Gradient norm smaller than tolerance' fit = 0.9010

## Calling `cp_opt` method with higher rank

Re-using the same example as before, consider the case where we don't know R in advance. We might guess too high. Here we show a case where we guess R+1 factors rather than R.

% Generate initial guess of the correct size rng(2); M_plus_init = create_guess('Data', X, 'Num_Factors', R+1, ... 'Factor_Generator', 'nvecs');

% Run the algorithm [M_plus,~,output] = cp_opt(X, R+1, 'init', M_plus_init,'printitn',25); exitmsg = info.optout.exit_condition fit = 1 - sqrt(info.f)

CP-OPT CP Direct Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Size: 50 x 40 x 30, Rank: 6 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 25, f(x) = 5.530646e-02, ||grad||_infty = 2.50e-04 Iter 50, f(x) = 9.803871e-03, ||grad||_infty = 8.52e-05 Iter 56, f(x) = 9.785758e-03, ||grad||_infty = 5.15e-06 End Main Loop Final f: 9.7858e-03 Setup time: 0.0026 seconds Optimization time: 0.16 seconds Iterations: 56 Total iterations: 163 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL. exitmsg = 'Gradient norm smaller than tolerance' fit = 0.9010

```
% Check the answer (1 is perfect)
scr = score(M_plus, M_true)
```

scr = 0.9986

## Simple example problem #2 (nonnegative)

We can employ lower bounds to get a nonnegative factorization. First, we create an example 50 x 40 x 30 tensor with rank 5 and add 10% noise. We select nonnegative factor matrices and lambdas. The create_problem doesn't really know how to add noise without going negative, so we *hack* it to make the observed tensor be nonzero.

R = 5; rng(3); problem2 = create_problem('Size', [50 40 30], 'Num_Factors', R, 'Noise', 0.10,... 'Factor_Generator', 'rand', 'Lambda_Generator', 'rand'); X = problem2.Data .* (problem2.Data > 0); % Force it to be nonnegative M_true = problem2.Soln;

## Call the `cp_opt` method with lower bound of zero

Here we specify a lower bound of zero with the last two arguments.

[M,M0,info] = cp_opt(X, R, 'init', 'rand','lower',0,'printitn',25); % Check the output exitmsg = info.optout.exit_condition % Check the fit fit = 1 - sqrt(info.f) % Evaluate the output scr = score(M,M_true)

CP-OPT CP Direct Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Size: 50 x 40 x 30, Rank: 5 Lower bound: 0, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 25, f(x) = 1.534325e-02, ||grad||_infty = 1.80e-03 Iter 50, f(x) = 9.897878e-03, ||grad||_infty = 1.74e-04 Iter 75, f(x) = 9.786600e-03, ||grad||_infty = 5.02e-05 Iter 90, f(x) = 9.776759e-03, ||grad||_infty = 3.17e-05 End Main Loop Final f: 9.7768e-03 Setup time: 0.0024 seconds Optimization time: 0.13 seconds Iterations: 90 Total iterations: 186 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL. exitmsg = 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.' fit = 0.9011 scr = 0.9858

## Reproducibility

The parameters of a run are saved, so that a run can be reproduced exactly as follows.

cp_opt(X,R,info.params);

CP-OPT CP Direct Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Size: 50 x 40 x 30, Rank: 5 Lower bound: 0, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 25, f(x) = 1.534325e-02, ||grad||_infty = 1.80e-03 Iter 50, f(x) = 9.897878e-03, ||grad||_infty = 1.74e-04 Iter 75, f(x) = 9.786600e-03, ||grad||_infty = 5.02e-05 Iter 90, f(x) = 9.776759e-03, ||grad||_infty = 3.17e-05 End Main Loop Final f: 9.7768e-03 Setup time: 0.0022 seconds Optimization time: 0.13 seconds Iterations: 90 Total iterations: 186 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.

## Specifying different optimization method

[M,M0,info] = cp_opt(X, R, 'init', M_init, 'printitn', 25, ... 'method', 'fmincon', 'lower', 0);

CP-OPT CP Direct Optimzation Bound-Constrained Optimization (via Optimization Toolbox) Size: 50 x 40 x 30, Rank: 5 Parameters: gtol=1e-05, maxiters = 1000, maxsubiters=10000 Begin Main Loop Feasible point with lower objective function value found. Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance. End Main Loop Final f: 1.1224e-01 Setup time: 0.059 seconds Optimization time: 32 seconds Iterations: 375 Total iterations: 1244 Exit condition: Gradient norm smaller than tolerance

Check exit condition

```
exitmsg = info.optout.exit_condition
% Fit
fit = 1 - sqrt(info.f)
```

exitmsg = 'Gradient norm smaller than tolerance' fit = 0.6650