# All-at-once optimization for CP tensor decomposition

We explain how to use `cp_opt` function which implements the **CP-OPT** method that fits the CP model using *direct* or *all-at-once* optimization. This is in contrast to the `cp_als` function which implements the **CP-ALS** that fits the CP model using *alternating* optimization. The CP-OPT method is described in the following reference:

- E. Acar, D. M. Dunlavy and T. G. Kolda, A Scalable Optimization Approach for Fitting Canonical Tensor Decompositions, J. Chemometrics, 25(2):67-86, 2011, http://doi.org/10.1002/cem.1335

## Contents

- Third-party optimization software
- Check that the software is installed.
- Create an example problem.
- Create initial guess using 'nvecs'
- Call the
`cp_opt`method - Check the output
- Evaluate the output
- Overfitting example
- Nonnegative factorization
- Create an example problem.
- Generate initial guess of the corret size
- Call the
`cp_opt`method - Check the output
- Evaluate the output

## Third-party optimization software

The `cp_opt` method uses third-party optimization software to do the optimization. You can use either

The remainder of these instructions assume L-BFGS-B is being used. See here for instructions on using `cp_opt` with Poblano.

## Check that the software is installed.

Be sure that lbfgsb is in your path.

```
help lbfgsb
```

x = lbfgsb( fcn, l, u ) uses the lbfgsb v.3.0 library (fortran files must be installed; see compile_mex.m ) which is the L-BFGS-B algorithm. The algorithm is similar to the L-BFGS quasi-Newton algorithm, but also handles bound constraints via an active-set type iteration. This version is based on the modified C code L-BFGS-B-C, and so has a slightly different calling syntax than previous versions. The minimization problem that it solves is: min_x f(x) subject to l <= x <= u 'fcn' is a function handle that accepts an input, 'x', and returns two outputs, 'f' (function value), and 'g' (function gradient). 'l' and 'u' are column-vectors of constraints. Set their values to Inf if you want to ignore them. (You can set some values to Inf, but keep others enforced). The full format of the function is: [x,f,info] = lbfgsb( fcn, l, u, opts ) where the output 'f' has the value of the function f at the final iterate and 'info' is a structure with useful information (self-explanatory, except for info.err. The first column of info.err is the history of the function values f, and the second column is the history of norm( gradient, Inf ). ) The 'opts' structure allows you to pass further options. Possible field name values: opts.x0 The starting value (default: all zeros) opts.m Number of limited-memory vectors to use in the algorithm Try 3 <= m <= 20. (default: 5 ) opts.factr Tolerance setting (see this source code for more info) (default: 1e7 ). This is later multiplied by machine epsilon opts.pgtol Another tolerance setting, relating to norm(gradient,Inf) (default: 1e-5) opts.maxIts How many iterations to allow (default: 100) opts.maxTotalIts How many iterations to allow, including linesearch iterations (default: 5000) opts.printEvery How often to display information (default: 1) opts.errFcn A function handle (or cell array of several function handles) that computes whatever you want. The output will be printed to the screen every 'printEvery' iterations. (default: [] ) Results saved in columns 3 and higher of info.err variable Stephen Becker, srbecker@alumni.caltech.edu Feb 14, 2012 Updated Feb 21 2015, Stephen Becker, stephen.becker@colorado.edu

## Create an example problem.

Create an example 50 x 40 x 30 tensor with rank 5 and add 10% noise.

R = 5; info = create_problem('Size', [50 40 30], 'Num_Factors', R, 'Noise', 0.10); X = info.Data; M_true = info.Soln;

## Create initial guess using 'nvecs'

M_init = create_guess('Data', X, 'Num_Factors', R, 'Factor_Generator', 'nvecs');

## Call the `cp_opt` method

Here is an example call to the cp_opt method. By default, each iteration prints the least squares fit function value (being minimized) and the norm of the gradient.

```
[M,M0,output] = cp_opt(X, R, 'init', M_init);
```

Iter 10, f(x) = 8.891748e+03, ||grad||_infty = 2.76e+02 Iter 20, f(x) = 4.958865e+03, ||grad||_infty = 1.11e+01 Iter 30, f(x) = 2.006660e+03, ||grad||_infty = 2.84e+02 Iter 40, f(x) = 3.666126e+02, ||grad||_infty = 7.33e-01 Iter 48, f(x) = 3.666031e+02, ||grad||_infty = 5.80e-03

## Check the output

It's important to check the output of the optimization method. In particular, it's worthwhile to check the exit message. The message `CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH` means that it has converged because the function value stopped improving.

exitmsg = output.ExitMsg

exitmsg = 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH.'

The fit is the percentage of the data that is explained by the model. Because we have noise, we do not expect the fit to be perfect.

fit = output.Fit

fit = 99.0192

## Evaluate the output

We can "score" the similarity of the model computed by CP and compare that with the truth. The `score` function on ktensor's gives a score in [0,1] with 1 indicating a perfect match. Because we have noise, we do not expect the fit to be perfect. See doc score for more details.

scr = score(M,M_true)

scr = 0.9990

## Overfitting example

Re-using the same example as before, consider the case where we don't know R in advance. We might guess too high. Here we show a case where we guess R+1 factors rather than R.

% Generate initial guess of the corret size M_plus_init = create_guess('Data', X, 'Num_Factors', R+1, ... 'Factor_Generator', 'nvecs');

% Run the algorithm [M_plus,~,output] = cp_opt(X, R+1, 'init', M_plus_init); exitmsg = output.ExitMsg fit = output.Fit

Iter 10, f(x) = 8.892216e+03, ||grad||_infty = 2.76e+02 Iter 20, f(x) = 4.959036e+03, ||grad||_infty = 1.11e+01 Iter 30, f(x) = 2.242329e+03, ||grad||_infty = 2.38e+02 Iter 40, f(x) = 3.662549e+02, ||grad||_infty = 1.06e+00 Iter 50, f(x) = 3.659779e+02, ||grad||_infty = 2.57e+00 Iter 60, f(x) = 3.656326e+02, ||grad||_infty = 1.08e+00 Iter 70, f(x) = 3.654654e+02, ||grad||_infty = 4.21e-01 Iter 80, f(x) = 3.653954e+02, ||grad||_infty = 3.34e-01 Iter 90, f(x) = 3.652616e+02, ||grad||_infty = 2.89e-01 Iter 100, f(x) = 3.652308e+02, ||grad||_infty = 2.58e-01 Iter 110, f(x) = 3.651562e+02, ||grad||_infty = 1.28e-01 Iter 120, f(x) = 3.651236e+02, ||grad||_infty = 1.60e-01 Iter 130, f(x) = 3.650886e+02, ||grad||_infty = 1.76e-01 Iter 140, f(x) = 3.650385e+02, ||grad||_infty = 1.85e-01 Iter 150, f(x) = 3.650276e+02, ||grad||_infty = 1.94e-01 Iter 160, f(x) = 3.650209e+02, ||grad||_infty = 9.43e-02 Iter 170, f(x) = 3.650180e+02, ||grad||_infty = 1.41e-01 Iter 180, f(x) = 3.650100e+02, ||grad||_infty = 3.56e-01 Iter 190, f(x) = 3.649907e+02, ||grad||_infty = 3.44e-01 Iter 200, f(x) = 3.649657e+02, ||grad||_infty = 8.67e-02 Iter 210, f(x) = 3.649354e+02, ||grad||_infty = 1.11e-01 Iter 220, f(x) = 3.649190e+02, ||grad||_infty = 1.34e-01 Iter 230, f(x) = 3.649054e+02, ||grad||_infty = 6.80e-02 Iter 240, f(x) = 3.648948e+02, ||grad||_infty = 6.81e-02 Iter 250, f(x) = 3.648863e+02, ||grad||_infty = 1.15e-01 Iter 260, f(x) = 3.648811e+02, ||grad||_infty = 1.21e-01 Iter 270, f(x) = 3.648779e+02, ||grad||_infty = 5.69e-02 Iter 280, f(x) = 3.648764e+02, ||grad||_infty = 1.97e-01 Iter 290, f(x) = 3.648731e+02, ||grad||_infty = 1.21e-01 Iter 300, f(x) = 3.648704e+02, ||grad||_infty = 4.21e-02 Iter 310, f(x) = 3.648685e+02, ||grad||_infty = 2.84e-02 Iter 320, f(x) = 3.648668e+02, ||grad||_infty = 2.99e-02 Iter 330, f(x) = 3.648652e+02, ||grad||_infty = 1.05e-01 Iter 340, f(x) = 3.648575e+02, ||grad||_infty = 2.25e-01 Iter 350, f(x) = 3.648373e+02, ||grad||_infty = 1.22e-01 Iter 360, f(x) = 3.648303e+02, ||grad||_infty = 2.17e-01 Iter 370, f(x) = 3.648141e+02, ||grad||_infty = 9.75e-02 Iter 380, f(x) = 3.647937e+02, ||grad||_infty = 5.31e-01 Iter 390, f(x) = 3.647843e+02, ||grad||_infty = 3.80e-01 Iter 400, f(x) = 3.647760e+02, ||grad||_infty = 2.96e-01 Iter 410, f(x) = 3.647681e+02, ||grad||_infty = 5.99e-01 Iter 420, f(x) = 3.647614e+02, ||grad||_infty = 6.62e-02 Iter 430, f(x) = 3.647592e+02, ||grad||_infty = 1.09e-01 Iter 440, f(x) = 3.647524e+02, ||grad||_infty = 3.90e-02 Iter 450, f(x) = 3.647500e+02, ||grad||_infty = 1.08e-01 Iter 460, f(x) = 3.647468e+02, ||grad||_infty = 1.61e-02 Iter 470, f(x) = 3.647459e+02, ||grad||_infty = 3.14e-02 Iter 480, f(x) = 3.647453e+02, ||grad||_infty = 1.55e-02 Iter 490, f(x) = 3.647450e+02, ||grad||_infty = 3.27e-02 Iter 500, f(x) = 3.647446e+02, ||grad||_infty = 4.96e-02 Iter 510, f(x) = 3.647443e+02, ||grad||_infty = 4.67e-02 Iter 520, f(x) = 3.647441e+02, ||grad||_infty = 7.30e-02 Iter 530, f(x) = 3.647440e+02, ||grad||_infty = 1.40e-02 Iter 540, f(x) = 3.647439e+02, ||grad||_infty = 2.47e-02 Iter 543, f(x) = 3.647439e+02, ||grad||_infty = 6.08e-03 exitmsg = 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH.' fit = 99.0242

```
% Check the answer (1 is perfect)
scr = score(M_plus, M_true)
```

scr = 0.9990

## Nonnegative factorization

We can employ lower bounds to get a nonnegative factorization.

## Create an example problem.

Create an example 50 x 40 x 30 tensor with rank 5 and add 10% noise. We select nonnegative factor matrices and lambdas. The create_problem doesn't really know how to add noise without going negative, so we *hack* it to make the observed tensor be nonzero.

R = 5; info = create_problem('Size', [50 40 30], 'Num_Factors', R, 'Noise', 0.10,... 'Factor_Generator', 'rand', 'Lambda_Generator', 'rand'); X = info.Data .* (info.Data > 0); % Force it to be nonnegative M_true = info.Soln;

## Generate initial guess of the corret size

M_init = create_guess('Data', X, 'Num_Factors', R, ... 'Factor_Generator', 'rand');

## Call the `cp_opt` method

Here we specify a lower bound of zero with the last two arguments.

[M,M0,output] = cp_opt(X, R, 'init', M_init,'lower',0);

Iter 10, f(x) = 1.224958e+02, ||grad||_infty = 7.85e+00 Iter 20, f(x) = 5.944929e+01, ||grad||_infty = 4.63e+00 Iter 30, f(x) = 4.820681e+01, ||grad||_infty = 2.46e+00 Iter 40, f(x) = 4.559372e+01, ||grad||_infty = 3.49e+00 Iter 50, f(x) = 4.183102e+01, ||grad||_infty = 2.25e+00 Iter 60, f(x) = 4.038936e+01, ||grad||_infty = 1.45e+00 Iter 70, f(x) = 3.909901e+01, ||grad||_infty = 1.86e+00 Iter 80, f(x) = 3.834299e+01, ||grad||_infty = 7.54e-01 Iter 90, f(x) = 3.820908e+01, ||grad||_infty = 5.62e-01 Iter 100, f(x) = 3.811943e+01, ||grad||_infty = 4.09e-01 Iter 110, f(x) = 3.810099e+01, ||grad||_infty = 3.97e-01 Iter 120, f(x) = 3.809854e+01, ||grad||_infty = 3.42e-01 Iter 130, f(x) = 3.809783e+01, ||grad||_infty = 3.12e-01 Iter 140, f(x) = 3.809766e+01, ||grad||_infty = 3.14e-01 Iter 150, f(x) = 3.809763e+01, ||grad||_infty = 3.24e-01 Iter 160, f(x) = 3.809762e+01, ||grad||_infty = 3.25e-01 Iter 170, f(x) = 3.809762e+01, ||grad||_infty = 3.25e-01 Iter 171, f(x) = 3.809762e+01, ||grad||_infty = 3.26e-01

## Check the output

exitmsg = output.ExitMsg

exitmsg = 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH.'

The fit is the percentage of the data that is explained by the model. Because we have noise, we do not expect the fit to be perfect.

fit = output.Fit

fit = 99.0244

## Evaluate the output

We can "score" the similarity of the model computed by CP and compare that with the truth. The `score` function on ktensor's gives a score in [0,1] with 1 indicating a perfect match. Because we have noise, we do not expect the fit to be perfect. See doc score for more details.

scr = score(M,M_true)

scr = 0.9797