# Alternating randomized least squares for CP Decomposition

The function `cp_arls` computes an estimate of the best rank-R CP model of a tensor X using alternating *randomized* least-squares algorithm. The input X must be a (dense) `tensor`. The output CP model is a `ktensor`. The CP-ARLS method is described in the following reference:

- C. Battaglino, G. Ballard, T. G. Kolda. A Practical Randomized CP Tensor Decomposition. SIAM Journal on Matrix Analysis and Applications 39(2):876-901, 2018. https://doi.org/10.1137/17M1112303

## Contents

- Set up a sample problem
- Running the CP-ARLS method
- Speed things up by skipping the initial mixing
- Comparing with CP-ALS
- How well does the approximate fit do?
- Varying epoch size
- Set up another sample problem
- Terminating once a desired fit is achieved
- Changing the number of function evaluation samples
- Change the number of sampled rows in least squares solve

## Set up a sample problem

We set up an especially difficult and somewhat large sample problem that has high collinearity (0.9) and 1% noise. This is an example where the randomized method will generally outperform the standard method.

rng('default') %<- Setting random seed for reproducibility of this script sz = [200 300 400]; R = 5; ns = 0.01; coll = 0.9; info = create_problem('Size', sz, 'Num_Factors', R, 'Noise', ns, ... 'Factor_Generator', @(m,n) matrandcong(m,n,coll), ... 'Lambda_Generator', @ones); % Extract data and solution X = info.Data; M_true = info.Soln;

## Running the CP-ARLS method

Running the method is essentially the same as using CP-ALS, feed the data matrix and the desired rank. Note that the iteration is of the form NxN which is the number of epochs x the number of iterations per epoch. The default number of iterations per epoch is 50. At the end of each epoch, we check the convergence criteria. Because this is a randomized method, we do not achieve strict decrease in the objective function. Instead, we look at the number of epochs without improvement (newi) and exit when this crosses the predefined tolerance (`newitol`), which defaults to 5. It is important to note that the fit values that are reported are approximate, so this is why it is denoted by `f~` rather than just `f`.

tic [M1, ~, out1] = cp_arls(X,R); time1 = toc; scr1 = score(M1,M_true); fprintf('\n*** Results for CP-ARLS (with mixing) ***\n'); fprintf('Time (secs): %.3f\n', time1) fprintf('Score (max=1): %.3f\n', scr1);

CP-ARLS (with mixing): Iter 10x5: f~ = 9.879516e-01 newi = 0 Iter 20x5: f~ = 9.889153e-01 newi = 0 Iter 30x5: f~ = 9.892952e-01 newi = 2 Iter 39x5: f~ = 9.894103e-01 newi = 5 *** Results for CP-ARLS (with mixing) *** Time (secs): 1.957 Score (max=1): 0.841

## Speed things up by skipping the initial mixing

The default behavior is to mix the data in each mode using an FFT and diagonal random +/-1 matrix. This may add substantial preprocessing time, though it helps to ensure that the method converges. Oftentimes, such as with randomly-generated data, the mixing is not necessary.

tic [M2, ~, out2] = cp_arls(X,R,'mix',false); time2 = toc; scr2 = score(M2,M_true); fprintf('\n*** Results for CP-ARLS (no mix) ***\n'); fprintf('Time (secs): %.3f\n', time2) fprintf('Score (max=1): %.3f\n', scr2);

CP_ARLS (without mixing): Iter 10x5: f~ = 9.754541e-01 newi = 0 Iter 20x5: f~ = 9.877080e-01 newi = 0 Iter 27x5: f~ = 9.877772e-01 newi = 5 *** Results for CP-ARLS (no mix) *** Time (secs): 0.781 Score (max=1): 0.715

## Comparing with CP-ALS

CP-ALS may be somewhat faster, especially since this is a relatively small problem, but it usually will not achieve as good of an answer in terms of the score.

tic; [M3, ~, out3] = cp_als(X,R,'maxiters',500,'printitn',10); time3 = toc; scr3 = score(M3,M_true); fprintf('\n*** Results for CP-ALS ***\n'); fprintf('Time (secs): %.3f\n', time3) fprintf('Score (max=1): %.3f\n', scr3);

CP_ALS: Iter 10: f = 9.650407e-01 f-delta = 3.1e-04 Iter 20: f = 9.670704e-01 f-delta = 2.5e-04 Iter 30: f = 9.717365e-01 f-delta = 1.4e-04 Iter 32: f = 9.719407e-01 f-delta = 9.4e-05 Final f = 9.719407e-01 *** Results for CP-ALS *** Time (secs): 1.960 Score (max=1): 0.370

## How well does the approximate fit do?

It is possible to check the accuracy of the fit computation by having the code compute the true fit and the final solution, enabled by the `truefit` option.

```
[M4,~,out4] = cp_arls(X,R,'truefit',true);
```

CP-ARLS (with mixing): Iter 10x5: f~ = 9.885802e-01 newi = 0 Iter 20x5: f~ = 9.891681e-01 newi = 0 Iter 30x5: f~ = 9.894600e-01 newi = 0 Iter 40x5: f~ = 9.895941e-01 newi = 0 Iter 49x5: f~ = 9.896514e-01 newi = 5 Final fit = 9.895394e-01 Final estimated fit = 9.896870e-01

## Varying epoch size

It is possible to vary that number of iterations per epoch. Fewer iterations means that more time is spent checking for convergence and it may also be harder to detect as an single iteration can have some fluctuation and we are actually looking for the overall trend. In contrast, too many iterations means that the method won't realize when it has converged and may spend too much time computing.

tic M = cp_arls(X,R,'epoch',1,'newitol',20); toc fprintf('Score: %.4f\n',score(M,M_true));

CP-ARLS (with mixing): Iter 10x1: f~ = 9.771743e-01 newi = 0 Iter 20x1: f~ = 9.828682e-01 newi = 0 Iter 30x1: f~ = 9.861844e-01 newi = 0 Iter 40x1: f~ = 9.880263e-01 newi = 0 Iter 50x1: f~ = 9.885041e-01 newi = 0 Iter 60x1: f~ = 9.886651e-01 newi = 0 Iter 70x1: f~ = 9.888218e-01 newi = 0 Iter 80x1: f~ = 9.888750e-01 newi = 1 Iter 90x1: f~ = 9.889503e-01 newi = 2 Iter 100x1: f~ = 9.890277e-01 newi = 0 Iter 110x1: f~ = 9.892037e-01 newi = 0 Iter 120x1: f~ = 9.891564e-01 newi = 10 Iter 130x1: f~ = 9.892404e-01 newi = 1 Iter 140x1: f~ = 9.892272e-01 newi = 7 Iter 150x1: f~ = 9.893753e-01 newi = 0 Iter 160x1: f~ = 9.894109e-01 newi = 0 Iter 170x1: f~ = 9.894410e-01 newi = 0 Iter 180x1: f~ = 9.894406e-01 newi = 8 Iter 190x1: f~ = 9.894969e-01 newi = 7 Iter 200x1: f~ = 9.894815e-01 newi = 2 Iter 210x1: f~ = 9.895155e-01 newi = 12 Iter 220x1: f~ = 9.895438e-01 newi = 4 Iter 230x1: f~ = 9.895391e-01 newi = 3 Iter 240x1: f~ = 9.895673e-01 newi = 2 Iter 250x1: f~ = 9.895374e-01 newi = 8 Iter 260x1: f~ = 9.896127e-01 newi = 3 Iter 270x1: f~ = 9.896080e-01 newi = 9 Iter 280x1: f~ = 9.895872e-01 newi = 19 Iter 281x1: f~ = 9.895949e-01 newi = 20 Elapsed time is 2.928779 seconds. Score: 0.9285

tic M = cp_arls(X,R,'epoch',200,'newitol',3,'printitn',2); toc fprintf('Score: %.4f\n',score(M,M_true));

CP-ARLS (with mixing): Iter 2x200: f~ = 9.895845e-01 newi = 0 Iter 4x200: f~ = 9.896783e-01 newi = 0 Iter 6x200: f~ = 9.897641e-01 newi = 0 Iter 8x200: f~ = 9.896691e-01 newi = 2 Iter 9x200: f~ = 9.896866e-01 newi = 3 Elapsed time is 16.179026 seconds. Score: 0.9926

## Set up another sample problem

We set up another problem with 10% noise, but no collinearity.

sz = [200 300 400]; R = 5; ns = 0.10; info = create_problem('Size', sz, 'Num_Factors', R, 'Noise', ns, ... 'Factor_Generator', @rand,'Lambda_Generator', @ones); % Extract data and solution X = info.Data; M_true = info.Soln;

## Terminating once a desired fit is achieved

If we know the noise level is 10%, we would expect a fit of 0.90 at best. So, we can set a threshold that is close to that and terminate as soon as we achieve that accuracy. Since detecting convergence is hard for a randomized method, this can lead to speed ups. However, if the fit is not high enough, the accuracy may suffer consequently.

M = cp_arls(X,R,'newitol',20,'fitthresh',0.895,'truefit',true); fprintf('Score: %.4f\n',score(M,M_true));

CP-ARLS (with mixing): Iter 4x5: f~ = 8.963288e-01 newi = 0 Final fit = 8.962945e-01 Final estimated fit = 8.963288e-01 Score: 0.8962

## Changing the number of function evaluation samples

The function evaluation is approximate and based on sampling the number of entries specified by `nsampfit`. If this is too small, the samples will not be accurate enough. If this is too large, the computation will take too long. The default is , which should generally be sufficient. It may sometimes be possible to use smaller values. The same sampled entries are used for every convergence check --- we do not resample to check other entries.

M = cp_arls(X,R,'truefit',true,'nsampfit',100); fprintf('Score: %.4f\n',score(M,M_true));

CP-ARLS (with mixing): Iter 8x5: f~ = 8.987134e-01 newi = 5 Final fit = 9.032629e-01 Final estimated fit = 9.008413e-01 Score: 0.8823

## Change the number of sampled rows in least squares solve

The default number of sampled rows for the least squares solves is `ceil(10*R*log2( R ))`. This seemed to work well in most tests, but this can be varied higher or lower. For R=5, this means we sample 117 rows per solve. The rows are different for every least squares problem. Let's see what happens if we reduce this to 10.

M = cp_arls(X,R,'truefit',true,'nsamplsq',10); fprintf('Score: %.4f\n',score(M,M_true));

CP-ARLS (with mixing): Iter 9x5: f~ = 7.281605e-01 newi = 5 Final fit = 7.324549e-01 Final estimated fit = 7.296237e-01 Score: 0.1827

What if we use 25?

M = cp_arls(X,R,'truefit',true,'nsamplsq',25); fprintf('Score: %.4f\n',score(M,M_true));

CP-ARLS (with mixing): Iter 10x5: f~ = 8.780384e-01 newi = 1 Iter 14x5: f~ = 8.803480e-01 newi = 5 Final fit = 8.806791e-01 Final estimated fit = 8.809336e-01 Score: 0.8648