Generalized CP (GCP) Tensor Decomposition

This document outlines usage and examples for the generalized CP (GCP) tensor decomposition implmented in gcp_opt. GCP allows alternate objective functions besides sum of squared errors, which is the standard for CP. The code support both dense and sparse input tensors, but the sparse input tensors require randomized optimization methods. For some examples, see also GCP-OPT Examples with Amino Acids Dataset.

GCP is described in greater detail in the manuscripts:

Contents

Basic Usage

The idea of GCP is to use alternative objective functions. As such, the most important thing to specify is the objective function.

The command M = gcp_opt(X,R,'type',type) computes an estimate of the best rank-|R| generalized CP (GCP) decomposition of the tensor X for the specified generalized loss function specified by type. The input X can be a tensor or sparse tensor. The result M is a Kruskal tensor. Some options for the objective function are:

See Function Types for GCP for a complete list of options.

Manually specifying the loss function

Rather than specifying a type, the user has the option to explicitly specify the objection function, gradient, and lower bounds using the following options:

Note that the function must be able to work on vectors of x and m values.

Choice of Optimzation Method

The default optimization method is L-BFGS-B (bound-constrained limited-memory BFGS). To use this, install the third-party software:

The L-BFGS-B software can only be used for dense tensors. The other choice is to use a stochastic optimization method, either stochastic gradient descent (SGD) or ADAM. This can be used for dense or sparse tensors.

The command M = gcp_opt(X,R,...,'opt',opt) specifies the optimization method where opt is one of the following strings:

Each methods has parameters, which are described below.

Specifying Missing or Incomplete Data Using the Mask Option

If some entries of the tensor are unknown, the method can mask off that data during the fitting process. To do so, specify a mask tensor W that is the same size as the data tensor X. The mask tensor should be 1 if the entry in X is known and 0 otherwise. The call is M = gcp_opt(X,R,'type',type','mask',W).

Other Options

A few common options are as follows:

Specifying L-BFGS-B Parameters

In addition to the options above, there are two options used to modify the L-BFGS-B behavior.

It can sometimes be useful to increase or decrease pgtol depending on the objective function and size of the tensor.

Specifying SGD and ADAM Parameters

There are a number of parameters that can be adjusted for SGD and ADAM.

Stochastic Gradient. There are three different sampling methods for computing the stochastic gradient:

The options corresponding to these are as follows.

Estimating the Function. We also use sampling to estimate the function value.

[xsubs, xvals, wghts] = tt_sample_uniform(X, 10000);
fsampler = @() deal(xsubs, xvals, wghts);'

The 'stratified' sampler has an extra option: * 'oversample' - Factor to oversample when implicitly sampling zeros in the sparse case. Defaults to 1.1. Only adjust for very small tensors.

There are some other options that are needed for SGD, the learning rate and a decrease schedule. Our schedule is very simple - we decrease the rate if there is no improvement in the approximate function value after an epoch. After a specified number of decreases ('maxfails'), we quit.

There are some options that are specific to ADAM and generally needn't change:

Example on Gaussian distributed

We set up the example with known low-rank structure. Here nc is the rank and sz is the size.

clear
rng(4)
nc = 2;
sz = [50 60 70];
info = create_problem('Size',sz,'Num_Factors',nc);
X = info.Data;
M_true = info.Soln;
whos
  Name         Size                 Bytes  Class      Attributes

  M_true      50x60x70               3584  ktensor              
  X           50x60x70            1680376  tensor               
  info         1x1                1684312  struct               
  nc           1x1                      8  double               
  sz           1x3                     24  double               

Run GCP-OPT

tic, [M1,M0,out] = gcp_opt(X,nc,'type','normal','printitn',10); toc
fprintf('Final fit: %e (for comparison to f in CP-ALS)\n',1 - norm(X-full(M1))/norm(X));
fprintf('Score: %f\n',score(M1,M_true));
GCP-OPT-LBFGSB (Generalized CP Tensor Decomposition)

Tensor size: 50 x 60 x 70 (210000 total entries)
Generalized function Type: normal
Objective function: @(x,m)(m-x).^2
Gradient function: @(x,m)2.*(m-x)
Lower bound of factor matrices: -Inf
Optimization method: lbfgsb
Max iterations: 1000
Projected gradient tolerance: 21

Begin Main loop
Iter    10, f(x) = 9.397039e+04, ||grad||_infty = 2.88e+03
Iter    20, f(x) = 2.137406e+04, ||grad||_infty = 3.79e+01
Iter    30, f(x) = 2.092906e+04, ||grad||_infty = 1.64e+02
Iter    40, f(x) = 4.836752e+03, ||grad||_infty = 1.11e+03
Iter    47, f(x) = 1.320194e+03, ||grad||_infty = 1.73e+01
End Main Loop

Final objective: 1.3202e+03
Setup time: 0.01 seconds
Main loop time: 0.60 seconds
Outer iterations: 47
Total iterations: 117
L-BFGS-B Exit message: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.
Elapsed time is 0.611522 seconds.
Final fit: 9.004310e-01 (for comparison to f in CP-ALS)
Score: 0.999567

Compare to CP-ALS, which should usually be faster

tic, M2 = cp_als(X,nc,'init',tocell(M0),'printitn',1); toc
fprintf('Objective function: %e (for comparison to f(x) in GCP-OPT)\n', norm(X-full(M2))^2/prod(size(X)));
fprintf('Score: %f\n',score(M2,M_true));
CP_ALS:
 Iter  1: f = 5.755636e-01 f-delta = 5.8e-01
 Iter  2: f = 6.779797e-01 f-delta = 1.0e-01
 Iter  3: f = 8.776194e-01 f-delta = 2.0e-01
 Iter  4: f = 9.002202e-01 f-delta = 2.3e-02
 Iter  5: f = 9.005574e-01 f-delta = 3.4e-04
 Iter  6: f = 9.005586e-01 f-delta = 1.2e-06
 Final f = 9.005586e-01 
Elapsed time is 0.023634 seconds.
Objective function: 6.270533e-03 (for comparison to f(x) in GCP-OPT)
Score: 0.999536

Now let's try is with the ADAM functionality

tic, [M3,~,out] = gcp_opt(X,nc,'type','normal','opt','adam','init',M0,'printitn',1); toc
fprintf('Final fit: %e (for comparison to f in CP-ALS)\n',1 - norm(X-full(M1))/norm(X));
fprintf('Score: %f\n',score(M3,M_true));
GCP-OPT-ADAM (Generalized CP Tensor Decomposition)

Tensor size: 50 x 60 x 70 (210000 total entries)
Generalized function Type: normal
Objective function: @(x,m)(m-x).^2
Gradient function: @(x,m)2.*(m-x)
Lower bound of factor matrices: -Inf
Optimization method: adam
Max iterations (epochs): 1000
Iterations per epoch: 1000
Learning rate / decay / maxfails: 0.001 0.1 1
Function Sampler: uniform with 210000 samples
Gradient Sampler: uniform with 2100 samples

Begin Main loop
Initial f-est: 2.660878e+05
Epoch  1: f-est = 1.361652e+05, step = 0.001
Epoch  2: f-est = 1.330132e+05, step = 0.001
Epoch  3: f-est = 1.319047e+05, step = 0.001
Epoch  4: f-est = 2.539533e+04, step = 0.001
Epoch  5: f-est = 2.144132e+04, step = 0.001
Epoch  6: f-est = 2.403361e+03, step = 0.001
Epoch  7: f-est = 1.317062e+03, step = 0.001
Epoch  8: f-est = 1.316352e+03, step = 0.001
Epoch  9: f-est = 1.318167e+03, step = 0.001, nfails = 1 (resetting to solution from last epoch)
Epoch 10: f-est = 1.313968e+03, step = 0.0001
Epoch 11: f-est = 1.314306e+03, step = 0.0001, nfails = 2 (resetting to solution from last epoch)
End Main Loop

Final f-est: 1.3140e+03
Setup time: 0.05 seconds
Main loop time: 14.34 seconds
Total iterations: 11000
Elapsed time is 14.390320 seconds.
Final fit: 9.004310e-01 (for comparison to f in CP-ALS)
Score: 0.999683

Create an example Rayleigh tensor model and data instance.

Consider a tensor that is Rayleigh-distribued. This means its entries are all nonnegative. First, we generate such a tensor with low-rank structure.

clear
rng(65)
nc = 3;
sz = [50 60 70];
nd = length(sz);

% Create factor matrices that correspond to smooth sinusidal factors
U=cell(1,nd);
for k=1:nd
    V = 1.1 + cos(bsxfun(@times, 2*pi/sz(k)*(0:sz(k)-1)', 1:nc));
    U{k} = V(:,randperm(nc));
end
M_true = normalize(ktensor(U));
X = tenfun(@raylrnd, full(M_true));

Visualize the true solution

viz(M_true, 'Figure', 1)
ktensor/viz: Normalizing factors and sorting components according to the 2-norm.

ans = 

  struct with fields:

              height: 0.2933
               width: [3×1 double]
          GlobalAxis: [1×1 Axes]
          FactorAxes: [3×3 Axes]
    ModeTitleHandles: [3×1 Text]
    CompTitleHandles: [3×1 Text]
         PlotHandles: {3×3 cell}

Run GCP-OPT

tic, [M1,~,out] = gcp_opt(X,nc,'type','rayleigh','printitn',10); toc
fprintf('Score: %f\n',score(M1,M_true));
GCP-OPT-LBFGSB (Generalized CP Tensor Decomposition)

Tensor size: 50 x 60 x 70 (210000 total entries)
Generalized function Type: rayleigh
Objective function: @(x,m)2*log(m+1e-10)+(pi/4)*(x./(m+1e-10)).^2
Gradient function: @(x,m)2./(m+1e-10)-(pi/2)*x.^2./(m+1e-10).^3
Lower bound of factor matrices: 0
Optimization method: lbfgsb
Max iterations: 1000
Projected gradient tolerance: 21

Begin Main loop
Iter    10, f(x) = 9.142571e+05, ||grad||_infty = 1.41e+03
Positive dir derivative in projection 
Using the backtracking step
Iter    20, f(x) = 8.450604e+05, ||grad||_infty = 1.89e+03
Iter    30, f(x) = 7.770233e+05, ||grad||_infty = 1.41e+03
Iter    40, f(x) = 7.632798e+05, ||grad||_infty = 1.80e+03
Iter    50, f(x) = 7.580042e+05, ||grad||_infty = 1.10e+03
Iter    60, f(x) = 7.573270e+05, ||grad||_infty = 2.52e+02
Iter    70, f(x) = 7.572930e+05, ||grad||_infty = 7.99e+01
End Main Loop

Final objective: 7.5729e+05
Setup time: 0.01 seconds
Main loop time: 2.85 seconds
Outer iterations: 70
Total iterations: 165
L-BFGS-B Exit message: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.
Elapsed time is 2.852499 seconds.
Score: 0.795733

Visualize the solution from GCP-OPT

viz(M1, 'Figure', 2)
ktensor/viz: Normalizing factors and sorting components according to the 2-norm.

ans = 

  struct with fields:

              height: 0.2933
               width: [3×1 double]
          GlobalAxis: [1×1 Axes]
          FactorAxes: [3×3 Axes]
    ModeTitleHandles: [3×1 Text]
    CompTitleHandles: [3×1 Text]
         PlotHandles: {3×3 cell}

Now let's try is with the scarce functionality - this leaves out all but 10% of the data!

tic, [M2,~,out] = gcp_opt(X,nc,'type','rayleigh','opt','adam'); toc
fprintf('Final fit: %e (for comparison to f in CP-ALS)\n',1 - norm(X-full(M1))/norm(X));
fprintf('Score: %f\n',score(M2,M_true));
GCP-OPT-ADAM (Generalized CP Tensor Decomposition)

Tensor size: 50 x 60 x 70 (210000 total entries)
Generalized function Type: rayleigh
Objective function: @(x,m)2*log(m+1e-10)+(pi/4)*(x./(m+1e-10)).^2
Gradient function: @(x,m)2./(m+1e-10)-(pi/2)*x.^2./(m+1e-10).^3
Lower bound of factor matrices: 0
Optimization method: adam
Max iterations (epochs): 1000
Iterations per epoch: 1000
Learning rate / decay / maxfails: 0.001 0.1 1
Function Sampler: uniform with 210000 samples
Gradient Sampler: uniform with 2100 samples

Begin Main loop
Initial f-est: 2.715221e+06
Epoch  1: f-est = 1.017441e+06, step = 0.001
Epoch  2: f-est = 9.204216e+05, step = 0.001
Epoch  3: f-est = 8.791755e+05, step = 0.001
Epoch  4: f-est = 8.496629e+05, step = 0.001
Epoch  5: f-est = 8.276276e+05, step = 0.001
Epoch  6: f-est = 8.053227e+05, step = 0.001
Epoch  7: f-est = 7.839439e+05, step = 0.001
Epoch  8: f-est = 7.710536e+05, step = 0.001
Epoch  9: f-est = 7.653168e+05, step = 0.001
Epoch 10: f-est = 7.619699e+05, step = 0.001
Epoch 11: f-est = 7.600227e+05, step = 0.001
Epoch 12: f-est = 7.590060e+05, step = 0.001
Epoch 13: f-est = 7.585602e+05, step = 0.001
Epoch 14: f-est = 7.583133e+05, step = 0.001
Epoch 15: f-est = 7.582559e+05, step = 0.001
Epoch 16: f-est = 7.582295e+05, step = 0.001
Epoch 17: f-est = 7.582587e+05, step = 0.001, nfails = 1 (resetting to solution from last epoch)
Epoch 18: f-est = 7.581745e+05, step = 0.0001
Epoch 19: f-est = 7.581687e+05, step = 0.0001
Epoch 20: f-est = 7.581605e+05, step = 0.0001
Epoch 21: f-est = 7.581473e+05, step = 0.0001
Epoch 22: f-est = 7.581537e+05, step = 0.0001, nfails = 2 (resetting to solution from last epoch)
End Main Loop

Final f-est: 7.5815e+05
Setup time: 0.08 seconds
Main loop time: 40.89 seconds
Total iterations: 22000
Elapsed time is 40.977851 seconds.
Final fit: 5.380785e-01 (for comparison to f in CP-ALS)
Score: 0.797088

Visualize the solution with scarce

viz(M2, 'Figure', 3)
ktensor/viz: Normalizing factors and sorting components according to the 2-norm.

ans = 

  struct with fields:

              height: 0.2933
               width: [3×1 double]
          GlobalAxis: [1×1 Axes]
          FactorAxes: [3×3 Axes]
    ModeTitleHandles: [3×1 Text]
    CompTitleHandles: [3×1 Text]
         PlotHandles: {3×3 cell}

Boolean tensor.

The model will predict the odds of observing a 1. Recall that the odds related to the probability as follows. If $p$ is the probability adn $r$ is the odds, then $r = p / (1-p)$. Higher odds indicates a higher probability of observing a one.

clear
rng(7639)
nc = 3; % Number of components
sz = [50 60 70]; % Tensor size
nd = length(sz); % Number of dimensions

We assume that the underlying model tensor has factor matrices with only a few "large" entries in each column. The small entries should correspond to a low but nonzero entry of observing a 1, while the largest entries, if multiplied together, should correspond to a very high likelihood of observing a 1.

probrange = [0.01 0.99]; % Absolute min and max of probabilities
oddsrange = probrange ./ (1 - probrange);
smallval = nthroot(min(oddsrange)/nc,nd);
largeval = nthroot(max(oddsrange)/nc,nd);

A = cell(nd,1);
for k = 1:nd
    A{k} = smallval * ones(sz(k), nc);
    nbig = 5;
    for j = 1:nc
        p = randperm(sz(k));
        A{k}(p(1:nbig),j) = largeval;
    end
end
M_true = ktensor(A);

Convert K-tensor to an observed tensor Get the model values, which correspond to odds of observing a 1

Mfull = full(M_true);
% Convert odds to probabilities
Mprobs = Mfull ./ (1 + Mfull);
% Flip a coin for each entry, with the probability of observing a one
% dictated by Mprobs
Xfull = 1.0*(tensor(@rand, sz) < Mprobs);
% Convert to sparse tensor, real-valued 0/1 tensor since it was constructed
% to be sparse
X = sptensor(Xfull);
fprintf('Proportion of nonzeros in X is %.2f%%\n', 100*nnz(X) / prod(sz));
Proportion of nonzeros in X is 8.42%

Just for fun, let's visualize the distribution of the probabilities in the model tensor.

histogram(Mprobs(:))

Call GCP_OPT on the full tensor

[M1,~,out] = gcp_opt(Xfull, nc, 'type', 'binary','printitn',25);
fprintf('Final score: %f\n', score(M1,M_true));
GCP-OPT-LBFGSB (Generalized CP Tensor Decomposition)

Tensor size: 50 x 60 x 70 (210000 total entries)
Generalized function Type: binary
Objective function: @(x,m)log(m+1)-x.*log(m+1e-10)
Gradient function: @(x,m)1./(m+1)-x./(m+1e-10)
Lower bound of factor matrices: 0
Optimization method: lbfgsb
Max iterations: 1000
Projected gradient tolerance: 21

Begin Main loop
Iter    25, f(x) = 4.498422e+04, ||grad||_infty = 7.66e+01
Iter    50, f(x) = 4.323442e+04, ||grad||_infty = 1.27e+02
Iter    62, f(x) = 4.309850e+04, ||grad||_infty = 1.95e+01
End Main Loop

Final objective: 4.3098e+04
Setup time: 0.01 seconds
Main loop time: 1.45 seconds
Outer iterations: 62
Total iterations: 140
L-BFGS-B Exit message: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.
Final score: 0.739944

GCP-OPT as sparse tensor

[M2,~,out] = gcp_opt(X, nc, 'type', 'binary');
fprintf('Final score: %f\n', score(M2,M_true));
GCP-OPT-ADAM (Generalized CP Tensor Decomposition)

Tensor size: 50 x 60 x 70 (210000 total entries)
Sparse tensor: 17690 (8.4%) Nonzeros and 192310 (91.58%) Zeros
Generalized function Type: binary
Objective function: @(x,m)log(m+1)-x.*log(m+1e-10)
Gradient function: @(x,m)1./(m+1)-x./(m+1e-10)
Lower bound of factor matrices: 0
Optimization method: adam
Max iterations (epochs): 1000
Iterations per epoch: 1000
Learning rate / decay / maxfails: 0.001 0.1 1
Function Sampler: stratified with 17690 nonzero and 17690 zero samples
Gradient Sampler: stratified with 1000 nonzero and 1000 zero samples

Begin Main loop
Initial f-est: 7.428218e+04
Epoch  1: f-est = 4.717953e+04, step = 0.001
Epoch  2: f-est = 4.654388e+04, step = 0.001
Epoch  3: f-est = 4.622779e+04, step = 0.001
Epoch  4: f-est = 4.556995e+04, step = 0.001
Epoch  5: f-est = 4.513884e+04, step = 0.001
Epoch  6: f-est = 4.497412e+04, step = 0.001
Epoch  7: f-est = 4.478373e+04, step = 0.001
Epoch  8: f-est = 4.417746e+04, step = 0.001
Epoch  9: f-est = 4.361799e+04, step = 0.001
Epoch 10: f-est = 4.347086e+04, step = 0.001
Epoch 11: f-est = 4.341842e+04, step = 0.001
Epoch 12: f-est = 4.338054e+04, step = 0.001
Epoch 13: f-est = 4.341225e+04, step = 0.001, nfails = 1 (resetting to solution from last epoch)
Epoch 14: f-est = 4.336923e+04, step = 0.0001
Epoch 15: f-est = 4.336528e+04, step = 0.0001
Epoch 16: f-est = 4.335960e+04, step = 0.0001
Epoch 17: f-est = 4.336279e+04, step = 0.0001, nfails = 2 (resetting to solution from last epoch)
End Main Loop

Final f-est: 4.3360e+04
Setup time: 0.01 seconds
Main loop time: 33.04 seconds
Total iterations: 17000
Final score: 0.836891

Create and test a Poisson count tensor.

nc = 3;
sz = [80 90 100];
nd = length(sz);
paramRange = [0.5 60];
factorRange = paramRange.^(1/nd);
minFactorRatio = 95/100;
lambdaDamping = 0.8;
rng(21);
info = create_problem('Size', sz, ...
    'Num_Factors', nc, ...
    'Factor_Generator', @(m,n)factorRange(1)+(rand(m,n)>minFactorRatio)*(factorRange(2)-factorRange(1)), ...
    'Lambda_Generator', @(m,n)ones(m,1)*(lambdaDamping.^(0:n-1)'), ...
    'Sparse_Generation', 0.2);

M_true = normalize(arrange(info.Soln));
X = info.Data;
viz(M_true, 'Figure',3);
ktensor/viz: Normalizing factors and sorting components according to the 2-norm.

Loss function for Poisson negative log likelihood with identity link.

% Call GCP_OPT on sparse tensor
[M1,M0,out] = gcp_opt(X, nc, 'type', 'count','printitn',25);
fprintf('Final score: %f\n', score(M1,M_true));
GCP-OPT-ADAM (Generalized CP Tensor Decomposition)

Tensor size: 80 x 90 x 100 (720000 total entries)
Sparse tensor: 123856 (17%) Nonzeros and 596144 (82.80%) Zeros
Generalized function Type: count
Objective function: @(x,m)m-x.*log(m+1e-10)
Gradient function: @(x,m)1-x./(m+1e-10)
Lower bound of factor matrices: 0
Optimization method: adam
Max iterations (epochs): 1000
Iterations per epoch: 1000
Learning rate / decay / maxfails: 0.001 0.1 1
Function Sampler: stratified with 100000 nonzero and 100000 zero samples
Gradient Sampler: stratified with 1000 nonzero and 1000 zero samples

Begin Main loop
Initial f-est: 4.721309e+05
Epoch 14: f-est = 3.448798e+05, step = 0.001, nfails = 1 (resetting to solution from last epoch)
Epoch 18: f-est = 3.447516e+05, step = 0.0001, nfails = 2 (resetting to solution from last epoch)
End Main Loop

Final f-est: 3.4475e+05
Setup time: 0.13 seconds
Main loop time: 34.05 seconds
Total iterations: 18000
Final score: 0.954415