## 12.6 Computational Problems with Very Large Portfolios

In principle, mean-variance portfolio analysis can be applied in situations
in which there is a very large number of risky assets (e.g., \(N=5,000)\).
However, there are a number of practical problems that can arise.
First, the computation of efficient portfolios requires inverting
the \(N\times N\) asset return covariance matrix \(\Sigma\).
When \(N\) is very large, inverting \(\Sigma\) can be computationally
burdensome. Second, the practical application of the theory requires
the estimation of \(\Sigma\). Recall, there are \(N\) variance
terms and \(N(N-1)/2\) unique covariance terms in \(\Sigma\).
When \(N=5,000\), there are \(12,502,500\) unique elements of \(\Sigma\)
to estimate. And since each estimated element of \(\Sigma\)
has estimation error, there is a tremendous amount of estimation error
in the estimate of \(\Sigma\). There is an additional problem
with the estimation of \(\Sigma\) using the sample covariance
matrix of asset returns when \(N\) is very large. If the number of
assets, \(N\), is greater than the number of sample observations, \(T\),
then the \(N\times N\) sample covariance matrix:
\[\begin{eqnarray*}
\hat{\Sigma} & = & \frac{1}{T-1}\sum_{t=1}^{T}(\mathbf{R}_{t}-\hat{\mu})(\mathbf{R}_{t}-\hat{\mu})^{\prime},\\
\hat{\mu} & = & \frac{1}{T}\sum_{t=1}^{T}\mathbf{R}_{t},
\end{eqnarray*}\]
is only positive semi-definite and less than full rank \(N\). This
means that \(\hat{\Sigma}\) is not invertible and so mean-variance
efficient portfolios cannot be uniquely computed. This problem can
happen often. For example, suppose \(N=5,000\). For the sample covariance
matrix to be full rank, you need at least \(T=5,000\) sample observations.
For daily data, this mean you would need \(5,000/250=20\) years of
daily data.^{83}
For weekly data, you would need \(5000/52=96.2\) years of weekly data.
For monthly data, you would need \(5,000/12=417\) years of monthly
data.

**Example 2.31 (Nonsingular sample return covariance matrix)**

To illustrate the rank failure of \(\hat{\Sigma}\) that occurs
when the number of assets \(N\) is greater than the number of data
observations \(T\), consider computing \(\hat{\Sigma}\) for
the six Vanguard mutual funds in the **IntroCompFinR** data object
`VanguardPrices`

using only five monthly observations:

`## [1] "vfinx" "veurx" "veiex" "vbltx" "vbisx" "vpacx"`

`## [1] "Jan 1995" "Dec 2014"`

```
VanguardRetS = na.omit(Return.calculate(VanguardPrices,
method="simple"))
covhat = cov(VanguardRetS[1:5, ])
```

A quick way to determine if \(\hat{\Sigma}\) is full rank
(and invertible) is to compute the Cholesky decomposition \(\hat{\Sigma}=\hat{\mathbf{C}}\hat{\mathbf{C}}^{\prime}\),
where \(\hat{\mathbf{C}}\) is a lower triangular matrix with non-negative
diagonal elements. If all of the diagonal elements of \(\hat{\mathbf{C}}\)
are positive then \(\hat{\Sigma}\) is positive definite, full
rank, and invertible. In R, we compute \(\hat{\mathbf{C}}\) using the
function `chol()`

:

Here, `chol()`

returns an error that indicates \(\hat{\Sigma}\)
is not positive definite and less than full rank. If we try to invert
\(\hat{\Sigma}\) using `solve()`

we will also get an
error indicating \(\hat{\Sigma}\) is not invertible:^{84}

\(\blacksquare\)

Due to these practical problems of using the sample covariance matrix
\(\hat{\Sigma}\) to compute mean-variance efficient portfolios
when \(N\) is large, there is a need for alternative methods for estimating
\(\Sigma\) when \(N\) is large. One such method based on the
*Single Index Model* for returns is presented in Chapter 16.