# Sparse Versus Scarce

Many thanks to Jed Duersch for discussion of these concepts and contributions to this post.

Sometimes the term sparse is used to refer to a matrix that has a large fraction of missing entries, but the more typical usage of that term is to refer to a matrix that has a large fraction of zero entries.

Unfortunately, these concepts are often conflated for a variety of reasons. For instance, a common technique for replacing missing entries is to replace them by zeros. Assuming the data is centered, i.e., has mean zero, this may be a reasonable approach in some situations. Conversely, zeros are sometimes used as indicators of missingness, though this makes it impossible to identify known zeros and is not recommended!

We want to develop terminology and approaches for real-world data where both zeros and missing entries can be prevalent, and we need to be able to sort out which is which as well as store the information efficiently.

For the purposes of this discussion, we have matrices of size $m \times n$ with $p$ known nonzero entries, $q$ known zero entries, and $r$ unknown/missing entries. Necessarily, $p+q+r=mn$. We use the notation $\cong$ to indicate being approximately the same size and the notation $\ll$ to indicate at least one order of magnitude smaller.

# Dense Matrices with No Missing Data

If the matrix is dense and has no missing entries, we store it the usual way, i.e., in an array of length $mn$ with an explicit value for each entry.

• Storage for matrix: $mn$
• Storage for location of missing entries: N/A # Dense Matrices with Some Missing Data

If the matrix is dense and has a few missing entries, we store it the usual way, i.e., in an array of length $mn$ with an explicit value for each entry. The missing entries can be any value, but we recommend zero since it’s convenient for certain computations that are common in such a scenario. We have to separately store the pattern of missing entries. We could store the locations as an array or hash table, but in this case it is just as easy to store it as a dense 0-1 indicator matrix.

• Storage for matrix: $mn$
• Storage for location of missing entries: $mn$ # Sparse Matrices with No Missing Data

We say that matrix is sparse if $p \ll mn$. In this case, it’s more efficient to store and operate only only the $p$ nonzeros and their locations rather than explicitly storing the zeros. Implicit elements are defined to be zero, and we can infer the locations of the $q \cong mn$ zero entries from the locations of the nonzeros.

• Storage for matrix: $\mathcal{O}(p)$
• Storage for location of missing entries: N/A # Scarce Matrices

I propose to use the term scarce if $p+q \ll mn$. That is, in the case that very few entries are known (and some of them may be zero). The relationship between $p$ and $q$ is irrelevant to the storage in this case. We store the $p+q$ known entries and their locations, requiring $\mathcal{O}(p+q)$ storage. The (few) zeros are stored explicitly. Implicit entries are defined to be unknown, and we can infer the locations of the $r \cong mn$ missing entries from the locations of the known entries.

• Storage for matrix: $\mathcal{O}(p+q)$
• Storage for location of missing entries: 0 # Sparse Matrices with Some Missing Data

This isn’t scarce since the majority of entries are known, but just happen to be mostly zeros. In this case, $p+r \ll mn$. We can separately store the $p$ nonzero entries and their locations and the $r$ missing locations, using a total of $\mathcal{O}(p+r)$ storage. The known zeros are stored implicitly, and we infer the locations of the known zeros from the locations of the known nonzero and missing entries.

• Storage for matrix: $\mathcal{O}(p)$
• Storage for location of missing entries: $\mathcal{O}( r )$ # Sparse with Mix of Known Zeros and Missing Data

If $p \ll mn$ but $q \cong mn$ and $r \cong mn$, then we have a difficult situation. In order to track the pattern of missing data, we require $mn$ storage. In this case, it’s probably best just to revert to the “Dense Matrices with Some Missing Data” case. However, it is possible to save space by storing the nonzeros and their locations along with a compressed binary indicator matrix. We can get 32-64X compression simply packing the binary entries rather than storing them as regular integers or doubles. If there is some pattern to the missingness (like every third entry is missing), then that’s another way to achieve compression. But generally there is no efficient way to handle this situation. # Use Cases

Exactly how the missing data is to be incoporated may impact the data structures.

Mask. In some cases, the locations of missing entries may need to be masked to zero so that they are omitted from a computation. In this case, we would ideally have a 0/1 matrix that is efficient to multiply with.

Selection. In other cases, we need to select randomly amongst the known entries. This is used, for instance, in stochastic gradient descent. We may need to be able to quickly determine if a selected entry is missing so that we can do rejection sampling. This entails usage of efficient data structures such as hash tables.

# Commentary: How Sparse is Sparse?

I usually recommend that a matrix should not be considered sparse unless fewer than 10% of its entries are nonzero. Even if the storage is reduced by using a sparse representation, the computations are much less efficient and so dense storage is often preferable. It’s only when the storage savings is an order of magnitude or more that a sparse representation makes sense.