Matrix multiplication time complexity in MATLAB -


does know algorithm matlab uses matrix multiplication , time complexity?

for completeness -- mentioned in this thread, matlab uses dgemm (double general matrix multiplication) routine blas (basic linear algebra subprograms).

note there not 1 single implementation of blas - tuned particular processor architectures. therefore cannot absolutely algorithm being used on machine without finding out version of blas in use.

the specification blas specifies inputs , output of each subroutine, , provides acceptable error bounds output of each subroutine. implementations free use whatever algorithm like, long follows specification.

the reference implementation of blas uses block matrix multiplication algorithm in dgemm has time complexity o(n^3) multiplying 2 n x n matrices. think it's reasonable assume implementations of blas more or less follow reference implementation.

note doesn't use naive matrix multiplication algorithm

for = 1:n     j = 1:n         k = 1:n             c(i,j) = c(i,j) + a(i,k) * b(k,j);         end     end end 

this because, typically, entire matrix not fit in local memory. if data being shifted , out of local memory, algorithm slow down. block matrix algorithm breaks operation small blocks, such each block small enough fit local memory, reducing number of shifts , out of memory.

there exist asymptotically faster matrix multiplication algorithms, eg strassen algorithm or coppersmith-winograd algorithm have faster rate o(n^3). however, not cache aware , ignore locality - meaning data continually needs shunted around in memory, modern architectures overall algorithm slower optimized block matrix multiplication algorithm.

wikipedia notes strassen algorithm may provide speedups on single core cpu matrix sizes greater several thousand, speedup around 10% or so, , developers of blas don't consider worthwhile rare case (saying that, this paper 1996 claims speed increase of around 10% on dgemm n above 200 -- though don't know how out of date is). coppersmith-winograd algorithm, on other hand, "only provides advantage matrices large cannot processed modern hardware".

so answer matlab uses naive, efficient , cache-aware algorithm blazing fast matrix multiplication.


i updated answer creating videos demonstrate locality of block matrix multiplication algorithm, compared naive algorithm.

in each of following videos, visualizing multiplication of 2 8x8 matrices a , b create product c = a x b. yellow highlight indicates element in each of matrices a, b , c being processed @ each step of algorithm. can see how block matrix multiplication works on small blocks of matrix @ time, , re-uses each of blocks multiple times, number of times data must shifted in , out of local memory minimized.


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -