Prior implementations of Strassen's matrix multiplication algorithm on GPUs traded additional workspace in the form of global memory or registers for time. Although Strassen's algorithm offers a ...