# PyTorch: fast array computations on GPU

When using numpy at the beginning of a project we don’t think about performance. This is because it is more important for us to get solution that works. And only then, with real data, observing how our computer begins to slow down, we have to find answer on how to avoid this. And then we remember the GPU inside our computer and think: can it help us?

Yes, sure. PyTorch can utilize GPU and perform all computations on it. PyTorch uses Tensor primitives like numpy arrays. Unlike numpy, once declared tensors reside on GPU, and are calculated on it. Due to a lot of the parallel working GPU inits, computing performance arises dramatically.

We will take the matrix multiplication algorithm for the example.

First, let’s make sure the multiplication computations are performed correctly.

Output:

Everything is as expected.

Now, tensor comes into play. At this point, all the necessary modules and drivers have already been installed:

Output:

Let’s allow PyTorch to multiply matricies:

Output:

The result is the same as for the numpy.

Take a larger matrices and see how much time it takes to calculate in each of the three modes:

1) PyTorch in “cpu” (numpy – like) CPU only mode
2) PyTorch in “cuda” (GPU) mode
3) Numpy CPU mode

Output for M=100

Output for M=1000

Output for M=10000

For small matrices (M=100) numpy looks like a performance champion. Even torch in cpu mode loses to it. The worst results are observed for the GPU.

For a medium-sized matrices (M=1000) the results are leveled out. Numpy and torch in cpu mode are the same and only torch in gpu mode is far behind.

At the moment, everything looks sad for GPU. It makes a real breakthrough for large-sized (M=10000) matrices. Here the gap in execution speed is huge: 37 milliseconds for GPU versus 12 seconds for numpy.

Simultaneous computing is good