What is the difference between model.to(device) and model=model.to(device)?

Question

Suppose the model is originally stored on CPU, and then I want to move it to GPU0, then I can do:

device = torch.device('cuda:0')
model = model.to(device)
# or
model.to(device)

What is the difference between those two lines?

Mano · Answer

Citing the documentation on to:

When loading a model on a GPU that was trained and saved on GPU,
  simply convert the initialized model to a CUDA optimized model using
  model.to(torch.device('cuda')). Also, be sure to use the
  .to(torch.device('cuda')) function on all model inputs to prepare
  the data for the model. Note that calling my_tensor.to(device)
  returns a new copy of my_tensor on GPU. It does NOT overwrite
  my_tensor. Therefore, remember to manually overwrite tensors:
  my_tensor = my_tensor.to(torch.device('cuda')).

Mostly, when using to on a torch.nn.Module, it does not matter whether you save the return value or not, and as a micro-optimization, it is actually better to not save the return value. When used on a torch tensor, you must save the return value - seeing you are actually receiving a copy of the tensor.

Ref: Pytorch to()

youkaichao · Answer

No semantic difference. nn.Module.to function moves the model to the device.
But be cautious.
For tensors (documentation):
# tensor a is in CPU
device = torch.device('cuda:0')
b = a.to(device)
# a is still in CPU!
# b is in GPU!
# a and b are different

For models (documentation):
# model a is in CPU
device = torch.device('cuda:0')
b = a.to(device)
# a and b are in GPU
# a and b point to the same model

What is the difference between model.to(device) and model=model.to(device)?

2 Answers

Add your own answers!

Ask a Question