Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I think it's a pretty common message for PyTorch users with low GPU memory:

RuntimeError: CUDA out of memory. Tried to allocate ?? MiB (GPU ??; ?? GiB total capacity; ?? GiB already allocated; ?? MiB free; ?? cached)

I want to research object detection algorithms for my coursework. And many deep learning architectures require a large capacity of GPU-memory, so my machine can't train those models. I tried to process an image by loading each layer to GPU and then loading it back:

for m in self.children():
   m.cuda()
   X = m(X)
   m.cpu()
   torch.cuda.empty_cache()

But it doesn't seem to be very effective. I'm wondering is there any tips and tricks to train large deep learning models while using little GPU memory. Thanks in advance!

Edit: I'm a beginner in deep learning. Apologize if it's a dummy question:)

question from:https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
138 views
Welcome To Ask or Share your Answers For Others

1 Answer

Although,

    import torch
    torch.cuda.empty_cache()

provides a good alternative for clearing the occupied cuda memory and we can also manually clear the not in use variables by using,

    import gc
    del variables
    gc.collect()

But still after using these commands, the error might appear again because pytorch doesn't actually clears the memory instead clears the reference to the memory occupied by the variables. So reducing the batch_size after restarting the kernel and finding the optimum batch_size is the best possible option (but sometimes not a very feasible one).

Another way to get a deeper insight into the alloaction of memory in gpu is to use:

    torch.cuda.memory_summary(device=None, abbreviated=False)

wherein, both the arguments are optional. This gives a readable summary of memory allocation and allows you to figure the reason of CUDA running out of memory and restart the kernel to avoid the error from happening again (Just like I did in my case).

Passing the data iteratively might help but changing the size of layers of your network or breaking them down would also prove effective (as sometimes the model also occupies a significant memory for example, while doing transfer learning).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...