Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

The goal is to train YOLO with multi-GPU. According to Darknet AlexeyAB, we should train YOLO with single GPU for 1000 iterations first, and then continue it with multi-GPU from saved weight (1000_iter.weigts). So, we don't need to change any parameters in .cfg file? Here is my .cfg when I trained my model with single GPU:

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

AlexyAB says: modify .cfg "if you get Nan". In my case, I'm not getting Nan, but my loss is fluctuating. Shouldn't we change anything when we continue training with multi-GPU? batch? subdivisions? learning_rate? burn_in? We just need to continue training with same configurations?

question from:https://stackoverflow.com/questions/65838473/yolo-change-parameters-for-multi-gpu

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

You will need to change burn_in, max_batches and steps between the two cases, for example, if your final target is 500200, your first .cfg file should have this:

burn_in=100
max_batches = 50000
policy=steps
steps=40000,45000

and the second file like this:

burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000

You need only to change learning_rate if you get a Nan according to this, then you should divide learning_rate by the number of GPUs and multiply burn_in by the same number.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...