Week 10

I spent the beginning of this week continuing to train and test neural networks, which can be seen below; binary map plots were made for thresholds of 0.5 and 0.2 respectively, and the differences in the neural networks compared to their immediate predecessor have been made bold. I spent the latter part of the week writing my final report for the summer which can be found on the homepage of this website. It doesn’t include any of the interesting analysis work that I’ve done that could be used by the lab in future publications but provides an overview of the development of the neural networks.

Modification 23

  • I halved the original channel numbers and used an early stopping patience of 25
  • The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
  • Before training, I loaded the weights of the pretrained model from modification 1.27 for a starting point
  • The network stopped training at epoch 250/1000 with a training loss of 0.0226 and a validation loss of 0.0303
  • When tested on the unseen cube of data, this network had a loss of 0.573 image image

At this point, I decided to investigate how to better prevent overfitting. Though my implementation of early stopping had helped, I still noticed a significant difference between validation loss and training loss; an example of this difference, from modification 22, can be seen here. image In order to remedy this issue, I began implementing dropout…

Modification 24

  • Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
  • The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
  • The network stopped training at epoch 130/1000 with a training loss of 0.0261 and a validation loss of 0.0305 (this difference was much improved compared to before)
  • When tested on the unseen cube of data, this network had a loss of 0.267 image image image

I also determined that my model was learning too fast. The majority of the plots generated from training show a sharp decrease in loss and a fast plateau, which appears to be due to a learning rate that is too high. Therefore, I then focused on decreasing the learning rate, which had previously stayed at the default 0.001.

Modification 25

  • Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
  • A learning rate of 0.0001 was fed to the Adam optimizer
  • The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
  • The network stopped training at epoch 130/1000 with a training loss of 0.0312 and a validation loss of 0.0356, and still appeared to learn too quickly image
  • When tested on the unseen cube of data, this network had a loss of 0.299 image image

Modification 26

  • Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
  • A learning rate of 0.00005 was fed to the Adam optimizer
  • The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
  • The network stopped training at epoch 220/1000 with a training loss of 0.0295 and a validation loss of 0.0425, and still appeared to learn too quickly image
  • When tested on the unseen cube of data, this network had a loss of 0.284 image image

Modification 27

  • Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
  • A learning rate of 0.00001 was fed to the Adam optimizer
  • The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
  • The network stopped training at epoch 260/1000 with a training loss of 0.0382 and a validation loss of 0.0372, and appeared to have an improved learning rate image
  • When tested on the unseen cube of data, this network had a loss of 0.274 image image

Modification 28

  • Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
  • A learning rate of 0.000005 was fed to the Adam optimizer
  • The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
  • The network stopped training at epoch 230/1000 with a training loss of 0.0491 and a validation loss of 0.0490, and appeared to have an improved learning rate, though learning took significantly longer image
  • When tested on the unseen cube of data, this network had a loss of 0.247 image image

After I was happy with the learning rate, I played around with various other things, but these didn’t seem to improve the network much…

Modification 29

  • Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
  • A learning rate of 0.000005 was fed to the Adam optimizer, along with weight_decay=1e-5
  • The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
  • The network stopped training at epoch 300/1000 with a training loss of 0.0448 and a validation loss of 0.0461
  • When tested on the unseen cube of data, this network had a loss of 0.293 image

Modification 30

  • Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
  • A learning rate of 0.000005 was fed to the Adam optimizer
  • The small swarm was divided into 10,092 cubes with 25 voxels overlap to use for training data
  • The network stopped training at epoch 230/1000 with a training loss of 0.0283 and a validation loss of 0.0254, and it once again appeared that the learning rate was too small, but training was taking too long to increase it significantly
  • When tested on the unseen cube of data, this network had a loss of 0.288 image

Modification 31

  • Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
  • A learning rate of 0.000005 was fed to the Adam optimizer
  • The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data, then this data was duplicated, and the copied data was rotated either 90, 180, or 270 degrees to add augmentation to the training data
  • The network stopped training at epoch 90/1000 with a training loss of 0.1654 and a validation loss of 0.0930
  • When tested on the unseen cube of data, this network had a loss of 0.338 image

Modification 32

  • Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
  • A learning rate of 0.000001 was fed to the Adam optimizer
  • The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data, then this data was duplicated, and the copied data was rotated either 90, 180, or 270 degrees to add augmentation to the training data
  • The network stopped training at epoch 210/1000 with a training loss of 0.1683 and a validation loss of 0.0938
  • When tested on the unseen cube of data, this network had a loss of 0.345 image

Modification 33

  • Keeping the original channel numbers, I added bath normalization and dropout, and removed my implementation of early stopping
  • A learning rate of 0.000001 was fed to the Adam optimizer
  • The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data, then this data was duplicated, and the copied data was rotated either 90, 180, or 270 degrees to add augmentation to the training data
  • The network stopped training at epoch 1000/1000 with a training loss of 0.1567 and a validation loss of 0.0869
  • When tested on the unseen cube of data, this network had a loss of 0.329 image

I kept training some networks with different learning rates and epoch numbers to see if I could get the data augmentation to help performance, but they all remained around the same, so I don’t think they’re worth noting here. They also took much longer to train (up to 8 hours on the GPU), so my experimentation was limited by time constraints. Overall, I was surprised to see that increasing the amount of training data, either through more crops of the cube or data augmentation, seemed to make the networks perform worse.

I also tested another one of my best-performing networks (modification 28) on a large swarm to see how well it segmented those bees and was very happy with the results! Unlike all three of the previous networks I’ve tested, this one segmented the whole swarm well, and didn’t have any large components along the outer edges: image BigSwarm34

I would like to end this blog by thanking everyone who made this summer research possible. Thank you to DREU for accepting me into this program, Professor Peleg for allowing me into your research lab, and most of all to Danielle for helping me every step of the way–I look forward to continuing working with you going forward.

Written on August 9, 2024