Week 10

I spent the beginning of this week continuing to train and test neural networks, which can be seen below; binary map plots were made for thresholds of 0.5 and 0.2 respectively, and the differences in the neural networks compared to their immediate predecessor have been made bold. I spent the latter part of the week writing my final report for the summer which can be found on the homepage of this website. It doesn’t include any of the interesting analysis work that I’ve done that could be used by the lab in future publications but provides an overview of the development of the neural networks.

Modification 23

I halved the original channel numbers and used an early stopping patience of 25
The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
Before training, I loaded the weights of the pretrained model from modification 1.27 for a starting point
The network stopped training at epoch 250/1000 with a training loss of 0.0226 and a validation loss of 0.0303
When tested on the unseen cube of data, this network had a loss of 0.573

At this point, I decided to investigate how to better prevent overfitting. Though my implementation of early stopping had helped, I still noticed a significant difference between validation loss and training loss; an example of this difference, from modification 22, can be seen here. In order to remedy this issue, I began implementing dropout…

Modification 24

Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
The network stopped training at epoch 130/1000 with a training loss of 0.0261 and a validation loss of 0.0305 (this difference was much improved compared to before)
When tested on the unseen cube of data, this network had a loss of 0.267

I also determined that my model was learning too fast. The majority of the plots generated from training show a sharp decrease in loss and a fast plateau, which appears to be due to a learning rate that is too high. Therefore, I then focused on decreasing the learning rate, which had previously stayed at the default 0.001.

Modification 25

Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
A learning rate of 0.0001 was fed to the Adam optimizer
The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
The network stopped training at epoch 130/1000 with a training loss of 0.0312 and a validation loss of 0.0356, and still appeared to learn too quickly
When tested on the unseen cube of data, this network had a loss of 0.299

Modification 26

Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
A learning rate of 0.00005 was fed to the Adam optimizer
The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
The network stopped training at epoch 220/1000 with a training loss of 0.0295 and a validation loss of 0.0425, and still appeared to learn too quickly
When tested on the unseen cube of data, this network had a loss of 0.284

Modification 27

Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
A learning rate of 0.00001 was fed to the Adam optimizer
The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
The network stopped training at epoch 260/1000 with a training loss of 0.0382 and a validation loss of 0.0372, and appeared to have an improved learning rate
When tested on the unseen cube of data, this network had a loss of 0.274

Modification 28

Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
A learning rate of 0.000005 was fed to the Adam optimizer
The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
The network stopped training at epoch 230/1000 with a training loss of 0.0491 and a validation loss of 0.0490, and appeared to have an improved learning rate, though learning took significantly longer
When tested on the unseen cube of data, this network had a loss of 0.247

After I was happy with the learning rate, I played around with various other things, but these didn’t seem to improve the network much…

Modification 29

Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
A learning rate of 0.000005 was fed to the Adam optimizer, along with weight_decay=1e-5
The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data
The network stopped training at epoch 300/1000 with a training loss of 0.0448 and a validation loss of 0.0461
When tested on the unseen cube of data, this network had a loss of 0.293

Modification 30

Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
A learning rate of 0.000005 was fed to the Adam optimizer
The small swarm was divided into 10,092 cubes with 25 voxels overlap to use for training data
The network stopped training at epoch 230/1000 with a training loss of 0.0283 and a validation loss of 0.0254, and it once again appeared that the learning rate was too small, but training was taking too long to increase it significantly
When tested on the unseen cube of data, this network had a loss of 0.288

Modification 31

Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
A learning rate of 0.000005 was fed to the Adam optimizer
The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data, then this data was duplicated, and the copied data was rotated either 90, 180, or 270 degrees to add augmentation to the training data
The network stopped training at epoch 90/1000 with a training loss of 0.1654 and a validation loss of 0.0930
When tested on the unseen cube of data, this network had a loss of 0.338

Modification 32

Keeping the original channel numbers, I added bath normalization and dropout, and used an early stopping patience of 25
A learning rate of 0.000001 was fed to the Adam optimizer
The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data, then this data was duplicated, and the copied data was rotated either 90, 180, or 270 degrees to add augmentation to the training data
The network stopped training at epoch 210/1000 with a training loss of 0.1683 and a validation loss of 0.0938
When tested on the unseen cube of data, this network had a loss of 0.345

Modification 33

Keeping the original channel numbers, I added bath normalization and dropout, and removed my implementation of early stopping
A learning rate of 0.000001 was fed to the Adam optimizer
The small swarm was divided into 605 cubes with 13 voxels overlap to use for training data, then this data was duplicated, and the copied data was rotated either 90, 180, or 270 degrees to add augmentation to the training data
The network stopped training at epoch 1000/1000 with a training loss of 0.1567 and a validation loss of 0.0869
When tested on the unseen cube of data, this network had a loss of 0.329

I kept training some networks with different learning rates and epoch numbers to see if I could get the data augmentation to help performance, but they all remained around the same, so I don’t think they’re worth noting here. They also took much longer to train (up to 8 hours on the GPU), so my experimentation was limited by time constraints. Overall, I was surprised to see that increasing the amount of training data, either through more crops of the cube or data augmentation, seemed to make the networks perform worse.

I also tested another one of my best-performing networks (modification 28) on a large swarm to see how well it segmented those bees and was very happy with the results! Unlike all three of the previous networks I’ve tested, this one segmented the whole swarm well, and didn’t have any large components along the outer edges: BigSwarm34

I would like to end this blog by thanking everyone who made this summer research possible. Thank you to DREU for accepting me into this program, Professor Peleg for allowing me into your research lab, and most of all to Danielle for helping me every step of the way–I look forward to continuing working with you going forward.

Written on August 9, 2024