Week 2

Throughout this week I worked on labelling bees in a larger swarm to use for training data. Meanwhile, I performed some preliminary analysis using the bees I labeled last week, in order to better understand the data…

The first form of preliminary analysis I performed was to look at connected components in the original cube of bees. This visualization can be seen below using a threshold of 0.2. Using this threshold value, there are 61 separate components, and many are connected (as can be seen by the majority blue ‘bees’ in the rightmost image). I explored different threshold values ranging from 0.15 to 0.40 to try to find one which best reflected what I would expect the data to look like. Below, histograms showing the distribution of component sizes for some of these thresholds can be seen. In most of these figures, a large spike around 1 can be seen, which reflects the presence of many small, individual pixels, and there is always a singular very large component which is the background of the image.

I had labeled about 50 individual bees, and the closest I got to this value was 45 bees using a threshold of 0.17:

I then looked at the number of components and the average component sizes as a function of the threshold: There is a noticeable spike in the average component size around the threshold value of 0.25; below is the corresponding histogram for this threshold value.

I also made a similar histogram to show the distribution of sizes for the bees that I labeled. I was quite surprised by the large distribution of bee sizes–I was expecting more of a normal curve centered around one singular area. Instead, the bees seem fairly evenly distributed between 50 pixels large and over 700 pixels large. To better understand this, I visualized the 10 smallest bees and 10 largest bees. I found that the smallest bees were almost entirely bees that were located around the perimeter of the cube and were therefore cut off in certain areas. The 10 largest bees were around the size that I had come to expect and were not cut off on any edge.

All 49 bees visualized in 3D space:

The 10 smallest bees, located primarily along the cube’s perimeter:

The 10 largest bees:

I also looked at the differences in how I chose to label bees and how another person in the lab chose to label the bees. Most of the differences in our labeling occurred around the edges of bees and there was a total of 7871 pixels labeled differently–about seven percent of the total cube. Below are some images of the differences in our labeling for different slices of the data. I also visualized these differences in 3D, though I found this less easy to decipher.

After all these initial steps to better understand my input data were complete, it was time to start looking at the neural network. The first thing I did with my mentor, Danielle, was to take some time looking through the simple neural network she had built. We talked through all the steps and what was happening. Once I was sure I understood everything about the architecture, I was allowed to begin playing with it and making edits to see if I could improve the results

Written on June 14, 2024