Week 6
One mini research question that came about in discussion with Danielle this week was how much time could be saved creating training data if using a preliminary neural network’s output to assist with labeling. Currently, I am spending hours labeling bees to have more data to train the network on, but I also have a preliminary network that is capable of identifying bees fairly well. Therefore, it would theoretically be possible to feed new data into the network and use the output as a foundation for labeling; instead of starting from scratch, I would just have to clean up the data and fill in any gaps. We think this process may greatly increase the rate at which we could produce training data, therefore helping the network learn better, faster. Though this wouldn’t work for the swarm I’m currently working on, as I’ve come too far to be helped by the network, I’ve begun to look a little into how long the labeling process is taking to assess how much time could be saved.
I timed how long it took for me to label 10 bees. I chose clearly defined bees, and began the timer after I had located them; on average, each bee took around 8 minutes to label with a standard deviation of almost 2 minutes. The fastest bee to label took only 5 minutes and 43 seconds while the slowest took 11 minutes and 5 seconds. The total time for all 10 bees added up to about one hour and 24 minutes. I would roughly estimate that I spend about 75% of my time actually labeling the bees, and the rest (1) locating bees to label, (2) sorting through noise, (3) identifying the boundaries of blurry bees, and (4) performing quality control on completed bees. Additionally, bees that are not well defined (of which there are many) take significantly longer to identify (even double the time in some cases). I also took these time measurements after labeling upwards of 350 bees total, so my time had definitely decreased with increased practice. Therefore, if I were to assume that all bees were well defined, locating the bees was instantaneous, and minimal quality control was performed past the initial labeling, it would take roughly 6.6 hours to label a cube of bees containing 50 individual bees. Labeling a small swarm containing 400 bees would take at least 53 hours. Considering this is an extreme underestimation of the actual time it would take, it is clear that labeling bees for training data is an extremely time-intensive process. Therefore, it will be interesting to see how much this time could be decreased using starting labels from a neural network.
Actual times recorded (recorded as “minutes:seconds”) : 9:36, 8:03, 5:43, 7:28, 6:54, 8:40, 11:05, 8:26, 8:00, 9:51
By the end of this week, I was able to finish labeling about 350 bees.
At the end of this week, I was also able to start working on writing the code that I will be able to analyze the swarm with once my labeling is completed. Though I have some results from that already, I won’t put them here since they are done on an incomplete swarm. By next week, I should have the swarm fully labeled and will hopefully be able to report some preliminary findings.