We’ve released a new computer vision model for iNaturalist. This is our first model update since April 2022. The iNaturalist website, mobile apps, and API are all now using this new model. Here’s what’s new and different with this change:
To see if a particular species is included in this model, you can look at the “About” section of its taxon page.
Our previous model included 55,000 taxa and 27 million training photos. The new model was trained on over 60,000 taxa and almost 30 million training photos.
During previous training runs, our strategy was to train the entire model on the dataset. This means that all of the model weights were candidates for being updated, in order to learn the most efficient and useful visual features for making suggestions for the taxa in that dataset. When training this model, we froze most of the model weights (thereby freezing the visual feature extraction) and only trained the very last layer of the model, the layer that makes the taxa suggestions. This is a machine learning strategy known as transfer learning.
One way to think about this is to imagine that someone was asked to learn all about different kinds of cars. Later, that person was asked to differentiate between two different kinds of pickup trucks, but only using distinguishing characteristics they learned from their study of cars (for example, color, size, visual shape, branding, engine size, etc), without learning anything new about pickup trucks (for example bed capacity, towing limits, etc). Chances are, that person could distinguish between most kinds of trucks without needing to learn anything new specifically about pickup trucks. They may not perform as well as someone who learned about trucks from the beginning, but they have strong foundational knowledge to draw upon for the task.
Our new model was trained using a transfer learning strategy. We used the internal weights and visual features from our previous model which was trained on 55,000 taxa. The advantage of this approach is that we didn’t need to learn all of those internal model weights and visual features again, so training was quite a bit faster. It’s only been four months since our last model was released, which is the shortest time between model releases so far.
As with the pickup truck analogy, it could be that this model trained with the transfer learning approach is slightly less accurate overall than if we had trained the entire model again. However, in our testing this new model appears to achieve nearly the same accuracy as the previous model while containing more taxa. Our plan going forward will be to spend the time fully training a model about once a year to maximize accuracy with new photos and taxa, and to use the faster transfer learning approach in between full training runs so we can release models more frequently than we have in the past.
First, we are still working on new approaches to improve suggestions by combining visual similarity and geographic nearness. We still can’t share anything concrete, but we are getting closer.
Second, we’re still working to compress these newer models for on-device use. The in-camera suggestions in Seek continue to use the older model from March 2020.
Thank you to everyone in the iNaturalist community who makes this work possible! Sometimes the computer vision suggestions feel like magic, but it’s truly not possible without people. None of this would work without the millions of people who have shared their observations and the knowledgeable experts who have added identifications.
In addition to adding observations and identifications, here are other ways you can help:
Comentarios
Excellent! Thank you for making these continuous improvements to the process, and thank you for sharing this concise update. Well done!
Wow! Thanks for sharing & thank you staff!
Awesome! Thanks for the explanation of how this works and even more thanks for making it work in the first place :)
Very nice work! I'm excited to see more updates. I love this use of this technology and love to see how its evolving.
Impressive, very nice. Glad to see continued updates to the infrastructure.
In which countries or continents will this new model perform better then the old model? Can you say that the 5.000 added new species are mainly from the continent Asia or that the improvement is about insekts, worms or centipeds?
Congratulations!
I want to highlight this discussion in the iNat-Forum: https://forum.inaturalist.org/t/possible-increase-in-cv-errors-around-organism-range-location/34411
I for my part noticed (without being aware of a new model released) less accurate suggestions lately, with especially the 'seen nearby' species disappeared - maybe the new model is putting regions with fewer observations at a disadvantage, and favors especially North American species?
Good!
A question: is it possible, for the future, to use the shared computing power of users (those who voluntarily make themselves available) to train the entire model on the dataset, as is done, I believe, in astronomy, to manage large masses of data?
As reference: https://boinc.berkeley.edu/
Nice and important improvement! Congratulations!
Please - can we have a blog post, or just a link to explore - for the 5K new species?
How many for Africa? How many plants?
(With those gifted graphics you have brought us before?)
Excellent!
That´s great news! Would also like to know a bit more about those new additions, if this is possible.
This is fast!
This is great! iNaturalist is truly the perfect example of how technology and nature can come together to create something amazing! I would love to see a list of what species have been given the honor of being added! I hope they were a lot of hexapods..! But I don't wish to pressure anyone :)
I know at least one new CV-species 😄
Thank you for the update!
Have the "Included" labels on species "About" pages been updated?
Thanks so much! Been waiting for these small updates that help a ton!
https://www.inaturalist.org/taxa/260419-Panopeus-herbstii meets the 100 observation mark but isn’t included.
HELP!! I have been searching for "how to's" on this site - and while the drop down menu says "video tutorials" and other invitations, when you open it up it says "this page does not exist". I am so grateful for this tool but after trying to use it for a few years, I still am not proficient and would sincerely appreciate being able to learn from somebody who is proficient. THANKS again for this amazing tool and all your work.
Could we please get the new species list? It might help people see any misidentifications that they may have had.
Yes the new species list would be fascinating. :)
In April, when the export for this model was created and training was started, there were only 96 verifiable observations and only 28 research grade observations of this taxon, so it's possible that it was under one of the taxon cutoffs.
@carnifex - the new model was released just a few hours before this blog post was published, so none of the observations mentioned in that forum post would have been affected by the new model.
@valentino_traversa - unfortunately, computer vision training is not (to my knowledge) modular and granular the way that many scientific computing jobs are.
@dan_johnson - yep they are automatically updated when the model goes live.
Whoop whoop!
Awesome stuff y'all! Thanks for sharing!
Great stuff, folks!
Great stuff, and seems like a smart strategy!
Thanks once again for everyone's hard work!
Magnificent! Wonderful! Amazing! :) Always love to hear about these updates.
Nice! Is there a way to see all the new species included in this model?
I'm happy to see the new ants added, hopefully even more will come with the next round as IDs continue going through.
Geographical inclusion is necessary.Many times the computer suggestions are faulty because the species is not found in that region.I think that should be the way to go.
Any way great progress so far.congratulations.
Thanks, and the plain-language explanation of transfer learning is appreciated!
@alexshepard @valentino_traversa
I would be happy to contribute some processing power to distributed deep learning as well.
It seems like someone has worked on that topic. Here is a paper from 2021 I found: https://arxiv.org/pdf/2103.08894.pdf
Are males and females of dimorphic taxa learned separately?
@trichopria, no they are not, and neither are egg, larva, pupa, and adult of insects that have complete metamorphosis.
Nor are flowers, seeds, leaves, trunks, tubers, etc. - but it is an interesting idea for the future!
That's good to hear. The computer learning is one of the factors that keeps me motivated to put in so many hours photographing and editing photos to provide the best photos I possibly can for the learning models. I am always curious if anyone else puts in as many hours as I do every day to get the best possible taxon photos.
Although I suspect most of us who post thousands of photos do sometimes crop our pictures to make identification easier, sounds like you, @royaltyler , do a much better and more consistent job of making the photos as good as possible. That's great!
Super good news! Transfer learning FTW!
Añade un comentario