Archivos de Diario para enero 2021

22 de enero de 2021

Thoughts on the challenges of spider-IDing on iNaturalist

Long effortpost TL;DR: I am proposing that the people who help identify spiders on iNat (though this probably also applies to many other groups of small arthropods) make more liberal use of the Data Quality Assessment flag "No, it's as good as it can get" when we believe it is unlikely an observation will ever get a more specific ID. This is intended to be a conversation starter among the handful of people who follow spiders and help with identification. Again, this is not entirely specific to spiders but that's my area of study/interest so that's how I wrote this out.

Recently I spent a couple of weeks trying to review "all" of the Texas spider observations in Needs-ID state and give reasonable IDs where I could (even just moving from Order -> Family for later review). For people who have spent a lot of time sorting through these buckets, you are probably familiar with some of the headaches. For a variety of reasons, many observations are simply not identifiable to any reasonable level (say, Family). After some discussion with a few of the other active identifiers, we decided it would be helpful to start tagging "unidentifiable" observations as such - using the Data Quality Assessment flag labeled "No, it's as good as it can get." This will turn the observation to either Research Grade or Casual depending on the community taxon level and (maybe?) the number of people who agree with that assessment. So it will be filtered out of most peoples' (default) identify criteria.

My main motivation for doing this is the steady increase in observations outpacing the ability of the identifiers to keep up with - There is a limited number of people actively reviewing spiders in North America (my main focus but also the bulk of observations) while iNat's popularity is increasing. So the number of observations is growing rapidly, and the number of identifiers seems to be either flat or dropping. I don't have exact numbers (although certainly the data is out there) but I recall from previous conversations that in 2019 the number of Needs-ID (Araneae / United States) was approaching 200k. By mid-2020 it was over 300k, and at the end of 2020 it was over 400k. Over 40% of the total observations in iNat's history were uploaded in 2020 alone. Going through these buckets is fairly time consuming if you are trying to be accurate and (even better) helpful. Over a 2-week period (probably averaging ~8 hours a day) I reviewed something like 20 thousand observations (rough guess) and made about 5-6 thousand IDs - mostly to family or genus. Probably less than 10% of those IDs were specific and probably only a couple percent actually resulted in an observation reaching RG. Of course I spent a lot of time consulting the literature and BugGuide, trying to include helpful comments where I could, etc. - so I was not going for maximum speed, but still was attempting to get through as much as I could in the time I had. Just trying to give a rough idea of the time it would take an average(ish) person to work through a pile of a given size. A month of work and 10,000+ IDs later and I feel like I'm about where I started.

Anyway, before long I decided to start marking observations I considered plainly "unidentifiable," to remove them from Needs-ID status. The rough criteria I initially used was that, due to the photo quality, I couldn't confidently place the observation in any particular family, and I doubt anyone else could either. I did not apply this to anything with clear photos that I simply wasn't familiar with, nor did I apply it to confusing taxa like the many similar-looking Agelenidae, Philodromidae, Thomisidae, Dictynidae, etc., where it was a good enough photo(s) but I couldn't identify it further. Because probably there is someone out there who studies the Dictynidae and is familiar enough with the patterns to make better IDs (even if that happens years later) and I don't want to get in the way of that. Basically, just photos where the quality/focus/angles could not justify even a family level ID. One example would be a photo of a "typical" orb web with no spider - so you could give an ID of Araneoidea (could be Araneidae, Tetragnathidae, maybe Uloboridae?) - but is it really necessary to keep that as "Needs ID" ? In most cases I left a copy/paste comment along the lines of "Unfortunately there is likely not enough detail to give a more specific ID" or "It is an orbweaver but I can't be sure which kind" so the observer at least knew that someone reviewed it.

Examples of where I have been applying this:

  • Photos that are plainly too blurry or distant to even suggest a family
  • Photos that are too dark and I could not improve sufficiently with basic photo editing software
  • Night photos lit with flash (mostly orbweavers in webs) where only the rough shape is visible
  • (Most) shed exoskeletons that do not seem to have identifying features other than 8 legs
  • Partial/abandoned webbing with no animal visible
  • Multiple possible species/genera and definitely not enough detail in the photos to be more specific

A lot of these are cell phone images from users who made an iNat account, posted an observation or ten, then never came back. Many of them seem to be what we call "duress users" - students who had to make X number of observations for a school assignment, then never came back. I definitely support that (we want more people to discover iNat) but it leaves a lot of "frass" as BugGuide calls it. Also I want to make clear that I fully appreciate the challenge of making good photos of tiny (often moving) animals and I am not trying to criticize anyone's photography. Spiders are difficult to photograph well, even with dedicated equipment (I still suck at it) and I don't want to discourage people from submitting these observations. But at this point iNat has a rapidly growing pile of spider photos that I feel will never even be reviewed, and I think removing "unidentifiable" things from Needs-ID as we go will eventually help the small group of people who are willing to spend their time on this. Of course I know it is not really possible to make a definite ID without the specimen in hand, and for that reason many observations may never reach RG, and that's fine. But there are 1000s of cases where we have the same photos being reviewed by the same 5 or 6 people over the course of several years, each individually making the determination that "It looks like some type of orbweaver maybe but that's the best I can do" and then it is left there for the next person. Which eats up a lot of time and seems unproductive/frustrating to IDers. So I am trying to find a way to make things better without being too aggressive/critical or accidentally "hiding" something that could be scientifically interesting.

Some other ideas I have had in parallel with this:

  • An Observation Field indicating the observation has (multiple) high quality photos - for easier review, maybe by more seasoned arachnologists.

    Could be particularly helpful for the smaller or more cryptic spiders like Erigoninae/Linyphiinae, Thomisidae, uncommon Therirdiids, etc. The idea being that we could present a more curated subset of high quality observations (e.g. all of wildcarrot's photos :) ) and request help from outside experts.
  • An Observation Field indicating the observation contains microscope photos - this is uncommon but I think would be useful.
  • "Holding bins" to help sort easily-confused or similar-looking taxa (like many Clubionidae/Cheiracanthidae/Anyphaenidae) for later review

    Joe Lapp did some initial work on this while he was more active on iNat (I think he stepped back partly because of the stuff I'm hoping to improve)
  • Observation fields or some other way to tag things like egg sacs/webs/spiderlings for further review, but get them out of Needs-ID
  • Some easy way to tag-team other IDers on observations that need more people to correct the community ID

    A common example is: Computer Vision said *Oecobius* (it's not), some other person agreed, so now we need 4 votes to fix it. This could take years to happen naturally, especially on older observations.

I ran some quick numbers while I was working and found that almost 40% of the total observations in iNat's history (Spiders / Texas) were made in 2020. Almost 40,000 observations, just spiders, just Texas. For USA it was well over 40%. Over 2/3 of all US spider observations (400,000+) are Needs-ID. I expect iNat will continue to grow at a steady pace, or at least I don't see any reason why its popularity would suddenly fall off. This is awesome, but is overwhelming for the limited number of volunteers we have to try and sort through everything. So that's pretty much it - I am looking at this as a way to make Spider-IDing-on-iNat better for us, without upsetting observers or obscuring any potentially-interesting observations. I welcome anyone's thoughts. I chose a journal entry because many people are not active on the iNat forum and this seemed the best way to involve everyone who might have input. It might not be the best forum for an active conversation but we'll see.

I did save a bunch of representative examples of things I would or wouldn't treat as "unidentifiable" for various reasons, but I didn't include it here because I didn't want this to seem like a call-out post - more a group problem solving thing. But if there is interest I can include some examples. I have had this basic conversation with several people individually so I thought a sort of group discussion might be productive.

Thanks for reading (sorry for the wall of words) and any opinions you would like to share about this, and thanks for the work you do to make iNat so awesome!


Publicado el 22 de enero de 2021 a las 08:33 AM por jgw_atx jgw_atx | 6 comentarios | Deja un comentario