My student trained some classifiers on ImageNet and the result was that all quadrupeds were predicted as dog.
We then switched to MS Coco and found that -- while the caption generation is often good for a laugh -- the object detection was not pretty good in most cases.
Not detecting sheep on trees maybe shows that the Deep Networks now have actually a good "common sense"
I wonder how it fare on the columbine harvester picture?
https://www.reddit.com/r/funny/comments/7rkvq4/i_had_to_look_a_few_times/
@deeds But Cloudsight is an interesting case! It seamlessly uses humans for the hard one and since the caption was part of the photo...
Object: "green and yellow combine harvester"
Scene: "This Look Like A Sick Concert Text"
@deeds Ha, what a picture! As you predicted, Microsoft Azure (trained on Coco, I believe) reported it as "a crowd of people"