Partial Annotation in Object Detection

In this post, I will dis­cuss two pa­pers that try to han­dle the par­tially an­no­tated datasets. Let talk a bit about why we care about miss­ing an­no­ta­tions in de­tec­tion. Firstly, la­bel­ing box is dif­fi­cult and te­dious. Increasing the size of tax­on­omy also ex­po­nen­tially in­crease the dif­fi­culty of the task. Secondly, Suppose that orig­i­nally we have a train­ing dataset with 20 cat­e­gories, later, we want to add 10 more new cat­e­gories into the model. The ques­tion is: do we need to re­an­no­tate the train­ing dataset? Or do we have any tech­niques that au­to­mat­i­cally solve the prob­lem? Recently, we have the Open Images Dataset with a huge num­ber of im­ages and an­no­ta­tions, hence, the com­mu­nity also be­comes in­ter­ested in this prob­lem. I found two pa­pers that are in­ter­est­ing:

  1. Wu, Zhe, et al. Soft sam­pling for ro­bust ob­ject de­tec­tion.” arXiv preprint arXiv:1806.06986 (2018).
  2. Niitani, Yusuke, et al. Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects.” arXiv preprint arXiv:1811.10862 (2018).

In the first pa­per, the au­thors try to study the ro­bust­ness of the ob­ject de­tec­tion sys­tem un­der the pres­ence of mis­sion an­no­ta­tions. I have done this one be­fore with COCO-like datasets. However, the au­thors have a sys­tem­atic way to con­duct ex­per­i­ments than me. Their con­clu­sion is also in­ter­est­ing:

we ob­serve that af­ter drop­ping 30% of the an­no­ta­tions (and la­bel­ing them as back­ground), the per­for­mance of CNN-based ob­ject de­tec­tors like Faster-RCNN only drops by 5% on the PASCAL VOC dataset.

The thing is: the con­clu­sion is drawn when you set the de­tec­tion thresh­old at 0. It is not pos­si­ble for any real ob­ject de­tec­tions. You have to set higher thresh­olds in or­der to main­tain cer­tain pre­ci­sion/​re­call. I be­lieve that any com­mer­cial ob­ject de­tec­tion sys­tems do that. For that rea­son, if we look at the re­sult at a thresh­old larger than 0.4, we can ob­serve a sig­nif­i­cant drop in term of mAP, which makes sense. That be­ing said, the au­thors also men­tioned in sec­tion 4 that it is im­por­tant for prac­ti­tion­ers to tune the de­tec­tion thresh­old per class when us­ing de­tec­tors trained on miss­ing la­bels”.

The lit­tle game is the sec­ond fig­ure of the pa­per in which they show the per­for­mance changes on trainval and test set of VOC2007 with dif­fer­ent de­tec­tion thresh­olds.

One thing worth not­ing in the ex­per­i­ment set­ting is that they drop groundtruth boxes across all classes, which is quite dif­fer­ent com­pared to the sce­nario in which we add more class into the trained model. While the for­mer set­ting does not change any tax­on­omy, the lat­ter re­vamps the whole la­bel in­for­ma­tion.

Now, let move on to the pro­posed method. Firstly, they sug­gest us­ing hard-min­ing ex­am­ple sam­pling to tackle the miss­ing an­no­ta­tion prob­lem. The rea­son is that by min­ing hard ex­am­ples, we can avoid ran­domly sam­pling the miss­ing an­no­ta­tion re­gions. Then, they pro­pose a fancy func­tion that weights the gra­di­ent based on the IoU over­lap­ping value. To this point, you can see that the pro­posed method is just an­other form of Balance IoU Sampler.

So, there is noth­ing sur­pris­ing here.

The au­thors also pro­pose an­other ap­proach (otherwise, the pa­per is just too short to ap­pear in any con­fer­ences). This time they want to weight the gra­di­ent of ROIs that are not pos­i­tives or hard neg­a­tives. The weight func­tion is akin to the afore­men­tioned weight func­tion. So ba­si­cally, they be­lieve their model, hop­ing that it rel­a­tively pre­dict the cor­rect ob­jects. If the con­sid­er­ing ROIs (of course, nei­ther pos­i­tive nor hard neg­a­tive ROIs) has a high score, they con­sider it as a pos­i­tive ROI and en­force the gra­di­ent, oth­er­wise just sub­due that ROI. They also men­tioned that the trained model is quite weak, hence this ap­proach does not work well. The ex­per­i­ment re­sults clearly sup­port the ob­ser­va­tion.

Now, let talk about the sec­ond pa­per. The pro­posed method is named as pseudo la­bel-guided. Their ob­ser­va­tion is that if an ob­ject pre­sents in the im­age, some parts of the ob­ject should be in­cluded as well. Say you have a car in a photo, it prob­a­bly also has tires in that photo as well. In other words, this kind of ap­proach is only suit­able for hi­er­ar­chi­cal tax­onomies.

The pro­posed method is com­posed of two com­po­nents:

  1. part-aware sam­pling: they sim­ply ig­nore the clas­si­fi­ca­tion loss of part cat­e­gories when an in­stance of them is in­side an in­stance of their sub­ject cat­e­gories.

  2. pseudo la­bels: to ex­clude re­gions that are likely not to be an­no­tated.

So, ba­si­cally, they want to ig­nore those re­gions they think are missed an­no­tated.

Table 1. in the pa­per is in­ter­est­ing. There are two no­tions here:

  1. Included: the ra­tio be­tween a set of part com­po­nent and sub­ject cat­e­gory in the same in­stance and to­tal bound­ing boxes of the part com­po­nent.
  2. Co-occur: the ra­tio be­tween im­ages ob­tains both part and sub­ject cat­e­gories and to­tal im­ages hav­ing sub­ject cat­e­gories.

The num­ber sug­gests that miss­ing an­no­ta­tion are a se­vere prob­lem in the Open Images Dataset.

Even though they sum­ma­rized two al­go­rithms in the pa­per, it is worth trans­lat­ing them to English:

  • Part-ware sam­pling: For each RoI pro­posal (Line 1), check if the as­so­ci­ated groundtruth (Line 3) con­tains part cat­e­gories (Line 4). If it does, ig­nore those la­bels (Line 6) that are not ver­i­fied (Line 5).

  • Pseudo la­bel-guided sam­pling: For each out­put from a trained model (Line 2), remove those whose score is smaller than the thresh­old or its la­bel is in the ver­i­fied set (Line 3), oth­er­wise re­move it if it is very close to any groundtruth (Line 6). After that, for each RoI pro­posal (Line 8), add those box from the fil­tered out­put to the ig­nored groups (Line 11) if its IoU with the RoI is high enough (Line 10).

Experiments

Nothing spe­cial here, they just show off their ex­per­i­ment. However, I will im­ple­ment soft sam­pling and pseudo la­bel-guided sam­pling in the next cou­ple of weeks. Let see if these meth­ods can ac­tu­ally help my work or not.