Gvenet And Alice ((full)) Access

: Pure text pre-training does not adapt well to visual grounding; the AG-ALICE integration requires careful tuning of attention temperature.