Higher-level, semantic clustering, classification or understanding of video scenes