Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I built this same solution for myself last year, used Hugging Face's "SmolVLM". It works surprisingly well. I use the model to generate verbose descriptions of each image, embed the descriptions using another model, which I also use for the query embedding.

The stack is hacky, since it was mostly for myself...






Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: