I built this same solution for myself last year, used Hugging Face's "SmolVLM". ...

		spacecadet 11 days ago \| parent \| context \| favorite \| on: Ask HN: What's the 2025 stack for a self-hosted ph... I built this same solution for myself last year, used Hugging Face's "SmolVLM". It works surprisingly well. I use the model to generate verbose descriptions of each image, embed the descriptions using another model, which I also use for the query embedding. The stack is hacky, since it was mostly for myself...