llama.cpp Qwen2.5-0.5B-Rag-Thinking-Flan-T5

  • I use forked llama-cpp-python which support T5 on server and it's doesn't support new models(like gemma3)
  • Search query generation(query reformulation) Tasks - I use flan-t5-base (large make better result,but too large for just this task)
  • Qwen2.5-0.5B as good as small-size.
  • anyway google T5 series on CPU is amazing

Huggingface Free CPU Limitations

  • When duplicating a space, the build process can occasionally become stuck, requiring a manual restart to finish.
  • Spaces may unexpectedly stop functioning or even be deleted, leading to the need to rework them. Refer to issue for more information.
1024 8192
0.1 2
0.1 1
1 100
1 2