liner terms

deepseek r1 reward model