•1 min read•from Machine Learning
[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.
GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The official RLM implementation dropped too (69.7% to 50.2%). Our implementation - where the model writes Python to query data instead of attending to all of it with task pattern matching and entropy - went from 72.7% to 69.5%. The architecture absorbed what the model couldn't.
Also: AIME 2025 is 80% vs 0% vanilla. Same pattern as GPT-5.2. The model outputs a bare guess with no reasoning; the REPL forces it to compute via code. Reducing latency while increasing accuracy.
5.1x fewer tokens than official RLM, while 3.2x cheaper. It works with every model.
[link] [comments]
Want to read more?
Check out the full article on the original site
Tagged with
#rows.com
#financial modeling with spreadsheets
#big data management in spreadsheets
#generative AI for data analysis
#conversational data analysis
#Excel alternatives for data analysis
#real-time data collaboration
#intelligent data visualization
#no-code spreadsheet solutions
#data visualization tools
#enterprise data management
#big data performance
#data analysis tools
#data cleaning solutions
#natural language processing for spreadsheets
#natural language processing
#GPT-5.4-mini
#vanilla prompting
#accuracy
#recursive language models