wolfram index
Published On: 7/19/24, 12:44
Author: Julian Bleecker
Contributor: Julian Bleecker
wolfram index
Interesting to imagine that this is a 'benchmark' in some possible future that is represented almost like a weather report of some description. You know when you see weather reports for different cities, and the weather changes all the time (which is why they call it weather): you might imagine that LLMs and their ability to function on particular broad task idioms may evolve rather than just be static or always getting better to '100%' — the indices vary and alter day by day such that you may make choices about which LLM 'brand' you want to use on any given day for any given task. Like having a favorite soft drink, you might have strong allegiances, but one day a Huberman type knot head says there's a new brand and it's better for you because it has this kind of characteristic which is great for online debates or gaming augments, and if you order a gig of tokens right now and use affiliate code HUBERMAN until Sunday, you'll get a t-shirt you can wear in HUBERVERSE to show everyone YOU'RE GETTIN' AFTER IT with ELMT ELELEM. Do You Want To Know More?
But also what I was going to say that the 'Functionality' column? That's like..not so encouraging. The context by which I came across this was an email about the CrowdStrike conflagration basically saying that there's no way we are ready for AGI (if you believe that's coming) as these things mostly suck at getting things right with any consistency. (Sort of more or less the sentiment of the email newsletter I got, not necessarily my sentiment, although I'm probably close to curiously skeptical/we'll get what we deserve in attitude on this point.)
cf: https://www.wolfram.com/llm-benchmarking-project/
No Text Array.
No Additional Details.