You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!
With many variations/platforms of AI/ML coming into play at an ever-accelerating rate, does anyone have thoughts of how the cost/performance/benefits will be evaluated, or will it be a wild west type culture for a number of years? Is there any central source for looking up this information or will one be created? If so, who will evaluate and grade AI/ML platforms? Will there be a central site, where an organization/individual can look for guidance to pick the best option for them? Also, what are the views of how the future of AI/ML will be handled? With great power comes great danger if not handled properly and any thoughts in this area would be appreciated.
I'm just guessing here. But it feels like in order to compare different AI/ML services (let's assume for the moment we're restricting the question to things like ChatGPT which could be directly compared) then you'd want some sort of "standard" benchmarks where you could compare things like the reliability, stability, response time, cost, safety, bias and other quality metrics of different tools. Could also compare power consumption !
I guess governments will want to try to start doing some of this for safety and bias. And tech journalists/analysts/etc will also cover some of it (if they aren't already - I'm really not following closely).
I can't see any central site any time soon. There's no central agreed reference for CPUs or GPUs - so why would there be for more complex, harder to measure AI/ML hardware/software systems ?
Perhaps it's like cars. Governments benchmark the safety stuff. You take your pick of reviews for the other factors. And then test drive a few.
I can easily imagine that the results coming out of AI engines will be unstable (variable) for some time and that grading the quality of results will be difficult - an order of magnitude more difficult than things like CPU benchmarking. With all the benchmark gamesmanship that inevitably leads to.
We also don't know how well such systems will scale and age. We assume that they will only get better and more reliable over time, but there's no guarantee. Nor do we know what the useful lifecycle of these systems is and whether they can be easily scaled up in the field.