10/03/2025
What is the IQ of AI and How and who determines this?
IQ could be derived from set of standardized tests or subtests designed to assess human competing intelligence.
The Turing Test a method of inquiry in AI for determining whether or not a computer is capable of thinking like a human being;
Data testing;
Model testing;
Code testing;
Integrations testing;
All types of benchmarking data sets, all for accuracy, precision, latency or processor inference speed or reliability, efficiency, cost, focusing on text, while other modalities (audio, images, video, and multimodal systems) remain largely unexamined, user privacy, copyright infringement, interpretability, ethics, safety areas, or explainability are practically missing all together.
Overall neither is valid, specially one can not trust , having the same and more issues as the training data sets, all could perform well in controlled environments, but failing in critical/real world circumstances over time.
One of them is the practical utility of benchmarks ignoring the discriminatory and environmental damages of AI technologies, Thus allowed for highly energy-inefficient and deeply biased AI models to reach the top of most benchmark leaderboards.
For example, LLMs leaderboards feature various metrics like HellaSwag, MMLU (Massive Multitask Language Understanding), GSM8K, or ARC reasoning, commonsense, and in-depth text understanding, including hallucination rate.
It is all about increasing the AI hype: benchmarks ”serve as the technological spectacle through which companies such as OpenAI and Google can market their technologies”.
The issue of optimising for high benchmark scores at the expense of insight and explanation is known as a form of SOTA-wild goose chasing.