LLaVA vs. GPT-4: An Open-Source AI Showdown Highlighting Multimodal Potential and Mathematical Limitations
At the recent presentation of GPT-4, one of the standout features was its ability to engage in conversations enriched with images. However, this capability is yet to be integrated into OpenAI’s offering. While we previously highlighted Bing’s competence in this aspect, an open-source solution has now emerged in the form of the “Large Language and Vision Assistant” (LLaVA).
LLaVA’s Multimodal Potential
LLaVA is an open-source multimodal AI that combines language and vision processing. A demo of LLaVA can be accessed here.
We conducted a simple test by inputting a picture of Taxi and man, and LLaVA provided me with a descriptive analysis. However, my attempts to challenge it with mathematical problems, similar to those tackled by Bing, proved futile. LLaVA appears to struggle with mathematics, despite its proficiency in image recognition.
Challenges with Mathematical Tasks
We presented LLaVA with a mathematical problem involving trigonometry, akin to those successfully solved by Bing. Unfortunately, LLaVA could not provide the correct solution, returning an erroneous answer for the square root of 169 as 13.2.
Let’s solve this query
The task at hand involves determining the length of the diagonal of a triangle with sides measuring 5 cm and 12 cm. To accomplish this, we can employ the Pythagorean theorem, which states that in a right-angled triangle, the square of the hypotenuse’s length (the side opposite the right angle) equals the sum of the squares of the other two sides.
In this scenario, the hypotenuse corresponds to the diagonal of the triangle, which we can ascertain by taking the square root of the sum of the squares of the two given sides.
To begin, we must find the length of the diagonal, which is the side connecting the two non-adjacent vertices of the triangle. In this instance, those two vertices are associated with the sides measuring 5 cm and 12 cm.
Now, let’s apply the Pythagorean theorem:
(5 cm)^2 + (12 cm)^2 = diagonal^2
5^2 + 12^2 = diagonal^2
25 + 144 = diagonal^2
169 = diagonal^2
Finally, we can determine the length of the diagonal by taking the square root of both sides:
√169 = diagonal
The length of the diagonal is approximately 13.2 cm.
An Unusual Perspective on Images
LLaVA excels in its ability to converse with images, although challenges persist, particularly in mathematical problem-solving. For example, when presented with an image of a man leaning out of a yellow taxi window, holding a clothesline with a white shirt, LLaVA provided an unusual perspective. It suggested that such a scene is atypical, as it is not common to witness individuals leaning out of car windows while holding clothing. The analysis indicated that the man may be attempting an unconventional and potentially unsafe method of drying his shirt while the taxi is in motion.
While LLaVA offers promising multimodal capabilities, particularly in conversing with images, it faces limitations in mathematical problem-solving. It’s worth noting that Google’s capabilities in this regard surpass LLaVA’s, as demonstrated by a more accurate solution to a similar mathematical problem.
The development of AI with multimodal capabilities is undoubtedly an exciting advancement, and LLaVA is a commendable open-source effort in this direction. However, improvements are needed to enhance its mathematical reasoning capabilities to match its proficiency in image analysis.
For a more accurate mathematical solution, Google’s capabilities are currently superior: Google’s Mathematical Problem Solver.
Read more about AI:
Disclaimer
In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.
About The Author
Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet.
More articlesDamir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet.