Problem Statement: 1) Over 70% of PDFs contain critical data in images like charts and tables, especially research articles 2) Gemini is released for English only today. Can we build a solution for 1) Answering natural language questions based on images in PDFs ? 2) Making Gemini accessible for non english speakers? By leveraging Spire, Open AI GPT 3.5, Gemini Pro Vision and Trulens, I have built an application that solves both problems - Spire for Image Extraction - Open AI for Translation to English (optional) - Gemini-Pro-Vision for the answer - TruLens for Monitoring
Category tags:"excellent work. amazing and very useful idea"
Walaa Nasr Elghitany
Data scientist and doctor
"Great use of Gemini to make PDFs and images more accessible + use of trulens to make sure it's safe. Areas of improvement: - A narrower use case can often be more impactful than a general one, and bring a lot of value! Focus on selling to your first customers, not the whole market. - It would have been nice to see evaluations that validated the core capabilities of the app in addition to the harmlessness evaluations you completed."
Josh Reini
DevRel