India/kerala

SIH grand finalist

SIH grand finalist
April 8, 2024
Project for the SIH Grand Finale 2024, Fuzzify addresses challenges faced by police officers and others managing large public databases, where variations in names—such as Laxmi being written as Lakshmi, Lakxmy, or Lackshmy—create inconsistencies. Fuzzify leverages a fine-tuned Llama 3.2 1B model, optimized for lightweight performance, to predict all possible pronunciations of a given name in English Latin script. It outputs these variations in IPA (International Phonetic Alphabet) notation, effectively capturing the full range of pronunciations. A custom embedder then converts the IPA representations into vectors, which are stored in a vector database. Searches are efficiently performed using the cosine similarity algorithm to match and retrieve relevant results
  • Finetuned LLMs: Fine-tuned Llama 3.2 1B model, optimized for lightweight performance, to predict all possible pronunciations of a given name in English Latin script
  • Phonetic embeder: A custom embedder then converts the IPA representations into vectors
  • vector database storage and retieval: thecustom embedder vectorises the name and then they are stored in a vector database (chomadb). Querry are efficiently performed using the cosine similarity algorithm to match and retrieve relevant results based on their pronunciations
  • Frontend: A flutter app to demonstrate the capabilities of the backend algorithm
  • Flutter: For building a cross platform mobile apps.
  • Chromadb: The vector database to store the embedded vectors
  • Unsloth: To fine tune the llm model
  • Python fastapi: To setup the backend server.