Greetings!
I am thrilled to share the success of my latest endeavor at Right Information! As a data scientist and information designer, I had the opportunity to contribute to an impactful project called ‘AI-based gene expression profiling’ within our expanding AI department in biomedicine research.
The project involved several key parts. First, I trained predictive models using gene expression datasets for various cancer types. Next, I analyzed these models to identify the most crucial genes. Lastly, I integrated AI-driven scientific literature analysis to extract essential insights, which greatly aided our domain experts. This step, in particular, was significant for the challenge.
I crafted a series of plots that vividly present insights from the dataset and thousands of analyzed publications. One particularly engaging plot is showcased on our page, allowing interactive exploration and utilizing our contextual chatbot to provide deeper insights. Additional details about the study can be found on our page and in our published paper.
Notable features of our web application include:
- Information Retrieval Animation: An animation with graphs showcasing gene regulation and correlations extracted from the literature. Each graph line represents a scientific article connecting two nodes, enabling users to retrieve information by clicking on a line.
- Enhanced GPT 3.5 Chatbot: Our chatbot, powered by an enhanced GPT 3.5 model, is a remarkable asset. We split research papers into smaller text chunks and calculated their semantic vectors using the ‘text-embedding-ada-002’ model. All this valuable data resides in the Chroma database, allowing the chatbot to respond with the most relevant content when users inquire about a specific publication. We reduced model hallucination using the novel Langchain library, though we faced some challenges working with this cutting-edge tool.
- Knowledge Base Summarization Table: To provide comprehensive problem-related insights, we generated a table of questions and answers based on GPT 3.5 analysis of 10k research papers.
You can explore my work on the GitHub repository, where you’ll find the code and additional details in the readme: GitHub Repo
Discover the web application I submitted for the challenge here: Genomic Analysis Web App
I hope my project inspires fellow enthusiasts to explore the potential of Dash and similar technologies in AI-driven research. Your feedback is invaluable to me, as I am always striving to learn and improve. Though I am relatively new to this framework, any guidance would be highly appreciated.
Major problems I couldn’t overcome:
- Asynchronous callbacks
- Handling long callbacks in a multipage setup
Thank you for taking the time to engage with my work. I look forward to hearing from you and contributing further to the growing community.
Best regards,
Antoni Dąbrowski
Data Scientist and Information Designer
at Right Information