Build a LangChain-based Semantic PDF Search App with No-Code Tools Bubble and Flowise
May 06, 2023hi everyone, today we are going to create an earthchain
based
app without acode
framework and for that we are going to use abubble
based
front end and a stream based back end so we will create asearch
app for PDF where we are we're going to be using open AI embeddings as well as an open AI completion API, and we'll be using the Pinecone Vector Database, which is another vendor in the space for vector databases and of course flows wisely tobuild
our backend so just to remind from our previous video any document ingestion system has these two arms the first part is taken from the document and then the text is extracted and these fragments are created which are then embedded and stored in the backend pine convector data.Basically, these texts are taken and then converted to numbers or vectors that are stored. in pine cone Vector database and on the other hand we have the
search
query when a user searches for anything from the document it will also be converted to embeds and then we look for similarities in our Vector database and once we find the relevant answers, please submit it to be completed to open AI asking to formulate an answer from these relevant results, so as mentioned it's easier to decouple these two systems and imagine as two different systems that we need tobuild
and what we see here it's those two separate decoupled systems where we're going to use the document ingestion portion and we'll use the stream mode for this and for the data query we'll use thebubble
base front end to submit the question and then the flow takes care of the rest of this diagram here So to get started the first thing is we need to Fork this Flowwise repository and basically you'll see something similar.
More Interesting Facts About,
build a langchain based semantic pdf search app with no code tools bubble and flowise...
You can keep all this information as it is and create the branch. Once you do, it will be in your account. and now we're going to go to render.com and create a new webservice once you click that you'll have these options you can select flowwise and hit connect you'll probably see something similar so I'm going to call this flowwise doc q a select the region that is closest to you and your users and then the runtime we're going to select the node in this case and the configuration here is something that flow wise has given us so if you were to go down in the In the configuration section you will see that the dependencies can be installed using yarn install build and then the application can be started using yarn start so that is exactly what we will configure here in the build commands where we will have this particular command to build our image and then this command here to start the app so use those for your setup as well the only additional thing that is key to the flow is that the free instance might not be the best to run the app because the free instance goes to sleep after 15 minutes of inactivity and that's something that won't save your flows so I suggest you start with a starter plan or any other plan a paid plan that will keep your flows active and saved so you can use them to call the API once once you do that you will see your logs will probably be something like this where it says it will install the app and run the scripts provided and it will give you a link to the app once you click the link it will have something along these lines where the interface flowwise user interface is available to you so there's a marketplace where you can see examples we're going to build something completely new so we'll start with a new flow and going back to our diagram here you need to take the document and then create these snippets and save them to pinecon's vector database to do so if I were to look in the PDF uploader this is the option we're going to select it says ok we need a text divider and output here so the input and output to this block for the text dividers we can select the text divider here which is a drag and drop recursive character attack divider so it's pretty easy and nice just drag and drop and select which one you would choose .
Like and then for the snippet overlay this can help keep the context relevant between each snippet so feel free to add any sort of snippet overlay to your document and then next thing we need to submit it to the database of Pine convectors for that. i'm going to select the vector storage option and embed the document once i do that you'll notice it takes the documents as well as the embeds and this is where we're going to provide the openai embeds so we'll look for the open ai embeds then we use it in our stream and the last thing is once we can alter the document and the embeds we need to ask the document questions and that's where we're going to use the fetch block ok this takes two options one is the stored vector retriever which we can supply here and an llm option for which we're going to use open AI as mentioned above so we'll use the llm option of openai and this completes the flow for our use case and then the next step is for us to fill in all of these keys from API so from openai as well as Pinecone and then Pinecone environment as well as index so open AI API keys can be found in your openai account the part I'd like to mention is pine cone So if you create an account on Pinecone and then create an index this is where you can give an index name and the other option is Dimensions so I set the index name to docsqa and the dimensions are 1536 so it's you might want to remember that like 1536 here this is the particular embedding model we'll be using from open AI which takes 1536 as dimensions now once you do that you'll need the Pinecone API keys as well as the pinecone environment and the other key information is the name of the index so now I've filled in the API keys and the other information so pretty much our backend is set up so just to test it I'm going to load a United States Constitution file and hit save. to ask me to give it a name point q a something i missed earlier is that the updated version of flowwise also has a namespace option and what it does is it saves your embeds in Pinecone as a separate partition so you can basically name it as the Constitution PDF document or you can name it anything so the next time you load a different document you can save it to the same namespace or you can save it to a separate namespace and once you do that when you do a search on that namespace will only search for documents you saved to that particular partition, useful if you plan to upload multiple documents and want to separate them or want to have separate accounts for different users. you want to have these namespaces for each user so that when they do a search it only looks for that particular namespace based on that user so that you have an option to test the operation of the application so I'm just going to ask what's this doc about what's going to happen is you've already done the embeds part and also changed it to Pinecone and now it's generating responses based on our workflow so this is a good sign that it's working fine now the key part we are What I'm going to use from this back end is the API endpoint so if you look at the link given here this is specific to my application so I'll use it in our bubble front end, so you might want to remember one additional thing. is that this render-provided link is specific to your app, but also if someone were to visit this link, they'll have access to the same streams you have, as well as the work you've done, so you might want to keep it. for you and also makes it a bit hard to guess for sure, so if you were to go back to your database of pine cone vectors, you'll notice that there will be some vectors stored at that index, which basically suggests running success of the flowwise UI, so now let's go to Bubble and create a new app.
I've created something very simple in this case, so we just have a text box where users can ask questions and then once they hit search, we'll have some text displayed here. so the logic behind this lookup starts with a loader again this is something for cosmetic purposes and then an api call this is where we'll call the stream backend and then once we get the response I'll set the state for this particular group here and the way you can set the state is by clicking this I and then it has custom States and I gave it a name and set it as a text variable.
So inside the workflow the main aspect is the call to the backend which can be configured using the API connector and the way you can get the API connector is by adding a plugin so once the API connector is available, we'll set it up with whatever name you want and the key here will be the api call and setup with that so i had to play around a bit to get the setup right. I assumed that the data type should be like the text header as shown here the Json content type application and then the url that was mentioned above for the stream api call and this will be a post call and the way which Json will look like is something similar so the question in quotes is important and to initialize the call you can ask any question so once everything is set up as mentioned and once you have the url of the application flowwise which is here, I'm going to take this and then paste it there and reinitialize the call just with the test question again What is this document about?
It should hit the API and it gives the output here so if we now preview the app and it looks for the same question we should get the same answer now it's going to hit the API and it will give a very good answer let's try it with another thing, what does the item do once for now? I have to do the same again. I pushed the back end and got some feedback for us so basically this is a working prototype of uploading a document to flowwise and saving the embeds to Pinecone and then generating an API that you can call from the Bubble interface with that thanks again for watching the video if you have any questions comments feel free to reach out on twitter or here on youtube and i hope to have even more content available thanks
If you have any copyright issue, please Contact