Connecting Agents

Transcript

In this chapter, we're going to learn how to connect the agent to our video call. Let's start by obtaining the OpenAI API key, which you can do by visiting platform.openai.com or you can use the link in the description to let them know you came from this video. After that, go ahead and log in or create an account. If it's your first account, you will most likely have to create an organization. And as per project, you can use the default project.

Now, go inside of your settings. Once you are in your settings, go inside of billing and confirm you have sufficient credit balance. 5 or 6 dollars will be more than enough for you to complete this project. And after that, go inside of API keys and click Create New Secret Key. I'm going to call this Meet AI, select the default project with all permissions.

Let's go ahead and copy our key. Inside of your IDE, make sure you are in the default project. Go inside of .environment here and set OpenAI API key to be your newly copied key. The next step we have to do is set up ngrok to expose our localhost so webhook handlers can communicate to it. So head to ngrok.com and create an account or you can simply use the link in the description to let them know you came from this video.

Once you create an account you will see a welcome screen like this. In here select your operating system and simply follow the installation method. And once you've added your token, go ahead and try running this command right here. In order to properly test it, first, make sure you have your app running on localhost 3000. And after that, change the command to target, well, localhost 3000.

This will generate the following forwarding URL using HTTPS protocol. So now when you paste that inside of your URL you will be able to see your project's login screen, meaning that our project is now accessible through this URL right here. The only problem with this is every time that you run this command, a new URL will be generated. And that's quite difficult to work with. So what ngrok offers is a static domain.

If you can't find it here, you can click inside of domains here and then find it down here. And then you can find start a tunnel button and you can copy this command. So now you can go ahead and run the following, which basically adds your static URL here and change the port to 3000. So what happens now is every time you run this command you will have your own static URL. There we go, our app is accessible through that URL.

Keep in mind that it's not intended to access our app through that URL. You shouldn't use that to share your app with someone else. We will have a deployment for that because this will cause many things to fail. The only thing we need this for is webhooks. So what I want to do now is I want to go inside of my package.json here and I want to add dev webhook command and I simply want to run this.

This is the command I want to run. Ngrok http and then your static URL. So now, every time you're starting your project, in one terminal, simply type npm run dev, and in another, npm run dev webhook. As simple as that. So now that we have that, let's go ahead and let's set up Stream Webhook Handler.

So the first thing you're going to have to do is just copy your URL, your static domain URL, and go inside of Stream Dashboard, specifically Video and Audio, and click on Overview. In here, go ahead and scroll down until you find Webhook URL and paste it here. Make sure it includes the proper protocol, HTTPS. So you can visit that here, right here. And change this to be forward slash API forward slash webhook.

Make sure to not misspell it. And for the events you want to listen to, we won't be needing all of this, but feel free to leave all of them selected so it's easier to develop right now. Just make sure you have all of them selected regarding call. And after that, make sure you click Submit. Perfect.

So now Let's go ahead and let's actually build the endpoint. So we're going to do it in the place we just added, API, webhook. So inside of source, app folder, API, create a new folder called webhook. And inside of here, route.ts. Inside of this route, let's prepare by adding the following imports from stream.io node SDK.

We are going to use all of them except message new event for now. So these are the events we are interested in for this chapter. Now let's go ahead and let's also add the following from DrizzleORM and equals and not and let's also import next request and next response from next server. Then we are going to import our database. We are going to import agents and meetings from database schema.

And we are going to import a stream video from lib stream video. What we have to do now is develop a method to verify the signature for whoever is trying to access this webhook because this will not be protected via our auth. Instead it will be protected via a signature. So we are adding a function called verifySignatureWithSDK which accepts the body and the signature and calls a stream video util verify webhook using the body and the signature. Now let's export asynchronous function post with request next request here And what we have to do now is we have to obtain two headers, signature and API key, from request headers get and then their respective header names.

If you want to, you can also use the next headers import. Just in case you're using some newer version and this becomes outdated even though it shouldn't because this is the native Node.js API. So if you want to this will also work. You can also await headers and then do headers.get the same way just in case you were interested. Now let's go ahead and check if any of these are missing, and if they are, throw an error.

Now, in order to verify our signature, we have to turn the body of this request into a string. We can do that by using awaitRequest.text. And then we can do a final check. If verifySignatureWithSDK, including our body and the signature, failed, meaning put an exclamation point here, we throw back an error meaning you don't have access for this endpoint. Now let's define our payload to be unknown and let's attempt to parse the string which we just created here.

So we are attempting to parse it. And if we fail to parse it, it means it's not a valid JSON. So we can't work with it. And then, after we successfully parse it, let's prepare a constant event type. We have to assign the payload to be a type of object and then simply extract .type from it.

And now we can finally check if event type is one of our events that we need. The first one will be call session started. But before you do any development in here, what I want you to do is I want you to go outside of this if clause and simply return a success message at the end. Because otherwise, all other events which were not handled would be throwing an error. And that would eventually cause Stream to stop sending our webhook any events.

So make sure to add that. Great. Now let's go ahead to our call session started here And let's add two things. The first one will be event payload with a type of call session started event, which we imported from here, from stream IO node SDK. And then we destructure the meeting ID from event call custom meeting ID.

Make sure to put a question mark here. So what is custom? If you remember in our meetings procedures when we do the create we do await call create and we add the custom fields here meeting id and meeting name So that's how we access the meeting ID here, even though it's a webhook with no meeting information. So that's why we needed it. And we can actually remove this to do text now.

We fixed that. Now in here, we have to check if we don't have a meeting ID, because then we can just throw the error. What we have to do now is we have to find an existing meeting under certain conditions, first of all, including the matching meeting ID. So existing meeting await database select from meetings where the matching meeting ID is found. But also, we have to ensure that its status is not completed or active.

And we can also add if it's not canceled. So under no circumstances should we find this meeting, the only one we should actually look for is upcoming. So if you want to, you can actually set equals meeting.status upcoming. Or you can simply use the reverse values if you want to be explicit. Great.

Now let's go ahead and let's throw an error here if that kind of meeting was not found. And immediately what I want to do here is update this meeting to a status of active, just in case this fires twice. Because I found some rare cases where that can happen especially if it's been errored and then it may be retried so it's important to update the meeting status to active as soon as possible because what we're gonna do now is connect the agent so if this event accidentally fires multiple times it will connect multiple agents which is not good right so that's why we immediately set it to active so if even if it fires next time this will fail right because in here we explicitly say that it must not be active. And you can also add not Processing here. There we go.

So now we explicitly add all states that we don't want to have access to, or you can use the reverse and just target the upcoming ones. Great. So now what we want to do is we want to find the existing agent for this meeting that we just updated. So existing agent using agent's ID, that match existing meeting agent ID. If such an agent does not exist, we throw an error.

And now We can connect to the stream video call using the exact same method as we did in our create procedure here. So now that is the same instance. And before we go any further, we now have to install a package for OpenAI agent to connect. So I'm just going to shut this down and I will do npm install stream-io openai-realtime-api-legacy-peer-devs. And I will do npm run dev here and just make sure you have your webhook running here.

And then inside of my package.json, I will simply show you all of my stream versions just in case yours keep failing and you don't know why. It might be due to newer versions. So you can always use the exact same versions as I do if you prefer doing so. Let's go ahead now and let's actually connect the agent. So we're going to do that by adding a real-time client to be awaitStreamVideo.video, connectOpenAI, pass it the call instance, and then pass it OpenAI API key to be process environment OpenAI API key.

Always double check this by copying it from your .environment and pasting it here. The agent user id will be existing agent dot id right here perfect so now let's also update the session for this real-time client just be very very careful here this is not yet typed So you can add anything here and it will not throw errors. So be very, very careful to correctly type this. Update session instructions existing agent dot instructions. Like that.

And that is it. That's all we need for connecting the agent to our call. But before we test it out, let's add one more event here. So let's do else if event type is call session participant left. So If anyone leaves the call, first of all, let's establish this to be the payload type call session participant left event, which we imported, and let's get the meeting ID.

The only problem is in here, we need to use a special way to obtain a meeting ID. We can't do it the same way as in here because there is no call event here, because this is a participant event, right? So inside of the payload here, you won't really have, you can see you have call CID but you don't have the call ID so you have to access the meeting ID like this and if you're wondering why that format it's because it's formatted as type and then colon and then the ID So that's why we need it like that. But in case we cannot find the meeting ID, throw an error. And then let's go ahead and connect to the call again, so just as we did here above.

And what we are going to do is we are going to end the call. So the reason we are doing this is as a fail safe, just in case, so we don't leave our agents hanging in the call too long and they increase your usage on stream. So I think we are now ready to test this out. If you want to, you can comment out these three events so you don't have any errors here. And make sure you have your localhost running, make sure you have your ngrok running.

And now what I highly suggest that you do is first of all, clear all of your meetings. So I'm going inside of my Neon Tables meetings here and I have deleted all of my meetings here and make sure to create new meetings. You can ignore the fact that one of mine is active, I was testing it out. So let's go ahead and create a new one, Math Consultations, and I created an agent math tutor and I will click create here. And just to show you this new agent of mine what instructions it has so I just added you're a helpful math assistant that's it And now I will click start meeting here.

And then I'm going to click join call. And this will fire an event. And as you can see, Math Tutor has joined the call. And you can see it immediately starts recording. Now I'm going to ask it a question.

What is 1 plus 1? Now I don't know if you're hearing the answer, but it's actually telling me the answer. Perfect. So now what I'm going to do is I'm going to go inside of my Neon console and just refresh my meetings. And this should now have a status of active.

And it does. Perfect. And now what we can do here is we can just end this call and go back to meetings here. And you can see it still says active, and that's fine because we don't yet have a event which will turn it into processing or completed. So it's completely fine that it still says active.

And in here, you should now see this kind of screen here. Amazing, amazing job. So that's exactly what I wanted us to do in this chapter. And we're going to leave the rest in the next chapter when we actually add some background jobs to process what we just had. So let's go ahead and merge this now.

So I'm going to go ahead and create a new branch, 23 connecting agents. And I will stage all of my changes. 23 connecting agents. Let's commit. And let's publish the branch.

And now let's go ahead and let's review our code. I am just focused on any potential issues or bugs. And let's look at this chapter's summary. So we introduced some new features, including a webhook endpoint to handle video call events, enabling real-time updates to meeting status and agent integration. And we also added a development script for webhook testing and updated dependencies to support real-time OpenAI integration.

In here, we have the sequence diagram explaining exactly what happens in our webhook. So once the event is hit, We validate the headers and the signature, parse JSON body and event type. And in the event call session started, we find the meeting ID, fetch the agent, and then we connect that agent to the call and initialize real-time client with agent instructions. And when the participant leaves, we end the call session. Perfect.

In here, it actually recommends using an environment variable instead of hard-coded text. And honestly, I didn't even know you can do this. So if you want to, feel free to change it to an environment variable like this. You learn new things every day. In here it suggests wrapping our webhooks inside of a try and catch.

We might look into that after we wrap up the entire webhook event. And in here it suggests already updating the meeting. So it's very aware of our code base. It knows that we expect the status completed and ended at at some point. So it's super impressive how it knows our schema, but this is not the place to do that.

Completed is a status that will be handled differently. In here another try cache suggestion and I'm satisfied with our code so let's go ahead and merge it and once you've merged it go back into your main branch and go ahead and hit synchronize changes And then you should see the new branch 23 merged to your project. Amazing, amazing job and see you in the next chapter.

Transcript

So I'm going to go ahead and create a new branch, 23 connecting agents. And I will stage all of my changes. 23 connecting agents. Let's commit. And let's publish the branch.