In this chapter, we're going to connect the seeded voices to the UI and set up an external text-to-speech API client. We're going to build a voice selector with a drop-down, which is going to communicate to TRPC Voices router that we are also going to create. Each voice is going to have an avatar and we're going to have grouped sections for team voices which we are going to create ourselves. And When I say we, I mean users of the app, right? And we're gonna have a separate section for built-in voices, which is a user-friendly way of seeding, of saying seeded voices, right?
The ones we created in the previous chapter. We're also gonna learn how to self-host the actual text-to-speech client. So we will use Chatterbox text-to-speech along with Modal Serverless GPU and that will use FastAPI Python server with generated TypeScript types using OpenAPI architecture. By the end, the settings panel right here will have a working voice selector and we will be able to actually try out our self-hosted text-to-speech client. So, let's go ahead and start with building the voice router.
So those are going to be, these are going to be the key concepts. The TRPC Voices router because it's going to be the first custom router that we're building within TRPC. We're going to learn how to create a context for Voices because we're going to have to combine both team Voices and built-in Voices. We're going to learn how to use Dice Bear Collection for custom avatars. And we're going to, in production, or should I say in a real-world example, show how to do the server prefetching, which we kind of explained in the last chapter, and how to do OpenAPI-typed client using FastAPI in Python, and how to host the chatterbox text-to-speech model.
So let's start with the Voices router using TRPC and database querying followed by creating the UI which will render that and finally generating the chatterbox client. All right, so make sure you are on your main branch, make sure you have npm run dev running and this is the last thing we did on localhost 3000 forward slash test, we build a trpc test page. So the first thing we can actually do is get rid of that. So let me zoom in just a bit and we're going to go inside of dashboard and we can remove the entire test page. We can also go inside of trpc, routers, app folder and we can remove the entire thing from here too.
And instead what we can do is we can do voices and we can add voices router. We can remove the base procedure and I'm just going to fix this import to be leaner. There we go. Voices router currently doesn't exist. That's fine.
Let's go ahead and change this to go back to localhost 3000 so we can see our home page. And now we're going to go ahead and actually develop the Voices router. So we're going to go inside of routers here and we're going to create a new file Voices.typescript. Let's go ahead and add the imports which we are going to need. Zod, TRPC error from at TRPC server.
This is a library, not our internal TRPC server. You can notice a subtle difference. When we are referring to our trpc folder and then this server, we use another slash here, see, but that throws an error. So this is a package and this is our local trpc server. Prisma from database and delete audio helper from our recently created R2 instance.
Alright, then let's go ahead and simply export const VoicesRouter using createTRPCRouter which I have seemingly forgot to import. The most important thing. And let's also import organization procedure, which is another thing we don't really have, so we're going to have to add that. Okay. So I'm going to go ahead and go back to underscore app folder and I'm going to import VoicesRouter from forward slash Voices.
So The out procedure and the organization procedure and base procedure are something we were supposed to do in the TRPC chapter but I have seemingly forgot to do that so let's go ahead and do it now. Perhaps this is better because during the TRPC setup we were very confused with all the setup and it was too much information at once so maybe it's better that we do it now. Let's go and set up TRPC init file. So in here we have something called a base procedure. A base procedure means a public procedure.
Everyone can access it and everyone can query it. So now we're going to go ahead and do the following. I'm going to go ahead and clear up my TRPC context. For a simple reason, I don't need this. I'm just going to go ahead and make it be this.
In fact, we can make it even easier. We can just make it an empty object. If you want to, you can turn it... You can leave it like this if you want to. Simply if you want to see this documentation, because we copied this from TRPC.
So basically, this is what it was. If you want to, you can just return an empty object. I mean, This won't change the behavior at all because we're not using this context for anything. So perhaps I should answer, why not implement ALF here? Well, for a simple reason, in order to check whether a user is logged in, we need to query clerks await ALF.
And it makes no sense to do that in here because this context is run for every single procedure right so it makes no sense to add additional overhead to both base procedures and to out procedure right and additionally we are going to need organization id too so we would be adding additional overhead to load that so because of that context is not something we are interested in right now. So you can just go ahead and make it return an empty object. Then let's go ahead and go down here. So this is base procedure and using base procedure we're going to learn how to build an authenticated procedure so this will call auth only when needed let's export const auth procedure t.procedure.use async Let's go ahead and open a function and let's go ahead and extract next from the props. It's important that this eventually returns next.
You can see that that fixes all of the issues. This is basically a middleware. A middleware always needs to go next, right? A middleware isn't the end request. And from here, what we're gonna do is we're going to extract user ID and organization ID from a waitout which we can import from clerk nextjs server and then we're just going to do some checks.
If there is no user ID we're going to throw a TRPC error. We can import that from at TRPC server. Again, this is a package, right? And with a code unauthorized. And then we're just gonna go ahead and extend the context with user ID which needs to exist and organization ID, which doesn't need to exist, but still it's useful to check, at least, if we have it.
I mean, if you want to be strict, you can do this. And then you will never ever rely on Organization ID using out procedure. Instead, if you ever need an Organization procedure, you will build an Organization procedure, which requires both user ID and organization ID. So it is very similar. You go ahead and export const org procedure which calls t.procedure.use Async function extracts next.
Go ahead and immediately return it so you fix the errors. And in here, we're going to go ahead and again, extract user ID and organization ID. And we're going to do a check for a missing user ID. But we're also going to do a check for a missing organization ID. In this case, we are forbidden, not unauthorized, but forbidden from moving forward.
So organization required. Because if a user doesn't have an organization, there is no way for us to confirm that this user, even though they are logged in, how do we know whose organization do they belong to? That's why we need that. And let's go ahead and extend the context object. There we go.
So that's what we wanted to do before. And while we are here, because I'm certain I will forget, we're gonna enable transformers. Transformers basically allow us to parse data, specifically date, map, set, over the wire between server and client. Basically, it's very hard to explain what it is until you see it in action. But the problem is, if you don't enable these data transformers, you will start to see some errors when you pass props from server component to a client component.
Most of the time, this won't really... It doesn't really matter to us because of the way we are architecturing our app, but still it doesn't hurt us to enable this. So that's exactly what I'm going to do. Let's go inside of the init file and let's enable SuperJSON. Let's go ahead and do npm install superjson.
While we are here we can save the file even though it's an error. Let's go inside of client.tsx and let's just check if there is somewhere here we should enable superjson. Here it is. So let's go inside of the RPC client. Let's go ahead and remove this part and let's enable transformer super JSON.
Let's go ahead inside of server.tsx. So the RPC server and let's do the same thing. Let's check if there is somewhere here where we need super JSON. Looks like here we don't need it and let's do query client check. Here we need it.
So let's go ahead and uncomment serialize data and uncomment deserialize data and uncomment the import superjson from superjson. Then let's go back to client.tsx and let's add an import SuperJSON from SuperJSON. Let me just check that it's a named import. It is. So SuperJSON from SuperJSON should fix all errors.
And last but not least, the init file. I'm gonna go ahead and import SuperJSON from SuperJSON. So, seemingly right now, you won't notice any difference, but I will try to capture a moment where it might be easy to understand why we need this. Again, this is a set and forget. You will never modify this again.
That's why I'm not paying too much attention to it. I'm not trying to disregard it as unimportant. I just don't want you to be intimidated if you don't immediately understand what it is. Okay, Now that we have that ready, let's go back to what we attempted to build, the Voices router. So regarding the Voices router, the last thing we did is we connected it here, right?
In the app router, make sure you have Voices, Voices router. And now let's go back here and now we can extract organization procedure from the init file. Perfect. And what that allows us to do is it allows us to well, use the organization procedure for various procedures that we need to do. For example, the first procedure I want to do is GET ALL.
That's going to be an organization level procedure. Let's define the queries for getAll. So we're gonna use Zod and we're gonna define params. It will accept query. Query will need to be trimmed and optional.
And the entire parameters are optional. So if you don't want to pass query, you don't have to pass an empty object. And then let's go ahead and define this query. So an asynchronous function. Let's define, let's extract context and input from here.
And now let's go ahead and use this optional query to check if we should enable a search filter across our Prisma database query. So if we have input question mark dot query Let's go ahead and do one thing. Otherwise, let's go ahead and do another thing. So in the important one, let's go ahead and write a Prisma query by adding or and then two options. So If we have a query, meaning if we have a search, let's go ahead and search by name in first element.
So contains input.query mode insensitive as constant. So we are using typecasting here because you've probably noticed that we are writing Prisma filter here, but we didn't really define a type for it. So you can see this doesn't throw an error, but it should. So we are using as const to stop it from erroring when we eventually pass it down to a Prisma query. That's why we're using typecasting here.
Alright, so yeah, you should be careful with your typing here. Or name contains input.query mode insensitive as constant. And in the second, we're gonna query in description. So either we will try to find users search in the voice name or in the voice description. So again, if it contains input query and mode insensitive.
So this way we are searching through voices titles or should I say names and their descriptions. Great. And yes, the else can be empty, just an empty object. And then in here, we're gonna go ahead and load separately custom voices and system voices. We can do that by calling prisma voice findMany within a promise all.
So, since it expects two results, we need to have two Prisma voice many. So I'm going to go ahead and duplicate them and in the first Prisma voice many I'm going to go ahead and open a where and in this where I'm going to go ahead and search for a specific variant of custom and only custom voices whose organization matches the currently logged in users organization ID. What is context.organizationID? How did I get that? Well, let's take a look.
Where does the context come from? It comes here from .query. How is this populated with user ID and organization ID? Because of the organization procedure. If I change this to base procedure, you can see down here we have an error.
Organization ID isn't guaranteed here. That's why we invented the Organization procedure. It extends the base procedure and safely appends UserID and OrganizationID. That's why we needed to do that. We are fetching all custom voices made by this organization and we're simply going to spread search filter.
If you wrote anything incorrectly within the search filter, you will get errors here. I think we can try it out if I do something like this. Maybe not. Okay. Not my proudest code, but yeah, it's a way to like not do the same search filter twice.
Perhaps it's safer to do it because this isn't throwing me any errors. So we'll see. And what we have to do is we have to give it an order by. So after where, let's do order by created at, so the latest ones. And let's use Explicit select here to only choose what we want to send to the front end So I want to send the ID name description Category language and variant.
So the reason I'm using select here is because I don't want to send R2 object key for example. Now technically you could use omit and then just do R2 object key true. You could do that but I think it depends on your like safety threshold because imagine if alongside R2 object ID, you add another super important, super private field here. You will have to remember to update all of your omit fields, right? So select is more explicit than omit so it's a matter of you know your privacy security threshold and so make sure you just have these and then in this one we're going to go ahead and have a similar where but it's going to query by a system, right?
In here we are querying by custom which requires adding an organization id, but in here we're just querying all built-in voices. Let's go ahead and add an order by here and let's add the exact same select. Now let me go ahead and see what did I do wrong here. Await Promise all which was supposed to be an array. My bad.
So open an array within promise all. You can see the square bracket which I've just added and end it here. And now there should be no errors. Perhaps I can retry this now. If I change this.
Okay, still nothing is telling me that it's wrong. So yeah, okay, I'm just going to leave it like this and we're simply going to see if search works or doesn't. Excellent. So now You can see we have custom and we have system. So we are separating them on the back end.
So we don't have to do any filtering on the front end. And we can just return them like this. Custom and system. So if you go ahead and do npx prisma studio, you will see that all of the voices we have added in the previous chapter have a where is it variant of system. So these are all system voices.
So currently the custom is always going to be empty because we don't have functionality to create new voices yet. But that's what we're doing. We're separating them on the backend so we don't have to do massive slow filtering on the frontend just for an improved user experience. Now let's go ahead and add another procedure here. So this is going to be a delete procedure.
Again, we're going to use organization procedure because only users from an organization should be able to remove something. The input it will accept is an ID and this is not going to be a query, this is going to be a mutation. Okay, so what do we do inside of the mutation? Well, first things first, we need to find this voice using the ID. We need to check if it exists.
And more importantly, we need to check that this ID that was just passed has a variant of custom because we don't want to allow a user to delete a system voice, so only if the ID that the user attempts to delete truly exists in the database, exists with the variant of custom, and exists with that ID, variant of custom, and a matching organization ID of the currently logged in user, only then shall we proceed forward. And immediately, if that is not the case, we're gonna go ahead and throw an error. Technically, we could have called Prisma delete directly here, but I like to have explicit errors such as voice not found. Maybe there's even a way to do that, I don't know. But now we can safely do this, which is just a simpler delete query and what we should do now is we should also clean up our R2 storage so if we do have voice R2 object key let's go ahead and await delete audio voice R2 object key and simply catch any errors here.
So obviously this is something that could be improved in a way that we're not really doing anything if this fails. So there is a chance, you know, if this scales, hopefully, I hope you manage to scale this to millions of users, there are probably chances you will have some orphaned R2 uploads because eventually this will start to fail, right? Because one in a million chances start to increase when you have millions of users, of course. So because of that, what I would recommend doing here in the future, you know, when you go through production is try to put this in some kind of background job with automatic retries or something like that, or at least have some cron job scanning for orphaned R2 uploads. So you don't have something filling your storage that's actually deleted from the database.
But for now, for tutorial purposes, this is more than okay. And once that is deleted, let's go ahead and return. So if you want, you can add a comment, in production consider background jobs, retries, cron jobs, et cetera. Brilliant, so we now added two procedures. One to get all voices, and one to delete a custom voice.
Great. Now, I want to go ahead and build the UI for this. So I want to be able to fetch the voices and display them in a voice selector. In order to do that we need to install at Dice Bear collection and we need to install at Dice Bear core. These are two packages used for a very simple task of creating unique avatars, which is just a very nice way of displaying our voices, because otherwise it just doesn't look as good.
Now I'm going to go ahead and go inside of Source Components and I'm going to create a folder called VoiceAvatar. In here I'm gonna create useVoiceAvatar.ts and I'm gonna go ahead and add the whole thing. So we need use memo from React, create avatar from Dice Bear Core, and glass from Dice Bear Collection. So if you search for Dice Bear, you will find a bunch of collections you can use, be mindful of their license, of course, and you can customize these avatars as you wish. I find the glass one to be perfectly fine for this project.
Now that we have this reusable hook to generate the voice avatar, let's create the actual component voiceAvatar.tsx. Let's mark it as useClient. Let's go ahead and import our CN util, which helps us with class names. And let's import our recently created use voice avatar from .slash use voice avatar. Let's define the props which are required string, my apologies, required seed, required name and optional class name.
Then let's go ahead and export function voice avatar which has seed name and class name props as we defined above and in here The first thing we want to do is we want to generate the avatar using our useVoiceAvatar and passing in the seed. And this will return us avatar URL. And fun fact, this will generate an actual SVG. It will not actually create a network request to some URL. Right?
It's just that this is technically a URL. I think it's just base 64 generated URL. Okay, let's go ahead and return and let's build the simple avatar component. So the avatar component is going to have a class name size 4, border white and shadow extra small. Let's go ahead and fix the typo and let's make this dynamic and just passing class name which is a prop.
So if the user wants to modify this beyond what it currently is, they can do it using the class name prop. Let's go ahead and render the avatar URL and give it an alt-off name to the avatar image component. And let's create a simple avatar fallback with a very small text in case for some reason avatar URL fails. This will almost never happen because we're using base64 instead of a network request so I don't see how it can fail but still it's nice to have a fallback which will simply you know display two letters. Great okay That's it for the voice avatar.
Now what we have to do is we have to build the text to speech voices context. We need this because it's how we're going to populate the voice selector. So let's go ahead inside of features text to speech let's go ahead and open a new folder contexts and inside of contexts let's go ahead and do text to speech, voices, context.tsx. And in here I'm going to add use client, we're going to import create context and use context from React, we're going to import type infer route outputs from TRPC server, again, from the package, not from our add forward slash, okay, from the package. And we're also going to import our app router type from our TRPC routers app.
So what app router is, it's basically a single type definition of our entire API. You can see that in here we have voices and the voice router and here we have the export for this. And in here you can see I have voices. I have get all procedure. I see exactly what input it takes and exactly what output it provides.
And now we have to define the type of a singular text-to-speech voice item, which is basically a single voice which has ID, name, description, category, language, and variant. Since custom and system share the exact same filters, we can choose which one of those we want to use. So I'm going to go ahead and write the following. Infer router outputs, give it app router, and then simply traverse through the router. So we are going into the voices router, getAll procedure, and you can choose between custom or system.
They are the same. And simply use number, this indicates any item in the array. And now when you hover over tts voice item, you should have the exact same item here. Now let's go ahead and create the interface for this context. So thanks to text to speech voices context value we'll have custom voices, system voices and all voices all sharing the same prop an array of TTS voice item.
Then let's go ahead and create the actual context using create context, passing the value from above or null if it's the initial value. Now that we have that, we can go ahead and export function TTS voices provider. This provider is going to have children and value. The value is a type of TTS voices context value. And very simply, let's return the provider.
TTS VoicesContext, which we defined above, dot provider with a value, value from the props and render the children inside. As simple as that. And then let's go ahead and export one more thing. Use text to speech voices function. This hook will be used to access custom voices, system voices, and all voices in any single component which is basically rendered inside of the children.
So let's go ahead and get the context using useContext, passing text to speech voices context. And in here, let's throw an error if the context doesn't exist. So if a component attempts to use this hook outside of this provider, we have to throw an error. This is a standard practice when developing context. You've probably seen this error yourself from various other packages.
For example, if we attempted to call use-trpc without this, it will throw the same error. That's how context works. So this is the equivalent of us building somewhat of how TRPC built their context and provider. Much simpler, of course, but I think you get what I'm trying to say here. All right.
And now we can build the actual voice selector. So I'm going to go inside of source here. I'm going to inside of features, text to speech components, and let's open voice selector dot TSX. Let's go go ahead and import useClient and useStore from 10-stack React form. Let's go ahead and import VoiceCategoryLabels from features, voices, data, voice categories.
So if you remember, it's basically a remapping of voice category enum from Prisma schema into user-friendly labels, fixing capitalization, adding spaces. I chose to do it in a manual way rather than programmatically. Technically it could be done, but I feel just better doing it this way. And then let's go ahead and import the following field and field label from components UI field. Then every single thing from select, which is select content group item label separator trigger and select the value.
Then let's go ahead and reuse our use app form so basically it allows us to share our form values if you remember in this component in this component and now in this component That's why we need it because they are three separated components who share the same form. So that's why we needed to develop it in this way. Right? We created a context for that here. Then let's go ahead and import our newly created voice avatar.
Then let's go ahead and import useTextToSpeechVoices from our newly created context folder, textToSpeechVoices.context, so outside of components in here, inside of text to speech, text to speech voices context. And besides that we also need text to speech form options which we have in the same folder. So where it is text to speech form it's right here so we are reusing this so we have the exact same form access in a completely separate component. Perfect. So, let's export this component called VoiceSelector.
And immediately we can go ahead and get custom voices, system voices and all voices using the UseTextToSpeechVoices hook. The only thing we're gonna do is we're gonna remap all voices to voices for simplicity sake. Then I'm gonna go ahead and define the form, use typed app form context and pass along text to speech form options. Then what I'm gonna do is I will check if I have a voice ID inside of my default values from the form because I have to somehow keep track of what is the currently selected voice ID. So I'm going to do that with useStoreSelectorHook.
So reading from form.store, I'm going to read s.values.voiceid and that will give me voice ID. And this is completely type safe, right? Because voice ID exists inside of text to speech form options. That's why we are able to read it and this is how we will get the current value of voice ID. And in the same way, let's add isSubmitting.
Then, let's go ahead and see is this voice ID, which we currently have in our form, actually existent? What does that mean? Well, something in our form currently has a value of voiceID 1 to 3. But what we have to check is does this voice ID exist within custom voices or system voices or even all voices, right? Because if it doesn't, it can either mean it's invalid or it existed before, but now it's deleted.
So because of that, we need to carefully search for it. Right? Selected voice. Let's go through the alias for all voices, which is voices, and let's search for a single voice with a matching ID. Okay?
And now let's check if it has a missing voice. How do we do that? Well, let's define. Has missing selected voice if either voice ID is missing or if we cannot find a selected voice. And then let's go ahead and decide what to do.
OK, so what's the current voice? Well, it will depend on the following. If we have selected voice, let's go ahead and use the selected voice. As simple as that. Otherwise, if we deducted using these two rules that this is a missing selected voice, we're going to go ahead and add a ternary here and immediately we can fix the other part so this is if and this is else the else is simply gonna say okay just display the first item in the array that's it okay But otherwise what we're gonna do is we're going to give this a name of unavailable voice.
So if a glitch happens We are going to display to the user like I don't know how you got this voice but it is unavailable. I cannot find it in an array of items here. Okay, that's what we're doing now. Okay, and now we can go ahead and build the actual UI. So in here, we're gonna go ahead and do some form composition here using field label to call this voice style.
Then we're going to render the select component, give it a value of voice id, onValueChange of value, calling form setFieldValue and control the property voice id and populate it with the new value. Disabled if the form is submitting. Then we're gonna add a SelectTrigger component with a class name FullWidth, HeightAuto, Gap1, RoundedLarge, BackgroundColorWhite, Px2, and Py1. Then we're going to render the select value. In the select value we're going to check if we have the current voice.
Okay. And if we have the current voice we're going to open a fragment and inside of the fragment we're going to add two elements. The first element is going to be a voice avatar which will accept the current voice id and the current voice name to generate a unique looking avatar And the second element will be a span with a class name of truncate, text small, font medium, and tracking tight. In here, we're going to go ahead and render the voice name and then beneath or should I say next to it, we're going to render a current voice dot category. But we have to check if the category actually exists.
If it does, we're going to go ahead and add a space, another space, sorry, space, dash, another space, and then in here we're going to go through our voice category labels and pick current voice.category to map to a human readable result. There we go. Perfect. And that's it for the select trigger. Now let's go ahead and let's build the select content.
So that's outside right here. And let's check if we have a missing selected voice, but we have managed to populate the current voice. In that case, let's go ahead and do the following here. So we're going to select group. Then we're going to add select label with selected voice.
We're going to add selected item with value current voice ID. We're going to render the voice avatar with seed and name. We are going to repeat the span from above. Right? Current voice name, current voice category, if it exists it's going to be mapped to human readable voices.
And the class name will be Truncate, text small and font medium. And then we're going to go ahead outside of the select group here and we're going to check the following. So let me try and expand a bit. So if inside of parentheses customVoices.length is larger than 0 and systemVoices.length is larger than 0 And only then render the select separator. Okay?
And now we have to render the other part. So outside of this fragment and outside of this curly bracket and this normal bracket. So basically what this was, this was a category for a missing custom voice. But now we're actually doing the real custom voices here. So let's go ahead and render a select group.
Let's render select label with team voices text inside and then let's iterate over custom voices dot map. Let me see what's wrong here. Okay, I have to render something. Customvoices.map will simply use select item and okay, no, that's not what it was. Let me check.
Something is wrong here. Am I missing? I am missing another parenthesis. There we go. Select item has a key and a value.
And inside we're going to render Voice avatar with seed and name. And then we're going to go ahead and repeat the span element once again, but this time it can be simpler because at this point, it's guaranteed that a category label exists. Okay, because this isn't some scenario of a missing voice, right? Which God knows what category it had at the time. Perhaps we decided to deprecate that voice.
Maybe it was deleted from the database. We can't really know. But this one is a simpler use case because we know this is currently active, this exists, there's no reason for us to do the complicated label thing. And then outside of this select group, let's go ahead and let me see, my apologies, not here, not outside of select group, outside of this here. Let's check again.
If custom voices.length is larger than 0 and system voices.length is larger than 0, let's render a select separator and now in here we can go ahead and we can actually copy the entire thing here and paste it and simply replace custom voices with system voices. And we can replace the label, which was team voices, and change it to built-in voices. And everything else is exactly the same. They have the same props, they look the same, they have the same class name. That's it.
That is our voice selector field. So it looks complicated, but it isn't really. It's mostly complicated because of this weird custom voice has missing selected voice, but we need to cover that case because we allow voice deletion. Alright? So because of that, we need to think of user experience.
Another reason this might happen is later, if you have a billion voices, you might decide to add pagination. At that point, you should just say unavailable voice until you adjust this to work with that pagination. You're going to have to rethink a bit how it works. But, I mean, this isn't really expensive to load because these are just like strings. So now that we have this and we built the API endpoint, let's go ahead and let's render this properly.
So we're going to go inside of source, app folder, dashboard, text to speech page. And in here we have text to speech view and we're going to adapt this a little bit by adding TRPC, Hydrate Client and Prefetch from at TRPC server this time from our TRPC server right so if you do this it doesn't work if you add a forward slash it works It's basically our server instance with hydrate client and prefetch function. So add all of these. And Then what we're gonna do here is we're gonna go ahead and add search params as a prop. Because remember we can now do searching.
And let's go ahead and do this. Search params. And in here we will accept text and voice ID. So okay, I just told you it's because we will accept searching. My apologies, that's not what these search params are for.
What these search params are for, are for this scenario. So you remember the homepage. On the homepage, when I click try now, look at the URL. The URL has text. So that's how we are going to populate the form, okay?
By looking at what text was passed. And we will also later be able to pass a voice ID through the URL. That's what this is for. My apologies. I confused this with our search functionality, which doesn't come until later.
Okay. So let's go ahead and extract that using await search params which means we have to turn this into an asynchronous function And once we have text and voice ID here, we are ready to prefetch. So let's prefetch trpc.voices.getAll and simply empty query options. And then, instead of just rendering text-to-speech view, we're going to render hydrate client wrapping the text to speech view. Okay.
And let's go ahead and do one more thing. Let's extend text to speech view with initial values passing along the text and the voice ID. Alright, now that we have added that let's go ahead and modify the text to speech view component. So inside of features, text to speech, views, text to speech, view. Let's start by modifying the the props and for that we're gonna need to import where is our text to speech form here it is let's extend this import by importing type text to speech form values and then inside of this function we can go ahead and request initial values and let's go ahead and give it the proper type so they are optional but their proper type whoops their proper type is a partial version of text-to-speech form values So all of these could technically be passed through URL.
Okay. But we will for now only allow text and voice ID. We don't want users to, you know, do features which we didn't enable ourselves. Okay. Great.
And now what we have to do, we have to populate our text to speech context with voices. We can now do that because inside of this component page we just prefetched all voices, meaning we now in this cache, which we're going to call here, have access to all the voices. So let's go ahead and add the following import. I'm going to import use suspense query from 10 stack react query and I'm going to import use trpc from at forward slash TRPC client. Again, this isn't a package.
This is our TRPC client. Great. And we also need to import our newly created context. So text to speech voices provider context text to speech voices context. All right.
Now that we have that, we can initialize TRPC with a hook. We can go ahead and fetch the voices. So, let me go ahead and collapse this so you can see it in a more readable way. UseSuspenseQuery, trpc, VoicesRouter, getAllProcedure, emptyQueryOptions. The exact same one we prefetched here.
So this is essentially using the same key. So it's going to read from cache and it will say, hey, I already have this because the server component has prefetched it and hydrated it to me. So it's a much faster load than it would be if we just did use query. You can do that, but it makes no sense since we have access to server components, right? Let's leverage them.
Because we chose Next.js, so let's use Next.js, right? And from here we have access to data, which are voices, custom and system, right? And now we can, just for readability sake, destructure them like this. So from voices we can extract custom and system. Let's map this to custom voices and this to system voices.
And then let's go ahead and generate all voices. How? Well, by combining them together. And then let's go ahead and define the fallback voice ID to be the first item inside of all voices or if even that doesn't exist fall back to an empty ID. Alright now let's go ahead and do a scenario where a voice no longer exists if it's deleted, for example.
So, requested voice may no longer exist deleted. So let's fall back to the first available voice. So I'm going to go ahead and do resolved voice ID. We're going to look through initial values and check if it has an existing voice ID. Let's check if in those initial values we can find a voice ID.
If we can, we are simply gonna allow the user to populate this form using the initial values voice ID but if not we're gonna have to fall back to another. Okay, what does that mean? Well, if someone in the future calls text to speech with voice ID deleted, right, this voice ID no longer exists, we cannot just blindly populate that to our form all right because we have access to all voices here so how about we check against them to see if that voice actually exists, right? So if they add a proper ID and we find it in all voices array, which is a combination of all custom voices and all system voices, Sure, proceed. Otherwise, fall back.
That's what we solved with this. Okay? I know it's a bit confusing to do it now. I say now because we don't really use it now. But I will forget it if I don't implement it now.
Okay? In the default values now, which we cast as text-to-speech form values, let's spread default text-to-speech values, let's spread default initial values, and our custom solution for the voice ID so we get rid of a deleted voice ahead of time. Great! And now we have to wrap the entire app within a text-to-speech voices provider. So let's go ahead and do that.
There we go. And in here we have to pass the value. This value will be Custom Voices, System Voices and All Voices. There we go. So now every single component within text-to-speech form has access to all voices at all times.
And we also have to modify the default values of text-to-speech form with default values here. And you can already see something happened here. Let's go ahead and go back to dashboard. Try clicking on one of the try nows. You will see that this is automatically populated now.
So that's what we just did. Okay, we enabled initial values passed through a URL. But this text is the easy part. The hard part was the voices. Why do I say it's the hard part?
Well, it's easy to just load an ID, But the problem is what if we pass a voice ID through the URL but we didn't load the voices yet? That's a problem. We can't know if it's been deleted, we don't know what avatar to assign to it, we don't know what category to assign to it, we don't know anything besides its ID. That's why this complication exists. We are doing this in an industry standard way So before the form initial values are initialized, we first fetch the record that the user is trying to autofill.
The user is trying to initialize, give an initial value to. That's why we're doing this complication. I hope this kind of clears it up. Great. So I believe that's it for this component.
I think we might be at the last component now. That is Components, Settings panel, settings. Okay. So this component right here can finally now render the voice selector. So let's go ahead and remove the paragraph.
We can leave the ID, we can leave the class name as is and let's simply render the voice selector. It's a simple import .//voice selector because they are in the same folder. If you want to, you can change it to components like this and you can see that we didn't have to pass any props to it because it uses use text-to-speech voices. It has access to these voices right here, okay? It also has access to the populated voice id And because these voices are already loaded, it is able to check if this is a missing voice or not.
So that's what this complication was all for. Okay, I hope I've kind of cleared it up now. And here it is. So let's try it out if it works. What I say if it works is this.
Let's open Prisma Studio. Find the voice that you want. I don't know a manual. I'm going to choose that. Copy the ID of the voice and I will go ahead and do the following go to text to speech and add for us sorry add question mark voice ID and add it here so text to speech voice ID and then paste the ID.
Okay, let's see, and manual is selected. Let's go ahead and see what happens for a random one. You can see it falls back to Aaron to the first one in array. Okay, that's what the complication was for. But yes, we now have a fully working voice selector.
Looks great. The only thing we cannot see right now is the category for custom voices because we don't have any just yet. Great, so we learned a lot here. We created the TRPC Voices router, we created a context for voices, we worked with Dice Bear Avatars, We even did prefetching once again. The only thing left to do now, which isn't really related to this voice style, but let's go ahead and do it.
We're going to learn how to create the chatterbox client with open API types and fetch. Alright, after some consideration I actually decided we are not gonna do text to speech client in this chapter for a very simple reason that it has absolutely nothing to do with voice selection. So I think it's better that you pause here and understand what we just built, right? And so that we have a pull request which matches the voice selection because what we plan to do next is start doing text-to-speech generations. So it makes more sense to generate the OpenAPI-typed client there rather than here.
So, no third step. We are finished at this point. I think this is a natural step to end for this chapter. And I think most of you will agree it makes no sense to learn how to self-host chatterbox after doing voice selection, right? All right, meaning we are done.
:01 So let's go ahead and quickly go over these files. So I have 16 changes. One of them is package lock and package JSON. I have some deletions of health check and test page. But other than that, I seem to be having these modifications for TRPC.
:22 These are mostly enabling SuperJSON and adding organization procedure, auth procedure, and in the app folder we added the voices router. Everything else are components we need. So you should have those the same as me. Great, so this is chapter five. So let's go ahead And let's do git add, git commit, 05 voice selection, and git checkout b, let's say 05 voice selection.
:03 Alright, git push u origin 05 voice selection. You can see we checked out on that branch. And now let's go ahead to our source code. Let's open up pull request and let's create a pull request and let's wait for the deployment and let's wait for the review. One thing I forgot to do npm run lint and npm run build, but I'm fairly certain both will work.
:38 Lint has no problems except those warnings which we have from the first chapter. Looking good. All right, so let's wait for the review. And here we have the finished CI CD. So down here I can see my railway app is deployed.
:55 Let's go ahead and check the web instance here. There we go. Seems to work. Text to speech. And here we have our voice selector with all of the voices we have seeded.
:07 Amazing! Let's check if the pre-fill works. Pre-fill works perfectly fine. Great! Our app works in production.
:17 Now let's go ahead and take a look at the summary by CodeRabbit. Voice avatars now display in voice selection UI for better visual identification. Text to speech page supports URL parameters to preload text and voice preferences. Enhanced voice selector with improved organization and categorization and voice management functionality for viewing and deleting voices, deleting currently only on the API side. So let's take a look at the three comments we got left here.
:49 The first one I never heard of, I did not, I mean I probably heard of it but I wasn't aware of it for this specific scenario. Normalize search params values before passing them into form defaults. According to the Next.js app router specification, repeated query key produce array values. So if someone redirects to our page with multiple text values that will result to text being an array which will break all logic that we have around this. So we definitely have to do something about this.
:25 I mean our app has no way of doing that So someone would have to manually do this to break our app. At that point, the user is at fault, right? But we could do something about it. Definitely a good comment by CodeRabbit here. I did not think about normalizing the search params.
:47 I will see if this is something that we have to do or not because you can see they labeled it as minor as well simply because this is something that we will not do. So only users might do this or we can do it by accident, right? But I think they're mostly telling us you should use the proper type so you can understand what might break. That's what they're telling us here to change it to string or an array of strings. So great, great comment by CodeRabbit here.
:20 In here it's technically telling us that we need to add an explicit empty state guard in case we forget to seed system voices. In our case, the entire app will not work if we don't have system voices, so not really pointing adding more guards than we did. And we did a lot of guards for an invalid or missing voice ID. We'll just display an empty voice ID then, right? And in here, it's telling us what I told you, right?
:49 We are currently swallowing up failure, but obviously in production you should retry or at least consider putting this in a background job. Other than that, we did a very, very good job here. So let's merge these changes and let's go ahead and get checkout back to main and git pull origin main. There we go. That should synchronize the entire thing and I'm going to go ahead and check that we are on main here.
:17 We are. And let's go ahead and look at the graph. There we go. 05 voice selection merged to the main branch. I believe that marks the end of this chapter.
:29 So yes, I decided that we're not going to do text to speech client in this chapter simply because it makes no sense to do that because in the next chapter we're going to do that and the actual generation TRPC procedure and I think it fits better there so we understand why we build, why we do that, right? Because if we do it now, we're just going to forget why we did it by the time we come to the next chapter. So amazing job and see you in the next chapter.