In this chapter, we're going to create the text to speech page with a responsive three-panel layout, including the text input, the voice preview, and the settings sidebar. And then we're going to wire everything up with TANstack React form, adding four voice adjustment sliders, so creativity, voice variety, expression range, and natural flow, as well as the actual settings and history tab all of this is going to be connected using a 10 stack form So we're going to build it in the following order, starting with the layout and the three panel shell. So we're going to reuse our knowledge of layouts and pages to reuse this component page header, create this component, this component and this layout and make everything look nice. And then we're actually going to make this sliders connect to a state as well as this input. And this will be interesting because these are completely separated components and we need to find a way to use a single form instance for various detached components and that will be a really cool exercise because we're going to have to learn context-based form hook and form provider to wrap all of these panels so they can communicate with each other and share each other's values.
Excellent! So That is the goal for this chapter. So the last thing we did was, and this is only relevant to you if you're actually following my Git instructions, right? If you're just building your own way, you don't have to do this. But In the last chapter, we merged a pull request 0 to dashboard, right?
We have this summary by CodeRabbit and this is a great reminder for me because there's a mistake in my quick actions, right? So I have to fix that. But what I wanted to tell you is that after we have merged this, essentially what happened with our git is that our current state, which is on branch 02 dashboard is identical to our main branch right? There is no difference between 02 dashboard and the main branch. This branch is not ahead of main branch.
In fact, it's behind main branch because the main branch now has the merge pull request commit. So in order to start developing this and something we should have done in the previous chapter is simply checkout back to main. And then let's go ahead and do git pull origin main which will essentially just pull the merged state from the remote main branch. So this command, which I just ran, git pull origin main, is essentially the same as if you go inside of your IDE here and select the main branch here and then click this, Synchronize Changes. So that's what it does.
Push, pull and push. It's a synchronization. You can see that we are completely clean here. And if you want to make sure you're on the same page as me, you can open up the graph. So we started the app using create next app.
That was the first commit. Then we added the chat CNUI, Clerk and Prisma. That was the second commit, but we are marking that as one because that was the first change that we actually did. I then added an additional commit to fix the post install script. Right?
This was just a very simple package JSON post install script. And then what we did is we opened the 02 dashboard in another branch, in a pull request and then we merged that back to main. So your graph, if you followed my instructions, should look like this. What's important for you right now is that you make sure that you are on your main branch. Make sure you are not on any feature branch and just make sure that when you actually run npm run dev you can go ahead and see the changes that we did in the previous chapter.
If you can see that, you are good to go. Perfect. Usually we are going to make sure that we do these changes at the end of each chapter so we don't have to remember to do it at the start of the next one. Alright, so I'm gonna go ahead and immediately go inside of my source, features, and then dashboard, data, quick actions and in here I'm just going to fix the invalid href. You might or might not have this bug, so I am missing the text parame here.
You can see that all of my other ones have it, but I forgot one here. Great, so CodeRabbit caught that bug. Amazing. Now, what we have to do is we have to build the actual text to speech page layout. So let's start by doing the following let's go inside of source app folder and let's go ahead and go inside of the dashboard.
So I'm going to add text to speech like this and then I'm going to add page.tsx inside and in here I'm going to go ahead and export default function text to speech page and I'm just going to return a heading element which will yield text to speech. So now if we go to localhost 3000 text to speech, this one right here, we should see that right here. Let's see. There we go. Text to speech.
And one important thing to notice is that the sidebar is still available. The reason the sidebar is available is because we developed text to speech within the dashboard route group. If you actually move it outside, you don't have to do this, I'm just doing this, you know, to demonstrate it. If you actually moved it outside, you can see that text to speech doesn't have the sidebar. So make sure that you are developing your text to speech within dashboard right here.
And if you get this unsaved files, don't worry. Again, this is just cache. You can close this, you can click save, don't save, it does not matter, just close this big .next folder, it's just cache, it will get regenerated, don't worry about it. Now let's go inside of text to speech. If you wrote this you can remove it because that's invalid syntax.
And now what I want to do here is I want to add a reserved metadata here. So I'm going to import type metadata. Let me go ahead and import metadata from next. And then I can export const metadata. It's important that it's named this way.
And this object will have a title of text to speech. So what happens then? If you look closely at your tab, you will see that now it has a text to speech label. But I want to make it just a little bit different. So I want to go ahead and go inside of source app folder layout.
This is the root layout where we define the font, the body, the HTML. In here we can do one cool thing. So we already have the metadata here, Perfect. So let's change the title here so it's no longer Create Next App. Instead, let's go ahead and let's just make it say Resonance.
And for the description, it can be whatever you want. I'm going to add AI-powered text-to-speech and voice cloning platform. So now if I go to dashboard you can see it says resonance but if I go to my text to speech it just says text to speech But what I would want it to say is resonance. Actually, I would want it to say text to speech pipe resonance. That's what I would like it to do.
And technically, we can do that by hand in each page that we develop. I can do text to speech, pipe, resonance. But that's not exactly the proper way to do it. The proper way would be to focus on this layout inside of the app folder and open an object instead of a string here, give it a default of resonance but add a template. And this template can use the following syntax %s and then pipe resonance.
And by doing this, you can just add various different titles to your pages and notice the tab now Text to speech resonance. So now it's more of an industry standard. This is the behavior you see in production apps. I just wanted to show you that little trick. Great.
So now that we have the layout, let's actually develop the layout view. So I'm going to go inside of source features, we already have text to speech and now I'm going to create views folder And in here I'm going to create text-to-speech-layout.tsx I'm going to go ahead and import a reusable component we developed the last time which is page header and I'm going to export function text-to-speech-layout which simply has a children prop. And I'm going to return a div with class name flex, full height, a height reset using minimum height of 0, flex column and overflow hidden. I'm going to render the page header with a title text to speech and beneath it I'm going to render the children. This file by itself doesn't do anything because this is just a component.
What we have to do now is we have to go inside of the app folder dashboard text to speech and let's go ahead and actually create layout.tsx. Since it's empty we now have an error so what we have to do is we have to import the text to speech layout and then we have to export default function layout. I'm just going to collapse this so it's easier to look at. So layout is a default export. Why is this a default export?
Because it's in a reserved file name. So almost every file within the app folder requires a default export to work. That's how routing is recognized within Next.js. So what we are basically doing is instead of writing the layout here directly in layout.tsx we are maintaining it and developing it inside of features text-to-speech views. So in here is where we write the actual content.
All right. So now that we have this, you can see that we have a nice text to speech at the top and we can reuse it, you know, for voices, voice cloning and everything else we have in the future. Perfect. Now that we have that let's go ahead and create the text-to-speech view. So I'm going to go back here in features, text-to-speech, views and let's add add text-to-speech-view.tsx text-to-speech-view will have the following components so I'm just gonna prepare these because they don't exist yet so we're gonna have text input panel, voice preview placeholder and the settings panel.
We don't have any of them, so I'm just going to comment it out so we can enable them later. And for now you can just export a function text to speech view. The text to speech view is going to return a div with flex min height zero flex one and overflow hidden and then flex min height zero flex one flex call which is going to hold the text input panel and the voice preview placeholder. And outside of that div, we're gonna have the settings panel. So for now we can comment these three out simply because none of them exist yet.
So what this is, it's basically the same thing as text to speech layout. It's just a function, just a component, right? So it's not exactly rendered anywhere yet. So now we're going to go inside of our text to speech page and we're just going to return text to speech view like this. So we should now have import text to speech view and import type metadata and we are just returning this.
So notice the difference. Every time that we are within the app folder and we develop a layout, we are using a default export. When we are in a page, we are using a default export. Here we have an exception because this is metadata, but I'm talking about the component, the JSX. It's required to do a default export, otherwise the component will not be recognized in routing system.
Same is true for the dashboard route group. If I go inside of page, default. But if I go ahead and click within the dashboard view, you can see we don't do default exports. Nothing would really break if you changed the dashboard view or something like that to be a default export. You would just have to change the way you import it.
But I don't like doing this because with default exports you can name it whatever you want. I don't like that. I like to explicitly export without default and then I have to import like this. All right, Let's make sure that so far we only changed several files so right now you should have text to speech view, text to speech layout. If you had a mistake you will also have Quick actions modified, text to speech page, text to speech layout and general layout.
Perfect. So now let's go ahead and let's develop the text input panel. I'm going to go inside of source, features, text to speech, and we're going to open up components. Inside of components, I'm going to add text input panel.tsx. I'm going to mark this as use clients.
I'm going to import use state from react and coins icon from Lucid React. I'm going to add badge button and text area components from chat CN. I'm going to import my text max length, which we defined in the previous chapter actually. And I'm going to go ahead and start developing the text input panel. So again, this is a component, we're not within the app folder, so we can do an export function here.
So we have a named export. Since we don't have any form libraries yet, we're just going to maintain this with a use state. So text and set text. Then let's go ahead and let's return a div with flex height full, minimum height of 0, flex call and flex 1. Now we're going to develop the text input area so that will have a relative minimum height of 0 and flex 1 and within that container we're going to render a text area component.
So first let's add the controlled prop value which uses the text and on change appends the event target value to set text. Then let's go ahead and add a placeholder here. Start typing or paste your text here. And then let's go ahead and give it a class name. So the class name will have absolute inset zero, resize none, border zero.
Then it's going to continue with background transparent padding 4, padding bottom of 6, on large padding will be 6. Then we're going to add on large devices again padding will be padding bottom will be 8. Text is going to be base. Leading is going to be relaxed. Then we're going to have tracking tight shadow none wrap break word.
And last one is going to be focus visible ring zero. The maximum length of this text area will be equivalent to the constant we defined. So this is enough for us to render this so we can actually see what we are developing. You can also pause the screen if you want to type out or confirm your class names and again if you have the Tailwind CSS extension you can hover over your class names and if the definition pops up it's correct but if you accidentally made a typo you can see that nothing pops up just to remind you of that trick. So now let's go ahead inside of our text to speech views text to speech view Let's uncomment text input panel and you can see how we can import it from dot dot components text input panel because we just have to exit the views folder and enter the components folder.
If you prefer, You can also directly go with the features folder, whatever you like. And let's now uncomment text input panel inside of text to speech view. And just like that, you should be able to see the text, start typing or paste your text here. You can see it almost looks like the entire thing is the text input which is exactly the kind of look we want to achieve. Great.
Now that we have that, let's go ahead and let's add a bottom fade overlay. So Why do we need this exactly? Well, I think the best way to demonstrate is to kind of add a bunch of text here. Basically, we wanna make sure that at the bottom here, you have kind of a fade. And the reason we need that is because we're gonna have some buttons here and then those buttons will kind of overlap with the visual text right if you have a long line like this let me try and delete it yes for example the button will be right here and that just doesn't look good So let's go ahead and develop the bottom fade overlay which is just a single self-closed div.
This is it. Class name, pointer events none, absolute, inset x 0, bottom 0, height 8, background linear to top, from background to transparent. And you can see how now it's kind of subtly fading out. It's very very subtle but you can see that when you have a bunch of text You can see how they almost disappear into a fog-like effect, I guess you could call it. Right?
That's the effect we want. It's a subtle change, but it makes it look much better. It doesn't seem so right now because we didn't actually develop the action bar, which we're gonna do next. So, almost before the entire thing ends, let's add an action bar. The action bar is going to have a div shrink 0 padding 4 and enlarge padding 6.
We're gonna start with a mobile layout so while you're developing this I would recommend being zoomed in so you're looking at the mobile version here. So mobile layout is going to have a div with class name flex, flex column, gap 3 and hidden on large devices and we're very simply gonna go ahead and render a button with a class name with full generate speech. Right, So this is it. This is the mobile view. So you can go ahead and paste some text here and you can see how it doesn't abruptly end.
If you comment out the bottom fade overlay, you will see that it kind of gets cut out. Which looks very bad when there's a lot of text. Maybe not so much on mobile, but on desktop, it looks even worse. But with this simple class name, you kind of make it like fade out into some kind of fog, right, it's just a subtle effect. All right, so that's it for the mobile view.
The mobile view is very simple actually. So outside of this div right here, we're now going to develop the desktop layout. Now the desktop layout will be different depending on if we have any text written or if we don't. So prepare a ternary like this. If text length is larger than zero it means we have some text.
Otherwise Let's go ahead and do this one first. So let's add a div, which it will be hidden, only visible on large devices on desktop. And inside of that div, a simple paragraph with a class name, text small, text muted foreground, get started by typing or pasting text above and then inside of this one let's go ahead and start with a different div like this with a class name of hidden items center justify between and flex on large devices So if I clean this up now, let me go ahead and zoom out. You don't have to be zoomed out so you can see the desktop mode. But yes, the goal is basically that there is no generate button until we start typing something.
Well, we don't even have the generate button on desktop right now, but you should see this text when you don't type anything and then empty when you type something. So now let's go ahead and let's add a badge. A badge will have a variant of outline and the class name of gap 1.5 and border dashed. Badge is a component from chat C and UI which we've added when we began developing this. Beneath, I mean inside of the badge let's render the coins icon with class name size 3 and text chart 5.
Beneath the coins icon we're gonna render a span with an extra small text. And then in here we're going to render another span with a class name TabularNumbers. This is a super cool class name which is basically used whenever you have moving numbers. I will try and remove this class so you see the difference. Basically in here we are doing the exact same calculation that we did in another text input panel.
You should have two text input panels. One inside of this feature you are developing right now and another in source features dashboard. You can see that in here we have the exact same function, right? It would probably be a good idea to separate this 0.003 into some form of constant. Perhaps we can actually do that.
I think that would just make sense. Let's go inside of text to speech data constants and I'm going to go ahead and do export const cost per unit and I'm going to make it 0.3. I mean 0.0003. Okay. And once you have cost per unit, let's go inside of, well we need to modify both of our text input panels.
Let's go ahead and first go inside of text to speech, components, text input panel. So the one we are just developing. And instead of this magic number, let's go ahead and use cost per unit. Which we can import from data constants. I like to use the explicit, but you can also use this if you prefer whatever you like.
Alright, so you should have cost per unit and you should have text max length. Now let's go ahead and copy this and let's close the text to speech feature. Let's go inside of dashboard components, text input panel, and let's go ahead and add both cost per unit and the text max length. And let's change the magic number with cost per unit. This way we don't have to worry about this changing in the future.
Alright, and nothing much should really change. On desktop, When you start typing, it's very zoomed out, but you should be able to see the estimated cost calculation. This will be easier to demonstrate once we add the bottom voice preview panel because right now this is just huge and if I try to zoom in, okay, I can work with this. Perfect. But basically you can see that when I type something, I'm trying to demonstrate the badge expanding, right?
But basically if you don't use tabular nums class name, it can shift the layout. And it just looks very bad. So if you go inside of Features, Text-to-Speech Components, Text Input Panel, and find this where you calculate the cost and remove tabular nums. I'm not sure if this will exactly be a good example. Yeah, I'm not sure if you can see, but it's kind of Twitching.
See? You see how the border is going left and right. But that doesn't happen if you enable that class name. That's why we need it. Tabular nums.
And not many people know about that. But it's like a perfect solution whenever you need to display that kind of data that changes You can see how now there's no shift at all Right another cool trick. I hope you just learned Great. Let's go ahead and focus back on finishing this so we are in desktop layout and we just added the calculation and we replace the magic number with a constant. Now what we're gonna do is we're gonna go ahead and add an empty space here and write estimated like that Then let's go outside of this badge and in here we're going to render a div and let's close it.
The div will have a class name of flex, item center and gap3 in here let's add a paragraph text extra small and tracking tight And I'm gonna go ahead and render text.length to local string. And then I'm gonna add a span which is text muted foreground. And I'm gonna go ahead and render a divider and we can render some empty space here. Like this. And then let's go ahead and display text max length to locale string characters.
Like this. So now on desktop, you should also have the amount of characters you have written. You can see that right here. All right. So that works.
And if you want to represent whitespace in another way, you can also use this Unicode. If you prefer that is the exact same effect. Perhaps your IDE even recommends this because this can work differently depending on like if you add enter here or maybe add some spaces here, right? Using a Unicode might actually be more consistent. So whatever you prefer, actually.
For example, I really like using those. So I know the white space is expected here. So it's an AND sign, NBSP, and a semicolon. Okay. And then after that paragraph, let's render a button, size is small, generate a speech.
And we finally have the desktop layout. So this is how it's gonna look like when there's no text, except later this will not be a text, this will be prompt suggestions. We're going to suggest some pre-made prompts for the user. So they can start typing and then they will be able to generate the speech. Alright, pretty good.
So now that we have a text input panel, let's actually develop the voice preview placeholder. So again, inside of components, I'm going to add voice preview placeholder.tsx. I'm going to add audio lines, book open, sparkles, and volume two from Lucid React. And I'm going to add button from components UI button. Let's export function voice preview placeholder and let's go ahead and develop this component.
This will be just a UI component there won't be any logic here. So we're going to start with a div which is hidden on mobile and on large devices it's flex meaning visible. We're also going to have flex1, full height, flex call, items center, justify center, gap 6 and border top. Then in here let's go ahead and let's add flex, flex call, item center, gap three. And then let's go ahead and add another div, relative Flex, Width of 32, Items Center, and Justify Center.
And in here we're going to add a div which encapsulates our icon. So class name AbsoluteLeft0, minus rotate 30, rounded full, background muted, and padding 4. And let's go ahead and render volume 2 icon with the class name size 5, text muted foreground. Let's stop here and let's actually render this. So we're going to go back inside of text to speech views, text to speech view.
And let's uncomment the voice preview placeholder and you can import it from dot dot components voice preview placeholder or if you are like me and you prefer explicit import, you can use add features text to speech components voice preview placeholder. And let's uncomment its place. It's only visible on desktop so you have to zoom out. And this is basically what it looks like. We're going to have three icons like this and then beneath it a text which says this is where the audio preview will appear.
Alright, let me try and zoom out a bit so maybe I can see while I'm developing. So let's go back and set a voice preview placeholder and unfortunately I have to collapse the screen even more I'm not sure if I can zoom out that much here. So to make it easier to look at this perhaps you can add like spaces here so you understand that this is a div encapsulating an icon right so you can now copy this and paste it and this is the second icon in here we're gonna have sparkles and then let's go ahead and copy it again and this is going to be audio lines. So both volume 2 sparkles I mean all three volume 2 sparkles and audio lines all have the same class name size 5 and text muted foreground but their containers will have different class names. So we are finished with the volume 2 but now we have to modify this one.
So it's not going to be left to, we can remove that. This one will actually be a relative. We're not going to have rotate. Z index will be 10. Rounded will be 4.
Background muted will be 4. That's all good. And let's go ahead and change it from background muted actually to background foreground. And let's go ahead and change the text to text background. So it's reverse.
All right. And then the audio lines one will not be aligned to left but to right. And everything else I think is exactly the same. So now let's take a look at what this looks and there we go. So a nice little three icon layout.
Perfect. Then beneath all that, actually outside of this div, let's go ahead and enter a paragraph. Preview will appear here. And this paragraph has large text, font semi-bold, tracking tight, and text foreground. And beneath it, we're gonna have another paragraph.
Once you generate your audio result will appear here. Sit back and relax. Let me go ahead and fix the unnecessary white space. This class name will have a maximum width of 64, text-centered, text-small and text-muted foreground. And to wrap it all up, before we wrap this div, let's just add a button with a variant outline, size-small, a book-open icon and don't-know-how button.
And this can have an appos like this. Let's see how it looks like. There we go. Preview will appear here. Once you generate your audio result will appear here.
Sit back and relax. And you can go ahead and try adding a bunch of things now and you can see how this fade now looks much better, right? It doesn't have a clear cutout line here. So that's what I wanted to do that. Great.
So we now have that. And what should this button do? Well, usually it will lead to your documentation, right? So you can import link from next link. Let me show you, make sure you have link from next link like that.
And the href will be, you know, your documentation or you can just use mail to and add your support email. If you don't have anything in place right now and make sure to add as a child so this button becomes the link. So at least now the button has a purpose. It's gonna open an email. But yeah, in production you should probably make this go to some kind of documentation.
All right, Now let's go ahead and develop the settings panel, which is the last shell layout component we are building. So instead of text to speech components, let's go ahead and add settings-panel.tsx. I'm gonna go ahead and import settings from Lucid React. I'm going to export function settings panel and I'm going to go ahead and return inside of here. We're going to render a div.
The div is going to have a hidden class name so it's hidden on mobile. It's going to have a width of 105 which calculates to 420 pixels, minimum height of 0 which is basically a height reset, flex call, border L and on large devices it will be flex. Basically hidden on mobile, flex on desktop or invisible on mobile, visible on desktop with flex properties. In here we're gonna go ahead and add another div with flex items center, gap 2, border bottom, px 4 and height of 12. Let's go ahead and let's render settings icon with class name size 4.
Beneath it, let's go ahead and render a span with small text and font medium rendering the text settings. Outside of the div encapsulating those two, let's render another div with flex flex1, items center, justify center and padding 4 and the paragraph which renders voice settings will appear here. The paragraph will have text small and text muted foreground. And that's actually it for the settings panel right now. So we can go ahead inside of text to speech, views, text to speech view and we can uncomment the settings panel.
Again, if you prefer, you can go ahead and import from this. I just really like being explicit and we can render the settings panel. What's important is that settings panel is not rendered inside of the same div as text input panel and voice preview placeholder as that's going to break the entire layout. And you can zoom out and there we go. This is the three panel layout, the text input, the voice preview, and the settings bar exactly as we envisioned.
So that is part one done. We now have to go ahead and add a tanStackForm and develop these sliders and actually connect the entire form functionality and this value functionality so that when you click generate we can actually submit all of those information from various different panels which are detached from one another. So this is where we're going to have to kind of learn how to pass form values through context. So before we go ahead and add the 10 stack react form, I just want to make sure we are on the same page. So I have 11 changes in here, And I'm just going to go ahead and let you pause the screen so you can see all of the changes I have done.
Again, you don't need to have all of the changes as me. For example, quick actions. I had a bug with a missing text param. You might not have that. Perhaps you didn't modify the layout if you are not interested in the title thingies, but just go ahead and make sure that you didn't forget any crucial new files which were created.
All right. If you confirm everything is fine, Let's go ahead and let's do npm install tan stack react form. We're going to use this along with the field component from chat CN UI. And usually, the implementation you will see me do now for 10 stack React form is usually not as complicated. But since we are working with a very, very specific layout where our form is actually in this component and in this component, we need to do it in a very specific way.
So we're going to go ahead and learn how to use form in a context way. What I'm trying to say is, UIShats-EN has TanStackForm documentation here, And you can see how they intend to use it, right? You can see it's very simple. You have your use form and default values, validators, nothing you haven't seen before, right? Ours will be a little bit more complicated.
That's what I'm trying to say. We will have some things like this. We will have filled label and we will have form.filled. Thing is, But the problem is sharing the use form value across different components, right? That's what the goal is.
So now that you've installed that, what I like to do is I like to go inside of Packet.json simply so you're aware what version I am on. If you're on a drastically different version like version 5, it might be a good idea to look at the breaking changes or you can simply go ahead and use npm install tanstacked form with exact version that I am using. Great. Once you've confirmed you have the package, let's go ahead and create a new file within source hooks. So in here we only have use mobile, which we didn't develop, we got it from Shazzy and UI, but this one we will develop use app form, which will be used to create forms which span through multiple components.
So from the new package 10-Stack React Form, let's import createFormHookContext and createFormHook. And from this createFormHookContext let me go ahead and fix this. We can actually import field context, form context, use field context and use form context. And then from createFormHook we can also extract the following. Actually createFormHook accepts the filled context, formContext, filledComponents and formComponents.
CreateFormHook accepts the field context, form context, field components, and form components. So these two are empty, right? Just an empty object. And from here we can extract useAppForm and useTypedAppFormContext. And once we have those, we are ready, right?
We are basically initializing this once so we don't have to do it in all components that will need this. All right. So now what we're gonna do is we're gonna go inside of features, text to speech, data folder, and we're gonna add sliders.ts. So, what is a slider? A slider is basically this.
Creativity, voice variety, expression range, and natural flow. And while we are going to present this to the user in a natural language, human readable way, the way we are going to store this is pretty dependent on chatterbox text to speech parameters. So chatterbox text to speech is our AI model, which we're going to learn how to self host. And they have very specific, I mean, the model has very specific parameters it accepts so we're now going to create types for that so we map it correctly so each slider can be the following temperature top P top K and repetition penalty. Now I would advise you to carefully write this correctly casing is important but even if you write it incorrectly don't worry we will have type safe APIs So you will eventually see a mistake and a bug if you made any typos here, which is a good thing because then you will know exactly what to fix, right?
But try to do it correctly from the start so you reduce the amount of problems. Alright, so I'm just adding all the properties here and then we're gonna go and just list them. So we're gonna have ID Label, Left Label, Right Label, minimum value, maximum value, increment step and the default value. And then we can go ahead and create various sliders. So we're going to have creativity, voice variety, expression range and natural flow.
So for example, let's add temperature ID, which we are going to label as creativity. The left label will be consistent, the right label will be expressive, minimum will be zero, maximum will be two, step will be 0.1, default value will be 0.8. So I didn't invent these numbers, these are actually inferred from the Chatterbox DTS model which we are going to develop later. If you already want to explore this, I'm going to show you where you can do that. Using the link on the screen, you can see Chatterbox text-to-speech GitHub repository.
In here, if you want to, you can go ahead and read more about how it works and you can see it's code even and this is basically where I have obtained these sliders and values from so just to make you aware that these are not magic numbers and they will have more sense later once we actually have a Python server which accepts those values, but I just wanna make you aware of the GitHub repository which we are going to use. Alright, so if you want you can also visit my source code and go directly into text-to-speech-data-sliders.ts and then copy the entire file because you won't really learn much by typing this out by hand and you have higher chances of making an error but however you prefer. So we just did ID temperature which we will translate to the users as creativity of the model and now we're going to do a top p value which we will present to users as a voice variety, ranging from stable all the way to dynamic minimum zero maximum one step of 0.05 and a default value 0.95. Then we're going to add top k, which we're going to express to the users as expression range, ranging from subtle to dramatic.
Minimum of 1, maximum of 10, 000 Step 100, default value of 1000. And then the last one will be repetition penalty which we're going to name natural flow ranging from rhythmic to varied with a minimum of 1, maximum of 2, step 0.1 and default value of 1.2. I'm just gonna slowly go over this so you can pause at any moment to confirm with your code or as I said, you can simply go inside of the source code and look at it. So now let's go ahead inside of text-to-speech components and let's create text-to-speech form.tsx. Let's go ahead and mark this as useClient.
Let's import Zod and form options from tanstack react form and Z from Zod. Let's import our use app form from hooks use app form which we have developed. Where was I? Here I am, use app form, perfect. And then let's go ahead and develop the text to speech form schema using Zod.
So open up a Zod object and let's first require the user to enter a text with a minimum length of one otherwise they get an error please enter some text. Secondly, they're going to have to select a voice id from the database so please select a voice and then they're gonna have to select the parameters which aren't required and won't throw errors because we will have default values for them so those are temperature top p top k and repetition penalty great From that object we can create a type text-to-speech form values using ZodInfer type of DTS form schema. And then using that type we can create constants like default text to speech values. In here I'm going to assign the text to be empty, voice id to be empty, temperature to be 0.8, top P to be 0.95, top K to be 8000, and repetition penalty to be 1.2. These are the default values that Chatterbox TTS model will pass if you don't even pass these values.
Right? So if you don't add them, these are going to be the defaults anyway. So that's why I chose those. So users can at least visually see what the defaults are. Then let's export const text-to-speech form options using form options from tanstack react form and passing along the default values and then from here we can go ahead and export function text-to-speech form.
Text-to-speech form will have two props, children and optional default values, which are text-to-speech form values. Then let's go ahead and define the form by using useAppForm. Let's go ahead and open an object within this hook and let's start by spreading the text-to-speech form options we defined above which infers the default values Now, we still have to pass the actual default values here, which we can do by the following. If the component has default values passed, we're going to use them. Otherwise, we're gonna use default text to speech values.
So what's the difference? These default values from the prop are often going to be history, like an older generation that user selected, so we populate the exact properties that existed at the time, right? And if that isn't passed, so basically, if default values exist, it means we are revisiting an existing generation, Whereas this means this is a new generation we are trying to create. So the validators are going to be on the onSubmit event and we're going to validate by using the text-to-speech form schema ZodObject. And the onSubmit method will simply be an asynchronous function with generation logic to be added later on.
And for the return, we simply add form, app form, render the children inside. As simple as that, we now have a reusable text-to-speech form with context default values validation that we can share across these layouts which we have created So let's go ahead and continue and let's create another component here called generateButton.tsx We're going to go ahead and mark this as useClient and import button and spinner from Shazam UI. Let's go ahead and export function generateButton. Let's go ahead and prepare the types and props here. So generateButton will accept size, disabled, isSubmitting, onSubmit and className.
The props for that, I mean the types for that will be size is an optional which can either be default or small, Disabled is a required boolean, isSubmitting is a required boolean, onSubmit is a required function and className is an optional string. And then in here we're going to go ahead and return a button component like this and we're going to add it the following class names. Size, class name onClick matching to onSubmit and disabled. We're then going to check if we are submitting, we are going to render one thing otherwise we are going to render another thing. So if we are submitting, let's go ahead and render a fragment a fragment with a spinner class name size three and the text generating.
Otherwise, let's simply render generate speech label. Great, so we are writing this small component and we are purposely doing that because we are going to reuse it in two places so it just makes no sense to have this exact function twice since we can easily reuse it now. Now let's go inside of components and let's create settings-panel-settings.tsx. In here I'm going to mark it as use client and I'm going to import use store from 10 stack react form Then I'm going to go ahead and import field, field group, and field label from components UI field followed by slider component from components UI slider. So all of these are coming from ShatCN UI.
We've added them in the first chapter when we ran npx ShatCN add dash dash all. And then I'm going to add use typed app form context from hooks use app form this is what we developed recently right so you should definitely have use typed app form context use app form and all of these Let me go back to where was I? Settings panel, settings. We're then going to import the constants for the sliders from dash dash data sliders or features text to speech data sliders and then I'm going to import text to speech form options from text to speech form or again whatever you prefer I like seeing exactly what feature something belongs to. So yeah, if you want to you can keep keep it much simpler just ./.text-to-speech-form.
Alright now that we have text-to-speech-form options let's actually develop the function settingsPanelSettings. So why such a weird name? Well, because this panel here is considered the Settings panel. And the Settings panel can either show history or settings. So depending on what the user clicks, we're gonna display something different here.
So if the user clicks on settings, we should show the settings panel settings. If the user clicks on history, we're gonna show the settings panel history. So unfortunate name, perhaps they could have thought of something better, but just to explain why such a weird name. Let's go ahead and extract the form into this component by using useTypedAppFormContext and passing along the text to speech form options. And then from here we can also extract is submitting by calling user store, selecting form dot store, and specifically selecting s dot is submitting.
So now let's go ahead and return the UI, so we can actually see it. We're going to wrap this instead of an empty fragment. First, we're going to do a voice style drop-down section. So this actually doesn't exist. So for now, let's just do border-bottom, border-dashed, and padding-4.
And let's add a paragraph voice selector coming soon, with a class name text-small and text-muted foreground. All right? And then outside of that div, let's do voice adjustments section. In here, we're going to go ahead and render a div with a class name padding 4 and flex 1. And then we're going to do the field composition.
So we're going to add a field group with class name gap 8. So we added field group field label and field right here, make sure you have that along with the slider. And what we're gonna do is we're gonna map over sliders and for each slider we're gonna go ahead and render form field. So let's go ahead and render form.field with a key slider.id and the name slider.id. And then in here, we can go ahead and do the following.
So we can extract the children property like this and get the field. We can render that within the field component. For each slider, we can render slider.label within field label. And then in here, we're just going to go ahead and make it pretty. So we're going to add a div with a class name flex items center and justify between.
And the first thing we're gonna display here is a span slider left label, meaning the minimum range, right? So Let me close everything else. So in sliders, this would be consistent, right? And then on the right end, it will say expressive. So if you pull the slider all the way here, it's consistent.
If you pull it here, it's expressive. That's what we're trying to render now. Alright, so that's the left label. And then next to it, another span right label. Identical class names and we are using justify between so there will be a bunch of space between them, right?
Alright, and then outside of this div, we have to render the slider. The slider is going to have a value which is held within an array, like this. See, it's within an array. That's because the Shazian slider component can store multiple values but for our use case we're just going to be working with a single value. So you have to put it within an array.
Just be careful you do that, okay? Then let's go ahead and add on value change to get the value and call field handle change and get the only or the first element from the value. So again, the exact same reason we need to do this is the exact same reason we had to do this because Slider is naturally multivalue but we are only using one value from it so that's why we have to extract the first value from this value prop. See? And then let's go ahead and just pass along other props.
:36 Minimum slider minimum, maximum slider maximum, step slider step and disabled if the entire form is submitting. So if the user clicks generate, we're gonna automatically disable all of these. That's why we had to develop this context-based form hook so the form provider can wrap all of these panels. So we can initiate form submission from this component and it will immediately react into this component. That's why we had to do it in a more complicated way.
:07 And if you're confused about this, like field group, form dot field, field, like what is this? Well, this is just chat-cn composition. I think we already developed something like that form, 10 stack form. You can see those examples right here. Maybe not exactly this one, but yeah, you can see this prop, children field, that is the same thing as doing this.
:34 I could have technically done it within a prop but it's just cleaner to write JSX not within a prop but the other way. I think they definitely have some examples of doing it in a different way. Perhaps they can. Okay. So they seem to be doing in this way right now.
:52 But yeah, you can do it in multiple ways. I suggest you read this page a little bit so you can see various ways this can be used with. Okay. So now, what we have to do is we have to add some class name to this slider. Just to make it look a little bit better.
:13 Don't worry, it's not going to be too much. It's just that they are big class names. So the first class name is the following asterix asterix column data dash slot slider thumb size three. So yes it has to be specifically like this, no spaces. Then, the second one is exactly the same prefix.
:39 So, you can copy this part. And the only difference is, after the semicolon, we do background, foreground. And then the last one is slightly different so again asterix asterix data dash but instead of slider thumb it's slider track height one that's it that's the class name And I'm going to zoom out just so you can see how they all look like. So this is one class name, this is second, and this is third. Okay.
:18 Perfect. Now that we have that, let's go ahead and develop the settings panel history. So, settings panel history. But we're not actually going to develop anything in here. We're just gonna add a placeholder state.
:35 So import audio lines, audio waveform, and clock. And you can go ahead and export function settings panel history here. Let's go ahead and return. And what we're going to develop here is awfully similar to voice preview placeholder, right? So we're going to have three icons like this, but just in a smaller arrangement.
:03 So I'm going to go ahead and add the first div here. Flex height full flex column items center justify center gap to and padding eight. Then we're going to have another div with a relative flex width of 25, item center and justify center. And then we're going to render our three icons. So the first icon is going to be audio lines and its div will have absolute left 0 minus rotate 30 rounded full background muted and padding 3.
:38 The icon itself will have size 4 and text muted foreground. Then for the second item we're gonna have audio waveform with relative z index of 10, rounded full, BG foreground, padding 3, and audio waveform size 4, and text background. So the reverse, The exact same thing we had to do here, but we're just doing it with a smaller width. And I think some sizes are a little bit different. And then the last one is identical to the first one, except positioned on the right side instead of left side.
:14 So absolute right zero, rotate 30, rounded full, background muted, padding three, clock icon size four, text muted foreground. And then outside of this div, let's add a paragraph, no generations yet, font semi bold, tracking tight and text foreground. And last one, generate some audio and it will appear here maximum width of 48 text center text extra small and text muted foreground great So now let's go ahead and actually learn how to render those two components, right? Because we now have those two components, SettingsPanel.History and SettingsPanel.Settings, but the actual SettingsPanel is just this, right? So we have to change that.
:05 So let us start by removing the import and let's instead add history and settings from Lucid React. Then let's go ahead and let's add tabs. Let's add tabs content, tabs list, tabs trigger from components, UI tabs. Then we're going to go ahead and add settings panel history from .slash settings panel history and settings panel settings from settings panel, well, settings, all right? Then inside of the actual settings panel here, we can delete everything within the parent div.
:51 So just leave hidden with 105, minimum height of 0, flex column, border left, enlarge flex. And in here, we're going to go ahead and render the tabs element so the tabs will have a default value of settings and a class name flex-height-full, minimum height of 0 flex-column and gap-width of 0. Inside we're gonna have tabs list. Tabs list will have tabs trigger. The tabs trigger will have a value of settings.
:33 And in here we're gonna render settings icon with a class name size 4, which will say settings, like this. And then we can copy that tab trigger and change this one to be history and the text to be history so I think already we should see that there we go so we have settings in history okay not looking perfect but they are visible And what we have to do now is outside of the tabs list, but still within the tabs, we have to add tabs content. And in here, let's render settings panel settings. Let's give the first tabs content value of settings. So that's what this one is controlling.
:28 Let's give it a class name margin top zero flex minimum height of zero flex one flex column overflow, y auto. Like that. Okay, it looks like something has happened here. So for now I'm just going to comment this out. And let me duplicate the tabs content here, paste it and change this to be a value of history and in here we're gonna render settings panel history okay and for this we can just put the paragraph settings panel.
:06 Alright, let's expand. So now it's settings panel, but if I click on history, it should change to that, but it's not. So let's see exactly what we did wrong here. I didn't change the value. The value should be history.
:21 Okay. Default value is settings. So that's what's going to be selected first. And when you click on history, it changes to no generations yet. Generate some audio and it will appear here.
:32 Alright, so now let's go ahead and quickly deduct what is happening with settings panel settings. When I uncomment this, we get an error that we aren't wrapping... Okay, so... I think we need to wrap our app within the form component somewhere I'm just not entirely sure where So, let me see Just a second So we have to do it inside of text to speech view. Okay.
:11 So go ahead and delete this paragraph settings panel and make sure you render settings panel settings and go inside of text to speech views text to speech view. And now we have to modify this a bit. So we're going to go ahead and let's see, let's mark this as use client first. That's very important. And then we're gonna go ahead and import text to speech form and default DTS values from components text to speech form like that.
:53 And then we're going to go ahead and wrap the entire thing around text to speech form like this. Okay. Let me go ahead and try and invent this. And let's go ahead and pass in the default values to be default DTS values. There we go.
:16 So by doing this, that should be fixed. I'm going to change this to Go features, text to speech, like this, and then I'm going to keep them together. And now, there we go. You can see our sliders. Creativity, consistent, expressive, voice variety, stable, dynamic, expression range, subtle or dramatic, natural, flow, rhythmic or varied.
:40 And it changes depending on the tab we are clicking on. And the voice selector coming soon because we didn't develop it just yet. Perfect. So I just wanna make sure we fix that error before we continue here, because what we have to do now for the tabs is add some class names. So let's go ahead and develop the class name.
:05 Yes, here in the settings panel. Let's go ahead and develop the class name within a constant. Tab trigger class name. So this will be a relatively long one, so feel free to copy it from my source code if you want to. Flex 1, Height Full, Gap 2, Background Transparent, Rounded None, Border X 0, Border Top 0, Border BorderX 0, BorderTop 0, BorderBottomPixel, BorderBottomTransparent, Shadow none, DataStateActiveBorderBottomForeground, So this is a single class name.
:48 This isn't separated. It needs to be all together without any spaces. Then this is a long one. Group data dash variant default forward slash tabs list data state active shadow none. Super long one.
:04 So it's important that you understand this is all one class name. There's no space here. Okay, this doesn't end, this collapses so you can see the entire code without me having to scroll here, all right? Same thing with this, This is also one class name. All right.
:20 Once we have the tab trigger class name, let's go ahead and find the tab trigger here and here. And let's go ahead and give it a class name and passing the tab trigger class name. Great. We have that. And I think we did all other class names here.
:39 So let me see. It's definitely not perfect. Not what I intended. So let me try and find exactly what I did wrong here. Flex height full, minimum height zero, flex call, yep, one.
:54 Oh, tabs list, okay. So tabs list is missing a class name with full Background transparent, rounded none, border bottom, height 12, group data, orientation, horizontal, end square brackets, forward slash tabs, height 12, and padding 0. All right. Another super long one. So group data orientation horizontal tabs height 12.
:30 I'm just going to double check that I didn't misspell anything. So feel free to copy this from the source code too, but looks like everything is fine. Okay. And now they look as I intended them to look. Perfect.
:45 Amazing. So you can see that since we are able to move these values, it means that the form is connected. The form values are actually changing right now. You could even try it to see if it works by modifying the default DTS values. If you want, you can change everything to zero and then you will see when you refresh that they are actually loaded.
:11 So this way you can confirm that the values are actually loaded. Make sure to reset them afterwards. All right, so we are almost done. What we have to do now is we have to go inside of TextToSpeechComponents text input panel because right now this isn't connected to the form, this has its own text here. We no longer need this.
:34 We no longer need useState from React. What we do need instead is useStore from 10-stack React form. We no longer need the button. We can remove that. But what we do need is useTypedAppFormComplex from adhooks useAppForm.
:59 And we also need ttsFormOptions from .textToSpeechForm. And we need generateButton from generateButton. We also need TTS form options from .text-to-speech form and we need generate button from generate button. Again, you can replace both of these with features text-to-speech components if you prefer it this way. So make sure you have those and now let's go ahead and fix some things here.
:26 So instead of that previously written useState, it's now going to be form useTypedAppFormContext. We can get the value of the text using useStore, formStore, get the selector, selectorValues.text. And you can see it's all properly written. You can see how we can now share all the form values and validation in different components, which are not next to each other and which are not nested nearly. So that's how you do that when you have complex layouts like ours.
:59 Let's add isSubmitting the same way and let's add is valid the same way. Perfect. So now we're going to go ahead and do the following around the text area here. So I'm going to add form dot field name text because that's the form field we're going to control with this. And let's go ahead and just encapsulate the text area.
:27 That's the only thing we need. Let's indent it. In here, we're gonna add the children prop, which can be written like this, and that will render the text area, okay? And then the onChange here will be modified to call field onHandleChange, my apologies, handleChange, and event target value. And the value here will be replaced with fieldStateValue.
:55 So just like that we turn this into a controlled component which will be disabled if it's submitting as well. Alright, and now we have to go down here, and we have to change from using this button in the mobile layout to using generate button, like this. Class name will be full width disabled is submitting, is submitting, is submitting, on submit, form handle submit. And we're going to have the exact same thing in the desktop layout. So again, find the error button.
:46 Let's go ahead and remove it now. Paste this and you can remove the class name and it can change the size to be small like this. And I believe that is it. So, we now have a fully controlled form here. So everything now works.
:08 I think there might be a way to test if it actually works. Should we use isValid somewhere? Yes, we should use isValid here in the desktop variant. So it should be disabled if is submitting or if not is a valid that should be the case okay For mobile you don't have to do it because UX is different on mobile. So if it's not valid, well it makes no sense now because we hide these but later, yeah, maybe we don't even need That is valid thing because we obviously don't even show if it's not valid.
:49 Right now the only thing that happens is this gets disabled because we're not even submitting anything we're not even loading anything but what we could potentially do is go inside of text to speech form and go inside of on submit. I'm not sure if we can extract the values from here. Can I try and getting the values just to see if that works? So let me go ahead and try hello world. And I'm going to go ahead and purposely mess around with these.
:19 Maybe I won't even be able to submit. Yes, I'm not able to submit because I don't have the voice selector. I think that's why I can't even try it out. But since we are able to change the default values and see that that works I think we are good to go. All right.
:36 So I believe those are all the changes that we need. So as you can see at the end of this chapter I have 19 changes If you want you can pause the screen to confirm you aren't missing any important ones. As I said multiple times during this chapter you don't have to have all of these files you know depending on if you made the same mistakes that I did. Perfect. So what we have to do now is, you know, just make sure everything works.
:06 You know, don't go forward if something is broken. Your cost estimation should work. Your character counting should be working. You should be able to switch forms. You should be able to switch tabs.
:17 It should look good on mobile too, right? I mean, this is the only thing you should see right now. And one thing I'm interested about is text input panel in text to speech here. So when I have text, okay, so it's derived from here. So all the calculations are done using this text.
:38 Okay, that's all I wanted to know. Perfect. So, what we're going to do now is we're going to open a pull request. So chapter 3, text to speech UI. Let's go ahead and do, first of all, let's do npm run lint to confirm that we can still build this, that we didn't break anything.
:56 Looks good. Npm run build. Let's wait a second to see if it builds. Looking good, everything passes. If yours doesn't, I would suggest fixing any errors that you have.
:07 Let's go ahead and do git checkout-b 03-text-to-speech-ui, git add, which basically stages all the changes and git commit 03 text to speech UI and then git push uorigin 03 text to speech UI there we go our branch is now on GitHub So let's go ahead and go to our repository here. We can now open a pull request and let's review our changes to see if we made any mistakes. And after a few moments, you can see that both our continuous integration and continuous development, thanks to a Railway has successfully passed. So this preview of this pull request is now actually available. So if you were developing within a theme, you could easily share your new changes, which are deployed.
:13 So take a look at the URL, resonance, resonance, BR2, Railway. So this is actually deployed on Railway and you can share with your team or with whomever you are developing this, the progress of the app. And if you don't have this, it means that you didn't enable it. So if you go inside of your project on Railway, go inside of Settings, go inside of Environments, go ahead and click Enable PR Environments. And for Base Environment, select Production.
:46 And then it will apply to all future pull requests. And this environment, this temporary URL, will be cleaned up when the PR is closed. So a super cool feature from Railway and this way you can keep track that whatever you just developed actually works in production. Brilliant! So we confirmed that and now we have some changes and summaries from CodeRabbit.
:10 So, new features. We added complete text-to-speech page with integrated form interface, text input panel with real-time characters, voice settings panel, everything we already know. Brilliant. Here are some comments. In here it's telling us to add an explicit type to this generate button, but we actually don't need to do that because we are explicitly using this button to submit.
:36 So if anything, we can give it a type of submit. Next, inside of here, it told me that we have an invalid border bottom pixel and it's actually incorrect. And the cool thing about CodeRabbit is that you can update its learning. So if you respond border bottom pixel is a standard Tailwind utility in a newer version, it will add those learnings. In here, it's telling me that placeholder is not a substitute for an accessible label.
:06 So yes, we are missing accessibility in this text area and that's definitely something we could improve here. In here, it's telling me what we actually noticed ourselves is that the generate button in the mobile layout is not using the is valid guard in comparison to desktop layout which is using. So I just responded mobile version needs different user experience and it added that learning too. In here it's telling us that the voice ID is required, but it defaults to this, but the voice selector doesn't exist. So this is preventing the form from submitting.
:40 We are aware of this, obviously, because we are in the middle of a tutorial, but yes, if this was a real pull request, it would have been a broken pull request because you cannot submit the form because voice ID is always going to be empty because you cannot select it. Other than that, amazing, amazing job. Let's go ahead and merge all of these changes. This time we didn't have any major mistakes. And once you have merged those changes, we're going to go ahead and immediately do what we had to do in the beginning of this chapter, which is check out the back domain and get pool origin main.
:17 Because branch 3 and main are now on the same level, right? Because we just emerged it. So just confirm that you have to closed pull requests. This is the most recent one, meaning we just merged all of those changes. Go ahead inside of your IDE, confirm you're on your main branch and you can always open the source control graph to confirm this is what it looks like.
:42 So after we merged dashboard pull request, we opened up a new branch, text-to-speech and we merged that back to main. Brilliant! And you can confirm if you have text to speech, you should have components, views that will let you know that you are on the correct branch. You should have all of that visible in the main branch. Great.
:05 So I believe that marks the end of this chapter. We developed all of this and see you in the next one where we are actually going to add... We're not exactly going to be continuing to work on text to speech. We are going to add TRPC and R2 storage simply so we prepare the ability to continue developing this beyond UI. Amazing job and see you in the next chapter.