Uniform Tube Modeling of Speech Processing – VII

Uniform Tube Modeling of Speech Processing – VII


ok so ah we have derived that the all pole
model for vocal track so vz is nothing but a i can assign a g by dz ok so where dz is
nothing but a one plus k sorry summation of k equal to one to n alpha k z to the power
minus k ok now if you see that d ah [vocalized noise] that second the for the two tube model
the equation vz is nothing but a zero point five one plus rg one plus r one so for two
tube model vz is n equal to two n equal to two is two tube model zero point five one
plus rg one plus r one one plus r two into z to the power minus one divided by i can
say one plus r one r two r one r two r one rg into z to the power minus one plus r two
rg z to the power minus two so dz for two tube model is nothing but a one plus r one
r two plus r one rg z to the power minus one plus r two rg z to the power minus two now
if i say rg is equal to one if i say rg is equal to one what all impedance is infinite
zg is equal to infinite then rg equal to one so rg is nothing but a you know that rg is
r rt minus rg divided by rt plus rg ah or whatever rg minus rt or rt minus rg rg minus
rt so to that that is there in constant when
when we model in that ah boundary condition when you model in boundary condition we said
rg boundary condition modeling where here so here i can say rg is nothing but a zg minus
zd divided by zg plus so rg is nothing but a here i write capital rg lets consider small
rg rg is zg minus zd divided by zg plus zd zg is infinite then rg is equal to one ok
now if i consider rg is equal to one then dz is nothing but a dz is nothing but a one
plus r one r two plus r one plus z to the power minus one plus r two z to the power
minus two ok so if it is nth order i if i can write that if it is instead of two tube
it is n tube or i can say it is nothing but a one plus r one one plus r one r two r one
z to the power minus one plus r two z to the power minus two ok so if i say that ok i can
want to write down the recursive equation so d one z if it is d two z so d one z if
i write d one z is nothing but a one plus r one z to the power minus one if rg is equal
to one d two z is one plus r one z to the power minus one plus r one r two z to the
power minus one plus r to z to the power minus two that i have derived here z to the power
minus two now if i say this is nothing but a d one z
one plus r one z to the power minus one is nothing but a d one z ok so d one z plus r
two z to the power minus two into d one z to the power minus one r one r two z to the
power minus one plus r two z to the power minus two this term can be written like this
way r two z to the power minus two into d one z to the power minus one because d one
z to the power minus one will be one plus r one z to the minus one instead of z so z
so r one z into this one so r one r two z to the power minus one and r one z ok so this
can be also written by d zero z there is no junction no nothing is there is one so it
is nothing but a d zero z into plus r one i can say z to the power minus one d zero
z to the power minus one ok similarly i can write down this way because this is nothing
but a d one we put put d one value and this will come same value will here ok similarly
i can write dnz or dkz dnz i can write nth nth tube n number of tube this is nothing
but a d n minus one z plus r two z to the power minus n d n minus one z to the power
minus one or i can say dk z is nothing but a d k minus one z plus r two z to the power
minus one dk minus one z to the power minus one which is same as if i write the equation
dz is equal to one plus k equal to one to n alpha k z to the power minus k same way
i can write so this can be written as like this way ok so if i write this one if n is
eq number of section is equal to ten and lets rn is is equal to one load but rn is equal
to rl so lets load is rl is equal to one ok so no registive load for low frequency rn
is equal to rl which is equal to one ok now i can say i can find out the what is the
value of if i know that rn is equal to one then from that side last side this a last
last rn is equal to one so i can find out rn minus one rn minus two rn minus three if
it is ten tu ten tube model so rn means r ten is equal to one i can find out r nine
r eight r seven how because using n is equal number of tube then model rn is equal to one
rn is equal to lets this one n plus one is equal to eight eighteen centimeter so what
is saying that that if i know the reflection coefficient i can implement the tube digital
implement the tube or other hand if i know the signal which is coming out from the tube
from there i can estimate the area function of different junction because if i know r
r is rn is nothing but a what is rk rk is nothing but a a k plus one minus ak divided
by a k plus one plus ak so ak a are the area func cross sectional area function of the
tube different tube so if it is a ten number of tube consist of the vocal track is model
within a ten number of tube last rn rn is equal to one last one is equal to one then
i can find out ak so rn i know rn so this is rk so so i know rn rn is equal
to a n plus one minus an divided by a n plus one plus a n so if i know the cross sectional
area then i can know the derive the reflection coefficient and i can model the tube in digital
domain on the other hand if i know the pc signal and if i able to find out the value
of reflection coefficient alpha k in somehow i can able to find out the alpha k value then
i can find out the cross sectional orea area of the different tube i can give you one example
from ravinand say for that that is written there is a different cross sectional area
for a vowel i think the vowel which vowel i dont know that there is a vowel so this
vowel area functions is written either i know the area function i can generate the vowel
if i know the vowel i can derive the area function ok so how do we digitally implement
this tube i want to digitally implement this tube so i know i have to implement dz is nothing
but a one plus k equal to one to n alpha k z to the power minus k this equation i have
to implement so if i able to implement how to implement very simple so there is a un
lets n unn impulse then i have a output which is uln ok so this will be simple delay by
z to the power minus one alpha one and has to be added with here then z to the power
minus two alpha two added in here dot dot dot z to the power minus n alpha n added here
this is one ok so i can implement in digital systems i can
implement it digital filter the line nothing but a digital filter the equation looks like
a nothing but a digital filter so i can easily implement it using that things that vocal
vocal track i can implement using digital domain ok so now there is some concept poles
of vocal track so you that poles all pole models are vocal track tube so i can say the
vz is model using g by dz where dz is equal to one plus k equal to one to n alpha k z
to the power minus k ok so dz has an n if it is order is n so n number of pole so i
can say vz is a all pole model which has n number of pole so if i have a ten junction
tube ten tube ten ah ten ten section tube then i can say ten poles will be there ten
poles will be there so now if dz is real so the the dz have either the all re real real
pole will be the real functions so there is a real pole will be occur or there
all pole have a complex conjugate in nature so if dz has a n number of pole then i can
say that is a n by two complex conjugate pole will be there n by two pair of complex conjugate
pole so i can say vz is nothing but a g by k equal to one to n by two one minus alpha
k z to the power minus one into one minus alpha k star z to the power minus one so it
is nothing but a n by two complex conjugate pole product ok so what is the if you see
that then if i say the this is my unit circle ok so complex conjugate pole so this is the
real axis this is the imaginary axis if a pole occur in here with a angle of theta there
will be a another pole which is conjugate this pole minus theta and here so if there
is a pole in here there is a another pole will be in here if the pole in here there
is another pole will be here so every pole has an complex conjugate pole
ok if it is complex number then alpha k i can say it is nothing but a rk e to the power
j theta so a complex number has a two a pole let this is the pole so these as a theta and
this as a amplitude r so alpha k can be represented using rk and e to the power j theta this rk
is not reflection coefficient ok so there is a amplitude of the pole and e to the power
j theta so if i say alpha k complex conjugate part it will be nothing but a rk amplitude
will be same e to the power minus j theta value of alpha complex conjugate pole ok now
interpreted the value of value of alpha so if i do that if i put the value of alpha in
here then the dz will becomes or you can say vz will become g by one minus alpha k z to
the power minus one into one minus alpha k star z to the power minus one lets this right
down this product i am not i am product of k equal to one to n by two so if i do this
one instead of alpha k i put the value of rk e to the power of j theta instead of alpha
star k i can put the ar rk e to the power minus j theta if i put that value it will
come g by one minus two rk cos theta k z to the power minus one plus rk square z to the
power minus two for this complex conjugate product ok so i can now if you see the vz vkz in term
of this one this is vkz vz if i want vz right vz then there will be a product of pi equal
to k k equal to one to n so vk i can say this two only this is v if i write vk this one
then i dont write this product term ok now if it is that then what is contribution of
rk and bk lets rk is equal to e to the power minus bk and bk is equal to so bk is equal
to minus ln rk ok if i put that value then vkz will become one by one minus two rk so
two e to the power minus bk cos theta k z to the power minus one plus e to the power
minus two bk ask r square k so z to the power minus two now importance of this bk is nothing
but a produce a band width bk produce a band width and theta k give the formant position
when the value of rk approach to unit i can get the formant frequency resonance frequency
so if it is rk approach to unit cir close to unit circle then i can get the resonant
frequency and if the bk right bk value is non zero then i get a band width pole there
will be a pole there will be a formant which has a band width ok so bk provide me the band
width theta k provide me the formant position so this information will be used when we develop
the model using linear prediction model for speech production system ok so if i know the
theta k if i know the theta k and value of rk then i can model the system theta k give
me the formant frequency position and if rk tends to zero tends to unit circle amp amplitude
of that pole is tends to your close to unit circle that give you the formant frequency
ok so if there is a n number of pole so n by two complex conjugate pole will be there
so each pair of complex conjugate pole give me a formant frequency so if i have a five
formant frequency if i give you a spectrogram or let the i told you the spectrum of the
speech signal is like this one two three four lets f one f two f three f four then if i
aks how many complex conjugate pole will be there four pair how many pole will be there
complex pole will be eight so if n tube model is there n number of pole is there in a transfer
function then i can say n by two complex conjugate pole will be there so literally i can get
n by two formant frequency in a spectrogram ok so if i give you the formant frequency
and formant band width if i able to find out for a speech event i can able to derive the
transfer function for that speech event the vocal track transfer function now if i say
that that event is given so speech event lets say speech event the steady state vowel r
if i take the steady state vowel and analyze the spectro [vocalized noise] frequency analysis
and find pout the formant frequency and formant band width then i can able to find out the
transfer function of that vocal track ok so complex conjugate num so dont confuse to the
number of complex conjugate pole and pole number of normal pole ok so if there is a
n tube model n tube model is there then i can get n by two complex conjugate
pole so there will be a n by two format so if i able to lets there is a n equal to ten
tube i take ten ten tube to model this the ten this whole vocal track system is divided
by ten parts and model it then i can get ten by two complex conjugate pole so i can get
five formant frequency ok on the other hand lets i derive that mathematics that is a example
i have given in the ah i in the slides lets the length of the vocal track l is equal to
seventeen centimeter and the velocity of sound is three forty meter per second find the number
of section required to generate five kilo hertz band width voiced signal so i have to
generate five kilo hertz band width voiced signal what should be the sampling frequency
fs should be ten kilo hertz ok or not fs should be ten kilo hertz then what is my tau tau
is nothing but a x by c so what is x x is a length of each section ok i have to find
out number of section required so lets i require a n number of section so
if the length of the tube is l then the x is equal to l by n ok so then tau is equal
to l by n into c ok now if it is l is equal to seventeen point five centimeter and c is
equal to three five centimeter per second then i can find out it is nothing but the
seventeen point five divided by n into three five triple zero which is nothing but a one
seven five divided by n into three fifty lets right three fifty into ten to the power three
so it is two so it is nothing but a one by two n into ten to the power three ok now what
is tau so what is t t is equal to one by fs so that t is nothing but a t is equal to two
tau so tau is equal to t by two tau is equal to t by two what is t t is nothing but a one
by fs so it is nothing but a two by fs so i can write one by two fs is equal to one
by two n ten to the power three two two cancel so it is n is equal to fs by ten to the power
three so fs is equal to ten kilo hertz which is ten kilo hertz nothing but a ten sections
ok or i can say n is nothing but a fs by n by two i can say n by two is nothing but a
band width if it is a fs is equal to two b five kilo hertz ok so ten ten section is required
to model that tube five kilo hertz band width i require ten sections so similar kind of
mathematics you can expect that ok if i want to generate four kilo hertz band width signal
and fs is equal to lets eight kilo hertz sub frequency is eight kilo hertz then find out
the number of section is required minimum number of section is required ok so i can
find out the number of section is required so this way i can find out and this way you
can model that signal so since the vocal track can be model as a
all pole model i am not going details of that things this is the formant trajectory so i
can see that whole vz can be so this is the vk is the product of this if it is like vz
then product will be here so i can say this digital implement is an again same each of
the chunk represent the formant frequency this is the first formant one formant two
formant three so that will come so each stage represent one formant frequency each stage
represent one formant frequency so that we n by two number of stage will be there so
if it is n by two number of stage then n by two formant frequency will be there so i can
say that loss less tube model lossless vocal tube model can be done by a linear system
which is nothing but a vz is a linear systems vz is equal to one plus k is equal to one
to n n alpha k z to the power minus k lt so will discuss the about that lta system so
this i can say in summary this is the vocal track vz radiation is rz and glottal pulse
is gz so if i say who what is the total transfer function is nothing but a hz whole speech
production transfer function gz vz into rz whole hz the thus if i say thus from the speech
i want to find out hz which is the product of glottal pulse transfer function glottal
generate glottal transfer function plus vocal track transfer function into vocal track transfer
function into vocal track transformation into lip radiation transfer function ok so if there
is a voice in impulse train will be there that will be modified by glottal transfer
function and that modified signal fed to the vocal track and after lip radiation i get
the speech signal if it is unvoiced speech then i can say it is connected to the random
voice and it will only modify by the vocal track and lip radiation and i get the speech
signal ok so random noise passes through vocal track
and included lip radiation produce the speech for unvoice speech if it is voice speech impulse
ten will you modify by the glottal pulse generator and that will pass through the vocal track
and lip radiation produce the speech so if you see three transfer function gz can be
approximate by the ah second order ah ah this two pole two pole function for radiation there
will be a one single pole is required so two plus one three pole if i do that then i can
get hz ok so will discuss about that how many pole ah ah what are the linear prediction
analysis that things during the linear prediction analysis function what are the how do you
get what are of the linear prediction that time we use this information ok so this is
the whole vocal track tube modeling so in summary i can say the vocal track this human
vocal track can be model using a digital signal processing or [vocalized noise] all can be
implemented using a digital linear filter based on that requirement so we have shown that if i consider this vocal
track is a fun or you can is a simulated using a number of junction or number of section
lossless tube section lets n number of lossless tube section that then it can be implemented
using a linear time linear system which is vz and which can be implemented in digital
domain so thats why this is called uniform tube modeling the throughout the section if
i say n number of section throughout the each section the cross sectional area of the vocal
chord i said uniform so i can say throughout the whole vocal track can be single tube whole
vocal track can be two tube or whole vocal track can be n number of tube but if i say
n number of tube each tube cross sectional area is constant thats why uniform and loss
less also because if it is loss lossy then all kinds of complex it will come up so if
it is loss less uniform tube model then it can be implemented using digital system that
is vz once i know that then i know yes if it is implementable by a vz then can i think
that output speech which has collected using a microphone cannot be can it be analyzed
using linear signal processing yes so that from there the concept of linear prediction
analysis come so this system can be linearly model so from the signal if i want to find
out the transfer function that sys the thats why i want to predict the vocal track constriction
so if i know the signal yes it is possible to imple find out the area cross sectional
area of different section if i say it is n tube model n different section cross sectional
area is possible to find out from the behavior of output speech or if i know the area function
i can implement it digitally it is possible to implement it if i excited by impulse response
i can able to prodsssuce the speech ok so this is the tube modeling so this is called
uniform tube model or loss less tube modeling of speech production system ok thank you

Leave a Reply

Your email address will not be published. Required fields are marked *