<HTML>
<HEAD>
<TITLE>LREC 2000 - Paper 235 summary</title>
<SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript">
<!--
// preload images:
 if(document.images)
  {
  hom_d= new Image(100,20);   hom_d.src="../eikones/hom_d.gif";
  pap_g=new Image(100,20);    pap_g.src="../eikones/pap_g.gif";  
  pap_d=new Image(100,20);    pap_d.src="../eikones/pap_d.gif";  
  pap_l=new Image(100,20);    pap_l.src="../eikones/pap_l.gif";  
  hom_l=new Image(100,20);    hom_l.src="../eikones/hom_l.gif";
  aut_d=new Image(100,20);     aut_d.src="../eikones/aut_d.gif";
  aut_l=new Image(100,20);    aut_l.src="../eikones/aut_l.gif";
  Key_d=new Image(100,20);    Key_d.src="../eikones/Key_d.gif";
  Key_l=new Image(100,20);   Key_l.src="../eikones/Key_l.gif";
  ses_d=new Image(100,20);    ses_d.src="../eikones/ses_d.gif";
  ses_l=new Image(100,20);   ses_l.src="../eikones/ses_l.gif";
  abs_l=new Image(100,20);   abs_l.src="../eikones/abs_l.gif";
  abs_d=new Image(100,20);    abs_d.src="../eikones/abs_d.gif";
  aut_l=new Image(100,20);    aut_l.src="../eikones/aut_l.gif";
}

function changimg(imgName,imgObjName)
 {
  if (document.images)
   {
   document.images[imgName].src=eval(imgObjName+".src");
   }
 }
//-->
</SCRIPT>

</HEAD>
<BODY marginwidth="0" marginheight="0" leftmargin="0" topmargin="0" rightmargin="0"  background="../eikones/fonto.jpg">
<TABLE align="center" border="0" width="100%" cellspacing="0" cellpadding="0" >
<TR>
<TD height="50" valign="center" colspan="7" bgcolor="#003163"><font face="Arial" size="4" color="#ffffff"><b>LREC 2000</b> 2<sup>nd</sup>
      International Conference on Language Resources &amp; Evaluation</font></TD>
</TR>
 <tr bgcolor="#003162">
 <td width="100" valign="center"><A href="../../default.htm" onmouseout="changimg('home','hom_d')" onmouseover="changimg('home','hom_l')"><IMG border="0" height="20" name="home" src="../eikones/hom_d.gif" width="100"></A></td>
 <TD width="100"><A href="../session.htm" onmouseout="changimg('sessions','ses_d')" onmouseover="changimg('sessions','ses_l')"><IMG border="0" height="20" name="sessions" src="../eikones/ses_d.gif" width="100"></A></TD>
 <TD width="100"><A href="../paper.htm" onmouseout="changimg('papers','pap_d')" onmouseover="changimg('papers','pap_l')"><IMG border="0" height="20" name="papers" src="../eikones/pap_d.gif" width="100"></a></TD>
 <TD width="100"><A href="../abstract.htm" onmouseout="changimg('abstracts','abs_d')" onmouseover="changimg('abstracts','abs_l')"><IMG border="0" height="20"  name="abstracts" src="../eikones/abs_d.gif" width="100"></A></TD>
 <TD width="100"><A href="../author.htm" onmouseout="changimg('authors','aut_d')" onmouseover="changimg('authors','aut_l')"><IMG border="0" height="20"  name="authors" src="../eikones/aut_d.gif" width="100"></a></TD>
 <TD width="100"><A href="../keyword.htm" onmouseout="changimg('keywords','Key_d')" onmouseover="changimg('keywords','Key_l')"><IMG border="0" height="20" name="keywords" src="../eikones/Key_d.gif" width="100"></A></TD>
<td width="1000">&nbsp;</td>
 </tr>
 </TABLE>
<BLOCKQUOTE style="MARGIN-RIGHT: 0px">
  <P><A href="234.htm">Previous Paper</A>&nbsp;&nbsp; <A href="236.htm">Next Paper</A></P></BLOCKQUOTE>
  <center>
<TABLE width="95%" Align="center" Border="1" bordercolor="#669999" cellspacing="1">
    <tr>
      <td width="15%" height="40"><b>Title</b></font></td>
      <td width="85%" height="40"><font color="#990033" size="4">A New Methodology for Speech Corpora Definition from Internet Documents</font></td>
    </tr>
    <tr>
      <td height="40"><b>Authors</b></td>
      <td height="40"><font color="#006600">Vaufreydaz D.</font> (Laboratoire CLIPS-IMAG, équipe GEOD, Université Joseph Fourier, Campus scientifique, B.P. 53, 38041 Grenoble cedex 9, France, Dominique.Vaufreydaz@imag.fr)<br><font color="#006600">Bergamini C.</font> (Laboratoire CLIPS-IMAG, équipe GEOD, Université Joseph Fourier, Campus scientifique, B.P. 53, 38041 Grenoble cedex 9, France, Carole.Bergamini@imag.fr)<br><font color="#006600">Serignat J.F.</font> (Laboratoire CLIPS-IMAG, équipe GEOD, Université Joseph Fourier, Campus scientifique, B.P. 53, 38041 Grenoble cedex 9, France, Jean-Francois.Serignat@imag.fr)<br><font color="#006600">Besacier L.</font> (Laboratoire CLIPS-IMAG, équipe GEOD, Université Joseph Fourier, Campus scientifique, B.P. 53, 38041 Grenoble cedex 9, France, Laurent.Besacier@imag.fr)<br><font color="#006600">Akbar M.</font> (Laboratoire CLIPS-IMAG, équipe GEOD, Université Joseph Fourier, Campus scientifique, B.P. 53, 38041 Grenoble cedex 9, France, Mohammad.Akbar@imag.fr)</td>
    </tr>
    <tr>
      <td height="40"><b>Keywords</b></td>
      <td height="40">&nbsp;</td>
    </tr>
      <tr>
      <td height="40"><b>Session</b></td>
      <td height="40">Session SP2 - Spoken Language Resources Issues from Construction to Validation</td>
    </tr>
     <tr>
      <td height="40"><b>Full Paper</b></td>
            <td height="40"><a href="../../ps/235.ps" target="newps" type="application/postscript">235.ps</a>, <a href="../../pdf/235.pdf" target="newpdf" type="application/pdf">235.pdf</a></td>
    </tr>
      <tr>
      <td height="40"><b>Abstract</b></td>
             <td height="40">In this paper, a new methodology for speech corpora definition from internet documents is described, in order to record a large speech database, dedicated to the training and testing of acoustic models for speech recognition. In the first section, the Web robot which is in charge of collecting Web pages from Internet is presented, then the web text to French sentences filtering mechanism is explained. Some information about the
corpus organization (90% for training and 10% for test) is given. In the third section, the phoneme distribution of the corpus is presented and comparison is made with others French language studies. Finally tools and planning for recording the speech database with more than one hundred speakers are described.</td>
    </tr>
  </table><br>
  </center>
</BODY>
</html>