[Date Prev][Date Next] [Chronological] [Thread] [Top]

New Phonetic Design

To: openldap-devel@OpenLDAP.org
Subject: New Phonetic Design
From: Alexandre PAUZIES <apauzies@linagora.com>
Date: Wed, 22 Sep 2004 12:02:49 +0200
Organization: LINAGORA - http://www.linagora.com
User-agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3.50 (gnu/linux)

Hi,

As   far   as   you   know,   OpenLdap  use   phonetic   functions   for
approximation.  I've done a  new phonetic  mecanism for  OpenLDAP 2.2.17
that use for instance  a set of french rules but new  set of rules could
be easily added.


Why we need a new design ?
##########################

- for each  new phonetic algorithm/language  we need to implement  a new
  function, add #define etc...
- phonetic functions are not easy to understand or implement
- the use  of strcmp for matching  does not allow a  flexible match (see
  SLAPD_PHONETIC_V2_PRECISION)
- the use of #define does not allow to switch from an algorithm/language
  on runtime (so could not be used with langage codes)


How does this design look like ?
################################

- A new  language could  easily be  added by a  new entry  (lang, rules,
  post-rules) in the phonetic lang table.
- A new  algorithm could easily be  added by writting a  simple table of
  rules.
- Each rule is  an action (find/replace...) with a  set of condition (is
  preceded by...) which are easy to implement.
- Each post-phonetic rule is a simple table of ordered characters.
- The precision of this phonetic mecanism could easily be changed.
- The default phonetic language could be changed from config file.



How does this one works ?
#########################

1) The Phonetic's rules
-----------------------

- You need to write your own language/algorithm photenic rules :

Here i define rules for french language and phonex alorithm (by Frederic
BROUARD)


static rule_t   phonetic_rules_fr_phonex[] =
  {
  }

a rule  is defined  by an action  (ie: FIND_REPLACE) with  its arguments
("ie: h" -> "")  and by a set of conditions (ie:  NOT PRECEDED BY 'c' OR
's' OR 'p')


this example :

    { {FIND_REPLACE, {"h", ""}}, {{PRECEDED, "csp", NOT|OR}} },

will delete all  characters 'h' not preceded by character  'c' or 's' or
'p'


You could write rules with more than one condition like this :

    {   {FIND_REPLACE,  {"s",   "z"}},  {{FOLLOWED,   "aeiou1234",  OR},
                                         {PRECEDED, "aeiou1234", OR}} },


An other example, I want to delete character 't' if it end the word :

    { {FIND_REPLACE, {"t", ""}}, {{FOLLOWED, ALL, AND|NOT}} },

etc...

You could find more example by looking in "phonetic.h"


2) Post-Phonetic's rules
------------------------

For now, you got a phonetic  function that return a phonetic copy of the
word   (like  the   old  one   function)  but   you  can't   select  how
permissive/flexible  your match  will be.  That's why  the post_phonetic
function is.


You  need to  define post-phonetic  rules by  assigning an  integer (the
position  of  the char  on  the "char  tab[]")  to  each character  (not
replaced/deleted by your phonetic algorithm).


Thoses rules will be used to convert your phonetic word copy to a string
representing a float value.


Example :

static char     phonetic_post_rules_fr_phonex[22] =
  {
    '1', '2', '3', '4', '5', 'e', 'f', 'g', 'h', 'i', 'k',
    'l', 'n', 'o', 'r', 's', 't', 'u', 'w', 'x', 'y', 'z'
  };

will asign number 0 to char '1' ... and number 21 to char 'z'

In this example,  those number will be converted to base  22 and the sum
of all new numbers will become  a float. This float number will be store
into a string.


So, to set the precision/flexibility  of this new phonetic mecanism, you
need  to  set  SLAPD_PHONETIC_V2_PRECISION  (in  schema_init.c)  to  the
signifiant number of figure in your float (string) value.


Then, an  strncmp(word, post_phonetic_word, SLAPD_PHONETIC_V2_PRECISION)
will be done to do the match.



3) The Phonetic language table
------------------------------

Once you've defined  your phonetic and post-phonetic rules,  you need to
add them for your language to phonetic_lang[] :


static phonetic_t       phonetic_lang[] =
  {
    {"fr", phonetic_rules_fr_phonex, phonetic_post_rules_fr_phonex},
    {NULL, NULL, NULL},
  }


4) Slapd.conf
-------------

Set the default "lang" option in you slapd.conf like this :

lang fr

so the phonetic function now which rules to use for your language.


4) Enable new Phonetic mecanism
-------------------------------

Finaly, add the "--enable-phonetic2" option to you configure script.


To do:
######

May be more  actions/conditions should be added to  this new mecanism to
suite all languages.

The LDAP_UTF8_APPROX  flag passed to UTF8bvnormalize could  be a problem
(ie: I can't do actions or check condition on accentueted characters).

The "lang" option in the config  file should be the default lang and not
the only one because for attributes with language codes we should select
the corresponding  phonetic rules  if there is  one, or the  default one
(config file defined).



Any comments will be appreciated.

Best regards,

Alexandre.

--- openldap-2.2.17/configure.in	2004-07-26 20:15:05.000000000 +0200
+++ openldap-2.2.17-phonetic2/configure.in	2004-09-20 16:17:56.805716056 +0200
@@ -194,6 +194,7 @@
 OL_ARG_ENABLE(slapi,[    --enable-slapi        enable SLAPI support (experimental)], no)dnl
 OL_ARG_ENABLE(slp,[    --enable-slp          enable SLPv2 support], no)dnl     
 OL_ARG_ENABLE(wrappers,[    --enable-wrappers	  enable tcp wrapper support], no)dnl
+OL_ARG_ENABLE(phonetic2,[    --enable-phonetic2	  enable new phonetic system for approx], no)dnl
 
 dnl ----------------------------------------------------------------
 dnl SLAPD Backend Options
@@ -1990,6 +1991,30 @@
 dnl fi
 
 dnl ----------------------------------------------------------------
+dnl PHONETIC2
+ol_link_math=no
+if test $ol_enable_phonetic2 != no ; then
+	AC_CHECK_HEADERS(math.h)
+	if test $ac_cv_header_math_h != yes ; then
+		AC_MSG_ERROR([could not locate <math.h>])
+	fi
+
+	AC_CHECK_LIB(m,powf,[have_m=yes],[have_m=no])
+	if test $have_m = yes ; then
+		ol_link_math="yes"
+	fi
+
+	if test $ol_link_math != no ; then
+		ac_save_LIBS="$LIBS"
+		LIBS="$LIBS -lm"
+
+	elif test $ol_enable_phonetic2 != auto ; then
+		AC_MSG_ERROR([could not locate Math library])
+	fi
+fi
+
+
+dnl ----------------------------------------------------------------
 dnl SQL
 ol_link_sql=no
 if test $ol_enable_sql != no ; then
@@ -2401,6 +2426,9 @@
 if test "$ol_enable_aci" != no ; then
 	AC_DEFINE(SLAPD_ACI_ENABLED,1,[define to support per-object ACIs])
 fi
+if test "$ol_enable_phonetic2" != no ; then
+	AC_DEFINE(SLAPD_PHONETIC_V2,1,[define to support new phonetic system])
+fi
 
 if test "$ol_link_modules" != no ; then
 	AC_DEFINE(SLAPD_MODULES,1,[define to support modules])
--- openldap-2.2.17/include/portable.h.in	2004-07-16 20:35:03.000000000 +0200
+++ openldap-2.2.17-phonetic2/include/portable.h.in	2004-09-20 16:25:52.775357672 +0200
@@ -977,6 +977,9 @@
 /* define to support SHELL backend */
 #undef SLAPD_SHELL
 
+/* define to support new Phonetic system */
+#undef SLAPD_PHONETIC_V2
+
 /* define to support SQL backend */
 #undef SLAPD_SQL
 
--- openldap-2.2.17/servers/slapd/schema_init.c	2004-08-30 18:18:31.000000000 +0200
+++ openldap-2.2.17-phonetic2/servers/slapd/schema_init.c	2004-09-20 16:39:35.628265016 +0200
@@ -61,6 +61,7 @@
 #define IA5StringApproxIndexer			approxIndexer
 #define IA5StringApproxFilter			approxFilter
 
+
 static int
 inValidate(
 	Syntax *syntax,
@@ -1400,6 +1401,11 @@
 #	define SLAPD_APPROX_WORDLEN 1
 #endif
 
+#if defined(SLAPD_PHONETIC_V2)
+#	define SLAPD_PHONETIC_V2_PRECISION 7
+#endif
+
+
 static int
 approxMatch(
 	int *matchp,
@@ -1412,6 +1418,8 @@
 	struct berval *nval, *assertv;
 	char *val, **values, **words, *c;
 	int i, count, len, nextchunk=0, nextavail=0;
+	char *tmp;
+
 
 	/* Yes, this is necessary */
 	nval = UTF8bvnormalize( value, NULL, LDAP_UTF8_APPROX, NULL );
@@ -1442,7 +1450,14 @@
 	values = (char **)ch_malloc( count * sizeof(char *) );
 	for ( c = nval->bv_val, i = 0;  i < count; i++, c += strlen(c) + 1 ) {
 		words[i] = c;
+#if defined(SLAPD_PHONETIC_V2)
+		tmp = phonetic_v2(c);
+		values[i] = post_phonetic_v2(tmp);
+		printf("[%s] -> [%s] -> [%s]\n", c, tmp, values[i]);
+		ch_free(tmp);
+#else
 		values[i] = phonetic(c);
+#endif
 	}
 
 	/* Work through the asserted value's words, to see if at least some
@@ -1467,11 +1482,22 @@
 		else {
 			/* Isolate the next word in the asserted value and phonetic it */
 			assertv->bv_val[nextchunk+len] = '\0';
+#if defined(SLAPD_PHONETIC_V2)
+			tmp = phonetic_v2(assertv->bv_val + nextchunk);
+			val = post_phonetic_v2(tmp);
+			printf("[%s] -> [%s] -> [%s...]\n", assertv->bv_val+nextchunk, tmp, val);
+			ch_free(tmp);
+#else
 			val = phonetic( assertv->bv_val + nextchunk );
+#endif
 
 			/* See if this phonetic chunk is in the remaining words of *value */
 			for( i=nextavail; i<count; i++ ){
+#if defined(SLAPD_PHONETIC_V2)
+				if( !strncmp( val, values[i], SLAPD_PHONETIC_V2_PRECISION ) ){
+#else
 				if( !strcmp( val, values[i] ) ){
+#endif
 					nextavail = i+1;
 					break;
 				}
@@ -1521,6 +1547,7 @@
 	void *ctx )
 {
 	char *c;
+	char *tmp;
 	int i,j, len, wordcount, keycount=0;
 	struct berval *newkeys;
 	BerVarray keys=NULL;
@@ -1551,7 +1578,13 @@
 		for( c = val.bv_val, i = 0; i < wordcount; c += len + 1 ) {
 			len = strlen( c );
 			if( len < SLAPD_APPROX_WORDLEN ) continue;
+#if defined (SLAPD_PHONETIC_V2)
+			tmp = phonetic_v2(c);
+			ber_str2bv( post_phonetic_v2( tmp ), 0, 0, &keys[keycount] );
+			ch_free(tmp);
+#else
 			ber_str2bv( phonetic( c ), 0, 0, &keys[keycount] );
+#endif
 			keycount++;
 			i++;
 		}
@@ -1576,6 +1609,7 @@
 	void *ctx )
 {
 	char *c;
+	char *tmp;
 	int i, count, len;
 	struct berval *val;
 	BerVarray keys;
@@ -1607,7 +1641,13 @@
 	for( c = val->bv_val, i = 0; i < count; c += len + 1 ) {
 		len = strlen(c);
 		if( len < SLAPD_APPROX_WORDLEN ) continue;
+#if defined (SLAPD_PHONETIC_V2)
+		tmp = phonetic_v2(c);
+		ber_str2bv( post_phonetic_v2( tmp ), 0, 0, &keys[i] );
+		ch_free(tmp);
+#else
 		ber_str2bv( phonetic( c ), 0, 0, &keys[i] );
+#endif
 		i++;
 	}
 
--- openldap-2.2.17/servers/slapd/phonetic.c	2004-01-01 19:16:34.000000000 +0100
+++ openldap-2.2.17-phonetic2/servers/slapd/phonetic.c	2004-09-20 18:09:57.158067208 +0200
@@ -23,6 +23,13 @@
  * software without specific prior written permission. This software
  * is provided ``as is'' without express or implied warranty.
  */
+/* Portions Copyright (c) 2004 Alexandre PAUZIES <apauzies@linagora.com>.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms are permitted
+ * provided that this notice is preserved and that due credit is given
+ * to Alexandre PAUZIES.
+ */
 
 #include "portable.h"
 
@@ -32,10 +39,13 @@
 #include <ac/string.h>
 #include <ac/socket.h>
 #include <ac/time.h>
+#include <math.h>
 
 #include "slap.h"
+#include "phonetic.h"
+
 
-#if !defined(SLAPD_METAPHONE) && !defined(SLAPD_PHONETIC)
+#if !defined(SLAPD_METAPHONE) && !defined(SLAPD_PHONETIC) && !defined(SLAPD_PHONETIC_V2)
 #define SLAPD_METAPHONE
 #endif
 
@@ -180,6 +190,197 @@
         return( ch_strdup( phoneme ) );
 }
 
+
+#elif defined(SLAPD_PHONETIC_V2)
+/* Portions Copyright (c) 2004 Alexandre PAUZIES <apauzies@linagora.com>.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms are permitted
+ * provided that this notice is preserved and that due credit is given
+ * to Alexandre PAUZIES.
+ */
+
+
+static int	is_followed(char *start, char *pos, condition_t *condition)
+{
+  char		*p;
+
+  if (*(++pos))
+    {
+      if (condition->flag & OR)
+	{
+	  if (strchr(condition->param, *pos) != NULL)
+	    return ((condition->flag & NOT) ? 0 : 1);
+	}
+      else if (condition->flag & AND)
+	{
+	  for (p = condition->param;
+	       *p && *pos && *p == *pos; p++, pos++);
+	  if (!*p)
+	    return ((condition->flag & NOT) ? 0 : 1);
+	}
+    }
+  return ((condition->flag & NOT) ? 1 : 0);
+}
+
+
+static int	is_repeated(char *start, char *pos, condition_t *condition)
+{
+  if ((*(pos+1)) && *pos == (*(pos + 1)))
+    return ((condition->flag & NOT) ? 0 : 1);
+  return ((condition->flag & NOT) ? 1 : 0);
+}
+
+
+static int	is_preceded(char *start, char *pos, condition_t *condition)
+{
+  int		i;
+
+  if (pos > start)
+    {
+      pos--;
+      if (condition->flag & OR)
+	{
+	  if (strchr(condition->param, *pos) != NULL)
+	    return ((condition->flag & NOT) ? 0 : 1);
+	}
+      else if (condition->flag & AND)
+	{
+	  for (i = strlen(condition->param) - 1;
+	       i >= 0 && pos >= start && condition->param[i] == *pos;
+	       i--, pos--);
+	  if (i < 0)
+	    return ((condition->flag & NOT) ? 0 : 1);
+	}
+    }
+  return ((condition->flag & NOT) ? 1 : 0);
+}
+
+
+static int	check_conditions(char *start, char *pos, rule_t *rule)
+{
+  int		i;
+  int		j;
+
+  for (i = 0; rule->conditions[i].name; i++)
+    for (j = 0; checks[j].name; j++)
+      if (checks[j].name == rule->conditions[i].name)
+	switch (rule->conditions[i].name)
+	  {
+	  case FOLLOWED:
+	    if (!checks[j].try(start, pos+strlen(rule->action.params[0])-1,
+			       &rule->conditions[i]))
+	      return 0;
+	  default:
+	    if (!checks[j].try(start, pos, &rule->conditions[i]))
+	      return 0;
+	  }
+  return 1;
+}
+
+
+static char	*replace(char *start, char *pos, rule_t *rule)
+{
+  int	str_len;
+  int	look_for_len;
+  int	change_to_len;
+  int	diff_len;
+
+  str_len = strlen(pos);
+  look_for_len = strlen(rule->action.params[0]);
+  if (!look_for_len)
+    look_for_len++;
+  change_to_len = strlen(rule->action.params[1]);
+  diff_len = look_for_len - change_to_len;
+
+  if (diff_len < 0) // Do we really need this ?
+    pos = ch_realloc(pos, (size_t)(strlen - diff_len +1));
+  memmove(pos + change_to_len, pos + look_for_len, str_len - diff_len + 1);
+  if (change_to_len)
+    memcpy(pos, rule->action.params[1], change_to_len);
+
+  return pos;
+}
+
+
+static void	*find_replace(char *start, char *pos, rule_t *rule)
+{
+  if (!*pos)
+    return NULL;
+  if (*rule->action.params[0])
+    if ((pos = strstr(pos, rule->action.params[0])) == NULL)
+      return NULL;
+
+  if (!check_conditions(start, pos, rule))
+    find_replace(start, ++pos, rule);
+  else if (*(pos = replace(start, pos, rule)))
+    find_replace(start, pos, rule);
+
+  return NULL;
+}
+
+
+char		*phonetic_v2(char *word)
+{
+  int		i;
+  int		j;
+  char		*s;
+  rule_t	*rules;
+
+  for (i = 0; phonetic_lang[i].lang != NULL &&
+	strcmp(phonetic_lang[i].lang, lang); i++);
+  if (phonetic_lang[i].lang == NULL)
+    return NULL; // Error, no phonetic rules found for this lang
+
+  rules = phonetic_lang[i].rules;
+  s = ch_strdup(word);
+
+  for (i = 0; rules[i].action.name; i++)
+    for (j = 0; commands[j].name; j++)
+      if (rules[i].action.name == commands[j].name)
+	commands[j].run(s, s, &rules[i]);
+
+  return s;
+}
+
+
+char		*post_phonetic_v2(char *word)
+{
+  int		*tab;
+  int		i;
+  int		j;
+  double	res;
+  char		*res_str;
+  char		*p;
+  char		*post_rules;
+
+  for (i = 0; phonetic_lang[i].lang != NULL &&
+	strcmp(phonetic_lang[i].lang, lang); i++);
+  if (phonetic_lang[i].lang == NULL)
+    return NULL; // Error, no post phonetic rules found for this lang
+
+  post_rules = phonetic_lang[i].post_rules;
+
+  tab = ch_malloc(sizeof(int) * strlen(word) + 1);
+  for (i = 0, p = word; *p; p++, i++)
+    for (j = 0; post_rules[j]; j++)
+      if (*p == post_rules[j])
+	tab[i] = j;
+
+  for (j = 0; post_rules[j]; j++);
+
+  for (res = 0.0, i = 0; i < strlen(word); i++)
+    res += tab[i] * powf(j, 0 -i -1);
+
+  if (tab)
+    ch_free (tab);
+
+  res_str = ch_malloc(sizeof(char) * 26);
+  sprintf(res_str, "%4.20f", res);
+
+  return res_str;
+}
+
 #elif defined(SLAPD_METAPHONE)
 
 /*
--- openldap-2.2.17/servers/slapd/proto-slap.h	2004-09-12 22:22:39.000000000 +0200
+++ openldap-2.2.17-phonetic2/servers/slapd/proto-slap.h	2004-09-20 14:18:23.046293256 +0200
@@ -901,6 +901,9 @@
  * phonetic.c
  */
 LDAP_SLAPD_F (char *) phonetic LDAP_P(( char *s ));
+LDAP_SLAPD_F (char *) phonetic_v2 LDAP_P(( char *s ));
+LDAP_SLAPD_F (char *) post_phonetic_v2 LDAP_P(( char *s ));
+
 
 /*
  * referral.c
@@ -1259,6 +1262,8 @@
 LDAP_SLAPD_V (unsigned long)		num_ops_initiated_[SLAP_OP_LAST];
 #endif /* SLAPD_MONITOR */
 
+LDAP_SLAPD_V (char *)		lang;
+
 LDAP_SLAPD_V (char *)		slapd_pid_file;
 LDAP_SLAPD_V (char *)		slapd_args_file;
 LDAP_SLAPD_V (time_t)		starttime;
--- openldap-2.2.17/servers/slapd/phonetic.h	1970-01-01 01:00:00.000000000 +0100
+++ openldap-2.2.17-phonetic2/servers/slapd/phonetic.h	2004-09-20 17:53:03.952097928 +0200
@@ -0,0 +1,194 @@
+/* phonetic.h - routines to do phonetic matching */
+/* This work is part of OpenLDAP Software <http://www.openldap.org/>.
+ *
+ * Copyright 1998-2004 The OpenLDAP Foundation.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted only as authorized by the OpenLDAP
+ * Public License.
+ *
+ * A copy of this license is available in the file LICENSE in the
+ * top-level directory of the distribution or, alternatively, at
+ * <http://www.OpenLDAP.org/license.html>.
+ */
+/* Portions Copyright (c) 2004 Alexandre PAUZIES <apauzies@linagora.com>.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms are permitted
+ * provided that this notice is preserved and that due credit is given
+ * to Alexandre PAUZIES.
+ */
+
+#ifndef _SLAP_PHONETIC_H_
+#define _SLAP_PHONETIC_H_
+
+#define NONE 0
+
+#define FOLLOWED 1
+#define REPEATED 2
+#define PRECEDED -1
+
+#define OR 1
+#define AND 2
+#define NOT 4
+#define ALL ""
+
+#define FIND_REPLACE 1
+
+
+
+#define MAX_CONDITIONS 4
+#define MAX_PARAMS 3
+
+
+
+typedef struct	check_s
+{
+  int		name;
+  int		(*try)();
+}		check_t;
+
+typedef struct	command_s
+{
+  int		name;
+  void		*(*run)();
+}		command_t;
+
+typedef struct	condition_s
+{
+  int		name;
+  char		*param;
+  int		flag;
+}		condition_t;
+
+typedef struct	action_s
+{
+  int		name;
+  char		*params[MAX_PARAMS];
+}		action_t;
+
+typedef struct	rule_s
+{
+  action_t	action;
+  condition_t	conditions[MAX_CONDITIONS];
+}		rule_t;
+
+
+typedef struct	phonetic_s
+{
+  char		*lang;
+  rule_t	*rules;
+  char		*post_rules;
+}		phonetic_t;
+
+
+static void	*find_replace(char *start, char *pos, rule_t *rule);
+static char	*replace(char *start, char *pos, rule_t *rule);
+static int	check_conditions(char *start, char *pos, rule_t *rule);
+static int	is_followed(char *start, char *pos, condition_t *condition);
+static int	is_preceded(char *start, char *pos, condition_t *condition);
+static int	is_repeated(char *start, char *pos, condition_t *condition);
+
+
+static command_t	commands[] =
+  {
+    { FIND_REPLACE,	find_replace },
+    { NONE,		NULL },
+  };
+
+static check_t	checks[] =
+  {
+    { PRECEDED, is_preceded },
+    { FOLLOWED, is_followed },
+    { REPEATED, is_repeated },
+    { NONE,	NULL },
+  };
+
+
+
+/* This is the phonex rules, by Frederic BROUARD
+   (http://sqlpro.developpez.com/Soundex/SQL_AZ_soundex.html) */
+
+static rule_t	phonetic_rules_fr_phonex[] =
+  {
+    { {FIND_REPLACE, {"y", "i"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"h", ""}}, {{PRECEDED, "csp", NOT|OR}} },
+    { {FIND_REPLACE, {"ph", "f"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"gan", "kan"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"gam", "kam"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"gain", "kain"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"gaim", "kaim"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"ain", "yn"}}, {{FOLLOWED, "aeiou", OR}} },
+    { {FIND_REPLACE, {"ein", "yn"}}, {{FOLLOWED, "aeiou", OR}} },
+    { {FIND_REPLACE, {"aim", "yn"}}, {{FOLLOWED, "aeiou", OR}} },
+    { {FIND_REPLACE, {"eim", "yn"}}, {{FOLLOWED, "aeiou", OR}} },
+    { {FIND_REPLACE, {"eau", "o"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"oua", "2"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"ein", "4"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"ain", "4"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"eim", "4"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"aim", "4"}}, {{NONE, NULL, NONE}} },
+    /* { "é", "y", {{NONE, NULL, NONE}} }, */ // Could not be use
+    /* { "è", "y", {{NONE, NULL, NONE}} }, */ // (APPROX flag to
+    /* { "ê", "y", {{NONE, NULL, NONE}} }, */ // normalize())
+    { {FIND_REPLACE, {"ai", "y"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"ei", "y"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"er", "yr"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"et", "yt"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"ess", "yss"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"an", "1"}}, {{FOLLOWED, "aeiou1234", OR|NOT}} },
+    { {FIND_REPLACE, {"am", "1"}}, {{FOLLOWED, "aeiou1234", NOT|OR}} },
+    { {FIND_REPLACE, {"en", "1"}}, {{FOLLOWED, "aeiou1234", NOT|OR}} },
+    { {FIND_REPLACE, {"em", "1"}}, {{FOLLOWED, "aeiou1234", NOT|OR}} },
+    { {FIND_REPLACE, {"in", "4"}}, {{FOLLOWED, "aeiou1234", NOT|OR}} },
+    { {FIND_REPLACE, {"s", "z"}}, {{FOLLOWED, "aeiou1234", OR},
+				   {PRECEDED, "aeiou1234", OR}} },
+    { {FIND_REPLACE, {"oe", "e"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"eu", "e"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"au", "o"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"oi", "2"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"oy", "2"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"ou", "3"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"sch", "5"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"ch", "5"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"sh", "5"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"ss", "s"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"sc", "s"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"c", "s"}}, {{FOLLOWED, "ei", OR}} },
+    { {FIND_REPLACE, {"c", "k"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"q", "k"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"qu", "k"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"gu", "k"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"ga", "ka"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"go", "ko"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"gy", "ky"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"a", "o"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"d", "t"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"p", "t"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"j", "g"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"b", "f"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"v", "f"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"m", "n"}}, {{NONE, NULL, NONE}} },
+    { {FIND_REPLACE, {"t", ""}}, {{FOLLOWED, ALL, AND|NOT}} },
+    { {FIND_REPLACE, {"x", ""}}, {{FOLLOWED, ALL, AND|NOT}} },
+    { {FIND_REPLACE, {ALL, ""}}, {{REPEATED, NULL, NONE}} },
+    { {NONE, {NULL}}, {{NONE, NULL, NONE}} },
+};
+
+
+static char	phonetic_post_rules_fr_phonex[22] =
+  {
+    '1', '2', '3', '4', '5', 'e', 'f', 'g', 'h', 'i', 'k',
+    'l', 'n', 'o', 'r', 's', 't', 'u', 'w', 'x', 'y', 'z'
+  };
+
+
+static phonetic_t	phonetic_lang[] =
+  {
+    {"fr", phonetic_rules_fr_phonex, phonetic_post_rules_fr_phonex},
+    {NULL, NULL, NULL},
+  };
+
+
+#endif /* _SLAP_PHONETIC_H_ */
--- openldap-2.2.17/servers/slapd/config.c	2004-09-12 22:22:38.000000000 +0200
+++ openldap-2.2.17-phonetic2/servers/slapd/config.c	2004-09-20 12:01:04.084805784 +0200
@@ -91,6 +91,8 @@
 
 char   *strtok_quote_ptr;
 
+char	*lang = NULL;
+
 int use_reverse_lookup = 0;
 
 #ifdef LDAP_SLAPI
@@ -631,6 +633,24 @@
 		} else if ( strcasecmp( cargv[0], "replica-argsfile" ) == 0 ) {
 			/* ignore */ ;
 
+		/* get default lang for approx */
+		} else if ( strcasecmp( cargv[0], "lang" ) == 0 ) {
+			if ( cargc < 2 ) {
+#ifdef NEW_LOGGING
+				LDAP_LOG( CONFIG, CRIT, 
+					"%s: line %d missing lang name in \"lang <language>\" "
+					"line.\n", fname, lineno, 0 );
+#else
+				Debug( LDAP_DEBUG_ANY,
+	    "%s: line %d: missing lang name in \"lang <language>\" line\n",
+				    fname, lineno, 0 );
+#endif
+
+				return( 1 );
+			}
+
+			lang = ch_strdup( cargv[1] );
+
 		/* default password hash */
 		} else if ( strcasecmp( cargv[0], "password-hash" ) == 0 ) {
 			if ( cargc < 2 ) {

-- 
Alexandre PAUZIES <apauzies@linagora.com>
LINAGORA - http://www.linagora.com/

Follow-Ups:
- Re: New Phonetic Design
  - From: Howard Chu <hyc@symas.com>
- Re: New Phonetic Design
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- Re: New Phonetic Design
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>

Prev by Date: Re: Support for extensible certificate mapping
Next by Date: Re: New Phonetic Design
Index(es):
- Chronological
- Thread