Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

RegEx to find misspelled words

Last post 11-05-2009, 5:11 PM by Aussie Susan. 1 replies.
Sort Posts: Previous Next
  •  11-05-2009, 6:04 AM 57158

    RegEx to find misspelled words

    Normal 0 21 false false false DA X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabel - Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

    I need a RegEx that will find a specific word within a long string. The issue is that this word may be misspelled and I need to find it even so. I would like to accept a certain percentage of wrong-ness when looking for the word. Ex.

    The complete string: Hello, this is my comp/et sting to look at

    The word to search for: complete

    Let’s say that I which to accept a maximum of two wrong letters in the above string, then the RegEx should match the word complete. However, if I only accept 1 wrong letter it shouldn’t find it. Ideally the RegEx would also be able to handle whitespaces, and missing letters. Ex:

    The complete string: Hello, this is my com pee sting to look at

    The word to search for: complete

    This should match the word as well, even though there is a whitespace between ‘m’ and ‘p’ and the letter ‘l’ is missing.

    Is this possible at all with RegEx or should I be looking at an alternative way to solve it?

    Thanks, Tommy

  •  11-05-2009, 5:11 PM 57174 in reply to 57158

    Re: RegEx to find misspelled words

    Basicallly this is not possible with a regex, if only for the reason that you are asking the regex to count the number of mismatches. (Actually, if you use a callback or delegate [depending on the regex variant you were using] then it might be possible to do the counting, but you are still left with the problem of matching the entire word once the necessary number of mistakes have been made - and where would you stop?).

    I would suggest that you think about how you would do this mechanically. For a start, there is the issue of how to know how many characters to match, especially if you can have whitespace added between characters - following your example, if you have the text "com plete" which has 9 characters and the test word 'complete' which has 8, what would you do; then what about the same rules applied to the text "com pleted" or "the complete d structure" etc.

    I think even breaking the text into "words" would be a challenge given the whitespace issue, let alone the fuzzy matching (which is a quite different problem to regex pattern matching) you would need to do on each word.

    Just my 2c

    Susan

View as RSS news feed in XML