Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Need to replace non-alpha characters

Last post 01-18-2010, 9:22 PM by Aussie Susan. 3 replies.
Sort Posts: Previous Next
  •  01-18-2010, 6:24 AM 58534

    Need to replace non-alpha characters

    Hi

     

    I am a software engineer in VB.Net. Regex is its own language. I am willing to learn it.
    My first info I got was from : http://www.regular-expressions.info/dotnet.html
    But I miss an overview of commands. Perhaps you can help me in this first case.

     

    My main question is;
    How to replace non-Alhpa characters in my String with the ascii number of his non-Alpha character.


    Example1
    Source:
    Hello world, I am 13 years old
    But quiete not     fat!!

    Result should be:
    Hello world, I am 13 years old [13] But quiete not [9] fat!!

    And back:
    Example2
    Source:
    Hello world, I am 13 years old [13] But quiete not [9] fat!!

    Result should be:
    Hello world, I am 13 years old
    But quiete not     fat!!


    [13] is used in my own textbox to show, that an carriage return is used here. And [9] for an Tab.
    In this case, all characters instead of ASCII 32 to ASCII 126 (My try: 0-9a-zA-Z_) should be replaced with his Ascii number bordered with [ and ].

    http://www.asciitable.com

     

    If you see the table above (weblink) there are some problems because the "[" and "]" should be replaced too. But this is a second update to make this special replacement with the "[" and "]"

     

    Hope you can help me to create this Regex.

     

    Regards

     


    www.goldengel.ch
  •  01-18-2010, 6:01 PM 58564 in reply to 58534

    Re: Need to replace non-alpha characters

    You are correct in that "Regex is its own language" but it is a language that works at the character level and makes no interpretation of the characters itself.

    In your case, you can create a character set definition such as:

    [\r\n\t\f]    (for a specific set of characters)

    or 

    [^ -~]       (for all non-printing characters - also '\p{C}' for all control characters etc)

    that will identify each specified character of interest. However, that is all it is designed to do: it cannot (of itself) convert that into a number or perform any other manipulations on the character (or string of characters). What a regex IS designed to do is to locate (and possibly extract) a sequence of characters that match some pattern and then return them to you so you can interpret them in some way. You can think of a regex engine as a fancy "fuzzy search" machine - it will find things for you but not much more.

    In VB.NET there are 2 approaches you can take.The first is to realise that the collection of matches that are returned by the regex engine also include the position within the source string where the character(s) is/are found. Therefore, you can use this to perform your own string manipulation to convert the character to its ASCII index equivalent and then replace the single character with the correctly formatted character sequence that you create. (If you do this, I would recommend that you work backwards from the last match to the first because the location information returned by the regex is relative to the original location of the character in the source string. By working backwards you are not affecting the locations of the preceding matches when you make the substitutions).

    There is another way with the .NET regex engine using delegate functions which can be called when each match is made. However, if you are just starting out with regexs, I would recommend that you understand the basics first (starting with what problems regexs are intended to solve) and then (much later) start looking at the fancier capabilities of whatever regex variant you are working with.

    Also, please remember that not all regex variants are the same. Delegate functions are available in the .NET regex engine; PCRE provides a callback facility (using a different mechanism and pattern syntax), PERL allows you to insert code that is executed within the pattern and so on. Each regex variant and access language provide you with a different set of capabilities, often with differing extensions to the basic regex pattern syntax. That is way I suggest you start with the basics and only later get in to what else can be done.

    I wish I could remember where I first read this (and attribute it appropriately)  but there is a saying along the lines of: "I had a problem that I thought I could solve with a regex: now I have 2 problems!".

    Susan

  •  01-18-2010, 8:11 PM 58569 in reply to 58564

    Re: Need to replace non-alpha characters

    Ha ha ha. Now I have two problems. This pleases me.

     

    Thanks a lot Susan for your time to write down your knowledge to me. My problem is solved because I simle use VB's IndexOf function if I can not use Regex. 

    But I am still interested in, how the syntax works. And I really wonder, what people uses to learn it. I mean, with some colors, I will understand it immediately. I saw some editors which are working with colors for the Regex syntax. But I would be please to have an explanation list of every keyword and syntax. And then it will be easy to understand if not all characters are black on white I think. 

     

    Thanks again

    Nice day

    Timo


    www.goldengel.ch
  •  01-18-2010, 9:22 PM 58572 in reply to 58569

    Re: Need to replace non-alpha characters

    I would suggest that you do 3 things:

    1) do a Google for a suitable search phrase such as "regex syntax tutorial" - this returns over a million hits but the first few I saw were fairly good sites - and select one that suits you. They generally go over the basic syntax with examples

    2) find a real (but initially simple) problem that you can use as the basis for your learning

    3) get access to a regex tester (either download a free one or locate one on the web). Before you do this you will need to think about the platform you are wanting to use because, as I said, not all regex variants are the same. Under the Windows environment the .NET regex is readily available as are free C# and VB development environments from Microsoft but there are many others as well. If you are likely to work with web-based systems the the PHP and similar languages provide access to a regex (sometimes several) and you many need to identify the regex that lurks beneath the surface (PCRE is often used by PHP for example)

    Once you have these, you can start by taking small steps and build up to a working pattern though trial and error.

    Susan

View as RSS news feed in XML