Welcome to RegexAdvice Sign in | Join | Help

A regular expression is a regular pattern to identify regular language

Free Web Hosting with Website Builder

I still didn't really understand the meaning of 'regular'. What is regular language? why they are regular? and how regular is regular? Because the word 'regular' is too ambiguous to be understood. Is it really mean 'regular'? Is it similar in meaning with 'regular' in 'i'm his regular customer'? or does it refer to something else? I already red the book by Arto Salooma (Jewels of Formal Language Theory). What i can understand is regular expression .. (read the title), and regular language is a language that can be accepted by DFA. But i'm still have something in my instinct that keep saying this is not the complete answer (for sure it is not Salomaa's fault, since he's a REGULAR in formal theory!). So i'm still keep searching to complete my record...

 

Free Web Hosting with Website Builder

update 12th June 2008

 a language is regular if there is an automata or FSA that can accept it - wiki. One can build an automata using transition diagram or transition table.
 

Sponsor
Posted by M.Yunus.S | 0 Comments

A good stripper is a perfect matcher...

Free Web Hosting with Website Builder

Angel This page is about regular expression, not sexual content! Angel

A good regular expression should be seen as a beautiful stripper. They have solid body, very attractive, 'hot' and really don't think twice to strip their body (and others!) to search for a perfect match. They swallow everything, test anything (am i gone too far?) just to catch what they're supposed to looked for - the perfect match (the matching pattern). Unfortunately, this 'stripper' have weakness. If we make any mistake, even a simple '.', in the pattern,we won't get anything. Because it is discrete, using bit-to-bit matching. The advantage is, we just name it (correctly), we got it instantly!

The problem with creating regular expression (regex) is they're quite complex to build. Even for a seasoned programmer, they'll easily make mistake. So, it is a much needed effort to create a tool that can auto generate regex pattern. Some try to use natural language, WYSIWYG etc but unfortunately, regex is not suitable to be described using natural language or any of these approaches, because they will (surely) make the notation loose its expressiveness...

 

So what we should do?

For now, i'm still asking this question to myself (and others). ...

 

Free Web Hosting with Website Builder
Sponsor
Posted by M.Yunus.S | 0 Comments

Regular expression as a programming language. Is it possible?

Free Web Hosting with Website Builder

 

I began to learn about regular expression about 3 years ago. I was at that time, never heard about, or even expect to know such a thing. I was asked to learn Perl to solve some Bioinformatics work, and in my mind i don't have any regex knowledge, except mathematical statements like x = { y| y is subset of z}, or  y = {1,2,3}, and some basic knowledge of Z-language. Although the math statements did not closely resembles the regular expression statements that are used in programming language, its foundation is still a 'regular expression' (since regular expression is about stating the regular behavior of the item that we want to specify). We assure y = {1,2,3} by /[1-3]/. Anyway, from that point of time, i began to learn Perl, and slowly i was introduced to m// operator, s/// and tr/// (text processing requires a massive use of these operators).

What i like about regular expression is its compactness. Techniques for simplifying codes have been explored long time before. One approach is by using function (which is eventually another concept that come from math). Using function (or some use the word subroutine), we manage to reduce codes, and simplify them just by calling their name instead of rewriting the same codes. Almost with a similar purpose in mind, we use regex to simplify complex requirements, which is by representing a set of rules within a simple statement, i.e. /a-z/. One should realize that we are representing many lines of codes within a single statement. Just imagine, using regular expression as programming language, a million lines of codes can be turned into just several lines of codes (or symbols).

I'm also believe that regular expression can possibly be a language that is easier to remember, and can be written faster. This because in regular expression we use simple symbols to represent (possibly) complex rules. It have been proved that our brains can (easily) remember things that we see visually compared to the things that are written or touched. Also human brains will capture things in graphical form. Based on this fact, isn't it possible that we can remember some simple symbols more faster than to remember huge amount of text? Also since we only need to write symbols, it will not take us a long time to write the codes in regular expression (unless for complex rules).


Regular expression is specified using a finite set of symbols, such as '?' to represent existence, '+' to represent repetition etc, make it looks more encrypted. Programming language was created to bring computer language (machine language) more closer to the natural language, so that it will become easier for people to write codes to be computer programs. Based on this fact, it seem impossible for encrypted code like regular expression to be accepted as one of high programming language.However, it is not an excuse. Even most of programming language today require some comments to clarify its purpose, or explain what the code does. People might claims that some high level language is already self-explained (the codes explains its purpose). However many of us will found that this statement is not true for all cases. When a section of codes becomes so complex, even the most proclaimed self-explanatory language require at least few comments to describe the codes. Some of todays implemented regular expressions allows comments to be included in the regular expression statement. So it is not encrypted at all when the regular expression are combined with some extra comments. In implementation, no different in code size since comments will be ignored.

I'm just writing the general ideas of how regular expression can possibly be a programming language here. There's still a lot of things that need to be considered, studied and experimented with. But I'm still hoping for this idea to become true.

 

Free Web Hosting with Website Builder

[ update 29th May 2008 ]

I've come across a few paragraphs in a book and some articles which seems related (unfortunately some information on the reference details are missing):

'... If the level of programming language is low compared to the level of the problem it has to cope with, it is extremely difficult for a programmer to write an effective program, The design of programming language is primarily concerned with the reduction of the gap between these levels. ...'

Too high difference between programming language and the target machine will results (1) software unreliability, (2) lowering machine efficiency, (3) excessive program size and (4) increase compiler complexity

=Ichikawa.eds (1992) Language Architecture and Programming Environment. World Scientific Publishing.

A notation must be readable. '... with proper choice of vocabulary the notation can be quite readable. ...'

'... A natural, readable notation results from combining non-symbolic operator names with a right-associative infix syntax, and comma and colon rules that suppress many parentheses. ...'

=MacLennan, B.J. - A Simple, Natural Notation For Applicative Languages


'... Notation that is efficient and preferred is more appropriate. ...'

'... English like notation not necessarily improve programming skill. ...'

=Wright, T., Cockburn, A. (2005) Evaluation of Two Textual Programming Notations for Children. 6th Australasian User Interface Conference (AUIC2005), Newcastle. Conferences in Research and Practice in Information Technology, Vol 40.


'... Once the asymptotic notation is defined in terms of sets, it is only natural to manipulate it using set notation . If "the equal sign really means set inclusion", why not use set inclusion? ...'

'... The proposed switch to set theory would be intolerable if it resulted in a net decrease in expressive power. ...'

'... using a notation closer to the natural property. ...'

=Brassard, Gilles. - Crusade For A Better Notation


'... Notation suited as a tool of thought in any topic should permit easy introduction in the context of that topic. ...'

‘... The utility of a language as a tool of thought increases with the range of topics it can treat, but decreases with the amount of vocabulary and the complexity of grammatical rules which the user must keep in mind. Economy of notation is therefore important. ...’

‘... Economy requires that a large number of ideas be expressible in terms of relatively small vocabulary. ...’

‘... The subjects of mathematical analysis and computation can be represented in a variety of ways, and each representation may posses particular advantages. ...’

=Iverson, K. E. (1980) Notation as a tool of thought. Communication of the ACM. Vol 23:8 pg 444-465


'The primary purpose of notation is communication' - Patashnik, O.

 

[ update 12th June 2008 ]

 I'm thinking about the advantage of natural (written) language over mathematical notations (and vice versa). Since (most) natural language statements can be simplified using mathematical notation e.g. like 'one plus two is equal to three' and '1+2=3'. But somehow there's a situation where mathematical notations gets longer than natural language. For example (unfortunately i can't show the mathematical expression), when we need to describe the relations of elements between sets, i found it is more convenient to describe it using simple sentence. But i arrive at a conclusion that, this 'advantages' over another is simply because there are no simple notation in either languages (neither natural nor mathematical) to describe the semantics of another language. If we need to describe certain aspect of a language (X), in another language (Y), we must define a notation (in Y) which describe the semantic described by other language (X) in similar fashion, which is as precise and understandable (same complexity in interpretation) as the original (X). So when we convert into the target language (Y), we gets a 'similar' complexity with the original (X), but with a new notation.

This idea basically was applied in regex to natural language, except with a different complexity. This complexity increased because regex tends to describe a pattern (many characteristics and semantics in a minimum notation) of natural text, not directly one-to-one interpretation (i mean the whole expression, not the symbol). If we want to allow regex to be able to describe a semantic of program (or as programming language), there should be many (not all) one-to-one correspondence between regex statement and computer program command.

Well, that's all. I haven't found anymore idea than this (yet!)
 

 

 

 

Sponsor
Posted by M.Yunus.S | 1 Comments