Hello,
I am quite new to regular expressions. I have been strugling with a regexp for the most part of tonight and although it has been a slow process I've been making progress. Up untill 3 hours ago. First, let me introduce you to what I'm trying to do. I am sorry if my explanation is bad but I find it quite hard to explain exacly what it is that I want.
My input string (the one I want to search) will be one or several occurances of the following (the doublequtes are not a part of the string, they are there just to better show you my string):
"('[string of a-z, 0-9, score, space and slash]', '[string of a-z, 0-9, score, space and slash]', '[string of a-z, 0-9, score, space and slash]', '[string of 4 numbers]',
'[string of a-z, 0-9, score and slash]', '[string of 32 letters and number]', '[constant string]'), "
So to clarify the input could look like this:
('1ip5nf75165x2 y76gbls9vg2b 1eykrv0sfkj01', '1eykrv0sfkj01 1ip5nf75165x2 y76gbls9vg2b', 'y76gbls9vg2b 1eykrv0sfkj01 1ip5nf75165x2', '1986', 'Pop/rock', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', '666'), ('vxou799vo60q je2pis76nfja moo71t4xvtzq', 'moo71t4xvtzq vxou799vo60q je2pis76nfja', 'je2pis76nfja moo71t4xvtzq vxou799vo60q', '1986', 'Pop/rock', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', '666'), ('xphd4va0yn3k 1dpm8d4drfkcj 5vsufgm9pk98', '5vsufgm9pk98 xphd4va0yn3k 1dpm8d4drfkcj', '1dpm8d4drfkcj 5vsufgm9pk98 xphd4va0yn3k', '1986', 'Pop/rock', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', '666'), ('19vqw8y57g7lw 16uhqeqgqlweu 1fmg8hah2d0rf', '1fmg8hah2d0rf 19vqw8y57g7lw 16uhqeqgqlweu', '16uhqeqgqlweu 1fmg8hah2d0rf 19vqw8y57g7lw', '1986', 'Pop/rock', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', '666'),
I may want to make sure that [constant string] is allways 666. If its any other numerical string (or even empty) than 666 I want my regexp to catch that occurance.
Here is the regexp I have come up with (the case (lowercase/upercase) doesnt matter):
/\((\'[\w -\/]*\', ){3}\'\d{4}\', (\'[\w -\/]*\', ){1}(\'\w{32}\', ){1}\'[^6]*[^6]*[^6]*\'\), /i
I want this regexp to match all occurances that have its [constant string] set to anything than 666. However it doesnt work. It seems to work alot better if i change [^6]*[^6]*[^6] to some non-identical numbers but I need the regexp to work regardless of the number I have set.
So finally, to clarify, if the input is:
('1ip5nf75165x2 y76gbls9vg2b 1eykrv0sfkj01', '1eykrv0sfkj01
1ip5nf75165x2 y76gbls9vg2b', 'y76gbls9vg2b 1eykrv0sfkj01
1ip5nf75165x2', '1986', 'Pop/rock', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',
'457'), ('vxou799vo60q je2pis76nfja moo71t4xvtzq', 'moo71t4xvtzq
vxou799vo60q je2pis76nfja', 'je2pis76nfja moo71t4xvtzq vxou799vo60q',
'1986', 'Pop/rock', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', '457'),
('xphd4va0yn3k 1dpm8d4drfkcj 5vsufgm9pk98', '5vsufgm9pk98 xphd4va0yn3k
1dpm8d4drfkcj', '1dpm8d4drfkcj 5vsufgm9pk98 xphd4va0yn3k', '1986',
'Pop/rock', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', '457'), ('19vqw8y57g7lw
16uhqeqgqlweu 1fmg8hah2d0rf', '1fmg8hah2d0rf 19vqw8y57g7lw
16uhqeqgqlweu', '16uhqeqgqlweu 1fmg8hah2d0rf 19vqw8y57g7lw', '1986',
'Pop/rock', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', '457'),
and I have set the constant string to 457 my regexp should find nothing. But if any of the 457's should happen to be anything other than 457 I need to find that occurance. I guess what I'm trying to do is find a way to match anything other than some word that I specify. As I understand it [^x] matches all charachters other than x, but I need a way to match all strings other than some string that I specify.
Maybe it is of some value if I mention I am trying to use this regexp with php, and more specifically php's preg_match().
I hope someone can help me with this. Thank you very much in advance!
p.s.
If you have some additional suggestions that could increase the performance of my regexp please tell me about it. In the application where the regexp will be used it will be critical that it runs as fast as possible. The input string will be somewhere around 1-5 megabytes so it will have alot of data to process, and I guess even small optimizations can make a big difference on the final execution time.
/Robert
edit:
I discovered that replaceing spaces " " with \s gave me a performance boost.
edit2:
I am starting to realize my regexp seems to be flawed in other ways. It seems that the part before the last bit (the bit that should check for all ints other than xyz) isnt working very well.
So now that the whole regexp is broken here is a summary of what the whole regexp is supposed to achieve (I have decided to simplify it):
Determine if "('[random text]', '[random text]', '[random text]', '[random text]', '[random text]', '[32 chars and digits]',
'[a number other than xyz'), " exists in the input string. The input string can be any number of "('[random text]', '[random text]', '[random text]', '[random text]', '[random text]', '[32 chars and digits]',
'[a number other than xyz'), " different occurances.
edit3:
I've been trying to make progress and after switching to the eregi function I wrote this regexp:
(\(('[^']*',){5}'[0-9a-z]{32}','666'\),)
It will attempt to find a match that has the last ,'number' set to 666 and it works. For instance if the input looks like this (there are thee occurances of the pattern):
('jwoiqpeqy4dt 16krx4eqk2ff2 so3p2vts8b2r','so3p2vts8b2r jwoiqpeqy4dt 16krx4eqk2ff2','16krx4eqk2ff2 so3p2vts8b2r jwoiqpeqy4dt','1986','Pop/rock','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','656'),('jwoiqpeqy4dt 16krx4eqk2ff2 so3p2vts8b2r','so3p2vts8b2r jwoiqpeqy4dt 16krx4eqk2ff2','16krx4eqk2ff2 so3p2vts8b2r jwoiqpeqy4dt','1986','Pop/rock','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','666'),('jwoiqpeqy4dt 16krx4eqk2ff2 so3p2vts8b2r','so3p2vts8b2r jwoiqpeqy4dt 16krx4eqk2ff2','16krx4eqk2ff2 so3p2vts8b2r jwoiqpeqy4dt','1986','Pop/rock','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','656'),
then the middle occurance will be matched.
However this isnt really what I want, instead I would like it to match agains the first row that doesnt match the 666. It feels like im so close to the solution...