Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

TVshows regex problem

Last post 01-25-2012, 3:08 AM by misterb. 2 replies.
Sort Posts: Previous Next
  •  01-23-2012, 9:04 PM 84549

    TVshows regex problem

    Hi guys,

    I need a regex that will help me rewrite my TV shows blog old posts. Until now I'm rewiting title using a php code that construct "show name"."Season number"."x"."Ep number)"." - "."episode tiltle

    for exemple No Ordinary Family 1x01 – Pilot

    For a better use I need to rewrite it to "show name" saison "Season number" épisode "Ep number"

    for example  No Ordinary Family saison 1 épisode 1

    I try to constructmy regex with (((?!\s+–\s+).)*)\s+(\d+)x(\d+)\s+\-\s+(((?!\s+–\s+).)*) but it don't work and I still don't undertand why. Thanks for your advice

     

     

    Filed under:
  •  01-24-2012, 12:21 AM 84552 in reply to 84549

    Re: TVshows regex problem

    To understand what is going on here, you need to break the pattern down into its various parts. The first is

    (((?!\s+–\s+).)*)\s+

    What this will do is to step one character at a time through the text until it comes to a "whitespace dash whitespace" pattern, i.e the underlined part in "1x01 - pilot".

    What happens after that is actually rather complicated as it will involve a lot of back-tracking to try to get a match with the next part of the pattern whcih is:

    (\d+)x(\d+)

    This part will look for text such as the "1x01" part of your first example.

    The key as to why your second example does not match is that it does not contain anything that looks like this. As this is a required part of the overall pattern, the regex engine will fail to match your second example at all. (Actually there are other reasons why it would not match but that is enough to show you the problem).

    As to how you can solve this, you really need to understand exactly what you are trying to match. In cases such as this I recommend that you get several examples of all of the lines you want to match, knowing which parts you want to extract (such as the title, the series number, episode number etc). Then work out the rules that you would tell a child as to how to separate out any of the examples in the way you want. These rules might look like:

    step through until you get to the work "saison" or something that looks like "digits X digits" can call that the episode title
    take the next set of digits you come to and call that the season number
    skip any "X" or spaces and the word "episode"
    take the number you get next and call that the episode number

    and so on. When you have a set of rules that will match ALL of the possible examples you might want to scan, then we can try to put those into a regex expression for you.

    Susan

  •  01-25-2012, 3:08 AM 84570 in reply to 84552

    Re: TVshows regex problem

    Thanks you Susan for your explanation
View as RSS news feed in XML