Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Extract a Date in a Funky Format

Last post 11-02-2009, 1:41 AM by ChuckG. 4 replies.
Sort Posts: Previous Next
  •  10-31-2009, 4:34 PM 57067

    Extract a Date in a Funky Format

    I have downloaded a trial version of a piece of software called Email2DB. It's an email parser/processor that allows you to filter emails based on criteria and extract data from within those filtered emails. I have managed to extract the other fields I need, but I'm having particular trouble with the date and conversion. I am a complete Regex noob, and have really no idea where to start with this one.

    I am book reseller and buy many books from the same few sites. These sites always include the date in their Order Confirmation Email of when I bought the books. The date appears in the emails like this:

    Sat, Oct 31, 2009 01:15 PM

    I need to extract this line, discard the time, and convert the date to MM/DD/YYYY.

    So that line would become a database entry of:
    10/31/2009
     
    I realize Regex's can't convert strings, only match them, so if you can just tell me how to isolate a variable date in the format example of "Sat, Oct 31, 2009" it would be much appreciated. The I'll just have to wait until the software developers tell me how to convert it.
     
    Is this possible? Thanks in advance!
  •  10-31-2009, 6:46 PM 57069 in reply to 57067

    Re: Extract a Date in a Funky Format

    Doing a quick Google on "email2db regex" shows a page with the regex syntax options on it. I think the following pattern should work

    ([A-Za-z]+) ([0-9]+), ([0-9]+)

    Note that I would normally use '\d' instead of '[0-9]' and '\w' instead of '[A-Za-z]' but I could not see the "standard" forms listed on the web page.

    Also I'm not sure about the use if the parentheses. These are normally used so that you can get the characters that match those parts of the text at a later stage - as in the conversions you are talking about. However, again I couldn't see them in the web page so you might want to take them out if the pattern doesn't work.

    This assumes that you are only scanning the line with the date on it.

    Also, that date format is not all that "funky" - it is a fairly standard UNIX-style human readable date string and there are quite a few routines available that can be used to parse those strings into the format you want.

    Susan 

     

  •  11-01-2009, 5:16 PM 57079 in reply to 57069

    Re: Extract a Date in a Funky Format

    Thanks for responding so quickly Susan!

    Unfortunately that didn't work for me. Once I removed the parentheses it did do something, but it grabbed exactly what I don't need from that string, aka the time. So the parentheses definitely aren't needed, but do you have any other suggestions to matching just the date (the bold part below) and not the time? I don't mind if I have to take the entire thing even with the time, just as long as it grabs the line with the date.

    Sat, Oct 31, 2009 01:15PM 

    Sorry to request additional help, Susan. I appreciate it very much though!

    --Chuck.

  •  11-02-2009, 12:02 AM 57082 in reply to 57079

    Re: Extract a Date in a Funky Format

    I don't understand how it matched the time part as the first part of the match requires alphabetic characters followed by a space followed by digits and there is nothing in the time part that could match that sequence. The only thing I can think of is that the program is actually doing a "replace" operation and is removing the date part and only leaving the time on the output line.

    Also, I'm a bit confused by you comment about "just as long as it grabs the line with the date". Are you expecting the pattern to find the date from the complete text of the email or are you able to guide the regex to only the line and want it to extract the the date part? I did say that the pattern assumes you are only scanning the line with the date on it.

    Assuming that you can sort out the find vs replace situation, if you are searching for a date within the entire text of an email, then I would use something like:

    [A-Za-z]+, [A-Za-z]+ [0-9]+, [0-9]+

    which also includes the day of the week to make it a bit more specific.The pattern I presented before would also match "Fred 56, 32" etc.

    In fact, if you can either accept the full date/time string or there is a way of extracting the match groups fom the overall match, then something like:

    [A-Za-z]+, [A-Za-z]+ [0-9]+, [0-9]+ [0-9]{2}:[0-9]{2}[AP]M

    would match the entire string. While it would also match "Meters, Distance 23, 6594 43:23AMICABLE" this would seem a much less likely false positive.

    Susan

  •  11-02-2009, 1:41 AM 57083 in reply to 57082

    Re: Extract a Date in a Funky Format

    You got it! Thanks ever so much. Your misunderstanding was completely accurate, I was using the regex as a pointer to then start a parse, meaning it was just locating the regex and then taking everything after it. Your questioning of the find/replace issue made me dig deeper to discover an option that let me use the regex as the field, under the "Use Find Mask" option. I'm really not as daft as I sound, I had just never heard of regex's before yesterday or had a need for them. I do see how powerful they can be now though. Thanks ever so much for your thoughtful and well laid out responses, it doesn't seem right to get your kind help for free.

    Thanks again Susan, I appreciate your time greatly.

     --Chuck

View as RSS news feed in XML