Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Change <p>whatever</p> into <li>whatever</li>, based on a class name and multiple <p> tags.

Last post 09-04-2008, 2:56 PM by prometheuzz. 9 replies.
Sort Posts: Previous Next
  •  09-04-2008, 1:01 PM 45962

    Change <p>whatever</p> into <li>whatever</li>, based on a class name and multiple <p> tags.

    Hi there, I'm new here and just learning Regular Expressions and am looking for some help on the following problem.

     I want to replace <p> tags with <li> tags in a large HTML string using a regular expression.

     I would like the following input to translate to the following output:

    Input:

    <p>test</p>
    <p class="a">test</p>
    <p class="a">test</p>
    <p>test</p>
     

    Output:

    <p>test</p>
    <li>test</li>
    <li>test</li>
    <p>test</p>  

    So Far I have as follows:

    RegEx: <p class="a">(?'S1'.*)</p>

    Replace: <li>${S1}</li>

    The problem occurs when the Input is on the same line:

    Input:

     <p>test</p>
    <p class="a">test</p><p class="a">test</p>
    <p>test</p>

    Output:

    <p>test</p>
    <li>test</p><p class="a">test</p>
    <p>test</li>

    I understand that the regex is finding the first instance of  <p class="a"> and taking everything in between and finding the last </p>, however, as you can see it is not my desired result.

    Is there an indicator that will allow me to specify "find the following </p> after finding the <p class="a"> and apply the regex to that"

    I am using C# with the following basic code for testing purposes.

    Regex.Replace(txtInput.Text, txtRegEx.Text, txtRegReplace.Text, RegexOptions.Singleline | RegexOptions.IgnoreCase);

    Any help would be appreciated.

    harry

  •  09-04-2008, 1:07 PM 45964 in reply to 45962

    Re: Change <p>whatever</p> into <li>whatever</li>, based on a class name and multiple <p> tags.

    In your regex, replace the .* with [^<]*
  •  09-04-2008, 1:25 PM 45965 in reply to 45964

    Re: Change <p>whatever</p> into <li>whatever</li>, based on a class name and multiple <p> tags.

    Thanks Prometheuzz! That totally did it.

    Could you explain exactly what it is doing there?

     

  •  09-04-2008, 1:41 PM 45969 in reply to 45965

    Re: Change <p>whatever</p> into <li>whatever</li>, based on a class name and multiple <p> tags.

    Ah, I spoke a little soon, this does solve the initial problem, however, brings to light a new one.

     In a more extended example:

     Input:
    <p class="a"><!--[if !supportLists]-->Item 1</p>

     I guess the regex fails, because it is finding a "<" and stopping there? Is there a way around that also?

  •  09-04-2008, 1:48 PM 45970 in reply to 45965

    Re: Change <p>whatever</p> into <li>whatever</li>, based on a class name and multiple <p> tags.

    footose:

    Thanks Prometheuzz! That totally did it.

     Good to hear it, and you're welcome.

     

    footose:

    Could you explain exactly what it is doing there?

    Sure.

    The * is called a greedy operator. This means it will "eat" or "match" as much as it can.
    Take the string "<x>aaa</x>aaa</x>" for example. Using the regex "<x>.*</x>", the first thing the regex will do is match the first three characters: "<x>", after that the greedy ".*" will  match "aaa</x>aaa</x>" but there is still the "</x>" to be matched in the string, so the regex engine will back-track a few positions, letting the ".*" match "aaa</x>aaa" and then let the final part of the regex "</x>" match the l
    ast part of the string.

    <x>aaa</x>aaa</x>
    -->                  // "<x>" matches "<x>"
       ------------->    // ".*" matches "aaa</x>aaa</x>"
                 <---    // back-tracking: ".*" now matches "aaa</x>aaa"
                 --->    // "</x>" matches "</x>"

    Now, to make ".*" non-greedy (it's called reluctant in regex-terms), you can add a "?" behind it (try it out!). But in this case, it is "better" to be more specific as to what you want to match. The characters between tags you want to match, shouldn't contain a '<', so that's what you want to match: "zero or more occurrances of any character except '<'", which in regex is: "[^<]*". The two square brackets are called a character class (or character set). The character class [abc] means: "match one character, either 'a', 'b' or 'c'". But adding a ^ at the start of the character class negates it, so [^abc] means: "any character except 'a', 'b' or 'c'".

  •  09-04-2008, 1:53 PM 45971 in reply to 45970

    Re: Change <p>whatever</p> into <li>whatever</li>, based on a class name and multiple <p> tags.

    Great explanation.

     So can I match on a whole ending tag such as "</p>", instead of juts the "<" (incase a user has input a "<" somewhere between the <p> and </p>?

     

     <p class="a">(?'S1'[^</p>]*)</p>  ?

  •  09-04-2008, 2:08 PM 45973 in reply to 45971

    Re: Change <p>whatever</p> into <li>whatever</li>, based on a class name and multiple <p> tags.

    double post
  •  09-04-2008, 2:22 PM 45977 in reply to 45971

    Re: Change <p>whatever</p> into <li>whatever</li>, based on a class name and multiple <p> tags.

    footose:

    Great explanation.

     So can I match on a whole ending tag such as "</p>", instead of juts the "<" (incase a user has input a "<" somewhere between the <p> and </p>?

     

     <p class="a">(?'S1'[^</p>]*)</p>  ?

    Nope, a character class matches only one character. So [^</p>] means: "match one character which can be anything except '<', '/', 'p' or '>'".
    The easiest way to do what you want is to make your dot-star reluctant by adding a ? after it:

     <p class="a">(?'S1'.*?)</p>

     and a (slightly) more complex way would be to use a negative look ahead:

     <p class="a">(?'S1'((?!</p>).)*)</p>

    but that one may go over your head, since you mentioned you've just started with regex-es.

     HTH

  •  09-04-2008, 2:44 PM 45979 in reply to 45977

    Clean pasted MS Word HTML, and add unordered list.

    This is awesome. Thank you very much.

    I went with yoru suggested "complex" way since you recommended it, although i do not understand it fully at this point because of the "negative look ahead" portion, but I plan to get more into that shortly.

    Anyway, for anyone that is interested, the final code does the following:

     - Strip useless Word Characters

     - Check for an Unordered list in "Word" and change it all to real HTML unordered list

    - Strip all crap inside of "<p>" tag:

    - Strip "spaces"

     Code:


     string html = "";
    html = txtInput.Text;

    StringCollection sc = new StringCollection();

    // get rid of unnecessary tag spans (comments and title)
    sc.Add(@"<!--(\w|\W)+?-->");
    sc.Add(@"<title>(\w|\W)+?</title>");
    // Get rid of classes and styles
    sc.Add(@"\s?class=\w+");
    sc.Add(@"\s+style='[^']+'');
    // Get rid of unnecessary tags
    sc.Add(@"<(meta|link|/?o:|/?style|/?font|/?div|/?st\d|/?head|/?tmhl|body|/?body|/?span|!\[)[^>]*?>");
    // Get rid of empty paragraph tags
    //sc.Add(@"(<[^>]+>)+&nbsp;(</\w+>)+");
    // remove bizarre v: element attached to <img> tag
    sc.Add(@"\s+v:\w+=""[^""]+""");
    // remove extra lines
    sc.Add(@"(\n\r){2,}");


    foreach (string s in sc)
    {
        html = Regex.Replace(html, s, "", RegexOptions.IgnoreCase);
    }

    // replace the msolistparagraphcxspfirst with the UL and the LI tag
    // Regex: <p class=""MsoListParagraphCxSpFirst""[^>]*>(?'S1'((?!</p>).)*)</p>
    // Replace: <ul><li>${S1}</li>
    html = Regex.Replace(html, @"<p class=""MsoListParagraphCxSpFirst""[^>]*>(?'S1'((?!</p>).)*)</p>", "<ul><li>${S1}</li>", RegexOptions.Singleline);

    // replace the msolistparagraphcxspmiddle with li /li
    // Regex: <p class=""MsoListParagraphCxSpMiddle""[^>]*>(?'S1'((?!</p>).)*)</p>
    // Replace: <li>${S1}</li>
    html = Regex.Replace(html, @"<p class=""MsoListParagraphCxSpMiddle""[^>]*>(?'S1'((?!</p>).)*)</p>", "<li>${S1}</li>", RegexOptions.Singleline);

    // replace the msolistparagraphcxspmiddle with li /li and </ul>
    // Regex: <p class=""MsoListParagraphCxSpLast""[^>]*>(?'S1'((?!</p>).)*)</p>
    //Replace: <li>${S1}</li></ul>
    html = Regex.Replace(html, @"<p class=""MsoListParagraphCxSpLast""[^>]*>(?'S1'((?!</p>).)*)</p>", "<li>${S1}</li></ul>", RegexOptions.Singleline);

    // strip it all from the p-tag
    html = Regex.Replace(html, @"<p[^>]*>", "<p>", RegexOptions.IgnoreCase);

    // remove crappy spaces
    html = Regex.Replace(html, @"&nbsp;", "", RegexOptions.IgnoreCase);

    txtOutput.Text = html; 

     

    Input:


    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="ProgId" content="Word.Document"><meta name="Generator" content="Microsoft Word 12"><meta name="Originator" content="Microsoft Word 12"><link rel="File-List" href="file:///C:%5CUsers%5CHarry%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_filelist.xml"><link rel="themeData" href="file:///C:%5CUsers%5CHarry%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_themedata.thmx"><link rel="colorSchemeMapping" href="file:///C:%5CUsers%5CHarry%5CAppData%5CLocal%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_colorschememapping.xml"><!--[if gte mso 9]><xml>
     <w:WordDocument>
      <w:View>Normal</w:View>
      <w:Zoom>0</w:Zoom>
      <w:TrackMoves/>
      <w:TrackFormatting/>
      <w:PunctuationKerning/>
      <w:ValidateAgainstSchemas/>
      <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
      <w:IgnoreMixedContent>false</w:IgnoreMixedContent>
      <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
      <w:DoNotPromoteQF/>
      <w:LidThemeOther>EN-US</w:LidThemeOther>
      <w:LidThemeAsian>X-NONE</w:LidThemeAsian>
      <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
      <w:Compatibility>
       <w:BreakWrappedTables/>
       <w:SnapToGridInCell/>
       <w:WrapTextWithPunct/>
       <w:UseAsianBreakRules/>
       <w:DontGrowAutofit/>
       <w:SplitPgBreakAndParaMark/>
       <w:DontVertAlignCellWithSp/>
       <w:DontBreakConstrainedForcedTables/>
       <w:DontVertAlignInTxbx/>
       <w:Word11KerningPairs/>
       <w:CachedColBalance/>
      </w:Compatibility>
      <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
      <m:mathPr>
       <m:mathFont m:val="Cambria Math"/>
       <m:brkBin m:val="before"/>
       <m:brkBinSub m:val="&#45;-"/>
       <m:smallFrac m:val="off"/>
       <m:dispDef/>
       <m:lMargin m:val="0"/>
       <m:rMargin m:val="0"/>
       <m:defJc m:val="centerGroup"/>
       <m:wrapIndent m:val="1440"/>
       <m:intLim m:val="subSup"/>
       <m:naryLim m:val="undOvr"/>
      </m:mathPr></w:WordDocument>
    </xml><![endif]--><!--[if gte mso 9]><xml>
     <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
      DefSemiHidden="true" DefQFormat="false" DefPriority="99"
      LatentStyleCount="267">
      <w:LsdException Locked="false" Priority="0" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
      <w:LsdException Locked="false" Priority="9" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
      <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
      <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
      <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
      <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
      <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
      <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
      <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
      <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
      <w:LsdException Locked="false" Priority="39" Name="toc 1"/>
      <w:LsdException Locked="false" Priority="39" Name="toc 2"/>
      <w:LsdException Locked="false" Priority="39" Name="toc 3"/>
      <w:LsdException Locked="false" Priority="39" Name="toc 4"/>
      <w:LsdException Locked="false" Priority="39" Name="toc 5"/>
      <w:LsdException Locked="false" Priority="39" Name="toc 6"/>
      <w:LsdException Locked="false" Priority="39" Name="toc 7"/>
      <w:LsdException Locked="false" Priority="39" Name="toc 8"/>
      <w:LsdException Locked="false" Priority="39" Name="toc 9"/>
      <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
      <w:LsdException Locked="false" Priority="10" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Title"/>
      <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
      <w:LsdException Locked="false" Priority="11" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
      <w:LsdException Locked="false" Priority="22" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
      <w:LsdException Locked="false" Priority="20" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
      <w:LsdException Locked="false" Priority="59" SemiHidden="false"
       UnhideWhenUsed="false" Name="Table Grid"/>
      <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
      <w:LsdException Locked="false" Priority="1" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
      <w:LsdException Locked="false" Priority="60" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Shading"/>
      <w:LsdException Locked="false" Priority="61" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light List"/>
      <w:LsdException Locked="false" Priority="62" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Grid"/>
      <w:LsdException Locked="false" Priority="63" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 1"/>
      <w:LsdException Locked="false" Priority="64" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 2"/>
      <w:LsdException Locked="false" Priority="65" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 1"/>
      <w:LsdException Locked="false" Priority="66" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 2"/>
      <w:LsdException Locked="false" Priority="67" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 1"/>
      <w:LsdException Locked="false" Priority="68" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 2"/>
      <w:LsdException Locked="false" Priority="69" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 3"/>
      <w:LsdException Locked="false" Priority="70" SemiHidden="false"
       UnhideWhenUsed="false" Name="Dark List"/>
      <w:LsdException Locked="false" Priority="71" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Shading"/>
      <w:LsdException Locked="false" Priority="72" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful List"/>
      <w:LsdException Locked="false" Priority="73" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Grid"/>
      <w:LsdException Locked="false" Priority="60" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
      <w:LsdException Locked="false" Priority="61" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light List Accent 1"/>
      <w:LsdException Locked="false" Priority="62" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
      <w:LsdException Locked="false" Priority="63" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
      <w:LsdException Locked="false" Priority="64" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
      <w:LsdException Locked="false" Priority="65" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
      <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
      <w:LsdException Locked="false" Priority="34" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
      <w:LsdException Locked="false" Priority="29" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
      <w:LsdException Locked="false" Priority="30" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
      <w:LsdException Locked="false" Priority="66" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
      <w:LsdException Locked="false" Priority="67" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
      <w:LsdException Locked="false" Priority="68" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
      <w:LsdException Locked="false" Priority="69" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
      <w:LsdException Locked="false" Priority="70" SemiHidden="false"
       UnhideWhenUsed="false" Name="Dark List Accent 1"/>
      <w:LsdException Locked="false" Priority="71" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
      <w:LsdException Locked="false" Priority="72" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
      <w:LsdException Locked="false" Priority="73" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
      <w:LsdException Locked="false" Priority="60" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
      <w:LsdException Locked="false" Priority="61" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light List Accent 2"/>
      <w:LsdException Locked="false" Priority="62" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
      <w:LsdException Locked="false" Priority="63" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
      <w:LsdException Locked="false" Priority="64" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
      <w:LsdException Locked="false" Priority="65" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
      <w:LsdException Locked="false" Priority="66" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
      <w:LsdException Locked="false" Priority="67" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
      <w:LsdException Locked="false" Priority="68" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
      <w:LsdException Locked="false" Priority="69" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
      <w:LsdException Locked="false" Priority="70" SemiHidden="false"
       UnhideWhenUsed="false" Name="Dark List Accent 2"/>
      <w:LsdException Locked="false" Priority="71" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
      <w:LsdException Locked="false" Priority="72" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
      <w:LsdException Locked="false" Priority="73" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
      <w:LsdException Locked="false" Priority="60" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
      <w:LsdException Locked="false" Priority="61" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light List Accent 3"/>
      <w:LsdException Locked="false" Priority="62" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
      <w:LsdException Locked="false" Priority="63" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
      <w:LsdException Locked="false" Priority="64" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
      <w:LsdException Locked="false" Priority="65" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
      <w:LsdException Locked="false" Priority="66" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
      <w:LsdException Locked="false" Priority="67" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
      <w:LsdException Locked="false" Priority="68" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
      <w:LsdException Locked="false" Priority="69" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
      <w:LsdException Locked="false" Priority="70" SemiHidden="false"
       UnhideWhenUsed="false" Name="Dark List Accent 3"/>
      <w:LsdException Locked="false" Priority="71" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
      <w:LsdException Locked="false" Priority="72" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
      <w:LsdException Locked="false" Priority="73" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
      <w:LsdException Locked="false" Priority="60" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
      <w:LsdException Locked="false" Priority="61" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light List Accent 4"/>
      <w:LsdException Locked="false" Priority="62" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
      <w:LsdException Locked="false" Priority="63" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
      <w:LsdException Locked="false" Priority="64" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
      <w:LsdException Locked="false" Priority="65" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
      <w:LsdException Locked="false" Priority="66" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
      <w:LsdException Locked="false" Priority="67" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
      <w:LsdException Locked="false" Priority="68" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
      <w:LsdException Locked="false" Priority="69" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
      <w:LsdException Locked="false" Priority="70" SemiHidden="false"
       UnhideWhenUsed="false" Name="Dark List Accent 4"/>
      <w:LsdException Locked="false" Priority="71" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
      <w:LsdException Locked="false" Priority="72" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
      <w:LsdException Locked="false" Priority="73" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
      <w:LsdException Locked="false" Priority="60" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
      <w:LsdException Locked="false" Priority="61" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light List Accent 5"/>
      <w:LsdException Locked="false" Priority="62" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
      <w:LsdException Locked="false" Priority="63" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
      <w:LsdException Locked="false" Priority="64" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
      <w:LsdException Locked="false" Priority="65" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
      <w:LsdException Locked="false" Priority="66" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
      <w:LsdException Locked="false" Priority="67" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
      <w:LsdException Locked="false" Priority="68" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
      <w:LsdException Locked="false" Priority="69" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
      <w:LsdException Locked="false" Priority="70" SemiHidden="false"
       UnhideWhenUsed="false" Name="Dark List Accent 5"/>
      <w:LsdException Locked="false" Priority="71" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
      <w:LsdException Locked="false" Priority="72" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
      <w:LsdException Locked="false" Priority="73" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
      <w:LsdException Locked="false" Priority="60" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
      <w:LsdException Locked="false" Priority="61" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light List Accent 6"/>
      <w:LsdException Locked="false" Priority="62" SemiHidden="false"
       UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
      <w:LsdException Locked="false" Priority="63" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
      <w:LsdException Locked="false" Priority="64" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
      <w:LsdException Locked="false" Priority="65" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
      <w:LsdException Locked="false" Priority="66" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
      <w:LsdException Locked="false" Priority="67" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
      <w:LsdException Locked="false" Priority="68" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
      <w:LsdException Locked="false" Priority="69" SemiHidden="false"
       UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
      <w:LsdException Locked="false" Priority="70" SemiHidden="false"
       UnhideWhenUsed="false" Name="Dark List Accent 6"/>
      <w:LsdException Locked="false" Priority="71" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
      <w:LsdException Locked="false" Priority="72" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
      <w:LsdException Locked="false" Priority="73" SemiHidden="false"
       UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
      <w:LsdException Locked="false" Priority="19" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
      <w:LsdException Locked="false" Priority="21" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
      <w:LsdException Locked="false" Priority="31" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
      <w:LsdException Locked="false" Priority="32" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
      <w:LsdException Locked="false" Priority="33" SemiHidden="false"
       UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
      <w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
      <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
     </w:LatentStyles>
    </xml><![endif]--><style>
    <!--
     /* Font Definitions */
     @font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;
        mso-font-charset:2;
        mso-generic-font-family:auto;
        mso-font-pitch:variable;
        mso-font-signature:0 268435456 0 0 -2147483648 0;}
    @font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;
        mso-font-charset:1;
        mso-generic-font-family:roman;
        mso-font-format:other;
        mso-font-pitch:variable;
        mso-font-signature:0 0 0 0 0 0;}
    @font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;
        mso-font-charset:0;
        mso-generic-font-family:swiss;
        mso-font-pitch:variable;
        mso-font-signature:-1610611985 1073750139 0 0 159 0;}
     /* Style Definitions */
     p.MsoNormal, li.MsoNormal, div.MsoNormal
        {mso-style-unhide:no;
        mso-style-qformat:yes;
        mso-style-parent:"";
        margin-top:0in;
        margin-right:0in;
        margin-bottom:10.0pt;
        margin-left:0in;
        line-height:115%;
        mso-pagination:widow-orphan;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        mso-ascii-font-family:Calibri;
        mso-ascii-theme-font:minor-latin;
        mso-fareast-font-family:Calibri;
        mso-fareast-theme-font:minor-latin;
        mso-hansi-font-family:Calibri;
        mso-hansi-theme-font:minor-latin;
        mso-bidi-font-family:"Times New Roman";
        mso-bidi-theme-font:minor-bidi;}
    p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        mso-style-unhide:no;
        mso-style-qformat:yes;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:10.0pt;
        margin-left:.5in;
        mso-add-space:auto;
        line-height:115%;
        mso-pagination:widow-orphan;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        mso-ascii-font-family:Calibri;
        mso-ascii-theme-font:minor-latin;
        mso-fareast-font-family:Calibri;
        mso-fareast-theme-font:minor-latin;
        mso-hansi-font-family:Calibri;
        mso-hansi-theme-font:minor-latin;
        mso-bidi-font-family:"Times New Roman";
        mso-bidi-theme-font:minor-bidi;}
    p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst
        {mso-style-priority:34;
        mso-style-unhide:no;
        mso-style-qformat:yes;
        mso-style-type:export-only;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        mso-add-space:auto;
        line-height:115%;
        mso-pagination:widow-orphan;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        mso-ascii-font-family:Calibri;
        mso-ascii-theme-font:minor-latin;
        mso-fareast-font-family:Calibri;
        mso-fareast-theme-font:minor-latin;
        mso-hansi-font-family:Calibri;
        mso-hansi-theme-font:minor-latin;
        mso-bidi-font-family:"Times New Roman";
        mso-bidi-theme-font:minor-bidi;}
    p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle
        {mso-style-priority:34;
        mso-style-unhide:no;
        mso-style-qformat:yes;
        mso-style-type:export-only;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        mso-add-space:auto;
        line-height:115%;
        mso-pagination:widow-orphan;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        mso-ascii-font-family:Calibri;
        mso-ascii-theme-font:minor-latin;
        mso-fareast-font-family:Calibri;
        mso-fareast-theme-font:minor-latin;
        mso-hansi-font-family:Calibri;
        mso-hansi-theme-font:minor-latin;
        mso-bidi-font-family:"Times New Roman";
        mso-bidi-theme-font:minor-bidi;}
    p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast
        {mso-style-priority:34;
        mso-style-unhide:no;
        mso-style-qformat:yes;
        mso-style-type:export-only;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:10.0pt;
        margin-left:.5in;
        mso-add-space:auto;
        line-height:115%;
        mso-pagination:widow-orphan;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        mso-ascii-font-family:Calibri;
        mso-ascii-theme-font:minor-latin;
        mso-fareast-font-family:Calibri;
        mso-fareast-theme-font:minor-latin;
        mso-hansi-font-family:Calibri;
        mso-hansi-theme-font:minor-latin;
        mso-bidi-font-family:"Times New Roman";
        mso-bidi-theme-font:minor-bidi;}
    .MsoChpDefault
        {mso-style-type:export-only;
        mso-default-props:yes;
        mso-ascii-font-family:Calibri;
        mso-ascii-theme-font:minor-latin;
        mso-fareast-font-family:Calibri;
        mso-fareast-theme-font:minor-latin;
        mso-hansi-font-family:Calibri;
        mso-hansi-theme-font:minor-latin;
        mso-bidi-font-family:"Times New Roman";
        mso-bidi-theme-font:minor-bidi;}
    .MsoPapDefault
        {mso-style-type:export-only;
        margin-bottom:10.0pt;
        line-height:115%;}
    @page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;
        mso-header-margin:.5in;
        mso-footer-margin:.5in;
        mso-paper-source:0;}
    div.Section1
        {page:Section1;}
     /* List Definitions */
     @list l0
        {mso-list-id:1221133633;
        mso-list-type:hybrid;
        mso-list-template-ids:2137679924 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
    @list l0:level1
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:Symbol;}
    ol
        {margin-bottom:0in;}
    ul
        {margin-bottom:0in;}
    -->
    </style><!--[if gte mso 10]>
    <style>
     /* Style Definitions */
     table.MsoNormalTable
        {mso-style-name:"Table Normal";
        mso-tstyle-rowband-size:0;
        mso-tstyle-colband-size:0;
        mso-style-noshow:yes;
        mso-style-priority:99;
        mso-style-qformat:yes;
        mso-style-parent:"";
        mso-padding-alt:0in 5.4pt 0in 5.4pt;
        mso-para-margin-top:0in;
        mso-para-margin-right:0in;
        mso-para-margin-bottom:10.0pt;
        mso-para-margin-left:0in;
        line-height:115%;
        mso-pagination:widow-orphan;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        mso-ascii-font-family:Calibri;
        mso-ascii-theme-font:minor-latin;
        mso-hansi-font-family:Calibri;
        mso-hansi-theme-font:minor-latin;}
    </style>
    <![endif]-->

    <p class="MsoListParagraphCxSpFirst" style='text-indent: -0.25in;'><!--[if !supportLists]--><span style='font-family: Symbol;'><span style=''><span style='font-family: &quot;Times New Roman&quot;; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    </span></span></span><!--[endif]-->List 1</p>

    <p class="MsoListParagraphCxSpMiddle" style='text-indent: -0.25in;'><!--[if !supportLists]--><span style='font-family: Symbol;'><span style=''><span style='font-family: &quot;Times New Roman&quot;; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    </span></span></span><!--[endif]-->List 2</p>

    <p class="MsoListParagraphCxSpMiddle" style='text-indent: -0.25in;'><!--[if !supportLists]--><span style='font-family: Symbol;'><span style=''><span style='font-family: &quot;Times New Roman&quot;; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    </span></span></span><!--[endif]-->List 3</p>

    <p class="MsoListParagraphCxSpLast" style='text-indent: -0.25in;'><!--[if !supportLists]--><span style='font-family: Symbol;'><span style=''><span style='font-family: &quot;Times New Roman&quot;; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    </span></span></span><!--[endif]-->List 4</p>


    Output:

     


    <ul>

    <li>List 1</li>

    <li>List 2</li>

    <li>List 3</li>

    <li>List 4</li>

    </ul>


  •  09-04-2008, 2:56 PM 45980 in reply to 45979

    Re: Clean pasted MS Word HTML, and add unordered list.

    Good to hear you've solved your problem and you're welcome.

    Here's a good explanation of what negative look-ahead is all about:

    http://www.regular-expressions.info/lookaround.html

    B.t.w., that entire website is an excellent resource for learning more about regex-es!

View as RSS news feed in XML