hi everyone, thanks for being here for regex help. i am running a regex through a big XML database (a wikipedia dump), and i want to make it so when scanning through each article, it will ignore anything placed between <!-- and -->, which is wikipedia's commenting area. my regex finds misspelled words, so if there are misspelled words in the commenting area, i want it to ignore it.
this for instance this is a result i get back when running my regex:
<!--NOTE: Copy everything below here. Remember to remove unneccessary sections-->
notice how unnecessary
has two "c"s, which makes it spelled wrong. can a regex be told not
to look within commenting like that every time (thus not returning 'false positives' like this)?
here is a URL with the regex: