I am working with a tool our library uses, called MetaLib, which allows a person to search one or more of our databases for articles and then connects to the full text by pulling various metadata and putting into an OpenURL. MetaLib uses Perl-based regex in it's configuration files to pull the data. I'm new to regex and am having trouble setting up the configuration for a particular database. The majority of the citation information is kept in one field, field "g". My problem I have is that the article page numbers may be in one of two formats:
1) g JOURNAL OF CHINA UNIVERSITY OF GEOSCIENCES v.18, no.1, pp.49-59, March 2007. (ISSN 1002-0705; Over 10 refs)-59
2) g OIL GEOPHYS. PROSPECTING v.39, no.2, pp.VI,218-221, 4/5/2004. (ISSN 1000-7210; 2 refs; In Chinese)-221
As you can see, the page numbers may come immediately after the "pp." or they may follow some roman numerals. I need to create a regular expression that will pull the page numbers either way. I've created a couple of expressions but neither seems to work:
(?<=pp\.)(?([A-Z][A-Z]?)(?<=,)(\d{1,}-\d{1,})|(?<=pp\.)(\d{1,}-\d{1,}))
(?(pp\.[A-Z][A-Z]?)(\d{1,}-\d{1,})|((?<=pp\.)(\d{1,}-\d{1,})))
Any assistance would be appreciated.