Unicode defines a set of symbols, not the encoding. I guess you want to recognise the non-ascii encoding, which might be a subset of utf16 or utf8 or utf7 or whatever the jungle of standards might define
I assume you want to recognize the most popular of them, which is UTF8. You can find one of the definitions in rfc 3629 http://www.faqs.org/rfcs/rfc3629.html, together with the formal definition of the encoding (BNF).
For the convenience of implementors using ABNF, a definition of UTF-8 in ABNF syntax is given here. A UTF-8 string is a sequence of octets representing a sequence of UCS characters. An octet sequence is valid UTF-8 only if it matches the following syntax, which is derived from the rules for encoding UTF-8 and is expressed in the ABNF
UTF8-octets = *( UTF8-char )
UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
UTF8-1 = %x00-7F
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
%xF4 %x80-8F 2( UTF8-tail )
UTF8-tail = %x80-BF
I translate it to pcre-compliant lingo, you won't find complex to rewrite it for egrep, vim and other tools.
Because nothing in the utf8 encoding does include the bytes from \x00 to \xff, the quick decoding would be to exclude "us-english ascii"
[^\x00-\x7f]
If you want to properly delimit each symbol of the utf8 encoding (variable number of bytes per symbol),
( ( ( [\xf1-\xf3][\x80-\xbf] | \xf0[\x90-\xbf] | \xf4[\x80-\x8f] | [\xe1-\xec\xee-\xef] ) [\x80-\xbf] ) | [\xc2-\xdf] | \xe0[\xa0-\xbf] | \xed[\x80-\x9f] ) [\x80-\xbf]
You might also discover that some OS, tools and utilities use \xfe\xff and \xff\xfe and \xff\xff in some cases, in which case it could be utf16. See http://www.faqs.org/rfcs/rfc2781.html . Unicode encoding is 2 to 3 bytes starting with a "byte order mark"
([\xfe\xff] | [\xff\xfe]) marks the start of utf16 encoding of unicode.
since you didn't provide an example of what you wanted to match and the moderator will kill me for responding to a request non compliant with the guidelines, can I suggest you check exactly what you wish to catch with a regexp ?
dump your .txt file using a unix/linux tool like od
od -c -x file.txt | more
if you see that it is alternates of 00 bytes and something else, then it is likely to be utf16, else if you find scattered bytes greater than 0x80, by pairs or triples, they are utf8