Off the top of my head (can't test this just now) try
(\p{L}|\p{N}) +
That is all Unicode letters or all Unicode numbers. There may be some other combination that is more equivalent to the \w (alphanumerics) but I would need to look it up. However, given this as a start, I'm sure that you can lookup what Java can and cannot handle with this style of shortcut (note that Java only handles a subset of the Unicode properties, scripts and blocks that are available in other regex's).
Susan