> What i was hoping for was some kind of term for one character or symbol and use that as a unit
There is one, kind-of: "grapheme cluster"[0]. This is the "unit" used by UAX29 to define text segmentation, and aliases to "user-perceived character"[1].
Most languages/API don't really consider them (although they crop up often in e.g. browser bug trackers), let alone provide first-class access to them. One of the very few APIs which actually acknowledges them is Cocoa's NSString — and Apple provides a document explaining grapheme clusters and how they relate to NNString[2] — which has very good unicode support (probably the best I know of, though Factor may have an even better one[3]), and it handles grapheme clusters through providing messages which work on codepoint ranges in an NSString, it doesn't treat clusters as first-class objects.
> i guess if you asked a Sanskrit speaker how long a word/sentence was, you'd get the answer..
There is one, kind-of: "grapheme cluster"[0]. This is the "unit" used by UAX29 to define text segmentation, and aliases to "user-perceived character"[1].
Most languages/API don't really consider them (although they crop up often in e.g. browser bug trackers), let alone provide first-class access to them. One of the very few APIs which actually acknowledges them is Cocoa's NSString — and Apple provides a document explaining grapheme clusters and how they relate to NNString[2] — which has very good unicode support (probably the best I know of, though Factor may have an even better one[3]), and it handles grapheme clusters through providing messages which work on codepoint ranges in an NSString, it doesn't treat clusters as first-class objects.
> i guess if you asked a Sanskrit speaker how long a word/sentence was, you'd get the answer..
Indeed.
[0] http://www.unicode.org/glossary/#grapheme_cluster
[1] http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Bounda...
[2] https://developer.apple.com/library/mac/#documentation/Cocoa...
[3] the original implementor detailed his whole route through creating factor's unicode library, and I learned a lot from it: http://useless-factor.blogspot.be/search/label/unicode