Support splitting Strings into Unicode Grapheme Cluster

### Describe the feature

When working with Unicode, we usually don't care about the bytes, but we usually also don't care about the code points (runes). What we mostly care is characters displayed on screen (grapheme clusters). Unicode provides an algorithm to split strings into grapheme clusters (units of display width one). This feature is about including grapheme cluster splitting into builtin.



### Use Case

Anyone working with a UI, who wants to know:
- how long is a string (display characters on the screen)
- where is the pointer on screen
Neither bytes nor runes provide this information
- use format strings with unicode strings

Example:

This text should be right aligned:
```
examples := [
	'\u006E\u0303',
	'\U0001F3F3\uFE0F\u200D\U0001F308',
	'ห์', 
	'ปีเตอร์'
]

println("0123456789abcdefgh")
for text in examples 
{
	println("${text:10}")
}
```

```
0123456789abcdefgh
         ñ
    🏳️‍🌈
        ห์
   ปีเตอร์

```

But it isn't.

### Proposed Solution

### Add a feature to split a string into graphemes

```
hello := 'Hello World 🏳️‍🌈'
hello_graphemes := hello.graphemes () // [`H`, `e`, `l`, `l`, `o`, ` `, `W`, `o`, `r`, `l`, `d`, ` `, `🏳️‍🌈`]
```

Current Behavior
```
examples := [
	'\u006E\u0303',
	'\U0001F3F3\uFE0F\u200D\U0001F308',
	'ห์', 
	'ปีเตอร์'
]

for text in examples 
{
	println("0123456789abcdefgh")
	println(text)
	println(text.runes())
}
```

```
0123456789abcdefgh
ñ
[`n`, `̃`]
0123456789abcdefgh
🏳️‍🌈
[`🏳`, `️`, `‍`, `🌈`]
0123456789abcdefgh
ห์
[`ห`, `์`]
0123456789abcdefgh
ปีเตอร์
[`ป`, `ี`, `เ`, `ต`, `อ`, `ร`, `์`]
```

Proposed behavior:
```
examples := [
	'\u006E\u0303',
	'\U0001F3F3\uFE0F\u200D\U0001F308',
	'ห์', 
	'ปีเตอร์'
]

for text in examples 
{
	println("0123456789abcdefgh")
	println(text)
	println(text.graphemes())
}
```

```
0123456789abcdefgh
ñ
[`ñ`]
0123456789abcdefgh
🏳️‍🌈
[`🏳️‍🌈`]
0123456789abcdefgh
ห์
[`ห์`]
0123456789abcdefgh
ปีเตอร์
[`ปี`, `เ`, `ต`, `อ`, `ร์`]
```

### Further suggestions

- consider removing runes (or consider replacing the implementation to use grapheme clusters instead of codepoints):
    - What is the rational to have them? 
    - In which situation do you want to work with codepoints but not grapheme clusters?
- consider using grapheme clusters for width calculation of format strings
- consider making grapheme clusters a first class citizen and hide bytes behind a call 

e.g. 
```
string[n] ... access n-th grapheme
string.len ... number of graphemes
string.bytes()[n] ... access n-th byte
string.bytes().len ... number of bytes
```



### Other Information

Unicode Reference and some more info on the background
- [Unicode® Standard Annex #29 Unicode Text Segmentation](http://www.unicode.org/reports/tr29/)
- [[Medium] Dealing with Unicode strings, done right and better.](https://dev.to/cometkim/dealing-with-unicode-string-done-right-and-better-2nei)
- Sample C-implentation (examplery) https://libs.suckless.org/libgrapheme/

This feature would also fix this bug:
- https://github.com/vlang/v/discussions/18650

### Acknowledgements

- [X] I may be able to implement this feature request
- [X] This feature might incur a breaking change

### Version used

0.4.7

### Environment details (OS name and version, etc.)

```
V full version: V 0.4.7 7baff15
OS: linux, "Manjaro Linux"
Processor: 16 cpus, 64bit, little endian, AMD Ryzen 7 7840U w/ Radeon  780M Graphics

getwd: /home/pepper
vexe: /usr/lib/vlang/v
vexe mtime: 2024-08-26 17:34:57

vroot: NOT writable, value: /usr/lib/vlang
VMODULES: OK, value: /home/pepper/.vmodules
VTMP: OK, value: /tmp/v_1000

Git version: git version 2.46.0
Git vroot status: Error: fatal: not a git repository (or any of the parent directories): .git
.git/config present: false

CC version: cc (GCC) 14.2.1 20240805
thirdparty/tcc status: thirdparty-linux-amd64 0134e9b9-dirty
```
> [!NOTE]
> You can use the 👍 reaction to increase the issue's priority for developers.
>
> Please note that only the 👍 reaction to the issue itself counts as a vote.
> Other reactions and those to comments will not be taken into account.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support splitting Strings into Unicode Grapheme Cluster #22117

Describe the feature

Use Case

Proposed Solution

Add a feature to split a string into graphemes

Further suggestions

Other Information

Acknowledgements

Version used

Environment details (OS name and version, etc.)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support splitting Strings into Unicode Grapheme Cluster #22117

Description

Describe the feature

Use Case

Proposed Solution

Add a feature to split a string into graphemes

Further suggestions

Other Information

Acknowledgements

Version used

Environment details (OS name and version, etc.)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions