For bert, there are many models use `#` for subword symbol, but not all. Some popular bert-based models defined their own subword symbol. For example, in `e5` the symbol is `▁`. ``` >>> a = '▁' >>> a.encode('utf-8') b'\xe2\x96\x81' ```