Thursday, March 29, 2012

Easy way to remove international alphabet from all rows?

Is there an easy way to loop through all rows and remove all
international alphabet characters from a column in a table, for example
remove German umlauts "" and convert them to a simple "u".
Thanks,
lqSee the following thread:
http://groups-beta.google.com/group...b45abaa5cf410b8

Razvan|||OK< thanks on that. How about keeping out these characters in the first
place?|||laurenq uantrell (laurenquantrell@.hotmail.com) writes:
> Is there an easy way to loop through all rows and remove all
> international alphabet characters from a column in a table, for example
> remove German umlauts "" and convert them to a simple "u".

Here is a much smarter version than the one in the linkn Razvan foun

create table T (
s nvarchar(20)
)
insert into T values ('cre')
insert into T values ('pt de fois gras')
insert into T values ('HLNE MELANON')

update T
set s = convert(varchar(20), s) COLLATE Cyrillic_General_CS_AS
go
select * from T
go
drop table T

This assumes that your data only has Latin characters.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||laurenq uantrell (laurenquantrell@.hotmail.com) writes:
> OK< thanks on that. How about keeping out these characters in the first
> place?

I don't know, nuke the parts of the world where we insist on using other
languages than English?

Seriously, I think it would be a very bad idea. If you want to send
mail to these persons, you do want to spell their names properly, don't
you?

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||That's only one of the problems! I want to store the data with the
international characters but then I need to strip out the international
characters for certain output, such as in e-mail addresses...
lq|||Technically, international characters are permitted in email addresses
and domain names so stripping them out is still a bad idea.

You'll have to build your own table to translate characters that you
deem to be unacceptable. Google for some conversion tables but there
isn't really a single answer because it depends on which characters you
need to support (don't forget the two-byte character sets used in the
Far East and elsewhere for example).

--
David Portas
SQL Server MVP
--|||laurenq uantrell (laurenquantrell@.hotmail.com) writes:
> That's only one of the problems! I want to store the data with the
> international characters but then I need to strip out the international
> characters for certain output, such as in e-mail addresses...

I would recomment that you store the e-mail address. While the e-mail
address often can be formed as firstname.lastname@.domain.xxx, this is
not a requirement.

And the local transformation may not be the one you expect. in German
after often replaced with ae, oe and ue. (Whether this is actually common
in mail addresses, I don't know.)

If you are thinking of the comment part of the e-mail address:

John Smith <johnsmith@.example.com
(John Smith is the comment here) You should either encode the comment
according to MIME as per RFC2047, or send as-is. But don't misspell
people's names.

(By the way, permit me to point out that the term "international characters"
is a misnomer. The internationalest characters I can think of is the
English alphabet A-Z. Non-ASCII characters is a better term, and
technically precise.)

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland,
Thanks for that. You're right, I do mean non-ASCII characters...
This gets tricky because many city names and people names contain
non-ASCII characters and I want to display then properly, however, some
systems do not support non-ASCII characters and they need to be
cleansed for output to those systems, so as you point out, there is not
an always correct translation into ASCII characters.|||laurenq uantrell (laurenquantrell@.hotmail.com) writes:
> Thanks for that. You're right, I do mean non-ASCII characters...
> This gets tricky because many city names and people names contain
> non-ASCII characters and I want to display then properly, however, some
> systems do not support non-ASCII characters and they need to be
> cleansed for output to those systems, so as you point out, there is not
> an always correct translation into ASCII characters.

Systems in 2005 that is not capable to handle non-ASCII characters? So,
not all programs are capable to handle Unicode, but one would at least
imagine that they should be able handle 8-bit charset. If I were you,
I would consider to attempt to kill these systems by starvation and
not feed them any data. :-)

On a more serious point, there are one letter that the snippet I
posted does not handle and that is . This gets converted into a ?.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Windows file names is an example on one such system...|||Which Windows are you referring to? FAT32? NTFS does support non-ASCII
characters in files names, at least it does under Win 2003 so I don't
know why it wouldn't under other versions.

--
David Portas
SQL Server MVP
--|||laurenq uantrell (laurenquantrell@.hotmail.com) writes:
> Windows file names is an example on one such system...

The 8.3 names, yes, but you rarely have to see them. Even less rarely
have to form them. As long as you are on NTFS you can use the whole range
of Unicode, and use letters from any script. The FAT files systems may
not support this, but I'm pretty sure that they support the full range
of the current ANSI code page.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.aspsql

No comments:

Post a Comment