[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [iDNS] Re: draft-jseng-utf5-00.txt
James Seng wrote:
>
> Terry Lambert wrote:
> > The multiple registrations issue is really a non-starter
> > for me, since I could argue that Hirugana registrations
> > of Kanji registrations must be made, even in the Unicode
> > case. I think a fiat that you must use Kanji, or that
> > Kanji and Hirugana machines of the same name are in fact
> > definitionally different names, is fine.
>
> that is true. but *why* make it more complicated. with RFC2047
> domain names, you not only need to duplicate it for hiragana
> and kanji, multiple this by the number of locale encodings,
> then multiple it by Base64, quoted-printable and UUencoded.
>
> so to get one domain name in RFC2047 in Chinese, i have to
> register and managed 2 (simp+trad) x 3 (locale encodings) x 3
> 18 zone files. cool, now lets do the sub-delegation of my zone
> erm.., another permuation of 18x. by the 2nd level, it has
> become ridicilous! some languages have more than 10 locale
> encodings and good luck to those.
OK, here's my rationale:
1) Your requirement for multiple registrations in
support of a unification via Unicode (or some
other mechanism) is a strawman.
The reason this is true is because this particular
problem is wholly artificial, since it can be
trivially resolved by an appeal to locale-based
administrative fiat. That is, I, as Japan, can
demand that all registrations of Japanese host
names registered under .jp take place in ISO-2022-JP.
Similarly, I as China, can demand that they take
place in GB-5.
I think there is a correct argument for Unicode
as a means of code space unification, based on the
shared TLD's, and the desire to have physical
distance between your primary and secondary
name servers; maybe sufficient distance that you
place them in different countries.
For zone transfers between primary and secondary
name servers, I believe there is a need for a
codespace unification.
Discussions about multiple registration requirements
being the reason for unification are bogus; they
serve only to detract from your other arguments.
2) I point at RFC 2047 specifically because of where
you say the problem space lies: mail headers,
specifically the restricted alphabet of RFC 821
(RFC 822 has no such restriction, and the RFC 1034
restrictions are an artifact of implementation, as
later RFC's clarify).
There are no such restrictions on the host names
themselves; merely on the alphabet available in
which to express maildrop names.
If we are talking about satisfying RFC 821 rules
about maildrop naming, then RFC 2047 is appropriate
technology.
The suggestion that one must both "B" and "Q"
encode domains is inapropriate. Specifically,
if we are to try to accommodate legacy systems
(such as ignorant MUA's or resolvers), then we
_must_ use "Q" encoding. Specifically, Base64
encoding is vulnerable to distortion via case
folding, whereas you can over-encode characters
which would be hit by the case folding to protect
them, using quoted-printable.
Note: Since this transformation is suggested
to be accomplished by a resolver
supporting legacy applications, there
is never a situation when the data
actually encoded in the DNS itself
should be other than raw, binary data
in some unified codeset space.
3) You continue to argue that the email address, and
in particular, the portion to the left of the "@"
(at sign), need to somehow be localizable.
With respect, this is an issue best addressed by
(and within) the DRUMS working group.
This issue is not a strawman, per se, but it's
irrelevent to iDNS, and likewise clouds your
arguments.
4) There is already a supported mechanism for putting
a localized name out as a moniker for an email
address.
While I agree that we should be pursuing an iDNS,
it is a very small distance to travel to say that
your domain name, or actual email address, being
limited to a restricted alphabet, is really the
hardship you make it out to be. One could make
the same argument about the use of Arabic numbers
in IP address tuples. At some point, domain names
become abstractions. It doesn't matter that I
use an "English" domain name:
<A HREF="http://www.example.com/">Example</A>
So long as the tag "Example" is in the correct
character set and language. Likewise, your email
address is largely irrelevnt:
<a href="mailto:?to=joe@example.com">
Joe User
</a>
This significantly weakens the supposed need for
an encoding mechanism that allows you to put email
addresses themselves into the local language, and
reduces it to a problem of businees cards and
advertising media.
> RFC2045-7 are not meant for data processing, and they *never*
> are. no one use MIME for data processing. it is a design as
> a way to encode data content.
It's a method of tunneling data in a non US-ASCII character
set through a transport that has a restricted alphabet.
It seems to me that this does exactly what you are wanting
in order to protect legacy programs, like MUA's.
As a direct side benefit, the use of RFC 2047 encoding
means that RFC 2047 aware MUA's will even do more than
you are asking for: they will, in fact, render the
text in the correct character set, ensuring the intended
visual appearance.
Note: the DNS itself would _not_ store these values encoded,
it would store raw binary data (or should). The encoding
could be done on the way in (and out) of the resolver code,
which is an intermediation between legacy systems and the
fully international iDNS.
> > Plain old vanilla RFC 821 is very clear:
>
> i think these are from RFC1034. but lets go on...
RFC 821, Section 4.1.2, page 30...
> > only domains. Maildrops are not persons. Locale specific
>
> IMHO, maildrops are person. jseng@pobox.org.sg my email address.
> and it means as much to me as James Seng. I have a chinese name
> and i would like to have a chinese email address. Consider someone,
> say from rural China area, who has *never* study English,
> jseng@pobox.org.sg is totally unintelligent to him but i am quite
> sure an email like 莊振宏@電郵.公司.新加坡 is reasonable to him.
> You may not understand this gibberish, but i am sure he can.
> this is the social reason why we are promoting I18N in DNS and
> the next step, is of cos Email. My Email address means a lot to
> you. To put it frankly, I dont really care if my Chinese Email
> doesnt makes sense to anyone in the English world. For them, I
> always have my English email address. But I *want* to have my
> Chinese Email to communicate with my fellow Chinese. And I am
> sure there are a lot of people who feel strongly about their own
> language to want this to happen.
<a href="mailto:莊振宏 <jseng@pobox.org.sg>">
莊振宏@電郵.公司.新加坡
</a>
What we are talking about here is business cards, advertising
media, and legacy applications. This is a convenience for
humans, and, as such, is of dubious value (see below).
> Do not deny us the right to communicate in our own language for
> the convience of yourself or because you cannot understand it.
This is a bogus argument, based on the assuption that the
email address is going to be exposed in its raw form. If
this were to happen, what you would see would not be:
莊振宏@電郵.公司.新加坡
But instead something like (pseduo-transformed):
A1DG5327@BC5A2.FD1D5F7.8E7245
...a UTF-5 encoded representation of the email address, in
the same US-ASCII subset that you are trying to overcome.
> > > > It is not being updated to be cognizant of UTF-* or
> > > > Unicode -- nor should it be. That is the job of MIME.
> > >
> > > And MIME is unfortunately insufficient to provide a complete
> > > I18N solution. I10N yes, but not I18N where you want a unified
> > > code.
> >
> > I think if we are talking about a short term soloution, then
> > L10N is sufficient, for the short term.
>
> What makes you think so? I would like to see an working
> implementation of l10n DNS or at least a proposal.
Because, if you are going to do something long-term, then
any stop-gap you come up with will be with us forever.
If the stop-gap doesn't lead naturally into the correct long
term soloution, then it's the wrong stop-gap. Any stop-gap
that doesn't dictate a raw, unencoded back end, where legacy
applications are "protected" by encoding (perhaps UTF-5;
perhaps not) done at the resolver, is probably not going to
be successful.
> In case you wonder, we went thru at least 6 prototypes
> of iDNS before we come up with this proposal, including
> the one using RFC2047 domain names.
I appreciate the effort you have invested so far.
> > Can we seperate the UTF-5 discussion from the DNS I18N
> > discussion?
>
> This is fine with me. We only bring up I18N DNS and EMail
> because I was asked to justify for Yet-another-UTF. So,
> is there anything suggestions you have for UTF-5?
Some suggestions, then.
1) In the abstract, remove the sentence:
"Example of such systems are the domain name
system and email addresses."
I believe that putting the encoded data into the
DNS itself, rather than making the transformation
at the resolver from a pure binary back end, is a
very bad thing. The DNS data itself should be in
whatever unified code set, going forward.
2) In the fourth paragraph of section 2, an attempt
is made to disclaim dependence on a particular
character set. I believe that the wording
should be strengthened to something similar to
the wording specifically calling out [US-ASCII]
in section 1 of [UTF8].
3) The table in section 2.1 uses the leading bit
for initial but not subsequent values. This
means that there are encoding restart issues
that [UTF8] does not suffer from.
I believe the encoding should be made less
vulnerable to this problem, through a different
sequence of introduction bits.
4) In 2.5, there is a discussion of auto-detection.
This discussion should be removed.
The purpose of this encoding is to ensure that
legacy systems can interoperate with international
data elements.
If it is algorithmically possible to detect data
thus encoded, then this goal has failed to be met.
Note: One method that occurs to me is to modify
the alphabet; specifically, to remove all
vowels. This would be much less error
prone, if you _must_, for some reason,
attempt auto-detection. of data format.
5) In section 4, remove item "a", unless you specifically
restrict yourself to discussing possible resolver
behaviour on a legacy system.
Suggesting that other than binary data should be stored
in the iDNS is a blind alley which can trap us with
bulky data and fixed internal encodings forever.
I do not believe that UTF-5 should be applied to
the data actually stored in the iDNS, except as a
transform when externalizing that data to a legacy
application.
6) In section 4, item "b", remove the sentence:
"SMTP mailbox have a very strict check [RFC822]
dues to many potential security risks when using
symbols or special characters in mailbox."
This is a legacy data issue, not a security issue.
7) Reword the remainder of paragraph 1 of 4 "b".
I believe the emphasis on the idea that localization
of email addresses themselves in support of legacy
systems to be counter productive.
Specifically, I believe that said systems are more
likely to suffer from buffer overrun issues. Further,
it is not correct to suggest UTF-5 as a "fix" for
issues better address by DRUMS.
8) I believe that you should remove paragraph 2 of
4 "b".
9) Paragraph 3 of 4 "b" deserves to be its own point.
10) Remove the reference to "morse code". Most of the
people in this forum appear to find it extravagant.
Suggested replacement examples:
o Caption Vision (closed captioning)
o Teletype
o Telephony
o Teletext
-- Terry Lambert
-- Whistle Communications, Inc.
-- terry@whistle.com
-------------------------------------------------------------------
This is formal notice under California Assembly Bill 1629, enacted
9/26/98 that any UCE sent to my email address will be billed $50
per incident to the legally allowed maximum of $25,000.