[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [iDNS] Re: draft-jseng-utf5-00.txt



James Seng wrote:
> 
> Terry Lambert wrote:
> > The multiple registrations issue is really a non-starter
> > for me, since I could argue that Hirugana registrations
> > of Kanji registrations must be made, even in the Unicode
> > case.  I think a fiat that you must use Kanji, or that
> > Kanji and Hirugana machines of the same name are in fact
> > definitionally different names, is fine.
> 
> that is true. but *why* make it more complicated. with RFC2047
> domain names, you not only need to duplicate it for hiragana
> and kanji, multiple this by the number of locale encodings,
> then multiple it by Base64, quoted-printable and UUencoded.
> 
> so to get one domain name in RFC2047 in Chinese, i have to
> register and managed 2 (simp+trad) x 3 (locale encodings) x 3
> 18 zone files. cool, now lets do the sub-delegation of my zone
> erm.., another permuation of 18x. by the 2nd level, it has
> become ridicilous! some languages have more than 10 locale
> encodings and good luck to those.


OK, here's my rationale:

1)	Your requirement for multiple registrations in
	support of a unification via Unicode (or some
	other mechanism) is a strawman.

	The reason this is true is because this particular
	problem is wholly artificial, since it can be
	trivially resolved by an appeal to locale-based
	administrative fiat.  That is, I, as Japan, can
	demand that all registrations of Japanese host
	names registered under .jp take place in ISO-2022-JP.
	Similarly, I as China, can demand that they take
	place in GB-5.

	I think there is a correct argument for Unicode
	as a means of code space unification, based on the
	shared TLD's, and the desire to have physical
	distance between your primary and secondary
	name servers; maybe sufficient distance that you
	place them in different countries.

	For zone transfers between primary and secondary
	name servers, I believe there is a need for a
	codespace unification.

	Discussions about multiple registration requirements
	being the reason for unification are bogus; they
	serve only to detract from your other arguments.


2)	I point at RFC 2047 specifically because of where
	you say the problem space lies: mail headers,
	specifically the restricted alphabet of RFC 821
	(RFC 822 has no such restriction, and the RFC 1034
	restrictions are an artifact of implementation, as
	later RFC's clarify).

	There are no such restrictions on the host names
	themselves; merely on the alphabet available in
	which to express maildrop names.

	If we are talking about satisfying RFC 821 rules
	about maildrop naming, then RFC 2047 is appropriate
	technology.

	The suggestion that one must both "B" and "Q"
	encode domains is inapropriate.  Specifically,
	if we are to try to accommodate legacy systems
	(such as ignorant MUA's or resolvers), then we
	_must_ use "Q" encoding.  Specifically, Base64
	encoding is vulnerable to distortion via case
	folding, whereas you can over-encode characters
	which would be hit by the case folding to protect
	them, using quoted-printable.

	Note:	Since this transformation is suggested
		to be accomplished by a resolver
		supporting legacy applications, there
		is never a situation when the data
		actually encoded in the DNS itself
		should be other than raw, binary data
		in some unified codeset space.

3)	You continue to argue that the email address, and
	in particular, the portion to the left of the "@"
	(at sign), need to somehow be localizable.

	With respect, this is an issue best addressed by
	(and within) the DRUMS working group.

	This issue is not a strawman, per se, but it's
	irrelevent to iDNS, and likewise clouds your
	arguments.


4)	There is already a supported mechanism for putting
	a localized name out as a moniker for an email
	address.

	While I agree that we should be pursuing an iDNS,
	it is a very small distance to travel to say that
	your domain name, or actual email address, being
	limited to a restricted alphabet, is really the
	hardship you make it out to be.  One could make
	the same argument about the use of Arabic numbers
	in IP address tuples.  At some point, domain names
	become abstractions.  It doesn't matter that I
	use an "English" domain name:

	    <A HREF="http://www.example.com/";>Example</A>

	So long as the tag "Example" is in the correct
	character set and language.  Likewise, your email
	address is largely irrelevnt:

	    <a href="mailto:?to=joe@example.com";>
	    Joe User
	    </a>

	This significantly weakens the supposed need for
	an encoding mechanism that allows you to put email
	addresses themselves into the local language, and
	reduces it to a problem of businees cards and
	advertising media.


> RFC2045-7 are not meant for data processing, and they *never*
> are. no one use MIME for data processing. it is a design as
> a way to encode data content.

It's a method of tunneling data in a non US-ASCII character
set through a transport that has a restricted alphabet.

It seems to me that this does exactly what you are wanting
in order to protect legacy programs, like MUA's.

As a direct side benefit, the use of RFC 2047 encoding
means that RFC 2047 aware MUA's will even do more than
you are asking for: they will, in fact, render the
text in the correct character set, ensuring the intended
visual appearance.

Note: the DNS itself would _not_ store these values encoded,
it would store raw binary data (or should).  The encoding
could be done on the way in (and out) of the resolver code,
which is an intermediation between legacy systems and the
fully international iDNS.


> > Plain old vanilla RFC 821 is very clear:
> 
> i think these are from RFC1034. but lets go on...

RFC 821, Section 4.1.2, page 30...

> > only domains.  Maildrops are not persons.  Locale specific
> 
> IMHO, maildrops are person. jseng@pobox.org.sg my email address.
> and it means as much to me as James Seng. I have a chinese name
> and i would like to have a chinese email address. Consider someone,
> say from rural China area, who has *never* study English,
> jseng@pobox.org.sg is totally unintelligent to him but i am quite
> sure an email like 莊振宏@電郵.公司.新加坡 is reasonable to him.
> You may not understand this gibberish, but i am sure he can.
> this is the social reason why we are promoting I18N in DNS and
> the next step, is of cos Email. My Email address means a lot to
> you. To put it frankly, I dont really care if my Chinese Email
> doesnt makes sense to anyone in the English world. For them, I
> always have my English email address. But I *want* to have my
> Chinese Email to communicate with my fellow Chinese. And I am
> sure there are a lot of people who feel strongly about their own
> language to want this to happen.

	<a href="mailto:莊振宏 <jseng@pobox.org.sg>">
	莊振宏@電郵.公司.新加坡
	</a>

What we are talking about here is business cards, advertising
media, and legacy applications.  This is a convenience for
humans, and, as such, is of dubious value (see below).


> Do not deny us the right to communicate in our own language for
> the convience of yourself or because you cannot understand it.

This is a bogus argument, based on the assuption that the
email address is going to be exposed in its raw form.  If
this were to happen, what you would see would not be:

	莊振宏@電郵.公司.新加坡

But instead something like (pseduo-transformed):

	A1DG5327@BC5A2.FD1D5F7.8E7245

...a UTF-5 encoded representation of the email address, in
the same US-ASCII subset that you are trying to overcome.


> > > > It is not being updated to be cognizant of UTF-* or
> > > > Unicode -- nor should it be.  That is the job of MIME.
> > >
> > > And MIME is unfortunately insufficient to provide a complete
> > > I18N solution. I10N yes, but not I18N where you want a unified
> > > code.
> >
> > I think if we are talking about a short term soloution, then
> > L10N is sufficient, for the short term.
> 
> What makes you think so? I would like to see an working
> implementation of l10n DNS or at least a proposal.

Because, if you are going to do something long-term, then
any stop-gap you come up with will be with us forever.

If the stop-gap doesn't lead naturally into the correct long
term soloution, then it's the wrong stop-gap.  Any stop-gap
that doesn't dictate a raw, unencoded back end, where legacy
applications are "protected" by encoding (perhaps UTF-5;
perhaps not) done at the resolver, is probably not going to
be successful.


> In case you wonder, we went thru at least 6 prototypes
> of iDNS before we come up with this proposal, including
> the one using RFC2047 domain names.


I appreciate the effort you have invested so far.


> > Can we seperate the UTF-5 discussion from the DNS I18N
> > discussion?
> 
> This is fine with me. We only bring up I18N DNS and EMail
> because I was asked to justify for Yet-another-UTF. So,
> is there anything suggestions you have for UTF-5?

Some suggestions, then.

1)	In the abstract, remove the sentence:

	"Example of such systems are the domain name
	 system and email addresses."

	I believe that putting the encoded data into the
	DNS itself, rather than making the transformation
	at the resolver from a pure binary back end, is a
	very bad thing.  The DNS data itself should be in
	whatever unified code set, going forward.

2)	In the fourth paragraph of section 2, an attempt
	is made to disclaim dependence on a particular
	character set.  I believe that the wording
	should be strengthened to something similar to
	the wording specifically calling out [US-ASCII]
	in section 1 of [UTF8].

3)	The table in section 2.1 uses the leading bit
	for initial but not subsequent values.  This
	means that there are encoding restart issues
	that [UTF8] does not suffer from.

	I believe the encoding should be made less
	vulnerable to this problem, through a different
	sequence of introduction bits.

4)	In 2.5, there is a discussion of auto-detection.
	This discussion should be removed.

	The purpose of this encoding is to ensure that
	legacy systems can interoperate with international
	data elements.

	If it is algorithmically possible to detect data
	thus encoded, then this goal has failed to be met.

	Note:	One method that occurs to me is to modify
		the alphabet; specifically, to remove all
		vowels.  This would be much less error
		prone, if you _must_, for some reason,
		attempt auto-detection. of data format.

5)	In section 4, remove item "a", unless you specifically
	restrict yourself to discussing possible resolver
	behaviour on a legacy system.

	Suggesting that other than binary data should be stored
	in the iDNS is a blind alley which can trap us with
	bulky data and fixed internal encodings forever.

	I do not believe that UTF-5 should be applied to
	the data actually stored in the iDNS, except as a
	transform when externalizing that data to a legacy
	application.

6)	In section 4, item "b", remove the sentence:

	"SMTP mailbox have a very strict check [RFC822]
	 dues to many potential security risks when using
	 symbols or special characters in mailbox."

	This is a legacy data issue, not a security issue.

7)	Reword the remainder of paragraph 1 of 4 "b".

	I believe the emphasis on the idea that localization
	of email addresses themselves in support of legacy
	systems to be counter productive.

	Specifically, I believe that said systems are more
	likely to suffer from buffer overrun issues.  Further,
	it is not correct to suggest UTF-5 as a "fix" for
	issues better address by DRUMS.

8)	I believe that you should remove paragraph 2 of
	4 "b".

9)	Paragraph 3 of 4 "b" deserves to be its own point.

10)	Remove the reference to "morse code".  Most of the
	people in this forum appear to find it extravagant.

	Suggested replacement examples:

	o	Caption Vision (closed captioning)
	o	Teletype
	o	Telephony
	o	Teletext

-- Terry Lambert
-- Whistle Communications, Inc.
-- terry@whistle.com
-------------------------------------------------------------------
This is formal notice under California Assembly Bill 1629, enacted
9/26/98 that any UCE sent to my email address will be billed $50
per incident to the legally allowed maximum of $25,000.