Util/Punycode.php

Classes 
Classes
Kwf_Util_Punycode

\Kwf_Util_Punycode

author
Matthias Sommerfeld <mso@phlylabs.de>
author
Leonid Kogan <lko@neuse.de>
copyright
2004-2010 phlyLabs Berlin, http://phlylabs.de
version
0.6.9 2010-11-04
Properties
$NP
$_allow_overlong
$_api_encoding
$_base
$_damp
$_encode_german_sz
$_error
$_initial_bias
$_initial_n
$_invalid_ucs
$_lbase
$_lcount
$_max_ucs
$_ncount
$_punycode_prefix
$_sbase
$_scount
$_skew
$_strict_mode
$_tbase
$_tcount
$_tmax
$_tmin
$_vbase
$_vcount
Methods
__construct
_adapt
_apply_cannonical_ordering
_combine
_decode
_decode_digit
_encode
_encode_digit
_error
_get_combining_class
_hangul_compose
_hangul_decompose
_nameprep
_ucs4_string_to_ucs4
_ucs4_to_ucs4_string
_ucs4_to_utf8
_utf8_to_ucs4
decode
encode
encode_uri
get_last_error
set_parameter

Description

Encode/decode Internationalized Domain Names.

The class allows to convert internationalized domain names (see RFC 3490 for details) as they can be used with various registries worldwide to be translated between their original (localized) form and their encoded form as it will be used in the DNS (Domain Name System).

The class provides two public methods, encode() and decode(), which do exactly what you would expect them to do. You are allowed to use complete domain names, simple strings and complete email addresses as well. That means, that you might use any of the following notations:

  • www.nörgler.com
  • xn--nrgler-wxa
  • xn--brse-5qa.xn--knrz-1ra.info

Unicode input might be given as either UTF-8 string, UCS-4 string or UCS-4 array. Unicode output is available in the same formats. You can select your preferred format via {@link set_paramter()}.

ACE input and output is always expected to be ASCII.

Properties

$NP

 $NP = 'array'

Holds all relevant mapping tables See RFC3454 for details

Details

visibility
protected
default
array
final
false
static
false
private
array
since
0.5.2

$_allow_overlong

 $_allow_overlong = 'false'

Details

visibility
protected
default
false
final
false
static
false

$_api_encoding

 $_api_encoding = 'utf8'

Details

visibility
protected
default
utf8
final
false
static
false

$_base

 $_base = '36'

Details

visibility
protected
default
36
final
false
static
false

$_damp

 $_damp = '700'

Details

visibility
protected
default
700
final
false
static
false

$_encode_german_sz

 $_encode_german_sz = 'true'

Details

visibility
protected
default
true
final
false
static
false

$_error

 $_error = 'false'

Details

visibility
protected
default
false
final
false
static
false

$_initial_bias

 $_initial_bias = '72'

Details

visibility
protected
default
72
final
false
static
false

$_initial_n

 $_initial_n = '0x80'

Details

visibility
protected
default
0x80
final
false
static
false

$_invalid_ucs

 $_invalid_ucs = '0x80000000'

Details

visibility
protected
default
0x80000000
final
false
static
false

$_lbase

 $_lbase = '0x1100'

Details

visibility
protected
default
0x1100
final
false
static
false

$_lcount

 $_lcount = '19'

Details

visibility
protected
default
19
final
false
static
false

$_max_ucs

 $_max_ucs = '0x10FFFF'

Details

visibility
protected
default
0x10FFFF
final
false
static
false

$_ncount

 $_ncount = '588'

Details

visibility
protected
default
588
final
false
static
false

$_punycode_prefix

 $_punycode_prefix = 'xn--'

Details

visibility
protected
default
xn--
final
false
static
false

$_sbase

 $_sbase = '0xAC00'

Details

visibility
protected
default
0xAC00
final
false
static
false

$_scount

 $_scount = '11172'

Details

visibility
protected
default
11172
final
false
static
false

$_skew

 $_skew = '38'

Details

visibility
protected
default
38
final
false
static
false

$_strict_mode

 $_strict_mode = 'false'

Details

visibility
protected
default
false
final
false
static
false

$_tbase

 $_tbase = '0x11A7'

Details

visibility
protected
default
0x11A7
final
false
static
false

$_tcount

 $_tcount = '28'

Details

visibility
protected
default
28
final
false
static
false

$_tmax

 $_tmax = '26'

Details

visibility
protected
default
26
final
false
static
false

$_tmin

 $_tmin = '1'

Details

visibility
protected
default
1
final
false
static
false

$_vbase

 $_vbase = '0x1161'

Details

visibility
protected
default
0x1161
final
false
static
false

$_vcount

 $_vcount = '21'

Details

visibility
protected
default
21
final
false
static
false

Methods

__construct

__construct( array $options = false ) : boolean

the constructor

Arguments
$options
array
Output
boolean
Details
visibility
public
final
false
static
false
since
0.5.2

_adapt

_adapt( int $delta, int $npoints, int $is_first ) : int

Adapt the bias according to the current code point and position

Arguments
$delta
int
$npoints
int
$is_first
int
Output
int
Details
visibility
protected
final
false
static
false

_apply_cannonical_ordering

_apply_cannonical_ordering( array $input ) : array

Apllies the cannonical ordering of a decomposed UCS4 sequence

Arguments
$input
array
Decomposed UCS4 sequence
Output
array
Ordered USC4 sequence
Details
visibility
protected
final
false
static
false

_combine

_combine( array $input ) : array

Do composition of a sequence of starter and non-starter

Arguments
$input
array
UCS4 Decomposed sequence
Output
array
Ordered USC4 sequence
Details
visibility
protected
final
false
static
false

_decode

_decode(  $encoded ) : mixed

The actual decoding algorithm

Arguments
$encoded
string
Output
mixed
Details
visibility
protected
final
false
static
false

_decode_digit

_decode_digit( int $cp ) : int

Decode a certain digit

Arguments
$cp
int
Output
int
Details
visibility
protected
final
false
static
false

_encode

_encode(  $decoded ) : mixed

The actual encoding algorithm

Arguments
$decoded
string
Output
mixed
Details
visibility
protected
final
false
static
false

_encode_digit

_encode_digit( int $d ) : string

Encoding a certain digit

Arguments
$d
int
Output
string
Details
visibility
protected
final
false
static
false

_error

_error( string $error ) :

Internal error handling method

Arguments
$error
string
Details
visibility
protected
final
false
static
false

_get_combining_class

_get_combining_class( integer $char ) : integer

Returns the combining class of a certain wide char

Arguments
$char
integer
Wide char to check (32bit integer)
Output
integer
Combining class if found, else 0
Details
visibility
protected
final
false
static
false

_hangul_compose

_hangul_compose( array $input ) : array

Ccomposes a Hangul syllable (see http://www.unicode.org/unicode/reports/tr15/#Hangul

Arguments
$input
array
Decomposed UCS4 sequence
Output
array
UCS4 sequence with syllables composed
Details
visibility
protected
final
false
static
false

_hangul_decompose

_hangul_decompose( integer $char ) : array

Decomposes a Hangul syllable (see http://www.unicode.org/unicode/reports/tr15/#Hangul

Arguments
$char
integer
32bit UCS4 code point
Output
array
Either Hangul Syllable decomposed or original 32bit value as one value array
Details
visibility
protected
final
false
static
false

_nameprep

_nameprep( array $input ) : string

Do Nameprep according to RFC3491 and RFC3454

Arguments
$input
array
Unicode Characters
Output
string
Unicode Characters, Nameprep'd
Details
visibility
protected
final
false
static
false

_ucs4_string_to_ucs4

_ucs4_string_to_ucs4( string $input ) : array

Convert UCS-4 strin into UCS-4 garray

Arguments
$input
string
Output
array
Details
visibility
protected
final
false
static
false

_ucs4_to_ucs4_string

_ucs4_to_ucs4_string( array $input ) : string

Convert UCS-4 array into UCS-4 string

Arguments
$input
array
Output
string
Details
visibility
protected
final
false
static
false

_ucs4_to_utf8

_ucs4_to_utf8( string $input ) : string

Convert UCS-4 string into UTF-8 string See _utf8_to_ucs4() for details

Arguments
$input
string
Output
string
Details
visibility
protected
final
false
static
false

_utf8_to_ucs4

_utf8_to_ucs4( string $input ) : string

This converts an UTF-8 encoded string to its UCS-4 representation By talking about UCS-4 "strings" we mean arrays of 32bit integers representing each of the "chars". This is due to PHP not being able to handle strings with bit depth different from 8. This apllies to the reverse method _ucs4_to_utf8(), too.

The following UTF-8 encodings are supported: bytes bits representation 1 7 0xxxxxxx 2 11 110xxxxx 10xxxxxx 3 16 1110xxxx 10xxxxxx 10xxxxxx 4 21 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 5 26 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 6 31 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx Each x represents a bit that can be used to store character data. The five and six byte sequences are part of Annex D of ISO/IEC 10646-1:2000

Arguments
$input
string
Output
string
Details
visibility
protected
final
false
static
false

decode

decode( string $input,  $one_time_encoding = false ) : string

Decode a given ACE domain name

Arguments
$input
string
Domain name (ACE string) [@param string Desired output encoding, see {@link set_parameter}]
$one_time_encoding
Output
string
Decoded Domain name (UTF-8 or UCS-4)
Details
visibility
public
final
false
static
false

encode

encode( string $decoded,  $one_time_encoding = false ) : string

Encode a given UTF-8 domain name

Arguments
$decoded
string
Domain name (UTF-8 or UCS-4) [@param string Desired input encoding, see {@link set_parameter}]
$one_time_encoding
Output
string
Encoded Domain name (ACE string)
Details
visibility
public
final
false
static
false

encode_uri

encode_uri( string $uri ) : string

Removes a weakness of encode(), which cannot properly handle URIs but instead encodes their path or query components, too.

Arguments
$uri
string
Expects the URI as a UTF-8 (or ASCII) string
Output
string
The URI encoded to Punycode, everything but the host component is left alone
Details
visibility
public
final
false
static
false
since
0.6.4

get_last_error

get_last_error( ) : string

Use this method to get the last error ocurred

Output
string
The last error, that occured
Details
visibility
public
final
false
static
false

set_parameter

set_parameter( mixed $option, string $value = false ) : boolean

Sets a new option value. Available options and values: [encoding - Use either UTF-8, UCS4 as array or UCS4 as string as input ('utf8' for UTF-8, 'ucs4_string' and 'ucs4_array' respectively for UCS4); The output is always UTF-8] [overlong - Unicode does not allow unnecessarily long encodings of chars, to allow this, set this parameter to true, else to false; default is false.] [strict - true: strict mode, good for registration purposes - Causes errors on failures; false: loose mode, ideal for "wildlife" applications by silently ignoring errors and returning the original input instead

Arguments
$option
mixed
Parameter to set (string: single parameter; array of Parameter => Value pairs)
$value
string
Value to use (if parameter 1 is a string)
Output
boolean
true on success, false otherwise
Details
visibility
public
final
false
static
false
Documentation was generated by DocBlox 0.12.3.