Computer lessons

AJAX and problems with encodings. Problems with encoding in jQuery Ajax and PHP Jquery ajax encoding

AJAX is a technology. One of the commonly used techniques of this technology is to send requests using an XMLHttpRequest class object.

Of course, there are no classes in JavaScript, but for convenience we will use this terminology.

The documentation for XMLHttpRequest states that the browser must support the following types of HTTP requests:

GET, POST, HEAD, PUT, DELETE, OPTIONS

Today, only requests of the GET and POST types can be sent using JavaScript through an object of the XMLHttpRequest class.

So, let's look at these 2 queries:

All information can be transmitted to the script on the server only through URLs and headers.

Host: my-child

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.11) Gecko/20071127

Firefox/2.0.0.11

Accept:

Text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/pn

G,*/*;q=0.5

Accept-Language: ru-ru,ru;q=0.8,en-us;q=0.5,en;q=0.3

Accept-Encoding: gzip,deflate

Accept-Charset: windows-1251,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive

Referer: http://moy-rebenok/ajax.html

On the server, in ajax.php you can use the construction

$_GET["f"] to get the value of variable f.

Why is there a problem with Russian letters? Because, as you know, Russian letters cannot be used in URLs; they must be somehow conveyed using available Latin letters, numbers and symbols that are allowed in URLs after the “?” sign.

People agreed that they would do this using escape sequences.

escape sequence of the word "hello" in windows-1251 encoding:

%EF%F0%E8%E2%E5%F2

escape sequence of the word "hello" in UTF-8 encoding:

%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82

escape sequence of the word "hello" in KOI8-R encoding:

%CE%CF%D5%C1%C5%D0

(The "%" sign, then the character code).

Thus, you can convey Russian letters, for example, like this:

GET http://moy-rebenok/ajax.php?f=%EF%F0%E8%E2%E5%F2

GET http://moy-rebenok/ajax.php?f=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82

Nobody limits you in this.

By the way, for a GET request you do not need to specify the Content-Type header.
Because there is no content. There is only a request for a specific address.
All variables are sent to the server via URL.
How to make the necessary escape sequence in the required encoding?
You can make it with your hands, or somehow, but naturally in JavaScript.
Again, no one limits you.

But for convenience, they usually use one of 3 functions that are already defined in JavaScript:

a) escape()

b) encodeURI()

c) encodeURIComponent()

In order:

Latin letters, numbers, symbols @*/+. leaves it as is, codes everything else like this:

%xx, or like this: %uxxxx.

Moreover, xxxx in the second case is the character code not in UTF-8, but in Unicode

There is no need to use this function, because... the execution result depends on the browser, the function is not standardized by the W3C, it arose in the dashing 90s.

In addition, it is somehow difficult to normally (at least quickly) process a string in such a format on the server.

The escape() function is used by our compatriot JsHttpRequest library.
Not because the library is bad, but because it is designed to work with all browsers
(including the most ancient ones).

Latin letters, numbers, symbols!@#$&*()=:/;?+". Leaves everything else as is

encodes

W3C approved.

c) encodeURIComponent():

Latin letters, numbers, symbols!*()". leaves as is, everything else is encoded

escape sequences in UTF-8 encoding.

W3C approved.

Used jQuery, prototype.js when requesting using the GET method.

You may have heard someone say, "XMLHttpRequest only works with UTF-8."
Now you know that this is not entirely true.

When a GET request is used, the encoding of the transmitted data is not written down at all (!).

I repeat once again, “Content-type”, in which we can specify a charset, is not used in GET requests.

But, because JavaScript has 2 convenient functions for converting any string into a string with escape sequences in UTF-8, then everyone uses them and works with UTF-8.

This is why in jQuery you can’t even specify a charset when sending a request.

This is why in Prototype.js, even when you specify encoding="windows-1251" and use a GET request, UTF-8 is still transmitted.

Simply because the codes of these libraries use the encodeURIComponent() function.

Well. There is absolutely nothing wrong with this. All you need to do to now work in PHP in normal encoding is use iconv:

$f = iconv("UTF-8", "windows-1251", $_GET["f"]);

By the way, we can do this precisely because $_GET works in such a way that it understands

escape sequences. Thanks to the creators of PHP.

Those. when a GET request comes in, PHP looks at the URL, creates a $_GET array for us, and we
We can do whatever we want with him. But this should seem understandable.

2) POST requests.

This is where things get more interesting.

Here comes this request to the server. The PHP handler looks at the Content-type, and depending on it, fills the $_POST array and/or the $HTTP_RAW_POST_DATA variable.

$_POST it fills in when the Content-type specifies multipart/form-data or

x-www-form-urlencoded.

What kind of Content-type is this?

And the content type is very convenient. It allows you to pass several variables to a PHP script.

What exactly is a POST request?

These are the headings, followed by the content. The content is generally arbitrary. Those. just bytes, bytes, bytes.

But from JavaScript you usually need to transfer not just bytes, bytes, bytes, but several key=value, key=value, ...

Like in a GET request.

So people agreed on such a convenient type as x-www-form-urlencoded

In order to pass f=123 and gt=null you need to pass the content:

Sounds familiar, doesn't it? Of course it’s familiar, and it’s not for nothing that the type is called x-www-form-urlencoded.

Everything is the same as with a GET request.

And how is content generated in the jQuery and prototype.js libraries?

That's right, using the same encodeURIComponent() function, which means the escape sequences will be in UTF-8 encoding. (Regardless of what encoding you set in prototype.js).

All. There is one more possibility left. After all, you can transmit not x-www-form-urlencoded (i.e., not parameters), but ordinary text or binary content, which can then be read via $HTTP_RAW_POST_DATA.

To do this, set Content -type text /xml or application /octet -stream , and set charset="windows-1251" there.

We put the required encoding string into the send() function. (Prototype.js wraps this call with new Ajax.Request(...)).

And then what... And it (an object of the XMLHttpRequest class) translates this string into UTF-8, no matter what encoding it is in. This is what it says in the W3C documentation. And he really does it.

Conclusions:

2. You can transmit strings as if “in any other encodings,” if non-Latin characters are escaped.

3. There are 3 functions in JavaScript that escape non-Latin characters:

escape(), encodeURI() and encodeURIComponent().

The first one translates into a Unicode curve. The second two are in UTF-8.

You can write your own functions that will generate escape sequences of any encoding. It's possible, but not necessary. Because on the contrary, we should be glad that there are such functions that convert text of any encoding to UTF-8. This is an extremely wonderful fact. The scheme in which all xhtml pages work on windows-1251, ajax from the server sends windows-1251 to the client, and ajax from the client sends UTF-8 to the server is absolutely acceptable and is used on most resources.

Just remember to use iconv as described below. And in order for the server to send JavaScript JSON (or whatever you have) in the correct encoding (i.e. in the same encoding in which all xhtml pages are sent), simply write the header at the beginning of your ajax.php:

header("Content-type: text/html; charset=windows-1251");

And everything will be ok.

Finally, a little subjective opinion:

Use jQuery, love people, give gifts.

Many developers have encountered many coding problems in jQuery Ajax and PHP. Let's look at the reasons for their occurrence.

I a priori hope that no one makes a page on the server and client in different encodings...

1. Recode all pages to utf-8, since this encoding is multilingual and you will not have problems with it in the future.

2. If you transmit data to jQuery Ajax using the GET method, you may have a problem with transmitting text (data) in Cyrillic (Russian). Why is that? Since IE, for example, does not transmit data in utf-8. Using encodeURIComponent

$.ajax((
dataType: "html",
type: "GET",
url: "ajax.php",
data: "query="+encodeURIComponent("Russian text"),
success: function(data)(
alert(data)
}
});

And the Russian text is transmitted normally, and, for example, not in question marks.

3. For reference, the correct way to write is utf-8 (write like this everywhere!), and not utf8, windows-1251, not windows1251, etc. This can also cause problems. For example, in IE, since it is sensitive to correct spelling. Until now, still...

In the case of Ajax in PHP, part of the data processing

4. Don’t forget to indicate in the header what encoding you want to receive the data in. It is written at the very beginning of the document.

// at the beginning of the php page
header("Content-Type: text/plain; charset=utf-8");

// manually recode the received data
$name = iconv("UTF8","CP1251",$_GET["name"]);

5. If you write data to a database, then do not forget to let the server know in what encoding it will come/leave/where it will come from/how to write it

Mysql_query("SET character_set_results = "utf-8", character_set_client = "utf-8", character_set_connection = "utf-8", character_set_database = "utf-8", character_set_server = "utf-8"", $db);

If you have problems, write in the comments. We'll figure it out)

For a year and a half now, a post about the artificiality of problems with encodings, etc., has been gathering dust in drafts. AJAX.
Every time questions of a similar nature popped up on the forums, I wanted to give a link; whenever there was a surge in visits to the blog for the queries “encoding, ajax, problem,” I wanted to publish it, but it seemed to me that the post was not finished yet, I needed to add a little more ...
But just today a surprisingly similar post appeared - ajax, cp1251. Similar in content, but completely opposite in meaning.
Therefore, I decided to delete my draft and tell my “truth” in the form of criticism of fxposter’s advice.

It's no secret that the default encoding for data received via Ajax is UTF-8.

It's actually a secret. A secret for many. And many do not understand why this is so.
The internal representation of strings (and regular expressions) in JavaScript for all non-ASCII sequences is UTF-8.
This is where the so-called “problem” – if the encoding is not specified explicitly and a non-Latin alphabet is used, it will be interpreted as a utf-8 sequence.

Update 29.11 Fresh air and David Mzareulyan have cooled the ardor, so I hasten to clarify what exactly will be discussed below.
So - you have a certain resource in a single-byte encoding (don’t go to a fortune teller, it will be windows-1251) and you are concerned about mastering a new buzzword called AJAX. After reading a little, you take the first timid steps in this direction and immediately step on a “baby rake”, and then, after catching your breath a little, rush to the forums with a cry for help. And they will provide you with this help - they say, remake your resource in utf-8... Of course, of course you will say and go redo it...
I want to warn against such rash steps.

The standard solution, which everyone vyingly advises, is “use utf-8 and there are no problems.”

And the advisers are right - there really won’t be any problems.

The traffic will simply double. Same data, same result, but “twice” more traffic. Yeah?

What are you saying about the powder?!?

If this factor seems insignificant to you, then you should stop reading here and start reworking your project to use UTF-X,
For the rest, I’ll leave a few recipes that will help avoid problems when using single-byte encodings in the so-called. AJAX applications:

  • First, and most importantly, ALWAYS specify the content encoding. Any server response with text content must have a header Content-Type: your/type; charset=your-charset.
    The cheapest way to do this is by setting up a server (for example in php via default_charset)
  • Specify charset when including javascript in document body()
  • Please specify the CORRECT charset

    having previously set the appropriate header – “Content-Type: text/html; charset=cp1251”

    In this particular case, taken by the ass, fxposter is his own evil pinocchio.

    Any registered IANA charset may be used, but UTF-8 is preferred.

    Well, there is no encoding with the name cp1251 among any registered...

To complete the picture, I will give a couple of problematic issues that you will have to face:

  • Don't allow AJAX responses that contain non-Latin characters to remain in the browser cache (with 304 Not Modified the response will rise from the cache, but “some browsers” use utf-8 as the charset)
  • This rule is blatantly used by manufacturers of various libraries for json_code, but for browsers (as we found out earlier), the main thing is to specify the encoding, and then everything will fall apart.
    Hence the “problem” - you need to encode data into JSON manually; common library functions expect utf-8 as input.

I expect the moral of this story from you in the comments.

AJAX and encoding issues

» AJAX and problems with encodings.

Very often, developers in certain situations encounter problems related to encodings. Especially those who work in windows-1251 encoding. Today I wanted to look at this problem, look at different examples and possible solutions.

AJAX

AJAX (Asynchronous Javascript and XML, asynchronous Javascript and XML) has become very popular, and it is difficult to imagine a modern website without the use of this technology. Essentially, Ajax is a background data exchange that allows you to receive data without reloading the page. Various “live searches”, registrations, feedback forms, etc.

Through AJAX we can transfer data using the POST and GET methods. Let's figure out what problems there may be in transmitting this data.

Let's start with GET.

When we transmit data via GET, this means we send the script a URL in which the Russian text should be encoded, in a certain sequence. It's called an escape sequence.

For example:

ajax.php?query=%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9+%D1%82%D0%B5%D0%BA%D1%81 %D1%82

In this GET request, query passes the phrase "Russian text". But escape sequences differ from each other, depending on the encoding used. In order to convert Russian text into a sequence, the W3C recommends using the "encodeURIComponent()" function, which automatically converts the text to utf-8 and creates an escape sequence. Therefore, when transmitting text through Jquery, Prototype, and other frameworks, the output we receive is text in UTF-8 encoding. If you are working in Windows-1251 encoding, you will first have to convert the text from utf-8 to Windows-1251 (this can be done via iconv, for example: $_GET["query"] = iconv("utf-8", "Windows- 1251", $_GET["query"])).

We're done with the theory, now let's look at examples. There is one interesting nuance that I discovered. I worked with the Jquery framework, the site works in utf-8, the ajax request handler works in utf-8, the database works in utf-8, in short, all the site nodes are built in this encoding. Relying on the fact that Jquery sends the request via encodeURIComponent(), I did not use it. And, in principle, there were no problems until requests with “kassyabrs” began to arrive, despite the fact that Firefox, Chrome and Opera correctly sent requests in utf-8, but Internet Explorer, as the most outstanding browser, managed to send requests to Windows -1251.

I decided to conduct a test and figure out in what situations IE sends the wrong encoding.

There are 2 scripts:

$.get("ajax.php", ( "query": "Russian text"),function(data)( alert(data) ))

Result:

$.ajax(( dataType: "html", type: "GET", url: "ajax.php", data: "query=Russian text", success: function(data)( alert(data); ) ));

Result:

So, after checking both, I found out that in $.ajax IE sends not UTF-8, but Windows-1251. The solution to this problem is to add encodeURIComponent() and everything will be fine.

$.ajax(( dataType: "html", type: "GET", url: "ajax.php", data: "query="+encodeURIComponent("Russian text"), success: function(data)( alert(data ) ) ));

Result:

Okay, we've sorted out GET requests.

Now let's take a quick look at POST.

Unlike GET requests, POST carries Content-type, which tells the server script information about what data it is working with and the ability to specify the encoding. For example, in Jquery by default AJAX passes "application/x-www-form-urlencoded; charset=UTF-8", but even if you specify "text/html; charset=windows-1251”, then the incoming data will be in utf-8, since when transmitting data, Jquery generates an escape sequence using the function described above.

But this is not so scary, because we always have the opportunity to translate the encoding and already work with the data. At the same time, we can give the results in our favorite encoding, the main thing is not to forget to put a header indicating the encoding, if suddenly you get “kakazyabry” back.

For example:

Header("Content-type: text/html; charset=windows-1251");

Header("Content-type: text/html; charset=utf-8");

Conclusion: in order to avoid problems with encodings when transmitting data via AJAX, you need to use encodeURIComponent(). If your server script that receives requests works in a different encoding than utf-8, then you need to use the iconv php function and set the header header.



December 17, 2007 at 07:59 pm Let’s figure it out once and for all: AJAX, “Cyrillic characters,” encodings, prototype.js, jQuery, JsHttpRequest
  • Website development

AJAX is a technology. One of the frequently used techniques of this technology is
sending requests using an XMLHttpRequest class object.

How to send and receive AJAX requests in the encoding we need, do we need to use single-byte encodings or can we do without UTF-8. This article will answer all these questions once and for all.

And yet, of course, there are no classes in JavaScript, but for convenience we will use this terminology.

The documentation for XMLHttpRequest states that the browser must support the following types
HTTP requests: GET, POST, HEAD, PUT, DELETE, OPTIONS.

Today, you can send JavaScript through an object of the XMLHttpRequest class
only GET and POST requests.

So, let's look at these 2 queries:

All information can be transmitted to the script on the server only through URLs and headers.

On the server, in ajax.php you can use the construction
$_GET["f"] to get the value of variable f.

Why is there a problem with Russian letters? Because, as you know, Russian letters cannot be used in URLs; they must be somehow conveyed using available Latin letters, numbers and symbols allowed in URLs after the “?” sign.

People agreed that they would do this using escape sequences.

Escape sequence of the word “hello” in windows-1251 encoding:
%EF%F0%E8%E2%E5%F2

Escape the sequence of the word "hello" in UTF-8 encoding:
%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82

Escape sequence of the word “hello” in KOI8-R encoding:
%CE%CF%D5%C1%C5%D0

(The "%" sign, then the character code).

Thus, you can convey Russian letters, for example, like this:

Nobody limits you in this.

By the way, for a GET request you do not need to specify the Content-Type header.
Because there is no content. There is only a request for a specific address.
All variables are sent to the server via URL.

How to make the necessary escape sequence in the required encoding?

You can make it with your hands, or somehow, but naturally in JavaScript.
Again, no one limits you.

But for convenience, they usually use one of 3 functions that are already defined in JavaScript:

A) escape()
b) encodeURI()
c) encodeURIComponent()

In order:

Latin letters, numbers, symbols @*/+. leaves it as is, codes everything else like this:
%xx, or like this: %uxxxx.
Moreover, xxxx in the second case is the character code not in UTF-8, but in Unicode

There is no need to use this function, because... the execution result depends on the browser, the function is not standardized by the W3C, it arose in the dashing 90s.

In addition, it is somehow difficult to normally (at least quickly) process a string in such a format on the server.

The escape() function is used by our compatriot JsHttpRequest library.
Not because the library is bad, but because it is designed to work with all browsers
(including the most ancient ones).

B) encodeURI()

Latin letters, numbers, symbols!@#$&*()=:/;?+". Leaves everything else as is
encodes

B) encodeURIComponent():

Latin letters, numbers, symbols!*()". leaves as is, everything else is encoded
escape sequences in UTF-8 encoding.
W3C approved.

Used jQuery, prototype.js when requesting using the GET method.

You may have heard someone say, “XMLHttpRequest only works with UTF-8.”
Now you know that this is not entirely true.

When a GET request is used, the encoding of the transmitted data is not written down at all (!).
I repeat once again, “Content-type”, in which we can specify the charset
not used in GET requests.

But, because JavaScript has 2 convenient functions for converting any string into a string with escape sequences in UTF-8, then everyone uses them and works with UTF-8.

This is why in jQuery you can’t even specify a charset when sending a request.
This is why in Prototype.js, even when you specify encoding="windows-1251" and use a GET request, UTF-8 is still transmitted.

Simply because the codes of these libraries use the encodeURIComponent() function.

Well. There is absolutely nothing wrong with this. Everything you need to do to get it working now
in PHP
normal encoding use iconv:

$f = iconv("UTF-8", "windows-1251", $_GET["f"]);

By the way, we can do this precisely because $_GET works in such a way that it understands
escape sequences. Thanks to the creators of PHP.

Those. when a GET request comes in, PHP looks at the URL, creates a $_GET array for us, and we
already with him
we do what we want. But this should seem understandable.

2) POST requests.

This is where things get more interesting.

Here comes this request to the server. The PHP handler looks at the Content-type, and depending on it, fills the $_POST array and/or the $HTTP_RAW_POST_DATA variable.

$_POST it fills in when the Content-type specifies multipart/form-data or
x-www-form-urlencoded.

What kind of Content-type is this?
And the content type is very convenient. It allows you to pass several variables to a PHP script.

What exactly is a POST request?
These are the headings, followed by the content. The content is generally arbitrary. Those. just bytes, bytes, bytes.

But from JavaScript you usually need to transfer not just bytes, bytes, bytes, but several pairs key=value, key=value,...
Like in a GET request.

So people agreed on such a convenient type as x-www-form-urlencoded
In order to pass f=123 and gt=null you need to pass the content:

Sounds familiar, doesn't it? Of course it’s familiar, and it’s not for nothing that the type is called x-www-form-urlencoded.
Everything is the same as with a GET request.

And how is content generated in the jQuery and prototype.js libraries?

That's right, using the same encodeURIComponent() function, which means the escape sequences will be in UTF-8 encoding. (Regardless of what encoding you set in prototype.js).

All. There is one more possibility left. After all, you can transmit not x-www-form-urlencoded (i.e., not parameters), but ordinary text or binary content, which can then be read via $HTTP_RAW_POST_DATA.

To do this, set the Content-type text/xml or application/octet-stream, and set charset="windows-1251" there.

We put the required encoding string into the send() function. (Prototype.js wraps this call with new Ajax.Request(...)).

And then what... And it (an object of the XMLHttpRequest class) translates this string into UTF-8, no matter what encoding it is in. This is what it says in the W3C documentation. And he really does it.

Conclusions:

2. You can transmit strings as if “in any other encodings” if non-Latin characters
at the same time, escape is prohibited.

3. There are 3 functions in JavaScript that escape non-Latin characters:
escape(), encodeURI() and encodeURIComponent().

The first one translates into a Unicode curve. The second two are in UTF-8.

You can write your own functions that will generate escape sequences of any encoding. It's possible, but not necessary. Because on the contrary, we should be glad that there are such functions that convert text of any encoding to UTF-8. This is an extremely wonderful fact. The scheme in which all xhtml pages work on windows-1251, ajax from the server sends windows-1251 to the client, and ajax from the client sends UTF-8 to the server is absolutely acceptable and is used on most resources.

Just remember to use iconv as described below. And in order for the server to send JavaScript JSON (or whatever you have) in the correct encoding (i.e. in the same encoding in which all xhtml pages are sent), simply write the header at the beginning of your ajax.php:

Header("Content-type: text/html; charset=windows-1251");

And everything will be ok.

Finally, a little subjective opinion:

Use jQuery, love people, give gifts.