UTF-8 MySQL Tomcat Struts
Yeah, I admit it. I have been neglecting you, faithful reader. But you know me, I always have a month or so each year when my updates are few and infrequent. And if you, based on the heading on this entry, assume that the first entry in a week will be full of strange three letter abbreviations, computer programming jargon and other information interesting to very few people but computer nerds like me, you’re absolutely right!
So this entry will not be very interesting for the majority of you, but some of you will love me for it. It will save you from hours of pulling your hair, swearing and coffee drinking at the office - time you should have used at home in front of your Xbox instead. Without further ado, I hereby bring you:
How to display UTF-8 characters in Tomcat using MySQL as the data source! There’s even a little something extra for you if you’re using Struts! I did this with MySQL 4.1, Tomcat 5.5 and Struts 1.2.
- Make sure your MySQL database is using the UTF-8 character set. For some reason, the default character set is Latin1. I think the argument for that is that databases using UTF-8 are larger because UTF-8 uses more bytes to save characters than Latin1. Screw that, use UTF-8 for all it’s worth. I used MySQL-Front to convert the database, change the character ser in the database, table and row properties to UTF-8. There probably is a much smoother way to do this, but it worked for me. As a side note I have to say that I recommend using MySQL-Front for very little else than converting the database. It’s too bug-ridden for that.
- Tell the JDBC connector that it has to talk to the database using UTF-8. To do this add
useUnicode=true&characterEncoding=UTF-8
to the connection URL in the JDBC configuration. Your connection URL will look something like this:jdbc:mysql://localhost:3306/mydatabase?useUnicode=true&characterEncoding=UTF-8
. - The next step is to make sure the browser knows that what it’s receiving is actually UTF-8. In your JSP files add
<%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
at the top. In your Servlets, make sure the right HTTP headers are sent back to the client by adding the lineresponse.setContentType("text/html;charset=UTF-8");
. Of course, you’ll have to use another MIME type than text/html if you’re not going to display HTML. - After that, you’ll have to tell Java that you’re using UTF-8 by configuring the Java options. Add the parameter
-Dfile.encoding=UTF-8
to your Java options, either in the catalina.bat file or by clicking on the Java tab in the Monitor Tomcat program. - The fifth thing you’ll have to do - but as far as I know, you’ll only have to do this if you’re using Struts to handle web forms - is to make sure all input from the client is converted to UTF-8. This is done with a little bit of coding and a configuration change. First, create a class containing the following code:
package filters;
import java.io.IOException;
import javax.servlet.*;
public class UTF8Filter implements Filter {
public void destroy() {}
public void doFilter(
ServletRequest request,
ServletResponse response,
FilterChain chain)
throws IOException, ServletException {
request.setCharacterEncoding("UTF8");
chain.doFilter(request, response);
}
public void init(FilterConfig filterConfig) throws ServletException {}
}
When that is done, you’ll have to configure the filter in your web.xml:
<filter>
<filter-name>UTF8Filter</filter-name>
<filter-class>filters.UTF8Filter</filter-class>`
</filter>
<filter-mapping>
<filter-name>UTF8Filter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
If I’ve remembered everything you now have a UTF-8 compliant web application. Yay!
Thanks a million to the following sources:
Struts, UTF-8 and form submissions : Jiploo.com
Join me later, when I tell you the secret of how to internationalise your Struts application to display UTF-8 characters from the Application_xx.properties files. That was a true PITA to figure out, too.
Feedback
Do you have any thoughts you want to share? A question, maybe? Or is something in this post just plainly wrong? Then please send an e-mail to vegard at vegard dot net
with your input. You can also use any of the other points of contact listed on the About page.
that was usefull.. but why would I ever wanna use Tomcat??? talk ‘bout turning to the dark side! .Net might be a foul place and Gates the devil himself.. but Tomcat, that’s true HELL.. :)
Oh man, I can’t WAIT to hear how to internationalise my Strut! :)
I’m glad people like you think about these things, V, so I dont have to. Instead, I can figure out what wingflaps of a bird made of liquid sounds like…
so…
can all this strutting be useful if you are making and marketing internet porn? is there a potential for making more money?
this is pure hypothetical off course.
:-D
Actually UTF-8 doesnt use more space than Latin1 IF you stay within the first 127 ASCII characters. After that it uses 2,3 or 4 bytes depending on where in the unicode map you try to access characters. /geek off
Small update to the last post (where i forgot to enter my name). UTF-8 uses 1-6 bytes to represent unicode characters.
Roar: I certainly learn something new every day.
Klas: Sure it can be used for porn. All technology can be used for porn. In fact, porn drives technology.
Kristoffer: That’s why I wrote a new post today - just for you!
ju9||: You’ll probably burn in hell.
"Thanks a million to the following sources:
Struts, UTF-8 and form submissions : Jiploo.com"
You’re welcome.
Oh my god how much simpler this would have been if every piece of software could just stop with their incredibly stupid pseudo-arguments for not handling text as UTF-8 by default and just do it. Software not handilng text as UTF-8 are by default broken, imho.
My god, thank you for this. I was missing one of the steps. I have not found another place that lists all the steps required like this. Thank you!
thanks for this nice post. I tried these steps for creating
a unicode compatible website, but the problem is the database was not changed at all and it remained latin1 somehow !