UTF-8 MySQL Tomcat Struts

Yeah, I admit it. I have been neglecting you, faithful reader. But you know me, I always have a month or so each year when my updates are few and infrequent. And if you, based on the heading on this entry, assume that the first entry in a week will be full of strange three letter abbreviations, computer programming jargon and other information interesting to very few people but computer nerds like me, you’re absolutely right!

So this entry will not be very interesting for the majority of you, but some of you will love me for it. It will save you from hours of pulling your hair, swearing and coffee drinking at the office - time you should have used at home in front of your Xbox instead. Without further ado, I hereby bring you:

How to display UTF-8 characters in Tomcat using MySQL as the data source! There’s even a little something extra for you if you’re using Struts! I did this with MySQL 4.1, Tomcat 5.5 and Struts 1.2.

  1. Make sure your MySQL database is using the UTF-8 character set. For some reason, the default character set is Latin1. I think the argument for that is that databases using UTF-8 are larger because UTF-8 uses more bytes to save characters than Latin1. Screw that, use UTF-8 for all it’s worth. I used MySQL-Front to convert the database, change the character ser in the database, table and row properties to UTF-8. There probably is a much smoother way to do this, but it worked for me. As a side note I have to say that I recommend using MySQL-Front for very little else than converting the database. It’s too bug-ridden for that.
  2. Tell the JDBC connector that it has to talk to the database using UTF-8. To do this add useUnicode=true&characterEncoding=UTF-8 to the connection URL in the JDBC configuration. Your connection URL will look something like this: jdbc:mysql://localhost:3306/mydatabase?useUnicode=true&characterEncoding=UTF-8.
  3. The next step is to make sure the browser knows that what it’s receiving is actually UTF-8. In your JSP files add <%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %> at the top. In your Servlets, make sure the right HTTP headers are sent back to the client by adding the line response.setContentType("text/html;charset=UTF-8");. Of course, you’ll have to use another MIME type than text/html if you’re not going to display HTML.
  4. After that, you’ll have to tell Java that you’re using UTF-8 by configuring the Java options. Add the parameter -Dfile.encoding=UTF-8 to your Java options, either in the catalina.bat file or by clicking on the Java tab in the Monitor Tomcat program.
  5. The fifth thing you’ll have to do - but as far as I know, you’ll only have to do this if you’re using Struts to handle web forms - is to make sure all input from the client is converted to UTF-8. This is done with a little bit of coding and a configuration change. First, create a class containing the following code:
package filters;

import java.io.IOException;
import javax.servlet.*;

public class UTF8Filter implements Filter {

  public void destroy() {}

  public void doFilter(
      ServletRequest request,
      ServletResponse response,
      FilterChain chain)
        throws IOException, ServletException {
      request.setCharacterEncoding("UTF8");
      chain.doFilter(request, response);
    }

    public void init(FilterConfig filterConfig) throws ServletException {}
}

When that is done, you’ll have to configure the filter in your web.xml:

<filter>
    <filter-name>UTF8Filter</filter-name>
    <filter-class>filters.UTF8Filter</filter-class>`
</filter>
<filter-mapping>
  <filter-name>UTF8Filter</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>

If I’ve remembered everything you now have a UTF-8 compliant web application. Yay!

Thanks a million to the following sources:

Struts, UTF-8 and form submissions : Jiploo.com

Google

Join me later, when I tell you the secret of how to internationalise your Struts application to display UTF-8 characters from the Application_xx.properties files. That was a true PITA to figure out, too.


Feedback

Do you have any thoughts you want to share? A question, maybe? Or is something in this post just plainly wrong? Then please send an e-mail to vegard at vegard dot net with your input. You can also use any of the other points of contact listed on the About page.


Caution

It looks like you're using Google's Chrome browser, which records everything you do on the internet. Personally identifiable and sensitive information about you is then sold to the highest bidder, making you a part of surveillance capitalism.

The Contra Chrome comic explains why this is bad, and why you should use another browser.