one.world – C# vs. Java Network Benchmarking


We performed several experiments intended to compare the performance of C#'s TCP socket implementation to that of Java's implementation. Our experiences and results are summarized here.


Experiences

  • When sending small (~8 byte) messages over a Java socket, calling setTcpNoDelay(true) on the socket to disable Nagle's algorithm results in a speedup of two orders of magnitude on a local-area network.
  • With Java on Windows 2000, system time is available only at a resolution of 10 ms. (This has been previously documented in section 8 of the Java Programmer's FAQ and elsewhere.) Such a coarse-grained clock complicates network benchmarking, as network round-trip times are often under a millisecond on a local-area network. By contrast, C# provides a timer with approximately nanosecond resolution in the System.Counter class.
  • We experimented with preflooding the network buffers by sending some messages without receiving any messages before entering a steady-state receive/send loop. We expected this to result in improved performance measurements, as we avoided measuring the penalty for getting the first message into the network buffers. To our surprise, we found that preflooding the network with N messages (N > 1) resulted in a slowdown roughly linear in N. We believe this is because the preflooding steps cause a backlog of messages which are queued at the receiver, artificially inflating the measured RTT. A single preflooding step does result in a noticeable improvement for C#, but not for Java.

Results

We found C# and Java to be very close in TCP socket performance, with Java having a slight edge.
  • C# technical preview vs. Sun's HotSpot Client VM (build 1.3.0-C)
  • A TCP connection is established between client and server. 100-byte messages are sent by the client to the server, where they are immediately sent back to the client. A single throwaway message is sent to preflood the network buffers before performing trials.
  • Each trial consists of 1000 consecutive round-trips. (This is to overcome Java's coarse-grained timing facilities.)
  • Each experiment consists of 100 trials. The results given are means and standard deviations over the trials in an experiment.
  • Client and server on same machine:
    • Windows 2000 SP1, 400 MHz P II, 128 MB
    • C#: 220.21 +/- 1.85 ms per 1000 round-trips
    • Java: 202.0 +/- 7.9 ms per 1000 round-trips
  • Client and server on distinct machines:
    • Client: Windows 2000 SP1, 400 MHz P II, 128 MB
    • Server: Windows 2000, 200 MHz PPro, 64 MB
    • C#: 1008.97 +/- 34.53 ms per 1000 round-trips
    • Java: 999.0 +/- 12.0 ms per 1000 round-trips