Performance Comparison C vs. Java vs. Javascript vs. LuaJIT vs. PyPy vs. PHP vs. Python vs. Perl

This post was automatically copied from Performance Comparison C vs. Java vs. Javascript vs. LuaJIT vs. PyPy vs. PHP vs. Python vs. Perl on eklausmeier.goip.de.

1. Introduction. I always wanted to benchmark PHP, to confirm myself that choosing PHP as a static site generator is not a dead-end, compared, for example, against node.js. PHP 7 has already made huge performance advancements. See PHP 7: Features and Performance Comparison:

  • Drupal - Compared to previous versions, Drupal 8 runs 72 percent faster on PHP 7 when compared to previous versions.
  • Wordpress - For WordPress, a PHP 7 runtime only executes 25M CPU instructions compared to just under 100M doing the same job on older PHP versions.
  • Magento - For Magento, servers running PHP 7 are able to serve up to three times more requests as those running PHP 5.6.

Also, there were continuous improvements in PHP 7 and PHP 8, see PHP 8.1 Performance Is Continuing To Improve With Early Benchmarks.

2. Korol's and Zahariev's results. A specific comparison between PHP and node.js can be found in Viktor Korol's PHP vs NodeJS comparison and benchmarks, clearly favoring PHP. A performance comparison between PHP and various other languages was conducted in C++ vs. Python vs. PHP vs. Java vs. Others performance benchmark (2016 Q3). It shows that node.js is multiple times faster than PHP 7.

For the reader's convenience I reproduce the table from Ivan Zahariev, but leave out Rust, C#, Go, and Ruby:

Languagereal in s
C++0.951
PyPy1.496
node.js1.837
PHP 76.624
Java 812.144
Python 3.518.077
Perl25.068
Python 2.725.333

I had compared C vs. Lua vs LuaJIT and Java here, and I had previously benchmarked C vs. LuaJIT vs. various Lua versions vs. Pallene here.

3. Results. This time I again use the \(n\)-queens problem, i.e., how many times can \(n\) queens be put on an \(n\times n\) chess board without attacking any other queen. The programs in C, Python, PHP, Javascript (node.js), Lua, and Perl are given in 5. Programs. Task at hand is to compute from \(n=1\) to \(n=12\) all possible combinations. For example, for C I call time xdamcnt2 12. I ran the programs multiple times and took the lowest result. Results are "real" time in seconds, i.e., this includes "user" and "sys" times. Of course, all programs produced exactly the same results -- all computations are integer computations. Expected results are:

n2468101214
combinations100210440923527242,68014,20073,712365,596

Runtimes on NUC and Odroid are given in below table:

LanguageNUCRyzenOdroid
C0.170.150.44
Java 1.8.00.310.22108.17
node.js 16.40.340.211.67
LuaJIT0.490.332.06
PyPy3 7.3.51.570.86n/a
PHP 83.232.3542.38
Python 3.9.612.297.65168.17
Perl 5.3425.8721.14209.47

Conclusions:

  1. C is by far the fastest, by a factor of 2 on Intel to its nearest competitors: Java and Javascript. On Ryzen the difference is not so nuanced.
  2. Javascript and Java are almost similar in speed -- I didn't expect this result.
  3. Java's performance on ARM is terrible, even worse than Javascript, LuaJIT, and PHP combined. This means that very likely Java will run poorly on Apple's M1, or Amazon Graviton.
  4. Michael Pall's LuaJIT is a strong competitor to all other languages.
  5. PyPy is more than 7-times faster than Python.
  6. PHP 8 is faster than Python, but almost 10-times slower than Javascript, confirming the "adding numbers" benchmark as mentioned in the introduction from Viktor Korol, and in line with measurments from Ivan Zahariev.
  7. While C is roughly 2.5-times slower on ARM than on Intel, the other languages suffer a disproportionate degradation.

4. Machines used. Performance tests were run on Intel NUC Kit D54250WYKH, a quadcore i5-4250U machine.

$ lscpu
Architecture:            x86_64                                                                                                            
  CPU op-mode(s):        32-bit, 64-bit                                                                                                    
  Address sizes:         39 bits physical, 48 bits virtual                                                                                 
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz
    CPU family:          6
    Model:               69
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            1
    CPU max MHz:         2600.0000 
    CPU min MHz:         800.0000
    BogoMIPS:            3792.18
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t
                         m pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid ap
                         erfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic m
                         ovbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs 
                         ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsave
                         opt dtherm ida arat pln pts md_clear flush_l1d
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   64 KiB (2 instances)
  L1i:                   64 KiB (2 instances)
  L2:                    512 KiB (2 instances)
  L3:                    3 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
  Srbds:                 Vulnerable: No microcode
  Tsx async abort:       Not affected

Operating system and version:

$ uname -a
Linux nuc 5.12.15-arch1-1 #1 SMP PREEMPT Wed, 07 Jul 2021 23:35:29 +0000 x86_64 GNU/Linux

Performance tests were run on Ryzen 3400G.

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 5 PRO 3400G with Radeon Vega Graphics
    CPU family:          23
    Model:               24
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            1
    Frequency boost:     enabled
    CPU max MHz:         3700.0000
    CPU min MHz:         1400.0000
    BogoMIPS:            7402.17
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb r
                         dtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe pop
                         cnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext p
                         erfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt s
                         ha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeass
                         ists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sme sev sev_es
Virtualization features:
  Virtualization:        AMD-V
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   256 KiB (4 instances)
  L2:                    2 MiB (4 instances)
  L3:                    4 MiB (1 instance)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-7
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Operating system and version:

$ uname -a
Linux ryzen 5.12.15-arch1-1 #1 SMP PREEMPT Wed, 07 Jul 2021 23:35:29 +0000 x86_64 GNU/Linux

Tests were run on an Odroid XU4, an octacore ARM machine.

$ lscpu
Architecture:           armv7l
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              ARM
  Model name:           Cortex-A7
    Model:              3
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           r0p3
    CPU max MHz:        1500.0000
    CPU min MHz:        200.0000
    BogoMIPS:           120.00
    Flags:              half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
  Model name:           Cortex-A15
    Model:              3
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           r2p3
    CPU max MHz:        2000.0000
    CPU min MHz:        200.0000
    BogoMIPS:           120.00
    Flags:              half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae

Operating system and version:

$ uname -a
Linux odroid 4.14.180-3-ARCH #1 SMP PREEMPT Sat Jun 5 22:29:47 UTC 2021 armv7l GNU/Linux

5. Programs. The C program. C compiler is gcc 11.1.0 and using flags -Wall -O3 -march=native.

#define abs(x)   ((x >= 0) ? x : -x)

/* Check if k-th queen is attacked by any other prior queen.
   Return nonzero if configuration is OK, zero otherwise.
*/
int configOkay (int k, int a[]) {
    int z = a[k];

    for (int j=1; j<k; ++j) {
        int l = z - a[j];
        if (l == 0  ||  abs(l) == k - j) return 0;
    }
    return 1;
}



long solve (int N, int a[]) {  // return number of positions
    long cnt = 0;
    int k = a[1] = 1;
    int N2 = N;  //(N + 1) / 2;

    loop:
        if (configOkay(k,a)) {
            if (k < N)  { a[++k] = 1;  goto loop; }
            else ++cnt;
        }
        do
            if (a[k] < N)  { a[k] += 1;  goto loop; }
        while (--k > 1);
        a[1] += 1;
        if (a[1] > N2) return cnt;
        k = 2 ,  a[2] = 1;
    goto loop;
}

The Java program. Java is 1.8.0_292-b10.

class xdamcnt2 {

    /* Check if k-th queen is attacked by any other prior queen.
       Return nonzero if configuration is OK, zero otherwise.
    */
    public static boolean configOkay (int k, int[] a) {
        int z = a[k];

        for (int j=1; j<k; ++j) {
            int l = z - a[j];
            if (l == 0  ||  Math.abs(l) == k - j) return false;
        }
        return true;
    }



    public static long solve (int N, int[] a) {  // return number of positions
        long cnt = 0;
        int k = a[1] = 1;
        int N2 = N;  //(N + 1) / 2;
        boolean flag;

        for (;;) {
            flag = false;
            if (configOkay(k,a)) {
                if (k < N)  { a[++k] = 1;  continue; }
                else ++cnt;
            }
            do
                if (a[k] < N)  { a[k] += 1;  flag = true; break; }
            while (--k > 1);
            if (flag) continue;
            a[1] += 1;
            if (a[1] > N2) return cnt;
            k = 2;  a[2] = 1;
        }
    }
    ...

The Javascript program. node.js is v16.4.2

/* Check if k-th queen is attacked by any other prior queen.
   Return nonzero if configuration is OK, zero otherwise.
*/
function configOkay (k, a) {
    var j;
    let z = a[k];

    for (j=1; j<k; ++j) {
        let l = z - a[j];
        if (l == 0  ||  Math.abs(l) == k - j) return 0;
    }
    return 1;
}



function solve (N, a) {  // return number of positions
    let cnt = 0;
    let k = a[1] = 1;
    let N2 = N;  //(N + 1) / 2;

    loop: for(;;) {
        if (configOkay(k,a)) {
            if (k < N)  { a[++k] = 1;  continue loop; }
            else ++cnt;
        }
        do
            if (a[k] < N)  { a[k] += 1;  continue loop; }
        while (--k > 1);
        a[1] += 1;
        if (a[1] > N2) return cnt;
        k = 2 ,  a[2] = 1;
    }
}

The Python program. Python is 3.9.6. PyPy is 7.3.5 for Python 3.7.0.

#  Check if k-th queen is attacked by any other prior queen.
#  Return 'True' if configuration is OK, 'False' otherwise.
#
def configOkay (k, a):
    z = a[k]

    for j in range(1,k):
        l = z - a[j]
        if (l == 0  or  abs(l) == k - j): return False
    return True




def solve (N, a):    # return number of positions
    cnt = 0
    k = a[1] = 1
    N2 = N    # (N + 1) / 2

    while True:
        flag = 0
        if configOkay(k,a):
            if k < N:  k += 1; a[k] = 1; continue
            else: cnt += 1

        while True:
            if a[k] < N:  a[k] += 1; flag = 1; break
            k -= 1
            if k <= 1: break
        if flag == 1: continue
        a[1] += 1
        if (a[1] > N2): return cnt
        k = 2
        a[2] = 1

The PHP program. PHP is 8.0.8.

<?php
/* Check if k-th queen is attacked by any other prior queen.
   Return nonzero if configuration is OK, zero otherwise.
*/
function configOkay (int $k, &$a) {
    $z = $a[$k];

    for ($j=1; $j<$k; ++$j) {
        $l = $z - $a[$j];
        if ($l == 0  ||  abs($l) == $k - $j) return 0;
    }
    return 1;
}



function solve (int $N, &$a) {  // return number of positions
    $cnt = 0;
    $k = $a[1] = 1;
    $N2 = $N;  //(N + 1) / 2;

    loop:
        if (configOkay($k,$a)) {
            if ($k < $N)  { $a[++$k] = 1;  goto loop; }
            else ++$cnt;
        }
        do
            if ($a[$k] < $N)  { $a[$k] += 1;  goto loop; }
        while (--$k > 1);
        $a[1] += 1;
        if ($a[1] > $N2) return $cnt;
        $k = 2 ;  $a[2] = 1;
    goto loop;
}

The Perl program. Perl is 5.34.0

use strict;


#  Check if k-th queen is attacked by any other prior queen.
#  Return nonzero if configuration is OK, zero otherwise.
#
sub configOkay {
    my ($k, $a) = ($_[0], $_[1]);
    my $z = $a->[$k];
    my ($j,$l);

    for ($j=1; $j<$k; ++$j) {
        $l = $z - $a->[$j];
        if ($l == 0  ||  abs($l) == $k - $j) { return 0; }
    }
    return 1;
}



sub solve {    # return number of positions
    my ($N, $a) = ($_[0], $_[1]);
    my $cnt = 0;
    my $k = $a->[1] = 1;
    my $N2 = $N;  # (N + 1) / 2;

    loop:
        if (configOkay($k,$a)) {
            if ($k < $N)  { $a->[++$k] = 1;  goto loop; }
            else { ++$cnt; }
        }
        do {
            if ($a->[$k] < $N)  { $a->[$k] += 1;  goto loop; }
        } while (--$k > 1);
        $a->[1] += 1;
        if ($a->[1] > $N2) { return $cnt; }
        $k = 2 ;  $a->[2] = 1;
    goto loop;
}