Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well. I don't know. Those two programs don't really do the same thing. There's an awful lot of comparisons in the second one. After making the awk program more similar to the Perl program, and using mawk instead of gawk (which is quite a bit slower) the numbers look a bit different:

  $ seq 100000000 > /tmp/numbers 
  $ time perl -MData::Dumper -ne '$n{length($_)}++; END {print Dumper(%n)}'  /tmp/numbers 
  $VAR1 = '7';
  $VAR2 = 900000;
  $VAR3 = '8';
  $VAR4 = 9000000;
  $VAR5 = '5';
  $VAR6 = 9000;
  $VAR7 = '4';
  $VAR8 = 900;
  $VAR9 = '6';
  $VAR10 = 90000;
  $VAR11 = '10';
  $VAR12 = 1;
  $VAR13 = '2';
  $VAR14 = 9;
  $VAR15 = '3';
  $VAR16 = 90;
  $VAR17 = '9';
  $VAR18 = 90000000;
  
  real 0m16.483s
  user 0m16.071s
  sys 0m0.352s
  $ time mawk '{ lengths[length($0)]++ } END { max = 0; for(l in lengths) if (int(l) > max) max = int(l); print max; }' /tmp/numbers 
  9

  real 0m5.980s
  user 0m5.493s
  sys 0m0.457s
[edit]: Actually had a bug in the initial implementation. Of course.


I used them both to find the longest line in a file. The Perl option just spits out the number of times each line length occurs. It will get messy if you have many different line lengths (which was not my case).

You also have to take into account that awk does not count the line terminator.

Let's try the opposite: make the Perl script more like the AWK one.

  $ time perl -ne 'if(length($_)>$n) {$n=length($_)}; END {print $n}'  rockyou.txt 
  286

  real 0m2,569s
  user 0m2,506s
  sys 0m0,056s

  $ time awk 'length($0) > max { max=length($0) } END { print max }' rockyou.txt 
  285

  real 0m3,768s
  user 0m3,714s
  sys 0m0,048s


`perl -lne ...` to have perl strip the trailing newlines like awk does. Should give the same result with it.


You're right. It even makes the times converge.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: