Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why perl has separate arrays and hashes: It's as if they thought it through (altreus.blogspot.com)
84 points by mst on June 21, 2011 | hide | past | favorite | 70 comments


Arguing that this cannot reasonably be done is unconvincing when well-designed languages like Lua pull it off.

The only thing about Lua's tables that consistently confuses people is the behavior of the length operator in the presence of "holes." The previous, somewhat confusing definition in Lua 5.1 (http://www.lua.org/manual/5.1/manual.html#2.5.5) has been replaced with:

    The length of a table t is only defined if the table is a
    sequence, that is, all its numeric keys comprise the set
    {1..n} for some integer n. In that case, n is its length.
    Note that a table like

     {10, 20, nil, 40}

    is not a sequence, because it has the key 4 but does
    not have the key 3. (So, there is no n such that the
    set {1..n} is equal to the set of numeric keys of
    that table.) Note, however, that non-numeric keys do
    not interfere with whether a table is a sequence.

    A program can modify the behavior of the length
    operator for any value but strings through the __len
    metamethod.
(http://www.lua.org/work/doc/manual.html#3.4.6)


Not only does Lua have the length problem, but array offsets are a matter of convention. Lua chooses 1 as the first element of an array instead of 0. But that is just convention. Your code could use 0 or -100. A confusing implementation of length and offsets by convention instead of using nearly universal array semantics from other languages is not "well-designed". It's a cheap hack to conflate everything into one table data structure.


> array offsets are a matter of convention. Lua chooses 1 as the first element of an array instead of 0. But that is just convention. Your code could use 0 or -100.

Not sure what your point is here. Lua arrays start at 1. If you disobey this rule, your table isn't really an array and the length operator will be undefined starting in Lua 5.2.

Sure you could disobey this rule and create an array that starts at -100. You can do the same in C:

    // Array whose indices start at -100:
    char *funky_array = (char*)malloc(n * sizeof(char)) + 100;
> It's a cheap hack to conflate everything into one table data structure.

You say "cheap hack" I say "brilliant optimization." Show me an equally-capable language that has a <100kb interpreter and performs as well as Lua. Lua fulfills its design goals brilliantly.


Perl 5 has $[ which lets you set the base for every array. eg. $[ = 1; would cause all arrays to be 1-based. Thankfully they've now deprecated this misfeature ...

http://perldoc.perl.org/perlvar.html#Deprecated-and-removed-...


I thought that was deprecated a long time ago. When I learned Perl in 2000 I remember I read not to use that "feature". Apparently it was just not recommended, for a very long time.


Certainly the advice has been don't use it for a very long time. At least since the 90s when I learned Perl. However it wasn't actually deprecated until last year (Perl 5.12). There were a sequence of steps leading up to full deprecation which you can read about in the link I provided above. Maybe some people were actually using it?!?


I'm not convinced Lua quite pulls it off. The combination of array and hash is awesome and I love it, but as a practical matter, all my tables are either arrays or hashes (objects), and I treat them as such. They aren't really unified, and this difference permeates everywhere, from implementation details, to the C api, to high level functions like pairs vs ipairs.

The blog post was trying speculate about slicing an array to a hash, or iterating backwards from "-1" in a hash, or something. Whatever he wanted (not quite sure), you can't do that in Lua. I don't know why you'd want to either. So Lua doesn't pull it off, but there's no reason to.


> The blog post was trying speculate about slicing an array to a hash, or iterating backwards from "-1" in a hash, or something.

It was mentioning features which don't really make sense on associative arrays, or are not compatible with mixed usage:

* Some features (e.g. sorting values) simply can't behave the same between arrays and associative arrays (on an array it will change the value's index, but it can't change the value's key on an associative array)

* Splicing (not slicing, though that would probably work as well) hardly makes sense on an associative array, you just don't need it (splicing is the replacement of an indexed section of an array by an other array — which depending on the language may be of a different size, including empty): you can just set the keys instead (e.g. combine with an other associative array)

* The final issue is with indexes, including "interesting" indexes (direct and in slices/splices) such as indexing arrays from the end (a very nice feature of some languages: Python and Ruby share it with perl) using negative numbers. You just can't do that if you mix things up, so you have to rely on calls like `val = a[length(a) - 2]` instead of just `val = a[-2]` (note that there are languages out there which do not conflate arrays and associative arrays and still don't implement this feature, and a restricted class of languages on which you can actually have negative indexes on arrays)


So Lua is able to pull it off, but there's something that consistently confuses people? Forgive me, but hose sound contradictory.


There's only one thing that consistently confuses people, it has nothing to do with any of the objections raised in the article, and it's being fixed to be less confusing.


Except that it was mentioned, at the very least tangentially, in the article:

what to do when, for example, someone creates an entry in an array by giving it an ordinal position that doesn't exist. Do we create an array of suitable length and fill it with bits of emptiness in order to maintain the illusion that this array is ordinal?


So Lua has negative indexes?


It may be better to think of it as a key/value table (a "dict" or "hash") that recognizes when it's being used as an array and optimizes its internal storage accordingly. It always has dict/hash operations, but indices of 1..n are stored more efficiently, and a couple functions reflect this common & algorithmically significant case: the length operator returns the n of 1..n, and "ipairs" iterates over the {1, value}..{n, value} pairs. (The un-ordered {key,value} iterator is "pairs".)

Typically, it's clear upfront whether you're using it as an array or as a table; there isn't much overlap. I understand it being confusing in the beginning, but it's really not a big problem, it's just Lua's version of "oh god, the parens/significant whitespace/etc.". (That and indexing from 1. Not a fan of that personally, but whatever, Lua is handy.)

So yes - you can use negative numbers as indices. Or strings. Or other tables, or arbitrary C pointers. Arrays are just handled better.


It has to do with "holes" in the array. If you have a Lua array like: { 1 , 2 , 3 , nil , 5 }. That "nil" causes problems. I, personally, haven't had an issue with this, but it does come up regularly on the Lua mailing list.


And that sequential tables are 1-indexed, not 0-indexed like nearly every other language on the planet.


That was intended to make it less confusing when non-programmers use it as a configuration language. It's a bit annoying, but the language is consistent about it: if all you want is a tiny library for JSON-like config files or data dumps, it cleanly scales down. If you want a powerful, modern hacking language, it's there, but the advanced features (tail-call optimization, co-routines, etc.) never get in the way.


First, it is pretty funny for this poster to use this as a reason why perl is a well thought out language. It like all languages has pros and cons, and picking one thing about perl that isn't even unique to perl isn't a good example of how perl is so well thought out.

Second, I agree with the premise that having two separate structures is better, but a lot of the issues brought up in this post just don't come up in the real world. For example with PHP once you understand that the array() construct is just a hash map with a linked list that stores the ordering, a lot of the these complications are much less complicated. Just don't think of the keys as indexes in an array, think of them as keys in a hash map that have nothing to do with ordering. If you want to change the order, you need to change the linked list, which initially is set by insert order but can be changed to something else by using various sort functions. Again, I'm not saying this is the greatest design in the world, but I am saying that living with this design in practical application isn't a big deal.

In fact, the biggest annoyance has nothing to do with any coding stuff, it is just that you pay a huge memory penalty if you just want a simple array of items. Luckily PHP added a few newer data structures (http://www.php.net/manual/en/spl.datastructures.php) like SplFixedArray which are much more memory efficient. Obviously in standard usage the memory doesn't really matter, but in some cases where you want very large data structures these new objects come in handy.


It like all languages has pros and cons, and picking one thing about perl that isn't even unique to perl isn't a good example of how perl is so well thought out.

I believe you've misread the intent of the title. I read it as an answer to the question: "Why does Perl have arrays and hashes, even though the language I'm used to does not?" And, this article takes a decent stab at explaining why Perl has arrays and hashes. It does not, as far as I can tell, make any attempt to explain why this proves that Perl is superior to all other languages or is more thought-out than other languages (except perhaps those languages that don't have separate array and hash datatypes).

The way I read it, it's saying in a lot more words, "You'll like it once you get used to it."


Do so many languages conflate arrays and hashes?

I know of PHP and JavaScript. PHP was not designed at all so no point discussing it. JavaScript was initially written in a week so we forgive Brendan Eich.


Javascript does not conflate arrays and hashes.

Technically Javascript does not have hashes at all, and you use bare objects instead, the confusion between objects and Array comes from PHP reprobates.

Lua is an other language with no arrays, and where hashes are used in stead. Additional entrants in the category are ~~Tcl~~ (got corrected, Tcl's associative arrays are called "arrays" but it has a separate "list" type for indexed sequences) and Awk.


Tcl doesn't really fall into this category -- Tcl's hashes are called arrays, but for sequential structures you would use a list instead.


Oh crap, the terminology got me confused, I thought they were used the same. Fixing now.


Ok don't include JavaScript. So there is only PHP that does this? I'm just trying to find out where the article is coming from. I didn't think this was a common theme amongst languages.


Javascript doesn't really conflate arrays and hashes in the way PHP does. It uses the same bracket syntax for both, but you either instantiate an array or a hash.

I don't think that allowing you to iterate over a hash as if it were an array is all that unforgivable, though it's often not the right thing to do and you can't expect the values in any particular order.


> I don't think that allowing you to iterate over a hash as if it were an array is all that unforgivable

it's not even doable in JavaScript: bare objects (~hashes) are iterated with for...in, using that with arrays gives inconsistent results: the array's keys will be iterated as if they were (string) properties, there are no guarantees they'll be iterated in numerical order, and any enumerable property added to the array itself or any of its ancestors will be iterated over as well.

Friends don't let friends iterate over arrays with for...in.


> Friends don't let friends iterate over arrays with for...in.

Which is too bad because it looks so much better.


I just use Array.prototype.[forEach|map|filter] instead, or the equivalent Underscore.js function (which aliases to native when it can) if IE-compatibility is needed.

If you need performances, nothing will beat C-style access anyway. And FWIW, using for...in on an array is slower than using Array.prototype.forEach.

edit: FWIW, even hand-rolling an each() function (in the style of underscore's) will be faster than for...in.


it works this way, though, because an array's keys actually are string properties. It's the array methods that treat them as if they were numeric.


Perl has 3 ways to iterate over a hash: "keys", "values", and "each". The first two do what you'd think; "each" in array context iterates over key/value pairs. All three return their results in an "apparently random" order.

You can also use "for" to iterate over a hash, same as you would over an array, but in that case the hash is converted to a list of alternating keys and values, which isn't as useful.

[edited to correct a typo]


The results are random in order, but consistent between calls, so it's not all bad


I had to look this up because I was thinking otherwise, but now I believe you are correct - it is just inconsistent between runs:

http://perldoc.perl.org/functions/each.html says "Hash entries are returned in an apparently random order. The actual random order is subject to change in future versions of Perl, but it is guaranteed to be in the same order as either the keys or values function would produce on the same (unmodified) hash. Since Perl 5.8.2 the ordering can be different even between different runs of Perl for security reasons (see Algorithmic Complexity Attacks in perlsec)."


> I don't think that allowing you to iterate over a hash as if it were an array is all that unforgivable

Ruby allows this. Both arrays and hashes provide an each method and include the Enumerable module.


Ruby even makes it invisible because of destructuring assignment of block variables.

    ruby-1.9.2-p180 > {1=>2}.each {|k,v| puts "#{k} #{v}"}
    1 2
     => {1=>2} 
    ruby-1.9.2-p180 > {1=>2}.sort.each {|k,v| puts "#{k} #{v}"}
    1 2
     => [[1, 2]]


"You either instantiate an array or a hash" suggests that arrays and hashes are distinct in Javascript, but they aren't. Arrays _are_ hashes (that is, Objects) with some extra functions in their prototype which let them, in the words of the standard, "give special treatment to a certain class of property names."


Yes, you're absolutely right. I meant that in usage you generally either instantiate a plan object and use it as a hash or you instantiate an array and use it as an array, but when you instantiate an array, you can treat it like any other object.


JavaScript does not do this. It doesn't even have hashes in the traditional sense (all keys are stringified in prototypical objects).


I'm not sure if I'd characterize Javascript's handling of arrays and hashes as conflated. The two are treated differently. To take an example from the post, Javascript Objects (hashes/Associative arrays) do not have a splice function.

To answer your question I don't know of many languages which conflate the two. I know a lot of languages which offer similar methods, to create parity between arrays and hashes (since they are similar enough data structures) but not many which treat them the same.


> I'm not sure if I'd characterize Javascript's handling of arrays and hashes as conflated.

Insomuch as [] and {} produce arrays and object, yes, you are correct. But those objects are strikingly similar and the functions in each prototype work on the other. To the outside observer, only the length bit works differently.

I can do:

  a = new Array();
  a["b"] = 1;
  for (x in a) { /* x will be "b" here*/ }
  a.length /* 0 */
and

  d = {0:"a",1:"b",2:"c",3:"d",length:4}
  Array.prototype.slice.call(d, 1,3) /* returns ["b", "c"] */
That's conflated a little bit.


Ruby 1.9 hashes preserve insertion order, so you could use them as arrays. Not sure why you'd want to, Ruby still has separate arrays.


The point of preserving insertion order is not to be able to treat hashes as arrays. It's to get deterministic and repeatable iteration order between different runs. Java has LinkedHashMap and LinkedHashSet if you care about iteration order and HashMap and HashSet if you don't.

With the repeatability it's easier to write automatic tests.

I worked at a company making a compiler in Java and we got different binaries when compiling without changing the source code. After we changed to LinkedHashMap and LinkedHashSet that problem was gone. That's an example where we wanted the deterministic aspect of the Linked versions.

Another way to get the same behavior in Java is to use TreeMap and TreeSet, but that only works if the things you put in are comparable, and you get different time complexities as well.


That this post has so many upvotes is evidence that a lot of nonprogrammers read HN.


I don't understand this comment. I'm a programmer, designing a language in my spare time, and I found it interesting, since I considered going the Lua/PHP route with my collections. I thought it was nice to find an article about actual programming an HN again, and I don't know why non-programmers would upvote it.

Please explain. I'm really curious.


My comment was reflecting my intuition that most programmers would not find the blog post useful or insightful.

The first two sentences of the blog post are, "I wonder why we have separate arrays and hashes in Perl. Other languages don't do it." However, as evidenced by only 4 contrary languages listed in the comments here, the vast majority of programming languages have separate array and hash structures.

Much of the post is absolutely trivial. The post is a combination of circular reasoning (arrays and hashes must be different because they... are different) and some sort of straw man gedanken (well it would be easy to do if we did this but then that doesn't work, so obviously this way's "thought out").

It's a whole post to point out that arrays and hashes are different and are used differently. Hurray? Let's upvote it?


I see what you mean, in that it could definitely have been a lot shorter and to the point. But it was satire, especially those first two sentences, to point out how silly he thinks it is to combine them. I think you may be taking it a tad too seriously.


I don't think it means there are less programmers on HN, rather I think it reflects the drop in the programming skill of the average user as HN has grown.


Quite. The whole point (haha) of an array is that you take a pointer to its first item + (the size of each item * the index number) and get a pointer to the item you want, which is bloody quick. No hashtable can compete with that - a hash is a compromise for when you can't use an int for whatever reason.


That's not the whole point of an array, an array also has practical uses!


It's almost as if the majority of programming languaguages are "thought through". I mean, really, who designs a programming language and doesn't think it through? It's a nontrivial undertaking typically undertaken by people that are smarter than the average bear.


This was a fun post, but I found the other post on this blog more interesting:

http://altreus.blogspot.com/2011/06/anatomy-of-types.html

Not terribly useful to anyone who's been around the block in Perl at least once, but it's a great explanation of the three Perl variable types. Saving it for the next time I need to teach someone Perl.


How is this different from Python's lists/dictionaries?

Also IIRC PHP does internally differenciate between array(1,2,3) and array('blah'=>1, 'bloop' =>); Just the syntax is not as clear.


> How is this different from Python's lists/dictionaries?

It's not. FWIW, I know of three languages which use the same data structure for arrays and associative arrays (/maps/hashes/whatever key/value collections are called these days): PHP, Lua and Awk (edit: had originally put "Tcl" here, but an other commenter set me straight: Tcl's arrays are called "list", Tcl's associative arrays are called "array", this got me confused. Sorry about that).

Some functional languages also use lists of (key, value) pairs for small associative arrays (they may or may not provide "proper" associative arrays as well). They are called "association lists"[0] in many lisps (using a list of dotted pairs, rather than a list of lists) and "proplists"[1] in Erlang (using a mixed list of tuples and atoms).

[0] http://www.gnu.org/s/emacs/manual/html_node/elisp/Associatio...

[1] http://www.erlang.org/doc/man/proplists.html


What it does internally isn't much compensation for the fact that the code I write becomes harder to work with. :(


The syntax is unclear, but also the behaviour is unclear.

$var[] = "entry1"; $var['something'] = "entry2"; $var[] = "entry3";

$var[0] is now "entry1", and $var["something"] is "entry2", but what's the index for entry3? And if you iterate over the array, will you get entry2 before entry3?

What if you create $var[$somestring], but $somestring is "29"? Now it will be at $var[29], and automatic indexing will reset to 30, which you may not expect.

There's a lot of argument to "just don't write code like that", but with the inconsistencies in PHP library functions (and PHP's careless type conversion) it's easy to come across cases where this sort of "do something unexpected instead of throwing an error" behaviour can bite people in the ass down the road, through no fault of their own.


Oh yes, I don't deny that it's horribly done. I use my own classes "Dict" and "Li" instead of arrays whenever I need them to be specific, generally.


From what I recall, Python (2.6) dictionaries do not necessarily preserve the order of items in the underlying structure. Eg. if "x = {1:1, 2:2, -1:5}", then x.items() will not necessaryily return "[(1, 1), (2, 2), (-1, 5)]"

However, in Python 2.7+ ordered dictionaries in the collections library do preserve order. Perhaps this achieves the desired behavior?


Array and hash types in perl likely evolved bottom-up from the lower-level data structures of the same name.

Other languages use terms like "List" and "Associative Array" for their abstract data types to differentiate from the implementation techniques. And yes, lists and associative arrays have slightly different properties, as do sets, stacks, and queues.


tl;dr: arrays, dictionaries, ordered dictionaries (lists, sets and so on) are all different classes of data structures with different properties. To be efficient one should chose best one, not rely on some "all included" generic datatype.


The PHP array is a wonderful data structure with an awesome set of properties. This blog post contains a lot of "problems" that are somewhat trivial and have been reasonably solved in PHP. This comment will be downvoted to oblivion without discussion.


> This comment will be downvoted to oblivion without discussion.

If you can see the future, I recommend playing the lottery. If you're a snarky ass, I recommend not commenting.

I've been working with PHP for five+ years now, and only started working with Python in the last year or so. You're right - working with PHP's arrayhash isn't too terribly complicated, but having actual arrays and hashes as different types is super convenient. At least once a day, I think fondly of Python and wish I could use it - list comprehensions and negative indices could eliminate the need for half of PHP's array functions and make for cleaner and more readable code.


> If you can see the future, I recommend playing the lottery.

I can't see the future, but I can edit the past. It's just a little disappointing to be downvoted without anyone actually commenting on the subject.

> but having actual arrays and hashes as different types is super convenient.

I agree especially if, in the language, arrays and hashes are objects with methods.

I didn't claim PHP's arrays were the greatest data structure on earth but ordered hashes that special-cases integer indexes is a pretty nice procedural-style data structure. PHP doesn't really have arrays; it's just the hash type is flexible enough to do double-duty.

Apparently, on Hacker News, having such opinions is both downvote worthy and punch-in-the-face worthy.


> It's just a little disappointing to be downvoted without anyone actually commenting on the subject.

When I saw your comment, it didn't look downvoted to me. You clearly realize that people can vote on comments - has it occurred to you that comments can bounce back from a negative score? That is, unless you append some passive aggressive horse-shit to the end.

Honestly, I don't know if we gain anything by PHP having arrays pull double duty in this way. The only advantage is, what, having one less keyword to remember?


You said that PHP's arrays/hashes were awesome, but did not say why. In other words, you did not contribute to the discussion. This may have attracted some of your downvotes; certainly it made it your post unlikely to convince anyone of anything.


You made my morning, good sir. I could not stop laughing. You have brought mirth and cheer to our workplace.


With all due respect to you as a human being, Sir, I have sincere urge to punch you in the face. Please, tell me you're trolling.


Please don't feed the trolls.


Using the same object for arrays and hashes seems like a terrible idea long before thinking about it this much. JavaScript does it, but JS goes for this super-simple, everything-is-a-function-hash-thing simplicity


As stated above Javascript does not do this it has separate array and Object types. Javascript developers might.


No it doesn't. Arrays are a type of Object, they're not distinct.


> JavaScript does it

No, javascript does not.


  zoidberg:~ phil$ node
  > a = {}
  {}
  > a[1] = 56
  56
  > a['1']
  56
  > a = [null, 56]
  [ null, 56 ]
  > a['1']
  56
  > a[1]
  56
  > typeof a
  'object'




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: