Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why not store the IP Address as a 32-bit number (IPv4 addresses)? Why store it as a string in the first place? This is something I did not quite get. Also, wouldn't it be better to split out the domain from the email address for ease of filtering?

Also, how does performing joins give a performance advantage here. I'm assuming there would be queries to get at the IDs of at least one, but going up to 4, to get at the IDs of the items in the quad. Then there would be a lookup in the mapping table.

I have worked for some time in this industry, but I have never had to deal with relational databases (directly; query tuning and such were typically the domain of expert db people). It would be interesting to see an explanation of this aspect.

EDIT: To people who may want to comment, "You missed the point of the article!": no, I did not, but I want to focus on the technical things I can learn from this. I agree that ageism is a serious problem in this industry.



You could store the IP as a 32 bit unsigned int. There's no issue with that, but I would probably recommend nowadays to use an internal cidr or inet type if your database supports that. It probably wouldn't be better to split the email address from the domain for filtering since email address formats tend to be a bit tricky.

Joins give a performance advantage due to the fact that you aren't duplicating data unnecessarily. The slow query in question becomes five queries (4 for the data once for the final lookup) which can each be done quickly and if any one of them return nil, you can return a timeout.


Yes, some of the databases support native ip address types that can be manipulated efficiently. Better to use that (like inbuilt json querying capabilities in postgres) than come up with your own.

It is still not clear how is it better to do up to 5 separate queries or maybe a few joins, than to store the strings and construct indices and such on them? Is the idea that the cost of possibly establishing a new socket connection for some concurrent query execution or sequentially executing 5 queries is still < possible scans of the columns (even with indexing in effect)? Also, even if you had integers, don't you need some sort of an index on integer fields to avoid full table scans anyway?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: