Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone heading down a similar path (and I'm fairly sure I've got sensible prefixes) can you share an example of a prefix that caused you trouble. Is it something like

    /path/to/big-dir/«lots-of-sequential-filenames»

?


Exactly. It works fine for most tasks, of course, but if you ever want to process the contents of the S3 bucket in bulk, nothing will ever be able to parallelize that one list request to /path/to/big-dir.

If you don't use the evenly-distributed-prefix trick, your only chance of speeding it up is knowing the file names beforehand. If they're all sequentially numbered, you might do that, of course.

The shardable prefix doesn't need to be at the top level. So you could also organize it like so, for example:

    /secret/documents/2016-01-01/00000001.doc


Thanks! I've read the docs and blog posts, but it was interesting to see a real live antipattern.


I suppose it's like with regular file systems -- don't have too many files in a directory.

In your use case, consider `/path/to/big-dir/AA/AABB/AABBCC` or similar?


Sorry, that example wasn't my data, it was to illustrate question.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: