I can recommend the black art of 3d game programming, don’t remember by who, it’s this muscle guy :) and the series called “code on the cob”, by Chris Hargrove, while he was working at 3d realms
Just only watch the intro lecture and I can confidently confirm that this course would be fantastic. However, the professor mentioned that this course is only about processor. Anyone know similar courses/books about filesystem, network, RAM, ....?
Does someone here use DuckDB in production? Is it as stable as SQLite?
The hosting company I have to work with has a very old version of SQLite installed on the server and they don't want to update it. So I was looking at whether I could replace it with DuckDB since it seems to be easy to install with pip.
You can get an overall "idea" of the code distribution, or where the meat is, by using `cloc . --by-file`. This shows you a listing of files and their respective number of lines of code and comments. You instantly get an idea of the "big" files of the project you probably should get to know.
You also see the ratio comment/code and can start documenting and writing tests as you go. It will help you understand the project, and will be useful for everyone and in refactoring.
0.0.1. Structure:
0.0.1.0: Visualization
Drawing helps me. Conceptual blocks of the project. It can start as block names and boxes "Codec", "Streaming", "Persistence", "Dispatcher", etc.
Drawing on paper, whiteboard, etc. You can also use something like "Gliffy" to diagram for yourself and with someone when communicating. You can both change and make sure you're talking about the same thing.
And do it going down the abstraction scale. Higher abstractions to lower abstractions.
I also found a quick automatically generated "UML" diagram helps. I don't write them necessarily, but I use a tool that tells me about the structure of a project.
pyreverse for Python, scaladiagrams for Scala, are just a few examples that generate an image you can save and glance over after the `cloc` stuff.
0.0.1.1: Grep
I use `git grep` on the classes found in the biggest files I found with `cloc`. I can see how they "disseminate" through the project and how they get around and are used
0.0.1.2: Tests
I look at unit tests (if any) to get a sense on what the different parts do and how they're used. You learn a lot about "intent" from the tests if you're lucky to have them. I add tests as I get acquainted with the project.
If there's a CI config file, I'll check it out and it tells me how to deploy the project. It's often more accurate than the readme or else the builds would fail.
0.0.2: Musing
I maintain a separate "musing" file, sometimes called "refactoring" where I'll put a comment with the path to a file and the part it's about (say a function, or a class), and rewrite it. It is separate in the beginning as I don't know if the refactoring is relevant or if there's any Chesterton's Fence and I don't want to pollute the repo. It is my way of grappling with the code. Several rewrites. Often when I'm in a quiet place and my juices are flowing.
0.1. Taking notes for project:
One of my habits from the beginning was taking notes. I can now pull my notebooks at my current job and see the chronology (dated by month) and ordered. I can walk through the different difficulties, conceptualizations, drawings and schematics, sequence diagrams, re-architectures, refactorings, etc.
As you embark on your journey in a project, your fresh eyes are valuable. You're seeing a lot of what has become "normal" for other developers. Note everything. The idiosyncratic build or deployment. Out of date documentation. Everything.
You can create well written issues that follow a template with expected and actual behaviors, screenshots, logs, stack traces, possible fixes, likely culprits, etc.. If your repo does not have a template for issue, propose one.
Write issues in a way to make them easier to fix for everyone. Your group can use your notes to rework an API or the user experience. You're also working to streamline future contributions for newcomers.
0.2. Knowledge Base
As your understanding of the project increases and you touch more and more parts, you can maintain a sort of knowledge base. It is highly likely that this knowledge base will serve as the project documentation if it has none.
1. Attention
1.1 Taking meeting notes:
Many meetings. Few actions. Everybody ends up talking about the same things and issues drag on forever. You can take notes and review them later. Write them up and send them to everyone. Check out minutes of meetings.
You can write the gist of what's said, and the action that must be taken, by whom, and when.
Taking these notes will help mitigate your distraction during the meeting as you can review them at your rythm.
# Minutes of Meeting
--------------------
## Date: 2019-03-25
## Place: BIGCorp HQ. City, State.
## Participants:
### BIGCorp:
- John Doe (jd)
- MeMyself I (mmi)
### OtherCorp:
- Dilbert (db@othercorp.com)
- Dogbert (dg@othercorp.com)
## Topics:
- Scheduled information sending
- Information flow
- Architecture for Project X
## Details:
OtherCorp has raised some issues for the timeline of Project X...
Blah blah blah.
## Actions:
### BIGCorp:
- [ ] @bc: Send project X estimates by 2019-03-28
- [x] @bc: Send invoice and cheque for offices remodeling
- [ ] @mmi: Add different schemes for user authentication
- [ ] @mmi: Finalize migrations so we take into account user's timezone
### OtherCorp:
- [ ] @oc: Expose API end points for user's identity verification
- [ ] @oc: Cache the results of the most common queries
2. Skills
Understand the business value for the customer (what value is your code bringing to the customer, what are they trying to achieve).
Daily improvements. Reading good books and implementing what they contain. Writing the best code you can write. Reading the best code you can find. Consistent, continuous, relentless, improvement.
If you can be better than you were when the day started, and do that daily, good things may happen.
Constantly transfer the quality gains to the projects you're involved in (better documentation techniques, better patterns, less complexity, etc.)
Help others write better code by setting the example. Mentor newcomers to the project. Help write tools for everyone to use.
Put yourself out of a job. Imagine you are dead, what will the person who'll replace you need to know and have to do. Write that down somewhere. Imagine you lose memory and have to rediscover your job everyday: write a manual for yourself.
A colleague and I had to take over when the co-founder and CTO of a company we had joined a year and a half earlier quit without notice. One day, he just called in sick and then he said he was no longer with us.
We took over in the midst of major ongoing projects we were involved in in a technical capacity. We had about ten projects with large organizations at the time, and meeting lined up the following days to solve bottlenecks. There were no processes, the accounting was ad-hoc, everything was in his head. We had to consume thousands of emails, documents, contracts, and go over everything, everything, to prepare.
I took everything I could off the plate of my colleague who was an expert in deep learning. The first thing I did was to create an "Operator's Manual". i.e: a handbook that would enable my colleague, and potentially anyone in the company, to run that company if something were to happen to me.
That was a repository on GitLab that contained everything. I listed all the things that needed to be done: bank transactions, taxes, accounting, etc. and distilled them into a system that allowed anyone to do that.
I used GitLab because it allows us to write Markdown and has version control to track changes even in the Wiki feature, and that's what we use for code so we leverage our expertise instead of adopting a new tool, but the "Job to be Done" is to have information dissemination and be truly async. You can use Google Docs if you're more comfortable with that.
Meaning I literally wrote guides on how to pay taxes, with pictures of a cheque, where to put the amount, where to sign, what form to send, where to go to pay taxes including the GPS location of the building, a picture of the building, which floor to go to, which office in that floor, which lane in that office.
Every single time I had to do something, I wrote it down so it could be done by me if I forgot, and someone else if I couldn't do it.
When we onboarded people, I'd notice we'd go over the same things, so that was put into a checklist of the things to do when onboarding a new member so you don't forget to create an email address before they come in to work, for example.
When having practically the same conversations with new hires, that went into an onboarding document explaining our stack, the books to read, why they are good books and what they will teach, which tools to master, what we need them for, how our workflow is structured, etc.
I listed the bank accounts, the companies we were involved with, their bank informations, points of contact, etc. I wrote down how to prepare the invoices, created a LaTeX template, found out the laws that regulated that. Everything I did was to be put there so someone else could just pick that and operate the company.
The way I do thing is for me to be able to die without impacting the team. If you get over the morbid nature of that statement, you start seeing a lot of opportunities of things you could automate or make truly async. Something there for others to consult.
One of the problems we worked on was information asymmetry. We made sure everyone had access to all the information to make better decisions. I wrote meeting minutes of all the meetings we went to, with all the details, and put them on GitLab and gave access to everyone on the team so anyone can read them whenever they wanted. They could read client meetings, investor meetings, etc.
Here's our template:
# Minutes of Meeting
--------------------
## Date: 2019-03-25
## Place: BIGCorp HQ. City, State.
## Participants:
### BIGCorp:
- John Doe (jd)
- MeMyself I (mmi)
### OtherCorp:
- Dilbert (db@othercorp.com)
- Dogbert (dg@othercorp.com)
## Topics:
- Scheduled information sending
- Information flow
- Architecture for Project X
## Details:
OtherCorp has raised some issues for the timeline of Project X...
Blah blah blah.
## Actions:
### BIGCorp:
- [ ] @bc: Send project X estimates by 2019-03-28
- [x] @bc: Send invoice and cheque for offices remodeling
- [ ] @mmi: Add different schemes for user authentication
- [ ] @mmi: Finalize migrations so we take into account user's timezone
### OtherCorp:
- [ ] @oc: Expose API end points for user's identity verification
- [ ] @oc: Cache the results of the most common queries
Here's an excerpt from the reply about keeping up with information:
====BEGIN===
There are a few tricks:
Todo/Note dichotomy:
--------------------
Doing away with "todo" and "note" dichotomy can be useful. I use TaskWarrior[0] to add "tasks", but also "notes". I have them organized in projects, tags such as +musing, +engineering, +read, +watch, etc.
I'm watching an interview of someone relevant and they mention a book?
task add +read "X book mentioned by Y in interview with Z: youtube.com/..."
The dump of these notes formed the seed of our company's knowledge base and it continues to feed it and refine it.
The tasks are in a `.task` directory that is a repo. I have an alias to push them:
function tupd() {
git -C ~/.task commit -a -m "Update tasks $(whoami)@$(hostname)"
git -C ~/.task push
}
I can pull in my tasks and notes from my devices. I can search them by tag, by project, by word or regular expression. You can set due dates, start dates, etc. So you set three or four important tasks for the next day. It will sort them by an urgency score. It is one of those tool you could get started with using one command, but you can do a lot of things.
If you're working on a project, you have a nice knowledge base consisting of book titles, articles, interviews, talks, remarks, ideas, etc.
I have a script that exports work related notes into Markdown and pushes to a temporary repository for all my colleagues to see. Then we transfer that into proper issues, or our knowledge base.
===END====
Again, I use TaskWarrior because it allows me to do things in the command line, and I push things to version control in GitLab, and everything is in plain text, but the "Job to be Done" is to quickly capture insights, knowledge, and tasks and disseminate them.
For the product we're developing, I regularly send updates to everyone that explain what we're doing, why we're doing it, what we'll do next and why, and why I believe this is the best approach given the constraints/tradeoffs/objectives. Everyone can poke holes in them or chime with their opinion, from our advisors on a strategy level, to my colleagues on technical or product input in case I missed something. This enables people to be clear on things and unlocks individual initiatives.
Management is tied to the person, really. Some get it right very, very, often. Others need a talk because they veer off.
One other way I look at it is that I'm firing myself. I'm putting myself out of a job as I try and make sure to capture everything I can there. This institutionalizes knowledge and anyone can do that eventually, and I can do something else. When we have hired for a person to deal with that, we have them the handbook and they didn't miss a beat. They knew what to do with practically no training. They've been augmenting it with the things we have learned, tricky transactions or export procedures, etc.
Here are a few things I wrote in here that could be useful. They are designed to improve remembering to do things, doing things, learn from doing things, make sure everyone knows what they should be doing, remember why we're/if we ought to be doing things in the first place, and doing the right things. You can skim over the "tech" replies and go over "making the most out of meetings, leveraging your presence", or "product development", "giving a damn", "If I disappear, what will happen", etc...
Also, a book I highly recommend is "High Output Management" by Andy Grove.
- https://news.ycombinator.com/item?id=24972611 (about consulting and clients, but you can abstract that as "stakeholders", and understanding the problem your "client", who can be your manager, has.)
- https://news.ycombinator.com/item?id=24209518 (on taking notes. When you're told something, or receive a remark, make sure to make a note and learn from it whether it's a mistake, or a colleague showing you something useful, or a task you must accomplish.. don't be told things twice or worse. Be on the ball and reliable).
- https://news.ycombinator.com/item?id=21427886 (template for taking minutes of meetings to dispatch to the team. Notes are in GitHub/GitLab so the team can access them, especially if they haven't attended).
- https://news.ycombinator.com/item?id=26123017 (fractal communication: communication that can penetrate several layers of management and be relevant to people with different profiles and skillsets)
- https://news.ycombinator.com/item?id=26179539 (remote work, use existing tooling and build our own. Jitsi videos, record everything, give access to everyone so they can reference them and go back to them, meetings once a week or two weeks to align)
I'm a designer-turned-developer and created a nice-looking solitaire card game website: https://online-solitaire.com/. It's making me around $1500 a month.
My approach was to find a nice of apps that where already popular and then see if I could make it better.
Each section contains a story of some situation he was in where he faced a problem which he solved by applying one of various algo techniques (DP, divide and conquer, etc.). After reading CLRS for a class, it was nice to see how some of the most common textbook algorithms have been applied by a notable computer scientist.
A Coding for Interviews [1] group member mentioned that reading through the Java collections library [2] was the most valuable step he took while preparing for his Google interviews.
In addition to getting a better understanding the standard data structures, hearing a candidate say "well the Java collections library uses this strategy..." is a strong positive signal.
Go download David MacKay's Information Theory, Inference and Learning Algorithms (free book). Go through the part on Bayes and the part on Neural Nets (and the info. theory part if you want to, which is fascinating but not as directly relevant), which is a total of roughly 20-30 chapters, some very short. Do as many exercises as you can do (i.e. try them all, fail and come back later if necessary), and try implementing those algorithms. That will get you boned up on this stuff generally.
From there:
Standard references are Hastie and Tibshirani which you already have, Pattern Recognition by Duda Hart and Stork, and PRML by Chris Bishop (though I found it boring - too many unmotivated equations). All of Statistics and especially All of Nonparametric Statistics by Wasserman are both excellent books which will fairly rapidly get you introduced to large swaths of statistical models. Papoulis (1993) is quite a good reference on statistics in general, and Joy & Cover is the usual reference of choice for information theory (which is very relevant to what you're interested in), but neither of those are much fun to actually read.
You seem less interested in classification/ML problems and more interested in straight-up stats and/or timeseries stuff. So some slightly deeper references:
- Given your interests you might absolutely love Kevin Murphy's PhD thesis on Dynamic Bayes Nets, which are excellent for describing phenomena in all three fields you mentioned.
- Check out Geoff Hinton's work, especially on deep belief nets (there's a Google tech talk and a lot of papers).
- Hinton and Ghahramani have a tutorial called "Parameter Estimation for Linear Dynamical Systems", which could be directly applicable to the models you're talking about
- If you're interested in these dynamic, causal models you'll want to learn about EM (which you should know already since you know HMMs), and its generalization Variational Bayes. MacKay has a terse chapter on variational inference; http://www.variational-bayes.org/vbpapers.html has more. One of those is an introductory paper by Ghahramani and some others, which is nice.
Some of those references (esp. the VB stuff) can get slightly hairy in terms of the maths level required (depending on your background). Bayesian Data Analysis with R (by Jim Albert), or Crawley's R book (for a more frequentist approach), can get you started using R which can avoid you needing to implement all this stuff yourself, as much of it is already implemented. This might be your fastest route to writing code that does cool stuff - understand what the algo is, use somebody else's implementation, apply it to your own problem.
Build a web scrapper and save some raw data in somewhere like S3. Then run some job on top of that data to get aggregated measures and save them somewhere. I built a project[1] like this and learned a lot in the process. I used airflow to run the scrapping tasks, save the data in S3, use AWS Athena to run queries and load data into Redshift. I did all of this just to learn more about Airflow and some AWS tools.
First, I'd recommend being suspicious of survivorship bias and following in anyone's footsteps (including any mentioned here). Relevant xkcd: https://xkcd.com/1827/
After that, I'd recommend the Coursera course Learning How to Learn as pre-work.
3 free resources I highly recommend:
- Harvard's CS50. Cannot recommend this highly enough.
- MIT's Intro to Programming w/ EdX. Great way to learn more about problem solving
- FreeCodeCamp: Great directed resource for a path to learn front end development. There's so much information out there that it's helpful to have some guidance.
Just pick one and force yourself to use it to the exclusion of other editors. Future you will thank you later, because you'll still be using it 20 years from now. "We are typists first, programmers second" comes to mind. You need to be able to move chunks of code around, substitute things with regexes, use marks, use editor macros, etc.
https://www.tarsnap.com/download.html How to write C. Study the "meta," that is, the choice of how the codebase is structured and the ruthless attention to detail. Pay attention to how functions are commented, both in the body of the function and in the prototypes. Use doxygen to help you navigate the codebase. Bonus: that'll teach you how to use doxygen to navigate a codebase.
You're not studying Arc to learn Arc. You're studying Arc to learn how to implement Arc. You'll learn the power of anaphoric macros. You'll learn the innards of Racket.
Questions to ask yourself: Why did Racket as a platform make it easier to implement Arc than, say, C/Golang/Ruby/Python? Now pick one of those and ask yourself: what would be required in order to implement Arc on that platform? For example, if you say "C," a partial answer would be "I'd have to write my own garbage collector," whereas for Golang or Lua that wouldn't be the case.
The enlightenment experience you want out of this self-study is realizing that it's very difficult to express the ideas embodied in the Arc codebase any more succinctly without sacrificing its power and flexibility.
Now implement the four 6.824 labs in Arc. No, I'm not kidding. I've done it. It won't take you very long at this point. You'll need to read the RPC section of Golang's standard library and understand how it works, then port those ideas to Arc. Don't worry about making it nice; just make it work. Port the lab's unit tests to Arc, then ensure your Arc version passes those tests. The performance is actually not too bad: the Arc version runs only a few times slower than the Golang version if I remember correctly.
== Matasano crypto challenges ==
http://www.matasano.com/articles/crypto-challenges/ Just trust me on this one. They're cool and fun and funny. If you've ever wanted to figure out how to steal encrypted song lyrics from the 70's, look no further.
== Misc ==
(This isn't programming, just useful or interesting.)
Don't fall in love with studying theory. Practice. Do what you want; do what interests you. Find new things that interest you. Push yourself. Do not identify yourself as "an X programmer," or as anything else. Don't get caught up in debates about what's better; instead explore what's possible.
I have worked at Microsoft, Google and Facebook as a software engineer, going through the full interview process every time.
The thing to realize is that being good at technical interviews (as done by the above companies) is a skill unto itself but it is a skill an intelligent person with a comp sci background has the ability to get significantly good at after a 1 to 2 months of disciplined preparation. - I went to a top ranked school myself and had a comp sci degree but was very intimidated by technical interviews until I realized that this was no different than all the other other intellectual hurdles/gauntlets I had successfully navigated up to that point by giving myself time to thoroughly prepare.
Get "Elements of Programming Interviews" and give yourself 2 months to prepare. Start to with "1-month" plan in the book spending at least an hour a day at the very minimum. (I have worked through both Elements of Programming Interviews and Cracking the Coding Interview in their entirety and while both are good, in my experience Elements of Programming Interviews was clearly the better preparation in terms of technical depth, breadth of exposure to the kinds of questions I faced in the full-day interviews, and succinctness of coding solutions)
Get dry-erase paper/notebook or a white-board and work through the problems by hand including the coding (important!). For the first week or two give yourself an honest focused couple of hours to wrestle with a problem before looking at the solution. it is not enough to settle for "I think I know how to solve this" - Actually code up the solution by hand and step through it with some simple cases. This is important and it allows you to develop confidence in your ability to think methodically through a problem as well as giving you an opportunity to develop mental heuristics for how to tackle and test unfamiliar problems. Developing confidence in your ability to think through interview-style problems is every bit as important as exposing yourself to interview-style problems. As you progress, you will be working towards being able to deconstruct a problem and be ready to start coding up a high confidence solution in 15 - 20 minutes.
"Talk to yourself" as you try to solve a problem to simulate explaining your thought process to someone as you go along.
When going through the solutions in the book, do not gloss over a detail you do not understand. Go online and find alternative explanations/references if you don't understand some detail of the solution provided.
After a few weeks of this kind of daily disciplined prep, you should start feeling pretty good and your confidence should start building nicely. Lots of interview questions are variants of each other and once you have enough breadth, you start quickly being able to key into the "type" of question and possible solution approaches almost as soon as you hear it.
Last thing is when you feel ready to start doing interviews, do not interview with your "top choice" first. If you can find someone that has done interviews to give you a mock interview, great! If not, schedule interviews whose outcome you are not as attached to (relatively speaking) first.
Start with the MIT linear algebra course (18.06) by Gilbert Strang and Stanford course on linear dynamical systems (EE263) by Stephen Boyd. Then move on to Boyd's course on convex optimization (EE364). Lectures for all of these are on youtube.
Do not try to read any books on "machine learning" (most of which are a total mess) before you have this background or you will just end up hopelessly confused.
I had an interesting experience with triplebyte which wasn't as objectively bad as yours, but it also makes me skeptical of the company.
First round was multiple choice questions, relatively straight-forward. Second-round was skype-call and just felt incredibly subjective. I was asked questions around building out memcached to support arbitrarily-sized values, and I got the same "smug" vibe you sensed.
The interview style was very
"Him: How would you do X?"
"me: Well that's not a simple problem, there are a lot of solutions each with tradeoffs."
"Him: Okay so name one"
"Me: So you could do X"
"Him: BUT THEN Y [GOTCHA!]"
"Me: Yes, that's one of the tradeoffs of X"
It wasn't clear to me what the heck he was even looking for. Was he hoping I'd list race-condition problems? Had he not even considered race-condition problems? Was he looking for a theoretical solution or a real-world solution? Also he kept going on random tangents ("That brings me to an interesting question, how would you shift a gigabyte of memory 1 bit?"). He seemed very concerned with efficiently bit-packing the header in this problem, which seems silly to me when we're talking about storing gigabytes.
My understanding was that triplebyte was seeking to be the SATs of engineering, however SATs do heavy validation with test-retest reliability and such, I had no particular reason to suspect triplebyte's interview was any more objective than any other company's.
I have learned a lot from reading the source code and watching it develop. It is written in modern Java 8. The authors are obviously experts of the language, JVM and ecosystem. Since it is an MPP SQL engine performance is very important. The authors have been able to strike a good balance between performance and clean abstractions. I have also learned a lot about how to evolve a product. Large features are added iteratively. In my own code I often found myself going from Feature 1.0 -> Feature 2.0. Following Presto PRs, I have seen how for large features they go from Feature 1.0 -> Feature 1.1 -> Feature 1.2 -> ... Feature 2.0 very quickly. This is much more difficult than it sounds. How can I implement 10% of a feature, still have it provide benefits and still be able to ship it? I have seen how this technique allows for code to make it into production quickly where it is validated and hardened. In some ways it reminds me of this: https://storify.com/jrauser/on-the-big-rewrite-and-bezos-as-.... You shouldn't be asking for a rewrite. Know where you want to go and carefully plan small steps from here to there.
Yes, Al/ML MOOCs teach the corresponding tools well, and the creation of new tools like Keras make the field much more accessable. The obsolete gatekeeping by the AI/ML elites who say "you can't use AI/ML unless you have a PhD/5 years research experience" is one of the things I really hate about the industry.
However, contrary to the thought pieces that tend to pop up, taking and passing a MOOC doesn't mean you'll be an expert in the field (and this applies for most MOOCs, honestly). They're very good for learning an overview of the technology, but nothing beats appling the tools on a real-world, noisy dataset, and solving the inevitable little problems that crop up during the process.
I encourage everyone in this thread to not take things personally. Rust vs. Go conversations are going to cause a lot of angst on both sides. I also encourage people not to assume or read into comments, where someone states that "Rust is not a play thing", they are not implying that Go is.
For some people who've spent the time with it, Rust is a godsend, but that does not mean that by believing so they implicitly hate Go in anyway.
Go clearly has a lot of people who love it as much as people who love Rust, and because they overlap, there will constantly be a conversation of which language to pick for cases where it looks like both could be used. As such, the learning curve for Rust will always be an impediment, to its adoption, which is sad, b/c in all my experience it's the only language that I've used which inherently answers every core issue or fundamental bug that I've encountered in my career. This is why I'm really excited that the Rust core development team has decided to focus on development experience heavily this year.
There is also a public tutor [1] based on the course - basically a set of slides and audio (I think taken from an actual course) plus sets of questions to test your understanding of the material. It's interactive though - so you can submit your answers in scheme and they're tested on the server.
I think this it's a discontinued experiment as there are only 2 tutors. For me, this approach works very, very well; you can read the material, listen to it and self-test your understanding with basic yes/no answers.
ClickHouse, DuckDB, literally anything GIS related.
Thank you dude.