Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>This is only feasible if the program takes no input, or a very limited set.

One of the insights that people making tools for dynamic languages discover over and over again is that most uses of dynamic features is highly static and constrained. In general, yes, a python program can just do eval(input("enter some python code to execute :) \n>>")), but people mostly don't do this. People use extremly dynamic and extremly flexible constructs and features in very constrained ways. This is like the observation that most utterances of human languages are highly constrained and highly specific, even within syntactically-valid and semantically-meaningful sentences, not all utterances are equally probable, and some are so improbable as to be essentially irrelevant. People who try to memorize the dictionary never learn the language, because the vast majority of the dictionary is useless and mostly unused, and even the words that are used are only used in a subset of their possible meanings.

>Once you open it up to arbitrary input, no single run (or even a large set of runs) can capture everything the program might be expected to handle.

Anything is better than nothing, right ? if your program keeps executing with some set of types over and over again (and it will, because no program is infinitely-generic, the human brains that wrote the code can't reason over infinity in general), wouldn't it be better to record this and make it avilable at static write-time ?

Human brains are finite, how do we reason over the "infinite" types that every Python program theoretically deals with ? We don't! like I said, most dynamic features are an illusion, there is a very finite set of uses that we have in mind for them. Here is an experiment you might try, the next time you write in a dynamic language, try to observe yourself thinking about the code. In the vast majority of cases, you will find that your brain already has a very specific type in mind for each variable (or else how can you do anything ? even printing the thing requires assuming it has a __repr__ method that doesn't fail.).

>How does the type profiler know that the variable that only contained values like "123" or "456" was handling identifiers, not numbers?

It doesn't. I think you misunderstood the idea a little, the type profiler makes no attempt whatsoever at discerning the "meaning" of the data pointed to by variables, it will only record that your variable held strings during runtime. If the number of string values the variable held was small enough, it might attempt to list them like this "Str WHERE Str in ["123","456"]". If the number of values the variable held was larger than some threshold but some predicate held for it consistently it can also use that, i.e. "Str WHERE is_numeric(Str)". If a string variable was always tested against a regex before every use, it will notice that and include the regex into the type. No additional "smart pants" than this is attempted, just the info your VM or interpreter already knows, just recorded instead of thrown away after each execution.

The profiler will not and can not attempt to understand any "meaning" behind the data nor it needs to in order to be useful, it's just a (dynamic) type system. No current type system, static or otherwise, attempts to say "'123' is a numeric, would you like to make it an int ?", that would be painful, absurd in most cases I can think of and misguided in general.



> wouldn't it be better to record this and make it available at static write-time

I think you misunderstand my position. It's better for the creator to simply specify it when they write the code - i.e. static typing.

> the type profiler makes no attempt whatsoever at discerning the "meaning" of the data

Your examples are waaay beyond what I need or expect. What I need is for the program to recognize that, for a number, 123 and 456 are valid, but "abc" will never be. Conversely, for an identifier, no matter how many runs use values like 123, someday someone might provide the value "abc" and that's ok. Also, any code that attempts to sum up a collection of identifiers should not be runnable, even if the identifiers in question all happen to be numbers.

This is something that static typing provides, and no amount of profiling will ever be able to divine.

> In the vast majority of cases, you will find that your brain already has a very specific type in mind for each variable

Sure, for the code I write. But I've seen plenty of code written by juniors where, upon inspection, it was completely inscrutable whether a given parameter expects an integer, a string, a brick wall or a banana.

Which is all to say that my default position is unchanged: dynamic typing is unhelpful for anything larger than small, single-purpose scripts.


> No current type system, static or otherwise, attempts to say "'123' is a numeric, would you like to make it an int ?"

SQLite's column type affinity will in fact do this, if you tell SQLite that a column is an INTEGER it will turn '123' into 123 but will happily take a string like "widget" and just store it.

I also wanted to add that your thoughts on this subject are well-stated and align with some work I've been patiently chipping away at, in the intersection of gradual typing and unit testing. I'll have more to say on that subject someday...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: