As a geek, I actually think Excel is one of the few really nice things Microsoft has created (I realize it wasn't the first spreadsheet, but MS gets credit for taking it mainstream).
It's one of the few general-purpose programs that really empowers ordinary users.
Excel is essentially a functional programming environment used by hundreds of millions of people.
I agree with this. Excel is a wonderful product and I reach for it frequently when I need to get stuff done with some data. I tried to use OpenOffice and Apple Numbers and both are really poor by comparison.
Also, I've found that the combination of Perl and Unix command-line tools (cut, sort, uniq etc.) and Excel is really powerful. I can grind some data from log files or other sources into a format that Excel can read and then analyze it in Excel or get charts etc.
Excel and PowerPoint are the only Microsoft products I have on my Mac. Excel for its excellence and PowerPoint because I can exchange presentations with other people who use Powerpoint since none of the other presentation software really shares correctly with PowerPoint.
I had a similar experience, combining Excel (and VBA) with Perl and some other toolsets. I tend not to get too specific on here, but this allowed me to survive a major downsizing and extreme lack of budget while greatly improving accuracy and schedules, and to ride out a series of drastic changes to inputs that were "thrown over the wall" to my department. If I were to mention the dollar amounts being managed through those workbooks, people would shudder.
Excel has its cruft, and depending upon what you're doing you have to learn by experience some of its odder and dimmer (danker?) corners. But, that aside, IJFW. And quickly, and without choking.
That said, if you don't know what you're doing... Well, I had to "correct" a lot of people who assumed that because the machine showed them a number, it was right.
Excel will not enforce correct design, nor thinking.
Which reminds me of some crap dBASE programming I had to clean up, some years prior. And any number of other things.
It's not Excel. It's the people using it, and the people who assign them to use it without accounting for those people's limitations (and their own, apparently).
I have found mistakes in (other people's!) Excel spreadsheets that amounted to (real) 6 figure problems...
Then again a replacement system would probably cost about the same.
I was in a position where I did not have resources for anything else. (Yes, ironic, given the dollar amounts I was handling. But then, welcome to "big business"...)
And, with the changing shit raining down from on high, as well as the need to adapt processes for my own sake and survival, it ended up being for the best, anyway.
Within a limited value of "best". In retrospect, better would have been, ultimately, to be working somewhere else. (Though for a time, the relative autonomy and one very decent direct manager were rather nice.)
Some of the improvement I provided was correcting the outputs of a longstanding legacy system that routinely borked a "random" subset of its data. People had ostensibly looked and been unable to correct this in the original code, and at the time management felt it had no budget to work on this further, at the mainframe level.
So, I guess.... to some extent, it's not the system, it's what you do with it! Old, big dollar legacy project fucked up, and we ended up fixing it on the PC. LOL's aplenty.
I teach high school, and one regular time-sink is going over transcripts with students. Transcripts are textual documents that list the classes students have taken. To figure out which classes students need to take in order to graduate, we need to sit with each student and find the "holes" in their transcript.
I started writing a python script that would analyze a transcript for each student, and generate a visual representation for each student showing which classes they have taken and which they need. I started writing the script, but switched to Excel for maintenance reasons alone. I know that I could write a good script, and would enjoy maintaining it. But I want my contributions to live beyond my time at any one school, so I want new staff and students to be able to use and maintain the tools I create.
I made an Excel document where students enter their classes on one worksheet, and another worksheet generates a visual transcript. [0] It has already improved students' understanding of where they stand academically. I trust that staff, and even some students, can work with this document and keep it part of our school's culture when I move on.
I have new respect for the role that Excel can play in making some organizations more efficient.
Excel is the non-programmer's perl, basically. Simple, gets shit done, woefully, terribly ugly at times, but it doesn't really matter. I'd love to see the non-programmer's Python.
Python: "There should be one-- and preferably only one --obvious way to do it."
Perhaps Excel/Perl*Python wouldn't have any of Excel's organization tools exactly but would instead have only independent tables that could be made larger and smaller as needed.
Edit after reading some comment about databases:
Perhaps this program would be a front-end of sorts for databases or could act as one. That might be unrelated to the Perl/Python difference, but it could be a useful feature.
I feel something like subtextual.org may fit this bill. There's some interesting stuff going on there incorporating the text/cell nature of Excel, but with some more intuition.
Changing the programming language of Excel from VBA to Python will not do anything except increase the number of articles with Python coding on http://thedailywtf.com
Came to the comments page to make sure this sentiment got mentioned.
Excel is one of the nicest, best pieces of Microsoft software that I'm aware of. Well, Excel on Windows, that is. From what I've heard, on Mac it's always been kind of crappy.
There was a time (early on, as I understand more from reading than personal experience), when others exceeded Excel, including in some elegance and capabilities Excel never fully embraced.
But currently, in its domain, it is "the right tool for the job".
What will cause me to move away from it, I speculate, is Microsoft's apparent push to move it and Office to a subscription model. And my speculation about how that will effect one's ability to run it under emulation and so, more or less, perpetually (for continued access to data and models it contains).
I don't want to worry that, in 3 or 5 years or whatever, I will no longer be able to access my workbooks. Or that I will no longer be able to access them without paying perpetually, regardless of whether I want to use Excel for new work.
I have 20+ year old programs that still run fine under emulation. And reasons to return to them and the data they generated. Without paying X dollars/year forever, for the privilege.
Excel is a tool.
The major problem is the dimwits (mainly in suits) that use Excel instead of thinking.
Anyone can make a decision if $A$1 > $B$1 + $C$1.
When an Excel formula is doing the thinking for you, this is a bad sign.
For you as a person.
And for your job that will be shipped some day.
To India (no offense to the nice chaps over there).
Or to some ERP module (no offense to SAP).
Agree on India, but SAP? Nahh, the time they want to do that the guy that put the Excel file in place in the late ninties already had his job shipped to india a couple of years ago. And now there's nobody left who understands what this excel file is doing.
And putting that into SAP is way to expensive, as long as you can still port the Excel file to Office 2010, that is. If you can't, you'd have to ship the company to India...
The simplicity and flexibility of Excel make it great for one-offs and random hack jobs that don't require a new application or full-fledged report. At the same time, it has some minor idiosyncrasies that can be infuriating and make me wary of trusting it in the hands of average end-users who need to enter/analyze important data.
Automatic type conversion is my favorite. I can't even count the number of times I've received Excel spreadsheets where data was completely lost because of it. Leading zeroes at the beginning of your account number? Excel will gladly chop those off for you. Order number looks like a date because they used the year as a prefix? No worries, Excel will change that to a standard date and completely forget the original format.
Maybe I'm just crazy, but I don't think a business-oriented application should favor convenience that much more than data integrity.
This is why I'm continually amazed that Excel is used in business at all. I use a command line program to record my hours and can report the time I've worked on a project to the second. Unfortunately, my Excel-based timesheet sucks at math, and shaves off time when it converts my HH:MM:SS totals to the decimal figures (HH.x) required by payroll. It's so bad, sometimes the values displayed in the columns don't add up to the displayed total if you enter them in a calculator. No sweat, I only get paid for the displayed total, and at the end of the year I compute the difference and add it to my last timesheet.
I can't imagine trusting my business finances to a program that can't deliver a reasonable amount of precision. I wonder how many fortunes are made by people well-placed enough to exploit Excel's weaknesses ("Hmmm, Excel shows that we made only $10,000,000 at our bake sale. What should I do with this leftover $700,000?" or "I can use Excel to show you that I owe you less money than I actually do.").
Let's be clear, this is a type conversion issue for seconds, Excel's not going to start adding up normal numbers strangely.
And the point of this article and the point he's trying to drive home is that it lets non-programmers actually use their computers for computing.
For most businesses a home-brewed excel spreadsheet vs a $50,000 custom program that any of us here wrote?
The harsh reality is that the Excel version written by Jane from accounting who's the Excel whizz or even the smart college temp will probably be better and cheaper than anything we could ever give them.
I do understand what you're saying though, there is a time, place, and trade offs for everything related to computing.
(imho) Most 'businessy' systems can and should start out as spreadsheets, it lets the business people solve the problems with process design, and prove its real-world value without bringing a costly developer in. The developers (like myself) should be there to take the codified mess that results, clean it up, improve usability, bring in more stability, and accountability. - Sadly, that is rarely how things go.
The problem is that by the time they've modified the spreadsheet app to solve issues with process design, workflow, edge cases, it's an unholy mess that nobody wants to touch.
"The harsh reality is that the Excel version written by Jane from accounting who's the Excel whizz or even the smart college temp will probably be better and cheaper than anything we could ever give them."
Too true, but I've always wondered if this is true because visual programming languages have never really been accepted as part of development or because visual programming languages were touted as a replacement for "normal" programming? Maybe it is the bad taste CASE tools left.
Let me restate my thesis because I sense I wasn't clear given your response.
I believe that programmers rejection of visual programming tools has lead to a situation where programmers haven't been able to easily integrate Excel into programmer's workflow in a meaningful way or evolve tools as easy to use as Excel that translate directly to code.
Actually, Excel does add up normal numbers strangely, in that like (that many computer programs) it takes user input in decimal format but stores and operates on it in binary floating point.
Its hardly as if systems that do arithmetic on exact arbitrary-precision decimal (or rational) values and only resort to approximating with binary floating point when an operation requires it haven't been around for decades.
If you pre-format the cell as text, any value input will be stored in its entirety. For one offs, prepending any string with ' will force Excel to store as text.
For date, it typically does not truncate seconds, so not sure what's happening there.
I've run into those type conversion problems before. The data is usually still there, it's normally that Excel has decided to use a certain display format for the cell (such as a date). Simply set the formatting back to "General" and all is fine. Usually.
It's possible I've missed some setting somewhere in the tangled web of menus, but I'm fairly certain the data is lost in the case of both those conversions.
For example: Type 09-2012 into a cell and hit Enter. It will probably turn into Sep-12 (or some other variation depending on your default date settings). If you convert that to Text or General, it turns into 41153 -- the number of days since 1/1/1900.
Excel recognizes all dates from 1/1/1900 through 12/31/9999, so it happily converts anything from 01-1900 to 12-9999 into a date. You can create a function to convert them back, but it would have to be based on the assumption that the format was originally ##-#### (which you may or may not know for certain depending on where the spreadsheet is coming to you from).
Amen to that!! Yes, "Clippy" (Excel) gets a little too enthusiastic at times, but the missing piece here is metadata. So Clippy does what Clippy can without metadata, and the results can be laughable at times.
My fix for that, _if I'm sourcing the data_, is to source it as tab-delimited (not CSV-delimited) from the clipboard into an empty sheet area twice:
With the first paste, I 'visually' correct the columns' incorrectly guessed data types, but Clippy oddly doesn't attempt to reformat the data. No worries, I just select all, hit DELETE, then do my second paste using the same starting cell as my first paste. Clippy doesn't interfere this time, and my data comes up all beautifully typed as I want it. I don't know why, but the DELETE key doesn't kill the data types for cells, and I'm glad it doesn't! Not visible, not obvious, but very useful.
Depending on how you want to use the data, you may be able to get around this problem by adding a ' infront of the number. It won't show on the cell display and acts as an instruction to hold leading zeros.
I absolutely hate Excel. It's not the number crunching that's the problem. Database -> Excel, Excel -> Database issues are almost unavoidable. In 2 months of working with SSIS packages, I encountered every one of these problems (I'm not the original author of this rant):
"Anyone who has worked with a database in a professional capacity for more than 20 minutes should have a list of at least 10 reasons why Excel is a monster. These probably include:
1. The way it butchers postal codes that start with a leading zero, like the town I grew up in (Granby, MA 01033 USA)
2. Dates of any kind
3. Serial numbers that have leading 0's (see #1)
4. The JET database driver for Excel. One large WTF.
5. SQL Server Integration Services Excel datasource. WTF squared.
6. The f-ing "just put an apostrophe" workaround. WTF.
6. a. The equally effective "format as text before you paste" workaround. Gives the illusion of working, only to break later.
7. Save as CSV, then reopen the CSV in Excel. Lots of magical things happen there.
8. While on the topic, CSV files, which are a whole WTF on their own.
The root of all this: Excel makes things that look like tables, and tables are useful for data. There is no other program that is as widespread AND makes things that look like tables, so people use Excel to make tables of data. And it's in fact really, really bad at that. It was designed for ad-hoc numerical analysis and got appropriated as a database loading and reporting tool.
I think it's actually damaged the GNP of whole nations, this Excel program. It'd be interesting to know how badly."
Really, the main problem is that Excel (which has data types) is trying to support CSV (which has no data types) as a pseudo-native format. If Excel forced CSV files through the import wizard, and you could override a data type for each column, it would solve most of the issues. Instead, each column is implicitly treated as Auto and that fails in a lot of cases.
"The way it butchers postal codes that start with a leading zero, like the town I grew up in (Granby, MA 01033 USA)"
Excel, by default, treats any numerical object as a number. Numbers don't have significant leading zeroes. You can change the default data type, i.e. "format", to text or even postal code to preserve leading zeroes.
I find that does not always seem to work. We get CSVs with telephone numbers, and depending on the length, they end up being shown using E notation, even after selecting text formatting.
Why not run the CSV file through a custom made app that "cleans" each line in a proper format that will easily be imported into Excel, taking into account Excel's quirks. The app then spits out another CSV that Excel is able to properly import.
I find this solution a good compromise, although it may not be possible to do it.
Is this what you do? Because if someone sends me a csv file, under your workflow I'd have to open it in a text editor (or a different spreadsheet program) to figure out the data type of each column and run it through a perl script (or awk?). For a few columns, sure. For 200 columns, not ideal. Just to get around using excel's impmort directly. Plus, "taking into account Excel's quirks" is easier said than done.
That said, if you have a script you use that does this, you'd make a lot of friends if you posted a link.
I only do this for automated processes where I know the exact requirements of the CSV and the data is large. I definitely wouldn't do it for ad-hoc type of scenarios.
But for some things it just easier to put out an Excel file. In one scenario I have a set of complext spreadsheets that are updated nightly. I use EPPlus (http://epplus.codeplex.com/) a C# library which updates the appropriate data in the spreadsheet.
In an other scenario, I am taking in transactions (accounts payable/receivable, general ledger,etc.) that are in CSV format, applying some business rules and inserting them into a spreadsheet. This spreadsheet is then used by the accountants to do postbacks to the actual ERP system. I looked at doing this using the ERP's own batch interfaces and couldn't justify the time and expense as Excel was the best way to get the data in.
The above library doesn't require Excel on the machine. With Microsoft moving to an XML format for the file, it's made it much much easier to do these things. This particular library works well as long as you are doing simple data updates. There is no ambiguity in terms of the type of the data as you are able to explicitly state what is stored in the cell.
I would love to know if other library like this exist for Ruby or even Python.
If I'm going to write a custom app in some external programming language to consume and do type-aware processing on the CSV file, I might as well do whatever I analysis I need to do in that app, rather than just using it to reprocess CSV into CSV and then go back to working in Excel.
Its not like any of the languages I would use to do this (Python, Ruby, etc.) don't have easy-to-use libraries to treat the CSV as a data structure and perform analytic operations once I've spent the (minimal) effort to load it and do any basic transformations that I need to do based on the intended semantics of the data in each column.
we have to do cleanup of CSV imports in some areas of LedgerSMB and it isn't pretty.
Additionally the fact that it is even required makes you wonder what sort of monsters are lurking in the shadowy corners only to leap out later when you least expect it.
"Numbers don't have significant leading zeroes" is a rather daring statement. I guess you mean 'the western format of cardinal real numbers includes no leading zeroes,' but that assumes you are speaking of cardinal numbers, which are just some of them. 01033 is as much of a number as 1033.
Sorry for being nitpicky but do not confuse numbers with their representations. You would be amazed at the weird representations humans have used along history.
What a nightmare. Do they not realize how many DB tables start with 'ID'? And CSV is too common a format to just completely ignore. It's not always up to us.
Applies To:
- Microsoft Excel 2004 for Mac
- Microsoft Excel 2001 for Mac
- Microsoft Excel X for Mac
Sure, it sucks that it was there at all, but the most recent mentioned is a 9 year Mac version of the software (which is written by a different team than does the Windows Office anyways).
Yes! This ridiculous bug is still out there and is why I never use 'ID' when designing my schemas. Seriously, why can't reserved words be designed to be so uncommon, you'll never have a conflict? If I see another 'klass' object in Python or email broken because someone started a sentence with 'From' I'm going to cry.
Use LibreOffice calc to import the CSV then save to Excel format. No kidding, Open/LibreOffice gives you (among others) the option to chose the type of each column at import.
> 7. Save as CSV, then reopen the CSV in Excel. Lots of magical things happen there.
Indeed. My favorite one is accounting spreadsheets tending to export accounting numbers as text columns with trailing whitespace. That was off a recent version of excel for the Mac. I understand it's a cute convention for currency formatting but it makes data transformation in a database very, very annoying.
I really like excel, but the sad truth is that outside of the tech world, many many people are incredibly un-tech-savy.
I used to be a contractor for a government agency (that will go unnamed). Granted, it's the government, not the private sector but few people realize that most computers owned by the federal government still run Windows XP. Additionally, the approval process for getting new software usually takes 1-6 months. We're talking about installing something like Google Picasa. Additionally, software updates would have to go through a clearance process, leaving my computer completely vulnerable for weeks at a time while someone (maybe) scrutinized an update to Flash.
It wasn't only the equipment – the sheer lack of ability with computers surprised me. This wasn't an isolated incident – it seemed like everyone from secretaries to managers with PhDs were barely above that scene from Zoolander. Some examples:
We had one "analyst" who had never heard of pivot tables in excel. This is someone whose job it is to analyze massive budgets. They were manually selecting cells to see the count number at the bottom of the Excel window and writing coordinates down on a piece of paper.
After having transferred to Google Applications for 9 months, there were still several people who were surprised to learn that Chrome was a web browser. One asked, "but how do you Google things?"
$1500 videochat system? Forget it, nobody knew how to use it and rarely ever tried.
I think people are starting to wake up to the importance of technology, but I really feel like employers should do more to test their problem solving ability. I am by no means an expert with VB or the more advanced aspects of excel. But my ability to research quickly and solve problems put me miles ahead of everyone else.
>Granted, it's the government, not the private sector but few people realize that most computers owned by the federal government still run Windows XP. Additionally, the approval process for getting new software usually takes 1-6 months.
We're still using IE7 in my government office. Good times...
<off-topic>To all the Font wonks out there who tell me about the importance of the font to establish branding...blah,blah,blah. The font on this site, while cool looking is very hard to read.</off-topic>
<on-topic> Something I tell junior devs when making reports, if the data can't be exported to excel, then it isn't a report. It doesn't matter how cool your filter/sorting capabilities are, how good you make the data look, your charts could be beautiful to behold. If you can't export the data to excel then you haven't done anything of value as far as the custom is concerned, because they will ONLY look at the data if it is in Excel. In many companies, that is the only feature that is used (the export to excel).
That's been my experience too. It actually saves a lot of time figuring out how to make fancy graphs and sort/filter/pivot functions--I just make nice semantic HTML with <table>s of values. The users can point Excel at my URL and it'll automagically turn it into a spreadsheet they already know how to use.
One thing I haven't figured out how to do is have a link on my page that triggers an Excel HTML import operation on the current page.
The last time I've done this I just have a link that sets the content type header to something like text/excel (that's not it exactly i don't think, google for it) and on Windows at least it always worked to open the page in Excel. And if you use an HTML table, excel will just open it as a spreadsheet, including any CSS text formatting you use.
Is that what you meant, or did I answer a different question?
Unfortunately, Excel can't cope with some of the more modern authentication systems being deployed on the web, so it's no longer as easy as providing a URL that spits out an HTML table, if you have to protect sensitive data behind a login (I think it can handle HTTP-Basic, but not anything requiring redirects; this may have changed in recent versions).
ps. you should make the generated script tag do src="//r.office.microsoft.com/r/rlidExcelButton?v=1&kip=1" so the http/https issue in your FAQ goes away
Excel wasn’t the problem at JP Morgan. There was a reckless culture that thumbed its nose at rules, ignored the guidance of review committees, and tried to sweep things under the rug when they got caught. That would have happened whether the models were written in Excel or Ruby.
If anything, Excel promotes transparency in finance by allowing more people to read the "source" (even the bankers). Cutely named forgotten programmes written in J were a more fecund source of problems.
Agreed. They were doing CDS deals. Every bank I've worked at has their models in COM assemblies that can be consumed by Excel. Excel lets the traders value their instruments using the models. The inputs may have been copy-paste but I'd be hard pressed to believe that the risk numbers were generated by hand in Excel unless this was some very exotic instrument which I don't think it was.
Risk numbers aren't generated by Excel but an automated process that uses the same assemblies.
have you tried reverse engineering someone else's non-trivial spreadsheet? a programmer might build a good spreadsheet but the average person builds a nest of lookups and hacked up nested if/then statements. seriously, i think spreadsheets are un-auditable - the only way to double check something is to write your own version and check the output... now you've got 2 problems...
Financial analyst here (and front end dev geek / enthusiast). It’s been interesting to see a number of recent HN articles related to Excel. I work for a Global 500 manufacturing company and can assure that Excel supports a large number of business processes and decisions.
Raw data is often stored in proprietary OLAP data stores which are provide a single version of the “truth”. The financial data is retrieved through the vendor’s Excel add-ins. Analysts can then use Excel’s basic functionality to transform and enrich the data and finally output it in a format suitable to be presented to decision makers.
Having a decent knowledge of web technologies, I’m often frustrated not to have a shiny web app that will automagically show the data in stunning tables and graphs (e.g. d3.js bliss). For me, the main reason we don’t see proper “developer made” applications in large corporations is that they do not allow for quick and fast iteration and adaptations. Here is a very typical situation in my job : A manager bursts into my office to ask the following : “Hey, I know we usually compare our XYZ monthly performance to our prior year performance and to our last forecast. Could you compare add in a comparison between the year end run rate and forecast ? Oh, and could you also a express XYZ as a percentage of ABC, it could be insightful. Thanks ! ... don’t work too late.”
After a couple of Excel ninja moves : job done, manager happy, business decisions made. If the data is wrong, I'm responsible, not the mistyped Excel formulae.
"... I’m often frustrated not to have a shiny web app that will automagically show the data in stunning tables and graphs (e.g. d3.js bliss). " : Are you looking for a web interface for manipulating OLAP data? (something like http://www.tableausoftware.com/olap)?
I've always been a bit perplexed that there isn't a spreadsheet as good as 123 (ie, 1980s technology) as a standard part of Linux distributions. There is a massive blind spot in the open source world around spreadsheets. As far as I can tell it's down to the geeks writing open source tools not being interested in solving business problems and focusing on tools for working with code and manipulating data in text format, not cells. Which is understandable, you write tools for free for your own needs and you get paid to write tools for others. So businesses pay Microsoft for Excel.
I meant like a 123 (or even Visicalc) clone that would run in a terminal. It seems like such a basic thing that has existed on PCs for so long (in fact even predates the IBM PC) so why did it have to wait for Staroffice/Sun to make a GUI version? And there still is no decent text mode spreadsheet that can run over SSH. Even a 1:1 123 clone would be perfectly functional.
My company's part numbers are in the form 00-0000. After enough conversations about how to convert back to this format from the date Excel changes it to, we've finally just decided to change our sku format. A rolling change though, so we'll still be dealing with it till the current 5k sku's are all EOL.
I also hate the scientific notation default, in addition to the leading zero. Guess what, UPC's exist and no one wants them in scientific notation.
But when someone sends you a CSV file (a very common format for database exports and EDI), Excel does the type conversions automatically when you open it. You don't get a chance to change the cell format to Text beforehand. The ' workaround is a huge time-waster if you are dealing with a large amount of data, plus it screws up the file for use outside of Excel.
There really needs to be an option to turn off all type conversion globally for all files in Excel.
The real problem arises when you ask someone to send you data in CSV format. If, in between exporting it from their database and sending it to you, they happened to open and save it in Excel, you will get corrupted data. Usually the sender is blissfully unaware of what Excel's automatic type conversion does to their data.
CSV has been made unreliable as a format for data exchange between companies (aka EDI) largely because Microsoft decided that CSV files should always be opened in Excel by default in Windows. At the very least they should turn automatic type conversion off for CSVs.
Excel ignores double quotes around the fields, it just uses them to escape commas inside the fields.
The official way to do it is to insert a single quote at the beginning of every field that you don't want auto-converted. In practice this is a time-wasting pain in the neck and ruins your data for use outside of Excel.
The problem isn't the lack of workarounds. The problem is that we have all departments - Purchasing, Marketing, Operations/Warehouse, Sales, Finance, IT - working with sku's in spreadsheets. Every single time, every single person needs to know the workaround(s), and as other people are mentioning, you then have to hope it's not somehow saved as something else. And also as mentioned, if someone else has worked with the data before you get it, good luck restoring it to what it was before.
In an old job I had to slice and dice lots and lots and lots of spreadsheets and csv dumps -- almost all one offs, or in ways that were one offs.
Sure I could hack up some scripts to do that work, but almost everytime it was quicker and easier to just use Excel as a handy-dandy swiss army knife to do all kinds of bulk data processing.
It's a stupid good tool that gets you almost dangerously far with a modicum of effort and no additional cost.
Doing the same work any other way would have meant keeping 3 or 4 engineers on staff full-time banging out code and managing databases. I or another guy on my team were able to do everything we needed in less than an hour a day, then load the results into an appropriate analysis tool.
Quite often the appropriate analysis tool was also Excel.
The start of the article infers that the London Whale Trade was caused by manual handling of data between spreadsheets.
The real reason came down to a flaw in the formula they were using. From the JP Morgan report:
... a decision was made to stop using the Basel II.5 model and not to rely on it
for purposes of reporting CIO VaR in the Firm’s first-quarter Form 10-Q.
Following that decision, further errors were discovered in the Basel II.5 model,
including, most significantly, an operational error in the calculation of the
relative changes in hazard rates and correlation estimates.
*Specifically, after subtracting the old rate from the new rate, the spreadsheet
divided by their sum instead of their average, as the modeler had intended.*
This error likely had the effect of muting volatility by a factor of two and
of lowering the VaR.... It also remains unclear when this error was
introduced in the calculation.
The whole concept of VaR is an industry-wide consensus hallucination. Even if you do it right you're doing it wrong.
But to your point, if this formula had been in a single library procedure in version control rather than pasted and repasted into various dingy corners of various spreadsheets, this sort of error would have been less likely. Manual handling of formulas is at least as dangerous as manual handling of data.
I work for a mid-sized financial software company and am involved with dev on a mature, used in production, in-house Excel clone.
Initially I figured this was about the craziest thing possible, but over time I've come to realise the company derives genuine competitive advantage from this system because
1. it allows actuaries to program calculators in a language and environment that they are comfortable with. It is a lot easier to find finance guys that do excel than ones that can seriously program.
2. tbh, excel is often a really good tool for the job because it allows visualization of data as you work. If you work often with projections it kills the alternatives like numpy, matlab etc.
The system is pretty advanced and has been used in production for about 8 or so years. We have an interpretive runtime for use during dev and also a static compiler that generates c++ and creates a shared library per sheet.
Some interesting points about implementing excel:
* Most functional languages do lazy evaluation on the assumption that there's a fair amount of arguments that won't be evaluated. We find that in excel all arguments are almost always used, so lazy evaluation and thunks just add overhead if you use them in all cases. We just have special cases for IF and OR et al.
* Performance is all about cell caching - i.e. memoization - but you only really have performance problems if you want to do root finding monte carlo sims online (we do). We have a dependency tracking system so cached cells are selectively flushed only when a cell they depend on changes.
* the system generates very large amounts of static c++, sometimes hundreds of thousands of lines for one sheet - this can be necessary when the sheet has millions of cells, even though we scan for similar formulas and factor them into single functions to improve spatial locality. MSVC can compile a million line .cpp in about 5 minutes using about 1gb ram - gcc 4.6 would use all the memory on my 8gb machine and swap ad infinitum (but if you split the files it is fine).
I really don't mind people sending me stuff in a spreadsheet, if it's something that warrants such use.
Things like "I’ve had screenshots pasted into Excel and attached to an email. Excel is an ubiquitous file format" mentioned in the article is frustrating as well, as particularly in Japan, there's this weird practice of using it as graphing paper, by making each cell into tiny squares, and use it as free-form word processor alternative. (I'd say, PowerPoint would work better for this -- here's the thing, lowest tier of MS Office in Japan doesn't ship with PowerPoint, ugh.) Some of these misuses are actually harmful -- "graphing paper" usage of Excel causes a lot of trouble when it comes to printing, and long-term maintaining, and there's no document structures in such use.
Here's one thing I don't understand. Geeks love Smalltalk image-based persistence. Geeks hate Excel documents. [1] Don't they have the same problems? How is the program state in a Smalltalk image version-controlled? Is that even a sensible question?
> Here's one thing I don't understand. Geeks love Smalltalk image-based persistence. Geeks hate Excel documents. Don't they have the same problems?
They have neither the same benefits nor the same problems, though they have some overlap in each.
Also both the "Geeks love Smalltalk" and "Geeks hate Excel" generalizations are over-generalizations, and the set of geeks for whom the former is valid are not the same set of geeks for whom the latter is valid (though, again, there is some overlap.)
So, I don't think this even in the ballpark. There are a very high percentage of "geeks" that would have no idea what Smalltalk's image based persistence even is. Excel is like the lingua franca of the financial business world. It is used an abused. The Smalltalk persistence thing is ... it's an odd comparison, let's say. :)
I have worked in both a bioinformatics lab at University and a medical devices company. At both places, I have seen excel used for a HUGE amount of tasks by biologists, business people, software testers (from non-programming backgrounds) and programmers. It was used as a tool, along with perl, php, c/c++... depending on the level of complexity. Its just damn useful and everyone knows how to use it.
I like Excel for what it is supposed to be used for, and it remains the only MS product I use cause it does its job well. But it has limits, which are all too frequently abused.
Very interesting. I wonder what open source solutions there are that could supplant the use of excel by non-programmers (and programmers) in such a variety of fields. Something slightly less opinionated and slightly more malleable.. and just as simple to use. If there isn't something, there probably should be. I think GUI is a must if its going to overtake excel.
"Sorry?" It is wonderful. People make great things and run their businesses on Excel. It has its faults and people are not programmers, but it makes an incredible percentage of businesses run.
Excel is still used because it has High Operating Range with a Low Barrier to Entry. Someone can learn how to do simple tasks in 5 minutes, while advanced users can write macros. You can build a simple calculator or a complicated model without going through a programmers learning curve; where, you've traditionally had to learn multiple concepts before you can print "Hello World".
Sneering at Excel is a sign of someone who doesn't understand the goals of business and in many cases why they're employed. They aren't employed to code nice apps in $language, that's HOW they do their job of making the business run more smoothly.
I've built "apps" in Excel - simple stupid crap for doctors to enter hospital charges, etc. There's no database backend, lookup is "is there already a sheet for this patient?" Creating a new sheet is clicking the shortcut to the template and entering the patient's name, the hospital and the month, saving is closing and accepting the generated file name. Training was minimal, backend is office staff, and it's lasted through 2 separate billing systems. Development was simple form layout, locking cells, adding a few dropdown lists to populate cells, and setting up a couple of button/autoclose vbscript macros.
Cheap, simple, lets doctors capture charges that are worth more in one week than I was paid for the development 5+ years ago.
I came to HN as a distraction from the mind wrenching exercise of compiling 5 different copies of excel documents where everyone has butchered their version in a different, horrifying way. Luckily, this is only used to keep track of procurement data for a billion dollar engineering project... Excel is a powerful, but too easily abused, tool.
An Excel form development tip: select the entire sheet and set the column width to 1. Merge cells that users will be entering in. Lock all non-input cells. Hide empty areas beside/below. Yes, it looks like a printed form. Users understand printed forms.
I used to work for a company that used a series of excel sheets to calculate the price of investment funds. Sure, it was a primitive technique and sometimes a real pain in the ass, but it gave us the flexibility to make adjustments as needed.
While I was there they were beginning to transition to a Microsoft Dynamics based system, which turned out to be a nightmare. Maybe it was a case of bad developers, but the guys working on this system seemed oblivious to the actual mechanics of what they needed to build.
When you’re working with time sensitive data, making a few adjustments in excel rather than logging requests to have some code fixed or updated can make a lot more sense.
I don't have strong feelings about people using Excel to get things done. It's a GUI for data processing that could otherwise be done in the shell. That's fair because I wouldn't expect normal people to learn the amount of programming you'd need to match what you can do in Excel.
The problem I have with it is the same for any data in a proprietary format: It needs to be exported to something else before it can be manipulated. LibreOffice seems to do a good job of decoding .xlsx files so it's usually not too much of an issue. When there's functionality in a spreadsheet that can't be interpreted by an open source equivalent, then it becomes a problem.
I think in the other thread about this, someone posted something to the effect that the spreadsheet with the flawed formula was not questioned or tested because it provided justification for the kind of reckless, short-sighted risk-taking behavior that the decisionmakers at the bank wanted to promote anyway. Spot-on. In organizations like this, the decisionmakers will always massage models and abuse statistics to support the decision they already know they want to make anyway. That's why it's important that they are forced into a structure where testing and oversight are required.
I'm a developer at a fortune 500 company focusing on Business Intelligence. We spent 2 years pumping out report after report, dashboard after dashboard on a new BI web platform. The first question always asked was "How do I get it in Excel?" Luckily after years of hearing this, IT management acquiesced and I am now in the middle of rolling out a data virtualization platform that allows users to connect directly with Excel to "virtual" databases that insulate them from having to write complex SQL. Lesson? You must bow down to Excel.
As someone who works in the financial services industry, I think that all of the MS Office applications are great - much better than the open source alternatives (although I wish this wasn't true).
I also want to be the lone voice here in saying that I also think that Windows is excellent, and is much easier to use than OS X and Linux (and I've used them all) for everything except programming. This is a very unpopular opinion on Hacker News, but in the real world a lot of people are like me, agree with me, and it is worth bearing this in mind.
>VisiCalc inspired Lotus 123, a similar program for IBM PCs. Bosses were much more willing to order a PC for their staff than something built by California hippies. As they used to say, you’ll never get fired for buying IBM.
Is this really true? I don't remember this being a widely held viewpoint but then again I was 10 years old. Back then, normal people didn't know who Steve Jobs was, never mind his damn-hippy ways.
My impression is that Apple II/VisiCalc were used by small/personal business and Lotus 123 was for big corporations/industry. But I might be wrong since I wasn't even born on that period.
Given that perhaps 90% of Excel spreadsheets have errors I fail to see that Excel is a good or useful tool. Nicholas Taleb warns against the use of constructs (e.g., value at risk) that give false confidence to the user. Excel unfortunately appears to be such a tool. I believe most Excel users would deny that _their_ spreadsheets have significant errors notwithstanding strong evidence to the contrary.
Your comment is just darling. "Given that perhaps 90% of Excel spreadsheets have errors ..." -- a completely made up statistic leads to -- "I fail to see that Excel is a good or useful tool." -- a conclusion that doesn't even follow from your made up evidence, even if it were true.
Nicholas Taleb would be rolling over in his grave, if he was dead.
"My refutation of the VAR does not mean that I am against quantitative risk management - having spent all of my adult life as a quantitative trader, I learned the hard way the fails of such methods. I am simply against the application of unseasonned quantitative methods."
Use of a paradigm that has such a high error rate is, at the very least, an "unseasoned quantitative method".
A year ago I graduated and started working in the corporate world. Damn Windows and MS Office all around. I realize now, that at university I was living in a bubble of nice, free software. And if you had a problem you programmed your way out of it. Now I use Excel all the time.
I think the point of the parent comment was not about software being free but by being able, quote, "to program[...] your way out of it". Which I think is an important argument for or against Excel (or many other programs) depending on whether we talk about programming users or non-programming users.
Right. But probably I was just lucky enough never to encounter such kind of corporate world which uses Excel (or any other spread sheet software for that matter) in such weird fashion.
I don't think many programmers think Excel is a bad tool. Excel is a great tool for what it does. But, Excel is not a database. And when an Excel spreadsheet actually becomes your business's application, that's when you have problems (in my opinion).
Doubtless there are more sophisticated ways to do this, but for a quick one-off, manual validation test on tabular data presented by a web page, I scrape it (highlight it) off the web page, hit ^C, paste to a text editor, do some reformatting, then paste to an empty sheet.
Then I do my checksums of the page's presentation in the sheet. Find a bug, go to the SQL in the back end, fix the cause, write a little (manually run) SQL validation test test for that case, then cycle through it all again until the bugs are out and a nice little suite of validation tests in SQL. Really nice to have Excel, where I can point-and-click to write sums.
That is not spreadsheet risk, it's business process risk. You address risk by applying controls.
My old boss was a former labor statistician. His job 40 years ago was basically producing reports by having sets of data tabulated (aka "sending a job to a pool of people with big mechanical calculators") analyzing the data, and sending it somewhere else to be compiled into some report that was shipped to various places. They had people randomly sampling calculations for key or other errors. Other people were sampling the quality of his analysis and yet others were proofreading and double-checking the material prepared for print for typographical errors. The problem there was that building that process required thousands of people and a very rigid procedural setup to ensure consistency.
Computers changed all of that, and ultimately, all of that checking and re-checking was replaced by Excel. But that doesn't mean that you don't but a process around financial activities. You still need checks and balances.
All of these banks made decisions that speed to market for trading was worth the "risk" -- in this case that incompetent or malevolent traders can potentially put the bank out of business. The management accepts this risk because they don't bear ANY downside risk, as the bank is ineffectively regulated corporation. In the days when investment houses were partnerships, there were much tighter controls, as failure of the firm would bankrupt the partners.
Excel has been used and abused past it usefulness in the 'Big (Monkey) Business';
Any tailored software has to go through the constant scrutiny of IT managers and bean counters that don't understand anything and are over impatient while a donkey can 'code' a couple of function in Excel. Excel DOES NOT take into account user abuse and collaboration between them, and I don't blame it. But this is a networked world and not anymore a collection of work station where data is better passed from one user to another one via the mean of a floppy disk.....
The use of tools is reflecting the cerebral activity level of its user base.
Untouchable algorithmic integrity? History tracking? Data not easily corruptible by users? Accuracy in decimal amounts past 12 to 14 significant digits, particularly if fractionals involved? Huge data volumes? Mind-numbing repetitive tasks?
Excel for that? No. Code for that? Yes.
Agility? Quick checksums? Looking for errors? Ad-hoc analysis?
Excel for that? Yes! Code for that? Depends on whether I have to do it again, or if it gets big.
Ad-hoc correlations for equality checking? I'll take a FULL OUTER JOIN over criss-crossing cranky VLOOKUP(...)s any day.
I find I'm using Google Sheets for everything I used to use Excel for. It's easier to script, much easier to share, and if necessary it will export to Excel. I haven't run into anything I need that it won't do.
Of course it wouldn't be ideal for a business with a lot of legacy stuff in Excel.
I recently have the "opportunity" to use Windows and MS office products after 10year or more absence. And Holy Fuck does MS suck at making usable software products. Almost every single UI/UX choice they made is wrong. They ask for confirmation when it's uneeded, they blithely fuck shit up without confirmation when should ask. 4 bazillion tab bars filled with crap. Simple things are hard to find or do, every suggestion / guess is wrong. Things they should guess (such as ',' is the delmeter when importing a file ending with .csv and filled with csv data) they don't. On and on.
Really the most frustrating experience since trying to buy a Nexus 4.
The only reason most people don't suffer this is they've been slowly acclimated to this crap over many years of Office's evolution to below the bottom.
The other MS product I use regularly, xbox and mobile glass or whatever it's called, both also have fucking horrible UI. But half of that is them wanting to force thinly veiled ads and up sells at you.
As Android and iOS go to become the #1 and #2 platforms in the future, I don't see Office remaining that relevant in the future, even if they port it to those platforms.
Applying the "death of the PC" mantra to all things gets old. Things like multi-tasking, having access to a filesystem, embedding different types of documents is a "feature" that is really useful to people doing actual work.
One of my duties a couple of years ago was doing budgeting and rate-setting for a $50M IT business. A rich spreadsheet like Excel was an essential part of the that process, and there is no replacement platform out there that is going to replace that category of app. (You may be able substitute LibreOffice or something.)
"As Android and iOS go to become the #1 and #2 platforms in the future"
+1
And not only that: there's a massive shit to webapps and more and more people are using GMail. The day a (basic) Excel user discovers Google Docs spreadsheet and realize he can share a spreadsheet either read-only or read-write with another GMail user is the day he stops using Excel.
I do certainly see Google spreadsheets gaining lots of traction against SMEs and independent contractors and, horror, I do even know people making very very good looking documents using non-Excel and non-Google spreadsheets on Mac! (heresy for anyone on HN apparently).
Zero Excel spreadsheets here. Tens (if not hundreds) of Google Docs spreadsheets and most weren't created by me but shared with me.
Sorry, David Michael Ross (apparently you love your name enough to show it using big fonts on your site so I put in in full here), {iOS,Android,Linux,Java,OracleDB,...} is Everywhere.
What is TFA's point by using such a linkbait title? My home router (a "gift" from my ISP in exchange of a subscription) is running Linux. My Internet TV decoder is running Linux + Java. My phone runs iOS and my girlfriend's phone runs iOS. We have two Mac computers here (and a Linux one but that isn't common).
You can hardly make an electronic money payment without having Java involved in the process at some point (including to generate COBOL on the fly!).
Hundreds of millions of people (really ?) are using spreadsheets? So what: there are hundreds of millions of people carrying Java smartcard in their pockets daily. There are hundreds of millions of people using cellphones. There are billions of people using a browser daily.
What is the point about spreadsheet? We get it: people need to fill taxes, compute "stuff", etc.
We also understand that the corporate world (representing less than 50% of a country's GDP but being very "big-mouthed") uses Excel.
Just like the corporate world is totally and utterly dominated by solutions like SAP and its army of consultant writing ABAP and Java code to interface with SAP.
Is Microsoft is still dictating the rules of the entire IT game because Excel is a spreadsheet software?
Is that why such linkbaits are posted? Because we like to know that it's possible that companies like Apple and Google (two places where you're probably not seeing a lot of "Excel" compared to the other stuff you'll see the people there working with) can come tomorrow and change the world?
But, no, we should all be in admiration because spreadsheets are used in the "real corporate world" (and because of course we should bow in front of the corporate world, because the only business is in corporate right!?) and because Excel has a huge market share amongst the various spreadsheets software (I do certainly see Google Docs making inroads that said).
Seriously: what's the point!?
What's next!?: "Sorry, nerds, Microsoft Word is everywhere"
Or "Sorry, hackers, Internet Explorer is everywhere"
Or "Sorry, crackes, Microsoft Windows is still present on hundreds of millions of PCs".
Really? What is the point?
That we should have give up programming because every single programming need out there can be filled by a corporate user knowing how to enter an IF/ELSE in a spreadsheet!?
I'm seriously confused by these articles and the fact that people do still upvote the blatant linkbait.
It's one of the few general-purpose programs that really empowers ordinary users.
Excel is essentially a functional programming environment used by hundreds of millions of people.