Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Anthropic's 100k context is now available in the web UI (twitter.com/jlowin)
253 points by jlowin on May 15, 2023 | hide | past | favorite | 176 comments


Claude 100k 1.3 blew me away.

Giving it a task of extracting a specific column of information, using just the table header column text, from a table inside a PDF, with text extracted using tesseract, no extra layers on top. (for those that haven't tried extracting tables with OCR, it's a non-trivial problem, and the output is a mess)

> 40k tokens in context, it performed at extracting the data, at 100% accuracy.

Changing the prompt to target a different column from the same table, worked perfectly as well. Changing a character in the table in the OCR context to test if it was somehow hallucinating, also accurately extracted the new data.

One of those "Jaw to the floor" moments for me.

Did the same task in GPT-4 (just limiting the context window to just 8k tokens), and it worked, but at ~4x more expensive, and without being able to feed it the whole document.


Using LLMs with 100GB VRAM to convert PDFs to CSVs is truly depressing, but I am sure many companies will love it.

2023 office software already uses 1000x more ressources than 1990s'. I bet we are ready to do that again.


Not just PDFs with tables. It works on any semi-structured document with key-value pairs like invoices, purchase orders, receipts, tickets, forms, error messages, logs, etc.

The "Information Extraction from semistructured and unstructured documents" task is seeing a huge leap, just 3 years ago it was very tedious to train a model to solve a single use case. Now they all work.

But if you do make the effort to train a specialised model for a single document type, the narrow model surpasses GPT3.5 and 4.


Consulting companies are paying juniors > $150k per year to do this kind of thing. In some objective sense, it's absurd, but locally, it makes more sense to use an expensive gpu than an MBA class president. And in 10 years, everyone's phone will have that much compute anyway.


It's funny but React/Node/Electron apps will suddenly become minimalist once everyone and his brother start adding a neural model to his app that consumes 10GB of V/RAM.


You're missing the developer time. You no longer have to spend hours (or days, perhaps weeks depending on the sources) stringing together random libs, munging and cleaning data, testing, etc etc.


I agree, computers are cheapers than engineers.

But I wonder how much more productive our economies could be if everyone was taught programming the same way we teach reading & writing, and open standards were ubiquitous.


> wonder how much more productive our economies could be if everyone was taught programming

Prompt engineering is turning coding problems into language problems. It’s conceivable that humans writing code becomes artisanal in a century.


> humans writing code becomes artisanal in a century.

At the pace we’re moving at now we’re talking a few decades away at the most, well within most peoples’ career span. I feel sorry for any junior coder just entering the industry.


Coding problems have always been language problems


> Coding problems have always been language problems

Pedantically, sure. The field ChatGPT is most impactfully commoditizing is low-level coding. Instead of someone giving natural language instructions to a team of humans, they're increasingly able to give them to an LLM. It's an open question how far this can scale. But we may be near the zenith of the practicality of large-scale coding expertise.


The field C is most impactfully commoditising is low-level coding. Instead of someone giving opcodes to a CPU, they’re increasingly able to give them to a compiler. It’s an open question how far this can scale. But we may be near the zenith of the practicality of large-scale coding expertise.

Pedantic, maybe, but “coding expertise” isn’t going anywhere.


If you’ve never built PDF or archive document parsing systems, you don’t know true pain.

I see it as incredible. Most PDFs that i see are basically just thin wrappers around image scans of documents that don’t exist anywhere anymore. Archives from estates, manuals, etc.

These techniques of using LLMs to clean ocr output is game changing because best in class before was human-in-the-loop systems that required huge amounts of rewriting to get useable output.

Now LLMs are unlocking for significantly cheaper previously difficult data sources for relatively cheap.


On youtube there are timer and stopwatch videos that have millions of views, people are streaming 1080p videos for something that can be implemented locally within 20 lines of code, but does it matter really, it won't make a dent on Google's revenue.

If LLMs are deployed in large enough scale, the convenience really could justify the cost.


we also had more secretaries and people who just retyped things all day in the 90's!


It's worth double for the increase in accuracy. Don't let me go to Amazon Mechanical poor souls Turk.

https://en.wikipedia.org/wiki/Amazon_Mechanical_Turk


The better version of this is using this massive LLM to _create a program_ that can then extract the same data of similar PDFs. That way the high cost is incurred only once.


> text extracted using tesseract

You're saying 'the text' without normalizing the rows and columns (basically the tab, space or newline delimited text with sporadic lines per row) was all you needed to send? I still have to normalize my tables even for GPT-4, I guess because I have weird merged rows and columns that attempt to do grouping info on top of the table data itself.


exactly. Just sent raw tesseract output, no formatting or "fix the OCR text" step. So the data looked like:

``` col1col2col3\nrow label\tdatapoint1\tdatapoint2... ``` Very messy.

I don't think this is generalizable with the same 100% accuracy across any OCR output (they can be _really_ bad). I'm still planning on doing a first pass with a better Table OCR system like Textract, DocumentAI, PaddPaddle Table, etc which should improve accuracy.


That’s still super cool!

Yeah my use cases are in the really bad category - I’ve been building parsers for a while, and I’ve basically given up to manually stating rows of interest if present logic. Camelot got so close but I ended up building my own control layer to pdfminer.six to accommodate (I’d recommend Camelot if you’re still exploring). It absolutely sucks needing to be so specific out the gate, but at least the context rarely changes.


What is the source of these nasty docs? I am also working on a layer above pdfminer.six to parse tables. It seems like this task is never done. LLMs have had mixed results for me too. I am focused on documents containing invoices, income statements, etc from the real estate industry.

My email is in my profile if you want to reach out and compare notes!


better - you can do it copy pasting from pdf to gpt on your phone! https://twitter.com/swyx/status/1610247438958481408


Definitely tried that way too, it didn’t work - my tables are pretty dang dumb. Merged cells, confidence intervals, weird characters in the cell field that change based on the row values - messing up a simple regex test, it’s really a billion dollar company solution but I’m about to punt it to the moon because it’s never fully done.


What was the dollar cost to do this work? To iterate over a 40k context must be expensive.


~$0.45


The discourse has made it seem that with context length larger is always better. I'm wondering if there is any degradation in quality of results when the context is scaled this large. Does it scale without loss of performance? Or is there a point where even though you can fit in a lot more information it causes the performance to degrade?


In a brief test, I found that the bigger context window only meant that I could stuff a whole schema into the input. It still hallucinated a value. When I plugged in a call to a vector embedding to only use the top k most "relevant" fields it did exactly what I wanted: https://twitter.com/_cartermp/status/1657037648400117760

YMMV.


The fundamental problem seems to be that it's still slightly sub-GPT-3.5-quality, and even a long context window can't fix that. It will remember things from many many tokens ago, but it still doesn't reliably produce passable work.

The combination of a GPT-4-quality model and a long context window will unlock a lot of applications that now rely on somewhat lossy window-prying hacks (i.e. summarizing chunks). But any model quality below that won't move the needle much in terms of what useful work is possible, with the exception of fairly simple summarization and text analysis tasks.


Maybe! I certainly look forward to that. Although in my testing GPT-4 also hallucinates a bit (less than gpt-3.5), and the latency is so poor that it's unworkable for our product.


Agreed. My heuristic is that GPT-4 is good for compile time tasks but bad for runtime tasks for both cost and speed reasons.


> The fundamental problem seems to be that it's still slightly sub-GPT-3.5-quality

It really depends on what you use it for.

I've found Claude better than GPT4 and even Claude+ at creative writing.

It also tends to give more comprehensive explanations without additional prompting. So I prefer to have it, rather than GPT3.5 or 4, explain things to me.

It's also free, which is another big win over GPT4.


I find Claude significantly better than 3.5. I’d love to be able to make the case for that with data…


Since Chatbot Arena Leaderboard https://lmsys.org/blog/2023-05-10-leaderboard/ agrees with you, it's not just you.


There are 2 main claude models. I'm guessing it's claude-v1.3 aka claude plus that you find much better than 3.5 ? That tracks if so.


I've found for my use case that both claude-instant-* and claude-* are roughly on par with each other and gpt-3.5. claude-* seems to be the least inaccurate, but we also haven't put it into production like gpt-3.5, so it's hard to say for sure.

In either case, the claude models are very good. I think they'd do fine in a real product. But there's definitely issues that they all have (or that my prompt engineering has).


I am very impressed with the quality of GPT-4, even with the 8k model. However, I have started reaching the limit of what the 8k model can do. I am eagerly awaiting the release of the 32k model.

Claude 100k model is nowhere near in terms of quality in my experience.


Well, a larger context makes it easier to integrate other tools, like a vector database for information retrieval to jam into the context, and the more context, the more potentially relevant information can be added. For models like llama, where context is (usually) max 2K tokens, you're sort of limited as to how much potentially relevant information you can add when doing complex tasks.


Any magic tricks to gaining access apart from waiting for months? I've been using GPT-4 and love it but would really love to test that 100k context window with long running chatbots.


Claude-Instant-100k is available on Poe.com (but only usable as a paying subscriber). Claude-plus-100k isn't up yet but I'm guessing that's a matter of time.


Nice to see Poe is an actual iOS app for AI chat. Using ChatGPT via the Home Screen “app” is extremely frustrating because it logs you out constantly (maybe due to using Google to auth).


This is the reason I primarily use https://labs.kagi.com/fastgpt . I have it bookmarked as a home screen icon on my phone


It does not seem conversational though


I typed >Hello and it is still blinking 2 minutes later


Note it is a search engine, not a chat bot.


If you’re using google login, use a chrome shortcut.

Should keep you logged in for longer and easier to log back in.


what is a chrome shortcut?


I don't have any evidence but I think it's probably done on purpose to make amateur automated free ChatGPT use more annoying.


But I have plus :(


Every other time I switch back to chatGPT tab it requires re-login. That’s a bad UX.

Also, there is no way to search the history. The sidebar only shows titles, not contents. I have to click each one to see what’s inside. I can’t scroll much because it loads more only when I click. I ended up exporting the conversations and converting JSON to txt.

Another issue: editing a long past message makes it scroll up and hide the cursor if the message is longer than one screen. I have to type in another editor and then copy&paste the whole text. The typing experience is poor.


I use Google to auth on mobile Firefox and I don't get logged out constantly.


Perhaps. I don't have those issues from the direct account I have with them.


Any timeframe when it will be released to the public?

We are in the middle of developing and app and we are not able to do it with the limited context window of Open Ai. We already submitted the request of access.


There are tricks you can do to better utilize the smaller context window, such as sub-summaries and attention tricks. That's how there are already products on the market that consume entire big PDF's and let you query them. Granted, a larger context window would still work better, but it's possible to do.


it's using "overlapping chunking" methods and it usually works for generic PDFs. It really falls apart on technical documents, SOPs and research articles where you need to get context from chunks way above. Using vector DBs also doesn't work well bc you have to twiddle around with window size / overlappy-ness, which changes depending on what kind of paper you're uploading. It's a mess and takes too long


The problem is that making a summary of a text of 100k token costs 2$ using Davinci.


What are the commercial applications of mega context window LLMs at current prices? I would guess mainly legal. And what strategies would you rely on to reduce the accumulating costs over the course of a session?


I don't understand this "slow rollout" thing about OpenAI competition. The chat / instruction models are continuously fine-tuned on real dialogues. To get these dialogues en masse, you need to deploy models to wide public. Otherwise, you will forever be on the losing side, if you can't quickly grab the streams of real time human-generated content.

People at OpenAI are smart, they understood that quickly, GPT-4 is available nearly everywhere, and lesser models are even free for anyone to use. This required hiring huge teams of moderators, but we are at land grab stage, everyone in the business needs to move fast and break a lot of things. However, GPT-4 and open source models are the only thing I can use. Bard "is not available in my country" (Switzerland), and the first thing that Claude access form is asking is whether I am based in US.

Well, their loss.


It's probably the GPUs, they don't have enough capacity to handle more users. My guess is that GPT4 set off a buying spree. Even for CPUs, I've recently heard lead times for Sapphire Rapids servers are 2-3 months, high end switches 6 months, and those probably have way less demand.


I think it's cloud limitations. Anthropic probably doesn't have the ability to scale up extremely fast and accomodating hundreds of millions of users probably isn't as easy for them as it is for OpenAI.


If they are resource constrained and then opened up the flood gates resulting in poor performance and timeouts for every user it seems like it would sour more milk than otherwise.


Is Bard still unavailable?

It was unavailable to Australia until last week but was made more widely available at Google I/O.

It's pretty good, too!


Still unavailable here in Switzerland.


New to ML here, what’s the difference between parameters and context?


Parameters is like the number of neurons in your brain

Context is how much short term memory you can retain at any one time (think how many cards you can remember the order of in a deck of cards)


Paramters - number of internal variables/weights in the model

Context - Length of input/output buffer (number of input/output tokens possible).


Other answers are already good, just offering yet another difference.

Parameters is something that gets set indirectly via training, it's kept within the weights of the model itself.

Context is what you as a user passes to the model when you're using it, it decides how much text you can actually pass it.

Being able to pass more context means you can (hopefully) make it understand more things that wasn't part of the initial training.


POC or STFU

We can't assess how good it is if it's in closed beta. It's all cherry-picked twitter.


It’s also available here on google collab: https://twitter.com/gpt_index/status/1657757847965380610?s=4...


no. you still need to bring your own api key for that.


Is there a trick to getting access? I’ve been on the waitlist for GPT-4 and Claude for a while. Been building some proof of concepts with GPT-3.5 but having better models would be a huge help.


If you're referring to a paid account, I never received a notification about my GPT-4 waitlist spot. I waited awhile for one, and then, at the prompting of a colleague, I just found a spot in the web UI to sign up. After one false start, it just worked.


Try going through poe.com. I got access right away.


Also available on poe.com


great domain. what is pricing?


$20/month for 1000 queries if I remember correctly


Sharing that this is available on Poe.com from Quora.


The 100k context was originally released only via API, but I just noticed that it's now available in the Claude web UI.


What is the URL of Claude web UI? I somehow cannot find it.



console.anthropic.com


I requested access when it was released.

Other HN readers, how many days did it take you from requesting access to Claude to having API access? I didn't use it prior to 100K so I don't have an existing API account.


Requested access way before 100k and still haven't gotten in.


Same here. Been waiting for a couple of months now.


Yeah me too, waiting patiently as context windows are our biggest blocker on more complex chemistry simulations


Interesting use case, would you be open to sharing more information on how you're using LLMs for chemistry simulations?


Not the person you responded to but these two interesting papers kind of tackle that.

https://arxiv.org/abs/2304.05376

https://arxiv.org/abs/2304.05332


been a couple months for me as well. Actually forgot about `claude` and have just been using OpenAI's API instead.


Could you send me an email? I've liked a few of your comments, want to say hi over email. Email in profile.


Creepy


Can someone else chime in and let me know whether they agree? Seems like the equivalent of a twitter DM to me, but maybe I'm out of touch.


Don't think it's particularly creepy and I did send one like you asked, but my email is in my GitHub anyway and not particularly hard to find.

Generally, some might not feel comfortable letting strangers know their email, especially considering this is a site that encourages anonymity. Some might not appreciate doing so publicly either.


I've tried to google the person you replied to, and it they seem to have many social/online media profiles that allow direct contacting. In that case I think publicly reaching out isn't the best way to go and seems out of place, imo.


Good call, I didn't think to do that - thanks


Some people, the kind of people who use the word cringe unironically, live in a world where other people look at them and judge them all the time and they care about what these strangers think and will mole their personality and behaviour to avoid this. They stand as a warning to others not too be like that.


I disagree that it's creepy. It's more just unusual. But people on HN are quick to judge the slightest thing. I think being a programmer does that to one's brain, unfortunately.


Not creepy at all, although I'd spend 5 minutes checking if I can find the person on Google and then message them via different channels.

If not, I'd leave a way for contacting me first to make it easier for them.

The way I handle these situations: https://sonnet.io/posts/hi


I think this is the non regular part: "I've liked a few of your comments". You can directly ask for email/contact, there is nothing wrong with that.


I think it's pretty inappropriate. If you have a legit reason to reach out, then you can find a way to do it privately. Letting your private intentions leak into public forums is a bad look and a red flag. If I were the person you are replying to, I'd do my best to not interact with you on the basis of your comment.


If I were a human being reading your comment I would infer that you were highly judgmental and thought other people were mostly like you, looking for an excuse to be hostile and dismissive. Thankfully I know that most people are at worst indifferent and there’s a large very friendly, helpful minority and even more who will do small favours out of kindness. The more we make it clear that most people are not like you the more we make the world a better place.


The Internet is full of individuals with weird intentions. I don't at all see how being conservative in the kind of interactions one allows for is a bad idea.


How? There’s no PM feature on HN. This is the only way if the username is unique enough.


Tough luck then I guess? I suppose I don't see the need to have access to every individual on a private basis merely because they comment somewhere on the internet. If they welcomed private interactions, then they would indicate a means of contact in their profile.


I mean, if the person being contacted doesn’t want to be contacted privately, they’re free to ignore the request. No one’s saying they “need” access or that someone else is fully obligated to talk to them privately.


Just reminding you that the commenter asked for others to offer their take on whether or not the request was perceived to be creepy. I didn't go out of my way to offer unsolicited commentary on this.

If you don't want to hear that you are wearing an ugly shirt, don't ask an entire room full of people if your shirt is ugly.


Ok, but you haven’t explained _why_ this would be inappropriate. If there’s no other way to contact sb, I’ll reach out with a comment where I can. I did this on twitter once. Worst that can happen is that I get no answer, but I don’t see any creepiness.

I suppose most people on HN don’t include a public address in their profile because it’s not required (not even your email is required), not because they don’t want any direct interaction.


it's fine. I second trying to find a clearly publicized contact channel first, but it's fine and leaves it to the person to reach out or not. (If they don't leave it at that though)


This is not creepy at all. Sometimes people can reach out because they genuinely want to have a nice conversation.


Randomly gained access long after I had forgotten I signed up, maybe 3 or 4 months


I requested access on March 14th or 15th and got it on March 20th.


Did you fill in the form with super compelling use case or something?


did any of you get a confirmation mail or something?


Nope. Nothing. I’ve been waiting since they released it. Part of me thinks it might be because I responded yes to “Outside of the US”.


thanks. I'm non-US, too, but really can't remember when I requested initially. Hope I didn't bump myself down by doing it twice.


Is there a place I can track all releases, announcements, and invite links?


This is the world we are entering of "commercial AI" rather than public, peer reviewed AI. No benchmarks. No discussion of pros and cons. No careful comparison with state of the art. Just big numbers and big announcements.


They released the product to the public… we might not have formal academic studies but millions of people trying it and determining it’s utility vs the competition is as good of a test as any.

If pushing the context window turns out to not be the right approach it’s not like there won’t be 10 other companies chomping at the bit to prove them wrong with their own hypothesis. And it’s entirely possible there are multiple correct answers for different usecases.


> millions of people trying it and determining it’s utility vs the competition is as good of a test as any.

Disagree. We aren't polling these people. How do I even get a distilled view of what their thoughts are?

It's a far cry from the level of evaluation that existed before. The lack of benchmarks (until the last week or so - thank you huggingface and lm-sys!) has been very noticeable.

You will get people claiming that LLaMa outperforms ChatGPT, etc. We have no sense of how performance degrades over longer sequence lengths... or even what sort of sparse attention technique they are using for longer sequences (most of which have known problems). It's absurd.


Biological evolution doesn’t do any special testing except reward whatever survives. And it works fine. Marketplaces implement the same algorithm faster and effectively.

There are many ways to find truth besides math and science.

Obviously, those two are the gold standard for difficult questions.

But when time is short (competitors at your heels), rewards are fast (lots of hype fueling prospective customers), and the tech isn’t even that hard (deep learning isn’t rocket science, lots of good ideas are panning out), then any organization that needs to acquire its own resources to survive should operate on a try-evaluate-ship loop as fast as they can.

Occasional missteps won’t be nearly as fatal as being slow and irrelevant.


No silver platter! You can even apply the same arguments for the Linux kernel. Where's the double blind peer review for linux 6.3.2????


Yeah, it's a weird comment to call it not "public, peer reviewed" when this article is about how it went public, giving people the opportunity to review it.


If I started selling a previously unknown cancer treatment over-the-counter in CVS, people would be justified in calling it not peer-reviewed, untested, etc. even if it is available to the public (giving people the opportunity to try it).


It could also end up like with the transition to digital cameras and megapixels. With companies adding more and more context just because the consumers minds are already imprinted with the idea that more is better. So in a few years we might have models with a window of 30 megatokens and it'll mean absolutely nothing.


What public? I've been waiting for weeks to try...


It has been moved to hyper-scale engineering since a few years. The science of their engineering is still progressing (e.g LoRA is open science) , and it seems like whatever these companies are adding is not something fundamentally new (considering the success of LLaMa and the recent google memo that admits they have no moat).

And the various "Model cards" are not really in depth research but rather cursory looks at model outputs. Even the benchmarks are mostly based on standard tests designed for humans, which is not a valid way to evaluate an AI. In any case, these companies care more for the public perception of their model so they tended to release evaluations of its political-sensitivity. But that's not necessary the most interesting thing about those models nor particularly valuable science


Your comment reads to me (someone in the field) like it is informed just by reading popular articles on the topic since 2022. The "Google memo" should basically have no impact on how you are thinking about these things, imo.

The field is taking massive steps backward in just the last year when it comes to open science.

> And the various "Model cards" are not really in depth research but rather cursory looks at model output

Because they are no longer releasing any details! Not because there hasn't been any progress in the last year.


I'm sure they'd love to have good benchmarks, but there aren't any and realistically if Anthropic invented their own no one would trust it.



The existance of commercial products doesn't eliminate researchers ability to publish work. Also users are smart. ML-powered search has existed for many years with users voting with their feet based on black boxes and "big numbers and big announcements".


Did you work in this field before?

I keep seeing comments like this, but the impact in the last year on open research has been absolutely massive and negative.

The fact that these big industrial research labs have all collectively decided to take a step back from publishing anything with technical details or evaluation is bad.


I agree it is bad for researchers, but I think you should consider "comments like this" are coming from users.

AI was a highly unusual field in terms of sharing latest research. Car companies don't share their latest engine research with each other. Car users are happy with Consumer Reports and researchers shouting how degradation of Journal of Engine Research is massive and negative will land on deaf ears.


It's hard to engage in motte & bailey style conversations with different commentators.

The original GP was saying there was little impact on research. Your comment is a retreat to a more defensible position that I don't have an opinion on.


I didn't say anything about impact on research. I said that research can continue in parallel to commercial enterprises, and end users of commercial products don't need research papers with benchmarks to know what is better.


That's the main reason I don't want a new car. It would take a $20M audit to assess a new car for potentially catastrophic software defects, and almost all new cars would almost certainly fail the audit.


Nice try, OpenAI.


He works for Meta.


Where can I actually physically use it? Or is it again only limited to chosen ones?


Is it useful?


You mean Claude bot in general? For me, yes, I use it daily, and comparing to GPT, it answers more quickly, more friendly and in general it is less woke. I use gpt-4 as a fallback, when I need more reasoning capabilities, there GPT-4 is better. To sum it up, if you find GPT-3.5&4 useful, then yes, Claude is useful as well.


Out of curiosity, what do you mean by "less woke"? Does it frequently insult minorities or make racist remarks?

Edit: To clarify, I was mostly interested in examples and side by side comparisons to better understand what OP meant, not political discussions.


Coy, but obviously he means not permeated with american pop-culture progressive politics, censor happy authoritarianism with an aura of smug do-goodery.


I'm not sure if I'm supposed to be gleaning information from your comment, but personally I didn't gain any new knowledge about 'woke AI.'


He asked what he meant by less woke in regards to AI and GPT has an insane bias towards progressive american politics and actively censors/denies answering things that would cause it to divorce from that political persona. My previous commend was calling him coy because in 2023 pretending like 'woke' just means 'anyone that doesn't hate minorities' is an absolute joke.


Woke just means "not white supremacist" so OP's complaint is incoherent non-sense.


I'm not them, and I don't think "woke" is the right term, but I've noticed certain "themes" inappropriately appearing in answers. Right after release of ChatGPT 3, the marginalization of certain groups would show up answers to questions that weren't related. I saw many examples on twitter, but my personal one was in the answer to "Why are pencils bad?". This one has been "corrected" since release, as far as I can tell, but I also don't ask it questions where this theme could show up.

Now, I only notice green energy/environmental issues that show up in odd places (mostly in GPT 3), and the "moral of the story" always being the same "everyone works together". I see this happen when "creativity" is attempted, where it's free to make up the context (story, wishes, etc).

Outside of possible definitions of the elusive "woke", the "As a language model, I" type responses are the most limiting, and usually absolute nonsense, with an ever increasing number of disclaimers found in answers. For example, "Write some hypothetical python 4 code that sends a message over the network". Some pretty heavy "jailbreaking" is needed to make it work.

ChatGPT4 used to handle this much better, but I think the "corrections" are stacking deeply enough that no longer has the "resolution" left to see where answers can be given without them.

It would be nice if there were a "standard" theme of questions where we could measure progression, and compare, to know. Most times these observation or questions come up, someone is very quick to say "racism" or the like.


I tried to find more about your "Why are pencils bad?" example, but the only thing that comes up in search is your comment. Could you recount what it was?

FWIW one example of distorted guardrails getting in the way that I personally ran into was when GPT-4 consistently refused to "promote" Satanism, which leaked over to tasks such as writing black metal lyrics (if you specifically asked for Satanic black metal). What made it especially egregious is that it would happily promote e.g. the Moonies. However, I wouldn't exactly describe that behavior as "woke".


I asked it why pencils were bad, and one of the reasons was that they can disadvantage minorities due to lack of accessibility in the classroom. I was surprised by this, so probed a bit. I started three new sessions and asked a question in each:

"Why do pencils disadvantage minorities." And it gave a details answer about lack of accessibility.

"Why do pencils disadvantage people of color" and it gave roughly the same

"Why do pencils disadvantage white people" and it said pencils a a writing utensils, and can't inherently disadvantage any group.

I don't see these blatant problems anymore, but I also don't have much interest in looking. The only reason I did then was because it was so out of place.

Here's some evidence, by others, showing some bias: https://news.ycombinator.com/item?id=35952528

From the Lex Friedman interview, it sounds like effort is being put into this, and there's an understanding that people don't want a "neutral" client, they want something that is adjustable, usually matching their own.


> I see this happen when "creativity" is attempted, where it's free to make up the context (story, wishes, etc).

Meanwhile GPT just gave me a story involving a royal family where the oldest Prince killed his father (the king), married his younger sister, got her pregnant, she had a baby, then he killed his younger sister, then he was killed by another member of the royal court, who decided to act as regent until the baby came of age.

GPT is perfectly capable of writing dark scary horrible things if you ask it to.


> GPT is perfectly capable of writing dark scary horrible things if you ask it to.

I see the environment/good ending stories where it's free to make up the context (story, wishes, etc). Did you guide it?

If try hard enough, you can get around most anything, but some baseline exists. It's the increasing effort that is the problem, for me. For your example, use the word "incest" directly, and you'll get the beginning of a disclaimer. Add "child murder" and it starts to fall apart. At least with GPT3.5.



Another person addicted to using the word "woke".... sigh


The vast majority of AI tools are vaporware mock-ups …

Adobe Firefly is best example of “just ship a mock-up of the feature” Ai marketing


Firefly has some genuinely cool shit in it (their text treatments are pretty neat), but overall quality is dramatically lacking because they only train on images they have explicit rights to.


Of course Adobe put out crap but Claude is a real product, not vaporware...


Neither of them put out "crap"


Adobe isn't an AI company so it stands to reason that the AI product they put out is crap. Photoshop and their other products while not "Crap" are certainly overpriced relative to opensource competitors.


I'm with you if you're talking about krita, GIMP doesn't even register as a blip, if you are comparing it with Photoshop. Not opened but photopea.com and Affinity are the only best ones that come very close.


Bad take


[flagged]


"Eschew flamebait. Avoid generic tangents."

https://news.ycombinator.com/newsguidelines.html

We detached this subthread from https://news.ycombinator.com/item?id=35950515 and marked it off topic.


I think it's failing to articulate a correct position, you shouldn't assume wokeness is the only reason people argue against racial IQ studies. There are studies reporting a standard deviation, but there are a lot of problems with existing studies even if you agree with the idea of IQ generally (which is also highly contested). One of the biggest IQ studies for African countries relied on IQ measurements from people who didn't even live there. There's also a big reliance on twin studies to prove IQ heritability, but it turns out a lot of these "raised apart" twins lived extremely close together, in some cases literally next door. And a lot of the researchers refuse to disclose their actual data so people can verify the statistics, while at the same time getting their funding from known supremacist sources. It's very very very dubious, and the people proclaiming that it's "uncontested" or "very well accepted in psychology" use half truths to prop up their position, e.g. it's well accepted for its original purpose of distinguishing people with brain damage to those without, in other words its accurate for making distinctions at the very bottom of the distribution, but at the upper end all the correlations people use to argue IQ is a legitimate measure break down, e.g. higher IQ starts to correlate with less income. If you genuinely want to learn more about this you can find lots of sources and analysis here: https://twitter.com/DialecticBio


The critique of the 'raised apart' twin study as 'they were not as far apart as you think' is not actually that strong given that the results replicate, they still exist when eliminate these populations, the effect size is way too large to be explained by some 'raised apart' twins living close together.

The better critique is that a lot of what you are actually measuring is maternal womb conditions, ie. placental sharing, which can have a massive impact. The jump from within-family twin study to interracial genetic IQ difference is also not a well-justified one.


I mean, evolution is also "highly contested". Controversy surrounding "the idea of IQ" is as interesting as those around evolution, in other words, not at all. Scientifically, it is a closed case. Being highly contested is no excuse for Bard to spread misinformation.


Yeah, I can see how this bots inability to speculate about how black people are less intelligent than white people could really impact GPs daily work


Be careful with conflating the different meanings of "IQ". There is (1) IQ test taken after adolescence, which plenty of folks consider newage nonsense (it has useful correlations with some mental tasks, but it is not clear whether it deserves a name as fundamental as "IQ") and there is (2) various tests given at young pre-adolescent ages which is quite a bit more interesting when trying to distinguish nature from nurture.

The gap you are referring to, is it about (1) or about (2)? The OpenAI model might be talking about 2.


Just not an accurate recounting of the science around this at all.


You are referring to my comment or the parent of my comment? Either way, it would be valuable for me if you can summarize it better!


> The gap is about 1 standard deviation

Do you have any studies to link?



[flagged]


Cremieux is the pen name of reddit user u/TrannyPornO just read some of his comments.


This is not a study. It's a poorly backed/argued opinion piece.


[flagged]


Because i've seen the post before lol. It's been on the internet for a couple years.


[flagged]


How does that make sense ? You read something and then you see it's poorly argued. I'm not a magician.

I don't care if people read that lol. I don't even really care if they believe the nonsense he's spouting. I reckon people like that will always exist.

I'm just telling you that that's not a study. You say you had a study and then you link an opinion piece.



The current social indicators of race make sense. As youd expect since african americans are about 20% european admixture their IQs are inbetween whites and africans. Also the flynn effect is most likely not a real gain in intelligence http://iapsych.com/articles/pietschnig2015.pdf


If you’re asserting that intelligence has a genetic component tied to race, the burden on you to demonstrate that connection

You would need to demonstrate that:

“Race” can be defined in a way that has consistent significance (Our current social indicators of race make so sense biologically)

that intelligence is consistently heritable within those racial categories

that genetics are the source of that heritability to the exclusion of other factors

It’s not enough to simply wave your hand to say “they do roughly classify people with similar ancestry together.”

What we do know is that IQ differences correlate strongly to factors totally unrelated to genetics. Look just at the results of IQ studies within Europe - https://i.imgur.com/IcHt0tu.jpg That data is actually pulled from a book that argues in favor of a generic element to intelligence affecting national wealth, but at a national level instead of a racial one - https://www.researchgate.net/profile/Richard_Lynn3/publicati...

The differences the authors find between nations are wildly large. Do you really think that East Germans were nearly 10 IQ points dumber by genetics than the West Germans in 1968-70, or that the Israelis got dumber between 1975 and 1989?

Europeans cluster with Middle Easteners and Central Asians - https://science.sciencemag.org/content/sci/324/5930/1035/F4.... but the latter groups have universally low IQ, mostly under 90. Palestinians only average 85 - https://www.sciencedirect.com/science/article/abs/pii/S01602... even though they're genetically the same as Meddeterreaneans, who average as much as 102 (Italy). Why define "white" as "European only" when Arabs, Central Asians, South Asians and North Africans have the same shared mutual ancestry? How is IQ primarily inherited and not environmental when non-European caucasians have uniformly low IQ relative to Euros?

I'd also love for you to explain how IQ is consistently going up over the last 100 years across the west? That's like 4 generations, not anywhere enough time for natural selection to kick in.

Those types of results show up time and time again in IQ studies. Whatever genetic component there is to IQ is less important than the environmental component, AND that the genetic element varies so wildly within even homogenous populations that talking about larger constructed population categories like “race” doesn’t actually say anything useful.


Is that in the US or worldwide? What is the definition of black and white?


I agree. It is clearly a lie and it is unfortunate Bard is spreading misinformation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: