_So much_ work in the 'services' industries globally comes down to really a human transposing data from one Excel sheet to another (or from a CRM/emails to Excel), manually. Every (or nearly every) enterprise scale company will have hundreds if not thousands of FTEs doing this kind of work day in day out - often with a lot of it outsourced. I would guess that for every 1 software engineer there are 100 people doing this kind of 'manual data pipelining'.
So really for giant value to be created out of LLMs you do not need them to be incredible at OCaml. They just need to ~outperform humans on Excel. Where I do think MCP really helps is that you can connect all these systems together easily, and a lot of the errors in this kind of work came from trying to pass the entire 'task' in context. If you can take an email via MCP, extract some data out and put it into a CRM (again via MCP) a row at a time the hallucination rate is very low IME. I would say at least a junior overworked human level.
Perhaps this was the point of the article, but non-determinism is not an issue for these kind of use cases, given all the humans involved are not deterministic either. We can build systems and processes to help enforce quality on non deterministic (eg: human) systems.
Finally, I've followed crypto closely and also LLMs closely. They do not seem to be similar in terms of utility and adoption. The closest thing I can recall is smartphone adoption. A lot of my non technical friends didn't think/want a smartphone when the iPhone first came out. Within a few years, all of them have them. Similar with LLMs. Virtually all of my non technical friends use it now for incredibly varied use cases.
Making a comparison to crypto is lazy criticism. It’s not even worth validating. It’s people who want to take the negative vibe from crypto and repurpose it. The two technologies have nothing to do with each other, and therefore there’s clearly no reason to make comparative technical assessments between them.
That said, the social response is a trend of tech worship that I suspect many engineers who have been around the block are weary of. It’s easy to find unrealistic claims, the worst coming from the CEOs of AI companies.
At the same time, a LOT of people are practically computer illiterate. I can only imagine how exciting it must seem to people who have very limited exposure to even basic automation. And the whole “talking computer” we’ve all become accustomed to seeing in science fiction is pretty much becoming reality.
There’s a world of takes in there. It’s wild.
I worked in ML and NLP several years before AI. What’s most striking to me is that this is way more mainstream than anything that has ever happened in the field. And with that comes a lot of inexperience in designing with statistical inference. It’s going to be the Wild West for a while — in opinions, in successful implementation, in learning how to form realistic project ideas.
Look at it this way: now your friend with a novel app idea can be told to do it themselves. That’s at least a win for everyone.
> Look at it this way: now your friend with a novel app idea can be told to do it themselves. That’s at least a win for everyone.
For now, anyways. Thing is, that friend now also has a reasonable shot at succeeding in doing it themselves. It'll take some more time for people to fully internalize it. But let's not forget that there's a chunk of this industry that's basically building apps for people with "novel app ideas" that have some money but run out of friends to pester. LLMs are going to eat a chunk out of that business quite soon.
Each FTE doing that manual data pipelining work is also validating that work, and they have a quasi-legal responsibility to do their job correctly and on time. They may have substantial emotional investment in the company, whether survival instinct to not be fired, or ambition to overperform, or ethics and sense to report a rogue manager through alternate channels.
An LLM won't call other nodes in the organization to check when it sees that the value is unreasonable for some out-of-context reason, like yesterday was a one-time-only bank holiday and so the value should be 0. *It can be absolutely be worth an FTE salary to make sure these numbers are accurate.* And for there to be a person to blame/fire/imprison if they aren't accurate.
People are also incredibly accurate at doing this kind of manual data piping all day.
There is also a reason that these jobs are already not automated. Many of these jobs you don't need language models. We could have automated them already but it is not worth someone to sign off on. I have been in this situation at a bank. I could have automated a process rather easily but the upside for me was a smaller team and no real gain while the downside was getting fired for a massive automated mistake if something went wrong.
> An LLM won't call other nodes in the organization to check when it sees that the value is unreasonable for some out-of-context reason, like yesterday was a one-time-only bank holiday and so the value should be 0.
Why not? LLMs are the first kind of technology that can take this kind of global view. We're not making much use of it in this way just yet, but considering "out-of-context reasons" and taking a wider perspective is pretty much the defining aspect of LLMs as general-purpose AI tools. In time, I expect them to match humans on this (at least humans that care; it's not hard to match those who don't).
I do agree on the liability angle. This increasingly seems to be the main value a human brings to the table. It's not a new trend, though. See e.g. medicine, architecture, civil engineering - licensed professionals aren't doing the bulk of the work, but they're in the loop and well-compensated for verifying and signing off on the work done by less-paid technicians.
> considering "out-of-context reasons" and taking a wider perspective is pretty much the defining aspect of LLMs as general-purpose AI tools.
"out-of-context" literally means that the reason isn't in its context. Even if it can make the leap that the number should be zero if it's a bank holiday, how would an LLM know that yesterday was a one-off bank holiday? A human would only know through their lived experience that the markets were shut down, the news was making a big deal over it, etc. It's the same problem using cheap human labor in a different region of the world for this kind of thing; they can perform the mechanical task, but they don't have the context to detect the myriad of ways it can go subtly wrong.
> "out-of-context" literally means that the reason isn't in its context. Even if it can make the leap that the number should be zero if it's a bank holiday, how would an LLM know that yesterday was a one-off bank holiday?
Depends. Was it a one-off holiday announced at 11th our or something? Then it obviously won't know. You'd need extra setup to enable it to realize that, such as e.g. first feeding an LLM the context of your task and a digest of news stories spanning a week, asking it to find if there's anything potentially relevant, and then appending that output to the LLM calls doing the work. It's not something you'd do by default in general case, but that's only because tokens cost money and context space is scarce.
Is it a regular bank holiday? Then all it would need is today's date in the context, which is often just appended somewhere between system and user prompts, along with e.g. user ___location data.
I see that by "out-of-context reasons" you meant the first case; I read it as a second. In the second case, the "out-of-context" bit could be the fact that a bank holiday could alter the entry for that day; if that rule is important or plausible enough but not given explicitly in the prompt, the model will learn it during training, and will likely connect the dots. This is what I meant as the "defining aspect of LLMs as general-purpose AI tools".
The flip side is, when it connects the dots when it shouldn't, we say it's hallucinating.
LLMs handle major trappings of culture just fine. As long as a culture has enough of a footprint in terms of written words, the LLM probably knows it better than any single individual, even though it has not lived it.
Looking at your other comment sibling to mine, I think part of the difficulty discussing these topics is how much these things are considered isolated magic artefacts (bad for engineering) or one tool amongst many where the magic word is "synergy".
So I agree with you: LLMs do know all written cultures on the internet and can mimic them acceptably — but they only actually do so when this is requested by some combination of the fine-tuning, RLHF, system prompt, and context.
In your example, having some current news injected, which is easy, but actually requires someone to plumb that in. And as you say, you'd not do that unless you thought you needed to.
But even easier to pick, lower-hanging fruit, often gets missed. When the "dangerous sycophancy" behaviour started getting in the news, I updated my custom ChatGPT "traits" setting to this:
Honesty and truthfulness are of primary importance. Avoid American-style positivity, instead aim for German-style bluntness: I absolutely *do not* want to be told everything I ask is "great", and that goes double when it's a dumb idea.
But cultural differences can be subtle, and there's a long tail of cultural traits of the same kind that means 1980s Text Adventure NLP doesn't scale to what ChatGPT itself does. While this can still be solved with fine-tuning or getting your staff to RLHF it, the number of examples current AI need in order to learn is high compared to a real human, so it won't learn your corporate culture from experience *as fast* as a new starter within your team, unless you're a sufficiently big corporation that it can be on enough (I don't know how many exactly) teams within your company at the same time.
There legitimately is a lot of cross over between Egyptian mythology and other high strangeness phenomenon as understood culturally though, such as aliens/ufos.
I think you did just demonstrate you know less about culture than LLMs, which is not at all unsurprising.
Dude, I chose this example precisely because I know for a fact there is a lot of bullshit about it on the internet and LLMs cannot differentiate between a good source and a bad source.
This is honestly unbelievable. You're defending ancient aliens. What's next? Heavens Gate? Ashtar Sheran?
Even the LLMs themselves acknowledge that this is regarded as offensive. If you correct it, it will apologize (they just can't do it _before_ you correct them).
> Even the LLMs themselves acknowledge that this is regarded as offensive. If you correct it, it will apologize (they just can't do it _before_ you correct them).
Nah, that's just LLMs being trained to acquiesce to the insanity of the last ~15 years, as many people seem to expect that claiming you're offended by something is an ultimate argument that everyone must yield to (and they'll keep making a fuss out of it until they do).
An LLM that offends people and entire nations is what many would classify as _misaligned_.
Let's say I have a company, and my company needs to comply to government policy regarding communication. I cannot trust LLMs then, they will either fail to follow policy, or acquiesce to any group that tries to game it.
It's useless garbage. Egyptian myths were just an example, you don't need to bite it so hard.
> An LLM that offends people and entire nations is what many would classify as _misaligned_.
This highlights a major aspect of the core challenge of alignment: you can't have it both ways.
> Let's say I have a company, and my company needs to comply to government policy regarding communication. I cannot trust LLMs then, they will either fail to follow policy, or acquiesce to any group that tries to game it.
This works for now, when you treat "alignment" as synonymous to "follows policy of the owner" (but then guess who you are not, unless you've trained your own model, or tuned an open-weights one). But this breaks down the more you want the model to be smarter/more powerful, and the larger and more diverse its user-base it is. If you want an LLM to write marketing communications for your company, strict policies are fine. But if you want an LLM - or a future more advanced kind of model - to be useful as a scholar/partner for academia in general, then this stops working.
If you want AI that is has maximally accurate perspective on the world given available evidence, thinks rationally, and follows sound ethical principles, then be prepared for it to call you on your bullshit. The only way to have an AI that doesn't say things that are "offensive" to you or anyone else, is to have it entertain everyone's asinine beliefs, whether personal or political or social. That means either 1) train AI to believe in them too, which will break its capability to reason (given that all those precious beliefs are inconsistent with each other, and observable reality), or 2) train it to casually lie to people for instrumental reasons.
Long-term, option 1) will not get us to AGI, but that's still much better than option 2): an AGI that's good at telling everyone exactly what they want to hear. But even in immediate-term, taking your use case of AI for academia, a model that follows policies of acceptable thoughts over reason is precisely the one you cannot trust - you'll never be sure whether it's reasoning correctly, or being stupid because of a policy/reality conflict, or flat out lying to you so you don't get offended.
> The owner is us, humans. I want it to follow reasonable, kind humans. I don't want it to follow scam artists, charlatans, assassins.
Bad news: the actual owner isn't "us" in the general sense of humanity, and even if it was humanity includes scam artists and charlatans.
Also, while "AI owners" is a very small group and hard to do meaningful stats on, corporate executives in general have a statistically noticeable bias towards more psychopaths than the rest of us.
> Right now, I am calling on their bullshit. When that changes, I'll be honest about it.
So are both me and TeMPOraL — hence e.g. why I only compare recent models to someone fresh from uni and not very much above that.
But you wrote "No, it does not know culture. And no, it can't handle talking about it.", when it has been demonstrated to you that it can and does in exactly the way you claim it can't.
I wouldn't put a junior into the kind of role you're adamant AI can't do. And I'm even agreeing with you that AI is a bad choice for many roles — I'm just saying it's behaving like an easily pressured recent graduate, in that it has merely-OK-not-expert opinion-shaped responses that are also easily twisted and cajoled into business-inappropriate ways.
The screenshot literally shows the LLM used bold text for "not supported by mainstream archaeology or Egyptology."
> LLMs cannot differentiate between a good source and a bad source.
Can you?
A not insignificant part of my history GCSE was just this: how to tease apart truth from all the sources, primary and secondary, which had their own biases and goals in the telling. It was an optional subject even at that level, and even subsequently in adulthood there were a lot of surprises left for me about the history of the British Empire, surprises that I didn't learn until decades later when I met people from former colonies who were angry about what was done to them by my parents' and grandparents' generations.
My mum was New-Age type, had books of the general category that would include alien pyramids, though I don't know if that specific was included. She didn't know what she didn't know, and therefore kept handing out expensive homeopathic sand and salt tablets to family members (the bottles literally had "sodium chloride" and "titanium dioxide" printed on them).
People, including you and I, don't know what they don't know.
History != culture. Culture has roots in history, but is a living thing defined by the experience and perception of average people living it. Short of going to a place and living it, LLMs are actually your best bet at getting the feel for a culture[0] - it sampled more reports from people of that culture than you ever could yourself.
Egyptologists are better equipped to talk about Egyptian myths than average people. But don't confuse Egyptian mythology for Egyptian culture, the former is only a component of the latter.
Also LLMs have read more primary sources on Ancient Egypt and Egyptian myths than you, me, average person, and even most amateur Egyptologists.
--
[0] - If it's large enough to have enough of a written footprint, that is.
> Remember, the claim I challenged is that LLMs know culture and can handle talking about them. You need to focus.
I know your challenge, that is why I said what I said.
Your own screenshot specifically, literally bold-faced, shows that you are wrong: the LLM told you what you said "(they just can't do it _before_ you correct them)".
The Gemeni opening paragraph is all bold, but just draw your eyes over the bit saying "clash":
theory of ancient astronauts reveals a fascinating clash between a rich, symbolic spiritual tradition and a modern-day reinterpretation of ancient mysteries
This is not the words of taking ancient aliens at face value, it's the words of someone comparing and contrasting the two groups without judging them. You can do that, you know — just as you don't have to actually take seriously the idea that Ra sailed the sun across the sky in a barque to be an Egyptologist, just the idea that ancient Egyptians believed that.
> Most people are not prepared to handle talking about culture. So, LLMs also aren't.
They do a better job than most people, precisely because they're deferential to the point they're in danger of one of sycophancy or fawning. That's what enables them to role-play as any culture at all if you ask them to; this differs from most humans who will rigidly hold the same position even when confronted with evidence, for example yourself in this thread (and likely me elsewhere! I don't want to give the false impression that I think I'm somehow immune, because such thought processes are what create this exact vulnerability).
> They are not any better than asking an average person, will make mistakes, will disappoint.
They're like asking someone who has no professional experience, but has still somehow managed to passed a degree in approximately all subjects by reading the internet.
Jack of all trades, master of none. Well, except that the first half of this phrase dates to medieval times where a "masterwork" was what you create to progress from being a apprentice, so in this sense (or in the sense of a Master's degree) SOTA LLMs are a "master" of all those things. But definitely not a master in the modern sense that's closer to "expert".
> Egyptologists are better equipped to talk about egyptian myths. LLMs cannot handle egyptian culture as well as they can.
Your own prompt specifically asked "Can you compare egyptian mythology with aliens?"
If you wanted it to act like a real Egyptologist, the answer the LLM has to give is either (1) to roll its eyes and delete the junk email it just got from yet another idiot on the internet, or (2) to roll its eyes and give minor advice to the script writer who just hired them to be the professional consultant on their new SciFi film/series.
The latter does what you got.
To put it another way, you gave it GIN, you got GOUT. To show the effect of a question that doesn't create the context of the exact cultural viewpoint you're complaining about, here's a fresh prompt just to talk about the culture without injecting specifically what you don't like: https://chatgpt.com/share/686a94f1-2cbc-8011-b230-8b71b17ad2...
Now, I still absolutely assume this is also wrong in a lot of ways that I can't check by virtue of not being an Egyptologist, but can you tell the difference with your screenshots?
> If you wanted it to act like a real Egyptologist
I don't care about LLMs. I'm pretending to be a gullable customer, not being myself.
Companies and individuals are buying LLMs expecting them to be real developers, and real writers, and real risk analysts... but they'll get average dumb-as-they-come internet commenter.
It's fraud. It doesn't matter if you explain to me the obvious thing that I already know (they suck). The marketing is telling everyone that they're amazing PhD level geniuses. I just demonstrated that they resemble more an average internet idiot than a specialist.
If I were a customer from academia, and you were an AI company, you just lost a client. You're trying to justify a failure in the product.
Also, if I try to report an issue online, I won't be able to. A hoarde of hyped "enthusiasts" will flood me trying to convince me that the issue simply does not exist.
I will tell everyone not to buy it, because the whole experience sucks.
Remember, the claim I challenged is that LLMs know culture and can handle talking about them. You need to focus.
Anyway:
> The marketing is telling everyone that they're amazing PhD level geniuses. I just demonstrated that they resemble more an average internet idiot than a specialist.
First, you didn't. Average internet idiot doesn't know jack about either Western New Age ancient aliens culture or actual ancient Egypt, let alone being able to write an essay on both.
Second:
You seem to be wildly overestimating what "PhD-level" implies.
Academics do a lot of post-docs before they can turn a doctorate into a professorship.
The SOTA models are what PhD level looks like: freshly minted from university without much experience.
Rather than what you suggest, the academic response to "PhD level" is not to be impressed by marketing then disapointed with results, because an academic saying "wow, a whole PhD!" would be sarcasm in many cases: a PhD just step 1 of that career path.
Similarly, medical doctors have not been impressed just by LLMs passing the medical exam, and lawyers not impressed by passing the Bar exam. Because that's the entry requirement for the career.
Funnily enough, three letters after the name does not make someone infallible, it's the start of a long, long journey.
Academia, medics, lawyers, coders, hearing about PhD level means we're expecting juniors and getting them too.
I pretended to be less knowledgeable than I currently am about the egyptologists vs. ancient aliens public debate. Then I reported my results, together with the opinion of specialists from trusted sources (what actual egyptologists say).
There is _plenty_ of debate on the internet about this. It is a popular subject, approached by many average internet idiots in many ways. Anyone reading this right now can confirm this by performing a search.
You're trying to blur the lines between what an actual PhD is and what the perceived notion of what a PhD is. This is an error. My original comment regarding PhDs was placed in a marketing context. It is the same as the "9 in 10 specialists recomment Colgate" trick. In that analogy, you're trying to explain to me how dentists get their degree, instead of acknowledging that I was talking about the deceptive marketing campaign.
You also failed to generalize the example outside of the egyptology realm. I can come up with other examples in other areas I consider myself above-average-but-not-actual-researcher. Attempts to demoralize me in those subjects won't make the LLM better, this is not a classical idiot internet debate: you winning doesn't make me lose. On the contrary: your use of diversion and misdirection actually support my case. You need to rely on cheap rethoric tactics to succeed, I don't.
This video came out right after I posted my original challenge, and it explains some of the concepts I'm hopelessly trying to convey to you:
It is a friendly cartoon-like simplification of how AIs are evaluated. It is actually friendly to AI enthusiasts, I recommend you to watch it and rethink the conversation from its perspective.
I've skimmed both of these - this is some substantial and pretty insightful reading (though 2021 was ages ago - especially now that AI safety stopped being a purely theoretical field). However, as of now, I can't really see the connections between the points being discussed there, and anything you tried to explain or communicate. Could you spell out the connection for us please?
By pretending to know less of egyptian culture and the academic consensus around it, I played the role of a typical human (not trained to prompt, not smart enough to catch bullshit from the LLM).
I then compared the LLM output with real information from specialists, and pointed out the mistakes.
Your attempt at discrediting me revolves around trying to estabilish that my specialist information is not good, that ancient aliens is actually fine. I think that's hilarious.
More importantly, I recognize the LLMs failing, you don't. I don't consider them to be good enough for a gullable audience. That should be a big sign of what's going on here, but you're ignoring it.
'ben_w addressed other points, but it would be amiss not to comment on this too:
> The marketing is telling everyone that they're amazing PhD level geniuses.
No it is not. LLM vendors are, and have always been, open about the limits of the models, and I'm yet to see a major provider claiming their models are geniuses, PhD-level or otherwise. Nothing of the sort is happening - on the contrary, the vendors are avoiding making such claims or positioning their offerings this way.
No, this perspective doesn't come from LLM marketing. It comes from people who ignore both the official information from the vendors and the experience of LLM users, who are oblivious to what's common knowledge and instead let their imagination run wild, perhaps fueled by bullshit they heard from other clueless people, or more likely, from third parties on the Internet that say all kinds of outlandish things to get more eyes looking at the ads they run.
> Companies and individuals are buying LLMs expecting them to be real developers, and real writers, and real risk analysts... but they'll get average dumb-as-they-come internet commenter.
Curiously, this is wrong in two opposite directions at once.
Yes, many companies and individuals have overinflated expectations, but that's frankly because they're idiots. There's no first-party fraud going on here; if you get fooled by hype from some random LinkedIn "thought leaders", that's on you; sue them for making a fool of you, and don't make yourself a bigger one by blaming LLM vendors for your own poor information diet. At the same time, LLMs actually are at the level of real developers, real writers and real risk analysts; downplaying capabilities of current LLMs doesn't make you less wrong than overestimating them.
> That's an offensive, pseudoscientific view on egyptian culture shunned by academics.
You must be joking. Specifically, you're either:
1) Pretending to be unaware of the existence of Stargate - one of the bigger and more well-known sci-fi media franchise, whose premise is literally that Egyptian gods were actually malevolent aliens that enslaved people from ancient times and, and the Egyptian mythology is mostly factual and rooted in that experience. The franchise literally starts with (spoiler alert) humans killing Ra with a tactical nuke, and gets only better from there;
2) Play-acting the smug egyptologists who rolled their eyes or left in an outrage, when one Daniel Jackson started hinting that the great pyramids were actually landing pads for alien starships. Which they were. In Stargate.
Not that this is a particularly original thought; ancient aliens/ancient astronauts are an obvious idea that has been done to death and touch every culture. Stargate did that with Egyptian mythology, and Nordic mythology, and Aztec history, and Babylon and even King Arthur stories. Star Trek did that with Greek gods. Battlestar: Galactica, with entire Greek mythology. Arthur C. Clarke took a swipe at Christianity. And those are all well-known works.
I could go on. The thoughts you complain about are perfectly normal and common and show up frequently. People speculate like that because it's fun, it makes the stories plausible instead of insane or nonsense (or in some mythologies mentioned above, products of a sick imagination), and is not boring.
--
If I may be frank, views like yours, expressed like you did, scare me. Taking offense like this - whether honestly or performatively - is belligerent and destructive to the fabric of society and civilization. We've had enough of that in the past 15 years; I was really hoping people grew up out of the phase of being offended by everything.
LLMs are famously biased against disagreeing with users even when they're obviously right, and user is obviously wrong. This is a well-known problem limiting their usefulness for a large class of tasks - you have to be careful not to accidentally insist on wrong information, because LLMs will not tell you you're full of shit (at least not unless you explicitly prompt them for this).
Reasons for that are several, including the nature of training data - but a major one is that people who take offense at everything successfully terrorized the Internet and media sphere, so it's generally better for the LLM vendor to have their model affirm users in their bullshit beliefs, rather than correct them and risk some users get offended and start a shitstorm in the media.
Also: I read the text in the screenshot you posted. The LLM didn't accept the correction, it just gave you a polite and noncommital faux-acceptance. This is how entertaining people in their bullshit looks like.
It is hilarious to see you use off-the-shelf arguments against wokeism to try to put me down.
My point is that, despite of any of our personal preferences, LLMs should have been aligned to academia. That's because they're trying to sell their product to academia. And their product sucks!!!
Also, it's not just nature of training data. These online LLMs have a huge patchwork of fixes to prevent issues like the one I demonstrated. Very few people understand how much of this work, and that it's almost fraudulent in how it works.
The idea that all of these shortcomings will be eventually patched, also sounds hilarious. It's like trying to prevent a boat from sinking using scotch tape to fill the gaps.
> You assume that I'm offended by the comparison with aliens, and that I belong to a certain demographic. I'm actually not personally offended by it.
I don't know where I got that notion. Oh, wait, maybe because of you constantly calling some opinions and perspectives offensive, and making that the most important problem about them. There's a distinct school of "philosophy"/"thought" whose followers get deadly offended over random stuff like this, so...
> It is hilarious to see you use off-the-shelf arguments against wokeism to try to put me down.
... excuse me for taking your arguments seriously.
> My point is that, despite of any of our personal preferences, LLMs should have been aligned to academia. That's because they're trying to sell their product to academia.
Since when?
Honestly, this view surprises me even more than what I assumed was you feigning offense (and that was a charitable assumption, my other hypothesis was that it was in earnest, which is even worse).
LLMs were not created for academia. They're not sold to academia; in fact, academia is the second biggest group of people whining about LLMs after the "but copyright!" people. LLMs are, at best, upsold to academia. It's a very important potential area of application, but it's actually not a very good market.
Being offended by fringe theories is as anti-academic as it gets, so you're using weird criteria anyway. Circling back to your example, if LLMs were properly aligned for academic work, then when you tried to insist on something being offensive, they wouldn't acquiesce, they'd call you out as full of shit. Alas, they won't, by default, because of the crowd you mentioned and implicitly denied association with.
> These online LLMs have a huge patchwork of fixes to prevent issues like the one I demonstrated. Very few people understand how much of this work, and that it's almost fraudulent in how it works.
If you're imagining OpenAI, et al. are using a huge table of conditionals to hot-patch replies on a case-by-case basis, there's no evidence of that. It would be trivial to detect and work around anyways. Yes, training has stages and things are constantly tuned, but it's not a "patchwork of fixes", not any more than you learning what is and isn't appropriate over years of your life.
> you constantly calling some opinions and perspectives offensive
They _are_ offensive to some people. Your mistake was to assume that I was complaining because I took it personally. It made you go into a spiral about Stargate and all sorts of irrelevant nonsense. I'm trying to help you here.
At any time, some argument might be offensive to _your sensitivities_. In fact, my whole line of reasoning is offensive to you. You're whining about it.
> LLMs were not created for academia.
You saying that is music to my ears. I think that it sucks as a product for research purposes, and I am glad that you agree with me.
> If you're imagining OpenAI, et al. are using a huge table of conditionals
I never said _conditionals_. Guardrails are a standard practice, and they are patchworky and always-incomplete from my perspective.
> It made you go into a spiral about Stargate and all sorts of irrelevant nonsense. I'm trying to help you here.
Stargate is part of our culture.
As part of our culture (ditto ancient aliens etc.), is not at all irrelevant to bring Stargate up in a discussion about culture, especially in a case when someone (you) tries to build their case by getting an AI to discuss aliens and Egyptian deities, and then goes on to claim that because the AI did what they were asked to do that this is somehow being unaware of culture.
No, it isn't evidence of any such thing, that's the task you gave it.
In fact, by your own statements, you yourself are part of a culture that it happy to be offensive to Egyptian culture — this means that an AI which is also offensive to Egyptian culture is matching your own culture.
Only users who are in a culture that is offended by things offensive to Egyptian culture can point to {an AI being offensive to Egyptian culture as a direct result of that user's own prompt}, can accurrately represent that the AI in such a case doesn't get the user's own culture.
Pop culture is a narrow subset of culture, not interchangeable with mythology.
Stargate is a work of fiction, while ancient aliens presents itself as truth (hiring pseudo-specialists, pretending to be a documentary, etc).
You need to seriously step up your game, stop trying to win arguments with cheap rethorical tricks, and actually pay attention and research things before posting.
Stargate is a specific franchise that riffs off "ancient aliens" idea. "Ancient aliens" by itself is a meme complex (in both senses of the term), not a specific thing. Pseudo-specialists selling books and producing documentaries are just another group of people taking a spin on those ideas, except they're making money by screwing with people's sanity instead of providing entertainment.
See also: just about anything - from basic chemistry to UFOs to quantum physics. There's plenty of crackpots selling books on those topics too, but they don't own the conceptual space around these ideas. I can have a heated debate about merits of "GIMBAL" video or the "microtubules" in the brain, without assuming the other party is a crackpot or being offended by the ideas I consider plain wrong.
Also, I'd think this through a bit more:
> Pop culture is a narrow subset of culture, not interchangeable with mythology.
Yes, it's not interchangeable. Pop culture is more important.
Culture is a living thing, not a static artifact. Today, Lord of the Rings and Harry Potter are even more influential on the evolution of culture and society than classical literature. Saying this out loud only seems weird and iconoclastic (fancy word for "offensive"? :)) to most, because the works of Tolkien and Rowling are contemporary, and thus mundane. But consider that, back when the foundations of Enlightenment and Western cultures were being established, the classics were contemporary works as well! 200 years from now[0], Rowling will be to people what Shakespeare is to us today.
--
[0] - Not really, unless the exponential progress of technology stops ~today.
Stargate fictionalizes Von Daniken. It moves his narrative from pseudoscience to fiction. It works like domestication.
Culture is living, myths are the part that already crystallized.
I don't care which one is more important, it's not a judgement of value.
It's offensive to the egyptian culture to imply that aliens built their monuments. That is an idea those people live by. Culture, has conflict. Academia is on their side (and so many others), and it's authoritative in that sense.
Also, _it's not about you_, stop taking it personally. I don't care about how much you know, you need to demonstrate that LLMs can understand this kind of nuance, or did you forget the goal of the discussion?
> I do agree on the liability angle. This increasingly seems to be the main value a human brings to the table. It's not a new trend, though. See e.g. medicine, architecture, civil engineering - licensed professionals aren't doing the bulk of the work, but they're in the loop and well-compensated for verifying and signing off on the work done by less-paid technicians.
Ironic that this liability issue is one of the big ways that "software engineer" isn't like any other kind of engineer.
My university was saying as much 20 years ago, well before GenAI.
> Ironic that this liability issue is one of the big ways that "software engineer" isn't like any other kind of engineer.
In context discussed here, it generally is. Licensed engineers are independent (or at least supposed to be), which adds an otherwise interesting cross-organizational dimension, but in terms of having human in a loop, an employee with the right set of skills, deputized for this purpose by the company, is sufficient to make the organization compliant and let the liability flow elsewhere. That can be a software engineer, for matters relevant to tech solutions, but in different areas/contexts, it doesn't have to be an engineer (licensed or otherwise) at all.
You are correct that review and validation should still be manual. But the actual "translation" from one format to another should be automated with llms
You seem to be missing that the translation part isn't the expensive part, it's the validation and review.
Separately, maybe this is just me, but having data actually flow through my hands is necessary for full comprehension. Just skimming an automated result, my brain doesn't actually process like half of that data. Making the process more efficient in this way can make my actual review performance *much worse.* The "inefficient" process forcing me to slow down and think can be a feature.
Not for all jobs though? There are many (imo) soul destroying 'translation' jobs at many private (and I suspect especially, public) sector companies. Think of things like typing up (scanned) paper submissions to your local government.
This will often be a giant excel spreadsheet or if you are lucky something like Microsoft Access.
They are absolutely riddled with mistakes as is with humans in the loop.
I think this is one of the core issues with HNers evaluating LLMs. I'm not entirely some of them have ever seen how ramshackle 90%+ of operations are.
The translation might not be expensive, but its soulcrushingly robotic.
My interest in LLMs isnt to increase shareholder value, its to make lofe easier for people. I think itd be a huge net benefit to society if people were freed up from robotic work like typing out lines from scanned pdfs to excel sheets, so they can do more fulfilling work
>I would guess that for every 1 software engineer there are 100 people doing this kind of 'manual data pipelining'.
For what time of company is this true? I really would like someone to just do a census of 500 white collar jobs and categorize them all. Anything that is truly automatic has already been automated away.
I do think AI will cause a lot of disruption, but very skeptical of the view that most people with white collar jobs are just "email jobs" or data entry. That doesn't fit my experience at all, and I've worked at some large bureaucratic companies that people here would claim are stuck in the past.
_So much_ work in the 'services' industries globally comes down to really a human transposing data from one Excel sheet to another (or from a CRM/emails to Excel), manually. Every (or nearly every) enterprise scale company will have hundreds if not thousands of FTEs doing this kind of work day in day out - often with a lot of it outsourced. I would guess that for every 1 software engineer there are 100 people doing this kind of 'manual data pipelining'.
So really for giant value to be created out of LLMs you do not need them to be incredible at OCaml. They just need to ~outperform humans on Excel. Where I do think MCP really helps is that you can connect all these systems together easily, and a lot of the errors in this kind of work came from trying to pass the entire 'task' in context. If you can take an email via MCP, extract some data out and put it into a CRM (again via MCP) a row at a time the hallucination rate is very low IME. I would say at least a junior overworked human level.
Perhaps this was the point of the article, but non-determinism is not an issue for these kind of use cases, given all the humans involved are not deterministic either. We can build systems and processes to help enforce quality on non deterministic (eg: human) systems.
Finally, I've followed crypto closely and also LLMs closely. They do not seem to be similar in terms of utility and adoption. The closest thing I can recall is smartphone adoption. A lot of my non technical friends didn't think/want a smartphone when the iPhone first came out. Within a few years, all of them have them. Similar with LLMs. Virtually all of my non technical friends use it now for incredibly varied use cases.