Why I am not impressed by A.I.

joel1974@lemmy.world · 5 months ago

Why I am not impressed by A.I.

VintageGenious@sh.itjust.works · 5 months ago

Because you’re using it wrong. It’s good for generative text and chains of thought, not symbolic calculations including math or linguistics

Grandwolf319@sh.itjust.works · 5 months ago

Because you’re using it wrong.

No, I think you mean to say it’s because you’re using it for the wrong use case.

Well this tool has been marketed as if it would handle such use cases.

I don’t think I’ve actually seen any AI marketing that was honest about what it can do.

I personally think image recognition is the best use case as it pretty much does what it promises.

scarabic@lemmy.world · 5 months ago

Really? AI has been marketed as being able to count the r’s in “strawberry?” Please link to this ad.

joel1974@lemmy.world · 5 months ago

Give me an example of how you use it.

L3s@lemmy.world · edit-2 5 months ago

Writing customer/company-wide emails is a good example. “Make this sound better: we’re aware of the outage at Site A, we are working as quick as possible to get things back online”

Dumbing down technical information “word this so a non-technical person can understand: our DHCP scope filled up and there were no more addresses available for Site A, which caused the temporary outage for some users”

Another is feeding it an article and asking for a summary, https://hackingne.ws does that for its Bsky posts.

Coding is another good example, “write me a Python script that moves all files in /mydir to /newdir”

Asking for it to summarize a theory or protocol, “explain to me why RIP was replaced with RIPv2, and what problems people have had since with RIPv2”

Corngood@lemmy.ml · 5 months ago

Make this sound better: we’re aware of the outage at Site A, we are working as quick as possible to get things back online

How does this work in practice? I suspect you’re just going to get an email that takes longer for everyone to read, and doesn’t give any more information (or worse, gives incorrect information). Your prompt seems like what you should be sending in the email.

If the model (or context?) was good enough to actually add useful, accurate information, then maybe that would be different.

I think we’ll get to the point really quickly where a nice concise message like in your prompt will be appreciated more than the bloated, normalised version, which people will find insulting.

L3s@lemmy.world · edit-2 5 months ago

Yeah, normally my “Make this sound better” or “summarize this for me” is a longer wall of text that I want to simplify, I was trying to keep my examples short. Talking to non-technical people about a technical issue is not the easiest for me, AI has helped me dumb it down when sending an email, and helps correct my shitty grammar at times.

As for accuracy, you review what it gives you, you don’t just copy and send it without review. Also you will have to tweak some pieces that it gives out where it doesn’t make the most sense, such as if it uses wording you wouldn’t typically use. It is fairly accurate though in my use-cases.

Hallucinations are a thing, so validating what it spits out is definitely needed.

Another example: if you feel your email is too stern or gives the wrong tone, I’ve used it for that as well. “Make this sound more relaxed: well maybe if you didn’t turn off the fucking server we wouldn’t of had this outage!” (Just a silly example)

otp@sh.itjust.works · 5 months ago

As for accuracy, you review what it gives you, you don’t just copy and send it without review.

Yeah, I don’t get why so many people seem to not get that.

It’s like people who were against Intellisense in IDEs because “What if it suggests the wrong function?”…you still need to know what the functions do. If you find something you’re unfamiliar with, you check the documentation. You don’t just blindly accept it as truth.

Just because it can’t replace a person’s job doesn’t mean it’s worthless as a tool.

Voroxpete@sh.itjust.works · 5 months ago

The issue is that AI is being invested in as if it can replace jobs. That’s not an issue for anyone who wants to use it as a spellchecker, but it is an issue for the economy, for society, and for the planet, because billions of dollars of computer hardware are being built and run on the assumption that trillions of dollars of payoff will be generated.

And correcting someone’s tone in an email is not, and will never be, a trillion dollar industry.

otp@sh.itjust.works · 5 months ago

That’s a very different problem than the one in the OP

Grandwolf319@sh.itjust.works · 5 months ago

Yeah, I don’t get why so many people seem to not get that.

The disconnect is that those people use their tools differently, they want to rely on the output, not use it as a starting point.

I’m one of those people, reviewing AI slop is much harder for me than just summarizing it myself.

I find function name suggestions useful cause it’s a lookup tool, it’s not the same as a summary tool that doesn’t help me find a needle in a haystack, it just finds me a needle when I have access to many needles already, I want the good/best needle, and it can’t do that.

Voroxpete@sh.itjust.works · 5 months ago

I think these are actually valid examples, albeit ones that come with a really big caveat; you’re using AI in place of a skill that you really should be learning for yourself. As an autistic IT person, I get the struggle of communicating with non-technical and neurotypical people, especially clients who you have to be extra careful with. But the reality is, you can’t always do all your communication by email. If you always rely on the AI to correct your tone or simplify your language, you’re choosing not to build an essential skill that is every bit as important to doing your job well as it is to know how to correctly configure an ACL on a Cisco managed switch.

That said, I can also see how relying on the AI at first can be a helpful learning tool as you build those skills. There’s certainly an argument that by using tools, but paying attention to the output of those tools, you build those skills for yourself. Learning by example works. I think used in that way, there’s potentially real value there.

Which is kind of the broader story with Gen AI overall. It’s not that it can never be useful; it’s that, at best, it can only ever aspire to “useful.” No one, yet, has demonstrated any ability to make AI “essential” and the idea that we should be investing hundreds of billions of dollars into a technology that is, on its best days, mildly useful, is sheer fucking lunacy.

snooggums@lemmy.world · 5 months ago

If you always rely on the AI to correct your tone or simplify your language, you’re choosing not to build an essential skill that is every bit as important to doing your job well as it is to know how to correctly configure an ACL on a Cisco managed switch.

This is such a good example of how it AI/LLMs/whatever are being used as a crutch that is far more impactful than using a spellchecker. A spell checker catches typos or helps with unfamiliar words, but doesn’t replace the underlying skill of communicating to your audience.

msage@programming.dev · 5 months ago

I have a blog for you

Voroxpete@sh.itjust.works · edit-2 5 months ago

Noted, I’ll be giving that a proper read after work. Thank you.

Edit to add: Yeah, that pretty much mirrors my own experiences of using AI as a coding aid. Even when I was learning a new language, I found that my comprehension of the material very quickly outstripped whatever ChatGPT could provide. I’d much rather understand what I’m building because I built it myself. A lot of the time, when you use a solution someone else provided you don’t find out until much later how badly that solution held you back because it wasn’t actually the best way to tackle the problem.

CarnivorousCouch@lemmy.world · 5 months ago

This was an interesting read, thanks for sharing.

earphone843@sh.itjust.works · 5 months ago

It works well. For example, we had a work exercise where we had to write a press release based on an example, then write a Shark Tank pitch to promote the product we came up with in the release.

I gave AI the link to the example and a brief description of our product, and it spit out an almost perfect press release. I only had to tweak a few words because there were specific requirements I didn’t feed the AI.

Then I told it to take the press release and write the pitch based on it.

Again, very nearly perfect with only having to change the wording in one spot.

locuester@lemmy.zip · 5 months ago

Yes, people are using it as the least efficient communication protocol ever.

One side asks an LLM to expand a summary into a fluff filled email, and the other side asks an LLM to reduce the long email to a summary.

lurch (he/him)@sh.itjust.works · 5 months ago

it’s not good for summaries. often gets important bits wrong, like embedded instructions that can’t be summarized.

L3s@lemmy.world · edit-2 5 months ago

My experience has been very different, I do have to sometimes add to what it summarized though. The Bsky account mentioned is a good example, most of the posts are very well summarized, but every now and then there will be one that isn’t as accurate.

snooggums@lemmy.world · edit-2 5 months ago

The dumbed down text is basically as long as the prompt. Plus you have to double check it to make sure it didn’t have outrage instead of outage just like if you wrote it yourself.

How do you know the answer on why RIP was replaced with RIPv2 is accurate and not just a load of bullshit like putting glue on pizza?

Are you really saving time?

L3s@lemmy.world · edit-2 5 months ago

Yes, I’m saving time. As I mentioned in my other comment:

Yeah, normally my “Make this sound better” or “summarize this for me” is a longer wall of text that I want to simplify, I was trying to keep my examples short.

And

and helps correct my shitty grammar at times.

And

Hallucinations are a thing, so validating what it spits out is definitely needed.

snooggums@lemmy.world · 5 months ago

How do you validate the accuracy of what it spits out?

Why don’t you skip the AI and just use the thing you use to validate the AI output?

L3s@lemmy.world · 5 months ago

Most of what I’m asking it are things I have a general idea of, and AI has the capability of making short explanations of complex things. So typically it’s easy to spot a hallucination, but the pieces that I don’t already know are easy to Google to verify.

Basically I can get a shorter response to get the same outcome, and validate those small pieces which saves a lot of time (I no longer have to read a 100 page white paper, instead a few paragraphs and then verify small bits)

earphone843@sh.itjust.works · 5 months ago

Dumbed down doesn’t mean shorter.

snooggums@lemmy.world · edit-2 5 months ago

If the amount of time it takes to create the prompt is the same as it would have taken to write the dumbed down text, then the only time you saved was not learning how to write dumbed down text. Plus you need to know what dumbed down text should look like to know if the output is dumbed down but still accurate.

lime!@feddit.nu · edit-2 5 months ago

i’m still not entirely sold on them but since i’m currently using one that the company subscribes to i can give a quick opinion:

i had an idea for a code snippet that could save be some headache (a mock for primitives in lua, to be specific) but i foresaw some issues with commutativity (aka how to make sure that a + b == b + a). so i asked about this, and the llm created some boilerplate to test this code. i’ve been chatting with it for about half an hour and testing the code it produces, and had it expand the idea to all possible metamethods available on primitive types, together with about 50 test cases with descriptive assertions. i’ve now run into an issue where the __eq metamethod isn’t firing correctly when one of the operands is a primitive rather than a mock, and after having the llm link me to the relevant part of the docs, that seems to be a feature of the language rather than a bug.

so in 30 minutes i’ve gone from a loose idea to a well-documented proof-of-concept to a roadblock that can’t really be overcome. complete exploration and feasibility study, fully tested, in less than an hour.

The Hobbyist@lemmy.zip · 5 months ago

One thing which I find useful is to be able to turn installation/setup instructions into ansible roles and tasks. If you’re unfamiliar, ansible is a tool for automated configuration for large scale server infrastructures. In my case I only manage two servers but it is useful to parse instructions and convert them to ansible, helping me learn and understand ansible at the same time.

Here is an example of instructions which I find interesting: how to setup docker for alpine Linux: https://wiki.alpinelinux.org/wiki/Docker

Results are actually quite good even for smaller 14B self-hosted models like the distilled versions of DeepSeek, though I’m sure there are other usable models too.

To assist you in programming (both to execute and learn) I find it helpful too.

I would not rely on it for factual information, but usually it does a decent job at pointing in the right direction. Another use i have is helpint with spell-checking in a foreign language.

chiisana@lemmy.chiisana.net · 5 months ago

Ask it for a second opinion on medical conditions.

Sounds insane but they are leaps and bounds better than blindly Googling and self prescribe every condition there is under the sun when the symptoms only vaguely match.

Once the LLM helps you narrow in on a couple of possible conditions based on the symptoms, then you can dig deeper into those specific ones, learn more about them, and have a slightly more informed conversation with your medical practitioner.

They’re not a replacement for your actual doctor, but they can help you learn and have better discussions with your actual doctor.

noodle (he/him)@lemm.ee · 5 months ago

sounds like a perfectly sane idea https://freethoughtblogs.com/pharyngula/2025/02/05/ai-anatomy-is-weird/

Sippy Cup@lemmy.world · 5 months ago

So can web MD. We didn’t need AI for that. Googling symptoms is a great way to just be dehydrated and suddenly think you’re in kidney failure.

chiisana@lemmy.chiisana.net · 5 months ago

We didn’t stop trying to make faster, safer and more fuel efficient cars after Model T, even though it can get us from place A to place B just fine. We didn’t stop pushing for digital access to published content, even though we have physical libraries. Just because something satisfies a use case doesn’t mean we should stop advancing technology.

snooggums@lemmy.world · 5 months ago

AI is slower and less efficient than the older search algorithms and is less accurate.

Sippy Cup@lemmy.world · 5 months ago

We also didn’t make the model T suggest replacing the engine when the oil light comes on. Cars, as it happens, aren’t that great at self diagnosis, despite that technology being far simpler and further along than generative models are. I don’t trust the model to tell me what temperature to bake a cake at, I’m sure at hell not going to trust it with medical information. Googling symptoms was risky at best before. It’s a horror show now.

chaosCruiser@futurology.today · edit-2 5 months ago

Here’s a bit of code that’s supposed to do stuff. I got this error message. Any ideas what could cause this error and how to fix it? Also, add this new feature to the code.

Works reasonably well as long as you have some idea how to write the code yourself. GPT can do it in a few seconds, debugging it would take like 5-10 minutes, but that’s still faster than my best. Besides, GPT is also fairly fluent in many functions I have never used before. My approach would be clunky and convoluted, while the code generated by GPT is a lot shorter.

If you’re well familiar with the code you’ve working on, GPT code will be convoluted by comparison. If so, you can ask GPT for the rough alpha version, and you can do the debugging and refining in a few minutes.

Windex007@lemmy.world · 5 months ago

That makes sense as long as you’re not writing code that needs to know how to do something as complex as …checks original post… count.

TimeSquirrel@kbin.melroy.org · 5 months ago

It can do that just fine, because it has seen enough examples of working code. It can’t directly count correctly, sure, but it can write “i++;”, incrementing a variable by one in a loop and returning the result. The computer running the generated program is going to be doing the counting.

slaacaa@lemmy.world · 5 months ago

I have it write for me emails in German. I moved there not too long ago, works wonders to get doctors appointment, car service, etc. I also have it explain the text, so I’m learning the language.

I also use it as an alternative to internet search, which is now terrible. It’s not going to help you to find smg super location specific, but I can ask it to tell me without spoilers smg about a game/movie or list metacritic scores in a table, etc.

It also works great in summarizing long texts.

LLM is a tool, what matters is how you use it. It is stupid, it doesn’t think, it’s mostly hype to call it AI. But it definitely has it’s benefits.

scarabic@lemmy.world · 5 months ago

We have one that indexes all the wikis and GDocs and such at my work and it’s incredibly useful for answering questions like “who’s in charge of project 123?” or “what’s the latest update from team XYZ?”

I even asked it to write my weekly update for MY team once and it did a fairly good job. The one thing I thought it had hallucinated turned out to be something I just hadn’t heard yet. So it was literally ahead of me at my own job.

I get really tired of all the automatic hate over stupid bullshit like this OP. These tools have their uses. It’s very popular to shit on them. So congratulations for whatever agreeable comments your post gets. Anyway.

verdigris@lemmy.ml · 5 months ago

I mean, I would argue that the answer in the OP is a good one. No human asking that question honestly wants to know the sum total of Rs in the word, they either want to know how many in “berry” or they’re trying to trip up the model.

dreadbeef@lemmy.dbzer0.com · 5 months ago

“You’re holding it wrong”

Voyajer@lemmy.world · edit-2 5 months ago

This but actually. Don’t use an LLM to do things LLMs are known to not be good at. As tools various companies would do good to list out specifically what they’re bad at to eliminate requiring background knowledge before even using them, not unlike needing to somehow know that one corner of those old iPhones was an antenna and to not bridge it.

sugar_in_your_tea@sh.itjust.works · 5 months ago

Yup, the problem with that iPhone (4?) wasn’t that it sucked, but that it had limitations. You could just put a case on it and the problem goes away.

LLMs are pretty good at a number of tasks, and they’re also pretty bad at a number of tasks. They’re pretty good at summarizing, but don’t trust the summary to be accurate, just to give you a decent idea of what something is about. They’re pretty good at generating code, just don’t trust the code to be perfect.

You wouldn’t use a chainsaw to build a table, but it’s pretty good at making big things into small things, and cleaning up the details later with a more refined tool is the way to go.

snooggums@lemmy.world · 5 months ago

They’re pretty good at summarizing, but don’t trust the summary to be accurate, just to give you a decent idea of what something is about.

That is called being terrible at summarizing.

sugar_in_your_tea@sh.itjust.works · 5 months ago

That depends on how you use it. If you need the information from an article, but don’t want to read it, I agree, an LLM is probably the wrong tool. If you have several articles and want go decide which one has the information you need, an LLM is a pretty good option.

desktop_user [they/them] @lemmy.blahaj.zone · 5 months ago

if you want to find a few articles out of a few hundred that are about the benefits of nuclear weapons or other controversial topics that have significant literature on them it can be helpful to eliminate 90% that probably aren’t what I’m looking for.

snooggums@lemmy.world · 5 months ago

Or you might eliminate some that are what you are looking for because the summaries are inaccurate.

Guess it depends on whether an unreliable system is still better than being overwhelmed with choices.

TheGrandNagus@lemmy.world · 5 months ago

I think there’s a fundamental difference between someone saying “you’re holding your phone wrong, of course you’re not getting a signal” to millions of people and someone saying “LLMs aren’t good at that task you’re asking it to perform, but they are good for XYZ.”

If someone is using a hammer to cut down a tree, they’re going to have a bad time. A hammer is not a useful tool for that job.

Prandom_returns@lemm.ee · 5 months ago

So for something you can’t objectively evaluate? Looking at Apple’s garbage generator, LLMs aren’t even good at summarising.

Balder@lemmy.world · edit-2 5 months ago

For reference:

AI chatbots unable to accurately summarise news, BBC finds

the BBC asked ChatGPT, Copilot, Gemini and Perplexity to summarise 100 news stories and rated each answer. […] It found 51% of all AI answers to questions about the news were judged to have significant issues of some form. […] 19% of AI answers which cited BBC content introduced factual errors, such as incorrect factual statements, numbers and dates.

It makes me remember I basically stopped using LLMs for any summarization after this exact thing happened to me. I realized that without reading the text, I wouldn’t be able to know whether the output has all the relevant info or if it has some made-up info.

whotookkarl@lemmy.world · 5 months ago

I’ve already had more than one conversation where people quote AI as if it were a source, like quoting google as a source. When I showed them how it can sometimes lie and explain it’s not a primary source for anything I just get that blank stare like I have two heads.

schnurrito@discuss.tchncs.de · 5 months ago

Me too. More than once on a language learning subreddit for my first language: “I asked ChatGPT whether this was correct grammar in German, it said no, but I read this counterexample”, then everyone correctly responded “why the fuck are you asking ChatGPT about this”.

5 months ago

I use ai like that except im not using the same shit everyone else is on. I use a dolphin fine tuned model with tool use hooked up to an embedder and searxng. Every claim it makes is sourced.

Traister101@lemmy.today · 5 months ago

Sure buddy

Grandwolf319@sh.itjust.works · edit-2 5 months ago

There is an alternative reality out there where LLMs were never marketed as AI and were marketed as random generator.

In that world, tech savvy people would embrace this tech instead of having to constantly educate people that it is in fact not intelligence.

Static_Rocket@lemmy.world · 5 months ago

That was this reality. Very briefly. Remember AI Dungeon and the other clones that were popular prior to the mass ml marketing campaigns of the last 2 years?

daniskarma@lemmy.dbzer0.com · 5 months ago

They are not random per se. They are just statistical with just some degree of randomization.

eggymachus@sh.itjust.works · 5 months ago

A guy is driving around the back woods of Montana and he sees a sign in front of a broken down shanty-style house: ‘Talking Dog For Sale.’

He rings the bell and the owner appears and tells him the dog is in the backyard.

The guy goes into the backyard and sees a nice looking Labrador Retriever sitting there.

“You talk?” he asks.

“Yep” the Lab replies.

After the guy recovers from the shock of hearing a dog talk, he says, “So, what’s your story?”

The Lab looks up and says, “Well, I discovered that I could talk when I was pretty young. I wanted to help the government, so I told the CIA. In no time at all they had me jetting from country to country, sitting in rooms with spies and world leaders, because no one figured a dog would be eavesdropping, I was one of their most valuable spies for eight years running… but the jetting around really tired me out, and I knew I wasn’t getting any younger so I decided to settle down. I signed up for a job at the airport to do some undercover security, wandering near suspicious characters and listening in. I uncovered some incredible dealings and was awarded a batch of medals. I got married, had a mess of puppies, and now I’m just retired.”

The guy is amazed. He goes back in and asks the owner what he wants for the dog.

“Ten dollars” the guy says.

“Ten dollars? This dog is amazing! Why on Earth are you selling him so cheap?”

“Because he’s a liar. He’s never been out of the yard.”

whynot_1@lemmy.world · 5 months ago

I think I have seen this exact post word for word fifty times in the last year.

Clay_pidgin@sh.itjust.works · 5 months ago

Has the number of "r"s changed over that time?

ElectroLisa@lemmy.blahaj.zone · 5 months ago

TachyonTele@lemm.ee · 5 months ago

y do you ask?

Clay_pidgin@sh.itjust.works · 5 months ago

Just playing, friend.

TachyonTele@lemm.ee · 5 months ago

Same, i was making a pun

Clay_pidgin@sh.itjust.works · 5 months ago

Oh, I see! Apologies.

TachyonTele@lemm.ee · 5 months ago

No apologies needed. Enjoy your day and keep the good vibes up!

pulsewidth@lemmy.world · edit-2 5 months ago

And yet they apparently still can’t get an accurate result with such a basic query.

Meanwhile… https://futurism.com/openai-signs-deal-us-government-nuclear-weapon-security

gerryflap@feddit.nl · edit-2 5 months ago

These models don’t get single characters but rather tokens repenting multiple characters. While I also don’t like the “AI” hype, this image is also very 1 dimensional hate and misreprents the usefulness of these models by picking one adversarial example.

Today ChatGPT saved me a fuckton of time by linking me to the exact issue on gitlab that discussed the issue I was having (full system freezes using Bottles installed with flatpak on Arch). This was the URL it came up with after explaining the problem and giving it the first error I found in dmesg: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/110

This issue is one day old. When I looked this shit up myself I found exactly nothing useful on both DDG or Google. After this ChatGPT also provided me with the information that the LTS kernel exists and how to install it. Obviously I verified that stuff before using it, because these LLMs have their limits. Now my system works again, and figuring this out myself would’ve cost me hours because I had no idea what broke. Was it flatpak, Nvidia, the kernel, Wayland, Bottles, some random shit I changed in a config file 2 years ago? Well thanks to ChatGPT I know.

They’re tools, and they can provide new insights that can be very useful. Just don’t expect them to always tell the truth, or to actually be human-like

lennivelkant@discuss.tchncs.de · 5 months ago

Just don’t expect them to always tell the truth, or to actually be human-like

I think the point of the post is to call out exactly that: people preaching AI as replacing humans

desktop_user [they/them] @lemmy.blahaj.zone · 5 months ago

it can, in the same way a loom did, just for more language-y tasks, a multimodal system might be better at answering that type of question by first detecting that this is a question of fact and that using a bucket sort algorithm on the word “strawberry” will answer the question better than it’s questionably obtained correlations.

ClusterBomb@lemmy.blahaj.zone · 5 months ago

“My hammer is not well suited to cut vegetables” 🤷

There is so much to say about AI, can we move on from “it can’t count letters and do math” ?

ReallyActuallyFrankenstein@lemmynsfw.com · 5 months ago

I get that it’s usually just a dunk on AI, but it is also still a valid demonstration that AI has pretty severe and unpredictable gaps in functionality, in addition to failing to properly indicate confidence (or lack thereof).

People who understand that it’s a glorified autocomplete will know how to disregard or prompt around some of these gaps, but this remains a litmus test because it succinctly shows you cannot trust an LLM response even in many “easy” cases.

Lovable Sidekick@lemmy.world · edit-2 5 months ago

deleted by creator

Strykker@programming.dev · 5 months ago

But the problem is more “my do it all tool randomly fails at arbitrary tasks in an unpredictable fashion” making it hard to trust as a tool in any circumstances.

desktop_user [they/them] @lemmy.blahaj.zone · 5 months ago

it would be like complaining that a water balloon isn’t useful because it isn’t accurate. LLMs are good at approximating language, numbers are too specific and have more objective answers.

Lovable Sidekick@lemmy.world · edit-2 5 months ago

deleted by creator

interdimensionalmeme@lemmy.ml · 5 months ago

Answer, you’re using it wrong /stevejobs

superglue@lemmy.dbzer0.com · 5 months ago

Your not supposed to just trust it. Your supposed to test the solution it gives you. Yes that makes it not useful for some things. But still immensely useful for other applications and a lot of times it gives you a really great jumping off point to solving whatever your problem is.

daniskarma@lemmy.dbzer0.com · edit-2 5 months ago

That happens when do you not understand what is a llm, or what its usecases are.

This is like not being impressed by a calculator because it cannot give a word synonym.

xigoi@lemmy.sdf.org · 5 months ago

Sure, maybe it’s not capable of producing the correct answer, which is fine. But it should say “As an LLM, I cannot answer questions like this” instead of just making up an answer.

daniskarma@lemmy.dbzer0.com · 5 months ago

I have thought a lot on it. The LLM per se would not know if the question is answerable or not, as it doesn’t know if their output is good of bad.

So there’s various approach to this issue:

The classic approach, and the one used for censoring: keywords. When the llm gets a certain key word or it can get certain keyword by digesting a text input then give back a hard coded answer. Problem is that while censoring issues are limited. Hard to answer questions are unlimited, hard to hard code all.
Self check answers. For everything question the llm could process it 10 times with different seeds. Then analyze the results and see if they are equivalent. If they are not then just answer that it’s unsure about the answer. Problem: multiplication of resource usage. For some questions like the one in the post, it’s possible than the multiple randomized answers give equivalent results, so it would still have a decent failure rate.

xigoi@lemmy.sdf.org · 5 months ago

Why would it not know? It certainly “knows” that it’s an LLM and it presumably “knows” how LLMs work, so it could piece this together if it was capable of self-reflection.

Klear@lemmy.world · 5 months ago

It doesn’t know shit. It’s not a thinking entity.

alphabethunter@lemmy.world · 5 months ago

Precisely, it’s not capable of self-reflection, thinking, or anything of the sort. It doesn’t even understand the meaning of words

Strykker@programming.dev · 5 months ago

But everyone selling llms sells them as being able to solve any problem, making it hard to know when it’s going to fail and give you junk.

daniskarma@lemmy.dbzer0.com · 5 months ago

And redbull give you wings.

Marketing within a capitalist market be like that for every product.

NιƙƙιDιɱҽʂ@lemmy.world · 5 months ago

Is anyone really pitching AI as being able to solve every problem though?

FourPacketsOfPeanuts@lemmy.world · 5 months ago

It’s predictive text on speed. The LLMs currently in vogue hardly qualify as A.I. tbh…

TeamAssimilation@infosec.pub · 5 months ago

Still, it’s kinda insane how two years ago we didn’t imagine we would be instructing programs like “be helpful but avoid sensitive topics”.

That was definitely a big step in AI.

Grabthar@lemmy.world · 5 months ago

Doc: That’s an interesting name, Mr…

Fletch: Babar.

Doc: Is that with one B or two?

Fletch: One. B-A-B-A-R.

Doc: That’s two.

Fletch: Yeah, but not right next to each other, that’s what I thought you meant.

Doc: Isn’t there a children’s book about an elephant named Babar.

Fletch: Ha, ha, ha. I wouldn’t know. I don’t have any.

Doc: No children?

Fletch: No elephant books.

Tgo_up@lemm.ee · 5 months ago

This is a bad example… If I ask a friend "is strawberry spelled with one or two r’s"they would think I’m asking about the last part of the word.

The question seems to be specifically made to trip up LLMs. I’ve never heard anyone ask how many of a certain letter is in a word. I’ve heard people ask how you spell a word and if it’s with one or two of a specific letter though.

If you think of LLMs as something with actual intelligence you’re going to be very unimpressed… It’s just a model to predict the next word.

renegadespork@lemmy.jelliefrontier.net · 5 months ago

If you think of LLMs as something with actual intelligence you’re going to be very unimpressed… It’s just a model to predict the next word.

This is exactly the problem, though. They don’t have “intelligence” or any actual reasoning, yet they are constantly being used in situations that require reasoning.

sugar_in_your_tea@sh.itjust.works · 5 months ago

Maybe if you focus on pro- or anti-AI sources, but if you talk to actual professionals or hobbyists solving actual problems, you’ll see very different applications. If you go into it looking for problems, you’ll find them, likewise if you go into it for use cases, you’ll find them.

renegadespork@lemmy.jelliefrontier.net · 5 months ago

Personally I have yet to find a use case. Every single time I try to use an LLM for a task (even ones they are supposedly good at), I find the results so lacking that I spend more time fixing its mistakes than I would have just doing it myself.

Scubus@sh.itjust.works · 5 months ago

So youve never used it as a starting point to learn about a new topic? You’ve never used it to look up a song when you can only remember a small section of lyrics? What about when you want to code a block of code that is simple but monotonous to code yourself? Or to suggest plans for how to create simple sturctures/inventions?

Anything with a verifyable answer that youd ask on a forum can generally be answered by an llm, because theyre largely trained on forums and theres a decent section the training data included someone asking the question you are currently asking.

Hell, ask chatgpt what use cases it would recommend for itself, im sure itll have something interesting.

renegadespork@lemmy.jelliefrontier.net · 5 months ago

as a starting point to learn about a new topic

No. I’ve used several models to “teach” me about subjects I already know a lot about, and they all frequently get many facts wrong. Why would I then trust it to teach me about something I don’t know about?

to look up a song when you can only remember a small section of lyrics

No, because traditional search engines do that just fine.

when you want to code a block of code that is simple but monotonous to code yourself

See this comment.

suggest plans for how to create simple sturctures/inventions

I guess I’ve never tried this.

Anything with a verifyable answer that youd ask on a forum can generally be answered by an llm, because theyre largely trained on forums and theres a decent section the training data included someone asking the question you are currently asking.

Kind of, but here’s the thing, it’s rarely faster than just using a good traditional search, especially if you know where to look and how to use advanced filtering features. Also, (and this is key) verifying the accuracy of an LLM’s answer requires about the same about of work as just not using an LLM in the first place, so I default to skipping the middle-man.

Lastly, I haven’t even touched on the privacy nightmare that these systems pose if you’re not running local models.

Tgo_up@lemm.ee · 5 months ago

What situations are you thinking of that requires reasoning?

I’ve used LLMs to create software i needed but couldn’t find online.

renegadespork@lemmy.jelliefrontier.net · 5 months ago

Creating software is a great example, actually. Coding absolutely requires reasoning. I’ve tried using code-focused LLMs to write blocks of code, or even some basic YAML files, but the output is often unusable.

It rarely makes syntax errors, but it will do things like reference libraries that haven’t been imported or hallucinate functions that don’t exist. It also constantly misunderstands the assignment and creates something that technically works but doesn’t accomplish the intended task.

Tgo_up@lemm.ee · 5 months ago

I think coding is one of the areas where LLMs are most useful for private individuals at this point in time.

It’s not yet at the point where you just give it a prompt and it spits out flawless code.

For someone like me that are decent with computers but have little to no coding experience it’s an absolutely amazing tool/teacher.

Grandwolf319@sh.itjust.works · 5 months ago

If you think of LLMs as something with actual intelligence you’re going to be very unimpressed

Artificial sugar is still sugar.

Artificial intelligence implies there is intelligence in some shape or form.

JohnEdwa@sopuli.xyz · edit-2 5 months ago

Something that pretends or looks like intelligence, but actually isn’t at all is a perfectly valid interpretation of the word artificial - fake intelligence.

corsicanguppy@lemmy.ca · 5 months ago

Artificial sugar is still sugar.

Because it contains sucrose, fructose or glucose? Because it metabolises the same and matches the glycemic index of sugar?

Because those are all wrong. What’s your criteria?

Grandwolf319@sh.itjust.works · 5 months ago

In this example a sugar is something that is sweet.

Another example is artificial flavours still being a flavour.

Or like artificial light being in fact light.

Tgo_up@lemm.ee · 5 months ago

Exactly. The naming of the technology would make you assume it’s intelligent. It’s not.

Scubus@sh.itjust.works · 5 months ago

Thats because it wasnt originally called AI. It was called an LLM. Techbros trying to sell it and articles wanting to fan the flames started called it AI and eventually it became common dialect. No one in the field seriously calls it AI, they generally save that terms to refer to general AI or at least narrow ai. Of which an llm is neither.

JohnEdwa@sopuli.xyz · 5 months ago

LLM is a type of a machine learning model, which is a type of artificial intelligence.

Saying LLMs aren’t AI is just the AI Effect in action.

dan1101@lemm.ee · 5 months ago

It’s like someone who has no formal education but has a high level of confidence and eavesdrops on a lot of random conversations.

zipzoopaboop@lemmynsfw.com · 5 months ago

You rang?

Allero@lemmy.today · edit-2 5 months ago

Here’s my guess, aside from highlighted token issues:

We all know LLMs train on human-generated data. And when we ask something like “how many R’s” or “how many L’s” is in a given word, we don’t mean to count them all - we normally mean something like “how many consecutive letters there are, so I could spell it right”.

Yes, the word “strawberry” has 3 R’s. But what most people are interested in is whether it is “strawberry” or “strawbery”, and their “how many R’s” refers to this exactly, not the entire word.

Opisek@lemmy.world · 5 months ago

But to be fair, as people we would not ask “how many Rs does strawberry have”, but “with how many Rs do you spell strawberry” or “do you spell strawberry with 1 R or 2 Rs”

jj4211@lemmy.world · 5 months ago

It doesn’t even see the word ‘strawberry’, it’s been tokenized in a way to no longer see the ‘text’ that was input.

It’s more like it sees a question like: How many 'r’s in 草莓?

And it spits out an answer not based on analysis of the input, but a model of what people might have said.

Zess@lemmy.world · 5 months ago

You asked a stupid question and got a stupid response, seems fine to me.

interdimensionalmeme@lemmy.ml · 5 months ago

Yes, nobody asking that question is wonderring about the “straw” part of the word. They’re asking, is the “berry” part one, or two "r"s

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · edit-2 5 months ago

“strawbery” has 2 R’s in it while “strawberry” has 3.

Fucking AI can’t even count.

rumba@lemmy.zip · 5 months ago

Yeah and you know I always hated this screwdrivers make really bad hammers.