everytime i check nginx logs its more scrapers then i can count and i could not find any good open source solutions

  • Fedditor385@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    2
    ·
    1 day ago

    I just realized an interesting thing - if I use Gemini, and tell it to do deep research, it actually goes to the websites it knows/finds, and looks up the content to provide up-to-date answers. So, some of those AI crawlers are actually not crawlers, but actual users who just use AI instead of coming directly to the site.

    Soo… blocking AI completely could also potentially reduce exposure, especially as more and more people use AI to basically do searches instead of browsing themselves. That would also explain the amount of requests daily - could be simply different users using AI to research for some topic.

    Point is, you should evaluate if the AI requests are just proxies of real users, and blocking AI blocks real users from knowing your site exists.

    • daddycool@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      1
      ·
      1 day ago

      some of those AI crawlers are actually not crawlers, but actual users who just use AI instead of coming directly to the site. Soo… blocking AI completely could also potentially reduce exposure.

      Normally, websites want users to come to their site, instead of an AI search engine “stealing” the content and presenting it as it’s own. Yes, AI search engines are more convenient for the user, but in the end it will discourage website creators and thereby cut of it’s own “food supply”.

      • Zexks@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        We all understand that. But if those users keep insisting on giving everyone their life story and current option in world politics before giving us the bread recipe we came for, they can fade away.

      • nfreak@lemmy.ml
        link
        fedilink
        English
        arrow-up
        13
        ·
        1 day ago

        Yeah I’d consider blocking out both the bots and AI-users a win-win lmao

      • Fedditor385@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        2
        ·
        1 day ago

        I understand, but the shift in user behaviour is significant and I think websites are not taking it into account. If the users move more and more to AI, and since Google introduced AI mode it’s only a question of time until it becomes the default, we will see more and more of what we thing are AI crawlers and less and less organic users.

        AI seems to be the new middleman between you and the user, and if you block the middleman, you block the user. For people with hobby websites or established sites it may make sense because people either know of them, or getting more exposure is not a wish or requirement, but for everyone else, it will be painful.

        • lambalicious@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          2
          ·
          16 hours ago

          So, what I’m reading is, if your “users” are bad (or bots), just get better users.

          Sounds like a net win.

        • Noja@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 day ago

          I honestly don’t think most people replace search with AI, it will also slowly solve itself when google injects ads into the output.

    • rumba@lemmy.zip
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 day ago

      Porque no los dos?

      There is no functional difference between them scraping you systematically and them coming to you on behalf of user. They’re coming to scrape you either way, being asked by someone is just going to make them do it in a smarter fashion.

      Also, if you’re not using Gemini, damned if Google.com doesn’t search you with it anyway. They want these AIs trained bad, sooner or later almost all searching will be done through AI. There will eventually be no option.

      You are correct that blocking all AI calls well eventually make your search results not work.

      So if you want organic traffic, you have to allow ai scraping eventually. You’re just going to get diminishing returns until a point.

      • youmaynotknow@lemmy.ml
        link
        fedilink
        English
        arrow-up
        3
        ·
        20 hours ago

        Eso es correctísimo. I don’t want ANY AI in my servers looking for anything, regardless of if they are crawlers or if it’s on behalf of some lazy fuck.