I have a program that require all keywords to be in a single paragraph, most of the time, separated by commas

For example:

I have those terms

1-Term
1.1-Term
2-Term
3-Term
4-Term

That i collected and organized into groups and subgroups with Titles and subtitles

Title

  • 1-Term

  • 1.1-Term

  • 2-Term

    • Sub-Title
      • 3-Term
      • 4-Term

But then i want to turn them into:

1-Term, 1.1-Term, 2-Term, 3-Term, 4-Term 
 

Removing certain marked words(Titles and sub-Titles), any Empty/Blank space, and Line breaks, while adding the commas between The Terms. I want to keep certain dashes “-”(like in words )

1-Term,1.1 -Term,2-Term,3-Term,4-Term

  • Cactus_Head@programming.devOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 hours ago

    I think this is The solutions that makes the most sense to me

    But i don’t understand what sed does here

    replace the trailing comma with a newline again

    Why do we replace the commas again with new lines?


    Also, I figure a better way to group related terms

    Stars Wars;Clone Wars;Jedi
    

    Using semicolons “;”
    I figure i can replace them with commas using tr command

    tr ';' ',' 
    

    But do i just pipe

    tr '\n' ','
    

    Into

    tr ';' ',' 
    

    Or is there a way to combine them. I don’t see an option to do more than operation in tr manual


    Lastly, i have been trying to use regex to match

    What "X" Says About
    

    To

    What The MCU Says About The Comics Industry 
    

    I just need to match The “X” There, the program takes care of the rest

    I tried

    What \w+\s+ Says About
    

    On this website to match

    What The MCU Says About The Comics Industry

    But using the debugger, it only recgnize “The” and then stops

    • bus_factor@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      2 hours ago

      Why do we replace the commas again with new lines?

      Consider this two-line output:

      $ echo 'a\nb'
      a
      b
      $
      

      We convert the newlines to commas. Now there is a comma at the end of the last line as well, and because of no newline, the next prompt is at the end of the output:

      $ echo 'a\nb' | tr '\n' ,
      a,b,$
      

      Substituting only the last comma ($ means end of line) allows us to get the output we expected:

      $ echo 'a\nb' | tr '\n' , | sed 's/,$/\n/'
      a,b
      $
      

      Or is there a way to combine them

      These two commands have equivalent output:

      tr '\n' ',' | tr ';' ',' 
      tr '\n;' ',,'
      

      What tr does is take a list of characters in parameter 1 and converts them to the equivalent position character in parameter 2. There’s a little more to it (it supports ranges, for example), but this will do the job. To learn more you can run man tr to get the documentation for it.

      I tried What \w+\s+ Says About

      \w+\s+ matches “at least one word character and then at least one whitespace character”, and that’s not what you want. “The MCU” is one or more word characters, then a space, and then one or more word characters again, and that second part you’re not matching at all. In this case, you’re probably better off making a negative matching group where you make sure you don’t match across separators. What [^,;]+ Says About would match anything that’s not a comma or semicolon, for instance.

      The other problem with regex is that every implementation does things differently. For example, sed would interpret that plus as a literal +, so for sed syntax you’d need to use \+ instead. It also does not support \w and \s, and whether to use ( or \( for a literal parenthesis also varies between implementations. I often switch to Perl if I need to do some more complex regex shenanigans.

      • Cactus_Head@programming.devOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        second part you’re not matching at all.

        That because the program/ add-on i am using, only requires certain keywords to blacklist videos

        so if it find What "X" Says About in a Video Title , it doesn’t need the rest of the sentence to blacklist the video.

        The other problem with regex is that every implementation does things differently

        Th developer links to Firefox’s developers Regex Documentation.

        Regex
        
        You can use Regex to match very specific patterns of text.
        
        /aaa+/i: will block content that include aaaAAAAAaaaaAAAaaa or aaaaaaaa
        /top \d+/: will block content that include top 10 movies, top 5 upcoming movies
        
        Supports negative too, by adding ! (exclamation mark) before the regex.
        Example: !/^a/i will block content that does not start with a 
        
        

        This is a snip-it of the the add-on Guide. I cant like to it cuz for some reason its only inside the extension but here is the add-on’s page